Optical Character Recognition in Images: Have your say
The buzz in every major SEO news source over this last week has got to be the news that a patent that Google filed in June 2007 for optical character recognition (OCR) in static images and video has just become available.
For once, something that makes a SEO/web designer’s life easier! Up until this point, the web design community had 2 options when tackling headings: Opting for a font that is widespread on all/most PCs (very limited and very plain) and relying on stylesheets to spice it up a bit or creating a heading using any given font as an image and hoping that the search engines read the alternative text provided in the image tag.
With this new application, image headings can be read and indexed, as can videos, Flash presentations and even (and I dare to be bold) a whole web page consisting of 1 big image! Duncan Riley from Tech Crunch added further scope on the development:
With extreme technical advances also comes drawbacks. A Completely Automated Public Turing test to tell Computers and Humans Apart (CAPTCHA) is a script that generates a random word in an image and uses a variety of methods to distort the image, eg an angled line or crowded symbols making it harder for bots to encrypt. This method is widely used by sites such as MySpace and Blogger to reduce the amount of spam posted.
If and when this scripting comes into use, this could leave the CAPTCHA with its days numbered. If this technology can be developed at Google Central, then what's to say the same software can’t be developed elsewhere? Once the spamming community get their mitts on this hot potato, it will be across the community quicker than nausea at a Mika concert.
I have seen the damage this minority of the underworld can do on a forum with no security measures: Using their shared scripts to automatically generate numerous fake users with their spammy web links for all to see. Fair to say, I lost the battle of keeping that forum rid of spammers. This could become more commonplace if forum/blog owners don't act quick enough to protect their fort.
Even if Google has the patent giving them exclusive rights to OCR, are they going to nail every spammer, scammer, splogger, tinker, tailor, soldier, spy in cyberspace? It would be great if they could (with a long-awaited revolution where spammers are placed in stocks and pelted with rancid fruit) but lets live in the real world for now and assume that that is not going to happen.
So what do you all think: OCR sweet or OCR sour?
Hello Liberated Web Design
For once, something that makes a SEO/web designer’s life easier! Up until this point, the web design community had 2 options when tackling headings: Opting for a font that is widespread on all/most PCs (very limited and very plain) and relying on stylesheets to spice it up a bit or creating a heading using any given font as an image and hoping that the search engines read the alternative text provided in the image tag.
With this new application, image headings can be read and indexed, as can videos, Flash presentations and even (and I dare to be bold) a whole web page consisting of 1 big image! Duncan Riley from Tech Crunch added further scope on the development:
"If Google has found a way to index text in static images and video this is a great leap forward in the progression of search technology. This will make every book in the Google Books database really searchable, with the next step being YouTube, Flickr (or Picasa Web) and more. The search capabilities of the future just became seriously advanced."
Goodbye CAPTCHA
With extreme technical advances also comes drawbacks. A Completely Automated Public Turing test to tell Computers and Humans Apart (CAPTCHA) is a script that generates a random word in an image and uses a variety of methods to distort the image, eg an angled line or crowded symbols making it harder for bots to encrypt. This method is widely used by sites such as MySpace and Blogger to reduce the amount of spam posted.
If and when this scripting comes into use, this could leave the CAPTCHA with its days numbered. If this technology can be developed at Google Central, then what's to say the same software can’t be developed elsewhere? Once the spamming community get their mitts on this hot potato, it will be across the community quicker than nausea at a Mika concert.
I have seen the damage this minority of the underworld can do on a forum with no security measures: Using their shared scripts to automatically generate numerous fake users with their spammy web links for all to see. Fair to say, I lost the battle of keeping that forum rid of spammers. This could become more commonplace if forum/blog owners don't act quick enough to protect their fort.
Even if Google has the patent giving them exclusive rights to OCR, are they going to nail every spammer, scammer, splogger, tinker, tailor, soldier, spy in cyberspace? It would be great if they could (with a long-awaited revolution where spammers are placed in stocks and pelted with rancid fruit) but lets live in the real world for now and assume that that is not going to happen.
So what do you all think: OCR sweet or OCR sour?
Labels: captcha, google, ocr, optical character recognition


