Plusnet
Sunday 8th November 2009 Login | Register

I thought the computer was supposed to do the work: Part 1

September 17th, 2007 at 11:56 by Dean

The Internet was always destined to be a great leveller but now there are a number of small apps / widgets which are turning the general belief of “how things should work” on their head.

One such widget is recaptcha. Recaptcha is basically a anti-SPAM solution for websites. But its a clever one :-)

It used actually OCR scanned images from real books and gets you the human user to actually type in the word – so that over time hundreds, thousands, or millions of real people are actually training the OCR software to become better. Its tag line of “Stop SPAM, Read Books” sums is up. The following paragraph from their websites says it even better:

“Over 60 million CAPTCHAs are solved every day by people around the world. reCAPTCHA channels this human effort into helping to digitize books from the Internet Archive. When you solve a reCAPTCHA, you help preserve literature by deciphering a word that was not readable by computers.”

The 2nd project / application is the Rosetta Project. This project aims to document all know human language.

Now imagine the scenario where these two applications get together and create a single “mashup” application. You could imagine a scenario where all the worlds books are digitised by an OCR solution which is nearly as good as a human (after all it was trained by millions of them) and was translated into every language ever known to man.

How powerful would that be? Every book available in every known language.

Thats the power of the Internet :-)

Regards

Dean

Dean

This entry was posted by Dean on Monday, September 17th, 2007 at 11:56 am and is tagged with and is posted in the category PlusNet News. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.


one comment on "I thought the computer was supposed to do the work: Part 1"

Tamlyn

Neat idea but...

Q: But if a computer can't read such a CAPTCHA, how does the system know the correct answer to the puzzle?
A: Each new word that cannot be read correctly by OCR is given to a user in conjunction with another word for which the answer is already known. The user is then asked to read both words.

So they're not really tapping into the estimated 150,000 hours of human effort spent on CAPTCHAs each day, they're actually *adding* to it (albeit for a useful purpose)!

Add a Comment




Photos

photo photo photo photo photo photo

View More

Forums

Users online: 50

  • Total Topics: 79597
  • Total Posts: 653490
  • Total Members: 11672

Visit the Forums

Plusnet

Force9

Metronet

Free-Online

Madasafish

PAYH

Just The Name

Related Sites

Community Apps

Here at Plusnet we're always trying to use clever open source things to make our lives easier. Sometimes we write our own and make other people's lives easier too!

View the Plusnet Open Source applications page

About Plusnet

We sell broadband, phone, VoIP and more to homes and businesses in the UK. Winner of 9 out of 11 Categories in the 2008 USwitch survey. Winner of "Best Consumer ISP" at 2008 ISPA awards. Voted number 1 in the Broadband Choices 2008 survey.

© Plusnet plc All Rights Reserved. E&OE

Community Site News is powered by WordPress

Add to Technorati Favourites