cancel
Showing results for 
Search instead for 
Did you mean: 

I thought the computer was supposed to do the work: Part 1

I thought the computer was supposed to do the work: Part 1

I thought the computer was supposed to do the work: Part 1

The Internet was always destined to be a great leveller but now there are a number of small apps / widgets which are turning the general belief of "how things should work" on their head. One such widget is recaptcha. Recaptcha is basically a anti-SPAM solution for websites. But its a clever one :-) It used actually OCR scanned images from real books and gets you the human user to actually type in the word - so that over time hundreds, thousands, or millions of real people are actually training the OCR software to become better. Its tag line of "Stop SPAM, Read Books" sums is up. The following paragraph from their websites says it even better: "Over 60 million CAPTCHAs are solved every day by people around the world. reCAPTCHA channels this human effort into helping to digitize books from the Internet Archive. When you solve a reCAPTCHA, you help preserve literature by deciphering a word that was not readable by computers." The 2nd project / application is the Rosetta Project. This project aims to document all know human language. Now imagine the scenario where these two applications get together and create a single "mashup" application. You could imagine a scenario where all the worlds books are digitised by an OCR solution which is nearly as good as a human (after all it was trained by millions of them) and was translated into every language ever known to man. How powerful would that be? Every book available in every known language. Thats the power of the Internet :-) Regards Dean

0 Thanks
1 Comment
124 Views
1 Comment
Grafter
Neat idea but... Q: But if a computer can't read such a CAPTCHA, how does the system know the correct answer to the puzzle? A: Each new word that cannot be read correctly by OCR is given to a user in conjunction with another word for which the answer is already known. The user is then asked to read both words. So they're not really tapping into the estimated 150,000 hours of human effort spent on CAPTCHAs each day, they're actually *adding* to it (albeit for a useful purpose)!