type in your query to search makeyougohmm
Things that ... make you go hmmtechnology music video art news reviews and muse on the web

May 24, 2007

reCAPTCHA helps improve OCR for internet library

Books and Writing, blogs and podcasting, spam, linkdump — by TDavid @ 8:00 am PST
New! F = please no more posts like thisD = not among your best stuffC = average postB = good post, I liked itA = great post, please create more like this (Hmm, no ratings yet)
Loading ... Loading ...

reCAPTCHAYou’ve seen CAPTCHAs before, it’s where you type characters in a picture to prove you are human.

The two major issues I’ve experienced with CAPTCHA implementations are:

1. garbled characters
2. image only CAPTCHAs aren’t accessible to sight-impaired

The latter can be resolved by offering a clear text to speech voice option. Unfortunately most created CAPTCHAs don’t (myself included with Form Sentinel) offer voice functionality. On my someday to-do list is adding voice capability to Form Sentinel.

There have been some creative CAPTCHA options like the hotornot CAPTCHA and there’s a new one called reCAPTCHA from Carnegie Mellon University which uses the CAPTCHA to fix OCR (optical character recognition) problems digitizing books:

reCAPTCHA improves the process of digitizing books by sending words that cannot be read by computers to the Web in the form of CAPTCHAs for humans to decipher. More specifically, each word that cannot be read correctly by OCR is placed on an image and used as a CAPTCHA. This is possible because most OCR programs alert you when a word cannot be read correctly.

This is not planned to be a permanent change (if you don’t like this please speak up below) but I’ve reimplemented a CAPTCHA on the comments using reCAPTCHA. It’s already running and working in the comments below. Fellow bloggers with Wordpress can snag the Wordpress plugin here. Note: you’ll need to register at reCAPTCHA to get a public and private key.

We tried CAPTCHA in the comments here once before and I took it away after receiving some complaints from sight-impaired readers. As someone with sight that is getting progressively worse as the years pile on, I can sympathize with these readers. If any sight-impaired readers are disappointed that I’ve readded CAPTCHA, you’ll be delighted to know that this reCAPTCHA offers a voice challenge, woohoo!

I want to help this digitizing books project and would also like sight-impaired readers to be able to leave comments. Readers please let me know what you think of this project and the CAPTCHA being re-enabled in the comments area.

Will this cause you to leave less comments? Do you mind typing in the extra two words or not? Hmm doesn’t require registration to leave comments (to me that’s worse than a CAPTCHA) and it will remember your details in a cookie if you allow them, so all you’d need to type is your comment and two extra words. I’m particularly interested in those who feel this will make them leave less comments than they would without a CAPTCHA in use. The last time I tried CAPTCHA here it didn’t have any negative impact on the number of comments being left and significantly reduced the amount of comment spam. I removed it because of the valid accessibility complaints.

Personally, I don’t mind using CAPTCHA as long as I can read it. I despise CAPTCHAs that I can’t read and won’t keep refreshing the page to get a better looking one, I’ll just pass on leaving a comment.

I like using my Sony PS3 to help find cures and this seems like a good use of the comment feature to help accurately digitize more books and help reduce the growing onslaught of mechanized comment spam. Agree? Disagree? Sight impaired people will benefit from having more digitized books that their text-to-speech readers can read too.

Related Posts

RSS Feed comments for this post 11 Comments »

  1. Type in what two words? huh? :p You’re not making me type that crap in. haha…. damn you! I had to hit back and type the stupid words in! Why don’t you just provide the definitions while you’re at it. Stupid words. :p

    Comment by darkmoon — May 24, 2007 @ 10:23 am PST

  2. Did you try the audio challenge? I just did. It reads a series of numbers to you with background noise. Not very clear to follow :( I hope readers choosing that option can make sense of it.

    Comment by TDavid — May 24, 2007 @ 10:27 am PST

  3. I thought I would do my fair share and correct a scanning problem.

    Comment by weirdharold — May 25, 2007 @ 1:11 pm PST

  4. Based on a suggestion from another reader, I’ve made the reCAPTCHA only show for those who have less than five unapproved comments.

    Comment by TDavid — May 28, 2007 @ 12:47 pm PST

  5. Ok, the reCAPTCHA should be working fine now for those with more than 4 posts you won’t see it. Thanks for the head’s up that it wasn’t working right for these folks.

    Comment by TDavid — May 29, 2007 @ 8:35 am PST

  6. Hmmm…. TDavid, I am certain that I have more than 4 approved posts in the system and I still see the CAPTCHA. Personally I don’t mind helping a worthwhile project by using it, and wish it was used on VTOR. But thought I would mention that I still see it.

    Comment by weirdharold — May 29, 2007 @ 8:52 am PST

  7. Then again, once I entered that comment…. I no longer see the CAPTCHA… Go figure! but when hitting submit comment I received the message that the CAPTCHA was incorrect. I had to refresh the page and get a CAPTCHA back up to enter this message.

    Comment by weirdharold — May 29, 2007 @ 8:55 am PST

  8. I just checked the database Harold and it shows you have made 4 comments. Make one more and let’s see what happens.

    Comment by TDavid — May 29, 2007 @ 9:01 am PST

  9. testing to see if I get the CAPTCHA after this post.

    Comment by weirdharold — May 29, 2007 @ 9:04 am PST

  10. My bad, I forgot I mad most of my comments under my name rather than weirdharold

    Comment by weirdharold — May 29, 2007 @ 9:06 am PST

  11. […] commenting area as a means of protecting against the onslaught of comment spam and helping the good reCAPTCHA cause. At the time this blog was receiving over 1,000 comment spams a day, no doubt increased because we […]

    Pingback by Does adding CAPTCHA reduce the number of comments? » Make You Go Hmm — November 14, 2007 @ 6:52 am PST


TrackBack URI: http://www.makeyougohmm.com/20070524/4523/trackback/

Leave a comment


By leaving a comment you consent to the Official Hmm Comment Policy

Return Home


Copyright 2003-2008 KMR Enterprises All Rights Reserved