Craigslist says no thanks to Oodle scraping |
Oodle, founded in Spring of 2004, is one of these mashup services that sends out the bots to scrape classified ads data from a variety of sources including CraigsList. The benefit Oodle feels they offer is giving consumers a wider variety of choices in a centralized location, similar to what Froogle does for facillitating comparison shopping. Oodle makes money by placing ads around the collected, scraped content.
If Oodle got permission from the sources they were scraping then I think they’d be in good shape here, but it seems like they might be skipping that all important step in the process.
Case in point, Craig Nemark from Craigslist either wasn’t asked or didn’t like the deal (?) and has asked Oodle to stop scraping.. Oodle has complied but it seems from this reader’s perspective like they do so a bit begrudgingly, playing the whole “what’s best for the customers?” card:
I think it’s important to keep in mind what’s best for consumers. And I think being open is good for consumers.
It’s one thing if Oodle was a truly open source project with no ads and no money making, or somebody’s personal project that had no commercial actiivty, but it is quite another when they are running this mashup as a business, complete with rolling out the stench of greasy slick buzzwords. Craig said that the Web 2.0 speak thing was just a joke, but the timing could hardly be any worse.
Jason Calacanis vents on the situation in the comments area of John Battelle’s blog:
I for one am sick of people saying something is Meta or Web 2.0 when what they really means is it’s based on stealing … Fair use is one thing, wholesale scraping/syndication is another. Oodle, Indeed, etc. should a) get permission and b) consider paying Craig a licensing fee for his content.
I’m curious if Oodle ever asked or just assumed the data was there for the taking? I’ve seen the whole “but google does something like this too” argument but Google is a search engine for many things, not primarily and very specifically a classified ads search script. And anybody can block Google by just adding to the robots.txt rules. Was the Oodle spider obeying these rules?
Wonder if it is now?
Craig from Oodle (odd coincidence that they share the same first name, I guess) adds the comment:
Moreover, I’ve heard Craig Newmark describe Craiglist as a public commons. I take that mean that everyone is invited.
Everyone is invited, sure, that doesn’t mean clean out the beer and pizza in the fridge and say, “hey, we got it over at Craig’s place, go check them out.”
Did this post make you go hmm?
Maybe Related Posts (plugin generated)
- Copyright vs. copyrape: the Google Print saga
- Identifying Google News sources
- Mint’s unrefreshing contracted web scraping
- Craig getting pierced raw hand by fans as Bond replacement
- What’s considered commercial advertising at the Windows Live Expo?
- code.google.com




[…] I do realize these mashup services need to make money somehow, but this real world example is just more of the dark side of mashups that build their entire model on third parties for data. They aren’t a search engine, so please don’t anybody make that comparison. Search engines send out spiders to sources directly. They aren’t ping receptors either like Technorati or Feedster or Weblogs. Yeah, gada.be provides a service by sending traffic to the source, but they are more parasitic in nature than search engines and ping receptors who get their data directly from the source. […]
Pingback by Make You Go Hmm: » gada.be spammed by blogspot splogs — October 16, 2005 @ 12:37 pm PST
Craiglist is a devil dressed in an angel suit.
Comment by pat — February 9, 2006 @ 9:39 pm PST
Given that my site performs similar activities to those of Oodle, albeit on a much reduced scale, I have researched the topic of content scraping and copyright infringement and written this article:
lancebot.com/disclaimer.jsp
It is odd that CraigsList complains about the number of pages from CraigsList in Oodle while Google has over 9 million of pages indexed from their site.
Comment by Carlos — July 9, 2007 @ 6:33 am PST
I dunno, Carlos, this doesn’t sound accurate to me:
The copyright for any website is on the collection and presentation of information. You aren’t getting permission from the copyright holder of the site you’re scraping, you’re using their bandwidth to steal information from them and then trying to say they don’t own the copyright to the job listings, so what you’re doing is allowed within copyright law. That’s like saying, the backdoor is open so you went in and stole the cookies off the table. The cookies were made by Nabisco, after all, they weren’t created by the homeowner. Now you can resell those cookies and it’s no big deal. Hmm, doesn’t sound right to me, but I’m no lawyer.
Comment by TDavid — July 9, 2007 @ 6:58 am PST
“The copyright for any website is on the collection and presentation of information.”
Yes, collections are copyrightable. But who is creating the collection of ads? Does CraigsList have an editorial board, a group of paid staff selecting ads? No, the users are interacting with a computer system and leaving their advertisements, that are automatically published by a machine. Is the computer itself doing any editorial work at all? None are filtered, right? To what extent can the work of a computer system in collecting these ads be considered an original work of authorship? If it is not, and I highly doubt so, copyright law does not apply to it.
The cost of bandwidth is a non issue here. I get 100 GB/month for less than five dollars. I think that Oodle got scared away and we could have seen an interesting court decision on this issue. The real problem is that the law was not devised thinking in the Internet and it lags way behind by now. There may moral issues and I can agree with you that Oodle’s profiting on the back of CraigsList may be unethical, but it may be legal after all. Business men are scoundrels
By the way, Lancebot is my personal creature and I am not turning any profit on it
I hope that what you said here still applies even if we hold opposite opinions on this particular item.
“It’s one thing if Oodle was a truly open source project with no ads and no money making, or somebody’s personal project that had no commercial activity”
Thanks for the fruitful discussion.
Comment by Carlos — July 9, 2007 @ 12:45 pm PST
It’s one thing, yes, but I never said it was a LEGAL thing, so don’t get too excited about your position. Just because something doesn’t have ads and/or generates no revenue and/or is labeled a “personal project” doesn’t mean it’s not copyright infringing.
And what you really meant to say is you haven’t added ads yet, Carlos. Check back here in 12-24 months and if you are still doing this (doubtful) tell me you are still completely moneymaking free (no paid ads, placement or otherwise).
If you STEAL something somebody else makes money on, Carlos, and give it away, you are essentially taking money from somebody else’s pockets, which could have financial consequences. If you don’t want your personal project to impact your personal finances, I’d strongly suggest talking to a skilled copyright attorney.
Bottom line, this is between you and Craigslist. If you think what you’re doing is so noble then contact Craig Newmark and get permission to scrape his site. That’s his request and you seem to think you don’t need it. If you are afraid of asking for that permission then that pretty much tells the rest of us how legitimate your operation is.
Comment by TDavid — July 9, 2007 @ 1:07 pm PST
I am not scraping content from CraigList, I just took the Oodle case as an example because of the similarities. The fact that I am not making money off it is important, given that it counts towards fair use. I may end up profiting from the traffic indirectly by selling software for freelancers (I am a programmer) but that would not compete with the scraped sites and it still complies with fair use.
In an unrelated item that you may have missed, I did ask for permission to Guru.com (major freelance marketplace) and they declined. I accepted it and will not serve results from their site.
“If you STEAL something somebody else makes money on, Carlos, and give it away, you are essentially taking money from somebody else’s pockets, which could have financial consequences”.
Yes, this would not be fair use because I would be impacting the market of the copyright holder. But note that freelance marketplaces have a different business model from that of CraigsList, they make money off commissions from the projects that I would be publicizing to my search engine users. It is in their best interest that more and more people view them and I am not detracting any market value from them.
“Bottom line, this is between you and Craigslist. If you think what you’re doing is so noble then contact Craig Newmark and get permission to scrape his site. That’s his request and you seem to think you don’t need it.”
The whole point of fair use is that you do not have to ask for permission. CraigsList could still have gone against SimplyHired.com, that scrapes their job listings, and they haven’t. Google does exactly the same thing, without permission, and CraigsList has remained silent because being deindexed would surely hurt their earnings.
Comment by Carlos — July 9, 2007 @ 1:31 pm PST
Carlos there can’t be a more wrong statement than this: “The fact that I am not making money off it is important, given that it counts towards fair use.”
Whether or not you make a penny has no impact on Fair Use. You’re correct that you don’t have to ask for permission for fair use when it abides by certain criteria. Fair use was me quoting you above because I was doing it as a point of criticism and review. If I scrape your site and use your entire page verbatim and then link to it on this page and criticize it, that’s not fair use.
Again, and this is my last comment to you on the matter: contact a copyright attorney. You need one. Badly.
Comment by TDavid — July 9, 2007 @ 1:37 pm PST