type in your query to search makeyougohmm
Things that ... make you go hmmtechnology music video art news reviews and muse on the web

November 30, 2007

IRSeeK protects privacy and respects others in IRC how?

chat, search engines — by TDavid @ 8:40 am PST
F = please no more posts like thisD = not among your best stuffC = average postB = good post, I liked itA = great post, please create more like this (1 votes, average: 5 out of 5)
Loading ... Loading ...

In this morning’s reading I came across a new search engine called IRSeeK.com which has built a searchable index off spying listening to chats, most likely employing bots to do the dirty work.

IRCSeeK search engine says they protect privacy .. how?

In the IRSeeK About Us tab I became curious about this passage (emphasis mine):

By constantly archiving thousands of active, highly-focused, public chat-rooms in a wide variety of topics (e.g. Linux, soccer, Christianity, poker, business and others) then indexing, processing and publishing the content on the web using advanced Web 2.0 technologies while maintaining the privacy of the users, we are creating a knowledge base different from any other.

A couple issues here. One, they are “creating” a knowledge base? No, they are becoming the peeping tom’s of IRC channels without the permission of the people chatting. Two, how are they maintaining the privacy of the users when their entire conversations are being logged? Try searching for one or more handles you’ve used in IRC. Are you finding messages you’ve made in various IRC channels? I found messages I hadn’t even made in IRC, but on Twitter that was picked up by an IRC bot somewhere.

Strangely enough, the IRC server we’ve been running for over five years didn’t yield any hits. It appears our server isn’t on the IRSeeK server list. In the spirit of being open it would be nice to see a complete server export list of the IRC servers IRSeeK is mining.

Good idea, bad idea?
Mining public IRC channels for information and making it searchable isn’t an altogether bad idea, there is a goldmine of information being shared in IRC, but the execution here by IRSeeK is a bit questionable. Do their bots identify themselves as spies? Let’s put this in the context of a respectful search engine spider. Good bots identify themselves like Googlebot and give you the ability to refuse them access. Is IRSeeK following these same principles?

What IRSeeK is doing might be entirely legal, so don’t misconstrue my comments. Just because you can on th web doesn’t mean you should.

My problem with IRSeeK is one of manners. Taking without permission on the web is using bad netiquette. It’s like screenscraping or hotlinking without permission. There is a lot of great information on IRC and that’s what there is to love about IRC but there are also some semi-private conversations that people in niche groups have, yes, even out in the open “public” channels.

As a channel op and IRC server administrator based on experience I wouldn’t feel comfortable logging every word in the public channels and making it searchable without notifying the people in the channels the second they joined that this was happening. Why not? I can think of a couple cases where we’ve posted the public IRC channel logs of the live radio show we do on Fridays and people have come back to me and commented about something they said in the channel being published. People assume, either rightly or wrongly, that what goes on in IRC stays in IRC unless it’s made very clear otherwise. IRC etiquette.

I would be interested in hearing a reply from somebody in the IRSeeK team as to how they are addressing these sensitive issues as well as how they currently are disclosing their bot activity. If they aren’t, then do they plan to do so in the future? If they don’t feel compelled to disclose their intentions, then expect IRC server administrators to be up in arms and this is certainly not an organization or site I’d recommend to anybody. Their “idea” for making select IRC servers chat logs isn’t original, it’s just something few have had the stones to do because of obvious privacy implications.

Related Posts

RSS Feed comments for this post 5 Comments »

  1. I think this is wrong, wrong, wrong to be doing.

    None of these people are aware that they are being ‘recorded’. There is no onjoin message letting them know this. No disclaimers etc. Granted there is rarely unanimity on the www, but this is blatent.

    Comment by ^Lestat — November 30, 2007 @ 9:18 am PST

  2. I guess they call it IRSeeK and yet I have it as IRCSeek in the post everywhere. I’m going to leave it that way because it matches the domain.

    Comment by TDavid — November 30, 2007 @ 9:28 am PST

  3. I fixed the broken link and the name of the service in the title.

    Comment by TDavid — November 30, 2007 @ 12:37 pm PST

  4. IRSeek is ‘temporarily’ taken down by it’s own operators. Quote via the IRSeek blog in a post entitled ” Open letter to IRC Operators“;

    Due to the concerns of our users, we’ve decided that for the time being, until we figure out a satisfactory solution(s) to the user’s concerns, we have disabled the site.

    I still don’t like the idea of ‘recording’ and indexing these chats for the web. Especially if WE don’t know about it. Yes it’s public, but I still don’t like it. Especially if I’m not aware if/when it’s being done.

    A few items listed in the letter:

    1. Do our best to anonymize the nicknames.
    2. Allow channel operators to opt-out (by verifying their identity on IRC).

    How is it channel ops can’t opt out? Is this logging or censoring?!

    Don;t like it one bit.

    Comment by ^Lestat — December 3, 2007 @ 11:32 am PST

  5. Lestat - it appears from their message that when the service comes back they won’t be using any more stealth bots and will only be indexing and archiving channels permissively. It’s going to be a bit hard to trust these guys now that they seemingly had to be guilted into doing the right thing.

    More info about how IRSeeK’s data gathering habits courtesy of this Freenode blog post:

    However, currently there is no way to opt-in, or even to opt out. The bots aren’t easily identifiable and you’re not aware that they are present in your channel.

    Freenode took action to temporarily disable all TOR connections which was how IRSeeK was connecting and listening to crosstalk: “The logging bots primarily connect through tor, seem to have no distinguishing characteristics that we can identify, and so far the company has not been willing to remove them voluntarily.”

    Eventually they did suspend the service, but the sequence of events and how the original data was collected remains disturbing.

    Comment by TDavid — December 3, 2007 @ 4:44 pm PST


TrackBack URI: http://www.makeyougohmm.com/20071130/4977/trackback/

Leave a comment


By leaving a comment you consent to the Official Hmm Comment Policy

Return Home


Copyright 2003-2008 KMR Enterprises All Rights Reserved