type in your query to search makeyougohmm
Things that ... make you go hmmtechnology music video art news reviews and muse on the web

March 20, 2006

RSS Feed cleanup part deux, 250 feeds target

blogs and podcasting, How To — by TDavid @ 1:26 pm PST
New! F = please no more posts like thisD = not among your best stuffC = average postB = good post, I liked itA = great post, please create more like this (Hmm, no ratings yet)
Loading ... Loading ...

In September 2005 I wrote that Sunday seemed like a good day for an RSS cleanup and listed some of my criteria of why feeds were removed.

Yesterday, I spent a good part of the day combing through my list of nearly 400 RSS feeds which had not been fully cleaned since November, so it was well past time for me to scare up the axe from the garage.

Program used
I downloaded the server side program reblog, installed and followed along with the information in Matt Haughey’s excellent Lifehacker reblog How to post. Reblog is a lot like Bloglines, but adds some AJAXy goodness so you can use your keyboard to pound through feeds and archive or publish items.

The published items go into a separate RSS feed and web view so you can essentially filter hundreds of RSS feeds into one feed that interests you and then pick from that feed the items you want to blog about. There are plugins available to post directly to your Wordpress or Movable Type blog from your reblog feed. Reblog also publishes an OPML file of your feeds which is handy for exporting to other feed readers.

I didn’t like the idea of bots picking up my reblogged web feed so I added a block for spiders to robots.txt to keep them out of there.

Downside I see with using this program is one could look like a splogger since it is republishing the content you choose to “publish” in a public location and creating an RSS feed for this content with your name on it. The upside is that curious readers can see the posts that interested but you chose not to blog about for whatever reason. Sort of like following along with a linkblog, but actually containing the blog posts you saw and marked off as being interesting enough to ‘publish’.

For example, in going through the hundreds of RSS feeds I published 233 items. Out of those 233 posts I only blogged 10 of them (so far). So it’s filtering from a list of over 3,000 posts to 233, then down to 10 which ultimately became blog posts.

Once an hour (configurable, of course) a cron job updates the feeds and brings in new posts to review. The thing I don’t like about this is most feeds don’t update once an hour. A suggested future feature for some aggregator out there: updates polled by post frequency. What I mean by this is analyze the number of times a feed has posts in say the last week and then come up with a minimum update of say once a day. This way if a feed that doesn’t have a lot of updates starts having them in one day, it can alter the frequency of checking for new posts.

The picture below displays what happens in reblog when new posts are available from feeds. When new posts are available you can either archive and publish or archive. With the AJAX and keyboard shortcuts the process moves pretty swiftly. I timed myself pounding through a dozen feeds and 80 or so posts in less than five minutes. The ones that are marked for publish can then be read closer.

Criteria used to weed out feeds
This time I was even more picky. Had to be, really, as the list is getting a little too long to manage. Who can comb through 3,000+ posts a day even with headlines and get any other work done? I think 250 is a good target number of feeds for me, which after the paring I’m down to about 160 (Here’s my updated 3/20/2006 OPML file), so I have room to add more feeds. And there are a few feeds I want to add so if anybody reading doesn’t see their site there check again in a week or two. You can always tell me why I should be subscribed to your blog in the comments/trackback area too. I’ll look seriously at adding any blog that isn’t spam/splog, contains a good mix of unique content (or a unique view) and is updated at least once per week on average. Preferably 2-3 updates a day.

With my target RSS feed goal if the average feed has 10 posts, that would mean I’d be reviewing a max of 2500 posts at one time. Since most of the feeds I read don’t have 10 updates a day, that puts the number of total posts I’d skim on a daily basis under 2,000. This doesn’t count the feeds I’m reading that I’m not subscribed to, so you can see the numbers can get pretty large. The bottom line is I don’t think it’s realistic for anybody who isn’t a full time blogger to be able to keep up with much more than 250 feeds. Scoble has pared his list back to around 800 but I think before too long he’ll be back down around 500, and he’s a pretty voracious feed consumer.

Here is the criteria I used for unsubscribing:

- any feed without a new post in the last 60 days. I found quite a few feeds which hadn’t been updated in 2006 yet.
- any feed where the set of most recent posts had nothing that interested me. I marked all interesting items with “publish” in the reblog interface. If not even one of the most recent posts were interesting, then I unsubscribed. Chances are good if this blog/website updates with something good again then I’ll be back. If I were a blogger I’d make sure each feed contained at least 10 items, but don’t go crazy with some number like more than say 25 feeds.
- clickthru does not lead directly to the article. Those inserting full page ads in between permalinks deserve to be sent out to the woodshed and whipped. Send your readers directly to the content. Mainstream Media (MSM) was the most guilty in this department.
- popups used. I intentionally disabled my popup blocker to see which sites were using popups. These were dead in 2000, but some continue to assault readers with them. Goodbye.
- feeds that have moved. I resubscribed if the other criteria above was met.
- duplicate information. Seems like some of the feeds I subscribed to had crossover information like the CNET feeds for example. I got rid of most of the niche feeds and stayed subscribed to the broader feed instead.

I didn’t remove partial text RSS feeds. Those don’t bother me like they do Scoble and some others. I like clicking through to the site and reading the content there. I also didn’t remove feeds that contained ads. I don’t mind ads in feeds as long as they are complimentary to the content and not invasive. I want people with good content on the web to make money so they will continue to be compelled to create new content.

Now I’ll toss this over to readers again. What is your criteria like for subscribing/unsubscribing? To keep subscribed do you have to see at least one interesting thing a week, a month, a quarter? What’s your criteria like? How many feeds can you realistically keep up with?

Related Posts

RSS Feed comments for this post 11 Comments »

  1. Well, I feel like a real amateur — I subscribe to a grand total of 14 feeds, and I have trouble keeping up with them and getting any work done.

    Comment by Sterling Camden — March 20, 2006 @ 5:29 pm PST

  2. Nahhh Sterling, you will find that as time goes on you’ll get more and more feeds. Some of the feeds aren’t blogs but newspapers, search query results, etc. BTW, your blog was one that I wanted to get added (and have now done). Looking forward to reading what you have to say in the coming days :)

    Comment by TDavid — March 20, 2006 @ 9:29 pm PST

  3. Interesting that one of your criteria for unsubbing from a feed is lack of updates. I’ve always thought that RSS was a big help in helping me stay on top of sites that update very infrequently, especially when there are a lot of them! I totally agree that an asynchronous, non cron-based update mechanism is needed. This is obviously not possible with Apache & PHP, and we’ve been targetting Reblog for mere mortals. But recent experiments I’ve done with Twisted Python suggest that it would be a good environment to use for a feed agent that polls on your behalf, for advanced users. Thanks for the kind & exhaustive comments, don’t hesitate to talk in the RB Forum on sourceforge.

    Comment by Michal Migurski — March 20, 2006 @ 10:17 pm PST

  4. I use other checkers (filemod) for checking when sites with content older than 60 days are changed.

    Cron could still work to update as I outlined. My thinking is cron hits a main script and then only the feeds are updated based on the algorithm I described above. This way feeds that only updated once a week aren’t polled 24 times a day and perhaps no more than once per day. I realize it’s a very small hit, but if every aggregator does that it does start to add up and it’s a waste.

    I’ll keep the sourceforge thing in mind, thanks for stopping by :)

    Comment by TDavid — March 20, 2006 @ 11:33 pm PST

  5. Wow, I’m honored — now I’ll have to really work on content for my rambling blog. One of my to-do’s for the site is to improve the feed. The site uses bbcode for posts and comments, and the feed just dumps the bbcode as text. Besides the obvious downsides of not making images and links live, it also has the potentially dangerous side-effect of interpreting embedded code as live. In other words, if my post contains an example javascript enclosed in script tags, it could execute in your feed reader! Not that I would do that sort of thing…usually.

    Comment by Sterling Camden — March 21, 2006 @ 12:16 pm PST

  6. OK, I got off my __ and fixed the feed content: http://www.chipstips.com/microblog/index.php/post/256/

    Comment by Sterling Camden — March 21, 2006 @ 4:20 pm PST

  7. […] The breakdown with aggregators BTW, I don’t think tech.memeorandum should have no competition, but let’s be realistic here: how many tech bloggers want or need or actually can follow more than one tech news aggregator? I follow digg more closely than Slashdot simply because there’s no time to cover both extensively. I like boing boing and engadget, but I don’t like sucking on their firehoses. If I tried to cover TM and its numerous clones, I’d miss reading some individual blogs simply from the nature of trying to keep up with the constantly evolving nature of these sites. I don’t even visit TM every day any more, which in my opinion is the least disruptive of the ones I’ve seen, for this very reason. I’ve gone back to reading my RSS list more (with the help of reblog) than following the aggregators. It’s too easy to get sucked into the echo chamber of the same group of bloggers if any one aggregator source — and that includes TM — is used too extensively for too long a period of time. This is no grand revelation, other bloggers have noticed this too. […]

    Pingback by Make You Go Hmm: » Stop using Alexa for serious traffic analysis — April 6, 2006 @ 10:38 am PST

  8. […] started with reBlog as primary RSS Reader. I’m not sure if reBlog was available back in 2003 when this blog was created, but reBlog has become the best RSS reading experience for bloggers that I’ve tried to date. And I’ve tried at least a dozen different systems including web-based, desktop and readers that integrate with mail programs. I’m able to pour through the maximum amount of RSS content in a very short amount of time using reBlog. On a busy news/RSS day, literally thousands of posts roll through my aggregator. My earlier RSS reading wasn’t nearly as efficient or organized and wasted time that could have been spent relaxing, doing other work and/or creating more posts. Get a system going with reading that maximizes your time. […]

    Pingback by Make You Go Hmm: » Look more forward than back, two million in the distance — July 26, 2006 @ 10:20 am PST

  9. […] I’ve been speaking fondly about reBlog since March. It’s a great RSS reader and tool for bloggers. It’s assisted me in pouring through well over 75,000 posts total and 500-1,000+ new posts every day. The screencast below takes through the filtering process of a couple hundred posts. […]

    Pingback by Make You Go Hmm: » Filtering 6,000+ posts [screencast] — August 11, 2006 @ 11:11 am PST

  10. […] The most subscribed I’ve ever been is in the 400+ feeds range. At the time I felt I couldn’t keep up with that many so I went through a major cleaning and have since learned about an efficient system that helps me scan and read through a significant number of posts every day. The system I’m currently using allows me to sift through nearly 200 feeds daily without getting swamped or taking too much time. I feel like the number could be 250 or even 300+ without burying me, but that hasn’t been tested yet. With a couple vacations and AFK time I did get as much as nearly 10,000 posts behind, but I was able to get caught back up without surrendering to the reblog equivalent of “mark all as read.” […]

    Pingback by Make You Go Hmm: » Hmm quickies #40 and reaching 10,000 posts marked ‘publish’ — September 6, 2006 @ 9:59 pm PST

  11. […] Reblog is a free rsss reader that can be installed on your own server [Hmm review: Reblog]. In August of this year I made a Reblog screencast showing how fast you can skim through posts. If you a little extra time, install this program and then try it for a month. Try reading your RSS feeds this way and get back to me with how it works for you. Hands down to date this is the most efficient way I’ve found to skim/read the maximum amount of material in the minimum amount of time. Related PostsWP plugin Bad Behavior and Techmeme bot conflict resolution[site news] Problems with Wordpress and slow queriesFlexbeta: 13 Reasons To Use Firefox Over IE […]

    Pingback by Get your own Megite news page with RSS, also showing some RoN love » Make You Go Hmm — November 22, 2006 @ 10:15 am PST


TrackBack URI: http://www.makeyougohmm.com/20060320/3053/trackback/

Leave a comment


By leaving a comment you consent to the Official Hmm Comment Policy

Return Home


Copyright 2003-2008 KMR Enterprises All Rights Reserved