I’m somewhat of a skeptical early technology adopter, I think. I’m always interested in seeing what’s new, but it has to pass either my “what’s in it for my business” or “what’s in it for my customers” filter before I’m interested in adopting it. So I realize that I’m not exactly the poster blogger for the “Web 2.0” phenomenon. Heck, I’m in the Microsoft Windows Vista beta group and still haven’t installed the OS on any of my computers!
More and more, though, I’m recognizing how dissatisfied I am becoming with RSS and, in particular, with RSS readers. The promise of being able to assemble my own personal newspaper just isn’t being realized and I fear that the entire world of RSS is starting to slowly sink beneath the waves of geekdom and bad interface design.
The tip of the proverbial iceberg is duplication of content, but as someone who has stood on stage and argued about the irrelevance of the “partial versus full feed RSS” debate with people like Robert Scoble, I have to say that I’m starting to see a much more insidious problem…
The real crisis brewing in the RSS world is that RSS tools aren’t letting the reader gain full control over their reading experienceand it’s crippling the potential of the “RSS-based Web”.
One example is with partial versus full feeds: Apple solved that by letting you specify just how much of each feed you’d like to see in your reader (in this case, it’s Apple’s Safari browser with built-in RSS capabilities). This frees up information providers to decide how much they want to make available in their feed, but – and this is critically important – individual readers have control over how much they view rather than being at the mercy of this full feed versus that partial feed.
Where is this capability in other RSS readers, though?
The lack of adoption of this elegant solution highlights a limitation of RSS readers more than anything else, and the fact that we still have the debate (and high-visibility bloggers are still complaining about partial feeds!) shows exactly what I’m concerned about in terms of innovation and adoption of solutions: if feed detail controls were a common user-adjusted configuration option, then there’d be no debate.
But I know, I know, partial versus full is old news, so let’s talk about something that I haven’t seen discussed before: the duplication of content in RSS feeds.
Duplication of RSS Content
Now I’m not talking about how a wire story from Associated Press appears on a dozen newspaper RSS feeds in the following 24 hours. I’m talking about how the same entry in the same RSS feed appears time and time again in my RSS reader, without any obvious rhyme or reason.
The Washington Post, Business Week’s Blogspotting, SixApart’s blog, WIRED, none of them are immune from me catching ups, then coming back and seeing the same darn story appear again in my reader even though nothing’s changed.
It’s like buying the Wall Street Journal and seeing the same story appear three days running. Where’s the new material? More importantly, where’s the promise of “only what’s new since last time you read your feeds” that’s the basic premise of RSS readers in the first place?
Is this a problem with RSS itself? I don’t think so.
The problem is with the RSS readers that can’t differentiate between an updated feed entry of an article you’ve already seen, an article that has had a typo fixed or other minor – oft trivial – correction, and a completely new article. In the first case, I do not want to see the article again. In the second case, well, I probably don’t care about corrections unless they’re major, and in the last case, well, obviously, that’s the whole reason I’m using an RSS reader in the first place.
The Path to a Solution
In keeping with my theme about user control, though, I don’t want to have a programmer solve this for me, I want a user-adjustable setting where I can tweak how sensitive my RSS reader is to changes in previously seen entries. Imagine a slider where one side says “show me everything, even if it’s just a freshened feed of ancient articles” and the other side says “minimum new content required: 100%”.
Or, better, a series of different criteria and a smart back-end content analysis system that would let me indicate “don’t show me syndicated wire content more than once” and “don’t waste my time with spam blogs: if I’ve already seen the content and they don’t add any commentary, ignore it.”
You might not like this idea, but that’s why we need to have this as a user-adjustable setting. Don’t impose your information filtering needs on me and I’ll do my best not to impose mine on you either.
What Happened to Innovation, Anyway?
There’s more amiss in the world of RSS readers than just duplicate content and feed length, though: there seems to also be less and less innovation and choice as companies like NewsGator keep gobbling up little guys to normalize them all with a shared backend. Great concept, but where’s the innovation in the marketplace?
I’m actually a big fan of NewsGator but recently switched from using Newsgator Online to competitor Bloglines as my Web-based reader because I just couldn’t deal with the incessant duplication of content (often 10% or more of the entries I was shown I’d seen in an earlier viewing) and the frequent server errors.
Let me say that Bloglines has lots of problems too, not the least of which is that it’s astonishingly easy to click on a link in an article just to have your entire session vanish: you go to the new content, read it, back up, and all the other pending articles have been marked as read. I get bitten by this at least every two or three days.
It’s not rocket science to create an “RSS reader toolkit” where we could adjust all of these different settings, but somehow I have yet to find an online reader that lets me truly gain control over my RSS experience, and that’s worrying.
With all the hype about Web 2.0 companies and all the mashups and such, are we yet again as an industry forgetting to just pay attention to the basics of usability and functionality?
I just can’t believe I’m the only person who sees these problems.
Or am I? What would you do to improve the state of RSS readers and the general experience of reading RSS-based content?
It’s not about the technology, not about XML versus Atom. But even a simple interface idea of letting me specify how many columns of content I’d like in my ‘virtual newspaper’ is something I have yet to see, let alone allowing me to tweak the semantics of the reader so that I can display or not display corrections if feeds are republished, filter out any content more than X days old so adding new feeds is less painful, have highly sophisticated search results tightly integrated into the reader, and of course, have some sort of anti-duplicate-content settings too.
Am I shooting for the sky, or is this something that we should be able to attain today?
I don’t recall anywhere in the RSS specifications where it says anything about user interface. That may be part of the problem.
The duplication problem could be somethings that the RSS provider is doing to spoof the system by making it seem like there is more coming out than there is. Rueters is rather adept at that. Or the newsreaders are doing poorly at implementing the logic to determine what is new.
I agree completely Dave, and I have many more reservations and complaints about RSS/Atom.
The only post I ever deleted on Vaspers the Grate was “Against RSS” of about 2 years ago. I was flamed very severely on my anti-RSS post, especially by Rich…! of Hello World.
I re-thought my position, after arguing a bit, and decided I was a Luddite on this topic, lagging behind the learning curve. So I deleted the post, and added a Feedburner widget to VTG blog.
I opposed feed syndication as Push vs. Pull marketing. I felt that feeds pushed content at users, while good content pulls readers to you. I still have this perception.
But I have tried some feed readers, the Firefox Wizz, the Avant Brower feed reader, and Awasu feed scraper.
I vastly prefer Awasu, and have given them much publicity and free marketing assistance and advice. It is a feed scraper, thus it will create a feed from HTML data on any web site.
I like feedrolls, like Digg and LockerGnome. A new service has contacted me, forget the name right now, but it will create a feedroll for sidebar of blog, from any RSS/Atom URL you give it. I suggested that they build a feed scraper, like Awasu, into their service, so users won’t have to known the feed URL of a blog they want to add in a feed roll for their blog.
I don’t like reading blogs in feed readers, though. I prefer to actually go to the site. In an RSS/Atom feed reader, you don’t see the ads or sidebar, etc.
I bought the Extreme Tech/Wiley book “Hacking RSS and Atom” and have learned a great deal, especially concerning your problem of redundant info, the unnecessary updates, where a new ping is derived from a simple typo correction.
It’s called “polling”, and it can be solved.
[QUOTE–p.76]
“Removing the description property, for example, causes the aggregator to ignore any small changes made to feed entries (for example,…small edits made by a blog author)….
On the other hand, some aggregators pay attention only to the title of a feed entry, and so they sometimes miss new entries that might use the same title for a series of entries…”
You basically don’t want your feed aggregator to poll a site more than once an hour, since minor edits often occur within that time span. I often publish a post, then see a typo, or a disclarity, and edit it, rinse, and repeat.
The good thing about RSS/Atom is how you can have content delivered to you, like email messages.
It’s just going to take some time to iron out all the problems, which are many.
RSS/Atom feed syndication is not what it’s cracked up to be.
I shut up now, for I could go on and on with observations and advice.
Get that book and you’ll learn a lot. Since you are far more geeky than I.
Your post echoes Jakob Nielsen’s Alertbox of today, about focusing on simplicity and usability, and ignoring the Web 2.0 hype, the web sites that are singularities, rarities, eccentric successes.
I saw a blog post recently about “converting RSS subscribers to manual readers” to get more traffic to your blog ads, too.
… I fear that the entire world of RSS is starting to slowly sink beneath the waves of geekdom and bad interface design. …
a) RSS is an XML specification – you’re looking for love in all the wrong places. š
b) While blogging has popularized RSS, influential bloggers have set unreasonable expectations for RSS behaviors (i.e., they forgot it was simply a specification for creating awareness).
c) Seeing it sink beneath the waves is a clear indication that the adoption rate is accelerating. š
My app’s testers refer to duplicate content as “re-runs.” (I’m an aggregator developer.)
It absolutely *is* a problem with RSS — or, at least, a problem with many feeds.
The problem is that RSS doesn’t mandate unique IDs.
So you have situations where items that seem to be duplicates (but really aren’t) are displayed as brand-new items.
For instance:
Say your aggregator’s database has a news item like this:
Title — Kittens Born
Link — http://example.org/
Description — A litter of six kittens were born today to a cat.
Then say the editor later decides it would be a good idea to mention what kind of cat. So the feed updates, and the item looks like this:
Title — Kittens Born
Link — http://example.org/
Description — A litter of six kittens were born today to a tabby cat.
You will probably see this as a brand-new item in your aggregator, and you will think it’s a duplicate. But it’s not a duplicate to the computer, because it’s different.
The situation is completely different if the item has a unique ID. Then, when it updates, your aggregator knows it’s an update (because the unique IDs are exactly the same) as opposed to a brand-new item, and it can mark it as read or unread depending on your preferences.
Some RSS feeds do have unique IDs. It’s part of the spec, it’s just that it’s not required. (Atom *requires* unique IDs, which is the coolest thing about Atom.) So it is worth looking at your aggregator’s preferences to see if you can have updated items marked as read, at least for the cases where there are unique IDs.
I’ve used Awasu as my feed reader for a few years now, and I love it. It has a lot of extensibility built in allowing the use of downloadable plug-ins as well as allowing for user created plug-ins.
I very rarely get duplicate ‘stories’ from a single source via Awasu.
Hi Dave,
My main problem with RSS feeds is also duplicate content. And the inability to fine filter stuff on dynamic keywords from a site automatically.
Also don’t you just wish your RSS reader of choice could reliably decide whether something was interesting before you started to read it?!
Jim
I’m sort of dissapointed with RSS. I was expecting to be able to assemble something very similar to a custom newspaper. But with most readers, all you get is basically a shortcut to a webpage, which then renders when you click on it. My first reader was the Firefox Sage extension. Then, I started using Firefox’s Live Bookmarks, which I still find usefull. But the best reader I’ve found is Planet. I have a cron job that runs planet to update the page, so all I do is turn on the computer. I know when the job is scheduled, so I just wait a few minutes afterwars, and then read my “newspaper.”
Maybe with some creativity, instead of having planet generate a single page, it could be used to generate a front page, which what a user considers an important blog, and… well, you know the rest.
Overall, just like the Web, which is a lousy implementation of Semantic Web, I believe RSS was misused/misinterpreted. Oh well.
Look Guys You really mess up the definitions.
As already some mentioned here:
RSS is a specification, not interface and this is to remember.
It is like a blue print, and the implementation depends completely on the manager, the engineer and the workers.
Now let me explain.
First to mention my experience — I used to work for an online aggregator service provider ( not mentioning their name for now ) for a year and a half. I started as a media-intelligence developer, later moved to team leader of the same team for one year. Meanwhile I have developed our internal ticket system in Perl and had the task to develop an RSS feed for our tickets, for internal use. Here is where I became to get a bit familiar with RSS.
Now, Your primal big horror — the duplicating content — may arise for several ( yep ) reasons. First of all, it may come not because of the reader’s incapability to differentiate the new content, but for the messed XML, the feed-providing site streams. Since on some documentations, it is not very well commented, I used to do big messes with my feed, until I realize that it is not reader’s fault, but mine. So who knows what newbie coders the website hired to program the RSS routines.
Second, it may be because of the article changes. But here is the problem — I do not remember RSS specification to be strict about this, since it describes the format mainly. So the logic is implemented mainly by programer’s vision. So we get back to the feed developers, or feed implementers here.
Third — the reader. Now we finally are getting to the end-user and its reader of choice. One is getting duplicates, other is not. Well here is now really the reader’s fault of implementation and algorythms. Who knows what code lies there.
4th — Unlike Atom ( I am not familiar with Atom, just read few words about it ), RSS exist in several versions, some incompatible with each other, as I read. I personally decided to be strict and go only with the last one. The last RSS version, as far as I know is 2.0. Ok. but what about that there are soooo many sites with sooo many feeds? Which RSS version they have implemented? Do they strictly implemented the specification for their RSS version of choice? Who knows. Back to “First”.
As You can see this is a bit complicated issue. During this year and a half my team got alot ( really ALOT! ) of issues about duplicating content. Our case was a bit different, since we had our own web-crawler for getting some of the content, which was not coded good and caused my team alot of trouble, but we were not in charge for it ( another team was ). Yep — here You get another issue — his majesty the web crawler of the service provider. Believe me this could be a nightmare, if not coded good. The programer rewrote the crawler few times, as HE claimed, but who knows what he did actually. Must say that coding a crawler to get content from 2-3 sites, could be done with a wget for example and a scheduler, but coding a crawler for thousands of sources is not an easy task and often, as was my case when I was writing our ticketing system, programer’s vision collides with manager’s vision and nomatter how You want to write an application, it is the manager who decides last. This is how commercial software is being developed, which is very frustrating for new programers.
I have been trying for a very little time few readers, but only FeedReader and Abilon seems good to me ( keep in mind that I have tried only FEW readers and cool software is being developed and released EVERY single day worldwide ). Very cool are also the default feed-reading implementations in Firefox and Thunderbird ( I am kinda Mozilla fan ). I personally use Abilon for now. Must hardly point, that readers behave different. But this is not always application issue. The unique ID issue, which was mentioned here yet is very important thing — some readers completely rely on it, since it is by specification, some do not and luckily because some feeds do not implement it correctly. Their developers should really go to the famous “RTFM” algorythm, since it is a totally ID-10T hardware device problem issue š
Interesting to me are the online aggregators, since the desktop applications as FeedReader would be always limited by the end-user platform, while an online-application could be always powered by a good database and other engines, which expands the capabilities with tons of possibilities. Of course everything is limited by the imagination of the project manager and the developers.
To finish my post here, just would like to point out to Jim Symcox, who posted comment few lines above, that his wish about information evaluation, I already saw implemented on few online service-providers, 18 months ago, so do not remember the names, they were rivals to our company I remember. As I said — everything is limited by the developer’s imagination, the manager’s vision, the end-user critics… and of course the server’s hardware and software.
…( but mostly by manager’s vision )…
By the way it was very interesting to me to read Your opinions and wishes about the technology. Would be glad if everybody shares at least a little his vision. Thanks.
Regards:
Ognen_Demon
I would agree with your concerns over RSS Readers. I now rarely use Bloglines because of the concerns you mentioned. I actually prefer to read blogs by clicking the blogroll links on The PDA Pro blog. But, it does nothing to filter the content for me from a productivity standpoint.
I think it’s a reflection of the issues we’re having because of the growth of global conversations, which will serve to explode the amount of online content over the next decade. It’s not necessarily a good thing, but perhaps inevitable.
I agree completely that too many of us (myself included) have focused too much on the geeky details and not enough on basic usability. I don’t think you’re shooting for the sky at all – the companies that figure out how to simplify this stuff without dumbing it down will be the ones that succeed.
I also agree that the whole “full vs. partial feeds” debate is overblown, and that the issue could be more easily resolved by enabling users to control their reading experience. This is why FeedDemon 2.0 enables choosing how much content to show for each post – you can view the entire post, an excerpt of the post (which all HTML except hyperlinks removed), or just headlines.
Duplication of content is a bigger problem, but it doesn’t need to be. IMO the RSS aggregator should only consider a post updated if the feed *explictly* states that the post has changed (through dc:modified or atom:updated). Of course, this ignores the whole “uniquely identifying posts” issue that’s at the root of the dup problem, but that issue is perhaps too geeky for this thread š
I suppose what I’m really looking for is the Holy Grail of news readers. It would be a flexible software so that I can work the way I want to and not lock me into someone else’s preconceived notions. I want a news reader, browser and blogging client all seamlessly wrapped into one with all the bells and whistles. Is that too much to ask for? Here are my recommendations in my quest for the perfect news reader…
It’s worth noting that the Mac news-reader, endo, has an option so that one can view a feed using the original CSS styling of the home site.
Brent actually writes one of the best RSS aggregators out there – NetNewsWire (Mac OS X).
I haven’t had much trouble with duplicate entries with NetNewsWire – Brent must be doing something right. His explanation of why to prefer Atom was excellent – ironically NetNewsWire was very slow to incorporate Atom – coming only in version 2.0.1.
I can’t understand why you fellows prefer the online news readers. One of the joys of NetNewsWire is that everything is local on your computer and you click or scroll through articles with nary a pause.
As far as full or partial feeds go – nothing is more aggravating than those teaser feeds. Unless a site is very, very important to me, a teaser feed doesn’t stay in the news reader for long.
for those looking for advanced newpapering, check out newsAlloy.com