What fraction of Web pages are original content?

While perusing my RSS subscriptions in NewsGator this evening, I bumped into what sounded like it might be an interesting article at Lockergnome. The piece was two paragraphs of commentary followed by a link to the “source article” at RealTechNews. But that wasn’t the origin of the story… the trail actually goes further and further back until the original piece is finally unearthed. Here’s the trail I followed for this article “Ten signs your son is a hacker”:

What I find so interesting about this is that time and again instead of people tracking to the original source and then linking to it, in what I’d describe as a “wheel and spokes” model, Web authors are instead linking in more of a “daisy chain” fashion, perhaps never going all the way back to the original source (where it’d be quite clear that it was written back in December, 2001 and obviously a deliberate attempt to provoke the hacker community into a debate).

In some sense, I think this is one of the dark sides of the linking mania that the Web always had, but that’s become far more rampant with so-called linkblogs where people post links to what they consider interesting content, as often as twenty or more times a day. They’re serving as screeners, as selectors, but how they’re selecting content is what I find worthy of note.
In particular, it’s a good question to ask yourself: how often do you actually create original content for the Web and Internet, and how often do you just copy and paste or even simply point to other material without even offering even a minimal explanation of why your reader should care?
What’s most interesting about this resurgence of a tired meme, if you will, is that the original article is really rather sloppy, poorly thought out, and factually rife with errors. If it isn’t a hoax, it’s certainly not demonstrative of any intelligent thinking about the interesting question of how you would know your child is a ‘hacker’. Yet it’s being linked to by bloggers and web authors, without any of them digging into the link chain and finding out what’s really going on.
Instead, after the fact, Alice Hill, the reporter for RealTechNews who “broke” this story, issued an update on her story hours after posting it:
“Update: Our suspicions were correct, the posting was a hoax that originated on a a site called Adequacy.org (now defunct) that took pride in posting things that sparked controversy and outrage. I�ll leave the segment here, but maybe someone can come up with a real list of signs because teen hacking is something many parents aren�t aware of. D�oh. Thanks to Rob for pointing this out!”
If Ms. Hill was suspicious, why didn’t she dig into the story before posting it to the site, then?
My point isn’t to embarrass Ms. Hill at all, however, but to simply highlight both the challenge of Web readers finding quality original content, consistently, from Web sites, and the challenge Web authors face writing articles in a world where everyone can fact check everyone else.
At a bare minimum, if you are producing content for the Web, either in a weblog or some other type of site, please spend the few extra minutes digging through your citations and references to ensure that you point to original material, not indirect links. It’ll improve the veracity of your prose and satisfaction level of your readers.
It also makes me wonder what percentage of the billions of Web pages that modern search engines index have at least 75% original content. My guess: less than 15%.
What do you think?

4 comments on “What fraction of Web pages are original content?

  1. Thanks for another good, thought-provoking post. I see a somewhat different motivation behind bloggers linking to where they found content rather than the original source: they want to give credit to the intermediate source for digging up something interesting. I’ll also see links to the original content preceded with “hat tip: [intermediate source].”
    But most blogs I follow don’t reproduce other content wholesale. They’ll except a bit and comment, linking to their source for the full background. Usually I’m as interested in the commentary as the original content. As such, I appreciate the “wheel and spokes” model, at least when it’s adding value in this manner.
    As for blogs that just repost other content: it’s usually pretty easy to judge when a blogger isn’t contributing anything of their own, and I won’t bother visiting their blog further. And when someone is duped into reproducing satire or flame bait as “real news,” I appreciate knowing when someone isn’t checking their sources. It helps me judge their credibility, or lack thereof.
    I’m slightly more optimistic about percentage of original content. I’d take a wild guess at 30%. When I search for things it seems like I find the same content (i.e. same wording, etc.) in about 3 different places. Though there is certain content that throw the averages off, e.g. *nix man pages, which show up everywhere. Maybe 15% is the better estimate after all…
    Best regards,

  2. Excellent post, Dave. Sometimes I think many bloggers are just lazy, and don’t dig for the facts (or at least the origin of a post or article). At other times, the person may be new to blogging, I suspect, and isn’t aware that he needs to do a little research before repeating what someone else has said.
    If people continue to repeat info without checking, I fear the blogosphere will soon resemble the emails I used to get from my aunt, telling me about the Neiman-Marcus cookie story over and over. 😉
    Josh is right, too. Most folks will quickly stop visiting a blog that just reposts recycled material, unless the blogger has a unique way of packaging it, synthesizing it or commenting on it.

  3. Perhaps an even more distressing in terms of content originality are people that flat-out copy content.
    I’m an admin at various discussion forums, and it’s amazing how brazen individuals are in simply duplicating entire articles. Sometimes the will attempt to credit the source (polite, but not an excuse for copyright violation), and other times present it unattributed.
    There are many thousands of websites built entirely on content taken from other sites. In some cases they can attempt a dubious justification that the content is in the public domain, e.g., press releases; in other cases, the content is flat-out stolen. (In your original example of the hacker article, I’m surprised that most of the dupes weren’t complete ripoffs of the original.)
    To their credit, bloggers usually try to link to their source, though your point about taking the time to find the original source is an important one. Otherwise, blog dialog will turn into an internet game of “telephone” (the old party game where a message is passed from person to person, and is rendered unrecognizable by the time it returns to the starting point).

  4. Interesting question. To get a representative sample, and comprehensive answer, looks doable, if time consuming.
    My gut instinct is that a lot of the new bloggers seem to post original content, not having read anything or reported anything from others. These confessional/venting blogs are orginal, at least in not quoting or pointing. Long-term bloggers who tend to be more introspective also tend to make the blog a thinking space.
    The percent that are pundits or those who have run out of personal stories they have been itching to tell and have now told, may swing to be primarily link point and remark/snark.
    Does an unconsidered, brief opinion make it unoriginal? Where’s the line? Some people aren’t interested in journalistic thought or research. They are driven by their purpose and just want to amuse themselves and caper for applause or commiseration or to step back from talking and provide screened news to promote a rallying point for social reasons. It’s another way of relating of communicating. It can make for waves of duplicated subjects as people bandwagon.

Leave a Reply

Your email address will not be published. Required fields are marked *