Here’s an idea: what if when I wrote weblog entries about General Motors, I included a special tag, a keyword tag, that let everyone who wanted to read blog entries about General Motors read my weblog article, without otherwise having to subscribe to my blog? Makes sense. Now, should it be “gm” or “GM” or “generalmotors” or “general motors” or “General Motors” or “GM Corporation” or … ?
Therein lies the fundamental problem with Technorati Tags, as promoted by the popular weblog search system and utilized by a small percentage of bloggers.
Librarians are very familiar with this problem, though at a library it shows up as the “keyword problem”: having keywords assigned to a particular book can be very helpful as long as you agree on the subset of all words that comprise the entire keyword dictionary. Stultifying though the standards may seem, having a “use the formal company name that appears on their annual SEC filings” or “search the tags database before creating a new tag” rules do alleviate some of this trouble. But not all of it.
Instead, Technorati advertises that they’re now tracking 466,951 different tags, which is pretty darn impressive when you consider that a typical dictionary has around 75,000 entries (caveat: I’m relying on memory here, so I might be way off on this number).
Perhaps I don’t get it. Perhaps the Technorati tags are actually working better than I think because they’re traveling as “memes”: if I use and clearly cite a specific tag in my weblog articles, then you’ll use the same one so our articles are linked.
But, surprise, that doesn’t work either because you end up with subcommunities who are standardized, but against a different de facto standard than other subcommunities. In that situation, does one group then change their tags retroactively, or does the person surfing for tagged articles have to know about both? Or three different tags? Or a hundred?
With almost a half-million tags and with an online community that loves to engage in keyword and key phrase pollution to be more “search engine friendly”, I posit that the Technorati tags are a failed experiment and are just going to become increasingly irrelevant as the namespace continues to grow without bounds.
But I could be completely wrong. Neville Hobson is clearly supportive, Technorati’s CEO Dave Sifry is clearly a fan of tags, and even lawyer J. Matthew Buchanan is a fan.
What’s wrong with this picture? What don’t I get here? What do you think?
no time to really dig into this, (I’m 2 weeks into a 25 day trip), but here’s my notes from a relevant eTech conference session:
my personal take is that it’s too soon to underestimate the power of this so-called folksonomy movement. another related post:
I haven’t thought deeply about the various tagging experiments [yet], but it strikes me that this idea is approximately on par with taxonomies, which are okay in a small (or closed) enviroment, especially if there’s only one user; you. However, as you point out, the bigger the tagging space, the less likely this idea will maintain the benefits experienced on the ground floor.
I predict that these concepts will continue to grow in popularity because people want to organize information. However, they will paint themselves into a corner and the only way out is through the transformation of tagging schemes into ontologies. Don’t get me wrong – I think these early tagging systems are useful because they will educate us to the requirements beyond the near horizon and just how low the ceiling is for emergent (taxonomy-based) tagging.
As such, I predict that from a broad and divergent set of terms will emerge groups that create ontological overlays (see XTM 1.0) of agreed-open definitions. I suspect these groups will form around a domain of expertise and a desire to eliminate chaos created by ground-up taxonomies. This scenario requires the availability of tagging API’s that allow a higher-order architecture to emerge.
Indeed, we are witnessing the errant experiment that will lead to the dawn of ontological awareness. 😉
I think you’re right, Bill, and I certainly agree that there’s a clear need for some sort of taxonomy of discussions that’s above and beyond the pedestrian “Google search” or “Technorati search” that we have nowadays. Tags are a stepping stone, but I just can’t get too excited about having yet another manual step required before I can post a “fully findable” weblog entry.
It seems to me that the concept of a folksonomy (which is basically what you see with Technorati tags, and also with services like del.icio.us and Furl, where users are free to create their own tags at will) is pretty different from that of a taxonomy.
A folksonomy merges, diverges, and evolves much the way language does, through usage and interaction. A taxonomy, in contrast, is more like a master plan, rigid and fixed to a certain extent.
In practice, taxonomies are often a pain in the butt to use. They require people to extend effort to abandon their own perceived context and connections (which is what any labeling scheme is about) and instead fit something into someone else’s (often) ill-fitting box.
Yes, the lack of standardization you find with folksonomies is a problem for people who want to do one search and find every relevant result immediately. So folksonomies are not a good idea for libraries, archives, some business systems, etc.
That said, the strength I find with folksonomies is serendipity. With Technorati tags (which I have yet to implement) and Furl and del.icio.us (which I use avidly) I am often pleasantly surprised with the connections I find through tags. Generally, someone has applied a tag to a link I wouldn’t have, so I get to see how they made that connection and often my world gets a bit wider as a result. Or I locate individuals with tag lists that are intriguingly similar to or different from mine, and I use this as a way to start exploring their world.
So I guess, in short, folksonomies enhance exploration; taxonomies enhance searching.
That’s how I look at it, anyway.
I wrote more about folksonomies here: http://snipurl.com/cs4r
Enjoy all that oxygen down at sea level!
– Amy Gahran
I think one of the major assumptions in your argument (or at least, your expectations) is invalid: that your weblog entries will be “fully findable.”
First, the tags are there for the end user, not the provider. Any given tag is there first and foremost for User Numero Uno, myself. Anyone who gets additional value from that tag through organization, filtering, sorting, storing, etc. is a pure bonus.
At the same time, extensions will inevitably evolve tagging from a purely individual activity into one governed in a sense by the tagging mechanism itself. Already pages have sprung up to help you correct “stemming issues” in del.icio.us (having some entries tagged COMPUTERS and others tagged COMPUTER). Firefox extensions such as Scrumptious provide you with a list of popular tags for any particular URL, and a simple point-and-click interface to bookmark that URL with those tags yourself.
Finally, and I think is perhaps the most important rebuttal to your argument from a user’s point of view, the long tail (http://www.everything2.com/index.pl?node_id=1679291) does exist and is relevant. The Internet has room for everybody. We can all present our messages; we can all be heard.
P.S. I found this entry by doing a del.icio.us tag search for technorati.
Thanks for your posting, but I can’t see what makes adding a tag suddenly make an entry more findable than it would otherwise be. The tags are there for the end user, but the provider, the content creator, the author still has to create and add them, and that’s the crux of my argument. If you rely on users to accurately categorize and identify the three or four key concepts in an article, you’re setting yourself up for namespace collisions and confusion. Long tail or not, I never said that the Web wasn’t egalitarian, too, so I’m not sure why you think that your rebuttal needs to indicate that the ‘net has room for everybody. I already agree with you on that point!
I suppose what I meant is that, from a very broad philosophical standpoint, it seems that you are expecting tags or folksonomies or whatever Next Big Thing comes down to help out content creators and providers with being found and increasing visibility. My argument is that the new tagging Web is built for the user – not the content creator. Obviously you’re a user, too, but the central point is that tags were made for users, not creators, and that can be frustrating to content creators.
I referenced the long tail to make a larger point, but ultimately I whittled it down to the final one. My point about the long tail actually entailed the subtle unstated point about it: that most of the tail falls in the top 20% (aka Pareto’s Principle.) I’m sure if someone ever conducts an academic study of tag choices by users, we’ll see that there is a minimum of collision because most tags are straightforward and unadorned: “web”, “blog”, and “online” no doubt dominate, with tags like “linux”, “tv”, and “music” following right in behind. Along the same lines, a number of quote unquote personal tags will be all be entirely worthless towards searching anyways – personally, I have a Live Bookmark of the del.icio.us RSS feed of the “to_read” tag, because people want to read some interesting stuff, but who’s going to search for “to_read” in an engine?
In all fairness, though, an ontology must be developed in order to move beyond the neverending crush of tags. One solution that I proposed would be similar to the “stable” Wikipedia version proposed by Jimmy Wales. This would have an additional level of peer editing and review to merge and delete colliding namespaces, while at the same time providing a base-level taxonomy in the mechanism itself for others to emulate. I think using this to cover the basics (stemming issues, “mac” vs. “macintosh”) should be the first major upgrade of the tagging systems.
Since the year began, I have seen a variety of sites which implement tagging emerge on the net: a dating site ( http://www.consumating.com ), an open message board site ( http://www.tagsurf.com ), a new Wiki ( http://www.schtuff.com ), and of course, the ubiquitious Flickr and del.icio.us. And these sites are where your original post resonates the most: tagging may be difficult to scale, but the thriving subcommunities developing at these sites can use it to create more sophisticated and relevant linkages for themselves. Which sounds like what the Web was all about in the first place : )
I see the tagging technology as a way to create my own personal view of the web. The point about folksonomy being for users is a correct one in my opinion, and that’s folksonomy’s main strength, its main selling point.
I predict that after being spread around, and past the phase of early adoption, the tagging and technology can be combined with structural link analysis to provide better search experience.
In some sense, this can be viewed as a first step towards the semantic web: resources on the web receive a semantic value that can later be used by automated agents.
There are currently 2 problems I see with this technology:
1. Wide adoption is still a problem. A known fact is that early technolody adopters are usually people more fluent with computers, communications etc. The wide public isn’t like that. Before thinking about aggregating folksonomy and structural link analysis, I have to make sure that the emergant taxonomy indeed reflects a wide range of views (no “group thinking”, please).
2. Tag spamming. I think there are a few ways to deal with this problem, but I haven’t seen anyone address this issue yet. Spammers can easily use folksonomy tools to do exactly that (ever thought of a porn site tagged as “fun”,”family”, etc. ?). In order to reach wide adoption, this issue must be addressed in order to ensure stability and reliability.
What makes tags relevant to mass sites like Technorati, del.icio.us or flickr is the possibility to add at least some method of classification. As one pointed out earlier, a huge percentage of those tags is used by a very small fraction of users. These users will probably be the ones who just tag for personal use and not for the sake of the community. Lost by definition.
The other fraction of the community is well aware of the fact that they better chose commonly accepted tags which thus will serve as a common denominator in the long run.
You’ve got to look into the structures of dmoz.org to get an idea of how a large site can classify content which is generated by the unwashed masses without a common taxonomy, while adhering to strict rules for expanding the established classifications: It is even worse, as you have two causes contributing to the chaos: amateur editors, and lots of them.
Which may add to the fact that I know no one who relies on any kind of directory to discover content related to a certain topic.
Tags, key word searches, the bain of my life at the moment :o/. But as someone else mentioned it’s a stepping stone. Key words without content is a waste of time and same goes the other way round.