The New York Times has a very interesting story [sub required] about how the Department of Homeland Security is going to invest $2.4 million over the next three years to fund Cornell, The University of Pittsburgh and the University of Utah building a “sentiment analysis” system.
The goal of the program is to create a software system that can monitor and analyze what foreign newspapers and journalists are saying, and therefore thinking, about the United States. Researchers are going to test the system on “hundreds or articles published in 2001 and 2002 on topics like President Bush’s use of the term ‘axis of evil'”.
There are two concerns I have after reading through this particular story, however: first, I don’t necessarily expect Homeland Security to understand the blogosphere, but surely someone at one of these three universities could have said “uh, guys, what about tracking blogs?” My other concern is that they haven’t done their homework and identified companies that already have content analysis systems in place, like Colorado firm Umbria.
It’s this lack of industry savvy that constantly makes me sigh when I read about government-funded (read “our tax money funded”) research projects. Now maybe I’m wrong and these college researchers are already well informed about the state of the art here in the blogosphere, but it’s darn hard to imagine it.
As the story explains: “The new software would allow much more rapid and comprehensive monitoring of the global news media, as the Homeland Security Department and, perhaps, intelligence agencies look “to identify common patterns from numerous sources of information which might be indicative of potential threats to the nation,” a statement by the department said. It could take several years for such a monitoring system to be in place, said Joe Kielman, coordinator of the research effort.”
Now compare that with the description of one of the key services offered by Umbria: “Sentiment and Satisfaction Analysis: Umbria provides insights into the discussion and the context of discussion about brands and whether or not the discussion is positive, negative or neutral.”
It’s not quite the same, but it’s darn close. And a little birdie tells me that there are even more startups that’ll be introducing semantic and content analysis systems to track the millions of new articles added to the blogosphere every day.
Project supervisor Joe Kielman of the Department of Homeland Security sounds like the kind of guy who doesn’t spend much time on the blogosphere, frankly. He’s officially Science Advisor to the Under Secretary for Science and Technology within the Department of Homeland Security, which is quite impressive, but his prior experience is that he “worked for 20 years at the FBI, where he was successively chief of the Advanced Technology Group, chief of the Research and Engineering Unit in the Engineering Section, and Chief Scientist for the Information Resources Division.” [ref] Joe, before you and the Department spends $2.4 million on this project, it’d make a lot of us taxpayers happy to know you’re aware that:
- Just about every newspaper now has an RSS feed (including foreign papers)
- RSS feeds are easily aggregated
- There are many tools to monitor these aggregated feeds
- Vendors already have tools that do semantic analysis of large bodies of text
Maybe I’m just being picky, but I’d rather see the $2.4 million spent on new research and new tools, not reinventions of existing commercial systems.
What do you think?