On November 9th, I was privileged to have the opportunity to speak at the Rocky Mountain Internet User’s Group meeting, on weblogs and RSS. I’m including the minutes of the meeting since they should prove quite interesting reading.
“If the Internet is the medium of the Information Age,
then content is the matter. According to Netcraft
there are more than 55 million websites in existence.
Although there are no hard statistics, some estimate
that the number of web pages is well into the hundreds
of billions. Google, even with its incredible breadth
and reach, only catalogues four eight billion of those
pages. It’s amazing that we ever find anything!”
The following are the minutes from the 11/9/04 RMIUG
Meeting: “Content Management: Wrangling the Beast:”
Josh Zapin, Dan Murray, and Jeff Finkelstein
represented the RMIUG executive committee at the
meeting. About 50 people attended. Jeremy Kohler
recorded the minutes and Josh moderated, thanking the
RMIUG sponsors for their support.
INTRODUCTION
If the Internet is the medium of the Information Age,
then content is the matter. According to Netcraft
there are more than 55 million websites in existence.
Although there are no hard statistics, some estimate
that the number of web pages is well into the hundreds
of billions. Google, even with its incredible breadth
and reach, only catalogues four eight billion of those
pages. It’s amazing that we ever find anything!
Since the dawn of the Internet, many netizens have
wrestled with the challenge of making content more
digestible, manageable, and ultimately, usable.
Companies such as Vignette, Documentum, and of course
Microsoft, have spent millions inventing systems to
help organizations get a handle on their information.
According to IDC, worldwide enterprise content
management (ECM) will grow at a double-digit pace from
$2.7 billion in 2003 to $3.8 billion by 2007.
SPEAKERS
DAVE TAYLOR is the founder of Intuitive Systems, an executive management
and communications consultancy. He’s been involved
with the Internet for over 25 years, including a stint
as a research scientist at HP’s R&D Labs and another
as senior editor of Advanced Systems magazine. He’s
written 16 books including the best-selling Wicked
Cool Shell Scripts, Creating Cool Web Sites, and
Learning Unix for Mac OS X. He has extensive
experience as an entrepreneur, including having
founded four startups in the last ten years. Dave will
talk about RSS (Really Simple Syndication, a content
publishing “dialect” of XML) and weblog (or blog)
technology and how they are used in content
management. His weblogs include: The Intuitive Life,
Ask Dave Taylor!,
Real Life Debt,
Free Web Money, and
Attachment Parenting.
ROBERT A. DICKINSON is Product
Architect at Xaffire and founder of Reasonably Sane
Inc. Trained as both a software
engineer and writer, Rob has always been fascinated by
automated authoring and publishing. Having led
technical development of collaborative information
systems for several local startups, including
CorpMed.com and Achieve.com, Rob founded Reasonably
Sane in 2001 to advance multi-author, multi-format
publishing. Rob will speak about the role of document
compilers, including Wiki engines and the RSane
Compiler, in managing large structured publications
with many contributors.
MEETING MINUTES
An informal survey of the attendees revealed that most
have managed a website and maintained its content, but
hardly anyone had used a content management system.
DAVE TAYLOR
I’m intrigued–it looks like the majority of you
manage websites without any tools. I wonder how often
you manage to add content. If you’re keeping a website
up to date, how do you survive?
It’s easy enough to build a website, but when you have
a flow of data like a page per day, you don’t have the
tools to get it in there. What happens when you have
to make global changes to all of the pages, like when
your logo or copyright date changes? Many people use
Cascading Style Sheets to manage some of these
changes, and although CSS helps, it’s not the answer.
Years ago people used template-based page builders
(like Geocities) to manage content. But it didn’t work
very well because you had to force everyone into a
simple, restrictive template. And it led to web pages
that became static and rarely updated. Google and
other search engines hate static pages, and puts them
way at the bottom. Go to the last page of a Google
search result and you’ll find stuff that hasn’t been
updated in eons.
—Weblogs—
But what if we had a template language where you can
actually focus on the content. That’s what weblogs
are.
With weblog tools, you just type text into a box and
click submit, and the blog does all the work. This
lets you just focus on the text itself. When you click
submit, all pages, indexes, referring links, site
maps, etc get updated automatically without you having
to do any HTML. You can even set it up to
automatically notify other sites that there is a new
posting.
A Wiki uses a similar system, except it’s just a
website that everyone can edit. Weblogs are more under
control, only allowing people to add content to a
predetermined format; usually you can’t change things
like navigation in a weblog.
I can also build a web page and leave it unpublished
until I’m ready. All I have to do is change a setting
from “draft” to “publish”.
You can manage multiple websites from a single blog
interface program, like MoveableType, which can be
very sophisticated. By learning and using a weblog
management tool, I have so simplified my life,
converting hours-long jobs to a few seconds.
Some people just add a weblog as an extension of a
static site, but I think its better to manage your
entire site with the weblog tool. Of course, getting
the templates set up can be a large task, but there
are default templates out there that you can use as
is.
You can also translate your existing web page into a
template. With a little work, you can pour your
existing HTML into a weblog template.
If you want to track changes in a weblog, you usually
have to set it up to save the old pages each time you
update. Wikis are much better at versioning.
If you use a blog, your hosting service has to support
it. Some hosting services even offer their own blog
services.
—Really Simple Syndication (RSS)—
There’s a bit of controversy over what RSS actually
stands for.
One commonly accepted version is “Really Simple
Syndication.”
When you use RSS, the servers do all the work so I
don’t have to worry about my PC.
NewsGator for Windows and Web, and
NetNewsWire for Mac are good RSS
readers., but my favorite is NewsGator Online, which
offers a Web-based interface to let me read all of my
RSS feeds in one place.
With an RSS reader, I can know what’s happening on
other blogs without actually visiting the sites. Many
blogs and even nonblogs have RSS feeds, so the reader
lets me monitor all of them in one place. It shows me
what’s new–it’s the personalized newspaper that has
finally come into reality. I can aggregate everything
I’m interested in on one page. Typically, you get a
sentence or two as a teaser, with a link to the whole
article.
Feel free to ask me any questions you may have about the wonderful world of RSS.
ROB DICKINSON
Document Compilers: Empowering Collaborative
publishing through WYSIWYM (What You See is What You
Mean)
Most of the information we encounter (online or
offline) is heavily structured. There are two
authoring approaches to creating this structured
information:
1) WYSIWYG (What You See Is What You Get) is “visual
typesetting,” and is lightly automated with a word
processor. It turns our PCs into very intelligent good
typewriters.
2) WYSIWYM (What You See Is What You Mean) authoring
by contrast compiles unformatted source content into
the final docs., By highly automating the formatting
process, so that youauthors focus on writing, not
formatting.
Collaborative projects with multiple authors presents
many challenges including:
- asset distribution and tracking: where are all the
versions? - consistent formatting
- consistent voice/terminology
- overlapping work boundaries
- review and approval processes
Regardless of which authoring approach you use, these
challenges persist and get worse with more people and
larger documents.
–Content Management Systems–
Traditional CMSs like Vignette or Documentum are
essentially ‘WYSIWYG for workgroupsers,’ extending the
visual typesetting model by offering:
- management and search various for docs in native
formats - online authoring tools
- auto format conversion
- change tracking
- workflow and notification
- security & rights management
These CMSs are good at handling a large “swarm” of
different documents within an enterprise.. But they
don’t do as well in cases where many people need to
work simultaneously on one integrated document.
–WYSIWYM: Document Compilers–
This “What You See Is What You Mean” approach
requires a “document compiler” program. The compiler
Like other compilers, it takes raw content from
authors and does all the work to produce a finished
document.
Authors can choose set formatting policy, like a
templatethrough compiler options, so they can focus on
writing and accuracy and quality. The compiler can
dynamically transform the doc as it’s being
built–content can be filtered, mined, and indexed to
personalize the content.
Many challenges with collaborative environments
(multiple authors) have to do with consistency. For
this reason, document compilers are better at handling
single, large, integrated documents rather than a
swarm of different documents (thatwhere CMSs are good
for excel).
The compiler receives a set of source files and then
digests them to produce multiple output formats:
websites, PDFs, or text. A server can manage a
collection of shared source files, shared among
multiple authors, and the compiler generates the
multiple outputs.
Compilers don’t require you to learn any software.
TeX, DocBook, Wiki, and the RSane Compiler are all
document compilers. When it comes to doc compilers,
“It’s all about the source syntax,” which is what the
authors have to use to create the content.
TeX: (tug.org) —
Based on aAn early typesetting system,. It’s popular
in science and academia because it’s good with
formulas. TeX Ooutputs DVI files that get converted to
PDF, HTML, etc. The source format is tagged text, with
back slashes and brackets. A pretty cryptic format,
but good in the scientific world.
DocBook (docbook.org) —
This is just an XML standard, and it uses XSLT and or
XSL-FO for output. It was originally an interchange
format to create docs portable between different press
systems. Now it outputs PDF, HTML, and text. The
source syntax is XML, which is complicated. Generally
you’ll need a professional tech writer to produce it.
Wiki —
Originally these were just editable websites, which
come in many open-source flavors. It Wikis allowed a
bunch of people to brainstorm online. More recently,
Wikis are being used for it’s more structured
collaborative documentation. But it only outputs to
web. On the plus side, the source syntax is very
simple. No tags or code to learn, just a few simple
conventions. I can just write content, more or less.
RSane Compiler —
RSane combines ideas from all of the above, and is
more relevant geared to creating use your in own
custom applications. It’s a Jjava compiler used in
server applications, with multiple outputs (web,
print, text), versioning, and good for large
structured documents. The source syntax is also
simple, so people are able to zero into the content.
It includes a management area where you can check in
and out the source docs. And if you want you can do it
right in your browser like a Wiki. It lets you
increase your delivery opportunities while dealing
with collaborative creation.
Wikis and RSane (soon, at least) will track and save
source versionsfile history. RSane uses passwords to
protects the docs . from unauthorized changes, but
visitors can browse published content without
registering first.
We needed wanted to publish a 400-page book four times
a year with different updated content each time: RSane
allows usis designed to do support that kind of
effort. Free downloads are available at RSane.com.
QUESTIONS AND ANSWERS
Who is RSANE targeted to?
Rob: We’re looking at several vertical opportunities,
but huge large regulatory (government) docs areis a
good target market. We’re generally targeting any big
documents that represent an ongoing investment of
work, where multiple authors are involved.
MoveableType is easy to use, but how does it perform
in search engines?
Dave: Google loves blogs. Search engines like things
that change a lot. When you add content or tweak the
templates, the entire site gets rebuilt and the
engines love it.
Rob: Separating content from presentation is a very
good thing, you just have to pick the right tool to do
it.
Do blogs use another language?
Dave: Yes, but the output to the web is always
HTML–the languages are server includes. MoveableType
is very easy, but it is a very rich and sophisticated
environment with lots to do. There is some work, but
once you set up the templates, you never have to worry
about it again.
If all the stuff is on the server, I have to use a
browser and a connection, with no backups, it gets
hard to track stuff.
Dave: There are some apps that let you work locally
and have the app do all the server interaction. Email
me for suggestions.
Rob: With many CMSs, those are common problems. With
doc compilers like RSane, you can work either online
or offline, whatever is more convenient. The server
application still provides the main repository, but
you shouldn’t be chained to it..
ISPsEnterprises don’t host wiki and blog servers, do
they?
Rob/Dave: No, generally they don’t. (A few commercial
ventures do.)
When will RSS come of age, where I can use it to
distribute info from my enterprise, like sending out
tech bulletins?
Dave: It’s coming of age right now–CNN, New York
Times, Wall Street Journal use it. Currently there are
wars about what format it will take. It’s aggressively
evolving right now. We are just at the beginning of
the adoption curve. Within 2-3 years most pages will
have RSS feeds on them, I predict. Even cell phones
might have RSS. It saves you space because it’s all on
the servers.
Rob: Adoption always depends on who’s reading it and
what tools they are using. Maybe RSS is appropriate
for your audience. It does make a nice alternative to
email and avoids spam.
What about selling to other kinds of people, like the
corporate world?
Dave: Corporate culture might find it threatening,
where anyone can add content. Companies don’t
understand this. There is an inherent lag in adoption
of new technologies by corporations. You’ll need
champions in the company to push alternatives.
Rob: With adoption of open source, suddenly
corporations turned to linux because the license is
free. In particular, open source makes adoption
easier.
Since evaluating or even adopting open tools doesn’t
require royalty payments, it’s easier to get rolling
in corporate environments. The big-dollar CMSs are an
entirely different story.
Do you know any good open source content management
apps?
Audience input: RSane, Slight Project, openCMS, Twiki,
Xoops.
How about social networking? Or when you have a very
large group?
Rob: It helps to have some strong leadership with some
centralized control. There’s no magic technical
bullet. This lesson comes directly from the open
source community–pick any large project, and there’s
a relatively small group of folks at the core that are
coordinating and leading the effort.
Dave: I’m in the decentralized camp. On social
networking, blogs can bring together a community
around a topic of interest, and topical searches hit
my site because other similar topic sites are static.
There are models out there that let you create all
kinds of online communities. And you can have enough
control to facilitate a little.
Can MoveableType coexist with existing webpages, or do
you have to convert it all?
Dave: Coexistence is ok, you can have old material sit
in the blog statically and only reformat some of it.
Or just do a little work at a time to convert it–you
don’t have to do it all at once. You’ll have to rename
pages, but Apachy Apache can do name mapping (more
work) on the server if you don’t want old links from
outside to break.
What about RSS to make government info and public
meetings more accessible?
Dave: I imagine the city of Boulder could have an RSS
feed that aggregates all city info (from multiple RSS
feeds) into one place. That’s a great idea for RSS.
and so the meeting ended…
we are working on developing an external interface to our humongous data warehouse. We are developing subscription based access – any pointers / thoughts on subscription-based access to business intelligence applications
Russ Zink
Hi Russell. I’ll strongly suggest that you check out the platform that Bill French and Andy Seidl have been building at http://www.myst-technology.com/ It has all the capabilities you need and quite a bit more too. Plenty of very large tech companies are finding their system a very good way to structure and disseminate their information too. I’ll ask Bill to post a note here too, so you can get a heads up on their app.
Russ:
We build business intelligence systems ourselves, but more importantly, we use the MyST platform to provide a unified integration layer for business intelligence content that must be avilable in a variety of syndication formats (securely of course). It’s not enough (as you might expect) to secure an RSS feed. To meet rigid security requirements of enterprises, especially with regard to business and competitive intelligence, you have to create a granular permissions model that provides a security context at the RSS item level. And once you achieve that, you’ll want that context to apply whether you’re using a corporate Blog for BI, or an Atom feed of recent product announcements on a competitors site.
Give me a shout if you’d like to learn more about the things we’ve done with blogs, RSS and BI/CI applications.
[[[If all the stuff is on the server, I have to use a browser and a connection, with no backups, it gets hard to track stuff.]]]
Regarding backups – I agree – this is disconcerting. However, most blog products have an API of one form or another, so the data is there – you just have to know how to script a process to get it periodically. Some blog tools vendors will say “Just use the RSS feed as the backup mechanism.” – and in some cases this may be a suitable model, but only if the RSS feed includes the full content of each post. If that is indeed the case, you can easily set a newsreader to cache all your blog content locally.
[[[What about RSS to make government info and public meetings more accessible?]]]
The most progressive state government doing this is the State of Utah. In fact, they are so pro-RSS/Blogs, I recently presented at a State-sponsored RSS Summit in Salt lake City. Check out:
http://rssgov.com/rssworkshop.html
http://rssgov.com/
Ray Matthews (raymatthews@utahgov.net) is the most informed person about eGov use of RSS in Utah (and perhaps the country).
[[[ When will RSS come of age, where I can use it to distribute info from my enterprise, like sending out tech bulletins? ]]]
RSS is a specification – either its a good implementation strategy for an enterprise problem, or it isn’t. The more relevant question – when will tools and applications learn to take advantage of this specification in a way that enterprises can utilize to improve operations?
Enterprises have started to realize that [public] RSS feeds are important because surfing to see what’s new is a bad idea – it’s wasteful and counter productive. The impact of RSS on operational efficiency for information workers is likely to be significant. But, this requires enterprise-quality, scalable, and secure-minded tools and platforms for creating, managing, and hosting RSS for a variety use cases.
[AD]We presently provide such systems for Intel, VeriSign, and SBS Technologies who use RSS channels for a variety of objectives, both public and private.[/AD]
So, in my view RSS is ready for enterprises, but as Dave suggests, not all enterprises are ready for RSS. 😉