So you want to taxonomize?

In theory, taxonomies beat categories hands down for organizing content.

The main advantages are flexibility and precision.

With taxonomic tagging, you describe content on an atomic level, very precisely, and then you build queries get lists of the stories (or photos or whatever) that you want.

And because a true taxonomy maintains a parent-child relationship between entities on the taxonomy tree, you can go up or down the taxonomy tree to build general or specific queries.

So how does this build flexibility?

First of all, categories change. All of a sudden you may wake up and realize you really need to have a section devoted to the business of sports. If you’ve been taxonomizing well, that’s no problem — you’ve probably got a bunch of stuff that already has both business and sports tags. Now you just pull them together in a query, attach the query to a page or a block, and away you go.

But if you’ve been a category-based site, and you never had that particular “sports business” category before, then nobody has ever categorized anything helpful. So you’ve got to search through the database and manually recategorize the stuff you want.

Taxonomies also give you much greater precision, because you’re not just categorizing something as “sports,” you’re tagging it “sports>basketball” and you’re also tagging it for professional, college or high school, and you might even be tagging it as a tournament game. Want that Final Four page? No problem.

Even if you don’t go around adding new sections or content lists all that often, the inherent precision of taxonomies can help you get much tighter lists of related articles.

That’s the theory, anyway. In reality, there are lots of challenges to making taxonomies work for your site, from uncooperative content management systems to basic human limitations.

More on those challenges later, but for now check out the IPTC’s NewsCode taxomony, which is a pretty good starting point if you’re going to use or develop a taxonomy for your site. Also, this (somewhat outdated) history of the IPTC’s work in this area from is helpful. It’s focused on image tagging, but it gives a good overview of the evolution of the IPTC standards work. And the Dublin Core metadata initiative has some pretty clear explanations of what all this is about.


Tools for online journalists

I’ve started a page of useful tools & sites for online journalists. It’s a work in progress, and still a bit of a grab-bag list, but you’re sure to find something interesting there. Over the next few days & weeks I’ll dig deeper into my bookmarks and try to annotate, illustrate and organize it. Suggestions welcome!


Some people aspire to immortality through art or politics, or by building a business empire. I’d love to have my name attached to a law.

Not a criminal law, mind you, but something like Moore’s Law — something that captures some fundamental insight clearly and simply.

Here’s my current top candidate:

Wilson’s Power Law of Software Intelligence.

(OK, so I may need to work on the name a bit…)

It’s sort of a kissing cousin to Moore’s law and to the parallel law that states that bandwidth is increasing in a similarly predictable fashion. (I’ve seen variants referred to as simply the bandwidth law, Nielsen’s Law after Jakob Nielson and Edholm’s law of bandwidth.)

My law states that software gets twice as smart and costs half as much every two years.

Every time I see a Google ad I think of it. And therein lies a story and a great example of the phenomenon…

Back in 2003, as we were signing on with Saxotech to use their Publicus CMS, we helped push them to strike a deal with a company called Applied Semantics. We were moving toward a taxonomy-based site, and we saw the consistent application of relevant meta-data as a potential problem. Applied Semantics had some great black-box technology that used god-only-knows-what kind of fuzzy logic and proprietary algorithms to automatically tag content. It was also crazy expensive — hundreds of thousands of dollars for an installation. Only a few big media companies were using it. Anyway, Saxotech and Applied Sematics struck a deal that would have allowed Saxotech to resell the service to customers like us.

But before we got our site up, Google bought Applied Semantics. Now the Applied Semantics technology is apparently part of what powers AdSense. (Go to and you’ll land on an old Google page touting AdSense.)

So we never got to use it.

But seven years leter we can pump your content through any number of free automatic meta-data-tagging services like Calais. In just seven years, this hyper-intelligent software service has gone from crazy expensive to free.

The same dynamic seems to play out in other areas, as well, from virtualization to CMS systems to web design software.

And if it really holds up as a general rule, my law has huge implications, certainly for the developers of smart software, but also for the consumers. Especially for those of us lower down on business scale. What we see today in the hands of the big boys is likely to be within reach in two years and ubiquitous two years after that.

The problem with my law, unfortunately, is that it’s inherently hard to quantify software intelligence or complexity. And, yes, it’s true that I did just completely make up the two years bit above, as well as the doubling. Artistic license. But I think there’s something there. I’ll keep working on it. I’m not ready to give up on this shot at immortality yet.

Souper sites

Met today with Richard Anderson of Village Soup, who gave us an overview of the integrated print/online content management system they’ve developed to run their own sites and are now trying to spread to other news organizations.

The CMS is interesting (there’s an open-source version and an enterprise version they’re selling), but what really impresses me is how successful they’ve been in getting local businesses and organizations involved in their online communities. Through what they call the bizMember Program, they give local businesses what amount to directory profile pages that double as blogs. Then they publish the business posts prominently on their home page, in a kind of a ticker box that gets equal billing with their own staff content. It seems to set up a self-reinforcing situation where businesses are driven to participate to stay on top (and visible), which not incidentally gives the sites a lot of fresh and locally useful information.

bizBriefs next to staff content at Village Soup site.

The Village Soup group also runs four print weeklies, but they claim to get 21 percent of their revenue from online. Impressive.