Recently I have been interested in trying to create hierarchical taxonomies from flat tag data. Tagging systems like del.icio.us, Flickr, and CiteULike tend to have (relatively) flat tags. This means that while one can easily browse by a tag, like photography, one cannot as easily see tags which are more or less broad than that tag. It is also difficult to get a broad overview of what tags exist in these sorts of systems as a result, aside from frequency based displays like tag clouds.
Some commentators have suggested
that ontology is overrated, even irrelevant. That there is no hierarchy in
ideas, only links:
Tagging systems are excellent at the task that they were designed for---allowing a large, disparate group of users to collaboratively label massive, dynamic information systems like the web, media collections of millions of images, and so on. We are working to make these systems better by automating production of hierarchical taxonomies that describe the data from the raw flat tags generated by users.
I've found some interesting features of tagging datasets from del.icio.us and CiteULike which have in turn suggested reasonably good ways to create hierarchies. An example hierarchy generated using some of these methods from del.icio.us is here: mgfgsm-hierarchy.
|Title:||Collaborative Creation of Communal Hierarchical Taxonomies in Social Tagging Systems|
|Authors:||Paul Heymann and Hector Garcia-Molina|
|Type:||Preliminary Technical Report|
|Accessible:||(info) (ps) (pdf)|
|Description:||This paper describes a simple algorithm for constructing hierarchies in social tagging systems that usually works reasonably well. The main contribution is a notion of generality in social tagging systems based on centrality in a similarity graph.|