Louise Spiteri—User-generated metadata: boon or bust for indexing and controlled vocabularies? (ISC conference 2013)

Louise Spiteri is the director of the School of Information Management at Dalhousie University, and she spoke at the ISC conference about social tagging and folksonomies. As a trained cataloguer, Spiteri said to us, “I’m a firm believer in controlled vocabularies, but we have to accept the fact that that’s not what our clients use.” She added, “User-generated metadata is here. Let’s accept it and learn to work with it rather than against it.”

Traditionally, a document’s metadata has been the purview of cataloguers, information architects, and professional indexers. Users could search for an item based on its existing classification, but they couldn’t amend that item’s categorization and organization based on their own needs and understanding.

In recent years, however, many blog and social media platforms have made it possible for users to store and categorize items—blog posts, photos, music, articles, and so on—based on their interests. They can organize these items by adding their own keywords, and in many cases they can add further metadata in the form of ratings or reviews.

Users typically add keywords using tags, which are non-hierarchical. A social dimension to user tagging was popularized by such sites as Delicious, CiteULike, and Flickr, on which users could not only tag information but also share those tags with a wider community. The collective tagging efforts of such a community is a folksonomy (a portmanteau of “folk” and “taxonomy”)—the set of terms that a group of users has used to tag content. Although such a set is open and uncontrolled, some sites offer tag recommendations based on what others have assigned, allowing for the potential for consensus.

User tagging has its limitations, of course—from ambiguity and polysemy (does the tag “port” refer to wine or a computer port or the left side of a ship?) to synonymy (especially in cases of spelling variants and singular versus plural nouns) to variations in the level of their specificity—but it can also be enormously powerful. In some communities, for example, dedicated users—avid fans who are intimately familiar with the content—can generate a set of tags that are more useful and informative than classifications offered by the vendor or a cataloguer, who is more likely to do the minimum level of cataloguing. Social tagging’s major strength is that terms can be individualized to users’ own needs. Further, folksonomies can adapt quickly to changes in user vocabulary, accommodating new terms with virtually no cost to the user or the system. Over time, particularly if the platform supports recommendations for tags, an item’s tags will tend to stabilize into an organically curated set.

Spiteri also briefly discussed newer forms of social tagging, including hashtags and geotags. Hashtags, common on Twitter, Tumblr, Instagram, and now Facebook, allow users to quickly follow a stream of content about a particular topic. However, they suffer the same problems as uncontrolled vocabularies; Spiteri strongly advocated promoting an official hashtag for a public event so that everyone uses the same one and the conversation isn’t split among multiple streams. Geotags, by contrast, add geographic metadata to information—allowing users to follow location-based news or identify the place a photo was taken, for example—and because they are often given in numerical format, such as latitude, longitude, and altitude, they are likely to be more consistent.

Social tagging, emphasized Spiteri, isn’t going away. How do we indexers work with it? Ideally, we would have a system that combines both controlled vocabularies and tags. On many blogs, for example, you can assign a post to one or more categories, which can be tightly controlled. User tags can then supplement or complement these categories, serving special user-focused functions. For instance, in multi-cultural communities, users can tag an item in their own language. Tags can also connect like-minded users, a function that controlled vocabularies don’t readily support. Most importantly, indexers can learn from user tags, adapting their subject headings to the language of their clients.

