Blog

ISC Conference 2012, Day 1—Indexing National Film Board of Canada images

NFB librarian Katherine Kasirer showed ISC conference attendees what’s involved in indexing the National Film Board’s collection, particularly its Stock Shot library.

We all know the National Film Board as a Canadian institution. It was established in 1939 and has about 13,000 titles in its catalogue, including feature-length documentaries and short animated films. Only 2,500 are available through the NFB.ca website, and these are the result of the NFB’s ongoing project to digitize all films and make them available for streaming.

The NFB also has what it calls the Stock Shot library (or the “Images” database), which is a collection of discarded footage (outtakes) that can be used in other productions. The database also includes

  • the Canadian Army Film and Photo Units (CAPFU) collection, deposited in 1946
  • the Associated Screen News collection
  • captured materials from World War II (German war propaganda)
  • the Canadian Government Motion Picture collection

Users might be, say, music video or commercial producers, researchers, or documentary and feature filmmakers. The database has very fine subject indexing to allow users to find exactly what they need. Since filmmakers often have to convey a particular mood or show a specific object or event, the indexing must include a number of elements of information to help users retrieve the desired footage, including

  • subject
  • location
  • shooting conditions (e.g., foggy, sunny)
  • time of day, season
  • camera angles (e.g., close-up, aerial shot)
  • year of production
  • special effects (e.g., underwater, time-lapse)
  • camera operator
  • film (title of film that produced the outtakes)
  • technical details

The search is, of course, bilingual, and will bring up images and clips, not just a written description. Kasirer’s presentation really drove home how specific and often how nuanced image and footage indexing can be.

ISC Conference 2012, Day 1—Building a bilingual taxonomy for ordinary images indexing

Elaine Ménard gave ISC conference attendees a glimpse into the world of information science research. An assistant professor in the school of information studies at McGill University, Ménard embarked on a project to develop a bilingual taxonomy to see how controlled vocabularies can assist in both indexing and information retrieval. Taxonomies are inherently labour intensive to create, and the bilingualism adds an additional complication.

Ménard’s Taxonomy for Image Indexing And RetrivAl (TIIARA) project consists of three phases:

  1. a best practices review,
  2. development of the taxonomy, and
  3. testing and refinement of the taxonomy.

Phase 3 is currently underway, and she gave us an overview of the first two phases.

In phase 1, Ménard and her team evaluated 150 resources, including 70 image collections held by libraries, museums, image search engines, and commercial stock agencies and 80 image-sharing platforms with user-generated tagging. They discovered that 40% of the metadata dealt with the image’s dimensions, material, and source, and 50% of the metadata addressed copyright information, with the balance devoted to subject classification. This review of best practices constituted the basis of phase 2.

In phase 2, Ménard’s team constructed an image database and developed the top-level categories and subcategories of the taxonomy. To create the database, they solicited voluntary submissions and ended up with a database, called Images DOnated Liberally (IDOL), of over 6,000 photos from 14 contributors. Her taxonomy kept in mind Miller’s Law of 7 +/- 2 and featured (after a series of revisions and refinements) nine top-level categories, designed to help users with retrieval while being as broad as possible, and a further forty-three second-level categories.

After the category headings were translated, two volunteers, one anglophone and the other francophone, tested the preliminary taxonomy through a card-sorting game, in which they were instructed to sort the second-level cards according to whatever structure they desired and provide a heading for each sorted group. This pretest showed a polarization of “splitters” and “lumpers” and didn’t provide any practical recommendations for the taxonomy but did suggest revisions to the card-sorting exercise.

Ten participants (five male, five female; five anglophone, five francophone) were recruited to test the taxonomy to expose problematic categories in the structure. Half of the group was instructed to sort the second-level categories according to the existing first-level structure; the other half could sort the second-level categories as they pleased. Through this test Ménard hoped to assess how well each category and subcategory were understood; the differences between the French and English sorts would reveal nuances that had to be taken into account in the translation of the structure.

Results showed that the first-level categories of “Arts,” “Places,” and “Nature” were well understood but that “Abstractions,” “Activities,” and “Business and Industry” were problematic. Feedback from participants helped researchers clarify the taxonomic structure to seven first-level headings. Interestingly, Ménard found fewer disparities between the languages than expected.

The revised TIIARA structure was refined to include second-, third-, and fourth-level subcategories and was simultaneously developed in English and French.

In phase 3, underway now, two indexers—one English, one French—will work to index all images in the IDOL databases according to the TIIARA structure. Iterative user testing will be carried out to validate and refine the taxonomy.

So far the study has shown that language barriers still prevent users from easily accessing information, including visual resources, and a bilingual taxonomy is a definite benefit for image searchers. Eventually the aim is to implement TIIARA in an image search engine.

An amazing honour

Holy crap! I won the Tom Fairley Award for Editorial Excellence!

The award was announced at the Editors’ Association of Canada banquet on the evening of Saturday, June 2, and I was completely surprised. Thanks again to author Florian Werner, translator Doris Ecker, foreworder Temple Grandin and her wonderful assistant Cheryl Miller, proofreader Lara Smith, designer Naomi MacDougall (interior), cover designer Peter Cocking, and the whole D&M production department for making Cow happen. Thanks to Rob Sanders for trusting me with the project, and a million thanks to Nancy Flight for nominating me for the award and for encouraging me along the way.

Thanks to the EAC for having this award in the first place, and thanks to the donors, awards committee, and judges for this tremendous honour.

I had the pleasure of having a long conversation with fellow finalist Peter Midgley about our respective editing projects—both translations, interestingly enough. I’m very sorry he couldn’t have shared in the award with me, because it sounded as though we had parallel experiences. I look forward to reading the book he edited, The Man in Blue Pyjamas.

ISC Conference 2012, Day 1—More to come!

There were four other ISC sessions that I attended today, but I haven’t had the chance to write them up. I’ll post them as soon as I can piece together something coherent out of my notes. Thanks for your patience!

UPDATE (Sunday, June 3): Ack. Now I have three and a half days’ worth of conference sessions—for the ISC and the EAC—that I have to summarize and post. I took a heap of notes, and I got the speakers’ permission to post synopses, so I’ll eventually get everything up here, though perhaps not as quickly as I initially imagined. I’m hoping to work my way through the session notes over the next week or so. Right now, however, brain = toast.

ISC Conference 2012, Day 1—The glory and the nothing of a name

Noeline Bridge is the editor Indexing Names, a book fresh off the press. She spoke today at the ISC conference about proper noun indexing, particularly the tricky problems that arise from people’s names.

Determining the order of the elements of a name with multiple components is the basic problem that a proper noun indexer must solve. For example, the indexer must know that many medieval names and names that indicate a patronymic are typically left as is and that German names with “von” are traditionally indexed under the part that follows “von.” Bridge gave attendees an extremely useful list of resources that guide the practice with respect to inverting names in a variety of languages.

Deciding how much information to include and exclude is also an indexer’s judgment call. We have to be sensitive to what a publisher or author may want. For instance, one of Bridge’s publisher clients insisted that all military titles be included. Bridge occasionally adds glosses with qualifying phrases for added specificity. As an index user, she explains, she likes to know right away which entries refer to human beings and which ones do not, and the glosses help establish that.

Be careful for parts of a name that may be titles or honorifics. If an author uses only one name to refer to a person (e.g., Batista, versus Fulgencio Batista y Zaldivar), one school of thought is that that’s all you need to include, but Bridge often prefers to look up and include all components of that person’s name for completeness.

Bridge uses glosses to help distinguish between people with similar names (a situation that comes up often in family histories or local histories) by place, by occupation, or by relationship. She uses these to keep them straight for herself and often simply leaves them in to help the reader. Sometimes she uses a family tree program to keep track of whom the text is referring to if there are many generations of people with the same name.

Changes in name can be a complicated category, because in some cases—for instance, when a writer adopts a pseudonym—the person is adopting a different persona, and an argument can be made to index these separately. In cases where a name evolves, once again, the indexer must use judgment to decide whether to use the most recent name/title or the one used predominantly in the book.

In the case of transliteration and romanization, the decision usually has been made for you by the author regarding spelling. An exception is when you have a collection or anthology with different authors on overlapping topics.

A theme throughout Bridge’s talk was that you must be prepared to yield tactfully an author’s preferences, and you must be sensitive to context. For example, whereas you would usually index a celebrity under a name by which she is most commonly known, at times it may be appropriate to use her birth name if you’re indexing a book about her family history.

ISC Conference 2012, Day 1—Ebook indexes: the devil is in the details

Jan Wright, a leader in the field of ebook indexing, gave the keynote address at the Indexing Society of Canada’s annual conference this morning. We are witnessing a watershed moment, she says, where we are trying to define what the markup for our content should look like, no matter where it is—whether it ends up on paper or on a device like e-reader or smart phone. This development is in its infancy right now, with conflicting formats on different platforms, and Wright is part of a working group of indexers actively involved in shaping the EPUB 3.0 standard to include indexing concerns.

Current ebook indexing is either nonexistent or ineffectual. Ebook indexes may be missing or static, and there are almost no ebook indexes that index at a paragraph level. They are not an integrated navigational tool, they are difficult to get to, and they are hard to browse, especially if they’re typeset in two columns.

Existing platforms try to mimic certain features of indexing, but they don’t provide all of the functionality of a traditional index. For example, iBooks Author conflates an index with a glossary and limits the function of indexes as navigational tools. Amazon’s X-Ray, currently available only on the Kindle Touch, shows all occurrences of a particular term by page, chapter, or book, but it is merely recall—without the precision of an index—and offers terms in chronological order. In other words, it’s a brute force attempt at indexing.

When considering ebook indexes, we have to take into account a reader’s mental patterns and search behaviours. Some readers have never read the book and need to know if it adequately covers a given topic; some have read the book and know that their search topic is in there, but they have to find it. We must also keep in mind that reading styles differ whether you’re reading for education or for pleasure, fiction or nonfiction. Using physical cues, such as the position in a book or location on a page, to locate content, as well as behaviours like skimming, are disrupted in ebooks. Some platforms attempt to mimic a paper metaphor, but really, paper is just another interface. The key is to figure out what each interface does best and playing to those strengths, because the paper metaphor doesn’t carry over well onto a small screen. The danger with today’s ineffective ebook indexes is that they are training the reader to believe they are unpredictable and thus to question why they should bother using them at all.

The ideal ebook index has features that have been implemented in other contexts before and so should be completely feasible. Wright gave us a demo of what an effective ebook index should do. It should be accessible from every page; the “Find” feature should reflect the best hits, as identified by index; it should show the search results with snippets of text to offer context; it should allow cross-references to help you refine search phrasing; and it should remember that you’ve been there before and let you go back to it. Ebooks would also allow for additional functionality, like bringing up all indexed terms in a highlighted swath of text in a kind of “mind map” that offers additional information showing how concepts are connected.

So what can we do now? First, Wright says, is to get ready for the eventual use of scripts and anchors in EPUB 3.0. A goal is to develop a way to add anchors or tags to content at the paragraph level, which would allow for hyperlinking directly to the relevant content. Once prototypes of the interactive ebook index have been developed, we must assess their usability to ascertain what’s best for readers.

A big takeaway from this keynote speech is that advocacy and outreach are essential. With the standards at a nascent, malleable stage, this is the time for indexers to have their concerns addressed as the technology develops so that indexers’ workflow can be taken into account. (But more on this in a later post.)

East Meets West VIP launch

I just got back from the VIP launch for Stephanie Yuen’s cookbook East Meets West at Lin’s Chinese Cuisine, where Chef Zhang and his staff treated us to delicious snacks, including Lin’s signature xiao long bao, as well as a tan-tan noodle demonstration. It was great to chat with the ol’ D&M crew and meet the author in person, and each of us came away with a swank gift bag from Sunrise Soya Foods and Hon’s Wun-Tun House.

Exploring Vancouver pubs

Starting tomorrow you can get your own copy of Exploring Vancouver: An Architectural Guide, the story of Vancouver as told through its architecture. The book is organized into fourteen walking/driving tours of the city’s neighbourhoods and its closest suburbs, each showcasing structures of note—for their architectural excellence or for their historical significance. Architectural historian Harold Kalman and architectural critic Robin Ward have put together an authoritative but accessible guide featuring eye-popping photography by John Roaf in a stunning package designed by the fabulous Naomi MacDougall.