Indexing a moving target: Ontario Hansard’s approach (ISC conference 2014)

Rosalind Guldner, Cheryl Caballero, and Erica Smith work together to produce the index for the Ontario Hansard, the official record of the province’s legislative assembly and its standing committees. Their team recently won the Web & Electronic Indexing SIG Award for excellence in web site indexing, and at the ISC conference, they shared their insights on team indexing approaches, indexing a constantly changing and growing text, and adapting to the demand for electronic indexes.

The Hansard is a serial: the House sits Monday through Thursday, and House debates have to be transcribed, edited, and proofed within twenty-four hours. The index and research group aren’t quite on such tight timelines—they have to produce final speaker and subject indexes only after the session ends—but they do index and edit as they go and also help the Hansard team with fact checking and other research. The index is bilingual, but the Hansard is transcribed and indexed in the language spoken only.

The team has found that assigning one primary indexer and editor to each index (one for the House debates and one for the committees) yields the best consistency. They also keep a subject authority list to help standardize their headings and subheadings. This list grows continually and changes as heading terms go in and out of style. For example, whereas MPPs (Members of Provincial Parliament) used to say “physicians,” they’re more likely today to say “doctors.”

The House index is based on the transcripts of debates about bills, oral questions, members’ statements, and statements by the ministry, and the indexer faces a number of challenges. First, Question Period is fast paced, and there isn’t always enough time to provide context, so the indexer must constantly keep on top of current events to know what’s being discussed. Second, the content can be unpredictable: because nobody knows when the session will end, non-substantive content now may later resurface as substantive content, so it’s hard to know how specific to go with subheadings. Third, people read the Hansard to determine legislative intent, so indexers must provide several alternative access points to the information. Finally, indexers have to maintain neutrality. The transcript is substantially verbatim and editors are restricted from sense making, but MPPs go off topic constantly and are often crafty about using language that is only tenuously related to the topic.

The committee indexer works with transcripts from standing committees and select committees. Sometimes committees are given special mandates, and occasionally the committees will hear from witnesses. Although witness statements are recorded and transcribed, they are not indexed; only members’ questions and reactions are indexed. The committee indexer will often use the House index as a guide, although the subject matter can be discussed in finer detail, so the committee index may have more headings or subheadings.

The Hansard indexing team is constantly editing their index, issuing daily updates to the online House index and twice-weekly updates to the online committee index. Once the session ends, they do a final “big picture” edit before producing final print and online versions of their indexes. The print versions are sent to depository libraries all over the world.

The Hansard is still printed on paper, as it’s used as a legal record, and paper indexes have been used since 1949. For the past dozen years or so, the indexing team at the Ontario Hansard have also provided an online index. They use HTML/Prep and Webprep to convert indexes created in CINDEX to HTML. Right now there’s no tagging yet—the locators link to the top of a page, and the user has to use search function on that page to find what they need.

In the future, the team hopes to tag content directly; create a linked, tagged index to audio or video content; and provide “live” headings, where they listen live during the debate and provide quick access to popular content such as oral questions and members’ statements. They also aim to expand their role, spotlighting their indexing skills and reference resources to create useful reference lists, and maybe one day to index other assembly content, including the Members’ Guide and the Standing Orders (rules of Parliament).

Caroline Diepeveen—Team indexing: The way forward? (ISC conference 2013)

Caroline Diepeveen led a small team that indexed the five-volume Encyclopedia of Jews in the Islamic World (EJIW), published by Brill. Her efforts, along with those of her co-indexers, Pierke Bosschieter and Jacqueline Belder, won the team the Society of Indexers’ Wheatley Medal in 2011. Most gratifying for Diepeveen was the jury’s remark that they couldn’t tell that this index had been composed as a team.

Indexers are used to working in isolation, Diepeveen said, and some seem averse to the idea of working in a team. But her own experience with EJIW was positive, and in a small survey she conducted about team indexing, with eleven indexers responding, she found that 73% had had good experiences, while 27% said that their experience was okay; nobody had found team indexing particularly negative. The respondents had mostly worked in groups of two or three and used such strategies as constant discussion and a controlled vocabulary to achieve consistency in their work. Many teams had one main indexer who was responsible for putting the team together and ensuring the quality of the final product.

In Diepeveen’s case, team indexing became a necessity because of EJIW‘s project deadlines. She had initially signed on as the encyclopedia’s sole indexer. In theory, the encyclopedia would be built one article at a time; the editors expected a steady flow of articles from the authors, and Diepeveen could index at her leisure. In reality, the bulk of the articles came at the end, and the options for the publisher were to extend the deadline or to bring in more indexers.

Fortunately, the encyclopedia itself was compiled using a sophisticated content management system (CMS) with a fine-tuned workflow. Team members were allowed access to only the parts of the CMS that they needed; authors from all over the world contributed articles directly into the CMS, which were then edited by a team of editors and finally released for indexing. With the CMS, articles could easily be assigned to one indexer or then reassigned as needed; there was no need to mail files around. (Brill had attempted to develop a software module that allowed embedded indexing directly in the CMS, but the first version of the indexing module didn’t allow basic indexing features, such as selecting a range, and so was deemed unacceptable. In the end, the index was not fully embedded and instead was compiled using anchors in the text as locators.)

Serving as team captain, Diepeveen not only put together the indexing team but also oversaw her team’s work. She had already done some of the indexing before she brought on the other indexers, so the other team members could use her work as a reference. Helpfully, the articles in the CMS showed all indexed terms highlighted in green, and Diepeveen could easily see whether her teammates were over- or under-indexing and provide feedback as needed. She emphasized the importance of regularly communicating with team members to build trust and a strong working relationship. Geographically separated team members may not be able to meet in person, but teleconferencing and web conferencing go a long way in clarifying roles and tasks, not to mention allowing team members a chance to get to know one another.

To keep the process running smoothly, the team had to lay some groundwork:

  • Diepeveen did a thorough edit of the index near the start of the project so that all team members would have a basic structure to work towards.
  • The team disallowed double postings; cross-references could be converted to double postings at the very end if needed.
  • The team stipulated that all entries must have a subheading. When you see only one part of a publication, you don’t know how much weight or detail is given to a particular subject in another part of the publication. Again, unnecessary subheadings could be edited out at the very end if needed.

Most importantly, Diepeveen said, the team “kept asking questions. EJIW worked almost like peer review on the go. We asked each other, ‘Why did you decide to do things this way?’ We kept each other sharp by asking questions. That improved the quality of the index.”

As larger and larger electronic publications become the norm, Diepeveen said, team index will probably become more common. Emerging technological tools may help with the logistics, but the most important aspect of team indexing, she reiterated, was the team itself. It is critical to invest in trust, not only at the beginning of the project but also regularly throughout.