Autumn 2005 / Feature
The Infinite Library

Researchers are welcoming Google’s plan to digitize millions of books, but the implications for libraries are profound


If he traded his khakis and open-collared shirt for a monk’s robe, the bearded Rev. William Craig would not look out of place in a dank medieval library. But this is 2005, and the Anglican priest doesn’t need to roam the dark, mouldy corridors of a medieval library to do research.

In the cafeteria of Robarts Library at the University of Toronto, Craig, a doctoral student in theology, pulls out his wireless laptop computer, launches an Internet browser and calls up a digital book from the library’s electronic collection. The print version of the 221-volume Patrologia Latina, a massive 19th-century collection of a thousand years of church doctrine, would fill dozens of shelves. Yet Craig can scour the full text of every volume simply by entering a few words into the search box on the screen.

Academic research has changed remarkably since Craig began his student career in the 1970s. “When I was an undergrad, we didn’t have personal computers,” he says. Now he can find primary sources central to his thesis – on the 1604 Hampton Court Conference of King James I of England – with a few clicks of his mouse. Digital resources such as Early English Books Online, which contains nearly 100,000 page-by-page reproductions of books published between 1475 and 1700, have made research “infinitely” easier, says Craig.

Thirty years ago, card catalogues and printed periodical indexes ruled the library system. Then the digital revolution arrived. Computers replaced card catalogues and microfiche in the mid-1980s. Encyclopedias that appeared on CD-ROMs in the early 1990s gave way to easy-to-search electronic indexes and other reference databases on the World Wide Web.

But those early milestones were only a prelude. Thanks to the development of more powerful computers and larger databases, the number of library resources available in digital form is expanding rapidly. U of T’s electronic collection now includes almost 40,000 journals, more than 50,000 e-books and 1,000 indexes and other online reference tools. However, U of T’s e-collection is still dwarfed by the 15 million books and periodicals held by the university’s 32 libraries.

Which U of T library is (or was) your favourite?
Vote now in our poll.

Peter Clinton, the director of information technology services for U of T Libraries (UTL), says the last few years have seen a quantum leap in the availability of full-text electronic materials – whole articles and books rather than just brief citations or abstracts. As a result, students and faculty can do an increasing amount of research from their computer desktops. Not surprisingly, they check out materials from the library less often. Over the past decade, the circulation of print items at UTL has dropped 20 per cent, due mostly to falling demand for print journals. In the field of physics, the latest research is published only in electronic form. When it comes to searching for library resources, students “want it now, they want it fast and they want it to work like Google,” says Clinton.

Visit UTL’s Web site and you’ll discover just how influential the popular Internet search engine has become. Front and centre on the home page is Google’s colourful logo. Click the link and you land on a page with the message, “There are limits to searching Google Scholar and you may find better quality information through the University of Toronto Libraries’ databases.” The wording is mild, but it’s evidence of the growing competition between academic libraries and major technology companies, such as Google, Yahoo and Amazon.

Google Scholar, a service started late last year that’s still in its testing phase, is the company’s first foray into academic research. It allows users to search collections of proprietary electronic journals and a variety of online repositories of scholarly papers. A Google Scholar search on “exosolar planets,” for example, returns 54 academic essays on the subject, ranked roughly in order of the number of times they’ve been cited. Within just a few months, Google Scholar has established itself as a rival to powerful multinational companies such as Thomson and Elsevier that offer huge (and, for libraries, hugely expensive) databases of scholarly material. Some librarians say that Google underperforms its rivals in the currency and quantity of its search results, while others declare that its simplicity is a huge advantage. “Google Scholar works. And it works in a way that presents very few of the hoops that we make students jump through to use our library databases,” writes T.J. Sondermann, an academic librarian and prominent blogger on library issues in the U.S.

UTL is not so conciliatory. The library is attempting to teach researchers that its resources are more specialized, in-depth and targeted to particular fields. Carole Moore, UTL’s chief librarian, says the problem is not that students use Google’s main search engine but that they use it primarily because they are unaware of alternatives. “Many students have a limited idea of how to search and of what they’re finding,” she says. UTL is conducting seminars on the use of library materials and Moore notes that once researchers are aware of what the library has to offer, they tend to lose their dependency on the search engine. “If researchers know how to use the databases, then that actually does bring them in for materials because it’s not all online,” she says.

However, Google isn’t standing idly by. The company is developing another service called Google Print that may encroach even more on the traditional turf of libraries. Last December, it announced a partnership with the New York Public Library and four major university libraries – Harvard, Stanford, Oxford and Michigan – to digitize millions of their books. Publishers are expressing concerns about copyright protection, but even if Google limits itself to works in the public domain, the implications for academic libraries are profound. A Google search of the text of millions of instantly available digital books would be a more compelling first choice than even the largest library catalogue of physical volumes.

As Google expands into the academic realm, some argue that libraries should simply bow to its strengths. John Wilkin, a librarian at the University of Michigan in Ann Arbor, believes that the company’s dominance in online searching is inevitable. He says that libraries should “cede the generalist role to Google” and allow it to become a universal search engine for library materials. “We all love Google for its quick and dirty approach,” says Patricia Bellamy, a reference librarian at the Robarts Library. But Moore doesn’t share Wilkin’s view that libraries should relinquish their core business to Google. “I really don’t think we would want a world that was totally controlled by one search engine,” she says, noting that Google’s for-profit status puts it in a compromised position as an academic research tool compared to the non-profit neutrality of UTL. She adds that Google’s mandate is not to support academic research but to make a profit. If a service such as Google Scholar became unprofitable, the company might decide to stop supporting it.

To compete with Google, UTL, like other academic libraries across Canada, is facing short- and long-term challenges. In the short term, UTL must match Google’s ease of use. This means replicating the company’s one-stop search box. For libraries plagued with many different ways of gaining access to their diverse collections, as UTL is, creating a single search box to retrieve a broad range of materials is an obvious step toward making research more convenient for students. Currently, researchers must deal with separate entry points for the print catalogue, electronic index and abstract databases, e-books and e-journals, and UTL’s own scanned digital collections. Since 2002, however, UTL has been working with the libraries at all of Ontario’s universities to create the Ontario Scholars Portal, a single-box search engine that covers 7,300 electronic journals and 65 electronic indexes. Clinton says that the library is about six months away from its ultimate goal of tying its print catalogue, databases and catalogued Web resources to a single search. He admits that Google is innovating quickly, but says that libraries – and the electronic database vendors whose products they buy – are beginning to catch up. “Google, and in particular Google Scholar, has been a wake-up call for many of the information vendors,” he says.

Over the long term, the library is thinking about new ways that digital materials can be stored, packaged and delivered, says Moore. Like Google, UTL is digitizing books, but its focus is on its unique collections used by the U of T community and other Canadian researchers. So far, UTL has scanned thousands of rare illustrations of human anatomy, explorers’ documents and early Canadiana from the university’s rare-book collection. Progress has been slow, but last September UTL began a pilot project with a non-profit organization called Internet Archive to digitize books using a robotic scanning machine provided by the archive. Under the arrangement, U of T pays Internet Archive 10 cents US for every page scanned. Internet Archive, which is based in San Francisco, keeps one copy to add to its digital collection and U of T keeps one copy.

One year into the project, UTL has paid for about 2,000 scanned titles, ranging from a copy of a 1475 edition of St. Augustine’s City of God to war memoirs and literary texts. Over the next two years, the project will digitize all of the known editions of books and print material – about three million pages in total – by John Henry Cardinal Newman, a 19th century Christian theologian. The effort, a joint project of three American partners and St. Michael’s College at U of T, should enable researchers to detect subtle changes in Newman’s language that would otherwise take years to discover, says Jonathan Bengtson, the chief librarian of John M. Kelly Library at St. Mike’s. “Such analyses will lead to a deeper understanding of the development of Newman’s thought,” he says.

Google’s somewhat grandiose mission is “to organize the world’s information and make it universally accessible and useful.” With a market valuation in July of more than $80 billion US and quarterly earnings of $343 million US, the company is in a far better financial position than UTL. The library system is stretching its resources to the limit to compete with the ease and scope of Google as well as preserve its traditional collections and services. Unlike Google, which publicly announced that it plans to spend up to 30 per cent of its earnings on new product development, UTL has no capital for innovative projects. While UTL has managed to maintain its acquisition spending in real terms, it has had to contend with university-wide funding cuts that in the past decade have led to a 30 per cent reduction in library staff.

In the 2003-04 academic year, UTL’s total acquisition budget was $25 million, of which about half was spent on scholarly journals. Because some members of the university community prefer the print format, while others request electronic, UTL often acquires both – frequently at a higher cost, and with the associated headache of storing all those volumes. The university recently spent $6 million on an off-site, climate-controlled preservation space, to keep two million books. “I spent 10 years seeking this facility, because we’re out of space,” says Moore. Located near U of T’s aerospace building in Downsview, the warehouse is expected to be enlarged to hold five million volumes by 2020.

As UTL attempts to secure a role for itself in the Google Age, it’s reconsidering its traditional reliance on publishers and vendors and beginning to act a little like a publisher itself. Last year, for example, the library developed T-Space, a university-wide digital repository that holds thousands of documents, including course mate-rials and unpublished scholarship that would previously have fallen outside the library’s mandate to collect. Modelled on a similar repository developed by the Massachusetts Institute of Technology, T-Space lets students download specialized course materials and allows faculty to post papers and research findings in a public venue without first having to find a publisher.

As UTL boosts its technological capabilities, Moore likens the direction that the library is heading to a much earlier forebear – a medieval library. These institutions of the Middle Ages were not only book storehouses but places where manuscripts were rewritten and information was combined and republished in new ways, she says.
As for Rev. Craig, who has watched the world of print expand into the broader possibilities of digital texts, the library has become an even more stimulating place to be. Writing a dissertation is still not easy, but he says that research “is a lot more fun” when the wealth of the world’s knowledge is at your fingertips.

Devin Crawley (MISt 2004) is a librarian and writer in Ottawa.


Add a Comment

required, use real name
required, Not for Publication
optional, eg: BSc 2008

Next story in this issue: »
Previous story in this issue: «