Presses Have Little to Fear From Google
Michael Jensen
Chronicle of Higher Education, back page "Point of View,"
July 7, 2005
Author's copy; see free authoritative version on CHE site.

    The hot session at the recent annual meeting of the Association of American University Presses had to be moved to a half-ballroom to accommodate interest. Nearly 200 publishers listened patiently as the first two speakers talked about various online opportunities for scholarly presses, but the crowd was mostly there to hear -- and interrogate -- the third presenter: the "Google guy."
    Last December Google announced its Google Library Project, to digitize the books of five major university libraries over the next decade -- millions of volumes, starting with out-of-copyright material, but expected to include out-of-print and other books still in copyright. The project, an expansion of Google Print, a program to make traditionally printed material searchable online, first caused great alarm in many quarters. Soon after the announcement, the university-press association sent a strongly worded letter to the Google leadership about the intellectual-property incursions the Library Project seemed to represent. Then a few weeks ago, word circulated through the corridors at the university-press meeting that the Association of American Publishers had just sent a similar letter: It asked Google to cease scanning copyrighted works for at least six months to enable all parties to sit down to thrash out the issues.
    What many publishers believe is at stake is the possibility that they will lose control of their intellectual property. Behind that, of course, are such questions as whether Google might profit by selling advertisements related to the material in a book it digitizes and presents online, without buying the volume itself[--]or sharing the revenue.
    In 2004, when Google first met with publishers to enlist them in Google Print, the response was one of guarded interest. Publishers were told that the program would involve giving a copy of a book to Google, which would scan the pages and make them searchable online. Digital-character recognition would be used to perform the searches, but strict default limits on the interface would mean that no more than 20 percent of any book would be available to a user; a publisher could set its own constraints on how much of the book could be presented.
    Many university presses considered Google Print a big risk. After all, letting Google digitize huge chunks of what a press had nurtured into print was an act of faith (with plenty of contractual consternation): faith in Google's online security, faith in technology, faith in the limitations Google promised to adhere to. Still, Google's technology did seem to assuage the biggest worry for publishers: losing control of intellectual property in the digital arena. Perhaps that's why so many publishers reacted so viscerally to the unexpected announcement of Google Library. As often happens when the details aren't clear, many of them feared the worst: Google would assert ownership over the digitizations, start becoming a vendor of books, and eventually retail a giant single library that would, in essence, ruin the book market.
    I understand the fears, but I think most of them are unfounded. And I think they could lead publishers to miss some crucial opportunities.

    First, I doubt that participating libraries -- or Google -- would risk directly taking on copyright law in the current pro-ownership political environment. Moreover, it's clear to me that the control of intellectual property by publishers is not threatened by Google Print or Library. My own National Academies Press started participating in Google Print early in the program. Google has already scanned more than 1,300 of our titles, with another 1,000 pending, and we are sending them more all the time. It isn't displaying advertisements beside images of our works, because we asked it not to. It will take down any book we ask it to, and our contract leads us to believe it will always do so. It would be madness not to -- and thus launch a full-scale copyright war.
    Further, based on our own experience over the last seven years, I can predict with confidence that online-search capabilities will boost book sales: A university press that joins Google will find itself using "print-on-demand" technology to fill orders from its backlist for that 1958 tome on the Maginot Line that it never dreamed would have a life in the 21st century.
    At present, reaching selected audiences is the hardest problem publishers face. How do you find the 300 people who most care about the political genesis of the Maginot Line? It's one of our most expensive pursuits, often dwarfing production costs. Google can help. At our press, we already make more than 3,400 books available free online, and we began using the "page-image" approach Google is taking in 1998: presenting a picture of the page, with searchable text behind it. Every new book we published was scanned and made navigable online, free, at the same time it was available for sale. To our delight, we found that page images with searchable text behind them actually seemed to increase sales, not replace them with online reading. The more visitors we got to our Web site, the more sales. For every 1,000 visitors, about six ordered at least one book in 2001. Google, Yahoo, AltaVista, MSN -- those engines became our most important promotional tool for books appealing to a small market.
    The page images that we made available then, and that Google is offering now, are what we call "non-optimal" online forms of a book: They enhance discovery, but they aren't conducive to extended reading. They are slow, semi-fuzzy, clumsy, not copyable, and hard to work with -- but they are fine for giving readers a sense of a book. For publishers, they're the best of print and digital worlds.
    It's worth noting, however, that our press crossed an "improvement threshold" recently. Because our parent institution, the National Academies, values dissemination of ideas as much as the press's balance sheet, we have been encouraged to continue experimenting. With two-thirds of our publications now displaying their pages as HTML text, instead of as page images, and with a new "chapter skim" feature that presents a reader with the most significant few lines of each page, we seem to be providing enough free value to our audience to satisfy some potential book purchasers who used to find page images not optimal for reading. A number of factors are involved, but as more visitors increase their use of our material, we are now seeing that for every 1,000 viewers, we get only about 2.25 book orders. We do have enough increased visitors (totaling around a million a month) to stay even on sales; at least for now, we are balancing wide dissemination of content with sales.
    But Google isn't proposing crossing that threshold for readability. Directors of publishing houses are often vague about what "digitization" really is. Their operational understanding of the differences among "digital books" in various formats (PDF, HTML, XML, TEI, JPG, TIFF, e-books, etc.) is limited at best. The functional differences in how a book is made available online are profound, with varying levels of ease of reading, copying, repackaging, extracting, storing, and using as an archival copy. Some formats are a threat to current publishing paradigms; others (like page images) are digital representations of the current paradigm.
    Many publishers (myself included) tend to think of a book as the pre-eminent artifact of knowledge. We grew up with paper. But the 30-ish assistant professors grew up with a computer, and today's students have grown up with the Internet. Both of the younger groups simply take online access for granted. For the last few years, I've heard (mostly older) scholars and librarians moan, "If they can't Google it, it doesn't exist for these kids." That's a reality publishers should be loathe to deny. In the next three to five years, as the most-productive young faculty members publish their works, I think we'll see what I call a "Google access citation effect."
    Citation indices let us know how well referenced any document is -- what books or articles or essays are most influential -- by measuring their references in new scholarship. If the new digitally driven scholars can Google an essay or book, then they'll use it for further research. If they can't, they may well not. If university-press monographs are not available via Google, then they will be more likely to be losers in the citation-index derby. That will have significant consequences a few years out.
    I can only speak for myself and not my press, but in general I think that it's in the best self-interest of scholarly publishers to relax a bit about how we respond to intellectual-property issues raised by digitization plans like Google's. We need a bit more trust, so that we can take advantage of the new capabilities of a networked society.
    It's vital, too, that university presses, university libraries, and university administrations understand each other and work for the common good of scholarship. Right now we're wary of each other. Libraries, still a key market for scholarly publishers, are driven by their own fears of the predatory pricing of a few huge commercial publishers, and they unfortunately paint university presses with the same brush. Publishers worry about librarians creating huge "e-reserves" by scanning material for local use by students and faculty members, jeopardizing classroom adoptions of books. Administrators fear entering agreements that might involve them in intellectual-property wars fought on their soil. I keep hearing, "What might happen if...?"
    We are all on the same side. Yes, there is much to be afraid of. But it would be a mistake to let our fears derail this opportunity for the mass digitization of our intellectual heritage. Google is offering something marvelous, if imperfect; its model is more likely to help more people find library resources and publishers' works than anything else on the horizon.

Michael Jensen is director of publishing technologies for the National Academies Press and director of Web communications for the National Academies.

Back to Michael Jensen's home page