Jump to content

Google Books Library Project

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Rich Farmbrough (talk | contribs) at 22:12, 31 January 2015 (Academic criticism). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

The Google Books Library Project is an effort by Google to scan and make searchable the collections of several major research libraries.[1] The project, along with Google's Partner Program, comprise Google Books (formerly Google Book Search). Along with bibliographic information, snippets of text from a book are often viewable. If a book is out of copyright and in the public domain, the book is fully available to read or to download.[2][3]

The project is the subject of the Authors Guild v. Google lawsuit, filed in 2005 and as of April 2014 on appeal.

Participants

The Google Books Library Project continues to evolve;[4] however, only some of the institutional partners are listed on the web page currently maintained by Google:[5]

Initial project partners

The number of academic libraries participating in the digitization and uploading of books from their collections has grown beyond the original five: Harvard, Michigan, Stanford, Oxford, and the New York Public Library.

Harvard University

Harvard University (and Harvard University Library) is an institutional participant in the project.[6] The Harvard University Library (HUL) today is best understood as a coordinated system of more than 80 libraries with shared holdings. The University Library is also a department of the University's central administration through which the libraries collaborate in the areas of digital acquisitions and collections, information technology, high-density storage, and preservation.[7]

The Harvard University Library and Google conducted a pilot throughout 2005. The project continued, with the aim of increasing online access to the holdings of the Harvard University Library, which includes more than 15.8 million volumes. While physical access to Harvard's library materials is generally restricted to current Harvard students, faculty, and researchers, or to scholars who can come to Cambridge, the Harvard-Google Project has been designed to enable both members of the Harvard community and users everywhere to discover works in the Harvard collection.

New York Public Library

The New York Public Library (NYPL) is an institutional participant in the project.[8]

In this pilot program, NYPL is working with Google to offer a collection of its public domain books, which will be scanned in their entirety and made available for free to the public online. Users will be able to search and browse the full text of these works. When the scanning process is complete, the books may be accessed from both The New York Public Library's website and from the Google search engine.[8]

Stanford University

Stanford University and Stanford University Libraries (SULAIR) is an institutional participant in the project.[9]

"Stanford has been digitizing texts for years now to make them more accessible and searchable, but with books, as opposed to journals, such efforts have been severely limited in scope for both technical and financial reasons. The Google arrangement catapults our effective digital output from the boutique scale to the truly industrial. Through this program and others like it, Stanford intends to promote learning and stimulate innovation."
-– Michael A. Keller, University Librarian.[5]

University of Michigan

Notice about the project

The University of Michigan (and the University of Michigan Library) is an institutional participant in the project.[10]

"The project with Google is core to our mission as a great public university to advance knowledge — on campus and beyond. By joining this partnership that makes our library holdings searchable through Google, UM serves as an agent in an initiative that radically increases the availability of information to the public. The University of Michigan embraces this project as a means to make information available as broadly and conveniently as possible.
  • "Although we have engaged in large-scale, preservation-based conversion of materials in the Library's collection for several years, and have been a leader in digital preservation efforts among research libraries, we know that only through partnerships of this sort can conversion of this scale be achieved. Our program is strong, and we have been able to digitize approximately 5,000 volumes/year; nevertheless, at this rate, it would take us more than a thousand years to digitize our entire collection."
-– John P. Wilkin, Associate University Librarian.[5]

University of Oxford

University of Oxford is an institutional participant in this project.[11] Oxford is the oldest university in the English-speaking world, and its historic Bodleian Library is the oldest university library.

"The Bodleian Library's mission, from its founding in 1602, has been based on Sir Thomas Bodley's vision of a library serving the worldwide 'Republic of Letters', with the Library's collections open to all who have need to use them. To this day over 60% of readers who use and work in the Bodleian Library have no direct affiliation with the University of Oxford . The Google Library Project in Oxford testifies to our ongoing commitment to enable and facilitate access to our content for the scholarly community and beyond. The initiative will carry forward Sir Thomas Bodley's vision and the ethos of the Bodleian Library into the digital age, allowing readers from around the world to access the Library's collections over the World Wide Web."
-– Ronald Milne, former Director of Oxford University Library & Bodleian Librarian.[5]

Additional Project Partners

Other institutional partners have joined the Project in the years since the partnership was first announced:

Criticism

The large scale goals and methodology of the Google Books Library Project has come under criticism in two major areas: legal and academic.

The project has been somewhat indiscriminately digitizing library books regardless of copyright status, which has led to a number of lawsuits against Google. As of the end of 2008 Google had reportedly digitized over seven million books. Of these, only about one million were works in the public domain. Of the rest, one million were in copyright and in print, and five million were in copyright but out of print. In 2005 a group of authors and publishers brought a major class-action lawsuit against Google for infringement on the copyrighted works. Google argued that it was preserving "orphaned works" -- books still under copyright, but whose copyright holders cannot be located. [28]

The suit was ultimately settled out of court in 2009, but the settlement itself has been controversial, as it potentially sets Google up as the worlds largest information broker, and virtually frees it from any copyright liability.[29][30] There are also questions as to how the settlement, arrived at in US courts, would impact authors and publishers in other countries. In 2011 the judge overseeing the settlement put it on hold, [31][32] and as of April 2014 the settlement decision is being appealed by the original plaintiffs in the class-action.

Academic criticism

A scanning error in De morbis puerorum tractatus locupletissimi

For many scholars a far more egregious problem with the project stems from the fact that it does not seem to be meeting its fundamental state goal of preserving orphaned and out-of-print works. Google is apparently passing huge numbers of scanned and electronic-text books into circulation without editing the texts for errors introduced by these digitizing processes. This problem has been apparent for a number of years,[33][34][35] but became obvious in a big way in 2014, when Google formed a partnership with bookseller Barnes & Noble.[36] As part of that partnership, Google made available to Barnes & Noble more than a half-million public domain texts scanned by the project, to be offered as "free books" in their Nook Shop for their eBook readers.

Customers downloading these books quickly discovered that as many as 80% of them were essentially unreadable, riddled with huge numbers of errors introduced either through the scan process itself, or the conversion of the scans into electronic text via OCR (optical character recognition) software. Google apparently scanned the texts, but did not trouble to edit them for errors, and Barnes & Noble compounded the problem by not exercising any quality control over the Google texts, simply offering them in their shop to customers, unexamined.

The effect of these scan and OCR errors is to render the contents of these books into gibberish. This is especially a problem in scientific works, where incorrectly rendered, missing, or extraneous characters in pages of equations essentially render the equations meaningless. There are concerns that libraries participating in the project may be destroying old texts once they have been scanned. Many of these texts are old and scarce, and converting them from physical library books to unreadable scans may effectively be destroying some of these works forever, rather than preserving them.[37]

Also of concern are the large numbers of metadata errors in the Google collection. Metadata refers to the information which identifies a particular text: title, author, publisher, publication place and date, subject classification, etc. -- essentially the information which would be found in a library card catalog.

One casual investigator found thousands of such errors in the samples he took, including publication dates the predated the birth of the books' author (e.g., 182 works by Charles Dickens supposedly published prior to his birth in 1812); wildly inappropriate subject classifications (an edition of Moby Dick found under "computers"; a biography of Mae West classified under "religion"); conflicting classifications (10 editions of Whitman's Leaves of Grass all classified as both "Fiction" and "Nonfiction"); incorrectly spelled titles, authors, and publishers (Moby Dick : or the White "Wall"); metadata for one book incorrectly appended to a completely different book (the metadata for an 1818 mathematical work leads to a 1963 romance novel); books about the internet with publication dated before the internet existed; and many, many more.

Such metadata errors make doing any serious research using the Google Books Project database virtually impossible -- even assuming all of the scanned texts were edited and error corrected. To date, Google has shown only limited interest in cleaning up these errors.[38]

See also

Notes

  1. ^ Stein, Linda L.; Lehu, Peter, J (2009). Literary Research and the American Realism and Naturalism Period: Strategies and Sources. p. 261.{{cite book}}: CS1 maint: multiple names: authors list (link)
  2. ^ Google Books Library Project – An enhanced card catalog of the world's books
  3. ^ The book may not be readable, however; see "Criticism" section.
  4. ^ O'Sullivan, Joseph and Adam Smith. "All booked up," Googleblog. December 14, 2004.
  5. ^ a b c d e Google Library Partners
  6. ^ "Harvard-Google Project". Harvard University Library. Retrieved 28 August 2013.
  7. ^ HUL summary/overview
  8. ^ a b New York Public Library + Google
  9. ^ "Stanford's Role in Google Books". Stanford University Libraries. Retrieved 28 August 2013.
  10. ^ "Michigan Digitization Project". MLibrary - University of Michigan. Retrieved 28 August 2013.
  11. ^ "Oxford Google Books Project". Bodleian Libraries, University of Oxford. Retrieved 28 August 2013.
  12. ^ a b c d Albanese, Andrew (2007-06-15). "Google Book Search Grows". Library Journal. Retrieved 28 August 2013.; Staatsbibliothek + Google (in German)
  13. ^ "Columbia University Libraries Becomes Newest Partner in Google Book Search Library Project". Columbia University Libraries. 2007-12-13. Retrieved 28 August 2013.
  14. ^ CIC + Google
  15. ^ Complutense Universidad + Google (in Spanish)
  16. ^ "Cornell University Library becomes newest partner in Google Book Search Library Project". Cornell University Library. Retrieved 28 August 2013.
  17. ^ Ghent/Gent + Google
  18. ^ "Keio University to partner with Google, Inc. for digitalization and release of its library collection to the world For "Formation of Knowledge of the digital era"" (PDF). Keio University. 2007-07-06. Retrieved 28 August 2013.
  19. ^ Biblioteca de Catalunya (BNC) + Google (in Spanish)
  20. ^ Koninklijke Bibliotheek and Google sign book digitisation agreement
  21. ^ Cliatt, Cass (2007-02-05). "Library joins Google project to make books available online". Princeton University. Retrieved 30 August 2013.
  22. ^ "UC libraries partner with Google to digitize books". University of California. 2006-08-09. Retrieved 30 August 2013.
  23. ^ Bibliothèque cantonale et universitaire (BCU) + Google (in French)
  24. ^ Anderson, Nate (2007-05-22). "Google to scan 800,000 manuscripts, books from Indian university". Ars Technica. Retrieved 30 August 2013.
  25. ^ "The University of Texas Libraries Partner with Google to Digitize Books". The University of Texas Libraries. 2007-01-19. Retrieved 30 August 2013.
  26. ^ Wood, Carol, S. (2006-11-14). "U.Va. Library Joins the Google Books Library Project". University of Virginia. Retrieved 30 August 2013.{{cite web}}: CS1 maint: multiple names: authors list (link)
  27. ^ "University of Wisconsin-Madison Google Digitization Initiative". University of Wisconsin-Madison. Retrieved 30 August 2013.
  28. ^ Darnton, Robert; Google and the Future of Books; The New York Review of Books; February 12, 2009
  29. ^ http://arstechnica.com/tech-policy/2010/01/the-sequel-stinks-critics-trash-new-google-books-settlement/
  30. ^ http://www.nytimes.com/2009/04/04/technology/internet/04books.html?
  31. ^ pagewanted=all
  32. ^ http://www.theguardian.com/technology/2009/sep/25/google-books-delayed
  33. ^ http://www.mcelhearn.com/ebooks-and-typos-readers-and-consumers-deserve-better/
  34. ^ http://www.washingtonpost.com/wp-dyn/content/article/2009/02/06/AR2009020601069.html
  35. ^ https://musictechpolicy.wordpress.com/2009/09/29/google-books-how-bad-is-the-metadata-let-me-count-the-ways/
  36. ^ http://www.nytimes.com/2014/08/07/business/media/google-and-barnes-noble-unite-to-take-on-amazon.html?_r=0
  37. ^ https://bookclubs.barnesandnoble.com/thread/4616
  38. ^ Nunberg, Geoffrey; Google's Book Search: A Disaster for Scholars; The Chronical of Higher Education; August 31, 2009

References

  • Lester, June and Wallace C. Koehler. (2007). Fundamentals of Information Studies: Understanding Information and Its Environment. New York: Neal-Schuman Publishers. 13-ISBN 978-1-555-70594-7/10-ISBN 1-555-70594-4; OCLC 122526045
  • Miller, Michael. (2007). Googlepedia: the Ultimate Google Resource. Indianapolis, Indiana: Que. 13-ISBN 978-0-789-73639-0/10-ISBN 0-789-73639-X; OCLC 224762694
  • Stein, Linda L, and Peter J. Lehu. (2009). Literary Research and the American Realism and Naturalism Period: Strategies and Sources. Lanham, Maryland: Scarecrow Press. 13-ISBN 978-0-810-86141-1/10-ISBN 0-810-86141-0; 13-ISBN 978-0-810-86242-5/10-ISBN 0-810-86242-5; OCLC 233798804