Internet Society Vietnam

Last update:
Tuesday, March 19, 2002 2:15 PM

Google to scan books from big libraries. USA Today - Posted 12/14/2004 12:37 AM]

The common way to get around the problem is to scan to images.

This may actually be necessary when the quality of the paper
is poor and the ability of OCR software to recognize especially
the diacritic marks becomes poor.

But this means that file size is large, that text cannot be
indexed and searched and also that it is not accessible to people
with some disabilities.

This is the experience from a recent exercise in our library,
digitising old UN documents.

Vern

Subject: [OSS] Google to scan books from big libraries. USA Today - Posted 12/14/2004 12:37 AM
Date: Wed, 15 Dec 2004 20:26:18 +0000 (GMT)
From: Giao Ton <gtt00@hotmail.com>
Reply-To: oss@isoc-vn.org
To: vern.weitzel@undp.org

It could be interessting for our Vietnamese UNICODE mail group, because the
USA Today's article noted: "The project also poses other prickly issues,
such as how to convert material written in foreign languages, ..":

http://usatoday.com/tech/news/2004-12-14-google-books_x.htm?POE=click-refer

Best Regards.

Posted 12/14/2004 12:37 AM Updated 12/14/2004 3:07 PM
E-Mail Newsletters

Google to scan books from big libraries

SAN FRANCISCO (AP) — Google is trying to establish an online
reading room for five major libraries by scanning stacks of
hard-to-find books into its widely used Internet search
engine.

The ambitious initiative announced late Monday gives
Mountain View, Calif.-based Google the right to index
material from the New York public library as well as
libraries at four universities — Harvard, Stanford, Michigan
and Oxford in England.

The Michigan and Stanford libraries are the only two so far
to agree to submit all their material to Google's scanners.

The New York library is allowing Google to include a small
portion of its books no longer covered by copyright while
Harvard is confining its participation to 40,000 volumes so
it can gauge how well the process works. Oxford wants Google
to scan all its books originally published before 1901.

Scanning books so they can be read through computers isn't
new. Both Google and Amazon.com already have programs that
offer online glimpses of new books while an assortment of
other sites for several years have provide digital access to
some material in libraries scattered around the country.

But Google's latest commitment could have the biggest impact
yet, given the breadth of material that the company hopes to
put into its search engine, which has become renowned for
its processing speed, ease of use and accuracy.

"It's a significant opportunity to bring our material to the
rest of the world," said Paul LeClerc, president of the New
York Public Library. "It could solve an old problem: If
people can't get to us, how can we get to them?"

Librarians are also excited about the prospect of creating a
digital record for the reams of valuable material written
long before computers were conceived.

"This is the day the world changes," said John Wilkin, a
University of Michigan librarian working with Google. "It
will be disruptive because some people will worry that this
is the beginning of the end of libraries. But this is
something we have to do to revitalize the profession and
make it more meaningful."

The project gives Google's search engine another potential
drawing card as it faces stiffening competition for Yahoo
Inc. and Microsoft Corp.'s MSN. Attracting visitor traffic
is crucial to Google's financial health because the company
depends on revenue generated by people clicking on
advertising links posted next to the main body of search
results.

Scanning the library books figures to be a daunting task,
even for a cutting edge company such as Google, whose online
index of 8 billion Web pages already has revolutionized the
way people look for information.

Michigan's library alone contains 7 million of its library
volumes — about 132 miles of books. Google hopes to get the
job done at Michigan within six years, Wilkin said.

Harvard's library is even larger with 15 million volumes.
Virtually all of that material will be off limits Google
shows it can scan the material without losing or damaging
anything, said Harvard professor Sidney Verba, who also is
director of the university's library.

"The librarians at Harvard are very punctilious about
protecting their great treasures," Verba said.

The project also poses other prickly issues, such as how to
convert material written in foreign languages, and the issue
of protecting copyrighted books.

As it does with new books already included in its search
engine, Google will only allow its users to view the
bibliographies or other snippets of copyrighted books
scanned from the libraries. The search engine will provide
unrestricted access to all material in the public domain —
work no longer covered by copyrights.

The books scanned from libraries will be included in the
same Google index the spans the Web. By throwing everything
into the same pot, Google risks burying the library book
results far below the Web documents containing the same
search terms term, reducing the usefulness of the feature,
said Danny Sullivan, editor of Search Engine Watch, an
industry newsletter.

Copyright 2004 The Associated Press. All rights reserved.

<< Re: [OSS] Google to scan books from big libraries. USA Today - Posted 12/14/2004 12:37 AM

| Archive Index |

Google to scan books from big libraries. USA Today - Posted 12/14/2004 12:37 AM >>

To facilitate co-ordination regarding the introduction of OSS SW in Vietnam

Subscribe to OSS: