My life over the last two years has completely revolved around the digitization of books, but why? Why are we digitizing books? I’d like to think that it’s because we realize that the majority of the world’s knowledge is trapped in the written word, the world’s knowledge is trapped because the physicality of books locks the data in a container that only has one delivery vehicle.
Since the invention of the written word, mankind has recorded our history one event at a time by writing it down. As a result, man has spent countless hours poring over cave drawings, scrolls and manuscripts in hopes of learning from those who had the foresight to record the events that were important to them in their microcosm of time.
Technologies developed in the last half of the 20th century have provided us with the unique opportunity to not only record the vast knowledge accumulated in the world’s library through digitization, but to make that information available to the world via a plethora of delivery methods.
Think of the possibilities–what if all the world’s medical research journals could be digitized and made available to today’s researchers in a data set allowing them to cross reference modern medical techniques with Tupachiy, the ancient Peruvian alchemy of gathering and mixing the herbs and different elements, thus creating a specific treatment for each individual and malady?
There are a myriad of simultaneous scanning programs digitizing books, newspapers, family records and the like, most of these programs are executed under the guise of sharing knowledge when in actuality, they are in fact digitizing mountains of content to further their own interests.
Google is scanning millions of books with a number of libraries around the world, in hopes of making that content searchable through the Google search engine and ultimately profiting by the sale of the books. Amazon has their own digitization program, making snippets of books viewable through Amazon.com solely for the purpose of selling books online, and repurposing the book and selling it exclusively through their eBook reader the Kindle.
Google’s efforts although monumental in size, have fallen far short when it comes to quality. Their efforts being centered on their search engine are not concerned with capturing all the information contained within the books they are currently digitizing. Scholar after scholar has expressed their disappointment with Google’s efforts. Researchers are finding numerous pages missing, pages folded over, poorly executed OCR rendering the information woefully incomplete.
It’s baffling to me that quality isn’t more important when it comes to the digitization of recorded history. I shudder to think where we would be today if Nicolaus Copernicus’ De Revolutionibus Orbium Coelestium had every third page removed before it went to print. Man most likely would have never set foot on the moon. Copernicus’s recorded theories had immense influence on later thinkers of the scientific revolution, including such major figures as Galileo, Descartes, and Newton.
We owe it to our children and future generations to digitize all the world’s printed history at the highest quality available, not only for today’s technologies, but for technologies not yet even conceived, let alone invented.
Tags: book scanner, digitizing books, Google, Google Book Scanning, Google Search, Kirtas, Kirtas Books, KirtasBooks.com, scanning books
What is your take on the OCLC document “Shifting Gears”(http://www.oclc.org/programs/publications/reports/2007-02.pdf) and the ideas of access vs. preservation and quantity vs. quality?
“Shifting Gears” does a great job touching on a lot of the issues institutions face when it comes to digitization. We all need to find that balance between quality and quantity, access and preservation.
I think we need to take the long tail into account, how might these images be used in 25, 50 or even 100 years from now. I feel its important to capture at a quality level that allows for multiple uses at the back end of the work flow.
There is so much information held in climate controlled vaults all around the world, information that is not readily available to scholars, let alone the general public. I am a firm believer in making all that content available to the masses.
Although access is ultimately the most important outcome, of digitization, without a certain level of quality…the exercise becomes futile. If you can only accurately OCR 80% of the content digitized, you only get part of the story. What if the cure for cancer is out there, just waiting for two or three documents to be digitized, and linked through a data set, and 20%of the information isn’t available to researchers because of poor quality. Our hopes of finding a cure, become that much difficult, or maybe the cure is never found.