Jeffrey Adachi


Book Preservation


This page will describe my process for creating digital masters from paper books. (Of course, you should have the rights to make the copies.) There are two main flavors of creating digital versions of paper books:

Because my goal was to restore old books whose print quality had already degraded, I opted for the second approach.

Some topic to be covered:

  1. Scanning
  2. OCR
  3. Handling of Japanese Characters


If the pages of the original book are separable (e.g. spiral bound or falling apart) then it is probably easiest to feed them through a document scanner.
Home 3-in-one printer/scanner/copiers sometimes have document feeders, but they can be slow, may not support two-sided scanning and in my experience are prone to jamming.
An alternative is to take the book to a local photocopying business that offers scanning to a PDF.

If the original is a bond book and you don't want to cut the spine off, you might want to use a camera-based scanner.

Single Camera Systems

One type of scanner photographs the book laying open.
Distortion from the curve of the pages is removed digitally.
Examples of these scanners in the $500-$600 range are Advantages Disadvantages

Dual Camera Systems

Dual camera systems partially open the book and press the pages against glass plates arranged in a "V".
The idea is that the book is only partially open and the pages are kept flat by pressing them against glass plates.
This arrangement eliminates the need for digital distortion detection and correction.
Also, the spine of the book is automatically aligned with the "V" so page rotation is minimized.

For the DIYer, a design for a dual camera scanner is described here DIY Book Scanner.
Resolution can be upgraded by swapping out the consumer-grade cameras.
Claimed throughput is 1000 pages per hour.

I have good luck using one of these scanners, built from a kit. Archivist Quill.
I have written some image post-processing tools for cropping and background removal.
These are likely to be my first open source project.

Things to keep in mind:

Optical Character Recognition (OCR)

[Work in progress.]

Things to keep in mind: