BB eBooks Banner

Tips on Preparing a Scanned Book for eBook Conversion

Posted on 2013-Apr-30

by Paul Salvette

Mission Possible

Often we have clients who have been in the writing game for quite some time and are interested in taking a novel from their backlist and turning it into an eBook for sale at Amazon, Nook, Kobo, and other eBook platforms. If you happen to have a Word version of your novel--great! BB eBooks is happy to offer you our standard rates for eBook conversion. If you have a PDF file from your publisher that was generated from a desktop publishing program like InDesign or Quark, we can usually work with those too but may need to charge a slightly higher rate depending on how the PDF was compiled.

Unfortunately, Murphy’s Law is generally in effect for this situation, and the original publishing company has probably misplaced the computer file or cannot be reached via email or phone. In this case, we would recommend you perform what we call an “OCR Job” (short for Optical Character Recognition), which involves scanning the book, using a computer program to detect the characters, copyediting in Word, Pages, or another word processor of your choosing, and sending the manuscript back to us. Let’s talk about how best to approach this.

Destroying the Binding and Scanning

While book destruction is generally under the purview of bizarre cults, fascist political movements, and other weirdos, in this case we recommend you destroy the binding of your cherished tome to get the best scan results. You can usually finagle the pages out of the binding with some elbow grease if it is only held in with glue or, alternatively, you can use an Exacto knife or scissors to cut out each page if it is stitched (make sure you don’t cut into the content). Once you have the pages ripped out, you should find a colleague with a decent scanner. Scan each page as a JPEG image in as high resolution as possible (at least 300 dpi) and number them on your computer in some sort of logical order (e.g. page1.jpg, page2.jpg, etc.)

Alternatively, you can mail your book to a company like 1DollarScan, which will scan your book for a nominal fee. Unfortunately, BB eBooks does not perform the manual process of scanning at this time.

Running OCR or Just Send to BB eBooks

Now that you have hundreds of images for each page, you need to compile them into one big PDF file. If you are a client BB eBooks, please contact us and we can take care of this step for you if you send us a batch of JPEGs through DropBox. For those of you DIYers, you will need a decent (i.e. paid) version of Adobe Acrobat. The steps are as follows:

1) Convert Images to PDF: Select all the image files, right click, and select Combine Supported files in Acrobat.

Step 1

2) Finish PDF Conversion: Order the pages as necessary, click on the biggest file size option, and click Combine Files to create the PDF.

Step 2

3) Run the OCR Algorithm: Once in the PDF, click on Tools > Recognize Text > In This File to run the algorithm on all pages.

Step 3

4) Grab a beer, smoke, coffee, latte, whatever: This step takes a while. What’s going on is Adobe’s software is trying to turn the text that is stored as an image into actual computer code that can be recognized as text. This is critical for an eBook to work properly.

5) Save as a Word Document (or RTF): Now that the OCR algorithm has done its magic, you can save it to your favorite word processor for further processing Save As > Microsoft Word > Word Document.

Step 5

Copyediting an OCR Conversion

Now comes the hard part, actual copyediting the mess that the OCR software left. While this is a better option than retyping everything from scratch, it is still a long and arduous process. BB eBooks, unfortunately, cannot perform this step for you since we do not have skilled copyeditors in our shop (very hard to find in Thailand). You will notice that your manuscript is a bit of a hot damn mess. Below is an example (and we’ve seen much worse):

OCR Results

The first and most obvious problem is typos. Some common ones are as follows and a spell checker is a godsend:

  • “VV” (2 Vs) for “W”
  • The number “1” replacing a lowercase “l”
  • The number “0” replacing and uppercase “O”
  • Odd spacing between words
  • Important: If your print book had hyphens you must take them out and create one word (i.e. turn “in- appropriate” into “inappropriate”). The spell checker will sometimes, but not always, catch this error.
  • Adding italicized and bolded text where necessary

The next part is ensuring that paragraphs have a proper carriage return after them and are not “broken” by a carriage return. The best way to do this is to turn on paragraph marks and inspect all your paragraphs:

Paragraph Marks

You can delete the carriage returns in the middle by just pressing the delete key and then adding a space to substitute for the carriage return. To add a carriage return, you just need to press enter.

Trying to actually format the eBook (proper alignment, chapter headings, etc.) is a mess since Adobe’s software adds all sorts of strange corruption into it. However, if you have text that is correct with a carriage return at the end of each paragraph, we can work with it. Please feel free to send BB eBooks your copyedited OCR manuscript and we’ll take care of this step and get your manuscript turned into an eBook. Good luck!

Label: Technical and Design

comments powered by Disqus