Reading real books aloud using free software

Submitted by Sam on 1 June, 2012 - 16:47

I like to listen to audiobooks and podcasts whilst I draw, but I'm running out of books on my reading list which have audio editions. I hate reading from a book knowing that I could be doing something else at the same time if only it was in another format, so I am exploring ways to create my own audiobooks using text-to-speech software.

If a digital version of the book is available, it is quite trivial to get a screenreader to narrate it. The Amazon Kindle 2 has native support for reading text aloud, and Amazon also provide similar functionality in their Kindle app for Windows, through a very nice accessibility plugin. It can automatically turn the pages of the ebook, and reads in two surprisingly-listenable synthesized voices at a range of speeds.

When a digital version of the book is not available (which is often the case with the books I'm interested in), then the only option is to digitize from a paper copy. To have a computer read a book aloud page by page would require the following sequence of steps:

  1. Capture a digital image of the current page
  2. Process the image using optical character recognition software to extract the text
  3. Read the text aloud using text-to-speech software
  4. Turn the page and start the process again

I have written a very small batch script which chains together calls to several pieces of open-source software to fulfil the first three steps in this process, and have a rather basic system which can read a page of text from a book with a tolerable level of accuracy.

In my setup, I place a book on a flat surface, pull a lamp down to light it as evenly as possible, mount a camera on a tripod over it, connect to the camera from a computer, and process the image using image magick and tesseract, and read it out with espeak.

I am currently using my Android phone to capture images as it happened to be the quickest to tether to my computer (using the IP Webcam app for Android), but the whole setup could very easily be tailored for use with a tethered digital camera, perhaps using the remote control capture features of gphoto. I intend to connect my Nikon D5000 and my Nikon D70 in future, as these will provide much better image quality for the text recognition software to work with.

I have developed the book reader on Windows 7 so far, but all of the software I use is cross-platform. 

Software I used:

  • Wget for Windows 

http://www.gnuwin32.sourceforge.net/packages/wget.htm
Download the Setup option, e.g. "wget-1.11.4-1-setup.exe"
Run the setup to install 

  • Image Magick for Windows 

Download the self-installer, e.g. "ImageMagick-6.7.7-5-Q16-windows-dll.exe"
http://www.imagemagick.org/script/binary-releases.php#windows
Run the setup to install 

  • Tesseract-OCR for Windows 

http://www.code.google.com/p/tesseract-ocr/downloads/list
Download the Windows installer, e.g. "tesseract-ocr-setup-3.01-1.exe"
Run the setup to install 

  • Espeak for Windows 

http://www.espeak.sourceforge.net/download.html
Download the zip compiled for Windows, e.g. "espeak-1.46.02-win.zip"
Extract the zip e.g. C:\Program Files\espeak

I use the following example script to tie it all together (I saved mine as "speak.bat"):

@echo off
call "C:\Program Files\GnuWin32\bin\wget.exe" http://{IP OF ANDROID WEBCAM SERVER}:8080/photoaf.jpg
call "C:\Program Files\ImageMagick-6.7.7-Q16\convert.exe" -density 150x150 -compress none photoaf.jpg photoaf.tiff
call "C:\Program Files\Tesseract-OCR\tesseract.exe" photoaf.tiff booktext -l eng
call "C:\Program Files\eSpeak\command_line\espeak.exe" -v en -f booktext.txt

This batch script uses wget to request an image from the IP Webcam Server running on the Android phone (the phone's IP needs to be filled in the {}), saves it as 'photoaf.jpg' in the folder where the batch file is running from, passes the jpg to image magick to convert in to a Tiff (compression must be disabled for tesseract to read it correctly), gives the tiff to tesseract for processing, then reads the resulting text file using espeak.

Here is a video of the results so far:

As I say in the video, I think the accuracy can be improved by using a better camera and better lighting, and I will be looking at ways to automate turning the page after it has been read. A great resource for the kinds of design problems I can anticipate is http://diybookscanner.org, which is a community of people who have made software and hardware to digitize books. 

Attribution Noncommercial Share Alike
This Work, Reading real books aloud using free software, by Sam Haskell is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike license.
Drupal theme by Kiwi Themes.