Recent improvements in e-book making

September 2013

My hobby is making e-books out of nineteenth century and early twentieth century material.When I have finished processing a book and successfully put it through a large number of tests, I arrange for it to be made available to anyone in the world, for free, by putting it onto the Project Gutenberg bookshelf. There are over 40,000 books on Project Gutenberg, over 700 of which have been made by me. It recently dawned on me that this process is becoming much easier, and may become even easier still in the near future. I hope to explain this briefly here.

There are several steps to follow in making an e-book.

1. Selecting an author. Once selected I try to make sure he or she is totally represented on Gutenberg. You might find that an author wrote ten to twenty books, but some of the older ones wrote a hundred books or more.

2. Finding out what books that author wrote, which of them are already on Project Gutenberg, and which of them might well be put there, if a copy of the book can be found. In order to get a hard copy of a book I use Abebooks, which is a consortium of booksellers all over the world, with software that enables you to find an affordable copy. It may also be possible to find a copy using eBay. Some booksellers get the book to you in little more than a day, while some take more than a week. I have a list of the latter class, and try not to use them. To answer this question I consult Copac (a well-presented list of what is in the copyright libraries, and other main libraries), also NGCOBA (New General Catalogue of Old Books and Authors), as well as Abebooks and eBay.

3. Scanning the book. Currently I use a flat-bed scanner for books that can be opened out flat, and the very portable Handyscanner for books that have to be scanned one page at a time. The flat-bed scanner does two double-page scans per minute, while the Handyscanner does from three to five single-page scans per minute. Thus the Handyscanner is not always faster than the flat-bed scanner, and it takes practice to make sure it is making good scans. But I suspect that very soon I shall be using an even better scanner.

4. Cleaning-up the scans. I use the excellent and free “Scan Tailor”. This produces tidied-up images of each page, straightened, and squared up so that the edges of the text blocks are horizontal and vertical, just as they appeared in the book. Most of this work can be done in the background, so that you can get on with something else if you like.

5. OCR - Optical Character Recognition. I use ABBYY FineReader 11, which is way better both in speed and accuracy than its previous release 10.

6. Checking and editing the text. I use my own software, which works under MSDOS, and which I put into the public domain some six years ago, with regular updates. I wrote the first version of this software in 1997. The reason for any upgrading is to make sure that some class of error not previously looked for, can be easily found. I cannot say that it invariably produces a totally perfect e-book, only a very nearly perfect one. I rely on people reading the book when it is on Gutenberg, and letting me know if they find an error. There are probably two or three per book.

7. E-book formats tested here. First and foremost I make the FB2 format. This is a Russian design, very easy to make with software designed by myself, and easy to test using Haali Reader. Most e-readers use the epub format. Until very recently this was slow and not very easy to make, but now it can be made in literally two or three seconds. I use Sigil to test the epub. Then there is the RTF format, which I use to make the PDF format, which is the ideal one in which to read a book on the Mebook.

8. Sending the book to Gutenberg. There is the plain TXT format, which was the original one favoured by Gutenberg, whose founder used to call it the Plain Vanilla Ascii format, or PVASCII. And lastly there is the XHTML version, which is the one in which you have to present the book to the team at Gutenberg, after it has been thoroughly tested for a high technical quality.

9. Listening to the book. There are several apps that run on the IPod or other Apple devices, that can read the book aloud. However, these seem to have difficulty in making a little pause after each paragraph, which I consider essential. I have no doubt this will be fixed in the next year. Meanwhile you can either make a set of MP3 files for use on an e-reader, or you can play the book directly using your PC or laptop. I used to make a CD for each book, as work on it was finished, and then when the DVD was available, I could put several books on each one. Listening to an e-book is an ideal way of improving a car journey.

10. Formatting for the Kindle. This device uses the Mobipocket format, MOBI, which is the same as the PRC format. There is a Mobipocket web page for making e-books in this format, from various sources such as your XHTML one. The Amazon Kindle uses the MOBI format, and also has a private format of its own, AZW, which it discourages you from generating e-books in. I have bought three Kindles as presents for friends and relations, and in each case have added over 100 MOBI e-books for them to read. Some of these I made with the Mobipocket software and some with Calibre, a free e-book-making program. There is now an online converter which is very fast and very accurate. Kindle have brought out a utility kindlegen, which is behind many of the FB2 to MOBI and EPUB to MOBI softwares, but I cannot make it work properly as it does not seem to include the all-important Table of Contents. The online converter I have just mentioned does make the Table of Contents, and runs in a matter of two to three seconds.