I often add highlights or notes to the digital documents I read. But I found annotating ebooks a nuisance. My main problem is that, unlike pdf, epub (the ebook document format) cannot store annotations in the ebook itself. Instead most ebook readers (iBooks or Marvin) store the annotations you make in one local database. This makes it harder to read my annotated ebooks on another device. Also, there is always a risk that somehow the annotations get lost or detached from the ebook. Not a nice thought if you rely on your annotations! And most importantly, it locks you into a particular ecosystem: you cannot take the ebooks and their annotations and start using a different system for reading your ebooks (either different software or hardware).

The last issue is actually a quite fundamental problem with ebooks, as most of them are copy-protected. DRM ties your books to the particular platform you open them first. It should not surprise you then that I prefer non-DRM ebooks. (And I have been using this tool to remove any DRM from the ebooks that I buy quite successfully.)

To solve the annotation problem, I decided to convert my ebooks to pdf right after buying them. The reading experience of a typical pdf reader is slightly worse than that of an ebook reader, but if you use a decent pdf reader the difference is not that big. And you will typically get much better annotation and other editing possibilities in return. For reading and annotating pdf I use PDF Expert on an iPad pro with and iPencil. (Which, by the way offers a superb tablet experience, compared to the Microsoft Windows based tablets which I have used for at least eight before.)

But how to convert ebooks in epub format to pdf? There are some online tools, but they are inconvenient, and the conversion results are mixed.

I recently started to use the conversion program in Calibre, and I have been quite happy with it. (Calibre itself is supposed to be a good ebook management application, but the user interface is really a horrible mess…). Luckily, the conversion program is a stand alone, command line tool called ebook-convert. On the Mac it can be found in /Applications/calibre.app/Contents/MacOS/ebook-convert. It has many options, see the manual. One particular useful option is --enable-heuristics, which fixes epub’s that otherwise would be converted in pdf’s that do not have a clear chapter separation. So I typically use

ebook-convert input.epub output.pdf --enable-heuristics

(Yes, options come last on the command line…). There is a problem however. The pdf generated this way does not clearly separate or mark individual words. This means that if you want to highlight a word, you cannot easily select it: you can select all letters in a word one by one, but a single click on a word just selects one letter, not a word. Moreover, highlights created this way do not show the text you highlighted in annotation summary panes in applications Preview of Acrobat Reader. (Why this is the case, I do not know.) This makes it hard to get an overview of all the annotations you made earlier.

There is a solution however: convert the pdf to another pdf with a better structure using ghostscript. The command to use is

gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 \
-dPDFSETTINGS=/ebook \
-dNOPAUSE -dBATCH \
-sOutputFile=input.pdf output.pdf`

This requires ghostscript to be installed on your system. My full script for converting ebooks to pdf is therefore

function epub2pdf()
    { /Applications/calibre.app/Contents/MacOS/ebook-convert "$1".epub \
        "$1".tmp.pdf \
        --enable-heuristics --output-profile=ipad3 $2 ;
      gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 \
         -dPDFSETTINGS=/ebook \
        -dNOPAUSE -dBATCH \
        -sOutputFile="$1".pdf "$1".tmp.pdf ;
      rm "$1".tmp.pdf
    }

And with that I can happily annotate my ebooks, with the annotations stored within the ebook, and clearly shown in the annotations summary.