BB eBooks Banner

EPUB and KindleGen Tutorial

The eBook Design and Development Guide

eBook Design Development Guide Cover

For complete access to all the templates, tips, and tricks that BB eBooks uses for eBook production, please consider buying the eBook Design and Development Guide at Amazon for only $6.99. In it you will find comprehensive HTML, CSS, and Regular Expression tutorials, as well as a step-by-step workflow for turning a sloppy manuscript into a beautiful eBook that is only available in this guide. A PDF version is available upon request following purchase.


EPUB and KindleGen Index


Introduction

For many independent authors and small presses, converting eBooks from your HTML source markup using third-party programs such as Calibre, Sigil, or Scrivener may be a “good enough” solution to provide a decent reading experience for your customers. These programs are great solutions for independent authors and small presses; however, they may introduce unnecessary metadata, mangle CSS styling, and create anomalies in the Table of Contents. If you are a perfectionist who isn’t afraid to get her hands dirty with some nerdiness, this section discusses the actual structure of an EPUB file and how to convert EPUB files into a MOBI/KF8 eBook with the Amazon.com KindleGen program. This will allow you to not be dependent on eBook conversion software created by someone else, and you will be able to construct your eBooks entirely from the ground up at a professional level of quality that even some of the big publishing houses lack.

The EPUB format is based on the International Digital Publishing Forum’s (IDPF) specification for the EPUB 2.0.1 format, which is an open source standard. The actual specification is available on the IDPF website and it is not a particularly exciting read. The specification is geared toward programmers and technophiles rather than self-publishers and small presses. In this guide a basic understanding of the EPUB format will be presented so that you can construct your own EPUBs by hand-coding them yourself.

Please note that the EPUB 3.0 specification has been released as of late 2011. Unfortunately, the eBook stores have been very slow to adopt this standard. The EPUB3 specification will allow for more complex eBook designs that include audio/video embedding, footnote support, and even JavaScript. While new software for eReading devices is being developed, BB eBooks will continue to focus on the EPUB 2.0.1 standard. Please note that understanding the concepts of EPUB 2.0.1 will prepare you well for when EPUB 3 is widely adopted.

The MOBI/KF8 format is loosely based on the standards outlined by the International Digital Publishing Forum for EPUB, but with Amazon’s own proprietary spin. You need to convert the EPUB format into the MOBI/KF8 format by using a command-line program from Amazon.com called KindleGen, which is available for free from Amazon.com. Using KindleGen with a source EPUB is the best way to get a professional MOBI/KF8 file.

This advanced guide may be a little bit challenging for those without an IT background, and it is not necessary for every self-publisher to understand. However, once you comprehend the principles of the EPUB format, you will have a very high knowledge base for designing eBooks. Free EPUB standards for both EPUB-type eBooks and the source EPUB for the MOBI/KF8 compilation are available at the BB eBooks Developers page for your convenience.

In addition to the tools utilized in the previous sections, you need the following software as prerequisites for manually coding EPUB and MOBI/KF8:

  • The command-line zip.exe program [free]
  • The command-line KindleGen.exe program from Amazon [free]

It is also helpful to have the technical guidelines laid out by the major eBook stores when working on developing your EPUB package

  • Amazon Kindle Publishing Guidelines [pdf]
  • Barnes & Noble PubIt! Guidelines [pdf]
  • iBookstore Asset Guidelines [pdf]

Important Note: If you are using Windows, ensure that you are not hiding extensions inside your directories (i.e. the file mybook.epub appears in your directory and not mybook). This will be essential in modifying the extensions in this section. A tutorial of how to do this in Windows XP/Windows 7 is available from the How To Geeks.


Splitting the HTML Source File into Numerous HTML Files

Before delving into the EPUB structure, you need to split up your massive source HTML file into numerous little pieces. The reason for this is that eReading devices are not known to be the fastest parsers of HTML due to their limited processing power. If you’re eBook is one large source file (e.g. War and Peace is approximately a 3.2MB HTML file), it will cause serious lag and readability issues when a reader tries to open your eBook. Splitting your HTML files into numerous smaller files also has the added benefit of automatically inserting page breaks in between each split file. Therefore, the convention is normally to split HTML files at each chapter.

You want to make sure that your HTML files are less than about 300KB each. You can use the exact same HTML Head Section for each file. XHTML 1.1 boilerplate is provided for your convenience at the BB eBooks website. Make sure that you name your individual content files something logical for future reference (e.g. content001.html, content002.html, etc.) You define how the eBook orders the individual HTML files in the Spine Section of the content.opf file (discussed below).

Splitting one HTML files into numerous smaller ones may present a challenge to you with respect to internal hyperlinks. For instance, if you have a hyperlink to a footnote, it may look something like this:

…in regards to this publication.<a class="myfootnotes" href="#footnote23">23</a> Moving on…

If this hyperlink is now in a file called content051.html and the footnote section is now in content077.html, the link to this footnote is now broken. That is why after you split your files, you need to go back and correct all of your internal hyperlinks to ensure that they reference the correct content file. Yes, this is a tedious and frustrating process, but the alternative of having one monster-sized HTML source file will only create frustrated readers.

In the example above with the footnotes section in the content077.html file, your markup would change to the following:

…in regards to this publication.<a class="myfootnotes" href="content077.html#footnote23">23</a> Moving on…

Another gotcha is to make sure that your CSS file has the correct relative path in the link element of every HTML file. BB eBooks uses a sub-directory system where all content files are in a directory called content and CSS files are in a sibling directory called css. The directory structure is discussed in more detail below. Therefore, the link element in the head section of each HTML file would look something like this:

<link type="text/css" rel="stylesheet" href="../css/bbstylesheet-epub.css" />

Please note the relative path ../css/ instructs the HTML file to go one directory up and then into the css directory. The same principle applies to images when working with the src attribute. Ensure that your relative file paths are correct, because this is a frequent source of errors that the EPUB validator may not catch.


Cover Page HTML

Some eReading devices (most notably Adobe Digital Editions) do not automatically render the cover as the first page. A solution to ensure that your cover image is the first page of the eBook is to create a simple HTML file that only displays the image. You will insert this file into your EPUB package. The cover image is typically a JPG 800px long and about 500-600px high; however, feel free to change these dimensions based on what eReading device you are targeting with your EPUB.

A sample coverpage.html file (with cover.jpg being the actual cover image) is as follows:

<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
<head>
<meta http-equiv="Content-Type" content="application/xhtml+xml; charset=utf-8" />
<title>Sample EPUB 2.0.1 File for Developers</title>
<style type="text/css">
body {text-align: center; padding: 0; margin: 0;}
div {text-align: center; padding: 0; margin: 0;}
img {padding: 0; margin: 0; height: 100%;}
</style>
</head>
<body>
<div>
<img src="cover.jpg" alt="Cover for Sample EPUB 2.0.1 File for Developers" />
</div>
</body>
</html>

There is a sample coverpage.html file provided on the BB eBooks Developers page for your convenience. You will use this file in your EPUB package in later portions of this guide.

Tip: The height: 100%; CSS ensures that the image fills up the entire screen of the eReading device.

Important Note: The Cover Page HTML file should not be used when building MOBI/KF8 files for the Kindle. All Kindle eReading devices have a special button in the user interface that allows the reader to access the cover, so this file is not needed. Furthermore, KindleGen throws errors during compilation if you try to reference your cover.jpg file anywhere in your HTML. This will be discussed later in the guide.


Table of Contents HTML (Required)

eBooks actually have two separate Table of Contents: an NCX (or Meta) Table of Contents and an HTML Table of Contents. Different eReading devices utilize these two Tables of Contents in different ways. For example, Adobe Digital Editions shows the NCX Table of Contents in the left-hand side bar, and the HTML Table of Contents appears inside the reading pane:

The iBooks apps utilizes the NCX Table of Contents whenever you click on the little TOC icon in the upper-left corner. The Kindle renders the two Tables of Contents in a slightly different way. The older e-ink Kindles use the NCX Table of Contents to facilitate chapter-to-chapter navigation with the 5-way controller. However, whenever you click on “Table of Contents” in the Kindle Fire or Kindle for iOS, it will take you to the HTML Table of Contents.

Functionally, both Tables of Contents are similar in that they provide a way for the reader to easily navigate the document. However, they are extremely different in terms of syntax: the HTML Table of Contents is a standard HTML file with numerous internal hyperlinks, and the toc.ncx file is an XML oddity based on the EPUB standard discussed below. There is a lot of not-so-good advice online that says you only need one type of Table of Contents or the other. The fact of the matter is that a well-designed eBook will have both NCX and HTML Tables of Contents. It may seem strange for fiction authors to have a Table of Contents in their eBook in the first place, but you have to ensure that an easy-to-use navigation system is available to the reader whether the eBook is a short story, non-fiction work, novel, or otherwise.

You need to create a separate HTML file that just has internal hyperlinks to different parts of your eBook. The only purpose of this HTML file is to function as a Table Contents that allows the reader to click and access different parts of the eBook. The HTML Table of Contents should be standalone and not contain actual content. This ensures that you can easily move the HTML Table of Contents to the front of the eBook or the back of the eBook based on you or your client’s preference

Below is a sample HTML Table of Contents, which is part of the EPUB standard located at the BB eBooks Developers page:

<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
<head>
<meta http-equiv="Content-Type" content="application/xhtml+xml; charset=utf-8" />
<link type="text/css" rel="stylesheet" href="../css/bbstylesheet-epub.css" />
<title>Sample EPUB 2.0.1 File for Developers</title>
</head>
<body>
<h2>Table of Contents</h2>
<p class="toctext"><a href="../coverpage.html">Cover</a></p>
<p class="toctext"><a href="content001.html#h1-1">Sample EPUB 2.0.1 File for Developers</a></p>
<p class="toctext"><a href="content002.html#h2-1">Chapter 1 - Getting Started</a></p>
<p class="toctext"><a href="content003.html#h2-2">Chapter 2 - Media</a></p>
<p class="toctext"><a href="content004.html#h2-3">Chapter 3 - Caution</a></p>
<p class="toctext"><a href="content005.html#h2-4">Chapter 4 - MOBI/KF8</a></p>
</body>
</html>

Please note that the file htmltoc.html is located in the same directory as the content HTML files. However, the coverpage.html file is located in the parent directory. You can adjust this based on your preferences. As you can see, there is a listing of links that go to different anchors within the content files of the eBook. Feel free to be creative with your Table of Contents. Since it is an HTML file, all the same styling dos and don’ts that apply to your eBook’s content HTML apply to the HTML Table of Contents. You may want to consider adding some images, for instance. This HTML Table of Contents will be inserted into the EPUB package as discussed below.

Important Note: Again, do not include a link to the coverpage.html file with the source EPUB for MOBI/KF8 files. When you go to create your EPUB specifically for the MOBI/KF8 file, you need to make sure that you alter your HTML Table of Contents to remove the reference to the cover.


EPUB Structure

The EPUB file with the .epub extension is actually a compressed zip file which contains all of the content, metadata, and media of your eBook in a single, standalone package. However, unlike most zip files, there is a very specific way in which the eBook must be compressed. The EPUB standard relies heavily on the use of XML files. The easiest way to think about eXtensible Markup Language (XML) is that it is well-marked up data that can be easily understood or “parsed” by computer programs. You will recognize the syntax as being very similar to HTML in that it uses tags enclosed in < and > with data in between the opening and closing tags.

Inside the EPUB package are the following files:

  • The HTML content of your eBook (required)
  • An XML file called toc.ncx which is the NCX Table of Contents (required)
  • An XML file called content.opf which contains exactly how the EPUB is structured, what files are in the EPUB package, and the eBooks relevant metadata (required)
  • An XML file called container.xml which tells the eReader where the content.opf file is located in the compressed directory structure (required)
  • A text file called mimetype which says that the EPUB file is an EPUB and ZIP file (required)
  • The cover and content images (optional)
  • Audio, Video, Fonts and other media (optional)
  • One or more CSS files (optional)

Important Note: All of these files are case-sensitive, which may seem unusual for Windows users. So, be careful when you are building your EPUB package.

When uncompressed, the EPUB package will display multiple sub-directories. It is required that the mimetype file be located in the root directory and the container.xml file be in a directory called META-INF. The directory OEBPS is usually where the content.opf and toc.ncx file is stored, but the directory name is not required to be OEBPS. It is considered a good practice but not mandatory to organize multiple sub-directories in the OEBPS directory that contain different parts of your eBook (e.g. content HTML files, images, CSS files, and fonts).

For the purpose of this guide, the files that comprise the EPUB package are organized into the following directory structure:

  • /mimetype – this file must be in the root folder
  • /META-INF/container.xml – points to the location of the content.opf file (must be in this folder)
  • /OEBPS/content.opf – the OPF package (can be in any folder, but OEBPS is recommended)
  • /OEBPS/toc.ncx – the NCX Table of Contents
  • /OEBPS/coverpage.html – the Cover Page HTML file (discussed later)
  • /OEBPS/content/– content HTML files
  • /OEBPS/images/ – image files
  • /OEBPS/css/ – CSS style sheets
  • /OEBPS/fonts/ – font files (if any)

A sample EPUB package can be viewed at the BB eBooks Developers page and copied freely for your convenience. Please feel free to use it for your own eBook projects (although attribution is always appreciated).


The mimetype and container.xml Files

The mimetype file has no extension, and the content must read exactly as follows:

application/epub+zip

The mimetype file must not have any spaces or line returns inside the text file. It must be exactly 20 bytes and be located in the root directory of the EPUB package. When you save the file in your text editor, ensure that is encoded in ANSI.

The container.xml file defines where the content.opf file is located within the EPUB package. An example is as follows:

<?xml version="1.0"?>
<container version="1.0" xmlns="urn:oasis:names:tc:opendocument:xmlns:container">
  <rootfiles>
    <rootfile full-path="OEBPS/content.opf" media-type="application/oebps-package+xml"/>
  </rootfiles>
</container>

The only part of the XML you ever need to adjust is the full-path attribute to where the content.opf is located relative to the mimetype file. It is not necessary to label the directory that contains the OPF file OEBPS or the actual OPF file as content.opf, but it is recommended as a best practice.


The content.opf File

Overview

The content.opf file is the most important part of the EPUB package, because it defines the structure of the eBook and the metadata. It is also the file that will tend to cause EPUB validation errors for newcomers, so please be careful with the syntax and markup. The OPF file is an XML document, and it uses a defined set of tags to encode data (similar to HTML) specified by the IDPF.

The content.opf contains four sections as follows:

  • Metadata Section – This section contains data about the eBook such as the title, author, and product description. The eReading devices vary in how they utilize this metadata, but certain elements are required for a valid EPUB.
  • Manifest Section – This section is a list of all the content files, media, fonts, and stylesheets used in the eBook. The files can be listed in any order. However, you should not include a file in the Manifest Section that is not in the EPUB package. Also, you should not have undeclared files in the EPUB package that have not been declared in the Manifest Section.
  • Spine Section - This section contains linear instructions on how the eBook is ordered. The content files should be listed from top to bottom the same way you would read a book from left to right.
  • Guide Section – This section contains links to the cover, beginning of the eBook, and the HTML Table of Contents. eReading devices vary widely in how this information is interpreted and rendered.

The basic XML layout of the content.opf file for the EPUB 2.0.1 standard is as follows:

<?xml version="1.0" encoding="utf-8" ?>
<package version="2.0" xmlns="http://www.idpf.org/2007/opf" unique-identifier="BookId">
  
<!-- Metadata section -->
<!-- Manifest section -->
<!-- Spine section -->
<!-- Guide section -->
</package>

The first statement declares that this file is a valid XML file using UTF-8 encoding. The UTF-8 encoding allows for inserting characters outside the ASCII character set such as fancy quotes, letters with accents, and words from foreign languages. The package element contains all four sections of the content.opf. Now, let’s discuss the XML structure section by section.

Metadata Section

Metadata is essentially data about data, and it is not actual content. Different eReading devices will render the metadata declared in this section in different ways. As an example, the Kindle Fire will display the title specified in the Metadata Section on the top of the viewport inside the eBook on every single page:

As another example, the iBooks app will list the keywords from the Metadata Section when browsing the user’s eBook library, although the Kindle completely ignores them. It is difficult to predict how the eReading devices will render this metadata in the future, so it is best to be as accurate as possible. Metadata in the EPUB package also has the potential to aid in Search Engine Optimization (SEO) to help market your eBook. Although, it is unclear how that will work at this stage, because separate metadata entered when the eBook is uploaded to the various eBook stores seems to affect the algorithms rather than the metadata actually inside the EPUB package.

Important Note: Much of the metadata entered in the Metadata Section such as author, title, and description has to be re-entered when uploaded to the major eBook stores. This is frustrating, but it’s the way it is right now. However, you should always enter accurate metadata in the content.opf file.

It is essential that you have a unique identifier for your eBook (a series of digits and/or letters). A Universally Unique Identifier (UUID) is a randomly generated series of numbers and letters where there is a one in bazillion chance that the same one will be generated twice. You can obtain a UUID online at no cost using the BB eBooks Meta Pad. You can also use an ISBN number, if you prefer to go that route. However, this guide does not recommending utilizing the ISBN system for eBooks, since most eBook stores do not require them. Please note that you need one ISBN for the EPUB and one ISBN for the MOBI/KF8 file if you want to use the ISBN system, and it must not be the same ISBN as your print book edition.

Important Note: Besides the IDPF guidelines on the Metadata Section, Amazon, Barnes & Noble, and iBookstore have some additional requirements on how the cover image is referenced inside the EPUB package. Fortunately, they all have the same exact requirement. This is discussed below.

Per the IDPF specification, certain metadata is required inside every single eBook. The IDPF recognizes the open-access metadata standards from an organization called Dublin Core, which is based in Singapore. Unfortunately, trying to read through their requirements is rather challenging unless you received extensive wedgies in high school and/or have the intellectual aptitude of a Singaporean. That is why this section of the guide will go into great detail to help alleviate the confusion.

Tip: Please feel free to use the BB Meta Pad to help you construct the metadata of your eBook and avoid errors. The service is provided by BB eBooks at no cost.

Some sample XML of the Metadata Section for the EPUB 2.0.1 standard is as follows. You can also obtain this information from the EPUB boilerplate on the Developers page on BB eBooks:

  <metadata xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:opf="http://www.idpf.org/2007/opf">
    
<!--Required metadata-->
<dc:title>Your Book’s Title</dc:title> <dc:language>en-us</dc:language> <dc:identifier id="BookId" opf:scheme="uuid">urn:uuid:57b681b5-137b-4791-b5c5-99fdcfaa41ee</dc:identifier>
<!--Above identifier is an example of a UUID. To use an ISBN, it would look as follows: <dc:identifier id="BookId" opf:scheme="ISBN">urn:isbn:1234567890123</dc:identifier> -->
<dc:creator opf:role="aut" file-as="Last, First">First Last</dc:creator>
<!--The Author-->
<dc:publisher>BB eBooks</dc:publisher>
<!--Name of Publisher or yourself if self-published-->
<dc:date>2012-06-15</dc:date>
<!--Published Date in the format YYYY-MM-DD-->
<meta name="cover" content="My_Cover_ID" />
<!--Required for KindleGen, Barnes and Noble, and iBookstore-->
</metadata>

eBook’s Title: This is the content within the dc:title element. If you want to file your eBook as “Adventures of Huck Finn, The” rather than “The Adventures of Huck Finn”, you can do so by entering it that way. Do not use the file-as attribute in the dc:title markup.

eBook’s Author: This is the name of the primary author and that should appear within the dc:creator element. The opf:role="aut" attribute is not required by the IDPF, but it is “suggested.” It simply specifies that the creator is the author. You should always put the author’s name in this field and never the publishing company, editor, or someone else. If you want to file the author’s name with their last name first, simply write that in the file-as attribute within the dc:creator XML. The file-as attribute is not required and should be omitted if you want to file the eBook under the author’s first name.

eBook Publisher: Even if you are self-publishing, this metadata is required by the IDPF. If you do not have a publishing company, simply use the author’s name.

Publishing Date: This is the date the eBook is published. It should be in the YYYY-MM-DD format (e.g. 2012-07-06 for July 6th, 2012). If you make a modification to your eBook, you should always update this value.

ISBN or UUID: As mentioned previously ISBNs are expensive and most eBook stores do not require them. However, you do have to use some sort of unique identifier. A UUID is a good solution for those who don’t want to buy an ISBN. They are a combination of letters and numbers in a 8-4-4-12 format that are randomly generated. You can generate one for free at the BB eBooks website using the BB Meta Pad. Please note the difference in syntax between the ISBN and UUID and feel free to cut and paste the preceding example, replacing the unique identifier with your own. If you insist upon using an ISBN, please keep in mind that it must be different for both the EPUB and MOBI versions. Also, ensure that you use this exact same metadata in your toc.ncx Table of Contents or your EPUB may fail validation. Consult the NCX portion of this guide for more details.

Language: Enclosing the code for the eBook’s language within the dc:language XML is required within the OPF file. If you are submitting your eBook for sale, this will most likely be en-us for American English or en-gb for British English. Please visit the BB Meta Pad to view the codes for all the world’s languages.

Meta cover: Amazon, Barnes & Noble, and the iBookstore require a reference to the cover image within the Metadata Section. The My_Cover_ID refers to the id of the cover image that you will define in the Manifest Section.

An example of some optional metadata that you are permitted to include in the content.opf file is as follows. Please note that some optional metadata is more important than others:

<!--This is optional metadata that you should include-->
<dc:description>This is classic Russian literature!</dc:description> <dc:subject>Russian</dc:subject> <dc:subject>Classics</dc:subject>
<!--This is optional metadata that rarely gets used-->
<dc:rights>All rights reserved</dc:rights> <dc:type>Text</dc:type> <dc:source>Can be a URL or ISBN number</dc:source>
<!-- A prior resource from which the publication was derived -->
<dc:relation>Can be a URL or ISBN number</dc:relation>
<!-- An identifier of an auxiliary resource and its relationship to the publication -->
<dc:coverage>Worldwide</dc:coverage>
<!-- This metadata is for additional contributors -->
<dc:contributor opf:file-as="Last, First" opf:role="edt">First Last</dc:contributor>
<!-- example of an editor -->

Keywords: These are labels that apply to your eBook, similar to the way bloggers tag their posts. Every individual one gets placed inside a dc:subject element. So, if your steamy romance could be described with the keywords “Romance, Steamy and Hot, Erotic, Love, Women, Caribbean, Buns”, you would place each keyword within the XML element like <dc:subject>Romance</dc:subject>, <dc:subject>Steamy and Hot</dc:subject>, etc. You can list as many as you like, but seven is generally recommended.

Description: This is typically the backjacket description or blurb of the eBook. You should only have one long paragraph and do not enter HTML here; this will cause your EPUB to fail validation. Fancy quotes are okay, but make sure your text editor can support UTF-8 encoding.

Rights/Copyright Information: This is a standard statement on the rights of the eBook such as “All Rights Reserved” or “Public Domain”. It is not widely recognized by eReading devices, but it does no harm to add this metadata. It is placed within the dc:rights XML markup.

Type: This is from the Dublin Core metadata specification for dc:type. For eBooks, this will always be “Text”. Please note that eReading devices rarely use this metadata.

Source: This is more Dublin Core metadata, which defines the dc:source XML markup as “A Reference to a resource from which the present resource is derived.” If your eBook is part of a series or a portion of another larger periodical, you may consider adding this metadata by including an ISBN or UUID for the relevant publication. You can also reference the print version of your book by its ISBN. If your eBook is a compilation of a blog post, you can consider placing the URL to your blog.

Relation: This is some more Dublin Core metadata that also seems rather ambiguous. The dc:relation XML markup is defined as “A reference to a related resource.” If your eBook is a spin-off from another publication, you may consider placing a URL, ISBN, or UUID. However, eReading devices rarely make use of this metadata.

Coverage: This defines the coverage of your copyright, which is typically “Worldwide” or “Territorial”. If you eBook is public domain, you can simply say “Public Domain.” dc:coverage is the applicable XML markup and you don’t typically see this in metadata. It is part of the Dublin Core specification and included here for completeness.

Contributors: With the dc:contributor element, you can add one or more entries of additional people who contributed to the eBook publication. This can include an illustrator, editor, and even the rubricator (we’re not sure what that guy does). You state what their contribution is with the attribute opf:role plus a three-letter code indicating the nature of the contribution. The example in the above XML is for an editor. The three-letter codes come from the Marc Code List for Relators, which is an initiative from the United States Library of Congress. Please have a look at the BB Meta Pad to see the full list and to generate these entries quickly. Like the dc:creator element, you can add an attribute with the file-as attribute if you want to file the last name first.

Manifest Section

The Manifest Section is a listing of all HTML, CSS, media files, and other assets inside the EPUB package. It is divided into self-closing, individual XML elements denoted as item. For each item element there are three required attributes:

  • href – specifies the relative path from the content.opf location to your asset
  • id – a unique identifier in the content.opf file. Each id value should follow the same syntax conventions as id values in HTML (i.e. must start with a letter, must not have special characters, and it must be unique).
  • media-type – the MIME Type (or Internet Media Type) of the asset.

The media-type values are ways to specify file formats and they follow the same standard used by email clients and websites. The MIME Type (now called the “Internet Media Type”) is the convention utilized in the EPUB standard, and an exhaustive list of all MIME Types is here. Below is a list of MIME Types commonly used in eBook production for your convenience:

  • toc.ncx Meta TOC File - application/x-dtbncx+xml
  • .html Content files - application/xhtml+xml
  • .css Stylesheets – text/css
  • .jpg, .jpeg, or .jpe images – image/jpeg
  • .png images – image/png
  • .gif images – image/gif
  • .svg images – image/svg+xml
  • .ttf True Type Fonts – font/truetype
  • .otf OpenType Fonts – font/opentype
  • .mp3 Audio file – audio/mpeg
  • .mp4 Video File – video/mp4

An example listing of the Manifest Section is below. While the values of the id attributes are arbitrary, you should assign them based on some sort of naming convention so that your XML is human-readable. For example, it is much easier to assign your first HTML content file with an id of content001 rather than xsd324-sd2784f or something else that is meaningless to human eyes.

Important Note: Recall that everything in the content.opf file is case-sensitive.

  <manifest>
    <item href="cover.jpg" id="My_Cover_ID" media-type="image/jpeg" /> 
<!-- Required for KindleGen, Barnes and Noble, and iBooks -->
<item href="toc.ncx" id="ncx" media-type="application/x-dtbncx+xml" /> <item href="coverpage.html" id="htmlcoverpage" media-type="application/xhtml+xml" />
<!-- Remove for Kindle -->
<item href="content/content001.html" id="htmlcontent001" media-type="application/xhtml+xml" /> <item href="content/htmltoc.html" id="htmltoc" media-type="application/xhtml+xml" /> <item href="content/content002.html" id="htmlcontent002" media-type="application/xhtml+xml" /> <item href="content/content003.html" id="htmlcontent003" media-type="application/xhtml+xml" /> <item href="content/content004.html" id="htmlcontent004" media-type="application/xhtml+xml" /> <item href="content/content005.html" id="htmlcontent005" media-type="application/xhtml+xml" /> <item href="css/bbstylesheet-epub.css" id="cssbbstylesheet-epub" media-type="text/css" /> <item href="images/BBeBooks.jpg" id="img1" media-type="image/jpeg" /> <item href="images/someimage.jpg" id="img2" media-type="image/jpeg" /> <item href="fonts/Chunkfive.otf" id="font1" media-type="font/opentype" /> </manifest>

This Manifest Section is fairly standard. It contains the HTML content broken into five separate pieces along with some additional media files. You will notice that the href attribute uses relative paths to declare where the files are located. Recall that this EPUB standard is using the directories content, css, images, and fonts as sub-directories of the OEBPS folder. You can use an alternative directory structure if you wish, but try to maintain consistency for different eBook projects. Bad links inside the Manifest Section are a common source of errors and will result in failed EPUB validation.

You will also notice that the coverpage.html file is declared in the manifest. Ensure that this is removed if your EPUB will be the source for the MOBI/KF8 compilation with KindleGen.

The cover.jpg file should have an id value of My_Cover_ID, which was originally defined in the Metadata Section under a meta element. This ensures that Kindle, Nook, and iBooks users can access the cover directly from their devices, and it is the image displayed in the reader’s eBook library.

Tip: This guide uses an indentation scheme for XML, but you are not required to use one. However, it certainly helps with readability.

Important Note: It does not matter the order in which you specify the assets in the Manifest Section. The order of the eBook is defined in the Spine Section.

Spine Section

The Spine Section specifies the exact linear order of the eBook. Analogous to the “spine” of a print book: the first section listed is the start of the eBook, while the last section listed is the back of the eBook. This section is constructed entirely of self-closing itemref XML elements. The only required attribute is idref, which refers to the same value as the id attribute in the Manifest Section.

Below is a sample Spine Section that correlates to the Manifest Section example above:

  <spine toc="ncx">
    <itemref idref="htmlcoverpage" /> 
<!-- Remove for Kindle -->
<itemref idref="htmlcontent001" /> <itemref idref="htmltoc" /> <itemref idref="htmlcontent002" /> <itemref idref="htmlcontent003" /> <itemref idref="htmlcontent004" /> <itemref idref="htmlcontent005" /> </spine>

The opening toc="ncx" attribute for the spine element is required. This defines the NCX Table of Contents for the eBook, and the ncx value is the id declared for the toc.ncx file in the Manifest Section. As you can see from this example, when the reader goes to the beginning of the eBook, they will be at the coverpage.html file. As the reader keeps paging down, they will cycle through content001.html, htmltoc.html, and content002.html, until finally they get to the last piece of content in content005.html. Most eReading devices insert an automatic page break as they jump from one itemref element to the other.

eReading devices utilize the Spine Section to build the reading order of the eBook, so making a mistake here can be rather embarrassing (e.g. making the chapters appear as if they are in the wrong order). Please exercise caution when constructing the Spine Section, and labeling your id attributes in a logical order in the Manifest Section can be extremely helpful.

Tip: While not required, you can add the attribute linear="no" to any of the itemref elements. This means that the section will be skipped if the reader is paging through the eBook. However, you can permit access to the content by creating a hyperlink. This may be a useful feature if you want to create an educational eBook with hidden answer keys. The IDPF standard has an example.

Guide Section

The Guide Section provides extra metadata that declares target locations for the extra buttons that are available on some eReaders such as “Cover” or “Beginning”. It was supposed to be a way for eBook designers to annotate where commonly used sections such as footnotes, the bibliography, and index were located in the HTML, so that the reader could have easy access to them on her device. A full list of the type attribute you can define in the Guide Section is available at the IDPF EPUB 2.0.1 standard. Unfortunately, the interpretation of the XML in the Guide Section varies widely from device to device. Due to the confusion, the IDPF is actually getting rid of this section for EPUB 3 in favor of a different standard.

Due to the poor adoption of the standards laid out by the IDPF for this section, this guide recommends using only three XML entries (and only two for Kindle-source EPUBs):

  <guide>
    <reference href="coverpage.html" type="cover" title="Cover" /> 
<!-- Remove for Kindle -->
<reference href="content/htmltoc.html" type="toc" title="Table of Contents" /> <reference href="content/content002.html" type="text" title="Beginning" /> </guide>

In this example of the Guide section, clicking “Cover” in the eReader would go to coverpage.html, clicking “Table of Contents” would go htmltoc.html, and clicking “Beginning” would go to the first part of the story after the front matter (i.e. content002.html). Also, for the Kindle, when the reader opens the MOBI/KF8 file for the first time, they will automatically start at where “Beginning” is defined.

Complete OPF File

You now have a working knowledge of how the content.opf file for the EPUB works without having to resort to third-party software. Congratulations! This is the essential building block of how eBooks are created. To see the XML for the four sections of the content.opf as one big file, please visit the BB eBooks Developers page.


The toc.ncx File

Overview

The toc.ncx is another file in XML format that is an important part of the EPUB standard. This is where you construct the NCX Table of Contents (aka the Meta Table of Contents), which is required for EPUB validation and as part of the EPUB package at all eBook stores. Constructing this Table of Contents can be time-consuming, and you do not have the same flexibility as you do when crafting the HTML Table of Contents. Ensure that you have already split your files, because you must specify a series of internal hyperlinks to the correct files and relative paths as part of the toc.ncx.

Despite its difficult syntax, a properly-formatted toc.ncx will ensure that your Table of Contents is 100% professional. Most automatic conversion programs horribly mangle the toc.ncx, so you will distinguish yourself as an eBook designer by learning how this XML works.

Important Note: Be very cautious when hand-coding the NCX Table of Contents. Like the content.opf file, everything in the XML is case-sensitive. For example, it is the <navPoint> element and not the <navpoint> element—note the capital P.

The NCX Table of Contents XML is divided into 3 sections as follows:

<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE ncx PUBLIC "-//NISO//DTD ncx 2005-1//EN" "http://www.daisy.org/z3986/2005/ncx-2005-1.dtd">
<ncx xmlns="http://www.daisy.org/z3986/2005/ncx/" version="2005-1" xml:lang="en">
  
<!-- Metadata Section -->
<!-- Title and Author Section -->
<!-- Navigation Map Section -->
</ncx>

Metadata Section

In the Metadata Section you must specify the unique identifier (i.e. the UUID or the ISBN) that is exactly the same as your content.opf file in the dc:identifier element. Failing to specify the exact same identifier may result in failed EPUB validation. Recall that cut-and-paste is the greatest invention ever.

The Metadata Section is also where you define how many “levels” are in the NCX Table of Contents. For example, if you have a main section with a series of sub-sections, you could have a two-level Table of Contents. Most eReading devices will render the sub-level as indented from the main level.

This method of crafting the Table of Contents with multiple levels is a useful feature if you have a complicated non-fiction eBook. However, a single-level NCX Table of Contents is generally fine for most works of fiction. You specify the number of levels in the meta element with the dtb:depth attribute, and you are permitted up to four levels. The way to annotate which entries are sub-levels of other entries is discussed in the Navigation Map Section.

Important Note: The Barnes & Noble PubIt! guidelines only allow a single-level NCX Table of Contents. However, if you have encoded the Navigation Map Section with multiple levels, it will flatten out the structure and not delete individual entries.

Some sample XML of the Metadata Section for a single-level Table of Contents is as follows:

  <head>
    <meta content="urn:uuid:57b681b5-137b-4791-b5c5-99fdcfaa41ee" name="dtb:uid" /> 
<!-- Must be exactly the same as dc:identifier in the content.opf file -->
<meta content="1" name="dtb:depth" />
<!-- Set for 2 if you want a sub-level. It can go up to 4 -->
<meta content="0" name="dtb:totalPageCount" />
<!-- Do Not change -->
<meta content="0" name="dtb:maxPageNumber" />
<!-- Do Not change -->
</head>

You only need to change the values inside the first two meta elements. Do not alter anything in the last two.

Title and Author Section

The information in the Title and Author Section of the toc.ncx is simply is the metadata about the author and title of the eBook. While this data is not currently implemented by any eReading devices, it is required for EPUB validation.

Some sample XML for the Title and Author Section is as follows:

  <docTitle>
    <text>eBook Title</text>
  </docTitle>
  <docAuthor>
    <text>Author Name</text>
  </docAuthor>

Navigation Map Section

The Navigation Map Section is rather complex XML markup that is comprised of individual navPoint elements. Each navPoint element basically has markup that declares three characteristics: hyperlink target for the entry (with a relative path from the toc.ncx file), text for the hyperlink as it appears to the reader, and a linear play order. The “play order” provides navigation for when you click the Next/Previous Section button on an eReader, if it has one.

Important Note: The playOrder attributes must be sequentially marked starting at 1.

Some sample Navigation Map XML for six unique entries on a single-level NCX Table of Contents is as follows:

  <navMap>
    <navPoint id="ncxcoverpage" playOrder="1"> 
<!-- Remove for Kindle -->
<navLabel> <text>Cover</text> </navLabel> <content src="coverpage.html" /> </navPoint> <navPoint id="ncxcontent001h1-1" playOrder="2"> <navLabel> <text>Sample EPUB 2.0.1 File for Developers</text> </navLabel> <content src="content/content001.html#h1-1" /> </navPoint> <navPoint id="ncxcontent002h2-1" playOrder="3"> <navLabel> <text>Chapter 1 - Getting Started</text> </navLabel> <content src="content/content002.html#h2-1" /> </navPoint> <navPoint id="ncxcontent003h2-2" playOrder="4"> <navLabel> <text>Chapter 2 - Media</text> </navLabel> <content src="content/content003.html#h2-2" /> </navPoint> <navPoint id="ncxcontent004h2-3" playOrder="5"> <navLabel> <text>Chapter 3 - Caution</text> </navLabel> <content src="content/content004.html#h2-3" /> </navPoint> <navPoint id="ncxcontent005h2-4" playOrder="6"> <navLabel> <text>Chapter 4 - MOBI/KF8</text> </navLabel> <content src="content/content005.html#h2-4" /> </navPoint> </navMap>

For the attribute id within the navPoint element, its value can be anything as long as there are no duplicates. The playOrder attributes must be sequentially ordered. The self-closing content element points to the target for the hyperlink with the src attribute. You can establish anchors inside your HTML content the same way you do when constructing the HTML Table of Contents. It is considered a good practice to have the anchors in your HTML content be used by both the NCX and HTML Tables of Contents.

Tip: If you are using NotePad++, set the Language to XML when you are creating your toc.ncx file. It greatly helps with readability and editing.

Important Note: For the source EPUB for the Kindle, you need to remove the reference to the coverpage.html file in the NCX. This requires a complete re-ordering of all the playOrder attributes, since the coverpage.html is playOrder="1". It’s not a fun task, we know.

In order to make a multiple-level Table of Contents, you need to adjust the nesting of the navPoint elements. Much the same way that ordered and unordered lists are nested in HTML markup, you need to shift the closing </navPoint> tag from the end of a top level navPoint element to after its sub-level elements. The playOrder attributes do not need adjusting. Some example XML is as follows:

    <navPoint id="NCX_Chapter1" playOrder="51">
      <navLabel>
        <text>Chapter 1 - Joshua Tree</text>
      </navLabel>
      <content src="content/content001.html" />
      
<!-- closing navPoint tag removed here -->
<navPoint id="NCX_Chapter1" playOrder="52"> <navLabel> <text>Chapter 1 - Section I</text> </navLabel> <content src="content/content001.html#section1" /> </navPoint> <navPoint id="NCX_Chapter1" playOrder="53"> <navLabel> <text>Chapter 1 - Section II</text> </navLabel> <content src="content/content001.html#section2" /> </navPoint> </navPoint>
<!-- navPoint closing tag from Chapter 1 added down here -->
<navPoint id="NCX_Chapter2" playOrder="54"> <navLabel> <text>Chapter 2 - Reflections</text> </navLabel> <content src="content/content002.html" /> </navPoint>

A full example of a toc.ncx file can be viewed in the BB eBooks Developers page for your convenience.


Compressing Your EPUB File to Make Your eBook

After you have all the assets and XML placed in the correct directories, you can now bundle everything into one big EPUB eBook. Please note that there is a very specific way that you have to compress the files into the EPUB format. Unfortunately for Windows users, compressing all your files using a GUI-based compression tool like 7-Zip will cause your EPUB file to fail validation. Per the IDPF specification, it is necessary to have the mimetype file added first to the zip file, and also to have it “stored” (i.e. uncompressed). That is why you have to use the command line to build your EPUB.

Tip: This guide provides the command line codes for Windows users, but for Linux/Mac users, you can use the Unix commands provided here to compress your EPUB.

For those of you whippersnappers too young to remember MS-DOS, command line prompts were how operating systems on PCs worked before Windows 95 came around. You need to install the zip.exe file somewhere on your computer. This guide installed it in a folder called c:\zip\. The zip.exe program can be downloaded here for free.

Let’s travel back in time and perform the following steps at the command line. This guide assumes the root directory (where the mimetype file is located) is at c:\rootdirectory\. Within rootdirectory, there are all the sub-directories of your EPUB package (i.e. META-INF and OEBPS). Feel free to adjust the command line codes to your preference.

Perform the following steps to build your EPUB from the command line:

  1. Access the command line by typing cmd in the search/run window on Windows
  2. Access your root directory with all your EPUB files by typing cd\rootdirectory
  3. Verify the command line prompt says c:\rootdirectory>
  4. Type c:\zip\zip.exe mybook.epub -DX0 mimetype to compress the mimetype file into a newly-created file called mybook.epub
  5. Type c:\zip\zip mybook.epub -rDX9 META-INF OEBPS to compress the META-INF directory and the OEBPS directory (and all its sub-directories) into that same mybook.ebub file

Important Note: Do not compress files into the EPUB package that are not defined in the Manifest Section content.opf (with the exception of the mimetype and container.xml files).

Step 4 is adding the mimetype file first into the new EPUB file, and the -DX0 field makes the mimetype file get stored in the package uncompressed (note: this field is a “dee-x-zero” not a “dee-x-oh”). Step 5 adds the directories META-INF and OEBPS, plus any sub-directories, into the same EPUB file. The -rDX9 field in step 5 ensures the directory structure is maintained in the EPUB file, but doesn’t add extra attributes that may corrupt your file.

Tip: BB eBooks has a page dedicated to command line scripts if you are having any trouble with this process.

You now have an actual EPUB eBook. Try opening your eBook on your favorite eReading device (Adobe Digital Editions works fine for viewing EPUB eBooks on your PC or Mac). The EPUB may have serious issues and it may not open at all. Don’t be discouraged. You need to troubleshoot the EPUB package and fix all the errors you may have made along the way. Every developer makes mistakes and computers are very unforgiving about them.


Validating Your EPUB File and Troubleshooting

Overview of EpubCheck

Now that you have created your EPUB eBook, it’s time to do some debugging. The EpubCheck tool is an open-source program written in Java that checks your EPUB file for errors. Most eBook stores that utilize the EPUB format will utilize this exact same program to see if the eBook you upload for sale is valid. Therefore, it is imperative that you make sure your EPUB is valid prior to selling your eBook or returning the EPUB to the client. The errors that are thrown when EpubCheck runs may seem strange at first, but careful review of your EPUB files (particularly your content.opf file) will allow you to see where you made mistakes. The same axiom that applies to programmers applies eBook developers: debugging is part of the job. So, don’t be discouraged and have some patience as you iron out the kinks in your source files.

Installing EpubCheck

EpubCheck requires the Java Runtime Environment to work. You may already have this installed on your computer, and you can check to see if Java is installed by going to the Oracle website. If not, simply download the latest Java Runtime Environment and install on your computer (it’s free). Make a note of which directory the java.exe file is installed, because you will need to access it to run EpubCheck.

The next step is to go to the EpubCheck project page, and download the latest version (also free). It will have the extension .jar. Place this file in any directory on your computer. For the purpose of this guide, we have placed the EpubCheck file epubcheck-3.0b4.jar in a directory called e:\bin.

Tip: You can copy the java.exe file to the same directory as EpubCheck to make the command line codes simpler.

Running EpubCheck

For the purpose of this guide, epubcheck-3.0b4.jar and java.exe are in a directory called e:\bin; however, you can change this to any directory based on your preference. The easiest way to decipher the EpubCheck error messages is to output them to a text file. Run the following code at the command line in the same directory as your EPUB to see which errors are generated:

e:\bin\java.exe -jar e:\bin\epubcheck-3.0b4.jar temp.epub > errors.txt 2>&1

Tip: For Linux/Mac users, there are some helpful comments on how to run EpubCheck from the command line at the EpubCheck project page.

The errors from the EPUB validation process will be outputted to a file called errors.txt. If you are having trouble running this from the command line, please keep in mind that the file epubcheck-3.0b4.jar may be called something different as new versions are released somewhat frequently.

As an alternative to installing EpubCheck locally, you can upload your EPUB to the IDPF Validation site. It runs the EpubCheck program on their web server, so there will be no difference in the output. The good folks at the IDPF provide this service free of charge, but the limit on your EPUB is 10MB.

Analyzing and Correcting Errors in Your EPUB

While the EpubCheck program is great at finding little mistakes that you may have made while building your EPUB package, the error messages may not be so intuitive at first. Luckily, the EpubCheck provides a line number and file name on mistakes. If you did not check your HTML files at the W3C Validator and your CSS at the Jigsaw Validator, you may want to do that first. These W3 validators do a better job of finding errors in your HTML and CSS markup. The EpubCheck program is more geared toward finding errors in how the EPUB is packaged.

Some sample errors that EpubCheck outputs and a subsequent explanation of the cause of those errors are listed below.

Problem: mimetype issues

ERROR: temp.epub: Length of the first filename in archive must be 8, but was 22 !
ERROR: temp.epub/mimetype: Mimetype file should contain only the string "application/epub+zip".

Cause: Ensure that your mimetype file is in the root directory of your EPUB package. It must be named mimetype and it must be exactly 20 bytes.

Problem: playOrder issues within the toc.ncx

ERROR: temp.epub/OEBPS/toc.ncx: assertion failed: playOrder sequence has gaps
ERROR: temp.epub/OEBPS/toc.ncx: assertion failed: identical playOrder values for navPoint/navTarget/pageTarget that do not refer to same target

Cause: Verify that you have the correct playOrder values in your toc.ncx file. The first navPoint element should have playOrder="1", the next should be playOrder="2", etc.

Problem: Bad references on hyperlinks within the toc.ncx

ERROR: temp.epub/OEBPS/toc.ncx(51,53): 'OEBPS/content/content025.html': referenced resource missing in the package

Cause: Make sure that your hyperlink reference in the content element of the toc.ncx (on line 51 in this example) is valid and that the file content025.html exists.

Problem: Poorly formed XML

ERROR: temp.epub/OEBPS/toc.ncx(5,72): Element type "meta" must be followed by either attribute specifications, ">" or "/>".
ERROR: temp.epub/OEBPS/toc.ncx: Element type "meta" must be followed by either attribute specifications, ">" or "/>".

Cause: Your XML is broken on line 5 somehow. Make sure that all attributes are enclosed in quotes, and that the tags are properly closed. Running your XML through the W3C Validator may provide more helpful error messages.

Problem: Missing assets in the EPUB package

ERROR: temp.epub: image file OEBPS/images/BBeBooks.jpg is missing

Cause: You have declared an asset in the Manifest Section of the content.opf file, but the actual asset (BBeBooks.jpg in this example) does not exist.

Problem: Undeclared assets in the EPUB Package

WARNING: temp.epub: item (OEBPS/BBeBooks.jpg) exists in the zip file, but is not declared in the OPF file

Cause: You have an asset inside the EPUB package (BBeBooks.jpg in this example), but it was not declared in the Manifest Section of the content.opf file.

Problem: Improperly encoded source file

ERROR: temp.epub: I/O error reading OEBPS/content.opf

Cause: You have ANSI encoding declared in your content.opf file, but there is a non-ASCII character somewhere inside (e.g. a fancy quote). For NotePad++ users, make sure you have the encoding set to Encode in UTF-8 without BOM.

Problem: XML element not recognized by the IDPF standard

ERROR: temp.epub/OEBPS/content.opf(25,16): element "dc:badXML" not allowed anywhere;

Cause: You have mislabeled an XML element that is not recognized by the IDPF standard within the content.opf file. Double check the opening and closing tags for your XML elements (the error is on line 25 in this example).

Problem: Invalid Media Type (aka MIME Type)

ERROR: temp.epub: The file OEBPS/images/BBeBooks.jpg does not appear to be of type image/gif

Cause: You have given an asset the wrong media type in the Manifest Section of the content.opf. In this example, BBeBooks.jpg should be image/jpeg, not image/gif.

Problem: Improper reference in the Spine Section

ERROR: temp.epub/OEBPS/content.opf(48,39): item with id 'htmlcontent006' not found
WARNING: temp.epub/OEBPS/toc.ncx(51,53): hyperlink to resource outside spine 'OEBPS/content/content005.html'
WARNING: temp.epub/OEBPS/content/htmltoc.html(16,51): hyperlink to resource outside spine 'OEBPS/content/content005.html'

Cause: Verify that the idref attributes in the Spine Section match the id attributes for your HTML files in the Manifest Section of the content.opf. One mistake will cause a cascade of errors as seen in this example. The error above was caused by mislabeling one idref in the spine with htmlcontent006 rather than htmlcontent005 on line 48; however, there are actually no errors within the toc.ncx.

After tweaking your source file, recompiling the EPUB, and testing with EpubCheck over and over again, you should eventually get a message that looks like this:

Epubcheck Version 3.0b4
Validating against EPUB version 2.0
No errors or warnings detected.

Congratulations! You have successfully packaged a valid EPUB. While you may have taken a bit of a beating along the way, you will most likely feel as satisfied as Zapp Brannigan after he returned from Planet Amazonia. You should never ship an EPUB that does not validate with the latest version of EpubCheck. If you are having problems with some of the error messages, drop us a comment at BB eBooks, and we will try and help. You can do this.


Using KindleGen to make Your MOBI/KF8 eBook for Amazon

Why KindleGen?

Now that you know how to build a valid EPUB eBook, you can convert it into the proprietary MOBI/KF8 format utilized by Amazon. KindleGen is a free (but not open-source) command line program created by Amazon.com that can convert HTML, OPF, and EPUB files into a MOBI/KF8 eBook readable on all Kindle devices and apps. It is strongly recommended that you use KindleGen on your EPUB file, because this will result in the most professional-looking eBook. The .mobi file is the one you will upload to Amazon for sale at the Amazon Kindle store. Since the format is compiled, you unfortunately cannot edit the .mobi with a text editor. However, for advanced readers you can use the Python script Mobi Unpack if you want to view how the HTML and CSS is built from your EPUB.

Running KindleGen at the Command Line

Just like when working with EpubCheck, you will have to travel back in time to the days of the command prompt to run KindleGen, since it has no GUI support. Download KindleGen from the Amazon.com website and place the kindlegen.exe file in a folder. This guide installed KindleGen in the folder e:\bin, but you are welcome to change the directory based on your personal preferences.

Perform the following steps to convert an EPUB into the MOBI/KF8 format:

  1. Access the command prompt by typing cmd in the find/search window
  2. Go to the directory where your EPUB file is located
  3. Enter e:\bin\kindlegen.exe filename.epub -c1 -verbose

Tip: You can see the different fields available for KindleGen by just typing e:\bin\kindlegen. The -c1 field specifies standard compression (recommended) and the -verbose field outputs data during the conversion process.

The KindleGen compiler will spit out a lot of information, but as long as the last line says MOBI File successfully generated!, you should be good to go. However, you may get some errors during the compilation process, even if you run KindleGen on a valid EPUB. Below is an examination of some of the more common errors that get thrown by KindleGen.

Tip: If you want the KindleGen output to be saved to a text file for easier access, you can run the following command: e:\bin\kindlegen.exe filename.epub -c1 –verbose > errors.txt, and the output will be saved to a file called errors.txt.

Problem: You have a Cover HTML file

Info(prcgen):I1052: Kindle support cover images but does not support cover HTML. Hence using the cover image specified and suppressing cover HTML in content.

Cause: You should remove any reference to the cover.jpg inside your HTML. The only place that the cover should be referenced is inside the content.opf file in the Metadata and Manifest sections. Consult the earlier section of this guide for instructions.

Problem: You are Using CSS Declaration Not Supported by Kindle

Info(cssparser):I10004: @rules other than @import, @charset and @font-face are not supported.

Cause: Double-check your CSS to ensure you are not using any CSS3 rules unsupported by Kindle. The @page declaration that provides viewport padding on EPUB eBooks is not supported by Kindle. Consult the BB eBooks Developers page for CSS boilerplate and read the Amazon Publishing Guidelines for a full list of unsupported CSS properties.

Problem: You Have Embedded Fonts

Warning(prcfile):W14029: CFF/Type1 (Postscript) embedded font included in your source may not render clearly on all Kindle readers.

Cause: In some cases you should simply ignore this warning. The embedded fonts will show up on the Kindle Fire fine, but the text will fall back to the default font (good ole’ Palantino) on other Kindle eReading devices.

Once you have a valid EPUB, you will probably not get too many errors when running KindleGen. You may notice that the .mobi file produced from the compiler is huge in size, often two to three times as large as your EPUB. This is because the MOBI7 format (for older e-ink Kindles and Kindle for iOS) and the KF8 format (for Kindle Touch and Kindle Fire) are in the same file. When a reader downloads your eBook from the Amazon Kindle store, it will only ship the format that their eReading device requires. This is good, because a smaller file size ensures you don’t get charged with Amazon’s extra delivery fees.

Previewing Your Kindle eBook

To see your eBook will look on every Kindle device and app, download the latest Kindle Previewer software from the Amazon website and install on your computer. With the EPUB eBook you have to worry about designing for a lot of different eReading devices, but the MOBI/KF8 eBook is exclusively for Amazon-produced hardware and apps. However, there are many permutations of the Kindle, and your eBook will look very different on the older e-ink Kindle versus the Kindle Fire (as an example).

This is where the troubleshooting really begins. Since the Kindle eBook can only be uploaded to Amazon as a standalone file, you have to ensure that it looks professional across all devices. Consult the tutorials on the BB eBooks Developers page for tips on how to utilize Amazon-specific media queries to ensure that proper CSS styling declarations apply. Some common problems seen when previewing a Kindle eBook include:

  • Erroneous margins on older e-ink Kindles (KindleGen 2.5 has fixed a lot of these problems)
  • Blown-out rendering of lists and tables (only apply CSS to lists and tables for KF8)
  • Drop caps appearing by themselves on one line (only use the float property for KF8)
  • Blow-out formatting when clicking on hyperlinked text (only use div elements to anchor your HTML markup)

Ensure that you properly test your Kindle eBook for maximum quality. Cycle through the sections with the arrows to make sure that your NCX Table of Contents play order is correct. Click on the Go To menu and ensure that Cover takes you to your cover image, Beginning takes you to the location specified in the Guide Section of the content.opf, and Table of Contents takes you to your HTML Table of Contents. Ensure all the links in your Table of Contents are functional. Vary the font size on your Kindle eBook to ensure that the content reflows properly.

The process of compiling the MOBI/KF8, previewing it, and going back to tweak your HTML/CSS/XML in your source EPUB may take a while. Be patient and know that you are creating a professional eBook that is of higher quality than 95% of the other eBooks currently sold at the Kindle store. Many people think that just uploading a .doc file and living with the results is acceptable. You will have a serious competitive advantage in terms of reader appreciation and quality with your technical knowledge of how to properly build an eBook.