GutenMark Home Page
Attractively formatting Project Gutenberg texts


home
features
download
usage
FAQ
changes
bugs
links
developer
Ladders, by Lynnie Rothan

Contents

What is GutenMark?
What is Project Gutenberg?
The Problem
What's the Solution?
How does GutenMark fit in?

What is GutenMark?

GutenMark is a command-line tool for automatically creating high-quality HTML or LaTeX markup from Project Gutenberg etexts.  As of April 2008, there is also a graphical front-end called GUItenMark that greatly simplifies usage for casual users.  Windows, Mac OS X, and Linux are each directly supported with installers, though not all versions of these operating systems are supported so be sure to read the the fine print on the download page to see if the installers will work for you!  I'm sure the program can be compiled from source for other platforms as well (though possibly with limited functionality).  Limited iPhone support is also possible.

In combination with other freely-available conversion tools GutenMark aims to convert Project Gutenberg etexts into publication-quality Postscript or PDF, for print-on-demand applications.  The goal is for this conversion to be completely automatic, without manual markup or editing, but for the forseeable future some manual intervention will almost always be needed—at least, if your standards are at least as high as mine.


Project Gutenberg logo and linkWhat is Project Gutenberg?

Project Gutenberg—or PG for short—is  a project for freely providing online books.  Thousands of such "etexts" have been made available.  Many are familiar classics, and many others are completely unfamiliar books you're unlikely to find anywhere else.   I've provided a dozen or so of the etexts myself.

(GutenMark is not affiliated with Project Gutenberg in any way.)


The Problem

When I started this project, my principal complaint against Project Gutenbert etexts was that they were not very pretty in comparison to typical hand-held books, or even in comparison to typical web-pages.  PG etexts have traditionally been provided in a format providing most of the content of the books (i.e., what was in the author's mind) but have discarded the attractive formatting (provided by the publisher).  There have been sound reasons for doing so, and I can't quibble with them. 

In the intervening years, Project Gutenberg has made great strides in improving this situation, through the mechanism of a sub-organization known as Distributed Proofreaders.  Distributed Proofreaders not only greatly reduces the error rates within the etexts, but also often produces attractive HTML versions of the etexts in addition to the plain-text versions.  My personal assessment is that HTML produced by Distributed Proofreaders is "good enough" for browsing purposes, and could conceivably be adequate for printing in some cases if post-processed properly. 

Thus the need for a program like GutenMark is now somewhat reduced but not entirely eliminated.  With that said, you can now continue to read my original diatribe, which doesn't take such later developments into account.

It turns out, for many of us, that we really do prefer the more attractive printed version over the "plain-vanilla" version provided by Project Gutenberg.  In other words, we'd rather buy the book than read the online etext.  I fear that this effect has limited PG readership somewhat.

The situation has improved somewhat, in several ways.  Special software for reading the online books can make the books appear more attractive on the computer screen.  There is a project of the HTML Writers Guild to provide XML versions of PG etexts.  Even PG itself is now willing to accept formatted versions of etexts, as long as the plain-vanilla version exists also.  I applaud all of these efforts, and I hope it does not denigrate them to add my own efforts to theirs.


What's the Solution?

I think that printing on demand would go a long way towards making PG better for the readers.  It would be great to be able to take any random PG etext and automatically format it so that it is as attractive as a printed book, for either online reading or for printing.


How does GutenMark fit in?

GutenMark is a free program for Win32 or Linux.  It accepts a Project Gutenberg etext, applies what I hope are intelligent heuristics to it, and produces attractively-formatted HTML or LaTeX.  My definition of "attractively formatted" is that you should be able to easily print it out in a way that's indistinguishable from a published book.

In other words, Project Gutenberg has retained the content of the books in converting them to etexts, but has discarded the formatting.  GutenMark aims to restore the formatting.

(In fairness, I should point out that there are other alternatives to GutenMark that you might want to consider.  First, although Project Gutenberg likes to serve you plain text, it may also make other, fancier formats available to you for selected etexts.  Check this out at  gutenberg.org.  Also, even when PG does not maintain a fancier version than plain text, the PG volunteers who created the etexts may have such material, and might provide them to you by email; usually, their email addresses can be found within the etexts themselves.)

How well does GutenMark succeed?  It depends on the particular etext; in my view, it works pretty well.  To give you some idea, here is a small, sample etext, processed in various ways:

I've also created a small repository of various Project Gutenberg etexts that I've converted to LaTeX/PDF according to my own particular tastes.

You might need to download free Adobe's Acrobat Reader program to view or print the PDF files.  The PDF files were created using a freely available utilities, so that you could see a book-like printout.  It's important to understand that no manual markup or editing was performed at any stage in these sample files.


©2001-2002,2008-2009 Ronald S. Burkey.  Last updated 02/16/2009 by RSB.  Contact me.