GutenMark Home Page
Attractively formatting Project Gutenberg texts


home
features
download
usage
FAQ
changes
bugs
links
developer
Ladders, by Lynnie Rothan

Contents

What is GutenMark?
What is Project Gutenberg?
The Problem
What's the Solution?
How does GutenMark fit in?

What is GutenMark?

GutenMark is a tool for automatically creating high-quality HTML or LaTeX markup from Project Gutenberg etexts.  In combination with other freely-available conversion tools, GutenMark can convert Project Gutenberg etexts into publication-quality Postscript or PDF, for print-on-demand applications.  The goal is for this conversion is completely automatic, without manual markup or editing.


Project Gutenberg logo and linkWhat is Project Gutenberg?

Project Gutenberg -- or PG for short -- is  a project for freely providing online books.  Thousands of such "etexts" have been made available.  Many are familiar classics, and many others are completely unfamiliar books you're unlikely to find anywhere else.   I've provided a dozen or so of the etexts myself.

(GutenMark is not affiliated with Project Gutenberg in any way.)


The Problem

A problem with PG etexts -- dare I say it? -- is that they are not very pretty, in comparison to typical hand-held books.  PG etexts have traditionally been provided in a format providing most of the content of the books (i.e., what was in the author's mind) but have discarded the attractive formatting (provided by the publisher).  There have been sound reasons for doing so, and I can't quibble with them.

It turns out, for many of us, that we really do prefer the more attractive printed version over the "plain-vanilla" PG version.  In other words, we'd rather buy the book than read the online etext.  I fear that this effect has limited PG readership somewhat.

The situation has improved somewhat in recent years, in several ways.  Special software for reading the online books can make the books appear more attractive on the computer screen.  There is a project of the HTML Writers Guild to provide XML versions of PG etexts.  Even PG itself is now willing to accept formatted versions of etexts, as long as the plain-vanilla version exists also.  I applaud all of these efforts, and I hope it does not denigrate them to add my own efforts to theirs.


What's the Solution?

I think that printing on demand would go a long way towards making PG better for the readers.  It would be great to be able to take any random PG etext and automatically format it so that it is as attractive as a printed book, for either online reading or for printing.


How does GutenMark fit in?

GutenMark is a free command-line utility for Win32 or Linux (or BSD or UNIX or Mac OS X ...).  It accepts a Project Gutenberg etext, applies what I hope are intelligent heuristics to it, and produces attractively-formatted HTML or LaTeX.  My definition of "attractively formatted" is that you should be able to easily print it out in a way that's indistinguishable from a published book.

In other words, Project Gutenberg has retained the content of the books in converting them to etexts, but has discarded the formatting.  GutenMark aims to restore the formatting.

(In fairness, I should point out that there are other alternatives to GutenMark that you might want to consider.  First, although Project Gutenberg likes to serve you plain text, it may also make other, fancier formats available to you for selected etexts.  Check this out at  gutenberg.org.  Also, even when PG does not maintain a fancier version than plain text, the PG volunteers who created the etexts may have such material, and might provide them to you by email; usually, their email addresses can be found within the etexts themselves.)

How well does GutenMark succeed?  It depends on the particular etext; in my view, it works pretty well.  To give you some idea, here is a small, sample etext, processed in various ways:

I've also created a small repository of various Project Gutenberg etexts that I've converted to LaTeX/PDF according to my own particular tastes.

You might need to download free Adobe's Acrobat Reader program to view or print the PDF files.  The PDF files were created using a freely available utilities, so that you could see a book-like printout.  It's important to understand that no manual markup or editing was performed at any stage in these sample files.


©2001-2002 Ronald S. Burkey.  Last updated 08/04/02 by RSB.  Contact me.