Attractively formatting Project Gutenberg texts
|
|
What is GutenMark?
What is Project Gutenberg?
The Problem
What's the Solution?
How does GutenMark fit in?
(GutenMark is not affiliated with Project Gutenberg in any way.)
When I started this project, my principal
complaint against Project Gutenbert etexts was that they were not very
pretty in comparison to typical hand-held books, or even in comparison
to typical web-pages. PG etexts have
traditionally been provided in a format providing most of the content
of the books (i.e., what was in the author's mind) but have discarded
the
attractive formatting (provided by the publisher). There have
been
sound reasons for doing so, and I can't quibble with them. In the intervening years, Project Gutenberg has made great strides in improving this situation, through the mechanism of a sub-organization known as Distributed Proofreaders. Distributed Proofreaders not only greatly reduces the error rates within the etexts, but also often produces attractive HTML versions of the etexts in addition to the plain-text versions. My personal assessment is that HTML produced by Distributed Proofreaders is "good enough" for browsing purposes, and could conceivably be adequate for printing in some cases if post-processed properly. Thus the need for a program like GutenMark is now somewhat reduced but not entirely eliminated. With that said, you can now continue to read my original diatribe, which doesn't take such later developments into account. |
The situation has improved somewhat, in several ways. Special software for reading the online books can make the books appear more attractive on the computer screen. There is a project of the HTML Writers Guild to provide XML versions of PG etexts. Even PG itself is now willing to accept formatted versions of etexts, as long as the plain-vanilla version exists also. I applaud all of these efforts, and I hope it does not denigrate them to add my own efforts to theirs.
In other words, Project Gutenberg has retained the content of the books in converting them to etexts, but has discarded the formatting. GutenMark aims to restore the formatting.
(In fairness, I should point out that there are other alternatives to GutenMark that you might want to consider. First, although Project Gutenberg likes to serve you plain text, it may also make other, fancier formats available to you for selected etexts. Check this out at gutenberg.org. Also, even when PG does not maintain a fancier version than plain text, the PG volunteers who created the etexts may have such material, and might provide them to you by email; usually, their email addresses can be found within the etexts themselves.)
How well does GutenMark succeed? It depends on the particular etext; in my view, it works pretty well. To give you some idea, here is a small, sample etext, processed in various ways:
You might need to download free Adobe's Acrobat Reader program to view or print the PDF files. The PDF files were created using a freely available utilities, so that you could see a book-like printout. It's important to understand that no manual markup or editing was performed at any stage in these sample files.