GutenMark Software Change Log
Attractively formatting Project Gutenberg texts

Ladders, by Lynnie Rothan

Download current development software source code:   GutenMark_source_dev-20090510.tar.gz .


02/16/09.  The Mac OS X version is now believed to be fully working on Mac OS X 10.5 (PPC and Intel), 10.4 (PPC and Intel), and 10.3 (PPC, of course!), though I only have a subset of these systems on which to test.

06/01/08.  Another pot-pourri:
05/28/08.  GUItenMark now compiles and works for Mac OS X, but so far I'm only able to build it for Mac OS X 10.5, so I haven't yet gone through the hassle of adding it to the binary download for Mac OS X.  However, it is quite easy to build following the instructions on the download page.  Also, Jason Pollock <> has sent some additional changes and instructions for compiling for the iPhone, but I've not yet had a chance to try them or incorporate them here.  Soon, I hope.

05/26/08.  A miscellaneous pot-pourri of changes:
05/09/08.  Well, Mac OS X support is back (due to the addition of Mac OS X support to the IMCROSS development system which I am using to create Windows binaries), but untested, so I won't provide binaries just yet.  More on this later.

04/23/08.  Made the installer programs a little smaller, by removing the Linux-only files from the Windows installer, and vice-versa.

04/22/08.  Reorganized how the installers are created, to avoid overwriting some directories that are useful to me.  Also, the installer isn't built automatically any longer (you have to do 'make snapshot').  But I don't suppose either of those things is of interest to anyone but me.

04/21/08.  Big doings are transpiring!

The GUItenMark graphical front-end program is now fully working on both Windows and Linux 'x86, and seems to work quite well.  Admittedly, I've tested only Windows XP, SuSE 10.0, and Fedora Core 5.

Linux and Win32 installer programs are now available.

The website has been completely revamped to replace all of the outmoded download, installation, and compilation instructions, and to provide the necessary new instructions pertaining to GUI front-end and installer programs.

Direct support for Linux PPC, Mac OS X, FreeBSD, and NetBSD has now been discontinued.  The software (or at least, GutenMark) can presumably still be built for these platforms, but I simply don't have the time or resources that allow me to do it myself.  I doubt this will be much loss to anybody, since it has been years since I've updated the binaries for those platforms anyhow.

04/20/08.  The GutenMark program itself has had the groundwork for several experimental improvements laid, but the changes haven't resulted yet in any quality changes.   More importantly, there is now a GUI front-end for GutenMark, cleverly called GUItenMark, for both Linux and Windows.  There are a couple of improvements I'd like to make in this program, and a couple of bug-fixes for the Windows version of the program, but it's basically working and seems very useful.  At any rate, even I admit that it's enormously easier to use than the GutenMark command-line program.  I don't want to generally send it out into the world until I make the mentioned changes, test it on more computers (so far I've only tried it on SuSE Linux 10.0 and Windows XP Home), have an installer program, and get the web-page verbiage all fixed up.  However you, loyal reader, can try it out now:
  1. Download  This is the complete package, and there's nothing else to download unless you want the source code as well!  But it is 13.4 Mbytes.
  2. For Windows, unzip this file in "C:\Program Files\", thus creating the directory "C:\Program Files\GutenMark".  For Linux, unzip it in your home directory, thus creating "~/GutenMark", and rename it to "~/.GutenMark".
  3. Left-click on your desktop to create an icon for the application program:  either "C:\Program Files\GutenMark\binary\GUItenMark.exe" (Windows) or else "~/.GutenMark/binary/GUItenMark" (Linux).
  4. Click the icon with your mouse.
  5. Operation of the program should be pretty darned self-explanatory, and it had better be since I haven't written any instructions for it yet.  If not, let me know.
02/02/08.  A lot of explanation has been added on the usage page about LaTeX, and particularly about using LaTeX on Windows.  Note also that essentially all of the pre-processed etexts have been recently corrected or improved in some way.  (Such changes are normally described on their own "What's New" page rather than being described on this software-change page.)

03/20/04.  Fixed PR #114.  This seemingly hasn't caused anybody a problem, but ....  In conjunction with this, the "prefatory" area is no longer in a smaller font size, except for the message that GutenMark itself adds.

02/21/04.  Fixed PR #113, which caused HTML headers to be omitted in some files created by GutenSplit.

01/21/04.  A new utility program called GutenSplit has been added; this program splits the HTML files created by GutenMark into smaller HTML files, adding a table of contents, and links between all of the small HTML files.  The Makefiles have also been modified so that the various "GutenUtilities" (including GutenSplit) are automatically built when GutenMark is built; previously, this was a separate, manual build.  Also, if a cross-compiler version of MinGW is installed on Linux, then a Linux build tries to create not only the Linux versions of the executables, but also Win32 versions as well.

01/05/03.  LaTeX:  Fixed the incorrect mdash construct I've been using all this time!  (I used "--" at first, and then later decided that "----" looked better.  However, since neither of these is the correct LaTeX construct, namely "---", there were problems with them being arbitrarily broken across lines.  Therefore, I added an \mbox to correct this problem, and then a \linebreak to correct problems with the \mbox, and then ....).

12/28/02.  LaTeX:  Supplied a workaround for an importing bug in LyX 1.2 (PR#110, spurious linefeeds inserted when importing a LaTeX command-sequence of the form "\ \ ").

12/25/02.  LaTeX:  Messages about the text having been converted by GutenMark, the software version, and what-not have been moved from the "Prefatory Materials" to a copyright-area on the back-side of the title-page.  Also, added a trick to fool LyX into correctly importing the date change from 12/23.  Added the "--ron" command-line switch to group together various settings that
I personally find useful.

12/23/02.  LaTeX:  Fixed PR#109 (hyphenation and linebreak problems with mdashes).  Also, removed the date which had appeared on the title page.

12/16/02.  Fixed PR#108 (inability to compile in NetBSD).

11/24/02.  The source tarball has been corrected to contain the correct html file for the regression test.  More slight improvements were made do special.words.gz.  Added "Hon." to the list of honorifics.

11/22/02.  This version has lots of improvements that--in my view, at least--make producing LaTeX much easier, along with a  few other miscellaneous changes:  Corrected the email address displayed by the software.  Added the word "The" to special.words.gz.  Fixed the bulk of the problems (but probably not all) associated with too-long spaces following honorifics and quotes in LaTeX.  Fixed smart single-quotes, so as not to be fooled by words like 'em, 'til, etc.  Fixed smart single-quotes and smart double-quotes to correctly treat cases like this:
    ... and so I says to him"--he paused briefly--"why don't you stop ...
Hopefully, fixed the missing quotes in LaTeX chapter names.  Mdashes in LaTeX are now enclosed in LaTeX \mbox{}, to avoid breaking them across lines; also, the --mdash-size switch has been added to allow longer (or shorter) mdashes.  The LaTeX default is now \raggedbottom.  Now use LaTeX "\ " everywhere that "~" had been used previously (allowing latex to much more easily space shorter lines).

08/26/02.  Fixed PR#99, making the 20020809 ALL-CAPS turnoff a little more reliable.

08/25/02.  A few LaTeX conveniences were added, mainly to eliminate manual post-corrections:  For page headings, the cases of the right- and left-headings are matched; i.e., if one of them is all-caps then the other one is forced to be all-caps also.  Hard-spaces in chapter headings have been eliminated.  When the chapter heading is something like "CHAPTER IV.  THE SEARCH FOR PEACE", the page heading now only shows "THE SEARCH FOR PEACE" and eliminates "CHAPTER IV."  The "\sloppy" markup is now used in place of "\emergencystretch".

08/11/02.  Some LaTeX formatting changes were made to work around bugs in LyX 1.2.  Also, some LaTeX command sequences (like that for the æ ligature) were broken and have now been fixed.  Others may still be broken, for all I know.

08/10/02.  Fixed a segfault in the author-deduction code added two days ago.

08/09/02.  Fixed PR #93 (incorrect treatment of constructs like _[text]_ and [_text_]).  Fixed PR #94 (mixing/matching of ALL-CAPS italicizing mode with other italicizing modes).  [This is handled in two ways:  GutenMark attempts to deduce which emphasis mode is used, but also the "--caps-ok" switch has been added to simply turn off ALL-CAPS conversion.]

08/08/02.  LaTeX-related changes:  Output file now includes a table of contents.  Command-line switches "--no-toc", "--author", and "--title" added.

08/05/02.  Added "--latex-sections" command-line option.

08/04/02.  All LaTeX-related:  Various bugs I found yesterday (see buglist) were fixed.  Also, the hard spaces in sentence breaks have been completely eliminated now.  Chapter headings are now ragged-right in all circumstances.

08/03/02.  Added a new page to the website, for etexts I've converted to LaTeX and PDF.

08/03/02.  Added "Rev.", "Gen.", and "Messrs" to the list of honorifics.  All of the following are LaTeX-only changes:  Trailing spaces and periods (but not ellipses) are now removed from chapter names as used for page headings.  For example, if the chapter name was "CHAPTER III." (as opposed to "CHAPTER III"), it will now continue to appear as "CHAPTER III." except in the page heading where it is not "CHAPTER III".  An "emergencystretch" factor is now added, to eliminate some run-ons into the right margin for small page sizes.  The soft-hyphen mechanism after em-dashes has been changed, because the old one didn't work.  Honorifics and other abbreviations being treated as ends of sentences has been fixed.

07/25/02.  Addressed PR #85, in which abnormally long input lines can cause corruption in the output file.

07/22/02.  Addressed PR #84, hopefully extending yesterday's fixes from Windows 98 to the entire Win32 family.  Sadly -- or perhaps happily -- I only have Windows 98 myself, and therefore can't properly test the fixes, and did not know that that the previous fixes didn't work in Win2K.

07/21/02.  Fixed PR #83, in which the PATH environment variable didn't work properly.  Fixed PR #80, in which the default configuration file needed -- unreasonably, I think -- to have the exact paths of the wordlists rather than a more graceful means of finding the wordlists.

07/14/02.  [20020714 snapshot .]  Fixed PR #79, in which configuration files and/or wordlists were not used properly in Win32 when running the program from an alternate directory.  Added a lot of new log messages related to globbing wordlists.

07/13/02.  Added the "--config" command-line switch.  Fixed the problem, in Win32, of the default configuration file being named "GutenMark.exe.cfg".  Improved the warning and log messages associated with searching for the configuration file.

06/30/02.  Added various additional messages associated with wordlist failure to the log created with "--debug".  Also, found and hopefully fixed potential program aborts or wordlist omissions that might occur if not all wordlists are found.

06/16/02.  (All of the following relate only to LaTeX output.)  I think that GutenMark's LaTex has now reached the point of being nearly as satisfactory (if not as well-tested) as its HTML.  Some details:  Fixed PR#76, bad page headers.  Also, GutenMark no longer creates a table of contents, since the table that it had been creating was flaky.  Fixed PR#77, in which certain italicized text was omitted.  Fixed PR#19, in which the paragraph/line indentation was screwed up when processing certain text files.

06/15/02.  Fixed a shocking but previously undetected indexing bug in the prefatory analysis section.  (All of the following relate to LaTeX output only.)  Fixed PR#20, in which chapter headings incorrectly always added the words "Chapter 1", "Chapter 2", etc.  The default paragraph indentation has changed to non-indented (with a blank line between paragraphs).  To  override this (and return instead to indented paragraphs not separated with blanks) added the new command-line switch "--no-parskip".  PR#24 fixed:  The so-called "8-bit ASCII" PG files (i.e., ISO 8859-1 files) are now supported -- not perfectly, but well enough -- and certain bugs previously present in some of the regular 7-bit characters (|, <, >, ^, and maybe others) have been fixed.

03/11/02.  More website face-lifting, some new FAQ text.

02/07/02.  Gave the website a small facelift .

01/18/02.  Added the --no-prefatory and --page-breaks command-line switches, in response to some needs of folk at the Foundation Project.

01/13/02.  Source-code rearrangements:  the code that formerly handled HTML/Latex output has been split into separate functions in separate source files:  OutputHtml.c, OutputLatex.c, OutputXml.c, etc., to make it easier to maintain the separate output formats.

01/11/02.  On the download page, there is now a GutenMark *NIX man page, courtesy of Doug Merritt.

12/28/01.  [20011228 snapshot .]

12/27/01.  Can now markup non-PG plain-vanilla ASCII etexts.  Before, there had been bugs that prevented GutenMark from working properly if the PG header/footer could not be found.

12/23/01.  Auto-italicizing foreign phrases has been improved by fixing issues #27 & #65.  This should mean that there is less chance of non-foreign words appearing in foreign phrases; conversely, foreign proper names are now more likely to be italicized when found within the context of a foreign phrase.  This has entailed changes to the format of GutenMark.cfg, but these changes should be backward-compatible.

12/22/01.  Added a database of non-U.S. place names to the wordlists.  Like the U.S. placenames (see below), this is only a preliminary implementation, involving single-word names.  I don't have a lot of confidence in these names, and the database is horrifically large, so I'm not sure how valuable it may be.

12/20/01.  The 'prefatory area' now appears in a slightly smaller font than the remainder of the text.  This is done primarily to reduce the chances of awful-looking word-wrapping (since the prefatory area isn't reformatted).  But it also has the desirable (I think) side-effect of making the prefatory area look different and hence to help hide the fact that it has not participated in the beautifying operations carried out on the rest of the text.

12/18/01.  Added a database of U.S. place names to the wordlists.  These are only single-word names (like "Philadelphia" but not "New Orleans"), but is at least better than nothing until a full multi-word database can be implemented.

12/16/01.  Fixed a possibly-common case in which a blockquote was misinterpreted as centered text.  HTML-beautifying:  removed spurious soft-space that usually (or always) followed <br>.  Changed the default mode from '--force-symbolic' to '--force-numeric', so that stuff from the prior release won't be broken at the next release.  Fixed a bug with automatic substitution in which 8-bit ASCII could be directly embedded in the HTML rather than being converted to HTML character entities.  Added a bunch of new italicizing styles that were referred to in the gutvol-d newsgroup recently.

12/15/01.  The raw HTML output has been further beautified, as an aid to those who want to use GutenMark output as a starting point for further markup. (The reader's experience within a browser won't be affected.) The changes include limitation of line-length and rejustification of paragraph text, and the elimination of multiple line-feeds (except in preformatted areas).   Also, fixed a bug (#57) in which paragraphs could be started but not internally marked as such, and therefore would receive no closing tag.  Fixed a bug apparently introduced when issue #42 was dealt with, allowing a trailing mdash to appear outside of a paragraph.

12/13/01.  Added means of detecting certain weird sequences of characters that don't appear in normal text:  ".....", ". . . . .", "----", lines beginning or ending with a vertical bar.  When found near the beginning of where a paragraph would ordinarily start, they are interpreted instead as meaning that an area of preformatted text should start.  This seemingly takes care of problem reports #50, #52, and #53.  There was a problem with italicizing and/or 8-bit ASCII conversion of foreign words (#54) which sometimes incorrectly converted the first character of a foreign word to lower case; this has been fixed.  It is annoying when a proper name is interpreted as a foreign word and italicized and/or supplemented with diacriticals; personal names are now partially recognizable contextually, and these features disabled for them.

12/11/01.  HTML validity bugs fixed:  Bug #49 (bogus "<p></body>" at the end of the output file).  Bug #51 (double italicizing of some words, like <i><i>this</i></i>.)

12/10/01.  Added the command-line switch '--single-space' to allow single blanks between sentences or after colons (rather than the standard default two spaces).  Performed numerous beautifications of the HTML which won't affect the user, including closing issues #42 and #48 on the buglist.  Also, all HTML character entities now default to symbolic rather than numerical (for example, "&lsquo;" rather than "&#8216;"), but can be forced to be numeric instead with the '--force-numeric' command-line switch.  These changes do make the raw HTML a lot easier to read.

12/09/01.  All of the following changes concern the first word of a chapter, if the word is in ALL-CAPS:  A bug in the case-conversion of the words has been fixed.  The default behavior has been changed so that the word is now not italicized (even though its case is fixed).  New command-line switches have been added: '--first-italics' restores the old behavior (i.e., allows italicizing as well as case-conversion) and '--first-capital' allows the word to remain ALL-CAPS (without italicizing).

12/05/01.  [ 20011205 source-only snapshot .]  Discovered that the source format had gotten all messed up and unreadable by shuttling it back and forth between Linux and Win32.  Yikes!  Now fixed.

12/05/01.  Removed the compilation date from the disclaimer GutenMarkadds to the output file, to allow the regression test to work automatically (without inspection).  Provided for a global configuration file (which can be overridden by a user's configuration file), thus allowing a more sensible and automatic installation procedure.

12/01/01.  [ 20011201 snapshot .]  Fixed various bugs that became apparent only in running the Win32 and BSD-based versions.  Added regression testing to the build process.

11/26/01.  Experimental '--latex' command-line switch added.

11/25/01.  Foreign italicizing now pretty acceptable.  '--no-foreign' command-line switch added to turn off the foreign italicizing.  Restoration of 8-bit ASCII (diacritical marks) in 7-bit ASCII etext now implemented, along with the '--no-diacritical' command-line switch.  Embedded GutenMark disclaimer improved.  Detection of verse now slightly modified to reduce the probability of falsely detecting dialog as verse.  (Hopefully, real verse detection wasn't broken by this!)

11/23/01.  Wordlist/namelist extension to the purely internal handling of ALL-CAPS capitalization has been added, and some bugs fixed.  A preliminary form of automatically italicizing foreign words by means of the wordlists has been added.

11/22/01.  Parsing of the configuration file and reading of the wordlists/namelists has been added, along with a related "--profile" command-line switch.  Added a '--debug' command-line switch, which causes a logfile (GutenMark.log) to be created.  The logfile has many potential uses, but right now it lists the wordlists/namelists used, and shows how each word in the wordlist generated from the etext relates to the words in them.

11/19/01.  Fixed a bug in which things like "--" at the end of a line of the input text-file were converted to "-".   Added a notification at the top of the output file that the text had been converted by GutenMark .  Added a preliminary form of handling all-caps italicizing based on contextual clues within the text file.

11/13/01.  [ 20011113 snapshot .]  Added "smart quotes".

11/12/01.  Now removes the Project Gutenberg file header (and file footer) by default, in order to conform to the requirements described in the Project Gutenberg header itself.  However, the header (and footer) can be retained by means of a command-line option (--yes-header).
11/09/01.  Constructs like " - ", "--", "---", "----", and so on, are now converted to appropriate combinations of m-dashes and n-dashes.  These look great when the HTML is converted to postscript or PDF, but don't render well in any of the browsers I've tried. Therefore, a command-line option (--no-mdash) has been added that can turn this conversion off.11/02/01.  [ 20011102 snapshot .]  This is the first version made available online.  It should be quite useful, but it is certainly a very preliminary version.

08/30/01.  Began.

©2001-2003 Ronald S. Burkey .  Contact me .