GutenMark Software Change Log
Attractively formatting Project Gutenberg texts


home
features
download
usage
FAQ
changes
bugs
links
developer
Ladders, by Lynnie Rothan

Download current development software:   GutenMark_source_dev.tar.gz .

11/24/02.  The source tarball has been corrected to contain the correct html file for the regression test.  More slight improvements were made do special.words.gz.  Added "Hon." to the list of honorifics.

11/22/02.  This version has lots of improvements that--in my view, at least--make producing LaTeX much easier, along with a  few other miscellaneous changes:  Corrected the email address displayed by the software.  Added the word "The" to special.words.gz.  Fixed the bulk of the problems (but probably not all) associated with too-long spaces following honorifics and quotes in LaTeX.  Fixed smart single-quotes, so as not to be fooled by words like 'em, 'til, etc.  Fixed smart single-quotes and smart double-quotes to correctly treat cases like this:
    ... and so I says to him"--he paused briefly--"why don't you stop ...
Hopefully, fixed the missing quotes in LaTeX chapter names.  Mdashes in LaTeX are now enclosed in LaTeX \mbox{}, to avoid breaking them across lines; also, the --mdash-size switch has been added to allow longer (or shorter) mdashes.  The LaTeX default is now \raggedbottom.  Now use LaTeX "\ " everywhere that "~" had been used previously (allowing latex to much more easily space shorter lines).

08/26/02.  Fixed PR#99, making the 20020809 ALL-CAPS turnoff a little more reliable.

08/25/02.  A few LaTeX conveniences were added, mainly to eliminate manual post-corrections:  For page headings, the cases of the right- and left-headings are matched; i.e., if one of them is all-caps then the other one is forced to be all-caps also.  Hard-spaces in chapter headings have been eliminated.  When the chapter heading is something like "CHAPTER IV.  THE SEARCH FOR PEACE", the page heading now only shows "THE SEARCH FOR PEACE" and eliminates "CHAPTER IV."  The "\sloppy" markup is now used in place of "\emergencystretch".

08/11/02.  Some LaTeX formatting changes were made to work around bugs in LyX 1.2.  Also, some LaTeX command sequences (like that for the æ ligature) were broken and have now been fixed.  Others may still be broken, for all I know.

08/10/02.  Fixed a segfault in the author-deduction code added two days ago.

08/09/02.  Fixed PR #93 (incorrect treatment of constructs like _[text]_ and [_text_]).  Fixed PR #94 (mixing/matching of ALL-CAPS italicizing mode with other italicizing modes).  [This is handled in two ways:  GutenMark attempts to deduce which emphasis mode is used, but also the "--caps-ok" switch has been added to simply turn off ALL-CAPS conversion.]

08/08/02.  LaTeX-related changes:  Output file now includes a table of contents.  Command-line switches "--no-toc", "--author", and "--title" added.

08/05/02.  Added "--latex-sections" command-line option.

08/04/02.  All LaTeX-related:  Various bugs I found yesterday (see buglist) were fixed.  Also, the hard spaces in sentence breaks have been completely eliminated now.  Chapter headings are now ragged-right in all circumstances.

08/03/02.  Added a new page to the website, for etexts I've converted to LaTeX and PDF.

08/03/02.  Added "Rev.", "Gen.", and "Messrs" to the list of honorifics.  All of the following are LaTeX-only changes:  Trailing spaces and periods (but not ellipses) are now removed from chapter names as used for page headings.  For example, if the chapter name was "CHAPTER III." (as opposed to "CHAPTER III"), it will now continue to appear as "CHAPTER III." except in the page heading where it is not "CHAPTER III".  An "emergencystretch" factor is now added, to eliminate some run-ons into the right margin for small page sizes.  The soft-hyphen mechanism after em-dashes has been changed, because the old one didn't work.  Honorifics and other abbreviations being treated as ends of sentences has been fixed.

07/25/02.  Addressed PR #85, in which abnormally long input lines can cause corruption in the output file.

07/22/02.  Addressed PR #84, hopefully extending yesterday's fixes from Windows 98 to the entire Win32 family.  Sadly -- or perhaps happily -- I only have Windows 98 myself, and therefore can't properly test the fixes, and did not know that that the previous fixes didn't work in Win2K.

07/21/02.  Fixed PR #83, in which the PATH environment variable didn't work properly.  Fixed PR #80, in which the default configuration file needed -- unreasonably, I think -- to have the exact paths of the wordlists rather than a more graceful means of finding the wordlists.

07/14/02.  [20020714 snapshot .]  Fixed PR #79, in which configuration files and/or wordlists were not used properly in Win32 when running the program from an alternate directory.  Added a lot of new log messages related to globbing wordlists.

07/13/02.  Added the "--config" command-line switch.  Fixed the problem, in Win32, of the default configuration file being named "GutenMark.exe.cfg".  Improved the warning and log messages associated with searching for the configuration file.

06/30/02.  Added various additional messages associated with wordlist failure to the log created with "--debug".  Also, found and hopefully fixed potential program aborts or wordlist omissions that might occur if not all wordlists are found.

06/16/02.  (All of the following relate only to LaTeX output.)  I think that GutenMark's LaTex has now reached the point of being nearly as satisfactory (if not as well-tested) as its HTML.  Some details:  Fixed PR#76, bad page headers.  Also, GutenMark no longer creates a table of contents, since the table that it had been creating was flaky.  Fixed PR#77, in which certain italicized text was omitted.  Fixed PR#19, in which the paragraph/line indentation was screwed up when processing certain text files.

06/15/02.  Fixed a shocking but previously undetected indexing bug in the prefatory analysis section.  (All of the following relate to LaTeX output only.)  Fixed PR#20, in which chapter headings incorrectly always added the words "Chapter 1", "Chapter 2", etc.  The default paragraph indentation has changed to non-indented (with a blank line between paragraphs).  To  override this (and return instead to indented paragraphs not separated with blanks) added the new command-line switch "--no-parskip".  PR#24 fixed:  The so-called "8-bit ASCII" PG files (i.e., ISO 8859-1 files) are now supported -- not perfectly, but well enough -- and certain bugs previously present in some of the regular 7-bit characters (|, <, >, ^, and maybe others) have been fixed.

03/11/02.  More website face-lifting, some new FAQ text.

02/07/02.  Gave the website a small facelift .

01/18/02.  Added the --no-prefatory and --page-breaks command-line switches, in response to some needs of folk at the Foundation Project.

01/13/02.  Source-code rearrangements:  the code that formerly handled HTML/Latex output has been split into separate functions in separate source files:  OutputHtml.c, OutputLatex.c, OutputXml.c, etc., to make it easier to maintain the separate output formats.

01/11/02.  On the download page, there is now a GutenMark *NIX man page, courtesy of Doug Merritt.

12/28/01.  [20011228 snapshot .]

12/27/01.  Can now markup non-PG plain-vanilla ASCII etexts.  Before, there had been bugs that prevented GutenMark from working properly if the PG header/footer could not be found.

12/23/01.  Auto-italicizing foreign phrases has been improved by fixing issues #27 & #65.  This should mean that there is less chance of non-foreign words appearing in foreign phrases; conversely, foreign proper names are now more likely to be italicized when found within the context of a foreign phrase.  This has entailed changes to the format of GutenMark.cfg, but these changes should be backward-compatible.

12/22/01.  Added a database of non-U.S. place names to the wordlists.  Like the U.S. placenames (see below), this is only a preliminary implementation, involving single-word names.  I don't have a lot of confidence in these names, and the database is horrifically large, so I'm not sure how valuable it may be.

12/20/01.  The 'prefatory area' now appears in a slightly smaller font than the remainder of the text.  This is done primarily to reduce the chances of awful-looking word-wrapping (since the prefatory area isn't reformatted).  But it also has the desirable (I think) side-effect of making the prefatory area look different and hence to help hide the fact that it has not participated in the beautifying operations carried out on the rest of the text.

12/18/01.  Added a database of U.S. place names to the wordlists.  These are only single-word names (like "Philadelphia" but not "New Orleans"), but is at least better than nothing until a full multi-word database can be implemented.

12/16/01.  Fixed a possibly-common case in which a blockquote was misinterpreted as centered text.  HTML-beautifying:  removed spurious soft-space that usually (or always) followed <br>.  Changed the default mode from '--force-symbolic' to '--force-numeric', so that stuff from the prior release won't be broken at the next release.  Fixed a bug with automatic substitution in which 8-bit ASCII could be directly embedded in the HTML rather than being converted to HTML character entities.  Added a bunch of new italicizing styles that were referred to in the gutvol-d newsgroup recently.

12/15/01.  The raw HTML output has been further beautified, as an aid to those who want to use GutenMark output as a starting point for further markup. (The reader's experience within a browser won't be affected.) The changes include limitation of line-length and rejustification of paragraph text, and the elimination of multiple line-feeds (except in preformatted areas).   Also, fixed a bug (#57) in which paragraphs could be started but not internally marked as such, and therefore would receive no closing tag.  Fixed a bug apparently introduced when issue #42 was dealt with, allowing a trailing mdash to appear outside of a paragraph.

12/13/01.  Added means of detecting certain weird sequences of characters that don't appear in normal text:  ".....", ". . . . .", "----", lines beginning or ending with a vertical bar.  When found near the beginning of where a paragraph would ordinarily start, they are interpreted instead as meaning that an area of preformatted text should start.  This seemingly takes care of problem reports #50, #52, and #53.  There was a problem with italicizing and/or 8-bit ASCII conversion of foreign words (#54) which sometimes incorrectly converted the first character of a foreign word to lower case; this has been fixed.  It is annoying when a proper name is interpreted as a foreign word and italicized and/or supplemented with diacriticals; personal names are now partially recognizable contextually, and these features disabled for them.

12/11/01.  HTML validity bugs fixed:  Bug #49 (bogus "<p></body>" at the end of the output file).  Bug #51 (double italicizing of some words, like <i><i>this</i></i>.)

12/10/01.  Added the command-line switch '--single-space' to allow single blanks between sentences or after colons (rather than the standard default two spaces).  Performed numerous beautifications of the HTML which won't affect the user, including closing issues #42 and #48 on the buglist.  Also, all HTML character entities now default to symbolic rather than numerical (for example, "&lsquo;" rather than "&#8216;"), but can be forced to be numeric instead with the '--force-numeric' command-line switch.  These changes do make the raw HTML a lot easier to read.

12/09/01.  All of the following changes concern the first word of a chapter, if the word is in ALL-CAPS:  A bug in the case-conversion of the words has been fixed.  The default behavior has been changed so that the word is now not italicized (even though its case is fixed).  New command-line switches have been added: '--first-italics' restores the old behavior (i.e., allows italicizing as well as case-conversion) and '--first-capital' allows the word to remain ALL-CAPS (without italicizing).

12/05/01.  [ 20011205 source-only snapshot .]  Discovered that the source format had gotten all messed up and unreadable by shuttling it back and forth between Linux and Win32.  Yikes!  Now fixed.

12/05/01.  Removed the compilation date from the disclaimer GutenMarkadds to the output file, to allow the regression test to work automatically (without inspection).  Provided for a global configuration file (which can be overridden by a user's configuration file), thus allowing a more sensible and automatic installation procedure.

12/01/01.  [ 20011201 snapshot .]  Fixed various bugs that became apparent only in running the Win32 and BSD-based versions.  Added regression testing to the build process.

11/26/01.  Experimental '--latex' command-line switch added.

11/25/01.  Foreign italicizing now pretty acceptable.  '--no-foreign' command-line switch added to turn off the foreign italicizing.  Restoration of 8-bit ASCII (diacritical marks) in 7-bit ASCII etext now implemented, along with the '--no-diacritical' command-line switch.  Embedded GutenMark disclaimer improved.  Detection of verse now slightly modified to reduce the probability of falsely detecting dialog as verse.  (Hopefully, real verse detection wasn't broken by this!)

11/23/01.  Wordlist/namelist extension to the purely internal handling of ALL-CAPS capitalization has been added, and some bugs fixed.  A preliminary form of automatically italicizing foreign words by means of the wordlists has been added.

11/22/01.  Parsing of the configuration file and reading of the wordlists/namelists has been added, along with a related "--profile" command-line switch.  Added a '--debug' command-line switch, which causes a logfile (GutenMark.log) to be created.  The logfile has many potential uses, but right now it lists the wordlists/namelists used, and shows how each word in the wordlist generated from the etext relates to the words in them.

11/19/01.  Fixed a bug in which things like "--" at the end of a line of the input text-file were converted to "-".   Added a notification at the top of the output file that the text had been converted by GutenMark .  Added a preliminary form of handling all-caps italicizing based on contextual clues within the text file.

11/13/01.  [ 20011113 snapshot .]  Added "smart quotes".

11/12/01.  Now removes the Project Gutenberg file header (and file footer) by default, in order to conform to the requirements described in the Project Gutenberg header itself.  However, the header (and footer) can be retained by means of a command-line option (--yes-header).
11/09/01.  Constructs like " - ", "--", "---", "----", and so on, are now converted to appropriate combinations of m-dashes and n-dashes.  These look great when the HTML is converted to postscript or PDF, but don't render well in any of the browsers I've tried. Therefore, a command-line option (--no-mdash) has been added that can turn this conversion off.11/02/01.  [ 20011102 snapshot .]  This is the first version made available online.  It should be quite useful, but it is certainly a very preliminary version.

08/30/01.  Began.


©2001-2002 Ronald S. Burkey .  Contact me .