Subject: GutenMark - German Wordlist Unusable
Date: 03 Aug 2003 16:19:26 UT
To: info@sandroid.org

Hi Ronald,

as you mentioned somewhere in your FAQ, like many I also came across your site for non-Gutenberg reasons: Looking for wordlists for playing with anagrams.

Just a side note on the German wordlist:

I'm sorry to say that the wordlist available at your site
./wasftp.GutenMark/Wordlists/german2.words.gz is completely spoilt! There is so much script-generated nonsense in it that it would take days to manually clean it up again. False flexions, nonsense prefixes und suffixes - even from other languages ... just to name a few. I wasted quite some time in an effort to clean it up - but now it's half in size and the more difficult to find errors are still in there. The only method left is running a spellchecker against it - with manual editing in most cases - and now I decided it is not worth it. There are two lists I'd recommend for replacement: ftp://ftp.leo.org/pub/comp/doc/dict/german-wordlist.gz and
ftp://ftp.leo.org/pub/comp/doc/dict/german-wordlist.new.gz .

Actually they are two versions of the same list:
The first one transscribes German umlauts into two letter combinations which still results in readable writing
('a umlaut' = 'ä' = 'ä' becomes 'ae',
'sz ligature' = 'ß' = 'ß' becomes 'ss' and so on)
while the second one - maybe easier for automatic replacement because 'ae' and 'ss' do occur in some words - uses
('a umlaut' = 'ä' = 'ä' becomes 'a"',
'sz ligature' = 'ß' = 'ß' becomes 'sS')
instead.

At a first glance I couldn't detect any nonsense in there.

The files are just from their (LEO's) public domain software collection.
If you are ever interested in English-German translations, you might also want to look into their really cute English-German dictionaries search engine:
http://dict.leo.org/ GutenMark looks great BTW. I knew PG before, but not your program. Kind regards, Rainer Scheppelmann Bremen (Germany)