[htdig] Match Umlauts as ae AND a-umlaut possible?

Discussion:

Karl Beckers

2003-07-22 05:26:11 UTC

Hi all,

German has alternative ways of spelling for certain characters.
An A-Umlaut e.g. can be written using a special character 'ä'
(a\" in the affix file of the dictionary) or ae.
Now, I'd like to find both spellings, no matter what kind of
spelling the user used for his query.

Is that possible?
I've been looking at ispell docs and can't say I've found much
of a clue. Any pointers from this list? Anybody done this,
or is this a nogo?

TIA,

Karl.

Gilles Detillieux

2003-07-22 13:19:14 UTC

Permalink

Post by Karl Beckers
German has alternative ways of spelling for certain characters.
An A-Umlaut e.g. can be written using a special character 'ä'
(a\" in the affix file of the dictionary) or ae.
Now, I'd like to find both spellings, no matter what kind of
spelling the user used for his query.
Is that possible?
I've been looking at ispell docs and can't say I've found much
of a clue. Any pointers from this list? Anybody done this,
or is this a nogo?

For now it's pretty much a nogo. We had talked previously about expanding
the accents algorithm to be configurable and to handle one-to-many and
many-to-one character conversions, but no one has been able to step up
to the plate and implement this. The only alternative I can see at this
point is to put all the equivalent words containing umlauts and digraphs
together in the synonyms file, but that could get tedious as the matches
must be word by word, and not character by character.

--
Gilles R. Detillieux E-mail: <***@scrc.umanitoba.ca>
Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/
Dept. Physiology, U. of Manitoba Winnipeg, MB R3E 3J7 (Canada)

Karl Beckers

2003-07-22 13:53:34 UTC

Permalink

Post by Gilles Detillieux
[...]
For now it's pretty much a nogo. We had talked previously about expanding
the accents algorithm to be configurable and to handle one-to-many and
many-to-one character conversions, but no one has been able to step up
to the plate and implement this. The only alternative I can see at this
point is to put all the equivalent words containing umlauts and digraphs
together in the synonyms file, but that could get tedious as the matches
must be word by word, and not character by character.

Thanks,

Karl.