A word should be treated as incorrect unless followed by a dot

96 Views Asked by At

How does hunspell handle dots (periods .) ?

For example, it is written e.g. or Dr. should these words be treated as correct? Is there any way to tell hunspell that Dr is correct only if it is followed by a dot like Dr. ?

2

There are 2 best solutions below

2
On BEST ANSWER

You can download any language specific .dic or .aff dictionary file and just replace Dr with Dr. to create your own customised dictionary for example:

wget -O en_GB.dic https://cgit.freedesktop.org/libreoffice/dictionaries/plain/en/en_GB.dic

This will get you the British English dictionary file. In this file only Dr is valid so just append the . to make Dr. and replace the original.

hunspell -d en_GB myDocument.txt

Here is the list of dictionary files available for download.

Edit:

Yes you can modify the dictionary used by Libre office:

cd /usr/share/hunspell
sudo nano en_GB.dic

Just do Ctrl + W to look for Dr and replace with Dr. or add/replace/delete any other word you desire. When you close and save the file just restart Libre Office and the file used by hunspell will become effective.

3
On

Hunspell, the spell checker used in many applications, has specific rules for handling punctuation and abbreviations. The behavior regarding dots (periods) and their association with words like "e.g." or "Dr." depends on the dictionary and affix file used by Hunspell.

By default, Hunspell considers a word followed by a dot as a separate token. That means "Dr" and "Dr." are treated as two different words. If "Dr" is in the dictionary, it will be considered correct regardless of whether it is followed by a dot.

To make Hunspell recognize "Dr" as correct only when it is followed by a dot, you would need to modify the dictionary file to include "Dr." but not "Dr". That way, Hunspell will only recognize the abbreviation as correct when it includes the dot.

You can create or modify a custom dictionary and affix file to define specific rules. That includes specifying which abbreviations are valid and how they should be treated concerning punctuation.

Hunspell's ability to handle complex rules about punctuation is somewhat limited. It primarily relies on the word list in the dictionary file and simple affix rules.

To implement a custom rule for "Dr." in Hunspell, you would typically make sure "Dr." is in the dictionary file, and "Dr" is not, if you want to enforce the dot. After modifying the dictionary, test Hunspell with various inputs to make sure it behaves as expected.

# Adding 'Dr.' to a custom dictionary
echo "Dr." >> my_custom_dict.dic

# Testing Hunspell with the custom dictionary
echo "Dr" | hunspell -d my_custom_dict
echo "Dr." | hunspell -d my_custom_dict