Is it possible to pass multiple dictionary in enchant?

739 Views Asked by At

Is there any way I can use multiple dictionary in enchant. This is what I do,

import enchant
d = enchant.Dict("en_US")
d.check("materialise")
>> False

But if I use enchant.Dict("en_UK"), I will get True. What is the best way to combine multiple dictionaries, so that it will return True no matter materialise or materialize as the input argument?

3

There are 3 best solutions below

0
On

For Hunspell dictionaries there's a workaround if both dictionaries share the same .aff file and I suppose en_US and en_GB pass that condition.

The author is Sergey Kurakin and the Bash script is (dic_combine.sh) as follows:

#!/bin/bash

# Combines two or more hunspell dictionaries.
# (C) 2010 Sergey Kurakin <kurakin_at_altlinux_dot_org>

# Attention! All source dictionaries MUST share the same affix file.

# Usage: dic_combine source1.dic source2.dic [source3.dic...] > combined.dic

TEMPFILE=`mktemp`

cat $@ | sort --unique | sed -r 's|^[0123456789]*$||;/^$/d' > $TEMPFILE

cat $TEMPFILE | wc -l
cat $TEMPFILE
rm -f $TEMPFILE
rm -f $TEMPFILE 

So, you have to put those dictionary files in a directory and run:

$ dic_combine en_US.dic en_GB.dic > en.dic
1
On

@Mass17 that is actually not correct. The expression "en_US" and "en_UK" is a logical AND operation on 2 strings of which the result is "en_UK". Here's how the AND operator works in the above expression: (1) first, any non-empty string is considered True, (2) if the left string is True then the right string is checked and returned. Read about Python's short-circuit evaluation for some insight about why it works this way.

So:

>>> "en_US" and "en_UK"
'en_UK'

And note, if you switch the order of the strings:

>>> "en_UK" and "en_US"
'en_US'

The words "materialise" and "materialize" BOTH appear in your "en_UK" dictionary, hence the results you got. You haven't actually "combined" the 2 dictionaries yet.

0
On

I may be late here, but this question intrigued me too.

So, the solution for using multiple dialects of the English language in Python's enchant is as below:

    import enchant
    '''
    Use "en" simply to cover all available dialects and word usages of the English language
    '''
    d = enchant.Dict("en")
    d.check("materialise")  # UK (en_GB)
    >>> True
    
    d.check("materialize")  # USA (en_US)
    >>> True

Hope this helps for our future readers here :)