Bitextor/Bicleaner MAX_ORDER Issue

21 Views Asked by At

I am trying to analyze a translation file (with English-French sentence pairs) using Bicleaner (https://github.com/bitextor/bicleaner). I have a "test corpus" with ten sentence pairs formatted as required but when I run the code to generate a classifier file I always run into the same issue.

CODE:

!bicleaner-ai-classify  \
        --scol 3 --tcol 4 \
        corpus.en-de.tsv  \
        corpus.en-de.classifed.tsv  \
        bitextor/bicleaner-ai-full-en-fr

I always receive the following error:


2024-03-26 12:07:26,386 - INFO - Arguments processed
2024-03-26 12:07:26,387 - INFO - Starting process
Traceback (most recent call last):
  File "kenlm.pyx", line 139, in kenlm.Model.__init__
RuntimeError: lm/model.cc:49 in void lm::ngram::detail::{anonymous}::CheckCounts(const std::vector<long unsigned int>&) threw FormatLoadException because `counts.size() > 6'.
This model has order 7 but KenLM was compiled to support up to 6.  If your build system supports changing KENLM_MAX_ORDER, change it there and recompile.  With cmake:
 cmake -DKENLM_MAX_ORDER=10 ..
With Moses:
 bjam --max-kenlm-order=10 -a
Otherwise, edit lm/max_order.hh.

The above exception was the direct cause of the following exception:

Traceback (most recent call last): [...]


I have tried and retried cmake -DKENLM_MAX_ORDER=10 and recompiled but it doesn't work. I have also accessed lm/max_order.hh to manually set MAX_ORDER. The previous content of that file was:

#ifndef LM_MAX_ORDER_H
#define LM_MAX_ORDER_H
/* IF YOUR BUILD SYSTEM PASSES -DKENLM_MAX_ORDER, THEN CHANGE THE BUILD SYSTEM.
 * If not, this is the default maximum order.
 * Having this limit means that State can be
 * (kMaxOrder - 1) * sizeof(float) bytes instead of
 * sizeof(float*) + (kMaxOrder - 1) * sizeof(float) + malloc overhead
 */
#ifndef KENLM_ORDER_MESSAGE
#define KENLM_ORDER_MESSAGE "If your build system supports changing KENLM_MAX_ORDER, change it there and recompile.  With cmake:\n cmake -DKENLM_MAX_ORDER=10 ..\nWith Moses:\n bjam --max-kenlm-order=10 -a\nOtherwise, edit lm/max_order.hh."
#endif

#endif // LM_MAX_ORDER_H

Now it is:

#ifndef LM_MAX_ORDER_H
#define LM_MAX_ORDER_H

#ifndef KENLM_MAX_ORDER
#define KENLM_MAX_ORDER 10
#endif

#ifndef KENLM_ORDER_MESSAGE
#define KENLM_ORDER_MESSAGE "If your build system supports changing KENLM_MAX_ORDER, change it there and recompile.  With cmake:\n cmake -DKENLM_MAX_ORDER=10 ..\nWith Moses:\n bjam --max-kenlm-order=10 -a\nOtherwise, edit lm/max_order.hh."
#endif

#endif // LM_MAX_ORDER_H

I recompiled afterwards by doing:

cmake ..
make -j4

Yet it doesn't work. I am working in Linux with Jupyter Notebook as a Python notebook and conda as a terminal (in a specific environment). I have basically taken all the steps listed in the error message and tried different approaches, and even after manually editing the build files to 10 instead of 6 and recompiling with no issue I still encounter the same problem.

0

There are 0 best solutions below