In this project, I am trying to utilize the pycaret package to analyze some time series with the help of scikit-learn package. Specifically, I have imported some modules as follows:

from pycaret.regression import (setup, compare_models, predict_model, plot_model, finalize_model, load_model)

# setting up the stage to initialize the training environment
s = setup(
            data=train,
            target=target_var,
            ignore_features = ['Series'],
            numeric_features=involved_numerics,
            categorical_features = categorics,
            silent=True,
            log_experiment=True,
         )

 # Now, to train machine learning models, we need to compare models and find the best one
 best_model = compare_models(sort='MAE') 

 # Making some plots
 for id, name in zip(ids, names):
     plot_model(best_model, plot=id, scale=3, save=True)
 .
 .
 .

I was able to succeed in running the code for some of the models but not all from the list of available models mentioned in the documentation. However, for some specific models (such as Recursive Feat. Selection), there is an error message:

Traceback (most recent call last):
  File "c:/Users/username/Desktop/project/project.py", line 55, 
in <module>
    main()
  File "c:/Users/username/Desktop/project/project.py", line 48, 
in main
    ml_modelling(data, train, test)
  File "c:\Users\username\Desktop\project\utilities.py", line 1070, in ml_modelling
    plot_model(best_model, plot=id, scale=3, save=True)
  File "C:\Users\username\anaconda3\envs\py38\lib\site-packages\pycaret\regression.py", line 1601, in plot_model
    return pycaret.internal.tabular.plot_model(
  File "C:\Users\username\anaconda3\envs\py38\lib\site-packages\pycaret\internal\tabular.py", line 7712, in plot_model
    ret = locals()[plot]()
  File "C:\Users\username\anaconda3\envs\py38\lib\site-packages\pycaret\internal\tabular.py", line 6293, in residuals_interactive
    resplots.write_html(plot_filename)
  File "C:\Users\username\anaconda3\envs\py38\lib\site-packages\pycaret\internal\plots\residual_plots.py", line 673, in write_html
    f.write(html)
  File "C:\Users\username\anaconda3\envs\py38\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]        
UnicodeEncodeError: 'charmap' codec can't encode character '\u25c4' in position 276445: character maps to <undefined>

Here is the train:

Train

     Series  x     y   z    ID    var1  var2  var3  var4  var5  var6
0         1  2     1   3   True    -3    -4     6     7     4    6
1         2  2     1   7   False   22     0     3     5     2    8
2         3  2     1   0   True     3    -6     3     5     4    4
3         4  2     1   4   False   27    -4     8     3    -3    2
.
.
.

I am using VSCode to run my python tool on a Windows 10 machine and here is the list of all packages installed on the conda environment:

name: py38
channels:
  - conda-forge
  - defaults
dependencies:
  - bzip2=1.0.8=h8ffe710_4
  - ca-certificates=2022.12.7=h5b45459_0
  - et_xmlfile=1.1.0=pyhd8ed1ab_0
  - libffi=3.4.2=h8ffe710_5
  - libsqlite=3.40.0=hcfcfb64_0
  - libzlib=1.2.13=hcfcfb64_4
  - openpyxl=3.0.10=py38h91455d4_2
  - openssl=3.0.7=hcfcfb64_2
  - pip=22.3.1=pyhd8ed1ab_0
  - python=3.8.15=h4de0772_1_cpython
  - python_abi=3.8=3_cp38
  - setuptools=66.1.1=pyhd8ed1ab_0
  - tk=8.6.12=h8ffe710_0
  - ucrt=10.0.22621.0=h57928b3_0
  - vc=14.3=hb6edc58_10
  - vs2015_runtime=14.34.31931=h4c5c07a_10
  - wheel=0.38.4=pyhd8ed1ab_0
  - xz=5.2.6=h8d14728_0
  - pip:
      - alembic==1.9.2
      - asttokens==2.2.1
      - attrs==22.2.0
      - backcall==0.2.0
      - blis==0.7.9
      - boruta==0.3
      - catalogue==1.0.2
      - certifi==2022.12.7
      - charset-normalizer==3.0.1
      - click==8.1.3
      - cloudpickle==2.2.1
      - colorama==0.4.6
      - colorlover==0.3.0
      - comm==0.1.2
      - contourpy==1.0.7
      - cufflinks==0.17.3
      - cycler==0.11.0
      - cymem==2.0.7
      - cython==0.29.14
      - databricks-cli==0.17.4
      - debugpy==1.6.6
      - decorator==5.1.1
      - docker==6.0.1
      - entrypoints==0.4
      - executing==1.2.0
      - flask==2.2.2
      - fonttools==4.38.0
      - funcy==1.18
      - future==0.18.3
      - gensim==3.8.3
      - gitdb==4.0.10
      - gitpython==3.1.30
      - greenlet==2.0.2
      - htmlmin==0.1.12
      - idna==3.4
      - imagehash==4.3.1
      - imbalanced-learn==0.7.0
      - importlib-metadata==5.2.0
      - importlib-resources==5.10.2
      - ipykernel==6.20.2
      - ipython==8.9.0
      - ipywidgets==8.0.4
      - itsdangerous==2.1.2
      - jedi==0.18.2
      - jinja2==3.1.2
      - joblib==1.2.0
      - jupyter-client==8.0.1
      - jupyter-core==5.1.5
      - jupyterlab-widgets==3.0.5
      - kiwisolver==1.4.4
      - kmodes==0.12.2
      - lightgbm==3.3.5
      - llvmlite==0.37.0
      - mako==1.2.4
      - markdown==3.4.1
      - markupsafe==2.1.2
      - matplotlib==3.6.3
      - matplotlib-inline==0.1.6
      - mlflow==2.1.1
      - mlxtend==0.19.0
      - multimethod==1.9.1
      - murmurhash==1.0.9
      - nest-asyncio==1.5.6
      - networkx==3.0
      - nltk==3.8.1
      - numba==0.54.1
      - numexpr==2.8.4
      - numpy==1.20.3
      - oauthlib==3.2.2
      - packaging==22.0
      - pandas==1.5.3
      - pandas-profiling==3.6.3
      - parso==0.8.3
      - patsy==0.5.3
      - phik==0.12.3
      - pickleshare==0.7.5
      - pillow==9.4.0
      - plac==1.1.3
      - platformdirs==2.6.2
      - plotly==5.13.0
      - preshed==3.0.8
      - prompt-toolkit==3.0.36
      - protobuf==4.21.12
      - psutil==5.9.4
      - pure-eval==0.2.2
      - pyarrow==10.0.1
      - pycaret==2.3.10
      - pydantic==1.10.4
      - pygments==2.14.0
      - pyjwt==2.6.0
      - pyldavis==3.3.1
      - pynndescent==0.5.8
      - pyod==1.0.7
      - pyparsing==3.0.9
      - python-dateutil==2.8.2
      - pytz==2022.7.1
      - pywavelets==1.4.1
      - pywin32==305
      - pyyaml==5.4.1
      - pyzmq==25.0.0
      - querystring-parser==1.2.4
      - regex==2022.10.31
      - requests==2.28.2
      - scikit-learn==0.23.2
      - scikit-plot==0.3.7
      - scipy==1.5.4
      - seaborn==0.12.2
      - shap==0.41.0
      - six==1.16.0
      - sklearn==0.0.post1
      - slicer==0.0.7
      - smart-open==6.3.0
      - smmap==5.0.0
      - spacy==2.3.9
      - sqlalchemy==1.4.46
      - sqlparse==0.4.3
      - srsly==1.0.6
      - stack-data==0.6.2
      - statsmodels==0.13.5
      - tabulate==0.9.0
      - tangled-up-in-unicode==0.2.0
      - tenacity==8.1.0
      - textblob==0.17.1
      - thinc==7.4.6
      - threadpoolctl==3.1.0
      - tornado==6.2
      - tqdm==4.64.1
      - traitlets==5.8.1
      - typeguard==2.13.3
      - typing-extensions==4.4.0
      - umap-learn==0.5.3
      - urllib3==1.26.14
      - visions==0.7.5
      - waitress==2.1.2
      - wasabi==0.10.1
      - wcwidth==0.2.6
      - websocket-client==1.5.0
      - werkzeug==2.2.2
      - widgetsnbextension==4.0.5
      - wordcloud==1.8.2.2
      - yellowbrick==1.2.1
      - zipp==3.12.0
prefix: C:\Users\username\anaconda3\envs\py38
1

There are 1 best solutions below

0
On

It could be probably an issue in the library and the data being loaded having dash in unicode ...

Here is referenced pycaret's source code:

    def write_html(self, plot_filename):
        """
        Write the current plots to a file in HTML format.
        Parameters
        ----------
        plot_filename: str
            name of the file
        """

        html = self.get_html()

        with open(plot_filename, "w") as f:
            f.write(html)

And as mentioned in this stackoverflow question It could be solved by mentioning encoding while opening the file

with open(plot_filename, "w", encoding='utf-8') as f:
            f.write(html)

But since you cannot change library's code try running following in console before running your script as mentioned in this answer

chcp 65001
set PYTHONIOENCODING=utf-8