Unable to train a self-supervised(ssl) model using Lightly CLI

429 Views Asked by At

I am unable to train a self-supervised(ssl) model to create image embeddings using the lightly cli: Lightly Platform Link. I intend to select diverse example from my dataset to create an object detection model further downstream and the image embeddings created with the ssl model will help me to perform Active Learning.I have reproduced the error in the Notebook with public access -----> lightly_app_troubleshooting_stackoverflow.ipynb Link.

In the notebook shared above this cmd raises an exception:

!source /content/venv_1/bin/activate;lightly-magic \
    input_dir="/content/Sunflowers" trainer.max_epochs=20 \
    token='< your lightly token(free account) >' \
    new_dataset_name="sunflowers_dataset" loader.batch_size=64

The exception stack trace produced is as below:

    /content/venv_1/lib/python3.7/site-packages/hydra/_internal/hydra.py:127: UserWarning: Future Hydra versions will no longer change working directory at job runtime by default.
See https://hydra.cc/docs/next/upgrades/1.1_to_1.2/changes_to_job_working_dir/ for more information.
  configure_logging=with_log_configuration,
########## Starting to train an embedding model.
/content/venv_1/lib/python3.7/site-packages/pytorch_lightning/core/lightning.py:23: LightningDeprecationWarning: pytorch_lightning.core.lightning.LightningModule has been deprecated in v1.7 and will be removed in v1.9. Use the equivalent class from the pytorch_lightning.core.module.LightningModule class instead.
  "pytorch_lightning.core.lightning.LightningModule has been deprecated in v1.7"
Error executing job with overrides: ['input_dir=/content/Sunflowers', 'trainer.max_epochs=20', 'token=5bbcf60e3a5c7c266dcd4e0e9056c8301684e0f2f8922bc5', 'new_dataset_name=sunflowers_dataset', 'loader.batch_size=64']
Traceback (most recent call last):
  File "/content/venv_1/lib/python3.7/site-packages/lightly/cli/lightly_cli.py", line 115, in lightly_cli
    return _lightly_cli(cfg)
  File "/content/venv_1/lib/python3.7/site-packages/lightly/cli/lightly_cli.py", line 52, in _lightly_cli
    checkpoint = _train_cli(cfg, is_cli_call)
  File "/content/venv_1/lib/python3.7/site-packages/lightly/cli/train_cli.py", line 137, in _train_cli
    encoder.train_embedding(**cfg['trainer'], strategy=distributed_strategy)
  File "/content/venv_1/lib/python3.7/site-packages/lightly/embedding/_base.py", line 88, in train_embedding
    trainer = pl.Trainer(**kwargs, callbacks=[self.checkpoint_callback])
  File "/content/venv_1/lib/python3.7/site-packages/pytorch_lightning/utilities/argparse.py", line 345, in insert_env_defaults
    return fn(self, **kwargs)
TypeError: __init__() got an unexpected keyword argument 'weights_summary'

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.

I could not create a new tag - "lightly" as I lack the stackoverflow reputation points to do so.

1

There are 1 best solutions below

2
On

The error is from an incompatibility with the latest PyTorch Lightning version (version 1.7 at the time of this writing). A quick fix is to use a lower version (e.g. 1.6). We are working on a fix :)

Let me know in case that does not work for you!