Setup a base dir for the Data Catalog in Kedro

587 Views Asked by At

I'm working on a project that, because of the company's compliance rules, the data has to stay in a shared directory, that is synchronized among the programmers. The project's code on the other hand cannot be on that shared directory otherwise we wouldn't be able to version it and work together since it's all synchronized. The path to the shared folder is pretty much the same C:\Users\<employee name>\<path to data>, is there a way that I can setup C:\Users\<employee name> as a base path for my data catalog in Kedro?

I tried creating a catalog.py file that has the following code:

from kedro.io import DataCatalog
from kedro.extras.datasets.pandas import (
    CSVDataSet,
    ExcelDataSet,
)
from pathlib import Path

DEFAULT_DATA_PATH = Path.expanduser(
    Path(
        "~", 
        "Path to Data"
    )
)

DATA_CATALOG = DataCatalog(
    {
        "data": ExcelDataSet(
            filepath=Path(EXTERNAL_DATA_PATH, "data.xlsx").as_uri()
        )
            
    }
)

And then on the setting.py I've added this:

from .catalog import DATA_CATALOG
DATA_CATALOG_CLASS = DATA_CATALOG

but then I get the following error:

Traceback (most recent call last):
  File "...\Miniconda3\Scripts\kedro-script.py", line 9, in <module>
    sys.exit(main())
  File "...\Miniconda3\lib\site-packages\kedro\framework\cli\cli.py", line 205, in main 
    cli_collection = KedroCLI(project_path=Path.cwd())
  File "...\Miniconda3\lib\site-packages\kedro\framework\cli\cli.py", line 114, in __init__
    self._metadata = bootstrap_project(project_path)
  File "...\Miniconda3\lib\site-packages\kedro\framework\startup.py", line 155, in bootstrap_project
    configure_project(metadata.package_name)
  File "...\Miniconda3\lib\site-packages\kedro\framework\project\__init__.py", line 166, in configure_project
    settings.configure(settings_module)
  File "...\Miniconda3\lib\site-packages\dynaconf\base.py", line 223, in configure      
    self._wrapped = Settings(settings_module=settings_module, **kwargs)
  File "...\Miniconda3\lib\site-packages\dynaconf\base.py", line 271, in __init__       
    self.validators.validate()
  File "...\Miniconda3\lib\site-packages\dynaconf\validator.py", line 318, in validate  
    validator.validate(self.settings)
  File "...\Miniconda3\lib\site-packages\kedro\framework\project\__init__.py", line 34, 
in validate
    if not issubclass(setting_value, default_class):
TypeError: issubclass() arg 1 must be a class
1

There are 1 best solutions below

0
On BEST ANSWER

DATA_CATALOG_CLASS is expecting a class while you are providing an instance of data catalog, thus the error.

I think the way to go here to use TemplatedConfigLoader, and pass the share directory as a variable. You would supply this SHARE_DIR either through a global.yml or just a variable.

In your catalog.yml some_data: type: pandas.CSVDataSet

See more documentation here. https://kedro.readthedocs.io/en/stable/kedro.config.TemplatedConfigLoader.html path: ${SHARE_DIR}/file_name