I'm working on a project that, because of the company's compliance rules, the data has to stay in a shared directory, that is synchronized among the programmers. The project's code on the other hand cannot be on that shared directory otherwise we wouldn't be able to version it and work together since it's all synchronized. The path to the shared folder is pretty much the same C:\Users\<employee name>\<path to data>
, is there a way that I can setup C:\Users\<employee name>
as a base path for my data catalog in Kedro?
I tried creating a catalog.py
file that has the following code:
from kedro.io import DataCatalog
from kedro.extras.datasets.pandas import (
CSVDataSet,
ExcelDataSet,
)
from pathlib import Path
DEFAULT_DATA_PATH = Path.expanduser(
Path(
"~",
"Path to Data"
)
)
DATA_CATALOG = DataCatalog(
{
"data": ExcelDataSet(
filepath=Path(EXTERNAL_DATA_PATH, "data.xlsx").as_uri()
)
}
)
And then on the setting.py
I've added this:
from .catalog import DATA_CATALOG
DATA_CATALOG_CLASS = DATA_CATALOG
but then I get the following error:
Traceback (most recent call last):
File "...\Miniconda3\Scripts\kedro-script.py", line 9, in <module>
sys.exit(main())
File "...\Miniconda3\lib\site-packages\kedro\framework\cli\cli.py", line 205, in main
cli_collection = KedroCLI(project_path=Path.cwd())
File "...\Miniconda3\lib\site-packages\kedro\framework\cli\cli.py", line 114, in __init__
self._metadata = bootstrap_project(project_path)
File "...\Miniconda3\lib\site-packages\kedro\framework\startup.py", line 155, in bootstrap_project
configure_project(metadata.package_name)
File "...\Miniconda3\lib\site-packages\kedro\framework\project\__init__.py", line 166, in configure_project
settings.configure(settings_module)
File "...\Miniconda3\lib\site-packages\dynaconf\base.py", line 223, in configure
self._wrapped = Settings(settings_module=settings_module, **kwargs)
File "...\Miniconda3\lib\site-packages\dynaconf\base.py", line 271, in __init__
self.validators.validate()
File "...\Miniconda3\lib\site-packages\dynaconf\validator.py", line 318, in validate
validator.validate(self.settings)
File "...\Miniconda3\lib\site-packages\kedro\framework\project\__init__.py", line 34,
in validate
if not issubclass(setting_value, default_class):
TypeError: issubclass() arg 1 must be a class
DATA_CATALOG_CLASS
is expecting a class while you are providing an instance of data catalog, thus the error.I think the way to go here to use
TemplatedConfigLoader
, and pass the share directory as a variable. You would supply thisSHARE_DIR
either through aglobal.yml
or just a variable.In your
catalog.yml
some_data: type: pandas.CSVDataSetSee more documentation here. https://kedro.readthedocs.io/en/stable/kedro.config.TemplatedConfigLoader.html path: ${SHARE_DIR}/file_name