Folder Structure for CI/CD conform Databricks Repo

1.2k Views Asked by At

Are there any best-practices how to organize your project folders so that the CI/CD pipline remains simple?

Here, the following structure is used, which seems to be quite complex:

project
│   README.md
│   azure-pipelines.yml   
│   config.json
│   .gitignore
└─── package1
│       │   __init__.py
│       │   setup.py
│       │   README.md
│       │   file.py
│       └── submodule
│       │      │   file.py
│       │      │   file_test.py     
│       └── requirements
│       │      │   common.txt
│       │      │   dev.txt
│       └─  notebooks
│              │   notebook1.txt
│              │   notebook2.txt
└─── package2
|       │   ...
└─── ci_cd_scripts
        │   requirements.py
        │   script1.py
        │   script2.py
        │   ...

Here, the following structure is suggested:

.
├── .dbx
│   └── project.json
├── .github
│   └── workflows
│       ├── onpush.yml
│       └── onrelease.yml
├── .gitignore
├── README.md
├── conf
│   ├── deployment.json
│   └── test
│       └── sample.json
├── pytest.ini
├── sample_project
│   ├── __init__.py
│   ├── common.py
│   └── jobs
│       ├── __init__.py
│       └── sample
│           ├── __init__.py
│           └── entrypoint.py
├── setup.py
├── tests
│   ├── integration
│   │   └── sample_test.py
│   └── unit
│       └── sample_test.py
└── unit-requirements.txt

In concrete, I want to know:

  • Should I use one repo for all repositories and notebooks (such as suggested in the first approach) or should I create one repo per library (which makes the CI/CD more effortfull as there might be dependencies between the packages)
  • With both suggested folder structures it is unclear for me where to place my notebooks that are not related to any specific package (e.g. notebooks that contain my business logic and use the package)?
  • Is there a well-established folder structure?
1

There are 1 best solutions below

1
On

The Databricks had a repository with project templates to be used with Databricks (link) but now it has been archived and the template creation is part of dbx tool - maybe these two links will be useful for you: