Get the name of the zip file from Kaggle dataset

610 Views Asked by At

When I download a dataset in Kaggle, it's been downloaded using the following:

subprocess.run(["kaggle", "datasets", "download", "-d", DATA_URL, "-p", SAVE_PATH])

When I try to download it again, I get the hint message, that says that I've already downloaded the dataset and there I can see the name of the zip:

ecommerce-dataset.zip: Skipping, found more recently modified local copy (use --force to force download)

How can I find out the name of the dataset or the name of the .zip? I have looked into the Kaggle API, but I didn't find anything that helped me.

1

There are 1 best solutions below

0
On

I don't think this is possible with Kaggle API python CLI but, if the .zip folder is created in the API, there should somewhere be the naming logic.

However, I think it would be easier to just download the dataset into an empty folder, then check .zip files in that folder and use that.

This could be done with pathlib:

from pathlib import Path
import subprocess

# Create Path object and create "new_folder" in current working directory (Path.cwd())
path = Path(Path.cwd(), "new_folder")
path.mkdir()

# Download
subprocess.run(["kaggle", "datasets", "download", "-d", DATA_URL, "-p", path])

# Get .zip files in the "new_folder"
zip_files = path.glob(".zip")
dataset_zip_name = [f.name for f in zip_files][0]