Why is sdv.tabular not working for my tabular synthetic model?

270 Views Asked by At

I am trying to use SDV and CTGAN to synthesize tabular data, and I am using this website to guide me through the process (https://www.kdnuggets.com/2022/03/generate-tabular-synthetic-dataset.html). I am trying to innitialize the model, but an error saying the sdv.tabular model was not found keeps showing. You can see the step on the website, but here is what I did for that specific cell:

from sdv.tabular import CTGAN

model = CTGAN(primary_key='id',rounding=2)
model.fit(data)
model.save("sdv-ctgan-food-demand.pkl")
new_data = model.sample(200)
new_data.head()

I have tried checking for documentation errors (This might be the error possibly, but I dont know what Im supposed to use instead), checking and updating libraries to make sure they are compatiable, but nothing seems to fix that error.

1

There are 1 best solutions below

0
Neha Patki On

It seem like that tutorial is out of date, using a version of SDV that is almost 2 years old. Since the SDV 1.0 library was released early in 2023, some of the API has changed. This includes module name changes (eg. it's sdv.single_table instead of sdv.tabular).

I'd suggest referencing the official SDV docs site, which the team keeps pretty up-to-date. The demos page has tutorial notebooks. The CTGAN demo would be most relevant.

[Edit] The updated code would look something like this (Note that some concepts are now required such as metadata. Metadata will include the primary key id.)

from sdv.single_table import CTGANSynthesizer
from sdv.metadata import SingleTableMetadata

# write a metadata object to describe the data, primary key, etc.
# this can be done manually or you can auto-detect/update it
metadata = SingleTableMetadata()
metadata.detect_from_dataframe(data)
metadata.set_primary_key(column_name='id')

model = CTGANSynthesizer(metadata)
model.fit(data)

model.save("sdv-ctgan-food-demand.pkl")
new_data = model.sample(num_rows200)
new_data.head()