I am trying to use SDV and CTGAN to synthesize tabular data, and I am using this website to guide me through the process (https://www.kdnuggets.com/2022/03/generate-tabular-synthetic-dataset.html). I am trying to innitialize the model, but an error saying the sdv.tabular model was not found keeps showing. You can see the step on the website, but here is what I did for that specific cell:
from sdv.tabular import CTGAN
model = CTGAN(primary_key='id',rounding=2)
model.fit(data)
model.save("sdv-ctgan-food-demand.pkl")
new_data = model.sample(200)
new_data.head()
I have tried checking for documentation errors (This might be the error possibly, but I dont know what Im supposed to use instead), checking and updating libraries to make sure they are compatiable, but nothing seems to fix that error.
It seem like that tutorial is out of date, using a version of SDV that is almost 2 years old. Since the SDV 1.0 library was released early in 2023, some of the API has changed. This includes module name changes (eg. it's
sdv.single_tableinstead ofsdv.tabular).I'd suggest referencing the official SDV docs site, which the team keeps pretty up-to-date. The demos page has tutorial notebooks. The CTGAN demo would be most relevant.
[Edit] The updated code would look something like this (Note that some concepts are now required such as metadata. Metadata will include the primary key id.)