How can I back up dynamic content in a FiftyOne dataset? Tags are the most important data that needs to be backed up. Several of my users will spend quality time manually creating tags in the UI, and I'd like to make sure we back up their work. I do not need to back up static content such as the images themselves.
It would also be nice to back up detections and segmentations. For smaller data sets, I could regenerate these from a script, but for larger datasets, or situations where the source data (e.g. detections) change, it would be nice to not have to reconstruct these.
And, once I back up this data, how would I restore?
FiftyOne Teams
This workflow sounds like it would be solved with FiftyOne Teams, the enterprise version of FiftyOne designed for team-based collaboration on the same datasets. It is relevant not only because it supports multiple users working on a dataset simultaneously, but also because dataset versioning is on the near-term roadmap for FiftyOne Teams.
Dataset and Field cloning
In FiftyOne, the current recommended method for backing up a
Dataset
,DatasetView
, orField
is to "clone" it.For both a
Dataset
and aDatasetView
, the clone() method will take the existing fields in the given samples and copy them over into a newDataset
. When cloning aDatasetView
, only the fields that exist in the filtered view will be cloned.You can also use clone_sample_field() to copy the contents of a view’s field into a new field of the underlying
Dataset
. This applies to any sample field including tags and labels.FiftyOne only stores the paths to images in the database, no media is ever copied. This means that when cloning a
Dataset
, only the media file paths are duplicated, not the media itself.Restoring cloned data
Cloned fields
Restoring a cloned field is as simple as renaming the field back to the original name.
For a more nuanced restoration, you can always iterate over the samples in a simple Python loop to restore exactly what you need:
Cloned datasets
In order to restore a field from a cloned dataset, you can merge the samples from one dataset into another:
Dataset persistence
When planning on working with a
Dataset
more than once, thepersistent
option should be set. When aDataset
is set to bepersistent
, then even when the Python kernel and backing MongoDB database are shut down, theDataset
will not be deleted and can be reloaded at a future time.For example, to persist a dataset:
Now you can close Python, reopen it, and load the
Dataset
.App tagging
When a
tag
is created and applied in the FiftyOne App, it is automatically backed up in theDataset
and therefore in the backing MongoDB database.For example, to create a
custom_tag
in theDataset
loaded above you can launch the App, select samples or labels, enter a tag, and hit "Apply":Back in Python, the
Dataset
has been updated and the tags created in the App can be queried or backed up as shown above.