Backup/restore of FiftyOne dynamic data (e.g. tags, detections, and segmentations)

1.3k Views Asked by At

How can I back up dynamic content in a FiftyOne dataset? Tags are the most important data that needs to be backed up. Several of my users will spend quality time manually creating tags in the UI, and I'd like to make sure we back up their work. I do not need to back up static content such as the images themselves.

It would also be nice to back up detections and segmentations. For smaller data sets, I could regenerate these from a script, but for larger datasets, or situations where the source data (e.g. detections) change, it would be nice to not have to reconstruct these.

And, once I back up this data, how would I restore?

1

There are 1 best solutions below

2
On

FiftyOne Teams

This workflow sounds like it would be solved with FiftyOne Teams, the enterprise version of FiftyOne designed for team-based collaboration on the same datasets. It is relevant not only because it supports multiple users working on a dataset simultaneously, but also because dataset versioning is on the near-term roadmap for FiftyOne Teams.

Dataset and Field cloning

In FiftyOne, the current recommended method for backing up a Dataset, DatasetView, or Field is to "clone" it.

For both a Dataset and a DatasetView, the clone() method will take the existing fields in the given samples and copy them over into a new Dataset. When cloning a DatasetView, only the fields that exist in the filtered view will be cloned.

You can also use clone_sample_field() to copy the contents of a view’s field into a new field of the underlying Dataset. This applies to any sample field including tags and labels.

import fiftyone as fo
import fiftyone.zoo as foz

dataset = foz.load_zoo_dataset("quickstart")

# Create a view with only the "ground_truth" field
# and clone it into a new Dataset

view = dataset.select_fields("ground_truth")
bu_dataset = view.clone()



# Clone the "tags" field within the dataset

bu_dataset.clone_sample_field("tags", "tags_backup")

FiftyOne only stores the paths to images in the database, no media is ever copied. This means that when cloning a Dataset, only the media file paths are duplicated, not the media itself.

Restoring cloned data

Cloned fields

Restoring a cloned field is as simple as renaming the field back to the original name.

bu_dataset.rename_sample_field("tags_backup", "tags")

For a more nuanced restoration, you can always iterate over the samples in a simple Python loop to restore exactly what you need:

for sample in bu_dataset:
    backup_tags = sample.tags
    if "validation" in backup_tags:
        sample.tags = backup_tags
        sample.save()
Cloned datasets

In order to restore a field from a cloned dataset, you can merge the samples from one dataset into another:

merge_view = bu_dataset.select_fields("ground_truth")
dataset.merge_samples(merge_view)

Dataset persistence

When planning on working with a Dataset more than once, the persistent option should be set. When a Dataset is set to be persistent, then even when the Python kernel and backing MongoDB database are shut down, the Dataset will not be deleted and can be reloaded at a future time.

For example, to persist a dataset:

import fiftyone as fo
import fiftyone.zoo as foz

# Create your dataset
dataset = foz.load_zoo_dataset(
    "coco-2017",
    split="validation",
    max_samples=10,
    dataset_name="my_dataset",
)

dataset.persistent = True

Now you can close Python, reopen it, and load the Dataset.

import fiftyone as fo

print(fo.list_datasets())
# ["my_dataset"]

dataset = fo.load_dataset("my_dataset")

App tagging

When a tag is created and applied in the FiftyOne App, it is automatically backed up in the Dataset and therefore in the backing MongoDB database.

For example, to create a custom_tag in the Dataset loaded above you can launch the App, select samples or labels, enter a tag, and hit "Apply":

session = fo.launch_app(dataset)

apply_tag_in_app

Back in Python, the Dataset has been updated and the tags created in the App can be queried or backed up as shown above.

tagged_view = dataset.match_tags("custom_tag")
print(tagged_view)
Dataset:     my_dataset
Media type:  image
Num samples: 3
Tags:        ['custom_tag', 'validation']
Sample fields:
    id:           fiftyone.core.fields.ObjectIdField
    filepath:     fiftyone.core.fields.StringField
    tags:         fiftyone.core.fields.ListField(fiftyone.core.fields.StringField)
    metadata:     fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.metadata.Metadata)
    ground_truth: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Detections)
View stages:
    1. MatchTags(tags=['custom_tag'], bool=True)