-
Notifications
You must be signed in to change notification settings - Fork 136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dataset adding new items and saving #660
Comments
Hi, thank you for coming to us! I'll try to answer on your questions. Basically, a "project" currently represents a repository and "build tree" for a single or multiple datasets (called "sources") stored on the disk in the project directory. It is described in detail here. from datumaro.components.project import Project
with Project('project/dir/') as project:
dataset = project.working_tree.make_dataset('source_name')
# do stuff with the dataset, eg.
# dataset.add(item)
# dataset.remove(id)
# dataset.get(id)
# then you can update it using save() or export()
# Depending on the format, the command will only update what was changed -
# add, replace or remove some images and related annotation files
dataset.save(save_images=True) All these operations can be done without a project as well: from datumaro.components.dataset import Dataset
dataset = Dataset.from_iterable([], categories=['cat', 'dog', ...])
dataset.save('path/') # or export('path/', 'datumaro')
# modify dataset
dataset.save() You can also use the You can find more info in Dataset API docs and examples in tests.
Basically, it should be working like this. Maybe, if you did changes manually, you just added the image to the image directory, but not to a subset image list (like this)? "Reloading" in this case is loading a dataset from the project: Project('path/').working_tree.make_dataset('source_name')
This method works with project files, but updates to the project tree need to be saved with |
Hi,
the last few days I have been learning to use datumaro, but I have a few questions that I believe are not covered in the documentation.
I am working on a personal project similar to CVAT (labelling images and training of Object Detection model) for self-learning purpuses, specifically I am trying to integrate an "Active Learning" loop.
I don't fully understand how to integrate datumaro to my project.
My idea is to allow the user to upload images during training (after each training loop), this mean that the dataset changes during the usage of my application.
I have been using Datumaro project (adding sources) and Datumaro datasets (datumaro format), but I am unable to "persist" the changes.
I thought that by adding a "source" to a project, it would automatically "update" the dataset everytime I reload the project.
By example, if I am using a "VOC" source directory, I was expecting that if I made a change in the source directory (by example, adding a new image) it would reflect in the dataset the next time I reload the project.
and I am not sure how to "reload" a source, they only way I found was to delete the source and create a new one.
Now, my intention is to manipulate (add, remove, update) the dataset in memory (by modifying the dataset variable).
The only way I found was by calling "dataset.save()", but I feel this is not the right way since (if I understand correctly) it overwrite (delete and create) the original files, rather than just "update" the changes.
Not sure what is the right term to use, I feel like if this was a database, by using "save()" method, I am deleting the table and creating it again. My expectation is just to modify the current dataset file, in the case of the "datumaro format", to update the json file.
The workflow would be like:
while the application is running, the changes are kept in memory. the problem comes when the application is restarted.
so, everytime the user make a change to the dataset (add new items, modify an item's annotation), do I have to call the "save()" method of the dataset class (to persist the changes)?
I tried the "commit" method of the Project class, but the changes in the dataset are not saved.
I am a little bit lost in here, can someone point me to the right direction?
Thanks.
The text was updated successfully, but these errors were encountered: