Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE REQUEST] use of dataset in tfsim.callbacks.EvalCallback #293

Open
Lunatik00 opened this issue Sep 22, 2022 · 1 comment
Open

[FEATURE REQUEST] use of dataset in tfsim.callbacks.EvalCallback #293

Lunatik00 opened this issue Sep 22, 2022 · 1 comment

Comments

@Lunatik00
Copy link
Contributor

Hi, I have a relatively big dataset, considering the available ram, I currently have access to machines that I can use with the dataset, so that is not a problem for me, but since the ram use is a lot I checked if there was an implementation to use a dataset (tf.data.Dataset(), the same way it can be an input for the model.fit() function) and it wasn't, it could help people with less compute resources to use this function with their datasets (I read the dataset using the function tf.keras.utils.image_dataset_from_directory(), it can be batched or unbatched)

@owenvallis
Copy link
Collaborator

So we do provide the tfrecord sampler for handling datasets that are too large to fit in memory. There are some quirks to setting up the TFRecords, i.e., this sampler requires that each TF Record file contain contiguous blocks of classes where the size of each block is a multiple of example_per_class.

Regarding the EvalCallback. This was meant to hold a smaller subset of the data in memory as we need to rebuild the index every time we call the Callback. Since this is pretty expensive, the expectation is that this is small eval set.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants