Skip to content

Preprocess the RVL-CDIP dataset into a grouped by category format

Notifications You must be signed in to change notification settings

ch-hristov/Preprocess-RVL-CDIP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Preprocess the RVL-CDIP dataset into a grouped by category format

The RVL-CDIP should be downloaded from the original website - https://www.cs.cmu.edu/~aharley/rvl-cdip/. This repository is particularly useful if you want to train a model on all images without worrying about benchmarks (it does not keep the original train/test/val split!). For example to train a network for later use for transfer learning :).

  1. Move the downloaded file in a folder which has a lot of disk space.

  2. run tar -xvzf "./rvl-cdip.tar.gz"

  3. The directory should look something like the image, without the dataset folder (that one is created automatically by this script)

How your folder should look like before running this script

  1. Use the compose.py to build the per category dataset. Note it might take a while.

python compose.py

  1. Done

About

Preprocess the RVL-CDIP dataset into a grouped by category format

Resources

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages