Classification of 365 Scenes by fine-tuning Vision Transformer on Places365 Standard dataset.
The Places365 dataset contains 365 Scenes. Below are a few example scenes.
Images | Labels |
---|---|
Baseball Field | |
Balcony Interior | |
Embassy | |
Fire Escape | |
Kitchen | |
Lake Natural | |
Skyscraper | |
Office Cubicles | |
Reception |
The Vision Transformer model was finetuned on the Places365 dataset with the hyperparmeters as follows:
Hyperparameter | Value |
---|---|
Batch Size | 32 |
Learning Rate | 2e-4 |
Optimizer | AdamW |
No of Epochs | 5 |
Evaluate Validation after |
5000 batches |
The metrics were logged with the help of Weights and Biases. This specific run can be found here.
The Trained Vision Transformer Model was evaluated on the Places365 test dataset and obtained the following results:
Metric | Value |
---|---|
AUROC | 98.90 |
Accuracy Top 5 | 83.52 |
Accuracy Top 1 | 52.47 |
F1-Score | 51.71 |
Precision | 52.70 |
Recall | 52.47 |
- Download the Places365 Standard dataset.
- Install the requirements from requirements.txt.
- Update the paths of
- DATASET_TRAIN_PATH : Path of Places365 Standard training dataset.
- DATASET_TEST_PATH : Path of Places365 Standard validation dataset.
- DATASET_MAPPINGS_PATH : Path to store the dataset mappings for train and test datasets.
- WANDB_PATH : Path to initialize Weights and Bias runs.
- Run preprocess_dataset.py to create a mapping of images.
- Train the model by running the train.py script.
- Evaluate the model on the test dataset by running the evaluate_test.py script.