-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
14 changed files
with
3,582 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,11 @@ | ||
MIT License | ||
|
||
5th place solution | ||
"Turtle Recall: Conservation Challenge" | ||
https://zindi.africa/competitions/turtle-recall-conservation-challenge | ||
|
||
Copyright (c) 2022 Igor Ivanov | ||
Email: [email protected] | ||
|
||
Permission is hereby granted, free of charge, to any person obtaining a copy | ||
of this software and associated documentation files (the "Software"), to deal | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,159 @@ | ||
Turtle Recall: Conservation Challenge. 5th place solution | ||
========================================================= | ||
|
||
Competition: [link](https://zindi.africa/competitions/turtle-recall-conservation-challenge) | ||
Author: Igor Ivanov | ||
License: MIT | ||
|
||
|
||
Solution overview | ||
================= | ||
|
||
In order to ensure generalization ability I built my solution as an ensemble | ||
of 6 models each of which was trained on a 5-fold stratified split. | ||
For the same purpose I chose large deep architectures which have | ||
enough capacity to capture important features from the diverse dataset. | ||
All models share the same multiclass classification formulation over 2265 classes | ||
with average pooling and softmax on top. Optimization was performed using | ||
categorical cross-entropy loss and Adam optimizer. | ||
I used all available data for training i.e. joint set of training and extra images. | ||
Raw model prediction contains 2265 probabilities. Any predicted `turtle_id` | ||
which does not belong to 100 original training individuals is considered a `new_turtle`. | ||
Ensemble is computed as an arithmetic average of 30 predictions (6 models by 5 folds). | ||
|
||
Architectures used: | ||
- EfficientNet-v1-B7 | ||
- EfficientNet-v1-L2 | ||
- EfficientNet-v1-L2 | ||
- EfficientNet-v2-L | ||
- EfficientNet-v2-XL | ||
- BEiT-L | ||
|
||
Architectures are implemented in the following repositories: | ||
- https://github.com/qubvel/efficientnet | ||
- https://github.com/leondgarse/keras_cv_attention_models | ||
|
||
For augmentation I used rotations multiple of 45 degrees (with central crop) and flips. | ||
For validation purposes I measured Accuracy and MAP5 over 2265 classes. | ||
Software stack is based on Tensorflow and Keras. | ||
All hyperparameters are listed in a dedicated section on the top | ||
of the `run.py` file and can be passed as command line arguments. | ||
|
||
|
||
Results | ||
======= | ||
|
||
Each score in the table is an average of 5 folds. | ||
Suffix `2265` means that metric uses 2265 unique turtle ids (100 training + extra) | ||
Suffix `101` means that metric uses 101 unique turtle ids (100 training + 1 `new_turtle`) | ||
|
||
| Model | CV-acc1-2265 | CV-map5-2265 | Public-LB-map5-101 | Private-LB-map5-101 | | ||
|--------------------------|--------------|--------------|--------------------|----------------------| | ||
| run-20220310-1926-ef1b7 | 0.8731 | 0.9067 | 0.9523 | 0.9567 | | ||
| run-20220316-1310-beitl | 0.8896 | 0.9202 | 0.9611 | 0.9317 | | ||
| run-20220317-1954-ef1l2 | 0.8782 | 0.9112 | 0.9543 | 0.9501 | | ||
| run-20220318-1121-ef2xl | 0.8553 | 0.8928 | 0.9421 | 0.9332 | | ||
| run-20220322-2024-ef1l2 | 0.8720 | 0.9056 | 0.9625 | 0.9514 | | ||
| run-20220325-1527-ef2l | 0.8829 | 0.9151 | 0.9557 | 0.9545 | | ||
| - | | | | | | ||
| Ensemble | 0.9320 | 0.9503 | 0.9875 | 0.9648 | | ||
|
||
|
||
|
||
Conclusions: | ||
============ | ||
|
||
1) Solution generalizes well between public and private test sets | ||
despite very small test data size (147 and 343 examples respectively). | ||
As a result I was able to retain high position in both leaderboards: | ||
2nd place public, 5th place private. | ||
|
||
2) Ensembling gives stable significant improvement (about 0.01-0.03) | ||
observed by all metrics on all subsets of data (public/private). | ||
|
||
3) Combination of GeM pooling and ArcFace loss is a popular approach in the tasks dealing with image similarity. | ||
But in this task I did not see an improvement from this approach in my experiments. | ||
|
||
|
||
Hardware | ||
======== | ||
|
||
Training: TPUv3-8, 4 CPU, 16 GB RAM, 500 GB HDD | ||
Training time: 100 hours total | ||
|
||
Inference: V100-16GB GPU, 4 CPU, 16 GB RAM, 500 GB HDD | ||
Inference time: 30 minutes total | ||
|
||
|
||
Software | ||
======== | ||
|
||
- Ubuntu 18.04 | ||
- Python: 3.9.7 | ||
- CUDA: 11.2 | ||
- cuDNN: 8.1.1 | ||
- Tensorflow: 2.8.0 | ||
|
||
|
||
Demo | ||
==== | ||
|
||
The following example `solution/notebook/notebook.ipynb` demonstrates | ||
how to infer any single image using pretrained weights. | ||
|
||
|
||
Steps to reproduce | ||
================== | ||
|
||
``` | ||
# Install | ||
cd $HOME | ||
unzip solution.zip | ||
conda create -y --name py397 python=3.9.7 | ||
conda activate py397 | ||
pip install tensorflow==2.8.0 tensorflow-addons numpy pandas \ | ||
scikit-learn h5py efficientnet keras-cv-attention-models cloud-tpu-client | ||
# Prepare data | ||
cd $HOME/solution/data | ||
curl -L -O https://storage.googleapis.com/dm-turtle-recall/train.csv | ||
curl -L -O https://storage.googleapis.com/dm-turtle-recall/extra_images.csv | ||
curl -L -O https://storage.googleapis.com/dm-turtle-recall/test.csv | ||
curl -L -O https://storage.googleapis.com/dm-turtle-recall/sample_submission.csv | ||
curl -L -O https://storage.googleapis.com/dm-turtle-recall/images.tar | ||
mkdir images | ||
tar xf images.tar -C images | ||
rm images.tar | ||
cd $HOME/solution | ||
python3 create_tfrecords.py --data_dir=$HOME/solution/data --out_dir=$HOME/solution/data/tfrec | ||
# Training | ||
# Please remove all weights from previous runs if present. | ||
# All hyperparameters are configured for training on TPUv3-8. | ||
# To train on GPU (or several GPUs) set the following arguments in `run_training.sh`: | ||
# --tpu_ip_or_name=None | ||
# --data_tfrec_dir=$HOME/solution/data/tfrec | ||
# and adjust batch size and learning rate accordingly. | ||
# To use mixed precision set: | ||
# --mixed_precision=mixed_float16 | ||
bash run_training.sh | ||
# Inference | ||
bash run_inference.sh | ||
# Submission will appear as $HOME/solution/submission.csv | ||
``` | ||
|
||
|
||
Acknowledgement | ||
=============== | ||
|
||
Thanks to [TRC program](https://sites.research.google/trc/about/) | ||
I had an opportunity to run experiments on TPUv3-8. | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,116 @@ | ||
#------------------------------------------------------------------------------ | ||
#------------------------------------------------------------------------------ | ||
|
||
import os | ||
import sys | ||
sys.path.append('lib') | ||
import glob | ||
import warnings | ||
warnings.simplefilter('ignore', UserWarning) | ||
import collections | ||
import numpy as np | ||
import pandas as pd | ||
from sklearn.preprocessing import LabelEncoder | ||
from sklearn.model_selection import train_test_split | ||
from sklearn.model_selection import KFold | ||
from sklearn.model_selection import StratifiedKFold | ||
from sklearn.model_selection import GroupKFold | ||
import tensorflow as tf | ||
print('tf:', tf.__version__) | ||
from vecxoz_utils import create_cv_split | ||
from argparse import ArgumentParser | ||
|
||
parser = ArgumentParser() | ||
parser.add_argument('--data_dir', default='data', type=str, help='Data directory') | ||
parser.add_argument('--out_dir', default='data/tfrec', type=str, help='Out directory') | ||
args = parser.parse_args() | ||
|
||
os.makedirs(args.out_dir, exist_ok=True) | ||
|
||
#------------------------------------------------------------------------------ | ||
#------------------------------------------------------------------------------ | ||
|
||
class TFRecordProcessor(object): | ||
def __init__(self): | ||
self.n_examples = 0 | ||
# | ||
def _bytes_feature(self, value): | ||
if isinstance(value, type(tf.constant(0))): | ||
value = value.numpy() # BytesList won't unpack a string from an EagerTensor. | ||
return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value])) | ||
# | ||
def _int_feature(self, value): | ||
return tf.train.Feature(int64_list=tf.train.Int64List(value=[value])) | ||
# | ||
def _float_feature(self, value): | ||
return tf.train.Feature(float_list=tf.train.FloatList(value=[value])) | ||
# | ||
def _process_example(self, ind, A, B, C, D): | ||
self.n_examples += 1 | ||
feature = collections.OrderedDict() | ||
# | ||
feature['image_id'] = self._bytes_feature(A[ind].encode('utf-8')) | ||
feature['image'] = self._bytes_feature(tf.io.read_file(B[ind])) | ||
feature['label_id'] = self._bytes_feature(C[ind].encode('utf-8')) | ||
feature['label'] = self._int_feature(D[ind]) | ||
# | ||
example_proto = tf.train.Example(features=tf.train.Features(feature=feature)) | ||
self._writer.write(example_proto.SerializeToString()) | ||
# | ||
def write_tfrecords(self, A, B, C, D, n_shards=1, file_out='train.tfrecord'): | ||
n_examples_per_shard = A.shape[0] // n_shards | ||
n_examples_remainder = A.shape[0] % n_shards | ||
self.n_examples = 0 | ||
# | ||
for shard in range(n_shards): | ||
self._writer = tf.io.TFRecordWriter('%s-%05d-of-%05d' % (file_out, shard, n_shards)) | ||
# | ||
start = shard * n_examples_per_shard | ||
if shard == (n_shards - 1): | ||
end = (shard + 1) * n_examples_per_shard + n_examples_remainder | ||
else: | ||
end = (shard + 1) * n_examples_per_shard | ||
# | ||
print('Shard %d of %d: (%d examples)' % (shard, n_shards, (end - start))) | ||
for i in range(start, end): | ||
self._process_example(i, A, B, C, D) | ||
print(i, end='\r') | ||
# | ||
self._writer.close() | ||
# | ||
return self.n_examples | ||
|
||
#------------------------------------------------------------------------------ | ||
#------------------------------------------------------------------------------ | ||
|
||
train_df, test_df = create_cv_split(args.data_dir, n_splits=5) | ||
|
||
tfrp = TFRecordProcessor() | ||
|
||
#------------------------------------------------------------------------------ | ||
#------------------------------------------------------------------------------ | ||
|
||
for fold_id in range(len(train_df['fold_id'].unique())): | ||
print('Fold:', fold_id) | ||
n_written = tfrp.write_tfrecords( | ||
train_df[train_df['fold_id'] == fold_id]['image_id'].values, | ||
train_df[train_df['fold_id'] == fold_id]['image'].values, | ||
train_df[train_df['fold_id'] == fold_id]['turtle_id'].values, | ||
train_df[train_df['fold_id'] == fold_id]['label'].values, | ||
# | ||
n_shards=1, | ||
file_out=os.path.join(args.out_dir, 'fold.%d.tfrecord' % fold_id)) | ||
|
||
n_written = tfrp.write_tfrecords( | ||
test_df['image_id'].values, | ||
test_df['image'].values, | ||
test_df['turtle_id'].values, | ||
test_df['label'].values, | ||
# | ||
n_shards=1, | ||
file_out=os.path.join(args.out_dir, 'test.tfrecord')) | ||
|
||
#------------------------------------------------------------------------------ | ||
#------------------------------------------------------------------------------ | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,96 @@ | ||
#------------------------------------------------------------------------------ | ||
#------------------------------------------------------------------------------ | ||
|
||
import os | ||
import sys | ||
sys.path.append('lib') | ||
import warnings | ||
warnings.simplefilter('ignore', UserWarning) | ||
import numpy as np | ||
import pandas as pd | ||
from sklearn.preprocessing import LabelEncoder | ||
from vecxoz_utils import create_cv_split | ||
|
||
# List of models to ensemble | ||
dirs = [ | ||
'run-20220310-1926-ef1b7', | ||
'run-20220316-1310-beitl', | ||
'run-20220317-1954-ef1l2', | ||
'run-20220318-1121-ef2xl', | ||
'run-20220322-2024-ef1l2', | ||
'run-20220325-1527-ef2l', | ||
] | ||
|
||
model_dir = 'models' | ||
data_dir = 'data' | ||
n_folds = 5 | ||
n_tta = 0 | ||
|
||
#------------------------------------------------------------------------------ | ||
#------------------------------------------------------------------------------ | ||
|
||
# Load predictions from all models | ||
y_preds_test = [] | ||
for counter, d in enumerate(dirs): | ||
for tta_id in range(n_tta + 1): | ||
for fold_id in range(n_folds): | ||
y_preds_test.append(np.load(os.path.join(model_dir, d, 'preds', 'y_pred_test_fold_%d_tta_%d.npy' % (fold_id, tta_id)))) | ||
print(counter, end='\r') | ||
assert len(y_preds_test) == (n_tta + 1) * len(dirs) * n_folds | ||
|
||
#------------------------------------------------------------------------------ | ||
#------------------------------------------------------------------------------ | ||
|
||
# Compute mean and argsort | ||
probas = np.mean(y_preds_test, axis=0) | ||
preds = np.argsort(probas, axis=1)[:, ::-1] | ||
|
||
# train_df contains train + extra data of 2265 classes | ||
# train_orig_df contains 100 original classes | ||
train_df, _ = create_cv_split(data_dir, 5) | ||
train_orig_df = pd.read_csv(os.path.join(data_dir, 'train.csv')) | ||
turtle_ids_orig = sorted(train_orig_df['turtle_id'].unique()) # 100 unique | ||
|
||
# Fit LabelEncoder on 2265 clases to decode our predictions | ||
le = LabelEncoder() | ||
le = le.fit(train_df['turtle_id']) | ||
|
||
# Replace all predicted labels outside of 100 train ids with a "new_turtle" | ||
label_str = [] | ||
for row in preds: # 490 | ||
turtle_ids_predicted = le.inverse_transform(row) # transform a row of length 2265 | ||
turtle_ids_replaced = [] | ||
for turtle_id in turtle_ids_predicted: | ||
if turtle_id in turtle_ids_orig: | ||
turtle_ids_replaced.append(turtle_id) | ||
else: | ||
turtle_ids_replaced.append('new_turtle') | ||
label_str.append(turtle_ids_replaced) | ||
label_str_npy = np.array(label_str) # (490, 2265) | ||
|
||
# There may be more than 1 "new_turtle" prediction for any given example | ||
# We replace all repetitions except the first with the most probable predictions form 100 train ids | ||
rows_by_5 = [] | ||
for row in label_str_npy: | ||
cand = [x for x in row[row != 'new_turtle'] if x not in row[:5]][:4] | ||
row_new = [] | ||
for t_id in row[:5]: | ||
if t_id not in row_new: | ||
row_new.append(t_id) | ||
for _ in range(5 - len(row_new)): | ||
row_new.append(cand.pop(0)) | ||
rows_by_5.append(np.array(row_new)) | ||
rows_by_5_npy = np.array(rows_by_5) | ||
|
||
# Crete submission file | ||
subm_df = pd.read_csv('/home/vecxoz/data/sample_submission.csv') | ||
subm_df['prediction1'] = rows_by_5_npy[:, 0] | ||
subm_df['prediction2'] = rows_by_5_npy[:, 1] | ||
subm_df['prediction3'] = rows_by_5_npy[:, 2] | ||
subm_df['prediction4'] = rows_by_5_npy[:, 3] | ||
subm_df['prediction5'] = rows_by_5_npy[:, 4] | ||
|
||
subm_df.to_csv('submission.csv', index=False) | ||
|
||
#------------------------------------------------------------------------------ | ||
#------------------------------------------------------------------------------ |
Oops, something went wrong.