Pyrodataset: Create wildfire dataset

This repository aims to gather our work on dataset creation

The main objectives are the following

Gather the datasets we have created / collected
Gather several annotations for these datasets, the annotation of a smoke cloud is very complex and can answer several types of strategies.
Visualize datasets and their annotations using fiftyone
Create new datasets by combining available ones
Benchmark our models on these datasets.

The annotations here are annotations for object detection, so we can evaluate models in classification and object detection

This repository use dvc to store data. To fully use this repository you need access to our dvc storage which is currently reserved for Pyronear members. We hope to make it public soon. However you can access all public data listed below

Setup

First clone the repo and install requirements

git clone https://github.com/pyronear/pyro-dataset.git
cd pyro-dataset
pip install -r requirements.txt

Then pull the data using dvc

dvc pull

Visualize datasets using fiftyone

Fiftyone is an open-source tool to build and visualize datasets, please refer here for more information.

To load datasets, run

python fiftyone/create_datasets.py

then go to http://localhost:5151, to use their app

Once datasets are created in fiftyone, you can re-lunch the app using

python fiftyone/run.py

You can add a new dataset using:

python fiftyone/add_dataset.py

Create a dataset

You can create a combination of available datasets using

python datasets/make_dataset.py

this combination is defined by the configuration file dataset_config.yaml

You can preview the combination with the dry option

python datasets/make_dataset.py --dry

Each dataset has 3 folders:

images with the images of the dataset Labels, with one or more subfolders with the various annotations for this dataset Subset, text files containing a list of images of the dataset to extract a subset

Each dataset can have several annotations, this allows to propose a new annotation in case of task change or new annotation strategy.

To create our combined dataset we have to fill in the dataset_config file with three parameters for each dataset: Labels, the name of the labeling to use Ratio, the percentage of the dataset to use Subset, a potential subset to use

Datasets

Sources of Data

Today we have identified 3 main data sources, two of which are publicly available (Wildfire Alert & HPWREN):

Alerte wildfire

ALERTWildfire is a consortium of three universities -- The University of Nevada, Reno (UNR), University of California San Diego (UCSD), and the University of Oregon (UO) -- providing access to state-of-the-art Pan-Tilt-Zoom (PTZ) fire cameras and associated tools to help firefighters and first responders

HPWREN:

The High Performance Wireless Research and Education Network HPWREN is a network research program, funded by the National Science Foundation. The program includes the creation, demonstration, and evaluation of a non-commercial, prototype, high-performance, wide-area, wireless network in its Southern California service area.

PYRONEAR

Our camera network is in development which allows us to start building an image database. This database does not contain any fire images for the moment, but it does contain a large number of false positive cases, which are quite challenging for a network.

Pyronear has the ambition to become one day a public data source as important as the two presented above.

UNKNOWN

In addition to these 3 sources, we gather under the name UNKNOWN all other sources of images coming from the internet without a properly defined source or in too small quantity. Among these images we find those of Center for Wildfire Research of University of Split, Croatia

Datasets

From these data sources we have created or collected several datasets:

Alerte wildfire

A dataset was created from the Nevada Seismological Laboratory YouTube channel by Rodrigue de Schaetzen, Raphael Chang Menoni, Yifu Chen, and Drijon Hasani of the University of British Columbia, Canada their research paper detailing their work is available here

They have semi-automatically labeled (by video interpolation) 1.3M frames, you can download the whole dataset here. The code of their experimentation is available here and allows to extract a subset of 56K frames.

We added to this repository an extract of this 56K frames set, we took only 2807 frames of this subset.

HPWREN	Size	Smoke Images	Non Smoke Images
Nvseismolab_set1	2807	1375	1432

HPWREN

5 datasets have been created by AiforMankind:

Two training datasets were created during two hackathons, we name here these datasets AiForManKind_v1 (hackaton 1) and AiForManKind_v2 (hackaton 2).

To test the performance of their models on challenging false positive examples, Ai for mankind also proposes 3 small datasets each containing one of the main error sources in automated forest fire detection. We called these datasets AiForManKind_sunrise, AiForManKind_fog and AiForManKind_clouds.

A dataset is also proposed by the fuego project

HPWREN	Size	Smoke Images
Fuego	1739	1739
AiForManKind_v1	744	744
AiForManKind_v2	2191	2191
AiForManKind_cloud	1080	0
AiForManKind_sunrise	180	0
AiForManKind_fog	180	0

PYRONEAR

Pyronear starts to deploy its network of cameras, which allows us to create new datasets. We propose here two datasets ardeche_set0 and gironde_set0 named after the french regions where the cameras are located. These datasets do not contain any smoke images but many potential false positives which are quite challenging.

PYRONEAR	Size	Smoke Images	Non Smoke Images
ardeche_set0	20587	0	20587
gironde_set0	1205	0	1205

UNKOWN

We propose here two datasets from a mix of images collected on internet, fog_clouds to evaluate a model on challenging non-smoke images and smoke to test the hability of a model to detect a wildfire

UNKOWN	Size	Smoke Images	Non Smoke Images
fog_clouds	453	0	453
smoke	333	333	0

What else

Citation

If you wish to cite this project, feel free to use this BibTeX reference:

@misc{pyrodataset2019,
    title={Pyrodataset: wildfire early detection},
    author={Pyronear contributors},
    year={2019},
    month={October},
    publisher = {GitHub},
    howpublished = {\url{https://github.com/pyronear/pyro-dataset}}
}

Contributing

Please refer to CONTRIBUTING to help grow this project!

License

Distributed under the Apache 2 License. See LICENSE for more information.

Name		Name	Last commit message	Last commit date
Latest commit History 67 Commits
.dvc		.dvc
datasets		datasets
docs/source/_static/images		docs/source/_static/images
fiftyone		fiftyone
.dvcignore		.dvcignore
.flake8		.flake8
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
dataset_config.yaml		dataset_config.yaml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pyrodataset: Create wildfire dataset

Setup

Visualize datasets using fiftyone

Create a dataset

Datasets

Sources of Data

Alerte wildfire

HPWREN:

PYRONEAR

UNKNOWN

Datasets

Alerte wildfire

HPWREN

PYRONEAR

UNKOWN

What else

Citation

Contributing

License

About

Releases 2

Packages

Languages

License

pyronear/pyro-dataset

Folders and files

Latest commit

History

Repository files navigation

Pyrodataset: Create wildfire dataset

Setup

Visualize datasets using fiftyone

Create a dataset

Datasets

Sources of Data

Alerte wildfire

HPWREN:

PYRONEAR

UNKNOWN

Datasets

Alerte wildfire

HPWREN

PYRONEAR

UNKOWN

What else

Citation

Contributing

License

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases 2

Packages 0

Languages

Packages