Curator, the guide 🌎

This is a guide for SpaceML’s machine learning pipeline that has seven components which are summarized below. Each program serves a different role in the pipeline from downloading satellite images and labeling images to training a machine learning model, improving an existing model and doing image similarity search. These programs can be used altogether but you can also utilize just one of them or a few of them according to your needs. Throughout this guide, we will showcase a few ways to combine this pipeline.

Program description & guide

1. GIBS Downloader

A tool for downloading Earth images. You can download NASA satellite imagery of certain areas and certain time periods that you designate. It is useful to build an Earth image dataset.

2. Self-Supervised Learner (SSL)

Self-supervised learning program for training a machine learning model with fewer labeled data. You can train an encoder with unlabeled data and train a classifier with less labeled data compared to supervised learning.

Colab guide

3. Image Similarity Search

Reverse image search app. Once you have a dataset and a model trained on the dataset, Image Similarity Search can calculate similarities between images in the dataset and show you similar images within the dataset to an image you pick. This can be used for a sanity check to make sure your model is trained well.

Guide

4. Index & Search (GCP)

‘Image Similarity Search’ app works well with up to 3 million images. For the scalable image similarity search with bigger dataset, we used Index & Search (GCP), which utilizes Google Cloud Platform. To begin with, we saved the dataset and model we got from GIBS Downloader and Self-Supervised Learner on Google Cloud Storage Bucket. Then we had ①Index API and ②Search API. With Index API, we generated embeddings, an indexer file and a metadata file in Google Compute Engine VM. NVIDA DALI and FAISS were used to make the process more efficient. Then we deployed the Search API, which was built using FastAPI for minimal latency, to Google App Engine for the live image similarity search. Google Cloud Functions helped with easy and smooth usage of GCP throughout the process. To get a glimpse of how Index API works, check out this sample notebook

5. Swipe Labeler

GUI based image labeling program. You can easily label images by swiping right/left, clicking accept/reject, or pressing the right/left arrow key on the keyboard. Multiple people can use Swipe Labeler at the same time without overwriting labels so you can enjoy speedy labeling with your teammates.

Guide

6. Active Labeler

A program designed to better your model in an efficient manner. Once you have a trained model, Active Labeler will pick out images that the model has the most difficulty with. Then you’ll label those images through Swipe Labeler and retrain the model with the newly labeled images so that the model can overcome its weakness.

7. Worldview Search Chrome Extension

A chrome extension for finding similar images in the NASA Worldview website. Take a snapshot of a particular scene in a satellite image on the website. Then our extension will show you similar satellite images to the chosen image.

Guide

Combination guide

1. GIBS Downloader + Self-Supervised Learner

Guide

2. Self-Supervised Learner + Image Similarity Search + Swipe Labeler + Active Labeler

Guide

Required dataset format

Self-Supervised Learner, Image Similarity Search, Index & Search (GCP) and Active Labeler require a dataset to be organized in PyTorch ImageFolder format like this:

/Dataset
    /Class 1
        Image1.png
        Image2.png
    /Class 2
        Image3.png
        Image4.png

UC Merced Land Use dataset, which is used in some of our guide notebooks, is a good example:

/UCMerced_LandUse
    /Images
        /agricultural
            agricultural00.tif
            agricultural01.tif
            ...
        /airplane
            airplane00.tif
            airplane01.tif
            ...
        /...

In case there are no labels, you can organize images like this:

/Dataset
    /Unlabelled
        Image1.png
        Image2.png
        Image3.png

Citation

If you find Curator useful in your research, please consider citing the github code for this tool:

@code{
  title={Curator: A No-Code, Self-Supervised Learning and Active Labeling Tool to Create Labeled Image Datasets from Petabyte-Scale Imagery,
},
  url={https://github.com/spaceml-org/Curator-Unlabeled-Image-Search-Guide},
  year={2021}
}

Name		Name	Last commit message	Last commit date
Latest commit History 87 Commits
notebooks		notebooks
samples/index_api_samples		samples/index_api_samples
single_usage_guide		single_usage_guide
README.md		README.md
curator-complex-header.jpg		curator-complex-header.jpg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Curator, the guide 🌎

Program description & guide

1. GIBS Downloader

2. Self-Supervised Learner (SSL)

3. Image Similarity Search

4. Index & Search (GCP)

5. Swipe Labeler

6. Active Labeler

7. Worldview Search Chrome Extension

Combination guide

1. GIBS Downloader + Self-Supervised Learner

2. Self-Supervised Learner + Image Similarity Search + Swipe Labeler + Active Labeler

Required dataset format

Citation

About

Releases

Packages

Contributors 2

Languages

spaceml-org/Curator-Unlabeled-Image-Search-Guide

Folders and files

Latest commit

History

Repository files navigation

Curator, the guide 🌎

Program description & guide

1. GIBS Downloader

2. Self-Supervised Learner (SSL)

3. Image Similarity Search

4. Index & Search (GCP)

5. Swipe Labeler

6. Active Labeler

7. Worldview Search Chrome Extension

Combination guide

1. GIBS Downloader + Self-Supervised Learner

2. Self-Supervised Learner + Image Similarity Search + Swipe Labeler + Active Labeler

Required dataset format

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages