Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleanup the codebase to make it easier to understand and use #11

Open
wants to merge 11 commits into
base: main
Choose a base branch
from
12 changes: 12 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
__pycache__
.ipynb_checkpoints

test_imgs
nsfw_testset
clip_autokeras_binary_nsfw
clip_autokeras_nsfw_b32
TODO
nsfw-detector.ipynb
nude-net.ipynb
test.jpg
nsfw_trainset
504 changes: 504 additions & 0 deletions CLIP_based_NSFW_detector.ipynb

Large diffs are not rendered by default.

99 changes: 39 additions & 60 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,67 +1,45 @@
# CLIP-based-NSFW-Detector

This 2 class NSFW-detector is a lightweight Autokeras model that takes CLIP ViT L/14 embbedings as inputs.
It estimates a value between 0 and 1 (1 = NSFW) and works well with embbedings from images.

DEMO-Colab:
https://colab.research.google.com/drive/19Acr4grlk5oQws7BHTqNIK-80XGw2u8Z?usp=sharing

The training CLIP V L/14 embbedings can be downloaded here:
https://drive.google.com/file/d/1yenil0R4GqmTOFQ_GVw__x61ofZ-OBcS/view?usp=sharing (not fully manually annotated so cannot be used as test)


The (manually annotated) test set is there https://github.com/LAION-AI/CLIP-based-NSFW-Detector/blob/main/nsfw_testset.zip

https://github.com/rom1504/embedding-reader/blob/main/examples/inference_example.py inference on laion5B

Example of use of the model:

```python
@lru_cache(maxsize=None)
def load_safety_model(clip_model):
"""load the safety model"""
import autokeras as ak # pylint: disable=import-outside-toplevel
from tensorflow.keras.models import load_model # pylint: disable=import-outside-toplevel

cache_folder = get_cache_folder(clip_model)

if clip_model == "ViT-L/14":
model_dir = cache_folder + "/clip_autokeras_binary_nsfw"
dim = 768
elif clip_model == "ViT-B/32":
model_dir = cache_folder + "/clip_autokeras_nsfw_b32"
dim = 512
else:
raise ValueError("Unknown clip model")
if not os.path.exists(model_dir):
os.makedirs(cache_folder, exist_ok=True)

from urllib.request import urlretrieve # pylint: disable=import-outside-toplevel

path_to_zip_file = cache_folder + "/clip_autokeras_binary_nsfw.zip"
if clip_model == "ViT-L/14":
url_model = "https://raw.githubusercontent.com/LAION-AI/CLIP-based-NSFW-Detector/main/clip_autokeras_binary_nsfw.zip"
elif clip_model == "ViT-B/32":
url_model = (
"https://raw.githubusercontent.com/LAION-AI/CLIP-based-NSFW-Detector/main/clip_autokeras_nsfw_b32.zip"
)
else:
raise ValueError("Unknown model {}".format(clip_model)) # pylint: disable=consider-using-f-string
urlretrieve(url_model, path_to_zip_file)
import zipfile # pylint: disable=import-outside-toplevel

with zipfile.ZipFile(path_to_zip_file, "r") as zip_ref:
zip_ref.extractall(cache_folder)

loaded_model = load_model(model_dir, custom_objects=ak.CUSTOM_OBJECTS)
loaded_model.predict(np.random.rand(10**3, dim).astype("float32"), batch_size=10**3)

return loaded_model


nsfw_values = safety_model.predict(embeddings, batch_size=embeddings.shape[0])
The CLIP-based NSFW Detector is a 2-class model primarily trained to detect nudity or pornographic content. It provides an estimation value ranging between 0 and 1, where 1 indicates NSFW content. The detector works well with image embeddings.

Different models are available, ranging from small (`ViT-B-32`) to large models (`ViT-H-14`). Please refer to [models/README.md](models/README.md) for more details.

> **Note**
> The model files (`clip_autokeras_binary_nsfw.zip, clip_autokeras_nsfw_b32.zip, h14_nsfw.pth, violence_detection_vit_b_32.npy, violence_detection_vit_l_14.npy`) need to stay where they are. Becayse they are used in [clip_retrival](https://github.com/rom1504/clip-retrieval/tree/main) see [link](https://github.com/search?q=repo%3Arom1504%2Fclip-retrieval%20CLIP-based-NSFW-Detector&type=code).

# Local Development

To get started with local development, install the dependencies by running the following command:

```bash
pip install -r requirements.txt
```

# Training

We provide an example for training and testing the `ViT-L-14` openai model, as it's the only model for which we provide the training embeddings. You can find the training embeddings in the [Google Drive link](https://drive.google.com/file/d/1yenil0R4GqmTOFQ_GVw__x61ofZ-OBcS/view?usp=sharing). Please see [data/README.md](data/README.md) for more information.

For training and testing, refer to the notebook [Traininig_Test.ipynb](Traininig_Test.ipynb).

# Inference

You can find inference examples in the notebook [CLIP_based_NSFW_detector.ipynb](CLIP_based_NSFW_detector.ipynb)

# Additional Resources

Here are some other useful NSFW detectors:

* https://github.com/GantMan/nsfw_model
* https://github.com/notAI-tech/NudeNet

For NSFW detection datasets, you can refer to:

* https://github.com/alex000kim/nsfw_data_scraper
* https://archive.org/details/NudeNet_classifier_dataset_v1


# LICENSE

This code and model is released under the MIT license:

Copyright 2022, Christoph Schuhmann
Expand All @@ -71,3 +49,4 @@ Permission is hereby granted, free of charge, to any person obtaining a copy of
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Loading