Image Captioning Cli

This is a mod of fancyfeast/joy-caption-alpha-two.

Install Required

Joy Caption Alpha Two: https://huggingface.co/spaces/fancyfeast/joy-caption-alpha-two/tree/main

Overview

This application generates descriptive captions for images using advanced ML models. It processes single images or entire directories, leveraging CLIP and LLM models for accurate and contextual captions. It has NSFW captioning support with natural language. This is just an extension of the original author's efforts to improve performance. Their repo is located here: https://huggingface.co/spaces/fancyfeast/joy-caption-alpha-two.

Features

Single image and batch processing
Multiple directory support
Custom output directory
Adjustable caption generation options
Progress tracking

Usage

python app_cli.py [--input_image INPUT_IMAGE | --input_dir INPUT_DIR] --output OUTPUT [options]

Arguments

Argument	Description
`--input_image`	Path to a single input image.
`--input_dir`	Path to a directory containing input images. Mutually exclusive with `--input_image`.
`--output`	(Required) Path to the output directory where captions will be saved.
`--caption_type`	Type of caption to generate. Options: `descriptive`, `descriptive_informal`, `training_prompt`, `midjourney`, `booru_tag_list`, `booru_like_tag_list`, `art_critic`, `product_listing`, `social_media_post`. Default: `descriptive`.
`--caption_length`	Length of the caption (e.g., `any`, `very_short`, `short`, `medium_length`, `long`, `very_long`, or a specific number of words). Default: `long`.
`--extra_options`	Extra options to include in the caption generation by specifying their numeric indices. Available options are listed below.
`--name_input`	Name of a person or character if applicable.
`--custom_prompt`	Custom prompt to override default settings.

Extra Options Mapping

Option Number	Description
`1`	If there is a person/character in the image you must refer to them as `{name}`.
`2`	Do NOT include information about people/characters that cannot be changed (like ethnicity, gender, etc), but do still include changeable attributes (like hair style).
`3`	Include information about lighting.
`4`	Include information about camera angle.
`5`	Include information about whether there is a watermark or not.
`6`	Include information about whether there are JPEG artifacts or not.
`7`	If it is a photo you MUST include information about what camera was likely used and details such as aperture, shutter speed, ISO, etc.
`8`	Do NOT include anything sexual; keep it PG.
`9`	Do NOT mention the image's resolution.
`10`	You MUST include information about the subjective aesthetic quality of the image from low to very high.
`11`	Include information on the image's composition style, such as leading lines, rule of thirds, or symmetry.
`12`	Do NOT mention any text that is in the image.
`13`	Specify the depth of field and whether the background is in focus or blurred.
`14`	If applicable, mention the likely use of artificial or natural lighting sources.
`15`	Do NOT use any ambiguous language.
`16`	Include whether the image is sfw, suggestive, or nsfw.
`17`	ONLY describe the most important elements of the image.

Examples

Process a single image:

python app_cli.py --input_image path/to/image.jpg --output path/to/output

Process all images in a directory:

python app_cli.py --input_dir path/to/directory --output path/to/output

Process multiple directories:

python app_cli.py --input_dir path/to/dir1 --input_dir path/to/dir2 --output path/to/output

Specify caption type and length:

python app_cli.py --input_dir path/to/directory --output path/to/output --caption_type art_critic --caption_length medium_length

Include extra options and custom prompt:

python app_cli.py --input_image image.jpg --output captions/ --extra_options 1 3 5 --custom_prompt "Provide a detailed art critique for this image."

Specify the name of a character in the image:

python app_cli.py --input_image image.jpg --output captions/ --name_input "Alice"

Technical Details

Models: CLIP (vision), LLM (language), custom ImageAdapter
Optimization: CUDA-enabled GPU support
Error Handling: Skips problematic images in batch processing

Requirements

Python 3.x
PyTorch
Transformers library
PIL (Pillow)
CUDA-capable GPU (recommended)

Installation

Windows

git clone https://github.com/diegocaumont/image-captions
cd image-captions
python -m venv venv OR conda create -n captions python==3.10 | conda activate captions
.\venv\Scripts\activate
# Change as per https://pytorch.org/get-started/locally/
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
pip install -r requirements.txt

Linux

git clone https://github.com/diegocaumont/image-captions
cd image-captions
python -m venv venv OR conda create -n captions python==3.10 | conda activate captions
source venv/bin/activate
pip3 install torch torchvision torchaudio
pip3 install -r requirements.txt

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
LICENSE		LICENSE
README.md		README.md
app_cli_cpu.py		app_cli_cpu.py
app_cli_cuda.py		app_cli_cuda.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Image Captioning Cli

Install Required

Overview

Features

Usage

Arguments

Extra Options Mapping

Examples

Technical Details

Requirements

Installation

Windows

Linux

About

Releases

Packages

Languages

License

diegocaumont/image-captions

Folders and files

Latest commit

History

Repository files navigation

Image Captioning Cli

Install Required

Overview

Features

Usage

Arguments

Extra Options Mapping

Examples

Technical Details

Requirements

Installation

Windows

Linux

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages