The repo is used for studying how to use Prompt Engineering for Computer Vision tasks.
Use state-of-art models like Diffusion
or other baseline models to generate, inpaint, and paint images.
streamlit run basic_app.py --server.port 5555 --server.enableCORS false
streamlit run sam_app.py --server.port 5555 --server.enableCORS false
streamlit run sam_inpaint_app.py --server.port 5555 --server.enableCORS false
Image assets/van.jpg
- [Van] A Volkswagen California van, parked on a beach, with a surfboard on the roof.
- [Ground] Big grassland, a lot grass, green or grey grass.
- [Between sky and ground] Endless grassland.
- [Sky] Clear night sky with stars and full moon.
Use
SAM
to get mask of the object, use the mask of object to track through all frames.
streamlit run sam_tracker.py --server.port 5556 --server.enableCORS false
Video src: https://dl.dropbox.com/s/0lalmh95tylyw4s/sculpture.mp4
I personally think that prompting is a new programming approach. Don’t assume that guiding models with natural language is easy. On the contrary, I believe it’s quite the opposite. Natural language programming lacks the syntax of traditional programming languages, which means there are no type checks or any protective mechanisms in place. If the model (AI) receives an inappropriate prompt, the generated results can be completely different from what was expected.
Here is a prompt I have used the Diffusion model in computer vision. Although it has brought some surprises, it is not actually my ultimate goal.
AI new trend, prompt engineering
Mouse interactive prompt engineering
# setup
docker build --no-cache --tag cv-prompt-engineering -f Dockerfile .
# run
docker run --gpus all -v /home/ubuntu/work/cv-prompt-engineering/:/workspace/ -p 5555:5555 --rm -it --shm-size=55gb -d cv-prompt-engineering tail -f /dev/null
streamlit run basic_app.py --server.port 5555 --server.enableCORS false