Skip to content

Commit

Permalink
Add multimodal to possible tests (pytorch#1382)
Browse files Browse the repository at this point in the history
* Update multimodal.md

Complete markup for testing

* Update run-docs

Add ability to run on docs/multimodal.md

* Update run-readme-pr.yml
  • Loading branch information
mikekgfb authored Nov 19, 2024
1 parent 826c0c6 commit edc2cfb
Show file tree
Hide file tree
Showing 3 changed files with 70 additions and 1 deletion.
20 changes: 20 additions & 0 deletions .ci/scripts/run-docs
Original file line number Diff line number Diff line change
Expand Up @@ -91,3 +91,23 @@ if [ "$1" == "evaluation" ]; then
echo "*******************************************"
bash -x ./run-evaluation.sh
fi

if [ "$1" == "multimodal" ]; then

# Expecting that this might fail this test as-is, because
# it's the first on-pr test depending on githib secrets for access with HF token access

echo "::group::Create script to run multimodal"
python3 torchchat/utils/scripts/updown.py --file docs/multimodal.md > ./run-multimodal.sh
# for good measure, if something happened to updown processor,
# and it did not error out, fail with an exit 1
echo "exit 1" >> ./run-multimodal.sh
echo "::endgroup::"

echo "::group::Run multimodal"
echo "*******************************************"
cat ./run-multimodal.sh
echo "*******************************************"
bash -x ./run-multimodal.sh
echo "::endgroup::"
fi
45 changes: 44 additions & 1 deletion .github/workflows/run-readme-pr.yml
Original file line number Diff line number Diff line change
Expand Up @@ -243,4 +243,47 @@ jobs:
echo "::group::Completion"
echo "tests complete"
echo "*******************************************"
echo "::endgroup::"
echo "::endgroup::"
test-multimodal-any:
uses: pytorch/test-infra/.github/workflows/linux_job.yml@main
with:
runner: linux.g5.4xlarge.nvidia.gpu
gpu-arch-type: cuda
gpu-arch-version: "12.1"
timeout: 60
script: |
echo "::group::Print machine info"
uname -a
echo "::endgroup::"
echo "::group::Install newer objcopy that supports --set-section-alignment"
yum install -y devtoolset-10-binutils
export PATH=/opt/rh/devtoolset-10/root/usr/bin/:$PATH
echo "::endgroup::"
.ci/scripts/run-docs multimodal
echo "::group::Completion"
echo "tests complete"
echo "*******************************************"
echo "::endgroup::"
test-multimodal-cpu:
uses: pytorch/test-infra/.github/workflows/linux_job.yml@main
with:
runner: linux.g5.4xlarge.nvidia.gpu
gpu-arch-type: cuda
gpu-arch-version: "12.1"
timeout: 60
script: |
echo "::group::Print machine info"
uname -a
echo "::endgroup::"
echo "::group::Install newer objcopy that supports --set-section-alignment"
yum install -y devtoolset-10-binutils
export PATH=/opt/rh/devtoolset-10/root/usr/bin/:$PATH
echo "::endgroup::"
TORCHCHAT_DEVICE=cpu .ci/scripts/run-docs multimodal
6 changes: 6 additions & 0 deletions docs/multimodal.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,9 +14,11 @@ This page goes over the different commands you can run with LLama 3.2 11B Vision

While we strongly encourage you to use the Hugging Face checkpoint (which is the default for torchchat when utilizing the commands with the argument `llama3.2-11B`), we also provide support for manually providing the checkpoint. This can be done by replacing the `llama3.2-11B` argument in the commands below with the following:

[skip default]: begin
```
--checkpoint-path <file.pth> --tokenizer-path <tokenizer.model> --params-path torchchat/model_params/Llama-3.2-11B-Vision.json
```
[skip default]: end

## Generation
This generates text output based on a text prompt and (optional) image prompt.
Expand Down Expand Up @@ -48,6 +50,7 @@ Setting `stream` to "true" in the request emits a response in chunks. If `stream

**Example Input + Output**

[skip default]: begin
```
curl http://127.0.0.1:5000/v1/chat/completions \
-H "Content-Type: application/json" \
Expand Down Expand Up @@ -75,6 +78,7 @@ curl http://127.0.0.1:5000/v1/chat/completions \
```
{"id": "chatcmpl-cb7b39af-a22e-4f71-94a8-17753fa0d00c", "choices": [{"message": {"role": "assistant", "content": "The image depicts a simple black and white cartoon-style drawing of an animal face. It features a profile view, complete with two ears, expressive eyes, and a partial snout. The animal looks to the left, with its eye and mouth implied, suggesting that the drawn face might belong to a rabbit, dog, or pig. The graphic face has a bold black outline and a smaller, solid black nose. A small circle, forming part of the face, has a white background with two black quirkly short and long curved lines forming an outline of what was likely a mouth, complete with two teeth. The presence of the curve lines give the impression that the animal is smiling or speaking. Grey and black shadows behind the right ear and mouth suggest that this face is looking left and upwards. Given the prominent outline of the head and the outline of the nose, it appears that the depicted face is most likely from the side profile of a pig, although the ears make it seem like a dog and the shape of the nose makes it seem like a rabbit. Overall, it seems that this image, possibly part of a character illustration, is conveying a playful or expressive mood through its design and positioning."}, "finish_reason": "stop"}], "created": 1727487574, "model": "llama3.2", "system_fingerprint": "cpu_torch.float16", "object": "chat.completion"}%
```
[skip default]: end

</details>

Expand All @@ -90,6 +94,8 @@ First, follow the steps in the Server section above to start a local server. The
streamlit run torchchat/usages/browser.py
```

[skip default]: end

---

# Future Work
Expand Down

0 comments on commit edc2cfb

Please sign in to comment.