Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update vision README.md #305

Merged
merged 2 commits into from
Oct 17, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 5 additions & 2 deletions marimbabot_vision/README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
# TAMS Master Project 2022/2023 - Vision

This part of the repository stores all the code to run and train the transformer based end-to-end vision pipeline.
The latest trained model can be found on [Huggingface](https://huggingface.co/Flova/omr_transformer), which also allows a simple browser based demo.

## Scripts
For more information on the usage of the scripts, please refer [README](../marimbabot_vision/scripts/README.md).

Expand All @@ -10,8 +13,8 @@ This will create a dataset of N samples. The dataset is saved in the `data`, `da
## Src
### [vision_node.py](src/vision_node.py)

This ROS node is responsible for processing images from a camera source and recognizing notes in the images using a pre-trained model. It converts the image data into a textual representation of recognized musical notes and publishes them as ROS messages.
This ROS node is responsible for processing images from a camera source and recognizing notes in the images using a pre-trained model. It converts the image data into a textual LilyPond representation and publishes them as ROS messages.


### [visualization_node.py](src/visualization_node.py)
The ROS node receives recognized notes from the vision_node and generates visual representations of the musical notations. It uses the LilyPond library to create musical staff notation and publishes the resulting images as ROS messages for visualization.
The ROS node receives recognized notes from the vision_node and generates visual representations of the musical notations. It uses the LilyPond library to create musical staff notation and publishes the resulting images as ROS messages for visualization.
8 changes: 7 additions & 1 deletion marimbabot_vision/scripts/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,4 +51,10 @@ Trains a model on the a set of given `train_data_paths`.
Trains the tokenizer on all text files defined by a glob expression.

### `detect.py`
This script is used for live detection of notes. A trained model can be used to initialize. The current model is stored at HuggingFace and its path/name is set by the `MODEL_PATH` parameter inside `config/vision_node.yaml` The detected notes are shown in a window.
This script is used for detection of notes in given image file. The current model stored at HuggingFace will be downloaded/used by default. The detected notes are printed to the terminal.

### `attention_viz.py`
The script is similar to the `detect.py` script, but shows the cross-attention to the image encoder as a heatmap.

### `eval.py`
This script can be used to evaluate a given model on a given dataset.