Merge pull request #305 from UHHRobotics22-23/Flova-patch-1

Update vision README.md
UHHRobotics22-23 · Oct 17, 2023 · e494af2 · e494af2
2 parents 8c61a67 + 80782fb
commit e494af2
Show file tree

Hide file tree

Showing 2 changed files with 12 additions and 3 deletions.
diff --git a/marimbabot_vision/README.md b/marimbabot_vision/README.md
@@ -1,5 +1,8 @@
 # TAMS Master Project 2022/2023 - Vision
 
+This part of the repository stores all the code to run and train the transformer based end-to-end vision pipeline. 
+The latest trained model can be found on [Huggingface](https://huggingface.co/Flova/omr_transformer), which also allows a simple browser based demo.
+
 ## Scripts
 For more information on the usage of the scripts, please refer [README](../marimbabot_vision/scripts/README.md).
 
@@ -10,8 +13,8 @@ This will create a dataset of N samples. The dataset is saved in the `data`, `da
 ## Src
 ### [vision_node.py](src/vision_node.py)
 
-This ROS node is responsible for processing images from a camera source and recognizing notes in the images using a pre-trained model. It converts the image data into a textual representation of recognized musical notes and publishes them as ROS messages.
+This ROS node is responsible for processing images from a camera source and recognizing notes in the images using a pre-trained model. It converts the image data into a textual LilyPond representation and publishes them as ROS messages.
 
 
 ### [visualization_node.py](src/visualization_node.py)
-The ROS node receives recognized notes from the vision_node and generates visual representations of the musical notations. It uses the LilyPond library to create musical staff notation and publishes the resulting images as ROS messages for visualization.
+The ROS node receives recognized notes from the vision_node and generates visual representations of the musical notations. It uses the LilyPond library to create musical staff notation and publishes the resulting images as ROS messages for visualization.
diff --git a/marimbabot_vision/scripts/README.md b/marimbabot_vision/scripts/README.md
@@ -51,4 +51,10 @@ Trains a model on the a set of given `train_data_paths`.
 Trains the tokenizer on all text files defined by a glob expression.
 
 ### `detect.py`
-This script is used for live detection of notes. A trained model can be used to initialize. The current model is stored at HuggingFace and its path/name is set by the `MODEL_PATH` parameter inside `config/vision_node.yaml` The detected notes are shown in a window.
+This script is used for detection of notes in given image file. The current model stored at HuggingFace will be downloaded/used by default. The detected notes are printed to the terminal.
+
+### `attention_viz.py`
+The script is similar to the `detect.py` script, but shows the cross-attention to the image encoder as a heatmap.
+
+### `eval.py`
+This script can be used to evaluate a given model on a given dataset.