Skip to content

Commit

Permalink
chore: add citation file
Browse files Browse the repository at this point in the history
See if it works
  • Loading branch information
amitkparekh authored Jul 8, 2024
1 parent 235c77a commit 3b8ab7f
Showing 1 changed file with 32 additions and 0 deletions.
32 changes: 32 additions & 0 deletions CITATION.cff
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
cff-version: 1.2.0
title: &title "Investigating the Role of Instruction Variety and Task Difficulty in Robotic Manipulation Tasks"
message: "If you use this sofrware, please cite the software and the paper"
authors: &authors
- given-names: Amit
family-names: Parekh
email: [email protected]
affiliation: Heriot-Watt University
- given-names: Nikolas
family-names: Vitsakis
email: [email protected]
affiliation: Heriot-Watt University
- given-names: Alessandro
family-names: Suglia
email: [email protected]
affiliation: Heriot-Watt University
- given-names: Ioannis
family-names: Konstas
email: [email protected]
affiliation: Heriot-Watt University
date-released: 2024-07-04
references:
- type: article
authors: *authors
title: *title
year: 2024
journal: arXiv
url: https://arxiv.org/abs/2407.03967

abstract: >-
Evaluating the generalisation capabilities of multimodal models based solely on their performance on out-of-distribution data fails to capture their true robustness. This work introduces a comprehensive evaluation framework that systematically examines the role of instructions and inputs in the generalisation abilities of such models, considering architectural design, input perturbations across language and vision modalities, and increased task complexity. The proposed framework uncovers the resilience of multimodal models to extreme instruction perturbations and their vulnerability to observational changes, raising concerns about overfitting to spurious correlations. By employing this evaluation framework on current Transformer-based multimodal models for robotic manipulation tasks, we uncover limitations and suggest future advancements should focus on architectural and training innovations that better integrate multimodal inputs, enhancing a model's generalisation prowess by prioritising sensitivity to input content over incidental correlations.
license: MIT

0 comments on commit 3b8ab7f

Please sign in to comment.