Skip to content

Latest commit

 

History

History
25 lines (20 loc) · 1.21 KB

README.md

File metadata and controls

25 lines (20 loc) · 1.21 KB

PentominoDescription

This repository was part of the CogSys Master's project module (PM2: Project in Machine Learning; Multimodal Dialogue in Human-Robot Interaction), which I took in the Summer Semester 2022 at the University of Potsdam, Germany.

The repository only contains the sub-tasks I was responsible for, which were to 1) preprocess textual data (TakeCV and Survey), 2.2) build verbal classifiers, which match a textual description to a pentomino piece, and 2.3) combine the results from the vision model and the language model. The accuracy scores, as a result, of two different types of classifiers (NB and LSTM) are shown.

Project description

I am a part of Group D: Language and Vision; our goal is to build a multimodal model that correctly detects a pentomino piece that a human participant describes verbally in a real-time visual scene, and sends the pick-up coordinate to the robot arm. The overview of the project is as follows:

  1. Corpora
    • TakeCV
    • Survey
    • Augmented data
  2. Experiments
    • Vision
      • Fast R-CNN
      • YOLO
      • Grabbing Point
    • Language
      • Naive Bayes
      • CNNs
      • LSTMs
    • Combining LV Models

The project consists of four notebooks without data uploaded.