Skip to content

Latest commit

 

History

History
17 lines (11 loc) · 3.13 KB

README.md

File metadata and controls

17 lines (11 loc) · 3.13 KB

Papers List

This repository lists papers authored by Focoos AI.

2024

Title Venue Code
📜 PEM: Prototype-based Efficient MaskFormer for Image Segmentation
Niccolò Cavagnero, Gabriele Rosi, Claudia Cuttano, Francesca Pistilli, Marco Ciccone, Giuseppe Averta, Fabio Cermelli

Prototype-based Efficient MaskFormer (PEM) is a transformer-based architecture for image segmentation that improves efficiency without sacrificing performance. It uses prototype-based cross-attention and a multi-scale feature pyramid network to reduce computation. PEM outperforms task-specific models while being more computationally efficient.
CVPR 2024 🌐
Project Page

GitHub stars
📜 The Revenge of BiSeNet: Efficient Multi-Task Image Segmentation
Gabriele Rosi, Claudia Cuttano, Niccolò Cavagnero, Giuseppe Averta, Fabio Cermelli

BiSeNetFormer is a multi-task image segmentation architecture designed for efficiency and accuracy, supporting semantic and panoptic segmentation. It combines two-stream architectures with a transformer-based segmentation head, achieving high inference speeds and competitive accuracy on datasets like Cityscapes and ADE20K.
CVPR 2024 (Workshop) -
📜 What does CLIP know about peeling a banana?
Claudia Cuttano, Gabriele Rosi, Gabriele Trivigno, Giuseppe Averta

AffordanceCLIP leverages pre-trained Vision-Language models like CLIP to improve affordance segmentation for robots, bypassing the need for costly annotations or predefined actions. It achieves competitive zero-shot performance, works with any action prompt, and requires minimal additional training, enabling scalable, flexible models.
CVPR 2024 (Workshop) -
📜 SAMWISE: Infusing Wisdom in SAM2 for Text-Driven Video Segmentation
Claudia Cuttano, Gabriele Trivigno, Gabriele Rosi, Carlo Masone, Giuseppe Averta

SAMWISE is a Referring Video Object Segmentation (RVOS) method that overcomes limitations of previous models by enabling streaming processing while retaining context. Built on the Segment-Anything 2 (SAM2) model, it integrates natural language understanding and temporal modeling, achieving state-of-the-art performance with minimal overhead.
📝 Under submission GitHub stars

Feel free to explore the papers and reach out for collaborations or inquiries!