Skip to content

Latest commit

 

History

History
74 lines (39 loc) · 2.75 KB

VISION_AND_ROADMAP.md

File metadata and controls

74 lines (39 loc) · 2.75 KB

Why?

We think that artificial intelligence is going to be the next big thing in our nearby future. It will bring humanity to a new era of access and organization of information.

Language translation is probably the most complex of the human tasks for a machine to learn but it is also the one with the greatest potential to make the world a single family.

With this project we want to give our contribution to the evolution of machine translation toward singularity.

We want to consolidate the current state of the art into a single easy to use product, evolve it and keeping it an open to integrate the next greatest opportunities in machine intelligence like deep learning.

To achieve our goals we need a better MT technology that is able to extract more from data, adapt to context and easy to deploy. As every AI, it needs data and we are working to create the tools to make all the world translated information available to all.

We know that the challenge is big, but the reward is potentially so big that we think it is worth trying hard.

How?

We aggregated most of the people that created the current state of the art machine translation technology, added great engineers and a few challengers to rethink the problem.

If you feel you can contribute you are welcome to join.

What? - MMT Dirty Hands Todo

This documents is a recap of the most important activities.

The purpose is keeping vision and consensus on the strategy within the team and inform the users of what to expect next.

Items should be in order of priority.

Quality

Goal: with 2 billion words perform better than commercially available technology.

  • 0.14 - Incremental Training (Online Learning) Beta Released

  • 0.14 - Adding more languages and quality to tokenization, now 45.

  • 0.15 - Integration within the Matecat CAT Tool. Able to adapt to the corrections and TM used by the translator.

  • 0.16 - Releasing commercial quality baseline engines for the top 5 languages.

Speed

Training

Goal: Initial training to stay below 8 hours for each 1B word (36 cores). Make incremental training available.

All done :) Waiting for new goals here.

Translate

Goal: with 2 billion words stay below 400ms for the average sentence length (15 words).

Now 1s-5s

  • Pruning of models.
  • Better caching.
  • Fine tuning the max number LM requests per translation and other parameters.

Product Market Fit

Research

All done :) Need Marcello and Philipp input.

Industry

All done :) Need Marco and Alessandro input.

Donate

MMT is free, is open source and welcomes contributions and donations.

MMT is currently sponsored by its funding members (Translated, FBK, UEDIN and TAUS) and the European Commission.

For donations, customizations and words of encouragement contact [email protected]