MT Quality Estimation

COMET models are trained to predict quality scores for translations.

Getting started

Clone this repository:

gh repo clone capstanlqc/comet-api

Create a virtual environment and activate it:

cd /path/to/comet-api
python -m venv venv && source venv/bin/activate

Install dependencies:

pip install -r requirements.txt

Copy .env.example to .env:

cp .env.example .env

Update the environment variables you need to use in .env.

Download required models, e.g.

python code/get_models.py --model Unbabel/wmt22-cometkiwi-da

Run the API (for testing and development):

fastapi dev code/main.py

Test

The mandatory model parameter accepts values such as provider/model. The optional mode parameter accepts either mock (default value, if missing) or any other value (e.g. dev or prod, not yet defined).

The following call will get mock random scores (same as without mode parameter):

curl --location --request GET 'http://127.0.0.1:8000/api/scores?model=Unbabel/wmt22-cometkiwi-da&mode=mock' \
--header 'Content-Type: application/json' \
--data '[
    {"src": "How to Demonstrate Your Strategic Thinking Skills", "mt": "Cómo demostrar su capacidad de pensamiento estratégico" },{ "src": "Why is Accuracy important in the workplace?", "mt": "¿Por qué es importante la precisión en el trabajo" }, { "src": "When faced with a large amount of analysis ask for support setting up a team to approach the issue in different ways.", "mt": "Cuando se enfrente a una gran cantidad de análisis, pida ayuda para crear un equipo que aborde la cuestión de diferentes maneras." }
]'

The following call

curl --location --request GET 'http://127.0.0.1:8000/api/scores?model=Unbabel/wmt22-cometkiwi-da&mode=dev' \
--header 'Content-Type: application/json' \
--data '[
    {"src": "How to Demonstrate Your Strategic Thinking Skills", "mt": "Cómo demostrar su capacidad de pensamiento estratégico" },{ "src": "Why is Accuracy important in the workplace?", "mt": "¿Por qué es importante la precisión en el trabajo" }, { "src": "When faced with a large amount of analysis ask for support setting up a team to approach the issue in different ways.", "mt": "Cuando se enfrente a una gran cantidad de análisis, pida ayuda para crear un equipo que aborde la cuestión de diferentes maneras." }
]'

will have the following response with real scores:

{
    "data": [
        {
            "src": "How to Demonstrate Your Strategic Thinking Skills",
            "mt": "Cómo demostrar su capacidad de pensamiento estratégico",
            "score": 0.5087478756904602
        },
        {
            "src": "Why is Accuracy important in the workplace?",
            "mt": "¿Por qué es importante la precisión en el trabajo",
            "score": 0.6997313499450684
        },
        {
            "src": "When faced with a large amount of analysis ask for support setting up a team to approach the issue in different ways.",
            "mt": "Cuando se enfrente a una gran cantidad de análisis, pida ayuda para crear un equipo que aborde la cuestión de diferentes maneras.",
            "score": 0.41031646728515625
        }
    ],
    "model_output_system_score": 0.5395985643068949
}

References

https://github.com/Unbabel/COMET https://huggingface.co/docs/transformers/installation#offline-mode https://huggingface.co/docs/hub/models-adding-libraries#download-files-from-the-hub

Caveats / Warnings / To spruce up

auth?

Lightning automatically upgraded your loaded checkpoint from v1.8.2 to v2.4.0. To apply the upgrade to your files permanently, run python -m pytorch_lightning.utilities.upgrade_checkpoint ../.cache/huggingface/hub/models--Unbabel--wmt22-cometkiwi-da/snapshots/b3a8aea5a5fc22db68a554b92b3d96eb6ea75cc9/checkpoints/model.ckpt

python -m pytorch_lightning.utilities.upgrade_checkpoint ../.cache/huggingface/hub/models--Unbabel--wmt22-cometkiwi-da/snapshots/b3a8aea5a5fc22db68a554b92b3d96eb6ea75cc9/checkpoints/model.ckpt

/run/media/souto/257-FLASH/python-comet/venv/lib/python3.12/site-packages/transformers/tokenization_utils_base.py:1601: FutureWarning: clean_up_tokenization_spaces was not set. It will be set to True by default. This behavior will be depracted in transformers v4.45, and will be then set to False by default. For more details check this issue: huggingface/transformers#31884 warnings.warn( Encoder model frozen. /run/media/souto/257-FLASH/python-comet/venv/lib/python3.12/site-packages/pytorch_lightning/core/saving.py:195: Found keys that are not in the model state dict but in the checkpoint: ['encoder.model.embeddings.position_ids']

/run/media/souto/257-FLASH/python-comet/venv/lib/python3.12/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:419: Consider setting persistent_workers=True in 'predict_dataloader' to speed up the dataloader worker initialization. Predicting DataLoader 0: 100%|███████████████████████████████████████████████████████| 1/1 [00:00<00:00, 1.63it/s]

todo

Add logging
Add auth
Fix pytorch issues
Deploy in GPU-powered server

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
code		code
config		config
test		test
.env-example		.env-example
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
mockup.py		mockup.py
mockup_auth.py		mockup_auth.py
requirements.txt		requirements.txt
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MT Quality Estimation

Getting started

Test

References

Caveats / Warnings / To spruce up

todo

About

Releases

Packages

Languages

capstanlqc/comet-api

Folders and files

Latest commit

History

Repository files navigation

MT Quality Estimation

Getting started

Test

References

Caveats / Warnings / To spruce up

todo

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages