Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

License of the model files #158

Open
dlippold opened this issue Nov 12, 2023 · 3 comments
Open

License of the model files #158

dlippold opened this issue Nov 12, 2023 · 3 comments

Comments

@dlippold
Copy link

Where can I find the information about the license of the model files which can be loaded by ./translateLocally -d <model-name>?

Please add that information to the project page.

@XapaJIaMnu
Copy link
Owner

Sorry for the late reply. The bergamot models are CC-BY-SA , as you can see here: https://github.com/browsermt/students

@dlippold
Copy link
Author

Thank you for the information.

I took a closer look at it and realized that translateLocally downloads model data as tar.gz files which are listed in the file https://translatelocally.com/models.json , that these tar.gz files contains a file catalog-entry.yml and that this file specifies the different license CC-BY-SA-NC-4.0, which is more restrictive (no commercial use).

Then I downloaded the project https://github.com/browsermt/students , executed the script deploy.students.sh and compared the content of some tar.gz files with the files from the project. I found that some directories contain the same files (e.g. deen.student.base) but some directories contain different files (e.g. ende.student.base).

Is it correct that the files from the project https://github.com/browsermt/students are newer and therefore give slightly better results than those from the tar.gz files which translateLocally currently uses?

Is it planned to create and use updated tar.gz files, in particular with the new license?

@dlippold
Copy link
Author

I investigated the differences further for the following directories:

  • ende.student.base/
  • deen.student.base/
  • encs.student.base/
  • csen.student.base/
  • enes.student.tiny11/
  • esen.student.tiny11/
  • enet.student.tiny11/
  • eten.student.tiny11/
  • enfr.student.tiny11/
  • fren.student.tiny11/
  • enpl.student.tiny11/
  • plen.student.tiny11/

Apart from the files catalog-entry.yml with the different licenses I found differences only for the following four files:

  • ende.student.base/lex.s2t.bin
  • ende.student.base/model.intgemm.alphas.bin
  • ende.student.base/vocab.deen.spm
  • enes.student.tiny11/lex.s2t.bin

This information may decrease the effort for the update.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants