Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace bespoke translation toolset with more standards-based options #4779

Closed
hsluoyz opened this issue May 18, 2019 · 112 comments
Closed

Replace bespoke translation toolset with more standards-based options #4779

hsluoyz opened this issue May 18, 2019 · 112 comments

Comments

@hsluoyz
Copy link

hsluoyz commented May 18, 2019

Currently, all translations are done in XML files, like mentioned in: #4029 (comment), which is very inefficient for translators to translate, or sync a few of items between dozens of XML files.

Is there any chance to use a more advanced online translation platform like: https://crowdin.com/ ? In crowdin, all translators only need to do the translation in web browser, and no need to track which words have not been translated yet. The translation will be deployed automatically with a new git commit. Can we consider it?

@hsluoyz hsluoyz changed the title Missing a lot of translations in Chinese Missing a lot of translations in Chinese, can we use https://crowdin.com/? May 18, 2019
@asmecher
Copy link
Member

@hsluoyz, we have been hoping to replace our own translation tools with something else for some time, but unfortunately have not been able to make much progress on it. I was not aware of Crowdin, which does appear to have a free open source/academic plan:

Can I get an Open Source or a free Academic License?

Yes. If you want to use Crowdin for an Open Source project, sign up for a free account, set up your project and send us a request. Apply for an Academic License if your project has educational purposes. Each granted license will include an unlimited number of projects, strings, and members.

We're hesitant to use freemium services for necessary elements of the software, but some of the other translation options we've been considering are freemium as well.

(Tagging @mtub and @marcbria)

@marcbria
Copy link
Collaborator

marcbria commented May 18, 2019

As you know I'm not a fan of a privative software, so my vote will be always no... and less when we have free alternatives (free as in freedom. I'm ok if it's a paid service) that covers all the requirements (github/lab integration, import/export, translation memories, glossaries, multiple formats...). If somebody is interested, we made a comparative of the requirements.

So Heildelberg we started a weblate instance, that it's still up and running.
I can keep it in production for PKP if you need hosting... but we need somebody with time to set up it all correctly. OJS native files need to be parsed and converted to XLIFF, and then mapped in the tool.

IMHO, it's a huge task at the beginning, but will make the translation task a piece of cake and facilitate the integration of non tech profiles in the translation team.

If we are not going to host our own tools (that IMHO is an error ;-)), SaaS could be an option, but...
a) it need to be something based on free software...
b) and we need to be completely sure we can move everything outside the tool if in future we don't like the service conditions.

Weblate SaaS accomplish with those 2 conditions while crowdin doesn't.

You all know what happens when we trust in proprietary tools that start as "free of charge" to get enough people, and then move to a restrictive business model.

@hsluoyz
Copy link
Author

hsluoyz commented May 19, 2019

Hi @asmecher ,

I was not aware of Crowdin, which does appear to have a free open source/academic plan:

In fact, I'm using Crowdin in the docs site of my own project: https://crowdin.com/project/xxx, the site is here: https://xxx.org/. You can see there's a "English" button in the top to switch the translation. Of course there are many popular projects using it (see here) including Minecraft, Khan Academy, GitLab. I'm also recommended by other people about it, and currently it seems to be the No.1 popular online translation platform (correct me if I'm wrong).

HI @marcbria ,

As you know I'm not a fan of a privative software, so my vote will be always no... and less when we have free alternatives (free as in freedom. I'm ok if it's a paid service) that covers all the requirements (github/lab integration, import/export, translation memories, glossaries, multiple formats...). If somebody is interested, we made a comparative of the requirements.

I think Crowdin has covered these requirements (GitLab integration not checked, as I'm using GitHub only).

a) it need to be something based on free software...

Using self-hosted translation tool indeed gives ourselves more control. But it also brings many more efforts. The main task of this project is academic journal manuscript software, not a translation software. We don't need to build or host all services on our own. GitHub is actually a non-free software but we are still using it for free and open-source projects, right? GitLab is still not as popular as GitHub.

We can let professional people do their professional job. Currently, this project (ojs) is already short in person as many translations are not complete (at least in Chinese as I checked). We don't have the efforts to build/host a translation system.

b) and we need to be completely sure we can move everything outside the tool if in future we don't like the service conditions.

I understand your concern, but as I said above, much larger and popular projects like Minecraft, Khan Academy, GitLab are already using Crowdin. We are not the one to be hit first when the sky falls. Even if one day, Crowdin broke up, We still get the all translation files (which will be stored in our repository). It's no worse than current. We have nothing to lose.

@marcbria
Copy link
Collaborator

marcbria commented May 19, 2019

Using self-hosted translation tool indeed gives ourselves more control.

Sounds like a good idea to me.

But it also brings many more efforts.

Not necessarily. Weblate offers free hosting for free software projectes.
If not, I offer my servers for free.

The main task of this project is academic journal manuscript software, not a translation software.

Thanks for sharing your thoughts about the goal of OJS and PKP, but I think you are missing the whole picture.
From my perspective PKP project it's not only about tools... it's mainly about "Public Knowledge" and, as said, if when we have free alternatives, I have no doubts it's the way to go.
We need to ensure we don't depend on proprietary initiatives and supporting free software is also a way to empower the whole community.

We don't need to build or host all services on our own.

Of course we don't, but we can if we like.
At the end, moving from our own translation tool to a community build one it's also a way to optimize our dev resources.

GitHub is actually a non-free software but we are still using it for free and open-source projects, right?

And I think is an error, but don't get me started... ;-)

@hsluoyz
Copy link
Author

hsluoyz commented May 20, 2019

I'm also OK with weblate. It didn't know it before and found it to be very excellent after some googling. Hope this platform would be ready soon so we can get started to translate now..

@marcbria
Copy link
Collaborator

marcbria commented May 20, 2019

I'm also OK with weblate. It didn't know it before and found it to be very excellent after some googling.

Great. 👍

Hope this platform would be ready soon so we can get started to translate now.

Me too, but there is a lack of hands. :-(

Never mind what platform we use... in all cases, we need to translate our native XML to something standard (XLIFF sounds like a good plan), then setup the tool to define translation units and set the git-whatever exportation, and after this, change OJS (or every OxS tool) to read XLIFF instead of our native XML... and right now I have my hands full.

If somebody is interested in doing the job, I'm pretty sure he/she will make Marco be very happy. ;-)

Till then, I'm sorry but editing the xmls or using the native translation tool are the only ways the community has to contrib with translations.

@mtub, sorry to annoye you with this, but... you are the boss? ;-P
Weblate is fine or you prefer others?
Something to address in Pittsburgh or Barcelona sprint this year?

asmecher added a commit to asmecher/ojs that referenced this issue May 28, 2019
asmecher added a commit to asmecher/ojs that referenced this issue May 28, 2019
@asmecher
Copy link
Member

Here are two draft PRs that alter OJS and pkp-lib to use XLIFF sources instead of the current PKP-specific XML files:

To use them...

  1. Pull in the above modifications to your installation
  2. Go into lib/pkp and update your composer dependencies (composer update)
  3. Convert your locale files from PKP XML into XLIFF:
for name in `(find locale/*/*.xml && find lib/pkp/locale/*/*.xml) | sed -e "s/xml$//" | grep -v bic21 | grep -v countries | grep -v currencies | grep -v languages | grep -v emailTemplates`; do php lib/pkp/tools/xmlToXliff.php ${name}xml ${name}xliff; done

(This is equivalent to running php lib/pkp/tools/xmlToXliff.php path/to/source-locale-file.xml path/to/target-xliff-file.xliff for all translations that are present, excepting plugins.)

  1. Flush your file cache: rm -f cache/*.php

This is a work in progress, but should allow experimentation with XLIFF-based translation tools to see how well they work with this toolset.

@asmecher asmecher changed the title Missing a lot of translations in Chinese, can we use https://crowdin.com/? Replace bespoke translation toolset with more standards-based options May 28, 2019
@asmecher
Copy link
Member

(@mbria, @marco: #4779 (comment))

asmecher added a commit to asmecher/pkp-lib that referenced this issue May 28, 2019
@marcbria
Copy link
Collaborator

marcbria commented May 29, 2019

@asmecher that looks great.

If I read well, with your changes we have now OJS ready to read XLIFF and, by the same price, also a php helper script to convert local XMLs to XLIFF, isn't it?

You are a really fast coder!! ;-) Thanks a lot!!

BTW, Travis is claiming something here: pkp/ojs#2413
Do we need to worry?

@mtub , with Alec changes, now we only need somebody to configure weblate correctly and make some testing to see if we can integrate weblate with gitlab.

I won't have time for this, at least, till the end of the next month. :-(
Is there any body in PKP that can do the job or some money to hire someone?

Cheers,
m.

@asmecher
Copy link
Member

Hi @marcbria,

BTW, Travis is claiming something here: pkp/ojs#2413
Do we need to worry?

No, don't worry -- I didn't include the converted xliff files with the commits, so the tests will break because of untranslated locale keys. (It won't make sense to commit/maintain converted files until we're ready to take the plunge.)

I think the next step would be to get confirmation from someone who has worked with xliff files that the automatically-converted ones aren't totally crazy. I've attached one here for reference:
submission.xliff.txt

asmecher added a commit to asmecher/ojs that referenced this issue Jun 1, 2019
asmecher added a commit to asmecher/ojs that referenced this issue Jun 1, 2019
@marcbria
Copy link
Collaborator

marcbria commented Jun 4, 2019

I forwarded your question to our CAT expert, and I hope he will answer in a couple of days.
Thanks a lot for your work Alec.

@asmecher
Copy link
Member

asmecher commented Jun 5, 2019

@marcbria, a few questions I'd want them to consider:

  1. Symbolic vs. English-language keys

We use symbolic locale keys in the code (e.g. navigation.journalHelp), then all locales, including English, are specified in locale files. This differs a bit from the Gettext standard in that usually English-language text would be embedded in the code, then the locale files would provide translations from English into other languages.

As a result, the XLIFF will have translations like this (for French):

     <segment>
        <source>author.submit.submissionCitations</source>
        <target>Fournir une liste structurée de références pour les travaux cités dans cette soumission.</target>
      </segment>

...instead of...

     <segment>
        <source>Provide a formatted list of references for works cited in this submission. Please separate individual references with a blank line.</source>
        <target>Fournir une liste structurée de références pour les travaux cités dans cette soumission.</target>
      </segment>

Will this work e.g. with Weblate?

  1. The distribution of locale files into various directories and repositories

The translations are split between a number of Git repositories:

Within the Application and pkp-lib repositories, there are several locale files (example:
pkp-lib
), divided roughly into topics. (I'm open to change on this, if it's not a good fit for standard practices.)

Tools like Pootle and Weblate appear to support Projects and Components. Will that mapping match well against our use of multiple repositories and sometimes multiple locale files within them?

@marcbria
Copy link
Collaborator

marcbria commented Jun 7, 2019

Hi @asmecher

I have been out of the office a couple of days and I missed your last comment.
I will read it all in deep next Thuesday but let me advance some questions from Adrià (the CAT expert).

He need more time but at first sight he said he is very much agree with you about this point:

"We use symbolic locale keys in the code (e.g. navigation.journalHelp), then all locales, including English, are specified in locale files. This differs a bit from the Gettext standard in that usually English-language text would be embedded in the code, then the locale files would provide translations from English into other languages."

And he extends with:

"Of course, if the XLIFF file does not contain the original segments, the translation programs will not correctly recognize the file structure and the translators will not be able to translate.

I understand that the problem stems from the conversion process to XLIFF. If I do not remember badly, when creating XLIFF you should ask the converter to leave the targets blank. If you want, pass me the original file (which is behind submission.xliff) and try to take a look."

I send him this one: https://github.com/pkp/pkp-lib/blob/master/locale/es_ES/submission.xml

I planned to meet him next week and look together weblate to see if we can make PKP a proposal that I think you won't be able to refuse. (Right now, I can't say more) ;-)

Cheers,
m.

@asmecher
Copy link
Member

asmecher commented Jun 7, 2019

Thanks, @marcbria, sounds very intriguing! The XLIFF conversion tool was put together fairly quickly and there are surely a lot of ways of adjusting it. The submission.xliff file linked above comes from https://github.com/pkp/pkp-lib/blob/master/locale/en_US/submission.xml.

@hsluoyz
Copy link
Author

hsluoyz commented Jun 16, 2019

Hi, any update on this?

@marcbria
Copy link
Collaborator

Not yet. Sorry. Let us one or two more weeks.

@veotax
Copy link

veotax commented Jun 27, 2019

Hey guys, any progress on this issue?

@marcbria
Copy link
Collaborator

marcbria commented Jul 1, 2019

Nop. Thanks for your interest, and sorry again.
We have a meeting next week that (hopefully) will offer some light in some issues we still need to fix.

@marcbria
Copy link
Collaborator

marcbria commented Jul 8, 2019

We arranged a meeting with some CAT experts for tomorrow night.
BTW, if somebody is an expert translator (good knowledge of translation formats and tools) opinions and suggestions are welcome.

@veotax
Copy link

veotax commented Jul 22, 2019

Any update?

@marcbria
Copy link
Collaborator

marcbria commented Jul 23, 2019

Sorry again for the silence. I'm overwhelmed and sometimes is difficult to find time to write down what happened.

Long story short:

  • A fellow that is a chair in the OASIS XLIFF consortium expressed doubts about XLIFF being the best format for native OJS files and recommends PO instead.
  • Other fellow (also CAT expert) wants to help us in the format decision (we still like to make a deeper research) and also with the weblate configuration.
  • UAB offers it's resources to host the PKP translation server for free.
  • If PKP says we can wait till then, we will start working on it after summer vacations (with luck finished in BCN november's sprint).

@asmecher and @mtub what do you think about talking about this the next technical meeting?
or do you prefer a different space?

@ctgraham
Copy link
Collaborator

ctgraham commented Jul 29, 2019 via email

asmecher added a commit to pkp/texture that referenced this issue Nov 28, 2019
asmecher added a commit to asmecher/pkp-lib that referenced this issue Dec 3, 2019
asmecher added a commit to asmecher/pkp-lib that referenced this issue Dec 3, 2019
asmecher added a commit to asmecher/ojs that referenced this issue Dec 3, 2019
asmecher added a commit to asmecher/ojs that referenced this issue Dec 3, 2019
asmecher added a commit to asmecher/ojs that referenced this issue Dec 3, 2019
asmecher added a commit to asmecher/ojs that referenced this issue Dec 3, 2019
@asmecher
Copy link
Member

asmecher commented Dec 3, 2019

Converting email templates to the PO format!

PRs:

This preserves the old XML format (locale/en_US/emailTemplates.xml), but replaces the (localized) contents with {translate ...} calls, e.g.:

        <email_text key="NOTIFICATION">
                <subject>{translate key="emails.notification.subject"}</subject>
                <body>{translate key="emails.notification.body"}</body>
                <description>{translate key="emails.notification.description"}</description>
        </email_text>

Then the translations themselves come from a new PO file, e.g. locale/en_US/emails.po for English.

There's a new conversion tool to help with this in lib/pkp/tools/xmlEmailsToPo.php. It generates the new PO file and changes the old files over to {translate ...} calls.

@marcbria, could you take a quick look? Does this seem like a workable approach? If so, I can merge and set up a new "Emails" component in Weblate.

@marcbria
Copy link
Collaborator

marcbria commented Dec 3, 2019

@asmecher I'm missing something important here.

Does it means that we will need to convert xml to po (1), then load inside weblate (2), after this do the translation (3), then export outside weblate (4) and finally convert from po to xml (5) and push back to github (6)?

Why not moving from xml to po and work with a single format?
As far as weblate will be able to pull and push without any conversion work, isn't it?

Sorry in advance for the Mr.Obvius comment...
Knowing you I missed the real issue here.

@asmecher
Copy link
Member

asmecher commented Dec 3, 2019

@marcbria, not to worry, it's complicated :)

No, the translations can be managed inside Weblate as with everything else. There's no need to round-trip the files manually. The only reason we're keeping the old-style emailTemplates.xml is to map the three pieces of each translated template -- description, title, and body -- to the email template key. The addition of the {translate ...} calls to that file delegates the translation to the .po file.

The reason for the conversion tool is so that we can easily migrate the translation files when we upgrade translations to 3.2.

asmecher added a commit to asmecher/pkp-lib that referenced this issue Dec 3, 2019
asmecher added a commit to asmecher/ojs that referenced this issue Dec 3, 2019
@marcbria
Copy link
Collaborator

marcbria commented Dec 3, 2019

Thanks Alec. Clear now.

BTW, I'm kind of worried about how are we going to work with weblate.

I mean, is somebody testing translation memories features or translation workflows or glossaries or CAT tools or CLA agreement...?

I think is important to review it before we open the platform to translators... and maybe write some documentation.

Are you planning to do this? do you need help?

Cheers,
m.

@asmecher
Copy link
Member

asmecher commented Dec 3, 2019

For the moment we can't do anything with Weblate because it is still not able to deliver emails, thus register users. I'm working to get that resolved.

I don't have experience with the translation memory features but I know those exist -- some expert feedback/testing on those would be welcome.

As for documentation, I have some provisional documentation but it needs to be co-developed with some of our translators.

I plan to manage merges of translations manually, at least for now, and intend to manage CLA agreements manually for the first while (when granting translator accounts translation privileges during the registration process). I'd like to automate this later, but baby steps :)

asmecher added a commit that referenced this issue Dec 3, 2019
#4779 Move email templates to PO structure
asmecher added a commit to pkp/ojs that referenced this issue Dec 3, 2019
@asmecher
Copy link
Member

asmecher commented Dec 3, 2019

Committed the email text to PO format PRs and adapted and committed the same changes for OMP.

@ajnyga, this will require a conversion to the PPS email files too -- would you like me to create a PR for that?

@ajnyga
Copy link
Collaborator

ajnyga commented Dec 4, 2019

Thanks, go ahead. There are only a couple of templates there now and could be that even those are not all in use right now. But go ahead with the conversion.

asmecher added a commit to asmecher/ojs that referenced this issue Dec 4, 2019
@asmecher
Copy link
Member

asmecher commented Dec 4, 2019

@ajnyga, I opened a PR for that: ajnyga/ojs#7

ajnyga added a commit to ajnyga/ojs that referenced this issue Dec 5, 2019
@asmecher
Copy link
Member

asmecher commented Jan 8, 2020

Done and dusted! https://translate.pkp.sfu.ca/

@asmecher asmecher closed this as completed Jan 8, 2020
@hsluoyz
Copy link
Author

hsluoyz commented Jan 9, 2020

@asmecher Thanks for the amazing work! So what's the process to translate? I thought it would be:

  1. Pull pkp-lib and ojs from origin master. Or only pull ojs and latest pkp-lib (which contains latest translation) will be pulled automatically via Git Submodules?
  2. See if there's missing/wrong words in my OJS instance.
  3. Translate the words on https://translate.pkp.sfu.ca/

Then what to do next? How to make my translated words on Weblate sync to my instance?

@asmecher
Copy link
Member

asmecher commented Jan 9, 2020

Hi @hsluoyz!

I periodically merge the latest translations that have been provided via weblate into the official repos. We'll sometimes get translation contributions via other means and will merge them there as well. You can also download the latest files directly from Weblate:

image

Weblate itself pushes translations up to these repos: https://github.com/pkp-translations/
...so if you really want the latest content that's in weblate, you can get it there. If you're interested in working with a git master installation and round-tripping translation work with weblate, that might be the easiest way.

@hsluoyz
Copy link
Author

hsluoyz commented Jan 9, 2020

Thanks ! I saw translation commits in: https://github.com/pkp-translations. So I think I can use the following steps:

  1. Pull master from https://github.com/pkp-translations/pkp-lib and https://github.com/pkp-translations/ojs.
  2. See if there's missing/wrong words in my OJS instance.
  3. Translate the words on https://translate.pkp.sfu.ca/. My translation will become commits in https://github.com/pkp-translation
  4. Pull the code again, my OJS instance will have the latest translation.

Is this correct? BTW is https://github.com/pkp-translations a mirror to latest master branch or a stable release?

@asmecher
Copy link
Member

asmecher commented Jan 9, 2020

The pkp-translations repos are a mirror to the current master branch. Once we release OJS 3.2, then I expect we'll flip the translation toolset over to the stable branch until our next big push for a release from the master branch (that'll eventually be 3.3). Yes, your summary is correct -- we're still working out the kinks in our Weblate workflow, but my sense is that Weblate should be pushing up to pkp-translations more or less immediately.

JackBlackLight pushed a commit to cul/ojs-plugin-doi that referenced this issue Sep 15, 2021
JackBlackLight pushed a commit to cul/ojs-plugin-doi that referenced this issue Sep 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests