-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Translation Workflow
- High-level overview
- Automatic workflow
- Manually triggering workflow:
- Scenarios
- Key concepts / Glossary
- AWS Amplify
- Context
Translation Management System (TMS)
In order to make sure we are keeping track of all the files that are edited in our repository, we keep a queue of all the changed files that are .mdx files and have a ‘translate’ property. On every merge into main (I.e. every time we deploy to production or run a “release”), the files that are both of the .mdx type AND have the translate property in the frontmatter are added to this queue. Every two weeks then, a GitHub action executes to send these translated files to our vendor and create a job. A unique identifier for this job is then stored in a separate table. While the job is in the hands of the vendor to be translated, we execute another Github Action daily to determine if the job has been completed. If the job has been completed, the Github Action will execute a series of steps that will take the newly translated content, replace the old translated content with the new translated content, and then create a pull request. This pull request can then be reviewed before being merged in.
This section contains a lot of diagrams explaining each part of the workflow. For reference, some items are color-coded to make them easier to understand. Here is the color key for understanding the diagrams. To become more familiar with the GitHub related terminology or any of the technologies used in the workflow, visit the key concepts section.Throughout the week, content contributors add changes to our site by creating pull requests off of the develop
branch. This happens dozens of times per day. Less frequently, we release those changes (once to multiple times a day) to production through a pull request to our main
branch (i.e. production environment.)
When we merge into our main
branch, we execute the first of localization GitHub actions:
This action looks through all of the files that are being merged into the main branch, finds the files that are of the MDX type and have the ‘translate’ frontmatter. It then adds the file names (i.e. the slugs) to a data store (or table or queue, lots of names for this) in DynamoDB that is keyed by the language that they are to be translated in.
Over the next two weeks, all of the changed files will be added to this queue continuously on every merge to main.
- Executes:
on every merge to the
mainbranch
- Can be manually triggered?
No
- Steps:
- Looks at the files that have been changed on the merge
- Saves the filenames (or slugs) that have
translate
property in the frontmatter - For each file with the
translate
property, will save the filename for each language that is listed under the translate property.
This action takes all of the saved file names from that translation table in dynamoDB, serializes the content into a form that can be consumed by the vendor, sends the visual context for each page off of what is on docs.newrelic.com and then creates a job for each language specified in the queue. Lastly, it saves a unique identifier for the job, saved in another table (called job_ids) in DynamoDB which is used later to check on the status of the job.
- Executes:
bi-weekly
- Can be manually triggered?
Yes
- Steps:
- Get saved filenames from translation table
- Get files from filenames and serialize the content
- Send serialized content to vendor and create job
- Sends visual context by fetching the latest version of page from docs.newrelic.com
- Save job id to job ids table
- Clears filenames from translation table
This action executes every day at a specific time (time TBD). It will first check if any of the jobs saved are completed. If there is any job complete, then it will download the translated file, deserialize the file and turn it back into the mdx format, and then create a pull request into the ‘develop’ branch with the translated files. After the pull request is successfully created, it will remove the job id from the table, and remove the context for that id from the vendor’s store of contexts.
-
Executes:
daily
-
Can be manually triggered?
Yes
-
Steps:
- Get saved job ids from job id table
- Checks TMS if the job is completed by job id
- If the job is not completed, nothing happens and the job will stop on the “Fetch translated content and deserialize” step.
- If completed:
- Downloads translated files
- Saves translated files
- Creates a pull request to develop with the newly translated files
- Deletes job id from job id table
- Deletes context
And there you have it! The files will be fully translated and added to our staging environment. In order to deploy it to production, another PR from develop to main needs to be created (i.e. a release).
If there are time sensitive jobs that you would like to have translated, then a manual trigger might be necessary! To manually trigger any workflow, you must go to the repository: https://github.com/newrelic/docs-website
Go to the “Actions” tab from the homepage of the repository.
You can then see all of the workflows that we have available. There are three specific to the localization workflow: Add Slugs to Translation Queue, Check status of translation jobs, and Send content to be translated.
Select the workflow that you would like to run manually.
Click on run workflow, which will open a dropdown.
This should always be run from the branch develop. That will be the default branch that pops up, so there are no changes that need to be made from the drop down. Just click Run workflow!
Clicking on Run workflow will start the workflow. You will see it pop up in the workflow history. It might take a few seconds to pop up. If you see a yellow dot appear next to it, that means it is being queued.
Once you see a spinning yellow circle appear around the yellow dot, it means that the workflow has begun executing. When it has started executing, you can then click on it to see the progress of the workflow.
It will open up the menu below. To see the progress of the content, you will need to click on the box that says “Send content” or has some other name. This, in the context of Github Actions, is a job. All of the localization workflows only have one job. Click on the job to see the workflow executing.
In the job, you will see the steps of the workflow. You can click on the steps to see the logging output.
When the job is complete, the yellow mark will turn to a green checkmark. If the result is not what you expected (there isn’t a PR made, the content didn’t get sent to the vendor) you can look through the logging to see what happened.
Some scenarios for running a manual workflow.
There’s a bunch of content that has been deployed to production (i.e. merged into the main branch) and you want to send those translated files over sooner than the two week interval.
If you only want to send everything that is in the translation queue earlier, then you will just need to trigger the Send content to be translated workflow. Just follow the steps above to trigger the workflow. The result should be a job in TMS. If there is no job, this could be for a couple of reasons: * There are no files in the translation table. * The job didn’t execute correctly but didn’t fail. Reach out to an engineer on the Developer Enablement team to find out what went wrong (maybe some edge case we missed).
If you would like to get content from a completed job into GitHub sooner than the daily execution of the workflow, you will need to trigger the Check status of translation jobs workflow. You can do this by following the guidelines above for triggering a manual workflow.
There are two cases to consider when adding files to the translation queue.
- There is a file that hasn’t been translated before into that language, and you want to set it up to be translated into that language.
- There is a file where you want to re-run a translation but hasn’t been edited recently.
If this is the case, this will require you to make a PR to configure that file to be translated. Specifically, you will want to add that language to the translate
property in the frontmatter. If the translate
property does not exist, you will want to add the property and then list the language underneath.
To make a PR from the GitHub UI, you will need to navigate to the page that you want to translate in the github repo. If you don’t know where it resides in the repo but do know where it resides on the site, you can click on the “edit this page” button in the right navigation of the site, which will take you to the file in GitHub.
Once you are in the file, you may click the little pen icon in the right hand corner to edit the page. Then you can use the in-browser code editor to make changes and create a new branch. For example, if you would like to have the content to be translated into Japanese, you will need to add the front matter for that. For more information on creating a PR, reach out the documentation team! They are leading workshops on contributing to the docs site.
This is a case we hopefully wouldn’t run into, since we are constantly keeping track of all the files that have been edited and regularly sending them to be translated. If, however, something happens and there is some need to run a translation on a file that hasn’t been edited recently (no changes to frontmatter, no editing of the content) then reach out to someone on the Developer Enablement team. We may suggest running the translation separately as not part of the workflow (i.e. uploading the files manually, downloading them and then creating a PR) or we can add the filenames manually to the translation queue, but this is not recommended.
Branches are source code for a specific version of the code. You usually branch off of the default
branch which is the source of truth for your codebase. When you branch off of the default
branch and make some changes, you can create a pull request to add those changes back in. In a pull request, you are requesting that the changes you made in your branch get merged into the default
or base branch.
On our site, we have a little bit of a different set up. We have two branches that are the source of truth for our codebase. We have the develop
branch which holds the code for our “staging environment” (meaning everything that we want to change before publishing it to the whole world) and then the main
branch which holds the code for our “production environment.”
When we make changes to our codebase, we branch off the develop branch and then merge pull requests into the develop branch. When we want to release all these changes to production, we then create a pull request to merge develop into the main branch. We call the pull requests from the develop branch into the main branch “releases” since they are releasing the content to production.
Github Actions are just basically scripts that are run in one of three ways:
- By some other interaction on GitHub (creating a pull request, merging one branch into another, ect).
- On a time interval (cron job: everyday at 6pm, every two weeks at 1am, ect.)
- Or manually triggering the action (or “workflow” or “script”) from the GitHub UI. Github Actions are also called workflows, and inside those workflows are jobs (separate scripts that run to complete the workflow). Most of our workflows only have one job, which contains a series of steps.
We have two tables in DynamoDB for keeping track of our localization workflow. The first table is the to_translate
table or the translation table. It keeps track of all the filenames that have been changed since the last job created and the languages that the files need to be translated to. This is an example of what that table looks like:
The other table that we have is for keeping track of the job ids. These get stored once the job gets created, and is used to check for completion. In DynamoDB, it is named being_translated
. This is an example of what that table looks like:
In order for TMS to be able to process our files, we need to convert it into a format that they accept. For that reason, we serialize our mdx files into html. What this means is that we convert most of custom components into something that looks like this:
In this, we take our mdx components and convert them into plain html. We specifically turn our components into divs and props into serialized strings as either attributes of the div or as children of the div. Therefore, only the content that is relevant to the translator is provided while the rest is hidden as attributes html elements.
When getting this content back from TMS, we get back the same format that we sent it to them with the strings replaced with the translated strings. Then we take all the “serialized” components and convert them back into mdx, and take the file and save it as mdx.
Although not listed on the diagrams above, AWS Amplify is where we host and deploy our site. Everytime we merge into the main branch, Amplify is triggered to build the latest updates from GitHub. It fetches the code from the main branch and builds the site. Once the site has passed the checks, it deploys the built site to docs.newrelic.com. If the site fails to build, then the new version is not deployed and the engineers are notified.
Although also not described in detail in the Automatic workflow section, we also send visual context to TMS during the Send content to be translated workflow. This context is the html straight from the current version of the docs site available on docs.newrelic.com. When we send the context to TMS, they run automatic matching on the files to match the files that we have uploaded to the contexts that we have uploaded. In the Check status of translation files workflow, we delete the contexts for the job that we have created. The reason for this is to ensure that no outdated contexts are being matched. Read more about visual context and context matching here.