Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Internationalising CSV output #341

Open
3 of 5 tasks
markbrough opened this issue Jan 25, 2022 · 8 comments
Open
3 of 5 tasks

Internationalising CSV output #341

markbrough opened this issue Jan 25, 2022 · 8 comments
Labels
for-odsc Issues for Open Data Services to work on
Milestone

Comments

@markbrough
Copy link
Member

markbrough commented Jan 25, 2022

Is your feature request related to a problem? Please describe.

Users can now select different languages in the front-end interface. However, the data is still output in EN.

Describe the solution you’d like

For CSV and Excel exports:

  • receive the language parameter from filters
  • output CSV headers in the relevant language
  • where there are multiple titles/descriptions, output the relevant language title/description, or fallback to EN, or fallback to any other language (note: this depends on Capture Multiple Titles and Descriptions #167)
  • where there are multiple languages for the name of the participating / provider / receiver orgs, output the relevant language name, or fallback to EN, or fallback to any other language
  • output other codelists (sectors, etc.) using the relevant language (NB this might require more substantial changes to the way that codelists are modelled as Enums: https://github.com/codeforIATI/iati-datastore/blob/main/iati_datastore/iatilib/codelists/__init__.py)

NB relevant codelists can be retrieved from here: https://codelists.codeforiati.org/

The initially-supported languages should be: EN, FR, ES, PT

Note that we don't yet have complete translations for every codelist in each of these languages, but we will work on this separately.

Describe alternatives you’ve considered

Using CDFD: https://countrydata.iatistandard.org

Additional context

Add any other context or screenshots about the feature request here.

@markbrough markbrough added the for-odsc Issues for Open Data Services to work on label Jan 25, 2022
@markbrough markbrough added this to the Milestone 2 milestone Jan 25, 2022
@odscjames
Copy link
Collaborator

Also Excel.

& also headers of Excel/CSV

@markbrough
Copy link
Member Author

@odscjames
Copy link
Collaborator

Can roll out a bit at a time - do other 2 first then ENUM one. May want to rewrite ENUM handling at same time.

@markbrough
Copy link
Member Author

I updated the description a little bit providing a breakdown of required work

odscjames added a commit that referenced this issue May 18, 2022
#341

Alter JS so locale key added to URL's

Ignore locale key when checking for valid filters

Start using flask-babel

Set up our 2 current translations, with only one translated string (from Google Translate)

Translate headers in CSV & XLSX

So headers are picked up, duplicate at the top of iati_datastore/iatilib/frontend/serialize/csv.py
(Added TODO to fix this later)

Update README with instructions

Add compile strings stage to deploy
odscjames added a commit that referenced this issue Jul 20, 2022
#341

Alter JS so locale key added to URL's

Ignore locale key when checking for valid filters

Start using flask-babel

Set up our 2 current translations, with only one translated string (from Google Translate)

Translate headers in CSV & XLSX

So headers are picked up, duplicate at the top of iati_datastore/iatilib/frontend/serialize/csv.py
(Added TODO to fix this later)

Update README with instructions

Add compile strings stage to deploy
odscjames added a commit that referenced this issue Jul 20, 2022
#341

Alter JS so locale key added to URL's

Ignore locale key when checking for valid filters

Start using flask-babel

Set up our 2 current translations, with no translated strings.

Translate headers in CSV & XLSX

So headers are picked up, duplicate at the top of iati_datastore/iatilib/frontend/serialize/csv.py
(Added TODO to fix this later)

Update README with instructions

Add compile strings stage to deploy

Fix tests
odscjames added a commit that referenced this issue Jul 20, 2022
#341

Alter JS so locale key added to URL's

Ignore locale key when checking for valid filters

Start using flask-babel

Set up our 2 current translations, with no translated strings.

Set up custom extractor to get CSV column headings.

Translate headers in CSV & XLSX

Update README with instructions

Add compile strings stage to deploy

Fix tests
@radix0000
Copy link
Collaborator

Have begun looking a this, have a couple of clarifying questions see below.

@radix0000
Copy link
Collaborator

radix0000 commented Sep 9, 2022

Description types

  • It seems like about 30 percent of activities have multiple descriptions of different types (the multiple languages complicate counting though so that is a rough number)
  • Currently the first description in the XML is just grabbed and put into the description field and all the descriptions are put into description_all_values
  • Can just grab one like current code does (but language specific), but seems like the General should override Objectives etc. (basically lowest code wins) would be better
  • But throwing away data doesn't obviously seem the right way to go, or should we be putting more fields in output (csv etc.), e.g. description_general, description_objectives etc.
  • Question around what exactly would be best fields in csv though - description, description_general, description_objectives etc. where description is copy of most general description if there are multiple types, or description_general, description_objectives etc. where if there is one default description it is put in description_general

@radix0000
Copy link
Collaborator

Locales

  • Currently "locale" set through language drop down at top of page (EN, FR or PT) the primary function of which is to translate page, but seems very limiting (there are other languages in datastore, I have seen ES for a start) and you could think of plenty of use cases where someone would like to download data which is in a different language to that which it is convenient for them to read the page in, plus it is not being particularly explicit about what is going to happen for the user
  • Alternative would be that the language set at top of page sets the default for a "Preferred Language" form field for data downloads, which could then be changed by the user as required, and is then what gets passed to python through locale http parameter
  • Ideally the "Preferred Language" drop down would contain all the languages currently in the database
  • A possible enhancement would be if on the default EN homepage, which I assume most users would be using, the contents of Accept-Language request header are used to set the default "Preferred Language" rather than assuming it is EN (but could then be changed by user)
  • Another possible further enhancement at some point would be to store the list of available languages in a database table and append new languages when encountered during the data import process, so the drop down form field of available languages keeps up-to-date with what is actually in the database

@markbrough
Copy link
Member Author

Hello @radix0000 and welcome! Looking forward to working together, especially after such an insightful first few comments here :)

Description types - you are right. We should keep all description types, and yes we should add these as new columns. Your column headers are good. Also agree with description_general containing a fall-back to the most general description if there is not a description of type 1.

Locale - I agree that it is nice to be flexible with which languages we can support, but there are a number of constraints:

  1. the IATI data - we will use this for titles and descriptions, and are dependant on whether or not a publisher has published these in a particular language
  2. codelists - so far, codelists have been translated into:
  1. interface - so far, the Datastore interface has been translated into:
  • French
  • Portuguese
  1. CSV headers - we haven't yet translated any of these but we can use some text we have translated elsewhere to do this quite quickly at least for FR, and I think probably for ES and PT.

The internationalisation of the output will depend particularly on (2) above, and the accessibility will depend on (3), so requesting another language might not get the user very far. But if it is not complicated to implement then why not allow other languages and fall back to EN CSV headers and codelists if these are not available in the requested language?

NB the preferred language is already automatically set if the user's language is other than English -- try changing the preferred language in your browser and then going to the interface. I can see why you might want to download French versions of the files using the English interface but I don't think we should bother worrying about that right now, as the user can just switch to the French interface.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
for-odsc Issues for Open Data Services to work on
Projects
None yet
Development

No branches or pull requests

3 participants