Skip to content

Latest commit

 

History

History
201 lines (136 loc) · 4.94 KB

README.md

File metadata and controls

201 lines (136 loc) · 4.94 KB

Source XML linting

This script looks for patterns or literal strings in the input files and replaces them with the corresponding text.

Business logic in a nutshell

Linting for signoff

Provided the following as input:

  • a list of search and replace patterns (in an Excel file)
  • a folder containing text-based files (e.g. XML)

For every file, the script will:

  • look for every search pattern
  • replace it with the replacement pattern
  • write an output file in the output folder specified

Auxiliary tasks

  • Adding files to correct batch folder
  • Comparing batch/unit config file with folders/files in repo

Getting started

Clone this repo and change directory to it:

gh repo clone capstanlqc/source-xml-linter
cd source-xml-linter

Install a virtual environment in the root folder of the repo (only once):

python -m venv venv

Activate the virtual environment (before every time you run the script):

source venv/bin/activate

Install all dependencies in the virtual environment:

pip install -r requirements.txt

How to run the scripts

Linting

To lint XML files, run the code (see below for details):

python str_subs.py \
    -i /path/input/folder \
    -o /path/to_output/folder \
    -c /path/to/config.xlsx

You may exit the virtual environment when you're done running the code:

deactivate

Help

The help will show you what input parameters are needed:

$ python str_subs.py --help

usage: str_subs.py [-h] [-V] [-i INPUT] [-o OUTPUT] [-c CONFIG]

String substitution in XML files

options:
  -h, --help            show this help message and exit
  -V, --version         show program version
  -i INPUT, --input INPUT
                        specify path to the folder containing the files to be processed
  -o OUTPUT, --output OUTPUT
                        specify path to the folder where the processed files should be saved
  -c CONFIG, --config CONFIG
                        specify path to the config file containing patterns etc.

Examples

There are three different config files for:

  • new content
  • trend content
  • XYZ batch

In practice, that means running three different commands depending on the files that need to be linted and signed off.

For new content:

python $app/str_subs.py -i $tolint_new -o $linted_new -c $app/config_new.xlsx 

For trend content:

python $app/str_subs.py -i $tolint_trend -o $linted_trend -c $app/config_trend.xlsx 

For XYZ batch:

python $app/str_subs.py -i $tolint_xyz -o $linted_xyz -c $app/config_xyz.xlsx 

A log file will be written in logs with an account of what was done.

Convert TSV to YAML

To update config files:

  1. Save the PISA2025ft-batches data (first two columns) to source/files.tsv
  2. Convert that TSV file to YAML format
python $app/tsv2yml.py -i $files_tsv -o $files_yml 

where the variables stand for the following values:

  • app: /path/to/source-xml-linter/tsv2yml.py
  • files_tsv: /path/to/pisa_2025ms_translation_common/source/files.tsv
  • files_yml: /path/to/pisa_2025ms_translation_common/source/files.yaml

Copy files to their batch folder

To copy linted files the assigned batch folder:

bash $app/add_file_to_batch.sh -a copy -o $linted -d $source -c $files_yml

RUn the script without parameters to see the help:

usage: add_file_to_batch.sh [-a ACTION] [-c CONFIG] [-o ORIGIN] [-d DESTINATION]

Puts each file in the specified batch folder inside the destinatino parent folder.

parameters:
    -a ACTION
                action requested: either 'move' or 'copy'
    -c CONFIG
                absolute path to the configuration yaml file that indicates which folder each file belongs to,
    -o ORIGIN
                origin parent directory containing the files to be arranged in folders,
    -d DESTINATION
                destination parent directory where the folders containing the files should be written.

Analyse correspondence between config and repo

To confirm that there are no mismatches between the batches-files config and the common repository, run:

bash $app/check_files_sync.sh -d $source -c $files_yml

Output will be written to file source/file_sync_YYYYMMDD.log

Run the script without parameters to see the help:

usage: add_file_to_batch.sh [-c CONFIG] [-d DIRECTORY]

Looks for mismatches between the batches-units config file and the folders/files in the common repository.

parameters:
    -c CONFIG
                absolute path to the configuration yaml file that indicates which folder each file belongs to,
    -d DIRECTORY
                directory containing the batch folders containing unit files.

Backlog

Tentative todo list:

  • Parse XML input file and run the substution only inside the text node (e.g. <label>)
  • Make it a requirement that argument paths are absolute paths