Skip to content

capstanlqc/source-xml-linter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

68 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Source XML linting

This script looks for patterns or literal strings in the input files and replaces them with the corresponding text.

Business logic in a nutshell

Linting for signoff

Provided the following as input:

  • a list of search and replace patterns (in an Excel file)
  • a folder containing text-based files (e.g. XML)

For every file, the script will:

  • look for every search pattern
  • replace it with the replacement pattern
  • write an output file in the output folder specified

Auxiliary tasks

  • Adding files to correct batch folder
  • Comparing batch/unit config file with folders/files in repo

Getting started

Clone this repo and change directory to it:

gh repo clone capstanlqc/source-xml-linter
cd source-xml-linter

Install a virtual environment in the root folder of the repo (only once):

python -m venv venv

Activate the virtual environment (before every time you run the script):

source venv/bin/activate

Install all dependencies in the virtual environment:

pip install -r requirements.txt

How to run the scripts

Linting

To lint XML files, run the code (see below for details):

python str_subs.py \
    -i /path/input/folder \
    -o /path/to_output/folder \
    -c /path/to/config.xlsx

You may exit the virtual environment when you're done running the code:

deactivate

Help

The help will show you what input parameters are needed:

$ python str_subs.py --help

usage: str_subs.py [-h] [-V] [-i INPUT] [-o OUTPUT] [-c CONFIG]

String substitution in XML files

options:
  -h, --help            show this help message and exit
  -V, --version         show program version
  -i INPUT, --input INPUT
                        specify path to the folder containing the files to be processed
  -o OUTPUT, --output OUTPUT
                        specify path to the folder where the processed files should be saved
  -c CONFIG, --config CONFIG
                        specify path to the config file containing patterns etc.

Examples

There are three different config files for:

  • new content
  • trend content
  • XYZ batch

In practice, that means running three different commands depending on the files that need to be linted and signed off.

For new content:

python $app/str_subs.py -i $tolint_new -o $linted_new -c $app/config_new.xlsx 

For trend content:

python $app/str_subs.py -i $tolint_trend -o $linted_trend -c $app/config_trend.xlsx 

For XYZ batch:

python $app/str_subs.py -i $tolint_xyz -o $linted_xyz -c $app/config_xyz.xlsx 

A log file will be written in logs with an account of what was done.

Convert TSV to YAML

To update config files:

  1. Save the PISA2025ft-batches data (first two columns) to source/files.tsv
  2. Convert that TSV file to YAML format
python $app/tsv2yml.py -i $files_tsv -o $files_yml 

where the variables stand for the following values:

  • app: /path/to/source-xml-linter/tsv2yml.py
  • files_tsv: /path/to/pisa_2025ms_translation_common/source/files.tsv
  • files_yml: /path/to/pisa_2025ms_translation_common/source/files.yaml

Copy files to their batch folder

To copy linted files the assigned batch folder:

bash $app/add_file_to_batch.sh -a copy -o $linted -d $source -c $files_yml

RUn the script without parameters to see the help:

usage: add_file_to_batch.sh [-a ACTION] [-c CONFIG] [-o ORIGIN] [-d DESTINATION]

Puts each file in the specified batch folder inside the destinatino parent folder.

parameters:
    -a ACTION
                action requested: either 'move' or 'copy'
    -c CONFIG
                absolute path to the configuration yaml file that indicates which folder each file belongs to,
    -o ORIGIN
                origin parent directory containing the files to be arranged in folders,
    -d DESTINATION
                destination parent directory where the folders containing the files should be written.

Analyse correspondence between config and repo

To confirm that there are no mismatches between the batches-files config and the common repository, run:

bash $app/check_files_sync.sh -d $source -c $files_yml

Output will be written to file source/file_sync_YYYYMMDD.log

Run the script without parameters to see the help:

usage: add_file_to_batch.sh [-c CONFIG] [-d DIRECTORY]

Looks for mismatches between the batches-units config file and the folders/files in the common repository.

parameters:
    -c CONFIG
                absolute path to the configuration yaml file that indicates which folder each file belongs to,
    -d DIRECTORY
                directory containing the batch folders containing unit files.

Backlog

Tentative todo list:

  • Parse XML input file and run the substution only inside the text node (e.g. <label>)
  • Make it a requirement that argument paths are absolute paths

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published