This script looks for patterns or literal strings in the input files and replaces them with the corresponding text.
Provided the following as input:
- a list of search and replace patterns (in an Excel file)
- a folder containing text-based files (e.g. XML)
For every file, the script will:
- look for every search pattern
- replace it with the replacement pattern
- write an output file in the output folder specified
- Adding files to correct batch folder
- Comparing batch/unit config file with folders/files in repo
Clone this repo and change directory to it:
gh repo clone capstanlqc/source-xml-linter
cd source-xml-linter
Install a virtual environment in the root folder of the repo (only once):
python -m venv venv
Activate the virtual environment (before every time you run the script):
source venv/bin/activate
Install all dependencies in the virtual environment:
pip install -r requirements.txt
To lint XML files, run the code (see below for details):
python str_subs.py \
-i /path/input/folder \
-o /path/to_output/folder \
-c /path/to/config.xlsx
You may exit the virtual environment when you're done running the code:
deactivate
The help will show you what input parameters are needed:
$ python str_subs.py --help
usage: str_subs.py [-h] [-V] [-i INPUT] [-o OUTPUT] [-c CONFIG]
String substitution in XML files
options:
-h, --help show this help message and exit
-V, --version show program version
-i INPUT, --input INPUT
specify path to the folder containing the files to be processed
-o OUTPUT, --output OUTPUT
specify path to the folder where the processed files should be saved
-c CONFIG, --config CONFIG
specify path to the config file containing patterns etc.
There are three different config files for:
- new content
- trend content
- XYZ batch
In practice, that means running three different commands depending on the files that need to be linted and signed off.
For new content:
python $app/str_subs.py -i $tolint_new -o $linted_new -c $app/config_new.xlsx
For trend content:
python $app/str_subs.py -i $tolint_trend -o $linted_trend -c $app/config_trend.xlsx
For XYZ batch:
python $app/str_subs.py -i $tolint_xyz -o $linted_xyz -c $app/config_xyz.xlsx
A log file will be written in logs with an account of what was done.
To update config files:
- Save the PISA2025ft-batches data (first two columns) to
source/files.tsv
- Convert that TSV file to YAML format
python $app/tsv2yml.py -i $files_tsv -o $files_yml
where the variables stand for the following values:
- app:
/path/to/source-xml-linter/tsv2yml.py
- files_tsv:
/path/to/pisa_2025ms_translation_common/source/files.tsv
- files_yml:
/path/to/pisa_2025ms_translation_common/source/files.yaml
To copy linted files the assigned batch folder:
bash $app/add_file_to_batch.sh -a copy -o $linted -d $source -c $files_yml
RUn the script without parameters to see the help:
usage: add_file_to_batch.sh [-a ACTION] [-c CONFIG] [-o ORIGIN] [-d DESTINATION]
Puts each file in the specified batch folder inside the destinatino parent folder.
parameters:
-a ACTION
action requested: either 'move' or 'copy'
-c CONFIG
absolute path to the configuration yaml file that indicates which folder each file belongs to,
-o ORIGIN
origin parent directory containing the files to be arranged in folders,
-d DESTINATION
destination parent directory where the folders containing the files should be written.
To confirm that there are no mismatches between the batches-files config and the common repository, run:
bash $app/check_files_sync.sh -d $source -c $files_yml
Output will be written to file source/file_sync_YYYYMMDD.log
Run the script without parameters to see the help:
usage: add_file_to_batch.sh [-c CONFIG] [-d DIRECTORY]
Looks for mismatches between the batches-units config file and the folders/files in the common repository.
parameters:
-c CONFIG
absolute path to the configuration yaml file that indicates which folder each file belongs to,
-d DIRECTORY
directory containing the batch folders containing unit files.
Tentative todo list:
- Parse XML input file and run the substution only inside the text node (e.g.
<label>
) - Make it a requirement that argument paths are absolute paths