This folder contains the different Python scripts we have used during the project.
- decoupage_pliegos: Script to crop and rename the images of the Pliegos Varios corpus.
- creation_masterfiles: Concatenation of several XML files in single XML file, called masterfile, with XInclude.
- iiif_tei_additions: Inserting IIIF URI — stored in a CSV file – into XML-TEI files.
- extractionWoodcutsAltoXml: Extraction of data about woodcuts from Alto-XML files (coordinates, document identifier, page number) and creation of an Excel worksheet.
- extractionWoodcutsPageXML: Extraction of data about woodcuts from PAGE-XML files (coordinates, document identifier, page number) and creation of an Excel worksheet.
- Excel_To_TEI: Creation of XML-TEI files from an Excel worksheet (Used for the woodcuts catalogue).
- TEI_To_Excel: Extraction of data from XML-TEI files (title, date, printer) and addition of these data in a pre-existing Excel worksheet (Used for the woodcuts catalogue).
- iiif_tei_illustration: Inserting IIIF URI - stored in a CSV file – into XML-TEI file + Modification of URI to target specific portions of an image, based on the coordinates given by XML files.
- webScrapping: This code automatically extracts information from the metadata of a digital library (in our case, title of documents and IIIF links).
- NerToWkd: This code can be used to launch several SPARQL queries on the Wikidata endpoint from a list of names.
- csv2json: This code generates a JSON file in Linked Places format from a CSV file. The resulting JSON file is used to display places on a map created with the Peripleo application.
- ner2csv: This code transforms an IOB (Inside-Outside-Beginning) format file into a CSV file. It recovers only the named entities and reconstructs the entities in several parts (e.g. "Santa B-LOC Ana I-LOC" becomes "Santa Ana" in the CSV).
- ner2tei: This code enriches a CSV file with new information extracted from TEI XML files (example). It also inserts geographic information contained in the CSV into these TEI files: in the body of the text with
<name>
elements and in the<teiHeader>
with a<listPlace>
element (example). - ner2tei_index: This code creates a TEI index of place names from a CSV file.
- ner_count: This code counts the number of occurrences of a word in a list and outputs the results in a spreadsheet.