Skip to content

Latest commit

 

History

History
24 lines (20 loc) · 2.96 KB

README.md

File metadata and controls

24 lines (20 loc) · 2.96 KB

Python scripts

This folder contains the different Python scripts we have used during the project.

Chapbooks processings

  • decoupage_pliegos: Script to crop and rename the images of the Pliegos Varios corpus.
  • creation_masterfiles: Concatenation of several XML files in single XML file, called masterfile, with XInclude.
  • iiif_tei_additions: Inserting IIIF URI — stored in a CSV file – into XML-TEI files.

Woodcuts processings

  • extractionWoodcutsAltoXml: Extraction of data about woodcuts from Alto-XML files (coordinates, document identifier, page number) and creation of an Excel worksheet.
  • extractionWoodcutsPageXML: Extraction of data about woodcuts from PAGE-XML files (coordinates, document identifier, page number) and creation of an Excel worksheet.
  • Excel_To_TEI: Creation of XML-TEI files from an Excel worksheet (Used for the woodcuts catalogue).
  • TEI_To_Excel: Extraction of data from XML-TEI files (title, date, printer) and addition of these data in a pre-existing Excel worksheet (Used for the woodcuts catalogue).
  • iiif_tei_illustration: Inserting IIIF URI - stored in a CSV file – into XML-TEI file + Modification of URI to target specific portions of an image, based on the coordinates given by XML files.
  • webScrapping: This code automatically extracts information from the metadata of a digital library (in our case, title of documents and IIIF links).

NER processings

  • NerToWkd: This code can be used to launch several SPARQL queries on the Wikidata endpoint from a list of names.
  • csv2json: This code generates a JSON file in Linked Places format from a CSV file. The resulting JSON file is used to display places on a map created with the Peripleo application.
  • ner2csv: This code transforms an IOB (Inside-Outside-Beginning) format file into a CSV file. It recovers only the named entities and reconstructs the entities in several parts (e.g. "Santa B-LOC Ana I-LOC" becomes "Santa Ana" in the CSV).
  • ner2tei: This code enriches a CSV file with new information extracted from TEI XML files (example). It also inserts geographic information contained in the CSV into these TEI files: in the body of the text with <name> elements and in the <teiHeader> with a <listPlace> element (example).
  • ner2tei_index: This code creates a TEI index of place names from a CSV file.
  • ner_count: This code counts the number of occurrences of a word in a list and outputs the results in a spreadsheet.