Skip to content

Commit

Permalink
Merge pull request #44 from aarhusstadsarkiv/aca-fmt-23
Browse files Browse the repository at this point in the history
aca-fmt/23 - Microsoft Word XML Document
  • Loading branch information
clausjuhl authored Jun 20, 2024
2 parents cf397c6 + 9e7f7bc commit 88d37b0
Show file tree
Hide file tree
Showing 2 changed files with 20 additions and 1 deletion.
7 changes: 7 additions & 0 deletions custom_signatures.yml
Original file line number Diff line number Diff line change
Expand Up @@ -112,3 +112,10 @@
puid: aca-fmt/22
signature: Windows Compressed Enhanced Metafile, usually image file
description: .emz files are actually .gz files, which are identified with a 10-byte header, containing a magic number (1f 8b), a compression ID (08 for DEFLATE which is normal), and a variety of timestamps and flags. .emz files are compressed image files that we can convert directly with libreoffice.
- bof: (?i)^3c3f786d6c(3[^e]|[^3].)*3e(0a|20)*3c3f6d736f2d6170706c69636174696f6e2070726f6769643d22576f72642e446f63756d656e74223f3e
plain_bof: |
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<?mso-application progid="Word.Document"?>
puid: aca-fmt/23
signature: Microsoft Word XML Document
description: Microsoft Word allows exporting document as standalone XML files, which PRONOM incorrectly identifies as plain XML (fmt/101)
14 changes: 13 additions & 1 deletion fileformats.yml
Original file line number Diff line number Diff line change
Expand Up @@ -115,6 +115,15 @@ aca-fmt/22:
converter_type: master
outputs:
- jpg
aca-fmt/23:
name: Microsoft Word XML Document
description: A Microsoft Word document saved as a standalone XML file
action: convert
convert:
- converter: libre
converter_type: master
outputs:
- odt
fmt/3:
name: Graphics Interchange Format 87a
action: convert
Expand Down Expand Up @@ -858,7 +867,7 @@ fmt/100:
- pdf
fmt/101:
name: Extensible Markup Language 1.0
action: convert
action: reidentify
convert:
- converter: text
converter_type: statutory
Expand All @@ -868,6 +877,9 @@ fmt/101:
converter_type: master
outputs:
- _
reidentify:
reason: Some applications allow saving documents as XML and can re-open them
onfail: convert
fmt/102:
name: Extensible Hypertext Markup Language 1.0
action: convert
Expand Down

0 comments on commit 88d37b0

Please sign in to comment.