Skip to content

Latest commit

 

History

History
234 lines (204 loc) · 10.3 KB

README.md

File metadata and controls

234 lines (204 loc) · 10.3 KB

A NodeJS-based converter for translating GEDCOM files into JSON, with Linked Data context as well.

What?

The GEDCOM genealogy data file format is a text-based format, but defines a hierarchical structure (first value of each line of data is the "indent level" for that data) so very easily translates into a JSON structure, which in this booming age of REST APIs, lots of services understand more readily than GEDCOM files.

The JSON-LD specification is an extension of JSON that adds context for associating the data with the Semantic Web/Linked Data web. This converter maps a few ontologies to various parameters in GEDCOM:

  • Friend of a Friend (foaf): People and common relations between them.
  • Relationship (rel): Deeper relationship terms for relating two people.
  • Biography (bio): Vocabulary for enumerating events in a person's life and participants in those events (GitHub Source).
  • Dublin Core (dc): Vocabulary for citing sources and dates.

Usage

Output JSON:

node convert.js myFamilyTree.ged

Save JSON to a file:

node convert.js myFamilyTree.ged > myFamilyTree.json

The output structure of the convert.js script looks like:

{
  "@context": {
    "foaf": "http://xmlns.com/foaf/0.1/",
    "rel": "http://purl.org/vocab/relationship",
    "bio": "http://purl.org/vocab/bio/0.1/",
    "dc": "http://purl.org/dc/elements/1.1/"
  },
  "@graph": [
    {
      "@id": "_:I101",
      "@type": "foaf:Person",
      "foaf:name": "John /Smith/",
      "foaf:gender": "M",
      "bio:event": {
        "@type": "bio:Birth",
        "DATE": "1 APR 1900",
        "bio:principal": {
          "@id": "_:I101"
        }
      },
      "bio:relationship": {
        "@id": "_:F101"
      }
    },
    {
      "@id": "_:F101",
      "@type": "bio:Relationship",
      "bio:participant": [
        {
          "@id": "_:I101"
        },
        {
          "@id": "_:I102"
        }
      ]
    }
  ]
}

To be parsed into RDF, it will need an output structure like:

[
  {
    "@context": {
      "foaf": "http://xmlns.com/foaf/0.1/",
      "rel": "http://purl.org/vocab/relationship",
      "bio": "http://purl.org/vocab/bio/0.1/",
      "dc": "http://purl.org/dc/elements/1.1/"
    },
    "@id": "_:I101",
    "@type": "foaf:Person",
    "bio:relationship": {
      "@id": "_:F101"
    },
    "foaf:gender": "F",
    "foaf:name": "Jane /Smith/"
  },
  {
    "@context": {
      "foaf": "http://xmlns.com/foaf/0.1/",
      "rel": "http://purl.org/vocab/relationship",
      "bio": "http://purl.org/vocab/bio/0.1/",
      "dc": "http://purl.org/dc/elements/1.1/"
    },
    "@id": "_:I102",
    "@type": "foaf:Person",
    "bio:child": {
      "@id": "_:I103"
    },
    "bio:relationship": {
      "@id": "_:F101"
    },
    "foaf:gender": "F",
    "foaf:name": "Betty /Smith/"
  }
]

Meaning return a list of objects, and every object has its own @context set. Then a converter like riot --output=RDF/XML ged.jsonld can convert it to RDF/XML. (TODO)

Don't care about Semantic data

Grab the @graph property from the result JSON, which is an array of JSON objects. Objects that have a @type property of foaf:Person are INDI objects in the original GEDCOM, and @type of bio:Relationship are FAM objects in the original file. Between those two types, all the properties of the original data file should be present.

Mapping

  • CONT items are concatenated onto their parent items with a line break
  • TIME items are concatenated onto their parent DATE items with a space
  • Events on an INDI have that individual as bio:principal
GEDCOM Linked Data Note
INDI foaf:Person
INDI.NAME foaf:name
INDI.SEX foaf:gender
INDI.BIRT bio:Birth
INDI.CHR bio:Baptism
INDI.CHRA bio:Baptism
INDI.BAPM bio:Baptism
INDI.BLES bio:Baptism
INDI.DEAT bio:Death
INDI.BURI bio:Burial
INDI.CREM bio:Cremation
INDI.ADOP bio:Adoption
INDI.BARM bio:BarMitzvah
INDI.BASM bio:BasMitzvah
INDI.CONF bio:IndividualEvent Confirmation
INDI.FCOM bio:IndividualEvent First Communion
INDI.ORDN bio:Ordination
INDI.NATU bio:Naturalization
INDI.EMIG bio:Emigration
INDI.IMMI bio:IndividualEvent Immigration
INDI.CENS bio:GroupEvent Census
INDI.PROB bio:IndividualEvent Probate
INDI.WILL bio:IndividualEvent Will
INDI.GRAD bio:Graduation
INDI.RETI bio:Retirement
INDI.EVEN bio:IndividualEvent
FAM bio:Relationship
FAM.HUSB bio:participant Both husband and wife become bio:participants on the FAM Relationship; to find the gender, reference the related foaf:Person.
FAM.WIFE bio:participant Both husband and wife become bio:participants on the FAM Relationship; to find the gender, reference the related foaf:Person.
FAM.ANUL bio:Annulment
FAM.CENS bio:GroupEvent Census
FAM.DIV bio:Divorce
FAM.DIVF bio:GroupEvent Divorce filed
FAM.ENGA bio:GroupEvent Engagement
FAM.MARR bio:Marriage
FAM.MARB bio:GroupEvent Marriage Announcement
FAM.MARC bio:GroupEvent Marriage Contract
FAM.MARL bio:GroupEvent Marriage License
FAM.MARS bio:GroupEvent Marriage Settlement
FAM.EVEN bio:GroupEvent
DATE dc:date
SOUR dc:source Property on an object that points to the Source object
SOUR dc:BibliographicResource Class that the above points to
SOUR.DATA dc:coverage
SOUR.DATA.DATE dc:temporal
SOUR.AUTH dc:creator
SOUR.TITL dc:title

Linkages

The GEDCOM format links individuals through FAM objects, with the HUSB, WIFE, and CHIL references pointing to the various individuals, rather than individuals referencing each other. This is useful for drawing family tree diagrams, as the parents are usually arranged horizontally and joined to a central node, which the children's lines sprout from.

But for traversing person-to-person relationships, it adds a needless step. The conversion script adds rel:childOf rel:siblingOf, rel:spouseOf, and rel:parentOf to the individual (foaf:Person) objects, so FAM/bio:Marriage objects can be bypassed if desired. Where applicable, the more strict bio:child, bio:father, and bio:mother are used instead.

  • CHIL tags are left on the FAM (bio:Relationship) object to preserve the data of which marriage a child came from.

  • If the FAM object has an ANUL tag, no rel:spouseOf relations are generated. (TODO)

  • If the FAM object has an ENGA tag, but no MARR tag, rel:engagedTo is used instead of rel:spouseOf.

  • If the FAM object has no ENGA and no MARR tag, no rel:spouseOf or rel:engagedTo are created between the parents, but any children get the proper rel:childOf and rel:siblingOf relations added.

  • If the INDI object has an FAMC tag with PEDI set to 'natural' or 'birth', bio:child/father/mother tags are used instead of rel:childOf/parentOf.

  • If the FAM.CHIL object has _MREL or _FREL attributes (used by Family Tree Maker software to indicate pedigree) set to 'natural', bio:child/father/mother tags are used instead of rel:childOf/parentOf.

  • If an ANUL, DIV, or DIVF exists on a FAM object, the bio:concludingEvent of that bio:Marriage is set to that event. If both DIV and DIVF exist, DIV takes precedence as the concluding event. (TODO)

  • If one of the partners in a bio:Marriage has a Death event (or the first occurring Death if both are), that Death event is set as the bio:concludingEvent for the bio:Marriage if no ANUL, DIV, or DIVF exists. (TODO)

  • If DEAT and BURI or CREM exist, bio:followingEvent and bio:precedingEvent relationships are added. (TODO)

There are a few places in the GEDCOM structure that break the standard linkage between nodes that an RDF graph has. Namely, the INDI.FAMC.PEDI (Pedigree) and INDI.FAMC.STAT (Status) tags break the standard INDI.FAMC linkage. The PEDI and STAT attributes are not attributes of the FAM referenced by the FAMC ID, but rather attributes of the link that individual has with that family, which doesn't work well in JSON-LD. Technically, it's a reification of the link.

SOUR tags have the same situation; they are added onto a link to another node, and modify the link, rather than either of the nodes.

So, to get that to work properly, when an object (e.g. a foaf:name property on a foaf:Person) has a SOUR property, the parent object (foaf:Person in this example) gets a GEDREIF property with a value of:

{
  "@type": "rdf:Statement",
  "rdf:subject": "_:I101",
  "rdf:predicate": "foaf:name",
  "rdf:object": "John Smith",
  "dc:source": "_:S101",
}

If there are multiple SOUR references for that object, that property becomes an array of objects. If multiple SOUR references have the same ID, the rdf:predicate for that SOUR becomes an array of properties that source affects. (TODO)

For pedigree information on an INDI.FAMC, the INDI object gets a GEDREIF attribute, which is set to: (TODO)

{
  "@type": "rdf:Statement",
  "rdf:subject": "_:I101",
  "rdf:predicate": "FAMC",
  "rdf:object": "_:F101",
  "dc:description": "natural"
}

Breakdowns for being more specific about an INDI.NAME also exist in the GEDCOM specification. For example, an INDI with a GIVN and SURN additional tag on their NAME:

{
  "@type": "rdf:Statement",
  "rdf:subject": "_:I101",
  "rdf:predicate": "foaf:name",
  "rdf:object": "John Smith",
  "GIVN": "John",
  "SURN": "Smith"
}

Visualization ideas:

  • Pedigree tree: D3 "elbow dendrogram" using the "tree" D3 layout.
  • D3 smart force labels: Adding functinality to have labels "orbit" their node, and repel each other, so they stay out of each other's way.

Other Resources