Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fill in unused identifiers on semantic structures #274

Open
goodmami opened this issue Jan 20, 2020 · 0 comments
Open

Fill in unused identifiers on semantic structures #274

goodmami opened this issue Jan 20, 2020 · 0 comments

Comments

@goodmami
Copy link
Member

For a long time PyDelphin has included on MRS, DMRS, and EDS a slot for the 'identifier' ('ident' in the original DTDs) field, which is basically unused. The field only gets filled in if it is encoded in a representation that is read in. There are few comments about it in the LKB code. Here's one from lingo/lkb/src/rmrs/dtd-notes.txt:

ident is an attribute on rmrs's to identify which utterance they
belong with. The HoG currently uses a wrapper around the RMRS, with
identifying information there instead. Hinoki uses the ident
identifier but may switch to a wrapper, in which case ident may be
removed. In any case it is optional.

(note that XMT uses the HoG strategy)

And in lingo/lkb/src/tsdb/lisp/redwoods.lisp, it seems to be formatted using a few other fields:

      for ident = (format nil "~a @ ~a~@[ @ ~a~]" i-id result-id i-comment)
      [...]
              (mrs::output-rmrs1
               (mrs::mrs-to-rmrs mrs)
               'mrs::xml out nil nil i-input ident)

And in lingo/lkb/src/rmrs/dmrs.lisp, it (as far as I can tell) uses the first column of a [incr tsdb()] file:

              (let ((scount (extract-fine-system-number fsout))
              [...]
                        (setf (dmrs-ident dmrs) (format nil "~A" scount))
[...]
(defun extract-fine-system-number (str)
  ;;; compare extract-fine-system-sentence
  (let ((apos (position #\@ str)))
        (if apos
            (parse-integer (subseq str 0 apos) :junk-allowed t))))

In PyDelphin, not all codecs can handle identifiers (the PENMAN ones don't, nor do any EDS ones). These identifiers could be useful for, e.g., exporting a corpus of *MRS representations which encode which items they came from.

It seems like the appropriate form of the identifier may depend on the task. In some cases, just an i-id from a profile would be enough, while for others a parse-id and result-id may be needed to distinguish among multiple MRSs from one item.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant