Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for TDL :begin ... :end environments #183

Closed
fcbond opened this issue Sep 28, 2018 · 3 comments
Closed

Add support for TDL :begin ... :end environments #183

fcbond opened this issue Sep 28, 2018 · 3 comments
Milestone

Comments

@fcbond
Copy link
Member

fcbond commented Sep 28, 2018

tdl is also used in the setup files for PET and ACE, with slightly different expectations.

PyDelphin cannot parse these files:
e.g. for zhong/cmn/zhs/zhs-pet-mal.tdl
delphin.exceptions.TdlParsingError: At ?:9 (type/rule definition)
Syntax error:
:begin :instance :status lexical-filtering-rule.

In an ideal world it would be nice to be able to parse these and use them to decide which files to parse to build a grammar model, ...

@goodmami
Copy link
Member

Arguably those are not part of "DELPH-IN TDL" (as I'm calling the TDL subset in use by our grammars), despite the use of the .tdl suffix. They are defined by Krieger, Hans-Ulrich, and Ulrich Schäfer. "TDL: a type description language for HPSG.-Part 2: user guide." (1994), hereafter "K&S 1994b", but not by Copestake 2002. That is, if you are designing a grammar in the LKB, these :begin...:end blocks ("environments" as called by K&S 1994b) are not part of the language.

I've come to understand TDL as a "modal" description language, as parts of the syntax can be turned on/off per file (e.g., lexical rules and morphological patterns), and the interpretation of the same TDL forms can differ depending on how the file was loaded (e.g., whether it is a type file or an instance file). In the LKB, these modes are determined by the Lisp function used to read the TDL file, whereas for PET they use the environments defined in K&S 1994b. ACE, and I believe agree, adopt the PET-style environments (presumably they are easier to parse than writing a full Lisp interpreter in C or C#).

These things are perhaps the closest to Stephan's long-desired "universal configuration" for grammars, so I think we can adopt a subset of them for DELPH-IN TDL, and also then include them in PyDelphin. Indeed, this was part of my vague plan if I ever get type hierarchies (#93, #94) or unification working.

The top definitions for environments in K&S 1994b is as follows:

<start> -> { <block> | <statement> }*
<block> -> "begin" ":control" "." { <type-def> | <instance-def> | <start> }* "end" ":control" "."
         | "begin" ":declare" "." { <declare> | <start> }* "end" ":declare" "."
         | "begin" ":domain" <domain> "." { <start> }* "end" ":domain" "domain" "."
         | "begin" ":instance" "." { <instance-def> | <start> }* "end" ":instance" "."
         | "begin" ":lisp" "." { <Common-Lisp-Expression>}* "end" ":lisp" "."
         | "begin" ":template" "." { <template-def> | <start> }* "end" ":template" "."
         | "begin" ":type" "." { <type-def> | <start> }* "end" ":type" "."

The only ones used in our grammars (having surveyed Jacy, ERG, and GG) are begin :type and begin :instance. We don't do templates, and allowing for arbitrary lisp code is not a good idea. I'm not certain what domains do and I don't think we need control blocks. Declare blocks could be useful for the "universal configuration" goal.

I also note that K&S 1994b uses begin :type whereas our grammars use :begin :type (note the colon before begin). Inside of these, we use :status and :include but not much more.

The pet/*.set files also use include (but without the initial colon, and without environments). I also don't see anyone putting type or instance definitions directly within the :begin :type and :begin :instance environments, except for in pet/qc.tdl, so I suppose it is possible.

@goodmami goodmami changed the title tdl is also used in the setup files for PET and ACE Add support for TDL :begin ... :end environments Sep 28, 2018
@goodmami goodmami added this to the v0.9.0 milestone Sep 30, 2018
@fcbond
Copy link
Member Author

fcbond commented Sep 30, 2018

Great, thanks for the clear analysis.

@goodmami
Copy link
Member

goodmami commented Oct 8, 2018

Ok, I think I'm happy to consider these part of DELPH-IN TDL. They are syntactically and semantically similar enough to the K&S 1994b definition of TDL, and used by enough processors, that it doesn't seem like a PET-specific feature (although I'm not prepared to include .set files, which exhibit even more differences to TDL and are PET-specific).

I've updated the wiki with a syntax description that includes only the forms I've seen, but it does allow type definitions to appear directly in the blocks and not just in included files, and similarly environment blocks and file includes can appear in any TDL file. Actually doing so, however, would break compatibility, so it should be recommended to keep the current convention of defining environments in top-level files.

In PyDelphin, these are used similar to parsing XML elements in Python's xml.etree.ElementTree.iterparse() in that you'll see an event for the start of the environment when :begin occurs, but it will have an empty entries list. When the event for the end of the environment occurs, the list will be filled up. This way one can still inspect the TDL as it is parsed:

>>> from io import StringIO
>>> from delphin import tdl
>>> g = tdl.iterparse(StringIO('''
... :begin :type.
...   t := a & b & [ ATTR "val" ].
...   :include "file.tdl".
... :end :type.'''))
>>> event, env, lineno = next(g)
>>> event
'BeginEnvironment'
>>> env.entries
[]
>>> next(g)
('TypeDefinition', <TypeDefinition object 't' at 140577670422880>, 3)
>>> next(g)
('FileInclude', <delphin.tdl.FileInclude object at 0x7fdaca1be0b8>, 4)
>>> next(g)
('EndEnvironment', <delphin.tdl.TypeEnvironment object at 0x7fdaca1ca400>, 5)
>>> env.entries
[<TypeDefinition object 't' at 140577670422880>, <delphin.tdl.FileInclude object at 0x7fdaca1be0b8>]

This should be sufficient for traversing through a grammar from its top-level TDL file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants