Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

integrating Redwoods #193

Open
rimvydasrub opened this issue Oct 25, 2018 · 9 comments
Open

integrating Redwoods #193

rimvydasrub opened this issue Oct 25, 2018 · 9 comments

Comments

@rimvydasrub
Copy link

Hi,

I am a final year undergraduate student at the university of Edinburgh. For my dissertation (supervised by Alex Lascarides) I am using Redwoods Treebank to train a semantic parser to obtain DMRS/EDS logical forms. I choose to use pyDelphin as it offers a nice interface and conversion between these forms in the further analysis and model training.

For obtaining the data I am currently using logon svn repository together with the redwoods bash script. It writes the results to the file which is then read and parsed to be the input for Dmrs/Eds object creation.

I believe I am not the first/last person which has this issue, so I was thinking of implementing a submodule to pyDelphin (pydelphin.redwoods) that would directly serve redwoods data in Dmrs/Eds objects. Would that be something that DELPH-IN community be interested as integration for pyDelphin.

Cheers
Rimvydas

@goodmami
Copy link
Member

Hi Rimvydas, thanks for the feature request.

PyDelphin is made to accommodate the data from any DELPH-IN grammar and not one (e.g., the ERG) in particular, so at this time I would rather not implement something so specific to the ERG and Redwoods. However, in the somewhat near future there may be room for namespace packages (see #182) which would make it possible to distribute delphin.redwoods as a plugin. Either way, the data itself should not be packaged with PyDelphin or this hypothetical plugin, but it could automatically retrieve the data from SVN and serve it in a convenient interface.

In the meantime, you can also see https://github.com/goodmami/mrs-to-penman/ which I've used to convert Redwoods data into DMRS (the PENMAN serialization of DMRS, but it could be altered to output DMRX or something else).

Your feature request would certainly be useful for many people, so I'll leave this issue open for now.

@goodmami
Copy link
Member

@rimvydasrub are you still interested in this? I've been working a bit on the next version of PyDelphin and I think there's now space to make this happen. Let me know if you'd like to help out.

@rimvydasrub
Copy link
Author

@goodmami yes I would be still interested in contributing for the pydelphin library to include support for Redwoods. Could you detail how you would envision this addition such that it would most suitable for the design of pydelphin? In particular, I currently see 3 possible inclusions:

  1. Including separate delphin.redwoods module
  2. extending delphin.mrs with submodule delphin.mrs.redwoods
  3. extending delphin.interfaces with submodule delphin.interfaces.redwoods

I personally see option 3 as the most suitable as this plugin would work as the interface to retrieve redwoods parses.

@goodmami
Copy link
Member

I'm glad to hear you're still interested. In terms of the package structure, it would be best suited as a namespace package, as the delphin package of PyDelphin will become a namespace (see #222). That is, there would be a separate repository called delphin.redwoods that implements the following structure:

delphin.redwoods/
├── delphin          # no __init__.py
│   └── redwoods.py
└── setup.py

If you want it to depend on anything in PyDelphin, such as the interface classes used by ACE and the web API, you'll need to list pydelphin as a dependency in setup.py. The package should then be uploaded to PyPI so it can be installed, e.g.:

$ pip install delphin.redwoods

Also check the v1.0.0 branch of PyDelphin because many things have changed. E.g., for the interface classes, you'll import from delphin.interface:

from delphin.interface import Response, Result

I've also thought about using these interfaces for [incr tsdb()] test suites, but I have not implemented it yet. And in general things are still a little fluid as I prepare the v1.0.0 release, but I'm happy to help guide you through the new API.

Since the new module would not be bundled with PyDelphin directly I cannot really dictate its design (although import delphin.redwoods would not work without the above), but I have some suggestions:

  • Do not include the Redwoods data in the repository; instead, the code will retrieve them from the official host when requested and store them locally
  • Allow the user to specify where data will be stored and provide a sensible default (maybe just a temporary directory, or something like ~/redwoods if it doesn't already exist); this should be done in a portable way so that, e.g., Windows users can also use it
  • Maybe check if the $LOGONROOT environment variable is set; if so, the data might already exist locally
  • Provide a data structure that specifies which profiles are designated train, dev, and test
  • If you provide the delphin.interface method of retrieving data, then:
    • either yield Result objects for each line in the results file, or accumulate all data for an input item and yield Response objects
    • also provide the standard TestSuite interface

Does this seem doable? If you like the proposal, you can either create a repository on your own account or I can create one at the delph-in organization and add you as a contributor or owner.

@rimvydasrub
Copy link
Author

The recommendations both in terms of package structure and the functionality of the plugin seem great! I believe it is doable in a couple of weeks time.

I would prefer an option with repository created by delphin-in organization and me added as contributor/owner mostly because of better exposure of the tool developed for the community.

Note even though the plugin would not be in pydelphin I will be still following/using contribution guidelines namely to make it consistent.

@goodmami
Copy link
Member

I would prefer an option with repository created by delphin-in organization and me added as contributor/owner mostly because of better exposure of the tool developed for the community.

Ok I've created https://github.com/delph-in/delphin.redwoods and added you as a collaborator with write-access. I added the basic files as we've discussed and modeled the setup.py on PyDelphin's, but some fields are now blank.

Note even though the plugin would not be in pydelphin I will be still following/using contribution guidelines namely to make it consistent.

Good idea. You might look into code style checkers like flake8 to help with following PEP-8. Also note that I have not updated the contribution guidelines yet, but it's mostly relevant for 1.0.0 except for the part about tox.

Let me know if you have any questions :)

@goodmami
Copy link
Member

goodmami commented Jul 4, 2019

@rimvydasrub I haven't seen any commits to the other repository. Do you perhaps have some work that hasn't been pushed to GitHub? The DELPH-IN summit is in a week and it would be great to make an announcement.

@rimvydasrub
Copy link
Author

@goodmami delphin.redwoods repository branch v0.1.0 contains the initial package design which has not been merged with master + released as a package because:

  • it depends on delphin v1.0.0 which has not been released yet
  • as mentioned the design in v1.0.0 was still in development at the time of discussing, especially concerning interfaces and tsdb so after initial design I have decided to wait for v1.0.0 to align with it and add further useful functionality.

As I assume, you may be releasing v1.0.0 around the time of the summit which means that the current design is quite stable, hence I could finish the functionality in the upcoming week, which should align with a summit date.

@goodmami
Copy link
Member

goodmami commented Jul 4, 2019

Oh great, I hadn't seen the v0.1.0 branch as I hadn't gotten a notification. I'll take a look.

And while the v1.0.0 design is mostly stable, the tsdb stuff has been a little turbulent lately. I'll try to help with any porting issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants