Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Link-Grammar-based parsing #43

Open
akolonin opened this issue Sep 29, 2021 · 1 comment
Open

Support Link-Grammar-based parsing #43

akolonin opened this issue Sep 29, 2021 · 1 comment
Assignees
Labels
enhancement New feature or request help wanted Extra attention is needed on hold

Comments

@akolonin
Copy link
Member

akolonin commented Sep 29, 2021

We want to be able to do parsing of any language supported by LinkGrammar, starting with English, to be available both internally in Aigents framework and via Aigents Language API.

Specs:

  1. Integrate https://github.com/aigents/aigents-java-nlp into https://github.com/aigents/aigents-java as a dependency (the simpler the better, just having an extra jar file built from the former and required by the latter is fine).
    1.1. Link Grammar dictionaries are assumed to be deployed in the same folder structure as in https://github.com/aigents/aigents-java-nlp/tree/master/ and https://github.com/opencog/link-grammar/tree/master (./data/en/*)
    1.2. The aigents-java-nlp can be either A) built as a separate jar or B) just built as an external dependency from source files or C) cloning contents of "/aigents/aigents-java-nlp/src/main/java" to "/aigents/src/main/java" (having the package names fixed along the way to "org.aigents") - whichever is easier and more logical
    1.3. Tests from aigents-java-nlp should not be part of the jar (A above) or Aigents build (B above)
  2. Have internal https://github.com/aigents/aigents-java package responsible for NLP and parsing in particular, add a wrapper(s) to the Link Grammar loader and Link Parser to it (based on https://github.com/aigents/aigents-java-nlp ).
    2.1. Parsing means "parsing", which is not a "generation" or "segmentation" from aigents-java-nlp
    2.2. Parsing is what conventional LinkGrammar Parser (C++) does - takes the single sentence into a graph of linked words (it is close to what Segmentation code does, but it is different, so can look up the Segmentation but have different code).
    2.3. Code should be placed in "net.webstructor.nlp" of aigents-java project and called LinkGrammarParser, being a wrapper of the new class org.aigents.nlp.Parser created as modified/extended version of main.java.org.aigents.nlp.gen.Segment
  3. Do dictionary load only once per application startup in constructors or init function of the new LinkGrammarParser which should be implementor of GrammarParser interface. LangPack class should initialize it as member in LangPack constructor and it can be used later when doing parsing.
  4. Setup default storage for Link Grammar dictionary for Aigents Server deployment, update project documentation respectively
  5. Implement Link Grammar parser based parsing, extending the existing parsing API - tryParse - https://github.com/aigents/aigents-java/blob/master/src/main/java/net/webstructor/peer/Conversation.java#L814 - will have extra "mode" option with "link-grammar"/"link grammar"/"lg" value for that
  6. Add integration tests, extending the existing ones https://github.com/aigents/aigents-java/blob/master/php/agent/agent_cat.php#L404

Use existing LinkGrammar in Java implementation https://arxiv.org/pdf/2105.00830.pdf

Subtasks:

  1. Basic porting without of account of cost - done in b2ae519
  2. Assemble based on disjuncts - 2 weeks
  3. Assemble with cost account - 2 weeks
  4. Upgrade to support the latest Link Grammar? - ? weeks

Extension for segmentation and punctuation - subtasks:
5. Segmentation by sentence - 4 weeks
6. Adding punctuation - 4 weeks
7. Russian dictionary load - 2 weeks (need only for Russian)
8. Assemble with the account to morphology - 2 weeks (need only for Russian)

@akolonin akolonin added enhancement New feature or request help wanted Extra attention is needed labels Sep 29, 2021
@akolonin akolonin self-assigned this Nov 7, 2021
@akolonin akolonin added the progress In progress label Nov 7, 2021
@akolonin akolonin removed their assignment Nov 15, 2021
@akolonin akolonin removed the help wanted Extra attention is needed label Nov 22, 2021
@akolonin
Copy link
Member Author

@rvignav further fixes and improvements to Segmentation, Parsing, QA and the rest will have to be done relying on this.

@akolonin akolonin added the help wanted Extra attention is needed label Nov 23, 2021
@akolonin akolonin added on hold and removed progress In progress labels Mar 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed on hold
Projects
None yet
Development

No branches or pull requests

2 participants