- Write the different workpackages.
- Gantt diagram.
Workpackages:
- Communication/Organization (Tom)
- Router (Valentin)
- Web interface (Thomas)
- NLP (Raphaël)
- ML (Quentin)
- Wikidata (Thomas)
- Add-ons (Marc)
Aim: transform the output of the NLP library (typically a syntactic tree, with nodes such as "nominal group"...this output is different from a library to another), into a normal form (a semantic tree, with nodes such as "subject").
Some ideas written on the board:
What is the birth date of GW? -> ("GW", "birth date", ?)
["GW","birth date", null]
What is the birth date of the first president of the USA?
[[...],"...",null]
first([null,"is president","US"]AND[null,"is","black"])
Need a documentation of the normal form, with examples.
Marc: finish proposal.
Thomas & Valentin: doc for normal form. Once it is done, implementation in several languages.
Tom: website.
Marc, Quentin, Raphaël, Yassine: NLP libraries and bibliography.
Quickly provide a website to present our project. No diffusion for the moment (wait to have something more serious). Use the GitHub web page. Written with Jekyll?
Router written in Python (NLP libraries written in Python or Java).
Languages: HTML, CSS, Javascript.
Would be nice to test different existing libraries before any choice. Must work with Linux.
Remarks (Yassine & Marc): machine learning useless for NLP?
Improvements: when some user use the service, store the question, the provided answer, and the user's feelings (is the answer satisfying?), in order to improve the tool (not necessary ML). Could allow us to "hardcode" some questions.
Papers on the pad.
Queries on the data:
- Wikibase query engine (PHP, maintained in the future). Use a SQL basis. Well suited for birth dates. Catastrophic for recursive queries.
- Wikidata query (don't know if maintained in the future). Well suited for recursive queries.
- Wikidata toolkit (library in Java, does not perform yet queries).
Remark: Thomas prefer Wikidata query. In any cases, we have to do a modular work, to allow an "easy" changing of the queries engine.
Will be done after.