-
Notifications
You must be signed in to change notification settings - Fork 879
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Status of openephyra #89
Comments
Possible alternatives (didn't look at these that much) |
I spoke to the yodaqa owner a while back about using it as an alternative to open ephyra. At the time it was still heavy in dev and he was still working on the core but it'd be great to explore it and see the current status. |
Good to know. Do you have an opinion about the others ? |
Ok so I asked Yodoqa dev what's the actual status/performances of yodaqa, and here is his answer : As far as I know, there have been no formal comparisons between YodaQA But my informal experience is that YodaQA should be able to answer |
We've looked into a few other deep learning options but we're still working on them: To answer your original question, we don't plan to maintain OE beyond using what currently exists. The vision is to be able to swap QA systems in and out using the flexible Thrift interface we proposed, which allows the different systems in Lucida to communicate easily. Cool you reached out to the developer, have you played around with Yodaqa yet? A first step would definitely be benchmarking a few questions as the dev mentions and figuring out how easily we can inject new knowledge into YodaQA and then querying that DB. |
Not Yet. I'm trying to figure out how to play with Yodaqa. Unfortunately, maven is not available on openSUSE and then I'm not able to build it yet. I'm working on it anyway. Once it's done, I tell you. |
Sounds good, when I get the chance I'll play around with it as well. An alternative route would be to strip/replace the elements from OE that are obsolete and replace them. There's a lot in OE that's actually not needed for the basic functionalities (of course that's more work and if there's a better system that already works then that's more desirable)! |
I wouldn't bet on OE. Besides, there is a lot of work to be done on OE. But I'll help you the best I can. |
Hi, A generic Java toolkit for building dialogue systems http://opendial-toolkit.net, https://github.com/BotLibre/BotLibre it can be implement in lucida to get siri, cartona, ok google like experience. |
@kartx22 yep. Would be more user-friendly than the web interface. It was more a little qt widget in my mind. |
@posophe ya, trying to integrate botlibre to lucida |
Just a sample of future (awesome) yodaqa |
We should get started comparing this with other systems and moving it into a thrift service to integrate it into the Lucida ecosystem of services. @yunshengb have you looked at YodaQA? Can you also post the list of QA systems you're currently working on |
I am working on Qanta, Jacana, and YodaQA. |
After some tweaks, YodaQA is still slower than OpenEphyra, and the accuracy is roughly the same as OpenEphyra. |
However, YodaQA is still under development, and we will probably bring in YodaQA soon, and let users choose which one to use. |
Maybe I can help with this. I have been able to improve Yoda's recall, precision and response time for my thesis with some simple tricks (in a dedicated REST backend that can be deployed as a Docker container) to answer 1.5x-2x as many questions as the standard version. I can provide source code the moment I hand in my thesis, which will be before the end of next month. In my "Grand Challenge" brmson/yodaqa#38 the system beat all (27) users so far (most of them PhD students). While none of these additions are groundbreaking [I also present significantly more contributions in my thesis], they should be of practical interest for the issues you describe. Furthermore, I have taken a close look at adding own material (including a wiki), which is not quite ready for productive use (since it requires distributional semantics), but relevant to your project and to the best of my knowledge not available in OpenEphyra. |
Sounds good! On average, how long does a query take on your version of Yoda? I can make it ~40 seconds per query, but OpenEphyra takes ~30 seconds. |
Huh?! I'm at 9 seconds average, worst case 26s. What are your system specs? |
3.1 GHz, 16 GB RAM . And it connects to clarity server for Wikipedia and DBpedia. Are you running them on your own machine, or connecting to the author's machine? |
First a quick correction: I'm at under 8s average and 19s worst case - the other values were old. I can already reveal some extremely simple results that should help you out: I run everything myself including Freebase (and both label services), which helps. While I do have more RAM (32GB) and maybe a faster CPU, the bottleneck is not computational power, but I/O. Hence, I've run Yoda in several SSD combinations for my thesis and they help tremendously, at least to a certain extent. To give you a taste here are some averages:
I also ran everything with less RAM - almost no difference to 16GB. So you can see that running everything locally yields 12s and using SSDs gives another 4s, which makes a total speedup compared to Yoda's default configuration of ~3x. However, using even faster hardware (and a professional NVMe SSD over PCIe is as fast as it gets on my system - it has 4x the throughput or so, I have the measurements somewhere) does not(!) yield significantly better results. Accordingly, you can get a significant speedup with cheap consumer hardware. Setting up your own Freebase node takes about 5 days. If you'd like to use / cite this / read more take a look at "Leveraging Question Answering Systems for e-Learning and Research", Falk Pollok, RWTH Aachen, 2016. I'll provide the (much more interesting) details about my QA extensions once the thesis is handed in. After that speedup it becomes obvious that Yoda does not scale too well. At some point it will need to use UIMA-AS + DUCC to go beyond current performance limitations. We have been discussing this here: brmson/yodaqa#21 |
Thanks for your information. Unfortunately, our clarity server (at least the one I have access to) ran out of space when I tried deploying Freebase, so currently my version uses only Wikipedia and DBpedia running on that server. I notice that adding Freebase can improve the accuracy by ~10%. I agree that some small tricks can improve the speed dramatically. Personally I tried some and it worked. Currently my focus is rewrite the existing services with a new Thrift interface, which will also be the one for Yoda. We hope to make the command center more intelligent by adding in a query classifier. However, I am very interested in investigating into Yoda more, and I appreciate your work. I look forward to seeing a better Yoda. |
Query classifier? Like according to topic? I have written one as one of the additional contributions I've mentioned, since I've also been using IBM Watson, e.g. for the healthcare domain. Maybe we should collaborate closer to avoid more redundant work... PS: |
I am working at the level of the command center, so the query classifier is basically a controller that tells the command center which services should be used. I think the one you mentioned is dedicated to Yoda QA, but perhaps they have overlapping. For instance, if the query is "What is the age of this person?" along with a photo image, the classifier should say "facial recognition" and "OpenEphyra" (which has been provided with information "John Smith is 20 years old."); if the query is "What is the speed of light?", it should say OpenEphyra or Yoda without anything else. Maybe query classifier is a confusing name -- it should really be just part of the command center. |
Two last comments for now [it's 23:40PM here and I need to be back around 8:00AM, so I'll head home after this]: Freebase only takes approximately 260GB. Take any consumer machine with a 2TB HDD and 16GB RAM, setup Freebase over 5-6 days, then simply copy the d-freebase directory to your server and just execute Fuseki with it: Running seems to be much less resource-intensive than setting it up. Worked fine for me, unless your server is really at its max. What you describe seems rather close to my classifier (which is - just like most of my system - independent of Yoda). While I don't consider additional media like photos, that should be just another feature to add. Otherwise, it takes a ton of questions as training set and prints which backend to use. Not sure which algorithm I picked (I compared quite a few including naive bayes, random forests, decision trees etc.) and which features I used, but I can look that up easily. I think I got around 96% accuracy with 100-fold cross validation, but I might mix this up. Maybe we should team up over Easter and build a prototype over the weekend? I'll work anyways and I can ask tomorrow whether we can make an exception and provide you with my results before the deadline - I mean it's only a Master's thesis, you're good colleagues and I just showed in my evaluation that users want an integration with a dialog system. Or (what I'd actually prefer) we do this once I join you in a month or so. I can try to wrap my thesis up asap and come over around April 24th. I have no problem with working a week for free. It should also be fairly straightforward to plug my additional backend into Thrift and instantly get up to twice as many correct answers. I can already add papers, wikis and mine conferences, so once we solve distributional semantics and another topic I can discuss soon, we have quite a system. |
Update: |
@yunshengb, thanks for comparing YodaQA to OpenEphyra. You mentioned you found the accuracy comparable (albeit that was without Freebase enabled in YodaQA). Do you still have a details re how did you compare it? Did you use any specific dataset, or have the list of questions (and possibly system answers) still around? Or was it just an informal comparison? |
@k0105 have you made any progress on this? I see that it was put on hold. Are you planning on picking this back up? |
This? As in YodaQA? Sure, YodaQA works. Inside Lucida and out. |
Do you think YodaQA works better than our current QA system? Should we replace OpenEphrya with YodaQA? |
Depends. YodaQA is definitely a much better open-domain factoid QA system, yes. In terms of precision/recall, maintainability / software design and response time. And while it still largely relies on search, it features (one of?) the best open QA pipeline(s) I have seen. The one feature it doesn't offer out-of-the-box to my knowledge is to add facts to a knowledge base that can then be queried, i.e. the functionality you need for your previous demos to ask questions about people or the Eiffel tower. I'm not sure you need to worry about that, because I have YodaQA and the ensemble covered in the stuff I package [just like the Bluemix backend]. Thus, it is already available to us. Another consideration is backend size - with Freebase, enwiki and DBpedia it needs several hundred gigabytes of dependencies, which should ideally reside on an SSD. I have those available, but shipping them with every Lucida download is a challenge. And manually indexing it will take about 6 days on an average machine. Bottom line: I love YodaQA, actively use it and recommend it to everyone who needs a decent open QA system, but you will have to see whether it matches your requirements. Since I still don't know what your team is working towards, you'll have to draw that final decision yourself. I do think that accelerating YodaQA and adding the capability to add own knowledge would be crazy cool extensions, but both are not low-hanging fruit. If you have detailed questions, I'm always willing to help. |
Hi !
I'm an openSUSE maintainer, and have a lot of interest for lucida. Unfortunately, if I had been able to build other dependencies, openEphyra is pretty annoying, mainly because of obsolete dependencies.
First, maxent : the current API version is 3.0.2, provided by opennlp 1.6.0.
Then, minorthird : the most recent version, coming from https://github.com/TeamCohen/MinorThird is 20120608
JWNL is dead upstream. Prefer to it http://extjwnl.sourceforge.net/
Lingpipe : 4.1.0 now provided by javatools
stanford-ner : version 3.6.0
standford-parser : version 3.6.0
stanford-postagger : version 3.6.0
javelin : 2015.x
bing-search-java : superseded by https://code.google.com/p/azure-bing-search-java/
googleapi : superseded by ajaxsearch
indri : 5.10
yahoosearch : API no longer available
commons-codec : 1.10
commons-logging : 1.2
gson : version 2.5
htmlparser : no idea. > jericho html ?
log4j : 2.5
So, for all of these reasons, it is very hard to provide openephyra. I would love to help but I don't speak a word of java. Do you plan to maintain ephyra yourself or something ?
Anyway, thanks for your awesome work !
Regards.
Benjamin
The text was updated successfully, but these errors were encountered: