-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/send to pypln #65
base: master
Are you sure you want to change the base?
Conversation
added nlp import line to downloader.
I think I really ran out of ideas on how to speed this up in PyPLN. Can @turicas help? This code looks ok to me. I'd only remove the commented out code (for the corpus name option). From what I can see, this code can be merged (since downloader doesn't call load() yet). Can I delete the commented out code and merge? Or is it better to wait for us to find a solution on PyPLN? |
I think we need to fix the performance issue first, to avoid slowing down
|
Now we do: NAMD/pypln.web#118 . We can also think of a solution that doesn't include blocking the rest of the downloader process while we wait for the upload. Maybe we can leave the downloader as it is now and have another process upload to pypln? |
@fccoelho, I think the best approach is to separate te downloading and uploading processes, so there will no bottleneck between them. After downloading, a flag cna be set on the database so the uploader script can find which documents to upload in the next run. |
We can do this to remove this block for now. But if pypln is to succeed as
|
This feature is basically done. However the add_document of PyPLN.api is taking too long to return. So before putting this into production, we need to sort out this performance issue on PyPLN.api. @flavioamieiro, any ideas?
After this is done, a line must be added to downloader so that each downloaded document is sent to pypln immediately.