Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/send to pypln #65

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open

Feature/send to pypln #65

wants to merge 4 commits into from

Conversation

fccoelho
Copy link
Member

This feature is basically done. However the add_document of PyPLN.api is taking too long to return. So before putting this into production, we need to sort out this performance issue on PyPLN.api. @flavioamieiro, any ideas?

After this is done, a line must be added to downloader so that each downloaded document is sent to pypln immediately.

@flavioamieiro
Copy link
Member

I think I really ran out of ideas on how to speed this up in PyPLN. Can @turicas help?

This code looks ok to me. I'd only remove the commented out code (for the corpus name option).

From what I can see, this code can be merged (since downloader doesn't call load() yet). Can I delete the commented out code and merge? Or is it better to wait for us to find a solution on PyPLN?

@fccoelho
Copy link
Member Author

I think we need to fix the performance issue first, to avoid slowing down
the capture. Do we have an issue open in pypln.api for this?
Em 27/06/2014 12:10, "Flávio Amieiro" [email protected] escreveu:

I think I really ran out of ideas on how to speed this up in PyPLN. Can
@turicas https://github.com/turicas help?

This code looks ok to me. I'd only remove the commented out code (for the
corpus name option).

From what I can see, this code can be merged (since downloader doesn't
call load() yet). Can I delete the commented out code and merge? Or is it
better to wait for us to find a solution on PyPLN?


Reply to this email directly or view it on GitHub
#65 (comment)
.

@flavioamieiro
Copy link
Member

Now we do: NAMD/pypln.web#118 . We can also think of a solution that doesn't include blocking the rest of the downloader process while we wait for the upload. Maybe we can leave the downloader as it is now and have another process upload to pypln?

@turicas
Copy link
Contributor

turicas commented Jun 29, 2014

@fccoelho, I think the best approach is to separate te downloading and uploading processes, so there will no bottleneck between them. After downloading, a flag cna be set on the database so the uploader script can find which documents to upload in the next run.

@fccoelho
Copy link
Member Author

We can do this to remove this block for now. But if pypln is to succeed as
an API, it must perform much better.
Em 29/06/2014 18:38, "Álvaro Justen" [email protected] escreveu:

@fccoelho https://github.com/fccoelho, I think the best approach is to
separate te downloading and uploading processes, so there will no
bottleneck between them. After downloading, a flag cna be set on the
database so the uploader script can find which documents to upload in the
next run.


Reply to this email directly or view it on GitHub
#65 (comment)
.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants