Feature/send to pypln #65

fccoelho · 2014-06-26T14:02:27Z

This feature is basically done. However the add_document of PyPLN.api is taking too long to return. So before putting this into production, we need to sort out this performance issue on PyPLN.api. @flavioamieiro, any ideas?

After this is done, a line must be added to downloader so that each downloaded document is sent to pypln immediately.

added nlp import line to downloader.

flavioamieiro · 2014-06-27T15:10:02Z

I think I really ran out of ideas on how to speed this up in PyPLN. Can @turicas help?

This code looks ok to me. I'd only remove the commented out code (for the corpus name option).

From what I can see, this code can be merged (since downloader doesn't call load() yet). Can I delete the commented out code and merge? Or is it better to wait for us to find a solution on PyPLN?

fccoelho · 2014-06-27T15:35:44Z

I think we need to fix the performance issue first, to avoid slowing down
the capture. Do we have an issue open in pypln.api for this?
Em 27/06/2014 12:10, "Flávio Amieiro" [email protected] escreveu:

I think I really ran out of ideas on how to speed this up in PyPLN. Can
@turicas https://github.com/turicas help?

This code looks ok to me. I'd only remove the commented out code (for the
corpus name option).

From what I can see, this code can be merged (since downloader doesn't
call load() yet). Can I delete the commented out code and merge? Or is it
better to wait for us to find a solution on PyPLN?

—
Reply to this email directly or view it on GitHub
#65 (comment)
.

flavioamieiro · 2014-06-27T16:21:53Z

Now we do: NAMD/pypln.web#118 . We can also think of a solution that doesn't include blocking the rest of the downloader process while we wait for the upload. Maybe we can leave the downloader as it is now and have another process upload to pypln?

turicas · 2014-06-29T21:38:30Z

@fccoelho, I think the best approach is to separate te downloading and uploading processes, so there will no bottleneck between them. After downloading, a flag cna be set on the database so the uploader script can find which documents to upload in the next run.

fccoelho · 2014-06-30T23:12:54Z

We can do this to remove this block for now. But if pypln is to succeed as
an API, it must perform much better.
Em 29/06/2014 18:38, "Álvaro Justen" [email protected] escreveu:

@fccoelho https://github.com/fccoelho, I think the best approach is to
separate te downloading and uploading processes, so there will no
bottleneck between them. After downloading, a flag cna be set on the
database so the uploader script can find which documents to upload in the
next run.

—
Reply to this email directly or view it on GitHub
#65 (comment)
.

fccoelho added 4 commits June 26, 2014 10:19

merged master

c2add64

Added pypln corpus name to settings;

dfdbce6

added nlp import line to downloader.

Added pagination to sending to avoid cursor timeout

dfeb616

Load into Pypln working, albeit very slowly.

335b7a1

flavioamieiro mentioned this pull request Jun 27, 2014

Improve upload speed NAMD/pypln.web#118

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/send to pypln #65

Feature/send to pypln #65

fccoelho commented Jun 26, 2014

flavioamieiro commented Jun 27, 2014

fccoelho commented Jun 27, 2014

flavioamieiro commented Jun 27, 2014

turicas commented Jun 29, 2014

fccoelho commented Jun 30, 2014

Feature/send to pypln #65

Are you sure you want to change the base?

Feature/send to pypln #65

Conversation

fccoelho commented Jun 26, 2014

flavioamieiro commented Jun 27, 2014

fccoelho commented Jun 27, 2014

flavioamieiro commented Jun 27, 2014

turicas commented Jun 29, 2014

fccoelho commented Jun 30, 2014