Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Send extracted links from extractor pool via channel #193

Closed

Conversation

TimoFreiberg
Copy link
Contributor

@TimoFreiberg TimoFreiberg commented Mar 26, 2021

I tried to make it look decent, maybe a different style while waiting for the first link would be sensible?

I recorded an example here:
https://user-images.githubusercontent.com/5281645/112684773-cf958d00-8e73-11eb-832b-3e116dba3545.mp4

And all those nitpick commits are just small things I noticed, I can drop them if you don't want them

src/collector.rs Outdated Show resolved Hide resolved
src/collector.rs Outdated Show resolved Hide resolved
"{spinner:.red.bright} {pos}/{len:.dim} [{elapsed_precise}] {bar:25} {wide_msg}",
));
bar.set_length(0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noticed that the progress bar is a bit "bumpy" in the beginning. (There's a better word for what I mean, but I can't seem to find it, heh.)
Keeping the initial length of links.len() is not possible, of course because at this point we don't have the length (since links is a channel now, not a HashSet), but I wonder if there is another way to avoid that "bumpy" behavior. Maybe we can get the initial length from the channel somehow to give a better estimate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you talking about the initial period where no links were extracted yet? Or that, when links come streaming in, the progress bar can jump around?
I don't see a way to get the amount of current messages already waiting in the stream, did you mean using the capacity of the stream as the initial value?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I meant the jumping when links start streaming in. Thought there was a length method for mpsc, but it doesn't exist. Probably mixed it up with another channel impl. Anyway, if we can't find a solution for the jumpiness, it's fine, too. 🤷‍♀️

Copy link
Member

@MichaIng MichaIng Sep 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The downside of concurrency 😄. I think its fine when the bar represents the n/m checked/gathered URLs in the front, else it may be even more confusing. Only other idea would be to not show any bar until all inputs have been processed, but this example (video in OP) shows that sometimes the checker is waiting for the extractor, so no process bar at all then 😢.

Should be fine as it is.

@mre
Copy link
Member

mre commented Mar 28, 2021

Thanks! Looks good so far. Added a few thoughts. 😃

@TimoFreiberg TimoFreiberg force-pushed the 55-extractor-pool branch 2 times, most recently from 8528735 to b3a9e91 Compare March 28, 2021 17:17
@mre
Copy link
Member

mre commented Sep 4, 2021

@TimoFreiberg we went through quite a bit of a refactor lately. Would you like to rebase on top of master and continue working on this or do you think it makes sense to close the PR and alternatively open another one?

@mre mre added the question Further information is requested label Sep 4, 2021
@TimoFreiberg
Copy link
Contributor Author

Oh. I totally forgot about this, as far as I remember we ended up at "it's fine like this", right? (last comment was "Anyway, if we can't find a solution for the jumpiness, it's fine, too. 🤷‍♀️")

I'll try to rebase/reimplement it in the next few weeks, and comment again

@mre
Copy link
Member

mre commented Sep 5, 2021

Thanks a lot!

@mre
Copy link
Member

mre commented Sep 6, 2021

@TimoFreiberg heads up: we have some perf problems with the way the collector is implemented. See the last comments in #21.
Specifically, loading all files before checking the links clearly doesn't work:

time lychee --offline -b . '**/*.html'                                                 ✘
Error: Too many open files (os error 24)

As a consequence I'd like to strip out deadpool and any sort of channel from lychee-lib as I think it's not helping for our use-case.
Instead I'd rather use vanilla tokio streams.

Some more links provided by @lebensterben in #165 (comment):

If you want to help with that, I'd be thankful for a PR. If you don't find the time that's also totally understandable. I just wanted to let you know right away that this PR might not make it into lychee as we have to do some refactoring due to perf problems. 😕

@TimoFreiberg
Copy link
Contributor Author

Hey @mre , I finally found the time and energy to start looking into this again. I'm interested in working with you on a refactoring that introduces streams :)

@mre
Copy link
Member

mre commented Sep 14, 2021

Nice! I'm working on this in the stream branch. Feel free to check it out. I'll create a draft PR for review and collaboration.

@mre mre mentioned this pull request Sep 14, 2021
6 tasks
@mre
Copy link
Member

mre commented Sep 16, 2021

@TimoFreiberg let's focus on #330 and move the discussion over there. I'll close this to keep the issue tracker clean. We can draw some inspiration from here in the future, though.

@mre mre closed this Sep 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants