Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Domain limiting #3

Open
clojens opened this issue Aug 20, 2013 · 1 comment
Open

Domain limiting #3

clojens opened this issue Aug 20, 2013 · 1 comment

Comments

@clojens
Copy link

clojens commented Aug 20, 2013

Hey dakrone, you mention Itsy domain limiting capabilities, can you elaborate? In this case, I'd like to e.g. extract only pages->text which have a certain domain pattern. Of course I can hack this in somewhere but I was wondering if Itsy has something like that. Perhaps you know the solution frak but in case you didn't, it might be of interest. Thanks for all your work.

Cheers,
Rob (supersym)

@dakrone
Copy link
Owner

dakrone commented Aug 20, 2013

Sure, the host limiter allows you to limit the URLs that Itsy fetches based on a hostname.

By specifying the :host-limit option as true, Itsy limits the URLs corresponding to the host of the original seeding URL (so if you specify http://example.com/foo, itsy would limit to example.com). By specifying a string as the :host-limit, itsy will match URLs whose host contain that string.

Hopefully that helps explain it a bit more. It would be neat to use frak and parse the URL, I'll have to keep that in mind for the future, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants