Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Authentication mechanism on the REST API of scrapyrt #68

Open
aleroot opened this issue Oct 16, 2017 · 2 comments
Open

Authentication mechanism on the REST API of scrapyrt #68

aleroot opened this issue Oct 16, 2017 · 2 comments

Comments

@aleroot
Copy link

aleroot commented Oct 16, 2017

Basically I want to prevent unauthorized clients from accessing the scrapyrt API.
I would want to secure a scrapyrt API, is there anything built in handling an authorization mechanism ?

What kind of approach do you suggest ?

In addition I would like to understand if there is some mechanism to limit the number of maximum request per single client.

@pawelmhm
Copy link
Member

pawelmhm commented Oct 19, 2017

hey @aleroot

is there anything built in handling an authorization mechanism ?

no, nothing built in. You can do it in different ways. One way is to put scrapyrt behind some other webserver, for example nginx and configure rate limiting and auth in nginx.

Other option is to write some python code and overriding scrapyrt default resource.

There is option to create your own "resources" so basically your own request handlers. You can do it by subclassing CrawlResource and overriding some methods, e.g. render_GET then calling super()

adding resources is described here http://scrapyrt.readthedocs.io/en/latest/api.html#resources

for example you can write resource like this

class AleRootCrawlResource(CrawlResource):

    def render_GET(self, request, **kwargs):
        # your code goes here e.g. fetch basic auth header etc
        ...
        return super(AleRootCrawlResource, self).render_GET(
            request, **kwargs)

I'll think about adding some more extensive examples to docs with basic auth header, it could be useful for others.

@oscarcontrerasnavas
Copy link

Hi, I know this thread is a bit old but bear with me. I had explored this solution and created my own resource, but when I tried to add it according to the documentation, it was only possible for me by referring to a specific settings.py file in the command line like this:

scrapyrt -S nist_scraper.scrapyrt.settings

and it worked in my local environment, now the main CrawlResorce is the one I coded. But I tried to do the same over Heroku on the Procfile as follows:

web: scrapyrt -S nist_scraper.scrapyrt.settings -i 0.0.0.0 -p $PORT

and the ScrapyRT part still uses the default resource. I do not know if I cannot start Scrapy with variables on Heroku or if there is another way to override the resources safely.

Git here: https://github.com/oscarcontrerasnavas/nist-webbook-scrapyrt-spider

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants