My web scraper repo, hopefully to be updated fairly often.
As I play around with getting data off the internet and into my drives I'll also share all of my stuff with the world because, well, why the hell not? I'll do my best to add comments to everything I do to make this educational should anyone ever run into it on the internet. As more stuff is added I'll add it together with short descriptions to the list below.
Super small script, as well as my first one of this kind. It does what it's meant to. Get the data of a specific div, in this case identified with its class and then from that div extract the content of a paragraph. Write it all out to a file and presto! Around 10k godawful jokes ready for you to feed to your NN or neibourghs printer.
This baby downloads PDF files from the Croatian Mathematics Society and finds one with specific keywords. Page can be changed and it's probably useful for many similar websites just make sure to brush up on your RegEx to get as much hits as possible.