Systematic Approach To List Creation #9

jakem742 · 2022-05-26T23:53:33Z

jakem742
May 26, 2022
Collaborator

The more I think about this project, the more I feel like we need a systematic approach to the scripting. We're literally going to end up with our own little app which can create/update lists from html info that has been scraped into a db like btx's. I wonder if it's not a good idea to do something similar for CBRO so that we can maintain it a bit more consistently. End goal would be to scrape, parse, import and inject the entries into our existing CBL files where they are intended to be. At which point we manually review the entries to check there's no problems.

If we did try and build a system like this, we would need to work on scripts to handle the following:

Scraping data
Parsing html info (Different for each website we choose to include) - Maybe a pause to manual check at this point to review any problem data
Import & Merge standardised data into central db
Manual edits to data in db
Export new CBL file

This is obviously a lot of work, but at the minute all we're doing is creating a single static list with no easy way to update/maintain it. While it's possible to manually update everything, it would be nice to create a system that minimises the need for our input.

Unless this seems like too much and we just wanna go with a static list for now. The project I'm describing is something that will take a team to implement so it's only an option if we're actually keen to make it happen.

DieselTech · 2022-05-27T03:38:35Z

DieselTech
May 27, 2022
Maintainer

I'd like to try and make the scrapers as conclusive as possible. That way it serves two purposes, list creation and backup of the site it something should ever happen to it. Currently one of the sites I would consider in jeopardy. From there on you have the right point of it. We can do the parsing against our own data source and don't have to worry about miniature ddos attacks against the main sites. I think we if spend the most time making sure point 1 is done correctly, then the rest of the points will fall in line naturally.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Systematic Approach To List Creation #9

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Systematic Approach To List Creation #9

jakem742 May 26, 2022 Collaborator

Replies: 1 comment

DieselTech May 27, 2022 Maintainer

jakem742
May 26, 2022
Collaborator

DieselTech
May 27, 2022
Maintainer