Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add function to load external html files #33

Closed
wants to merge 2 commits into from
Closed

Conversation

wanLo
Copy link
Contributor

@wanLo wanLo commented Mar 24, 2023

We are maintaining a browser extension to simulate browsing behavior in various search engines: WebBot. I just stumbled across WebSearcher and thought it would be awesome if one could use its advanced parsing capabilities on search result HTML files obtained from the WebBot extension. This is why I would like to propose a load_serp function that mirrors the capabilities of the already existing save_serp function.

The other commit is rather minor, replacing the placeholder package bs4 with the official beautifulsoup4 package (see https://pypi.org/project/bs4/)

gitronald added a commit that referenced this pull request Nov 14, 2023
* update: component classifier functions

* add: banner component parser

* fix: return model as dict, require pydantic

* fix: nonetype has no attrs; ignore if no img

* fix: filter empty rso component

* update: modularized component classifier

* Bump to 0.3.2

* Replace dummy package bs4 with official name

* Add function to load external html files

* update: removed load_serp, see load_html in ws.webutils

---------

Co-authored-by: wanLo <[email protected]>
@gitronald
Copy link
Owner

Thanks! Left out the load_serp function because we already have something similar via ws.webutils.load_html. Merged the other parts in #43

@gitronald gitronald closed this Nov 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants