Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

using LLM to detect website structure #133

Open
berkbirkan opened this issue Aug 25, 2024 · 0 comments
Open

using LLM to detect website structure #133

berkbirkan opened this issue Aug 25, 2024 · 0 comments

Comments

@berkbirkan
Copy link

hello. first of all, i am not sure if this is the right place. i am quite new to this, so i apologize in advance if i am doing something wrong.

first of all, i need the following:

  1. create a full-content RSS/JSON structure from a ready RSS feed url
  2. create an rss feed from a website url (with or without AI)
  3. pull a single content from a single url (e.g. a news or blog post url)

the items i listed above should be valid for all sites, because i am developing a product for the end user and everyone's needs will be different. from what i have seen in my tests, i can pull some sites without any problems, while some sites have problems. i think morss cannot detect these sites because they use very different structures than the expected structure. morss recognizes certain structures manually, but it cannot do this dynamically. can artificial intelligence be used to detect the structure of such sites and then pull content? (similar to what the scrapegraph-ai library does)

thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant