You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Current implementation is messy, very hard to maintain, and make changes. New implementation should be compatible with current one and add new features:
Should normalize relative links
Should validate links and discard invalid ones
Should extract deep web .onion links
Should extract anchor text
Should extract text around links
Should extract meta-tags (description, keywords, etc)
Should decode HTML entities to regular characters (turn & into &) from links
Should decode HTML entities to regular characters (turn & into &) from text
Should remove the fragment portion of the URL (anything after the character #)
Should do basic link normalization (lowercase domain, reorder query parameters, etc)
NEW: Extract links to images and regular links separately
NEW: Allow for easy extensions such as extraction of meta tags such as og:description, og:title, etc
etc
The text was updated successfully, but these errors were encountered:
Current implementation is messy, very hard to maintain, and make changes. New implementation should be compatible with current one and add new features:
.onion
links&
into&
) from links&
into&
) from textog:description
,og:title
, etcThe text was updated successfully, but these errors were encountered: