Skip to content
/ xmlv Public

XMLVectorizer turn elements of a web page to numpy vectors

Notifications You must be signed in to change notification settings

kerighan/xmlv

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

XMLVectorizer

XMLVectorizer brings web pages to deep learning. Given a web page, this library returns a vector for each element as well as the xml graph structure. This library allows for elements to be passed to a graph neural network.

from xmlv import XMLV


xmlv = XMLV(vectorize_text=True, vectorize_link=True)
# attr is a dataframe where each line is an element and columns properties
# G is the networkx graph extracted from the web page
attr, G = xmlv.get("https://en.wikipedia.org/wiki/Lists_of_ambassadors")

# this fits the model using the attributes and vectorizes the properties
# X is a numpy matrix
X = xmlv.fit_transform(attr)
# G and X can the be used to fit a graph neural network

# model persistence
xmlv.save("xmlv.p")
xmlv = XMLV.load("xmlv.p")

About

XMLVectorizer turn elements of a web page to numpy vectors

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages