Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How do I run DBTransformer with my own datasets? #28

Open
BarclayII opened this issue Feb 2, 2024 · 1 comment
Open

How do I run DBTransformer with my own datasets? #28

BarclayII opened this issue Feb 2, 2024 · 1 comment

Comments

@BarclayII
Copy link

Thanks for the interesting work! I'd like to test the scalability of this framework but I have trouble finding the entry point of how to run DBTransformer. Do you have any instructions or an example script on how I could load data and train a DBTransformer on it?

Also, I found your paper saying that the data loading is done by online SQL queries. Does that mean (1) the entire dataset should reside in SQL, and (2) you do minibatch training with things like neighbor/subgraph sampling?

@jakubpeleska
Copy link
Collaborator

Hi, thanks for taking an interest in this project. Here is a script which is modified version of main.py with exposed DB Server URL and target database config. Hope this helps with testing the DBTransformer on your own data.

For the second question, we currently have two data-loading approaches based on whether the dataset is too large to fit into memory. If the dataset is large, we use online SQL queries (ideally with a local version of the dataset) with BFS to get the subgraph. As a second option, if the dataset is small enough, it is possible to work the graph as a whole, which results in considerable learning speed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants