You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The HeteroDataBuilder currently does the following:
loads each table (in full) using pd.read_sql
computes the edge_index for each relation using Pandas on top of the loaded tables
This is really fast. Since Pandas also supports pd.read_sql_query for any SQL query built using SQLAlchemy, I propose to rewrite BFSStrategy using Pandas as well. I expect that the benefits may be speed (hopefully), cleaner code, and results that will be more consistent with HeteroDataBuilder (as the new type converters use Pandas anyway as well - also for speed reasons).
I think the new BFSStrategy could work as follows:
load the target table (or a batch from the target table) using a single call to pd.read_sql
then load the joins like that as well within the BFS
then the edge_index computation can be done at the end similarly as I do it (hopefully)
Then at the end we should probably merge HeteroDataBuilder with Dataset and somehow find a nice way to have it as two different strategies for the dataset ("full strategy" vs "bfs strategy").
The text was updated successfully, but these errors were encountered:
The HeteroDataBuilder currently does the following:
pd.read_sql
edge_index
for each relation using Pandas on top of the loaded tablesThis is really fast. Since Pandas also supports
pd.read_sql_query
for any SQL query built using SQLAlchemy, I propose to rewrite BFSStrategy using Pandas as well. I expect that the benefits may be speed (hopefully), cleaner code, and results that will be more consistent with HeteroDataBuilder (as the new type converters use Pandas anyway as well - also for speed reasons).I think the new BFSStrategy could work as follows:
pd.read_sql
edge_index
computation can be done at the end similarly as I do it (hopefully)Then at the end we should probably merge HeteroDataBuilder with Dataset and somehow find a nice way to have it as two different strategies for the dataset ("full strategy" vs "bfs strategy").
The text was updated successfully, but these errors were encountered: