Creating sfnetwork from GRAPHML with missing geometries #257

loreabad6 · 2023-09-19T16:58:32Z

loreabad6
Sep 19, 2023
Collaborator

From #256 @JosiahParry shared a dataset from the Geographic Data Science with Python book.

If I take the way the graph was created using tidygraph, you will notice there is actually a geometry column for the edges.

library(sf)
library(dplyr)
library(igraph)
library(sfnetworks)

gurl <- "https://raw.githubusercontent.com/gdsbook/book/master/data/cache/yoyogi_park_graph.graphml"

g <- igraph::read_graph(gurl, format = "graphml")

# covert to tbl_graph
nodes <- tidygraph::as_tbl_graph(g) 
nodes
#> # A tbl_graph: 106 nodes and 287 edges
#> #
#> # A directed multigraph with 1 component
#> #
#> # A tibble: 106 × 5
#>   highway street_count x           y          id       
#>   <chr>   <chr>        <chr>       <chr>      <chr>    
#> 1 ""      3            139.6943335 35.6700868 886196069
#> 2 ""      3            139.6995077 35.669725  886196073
#> 3 ""      3            139.6997081 35.6694423 886196100
#> 4 ""      4            139.6985635 35.6704217 886196106
#> 5 ""      3            139.6974699 35.6712558 886196117
#> 6 ""      3            139.697167  35.6714092 886196121
#> # ℹ 100 more rows
#> #
#> # A tibble: 287 × 11
#>    from    to access name  bridge geometry     length oneway highway osmid id   
#>   <int> <int> <chr>  <chr> <chr>  <chr>        <chr>  <chr>  <chr>   <chr> <chr>
#> 1     1    29 ""     ""    ""     LINESTRING … 191.3… False  footway 7508… 0    
#> 2     1     8 ""     ""    ""     LINESTRING … 134.2… False  footway 7508… 0    
#> 3     1    19 ""     ""    ""     LINESTRING … 65.16… False  footway 1385… 0    
#> # ℹ 284 more rows

Josiah built the sfnetwork from the nodes by parsing the x and y column from the nodes into an sf geometry object and copying it back in:

# create notes geometry
n_geo <- as_tibble(nodes) |>
  transmute(across(c(x, y), as.numeric)) |>
  st_as_sf(coords = c("x", "y"), crs = 4326) |>
  st_geometry()

# move geometry into sfn
g_nodes <- nodes |>
  mutate(geometry = n_geo) 

g_sf <- sfnetworks::as_sfnetwork(g_nodes)

This works, BUT it basically ignores the original geometry stored on the edges. Why? Because the geometry in the edges is not a list-column, so it is not recognized. When calling g_sf |> tidygraph::convert(to_spatial_explicit) we are basically creating straight lines between the nodes coordinates, losing the original topology stored in the edges.

So instead of building the sfnetwork from the nodes as Josiah did, I tried to build it from the edges but got into problems. In escence, the geometry column has empty rows, and sf does not support this (see r-spatial/sf#1034).

# extract edges as tibble
edges = nodes |> activate(edges) |> as_tibble()

# edges have a geometry column! but it is character
edges |> glimpse()
#> Rows: 287
#> Columns: 11
#> $ from     <int> 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 6, 6, 6, 7, 7…
#> $ to       <int> 29, 8, 19, 3, 4, 67, 2, 10, 64, 20, 2, 5, 67, 9, 6, 4, 9, 5, …
#> $ access   <chr> "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "…
#> $ name     <chr> "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "…
#> $ bridge   <chr> "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "…
#> $ geometry <chr> "LINESTRING (139.6943335 35.6700868, 139.6942113 35.6701483, …
#> $ length   <chr> "191.36999999999998", "134.209", "65.16300000000001", "36.275…
#> $ oneway   <chr> "False", "False", "False", "False", "False", "False", "False"…
#> $ highway  <chr> "footway", "footway", "footway", "footway", "footway", "footw…
#> $ osmid    <chr> "75089904", "75089904", "138582748", "75089904", "75089904", …
#> $ id       <chr> "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "0", "…

# if we try to transform that to an sf object
edges |> st_as_sf(wkt = "geometry")
#> OGR: Unsupported geometry type
#> Error in eval(expr, envir, enclos): OGR error

# we get an error, the reason?
edges |> pull(geometry) |> head()
#> [1] "LINESTRING (139.6943335 35.6700868, 139.6942113 35.6701483, 139.6940878 35.6702175, 139.6939624 35.670293, 139.6938106 35.6704168, 139.6937429 35.6704875, 139.693679 35.6705662, 139.6936109 35.6706645, 139.693546 35.670771, 139.6934539 35.6709768, 139.6933705 35.6712665, 139.6933134 35.671511)"
#> [2] "LINESTRING (139.6943335 35.6700868, 139.6945793 35.6700204, 139.6947102 35.6699975, 139.6948197 35.6699821, 139.6949671 35.6699792, 139.6951141 35.669985, 139.6952919 35.6700054, 139.6954672 35.6700381, 139.6957673 35.6701448)"                                                                    
#> [3] "LINESTRING (139.6943335 35.6700868, 139.6942783 35.6699959, 139.6940377 35.6695524)"                                                                                                                                                                                                                   
#> [4] ""                                                                                                                                                                                                                                                                                                      
#> [5] "LINESTRING (139.6995077 35.669725, 139.6993668 35.6698732, 139.6992525 35.6699782, 139.6991484 35.6700611, 139.6987601 35.6703108, 139.6985635 35.6704217)"                                                                                                                                            
#> [6] "LINESTRING (139.6995077 35.669725, 139.6996204 35.6698159, 139.6997011 35.6698984, 139.6997476 35.6699517, 139.6997997 35.6700207)"

# there are empty geometries
# how to handle this?

So one way to solve this would be to drop all empty geometries as:

edges_clean = edges |> 
  mutate(geometry = na_if(geometry, ""))

edges_sf = edges_clean |> 
  drop_na(geometry) |> 
  st_as_sf(wkt = "geometry")

But this will break the original node data since we will be dropping information. Another thing would be to create straight lines between the nodes surrounding the edges with empty geometries since those coordinates are available from the original node data.

I am not sure what the rest of the analysis looks like for your example Josiah but I guess at least in terms of plotting you are not getting the same results as in the Python book.

Also, how do they handle the empty geometries? I would be curious to know that.

JosiahParry · 2023-09-19T19:52:05Z

JosiahParry
Sep 19, 2023

It's possible to convert the WKT to sfc geometry using {wk} like so. I think having an EMPTY GEOMETRY doesn't really make sense for an edge? Should that be possible? Probably not. So skippin those might be necessary

library(sf)
#> Linking to GEOS 3.11.0, GDAL 3.5.3, PROJ 9.1.0; sf_use_s2() is TRUE
library(dplyr)
library(igraph)
library(sfnetworks)

gurl <- "https://raw.githubusercontent.com/gdsbook/book/master/data/cache/yoyogi_park_graph.graphml"

g <- igraph::read_graph(gurl, format = "graphml")

tg <- tidygraph::as_tbl_graph(g) 

nodes <- tg |> 
  as_tibble() |>
  mutate(across(c(x, y), as.numeric)) |>
  st_as_sf(coords = c("x", "y"), crs = 4326) 

# extract edges as tibble
edges <- tg |> 
  activate(edges) |> 
  as_tibble() |> 
  mutate(geometry = st_as_sfc(wk::wkt(na_if(geometry, "")))) |> 
  st_as_sf(crs = 4326)

st_geometry(edges)
#> Geometry set for 287 features  (with 85 geometries empty)
#> Geometry type: LINESTRING
#> Dimension:     XY
#> Bounding box:  xmin: 139.6909 ymin: 35.66601 xmax: 139.7014 ymax: 35.67426
#> Geodetic CRS:  WGS 84
#> First 5 geometries:
#> LINESTRING (139.6943 35.67009, 139.6942 35.6701...
#> LINESTRING (139.6943 35.67009, 139.6946 35.6700...
#> LINESTRING (139.6943 35.67009, 139.6943 35.67, ...
#> LINESTRING EMPTY
#> LINESTRING (139.6995 35.66972, 139.6994 35.6698...

^{Created on 2023-09-19 with reprex v2.0.2}

1 reply

loreabad6 Sep 19, 2023
Collaborator Author

Ah that's a nice trick with {wk}. I also think dropping them makes sense but I'm curious to know how this affects the original network? Because the tbl_graph does show for instance that edge 4 which is empty goes from node 2 to 3 which have geometry. So how's the network supposed to look like? Are we loosing information if we skip those edges? I'll explore more tomorrow ;)

loreabad6 · 2023-09-20T09:40:59Z

loreabad6
Sep 20, 2023
Collaborator Author

library(sf)
library(tidygraph)
library(dplyr)
library(igraph)
library(tidyr)
library(sfnetworks)

gurl <- "https://raw.githubusercontent.com/gdsbook/book/master/data/cache/yoyogi_park_graph.graphml"

g <- igraph::read_graph(gurl, format = "graphml")

# covert to tbl_graph
tg = tidygraph::as_tbl_graph(g)

So after playing a bit more with this example I found a couple of ways someone could create the sfnetwork.

I first will extract network components

# create geometry column from nodes X and Y
nodes_coords = as_tibble(tg) |>
  transmute(across(c(x, y), as.numeric)) |>
  st_as_sf(coords = c("x", "y"), crs = 4326) |>
  st_geometry()

# extract only nodes as sf
nodes_sf = tg |>
  as_tibble() |> 
  mutate(geometry = nodes_coords) |> 
  st_as_sf(crs = 4326)

# add geometry to the `tbl_graph` nodes 
nodes_tg = tg |>
  activate(nodes) |> 
  mutate(geometry = nodes_coords) 

# extract only edges as tibble
edges = tg |>
  activate(edges) |> 
  as_tibble()

# create an sf object from the edges including empty geometries
edges_empty = edges |> 
  mutate(geometry = st_as_sfc(wk::wkt(na_if(geometry, "")))) |> 
  st_as_sf(crs = 4326)

# create an sf object from the edges excluding empty geometries
edges_filt = edges |> 
  mutate(geometry = na_if(geometry, "")) |> 
  drop_na() |> 
  mutate(geometry = st_as_sfc(wk::wkt(geometry))) |> 
  st_as_sf(crs = 4326)

Now we can create sfnetwork objects in a couple of ways

# using the nodes as sf objects and the empty geometries
# we need to use force=TRUE otherwise building the network fails
sfn_empty = sfnetwork(nodes_sf, edges_empty, force = TRUE)

# using the nodes as sf objects and the filterd geometries
sfn_filt = sfnetwork(nodes_sf, edges_filt, force = FALSE)
#> Checking if spatial network structure is valid...
#> Spatial network structure is valid

# using the tbl_graph object which contains nodes and edges
# but the edges don't have an explicit geometry
sfn_tg = as_sfnetwork(nodes_tg) 
#> Checking if spatial network structure is valid...
#> Spatial network structure is valid

# using only the nodes as sf objects and letting sfnetworks 
# build the edges
sfn_nodes = as_sfnetwork(nodes_sf)

par(mar = c(1, 1, 1, 1), mfrow = c(2,2))
plot(sfn_empty, main = "Includes EMPTY geoms")
plot(sfn_filt, main = "Excludes EMPTY geoms")  
plot(sfn_tg, main = "Built based on tidygraph with added geom column")
plot(sfn_nodes, main = "Built only from node geom information")

Now the first two networks look quite OK, mainly when we compare to how the Geographic Data Science with Python book renders it:

But you will see there are some edges missing, which correspond, I assume to those empty geometries.

We can see that when checking the number of nodes and edges

# edges
ecount(sfn_empty)
#> [1] 287
ecount(sfn_filt)
#> [1] 202
# nodes
vcount(sfn_empty)
#> [1] 106
vcount(sfn_filt)
#> [1] 106

I am not sure how osmnx builds the network but with sfnetworks some extra steps would be needed to get all the information from the GRAPHML file.

This is one way I solved this based on the sfnetwork that includes the EMPTY edge geometries

# Filter the network to contain edges with a geometry
# and assign to a new network
sfn_explicit_edges = sfn_empty |>
  activate(edges) |> 
  filter(!st_is_empty(geometry))

# Filter the network for EMPTY edge geometry
# drop the geometry column for the edges
# remove isolated nodes 
# and finally convert the edges to spatially
# explicit with a morpher
sfn_empty_edges = sfn_empty |>
  activate(edges) |> 
  filter(st_is_empty(geometry)) |> 
  st_drop_geometry() |> 
  activate("nodes") |> 
  filter(!node_is_isolated()) |> 
  convert(to_spatial_explicit)

# Finally, join both networks
sfn_complete = st_network_join(sfn_empty_edges, sfn_explicit_edges)
vcount(sfn_complete)
#> [1] 106
ecount(sfn_complete)
#> [1] 287

par(mar = c(1, 1, 1, 1), mfrow = c(1,1))
plot(sfn_complete)

After all this exploration, a couple of ideas popped as to how to handle this cases:

Create an argument on creation that explicitizes EMPTY geoms for edges
Create a spatial morpher to directly explicitize EMPTY geoms for edges

@luukvdmeer do you think this is something we could support? Otherwise, keeping this discussion and maybe turning it into a vignette could be worth for these cases?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Creating sfnetwork from GRAPHML with missing geometries #257

{{title}}

Replies: 2 comments 1 reply

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Creating sfnetwork from GRAPHML with missing geometries #257

loreabad6 Sep 19, 2023 Collaborator

Replies: 2 comments · 1 reply

JosiahParry Sep 19, 2023

loreabad6 Sep 19, 2023 Collaborator Author

loreabad6 Sep 20, 2023 Collaborator Author

loreabad6
Sep 19, 2023
Collaborator

Replies: 2 comments 1 reply

JosiahParry
Sep 19, 2023

loreabad6 Sep 19, 2023
Collaborator Author

loreabad6
Sep 20, 2023
Collaborator Author