You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While experimenting running Melbourne with an official Australian 1km population grid instead of the GHSL grid (that can be done, I've now verified it works #213 ), a bug was observed in the neighbourhood analysis subprocess where points that weren't joined with the data were retained with the grid ID as NA values. The result of that is the grid ID is treated as a float64, and in turn can't be cast as an int (or int64) at a later processing stage.
Here is an updated version, that is configured to dropna (non matches) by default.
def spatial_join_index_to_gdf(
gdf, join_gdf, join_type='within', dropna=True
):
"""Append to a geodataframe the named index of another using spatial join.
Parameters
----------
gdf: GeoDataFrame
join_gdf: GeoDataFrame
join_type: str (default 'within')
dropna: True
Returns
-------
GeoDataFrame
"""
gdf_columns = list(gdf.columns)
gdf = gpd.sjoin(gdf, join_gdf, how='left', predicate=join_type)
gdf = gdf[gdf_columns + ['index_right']]
gdf.columns = gdf_columns + [join_gdf.index.name]
if dropna:
gdf = gdf[~gdf[join_gdf.index.name].isna()]
gdf[join_gdf.index.name] = gdf[join_gdf.index.name].astype(join_gdf.index.dtype)
return gdf
This also gets rid of the explicit 'right_index_name' argument -- as that doesn't need to be provided, it is identifiable from the data itself.
While writing this, it occurred that maybe an inner join would have done the same thing -- but in any case, the above meant the code ran successfully in the Melbourne test case, and I also confirmed that re-running a new analysis for Las Palmas resulted in the same results as before.
While doing the above I also re-factored ID filtering.
and this code that uses it in _12_neighbourhood_analysis.py
samplePointsData = filter_ids(
df = samplePointsData,
query = f"""grid_id not in {list(grid.query(f'pop_est < {population["pop_min_threshold"]}').index.values)}""",
message = 'Restrict sample points to those not located in grids with a population below '
f"the minimum threshold value ({population['pop_min_threshold']})...",
)
samplePointsData = filter_ids(
df = samplePointsData,
query = f"""n1 in {list(gdf_nodes_simple.index.values)} and n2 in {list(gdf_nodes_simple.index.values)}""",
message = 'Restrict sample points to those with two associated sample nodes...',
)
I'll do a commit and pull request referencing this issue with the above fixes shortly.
The text was updated successfully, but these errors were encountered:
While experimenting running Melbourne with an official Australian 1km population grid instead of the GHSL grid (that can be done, I've now verified it works #213 ), a bug was observed in the neighbourhood analysis subprocess where points that weren't joined with the data were retained with the grid ID as NA values. The result of that is the grid ID is treated as a float64, and in turn can't be cast as an int (or int64) at a later processing stage.
Here is the function as currently implemented
https://github.com/global-healthy-liveable-cities/global-indicators/blob/7c7b974b2c0b47dba20646c3d9c0e77a5b0b0b93/process/subprocesses/setup_sp.py#L19-L40
Here is an updated version, that is configured to dropna (non matches) by default.
This also gets rid of the explicit 'right_index_name' argument -- as that doesn't need to be provided, it is identifiable from the data itself.
While writing this, it occurred that maybe an inner join would have done the same thing -- but in any case, the above meant the code ran successfully in the Melbourne test case, and I also confirmed that re-running a new analysis for Las Palmas resulted in the same results as before.
While doing the above I also re-factored ID filtering.
This replaced the following code:
https://github.com/global-healthy-liveable-cities/global-indicators/blob/7c7b974b2c0b47dba20646c3d9c0e77a5b0b0b93/process/subprocesses/_12_neighbourhood_analysis.py#L267-L297
with this function in setup_sp.py
and this code that uses it in _12_neighbourhood_analysis.py
I'll do a commit and pull request referencing this issue with the above fixes shortly.
The text was updated successfully, but these errors were encountered: