spatial join in neighbourhood analysis code can retain points with no grid ID, causing type issues #214

carlhiggs · 2023-03-13T07:22:20Z

While experimenting running Melbourne with an official Australian 1km population grid instead of the GHSL grid (that can be done, I've now verified it works #213 ), a bug was observed in the neighbourhood analysis subprocess where points that weren't joined with the data were retained with the grid ID as NA values. The result of that is the grid ID is treated as a float64, and in turn can't be cast as an int (or int64) at a later processing stage.

Here is the function as currently implemented

https://github.com/global-healthy-liveable-cities/global-indicators/blob/7c7b974b2c0b47dba20646c3d9c0e77a5b0b0b93/process/subprocesses/setup_sp.py#L19-L40

Here is an updated version, that is configured to dropna (non matches) by default.

def spatial_join_index_to_gdf(
    gdf, join_gdf, join_type='within', dropna=True
):
    """Append to a geodataframe the named index of another using spatial join.

    Parameters
    ----------
    gdf: GeoDataFrame
    join_gdf: GeoDataFrame
    join_type: str (default 'within')
    dropna: True

    Returns
    -------
    GeoDataFrame
    """
    gdf_columns = list(gdf.columns)
    gdf = gpd.sjoin(gdf, join_gdf, how='left', predicate=join_type)
    gdf = gdf[gdf_columns + ['index_right']]
    gdf.columns = gdf_columns + [join_gdf.index.name]
    if dropna:
        gdf = gdf[~gdf[join_gdf.index.name].isna()]
    gdf[join_gdf.index.name] = gdf[join_gdf.index.name].astype(join_gdf.index.dtype)
    return gdf

This also gets rid of the explicit 'right_index_name' argument -- as that doesn't need to be provided, it is identifiable from the data itself.

While writing this, it occurred that maybe an inner join would have done the same thing -- but in any case, the above meant the code ran successfully in the Melbourne test case, and I also confirmed that re-running a new analysis for Las Palmas resulted in the same results as before.

While doing the above I also re-factored ID filtering.

This replaced the following code:

https://github.com/global-healthy-liveable-cities/global-indicators/blob/7c7b974b2c0b47dba20646c3d9c0e77a5b0b0b93/process/subprocesses/_12_neighbourhood_analysis.py#L267-L297

with this function in setup_sp.py

def filter_ids(df, query, message):
        print(message)
        pre_discard = len(df)
        df = df.query(query)
        post_discard = len(df)
        print(
            f'  {pre_discard - post_discard} sample points discarded, '
            f'leaving {post_discard} remaining.',
        )
        return df

and this code that uses it in _12_neighbourhood_analysis.py

    samplePointsData = filter_ids(
        df = samplePointsData,
        query = f"""grid_id not in {list(grid.query(f'pop_est < {population["pop_min_threshold"]}').index.values)}""",
        message = 'Restrict sample points to those not located in grids with a population below '
        f"the minimum threshold value ({population['pop_min_threshold']})...",
    )
    samplePointsData = filter_ids(
            df = samplePointsData,
            query = f"""n1 in {list(gdf_nodes_simple.index.values)} and n2 in {list(gdf_nodes_simple.index.values)}""",
            message = 'Restrict sample points to those with two associated sample nodes...',
        )

I'll do a commit and pull request referencing this issue with the above fixes shortly.

The text was updated successfully, but these errors were encountered:

carlhiggs added bug Something isn't working and removed bug Something isn't working labels Mar 13, 2023

carlhiggs added a commit that referenced this issue Mar 14, 2023

updated 12_neighbourhood_analysis.py and setup_sp.py as per #214

0bf3d12

carlhiggs mentioned this issue Mar 14, 2023

Address an edge case issue where sample points aren't associated with grid ids, causing a type error and failure to run #215

Merged

carlhiggs closed this as completed Mar 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

spatial join in neighbourhood analysis code can retain points with no grid ID, causing type issues #214

spatial join in neighbourhood analysis code can retain points with no grid ID, causing type issues #214

carlhiggs commented Mar 13, 2023

spatial join in neighbourhood analysis code can retain points with no grid ID, causing type issues #214

spatial join in neighbourhood analysis code can retain points with no grid ID, causing type issues #214

Comments

carlhiggs commented Mar 13, 2023