diff --git a/lectures/w05_dataarch.qmd b/lectures/w05_dataarch.qmd new file mode 100644 index 0000000..f696b13 --- /dev/null +++ b/lectures/w05_dataarch.qmd @@ -0,0 +1,214 @@ +--- +title: "Geographic Data Science" +subtitle: "Point Patterns" +author: "Elisabetta Pietrostefani & Carmen Cabrera-Arnau" +format: + revealjs: + navigation-mode: grid +align-items: center; +--- + +# The *point* of points + +# Points like polygons + +- Points *can* represent "fixed" entities +- In this case, points are qualitatively similar to polygons/lines +- The goal here is, taking location fixed, to model other aspects of the data + +# Points like polygons + +Examples: - Cities (in most cases) - Buildings - Polygons represented as their centroid - ... + +# When points are not polygons + +Point data are not only a different geometry than polygons or lines...
... Points can also represent a fundamentally different way to approach spatial analysis + +# Points unlike polygons + +# A few examples + +# + +
+ +centered image + +
+ +# + +
+ +centered image + +
+ +# + +
+ +centered image + +
+ +# Points patterns + +# Points patterns + +Distribution of **points** over a portion of **space** Assumption is a point can happen anywhere on that space, but only happens in specific locations + +- **Unmarked**: locations only +- **Marked**: values attached to each point + +# Point Pattern Analysis + +Describe, characterize, and explain point patterns, focusing on their **generating process** + +- Visual exploration +- Clustering properties and clusters +- Statistical modeling of the underlying processes + +# Visualization of Point Patterns + +# Visualization of PPs + +Four routes (today): + +- One-to-one mapping -- "Scatter plot" +- Aggregate -- "Histogram" +- Smooth -- KDE +- Smooth -- Interpolation + +# One-to-one + +- Intuitive +- Effective in small datasets +- Limited as size increases until useless + +# One-to-one + +
+ +centered image + +
+ +# Aggregation + +# Points meet polygons + +- Use polygon boundaries and count points per area \[Insert your skills for choropleth mapping here!!!\] +- But, the polygons need to *"make sense"* (their delineation needs to relate to the point generating process) + +# + +
+ +    + +
+ +# Hex-binning + +If no polygon boundary seems like a good candidate for aggregation... ...draw a hexagonal (or squared) tesselation!!! + +Hexagons... + +- Are regular +- Exhaust the space (Unlike circles) +- Have many sides (minimize boundary problems) + +# + +
+ +       + +
+ +# But + +- (Arbitrary) aggregation may induce MAUP +- Points usually represent events that affect only part of the population and hence are best considered as rates + +# Kernet Density Estimation (KDE) + +# KDE + +Estimate the **(continuous)** observed distribution of a variable + +- Probability of finding an observation at a given point +- "Continuous histogram" +- Solves (much of) the MAUP problem, but not the underlying population issue + +# Bivariate (spatial) KDE + +Probability of finding observations at a given point in space + +- **Bivariate** version: distribution of pairs of values +- In **space**: values are coordinates (XY), locations +- Continuous "version" of a choropleth + +# + +
+ +centered image + +
+ +# + +
+ +    + +
+ +# Interpolation + +- Estimating values spatially continuous variables for spatial locations where they **have not** been observed, based on observations. +- **Geostatistics**, is concerned with the modelling, prediction and simulation of spatially continuous phenomena. + +# Inverse Distance Weighting (IDW) + +- We observe a property of a phenomenon $Z(s)$ at a **limited** number of sample locations, and are interested in the property value at **all** locations. +- Have to predict it for unobserved locations. + +# Kriging + +If we were predicting prices + +$$Price_i = \sum^N_{j=1} w_j * Price_j + \epsilon_i$$ + +- with $w_j = (\frac{1}{d_{ij}})^2$ for all $i$ and $j \neq i$ +- $d$ the distance between $i$ and $j$. + +# + +
+ +centered image + +
+ +# Parametres + +- **Variable**: for example price +- **Nearest Neighbours** : the number of nearest observations that should be used +- **idp** : set inverse distance power to 2 + +A super useful link [here](https://gisgeography.com/inverse-distance-weighting-idw-interpolation/) + +# Parametres + +idp = 1
idp = 2 + +# Density-Based Spatial Clustering of Applications with Noise, or DBSCAN + +# Questions + +# + +Creative Commons License
[Geographic Data Science]{xmlns:dct="http://purl.org/dc/terms/" property="dct:title"} by Elisabetta Pietrostefani is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.