-
Notifications
You must be signed in to change notification settings - Fork 1
/
589Project.Rmd
671 lines (496 loc) · 29.7 KB
/
589Project.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
---
title: "Red Foxes in BC: an Analysis of the Spatial Point Process"
author: "Nowshaba Durrani, Ricky Heinrich, Viji Rajagopalan"
date: "2023-04-30"
output:
pdf_document:
extra_dependencies: ["float"]
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = F,fig.width=8, fig.height=6, warning = F, cache = T, message = F,fig.pos = "H", out.extra = "", fig.align = 'center')
```
```{r message=FALSE, include=F}
library(rgbif) #allows searching and retrieving data from GBIF
library(ggplot2) #use ggplot2 to add layer for visualization
library(sp) #Standardized Support for Spatial Vector Data
library(sf)
library(spatstat)
library(maptools)
```
```{r}
#occ_count() # occurance count for all the species in GBIF (Global Biodiversity Information Facility) - rgbif
redFox <- name_backbone(name="Vulpes vulpes")
redFoxList <- occ_data(taxonKey = redFox$speciesKey, hasCoordinate=TRUE, stateProvince='British Columbia', limit=2000)
mydata <- redFoxList$data
n_row <- nrow(redFoxList$data)
n_col <- ncol(redFoxList$data)
#n_row
#n_col
```
```{r}
load("BC_Covariates.Rda")
# Create a spatial points data frame from the longitude and latitude columns
coordinates <- mydata[,c("decimalLongitude", "decimalLatitude")]
dat.sp <- SpatialPointsDataFrame(c(mydata[,c('decimalLongitude','decimalLatitude')]), data = mydata)
# Set the current CRS
proj4string(dat.sp)<- CRS("+proj=longlat +datum=WGS84")
# Define the new CRS you want to transform to
new_crs <- CRS("+proj=aea +lat_0=45 +lon_0=-126 +lat_1=50 +lat_2=58.5 +x_0=1000000
+y_0=0 +datum=NAD83 +units=m +no_defs")
# Transform the data to the new CRS
data.sp_trans <- spTransform(dat.sp, new_crs)
#data_transformed
#data.sp_trans
#plot(data.sp_trans, main = "Locations in BC", cex = 0.8, col ="blue")
```
# Introduction
Red Foxes (Scientific Name - Vulpes Vulpes) are a widespread animal in Canada, and are present in British Columbia. In the Global Biodiversity Information Facility (GBIF) database, there are approximately 640,000+ geo-references records for this species around the world, and `r n_row` number of entries are classified as from British Columbia. According to the Canadian Wildlife Federation (CWF), farmers used to regard red foxes as pests, as they would steal chickens. This might suggest that they hang around farms and human activity to take advantage of food sources. As well, the CWF says that they "prefer open fields", which suggests that they are more likely to be found in areas with less forest cover. We are interested in finding out if the occurrence of red fox data agrees with these claims, and if there are more factors that affect their presence. Treating the occurrence of red foxes as a point process, we will investigate its characteristics and attempt to model its intensity.
```{r warning=FALSE, fig.height=5, fig.cap = "Occurence of Red Foxes in BC"}
parks_ppp <- ppp(x = data.sp_trans@coords[,1], # X coordinates
y = data.sp_trans@coords[,2], # Y coordinates
window = as.owin( DATA[["Window"]]),# Observation window
)
col_pal <- c("maroon")
# FIGURE 1
plot(parks_ppp,
main = "",
cex = 0.9,
col ="white",
border = 3,
cols = col_pal,
par(cex.main = 1.6))
```
# Methods
The data comes from the GBIF databases. We used the package `rgbif` to access the 'Vulpes Vulpes' data from R directly, sorting by instances occurring in BC. We've extracted the longitude and latitude data from this, and converted it appropriately using the `sp` package.
Our second source of data contained the BC Window object, as well as possible covariate data: `Elevation`, `Forest Cover`,`HFI` and `Distance to Water`.
We mainly used the `spatstat` package functions to conduct our analyses. Other supporting libraries for spatial data used are `maptools`, `sp` and `sf`. First we used `ppp` function to build a point pattern process object with the converted coordinates of the Red Fox locations from the GBIF data and the window from our second data source.
To conduct first moment analysis, we used functions from the aforementioned `spatstat` package. We did a quadrat test as well as hotspot analysis to gain insight into the homogeneity assumption of the point process. For second moment analysis, we looked into Ripley's K-function and pair correlation function using functions from `spatstat`. This provides us with insight into possible clustering tendencies of the point process.
Next we looked into the relationship of the intensity with each covariate. For smoothing estimate of the 4 covariates, `rhohat` function was used which helped us to see the relationship between red fox occurrence and each of the covariates. We used correlation function to study the correlation or collinearity of the covariates. Simple models with each of the covariates is first fit and based on the model results and the relationship of covariates with red fox data, we proceeded to fit models with a combination of covariates.
Fitted models were then visualized using `plot` function. We investigated patters in the residuals using `diagnose` function and the partial residuals with each of the covariate is studied using `parres` function to understand the fit of each covariate in the model.
The `splines` package was used to build a GAMs model using functions `ppm` from spatstat package along with `bs` function that generates a basis matrix for representing the family of piecewise polynomials. The purpose was to build a non-parametric model for the data. This model was also diagnosed using techniques mentioned above.
We then assessed the models with `quadrat.test` and `anova` functions to understand any significant deviates from given occurrences and compare and contrast the models. Using `effectfun()`, we also looked at the influence of individual covariates. This computes the intensity of a fitted point process model as a function of one of its covariates. Using AIC as model metric and considering model parsimony and simplicity, we selected the final model for the red fox data.
# Exploratory Analysis
## First Moment Analysis
We start with investigating whether the occurrence of red foxes in BC seems homogeneous, as it will inform our steps to define the intensity. We have conducted a quadrat test of homogeneity with both 5 x 5 and 10 x 10 quadrats. These quadrats are shown in Figure 2, where we can visually tell that the intensity in each quadrats are not the same. The quadrat test for both the 5 x 5 and the 10 x 10 quadrats provide a p-value of 2.2e-16, confirming that the varied intensities are not due to chance alone, but rather due to an inhomogeneous point process.
```{r}
#Split into a 5 by 5 quadrat and count points
Q5 <- quadratcount(parks_ppp,
nx = 5,
ny = 5)
Q10 <- quadratcount(parks_ppp,
nx = 10,
ny = 10)
```
```{r, fig.cap = "Quadrat counts of Red Fox occurences, left 5x5, right 10x10", eval=F}
# FIGURE 2
par(mfrow=c(1,2), bty = "n")
plot(parks_ppp,
pch = 12,
cex = 0.5,
cols = "#046C9A",
main = "")
plot(Q5, cex = 1, col = "red", add = T)
plot(parks_ppp,
pch = 12,
cex = 0.5,
cols = "#046C9A",
main = "")
plot(Q10, cex = 1, col = "red", add = T)
```
```{r, fig.width=8, fig.height=4,fig.cap = "Intensity of Quadrat counts of Red Fox occurences, left 5x5, right 10x10"}
#Plot the output
par(mfrow=c(1,2))
plot(intensity(Q5, image = T),
main = "")
plot(parks_ppp,
pch = 12,
cex = 0.5,
cols = "#046C9A", add = T)
plot(intensity(Q10, image = T), main = "" )
plot(parks_ppp,
pch = 12,
cex = 0.5,
cols = "#046C9A",
add = T)
```
```{r warning=FALSE, eval=F}
#Quadrat test of homogeneity
quadrat.test(Q10)
```
As the next step, we investigate for any hot spots in the occurrences of red foxes. In Figure 3, we can see that hotspots appear scattered and of moderately high density. It seems like these occurrences are more inland.
```{r fig.height=5, fig.cap = "Hotspot of Red Foxes"}
# Estimate R
R <- bw.ppl(parks_ppp)
#Calculate test statistic
LR <- scanLRTS(parks_ppp, r = R)
#Plot the output
plot(LR, main = "")
plot(parks_ppp[["window"]], border = "white", add = T)
#Compute local p-values
pvals <- eval.im(pchisq(LR,
df = 1,
lower.tail = FALSE))
#Plot the output
#plot(pvals, main = "Local p-values")
#plot(parks_ppp[["window"]], add = T)
```
## 2nd Moment Analysis
Ripley's K-function provides information on whether there are significant deviations from independence between points. Taking into account that the intensity of red fox occurrences appear inhomogeneous, we can see in Figure 4 that there is some evidence of clustering up to a certain distance, as the black line, indicating the observed data, is separate from the 95% confidence bands of the values expected with no clustering.
```{r fig.cap="Ripley's K function with border correction assuming homogeneity", eval=F}
E_fox_homo <- envelope(parks_ppp,
Kest,
correction="border",
rank = 1,
nsim = 19, # aka alpha of 0.05
fix.n = T,
verbose = F)
plot(E_fox_homo, lwd = 2, main = "")
# Assuming homogeneity in the point process, we see that the actual occurence lines appear clustered, as the black line does not overlap with the expected red line confidence intervals (significance level of 0.05).
```
```{r fig.cap="Ripley's K function with border correction assuming inhomogeneity"}
E_fox <- envelope(parks_ppp,
Kinhom,
correction="border",
rank = 1,
nsim = 19, # aka alpha of 0.05
fix.n = T,
verbose = F)
plot(E_fox, lwd = 2, main = "")
```
To get a sense of the distances for which clustering occurs, we used the pair correlation function.
```{r fig.cap="Pair correlation function assuming inhomogeneity"}
# Estimate the g function
#pcf_fox <- pcfinhom(parks_ppp) # assumes inhomogeneity
# estimate a strickly pos density
lambda_pos <- density(parks_ppp,sigma=bw.ppl,positive=TRUE)
# estimating with bootstraped ci
pcf_fox <- envelope(parks_ppp, pcfinhom, simulate = expression(rpoispp(lambda_pos)),rank = 1, nsim = 19,verbose = F)
# Default plot method
#plot(pcf_fox, lwd = 2)
# visualise the results
plot(pcf_fox, main = "")
#plot(pcf_fox,theo ~ r,ylim = c(0,20),main = "",col = "grey70",lwd = 2,lty = "dashed")
#plot(pcf_fox,iso ~ r, col = c("#046C9A"),lwd = 2, add = T)
```
Figure 5 shows evidence for clustering at distances smaller than around 23 000m, or 23km but after that the observed values are not significantly different that those expected from a random spatial process.
## Relationship with Covariates
Our data includes 4 covariates which we are exploring: the `Elevation`, `Forest Cover`, human footprint inventory (`HFI`), and `Distance to Water`. Given our research questions, we expect `HFI` and `Forest Cover` to have a relationship with red fox occurrences, however we also investigate the other two covariates.
```{r fig.cap="Effect of `HFI` on intensity of red foxes"}
# Smoothing Estimate of the 4 Covariates Transformation
rho_HFI <- rhohat(parks_ppp, DATA$HFI)
par(mfrow=c(1,2))
#Estimate Rho for HFI
plot(rho_HFI,
main = "",
xlab = "HFI")
plot(rho_HFI,
main = "",
xlab = "HFI", xlim = c(0,0.8))
```
```{r fig.cap= "Effect of `Forest Cover`, `Elevation`, and `Distance to Water` on intensity of red foxes"}
#Estimate Rho for Forest Cover
rho_forest <- rhohat(parks_ppp, DATA$Forest)
#Estimate Rho for elevation
rho_elev <- rhohat(parks_ppp, DATA$Elevation)
#Estimate Rho for water distance
rho_water <- rhohat(parks_ppp, DATA$Dist_Water)
par(mfrow=c(1,3))
plot(rho_forest,
main = "",
xlab = "Forest cover")
#plot(rho_elev,main = "",xlab = "Elevation")
plot(rho_elev,
main = "",
xlab = "Elevation", xlim = c(0,max(DATA$Elevation)))
plot(rho_water,
main = "",
xlab = "Distance from Water")
```
In the left plot of Figure 6, we could be fooled into thinking that there is no relationship between `HFI` and intensity of red foxes up to around `HFI` = 0.4, until which it seems like an exponential relationship. However, zooming in from `HFI` 0 to 0.8, we see that the confidence bands don't intersect at all with the red line, which is the expected value given no relationship. This relationship appears non-linear and possibly exponential, where the greatest intensity of observed red foxes occurs at high `HFI`s. This relationship was expected, as our dataset is not exhaustive but rather is crowdsourced, and naturally foxes are more likely to be observed by humans in spaces with higher `HFI`s.
In Figure 7, we see that there seems to be non-linear relationship between `Forest Cover` and number of red foxes observed. The observance increases with increase in `Forest Cover` at intermediate coverage and then it decreases. We also see that there is non-linear relationship with `Elevation`. The relationship appears to be non-linear as the graph is showing different results for different `Elevation` and we cannot see any type of specific pattern from the same. In case of `Distance to Water`, we don't observe a significant deviation in observed red foxes than expected by chance, indicating that it is not a useful covariate to model.
### Fit models for the covariets
```{r, warning=F}
#Fit the PPP model for HFI
fitHFI <- ppm(parks_ppp ~ HFI,data=DATA)
fitHFI
fitHFIexp <- ppm(parks_ppp~HFI + exp(HFI), data=DATA)
fitHFIexp
#Fit the PPP model for forest
fit <- ppm(parks_ppp ~ Forest + I(Forest^2),data=DATA)
fit
#Fit the PPP model for Elevation
fitElev <- ppm(parks_ppp ~ Elevation, data=DATA)
fitElev
#Fit the PPP model for Distance to water
fitWater <- ppm(parks_ppp ~ Dist_Water, data=DATA)
fitWater
```
We have fitted 6 models and we came to observe that `HFI`, exp(HFI), `Forest Cover`, I(Forest^2) and `Elevation` seems to be highly significant however `Distance to Water` seems to be in-significant for the occurrence of red foxes in the BC area. When we check the AIC values and it is seen that `HFI` and `HFI`(exp) has lower values so we can consider these models to be a better fit.
```{r}
HFIAIC <- AIC(fitHFI)
HFIexpAIC <- AIC(fitHFIexp)
ForestAIC <- AIC(fit)
ElevAIC <- AIC(fitElev)
WaterAIC <- AIC(fitWater)
# Create table
table <- data.frame(Model = c("HFI", "HFI(Exp)", "Forest Cover", "Elevation", "Dist to Water"), AIC = c(HFIAIC, HFIexpAIC, ForestAIC, ElevAIC, WaterAIC))
# Print table
print(table)
```
```{r eval=FALSE}
## Plot the fitted models
par(mfrow=c(2,2))
#Plot the model predictions for HFI(Exp) & overlay the red fox locations
plot(fitHFIexp, se = F, superimpose = F, main="Fitted Model for HFI(Exp)")
plot(parks_ppp,
pch = 16,
cex = 0.6,
cols = "white",
add = TRUE)
#Plot the model predictions for Forest Cover & overlay the red fox locations
plot(fit, se = F, superimpose = F, main="Fitted Model for Forest Cover")
plot(parks_ppp,
pch = 16,
cex = 0.6,
cols = "white",
add = TRUE)
#Plot the model predictions for Elevation & overlay the red fox locations
plot(fitElev, se = F, superimpose = F, main="Fitted Model for Elevation")
plot(parks_ppp,
pch = 16,
cex = 0.6,
cols = "white",
add = TRUE)
#Plot the model predictions for Distance to Water & overlay the red fox locations
plot(fitWater, se = FALSE, superimpose = FALSE, main="Fitted Model for Distance to Water)")
plot(parks_ppp,
pch = 16,
cex = 0.6,
cols = "white",
add = TRUE)
#Does not look like a good fit for all points in forest and also not easy to interpret. Here Elevation is significant but Distance to Water is not significant.
```
```{r}
### 2. Checking correlation
cor.im(DATA$Forest, DATA$HFI, DATA$Elevation, DATA$Dist_Water, use = "complete.obs")
```
We see that no covariate is strongly correlated with another, and so we can move on without taking so into account and treating the covariates as independent.
## Model Fitting
From our assessment on individual covariates, we see that there is a non-linear relationship between the covariates and the red fox point data. Based on this knowledge, we move forward to fit the first base model with covariates that are showing strong trends with red fox data and also of research interest. For this purpose, the selected covariates are 1. `Elevation` 2. `HFI` and 3. `Forest Cover`.
We built the model with the linear and quadratic terms of covariates `Elevation`, `HFI` and `Forest Cover`. We call this model as `Model 1`.
```{r fig.cap="Model 1 Details"}
#Fit the PPP model
fit_red <- ppm(parks_ppp ~ Forest + I(Forest^2) + HFI + I(HFI^2)+ Elevation + I(Elevation^2), data = DATA)
fit_red
```
The results indicate that covariates `HFI` and `Elevation` show strong significance and explain occurrence of red fox data in both linear and quadratic terms. Though `Forest Cover` showed non-linear relationship in second moment analysis, in this combined model with other covariates, its not significant.
As our next step, we drop the covariate `Forest Cover` from the model and build the next model with `HFI` and `Elevation` with linear and quadratic terms. We call this as `Model 2`.
```{r, fig.cap="Model 2"}
#Fit the PPP model
fit_red2 <- ppm(parks_ppp ~ HFI + I(HFI^2)+ Elevation + I(Elevation^2), data = DATA)
fit_red2
```
Based on the model results, both `HFI` and `Elevation` show strong significance and explain occurrence of red fox data in both linear and quadratic terms. For understanding the fit of the model, we plot the model and red fox locations together by overlaying the observed data on the predicted values from the model.
```{r fig.cap="Red fox occurences vs predicted model fit"}
#Plot the model predictions
plot(fit_red2,
se = FALSE,
superimpose = FALSE,log=TRUE,n=500, main="")
#Overlay the red fox locations
plot(parks_ppp, pch = 16, cex = 1, col = "white",use.marks = F, add = T)
plot(parks_ppp, pch = 16, cex = 0.8, col = "black",use.marks = F, add = T)
```
We see in Figure 8 that the red fox locations are captured well with yellow color in the background which is an indicator of high intensity area as predicted by the model. As the model seem to capture the red fox locations reasonably well, we then proceed with other diagnostics to validate the model and compare with other models to select the best.
First we use the `diagnose` function to validate `Model 2` based on residuals. To the top right of Figure 9, we can see the residual plot of the cumulative sum of raw residuals against y coordinates. The residuals are showing a good fit in the intermediate y coordinates range and the negative high residuals for high and low coordinates indicates the model predictions are higher than actuals. The the bottom left is the residual plot of the cumulative sum of raw residuals against x coordinates. The model overall has a good fit as seen in the plot with residuals mostly within the dotted band and the prediction is low when compared to actual in the higher x coordinate zones.
Overall, the model is providing a good fit in the intermediate x and y coordinate areas and has tendency to deviate in the high and low coordinate areas of BC.
```{r fig.cap="Plotted residuals of Model 2"}
#Calculate the residuals
#res <- residuals(fit_red)
#Visualise
#plot(res, cols = "transparent",)
#Due to error with residuals function for this data, diagnose.ppm is used instead
diagnose.ppm(fit_red2)
```
```{r}
#Run the quadrat test
quadrat.test(fit_red2, nx = 5, ny = 5)
```
Additionally, we do a quadrat test on the model which uses a chi-squared test. We get a small p-value indicating the model has significant deviations from the observed data. So we can conclude that this model is useful one however it is not a perfect fit.
We move on to assess this model deeply against a few other models and also investigate further to decide if this is the best choice among the models evaluated.
## Model Selection and Validation
With a model that fits the data identified, we proceed to do a thorough validation of this model and also compare with a few other models to select the best one for our data.
First, we start with a simple test evaluating the AIC score of `Model 1` and `Model 2`. The AIC scores are 10401.62 and 10403.57 respectively. We can see that there is not a huge difference in terms of this score between the models. A likelihood ratio test also suggests that there is no evidence of significant difference in performance between the two models.
As we are interested in a parsimonious model, we prefer `Model 2` out of the two options, as it has only two covariates: `Elevation` and `HFI`.
```{r include=FALSE}
#AIC(fitHFI); AIC(fitHFIexp); AIC(fitHFIquad); AIC(fit_for); AIC(fit_red)
AIC(fit_red); AIC(fit_red2)
```
```{r}
#Likelihood ratio test
anova(fit_red2,fit_red, test = "LRT")
```
```{r}
## Rhohat function for this data errors out as this has NA. But we use other functions
## like funceffect and parres to validate each covariate.
#Calculate the relative intensity as a function of elevation
#rh_elev <- rhohat(fit_red2, DATA$Elevation)
#Calculate the relative intensity as a function of forest
#rh_for <- rhohat(fit_red, DATA$Forest)
#Calculate the relative intensity as a function of forest
#rh_HFI <- rhohat(fit_red, DATA$HFI)
#Side by side plotting
#par(mfrow = c(1,2))
#plot(rh_elev,
# legend = FALSE,
# main = "",
# xlab = "Elevation (m)")
#plot(rh_for,
# legend = FALSE,
# main = "",
# xlab = "Forest")
#plot(rh_HFI,
# legend = FALSE,
# main = "",
# xlab = "HFI")
```
Next, next we look at the partial residuals of `Model 2` for each of the covariates, showing the fitted effect of a covariate alongside the observed effect.
```{r fig.cap="Partial residuals for Model 2 covariates"}
#Calculate the partial residuals as a function of elevation
par_res_elev <- parres(fit_red2, "Elevation")
#Calculate the relative intensity as a function of HFI
par_res_HFI <- parres(fit_red2, "HFI")
#Side by side plotting
par(mfrow = c(1,2))
plot(par_res_elev,
legend = FALSE,
lwd = 2,
main = "",
xlab = "Elevation (m)")
plot(par_res_HFI,
legend = FALSE,
lwd = 2,
main = "",
xlab = "HFI")
```
Based on the plot for `HFI`, on the right side of Figure 10, we see that the model is capturing the patterns in the data really well. Looking at the plot for `Elevation` on the left side, the model is capturing the patterns in the data very well except for higher `Elevation`.
We compute the intensity of a fitted point process model as a function of one of its covariates to look at the influence of the individual covariates.
```{r fig.cap="Right: Elevation effect at mean HFI, Left: HFI effect at mean elevation"}
#Mean HFI
E_hfi <- mean(DATA$HFI)
#Elevational effect on lambda at mean HFI
elev_effect <- effectfun(fit_red2, "Elevation", HFI = E_hfi, se.fit = T)
#Mean elevation
E_elev <- mean(DATA$Elevation)
#HFI effect on lambda at mean elevation
hfi_effect <- effectfun(fit_red2, "HFI", Elevation = E_elev, se.fit = T)
#Side by side plotting
par(mfrow = c(1,2))
#Plot the `Elevation` effect
plot(elev_effect,
legend = FALSE,
main = "")
#Plot the slope effect
plot(hfi_effect,
legend = FALSE,
main = "")
```
From Figure 11, we can see that intensity of the model can be described well as a function of `Elevation` and `HFI`.
Though the `Model 2` evaluation so far is very promising, we see that the intensity or occurrence of red fox at higher `Elevation` is not captured well. We try to improve it by adding a higher order polynomial for `Elevation` but it results in convergence error. So, we compare with a non parametric alternative using an additive modelling framework (GAMs) as it allows more flexibility. We call this as our `Model 3`.
```{r}
library(splines)
#Fit the PPP model
fit_smooth <- ppm(parks_ppp ~ bs(Elevation,12) + bs(HFI, 5), data = DATA, use.gam = TRUE)
fit_smooth
```
For a quick assessment, we compare the AIC score of both models and do a quadrat test to validate if one model is superior to the other. The resulting AIC score for the GAMs model is 10408.13 which is higher than the `Model 2`. The quadrat test has a p-value greater than a significance value of 0.05 which tells us that any one model is not superior to the other.
```{r include=F}
#Delta AIC
AIC(fit_red2)
AIC(fit_smooth)
```
```{r fig.cap="Partial residuals for Model 3 covariates"}
#Calculate the partial residuals as a function of elevation
par_res_elev <- parres(fit_smooth, "Elevation")
#Calculate the relative intensity as a function of HFI
par_res_HFI <- parres(fit_smooth, "HFI")
#Side by side plotting
par(mfrow = c(1,2))
plot(par_res_elev,
legend = FALSE,
lwd = 2,
main = "",
xlab = "Elevation (m)")
plot(par_res_HFI,
legend = FALSE,
lwd = 2,
main = "",
xlab = "HFI")
```
Additionally, the partial residuals of `Model 3` shown in Figure 12 don't look particularly well. Relatively, `Model 2` has captured the underlying data better and even in the higher `Elevation` area, the output from `Model 3` is still not convincing.
```{r}
#Likelihood ratio test
anova(fit_red2,fit_smooth, test = "LRT")
```
Since the LRT test p-value is above a significance level of 0.05, we can conclude that the additional complexity brought on by `Model 3` is not warranted. We conclude `Model 2` as the winner to describe the red fox intensity in British Columbia.
# Discussion:
First look at the occurrence data of red fox on BC map showed that the species is spread across the province and look clustered in some places. First moments analysis clearly showed that the intensity of data is inhomogeneous and there are a few hot spots in the province. Second moment analysis using Ripley's K-function showed there is some clustering of the data but the pair correlation which focuses on contribution of all inter-point distances confirmed that there is no significant clustering in data past 23km. Based on this, we conclude there is very little or no clustering in the occurrences of red fox.
Analysis of covariates `HFI`, `Forest Cover`, `Elevation` with this data shows the variables have a non-linear relationship with the data. The fourth covariate available to us is `Distance to Water` and this does not show promising relationship in our initial assessment plots. Individual models with all the covariates is created and studied. `Distance to Water` is not significant and so is excluded from further analysis. We also discovered that there is no significant correlation between the variables and this allows us to combine them in further modeling the given red fox data.
First combined model (`Model 1`) with `Forest Cover`, `Elevation` and `HFI` is fitted with linear and quadratic terms and we discovered that `Forest Cover` is not a significant predictor. So, we removed `Forest Cover` and fitted the next model (`Model 2`) with `Elevation` and `HFI` including linear and quadratic terms. For a good comparison, Model 3 is also fit which is a GAMs model. AIC scores from the three models are tabulated below.
| | **3 covariates (Model 1)** | **2 covariates (Model 2)** | **GAM w 2 covariates (Model 3)** |
|-----------|----------|-----------|----------|
| **AIC** | 10401.62 | 10403.57 | 10408.13 |
As seen above, `Model 2` has comparatively a lower AIC score and a quadrat test between all three models showed that not any one of the models is superior. We decide to select a parsimonious model and `Model 2` is selected. The residual, partial residual and covariate effect plots support the goodness of fit for this model. They also reveal that there are opportunities for improvements with this model especially at high and low `Elevation` points.
Going back to our research questions, we have found that in this dataset, red foxes are more likely to be observed in areas of higher human activity, as we saw that `HFI` was a significant covariate in all of our models. We have not found that forest cover is significant in modeling the intensity of red foxes in our final model, but the individual effect of forest cover seemed to be significant, suggesting that red foxes were observed more often in places with a forest cover between 10-60%. We found there is significant relationship with `Elevation`, and that there is no evidence of a significant relationship with distance to water.
There are some interesting challenges and insights gathered during the analysis and modeling process and we would like to share to help with future research. The Rho plots and standard residuals plot for this data error out due to NAs in the data. The `Model 2` which is the selected model does not converge if higher order of `Elevation` variables are included. GAMs model is used as a comparison here and only limited tuning is performed in the interest of time. An in depth tuning in our opinion could help build a second model for this data.
# Appendix
### Plotting red fox occurances with covariates
```{r, warning=F}
par(mfrow=c(2,2))
plot(DATA$HFI, box = F, par(cex.main = 1), main = "HFI")
plot(parks_ppp,
pch = 16,
cex = 0.6,
cols = "white",
add = TRUE)
plot(parks_ppp,
pch = 16,
cex = 0.5,
cols = "black",
add = TRUE)
plot(DATA$Forest, box = F, par(cex.main = 1), main = "Forest Cover")
plot(parks_ppp,
pch = 16,
cex = 0.6,
cols = "white",
add = TRUE)
plot(parks_ppp,
pch = 16,
cex = 0.5,
cols = "black",
add = TRUE)
plot(DATA$Elevation, box = F, main = "Elevation")
plot(parks_ppp,
pch = 16,
cex = 0.6,
cols = "white",
add = TRUE)
plot(parks_ppp,
pch = 16,
cex = 0.5,
cols = "black",
add = TRUE)
plot(DATA$Dist_Water, box = F, main = "Distance from Water")
plot(parks_ppp,
pch = 16,
cex = 0.6,
cols = "white",
add = TRUE)
plot(parks_ppp,
pch = 16,
cex = 0.5,
cols = "black",
add = TRUE)
```
# References: Include references to all necessary literature.
1. Data: https://www.gbif.org/species/5219243, accessed through `rgbif` package
2. Research topics: Canadian Wildlife Federation, https://cwf-fcf.org/en/resources/encyclopedias/fauna/mammals/red-fox.html