Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spatSample() slowness #1353

Closed
mdsumner opened this issue Nov 23, 2023 · 5 comments
Closed

spatSample() slowness #1353

mdsumner opened this issue Nov 23, 2023 · 5 comments

Comments

@mdsumner
Copy link
Contributor

mdsumner commented Nov 23, 2023

spatSample() is very slow for this moderate sized COG GeoTIFF (there's no overviews)

## 2268, 2193, 3  (nrow, ncol, nlyr)
dsn <- "/vsicurl/https://github.com/mdsumner/cog-example/raw/main/cog/sentinel-image.tif"

library(terra)
r <- rast(dsn)

this spatSample call takes forever, and in particular affects the performance of plotRGB(r) :

spatSample(r, 500000, method = "regular", as.raster = TRUE, warn = F)

in fact it's better to do plotRGB(r, maxcell = Inf) but that's obviously a bad idea for very much larger sources.

I can't see why since it's a standard call to RasterIO but I haven't explored more. Using RasterIO outside of terra works quickly as expected.

These are also fast, read the entire thing into memory, or sample to very much smaller:

r.mem <- r * 1
spatSample(r, c(20, 20), method = "regular", as.raster = TRUE, warn = F)
@kadyb
Copy link
Contributor

kadyb commented Nov 23, 2023

We can also download the entire file (by removing the vsicurl) without loading it into memory and it works ok:

system.time({
  dsn <- "https://github.com/mdsumner/cog-example/raw/main/cog/sentinel-image.tif"
  r <- rast(dsn)
  spatSample(r, 500000)
})
#> user  system elapsed 
#> 0.21    0.08    4.08

@mdsumner
Copy link
Contributor Author

mdsumner commented Nov 24, 2023

I think it's the OVERVIEW_LEVEL=NONE setting, and appears to be a GDAL problem. The slowness is entirely in RasterIO(), I will follow up with GDAL itself, I could PR to avoid setting that open option if no overviews exist, or perhaps just remove it.

I can reproduce the slowness problem with sf like this:

d2 <- c(219, 226)
dsn <- "/vsicurl/https://github.com/mdsumner/cog-example/raw/main/cog/sentinel-image.tif"

# ## uses RasterIO via sf (equivalent to the spatSample() call above if we include this open option
sf::gdal_utils("translate", dsn, tf <- tempfile(tmpdir = "/vsimem", fileext = ".tif"), options = c("-outsize", d2[1], d2[2], "-ot", "Float64",  "-oo", "OVERVIEW_LEVEL=NONE"))
##a2 <- rast(stars::read_stars(tf))

@mdsumner
Copy link
Contributor Author

also, it's not a COG - there's no overviews, an oversight on my part

@mdsumner
Copy link
Contributor Author

mdsumner commented Nov 25, 2023

lovely and fast now as expected since 42ce57ea1edea81f6c4d2e0eb80aa9af3a2beecf

dsn <- "/vsicurl/https://github.com/mdsumner/cog-example/raw/main/cog/sentinel-image.tif"
library(terra)
r <- rast(dsn)
plotRGB(r)

(will be backported to GDAL 3.8.0 by the looks).

there could be some changes in terra, but ultimately I think these RasterIO calls should be replaced by GDALWarp() which will target the right overview, and also work for very large rasters - perhaps terra doesn't want to allow that, or perhaps maxcell should include some way to allow choice of overviews ... so will leave this open for discussion.

@rhijmans
Copy link
Member

sampling for plotting now uses the overviews. That made the example in #1412 much faster. I do not think that sampling should generally use the overviews as it would not be a sample anymore; but other data visualization/exploration methods could also use them, I suppose.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants