vignettes/articles/perf_comp.Rmd
perf_comp.Rmd
library(modisfast)
library(appeears)
library(MODIStsp)
library(MODISTools)
library(sf)
library(terra)
library(benchmarkme)
The goal of this exercise is to evaluate the performance of various R packages designed for accessing MODIS data, specifically focusing on data access and import times. To do this, we will download and import a one-year monthly time series of MODIS Normalized Difference Vegetation Index (NDVI) at a 1 km spatial resolution for the entire country of Madagascar.
The MODIS
package offers access to some MODIS data through global online data
archives, but it lacks comprehensive documentation and was removed from
the CRAN repository in 2023 for policy violations. Furthermore, some of
its dependencies are not available anymore on CRAN.
The MODIStsp
package provides both a command-line and a user-friendly graphical
interface to extract MODIS data in standard TIFF or original HDF
formats. However, it does not allow to extract data at a sub-tile level
(i.e. spatial subsetting capabilities are limited), and it was also
removed from the CRAN repository in 2023 at the maintainer’s
request.
The MODIStools
package serves as a programmatic interface to the ‘MODIS Land
Products Subsets’ web service, providing access to 46 MODIS and
VIIRS collections. This package, available on CRAN, extracts data at
specific points or buffer zones around a point. However, for area-based
queries, the maximum area size is 200 km x 200 km. This makes it
suitable for point-based data extraction but less effective for wide
area-based queries.
The appeears
package acts as a programmatic interface to the NASA AppEEARS API
services. AppEEARS is a NASA-built application that offers a simple and
efficient way to access and transform geospatial data from a variety of
federal data archives (including MODIS and VIIRS, but not GPM). AppEEARS
allows accessing data from various NASA federal archives, including
MODIS and VIIRS, and enables users to subset geospatial datasets using
spatial, temporal, and band/layer parameters. Indeed, as for
modisfast
, AppEEARS uses OPeNDAP. While similar to
modisfast
, appeears
offers a broader range of
data sources but has a latency period (ranging from minutes to hours)
for query processing due to server-side post-processing (mosaicking,
reprojection, etc.).
Finally, some R packages, such as rgee
[@rgee], rely on proprietary software or data
access protocols and are not discussed here for that reason.
Before proceeding to the test, some system benchmarking :
Sys.time()
#> [1] "2024-11-09 10:08:28 CET"
Sys.info()
#> sysname
#> "Linux"
#> release
#> "6.8.0-48-generic"
#> version
#> "#48~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Mon Oct 7 11:24:13 UTC 2"
#> nodename
#> "ptaconet-Latitude-7490"
#> machine
#> "x86_64"
#> login
#> "ptaconet"
#> user
#> "ptaconet"
#> effective_user
#> "ptaconet"
benchmarkme::get_r_version()
#> $platform
#> [1] "x86_64-pc-linux-gnu"
#>
#> $arch
#> [1] "x86_64"
#>
#> $os
#> [1] "linux-gnu"
#>
#> $system
#> [1] "x86_64, linux-gnu"
#>
#> $status
#> [1] ""
#>
#> $major
#> [1] "4"
#>
#> $minor
#> [1] "4.0"
#>
#> $year
#> [1] "2024"
#>
#> $month
#> [1] "04"
#>
#> $day
#> [1] "24"
#>
#> $`svn rev`
#> [1] "86474"
#>
#> $language
#> [1] "R"
#>
#> $version.string
#> [1] "R version 4.4.0 (2024-04-24)"
#>
#> $nickname
#> [1] "Puppy Cup"
benchmarkme::get_cpu()
#> $vendor_id
#> [1] "GenuineIntel"
#>
#> $model_name
#> [1] "Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz"
#>
#> $no_of_cores
#> [1] 8
benchmarkme::get_ram()
#> 33.5 GB
appeears
appears_start_time <- Sys.time()
roi <- st_as_sf(data.frame(id = "madagascar", geom = "POLYGON((41.95 -11.37,51.26 -11.37,51.26 -26.17,41.95 -26.17,41.95 -11.37))"), wkt="geom", crs = 4326)
time_range <- as.Date(c("2023-01-01","2023-12-31"))
collection <- "MOD13A3.061"
variables <- c("_1_km_monthly_NDVI")
# Replace Sys.getenv("earthdata_un") and Sys.getenv("earthdata_pw") below by your Earthdata credentials (see https://github.com/bluegreen-labs/appeears#use to setup an appeears connection)
rs_set_key(
user = Sys.getenv("earthdata_un"),
password = Sys.getenv("earthdata_pw")
)
token <- rs_login(user = Sys.getenv("earthdata_un"))
df <- data.frame(
task = "polygon_test",
subtask = "subtask",
start = as.character(time_range[1]),
end = as.character(time_range[2]),
product = collection,
layer = variables
)
# create the task
task <- rs_build_task(
df = df,
roi = roi,
format = "netcdf4"
)
# request the task to be executed
rs_request(
request = task,
user = Sys.getenv("earthdata_un"),
transfer = TRUE,
verbose = TRUE
)
r <- terra::rast(file.path(tempdir(),df$task,"MOD13A3.061_1km_aid0001.nc"))
r
#> class : SpatRaster
#> dimensions : 1777, 1118, 24 (nrow, ncol, nlyr)
#> resolution : 0.008333333, 0.008333333 (x, y)
#> extent : 41.95, 51.26667, -26.175, -11.36667 (xmin, xmax, ymin, ymax)
#> coord. ref. : +proj=longlat +datum=WGS84 +no_defs
#> sources : MOD13A3.061_1km_aid0001.nc:_1_km_monthly_NDVI (12 layers)
#> MOD13A3.061_1km_aid0001.nc:_1_km_monthly_VI_Quality (12 layers)
#> varnames : _1_km_monthly_NDVI (1 km monthly NDVI)
#> _1_km_monthly_VI_Quality (1 km monthly VI Quality)
#> names : _1_km~DVI_1, _1_km~DVI_2, _1_km~DVI_3, _1_km~DVI_4, _1_km~DVI_5, _1_km~DVI_6, ...
#> unit : NDVI, NDVI, NDVI, NDVI, NDVI, NDVI, ...
#> time (days) : 2023-01-01 to 2023-12-01
appears_end_time <- Sys.time()
modisfast
modisfast_start_time <- Sys.time()
roi <- st_as_sf(data.frame(id = "madagascar", geom = "POLYGON((41.95 -11.37,51.26 -11.37,51.26 -26.17,41.95 -26.17,41.95 -11.37))"), wkt="geom", crs = 4326)
time_range <- as.Date(c("2023-01-01","2023-12-31"))
collection <- "MOD13A3.061"
variables <- c("_1_km_monthly_NDVI")
log <- mf_login(credentials = c(Sys.getenv("earthdata_un"), Sys.getenv("earthdata_pw")))
#> Checking credentials...
urls <- mf_get_url(
collection = collection,
variables = variables,
roi = roi,
time_range = time_range
)
#> Building the URLs...
#> Estimated maximum size of data to be downloaded is 97 Mb
res_dl <- mf_download_data(urls, parallel = T)
#> 5 datasets in total : 0 already downloaded and 5 datasets to download
#> Downloading the data ... (destination folder: /tmp/RtmpbWJaGX/modisfast_405f7f2ea9cf )
#> Estimated maximum size of data to be downloaded is ~ 97 Mb
#>
#> Actual size of downloaded data is 24 Mb
r <- mf_import_data(
path = dirname(res_dl$destfile[1]),
collection = collection,
proj_epsg = 4326
)
#> Importing the dataset as a SpatRaster object...
r
#> class : SpatRaster
#> dimensions : 1508, 1796, 12 (nrow, ncol, nlyr)
#> resolution : 0.009861233, 0.009861233 (x, y)
#> extent : 38.40805, 56.11883, -26.24157, -11.37083 (xmin, xmax, ymin, ymax)
#> coord. ref. : lon/lat WGS 84 (EPSG:4326)
#> source(s) : memory
#> names : _1_km~DVI_1, _1_km~DVI_2, _1_km~DVI_3, _1_km~DVI_4, _1_km~DVI_5, _1_km~DVI_6, ...
#> min values : -0.1992, -0.1877457, -0.1953000, -0.1842000, -0.197400, -0.1993000, ...
#> max values : 0.9764, 0.9764000, 0.9748999, 0.9725385, 0.973778, 0.9645233, ...
#> time (days) : 2023-01-01 to 2023-12-01
modisfast_end_time <- Sys.time()
MODIStsp
modistsp_start_time <- Sys.time()
MODIStsp(gui = FALSE,
parallel = TRUE,
selprod = "Vegetation_Indexes_Monthly_1Km (M*D13A3)",
prod_version = "061",
bandsel = "NDVI",
sensor = "Terra",
start_date = gsub("-",".",as.character(time_range[1])),
end_date = gsub("-",".",as.character(time_range[2])),
user = Sys.getenv("earthdata_un"),
password = Sys.getenv("earthdata_pw"),
bbox = c(41.95,-26.17,51.26,-11.37),
download_range = "Full",
spatmeth = "bbox",
out_projsel = "User Defined",
output_proj = "4326",
out_res_sel = "Native",
out_format = "GTiff"
)
modistsp_end_time <- Sys.time()
The system returns that “Username and/or password are not valid”,
although they are those successfully used with both
modisfast
and appeears
. We could
therefore not finish the test with MODIStsp
.
MODIStools
MOD13A3 is not available with MODIStools
, so we run the
exercise with MOD21A2 (spatial resolution : 1 km ; temporal resolution :
8 day). Because the temporal resolution is 4 fold lower than MOD13A3 (8
days instead of 1 month), we also divide the time range by 4 to have
comparable data.
modistools_start_time <- Sys.time()
subset <- mt_subset(product = "MOD21A2",
lat = -19.5,
lon = 46.5,
band = "LST_Day_1KM",
start = as.character(time_range[1]),
end = "2023-03-12",
km_lr = 500,
km_ab = 900,
site_name = "madagascar",
internal = TRUE,
progress = FALSE)
modistools_end_time <- Sys.time()
The system returns that “Number of kilometers above or below the
subset location must be less than or equal to 100.”, which is not enough
for the whole size of Madagascar. MODIStools
enables to
download data for 200 km x 200 km areas maximum. We could
therefore not finish the test with MODIStools
.
modisfast
and appeears
passed the test,
MODIStools
and MODIStsp
did not (for different
reasons). Let’s see the duration of data processing access :
# with modisfast
round(modisfast_end_time - modisfast_start_time,2)
#> Time difference of 1.21 mins
# with appeears
round(appears_end_time - appears_start_time,2)
#> Time difference of 3.91 mins
We see here that accessing and importing the time-series with
modisfast
was much faster than with appeears
!
Once again, appeears
has a variable latency period for
query processing, ranging from minutes to hours, due to server-side
post-processing. modisfast
does not have such latency.