Introduction

The goal of this exercise is to evaluate the performance of various R packages designed for accessing MODIS data, specifically focusing on data access and import times. To do this, we will download and import a one-year monthly time series of MODIS Normalized Difference Vegetation Index (NDVI) at a 1 km spatial resolution for the entire country of Madagascar.

Other R packages available for accessing MODIS data

The MODIS package offers access to some MODIS data through global online data archives, but it lacks comprehensive documentation and was removed from the CRAN repository in 2023 for policy violations. Furthermore, some of its dependencies are not available anymore on CRAN.

The MODIStsp package provides both a command-line and a user-friendly graphical interface to extract MODIS data in standard TIFF or original HDF formats. However, it does not allow to extract data at a sub-tile level (i.e. spatial subsetting capabilities are limited), and it was also removed from the CRAN repository in 2023 at the maintainer’s request.

The MODIStools package serves as a programmatic interface to the ‘MODIS Land Products Subsets’ web service, providing access to 46 MODIS and VIIRS collections. This package, available on CRAN, extracts data at specific points or buffer zones around a point. However, for area-based queries, the maximum area size is 200 km x 200 km. This makes it suitable for point-based data extraction but less effective for wide area-based queries.

The appeears package acts as a programmatic interface to the NASA AppEEARS API services. AppEEARS is a NASA-built application that offers a simple and efficient way to access and transform geospatial data from a variety of federal data archives (including MODIS and VIIRS, but not GPM). AppEEARS allows accessing data from various NASA federal archives, including MODIS and VIIRS, and enables users to subset geospatial datasets using spatial, temporal, and band/layer parameters. Indeed, as for modisfast, AppEEARS uses OPeNDAP. While similar to modisfast, appeears offers a broader range of data sources but has a latency period (ranging from minutes to hours) for query processing due to server-side post-processing (mosaicking, reprojection, etc.).

Finally, some R packages, such as rgee [@rgee], rely on proprietary software or data access protocols and are not discussed here for that reason.

System benchmarking

Before proceeding to the test, some system benchmarking :

library(benchmarkme)
Sys.time()
#> [1] "2024-08-07 14:22:47 CEST"
Sys.info()
#>                                                            sysname                                                            release 
#>                                                            "Linux"                                                 "6.5.0-44-generic" 
#>                                                            version                                                           nodename 
#> "#44~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Tue Jun 18 14:36:16 UTC 2"                                           "ptaconet-Latitude-7490" 
#>                                                            machine                                                              login 
#>                                                           "x86_64"                                                         "ptaconet" 
#>                                                               user                                                     effective_user 
#>                                                         "ptaconet"                                                         "ptaconet"
benchmarkme::get_r_version()
#> $platform
#> [1] "x86_64-pc-linux-gnu"
#> 
#> $arch
#> [1] "x86_64"
#> 
#> $os
#> [1] "linux-gnu"
#> 
#> $system
#> [1] "x86_64, linux-gnu"
#> 
#> $status
#> [1] ""
#> 
#> $major
#> [1] "4"
#> 
#> $minor
#> [1] "4.0"
#> 
#> $year
#> [1] "2024"
#> 
#> $month
#> [1] "04"
#> 
#> $day
#> [1] "24"
#> 
#> $`svn rev`
#> [1] "86474"
#> 
#> $language
#> [1] "R"
#> 
#> $version.string
#> [1] "R version 4.4.0 (2024-04-24)"
#> 
#> $nickname
#> [1] "Puppy Cup"
benchmarkme::get_cpu()
#> $vendor_id
#> [1] "GenuineIntel"
#> 
#> $model_name
#> [1] "Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz"
#> 
#> $no_of_cores
#> [1] 8
benchmarkme::get_ram()
#> 33.5 GB

appeears

appears_start_time <- Sys.time()

library(appeears)
library(sf)
library(terra)

roi <- st_as_sf(data.frame(id = "madagascar", geom = "POLYGON((41.95 -11.37,51.26 -11.37,51.26 -26.17,41.95 -26.17,41.95 -11.37))"), wkt="geom", crs = 4326) 
time_range <- as.Date(c("2023-01-01","2023-12-31"))
collection <- "MOD13A3.061"  
variables <- c("_1_km_monthly_NDVI") 

token <- rs_login(user = Sys.getenv("earthdata_un"))

df <- data.frame(
  task = "polygon_test",
  subtask = "subtask",
  start = as.character(time_range[1]),
  end = as.character(time_range[2]),
  product = collection,
  layer = variables
)

# create the task
task <- rs_build_task(
  df = df,
  roi = roi,
  format = "netcdf4"
)

# request the task to be executed
rs_request(
  request = task,
  user = Sys.getenv("earthdata_un"),
  transfer = TRUE,
  verbose = TRUE
)

r <- terra::rast(file.path(tempdir(),df$task,"MOD13A3.061_1km_aid0001.nc"))

r
#> class       : SpatRaster 
#> dimensions  : 1777, 1118, 24  (nrow, ncol, nlyr)
#> resolution  : 0.008333333, 0.008333333  (x, y)
#> extent      : 41.95, 51.26667, -26.175, -11.36667  (xmin, xmax, ymin, ymax)
#> coord. ref. : +proj=longlat +datum=WGS84 +no_defs 
#> sources     : MOD13A3.061_1km_aid0001.nc:_1_km_monthly_NDVI  (12 layers) 
#>               MOD13A3.061_1km_aid0001.nc:_1_km_monthly_VI_Quality  (12 layers) 
#> varnames    : _1_km_monthly_NDVI (1 km monthly NDVI) 
#>               _1_km_monthly_VI_Quality (1 km monthly VI Quality) 
#> names       : _1_km~DVI_1, _1_km~DVI_2, _1_km~DVI_3, _1_km~DVI_4, _1_km~DVI_5, _1_km~DVI_6, ... 
#> unit        :        NDVI,        NDVI,        NDVI,        NDVI,        NDVI,        NDVI, ... 
#> time (days) : 2023-01-01 to 2023-12-01

appears_end_time <- Sys.time()

modisfast

modisfast_start_time <- Sys.time()

library(modisfast)
library(sf)
library(terra)

roi <- st_as_sf(data.frame(id = "madagascar", geom = "POLYGON((41.95 -11.37,51.26 -11.37,51.26 -26.17,41.95 -26.17,41.95 -11.37))"), wkt="geom", crs = 4326) 
time_range <- as.Date(c("2023-01-01","2023-12-31"))
collection <- "MOD13A3.061"  
variables <- c("_1_km_monthly_NDVI") 

log <- mf_login(credentials = c(Sys.getenv("earthdata_un"), Sys.getenv("earthdata_pw")))
#> Checking credentials...
#> Successfull login to Earthdata

urls <- mf_get_url(
  collection = collection,
  variables = variables,
  roi = roi,
  time_range = time_range
 )
#> Building the URLs...
#> OK

res_dl <- mf_download_data(urls, parallel = T)
#> 5  datasets in total :  0  already downloaded and  5  datasets to download
#> Downloading the data...
#> 
#> Data were all properly downloaded under the folder(s)  /tmp/RtmpVL1j3M/modisfast_6e25477426a4/data/madagascar/MOD13A3.061 
#> **To import the data in R, use the function modisfast::mf_import_data() rather than terra::rast() or stars::read_stars(). More info at help(mf_import_data)**

r <- mf_import_data(
  path = dirname(res_dl$destfile[1]),
  collection = collection, 
  proj_epsg = 4326
  )

r
#> class       : SpatRaster 
#> dimensions  : 1509, 1789, 12  (nrow, ncol, nlyr)
#> resolution  : 0.009858459, 0.009858459  (x, y)
#> extent      : 38.40805, 56.04484, -26.24725, -11.37083  (xmin, xmax, ymin, ymax)
#> coord. ref. : lon/lat WGS 84 (EPSG:4326) 
#> source(s)   : memory
#> names       : _1_km~DVI_1, _1_km~DVI_2, _1_km~DVI_3, _1_km~DVI_4, _1_km~DVI_5, _1_km~DVI_6, ... 
#> min values  :     -0.1992,  -0.1876327,  -0.1933123,  -0.1836101,     -0.1974,     -0.1914, ... 
#> max values  :      0.9764,   0.9764000,   0.9722000,   0.9912000,      0.9912,      0.9608, ... 
#> time (days) : 2023-01-01 to 2023-12-01

modisfast_end_time <- Sys.time()

MODIS

We were not able to install the MODIS package, because some of the dependencies were not available for our version of R. We could therefore not finish the test with MODIS.

MODIStsp

modistsp_start_time <- Sys.time()

library(MODIStsp)

MODIStsp(gui = FALSE, 
         parallel = TRUE,
         selprod = "Vegetation_Indexes_Monthly_1Km (M*D13A3)",
         prod_version = "061",
         bandsel = "NDVI",
         sensor = "Terra",
         start_date = gsub("-",".",as.character(time_range[1])),
         end_date   = gsub("-",".",as.character(time_range[2])),
         user = Sys.getenv("earthdata_un"),
         password = Sys.getenv("earthdata_pw"),
         bbox = c(41.95,-26.17,51.26,-11.37),
         download_range = "Full",
         spatmeth = "bbox",
         out_projsel = "User Defined",
         output_proj = "4326",
         out_res_sel = "Native",
         out_format = "GTiff"
)
#> Error in MODIStsp_download(modislist, proc_opts$out_folder_mod, download_server, : Username and/or password are not valid. Please provide
#>              valid ones!

modistsp_end_time <- Sys.time()

The system returns that “Username and/or password are not valid”, although they are those successfully used with both modisfast and appeears. We could therefore not finish the test with MODIStsp.

MODIStools

MOD13A3 is not available with MODIStools, so we run the exercise with MOD21A2 (spatial resolution : 1 km ; temporal resolution : 8 day). Because the temporal resolution is 4 fold lower than MOD13A3 (8 days instead of 1 month), we also divide the time range by 4 to have comparable data.

modistools_start_time <- Sys.time()

library(MODISTools)

subset <- mt_subset(product = "MOD21A2",
                    lat = -19.5,
                    lon = 46.5,
                    band = "LST_Day_1KM",
                    start = as.character(time_range[1]),
                    end = "2023-03-12",
                    km_lr = 500,
                    km_ab = 900,
                    site_name = "madagascar",
                    internal = TRUE,
                    progress = FALSE)
#> Error in do.call("rbind", subset_data$data): le second argument doit être une liste

modistools_end_time <- Sys.time()

The system returns that “Number of kilometers above or below the subset location must be less than or equal to 100.”, which is not enough for the whole size of Madagascar. MODIStools enables to download data for 200 km x 200 km areas maximum. We could therefore not finish the test with MODIStools.

Results

modisfast and appeears passed the test, while MODIS, MODIStools and MODIStsp did not (for different reasons). Let’s see the duration of data processing access :


# with modisfast
round(modisfast_end_time - modisfast_start_time,2)
#> Time difference of 1.21 mins

# with appeears
round(appears_end_time - appears_start_time,2)
#> Time difference of 3.91 mins

We see here that accessing and importing the time-series with modisfast was much faster than with appeears ! Once again, appeears has a variable latency period for query processing, ranging from minutes to hours, due to server-side post-processing. modisfast does not have such latency.