Introduction

In the vignette("get_started"), we have imported data :

  • from several collections (MOD13A3.061 and GPM_3IMERGM.07) ;
  • over one single region of interest (ROI) ;
  • for one single time frame of interest (2017-01-01 to 2017-01-30).

So far so good, but what if we need multiple regions of interest, and / or multiple time frames of interest ? Those case are likely to happen, for instance :

  • multiple time frames of interest : we have spatiotemporal sampling data - e.g. species occurrence - that were collected over a large time frame and we want to study how local past environmental / climatic conditions influence the occurrence ;
  • multiple regions of interest : we want to compare two areas in terms of their environmental or climatic conditions.

We could use for loops or related stuff to do the job. However, this would not be very optimized. In this vignette, we explain why and we show how to optimize the data import in the case of multiple regions or multiple time periods of interest. Let’s start !

Get data over multiple regions of interest simultaneously

It is very easy to get data over multiple regions of interest, as mf_get_url() naturally supports the setting up of various ROIs, as shown in the example below :


# Define multiple regions of interest
roi <- st_as_sf(data.frame(id=c("Korhogo","Diebougou"),
                           geom=c("POLYGON ((-5.82 9.54, -5.42 9.55, -5.41 8.84, -5.81 8.84, -5.82 9.54))",
                                   "POLYGON ((-3.62 11.03, -3.13 11.04, -3.11 10.60, -3.60 10.60, -3.62 11.03))"
                                   )),wkt="geom",crs = 4326)
                                   
time_range <- as.Date(c("2017-01-01","2017-01-30"))

# and then execute the classical workflow

log <- mf_login(credentials = c(Sys.getenv("earthdata_un"),Sys.getenv("earthdata_pw")))
#> Checking credentials...

urls_mod11a1 <- mf_get_url(
        collection = "MOD11A1.061",
        variables = c("LST_Day_1km","LST_Night_1km","QC_Day","QC_Night"),
        roi = roi,
        time_range = time_range)
#> Building the URLs...
#> Estimated maximum size of data to be downloaded is 3 Mb

res_dl <- mf_download_data(urls_mod11a1, parallel = TRUE)
#> 2  datasets in total :  0  already downloaded and  2  datasets to download
#> Downloading the data ... (destination folder: /tmp/Rtmp75vlp4/modisfast_3ef0635b9fe3 )
#> Estimated maximum size of data to be downloaded is ~ 3 Mb
#> 
#> Actual size of downloaded data is 1 Mb

modis_ts_korhogo <- mf_import_data(path = dirname(list.files(path = tempdir(), pattern = "MOD11A1.061", recursive = TRUE, full.names = TRUE))[2], 
                                   collection = "MOD11A1.061")
#> Importing the dataset as a SpatRaster object...

modis_ts_diebougou <- mf_import_data(path = dirname(list.files(path = tempdir(), pattern = "MOD11A1.061", recursive = TRUE, full.names = TRUE))[1], 
                                     collection = "MOD11A1.061")
#> Importing the dataset as a SpatRaster object...

Get data over multiple time periods of interest simultaneously

Here, the things are just a bit more different, as explained in this example :

We first setup the time ranges of interest (and as usual, the ROI)

roi <- st_as_sf(data.frame(id = "Korhogo", geom="POLYGON ((-5.82 9.54, -5.42 9.55, -5.41 8.84, -5.81 8.84, -5.82 9.54))"),wkt="geom",crs = 4326)

time_ranges <- list(as.Date(c("2016-01-01","2016-01-31")),
                    as.Date(c("2017-01-01","2017-01-31")),
                    as.Date(c("2018-01-01","2018-01-31")),
                    as.Date(c("2019-01-01","2019-01-31")))

log <- mf_login(credentials = c(Sys.getenv("earthdata_un"),Sys.getenv("earthdata_pw")))
#> Checking credentials...

Of course, we could loop over the mf_get_url() with the time ranges of interest, and get the data. However, the mf_get_url() function does query the OPeNDAP servers each time it is called. This query internally imports various data, including OPeNDAP time, latitude and longitude vectors, and this process takes some time. In case you loop over the function for the same ROI and multiple time frames of interest, it will import again and again the same data, which is quite useless.

Here is where the function mf_get_opt_param() comes into the game. For a given collection and ROI, this function queries the OPeNDAP server and retrieves the information that we were mentioning in the previous paragraph. This function is actually run within the mf_get_url() function, but its output can also be provided as input parameter opt_param of mf_get_url(). If mf_get_url() is queried multiple times for the same couple {collection, ROI}, it is hence more efficient to pre-compute only once the argument opt_param using mf_get_opt_param() and to further provide it to mf_get_url() within a for loop or e.g. a purrr::map() function.

To summarize : when we have multiple time frames of interest, we first execute the function mf_get_opt_param(). Then, we execute the function mf_get_url(), passing the result of mf_get_opt_param() in the parameter opt_param.

First execute the function mf_get_opt_param() :


opt_param_mod11a1 <- mf_get_opt_param("MOD11A1.061",roi)

Then execute the function mf_get_url() passing the argument opt_param :


urls_mod11a1 <- purrr::map_dfr(time_ranges, ~mf_get_url( 
  collection = "MOD11A1.061",
  variables = c("LST_Day_1km","LST_Night_1km","QC_Day","QC_Night"),
  roi = roi,
  time_range = .,
  opt_param = opt_param_mod11a1)
  )
#> Building the URLs...
#> Estimated maximum size of data to be downloaded is 2 Mb
#> Building the URLs...
#> Estimated maximum size of data to be downloaded is 2 Mb
#> Building the URLs...
#> Estimated maximum size of data to be downloaded is 2 Mb
#> Building the URLs...
#> Estimated maximum size of data to be downloaded is 2 Mb

str(urls_mod11a1)
#> 'data.frame':    4 obs. of  6 variables:
#>  $ id_roi              : chr  "Korhogo" "Korhogo" "Korhogo" "Korhogo"
#>  $ time_start          : Date, format: "2016-01-01" "2017-01-01" ...
#>  $ collection          : chr  "MOD11A1.061" "MOD11A1.061" "MOD11A1.061" "MOD11A1.061"
#>  $ name                : chr  "MOD11A1.061.2016001_2016031.h17v08.nc4" "MOD11A1.061.2017001_2017031.h17v08.nc4" "MOD11A1.061.2018001_2018031.h17v08.nc4" "MOD11A1.061.2019001_2019031.h17v08.nc4"
#>  $ url                 : chr  "https://opendap.cr.usgs.gov/opendap/hyrax/MOD11A1.061/h17v08.ncml.nc4?MODIS_Grid_Daily_1km_LST_eos_cf_projectio"| __truncated__ "https://opendap.cr.usgs.gov/opendap/hyrax/MOD11A1.061/h17v08.ncml.nc4?MODIS_Grid_Daily_1km_LST_eos_cf_projectio"| __truncated__ "https://opendap.cr.usgs.gov/opendap/hyrax/MOD11A1.061/h17v08.ncml.nc4?MODIS_Grid_Daily_1km_LST_eos_cf_projectio"| __truncated__ "https://opendap.cr.usgs.gov/opendap/hyrax/MOD11A1.061/h17v08.ncml.nc4?MODIS_Grid_Daily_1km_LST_eos_cf_projectio"| __truncated__
#>  $ maxFileSizeEstimated: num  2040000 2040000 2040000 2040000

Now, download and import the data in R :


res_dl <- mf_download_data(urls_mod11a1, parallel = TRUE)
#> 4  datasets in total :  0  already downloaded and  4  datasets to download
#> Downloading the data ... (destination folder: /tmp/Rtmp75vlp4/modisfast_3ef090bce3a )
#> Estimated maximum size of data to be downloaded is ~ 8 Mb
#> 
#> Actual size of downloaded data is 1 Mb

modis_ts <- mf_import_data(path = dirname(res_dl$destfile[1]), 
                           collection = "MOD11A1.061")
#> Importing the dataset as a SpatRaster object...

modis_ts
#> class       : SpatRaster 
#> dimensions  : 86, 51, 496  (nrow, ncol, nlyr)
#> resolution  : 926.6254, 926.6254  (x, y)
#> extent      : -638908.2, -591650.3, 981759.6, 1061449  (xmin, xmax, ymin, ymax)
#> coord. ref. : +proj=sinu +lon_0=0 +x_0=0 +y_0=0 +R=6371007.181 +units=m +no_defs 
#> sources     : MOD11A1.061.2016001_2016031.h17v08.nc4:LST_Night_1km  (31 layers) 
#>               MOD11A1.061.2016001_2016031.h17v08.nc4:QC_Day  (31 layers) 
#>               MOD11A1.061.2016001_2016031.h17v08.nc4:QC_Night  (31 layers) 
#>               ... and 13 more source(s)
#> varnames    : LST_Night_1km (Daily nighttime 1km grid Land-surface Temperature) 
#>               QC_Day (Quality control for daytime LST and emissivity) 
#>               QC_Night (Quality control for nighttime LST and emissivity) 
#>               ...
#> names       : LST_N~1km_1, LST_N~1km_2, LST_N~1km_3, LST_N~1km_4, LST_N~1km_5, LST_N~1km_6, ... 
#> unit        :           K,           K,           K,           K,           K,           K, ... 
#> time (days) : 2016-01-01 to 2019-01-31