Spatiotemporal data often comes in the form of dense arrays, with space and time being array dimensions. Examples include
This R package provides classes and methods for reading, manipulating, plotting and writing such data cubes, to the extent that there are proper formats for doing so.
The canonical data cube most of us have in mind is that where two dimensions represent spatial raster dimensions, and the third time (or band), as e.g. shown here:
By data cubes however we also consider higher-dimensional cubes (hypercubes) such as a five-dimensional cube where in addition to time, spectral band and sensor form dimensions:
or lower-dimensional cubes such as a raster image:
suppressPackageStartupMessages(library(dplyr)) library(stars) # Loading required package: abind # Loading required package: sf # Linking to GEOS 3.9.0, GDAL 3.2.0, PROJ 7.2.0 tif = system.file("tif/L7_ETMs.tif", package = "stars") read_stars(tif) %>% slice(index = 1, along = "band") %>% plot()
Raster data do not need to be regular and aligned with North/East, and
stars supports besides regular also rotated, sheared,
rectilinear and curvilinear rasters:
Vector data cubes arise when we do not have two regularly discretized spatial dimensions, but a single dimension that points to distinct spatial feature geometries, such as polygons (e.g. denoting administrative regions):
or points (e.g. denoting sensor locations):
NetCDF’s CF-convention calls this a discrete axis.
stars provides two functions to read data:
read_stars, where the latter reads through GDAL. (In the future, both
will be integrated in
read_stars.) For reading NetCDF files, package
RNetCDF is used, for reading through GDAL, package
sf provides the
binary linking to GDAL.
For vector and raster operations,
stars uses as much as possible the
routines available in GDAL and PROJ (e.g.
warp). Read more about this in the vignette on
vector-raster conversions, reprojection,
stars_proxy objects (currently only when read
through GDAL), which contain only the dimensions metadata and pointers
to the files on disk. These objects work lazily: reading and processing
data is postponed to the moment that pixels are really needed (at plot
time, or when writing to disk), and is done at the lowest spatial
resolution possible that still fulfills the resolution of the graphics
device. More details are found in the stars proxy
The following methods are currently available for
methods(class = "stars_proxy") #  [ adrop aggregate aperm as.data.frame #  c coerce dim droplevels filter #  initialize Math merge mutate Ops #  plot predict print pull select #  show slice slotsFromS3 split st_apply #  st_as_stars st_crop st_mosaic st_redimension st_sample #  st_set_bbox transmute write_stars # see '?methods' for accessing help and source code
In the following, a curvilinear grid with hourly precipitation values of a hurricane is imported and the first 12 time steps are plotted:
prec_file = system.file("nc/test_stageiv_xyt.nc", package = "stars") (prec = read_ncdf(prec_file, curvilinear = c("lon", "lat"), ignore_bounds = TRUE)) # no 'var' specified, using Total_precipitation_surface_1_Hour_Accumulation # other available variables: # time_bounds, lon, lat, time # No projection information found in nc file. # Coordinate variable units found to be degrees, # assuming WGS84 Lat/Lon. # stars object with 3 dimensions and 1 attribute # attribute(s): # Total_precipitation_surface_1_Hour_Accumulation [kg/m^2] # Min. : 0.000 # 1st Qu.: 0.000 # Median : 0.750 # Mean : 4.143 # 3rd Qu.: 4.630 # Max. :163.750 # dimension(s): # from to offset delta refsys point # x 1 87 NA NA WGS 84 NA # y 1 118 NA NA WGS 84 NA # time 1 23 2018-09-13 18:30:00 UTC 1 hours POSIXct NA # values x/y # x [87x118] -80.6113,...,-74.8822 [x] # y [87x118] 32.4413,...,37.6193 [y] # time NULL # curvilinear grid sf::read_sf(system.file("gpkg/nc.gpkg", package = "sf"), "nc.gpkg") %>% st_transform(st_crs(prec)) -> nc # transform from NAD27 to WGS84 nc_outline = st_union(st_geometry(nc)) # although coordinates are longitude/latitude, st_union assumes that they are planar plot_hook = function() plot(nc_outline, border = 'red', add = TRUE) prec %>% slice(index = 1:12, along = "time") %>% plot(downsample = c(5, 5, 1), hook = plot_hook)
and next, intersected with with the counties of North Carolina, where the maximum precipitation intensity was obtained per county, and plotted:
a = aggregate(prec, by = nc, FUN = max) # although coordinates are longitude/latitude, st_intersects assumes that they are planar # although coordinates are longitude/latitude, st_intersects assumes that they are planar plot(a, max.plot = 23, border = 'grey', lwd = .5)
We can integrate over (reduce) time, for instance to find out when the maximum precipitation occurred. The following code finds the time index, and then the corresponding time value:
index_max = function(x) ifelse(all(is.na(x)), NA, which.max(x)) st_apply(a, "geom", index_max) %>% mutate(when = st_get_dimension_values(a, "time")[.$index_max]) %>% select(when) %>% plot(key.pos = 1, main = "time of maximum precipitation")
gdalcubes can be used to create data cubes (or functions from
them) from image collections, sets of multi-band images with varying
and does this by resampling and/or aggregating over space and/or time. It reuses GDAL VRT’s and gdalwarp for spatial resampling and/or warping, and handles temporal resampling or aggregation itself.
ncdfgeom reads and writes vector data cubes from and to netcdf files
in a standards-compliant way.
raster is a powerful package for handling raster maps and
stacks of raster maps both in memory and on disk, but does not address
A list of
stars commands matching existing
raster commands is found
A list of translations in the opposite direction (from
raster) still needs to be made.
This project has been realized with financial support from the