GRASS application for real world big data

Chris Reudenbach

2024-10-28

Real world example

A typical example is the use of an existing project database in GRASS. GRASS organizes all data into an internal file structure known as a gisdbase folder, a mapset, and one or more locations within that mapset. All raster and vector data is stored within this structure and is organized by GRASS. So a typical task might be to work on datasets that are already stored in an existing GRASS structure.

Create GRASS project

Downloading census data

First, we need data from the real world. In this case, it’s the 2011 German microcensus data, which, with over 35 million points and several dozen characteristics, can definitely be called Big Data.

Download the data:

# (https://www.zensus2011.de/EN/Service/Contact/contact_node.html)
wget https://www.zensus2011.de/SharedDocs/Downloads/DE/Pressemitteilung/DemografischeGrunddaten/csv_Bevoelkerung_100m_Gitter.zip?__blob=publicationFile&v=3

The presentation of a real example has some interesting aspects:

We also need to download the metadata description file from the website mentioned above to get information about projection, data concepts, etc.

 # we need some additional packages
 require(link2GI)
 require(curl)

# first of all we create  a project folder structure 
  dirs = link2GI::createFolders(root_folder = paste0(tempdir(),"/link2GI_examples"), 
                                  folders =  c("run/"))

# set runtime directory
  setwd(dirs$path_run)

# get some typical authority generated data 
  url<-"https://www.zensus2011.de/SharedDocs/Downloads/DE/Pressemitteilung/DemografischeGrunddaten/csv_Bevoelkerung_100m_Gitter.zip;jsessionid=294313DDBB57914D6636DE373897A3F2.2_cid389?__blob=publicationFile&v=3"
 res <- curl::curl_download(url, paste0(path_run,"testdata.zip"))

# unzip it
 unzip(res,files = grep(".csv", unzip(res,list = TRUE)$Name,value = TRUE),
       junkpaths = TRUE, overwrite = TRUE)
fn <- list.files(pattern = "[.]csv$", path = getwd(), full.names = TRUE)

Preprocessing the data

After downloading the data, we will use it for some demonstration stuff. If you have a look, the data is nothing but x,y,z with some projection information assumed.

# fast read with data.table 
 xyz <- data.table::fread(paste0(path_run,"/Zensus_Bevoelkerung_100m-Gitter.csv"))

 head(xyz)

We can easily rasterize this data because it is intentionally gridded data. that means we have a value in at a grid size of 100 by 100 meters.

 require(RColorBrewer)
 require(terra)
 require(mapview)


# clean dataframe
 xyz <- xyz[,-1]

# rasterize it according to the projection 
  r <- terra::rast(xyz, type="xyz")
 terra::crs(r) <- 3035

# map it
 p <- colorRampPalette(brewer.pal(8, "Reds"))
 # aet resolution to 1 sqkm
 mapview::mapviewOptions(mapview.maxpixels = r@ncols*r@nrows/10)
 mapview::mapview(r, col.regions = p, 
                  at = c(-1,10,25,50,100,500,1000,2500), 
                  legend = TRUE)

Setting up a GRASS project

Nothing new so far. Now we create a new but permanent GRASS gisbase using the spatial parameters from the raster object. As you know, the linkGRASS function performs a full search for one or more existing GRASS installations. If a valid GRASS installation exists, all parameters are set and the rgrass package is linked.

Due to the fact that gisdbase_exist is set to FALSE by default, it will create a new structure according to the R object.

require(link2GI)
# initialize GRASS and set up a permanent structure  
link2GI::linkGRASS(x = r, 
                    gisdbase = paste0(tempdir(),"/link2GI_examples"),
                    location = "microzensus2011")   

Finally, we can import the data into the GRASS gisdbase using the functionality of the rgass package.

First we need to convert the raster/terra object into a GeoTIFF file. Any GDAL format is possible, but GeoTIFF is very common and stable.

require(link2GI)
require(raster)
require(rgrass)

# write it to geotiff
  terra::writeRaster(r, paste0(path_run,"/Zensus_Bevoelkerung_100m-Gitter.tif"), 
       x               overwrite = TRUE)

# import raster to GRASS
rgrass::execGRASS('r.external',
                   flags=c('o',"overwrite","quiet"),
                   input=paste0(path_run,"/Zensus_Bevoelkerung_100m-Gitter.tif"),
                   output="Zensus_Bevoelkerung_100m_Gitter",
                   band=1)

# check imported data set
rgrass::execGRASS('r.info',
                   map = "Zensus_Bevoelkerung_100m_Gitter") 

Let’s do now the same import as a vector data set. First we create a sf object.

 xyz_sf = st_as_sf(xyz,
                    coords = c("x_mp_100m", "y_mp_100m"),
                    crs = 3035,
                    agr = "constant")

#map points
 sf::plot_sf(xyz_sf)

The GRASS gisdbase already exists. So we pass linkGRASS the argument gisdbase_exist=TRUE and import the xyz data as generic GRASS vector points. Note that this will take some time.

 require(sf)
 require(sp)
 require(link2GI)
 
  sf2gvec(x =  xyz_sf,
           obj_name = "Zensus_Bevoelkerung_100m_",
           gisdbase = paste0(tempdir(),"/link2GI_examples"),
           location = "microzensus2011",
           gisdbase_exist = TRUE
          
           )
 
# check imported data set
rgrass::execGRASS('v.info', map = "Zensus_Bevoelkerung_100m_")