Developer Implementation Details

In this guide selected core mechanisms of the openEO package are described. It is targeted towards interested developers and it is highly recommended to dive into the source code, while reading through this guide. The explanations here are abstracted from the code and shall guide new developers on the concepts and routines of this package.

Process Graph Building

The ProcessCollection class represents the toolbox for creating a process graph in openEO. In contrast to the S3 class ProcessList which is created in list_processes() from the returned metadata of the back-end, this ProcessCollection interprets the meta data of the processes, e.g. the name and the available parameter with their types and names and creates builder functions upon this information like p$load_collection(). The builder functions themselves create the ProcessNode objects based on the used processes and the passed values for the arguments.

Note: we might reuse the ProcessCollection at some points, therefore it needed to be an R6 class, otherwise we copy the potentially list based object multiple times, which might resolves into memory issues at some point.

The classes related to the process graph like ProcessNode and Process are contained in process_graph_building.R. The argument and parameter related classes are in argument_types.R. And lastly the ProcessCollection is located in predefined_processes.R.

`ProcessCollection`

The first important detail is that the R6 object is unlocked, this means that R6 object can be changed at runtime. This is required because the builder functions are added dynamically during the initialization of the R6 object.

ProcessCollection = R6Class(
    "ProcessCollection",
    lock_objects = FALSE,
    ...)

Now, during the initialization (ProcessCollection$initialize()) of the ProcessCollection, the ProcessList is translated into a list of Process objects (1) and based on that the builder functions are derived (2).

This operation is done within the private function private$createListOfProcesses() where the main work is done by the utility function processFromJson()
In R a function is composed of its formals and the function body. The function formals can be accessed by Process$getFormals(). This will retrieve the parameter names and the default values from the meta data. For the function body we create a ProcessNode from the respective Process via a deep copy. Deep copy means that a new object is created, but all the fields are copied, especially nested Argument objects also need to be copied, otherwise two instances of the same process would share their arguments. Finally this process node will receive the values of the builder function as arguments, once the function is invoked. During the creation of those builder function index was used in the for-loop. To work properly we need to replace the variable with its real value, otherwise we cannot access the correct process, because either index is unknown or it is the wrong variable.

`processFromJson` and `parameterFromJson`

processFromJson was used to create a Process object from the JSON meta data - actually, the JSON meta data is already transformed into an R list object but this will always be referred as the JSON meta data as it always will be the response of the back-end. The function itself is won’t do much, but feeding the correct bits of the JSON meta data to the Process constructor. As part of the constructor parameter, a list of Argument objects need to be passed on. In the conceptual vision of the package parameter is the descriptive part and argument is essentially a parameter for which can hold a value. parameterFromJson will perform the translation from the JSON parameter meta data into a Argument object. The translation is done by comparing the type and schema of the meta data with the implemented Argument representation. Therefore each implemented Argument gets its unique schema and type assigned upon creation.

URI = R6Class(
  "uri",
  inherit=Argument,
  public = list(
    initialize=function(name=character(),description=character(),required=FALSE) {
      private$name = name
      private$description = description
      private$required = required
      private$schema$type = "string"
      private$schema$subtype = "uri"
    }
  ), ...)

The parameter meta data matching is handled in findParameterGenerator() and after a suitable Argument was found additional restrictive information are transferred from the meta data to the Argument, e.g. not-null constraints, patterns or enumerations, default values etc.

To complete this section findParameterGenerator() creates a single instance of all registered Argument objects and invokes Parameter$matchesSchema() on each object with the given schema. If none matches then a ominous Argument object will be created which has not many constraints by itself. If more than one match is found, then the first one in the list is chosen, otherwise the one match is selected as suitable Argument.

Inheritance

During the development of this package several functions were called again and again, especially validate() and serialize() on the Argument object. In general those functions work very similar, so R6 inheritance was used to unify this behavior, but for each type private$typeCheck() and private$typeSerialization() is implemented according to the specific needs of the argument and respectively called by their public counter part.

Similar considerations were made between Process and ProcessNode. Essentially the node is a process, but carries a unique id that is used in a process graph.

Package environment variables

At some point it appeared tedious to pass the active OpenEOConnection always to each function which interacts with the back-end. So the currently active components of an openeo session are stored in an internal package environment (openeo:::pkgEnvironment). This environment shall not be accessed by user, but active_connection(), active_data_collection() or active_process_collection() were implemented to access or set those environment variables.

function coercion

Another interesting and somewhat complex aspect is the coercion from an R function into an openEO process graph. This job is done by .function_to_graph() (in process_graph_building.R) and it is called in the respective coerce function as.Graph.function(). The routine would look like this.

extract the functions formals which are the variables to be used
create variables with create_variable() for each parameter of the function
run do.call() with the function and the parameters (which are all of type ProcessGraphParameter)
the function evaluation will return a ProcessNode which will be the final node
create a graph from the final node

When a function is passed as reducer or aggregation function it is basically the same procedure. But ProcessGraphArgument in this case offers already a set of process graph parameters which will be used instead of create_variable(). If the formals from the function and the amount of parameters from the ProcessGraphArgument do not match, the coercion will fail.

HTML widgets

In some contexts objects are rendered as HTML documents. For example in a Jupyter notebook environment, a RMarkdown or a RNotebook the meta data objects of collections, processes and their graphs are rendered in HTML. The rendering in HTML needs an internet connection, because java script files and styles are accessed from a content delivery system. The openEO ecosystem already provides those components because the openEO Webeditor already uses them. They are distributed at npm vue-components.

The visualization is controlled via the print function (print-functions.R), which checks if the current session is in an HTML environment and if so the internal print_html() is invoked instead of printing to console.

Authentication

The authentication changed over the years a lot. Basic Authentication was the initial mechanism, then there were various Open ID Connect mechanisms, which are all based on the OAuth2.0 authentication method. For legacy reasons all the different approaches are kept and are available in authentication.R. For the authentication classes inheritance is used again to provide the same function calls from OpenEOConnection. The main points are that an access_token needs to be provided for authentication and that a login() and a logout() is provided. Depending on the access token grants offered by the back-ends identity provider different procedures have to be performed, which might require user interaction. For example the OIDCAuthCodeFlow spawns a local webservice and waits for a call from the local internet browser based on a redirect that has to be stated at the Authentication Provider. Other flows like OIDCAuthDeviceCodeFlow poll a certain endpoint at the Authentication Provider with a device code until the user has entered the code and gave the consent to the personal data. The different flows have been implemented by the httr2 package, which is used to retrieve the access_token which is required for authorized services at the back-end.

RStudio Connection Contract

When using RStudio an additional feature was implemented that allows to inspect the available data sources of a connected back-end by using the RStudio’s Connection Contract to populate the Connections Pane. The connection contract is implemented in .fill_rstudio_connection_observer() in client.R. After connecting the contracts listObjects function is called which lists all the available data sets. On extending the view of a specific collection the contracts listColumns is invoked. This interacts with the back-end to describe the collection (describe_collection()) and the result is parsed into the stated table structure.

+ <Collection>
  - <dimension>: <description>