Explore this notebook interactively: Binder

References#

In this notebook we will show what types of references are supported by semantique. Remember that result instructions in query recipes can be formulated by combining basic building blocks into processing chains. These processing chains start with a reference. At the query recipe construction stage, such a reference is nothing more than a small piece of text. When executing the recipe, the query processor solves this reference and evaluates it internally into a multi-dimensional array filled with data values. Several actions can then be applied to this array. For a description of those, see the Verbs notebook. The same building blocks can also be used when constructing a set of mapping rules according to semantiques native mapping configuration.

Content#

Prepare#

Import packages:

[1]:
import semantique as sq
[2]:
import geopandas as gpd
import json

Referencing semantic concepts#

The most common reference in a query recipe is a reference to a semantic concept, i.e. a conceptualization of something that exists in the real world. The application expert writing the recipe can refer to any semantic concept by using the general concept() function, which should be provided with the index of the semantic concept in the mapping dictionary that will be used to process the query against. The depth of this index depends on the structure of the ontology to which the mapping refers. Usually, an ontology does not only formalize the semantic concepts themselves, but also formalizes a categorization of these concepts. That is, a reference to a specific semantic concept usually consists of the name of that concept, and the name of the category it belongs to. Optionally there can be multiple hierarchies of categories. The concept() function allows as many levels as needed, starting with the lowest-level category, and ending with the name of the semantic concept itself.

If no categorization exists, one can refer to a semantic concept solely by its name:

[3]:
ref = sq.concept("entity", "vegetation")
print(json.dumps(ref, indent = 2))
{
  "type": "concept",
  "reference": [
    "entity",
    "vegetation"
  ]
}

With ontologies that formalize a categorization, the same function can be used, in a form as below. There is no limit on how many levels of categories can be used in a reference. This all depends on the ontology.

[4]:
ref = sq.concept("entity", "forest")
print(json.dumps(ref, indent = 2))
{
  "type": "concept",
  "reference": [
    "entity",
    "forest"
  ]
}

For convenience, commonly used semantic concept categories are also implemented as separate construction functions, such that they can be called directly:

  • entity(): A phenonemon with a distinct and independent existence, e.g. a forest or a lake. We also use this category for land-cover like concepts such as vegetated areas (i.e. vegetation) and water bodies (i.e. water).

  • event(): A phenonemon that takes place, e.g. a fire or a flood.

Hence, the snippet below produces the same output as the snippet above.

[5]:
ref = sq.entity("vegetation")
print(json.dumps(ref, indent = 2))
{
  "type": "concept",
  "reference": [
    "entity",
    "vegetation"
  ]
}

Each semantic concept is defined by one or more named properties it has. For example, a entity lake may be defined by its color (a blueish, water-like color) in combination with its topography (it has an approximately flat surface). To reference only a single property of a semantic concept, one can specify the “property” argument of the concept() function:

[6]:
ref = sq.entity("lake", property = "color")
print(json.dumps(ref, indent = 2))
{
  "type": "concept",
  "reference": [
    "entity",
    "lake"
  ],
  "property": "color"
}

Referencing data layers#

When an EO expert constructs a mapping using semantiques native mapping configuration, they will reference data layers in an EO data cube. They can do so by using the general layer() function, which should be provided with the index of the data layer in the layout file of the EO data cube that will be used to process the query against. The depth of this index depends on the structure of the layout. Usually, a layout does not only list the data layers themselves, but also formalizes a categorization of these layers. That is, a reference to a data layer usually consists of the name of that layer, and the name of the category it belongs to. Optionally there can be multiple hierarchies of categories. The layer() function allows as many levels as needed, starting with the lowest-level category, and ending with the name of the data layer itself.

If no categorization exists, one can refer to a data layer solely by its name:

[7]:
ref = sq.layer("s2_band02")
print(json.dumps(ref, indent = 2))
{
  "type": "layer",
  "reference": [
    "s2_band02"
  ]
}

With layouts that formalize a categorization, the same function can be used, in a form as below. There is no limit on how many levels of categories can be used in a reference. This all depends on the layout.

[8]:
ref = sq.layer("reflectance", "s2_band02")
print(json.dumps(ref, indent = 2))
{
  "type": "layer",
  "reference": [
    "reflectance",
    "s2_band02"
  ]
}

For convenience, commonly used data layer categories are also implemented as separate construction functions, such that they can be called directly:

  • appearance(): Data layers describing what the observed phenonemon looks like.

  • reflectance(): Data layers describing how intensely certain types of radiation are reflected by the observed phenonemon.

  • topography(): Data layers describing the surface form of the observed phenonemon.

  • atmosphere(): Data layers describing what the atmosphere above the observed phenonemon looks like.

  • artifacts(): Data layers that label erroneous observations.

Hence, the code below produces the same output as above.

[9]:
ref = sq.reflectance("s2_band02")
print(json.dumps(ref, indent = 2))
{
  "type": "layer",
  "reference": [
    "reflectance",
    "s2_band02"
  ]
}

The way that semantique parses the layout also allows for easy auto-completion, for example when using jupyter notebooks.

[10]:
with open("files/layout_gtiff.json", "r") as file:
    dc = sq.datacube.GeotiffArchive(json.load(file), src = "files/layers_gtiff.zip")
[11]:
layout = dc.layout
[12]:
ref = sq.layer(*layout["reflectance"]["s2_band02"]["reference"])
print(json.dumps(ref, indent = 2))
{
  "type": "layer",
  "reference": [
    "reflectance",
    "s2_band02"
  ]
}

Although query recipes are in the first place meant to fall inside the semantic domain, not containing any image domain terminology, it is also possible to reference data layers directly in a query recipe. This can be useful in more advanced use-cases, for example when querying a cloud-free composite.

Referencing results#

A query recipe can contain instructions for multiple results. Whenever instructions for a certain result get too long or complicated to be defined in one line, they can of course be broken up into several assignment operations. For example:

water_count = sq.entity("water").reduce("count", "time")
vegetation_count = sq.entity("vegetation").reduce("count", "time")
recipe = sq.QueryRecipe({"summed_count": water_count.evaluate("add", vegetation_count)})

After executing this recipe, the response will contain a single result, i.e. summed_count. However, one might also want to obtain the water count and vegetation count maps as separate results themselves. At that point, it is good to know that processing chains may be started with a reference to a previously defined result, using the result() function. This allows to re-use other result definitions inside a new result definition. For example:

[13]:
ref = sq.result("water_count")
print(json.dumps(ref, indent = 2))
{
  "type": "result",
  "name": "water_count"
}

The same summed_count result as shown above can in this way also be formulated as shown below. The only difference here is that next to summed_count, the response will also contain water_count and vegetation_count as separate results.

recipe = sq.QueryRecipe()
recipe["water_count"] = sq.entity("water").reduce("count", "time")
recipe["vegetation_count"] = sq.entity("vegetation").reduce("count", "time")
recipe["summed_count"] = sq.result("water_count").evaluate("add", sq.result("vegetation_count"))

Referencing the active evaluation object#

Before explaining these type of references, we need to introduce some slightly more advanced processing chain structures. Remember that a processing chain starts with a reference, and that this reference will internally be evaluated into an array when processing the query recipe. Specific actions, called verbs, can then be applied to this array. This can be a single verb, but also a chain of multiple verbs. The array is constantly wrangled when moving through the verbs of the processing chain. It starts as the evaluated reference. This object will be the input to the first verb, which wrangles it into a different array. That array will then be the input to the second verb, which again wrangles it into a different array, etcetera. We use the term active evaluation object to refer to the input object at each stage of the processing chain. Hence, at the first verb, the active evaluation object is the evaluated reference. At the second verb, the active evaluation object is the output of the first verb, etcetera.

Some verbs combine information of multiple inputs. For example, the evaluate() verb lets you evaluate bivariate expressions in which the left-hand side of the expression is the active evaluation object, and the right-hand side of the expression is another array. This leads to nested processing chains. For example, adding the values of a vegetation count map to the values of a water count map:

sq.entity("water").reduce("count", "time").evaluate("add", sq.entity("vegetation").reduce("count", "time"))

But now what if we want to add the values of the water count map to themselves? Our recipe would look like this:

sq.entity("water").reduce("count", "time").evaluate("add", sq.entity("water").reduce("count", "time"))

Here you see that the same processing chain that constructs the water count map occurs twice in the recipe. This does not only make the code longer and less readable, it also decreases processing time since solving the water reference and reducing the resulting array has to be done twice instead of once. Now we came at the point where we can show that it is useful to reference the active evaluation object itself. The self() function can be used for that, without the need to provide any arguments.

[14]:
ref = sq.self()
print(json.dumps(ref, indent = 2))
{
  "type": "self"
}

This allows us to simplify the recipe above (and improve processing speed!):

sq.entity("water").reduce("count", "time").evaluate("add", sq.self())

Of course, the example above is trivial. Why would you add the active evaluation object to itself, when you can also just multiply it by 2. However, there are cases where the self-reference can be of real use. For example, when applying a self-filter (see here), when applying a filter based on dimension coordinates (see here), or when grouping an array along one of its dimensions (see here).

Referencing collections#

Up to now we have only talked about references to single objects (e.g. a single semantic concept or a single data layer), that will result in a single array during query processing. It is also possible to reference a collection of multiple objects at once. Such collections have dedicated verbs that can be applied to them.

Consider for example a case where someone wants to sum five arrays together. This could be done by starting a processing chain with a single reference, and calling four times the evaluate() verb to add each of the other arrays step-by-step. It is much easier to reference all the five arrays together in a collection, and call the merge() verb to sum them in a single step. Furthermore, a chained structure always implies some kind of a hierarchy, in which you start with a main input object, and add the other objects along the way. This might not feel fitting for every use-case.

The collection() function can be used to reference collections. It can be provided with as many singular references as needed. For example:

[15]:
ref = sq.collection(sq.entity("water"), sq.entity("vegetation"))
print(json.dumps(ref, indent = 2))
{
  "type": "collection",
  "elements": [
    {
      "type": "concept",
      "reference": [
        "entity",
        "water"
      ]
    },
    {
      "type": "concept",
      "reference": [
        "entity",
        "vegetation"
      ]
    }
  ]
}