Explore this notebook interactively: Binder

Verbs#

In this notebook we will describe the different verbs that semantique offers. Remember that result instructions in query recipes can be formulated by combining basic building blocks into processing chains. These processing chains start with a reference. For a description of those, see the References notebook. At the query recipe construction stage, the reference is nothing more than a small piece of text. When executing the recipe, the query processor solves this reference and evaluates it internally into a multi-dimensional array filled with data values. Several actions can then be applied to this array. These actions are all labeled with an action word that should intuitively describe the operation they are performing. That is why we call them verbs. The same building blocks can also be used when constructing a set of mapping rules according to semantiques native mapping configuration.

Content#

Prepare#

Import packages:

[1]:
import semantique as sq
[2]:
import geopandas as gpd
import pandas as pd
import numpy as np
import xarray as xr
import json
import copy

Set the context of query execution:

[3]:
# Load a mapping.
with open("files/mapping.json", "r") as file:
    mapping = sq.mapping.Semantique(json.load(file))

# Represent an EO data cube.
with open("files/layout_gtiff.json", "r") as file:
    dc = sq.datacube.GeotiffArchive(json.load(file), src = "files/layers_gtiff.zip")

# Set the spatio-temporal extent.
space = sq.SpatialExtent(gpd.read_file("files/footprint.geojson"))
time = sq.TemporalExtent("2019-01-01", "2020-12-31")

# Collect the full context.
# Including additional configuration parameters.
context = {
    "datacube": dc,
    "mapping": mapping,
    "space": space,
    "time": time,
    "crs": 3035,
    "tz": "UTC",
    "spatial_resolution": [-1800, 1800]
}

Verbs for single arrays#

Most verbs in semantique are verbs that apply an action to a single array. The currently implemented verbs in this category are:

  • Evaluate: Evaluates an expression for each pixel in an array.

  • Extract: Extracts dimension coordinates as a new one-dimensional array.

  • Filter: Filters values from an array.

  • Assign: Assign a new value to each pixel in an array.

  • Reduce: Reduces the dimensionality of an array.

  • Shift: Shifts array values a given number of steps along a dimension.

  • Smooth: Smoothes array values by applying a moving window function.

  • Trim: Trims the dimensions of an array.

  • Delineate: Delineates spatio-temporal objects in a binary array.

  • Fill: Fills nodata values in an array.

  • Groupby: Splits an array into multiple groups.

Evaluate#

The evaluate verb evaluates an expression for each pixel in an array. These expressions can take many different forms, but each of them accepts the value of a specific pixel in the input array and applies some operator to it. The result of that operation is the new value of that particular pixel in the output array. That is, the output array always has the same shape as the input array. Below, we will show different forms of expressions, and the built-in operators that semantique offers for them. For advanced users, it is also possible to define their own custom operators, which is explained in the notebook on Internal query processing.

You can specify an operator function simply by its name:

sq.entity("water").evaluate("not")

To be autocomplete-friendly, you can also use built-in constants that refer to an operator function. These are stored in the operators module of semantique. Hence, the snippet below is equal to the one above:

sq.entity("water").evaluate(sq.operators.NOT)

When using the evaluate verb, it is important to be aware of the value type of the input array(s). For example, they may contain nominal, ordinal, binary or numerical data. Different operators may only support specific (combinations of) value types. For details, see here.

Univariate expressions#

The simplest form of expressions are the univariate expressions. For each pixel value \(x_{i}\) in input array \(X\), these expressions only consider \(x_{i}\), and apply an operator to it. That is, the expression is of the following form.

\[expression = operator(x_{i})\]

This means that the output array has the same shape as the input array, and that each pixel value in the output array is the result of the univariate expression evaluated on the value of the corresponding pixel in the input array.

evaluate_univariate

The built-in operators for this purpose include the numerical univariate operators, which are intended for usage on numerical arrays:

  • absolute: Computes the absolute value of \(x_{i}\).

  • floor: Returns the largest integer \(k\) such that \(k \leq x_{i}\).

  • ceiling: Returns the smallest integer \(k\) such that \(k \geq x_{i}\).

  • exponential: Computes the exponential value of \(x_{i}\), i.e. \(e^{x_{i}}\).

  • natural_logarithm: Computes the natural logarithm of \(x_{i}\).

  • square_root: Computes the square root of \(x_{i}\).

  • cube_root: Computes the cube root of \(x_{i}\).

Next to those there are the boolean univariate operators, which are intended for usage on binary arrays:

  • not: Returns \(1\) if \(x_{i} = 0\), and \(0\) if \(x_{i} \neq 0\).

There are also the trigonometric univariate operators, which are intended for usage on numerical arrays in which the numbers are angles, which may for example occur in data layers containing sun zenith angles. All these functions assume the angles are in radians, except the conversion function “to_radians”, which excepts the angles to be in degrees.

  • cosine: Computes the cosine of \(x_{i}\).

  • sine: Computes the sine of \(x_{i}\).

  • tangent: Computes the tangent of \(x_{i}\).

  • secant: Computes the secant of \(x_{i}\), which is equal to \(\frac{1}{cos(x_{i})}\)

  • cosecant: Computes the cosecant of \(x_{i}\), which is equal to \(\frac{1}{sin(x_{i})}\)

  • cotangent: Computes the cotangent of \(x_{i}\), which is equal to \(\frac{1}{tan(x_{i})}\)

  • to_degrees: Converts angles in radians to angles in degrees.

  • to_radians: Converts angles in degrees to angles in radians.

Finally, there are two operators that allow to locate nodata values in an array:

  • is_missing: Returns \(1\) if \(x_{i}\) is a missing observation, and \(0\) otherwise.

  • not_missing: Returns \(1\) if \(x_{i}\) is valid observation, and \(0\) otherwise.

For example, applying the not operator to the translated semantic concept water marks all non-water pixels as 1 and all water pixels as 0.

[4]:
recipe = sq.QueryRecipe()
[5]:
recipe["water"] = sq.entity("water")
recipe["not_water"] = sq.result("water").evaluate("not")
[6]:
out = recipe.execute(**context)
[7]:
out["water"]
[7]:
<xarray.DataArray 'water' (time: 3, y: 3, x: 3)>
array([[[0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.]],

       [[0., 1., 0.],
        [0., 1., 0.],
        [0., 0., 0.]],

       [[0., 0., 0.],
        [1., 0., 0.],
        [1., 0., 0.]]])
Coordinates:
  * x              (x) float64 4.532e+06 4.533e+06 4.535e+06
  * y              (y) float64 2.696e+06 2.694e+06 2.692e+06
    spatial_ref    int64 0
  * time           (time) datetime64[ns] 2019-12-15T10:17:33.408715 ... 2020-...
    temporal_ref   int64 0
    spatial_feats  (y, x) float64 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
Attributes:
    AREA_OR_POINT:  Area
    scale_factor:   1.0
    add_offset:     0.0
    _FillValue:     1.7976931348623157e+308
    value_type:     binary
[8]:
out["not_water"]
[8]:
<xarray.DataArray 'not_water' (time: 3, y: 3, x: 3)>
array([[[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]],

       [[1., 0., 1.],
        [1., 0., 1.],
        [1., 1., 1.]],

       [[1., 1., 1.],
        [0., 1., 1.],
        [0., 1., 1.]]])
Coordinates:
  * x              (x) float64 4.532e+06 4.533e+06 4.535e+06
  * y              (y) float64 2.696e+06 2.694e+06 2.692e+06
    spatial_ref    int64 0
  * time           (time) datetime64[ns] 2019-12-15T10:17:33.408715 ... 2020-...
    temporal_ref   int64 0
    spatial_feats  (y, x) float64 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
Attributes:
    AREA_OR_POINT:  Area
    scale_factor:   1.0
    add_offset:     0.0
    _FillValue:     1.7976931348623157e+308
    value_type:     binary

Bivariate expressions#

Bivariate expressions consist of a left-hand side value, a right-hand side value and an operator that in some way combines these values. For each pixel value \(x_{i}\) in input cube \(X\), the left-hand side value of the expression is \(x_{i}\), and the right-hand side value of the expression is another value \(y_{i}\).

\[expression = x_{i} \; operator \; y_{i}\]

The right-hand side value \(y_{i}\) can be a constant, meaning that the same right-hand side value is used for all pixels in \(X\).

evaluate_constant

The right-hand side value \(y_{i}\) can also be a pixel value from another array \(Y\). In that case, \(Y\) should have the same shape as \(X\) (or able to be aligned to that shape, see below), such that each pixel \(x_{i} \in X\) has a matching pixel \(y_{i} \in Y\), i.e. a pixel that has exactly the same coordinates for each dimension.

evaluate_multivariate

The built-in operators for bivariate expressions can be subdivided into different categories. The algebraic operators are intended for usage on numerical arrays and perform an operation of arithmetic:

  • add: Adds \(y_{i}\) to \(x_{i}\)..

  • subtract: Subtracts \(y_{i}\) from \(x_{i}\).

  • multiply: Multiplies \(x_{i}\) by \(y_{i}\).

  • divide: Divides \(x_{i}\) by \(y_{i}\).

  • power: Raises \(x_{i}\) to the \(y_{i}\)th power.

  • normalized_difference: Calculates \(\frac{x_{i} - y_{i}}{x_{i} + y_{i}}\).

[9]:
recipe = sq.QueryRecipe()
[10]:
recipe["count"] = sq.entity("vegetation").reduce("count", "time")
recipe["twice"] = sq.result("count").evaluate("multiply", 2)
[11]:
out = recipe.execute(**context)
[12]:
out["count"]
[12]:
<xarray.DataArray 'count' (y: 3, x: 3)>
array([[2., 0., 1.],
       [1., 0., 2.],
       [1., 1., 2.]])
Coordinates:
  * x              (x) float64 4.532e+06 4.533e+06 4.535e+06
  * y              (y) float64 2.696e+06 2.694e+06 2.692e+06
    spatial_ref    int64 0
    temporal_ref   int64 0
    spatial_feats  (y, x) float64 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
Attributes:
    value_type:  discrete
[13]:
out["twice"]
[13]:
<xarray.DataArray 'twice' (y: 3, x: 3)>
array([[4., 0., 2.],
       [2., 0., 4.],
       [2., 2., 4.]])
Coordinates:
  * x              (x) float64 4.532e+06 4.533e+06 4.535e+06
  * y              (y) float64 2.696e+06 2.694e+06 2.692e+06
    spatial_ref    int64 0
    temporal_ref   int64 0
    spatial_feats  (y, x) float64 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
Attributes:
    value_type:  discrete

The relational operators evaluate a condition. The result is always binary, i.e. true (1) when the condition holds and false (2) when it doesn’t. Some of the conditions test for equality, and hence can be used on any array whenever the value type of the left-hand values is the same as the value type of the right-hand values:

  • equal: Returns \(1\) if \(x_{i} = y_{i}\), and \(0\) otherwise.

  • not_equal: Returns \(1\) if \(x_{i} \neq y_{i}\), and \(0\) otherwise.

The other conditions imply a fixed order among the values, and hence should not be used on nominal arrays:

  • greater: Returns \(1\) if \(x_{i} > y_{i}\), and \(0\) otherwise.

  • less: Returns \(1\) if \(x_{i} < y_{i}\), and \(0\) otherwise.

  • greater_equal: Returns \(1\) if \(x_{i} \geq y_{i}\), and \(0\) otherwise.

  • less_equal:Returns \(1\) if \(x_{i} \leq y_{i}\), and \(0\) otherwise.

[14]:
recipe = sq.QueryRecipe()
[15]:
recipe["count"] = sq.entity("vegetation").reduce("count", "time")
recipe["high"] = sq.result("count").evaluate("greater_equal", 2)
[16]:
out = recipe.execute(**context)
[17]:
out["count"]
[17]:
<xarray.DataArray 'count' (y: 3, x: 3)>
array([[2., 0., 1.],
       [1., 0., 2.],
       [1., 1., 2.]])
Coordinates:
  * x              (x) float64 4.532e+06 4.533e+06 4.535e+06
  * y              (y) float64 2.696e+06 2.694e+06 2.692e+06
    spatial_ref    int64 0
    temporal_ref   int64 0
    spatial_feats  (y, x) float64 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
Attributes:
    value_type:  discrete
[18]:
out["high"]
[18]:
<xarray.DataArray 'high' (y: 3, x: 3)>
array([[1., 0., 0.],
       [0., 0., 1.],
       [0., 0., 1.]])
Coordinates:
  * x              (x) float64 4.532e+06 4.533e+06 4.535e+06
  * y              (y) float64 2.696e+06 2.694e+06 2.692e+06
    spatial_ref    int64 0
    temporal_ref   int64 0
    spatial_feats  (y, x) float64 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
Attributes:
    value_type:  binary

The boolean operators are intended to be used in expressions involving two binary values.

  • and: Returns \(1\) when both \(x_{i} \neq 0\) and \(y_{i} \neq 0\), and \(0\) otherwise.

  • or: Returns \(1\) when \(x_{i} \neq 0\), \(y_{i} \neq 0\), or both, and \(0\) otherwise.

  • exclusive_or: Returns \(1\) when either \(x_{i} \neq 0\) or \(y_{i} \neq 0\), but not both, and \(0\) otherwise.

[19]:
recipe = sq.QueryRecipe()
[20]:
recipe["water"] = sq.entity("water")
recipe["vegetation"] = sq.entity("vegetation")
recipe["both"] = sq.result("water").evaluate("or", sq.result("vegetation"))
[21]:
out = recipe.execute(**context)
[22]:
out["water"]
[22]:
<xarray.DataArray 'water' (time: 3, y: 3, x: 3)>
array([[[0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.]],

       [[0., 1., 0.],
        [0., 1., 0.],
        [0., 0., 0.]],

       [[0., 0., 0.],
        [1., 0., 0.],
        [1., 0., 0.]]])
Coordinates:
  * x              (x) float64 4.532e+06 4.533e+06 4.535e+06
  * y              (y) float64 2.696e+06 2.694e+06 2.692e+06
    spatial_ref    int64 0
  * time           (time) datetime64[ns] 2019-12-15T10:17:33.408715 ... 2020-...
    temporal_ref   int64 0
    spatial_feats  (y, x) float64 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
Attributes:
    AREA_OR_POINT:  Area
    scale_factor:   1.0
    add_offset:     0.0
    _FillValue:     1.7976931348623157e+308
    value_type:     binary
[23]:
out["vegetation"]
[23]:
<xarray.DataArray 'vegetation' (time: 3, y: 3, x: 3)>
array([[[0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.]],

       [[1., 0., 1.],
        [1., 0., 1.],
        [1., 1., 1.]],

       [[1., 0., 0.],
        [0., 0., 1.],
        [0., 0., 1.]]])
Coordinates:
  * x              (x) float64 4.532e+06 4.533e+06 4.535e+06
  * y              (y) float64 2.696e+06 2.694e+06 2.692e+06
    spatial_ref    int64 0
  * time           (time) datetime64[ns] 2019-12-15T10:17:33.408715 ... 2020-...
    temporal_ref   int64 0
    spatial_feats  (y, x) float64 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
Attributes:
    AREA_OR_POINT:  Area
    scale_factor:   1.0
    add_offset:     0.0
    _FillValue:     1.7976931348623157e+308
    value_type:     binary
[24]:
out["both"]
[24]:
<xarray.DataArray 'both' (time: 3, y: 3, x: 3)>
array([[[0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.]],

       [[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]],

       [[1., 0., 0.],
        [1., 0., 1.],
        [1., 0., 1.]]])
Coordinates:
  * x              (x) float64 4.532e+06 4.533e+06 4.535e+06
  * y              (y) float64 2.696e+06 2.694e+06 2.692e+06
    spatial_ref    int64 0
  * time           (time) datetime64[ns] 2019-12-15T10:17:33.408715 ... 2020-...
    temporal_ref   int64 0
    spatial_feats  (y, x) float64 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
Attributes:
    AREA_OR_POINT:  Area
    scale_factor:   1.0
    add_offset:     0.0
    _FillValue:     1.7976931348623157e+308
    value_type:     binary

The membership operators are a special category of operators that test for set membership. That is, they test for each pixel value \(x_{i}\) in cube \(X\) if it is or is not member of a set of values \(Y\). \(Y\) may be another array, but in that case it will be treated as being a set. That is, for each pixel value \(x_{i} \in X\) it will be tested if it occurs anywhere in array \(Y\).

  • in: Returns \(1\) if \(x_{i} \in Y\), and \(0\) otherwise. Here, \(Y\) is a finite set of values that remains constant for each \(x_{i}\).

  • not_in: Returns \(1\) if \(x_{i} \notin Y\), and \(0\) otherwise. Here, \(Y\) is a finite set of values that remains constant for each \(x_{i}\).

To represent a set, you can use the set() function of semantique. Alternatively, you can use Pythons built-in set, list or tuple objects.

[25]:
recipe = sq.QueryRecipe()
[26]:
recipe["colors"] = sq.appearance("colortype")
recipe["water"] = sq.result("colors").evaluate("in", sq.set(21, 22, 23, 24))
[27]:
out = recipe.execute(**context)
[28]:
out["colors"]
[28]:
<xarray.DataArray 'colors' (time: 3, y: 3, x: 3)>
array([[[29., 25., 29.],
        [29., 25., 29.],
        [29., 29., 29.]],

       [[ 4., 21.,  1.],
        [ 4., 21.,  1.],
        [ 4.,  3.,  4.]],

       [[ 3., 28.,  7.],
        [21., 28.,  3.],
        [23., 28.,  4.]]])
Coordinates:
  * x              (x) float64 4.532e+06 4.533e+06 4.535e+06
  * y              (y) float64 2.696e+06 2.694e+06 2.692e+06
    spatial_ref    int64 0
  * time           (time) datetime64[ns] 2019-12-15T10:17:33.408715 ... 2020-...
    temporal_ref   int64 0
    spatial_feats  (y, x) float64 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
Attributes:
    AREA_OR_POINT:  Area
    scale_factor:   1.0
    add_offset:     0.0
    _FillValue:     1.7976931348623157e+308
    value_type:     ordinal
    value_labels:   {1: 'SVHNIR', 2: 'SVLNIR', 3: 'AVHNIR', 4: 'AVLNIR', 5: '...
[29]:
out["water"]
[29]:
<xarray.DataArray 'water' (time: 3, y: 3, x: 3)>
array([[[0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.]],

       [[0., 1., 0.],
        [0., 1., 0.],
        [0., 0., 0.]],

       [[0., 0., 0.],
        [1., 0., 0.],
        [1., 0., 0.]]])
Coordinates:
  * x              (x) float64 4.532e+06 4.533e+06 4.535e+06
  * y              (y) float64 2.696e+06 2.694e+06 2.692e+06
    spatial_ref    int64 0
  * time           (time) datetime64[ns] 2019-12-15T10:17:33.408715 ... 2020-...
    temporal_ref   int64 0
    spatial_feats  (y, x) float64 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
Attributes:
    AREA_OR_POINT:  Area
    scale_factor:   1.0
    add_offset:     0.0
    _FillValue:     1.7976931348623157e+308
    value_type:     binary

To make your lives easier, semantique also includes the interval() function, allowing you to specify a set of values as an interval between a lower bound and an upper bound. The interval is assumed to be closed, meaning that both the lower and upper bounds are included. Do note that intervals are only supported for numerical or ordinal data.

[30]:
recipe["water"] = sq.result("colors").evaluate("in", sq.interval(21, 24))
[31]:
out = recipe.execute(**context)
[32]:
out["water"]
[32]:
<xarray.DataArray 'water' (time: 3, y: 3, x: 3)>
array([[[0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.]],

       [[0., 1., 0.],
        [0., 1., 0.],
        [0., 0., 0.]],

       [[0., 0., 0.],
        [1., 0., 0.],
        [1., 0., 0.]]])
Coordinates:
  * x              (x) float64 4.532e+06 4.533e+06 4.535e+06
  * y              (y) float64 2.696e+06 2.694e+06 2.692e+06
    spatial_ref    int64 0
  * time           (time) datetime64[ns] 2019-12-15T10:17:33.408715 ... 2020-...
    temporal_ref   int64 0
    spatial_feats  (y, x) float64 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
Attributes:
    AREA_OR_POINT:  Area
    scale_factor:   1.0
    add_offset:     0.0
    _FillValue:     1.7976931348623157e+308
    value_type:     binary

When the values in \(X\) have labels (i.e. when category indices are mapped to category labels), you can also use the labels as set members, instead of the indices. For that, use the label() function:

[33]:
labels = [sq.label(x) for x in ["DPWASH", "SLWASH", "TWASH", "SASLWA"]]
recipe["water"] = sq.result("colors").evaluate("in", labels)
[34]:
out = recipe.execute(**context)
[35]:
out["water"]
[35]:
<xarray.DataArray 'water' (time: 3, y: 3, x: 3)>
array([[[0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.]],

       [[0., 1., 0.],
        [0., 1., 0.],
        [0., 0., 0.]],

       [[0., 0., 0.],
        [1., 0., 0.],
        [1., 0., 0.]]])
Coordinates:
  * x              (x) float64 4.532e+06 4.533e+06 4.535e+06
  * y              (y) float64 2.696e+06 2.694e+06 2.692e+06
    spatial_ref    int64 0
  * time           (time) datetime64[ns] 2019-12-15T10:17:33.408715 ... 2020-...
    temporal_ref   int64 0
    spatial_feats  (y, x) float64 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
Attributes:
    AREA_OR_POINT:  Area
    scale_factor:   1.0
    add_offset:     0.0
    _FillValue:     1.7976931348623157e+308
    value_type:     binary

The temporal relational operators are relational operators specifically designed to deal with time instants and time intervals as operand values. Hence, it requires each pixel value \(x_{i}\) in input array \(X\) to be a time instant. The right-hand side of the expression can either be a single time instant, a time interval (i.e. a list of two time instants representing the start and end of an interval) or another array \(Y\) in which each pixel value \(y_{i}\) is a time instant. The latter will in practice be evaluated as being a time interval with the earliest time instant being the start of the interval, and the latest time instant the end of the interval.

The currently implemented temporal relational operators are:

  • after: When \(y\) is a time instant: returns \(1\) if \(x_{i} > y\), and \(0\) otherwise. When \(y\) is a time interval: returns \(1\) if \(x_{i} > max(y)\), and \(0\) otherwise.

  • before: When \(y\) is a time instant: returns \(1\) if \(x_{i} < y\), and \(0\) otherwise. When \(y\) is a time interval: returns \(1\) if \(x_{i} < min(y)\), and \(0\) otherwise.

  • during: Returns \(1\) if \(min(y) \leq x_{i} \leq max(y)\), and \(0\) otherwise. Only intended for time intervals as \(y\).

To construct time instants and time intervals for usage as right-hand operand, you can use the time_instant() and time_interval() functions that semantique provides. The first expects a single datetime value, while the latter expects two datetime values (i.e. the start and end of the interval, with the interval being closed at both sides). You can provide datetimes in formats as “2020-12-31” or “2020/12/31”, but also complete ISO8601 timestamps such as “2020-12-31T14:37:22”. As long as the Timestamp initializer of the pandas package can understand it, it is supported by semantique. Any additional keyword arguments will be forwarded to this initializer.

[36]:
recipe = sq.QueryRecipe()
[37]:
recipe["times"] = sq.entity("water").extract("time")
recipe["early"] = sq.result("times").evaluate("before", sq.time_instant("2019-12-31"))
recipe["late"] = sq.result("times").evaluate("during", sq.time_interval("2020-01-01", "2020-12-31"))
[38]:
out = recipe.execute(**context)
[39]:
out["early"]
[39]:
<xarray.DataArray 'early' (time: 3)>
array([1., 0., 0.])
Coordinates:
    spatial_ref   int64 0
  * time          (time) datetime64[ns] 2019-12-15T10:17:33.408715 ... 2020-1...
    temporal_ref  int64 0
Attributes:
    value_type:  binary
[40]:
out["late"]
[40]:
<xarray.DataArray 'late' (time: 3)>
array([0., 1., 1.])
Coordinates:
    spatial_ref   int64 0
  * time          (time) datetime64[ns] 2019-12-15T10:17:33.408715 ... 2020-1...
    temporal_ref  int64 0
Attributes:
    value_type:  binary

The spatial relational operators are relational operators specifically designed to deal with geometries as operand values. It requires each pixel value \(x_{i}\) in input array \(X\) to be a tuple of spatial (x,y) coordinates. The right-hand side of the expression can either be one or more spatial geometries, or another array \(Y\) in which each pixel value \(y_{i}\) is a coordinate tuple. The latter will in practice be evaluated as being a geometry covering the spatial bounding box of the array.

The currently implemented spatial relational operators are:

  • intersects: Returns \(1\) if the spatial point with the coordinates of \(x_{i}\) spatially intersects with geometry \(y\), and \(0\) otherwise.

To construct a spatial geometry for usage as right-hand operand, you can use the geometry() function that semantique offers. It expects an object that can be read by the GeoDataFrame initializer of the geopandas package. Any additional keyword arguments will be forwarded to this initializer. In practice, this means you can read any GDAL-supported file format with geopandas.read_file(), and then use that object to create spatial geometries.

[41]:
recipe = sq.QueryRecipe()
[42]:
parcels = gpd.read_file("files/parcels.geojson")
recipe["coords"] = sq.entity("water").extract("space")
recipe["in_parcel"] = sq.result("coords").evaluate("intersects", sq.geometry(parcels))
[43]:
out = recipe.execute(**context)
/home/luuk/miniconda3/envs/semantique/lib/python3.11/site-packages/numpy/core/numeric.py:407: RuntimeWarning: invalid value encountered in cast
  multiarray.copyto(res, fill_value, casting='unsafe')
[44]:
out["in_parcel"]
[44]:
<xarray.DataArray 'in_parcel' (y: 3, x: 3)>
array([[0, 1, 0],
       [0, 0, 0],
       [0, 0, 0]])
Coordinates:
  * y              (y) float64 2.696e+06 2.694e+06 2.692e+06
  * x              (x) float64 4.532e+06 4.533e+06 4.535e+06
    spatial_ref    int64 0
    temporal_ref   int64 0
    spatial_feats  (y, x) float64 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
Attributes:
    value_type:  binary

Aligning two arrays#

In bivariate expressions involving two arrays, the second array \(Y\) does not necessarily have to be of the same shape as input array \(X\). Instead, it should be possible to align it to that shape. This can be done in two ways.

First consider the case where \(Y\) has the same dimensions as \(X\), but not all coordinate values of \(X\) are present in \(Y\). In that case, we can align \(Y\) with \(X\) such that pixel values at position \(i\) in both arrays, i.e. \(x_{i}\) and \(y_{i}\) respectively, belong to pixels with the same coordinates. If \(y_{i}\) was not originally part of \(Y\), we assign it a nodata value. This also works vice-versa, with coordinate values in \(Y\) that are not present in \(X\).

evaluate_align_same_dims

Secondly, consider a case where \(Y\) has one or more dimensions with exactly the same coordinate values as \(X\), but does not have all the dimensions that \(X\) has. In that case, we can align \(Y\) with \(X\) by duplicating its values along those dimensions that are missing. This does not work vice versa. When cube \(Y\) has more dimensions that cube \(X\), there is no clear way to define how to subset the values in \(Y\) to match the shape of \(X\).

evaluate_align_missing_dims

Alignment is something you have to be aware of to understand how bivariate expressions are evaluated. However, it is not something you have to do for yourself. Internally, the query processor takes care of it.

Extract#

The extract verbs extracts the coordinates of a specified dimension from an array, and returns them as a new, one-dimensional array.

extract

Dimensions can be referred to by their name. For example:

sq.entity("water").extract("time")

Semantique also contains built-in constants storing the reserved names for the spatio-temporal dimensions. These include the time dimensions, the spatial x and y dimensions, as well as the space dimension as a whole, which is a multi-indexed dimension containing both the y and x coordinates. These constants are stored in the dimensions module can be used to refer to the spatio-temporal dimensions without knowing their exact name. Hence, the snippet below is equivalent to the one above:

sq.entity("water").extract(sq.dimensions.TIME)
[45]:
recipe = sq.QueryRecipe()
[46]:
recipe["time"] = sq.entity("water").extract("time")
recipe["x"] = sq.entity("water").extract("x")
[47]:
out = recipe.execute(**context)
[48]:
out["time"]
[48]:
<xarray.DataArray 'time' (time: 3)>
array(['2019-12-15T10:17:33.408715000', '2020-09-05T10:17:43.167942000',
       '2020-12-19T10:17:34.610661000'], dtype='datetime64[ns]')
Coordinates:
    spatial_ref   int64 0
  * time          (time) datetime64[ns] 2019-12-15T10:17:33.408715 ... 2020-1...
    temporal_ref  int64 0
Attributes:
    value_type:  datetime
[49]:
out["x"]
[49]:
<xarray.DataArray 'x' (x: 3)>
array([4531500., 4533300., 4535100.])
Coordinates:
  * x             (x) float64 4.532e+06 4.533e+06 4.535e+06
    spatial_ref   int64 0
    temporal_ref  int64 0
Attributes:
    axis:           X
    long_name:      x coordinate of projection
    standard_name:  projection_x_coordinate
    units:          metre
    resolution:     1800
    value_type:     continuous

When extracting the spatial dimension as a whole, both the x and y dimensions are returned, with coordinate tuples as pixel values.

[50]:
recipe["space"] = sq.entity("water").extract("space")
[51]:
out = recipe.execute(**context)
[52]:
out["space"]
[52]:
<xarray.DataArray 'space' (y: 3, x: 3)>
array([[(2695500.0, 4531500.0), (2695500.0, 4533300.0),
        (2695500.0, 4535100.0)],
       [(2693700.0, 4531500.0), (2693700.0, 4533300.0),
        (2693700.0, 4535100.0)],
       [(2691900.0, 4531500.0), (2691900.0, 4533300.0),
        (2691900.0, 4535100.0)]], dtype=object)
Coordinates:
  * y              (y) float64 2.696e+06 2.694e+06 2.692e+06
  * x              (x) float64 4.532e+06 4.533e+06 4.535e+06
    spatial_ref    int64 0
    temporal_ref   int64 0
    spatial_feats  (y, x) float64 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
Attributes:
    value_type:  coords

Extracting a component of dimension coordinates#

Coordinate values of some dimensions may consist of multiple components. For example, the time coordinates are made up of a year, a month, a day, an hour, etc, while the multi-indexed space coordinates are made up of an x and a y coordinate. If you are only interested in a single component of a dimension, you can specify that through the second, optional argument of the extract verb. A component can be referred to directly by its name. For example:

sq.entity("water").extract("time", "year")

To be autocomplete-friendly, you can also use built-in constants that refer to a dimension component. These are stored in the components module of semantique. The snippet below is equal to the snippet above:

sq.entity("water").extract(sq.dimensions.TIME, sq.components.time.YEAR)
[53]:
recipe = sq.QueryRecipe()
[54]:
recipe["years"] = sq.entity("water").extract("time", "year")
[55]:
out = recipe.execute(**context)
[56]:
out["years"]
[56]:
<xarray.DataArray 'years' (time: 3)>
array([2019, 2020, 2020])
Coordinates:
    spatial_ref   int64 0
  * time          (time) datetime64[ns] 2019-12-15T10:17:33.408715 ... 2020-1...
    temporal_ref  int64 0
Attributes:
    value_type:  discrete

The spatial dimension has one additional component that may be of use: for each tuple of (x,y)-coordinates, it stores the index of the spatial feature to which that coordinate pair belongs. This is especially helpful when the spatial extent consists of multiple disconnected features (which in our example is not the case, we just have a single polygon as spatial extent).

[57]:
recipe = sq.QueryRecipe()
[58]:
recipe["feats"] = sq.entity("water").extract("space", "feature")
[59]:
out = recipe.execute(**context)
[60]:
out["feats"]
[60]:
<xarray.DataArray 'feats' (y: 3, x: 3)>
array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])
Coordinates:
  * x              (x) float64 4.532e+06 4.533e+06 4.535e+06
  * y              (y) float64 2.696e+06 2.694e+06 2.692e+06
    spatial_ref    int64 0
    temporal_ref   int64 0
    spatial_feats  (y, x) float64 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
Attributes:
    value_type:    nominal
    value_labels:  {1: 'feature_1'}

Filter#

The filter verb filters values from an array. That is, the output array is a subset of the input array. Which values in the input array are kept, and which are removed, is defined by a second, binary array which we call the filterer. The filterer should have the same shape as the input array (or able to be aligned to that shape, see below), such that each pixel \(x_{i}\) in input cube \(X\) has a matching pixel \(y_{i}\) in filterer \(Y\), i.e. a pixel that has exactly the same coordinates for each dimension. Then:

  • \(x_{i}\) is kept if \(y_{i} \neq 0\).

  • \(x_{i}\) is removed if \(y_{i} = 0\).

Here, kept means that the pixel value remains unchanged, while removed means that the pixel gets a nodata value assigned.

filter

For example, we may filter only those pixels in the translated semantic concept water that where not covered by clouds.

[61]:
recipe = sq.QueryRecipe()
[62]:
recipe["water"] = sq.entity("water")
recipe["cloud"] = sq.entity("cloud")
recipe["not_cloud"] = sq.result("cloud").evaluate("not")
recipe["filtered"] = sq.result("water").filter(sq.result("not_cloud"))
[63]:
out = recipe.execute(**context)
[64]:
out["water"]
[64]:
<xarray.DataArray 'water' (time: 3, y: 3, x: 3)>
array([[[0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.]],

       [[0., 1., 0.],
        [0., 1., 0.],
        [0., 0., 0.]],

       [[0., 0., 0.],
        [1., 0., 0.],
        [1., 0., 0.]]])
Coordinates:
  * x              (x) float64 4.532e+06 4.533e+06 4.535e+06
  * y              (y) float64 2.696e+06 2.694e+06 2.692e+06
    spatial_ref    int64 0
  * time           (time) datetime64[ns] 2019-12-15T10:17:33.408715 ... 2020-...
    temporal_ref   int64 0
    spatial_feats  (y, x) float64 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
Attributes:
    AREA_OR_POINT:  Area
    scale_factor:   1.0
    add_offset:     0.0
    _FillValue:     1.7976931348623157e+308
    value_type:     binary
[65]:
out["not_cloud"]
[65]:
<xarray.DataArray 'not_cloud' (time: 3, y: 3, x: 3)>
array([[[1., 0., 1.],
        [1., 0., 1.],
        [1., 1., 1.]],

       [[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]],

       [[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]]])
Coordinates:
  * x              (x) float64 4.532e+06 4.533e+06 4.535e+06
  * y              (y) float64 2.696e+06 2.694e+06 2.692e+06
    spatial_ref    int64 0
  * time           (time) datetime64[ns] 2019-12-15T10:17:33.408715 ... 2020-...
    temporal_ref   int64 0
    spatial_feats  (y, x) float64 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
Attributes:
    AREA_OR_POINT:  Area
    scale_factor:   1.0
    add_offset:     0.0
    _FillValue:     1.7976931348623157e+308
    value_type:     binary
[66]:
out["filtered"]
[66]:
<xarray.DataArray 'filtered' (time: 3, y: 3, x: 3)>
array([[[ 0., nan,  0.],
        [ 0., nan,  0.],
        [ 0.,  0.,  0.]],

       [[ 0.,  1.,  0.],
        [ 0.,  1.,  0.],
        [ 0.,  0.,  0.]],

       [[ 0.,  0.,  0.],
        [ 1.,  0.,  0.],
        [ 1.,  0.,  0.]]])
Coordinates:
  * x              (x) float64 4.532e+06 4.533e+06 4.535e+06
  * y              (y) float64 2.696e+06 2.694e+06 2.692e+06
    spatial_ref    int64 0
  * time           (time) datetime64[ns] 2019-12-15T10:17:33.408715 ... 2020-...
    temporal_ref   int64 0
    spatial_feats  (y, x) float64 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
Attributes:
    AREA_OR_POINT:  Area
    scale_factor:   1.0
    add_offset:     0.0
    _FillValue:     1.7976931348623157e+308
    value_type:     binary

We could also filter a vegetation count map keeping only those pixels with a value above 1. We first evaluate that condition using the evaluate() verb, and use that output as the filterer.

[67]:
recipe = sq.QueryRecipe()
[68]:
recipe["count"] = sq.entity("vegetation").reduce("count", "time")
recipe["high"] = sq.result("count").filter(sq.self().evaluate("greater", 1))
[69]:
out = recipe.execute(**context)
[70]:
out["count"]
[70]:
<xarray.DataArray 'count' (y: 3, x: 3)>
array([[2., 0., 1.],
       [1., 0., 2.],
       [1., 1., 2.]])
Coordinates:
  * x              (x) float64 4.532e+06 4.533e+06 4.535e+06
  * y              (y) float64 2.696e+06 2.694e+06 2.692e+06
    spatial_ref    int64 0
    temporal_ref   int64 0
    spatial_feats  (y, x) float64 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
Attributes:
    value_type:  discrete
[71]:
out["high"]
[71]:
<xarray.DataArray 'high' (y: 3, x: 3)>
array([[ 2., nan, nan],
       [nan, nan,  2.],
       [nan, nan,  2.]])
Coordinates:
  * x              (x) float64 4.532e+06 4.533e+06 4.535e+06
  * y              (y) float64 2.696e+06 2.694e+06 2.692e+06
    spatial_ref    int64 0
    temporal_ref   int64 0
    spatial_feats  (y, x) float64 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
Attributes:
    value_type:  discrete

Filtering by dimension coordinates#

The filterer does not have to be of the same shape as the input array, as long as we can align it to that shape. See this section for more details on how this works. In practice, it means that we can also filter values in an array based on the coordinates of a dimension. All we have to do is to construct a filterer for that dimension. Hence, a binary, one-dimensional array that specifies for each of the coordinates if values of pixels having that coordinate should be kept (i.e. 1) or removed (i.e. 0).

filter_dim

For example, when we only want to keep pixel values observed in 2020:

[72]:
recipe = sq.QueryRecipe()
[73]:
recipe["2020"] = sq.entity("water").filter(sq.self().extract("time", "year").evaluate("equal", 2020))
[74]:
out = recipe.execute(**context)
[75]:
out["2020"]
[75]:
<xarray.DataArray '2020' (time: 3, y: 3, x: 3)>
array([[[nan, nan, nan],
        [nan, nan, nan],
        [nan, nan, nan]],

       [[ 0.,  1.,  0.],
        [ 0.,  1.,  0.],
        [ 0.,  0.,  0.]],

       [[ 0.,  0.,  0.],
        [ 1.,  0.,  0.],
        [ 1.,  0.,  0.]]])
Coordinates:
  * x              (x) float64 4.532e+06 4.533e+06 4.535e+06
  * y              (y) float64 2.696e+06 2.694e+06 2.692e+06
    spatial_ref    int64 0
  * time           (time) datetime64[ns] 2019-12-15T10:17:33.408715 ... 2020-...
    temporal_ref   int64 0
    spatial_feats  (y, x) float64 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
Attributes:
    AREA_OR_POINT:  Area
    scale_factor:   1.0
    add_offset:     0.0
    _FillValue:     1.7976931348623157e+308
    value_type:     binary

You can also use a handy shortcut for the above formulation: the filter_time verb. This verb allows you to apply a temporal filter without having to explicitly extract the time coordinates from the array and evaluating a comparison expression on them.

The filter_time verb is only a “shortcut” verb, not an independent verb on its own. This means that when calling the filter_time verb, it is internally translated into a textual query recipe containing the self reference and the extract and evaluate verbs instead. In the same way, you can also use the shortcut verb filter_space for spatial filters.

[76]:
recipe["2020"] = sq.entity("water").filter_time("year", "equal", 2020)
[77]:
out = recipe.execute(**context)
[78]:
out["2020"]
[78]:
<xarray.DataArray '2020' (time: 3, y: 3, x: 3)>
array([[[nan, nan, nan],
        [nan, nan, nan],
        [nan, nan, nan]],

       [[ 0.,  1.,  0.],
        [ 0.,  1.,  0.],
        [ 0.,  0.,  0.]],

       [[ 0.,  0.,  0.],
        [ 1.,  0.,  0.],
        [ 1.,  0.,  0.]]])
Coordinates:
  * x              (x) float64 4.532e+06 4.533e+06 4.535e+06
  * y              (y) float64 2.696e+06 2.694e+06 2.692e+06
    spatial_ref    int64 0
  * time           (time) datetime64[ns] 2019-12-15T10:17:33.408715 ... 2020-...
    temporal_ref   int64 0
    spatial_feats  (y, x) float64 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
Attributes:
    AREA_OR_POINT:  Area
    scale_factor:   1.0
    add_offset:     0.0
    _FillValue:     1.7976931348623157e+308
    value_type:     binary

Self-filtering#

A special type of a filtering operation is self-filtering, i.e. filtering an array by itself. In this case, the input array should be binary. In the output, the “true” values will be preserved, while the “false” values are removed.

filter_self

[79]:
recipe = sq.QueryRecipe()
[80]:
recipe["vegetation"] = sq.entity("vegetation")
recipe["true_vegetation"] = sq.result("vegetation").filter(sq.self())
[81]:
out = recipe.execute(**context)
[82]:
out["vegetation"]
[82]:
<xarray.DataArray 'vegetation' (time: 3, y: 3, x: 3)>
array([[[0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.]],

       [[1., 0., 1.],
        [1., 0., 1.],
        [1., 1., 1.]],

       [[1., 0., 0.],
        [0., 0., 1.],
        [0., 0., 1.]]])
Coordinates:
  * x              (x) float64 4.532e+06 4.533e+06 4.535e+06
  * y              (y) float64 2.696e+06 2.694e+06 2.692e+06
    spatial_ref    int64 0
  * time           (time) datetime64[ns] 2019-12-15T10:17:33.408715 ... 2020-...
    temporal_ref   int64 0
    spatial_feats  (y, x) float64 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
Attributes:
    AREA_OR_POINT:  Area
    scale_factor:   1.0
    add_offset:     0.0
    _FillValue:     1.7976931348623157e+308
    value_type:     binary
[83]:
out["true_vegetation"]
[83]:
<xarray.DataArray 'true_vegetation' (time: 3, y: 3, x: 3)>
array([[[nan, nan, nan],
        [nan, nan, nan],
        [nan, nan, nan]],

       [[ 1., nan,  1.],
        [ 1., nan,  1.],
        [ 1.,  1.,  1.]],

       [[ 1., nan, nan],
        [nan, nan,  1.],
        [nan, nan,  1.]]])
Coordinates:
  * x              (x) float64 4.532e+06 4.533e+06 4.535e+06
  * y              (y) float64 2.696e+06 2.694e+06 2.692e+06
    spatial_ref    int64 0
  * time           (time) datetime64[ns] 2019-12-15T10:17:33.408715 ... 2020-...
    temporal_ref   int64 0
    spatial_feats  (y, x) float64 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
Attributes:
    AREA_OR_POINT:  Area
    scale_factor:   1.0
    add_offset:     0.0
    _FillValue:     1.7976931348623157e+308
    value_type:     binary

Assign#

The assign verb assigns each pixel in an array a new value that is not a function of the original value. The new value can be a constant, meaning that the same right-hand side value is used for all pixels in the input array. The new value can also be a pixel value from another array \(Y\). In that case, \(Y\) should have the same shape as input array \(X\) (or able to be aligned to that shape, see here), such that each pixel \(x_{i} \in X\) has a matching pixel \(y_{i} \in Y\), i.e. a pixel that has exactly the same coordinates for each dimension. Missing observations (i.e. pixels with a nodata value) in the input are preserved: they never get assigned a new value, unless specifically stated through the at argument (see below).

assign

A trivial example:

[84]:
recipe = sq.QueryRecipe()
[85]:
recipe["vegetation"] = sq.entity("vegetation")
recipe["foo"] = sq.result("vegetation").assign(-99)
[86]:
out = recipe.execute(**context)
[87]:
out["vegetation"]
[87]:
<xarray.DataArray 'vegetation' (time: 3, y: 3, x: 3)>
array([[[0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.]],

       [[1., 0., 1.],
        [1., 0., 1.],
        [1., 1., 1.]],

       [[1., 0., 0.],
        [0., 0., 1.],
        [0., 0., 1.]]])
Coordinates:
  * x              (x) float64 4.532e+06 4.533e+06 4.535e+06
  * y              (y) float64 2.696e+06 2.694e+06 2.692e+06
    spatial_ref    int64 0
  * time           (time) datetime64[ns] 2019-12-15T10:17:33.408715 ... 2020-...
    temporal_ref   int64 0
    spatial_feats  (y, x) float64 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
Attributes:
    AREA_OR_POINT:  Area
    scale_factor:   1.0
    add_offset:     0.0
    _FillValue:     1.7976931348623157e+308
    value_type:     binary
[88]:
out["foo"]
[88]:
<xarray.DataArray 'foo' (time: 3, y: 3, x: 3)>
array([[[-99., -99., -99.],
        [-99., -99., -99.],
        [-99., -99., -99.]],

       [[-99., -99., -99.],
        [-99., -99., -99.],
        [-99., -99., -99.]],

       [[-99., -99., -99.],
        [-99., -99., -99.],
        [-99., -99., -99.]]])
Coordinates:
  * x              (x) float64 4.532e+06 4.533e+06 4.535e+06
  * y              (y) float64 2.696e+06 2.694e+06 2.692e+06
    spatial_ref    int64 0
  * time           (time) datetime64[ns] 2019-12-15T10:17:33.408715 ... 2020-...
    temporal_ref   int64 0
    spatial_feats  (y, x) float64 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
Attributes:
    AREA_OR_POINT:  Area
    scale_factor:   1.0
    add_offset:     0.0
    _FillValue:     1.7976931348623157e+308
    value_type:     continuous

Optionally, you can assign a new value to only a subset of the pixels in the input array. In that case, the assigned values should always be of the same value type as the input array! The subset of pixels can be specified by providing a binary array as parameter at. Note that this will also fill nodata pixels if they correspond to a true value in the provided binary array.

[89]:
recipe = sq.QueryRecipe()
[90]:
recipe["count"] = sq.entity("vegetation").reduce("count", "time")
recipe["foo"] = sq.result("count").assign(0, at = sq.self().evaluate("less", 2))
[91]:
out = recipe.execute(**context)
[92]:
out["count"]
[92]:
<xarray.DataArray 'count' (y: 3, x: 3)>
array([[2., 0., 1.],
       [1., 0., 2.],
       [1., 1., 2.]])
Coordinates:
  * x              (x) float64 4.532e+06 4.533e+06 4.535e+06
  * y              (y) float64 2.696e+06 2.694e+06 2.692e+06
    spatial_ref    int64 0
    temporal_ref   int64 0
    spatial_feats  (y, x) float64 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
Attributes:
    value_type:  discrete
[93]:
out["foo"]
[93]:
<xarray.DataArray 'foo' (y: 3, x: 3)>
array([[2., 0., 0.],
       [0., 0., 2.],
       [0., 0., 2.]])
Coordinates:
  * x              (x) float64 4.532e+06 4.533e+06 4.535e+06
  * y              (y) float64 2.696e+06 2.694e+06 2.692e+06
    spatial_ref    int64 0
    temporal_ref   int64 0
    spatial_feats  (y, x) float64 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
Attributes:
    value_type:  discrete

Assigning dimension coordinates#

When the values to be assigned are taken from another array, this array does not have to be of the same shape as the input array, as long as we can align it to that shape. See this section for more details on how this works. In practice, it means that we can also assign dimension coordinates as new values (or optionally, a specific component of dimension coordinates).

assign_dim

For example, for each observation, we want to store the month in which the observation was made:

[94]:
recipe = sq.QueryRecipe()
[95]:
recipe["vegetation"] = sq.entity("vegetation")
recipe["months"] = sq.result("vegetation").assign(sq.self().extract("time", "month"))
[96]:
out = recipe.execute(**context)
[97]:
out["vegetation"]
[97]:
<xarray.DataArray 'vegetation' (time: 3, y: 3, x: 3)>
array([[[0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.]],

       [[1., 0., 1.],
        [1., 0., 1.],
        [1., 1., 1.]],

       [[1., 0., 0.],
        [0., 0., 1.],
        [0., 0., 1.]]])
Coordinates:
  * x              (x) float64 4.532e+06 4.533e+06 4.535e+06
  * y              (y) float64 2.696e+06 2.694e+06 2.692e+06
    spatial_ref    int64 0
  * time           (time) datetime64[ns] 2019-12-15T10:17:33.408715 ... 2020-...
    temporal_ref   int64 0
    spatial_feats  (y, x) float64 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
Attributes:
    AREA_OR_POINT:  Area
    scale_factor:   1.0
    add_offset:     0.0
    _FillValue:     1.7976931348623157e+308
    value_type:     binary
[98]:
out["months"]
[98]:
<xarray.DataArray 'months' (time: 3, y: 3, x: 3)>
array([[[12., 12., 12.],
        [12., 12., 12.],
        [12., 12., 12.]],

       [[ 9.,  9.,  9.],
        [ 9.,  9.,  9.],
        [ 9.,  9.,  9.]],

       [[12., 12., 12.],
        [12., 12., 12.],
        [12., 12., 12.]]])
Coordinates:
  * x              (x) float64 4.532e+06 4.533e+06 4.535e+06
  * y              (y) float64 2.696e+06 2.694e+06 2.692e+06
    spatial_ref    int64 0
  * time           (time) datetime64[ns] 2019-12-15T10:17:33.408715 ... 2020-...
    temporal_ref   int64 0
    spatial_feats  (y, x) float64 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
Attributes:
    AREA_OR_POINT:  Area
    scale_factor:   1.0
    add_offset:     0.0
    _FillValue:     1.7976931348623157e+308
    value_type:     ordinal
    value_labels:   {1: 'January', 2: 'February', 3: 'March', 4: 'April', 5: ...

You can also use a handy shortcut for the above formulation: the assign_time verb. This verb allows you to assign temporal coordinates without having to explicitly extract them from the array.

This verb is only a “shortcut” verb, not an independent verb on its own. This means that when calling it, it is internally translated into a textual query recipe containing the self reference and the extract verb. In the same way, you can also use the shortcut verb assign_space to assign (components of) spatial coordinates.

[99]:
recipe["months"] = sq.result("vegetation").assign_time("month")
[100]:
out = recipe.execute(**context)
[101]:
out["months"]
[101]:
<xarray.DataArray 'months' (time: 3, y: 3, x: 3)>
array([[[12., 12., 12.],
        [12., 12., 12.],
        [12., 12., 12.]],

       [[ 9.,  9.,  9.],
        [ 9.,  9.,  9.],
        [ 9.,  9.,  9.]],

       [[12., 12., 12.],
        [12., 12., 12.],
        [12., 12., 12.]]])
Coordinates:
  * x              (x) float64 4.532e+06 4.533e+06 4.535e+06
  * y              (y) float64 2.696e+06 2.694e+06 2.692e+06
    spatial_ref    int64 0
  * time           (time) datetime64[ns] 2019-12-15T10:17:33.408715 ... 2020-...
    temporal_ref   int64 0
    spatial_feats  (y, x) float64 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
Attributes:
    AREA_OR_POINT:  Area
    scale_factor:   1.0
    add_offset:     0.0
    _FillValue:     1.7976931348623157e+308
    value_type:     ordinal
    value_labels:   {1: 'January', 2: 'February', 3: 'March', 4: 'April', 5: ...

Assigning to nodata values#

Using the at argument of the assign verb in combination with the is_missing reducer, the assign verb can also be used to assign fixed values to pixels with missing data (see the fill verb for a more sophisticated way to fill missing data based on interpolation). A trivial example:

[102]:
recipe = sq.QueryRecipe()
[103]:
recipe["foo"] = sq.entity("vegetation").filter(sq.self()).reduce("count", "time")
recipe["bar"] = sq.result("foo").assign(-999, sq.self().evaluate("is_missing"))
[104]:
out = recipe.execute(**context)
[105]:
out["foo"]
[105]:
<xarray.DataArray 'foo' (y: 3, x: 3)>
array([[ 2., nan,  1.],
       [ 1., nan,  2.],
       [ 1.,  1.,  2.]])
Coordinates:
  * x              (x) float64 4.532e+06 4.533e+06 4.535e+06
  * y              (y) float64 2.696e+06 2.694e+06 2.692e+06
    spatial_ref    int64 0
    temporal_ref   int64 0
    spatial_feats  (y, x) float64 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
Attributes:
    value_type:  discrete
[106]:
out["bar"]
[106]:
<xarray.DataArray 'bar' (y: 3, x: 3)>
array([[   2., -999.,    1.],
       [   1., -999.,    2.],
       [   1.,    1.,    2.]])
Coordinates:
  * x              (x) float64 4.532e+06 4.533e+06 4.535e+06
  * y              (y) float64 2.696e+06 2.694e+06 2.692e+06
    spatial_ref    int64 0
    temporal_ref   int64 0
    spatial_feats  (y, x) float64 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
Attributes:
    value_type:  discrete

Reduce#

The reduce verb applies a reducer function along a dimension and subsequently drops the reduced dimension. That is, the output array always has one dimension less than the input array. Hence, the reduce verb reduces the dimensionality of an array.

To reduce a dimension, the reducer function operates on each slice of values along the axis of the dimension. Such a slice contains one value for each coordinate label of the dimension to reduce over, while the coordinate labels of all other dimensions are constant within each slice. The reducer function always returns a single value, such that each slice gets reduced from \(n\) values to one value.

For example: an array with a spatial and a temporal dimension contains for each location in space \(n\) values, where \(n\) is the number of timestamps in the temporal dimension. When we reduce the temporal dimension of this array, the reducer function reduces these \(n\) values for each location in space to one value. The resulting array has a single value per location in space, and no temporal dimension anymore.

reduce

The are many different types of reducers available in semantique. For advanced users, it is also possible to define their own custom reducers, which is explained in the notebook on Internal query processing.

You can specify a reducer function simply by its name:

sq.entity("water").reducer("mean", "time")

To be autocomplete-friendly, you can also use built-in constants that refer to a reducer function. These are stored in the reducers module of semantique. In the same way, you can use the built-in constants stored in the dimensions module to refer to one of the spatio-temporal dimensions, without having to know its exact name. The snippet below is equal to the snippet above:

sq.entity("water").reduce(sq.reducers.MEAN, sq.dimensions.TIME)

The built-in reducer functions of semantique currently are:

  • mean: Returns the average value of each slice \(S\).

  • median: Returns the median value of each slice \(S\).

  • mode: Returns the most occuring value in each slice \(S\).

  • max: Returns the largest value in each slice \(S\).

  • min: Returns the smallest value in each slice \(S\).

  • range: Returns the difference between the largest and smallest value in each slice \(S\).

  • n: Returns the number of observations in each slice \(S\).

  • product: Returns the product of the values in each slice \(S\).

  • sum: Returns the sum of the values in each slice \(S\).

  • standard_deviation: Returns the standard deviation of the values in each slice \(S\).

  • variance: Returns the variance of the values in each slice \(S\).

  • all: For each slice \(S\), returns \(1\) if all \(x_{i} \in S \neq 0\), and \(0\) otherwise.

  • any: For each slice \(S\), returns \(1\) if any \(x_{i} \in S \neq 0\), and \(0\) otherwise.

  • none: For each slice \(S\), returns \(1\) if all \(x_{i} \in S \eq 0\), and \(0\) otherwise.

  • count: Counts the number of non-zero values in each slice \(S\).

  • percentage: Calculates the percentage of non-zero values in each slice \(S\).

  • first: Returns the first value of each slice \(S\).

  • last: Returns the last value of each slice \(S\).

It is important to mention that nodata values are ignored by the reducer functions! That is, for example, when a slice has the values [1, 1, nan, 1] the all reducer will return “true” and the percentage reducer will return 100. A reducer will only return a nodata value when all values in a slice are nodata values.

Also, when using the reduce verb, it is important to be aware of the value type of the input array(s). For example, they may contain nominal, ordinal, binary or numerical data. Different reducers may only support specific value types. For details, see here.

Having said that, lets show some examples. The reduce verb takes as first argument the reducer function to be applied, and as second argument the name of the dimension to be reduced over.

[107]:
recipe = sq.QueryRecipe()
[108]:
recipe["vegetation"] = sq.entity("vegetation")
recipe["map"] = sq.result("vegetation").reduce("count", "time")
recipe["series"] = sq.result("vegetation").reduce("count", "space")
[109]:
out = recipe.execute(**context)
[110]:
out["vegetation"]
[110]:
<xarray.DataArray 'vegetation' (time: 3, y: 3, x: 3)>
array([[[0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.]],

       [[1., 0., 1.],
        [1., 0., 1.],
        [1., 1., 1.]],

       [[1., 0., 0.],
        [0., 0., 1.],
        [0., 0., 1.]]])
Coordinates:
  * x              (x) float64 4.532e+06 4.533e+06 4.535e+06
  * y              (y) float64 2.696e+06 2.694e+06 2.692e+06
    spatial_ref    int64 0
  * time           (time) datetime64[ns] 2019-12-15T10:17:33.408715 ... 2020-...
    temporal_ref   int64 0
    spatial_feats  (y, x) float64 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
Attributes:
    AREA_OR_POINT:  Area
    scale_factor:   1.0
    add_offset:     0.0
    _FillValue:     1.7976931348623157e+308
    value_type:     binary
[111]:
out["map"]
[111]:
<xarray.DataArray 'map' (y: 3, x: 3)>
array([[2., 0., 1.],
       [1., 0., 2.],
       [1., 1., 2.]])
Coordinates:
  * x              (x) float64 4.532e+06 4.533e+06 4.535e+06
  * y              (y) float64 2.696e+06 2.694e+06 2.692e+06
    spatial_ref    int64 0
    temporal_ref   int64 0
    spatial_feats  (y, x) float64 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
Attributes:
    value_type:  discrete
[112]:
out["series"]
[112]:
<xarray.DataArray 'series' (time: 3)>
array([0., 7., 3.])
Coordinates:
    spatial_ref   int64 0
  * time          (time) datetime64[ns] 2019-12-15T10:17:33.408715 ... 2020-1...
    temporal_ref  int64 0
Attributes:
    value_type:  discrete

The dimension is an optional argument to the reduce verb. If no dimension is specified, the reducer will reduce the whole array at once to a single value. That is, the output is a dimensionless array.

[113]:
recipe["stat"] = sq.result("vegetation").reduce("count")
[114]:
out = recipe.execute(**context)
[115]:
out["stat"]
[115]:
<xarray.DataArray 'stat' ()>
array(10.)
Coordinates:
    spatial_ref   int64 0
    temporal_ref  int64 0
Attributes:
    value_type:  discrete

Shift#

The shift verb shifts values in an array with a given offset along a given dimension. This allows to compare pixel values with their lagging or leading values along a dimension, e.g. using a comparison operator with the evaluate() verb.

shift

The offset can be specified by an integer, defining how many steps each pixel value should be shifted. A negative offset implies a shift to the left, while a positive offset implies a shift to the right. For the time dimension, this means that an offset of -1 shifts all pixel values one step into the past, while an offset of 1 shifts all pixel values one step into the future. For the spatial dimensions, the direction of the shift depends on the direction given by the CRS. Normally, for the X dimension, an offset of -1 shifts all pixel values one step (i.e. one time the spatial resolution) to the west, while an offset of 1 shifts all pixel values one step to the east. For the Y dimension, a negative offset normally shifts to the north and a positive offset to the south. A shift along the multi-indexed spatial (x,y) dimension follows the pixel order given by the CRS, normally starting in the top-left corner of the spatial extent and moving down each column of distinct y-coordinates.

The dimension to shift along can be referred to by its name. For example:

sq.entity("water").shift("time", 1)

You can also use the built-in constants stored in the dimensions module to refer to one of the spatio-temporal dimensions, without having to know its exact name. The snippet below is equal to the snippet above:

sq.entity("water").shift(sq.dimensions.TIME, 1)

The pixels at the start of the dimension to be shifted along are filled with nodata values, since there are no values preceeding/succeeding them.

[116]:
recipe = sq.QueryRecipe()
[117]:
recipe["water"] = sq.entity("water")
recipe["shifted"] = sq.entity("water").shift("time", 1)
[118]:
out = recipe.execute(**context)
[119]:
out["water"]
[119]:
<xarray.DataArray 'water' (time: 3, y: 3, x: 3)>
array([[[0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.]],

       [[0., 1., 0.],
        [0., 1., 0.],
        [0., 0., 0.]],

       [[0., 0., 0.],
        [1., 0., 0.],
        [1., 0., 0.]]])
Coordinates:
  * x              (x) float64 4.532e+06 4.533e+06 4.535e+06
  * y              (y) float64 2.696e+06 2.694e+06 2.692e+06
    spatial_ref    int64 0
  * time           (time) datetime64[ns] 2019-12-15T10:17:33.408715 ... 2020-...
    temporal_ref   int64 0
    spatial_feats  (y, x) float64 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
Attributes:
    AREA_OR_POINT:  Area
    scale_factor:   1.0
    add_offset:     0.0
    _FillValue:     1.7976931348623157e+308
    value_type:     binary
[120]:
out["shifted"]
[120]:
<xarray.DataArray 'shifted' (time: 3, y: 3, x: 3)>
array([[[nan, nan, nan],
        [nan, nan, nan],
        [nan, nan, nan]],

       [[ 0.,  0.,  0.],
        [ 0.,  0.,  0.],
        [ 0.,  0.,  0.]],

       [[ 0.,  1.,  0.],
        [ 0.,  1.,  0.],
        [ 0.,  0.,  0.]]])
Coordinates:
  * x              (x) float64 4.532e+06 4.533e+06 4.535e+06
  * y              (y) float64 2.696e+06 2.694e+06 2.692e+06
    spatial_ref    int64 0
  * time           (time) datetime64[ns] 2019-12-15T10:17:33.408715 ... 2020-...
    temporal_ref   int64 0
    spatial_feats  (y, x) float64 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
Attributes:
    AREA_OR_POINT:  Area
    scale_factor:   1.0
    add_offset:     0.0
    _FillValue:     1.7976931348623157e+308
    value_type:     binary

Smooth#

The smooth verbs smoothes values in array by applying a window function over the neighbourhood of each pixel (i.e. a moving window function). The window always moves along a given dimension. The size of the neighbourhood of each pixel can be given as an integer, which specifies how many neighbours at each side of the pixel should be considered. Hence, the pixel being smoothed will always be in the center of the window, with k pixels at its left and k pixels at its right. If the dimension to smooth over is the multi-indexed spatial (x,y) dimension, the size will be used for both the X and Y dimension, forming a square window with the smoothed pixel in the middle.

smooth

The moving window function reduces the values in the neighborhood of a pixel (including the smoothed pixel itself) to a single value, which will be the new value of that pixel. This means that the same functions can be used as for the reduce() verb. They can be referred to by their name. For example:

sq.entity("water").smooth("mean", "time", size = 1)

To be autocomplete-friendly, you can also use built-in constants that refer to a reducer function. These are stored in the reducers module of semantique. In the same way, you can use the built-in constants stored in the dimensions module to refer to one of the spatio-temporal dimensions, without having to know its exact name. The snippet below is equal to the snippet above:

sq.entity("water").smooth(sq.reducers.MEAN, sq.dimensions.TIME, size = 1)

By default, the smooth verb requires each window to contain at least two valid data values (i.e. nodata values not included). This can be tuned by setting the limit parameter, which defined the minimum number of valid data values each window should have. Pixels that do not have enough data values in their window are assigned a nodata value. This effects the pixels towards the edges of the array, as well as those surrounded by many nodata values.

[121]:
recipe = sq.QueryRecipe()
[122]:
recipe["vegetation"] = sq.entity("vegetation")
recipe["time_smoothed"] = sq.result("vegetation").smooth("sum", "time", size = 1)
recipe["space_smoothed"] = sq.result("vegetation").smooth("sum", "space", size = 1, limit = 9)
[123]:
out = recipe.execute(**context)
[124]:
out["vegetation"]
[124]:
<xarray.DataArray 'vegetation' (time: 3, y: 3, x: 3)>
array([[[0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.]],

       [[1., 0., 1.],
        [1., 0., 1.],
        [1., 1., 1.]],

       [[1., 0., 0.],
        [0., 0., 1.],
        [0., 0., 1.]]])
Coordinates:
  * x              (x) float64 4.532e+06 4.533e+06 4.535e+06
  * y              (y) float64 2.696e+06 2.694e+06 2.692e+06
    spatial_ref    int64 0
  * time           (time) datetime64[ns] 2019-12-15T10:17:33.408715 ... 2020-...
    temporal_ref   int64 0
    spatial_feats  (y, x) float64 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
Attributes:
    AREA_OR_POINT:  Area
    scale_factor:   1.0
    add_offset:     0.0
    _FillValue:     1.7976931348623157e+308
    value_type:     binary
[125]:
out["time_smoothed"]
[125]:
<xarray.DataArray 'time_smoothed' (time: 3, y: 3, x: 3)>
array([[[1., 0., 1.],
        [1., 0., 1.],
        [1., 1., 1.]],

       [[2., 0., 1.],
        [1., 0., 2.],
        [1., 1., 2.]],

       [[2., 0., 1.],
        [1., 0., 2.],
        [1., 1., 2.]]])
Coordinates:
  * x              (x) float64 4.532e+06 4.533e+06 4.535e+06
  * y              (y) float64 2.696e+06 2.694e+06 2.692e+06
    spatial_ref    int64 0
  * time           (time) datetime64[ns] 2019-12-15T10:17:33.408715 ... 2020-...
    temporal_ref   int64 0
    spatial_feats  (y, x) float64 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
Attributes:
    AREA_OR_POINT:  Area
    scale_factor:   1.0
    add_offset:     0.0
    _FillValue:     1.7976931348623157e+308
    value_type:     binary
[126]:
out["space_smoothed"]
[126]:
<xarray.DataArray 'space_smoothed' (time: 3, y: 3, x: 3)>
array([[[nan, nan, nan],
        [nan,  0., nan],
        [nan, nan, nan]],

       [[nan, nan, nan],
        [nan,  7., nan],
        [nan, nan, nan]],

       [[nan, nan, nan],
        [nan,  3., nan],
        [nan, nan, nan]]])
Coordinates:
  * x              (x) float64 4.532e+06 4.533e+06 4.535e+06
  * y              (y) float64 2.696e+06 2.694e+06 2.692e+06
    spatial_ref    int64 0
  * time           (time) datetime64[ns] 2019-12-15T10:17:33.408715 ... 2020-...
    temporal_ref   int64 0
    spatial_feats  (y, x) float64 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
Attributes:
    AREA_OR_POINT:  Area
    scale_factor:   1.0
    add_offset:     0.0
    _FillValue:     1.7976931348623157e+308
    value_type:     binary

The smooth verb preserves nodata values of the input, and hence, does not smooth them. This behaviour can be changed by setting the parameter fill to True. If you want to smooth only nodata values, use the fill() verb instead.

Trim#

The trim verbs trims an array by removing those dimension coordinates for which all values are nodata. The spatial dimensions get special treatment, by trimming them only at their edges, and thus preserving the regularity of those dimensions.

trim

You can apply the trim verb for example after filtering, shifting or smoothing.

[127]:
recipe = sq.QueryRecipe()
[128]:
recipe["filtered"] = sq.entity("water").filter(sq.self())
recipe["trimmed"] = sq.result("filtered").trim()
[129]:
out = recipe.execute(**context)
[130]:
out["filtered"]
[130]:
<xarray.DataArray 'filtered' (time: 3, y: 3, x: 3)>
array([[[nan, nan, nan],
        [nan, nan, nan],
        [nan, nan, nan]],

       [[nan,  1., nan],
        [nan,  1., nan],
        [nan, nan, nan]],

       [[nan, nan, nan],
        [ 1., nan, nan],
        [ 1., nan, nan]]])
Coordinates:
  * x              (x) float64 4.532e+06 4.533e+06 4.535e+06
  * y              (y) float64 2.696e+06 2.694e+06 2.692e+06
    spatial_ref    int64 0
  * time           (time) datetime64[ns] 2019-12-15T10:17:33.408715 ... 2020-...
    temporal_ref   int64 0
    spatial_feats  (y, x) float64 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
Attributes:
    AREA_OR_POINT:  Area
    scale_factor:   1.0
    add_offset:     0.0
    _FillValue:     1.7976931348623157e+308
    value_type:     binary
[131]:
out["trimmed"]
[131]:
<xarray.DataArray 'trimmed' (time: 2, y: 3, x: 2)>
array([[[nan,  1.],
        [nan,  1.],
        [nan, nan]],

       [[nan, nan],
        [ 1., nan],
        [ 1., nan]]])
Coordinates:
  * x              (x) float64 4.532e+06 4.533e+06
  * y              (y) float64 2.696e+06 2.694e+06 2.692e+06
    spatial_ref    int64 0
  * time           (time) datetime64[ns] 2020-09-05T10:17:43.167942 2020-12-1...
    temporal_ref   int64 0
    spatial_feats  (y, x) float64 1.0 1.0 1.0 1.0 1.0 1.0
Attributes:
    AREA_OR_POINT:  Area
    scale_factor:   1.0
    add_offset:     0.0
    _FillValue:     1.7976931348623157e+308
    value_type:     binary

Optionally, you can trim a single dimension only. As with the other verbs, you can refer to any dimension by its name, or use the built-in constants for the spatio-temporal dimensions.

[132]:
recipe["time_trimmed"] = sq.result("filtered").trim("time")
[133]:
out = recipe.execute(**context)
[134]:
out["time_trimmed"]
[134]:
<xarray.DataArray 'time_trimmed' (time: 2, y: 3, x: 3)>
array([[[nan,  1., nan],
        [nan,  1., nan],
        [nan, nan, nan]],

       [[nan, nan, nan],
        [ 1., nan, nan],
        [ 1., nan, nan]]])
Coordinates:
  * x              (x) float64 4.532e+06 4.533e+06 4.535e+06
  * y              (y) float64 2.696e+06 2.694e+06 2.692e+06
    spatial_ref    int64 0
  * time           (time) datetime64[ns] 2020-09-05T10:17:43.167942 2020-12-1...
    temporal_ref   int64 0
    spatial_feats  (y, x) float64 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
Attributes:
    AREA_OR_POINT:  Area
    scale_factor:   1.0
    add_offset:     0.0
    _FillValue:     1.7976931348623157e+308
    value_type:     binary

Delineate#

The delineate verb delineates sets of “true” pixels in a binary array that are connected in space-time. These sets are called spatio-temporal objects. In two-dimensional space (i.e. the spatial X and Y dimensions), pixels are considered to be connected when they are neighbours according to the queen criterion. In one-dimensional time, pixels are considered to be connected when they are directly preceeding/succeeding each other. Pixels belonging to the same object get the same integer index assigned, starting from 1. All pixels that do not belong to an object (i.e. those that are “false” in the input) get an index of 0. Nodata values in the input are always preserved.

delineate

[135]:
recipe = sq.QueryRecipe()
[136]:
recipe["water"] = sq.entity("water")
recipe["objects"] = sq.result("water").delineate()
[137]:
out = recipe.execute(**context)
[138]:
out["water"]
[138]:
<xarray.DataArray 'water' (time: 3, y: 3, x: 3)>
array([[[0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.]],

       [[0., 1., 0.],
        [0., 1., 0.],
        [0., 0., 0.]],

       [[0., 0., 0.],
        [1., 0., 0.],
        [1., 0., 0.]]])
Coordinates:
  * x              (x) float64 4.532e+06 4.533e+06 4.535e+06
  * y              (y) float64 2.696e+06 2.694e+06 2.692e+06
    spatial_ref    int64 0
  * time           (time) datetime64[ns] 2019-12-15T10:17:33.408715 ... 2020-...
    temporal_ref   int64 0
    spatial_feats  (y, x) float64 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
Attributes:
    AREA_OR_POINT:  Area
    scale_factor:   1.0
    add_offset:     0.0
    _FillValue:     1.7976931348623157e+308
    value_type:     binary
[139]:
out["objects"]
[139]:
<xarray.DataArray 'objects' (time: 3, y: 3, x: 3)>
array([[[0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.]],

       [[0., 1., 0.],
        [0., 1., 0.],
        [0., 0., 0.]],

       [[0., 0., 0.],
        [2., 0., 0.],
        [2., 0., 0.]]])
Coordinates:
  * x              (x) float64 4.532e+06 4.533e+06 4.535e+06
  * y              (y) float64 2.696e+06 2.694e+06 2.692e+06
    spatial_ref    int64 0
  * time           (time) datetime64[ns] 2019-12-15T10:17:33.408715 ... 2020-...
    temporal_ref   int64 0
    spatial_feats  (y, x) float64 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
Attributes:
    value_type:  ordinal

Fill#

The fill verbs fills nodata values in an array by interpolating the valid data values along a given dimension. There are several interpolation methods to choose from:

  • nearest: Finds the nearest pixel (based on the dimension coordinates) to the pixel with missing data and copies its value.

  • linear: Draws straight lines between the valid data points to determine the values of the missing data points.

  • cubic: Fits a cubic polynomial function to the valid data points to determine the values of the missing data points.

When interpolation along the stacked space dimensions, the two-dimensional versions of these interpolation methods are used, i.e. 2D nearest neighbour, bilinear and bicubic. See here for a figure that clearly explains the different methods. By default the one-dimensional interpolation methods also extrapolate to pixels with missing data at the edges, but this behaviour can be changed by setting extrapolate = False.

fill

A trivial example of one-dimensional interpolation using the nearest method:

[140]:
recipe = sq.QueryRecipe()
[141]:
recipe["foo"] = sq.entity("vegetation").filter_time("month", "equal", 12)
recipe["bar"] = sq.result("foo").fill("time", "nearest")
[142]:
out = recipe.execute(**context)
[143]:
out["foo"]
[143]:
<xarray.DataArray 'foo' (time: 3, y: 3, x: 3)>
array([[[ 0.,  0.,  0.],
        [ 0.,  0.,  0.],
        [ 0.,  0.,  0.]],

       [[nan, nan, nan],
        [nan, nan, nan],
        [nan, nan, nan]],

       [[ 1.,  0.,  0.],
        [ 0.,  0.,  1.],
        [ 0.,  0.,  1.]]])
Coordinates:
  * x              (x) float64 4.532e+06 4.533e+06 4.535e+06
  * y              (y) float64 2.696e+06 2.694e+06 2.692e+06
    spatial_ref    int64 0
  * time           (time) datetime64[ns] 2019-12-15T10:17:33.408715 ... 2020-...
    temporal_ref   int64 0
    spatial_feats  (y, x) float64 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
Attributes:
    AREA_OR_POINT:  Area
    scale_factor:   1.0
    add_offset:     0.0
    _FillValue:     1.7976931348623157e+308
    value_type:     binary
[144]:
out["bar"]
[144]:
<xarray.DataArray 'bar' (time: 3, y: 3, x: 3)>
array([[[0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.]],

       [[1., 0., 0.],
        [0., 0., 1.],
        [0., 0., 1.]],

       [[1., 0., 0.],
        [0., 0., 1.],
        [0., 0., 1.]]])
Coordinates:
  * x              (x) float64 4.532e+06 4.533e+06 4.535e+06
  * y              (y) float64 2.696e+06 2.694e+06 2.692e+06
    spatial_ref    int64 0
  * time           (time) datetime64[ns] 2019-12-15T10:17:33.408715 ... 2020-...
    temporal_ref   int64 0
    spatial_feats  (y, x) float64 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
Attributes:
    AREA_OR_POINT:  Area
    scale_factor:   1.0
    add_offset:     0.0
    _FillValue:     1.7976931348623157e+308
    value_type:     binary

And of two-dimensional interpolation using the linear method:

[145]:
recipe = sq.QueryRecipe()
[146]:
recipe["foo"] = sq.entity("vegetation").reduce("count", "time").filter(sq.self().evaluate("greater", 0))
recipe["bar"] = sq.result("foo").fill("space", "linear")
[147]:
out = recipe.execute(**context)
[148]:
out["foo"]
[148]:
<xarray.DataArray 'foo' (y: 3, x: 3)>
array([[ 2., nan,  1.],
       [ 1., nan,  2.],
       [ 1.,  1.,  2.]])
Coordinates:
  * x              (x) float64 4.532e+06 4.533e+06 4.535e+06
  * y              (y) float64 2.696e+06 2.694e+06 2.692e+06
    spatial_ref    int64 0
    temporal_ref   int64 0
    spatial_feats  (y, x) float64 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
Attributes:
    value_type:  discrete
[149]:
out["bar"]
[149]:
<xarray.DataArray 'bar' (y: 3, x: 3)>
array([[2. , 1.5, 1. ],
       [1. , 1.5, 2. ],
       [1. , 1. , 2. ]])
Coordinates:
  * x              (x) float64 4.532e+06 4.533e+06 4.535e+06
  * y              (y) float64 2.696e+06 2.694e+06 2.692e+06
    temporal_ref   int64 0
    spatial_feats  (y, x) float64 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
    spatial_ref    int64 0
Attributes:
    value_type:  discrete
    _FillValue:  nan

Another way to fill missing data value is by assigning fixed values using the assign verb.

Groupby#

The groupby verb splits an array into multiple smaller subsets, called groups. That is, the output array is a collection of multiple subsets of the input array. Which pixel ends up in which group is defined by a second array which we call the grouper. The grouper should have the same shape as the input array (or able to be aligned to that shape, see below), such that each pixel \(x_{i}\) in input cube \(X\) has a matching pixel \(y_{i}\) in grouper \(Y\), i.e. a pixel that has exactly the same coordinates for each dimension. Then, pixels in \(X\) that have an equal value for their corresponding pixel in \(Y\) are grouped together. Hence, given grouper \(Y\), pixels \(x_{i}\) and \(x_{j}\) in input array \(X\) are grouped together if and only if \(y_{i} = y_{j}\).

groupby

For example, we can group the translated semantic concept water such that each group comprises a set of of water pixels that are connected in space-time (i.e. we use the delineate() verb to create the grouper). The result of this operation is a collection of multiple arrays, i.e. one for each object. These array collections object have specific verbs to combine their elements into a single array again. For a description of those, see the next section.

[150]:
recipe = sq.QueryRecipe()
[151]:
recipe["water"] = sq.entity("water")
recipe["objects"] = sq.result("water").delineate()
recipe["groups"] = sq.result("water").groupby(sq.result("objects"))
[152]:
out = recipe.execute(**context)
[153]:
out["water"]
[153]:
<xarray.DataArray 'water' (time: 3, y: 3, x: 3)>
array([[[0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.]],

       [[0., 1., 0.],
        [0., 1., 0.],
        [0., 0., 0.]],

       [[0., 0., 0.],
        [1., 0., 0.],
        [1., 0., 0.]]])
Coordinates:
  * x              (x) float64 4.532e+06 4.533e+06 4.535e+06
  * y              (y) float64 2.696e+06 2.694e+06 2.692e+06
    spatial_ref    int64 0
  * time           (time) datetime64[ns] 2019-12-15T10:17:33.408715 ... 2020-...
    temporal_ref   int64 0
    spatial_feats  (y, x) float64 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
Attributes:
    AREA_OR_POINT:  Area
    scale_factor:   1.0
    add_offset:     0.0
    _FillValue:     1.7976931348623157e+308
    value_type:     binary
[154]:
out["objects"]
[154]:
<xarray.DataArray 'objects' (time: 3, y: 3, x: 3)>
array([[[0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.]],

       [[0., 1., 0.],
        [0., 1., 0.],
        [0., 0., 0.]],

       [[0., 0., 0.],
        [2., 0., 0.],
        [2., 0., 0.]]])
Coordinates:
  * x              (x) float64 4.532e+06 4.533e+06 4.535e+06
  * y              (y) float64 2.696e+06 2.694e+06 2.692e+06
    spatial_ref    int64 0
  * time           (time) datetime64[ns] 2019-12-15T10:17:33.408715 ... 2020-...
    temporal_ref   int64 0
    spatial_feats  (y, x) float64 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
Attributes:
    value_type:  ordinal
[155]:
out["groups"]
[155]:
[<xarray.DataArray 0.0 (time: 3, y: 3, x: 3)>
 array([[[ 0.,  0.,  0.],
         [ 0.,  0.,  0.],
         [ 0.,  0.,  0.]],

        [[ 0., nan,  0.],
         [ 0., nan,  0.],
         [ 0.,  0.,  0.]],

        [[ 0.,  0.,  0.],
         [nan,  0.,  0.],
         [nan,  0.,  0.]]])
 Coordinates:
   * time           (time) datetime64[ns] 2019-12-15T10:17:33.408715 ... 2020-...
   * y              (y) float64 2.696e+06 2.694e+06 2.692e+06
   * x              (x) float64 4.532e+06 4.533e+06 4.535e+06
     spatial_ref    int64 0
     temporal_ref   int64 0
     spatial_feats  (y, x) float64 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
 Attributes:
     AREA_OR_POINT:  Area
     scale_factor:   1.0
     add_offset:     0.0
     _FillValue:     1.7976931348623157e+308
     value_type:     binary,
 <xarray.DataArray 1.0 (time: 1, y: 2, x: 1)>
 array([[[1.],
         [1.]]])
 Coordinates:
   * time           (time) datetime64[ns] 2020-09-05T10:17:43.167942
   * y              (y) float64 2.696e+06 2.694e+06
   * x              (x) float64 4.533e+06
     spatial_ref    int64 0
     temporal_ref   int64 0
     spatial_feats  (y, x) float64 1.0 1.0
 Attributes:
     AREA_OR_POINT:  Area
     scale_factor:   1.0
     add_offset:     0.0
     _FillValue:     1.7976931348623157e+308
     value_type:     binary,
 <xarray.DataArray 2.0 (time: 1, y: 2, x: 1)>
 array([[[1.],
         [1.]]])
 Coordinates:
   * time           (time) datetime64[ns] 2020-12-19T10:17:34.610661
   * y              (y) float64 2.694e+06 2.692e+06
   * x              (x) float64 4.532e+06
     spatial_ref    int64 0
     temporal_ref   int64 0
     spatial_feats  (y, x) float64 1.0 1.0
 Attributes:
     AREA_OR_POINT:  Area
     scale_factor:   1.0
     add_offset:     0.0
     _FillValue:     1.7976931348623157e+308
     value_type:     binary]

Grouping by dimension coordinates#

The grouper does not have to be of the same shape as the input array, as long as we can align it to that shape. See this section for more details on how this works. In practice, it means that we can also group pixels based on the coordinates of a dimension (or optionally, a specific component of dimension coordinates). All we have to do is to construct a grouper for that dimension. Hence, a one-dimensional array containing the coordinates of that dimension as pixel values, such that pixels in the input array having the same coordinates for that dimension are grouped together.

groupby_dim

For example, when want to create groups of pixels based on the season in which they where observed:

[156]:
recipe = sq.QueryRecipe()
[157]:
recipe["seasons"] = sq.entity("water").groupby(sq.self().extract("time", "season"))
[158]:
out = recipe.execute(**context)
[159]:
out["seasons"][0]
[159]:
<xarray.DataArray 'September, October, November' (time: 1, y: 3, x: 3)>
array([[[0., 1., 0.],
        [0., 1., 0.],
        [0., 0., 0.]]])
Coordinates:
  * x              (x) float64 4.532e+06 4.533e+06 4.535e+06
  * y              (y) float64 2.696e+06 2.694e+06 2.692e+06
    spatial_ref    int64 0
  * time           (time) datetime64[ns] 2020-09-05T10:17:43.167942
    temporal_ref   int64 0
    spatial_feats  (y, x) float64 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
Attributes:
    AREA_OR_POINT:  Area
    scale_factor:   1.0
    add_offset:     0.0
    _FillValue:     1.7976931348623157e+308
    value_type:     binary
[160]:
out["seasons"][1]
[160]:
<xarray.DataArray 'December, January, February' (time: 2, y: 3, x: 3)>
array([[[0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.]],

       [[0., 0., 0.],
        [1., 0., 0.],
        [1., 0., 0.]]])
Coordinates:
  * x              (x) float64 4.532e+06 4.533e+06 4.535e+06
  * y              (y) float64 2.696e+06 2.694e+06 2.692e+06
    spatial_ref    int64 0
  * time           (time) datetime64[ns] 2019-12-15T10:17:33.408715 2020-12-1...
    temporal_ref   int64 0
    spatial_feats  (y, x) float64 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
Attributes:
    AREA_OR_POINT:  Area
    scale_factor:   1.0
    add_offset:     0.0
    _FillValue:     1.7976931348623157e+308
    value_type:     binary

You can also use a handy shortcut for the above formulation: the groupby_time verb. This verb allows you to group along the temporal dimension without having to explicitly extract the time coordinates from the array.

This is only a “shortcut” verb, not an independent verb on its own. This means that when calling it, it is internally translated into a textual query recipe containing the self reference and the extract verb. In the same way, you can also use the shortcut verb groupby_space for grouping directly along the spatial dimension.

[161]:
recipe["seasons"] = sq.entity("water").groupby_time("season")
[162]:
out = recipe.execute(**context)
[163]:
out["seasons"][0]
[163]:
<xarray.DataArray 'September, October, November' (time: 1, y: 3, x: 3)>
array([[[0., 1., 0.],
        [0., 1., 0.],
        [0., 0., 0.]]])
Coordinates:
  * x              (x) float64 4.532e+06 4.533e+06 4.535e+06
  * y              (y) float64 2.696e+06 2.694e+06 2.692e+06
    spatial_ref    int64 0
  * time           (time) datetime64[ns] 2020-09-05T10:17:43.167942
    temporal_ref   int64 0
    spatial_feats  (y, x) float64 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
Attributes:
    AREA_OR_POINT:  Area
    scale_factor:   1.0
    add_offset:     0.0
    _FillValue:     1.7976931348623157e+308
    value_type:     binary
[164]:
out["seasons"][1]
[164]:
<xarray.DataArray 'December, January, February' (time: 2, y: 3, x: 3)>
array([[[0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.]],

       [[0., 0., 0.],
        [1., 0., 0.],
        [1., 0., 0.]]])
Coordinates:
  * x              (x) float64 4.532e+06 4.533e+06 4.535e+06
  * y              (y) float64 2.696e+06 2.694e+06 2.692e+06
    spatial_ref    int64 0
  * time           (time) datetime64[ns] 2019-12-15T10:17:33.408715 2020-12-1...
    temporal_ref   int64 0
    spatial_feats  (y, x) float64 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
Attributes:
    AREA_OR_POINT:  Area
    scale_factor:   1.0
    add_offset:     0.0
    _FillValue:     1.7976931348623157e+308
    value_type:     binary

Multiple groupers#

It is also possible to provide a collection of groupers to the groupby verb, as long as their dimensions match. In that case, groups are formed as follows: given grouper \(Y\) and grouper \(Z\) with matching coordinates, pixels \(x_{i}\) and \(x_{j}\) in input array \(X\) are grouped together if and only if \(y_{i} = y_{j}\) and \(z_{i} = z_{j}\).

groupby_list

That means for example that we can group the input array along the time dimension in a way that two pixels observed in the same month but a different year end up in a different group.

[165]:
recipe = sq.QueryRecipe()
[166]:
multigrouper = sq.collection(sq.self().extract("time", "year"), sq.self().extract("time", "month"))
recipe["groups"] = sq.entity("water").groupby(multigrouper)
[167]:
out = recipe.execute(**context)
[168]:
out["groups"][0]
[168]:
<xarray.DataArray (2019, 'December') (y: 3, x: 3)>
array([[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]])
Coordinates:
  * x              (x) float64 4.532e+06 4.533e+06 4.535e+06
  * y              (y) float64 2.696e+06 2.694e+06 2.692e+06
    spatial_ref    int64 0
    time           datetime64[ns] 2019-12-15T10:17:33.408715
    temporal_ref   int64 0
    spatial_feats  (y, x) float64 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
Attributes:
    AREA_OR_POINT:  Area
    scale_factor:   1.0
    add_offset:     0.0
    _FillValue:     1.7976931348623157e+308
    value_type:     binary
[169]:
out["groups"][1]
[169]:
<xarray.DataArray (2020, 'September') (y: 3, x: 3)>
array([[0., 1., 0.],
       [0., 1., 0.],
       [0., 0., 0.]])
Coordinates:
  * x              (x) float64 4.532e+06 4.533e+06 4.535e+06
  * y              (y) float64 2.696e+06 2.694e+06 2.692e+06
    spatial_ref    int64 0
    time           datetime64[ns] 2020-09-05T10:17:43.167942
    temporal_ref   int64 0
    spatial_feats  (y, x) float64 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
Attributes:
    AREA_OR_POINT:  Area
    scale_factor:   1.0
    add_offset:     0.0
    _FillValue:     1.7976931348623157e+308
    value_type:     binary
[170]:
out["groups"][2]
[170]:
<xarray.DataArray (2020, 'December') (y: 3, x: 3)>
array([[0., 0., 0.],
       [1., 0., 0.],
       [1., 0., 0.]])
Coordinates:
  * x              (x) float64 4.532e+06 4.533e+06 4.535e+06
  * y              (y) float64 2.696e+06 2.694e+06 2.692e+06
    spatial_ref    int64 0
    time           datetime64[ns] 2020-12-19T10:17:34.610661
    temporal_ref   int64 0
    spatial_feats  (y, x) float64 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
Attributes:
    AREA_OR_POINT:  Area
    scale_factor:   1.0
    add_offset:     0.0
    _FillValue:     1.7976931348623157e+308
    value_type:     binary

A shorter formulation for the above statement would be:

[171]:
recipe["groups"] = sq.entity("water").groupby_time(["year", "month"])

Verbs for collections of arrays#

When constructing a query recipe, you can start a processing chain with a reference to a collection of multiple arrays. These array collections have specific verbs that all in some way combine the elements of the collections back into a single array. The currently implemented verbs in this category are:

  • Concatenate: Concatenates multiple arrays over a new or an existing dimension.

  • Compose: Creates a categorical composition of multiple binary arrays.

  • Merge: Merges values of corresponding pixels in multiple arrays into one by applying a reducer function.

It is important to mention that the verbs are intended for arrays that all have the same dimensions (but not necessarily the same coordinates)! They will also work on arrays that do not have the same dimensions, as long as they can all be aligned to each other. However, in these cases you should be aware of the pecularities of alignment, see here for details. To summarize: When the arrays in the collection have the same dimensions, but don’t share all of their coordinate labels, they get aligned to each other by filling the missing pixels in either of them with nodata values. When one of the arrays in the collection (e.g. \(C_{1}\)) is missing a dimension that is present in another array in the collection (e.g. \(C_{2}\)), they get aligned to each other by duplicating the values of \(C_{1}\) for each coordinate of the missing dimension. In any case, the coordinates of the output array are always the union of all coordinates from the input arrays. Only when the input arrays can in no way be aligned to each other, the verb will throw an error.

Concatenate#

The concatenate verb concatenates multiple arrays along a given dimension. There are two main ways in which you can do this: either you concatenate along a new dimension, or you concatenate along an existing dimension.

Concatenating along a new dimension#

Concatenating multiple arrays along a new dimension is a relatively simple process. Each of the input arrays becomes a dimension in the output array. Lets consider a collection with two two-dimensional arrays \(A\) and \(B\) that have matching coordinates along the dimensions \(\Gamma\) and \(\Delta\). Concatenating them along a new dimension \(E\) will result in a new three-dimensional array \(C\) with dimensions \(\Gamma\), \(\Delta\) and \(E\). A pixel with coordinates \((\gamma_{i}, \delta_{i})\) in array \(A\) becomes a pixel with coordinates \((\gamma_{i}, \delta_{i}, \epsilon = A)\) in array \(C\), while the pixel with the same coordinates \((\gamma_{i}, \delta_{i})\) in array \(B\) becomes a pixel with coordinates \((\gamma_{i}, \delta_{i}, \epsilon = B)\) in array \(C\).

concatenate_new

All you have to provide to the concatenate verb is the name of the new dimension. Note that this cannot be one of the names semantique has reserved for its spatio-temporal dimensions, see here. The coordinate labels of the new dimension will be the names of the input arrays.

[172]:
recipe = sq.QueryRecipe()
[173]:
recipe["concepts"] = sq.collection(sq.entity("water"), sq.entity("snow"), sq.entity("vegetation")).\
    concatenate("concept")
[174]:
out = recipe.execute(**context)
[175]:
out["concepts"]
[175]:
<xarray.DataArray 'concepts' (concept: 3, time: 3, y: 3, x: 3)>
array([[[[0., 0., 0.],
         [0., 0., 0.],
         [0., 0., 0.]],

        [[0., 1., 0.],
         [0., 1., 0.],
         [0., 0., 0.]],

        [[0., 0., 0.],
         [1., 0., 0.],
         [1., 0., 0.]]],


       [[[1., 0., 1.],
         [1., 0., 1.],
         [1., 1., 1.]],

        [[0., 0., 0.],
         [0., 0., 0.],
         [0., 0., 0.]],

        [[0., 0., 0.],
         [0., 0., 0.],
         [0., 0., 0.]]],


       [[[0., 0., 0.],
         [0., 0., 0.],
         [0., 0., 0.]],

        [[1., 0., 1.],
         [1., 0., 1.],
         [1., 1., 1.]],

        [[1., 0., 0.],
         [0., 0., 1.],
         [0., 0., 1.]]]])
Coordinates:
  * x              (x) float64 4.532e+06 4.533e+06 4.535e+06
  * y              (y) float64 2.696e+06 2.694e+06 2.692e+06
    spatial_ref    int64 0
  * time           (time) datetime64[ns] 2019-12-15T10:17:33.408715 ... 2020-...
    temporal_ref   int64 0
    spatial_feats  (y, x) float64 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
  * concept        (concept) object 'water' 'snow' 'vegetation'
Attributes:
    AREA_OR_POINT:  Area
    scale_factor:   1.0
    add_offset:     0.0
    _FillValue:     1.7976931348623157e+308
    value_type:     binary

Concatenating over an existing dimension#

Concatenating over an existing dimension is mainly meant for cases where each of the input arrays has different coordinate labels for that dimension. For example, we have one array with a time dimension containing dates in 2019, and another one with a time dimension containing dates in 2020. Then, concatenating them over the time dimension gives us a single array with a time dimension containing both the dates from 2019 and 2020.

concatenate_existing

[176]:
recipe = sq.QueryRecipe()
[177]:
recipe["water"] = sq.entity("water").\
    groupby_time("year").\
    concatenate("time")
[178]:
out = recipe.execute(**context)
[179]:
out["water"]
[179]:
<xarray.DataArray 'water' (time: 3, y: 3, x: 3)>
array([[[0., 0., 0.],
        [0., 0., 0.],
        [0., 0., 0.]],

       [[0., 1., 0.],
        [0., 1., 0.],
        [0., 0., 0.]],

       [[0., 0., 0.],
        [1., 0., 0.],
        [1., 0., 0.]]])
Coordinates:
  * x              (x) float64 4.532e+06 4.533e+06 4.535e+06
  * y              (y) float64 2.696e+06 2.694e+06 2.692e+06
    spatial_ref    int64 0
  * time           (time) datetime64[ns] 2019-12-15T10:17:33.408715 ... 2020-...
    temporal_ref   int64 0
    spatial_feats  (y, x) float64 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
Attributes:
    AREA_OR_POINT:  Area
    scale_factor:   1.0
    add_offset:     0.0
    _FillValue:     1.7976931348623157e+308
    value_type:     binary

We can also concatenate arrays that share coordinate labels of the dimension to concatenate over. However, for these coordinates, only the values of the first array in the collection that contains that coordinate, will end up in the output array. For the others, these values will simply be dropped. Only if the value of the first array is a missing value, the value of the second array will be considered, et cetera.

Compose#

The compose verb is primarily meant for collections of binary arrays, i.e. arrays that only have “true” (i.e. 1) and “false” (i.e. 0) values. Then, a pixel in the output array gets a value of 1 when it was “true” in the first array of the collection, a value of 2 of it was “true” in the second array of the collection, a value of 3 if it was “true” in the third array of the collection, et cetera. Hence, with the compose verb you convert a set of binary arrays into one categorical array.

When a pixel is “true” in more than one array in the collection, it gets the index of that array that comes first in the collection. Hence, if a pixel is “true” in both the second and third array in a collection, it gets a value of 2 in the output array. When a pixel is not “true” for any of the arrays in the collection, it gets a nodata value in the output array.

compose

[180]:
recipe = sq.QueryRecipe()
[181]:
recipe["land_cover"] = sq.collection(sq.entity("water"), sq.entity("snow"), sq.entity("vegetation")).\
    compose()
[182]:
out = recipe.execute(**context)
[183]:
out["land_cover"]
[183]:
<xarray.DataArray 'land_cover' (time: 3, y: 3, x: 3)>
array([[[ 2., nan,  2.],
        [ 2., nan,  2.],
        [ 2.,  2.,  2.]],

       [[ 3.,  1.,  3.],
        [ 3.,  1.,  3.],
        [ 3.,  3.,  3.]],

       [[ 3., nan, nan],
        [ 1., nan,  3.],
        [ 1., nan,  3.]]])
Coordinates:
  * x              (x) float64 4.532e+06 4.533e+06 4.535e+06
  * y              (y) float64 2.696e+06 2.694e+06 2.692e+06
    spatial_ref    int64 0
  * time           (time) datetime64[ns] 2019-12-15T10:17:33.408715 ... 2020-...
    temporal_ref   int64 0
    spatial_feats  (y, x) float64 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
Attributes:
    value_type:    nominal
    value_labels:  {1: 'water', 2: 'snow', 3: 'vegetation'}

Merge#

The merge verb is actually a combination of two other verbs. First, it concatenates the arrays in the collection along a new dimension, and then it reduces the output of that over this new dimension. In practice, that means that the merge verb applies a reduction function to each set of pixels that have the same dimension coordinates but are stored in different arrays in the collection. For example, if we merge the translated semantic concepts water, snow and vegetation using the any reducer, we get an output cube that contains a “true” value (i.e. 1) for a pixel if the value of that pixel was “true” in at least one of the water, snow or vegetation arrays, and a “false” value (i.e. 0) if the value of that pixel was not “true” in any of those.

merge

The only argument you need to provide to the verb is the reducer function. See the reduce() verb for an overview of them.

[184]:
recipe = sq.QueryRecipe()
[185]:
recipe["any_concept"] = sq.collection(sq.entity("water"), sq.entity("snow"), sq.entity("vegetation")).\
    merge("any")
[186]:
out = recipe.execute(**context)
[187]:
out["any_concept"]
[187]:
<xarray.DataArray 'any_concept' (time: 3, y: 3, x: 3)>
array([[[1., 0., 1.],
        [1., 0., 1.],
        [1., 1., 1.]],

       [[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]],

       [[1., 0., 0.],
        [1., 0., 1.],
        [1., 0., 1.]]])
Coordinates:
  * x              (x) float64 4.532e+06 4.533e+06 4.535e+06
  * y              (y) float64 2.696e+06 2.694e+06 2.692e+06
    spatial_ref    int64 0
  * time           (time) datetime64[ns] 2019-12-15T10:17:33.408715 ... 2020-...
    temporal_ref   int64 0
    spatial_feats  (y, x) float64 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
Attributes:
    value_type:  binary

Note that the process of merging a collection of two arrays usually can be modelled as well with the evaluate() verb. For example, the following lines produce identical results:

sq.collection(A, B).merge("any")
A.evaluate("or", B)

However, where the evaluate verb can “merge” one other array into a given input cube, the merge verb allows to combine an unrestricted number of arrays in one go.

Split-apply-combine structures#

All verbs for single arrays (except the groupby verb) can also be applied to array collections. In that case, they will simply be applied to each element of the collection seperately. Hence, the output will again be an array collection, with the same amount of members.

This allows to model well-know “split-apply-combine” processes, such as aggregation. You start with a single array, split it with the groupby() verb into a collection, apply one of the verbs for single arrays to each of its members, and then combine them back together using one of the dedicated verbs for array collections.

For example: we want to know the average water count over space for each year in our time dimension separately.

[188]:
recipe = sq.QueryRecipe()
[189]:
recipe["avg_count"] = sq.entity("water").\
    groupby_time("year").\
    reduce("count", "space").\
    reduce("mean", "time").\
    concatenate("year")
[190]:
out = recipe.execute(**context)
[191]:
out["avg_count"]
[191]:
<xarray.DataArray 'avg_count' (year: 2)>
array([0., 2.])
Coordinates:
    spatial_ref   int64 0
    temporal_ref  int64 0
  * year          (year) int64 2019 2020
Attributes:
    value_type:  continuous

Another example: when we have a spatial extent consisting of multiple distinct spatial features, we might want to know the water count at each timestamp for each feature separately.

[192]:
recipe = sq.QueryRecipe()
[193]:
recipe["count_per_feat"] = sq.entity("water").\
    groupby_space("feature").\
    reduce("count", "space").\
    concatenate("feat")
[194]:
parcels = gpd.read_file("files/parcels.geojson")
parcels.explore()
[194]:
Make this Notebook Trusted to load map: File -> Trust Notebook
[195]:
new_context = copy.deepcopy(context)
new_context["space"] = sq.SpatialExtent(parcels)
new_context["spatial_resolution"] = [-100, 100]

out = recipe.execute(**new_context)
[196]:
out["count_per_feat"]
[196]:
<xarray.DataArray 'count_per_feat' (feat: 2, time: 3)>
array([[  0.,  61.,  18.],
       [  0., 102.,  67.]])
Coordinates:
    spatial_ref   int64 0
  * time          (time) datetime64[ns] 2019-12-15T10:17:33.408715 ... 2020-1...
    temporal_ref  int64 0
  * feat          (feat) object 'Northern' 'Southern'
Attributes:
    value_type:  discrete

Utility verbs#

An additional set of verbs are the utility verbs. These are verbs that do not affect the values of an array themselves, but rather update the attributes of the array. The currently implemented verbs in this category are:

  • Name: Give a (new) name to an array.

Name#

The name verbs gives a name to an array. In some cases this can be particularly useful. For example, when concatenating multiple arrays together along a new dimension, the names of these arrays will be used as coordinate labels of this new dimension.

[197]:
recipe = sq.QueryRecipe()
[198]:
water = sq.entity("water").name("W")
snow = sq.entity("snow").name("S")
vegetation = sq.entity("vegetation").name("V")

recipe["concepts"] = sq.collection(water, snow, vegetation).concatenate("concept")
[199]:
out = recipe.execute(**context)
[200]:
out["concepts"]
[200]:
<xarray.DataArray 'concepts' (concept: 3, time: 3, y: 3, x: 3)>
array([[[[0., 0., 0.],
         [0., 0., 0.],
         [0., 0., 0.]],

        [[0., 1., 0.],
         [0., 1., 0.],
         [0., 0., 0.]],

        [[0., 0., 0.],
         [1., 0., 0.],
         [1., 0., 0.]]],


       [[[1., 0., 1.],
         [1., 0., 1.],
         [1., 1., 1.]],

        [[0., 0., 0.],
         [0., 0., 0.],
         [0., 0., 0.]],

        [[0., 0., 0.],
         [0., 0., 0.],
         [0., 0., 0.]]],


       [[[0., 0., 0.],
         [0., 0., 0.],
         [0., 0., 0.]],

        [[1., 0., 1.],
         [1., 0., 1.],
         [1., 1., 1.]],

        [[1., 0., 0.],
         [0., 0., 1.],
         [0., 0., 1.]]]])
Coordinates:
  * x              (x) float64 4.532e+06 4.533e+06 4.535e+06
  * y              (y) float64 2.696e+06 2.694e+06 2.692e+06
    spatial_ref    int64 0
  * time           (time) datetime64[ns] 2019-12-15T10:17:33.408715 ... 2020-...
    temporal_ref   int64 0
    spatial_feats  (y, x) float64 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
  * concept        (concept) object 'W' 'S' 'V'
Attributes:
    AREA_OR_POINT:  Area
    scale_factor:   1.0
    add_offset:     0.0
    _FillValue:     1.7976931348623157e+308
    value_type:     binary