Skip to article frontmatterSkip to article content

ESA Project Results Repository: Generating STAC collections with zarr files

This notebook shows how to generate a valid STAC collection, which is a requirement to upload research outcomes to the ESA Project Results Repository (PRR). It focuses on generating metadata for a project with zarr data. The product has two zarr files, covering different regions, created using Sentinel 1 and Sentinel 2 data respectively.

Check the EarthCODE documentation, and PRR STAC introduction example for a more general introduction to STAC and the ESA PRR.

The code below demonstrates how to perform the necessary steps using real data from the ESA project Yield Prediction and Estimation from Earth Observation (YIPEEO). The focus of YIPEEO is to improve field-scale crop yield forecasts through the usage of high-resolution remote sensing data and cutting edge scientific methods.

🔗 Check the project website: Yield Prediction and Estimation from Earth Observation (YIPEEO) – Website

# import libraries
import xarray as xr
from pystac import Item, Collection
import pystac
from datetime import datetime
from shapely.geometry import box, mapping
from xstac import xarray_to_stac
import glob
import json
import shapely
import numpy as np
import geopandas as gpd
import pandas as pd

1. Generate the parent collection

The root STAC Collection provides a general description of all project outputs which will be stored on the PRR. The PRR STAC Collection template enforces some required fields that you need to provide in order to build its valid description. Most of these metadata fields should already be available and can be extracted from your data.

# create the parent collection
collectionid = "yipeeo-cropyields"


collection = Collection.from_dict(
    
{
  "type": "Collection",
  "id": collectionid,
  "stac_version": "1.1.0",
  "title": "Yield Prediction and Estimation features from Sentinel1 and Sentinel2 data",
  "description": "This dataset contains the processed Sentinel 1 and Sentinel 2 features used for yield rediction  in the Yield Prediction and Estimation from Earth Observation (YIPEEO) project. Sentinel-2 L2A collection is used to compute a set of features based on the provided bands as well as various vegetation indices. Sentinel-1 data for the years 2016-2023 was pre-processed by TUW RS on the Vienna Scientific Cluster using the software SNAP8 and software packages developed by the TUW RS group.",
  "extent": {
    "spatial": {
      "bbox": [
        [
          4.844270319251073,
          49.040729923617775,
          31.01967739451807,
          52.869947524440924
        ]
      ]
    },
    "temporal": {
      "interval": [
        [
          "2016-01-01T00:00:00Z",
          "2022-12-31T00:00:00Z"
        ]
      ]
    }
  },
  "license": "various",
  "links": []

}

)

collection # visualise the metadata of your collection 

2. Create STAC Items and STAC Assets from original dataset

The second step is to describe the different files as STAC Items and Assets. Take your time to decide how your data should be categorised to improve usability of the data, and ensure intuitive navigation through different items in the collections. There are multiple strategies for doing this and this tutorial demonstrate one of the possible ways of doing that. Examples of how other ESA projects are doing this are available in the EarthCODE documentation .

2.1 Add the Sentinel 1 features to a STAC Item

sentinel1_url = 'https://objectstore.eodc.eu:2222/68e13833a1624f43ba2cac01376a18af:ASP_ZARR/S1_out.zarr'
ds = xr.open_zarr(sentinel1_url)
ds
bbox = (
    float(ds.min_lon.min().values), 
    float(ds.min_lat.min().values), 
    float(ds.max_lon.max().values), 
    float(ds.max_lat.max().values)
)
geometry = json.loads(json.dumps(shapely.box(*bbox).__geo_interface__))

template = {
    "id": f"{collectionid}-sentinel1-features",
    "type": "Feature",
    "stac_version": "1.0.0",
    "properties": {
        "title": "Sentinel-1 Features",
        "description": 'Sentinel 1 features for crop yield prediction and estimatation from 2015 to 2022. The processing workflow consists of the following steps:\n1. Apply precise orbit data\n2. Border-noise removal\n3. Radiometric calibration\n4. Radiometric terrain-flattening\n5. Range-Doppler terrain correction\nFor steps 4. and 5. the 30 m Copernicus Digital Elevation Model (DEM) was used. To extract time series on field level from the pre-processed Sentinel-1 data, several further processing steps were performed to mitigate the impact of the viewing geometry and undesired objects within or near the fields. In a first step, an incidence angle normalization to 40\u00b0 was performed. Afterwards, all pixels below a standard deviation of 5dB within one year were filtered out as they are typically stemming from radar shadow pixels or are no crop pixels. Finally, the cross-ratio was calculated by subtracting VV and VH polarized backscatter. ',
        "start_datetime": pd.to_datetime(ds.time.min().values).strftime("%Y-%m-%dT%H:%M:%SZ"),
        "end_datetime": pd.to_datetime(ds.time.max().values).strftime("%Y-%m-%dT%H:%M:%SZ"),
        "license": "CC-BY-4.0",
        "platform": "sentinel-1",
        "instruments": ["c-sar"],
        "created": datetime.utcnow().isoformat() + "Z"
    },
    "geometry": geometry,
    "bbox": bbox,
    "assets": {
        "data": {
            "href": "f'/d/{collectionid}/S1_out.zarr",  # or local path
            "type": "application/vnd+zarr",
            "roles": ["data"],
            "title": "Zarr Store of Sentinel1 Field Stats"
        }
    }
}
# 3. Generate the STAC Item
sentinel1_item = xarray_to_stac(
    ds,
    template,
    temporal_dimension="time",
    x_dimension=False,
    y_dimension=False
)

sentinel1_item.validate()
sentinel1_item

2.2 Add the sentinel 2 features

sentinel2_url = 'https://objectstore.eodc.eu:2222/68e13833a1624f43ba2cac01376a18af:ASP_ZARR/S2_out.zarr'
ds = xr.open_dataset(sentinel2_url, engine='zarr')
ds
bbox = (
    float(ds.min_lon.min().values), 
    float(ds.min_lat.min().values), 
    float(ds.max_lon.max().values), 
    float(ds.max_lat.max().values)
)
geometry = json.loads(json.dumps(shapely.box(*bbox).__geo_interface__))

template = {
    "id": f"{collectionid}-sentinel2-features",
    "type": "Feature",
    "stac_version": "1.0.0",
    "properties": {
        "title": "Sentinel-2 Features",
        "description": 'Sentinel 2 features based on the provided bands as well as various vegetation indices. The Sentinel-2 L2A data cube is dynamically created by utilising the STAC API. The datacube is pre-filter with scenes of a cloud cover less than 80%. The following features are extracted per field and timestamp: Band Medians and Standard Deviations: B02, B03, B04, B05, B06, B07, B08, B8A, B11, B12; Vegetation indices based on median bands of NDVI, EVI, NDWI, NMDI. An outlier removal was added on a field scale level utilising the SCL band and outlier removal based on 2 x inter quartile range (IQR).',
        "start_datetime": pd.to_datetime(ds.time.min().values).strftime("%Y-%m-%dT%H:%M:%SZ"),
        "end_datetime": pd.to_datetime(ds.time.max().values).strftime("%Y-%m-%dT%H:%M:%SZ"),
        "license": "CC-BY-4.0",
        "platform": "sentinel-2",
        "instruments": ["msi"],
        "created": datetime.utcnow().isoformat() + "Z"
    },
    "geometry": geometry,
    "bbox": bbox,
    "assets": {
        "data": {
            "href": "f'/d/{collectionid}/S2_out.zarr",  # or local path
            "type": "application/vnd+zarr",
            "roles": ["data"],
            "title": "Zarr Store of Sentinel2 Field Stats"
        }
    }
}
# 3. Generate the STAC Item
sentinel2_item = xarray_to_stac(
    ds,
    template,
    temporal_dimension="time",
    x_dimension=False,
    y_dimension=False
)
sentinel2_item.validate()
sentinel2_item

3. Add the Items to the collection and Save the metadata as a self-contained collection

collection.add_items([sentinel1_item, sentinel2_item])
# save the full self-contained collection
collection.normalize_and_save(
    root_href='../../data/yippeo_collection/',
    catalog_type=pystac.CatalogType.SELF_CONTAINED
)
collection

Acknowledgments

We gratefully acknowledge the Yield Prediction and Estimation from Earth Observation (YIPEEO) team for providing access to the data used in this example, as well as support in creating it.