Skip to article frontmatterSkip to article content

Example from SRAL Processing over Land Ice Dataset

This is an example notebook for creating the STAC Items uploaded to ESA Project Results Repository and made available at: https://eoresults.esa.int/browser/#/external/eoresults.esa.int/stac/collections/sentinel3-ampli-ice-sheet-elevation

Dataset is also discoverable via Open Science Catalogue, providing access to created in this tutorial collection stored in ESA Project Results Repository (PRR). https://opensciencedata.esa.int/products/sentinel3-ampli-ice-sheet-elevation/collection

It focuses on generating metadata for a project with a hundreads of items, each of which has hundreads of netcdf assets.

Check the EarthCODE documentation, and PRR STAC introduction example for a more general introduction to STAC and the ESA PRR.

The code below demonstrates how to perform the necessary steps using real data from the ESA project **SRAL Processing over Land Ice **. With the focus of the project on improving Sentinel-3 altimetry performances over land ice.

🔗 Check the : User handbook

🔗 Check the : Scientifc publication

Steps described in this notebook

This notebook presents the workflow for generating a PRR Collection for the entire dataset coming from the project. To create a valid STAC Items and Collection you should follow steps described below:

  1. Generate a root STAC Collection
  2. Group your dataset files into STAC Items and STAC Assets
  3. Add the Items to the collection
  4. Save the normalised collection

Due to the complexity of the project and the time it takes to process the data, the STAC Items are generated first and stored locally. They are added to the collection afterwards. Furthermore, since we are working with thousands of files, we are using the links from the PRR directly. When the notebook was created originally all the files were available locally.

This notebook can be used as an example for following scenario(s):

  1. Creating the STAC Items from the files stored locally
  2. Creating the STAC Items from files stored in the s3bucket or other cloud repository
  3. Creating the STAC Items from files already ingested into PRR

Of course if your files are locally stored, or stored in a different S3 Bucket the access to them (roor_url and items paths) should be adapted according to your dataset location.

Note: Due to the original size of the dataset ~ 100GB, running this notebook end to end may take hours. We do advise therefore to trying it on your own datasets by changing file paths to be able to produe valid STAC Collaction and STAC Items.

Loading Libraries

import json
import time
import pystac
import rasterio
from shapely import box
import pandas as pd
import xarray as xr
from datetime import datetime
from dateutil.parser import isoparse
from dateutil import parser
from dateutil.parser import parse

2. Load Product files stored in ESA Project Results Repository

root_url = 'https://eoresults.esa.int' # provide a root url for the datasets items 
# get all items for the S3 AMPLI collection from the PRR STAC API
items = pystac.ItemCollection.from_file('https://eoresults.esa.int/stac/collections/sentinel3-ampli-ice-sheet-elevation/items?limit=10_000')
# get the paths to all the data

# using a dictionary is faster than using pystac
items_dict = items.to_dict()
all_item_paths = []
for item in items_dict['features']:
    assets = item['assets']
    for asset_name, asset_dict in assets.items():
        if asset_dict['roles'] == ['data']:
            all_item_paths.append(asset_dict['href'])
# Create a list of EO Missions and instruments as well as region of the dataset and cycles
instruments = ['sentinel-3a', 'sentinel-3b']
regions = ['antarctica', 'greenland']
cycles = [f"cycle{str(i).zfill(3)}" for i in range(5, 112)]  # Cycle005 to Cycle111
# Assign the instrument name based on the acronym used in the file name
renaming = {
    'S3A': 'sentinel-3a',
    'S3B': 'sentinel-3b',
    'ANT': 'antarctica',
    'GRE': 'greenland'
}

Define geometries, which are the same for all items within the same region. If they are not, these have to be extracted from the assets inside the item.

# Define the spatial extent (bbox) for each region of interest
greenland_bbox = [-74.0, 59.0, -10.0, 84.0]
greenland_geometry = json.loads(json.dumps(box(*greenland_bbox).__geo_interface__))

antarctica_bbox = [-180.0, -90.0, 180.0, -60.0]
antarctica_geometry = json.loads(json.dumps(box(*antarctica_bbox).__geo_interface__))

2.1 Group the files by the instruments, region and cycle of the dataset

data = []

for ipath in all_item_paths:
    splitname = ipath.split('/')[-1].split('_')
    instrument = splitname[0]
    cycle = splitname[9]
    region = splitname[-2]

    data.append((renaming[instrument], renaming[region], cycle, ipath))


filedata = pd.DataFrame(data, columns=['instrument', 'region', 'cycle', 'path'])

3. Create the STAC Items with the metadata from the original files loaded from the PRR

# group all files into items from the same instrument, region and cycle
for (instrument, region, cycle), links in filedata.groupby(['instrument', 'region', 'cycle']):
    
    # open the metadata attributes for each file in the group
    datasets = [xr.open_dataset(root_url + link + '#mode=bytes') for link in links['path']]


    # Define the Temporal extent
    first_item = datasets[0]
    last_item = datasets[-1]
    props = first_item.attrs
    props2 = last_item.attrs

    start_datetime = props.get("first_meas_time")
    end_datetime = props2.get("last_meas_time")

    # Define the geometry
    if props['zone'] == 'Antarctica':
        bbox = antarctica_bbox
        geometry = antarctica_geometry
    elif props['zone'] == 'Greenland':
        bbox = greenland_bbox
        geometry = greenland_geometry


    # Shared properties
    properties = {
        "start_datetime": start_datetime,
        "end_datetime": end_datetime,
        "created": props.get("processing_date"),
        "description": f"Sentinel-3 AMPLI Land Ice Level-2 product acquired by {instrument.capitalize()} platform derived from the SRAL altimeter in Earth Observation mode over {region} region.",
        "conventions": props.get("Conventions"),
        "platform_name": props.get("platform_name"),
        "platform_serial_identifier": props.get("platform_serial_identifier"),
        "altimeter_sensor_name": props.get("altimeter_sensor_name"),
        "operational_mode": props.get("operational_mode"),
        "cycle_number": props.get("cycle_number"),
        "netcdf_version": props.get("netcdf_version"),
        "product_type": props.get("product_type"),
        "timeliness": props.get("timeliness"),
        "institution": props.get("institution"),
        "processing_level": props.get("processing_level"),
        "processor_name": props.get("processor_name"),
        "processor_version": props.get("processor_version"),
        "references": props.get("references"),
        "zone": props.get("zone"),
    }


    # Create STAC item for the cycle
    item = pystac.Item(
        id=f"sentinel-3{props.get("platform_serial_identifier").lower()}-{props.get("zone").lower()}-{cycle.lower()}",
        geometry=geometry,
        bbox=bbox,
        datetime=isoparse(start_datetime),
        properties=properties
    )

    item.stac_version = "1.1.0"
    item.stac_extensions = [
        "https://stac-extensions.github.io/projection/v1.1.0/schema.json",
        "https://stac-extensions.github.io/raster/v1.1.0/schema.json",
        "https://stac-extensions.github.io/eo/v1.1.0/schema.json"
    ]

    item.assets = {}

    # Add assets from that cycle
    for nc_href, ds in zip(links['path'], datasets):

        asset_title = ds.attrs['product_name']
        extra_fields = {
            "cycle_number": str(ds.attrs.get("cycle_number")),
            "orbit_number": str(ds.attrs.get("orbit_number")),
            "relative_orbit_number": str(ds.attrs.get("relative_orbit_number")),
            "orbit_direction": ds.attrs.get("orbit_direction"),
        }

        item.add_asset(
            key=asset_title,
            asset=pystac.Asset(
                href=nc_href,
                media_type="application/x-netcdf",
                roles=["data"],
                extra_fields=extra_fields
            )
        )

    # Save STAC item per cycle
    json_filename = f"sentinel-3{props.get("platform_serial_identifier").lower()}-{props.get("zone").lower()}-{cycle.lower()}.json"
    item.save_object(dest_href='examples/' + json_filename, include_self_link=False)
    print(f" Saved {json_filename}")

3.1 Import documentation

import pystac
from datetime import datetime
import os
from datetime import datetime, timezone

date_str = "07/05/2025"

# Convert to ISO format string (YYYY-MM-DD)
iso_like_str = datetime.strptime(date_str, "%d/%m/%Y").strftime("%Y-%m-%d")

# Parse with isoparse and attach UTC timezone
dt_utc = isoparse(iso_like_str).replace(tzinfo=timezone.utc)

print(dt_utc.isoformat())

3.2 Create STAC Item for the documentation associated to the dataset

# Basic metadata
doc_href = "/d/S3_AMPLI_User_Handbook.pdf"  # Relative or absolute href
doc_title = "Sentinel-3 Altimetry over Land Ice: AMPLI level-2 Products"
doc_description = "User Handbook for Sentinel-3 Altimetry over Land Ice: AMPLI level-2 Products"

# Create STAC item
item = pystac.Item(
    id="sentinel-3-ampli-user-handbook",
    geometry=None,
    bbox=None,
    datetime=dt_utc,
    properties={
        "title": doc_title,
        "description": doc_description,
        "reference": "CLS-ENV-MU-24-0389",
        "issue_n": dt_utc.isoformat()
    }
)

# Add asset for the PDF
item.add_asset(
    key="documentation",
    asset=pystac.Asset(
        href=doc_href,
        media_type="application/pdf",
        roles=["documentation"],
        title=doc_title
    )
)

# Save to file
item.set_self_href("examples/sentinel-3-ampli-user-handbook.json")
item.save_object(include_self_link=False)

print("📄 STAC Item for documentation created: sentinel-3-ampli-user-handbook.json")

4. Generate valid STAC collection

Once all the assets are processed, create the parent collection for all Items created in the previous step.

collection = pystac.Collection.from_dict(

{
  "id": "sentinel3-ampli-ice-sheet-elevation",
  "type": "Collection",
  "links": [
  ],
  "title": "Sentinel-3 AMPLI Ice Sheet Elevation",
  "extent": {
    "spatial": {
      "bbox": [
        [-180, -90, 180, 90]
      ]
    },
    "temporal": {
      "interval": [
        [
          "2016-06-01T00:00:00Z",
          "2024-05-09T00:00:00Z"
        ]
      ]
    }
  },
  "license": "CC-BY-4.0",
  "summaries": {
    "references": [
      "https://doi.org/10.5194/egusphere-2024-1323"
    ],
    "institution": [
      "CNES"
    ],
    "platform_name": [
      "SENTINEL-3"
    ],
    "processor_name": [
      "Altimeter data Modelling and Processing for Land Ice (AMPLI)"
    ],
    "operational_mode": [
      "Earth Observation"
    ],
    "processing_level": [
      "2"
    ],
    "processor_version": [
      "v1.0"
    ],
    "altimeter_sensor_name": [
      "SRAL"
    ]
  },
  "description": "Ice sheet elevation estimated along the Sentinel-3 satellite track, as retrieved with the Altimeter data Modelling and Processing for Land Ice (AMPLI). The products cover Antarctica and Greenland.",
  "stac_version": "1.1.0"
}
)
collection

4.1. Add items to collection

Once the collection is created read all the items from disk and add the necassary links.

import glob
for fpath in glob.glob('examples/*'):
    collection.add_item(pystac.Item.from_file(fpath))

4.2 Save the normalised collection

# save the full self-contained collection
collection.normalize_and_save(
    root_href='../data/example_catalog_ampli/',
    catalog_type=pystac.CatalogType.SELF_CONTAINED
)

Acknowledgments

We gratefully acknowledge the SRAL Processing over Land Ice team for providing access to the data used in this example, as well as support in creating it.

References
  1. European Space Agency. (2025). Sentinel-3 AMPLI Ice Sheet Elevation. European Space Agency. 10.57780/S3D-83AD619