Generating STAC collections for the CareHeat project

This notebook shows how to generate a valid STAC collection, which is a requirement to upload research outcomes to the ESA Project Results Repository (PRR). The code below demonstrates how to perform the necessary steps using real data from the ESA project deteCtion and threAts of maRinE HEAT waves (CAREHeat). The focus of CAREHeat is to improve existing extreme marine heatwave(MHW) detection algorithms, contributing to a better understanding of their impacts.

Check the EarthCODE documentation, and PRR STAC introduction example for a more general introduction to STAC and the ESA PRR.

🔗 Check the project website: deteCtion and threAts of maRinE HEAT waves (CAREHeat) – Website

🔗 Check the eo4society page: deteCtion and threAts of maRinE HEAT waves (CAREHeat) – eo4society

CareHeat Dataset source: Check out the Dataset of Marine heatwaves and cold spells events based on ESA-CCI SSTs

# import libraries
import xarray as xr
from pystac import Item, Collection
import pystac
from datetime import datetime
from shapely.geometry import box, mapping
from xstac import xarray_to_stac
import glob
import json
import shapely
import numpy as np
import geopandas as gpd
import pandas as pd
import os

1. Generate the parent collection¶

The root STAC Collection provides a general description of all project outputs which will be stored on the PRR. The PRR STAC Collection template enforces some required fields that you need to provide in order to build its valid description. Most of these metadata fields should already be available and can be extracted from your data.

# create the parent collection
collectionid = "careheat-marine-heatwaves-cold-spells"


collection = Collection.from_dict(
    
{
  "type": "Collection",
  "id": collectionid,
  "stac_version": "1.1.0",
  "title": "Marine heatwaves and cold spells events based on ESA-CCI SSTs",
  "description": "Marine heatwaves (MHWs) and cold spells (MCSs) prepared by the National Research Council - Institute of Marine Sciences (CNR-ISMAR, Italy) within the ESA-funded CAREHeat project. The catalogues are based on the ESA-CCI sea surface temperature (SST) dataset (available from https://doi.org/10.24381/cds.cf608234) for the period 1982-2022, on a regular 1°x1° longitude-latitude grid.",
  "extent": {
    "spatial": {
      "bbox": [
         [-180, -90, 180, 90]
      ]
    },
    "temporal": {
      "interval": [
        [
          "1982-01-01T00:00:00Z",
          "2022-12-31T23:59:59Z"
        ]
      ]
    }
  },
  "license": "CC-BY-4.0",
  "links": []

}

)

collection # visualise the metadata of your collection

2. Create STAC Items and STAC Assets from original dataset¶

The second step is to describe the different files as STAC Items and Assets. Take your time to decide how your data should be categorised to improve usability of the data, and ensure intuitive navigation through different items in the collections. There are multiple strategies for doing this and this tutorial demonstrate one of the possible ways of doing that. Examples of how other ESA projects are doing this are available in the EarthCODE documentation .

baseurl = './data/careheat-marine-heatwaves-cold-spells/'

bbox = [-180, -90, 180, 90]
geometry = json.loads(json.dumps(shapely.box(*bbox).__geo_interface__))

from pathlib import Path
baseurl = Path('../../data/careheat-marine-heatwaves-cold-spells/')
files = list(baseurl.glob("*.nc"))

# Convert to POSIX-style strings
file_paths = [f.as_posix() for f in files]

for file in file_paths:
    print(file)

../../data/careheat-marine-heatwaves-cold-spells/CCI2D_1x1_ssa_MCS_categories_glo_1982-2022.nc
../../data/careheat-marine-heatwaves-cold-spells/CCI2D_1x1_ssa_MCS_metrics_glo_1982-2022.nc
../../data/careheat-marine-heatwaves-cold-spells/CCI2D_1x1_ssa_MHW_metrics_glo_1982-2022.nc
../../data/careheat-marine-heatwaves-cold-spells/CCI2D_1x1_ssa_MHW_categories_glo_1982-2022.nc
../../data/careheat-marine-heatwaves-cold-spells/CCI2D_1x1_MHW_categories_glo_1982-2022.nc
../../data/careheat-marine-heatwaves-cold-spells/CCI2D_1x1_MHW_metrics_glo_1982-2022.nc
../../data/careheat-marine-heatwaves-cold-spells/CCI2D_1x1_MCS_metrics_glo_1982-2022.nc
../../data/careheat-marine-heatwaves-cold-spells/CCI2D_1x1_MCS_categories_glo_1982-2022.nc

for file in file_paths:

    # open the dataset and read metadata + convert to STAC
    ds = xr.open_dataset(file)
    detrended = 'Detrended (SSA) ' if '_ssa_' in file else ''
    
    template = {

        "id": f"{collectionid}-{file.split('/')[-1][:-3].lower()}",
        "type": "Feature",
        "stac_version": "1.0.0",
        "properties": {
            "title": detrended + ds.attrs['product'],
            "description": detrended + ds.attrs['description'],
            "start_datetime": pd.to_datetime(ds.attrs['climatologyPeriod'][0], format='%Y').strftime("%Y-%m-%dT%H:%M:%SZ"),
            "end_datetime": pd.to_datetime(ds.attrs['climatologyPeriod'][-1], format='%Y').strftime("%Y-%m-%dT%H:%M:%SZ"),
            "license": "CC-BY-4.0",
            "created": pd.to_datetime(ds.attrs['date'], format='%Y-%m').strftime("%Y-%m-%dT%H:%M:%SZ"),
            "git_information": ds.attrs['git_information'],
            "website": ds.attrs['website'],
            "version": ds.attrs['version'],
            "changelog": ds.attrs['changelog'],
            "institution": ds.attrs['institution'],
            "author": ds.attrs['author'],
            "contact": ds.attrs['contact'],
        },
        "geometry": geometry,
        "bbox": bbox,
        "assets": {
            "data": {
                "href": f"./{collectionid}/{file.split('/')[-1]}",  # or local path
                "type": "application/x-netcdf",
                "roles": ["data"],
                "title": detrended + ds.attrs['product']
            }
        }
    }
    # 3. Generate the STAC Item
    item = xarray_to_stac(
        ds,
        template,
        temporal_dimension="time" if 'time' in ds.coords else False,
        x_dimension='lon',
        y_dimension='lat',
        reference_system=False
    )

    # validate and add the STAC Item to the collection
    item.validate()
    collection.add_item(item)

collection

# save the full self-contained collection
collection.normalize_and_save(
    root_href=f'../../prr_preview/{collectionid}',
    catalog_type=pystac.CatalogType.SELF_CONTAINED
)

collection

Acknowledgments¶

We gratefully acknowledge the deteCtion and threAts of maRinE HEAT waves (CAREHeat) for providing access to the data used in this example, as well as support in creating it.