2.0. Product metadata#

The product STAC Collection provides a general metadata description of all project outputs which will be discovered on the Open Science Catalogue (OSC). Most of these metadata fields should already be available and can be extracted from your data or documentation.

This notebook shows how to create an OSC product entry using the current earthcode API, save it in a local OSC catalog clone, add an item to the product, and validate the full catalog.

You can attach one or more products to a single project! So if you have more than one, you have run the cell [2-4] for each product to add !

See example product metadata directly at open science catalogue metadata repository on GitHub to compare the list of required parameters and their format: See example product: WAPOSAL Dataset

LICENSE: In this step you are required to select one of the available licenses for each of your product.
Please have a look at available list of license and pick the one that defines your datasets: osc-licence schemas.

If you have a product with non-defined license, we cannot proceed with publishing the datasets. Please use the list of licenses by SPDX and select the most appropriate one.
Visit EarthCODE Best Practices to learn more about Open Data & Licensing

PRODUCT EO-MISSIONS: In this step you are required to select one or more EO Missions that you have used to generate your product. Please have a look at the defined list of EO missions available in the OSC under: (ESA-EarthCODE/open-science-catalog-metadata) and searchable under: https://opensciencedata.esa.int/eo-missions/catalog

If you have a product which uses or complements in-situ data collections or comes as a results of numerical models please select: [“in-situ-observations”] or [“numerical-models”]

PRODUCT VARIABLES: In this step you are required to select one or more variables that your product describes. Please have a look at the defined list of geophysical variables available in the OSC under: (ESA-EarthCODE/open-science-catalog-metadata). You can also explore the list of variables under: https://opensciencedata.esa.int/variables/catalog

Variables are defined in OSC as geophysical, climate and environmental variables selected from WMO OSCAR Database, complemented by the GCMD Keywords Database

NOTE: You can use the EarthCODE search functionality to find relevant variables and eo-missions to your data.

PRODUCT PARAMETER: Please provide a parameter linked to the product, in allignment with the CF convention standard: See full list under: https://cfconventions.org/

PRODUCT DOI: Since few weeks EarthCODE offers DOI assignment to products/datasets published on Open Science Data Catalogue. The process is still manual and is handled by ESA TEllUs service, and is handled by the EarthCODE Data Stewardship team on behalf of the Project PI.

If you would like to assign a DOI to your data, please contact the EarthCODE Team, who will support you in this process: earth-code@esa.int!

from datetime import datetime
from pathlib import Path

from earthcode.metadata_input_definitions import ProductCollectionMetadata
from earthcode.static import create_product_collection
from earthcode.git_add import save_product_collection_to_catalog

from earthcode.search import search
# Local OSC clone root path (assumed one folder above repository root)
catalog_root = str(Path("../open-science-catalog-metadata").resolve())

# A custom id of the product (must be different from project!), it can be related to the title, i.e. - 4datlantic-ohc-dataset. Use dash "-" symbol to separare words in the id"
product_id = ""
product_title = ""
product_description = ""
# Product status: pick from - ongoing or completed
product_status = ""

# Define the product license. i.e. CC-BY-4.0. See the note in the markdown cell above to consult full list of available licenses.
# If you have a license agreement that is not in the list, you can put 'other' in the product licenses and provide a link to the text directly
# if your license is in the list, leave the license link as None.
product_license = "CC-BY-4.0"
license_link = None

# Define at most five keywords for the product. You can use any short text, that allow users to discover your product.
product_keywords = ["", ""]

# Define spatial extent of PRODUCT/DATASET in epsg:4326. If the dataset covers discontinuous regions,
# add the bounding box boundaries for each
product_s = [-180.0]
product_w = [-90.0]
product_n = [180.0]
product_e = [90.0]

# Define the temporal extent of PRODUCT/ DATASET
product_start_year = 2021
product_start_month = 1
product_start_day = 1
product_end_year = 2021
product_end_month = 12
product_end_day = 31

# Define the semantic region covered by this product, i.e. Belgium, Global etc.
product_region = ""

# Define product themes i.e. land. Pick one or more from:
# - atmosphere, cryosphere, land, magnetosphere-ionosphere, oceans, solid-earth.
# See the list here: https://github.com/ESA-EarthCODE/open-science-catalog-metadata/blob/main/themes/catalog.json
product_themes = [""]

# Define the eo-misison(s) used to generate the product. i.e. - "sentinel-2"
# Pick one or more from - https://github.com/ESA-EarthCODE/open-science-catalog-metadata/tree/main/eo-missions
product_missions = []

# Define variables describing at best your Product/ dataset:
# Pick one or more from from https://github.com/ESA-EarthCODE/open-science-catalog-metadata/tree/main/variables
product_variables = []

# search("description",type="variable")

# Define the parameters describing your product in standardised CF convention format: i.e. "leaf_area_index".
product_parameters = []

# Provide DOI number assigned to your product. If your product does not have one, type: None
product_doi = None

# Define the related project id and title
# These must match the new or an already existing project in the catalog! Alteratively correct links cannot be produced!
project_id = ""
project_title = ""

Create Project collection#

ℹ️ Note
This function creates a product collection and automatically generates STAC Collection.json and all required STAC links.
It connects the project with related projects, themes, and missions, and updates existing entries as needed.
Run the cell below to automatically create new entry.

# create a product
product_bbox = [[w, s, e, n] for s, w, n, e in zip(product_s, product_w, product_n, product_e)]

product_metadata = ProductCollectionMetadata(
    product_id=product_id,
    product_title=product_title,
    product_description=product_description,
    product_bbox=product_bbox,
    product_start_datetime=datetime(product_start_year, product_start_month, product_start_day),
    product_end_datetime=datetime(product_end_year, product_end_month, product_end_day),
    product_license=product_license,
    product_keywords=product_keywords,
    product_status=product_status,
    product_region=product_region,
    product_themes=product_themes,
    product_missions=product_missions,
    product_variables=product_variables,
    project_id=project_id,
    project_title=project_title,
    product_parameters=product_parameters,
    product_doi=product_doi,
    license_link=license_link or None,
)

product_collection = create_product_collection(product_metadata)

Save the product entry into local fork of the open-science-catalog-metadata repository#

# save the product in local fork of the open-science-catalog-metadata repository 
catalog_root = Path(catalog_root)
save_product_collection_to_catalog(product_collection, catalog_root)

In this case we do not run validation yet. To complete the addition of products to the OSC, you need to provide asset-level metadata.
There are three requirements for this:

  1. Storage. Your research data and workflows/code must be hosted on remote, persistent storage that allows discovery. Examples include:

  • Repository provided by ESA >>> see 2.1 Product_files_PRR.ipynb notebook

  • S3-compatible object storage - permanent and public

  • Zenodo, CEDA, Dataverse, or other persistent archives used by the academic community

  1. Format. We encourage you to use cloud-optimised data format, since it makes storage and access to the products much easier.

  2. File-level metadata. To add your data to the Open Science Catalog, you have to generate a STAC items that describes your files, code (if applicable) and documentation (if applicable).

  • If you already cover requirement 1. and 2, you should now start to Generate STAC Items for your dataset >>> guide/2.2.Product_files_self_hosted.ipynb

ℹ️ To request permanent storage of data on ESA cloud storage to preserve the data in long-term, navigate to >>> guide/2.1.Product_files_PRR.ipynb.

We can support you through this all stages of this process, just contact us or post in the FORUM!