Skip to article frontmatterSkip to article content

Adding new content to Open Science Catalogue with Pull Request (PR)

The Open Science Catalog (OSC) is a key component of the ESA EO Open Science framework. It is a publicly available web-based application designed to provide easy access to scientific resources including geoscience products, workflows, experiments and documentation from activities and projects funded by ESA under the EO Programme.

The Open Science Catalog is built on the Spatio Temporal Asset Catalog (STAC), which is a standardised format for describing geospatial data. Throught the open source STAC browser, the catalog allows users to browse and explore interlinked elements such as themes, variables, EO missions, projects, products, workflows, and experiments, all described using STAC-compliant JSON files. This schema ensures that these can be easily reused by other scientists and automated workflowss and correclty displayed in the web browser. Data, workflows, and experiments are documented in the catalogue primarily through enriched metadata and direct links to the corresponding research outcomes. The physical location of these resources is typically indicated via the Project Results Repository or other secure external repositories. Further details on the OSC format can be found here.

Adding information to the OSC

There are three ways to add information to the OSC.

  • Manually opening a pull request (this tutorial)
  • Using the GUI editor.
  • Using one of the platform specific tools

This notebook describes how you can add information to the OSC by manually creating and editting json files that describe STAC Collections and Catalogs. The steps to add information in this way are:

  1. Fork the repository
  2. Add the information about project/product/workflow/variables in STAC json format.
  3. Open a PR to merge the new information into the OSC.

In general most of the information that you need, is already in your data or project documentation, so you will NOT need to generate anything new. All information that you provide will be automatically validated and manually verified by an EarthCODE team member. Therefore, you can use the automatic validation from the CI to make the appropriate changes to the format or information you provide.

1. Forking the repository

Since the OSC metadata is fully hosted on GitHub. Use your personal GitHub account to cotnribute to the catalog. If you do not have an account, you need to setup a new GitHub account: https://docs.github.com/en/get-started/start-your-journey/creating-an-account-on-github.

To contribute your research outputs, you need to create valid STAC objects and commit them to the open-science-catalog-metadata repository on GitHub. The first step in this process is to fork the open-science-catalog-metadata repository, that will create your own copy of the Open Science Catalog. ( More information about how to do this in GitHub is available here: https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/fork-a-repo )

Once you have a OSC copy, you should have a look at the folder structure and information for an existing Item of the same type as the one you want to add - product, project, workflow, variable etc. This will give you an idea of the required information for a valid STAC Object. These STAC objects, stored as JSON files, are be automatically processed and rendered in the catalog viewer.

2. Add the information about project/product/workflow/experiments/variables.

After you forked the repository, you can start adding the required information. This document explains the Open Science Catalog Extension to the SpatioTemporal Asset Catalog (STAC) specification. There are different requirements depending on the catalog entry you are trying to add.

Sometimes its easier to copy the folder of existing project/product/workflow, rename it and start changing its information.
For example, copying the contents of this folder products/sentinel3-ampli-ice-sheet-elevation/, renaming it to products/New_Project_Name/ and editing its values.

2.1 Add new Project

Projects are the containers that have the top level information about your work. It is the first type of information you should provide. Typically an OSC project corresponds to a project financed by the European Space Agency - Earth Observation programme. Before creating new project, check if your project is not already on the list of onboarded projects. In such case you can use your project entry and only update it where needed.

FieldDescriptionSTAC representation
Project_IDNumeric identifier
Status“ongoing” or “completed”osc:status property
Project_NameNametitle property
Short_Descriptiondescription property
Websitelink
Eo4Society_linklink
Consortiumcontacts[].name property
Start_Date_Projectextent.temporal[] property
End_Date_Projectextent.temporal[] property
TOcontacts[].name property
TO_E-mailcontacts[].emails[].value property
Theme1 - Theme6Theme identifiersosc:themes property

Metadata of each project is stored in a folder named after their unique id (collectionid). Each folder has one file - collection.json that has all the project information (metadata). Have a look at the structure of the Project entry below (with example values filled in):

{
    'type': 'Collection', // This is the STAC type specification. You dont need to change this
    'stac_version': '1.1.0',  // This is the STAC version specification. You dont need to change this
    'id': 'worldcereal2', // This is your project id. Please make sure to use unique id name for your project! The parent folder of the collection.json should have the same name as this id (not displayed in the browser).
    'title': 'WorldCereal2', // Title of your project. Official acronym of the project may be used as well (this will be displayed to public)
    'description': 'WorldCereal is an ESA initiative that provides global '
                'cropland and crop type maps at 10-meter resolution, offering '
                'seasonally updated data on temporary crops, croptypes (maize, '
                'winter cereals and spring cereals), and irrigation.', // A short, but meaningful description of your project.
    'links': [  // links to related elements of the catalog. The first two links should always be present and are always the same.
        {'href': '../../catalog.json',
        'rel': 'root',
        'title': 'Open Science Catalog',
        'type': 'application/json'},
        {'href': '../catalog.json',
        'rel': 'parent',
        'title': 'Projects',
        'type': 'application/json'},
        // The next two links are external links to project websites.  These are mandatory and you have to adapt them to your project.
        {'href': 'https://esa-worldcereal.org/en', # your dedicated project page
            'rel': 'via',
            'title': 'Website'},
           {'href': 'https://eo4society.esa.int/projects/worldcereal-global-crop-monitoring-at-field-scale/', #link to the project page on EO4Society website
            'rel': 'via',
            'title': 'EO4Society Link'},
        // The next link is a link to the themes specified in the themes field below. It is mandatory to have a link to all themes specified in the themes array
        {'href': '../../themes/land/catalog.json',  #related theme of the project
            'rel': 'related',
            'title': 'Theme: Land',
            'type': 'application/json'}
    ],

    'themes': [ // this is an array of the ESA themes the project relates to. The fields are restricted to the themes available in the OCS. The format of the array is id:theme and having at least one theme is mandatory. Check available themes here: https://opensciencedata.esa.int/themes/catalog
        {'concepts': [{'id': 'land'}], 
        'scheme': 'https://github.com/stac-extensions/osc#theme'}
    ],

    'stac_extensions': [ // which schemas is the project information validated against. Typically you would not change these.
        'https://stac-extensions.github.io/osc/v1.0.0/schema.json', 
        'https://stac-extensions.github.io/themes/v1.0.0/schema.json',
        'https://stac-extensions.github.io/contacts/v0.1.1/schema.json'
        ]
    'osc:status': 'completed', // status of the project - Select from: completed, ongoing, scheduled
    'osc:type': 'project', // Type of OSC STAC collection, for projects should always be project
    'updated': '2025-07-14T17:03:29Z', // when was last update made
    'extent': {'spatial': {'bbox': [[-180.0, -90.0, 180.0, 90.0]]}, // The study area of the project and its planned duration.
                'temporal': {'interval': [['2021-01-01T00:00:00Z',
                                        '2021-12-31T23:59:59Z']]}}
    'license': 'proprietary' // Top level license of project outcomes. Should be one of https://github.com/ESA-EarthCODE/open-science-catalog-validation/blob/main/schemas/license.json

    // list of consortium members working on the project and contact to ESA TO following the project. This field is required.
    'contacts': [{'emails': [{'value': 'Zoltan.Szantoi@esa.int'}],
               'name': 'Zoltan Szantoi',
               'roles': ['technical_officer']},
              {'name': 'VITO Remote Sensing', 'roles': ['consortium_member']}
 
 }

In addition to specifying the links within the project collection.json entry (created above), you should also add an entry in the parent catalog, listing all projects to be correclty rendered into STAC Browser. Once done it is required to add the following link (as provided below) to: projects/catalog.json .
Add this links array into the project/catalog.json just after the last project entry. Edit the catalog.json direclty by copy-and paste the followinf link (updated according to the data from your collection.json)

{
    'rel':'child', 
    'target: './{project_id}/collection.json', // use the collectionid of the project
    'media_type': 'application/json',
    'title': '{project_title}'   // title of th project as described in the collection.json file created before. 
}

2.2 Add new Product

Products represent the outputs of you projects and typically reference datasets. Similarly to Projects, they are STAC items and follow similar structure, with some additional fields, improving their findability.

FieldDescriptionSTAC representation
IDNumeric identifier
Status“ongoing” or “completed”osc:status property
ProjectThe project identifierosc:project property, collection link
Websitelink
ProductNamelink
Short_Nameidentifier
Descriptiondescription property
AccessURLlink
DocumentationURLlink
Versionversion property
DOIDigital Object Identifiersci:doi property and cite-as link
VariableVariable identifiercollection link
Startextent.temporal[]
Endextent.temporal[]
Regionosc:region property
Polygongeometry
Releasedcreated property
Theme1 - Theme6Theme identifiersosc:themes property
EO_MissionsSemi-colon separated list of missionsosc:missions property
Standard_Namecf:parameter.name property
{
 'type': 'Collection', // This is the STAC type specification. You dont need to change this
 'id': 'worldcereal-crop-extent-belgium2', // This is the unique id of the product. Typically contains the dataset title+project name (or acronym)
 'stac_version': '1.0.0', // This is the STAC version specification. You dont need to change this
 'stac_extensions': [  // which schemas is the product information validated against. Typically you would not change these.
    'https://stac-extensions.github.io/osc/v1.0.0/schema.json',
    'https://stac-extensions.github.io/themes/v1.0.0/schema.json',
    'https://stac-extensions.github.io/cf/v0.2.0/schema.json'
 ],
 'created': '2025-07-14T17:37:16Z', //initial creation date
 'updated': '2025-07-14T17:37:16Z'  // date of the last update
 'title': 'WorldCereal Crop Extent - Belgium2', // product title
 'description': 'WorldCereal is an ESA initiative that provides global ' // Short, but meaningful product description. It should provide enough information to the external users on the specific product.
                'cropland and crop type maps at 10-meter resolution, offering '
                'seasonally updated data on temporary crops, croptypes (maize, '
                'winter cereals and spring cereals), and irrigation. This '
                'dataset provides the outputs for Belgium.',

 'extent': {'spatial': {'bbox': [[-180.0, -90.0, 180.0, 90.0]]}, // the temporal and spatial extent of the product
            'temporal': {'interval': [['2021-01-01T00:00:00Z',
                                       '2021-12-31T23:59:59Z']]}},
 'keywords': ['Crops', 'Cereal'], // list of keywords associated with the product. These are expected to be inline with the description.

 'osc:project': 'worldcereal2', //unique id of the OSC project, this product is associated with. It must be the id provided in the ./project/(collectionid)
 'osc:region': 'Belgium', //text description of the study area
 'osc:status': 'ongoing', //product status
 'osc:type': 'product', // Type of OSC STAC collection, for products should always be product
 
 'links': [ // links to different elements of the catalog. The first two links should always be present and are always the same.
    
    {'href': '../../catalog.json',
    'rel': 'root',
    'title': 'Open Science Catalog',
    'type': 'application/json'},
    {'href': '../catalog.json',
    'rel': 'parent',
    'title': 'Products',
    'type': 'application/json'},
    {'href': '../../projects/worldcereal2/collection.json', // link to parent project (associated project)
    'rel': 'related',
    'title': 'Project: WorldCereal2',
    'type': 'application/json'},

    {'href': '../../themes/land/catalog.json', // link to the theme (scientific domain) this product is associated with.
    'rel': 'related',
    'title': 'Theme: Land',
    'type': 'application/json'},
    {'href': '../../eo-missions/sentinel-2/catalog.json', // link to eo-missions used to produce the outcomes
    'rel': 'related',
    'title': 'EO Mission: Sentinel-2',
    'type': 'application/json'},
    {'href': '../../variables/crop-yield-forecast/catalog.json', // link to variables specified below.
    'rel': 'related',
    'title': 'Variable: Crop Yield Forecast',
    'type': 'application/json'},

    {'href': 'https://eoresults.esa.int/browser/#/external/eoresults.esa.int/stac/collections/ESA_WORLDCEREAL_SPRINGCEREALS', // link to dataset hosted in ESA Project Results Repository (PRR). 
    'rel': 'child',
    'title': 'ESA WorldCereal Spring Cereals'},

    {'href': 'https://eoresults.esa.int/browser/#/external/eoresults.esa.int/stac/collections/ESA_WORLDCEREAL_SPRINGCEREALS',
    'rel': 'via',
    'title': 'Access'}, // external link to the actual data
    {'href': 'https://worldcereal.github.io/worldcereal-documentation/',
    'rel': 'via',
    'title': 'Documentation'} // external link to data documentation
],
 'osc:missions': ['sentinel-2'], // array of ESA missions related to the product. This array of values is mandatory and limited to missions already existing in the OSC. If you would like to associate your product to a mission that is not on the list, create eo-mission entry first. 
 'osc:variables': ['crop-yield-forecast'], // array of variables related to the product. This array of values is mandatory and limited to variables already existing in the OSC. If you would like to associate your product to a mission that is not on the list, create eo-mission entry first. 
 'cf:parameter': [{'name': 'crop-yield-forecast'}], // optional parameters following cf conventions
 
 'sci:doi': 'https://doi.org/10.57780/s3d-83ad619', // DOI, if already assigned
 
 'themes': [ // this is an array of the ESA themes the project relates to. The fields are restricted to the themes available in the OCS. The format of the array is id:theme and having at least one theme is mandatory.
    {'concepts': [{'id': 'land'}],
    'scheme': 'https://github.com/stac-extensions/osc#theme'}],
 
 'license': 'proprietary', //  License of the product. Should be one of https://github.com/ESA-EarthCODE/open-science-catalog-validation/blob/main/schemas/license.json

}

In addition to specifying the links from the product to other parts of the catalog, it is required to add the reverse links, as in case of the Project to following elements:

  • From the Product Collection.json to the Catalog.json (listing all products in the OSC)
  • From the associated Project to the Product
  • From the associated EO-Missions catalog to the Product
  • From the associated Variables Catalog to the Product
  • From the associated Themes Catalog to the Product
  1. Add the Product link to products/catalog.json by pasting the following in the links array:
{
    'rel':'child', 
    'target: './worldcereal-crop-extent-belgium2/collection.json', // use the collectionid of the product
    'media_type': 'application/json',
    'title': 'WorldCereal Crop Extent - Belgium2'   // title of the product as described in the collection.json file created before. 
}
  1. Add the links array to associated elements of the OSC. For example add following product to parent project:
{
      "rel": "related",
      "href": "../../products/worldcereal-crop-extent-belgium2/collection.json",
      "type": "application/json",
      "title": "Product: WorldCereal Crop Extent - Belgium2"
}

Similarly, add links to other OSC elements like eo-missions, variables, themes etc.

2.3 Add new Workflow

Workflows are the code and workflows associated with a project, that have been used to generate a specific product. Workflows follow OGC record specifications in contrast to OSC Projects and Products entries. However, the metadata of a workflow is also expressed in JSON format.

Field NameDescription
conformsToAn array of URIs indicating which OGC API Records specifications this record conforms to.
typeIndicates the GeoJSON object type. Required to be "Feature" for OGC compliance.
geometrySpatial representation of the item. Set to None here, as it may not be spatially explicit.
linkTemplatesAn array of link templates as per the OGC API. Used for dynamic link generation.
idUnique identifier for the workflow STAC item ('worldcereal-workflow2').
linksList of external and internal references including catalog navigation, project association, theme association, process graph, source code, and service endpoint.
properties.contactsList of individuals or organizations associated with the workflow. Each contact may include name, email, and roles such as technical_officer or consortium_member.
properties.createdTimestamp representing when the workflow was first created (2025-07-14T18:02:13Z).
properties.updatedTimestamp of the most recent update to the workflow (2025-07-14T18:02:13Z).
properties.versionThe version number of the workflow (1).
properties.titleA concise, descriptive title of the workflow: “ESA worldcereal global crop extent detector2”.
properties.descriptionA summary of what the workflow does: “Detects crop land at 10m resolution, trained for global use...”.
properties.keywordsArray of keywords to support discoverability (e.g., agriculture, crops).
properties.themesArray of themes the workflow relates to. Each entry includes a concepts array with IDs (e.g., 'land') and a scheme URL.
properties.formatsOutput formats of the workflow (e.g., GeoTIFF).
properties.osc:projectProject ID associated with the workflow (worldcereal2).
properties.osc:statusCurrent status of the workflow (e.g., completed).
properties.osc:typeType of OSC object, expected to be workflow.
properties.licenseLicense for the workflow (e.g., 'varuious' – likely a typo for various).

All data is stored in a record.json file, witin a folder that has the same name as the workflow id.

{
    'conformsTo': [ // OGC spec, does not need to change
        'http://www.opengis.net/spec/ogcapi-records-1/1.0/req/record-core' 
    ],
    'type': 'Feature'// OGC spec requirement, does not need to change
    'geometry': None, // OGC spec requirement, does not need to change
    'linkTemplates': [], // OGC spec, does not need to change
    'id': 'worldcereal-workflow2',  // unique workflow id

    'links': [  // links to different parts of the catalog. The first two links should always be present and are always the same.
        
        {'href': '../../catalog.json',
            'rel': 'root',
            'title': 'Open Science Catalog',
            'type': 'application/json'},
        {'href': '../catalog.json',
            'rel': 'parent',
            'title': 'Workflows',
            'type': 'application/json'},
        {'href': '../../projects/worldcereal2/collection.json', // link to associated project
            'rel': 'related',
            'title': 'Project: WorldCereal2',
            'type': 'application/json'},
        {'href': '../../themes/land/catalog.json', // link to associated themes in the themes array specified below
        'rel': 'related',
        'title': 'Theme: Land',
        'type': 'application/json'},
        { // link to the openeo-process process graph that describes the workflow
            'href': 'https://raw.githubusercontent.com/WorldCereal/worldcereal-classification/refs/tags/worldcereal_crop_extent_v1.0.1/src/worldcereal/udp/worldcereal_crop_extent.json',
            'rel': 'openeo-process',
            'title': 'openEO Process Definition',
            'type': 'application/json'},
        { // external link to the full workflow codebase
            'href': 'https://github.com/WorldCereal/worldcereal-classification.git',
            'rel': 'git',
            'title': 'Git source repository',
            'type': 'application/json'},
        { // external link to the service used to run the workflow
            'href': 'https://openeofed.dataspace.copernicus.eu',
            'rel': 'service',
            'title': 'CDSE openEO federation',
            'type': 'application/json'}
        ],
    // OGC spec requirement to have a properties field, that contains most of the workflow metadata

    'properties': {
        
        'contacts': [{'emails': [{'value': 'marie-helene.rio@esa.int'}],
                                'name': 'Marie-Helene Rio',
                                'roles': ['technical_officer']},
                                {'name': 'CNR-INSTITUTE OF MARINE SCIENCES-ISMAR '
                                        '(IT)',
                                'roles': ['consortium_member']},
                                {'name': '+ATLANTIC – Association for an Atla '
                                        '(PT)',
                                'roles': ['consortium_member']}],
        'created': '2025-07-14T18:02:13Z', // date of workflow creation
        'updated': '2025-07-14T18:02:13Z', // date of workflow last update
        'version': '1' //  workflow version
        'title': 'ESA worldcereal global crop extent detector2', // Short and meaningful title of the workflow
        'description': 'Detects crop land at 10m resolution, trained '
                    'for global use. Based on Sentinel-1 and 2 '
                    'data...', // Short and meaningful workflow description. Should provide specification on how the workflow can be executed and what it does.
        'keywords': ['agriculture', 'crops'], // workflow keywords (to enhance the findability of the workflow)
        'themes': [{'concepts': [{'id': 'land'}], // // this is an array of the ESA themes the project relates to. The fields are restricted to the themes available in the OCS. The format of the array is id:theme and having atleast one theme is mandatory.
                            'scheme': 'https://github.com/stac-extensions/osc#theme'
                    }],
        'formats': [{'name': 'GeoTIFF'}], //format of worfklow output
        'osc:project': 'worldcereal2', // workflow related project
        'osc:status': 'completed', // workflow status
        'osc:type': 'workflow', // OSC type, for workflows should always be workflow
        'license': 'varuious', // workflow license
        
    }
}

In addition to specifying the links from the workflow to other parts of the catalog, it is required to add the reverse links:

  • From the Workflow record.json to the workflows/catalog.json (listing all workflows in the OSC)
  • From the associated Project to the Workflow
  • From the associated Themes to the Workflow

3. Open a PR to merge the new information into the OSC.

After you have added all the information, commit and push your changes to the forked repository and open a pull request against the OSC - https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/creating-a-pull-request .

Once you open the PR, there will be an automatic validation run against the information you added . If it fails you will have to change some of the added information. You can see if the PR is successfull based on the specific CI run, in the screen shot below. If you click on the red X, the validator will give you the specific reason for the failure.
Please be advised that once a pull request (PR) is submitted to the open-science-catalog-metadata repository, it will undergo a review process conducted by members of the EarthCODE team. During this process, the content will be evaluated for completeness and accuracy. Should any additional information or modifications be required, you may be asked to update your PR accordingly. All communication related to the review will be provided through comments within the PR.

Alternatives

  • EarthCODE provides a GUI editor to automatically create links and open a PR for you.
  • If you are using one of the EarthCODE platforms, they provide specialised tools for automatic this work.
  • You can use libraries like pystac to automate some of the required work. This tutorial shows how.