3.Workflow#
Workflows are defined as a formal wrapper for scientific code that can run in a specific container or environment on the cloud. Workflows in Open Science Catalogue are associated with the project and scientific product to provide a transparent information about the process that was used to generate the scientific results. Workflows follow OGC record specifications in contrast to OSC Projects and Products entries. However, the metadata of a workflow is also expressed in JSON format. Workflow metadata describe how a workflow can be used in general by experiments. It may describe the input parameters required, acceptable values etc.
To discover the specification used in the workflows, explore the documentation here: https://esa-earthcode.github.io/tutorials/osc-pr-manual/#id-2-3-add-new-workflow
See example workflow metadata directly at open science catalogue metadata repository on GitHub to compare the list of required parameters and their format: See example workflow: FAIRSenDD workflow
LICENSE: In this step you are required to select one of the available licenses for your workflow. Please have a look at available list of license and pick the one that fits into your needs and needs of the potential users: osc-licence schemas.
If you have a workflow with non-defined license, we cannot proceed with publishing the workflow. Please use the list of licenses by SPDX and select the most appropriate one.
Visit EarthCODE Best Practices to learn more about Open Data & Licensing
This notebook shows how to create an OSC workflow record using the current earthcode API, save it in a local OSC catalog clone, and validate the full catalog.
from datetime import datetime
from pathlib import Path
from earthcode.metadata_input_definitions import WorkflowMetadata
from earthcode.static import create_workflow_record
from earthcode.git_add import save_workflow_record_to_osc
from earthcode.validator import validate_catalog
# BASIC INFORMATION ABOUT THE WORKFLOW
# A custom id of the workflow (must be different from project and product!), it can be related to the title, i.e. - world-cereal-algorithm.
#Use dash "-" symbol to separate words in the id"
workflow_id = ""
workflow_title = ""
workflow_description = ""
# Define at most five keywords for the workflow. You can use any short text, that allow users to discover your workflow.
workflow_keywords = ["", ""]
# Define the license of the workflow. i.e. CC-BY-4.0. See the note in the markdown cell above to consult full list of available licenses.
# If you have multiple licenses, you can pick 'various'
workflow_license = "CC-BY-4.0"
# what DATA the workflow takes as input and output, i.e. GeoTIFF, Netcdf
workflow_formats = ["netcdf64"]
# Define which project the workflow is associated with
# if are adding to an existing project see the id and titles from here:
# - https://github.com/ESA-EarthCODE/open-science-catalog-metadata/projects/
#These must match the new or an already existing project in the catalog! Alteratively correct links cannot be produced!
project_id = ""
project_title = ""
# Define workflow themes i.e. land. Pick one or more from:
# - atmosphere, cryosphere, land, magnetosphere-ionosphere, oceans, solid-earth.
# See the list here: https://github.com/ESA-EarthCODE/open-science-catalog-metadata/blob/main/themes/catalog.json
workflow_themes = [""]
# List the contacts in a tuple with format (name, contact_email), for example - ('Magellium', "contact@magellium.fr")
workflow_contracts_info = [("", "")]
# Define the access to the repository where the workflow can be discovered. Provide an active URL below:
codeurl = ""
# Optional workflow record fields
workflow_doi = None
workflow_s = -180.0
workflow_w = -90.0
workflow_n = 180.0
workflow_e = 90.0
workflow_start_year = 2021
workflow_start_month = 1
workflow_start_day = 1
workflow_end_year = 2021
workflow_end_month = 12
workflow_end_day = 31
include_workflow_bbox = False
include_workflow_time = False
# Local OSC clone root path (assumed one folder above repository root)
catalog_root = str(Path("../open-science-catalog-metadata").resolve())
Create Workflow record#
ℹ️ Note
This function creates a workflow record and automatically generates record.json and all required links.
It connects the workflow with related project, themes and updates existing entries as needed.
Run the cell below to automatically create new entry.
catalog_root = Path(catalog_root)
workflow_metadata = WorkflowMetadata(
workflow_id=workflow_id,
workflow_title=workflow_title,
workflow_description=workflow_description,
workflow_license=workflow_license,
workflow_keywords=workflow_keywords,
workflow_formats=workflow_formats,
workflow_themes=workflow_themes,
codeurl=codeurl,
project_id=project_id,
project_title=project_title,
workflow_doi=workflow_doi,
workflow_bbox=[[workflow_w, workflow_s, workflow_e, workflow_n]] if include_workflow_bbox else None,
workflow_start_datetime=datetime(workflow_start_year, workflow_start_month, workflow_start_day) if include_workflow_time else None,
workflow_end_datetime=datetime(workflow_end_year, workflow_end_month, workflow_end_day) if include_workflow_time else None,
)
workflow_record = create_workflow_record(workflow_metadata)
save_workflow_record_to_osc(workflow_record, catalog_root)
errors, error_files = validate_catalog(catalog_root)
if errors or error_files:
raise AssertionError(f"Catalog validation failed. errors={len(errors)} files={len(error_files)}")
print(f"Saved workflow: {workflow_record['id']}")
print("Catalog validation passed.")