2.1 Request permanent storage of data on ESA cloud storage#

1. Storage#

EarthCODE provides long term storage of research outcomes from ESA-funded EO Projects and activities including datasets and associated documents + workflows.
Research outcomes are organised in STAC Objects (Collections), accessible via the STAC API accessible via the STAC API and the Open Science Catalogue Browser
Each Collection contains STAC Items, with their related Assets stored within the repository.

The upload process to the PRR is manual at the moment, therefore you will have to email us with information about your:

  • Data type

  • Data size

  • ESA contract and ESA TO for your project

  • STAC items describing the files A pre-requisite to upload the data is to have descriptions of each file using STAC items. For examples how to do this see section 3.

2. Format#

We strongly encourage cloud-optimised format for your data, since it makes storage and access much easier. If your data is already in one of the preferred formats - cog, parquet, zarr, etc - there is nothing to do for this step.

If the data is not in a cloud-optimised format, we encourage you to transform it yourselves or to contact us and we can help doing this.

3. File-level metadata#

This is the most time-consuming step. There are multiple strategies for doing this, we are flexible and it is up to you to decide how to do it, so long as the data conforms to standard STAC specification.
The main consideration should be usability of the data!
You can learn more about STAC specification here: https://stacspec.org/en

If you are new to STAC Specification and how this applies to your dataset, we have many tutorials available from the EarthCODE Portal and executable from a designated workspace. The tutorials examples how to generate the STAC Items from most commonly used data formats like: netcdf, tiff and zarr files. You can start with the introductory tutorial with will also have an overview of all the information provided here: https://esa-earthcode.github.io/tutorials/prr-stac-introduction/ . Note, that the code in the examples does not generalise fully, so we only offer a few libraries and pointers to get you started. You have to tailor the code to your data, but generally the list of tutorials should faciliate this task. You can run all examples in the earthcode library environment.

More manual way to create STAC Items and Asset level data, is shown in the following example (applicable to all file types - including documentation)

The provided example use Python programming language, but you are free to explore options in other programming languages, if your are more comfortable with them. In that case please share with us the STAC Collections generated by your script.

We can support you through this all stages of this process, just contact us or post in the FORUM!

2.1.2.1 Adding PRR files#

  • After you have sent us the data with the STAC items and we have returned a link for and hosted your data. You can continue the OSC process.

You can run fill the below URL, with the one we returned to you and run the next section of the notebook.

from earthcode.git_add import save_item_links_to_product_collection
from earthcode.validator import validate_catalog
from pathlib import Path
# Define the relevant data links to be manually added
# link to an external data collection
item_link = ''
# Link to accessing the data, this link is required.
access_link = f'https://opensciencedata.esa.int/stac-browser/#/external/{item_link}'
#Link to the documentation, leave as None, if not available
documentation_link = ''

# the ID of the product you created in the previous steps
product_id = ''

catalog_root = str(Path("../open-science-catalog-metadata").resolve())
catalog_root = Path(catalog_root)
save_item_links_to_product_collection(catalog_root, product_id, item_link, access_link, documentation_link)

Run validation to make sure everything works.#

## run validation
errors, error_files = validate_catalog(catalog_root)
if errors or error_files:
    raise AssertionError(f"Catalog validation failed. errors={len(errors)} files={len(error_files)}")