Appearance
Workflow Best Practices
🛠️ Page Under Development
Content is being actively developed and updated for this page. EarthCODE's documentation is a living document and will be continuously updated with detailed reviews.
In this page, we describe the design decisions and best practices for creating and distributing your scientific workflows to maximize their value and impact within the EarthCODE ecosystem. A high-quality workflow is more than just code that runs; it is a complete, transparent, and robust scientific narrative, packaged in a way that is easy for others (and your future self) to understand, reuse, and reproduce. This guide provides practical recommendations based on widely accepted community standards and established software development principles. The suggested guidelines are summarized in the EarthCODE Quality Workflows checklist below.
The effort put into quality assurance for research code should be proportionate to the analysis's complexity and risk. While not every script needs production-level rigor, reproducibility is the minimum standard
When developing and publishing your workflow, consider these best practices:
- Structure Your Project Logically — Organize files consistently.
- Use Version Control Effectively — Track changes using Git.
- Explicitly Define the Environment — List dependencies and versions.
- Tell a Story (in Notebooks) — Explain context/methods/results in Markdown.
- Modularize and Refactor Code — Avoid duplication; use functions/modules.
- Adopt a Consistent Coding Style — Follow style guides (e.g., PEP 8).
- Build a Reproducible Analytical Pipeline — Design for automation; configure externally.
- Implement Basic Testing — Include basic code checks.
- Ensure Executability — Package code and environment for reuse.
- Link Code Version to Results — Link code versions to results via Experiments.
How Research Code Differs
Research code often differs significantly from traditional software development. It's frequently written by domain experts, like scientists or analysts, whose main goal is to answer a specific research question, generate insights from data, or test a hypothesis. This contrasts with building a long-lasting production service.
A key characteristic is its exploratory nature; much research code starts this way, evolving rapidly as understanding grows, which can initially lead to less structured code compared to production software. The primary focus is often on obtaining scientifically correct results and insights, sometimes taking precedence over optimal software engineering practices like extensive testing or user interface design. While some research code might be developed for a single analysis or publication, increasingly, workflows are designed for reuse and adaptation. Crucially, unlike many commercial applications, the ability for others (and the original author) to exactly reproduce the results from the code and data is a fundamental requirement for scientific validity. Understanding these differences helps in applying quality assurance practices appropriately.
Why Focus on Quality Research Code
Although research code has unique characteristics, focusing on its quality is vital. High-quality, well-documented code is essential for others, including your future self, to trust your results. It forms the foundation for reproducibility – the ability to run the same analysis with the same data and get the same outcome, which is the cornerstone of scientific validation.
Clean, understandable code makes it easier for peers and collaborators to review your methods, verify your implementation, and identify potential errors or improvements. Well-structured and documented code is also easier to adapt for new datasets or research questions. Investing time in quality upfront prevents "technical debt" and saves significant effort later by avoiding the need to rewrite or debug poorly written code, enabling efficient building upon previous work.
Sharing high-quality code alongside data supports transparency and open science, allowing the broader community to understand, scrutinize, and benefit from your work. It also aligns with funding agency requirements for quality and auditability. Applying proportionate quality assurance practices, even to exploratory code, ultimately increases the reliability, impact, and longevity of your research.