Skip to content

Overview

DOI: https://doi.org/10.5281/zenodo.17122517

The Lifecycle Model and Guidelines component provides structured guidance for addressing bias throughout the dataset creation process. It offers a practical, workflow-oriented approach to dealing with bias in dataset creation.

Structure and Content

This document provides guidelines for tasks that feature throughout the dataset lifecycle. It is organised into five key stages1:

1. Set Up: Project conceptualisation, planning, and funding

2. Collection: Gathering and organising relevant data

3. Process: Preparing data for analysis

4. Analyse: Exploring relationships and patterns in the data

5. Preserve & Share: Publishing, preserving, and contextualising the dataset

Each stage starts with a recommendation for what actions to take and what to include in your documentation for that part of the dataset creation. Each task includes expressions of biases to consider through questions, recommended actions to take, and resources to aid you in articulating bias in your work.

Documentation

We strongly recommend documenting your process through each of the questions and actions outlined in this framework. Each stage of the lifecycle includes a ‘recommended documentation’ section to ensure methodological rigor and transparency. This documentation should be published and openly available for future reference and audit.

Comprehensive Data Documentation

Good documentation should detail how data is curated: who collected it and when, the collection methodology, funding sources, and how data was reused, integrated, or preprocessed. Include any limitations or known biases in original sources and document your team’s rationale for key decisions. This transparency addresses the crisis of shared understanding around bias across disciplines and helps identify both discrimination (unfair treatment or representation) and opacity (obscured processes or perspectives).

Implementation Framework

The guidelines provide concrete steps that researchers can implement at each stage of their work. By following this structured approach, teams can systematically identify and address biases that might otherwise go unnoticed until later stages of the research process. Document your bias mitigation strategies using a Good-Better-Best approach that outlines progressive strategies based on available resources. We recommend creating your own schema (see template) that aligns with your research project—this process helps teams collectively identify and discuss project-specific biases while contributing to shared methodological understanding across the research community.

Relationship to Other Components

While the Bias Vocabulary examines bias expressions conceptually, this Lifecycle Model takes a process-oriented approach, integrating bias considerations directly into standard dataset creation workflows. Together, these components provide both theoretical understanding and practical implementation strategies.

Limitations

We acknowledge that we will inadvertently have missed certain considerations of bias, as well as forms of bias as they appear in other fields, such as cognitive biases. This overview is not intended to be - nor should it be interpreted as - conclusive or comprehensive. For any queries, comments or feedback, please feel free to contact us.


  1. This model builds upon Research Data Alliance (July 2024), The creation of a harmonised research data lifecycle (RDL) model and crosswalk to existing models. https://www.rd-alliance.org/wp-content/uploads/2024/09/D1_The-creation-of-a-harmonised-research-data-lifecycle-RDL-model-and-crosswalk-to-existing-models-.pdf