SDRF-Proteomics

1. Status of this Template

This document provides guidelines for annotating cell line-based proteomics experiments in SDRF-Proteomics format. This template extends the core SDRF-Proteomics specification with cell line-specific metadata fields and resources.

Status: Released

Version: 1.1.0 - 2026-01

2. Abstract

Cell lines are extensively used in proteomics research for biological studies and technology development. They present unique annotation challenges: metadata (sex, age, disease) refers to the original donor, multiple naming conventions exist (HeLa, HELA, He-La), and cell lines may be misidentified or have undergone genetic drift.

This template defines standardized approaches for annotating cell line experiments, using Cellosaurus as the PRIMARY standard for cell line identification and metadata retrieval.

Important
Cellosaurus is the REQUIRED primary standard for cell line annotation in SDRF-Proteomics. The characteristics[cellosaurus accession] column (CVCL_XXXX format) is REQUIRED for all cell line experiments. Do NOT use EFO, CLO, or BTO as primary identifiers - these are accepted only for legacy compatibility and cross-reference purposes.

3. Cellosaurus: The Primary Cell Line Resource

Cellosaurus is a comprehensive knowledge resource on cell lines maintained by the SIB Swiss Institute of Bioinformatics. It provides:

  • Standardized cell line names

  • Unique accession numbers (CVCL_XXXX format)

  • Cross-references to other databases

  • Information about species, diseases, and cell types

  • Documentation of cell line problems (contamination, misidentification)

Important
While Cellosaurus is not a formal ontology, it is the most comprehensive and actively maintained resource for cell line information. SDRF-Proteomics RECOMMENDS using Cellosaurus accession numbers for cell line identification.

3.1. Cellosaurus Accession Format

Cellosaurus accessions follow the format CVCL_XXXX where XXXX is a unique identifier:

Cell Line Cellosaurus Name Cellosaurus Accession

HeLa

HeLa

CVCL_0030

HEK293

HEK293

CVCL_0045

4. SDRF Cell Line Metadata Database

To facilitate consistent annotation of cell line experiments, a curated database of cell line metadata is available:

5. Checklist

This section defines the metadata columns required and recommended for cell line experiments.

5.1. Required Columns

The following columns are REQUIRED for cell line experiments:

ColumnRequirementDescriptionOntologyExample Values
characteristics[cell line] REQUIRED Name of the cell line Cellosaurus HeLa, HEK293, K562
characteristics[organism] REQUIRED Species of the cell line NCBITaxon homo sapiens, mus musculus
characteristics[organism part] REQUIRED Tissue/organ of origin UBERON cervix, kidney, blood
characteristics[disease] REQUIRED Disease state of original tissue MONDO cervical adenocarcinoma, normal, chronic myelogenous leukemia

5.2. Required Cellosaurus Columns

The following Cellosaurus columns are REQUIRED for cell line experiments:

ColumnRequirementDescriptionOntologyExample Values
characteristics[cellosaurus accession] REQUIRED Cellosaurus unique identifier Cellosaurus CVCL_0030, CVCL_0045, CVCL_0004

The following columns are RECOMMENDED for cell line experiments:

ColumnRequirementDescriptionOntologyExample Values
characteristics[cellosaurus name] RECOMMENDED Official Cellosaurus name Cellosaurus HeLa, HEK293, K-562
characteristics[cell type] RECOMMENDED Cell type classification Cell Ontology (CL) epithelial cell, fibroblast, lymphoblast
characteristics[sex] RECOMMENDED Sex of original donor PATO female, male, not available
characteristics[age] RECOMMENDED Age of donor at sample collection Free text 31Y, not available, fetal
characteristics[sampling site] RECOMMENDED Specific anatomical sampling location UBERON cervix uteri, fetal kidney, peripheral blood
comment[passage number] RECOMMENDED Passage number of cells used EFO:0007061 P15, P20-25, low passage
comment[cell line source] RECOMMENDED Source/provider of cell line EFO:0004443 ATCC, DSMZ, ECACC, in-house

5.4. Optional Columns

ColumnRequirementDescriptionOntologyExample Values
characteristics[developmental stage] OPTIONAL Developmental stage of donor UBERON adult, fetus, embryo
characteristics[ancestry category] OPTIONAL Ancestry of donor if known HANCESTRO African American, European, not available
comment[cell line modifications] OPTIONAL Genetic or other modifications EFO:0000510 CRISPR knockout of TP53, GFP-tagged, parental

6. Understanding Metadata Sources

Cell line metadata comes from two sources:

6.1. Database-Derived Metadata

These fields describe the original cell line and SHOULD be obtained from Cellosaurus or the SDRF Cell Line Metadata Database:

  • characteristics[organism] - Species

  • characteristics[organism part] - Tissue of origin

  • characteristics[disease] - Disease of original tissue

  • characteristics[sex] - Sex of donor

  • characteristics[age] - Age of donor

  • characteristics[cell type] - Cell type

  • characteristics[ancestry category] - Donor ancestry

  • characteristics[developmental stage] - Donor developmental stage

  • characteristics[cellosaurus accession] - Identifier

  • characteristics[sampling site] - Anatomical location

Important
For database-derived fields, use the values from the cell line database, NOT values specific to your experiment. For example, characteristics[age] should be the age of the original donor (e.g., "31Y" for HeLa), not the "age" of your cell culture.

6.2. Experiment-Specific Metadata

These fields describe your specific experiment and SHOULD be provided by the user:

7. Ontology Mapping

While Cellosaurus is the RECOMMENDED primary resource, SDRF-Proteomics also accepts terms from:

Resource Use Case OLS Link Example

Cellosaurus

Primary cell line identification (RECOMMENDED)

Cellosaurus

CVCL_0030

BTO (BRENDA Tissue Ontology)

Alternative ontology terms for cell lines

OLS - BTO

BTO:0000567

CLO (Cell Line Ontology)

Legacy support (not actively maintained)

OLS - CLO

CLO:0003684

EFO (Experimental Factor Ontology)

General experimental factors

OLS - EFO

EFO:0001185

When multiple identifiers are available, we RECOMMEND including at minimum: 1. characteristics[cell line] - Common name 2. characteristics[cellosaurus accession] - Cellosaurus ID

8. Example SDRF Files

8.1. Basic Cell Line Experiment

source name characteristics[cell line] characteristics[cellosaurus accession] characteristics[organism] characteristics[disease] ... comment[passage number] assay name comment[data file]
HeLa_ctrl_rep1 HeLa CVCL_0030 homo sapiens cervical adenocarcinoma ... P18 HeLa_ctrl_run1 HeLa_ctrl_1.raw
HeLa_ctrl_rep2 HeLa CVCL_0030 homo sapiens cervical adenocarcinoma ... P18 HeLa_ctrl_run2 HeLa_ctrl_2.raw
HeLa_treat_rep1 HeLa CVCL_0030 homo sapiens cervical adenocarcinoma ... P18 HeLa_treat_run1 HeLa_treat_1.raw
Sample metadata Data file metadata

8.2. Multi-Cell Line Comparison

source name characteristics[cell line] characteristics[cellosaurus accession] characteristics[organism] characteristics[disease] ... assay name comment[data file] factor value[cell line]
MCF7_rep1 MCF7 CVCL_0031 homo sapiens breast adenocarcinoma ... MCF7_run1 MCF7_1.raw MCF7
MCF7_rep2 MCF7 CVCL_0031 homo sapiens breast adenocarcinoma ... MCF7_run2 MCF7_2.raw MCF7
MDA-MB-231_rep1 MDA-MB-231 CVCL_0062 homo sapiens breast adenocarcinoma ... MDAMB231_run1 MDAMB231_1.raw MDA-MB-231
MDA-MB-231_rep2 MDA-MB-231 CVCL_0062 homo sapiens breast adenocarcinoma ... MDAMB231_run2 MDAMB231_2.raw MDA-MB-231
Sample metadata Data file metadata Factor value

9. Common Cell Line Issues

9.1. Misidentified Cell Lines

Some cell lines have known issues with misidentification or contamination. Cellosaurus documents these problems. When using cell lines with known issues:

  1. Check Cellosaurus for any documented problems

  2. Document your authentication method in comment[authentication method]

  3. Consider noting issues in comment[cell line notes]

9.2. Cell Line Variants and Sublines

For cell line variants or sublines:

Parent Cell Line Variant How to Annotate

HeLa

HeLa S3

Use specific Cellosaurus accession (CVCL_0058)

HEK293

HEK293T

Use HEK293T accession (CVCL_0063)

Jurkat

Jurkat E6-1

Use Jurkat E6-1 accession (CVCL_0367)

Always use the most specific Cellosaurus accession for your cell line variant.

9.3. Cell Lines with Unknown Donor Information

For cell lines where donor information is unknown:

  • Use not available for unknown fields (age, sex, ancestry)

  • Do NOT leave cells empty

  • Document what is known from Cellosaurus

10. Best Practices

  1. Use Cellosaurus accessions: Always include characteristics[cellosaurus accession] when available.

  2. Retrieve metadata from databases: Use the SDRF Cell Line Metadata Database for consistent annotation.

  3. Document passage number: Include comment[passage number] for reproducibility.

  4. Use official names: Prefer Cellosaurus names over informal abbreviations.

  5. Separate database vs. experiment metadata: Understand which fields come from databases vs. your experiment.

  6. Check for known issues: Review Cellosaurus for contamination or misidentification reports.

  7. Include source information: Document where cells were obtained with comment[cell line source].

11. Template File

The cell line SDRF template file is available in this directory:

12. Validation

Cell line SDRF files should be validated using the sdrf-pipelines tool:

pip install sdrf-pipelines
parse_sdrf validate-sdrf --sdrf_file your_file.sdrf.tsv

13. Authors and Maintainers

This template was developed by the SDRF-Proteomics community.

For questions or suggestions, please open an issue on the GitHub repository.

14. References