1. Status of this Template
This document provides guidelines for annotating cell line-based proteomics experiments in SDRF-Proteomics format. This template extends the core SDRF-Proteomics specification with cell line-specific metadata fields and resources.
Status: Released
Version: 1.1.0 - 2026-01
2. Abstract
Cell lines are extensively used in proteomics research for biological studies and technology development. They present unique annotation challenges: metadata (sex, age, disease) refers to the original donor, multiple naming conventions exist (HeLa, HELA, He-La), and cell lines may be misidentified or have undergone genetic drift.
This template defines standardized approaches for annotating cell line experiments, using Cellosaurus as the PRIMARY standard for cell line identification and metadata retrieval.
|
Important
|
Cellosaurus is the REQUIRED primary standard for cell line annotation in SDRF-Proteomics. The characteristics[cellosaurus accession] column (CVCL_XXXX format) is REQUIRED for all cell line experiments. Do NOT use EFO, CLO, or BTO as primary identifiers - these are accepted only for legacy compatibility and cross-reference purposes.
|
3. Cellosaurus: The Primary Cell Line Resource
Cellosaurus is a comprehensive knowledge resource on cell lines maintained by the SIB Swiss Institute of Bioinformatics. It provides:
-
Standardized cell line names
-
Unique accession numbers (CVCL_XXXX format)
-
Cross-references to other databases
-
Information about species, diseases, and cell types
-
Documentation of cell line problems (contamination, misidentification)
|
Important
|
While Cellosaurus is not a formal ontology, it is the most comprehensive and actively maintained resource for cell line information. SDRF-Proteomics RECOMMENDS using Cellosaurus accession numbers for cell line identification. |
3.1. Cellosaurus Accession Format
Cellosaurus accessions follow the format CVCL_XXXX where XXXX is a unique identifier:
| Cell Line | Cellosaurus Name | Cellosaurus Accession |
|---|---|---|
HeLa |
HeLa |
CVCL_0030 |
HEK293 |
HEK293 |
CVCL_0045 |
4. SDRF Cell Line Metadata Database
To facilitate consistent annotation of cell line experiments, a curated database of cell line metadata is available:
The database file is available at: https://raw.githubusercontent.com/bigbio/sdrf-cellline-metadata-db/main/cl-annotations-db.tsv
5. Checklist
This section defines the metadata columns required and recommended for cell line experiments.
5.1. Required Columns
The following columns are REQUIRED for cell line experiments:
| Column | Requirement | Description | Ontology | Example Values |
|---|---|---|---|---|
characteristics[cell line] |
REQUIRED | Name of the cell line | Cellosaurus | HeLa, HEK293, K562 |
characteristics[organism] |
REQUIRED | Species of the cell line | NCBITaxon | homo sapiens, mus musculus |
characteristics[organism part] |
REQUIRED | Tissue/organ of origin | UBERON | cervix, kidney, blood |
characteristics[disease] |
REQUIRED | Disease state of original tissue | MONDO | cervical adenocarcinoma, normal, chronic myelogenous leukemia |
5.2. Required Cellosaurus Columns
The following Cellosaurus columns are REQUIRED for cell line experiments:
| Column | Requirement | Description | Ontology | Example Values |
|---|---|---|---|---|
characteristics[cellosaurus accession] |
REQUIRED | Cellosaurus unique identifier | Cellosaurus | CVCL_0030, CVCL_0045, CVCL_0004 |
5.3. Recommended Columns
The following columns are RECOMMENDED for cell line experiments:
| Column | Requirement | Description | Ontology | Example Values |
|---|---|---|---|---|
characteristics[cellosaurus name] |
RECOMMENDED | Official Cellosaurus name | Cellosaurus | HeLa, HEK293, K-562 |
characteristics[cell type] |
RECOMMENDED | Cell type classification | Cell Ontology (CL) | epithelial cell, fibroblast, lymphoblast |
characteristics[sex] |
RECOMMENDED | Sex of original donor | PATO | female, male, not available |
characteristics[age] |
RECOMMENDED | Age of donor at sample collection | Free text | 31Y, not available, fetal |
characteristics[sampling site] |
RECOMMENDED | Specific anatomical sampling location | UBERON | cervix uteri, fetal kidney, peripheral blood |
comment[passage number] |
RECOMMENDED | Passage number of cells used | EFO:0007061 | P15, P20-25, low passage |
comment[cell line source] |
RECOMMENDED | Source/provider of cell line | EFO:0004443 | ATCC, DSMZ, ECACC, in-house |
5.4. Optional Columns
| Column | Requirement | Description | Ontology | Example Values |
|---|---|---|---|---|
characteristics[developmental stage] |
OPTIONAL | Developmental stage of donor | UBERON | adult, fetus, embryo |
characteristics[ancestry category] |
OPTIONAL | Ancestry of donor if known | HANCESTRO | African American, European, not available |
comment[cell line modifications] |
OPTIONAL | Genetic or other modifications | EFO:0000510 | CRISPR knockout of TP53, GFP-tagged, parental |
6. Understanding Metadata Sources
Cell line metadata comes from two sources:
6.1. Database-Derived Metadata
These fields describe the original cell line and SHOULD be obtained from Cellosaurus or the SDRF Cell Line Metadata Database:
-
characteristics[organism]- Species -
characteristics[organism part]- Tissue of origin -
characteristics[disease]- Disease of original tissue -
characteristics[sex]- Sex of donor -
characteristics[age]- Age of donor -
characteristics[cell type]- Cell type -
characteristics[ancestry category]- Donor ancestry -
characteristics[developmental stage]- Donor developmental stage -
characteristics[cellosaurus accession]- Identifier -
characteristics[sampling site]- Anatomical location
|
Important
|
For database-derived fields, use the values from the cell line database, NOT values specific to your experiment. For example, characteristics[age] should be the age of the original donor (e.g., "31Y" for HeLa), not the "age" of your cell culture.
|
6.2. Experiment-Specific Metadata
These fields describe your specific experiment and SHOULD be provided by the user:
-
comment[passage number]- Your passage number (EFO:0007061) -
comment[cell line source]- Where you obtained cells (EFO:0004443 - Material Supplier) -
comment[cell line modifications]- Any modifications made (EFO:0000510 - Genetic modification) -
characteristics[treatment]- Experimental treatments (EFO:0000727) -
characteristics[compound]- Drugs/compounds applied (EFO:0000369 - Compound Based Treatment)
7. Ontology Mapping
While Cellosaurus is the RECOMMENDED primary resource, SDRF-Proteomics also accepts terms from:
| Resource | Use Case | OLS Link | Example |
|---|---|---|---|
Cellosaurus |
Primary cell line identification (RECOMMENDED) |
CVCL_0030 |
|
BTO (BRENDA Tissue Ontology) |
Alternative ontology terms for cell lines |
BTO:0000567 |
|
CLO (Cell Line Ontology) |
Legacy support (not actively maintained) |
CLO:0003684 |
|
EFO (Experimental Factor Ontology) |
General experimental factors |
EFO:0001185 |
When multiple identifiers are available, we RECOMMEND including at minimum:
1. characteristics[cell line] - Common name
2. characteristics[cellosaurus accession] - Cellosaurus ID
8. Example SDRF Files
8.1. Basic Cell Line Experiment
| source name | characteristics[cell line] | characteristics[cellosaurus accession] | characteristics[organism] | characteristics[disease] | ... | comment[passage number] | assay name | comment[data file] |
|---|---|---|---|---|---|---|---|---|
| HeLa_ctrl_rep1 | HeLa | CVCL_0030 | homo sapiens | cervical adenocarcinoma | ... | P18 | HeLa_ctrl_run1 | HeLa_ctrl_1.raw |
| HeLa_ctrl_rep2 | HeLa | CVCL_0030 | homo sapiens | cervical adenocarcinoma | ... | P18 | HeLa_ctrl_run2 | HeLa_ctrl_2.raw |
| HeLa_treat_rep1 | HeLa | CVCL_0030 | homo sapiens | cervical adenocarcinoma | ... | P18 | HeLa_treat_run1 | HeLa_treat_1.raw |
8.2. Multi-Cell Line Comparison
| source name | characteristics[cell line] | characteristics[cellosaurus accession] | characteristics[organism] | characteristics[disease] | ... | assay name | comment[data file] | factor value[cell line] |
|---|---|---|---|---|---|---|---|---|
| MCF7_rep1 | MCF7 | CVCL_0031 | homo sapiens | breast adenocarcinoma | ... | MCF7_run1 | MCF7_1.raw | MCF7 |
| MCF7_rep2 | MCF7 | CVCL_0031 | homo sapiens | breast adenocarcinoma | ... | MCF7_run2 | MCF7_2.raw | MCF7 |
| MDA-MB-231_rep1 | MDA-MB-231 | CVCL_0062 | homo sapiens | breast adenocarcinoma | ... | MDAMB231_run1 | MDAMB231_1.raw | MDA-MB-231 |
| MDA-MB-231_rep2 | MDA-MB-231 | CVCL_0062 | homo sapiens | breast adenocarcinoma | ... | MDAMB231_run2 | MDAMB231_2.raw | MDA-MB-231 |
9. Common Cell Line Issues
9.1. Misidentified Cell Lines
Some cell lines have known issues with misidentification or contamination. Cellosaurus documents these problems. When using cell lines with known issues:
-
Check Cellosaurus for any documented problems
-
Document your authentication method in
comment[authentication method] -
Consider noting issues in
comment[cell line notes]
9.2. Cell Line Variants and Sublines
For cell line variants or sublines:
| Parent Cell Line | Variant | How to Annotate |
|---|---|---|
HeLa |
HeLa S3 |
Use specific Cellosaurus accession (CVCL_0058) |
HEK293 |
HEK293T |
Use HEK293T accession (CVCL_0063) |
Jurkat |
Jurkat E6-1 |
Use Jurkat E6-1 accession (CVCL_0367) |
Always use the most specific Cellosaurus accession for your cell line variant.
9.3. Cell Lines with Unknown Donor Information
For cell lines where donor information is unknown:
-
Use
not availablefor unknown fields (age, sex, ancestry) -
Do NOT leave cells empty
-
Document what is known from Cellosaurus
10. Best Practices
-
Use Cellosaurus accessions: Always include
characteristics[cellosaurus accession]when available. -
Retrieve metadata from databases: Use the SDRF Cell Line Metadata Database for consistent annotation.
-
Document passage number: Include
comment[passage number]for reproducibility. -
Use official names: Prefer Cellosaurus names over informal abbreviations.
-
Separate database vs. experiment metadata: Understand which fields come from databases vs. your experiment.
-
Check for known issues: Review Cellosaurus for contamination or misidentification reports.
-
Include source information: Document where cells were obtained with
comment[cell line source].
11. Template File
The cell line SDRF template file is available in this directory:
12. Validation
Cell line SDRF files should be validated using the sdrf-pipelines tool:
pip install sdrf-pipelines
parse_sdrf validate-sdrf --sdrf_file your_file.sdrf.tsv
13. Authors and Maintainers
This template was developed by the SDRF-Proteomics community.
For questions or suggestions, please open an issue on the GitHub repository.
14. References
-
Cellosaurus: https://www.cellosaurus.org/
-
SDRF Cell Line Metadata Database: https://github.com/bigbio/sdrf-cellline-metadata-db
-
Bairoch A. (2018) The Cellosaurus, a cell-line knowledge resource. Journal of Biomolecular Techniques.
-
ICLAC (International Cell Line Authentication Committee): https://iclac.org/