1. Introduction
This document provides guidelines for annotating general sample metadata in SDRF-Proteomics format. These guidelines apply to samples from any organism (human, animal, plant, microorganism).
For human-specific metadata (disease staging, comorbidities, treatment history), see Human Template.
Version 1.1.0 - 2026-01
2. Columns Beyond the YAML Templates
The YAML templates define the most common columns with validators and requirement levels. However, not every recognised SDRF column needs a YAML definition. The file TERMS.tsv serves as a broader registry of column names, ontology mappings, and allowed values. Submitters may add any term listed in TERMS.tsv (e.g., characteristics[xenograft], characteristics[mass]) to their SDRF file even if no YAML template includes it. Such columns will not be subject to template-level validation but remain part of the specification.
3. Best Practices
-
Use lowercase for all controlled vocabulary values (except proper nouns in disease names).
-
Use ontology terms mapped to MONDO (diseases), CL/BTO/CLO (cell types), UBERON (anatomy), PATO (phenotypes).
-
Be consistent with format across all samples in a dataset.
-
Document unknowns using
not available(unknown) ornot applicable(not relevant) - never leave cells empty. -
Validate before submission using sdrf-pipelines to check ontology mappings.
4. General Formatting Conventions
4.1. Capitalization Rules
Most controlled vocabulary values are recommended to be lowercase:
-
Organism names:
homo sapiens,mus musculus -
Organism parts:
blood,liver,brain -
Sex values:
male,female
Exceptions (retain proper noun capitalization):
-
Ancestry categories (geographic populations):
African,European,South Asian -
Cell line names:
HeLa,HEK293,K562
|
Note
|
Validators should normalize common capitalization variations (e.g., accept both Homo sapiens and homo sapiens), but submitters should use lowercase for consistency.
|
5. Organism
Column: characteristics[organism]
Ontology: NCBI Taxonomy (NCBITaxon)
Use the scientific name in lowercase. The validator will map to the correct ontology term.
| Value | NCBITaxon ID | Description |
|---|---|---|
homo sapiens |
Human |
|
mus musculus |
Mouse |
6. Organism Part / Tissue
Column: characteristics[organism part]
Use lowercase for all values. For cell line samples, use not applicable or specify the tissue of origin (e.g., cervix for HeLa).
|
Note
|
Do NOT use The |
| Value | UBERON ID | Description |
|---|---|---|
blood plasma |
Blood plasma |
|
liver |
Liver tissue |
|
brain |
Brain |
|
heart |
Heart |
7. Age
Column: characteristics[age]
Format: {Number}{Unit} where Unit is: Y (Years), M (Months), W (Weeks), D (Days).
7.1. Age Formats
| Age Type | Format | Example | Description |
|---|---|---|---|
Exact age |
|
|
40 years old |
Exact age |
|
|
8 weeks old |
Compound age |
|
|
30 years and 6 months (strict order Y>M>W>D) |
Compound age |
|
|
2 years, 3 months, 1 week |
Age range |
|
|
Between 40 and 50 years |
Greater than |
|
|
Older than 18 years |
Less than |
|
|
Younger than 65 years |
Greater or equal |
|
|
21 years or older |
|
Important
|
When exact age is unavailable: If age cannot be determined precisely, consider using:
Developmental stage is useful for animal studies, pooled samples, or historical samples where exact age is unavailable. |
8. Sex
Column: characteristics[sex]
Requirement: REQUIRED for human samples, RECOMMENDED for vertebrates, OPTIONAL for other organisms. Values allowed are under PATO ontology: PATO:0000047 - Biological Sex
Allowed values:
| Value | Description |
|---|---|
male |
Male biological sex |
female |
Female biological sex |
intersex |
Intersex biological sex |
hermaphrodite |
Hermaphrodite (for non-human organisms) |
|
Note
|
For cell lines, use the sex of the original donor if known (e.g., female for HeLa cells), otherwise use not available. Use anonymized when the information exists but has been redacted for privacy reasons (e.g., clinical studies with de-identified data).
|
9. Developmental Stage
Column: characteristics[developmental stage]
Ontology: EFO
| Value | Description |
|---|---|
adult |
Sexually mature organism |
embryonic |
Pre-birth/pre-hatching |
For model organisms, use specific stages when applicable: embryonic day 14 (mouse E14), 24 hpf (zebrafish 24 hours post fertilization).
10. Disease
Column: characteristics[disease]
Ontology: MONDO Disease Ontology (preferred), EFO, Human Disease Ontology (DOID), PATO (for normal — healthy samples)
10.1. Healthy Samples: normal vs healthy vs control
For samples without disease, the terminology matters for standardization:
normal (PATO:0000461) - REQUIRED term for healthy samples
-
Standard term in pathology ("normal tissue" vs "tumor tissue")
-
Well-defined ontology mapping to PATO:0000461
-
Widely used in existing proteomics datasets
-
This is the single canonical term for healthy/control samples
-
Avoid using "Control" as a disease state, it is an experimental design concept, not a disease state
-
Ambiguous: a control could be healthy, vehicle-treated, time-zero, or even a different disease used for comparison
The characteristics[disease] column captures the actual disease state of the sample, while factor value[disease] indicates the experimental comparison groups:
| source name | … | characteristics[disease] | … | factor value[disease] |
|---|---|---|---|---|
healthy_1 |
… |
normal |
… |
normal |
tumor_1 |
… |
breast carcinoma |
… |
breast carcinoma |
adjacent_1 |
… |
normal |
… |
normal |
In this example:
-
healthy_1: Healthy control sample from a healthy individual -
tumor_1: Disease case sample (tumor tissue) -
adjacent_1: Adjacent normal tissue from a cancer patient
Both healthy_1 and adjacent_1 have normal as their disease state (no tumor cells), but they come from different individuals. The experimental design and comparison groups are defined by the factor values.
10.2. Disease Examples
| Disease | SDRF Value | Ontology Link |
|---|---|---|
Healthy/no disease |
normal |
|
Breast cancer |
breast carcinoma |
|
Tip
|
For animal disease models, use the human disease name to facilitate cross-species comparisons. |
11. Phenotype
Column: characteristics[phenotype]
Ontology: EFO:0000651
Requirement: Optional
Describes observable characteristics or traits of a sample that result from genotype, environment, and treatment interactions. Captures how the sample behaves or appears, not what disease it has.
| Column | Purpose | Example |
|---|---|---|
|
What disease/pathology is present |
"breast cancer", "Alzheimer disease" |
|
What genetic variant is present |
"KRAS G12D mutant", "wild type" |
|
How the sample behaves/appears |
"gefitinib sensitive", "HER2 positive" |
Common phenotype categories:
-
Drug response: gefitinib sensitive, cisplatin resistant
-
Molecular markers: HER2 positive, ER positive
-
Expression states: overexpressing Neurogenin3, FOXP3 expression
-
Functional traits: undifferentiated, adipogenic
-
Environmental responses: high fat diet, heat shock response
When NOT to use phenotype: For disease names use characteristics[disease], for genetic variants use characteristics[genotype], for treatments use characteristics[compound] or characteristics[treatment].
12. Cell Type
Column: characteristics[cell type]
For cell lines, optionally include the cell type of origin.
| Value | CL Link |
|---|---|
epithelial cell |
|
T cell |
|
B cell |
|
macrophage |
|
fibroblast |
For detailed immune cell studies, use specific subtypes: CD4-positive, alpha-beta T cell (CL:0000624), regulatory T cell (CL:0000815).
13. Material Type
Column: characteristics[material type]
Ontology: PRIDE:0000837
Material type describes the nature of the biological material being analyzed.
Requirement: Optional
Allowed values:
| Value | Description | When to Use |
|---|---|---|
tissue |
Solid tissue sample from an organism |
Biopsies, surgical specimens, autopsy samples |
cell |
Individual cells or cell suspensions |
FACS-sorted cells, primary cell cultures, dissociated tissue |
cell line |
Established immortalized cell line |
HeLa, HEK293, A549, etc. |
organism part |
A part of an organism’s anatomy |
Organ samples, body fluids (when not whole tissue) |
whole organism |
Complete organism sample |
Single-celled organisms, small model organisms (C. elegans, yeast) |
synthetic |
Artificially synthesized material |
Synthetic peptide libraries, recombinant proteins |
Special values:
-
not available: When material type is unknown -
not applicable: When the concept doesn’t apply (e.g., for computational datasets)
Examples:
| Sample Type | characteristics[material type] | characteristics[organism part] | characteristics[cell line] |
|---|---|---|---|
Liver biopsy |
tissue |
liver |
not applicable |
HeLa cells |
cell line |
cervix |
HeLa |
Sorted T cells |
cell |
blood |
not applicable |
Mouse whole brain |
tissue |
brain |
not applicable |
E. coli culture |
whole organism |
not applicable |
not applicable |
Plasma sample |
organism part |
blood |
not applicable |
Synthetic peptides |
synthetic |
not applicable |
not applicable |
|
Note
|
For cell line samples, use cell line as the material type. The specific cell line name goes in characteristics[cell line]. The tissue of origin can be specified in characteristics[organism part].
|
13.1. Additional Sample-Related Columns
| Column | Description | Example Values | When to Use | Example |
|---|---|---|---|---|
|
Treatment applied to the sample |
dexamethasone, vehicle control, untreated |
Drug treatment studies |
|
|
Time of sample collection |
0 hour, 24 hour, 7 day |
Time-course experiments |
|
|
Dose of treatment if applicable. Prefer quantitative values (number + unit). Defined in clinical-metadata template. |
10 mg/kg, 100 nM, 50 uM |
Dose-response studies |
— |
|
Body mass index (EFO:0004340). Defined in clinical-metadata template. |
24.5, 31.2, 18.7 |
Metabolic or obesity-related studies |
|
|
Smoking history of the patient (NCIT:C19796). Use child terms of NCIT:C19796. Defined in clinical-metadata template. |
never smoker, current smoker, former smoker |
Lung or cardiovascular studies |
— |
For additional columns, see the SDRF Terms Reference and the Human Template.
14. PTM Enrichment
Column: characteristics[enrichment process]
Ontology: EFO and PRIDE (children of EFO:0009090)
| Value | Description |
|---|---|
enrichment of phosphorylated protein |
Phosphoproteomics enrichment |
not applicable |
No PTM enrichment performed |
15. Depletion
Column: characteristics[depletion]
Template: Defined in ms-proteomics template.
For blood/plasma samples indicating abundant protein depletion.
Values: no depletion, depletion, not applicable.
16. Patient-Derived Xenografts (PDX)
Column: characteristics[xenograft]
When annotating PDX samples, metadata (age, sex) MUST refer to the original patient, not the host organism.
| source name | characteristics[organism] | characteristics[xenograft] | characteristics[age] |
|---|---|---|---|
tumor_001 |
homo sapiens |
not applicable |
65Y |
pdx_001 |
homo sapiens |
pancreatic adenocarcinoma grown in nude mice |
65Y |
17. Synthetic Peptide Libraries
Column: characteristics[synthetic peptide]
Values: synthetic (sample is a synthetic peptide library), not synthetic (biological sample).
For synthetic libraries, most sample metadata can be not applicable. The organism MAY be specified if the library was designed from specific species peptides.
18. Spiked-in Samples
Column: characteristics[spiked compound]
For samples spiked with peptides, proteins, or mixtures (e.g., for quantification standards or retention time alignment), use key-value pairs:
| Key | Meaning | Example | Required for |
|---|---|---|---|
CT |
Compound type |
peptide, protein, mixture |
All |
QY |
Quantity |
10 fmol, 20 nmol |
All |
PS |
Peptide sequence |
PEPTIDESEQ |
Peptides |
AC |
UniProt accession |
A9WZ33 |
Proteins |
CN |
Compound name |
iRT mixture |
Optional |
CV |
Compound vendor |
Biognosys |
Mixtures (required) |
Example: characteristics[spiked compound]: CT=peptide;PS=PEPTIDESEQ;QY=10 fmol
The injected mass of the main sample SHOULD be specified in characteristics[mass]. For multiple spiked components, repeat the column. If the spiked component is another biological sample (e.g., E. coli lysate), annotate it in its own row with characteristics[mass] specified for both components.
19. Ontologies
-
MONDO Disease Ontology: https://mondo.monarchinitiative.org/
-
Cell Ontology (CL): https://obofoundry.org/ontology/cl.html
-
BRENDA Tissue Ontology (BTO): https://www.ebi.ac.uk/ols4/ontologies/bto
-
Cell Line Ontology (CLO): https://www.ebi.ac.uk/ols4/ontologies/clo
-
UBERON Anatomy Ontology: https://obofoundry.org/ontology/uberon.html
-
PATO Phenotype Ontology: https://obofoundry.org/ontology/pato.html
-
NCBI Taxonomy: https://www.ncbi.nlm.nih.gov/taxonomy
-
EFO Experimental Factor Ontology: https://www.ebi.ac.uk/efo/