1. Introduction
This document provides guidelines for annotating general sample metadata in SDRF-Proteomics format. These guidelines apply to samples from any organism (human, animal, plant, microorganism).
For human-specific metadata (disease staging, comorbidities, treatment history), see Human Template.
Version 1.1.0 - 2026-01
2. Best Practices
-
Use lowercase for all controlled vocabulary values (except proper nouns in disease names).
-
Use ontology terms mapped to MONDO (diseases), CL (cell types), UBERON (anatomy), PATO (phenotypes).
-
Be consistent with format across all samples in a dataset.
-
Document unknowns using
not available(unknown) ornot applicable(not relevant) - never leave cells empty. -
Validate before submission using sdrf-pipelines to check ontology mappings.
3. General Formatting Conventions
3.1. Capitalization Rules
Most controlled vocabulary values are recommended to be lowercase:
-
Organism names:
homo sapiens,mus musculus -
Organism parts:
blood,liver,brain -
Sex values:
male,female
Exceptions (retain proper noun capitalization):
-
Ancestry categories (geographic populations):
African,European,South Asian -
Cell line names:
HeLa,HEK293,K562
|
Note
|
Validators should normalize common capitalization variations (e.g., accept both Homo sapiens and homo sapiens), but submitters should use lowercase for consistency.
|
4. Organism
Column: characteristics[organism]
Ontology: NCBI Taxonomy (NCBITaxon)
Use the scientific name in lowercase. The validator will map to the correct ontology term.
| Value | NCBITaxon ID | Description |
|---|---|---|
homo sapiens |
Human |
|
mus musculus |
Mouse |
5. Organism Part / Tissue
Column: characteristics[organism part]
Ontology: UBERON for mammals/vertebrates, Plant Ontology (PO) for plants, FlyBase Anatomy (FBbt) for Drosophila.
Use lowercase for all values. For cell line samples, use not applicable or specify the tissue of origin (e.g., cervix for HeLa).
|
Note
|
Do NOT use The |
| Value | UBERON ID | Description |
|---|---|---|
blood plasma |
Blood plasma |
|
liver |
Liver tissue |
|
brain |
Brain |
|
heart |
Heart |
6. Age
Column: characteristics[age]
Format: {Number}{Unit} where Unit is: Y (Years), M (Months), W (Weeks), D (Days).
6.1. Age Formats
| Age Type | Format | Example | Description |
|---|---|---|---|
Exact age |
|
|
40 years old |
Exact age |
|
|
8 weeks old |
Age range |
|
|
Between 40 and 50 years |
Greater than |
|
|
Older than 18 years |
Less than |
|
|
Younger than 65 years |
Greater or equal |
|
|
21 years or older |
|
Important
|
When exact age is unavailable: If age cannot be determined precisely, consider using:
Developmental stage is useful for animal studies, pooled samples, or historical samples where exact age is unavailable. |
7. Sex
Column: characteristics[sex]
Requirement: REQUIRED for human samples, OPTIONAL for other organisms. Values allowed are under PATO ontology: PATO:0000047 - Biological Sex
Allowed values:
| Value | Description |
|---|---|
male |
Male biological sex |
female |
Female biological sex |
|
Note
|
For cell lines, use the sex of the original donor if known (e.g., female for HeLa cells), otherwise use not available. Use anonymized when the information exists but has been redacted for privacy reasons (e.g., clinical studies with de-identified data).
|
9. Disease
Column: characteristics[disease]
Ontology: MONDO Disease Ontology (preferred), EFO, Human Disease Ontology (DOID)
9.1. Healthy Samples: normal vs healthy vs control
For samples without disease, the terminology matters for standardization:
normal (PATO:0000461) - Recommended
-
Standard term in pathology ("normal tissue" vs "tumor tissue")
-
Well-defined ontology mapping to PATO:0000461
-
Widely used in existing proteomics datasets
healthy (SIO:001012) - Accepted alternative
-
More intuitive for clinical/human samples
-
Has valid ontology support (Semanticscience Integrated Ontology)
-
Validators should accept both
normalandhealthy
-
Avoid using "Control" as a disease state, it is an experimental design concept, not a disease state
-
Ambiguous: a control could be healthy, vehicle-treated, time-zero, or even a different disease used for comparison
The characteristics[disease] column captures the actual disease state of the sample, while factor value[disease] indicates the experimental comparison groups:
| source name | ... | characteristics[disease] | ... | factor value[disease] |
|---|---|---|---|---|
| healthy_1 | ... | normal | ... | normal |
| tumor_1 | ... | breast carcinoma | ... | breast carcinoma |
| adjacent_1 | ... | normal | ... | normal |
In this example:
-
healthy_1: Healthy control sample from a healthy individual -
tumor_1: Disease case sample (tumor tissue) -
adjacent_1: Adjacent normal tissue from a cancer patient
Both healthy_1 and adjacent_1 have normal as their disease state (no tumor cells), but they come from different individuals. The experimental design and comparison groups are defined by the factor values.
9.2. Disease Examples
| Disease | SDRF Value | Ontology Link |
|---|---|---|
Healthy/no disease |
normal |
|
Healthy (alternative) |
healthy |
|
Breast cancer |
breast carcinoma |
|
Tip
|
For animal disease models, use the human disease name to facilitate cross-species comparisons. |
10. Phenotype
Column: characteristics[phenotype]
Ontology: EFO:0000651
Requirement: Optional
Describes observable characteristics or traits of a sample that result from genotype, environment, and treatment interactions. Captures how the sample behaves or appears, not what disease it has.
| Column | Purpose | Example |
|---|---|---|
|
What disease/pathology is present |
"breast cancer", "Alzheimer disease" |
|
What genetic variant is present |
"KRAS G12D mutant", "wild type" |
|
How the sample behaves/appears |
"gefitinib sensitive", "HER2 positive" |
Common phenotype categories:
-
Drug response: gefitinib sensitive, cisplatin resistant
-
Molecular markers: HER2 positive, ER positive
-
Expression states: overexpressing Neurogenin3, FOXP3 expression
-
Functional traits: undifferentiated, adipogenic
-
Environmental responses: high fat diet, heat shock response
When NOT to use phenotype: For disease names use characteristics[disease], for genetic variants use characteristics[genotype], for treatments use characteristics[compound] or characteristics[treatment].
11. Cell Type
Column: characteristics[cell type]
Ontology: Cell Ontology (CL)
For cell lines, optionally include the cell type of origin.
| Value | CL Link |
|---|---|
epithelial cell |
|
T cell |
|
B cell |
|
macrophage |
|
fibroblast |
For detailed immune cell studies, use specific subtypes: CD4-positive, alpha-beta T cell (CL:0000624), regulatory T cell (CL:0000815).
12. Material Type
Column: characteristics[material type]
Ontology: PRIDE:0000837
Material type describes the nature of the biological material being analyzed.
Requirement: Optional
Allowed values:
| Value | Description | When to Use |
|---|---|---|
tissue |
Solid tissue sample from an organism |
Biopsies, surgical specimens, autopsy samples |
cell |
Individual cells or cell suspensions |
FACS-sorted cells, primary cell cultures, dissociated tissue |
cell line |
Established immortalized cell line |
HeLa, HEK293, A549, etc. |
organism part |
A part of an organism’s anatomy |
Organ samples, body fluids (when not whole tissue) |
whole organism |
Complete organism sample |
Single-celled organisms, small model organisms (C. elegans, yeast) |
synthetic |
Artificially synthesized material |
Synthetic peptide libraries, recombinant proteins |
Special values:
-
not available: When material type is unknown -
not applicable: When the concept doesn’t apply (e.g., for computational datasets)
Examples:
| Sample Type | characteristics[material type] | characteristics[organism part] | characteristics[cell line] |
|---|---|---|---|
Liver biopsy |
tissue |
liver |
not applicable |
HeLa cells |
cell line |
cervix |
HeLa |
Sorted T cells |
cell |
blood |
not applicable |
Mouse whole brain |
tissue |
brain |
not applicable |
E. coli culture |
whole organism |
not applicable |
not applicable |
Synthetic peptides |
synthetic |
not applicable |
not applicable |
|
Note
|
For cell line samples, use cell line as the material type. The specific cell line name goes in characteristics[cell line]. The tissue of origin can be specified in characteristics[organism part].
|
12.1. Additional Sample-Related Columns
| Column | Description | Example Values | When to Use | Example |
|---|---|---|---|---|
|
Treatment applied to the sample |
dexamethasone, vehicle control, untreated |
Drug treatment studies |
|
|
Time of sample collection |
0h, 24h, 7d, baseline |
Time-course experiments |
|
|
Dose of treatment if applicable |
10 mg/kg, 100 nM, high dose |
Dose-response studies |
— |
|
Body mass index (for human studies) |
25.3 kg/m2, 30.1 kg/m2 |
Metabolic or obesity-related studies |
|
|
Smoking history of the patient |
never smoked, current smoker, former smoker |
Lung or cardiovascular studies |
— |
For additional columns, see the SDRF Terms Reference and the Human Template.
13. PTM Enrichment
Column: characteristics[enrichment process]
Ontology: EFO
| Value | Description |
|---|---|
enrichment of phosphorylated Protein |
Phosphoproteomics enrichment |
not applicable |
No PTM enrichment performed |
14. Depletion
Column: characteristics[depletion]
For blood/plasma samples indicating abundant protein depletion.
Values: no depletion, depletion, not applicable.
15. Patient-Derived Xenografts (PDX)
Column: characteristics[xenograft]
When annotating PDX samples, metadata (age, sex) MUST refer to the original patient, not the host organism.
| source name | characteristics[organism] | characteristics[xenograft] | characteristics[age] |
|---|---|---|---|
| tumor_001 | homo sapiens | not applicable | 65Y |
| pdx_001 | homo sapiens | pancreatic adenocarcinoma grown in nude mice | 65Y |
16. Synthetic Peptide Libraries
Column: characteristics[synthetic peptide]
Values: synthetic (sample is a synthetic peptide library), not synthetic (biological sample).
For synthetic libraries, most sample metadata can be not applicable. The organism MAY be specified if the library was designed from specific species peptides.
17. Spiked-in Samples
Column: characteristics[spiked compound]
For samples spiked with peptides, proteins, or mixtures (e.g., for quantification standards or retention time alignment), use key-value pairs:
| Key | Meaning | Example | Required for |
|---|---|---|---|
CT |
Compound type |
peptide, protein, mixture |
All |
QY |
Quantity |
10 fmol, 20 nmol |
All |
PS |
Peptide sequence |
PEPTIDESEQ |
Peptides |
AC |
UniProt accession |
A9WZ33 |
Proteins |
CN |
Compound name |
iRT mixture |
Optional |
CV |
Compound vendor |
Biognosys |
Mixtures (required) |
Example: characteristics[spiked compound]: CT=peptide;PS=PEPTIDESEQ;QY=10 fmol
The injected mass of the main sample SHOULD be specified in characteristics[mass]. For multiple spiked components, repeat the column. If the spiked component is another biological sample (e.g., E. coli lysate), annotate it in its own row with characteristics[mass] specified for both components.
18. Ontologies
-
MONDO Disease Ontology: https://mondo.monarchinitiative.org/
-
Cell Ontology (CL): https://obofoundry.org/ontology/cl.html
-
UBERON Anatomy Ontology: https://obofoundry.org/ontology/uberon.html
-
PATO Phenotype Ontology: https://obofoundry.org/ontology/pato.html
-
NCBI Taxonomy: https://www.ncbi.nlm.nih.gov/taxonomy
-
EFO Experimental Factor Ontology: https://www.ebi.ac.uk/efo/