What is an SDRF file?
SDRF (Sample and Data Relationship Format) is a tab-delimited file that describes your proteomics samples and links them to data files. Every row is one sample-to-file relationship, and every column captures a piece of metadata — from the organism you studied to the instrument that acquired the data.
Columns fall into three categories:
| source name | characteristics[organism] | characteristics[disease] | assay name | comment[instrument] | comment[data file] | factor value[disease] |
|---|---|---|---|---|---|---|
| Sample-1 | Homo sapiens | lung cancer | run_01 | Orbitrap Exploris 480 | sample1.raw | lung cancer |
| Sample-2 | Homo sapiens | normal | run_02 | Orbitrap Exploris 480 | sample2.raw | normal |
One row = one data file
Each row in an SDRF file represents one data file (e.g. a .raw or .mzML file) and the biological sample it came from. If the same sample was run twice, it gets two rows. If a TMT multiplex packs 10 samples into one file, that file gets 10 rows — one per channel.
This design makes the relationship between samples and data explicit, so analysis tools can automatically match files to conditions without manual configuration.
Three types of columns
Every SDRF column belongs to one of three categories. Understanding these makes filling in a template straightforward.
characteristics[…]
Describe the biological sample: what organism, tissue, disease state, or treatment group it belongs to.
Examples: characteristics[organism], characteristics[disease], characteristics[cell type]
comment[…]
Describe the data acquisition: instrument, data file name, modifications, fragmentation method, and other technical details.
Examples: comment[instrument], comment[data file], comment[label]
factor value[…]
Mark which characteristics are the experimental variables you are comparing. These mirror a characteristics column.
Example: if characteristics[disease] varies, add factor value[disease]
source name & assay name
Two special columns: source name identifies the biological sample, and assay name identifies the instrument run.
These are always the first and middle columns in an SDRF file.
Use controlled vocabulary
SDRF values should come from established ontologies whenever possible. This ensures that "breast cancer" in your file means the same thing as "breast cancer" in someone else's file, enabling cross-study analysis.
- Organisms — use NCBI Taxonomy names:
Homo sapiens,Mus musculus - Diseases — use MONDO or EFO terms:
breast cancer,normal - Instruments — use PSI-MS terms:
Orbitrap Exploris 480,Q Exactive HF - Modifications — use UNIMOD names:
Oxidation,Carbamidomethyl - Tissues — use UBERON or BTO terms:
liver,blood plasma
The SDRF Terms page lists all allowed values and their source ontologies. The SDRF Editor provides autocomplete for ontology terms.
Start from a template
You don't need to build an SDRF from scratch. Templates define which columns are required and recommended for your experiment type. They are organized in layers:
- Organism templates (human, vertebrates, plants) add species-specific metadata like age, sex, ancestry, or growth conditions.
- Experiment templates (DIA, single-cell, crosslinking) add technique-specific columns like isolation windows or crosslinker details.
- Templates are composable — combine an organism template with an experiment template to get all the right columns.
Browse available templates on the home page or use the Template Builder to interactively pick the right combination.
Factor values tell tools what you're comparing
Factor value columns are what make an SDRF file useful for analysis. They tell tools like quantms which sample properties differ between your experimental groups.
If you're comparing healthy vs. disease samples, add factor value[disease]
with the same values as characteristics[disease]. If you're comparing
treatment vs. control, add factor value[compound].
Multiple factor values are allowed when comparing more than one variable.
Common patterns
Label-free experiment
One sample per file. Set comment[label] to label free sample.
Each row is one .raw file from one biological sample.
TMT / iTRAQ multiplexing
Multiple samples per file. Each channel gets its own row with the same comment[data file]
but different comment[label] values (e.g. TMT126, TMT127N).
Fractionation
One sample split across multiple files. Each fraction is a separate row with the same source name
but different comment[fraction identifier] values.
Technical replicates
Same sample run multiple times. Each replicate is its own row with the same source name
but different comment[technical replicate] values.
Validate before submitting
The sdrf-pipelines validator checks your file against the specification: required columns, valid ontology terms, consistent labels across rows, and more.
Install and run:
pip install sdrf-pipelines
parse_sdrf validate-sdrf --sdrf_file your-file.sdrf.tsv
Fix any errors reported, then validate again. A passing validation means your SDRF is ready for submission to PRIDE or other ProteomeXchange repositories.
Ready to build your SDRF?
Use the interactive Template Builder to select your technology, organism, and experiment type — and generate a customized template in seconds.
Open Template Builder