What is an SDRF file?

SDRF (Sample and Data Relationship Format) is a tab-delimited file that describes your proteomics samples and links them to data files. Every row is one sample-to-file relationship, and every column captures a piece of metadata — from the organism you studied to the instrument that acquired the data.

Columns fall into three categories:

source name characteristics[organism] characteristics[disease] assay name comment[instrument] comment[data file] factor value[disease]
Sample-1 Homo sapiens lung cancer run_01 Orbitrap Exploris 480 sample1.raw lung cancer
Sample-2 Homo sapiens normal run_02 Orbitrap Exploris 480 sample2.raw normal
Sample metadata Data / instrument Experimental factors

One row = one data file

Each row in an SDRF file represents one data file (e.g. a .raw or .mzML file) and the biological sample it came from. If the same sample was run twice, it gets two rows. If a TMT multiplex packs 10 samples into one file, that file gets 10 rows — one per channel.

This design makes the relationship between samples and data explicit, so analysis tools can automatically match files to conditions without manual configuration.

Three types of columns

Every SDRF column belongs to one of three categories. Understanding these makes filling in a template straightforward.

characteristics[…]

Describe the biological sample: what organism, tissue, disease state, or treatment group it belongs to.

Examples: characteristics[organism], characteristics[disease], characteristics[cell type]

comment[…]

Describe the data acquisition: instrument, data file name, modifications, fragmentation method, and other technical details.

Examples: comment[instrument], comment[data file], comment[label]

factor value[…]

Mark which characteristics are the experimental variables you are comparing. These mirror a characteristics column.

Example: if characteristics[disease] varies, add factor value[disease]

source name & assay name

Two special columns: source name identifies the biological sample, and assay name identifies the instrument run.

These are always the first and middle columns in an SDRF file.

Use controlled vocabulary

SDRF values should come from established ontologies whenever possible. This ensures that "breast cancer" in your file means the same thing as "breast cancer" in someone else's file, enabling cross-study analysis.

The SDRF Terms page lists all allowed values and their source ontologies. The SDRF Editor provides autocomplete for ontology terms.

Start from a template

You don't need to build an SDRF from scratch. Templates define which columns are required and recommended for your experiment type. They are organized in layers:

Browse available templates on the home page or use the Template Builder to interactively pick the right combination.

Factor values tell tools what you're comparing

Factor value columns are what make an SDRF file useful for analysis. They tell tools like quantms which sample properties differ between your experimental groups.

If you're comparing healthy vs. disease samples, add factor value[disease] with the same values as characteristics[disease]. If you're comparing treatment vs. control, add factor value[compound]. Multiple factor values are allowed when comparing more than one variable.

Common patterns

Label-free experiment

One sample per file. Set comment[label] to label free sample. Each row is one .raw file from one biological sample.

TMT / iTRAQ multiplexing

Multiple samples per file. Each channel gets its own row with the same comment[data file] but different comment[label] values (e.g. TMT126, TMT127N).

Fractionation

One sample split across multiple files. Each fraction is a separate row with the same source name but different comment[fraction identifier] values.

Technical replicates

Same sample run multiple times. Each replicate is its own row with the same source name but different comment[technical replicate] values.

Validate before submitting

The sdrf-pipelines validator checks your file against the specification: required columns, valid ontology terms, consistent labels across rows, and more.

Install and run:

pip install sdrf-pipelines
parse_sdrf validate-sdrf --sdrf_file your-file.sdrf.tsv

Fix any errors reported, then validate again. A passing validation means your SDRF is ready for submission to PRIDE or other ProteomeXchange repositories.

Ready to build your SDRF?

Use the interactive Template Builder to select your technology, organism, and experiment type — and generate a customized template in seconds.

Open Template Builder