Create Your SDRF File

Answer a few questions about your experiment and we'll guide you to the right template and examples.

1

What acquisition method did you use?


What is SDRF?

SDRF (Sample and Data Relationship Format) is a simple tab-separated file (like Excel) that describes your proteomics experiment. It connects your biological samples to your mass spectrometry data files.

The Core Concept

Think of SDRF as a table where:

  • Each row = one sample-to-file relationship
  • Each column = one piece of information about that sample or file
Why use SDRF? When you submit data to repositories like PRIDE, SDRF ensures your experiment is fully described and can be reanalyzed by others. It's becoming a standard requirement for proteomics data submission. Learn more in the specification →

Learn from Real Examples

The best way to learn is by example. Browse real SDRF files from published proteomics datasets and use them as templates for your own experiments.

Tip: Find datasets similar to yours by organism, instrument, or acquisition method. You can view the complete SDRF structure and use it as a starting point. Browse Examples in SDRF Explorer →

Step 1: Choose Your Template

Templates are pre-made SDRF files with the right columns already set up for your experiment type. Pick one based on your organism and experiment:

Core Templates (by organism)

Your Samples Template Includes
Human samples human age, sex, ancestry, ethnicity
Mouse, rat, zebrafish vertebrates developmental stage, strain
Insects, worms invertebrates developmental stage
Plants plants cultivar, growth conditions
Other / Not sure ms-proteomics minimal required columns

Specialized Templates (by experiment type)

Experiment Type Template Adds
Cell line studies cell-lines cell line name, Cellosaurus ID
DIA acquisition dia-acquisition DIA isolation window
Immunopeptidomics immunopeptidomics HLA alleles, MHC class
Cross-linking MS crosslinking crosslinker, cleavability
Single-cell proteomics single-cell single cell identifier
Tip: You can combine templates! Start with a core template (e.g., "human") and add columns from specialized templates as needed.

Step 2: Understand Column Types

SDRF columns follow a naming pattern that tells you what kind of information they contain:

characteristics[...]

Describe the biological sample

characteristics[organism] characteristics[disease] characteristics[organism part] Sample Metadata

comment[...]

Describe the data file or MS run

comment[data file] comment[instrument] comment[label] Data File Metadata

factor value[...]

The experimental variable you're comparing

factor value[disease] factor value[compound] factor value[time] What You're Studying
Important: Column names are case-sensitive and spacing matters!
  • characteristics[organism] — Correct
  • Characteristics[organism] — Wrong (capital C)
  • characteristics [organism] — Wrong (space before bracket)

Step 3: Fill in Sample Information

Open your template in Excel, Google Sheets, or any spreadsheet software. For each sample, fill in the biological information:

Column What to Write Example Notes
source name A unique identifier for your biological sample patient_001 Unique to the sample; repeats for fractions/replicates
characteristics[organism] Species name (lowercase) homo sapiens Use scientific name from NCBI Taxonomy
characteristics[organism part] Tissue or body part liver Use terms from UBERON
characteristics[disease] Disease name, or "normal" hepatocellular carcinoma Use "normal" for healthy samples
Tip: Don't stress about finding exact ontology terms. Write the common name (e.g., "liver", "breast cancer") and the validator will check it for you. You can always refine later.

Step 4: Add Data File Information

For each row, also specify information about the raw file and how it was acquired:

Column What to Write Example Notes
assay name A name for this MS run run_001 Often same as source name
comment[label] Type of labeling label free sample Or TMT126, TMT127N, etc.
comment[instrument] Mass spectrometer used Q Exactive HF From PSI-MS ontology
comment[data file] Your raw file name sample_001.raw Exact filename including extension
One row = one sample-to-file relationship. In multiplexed experiments (TMT/iTRAQ), multiple samples share the same file, so you'll have multiple rows pointing to the same raw file. In fractionated experiments, one sample spans multiple files, so you'll have multiple rows for the same sample. More about SDRF structure →

Step 5: Define Your Experimental Variables

Factor values tell analysis tools what you're comparing in your experiment. This is crucial for downstream analysis!

What is a Factor Value?

A factor value is the experimental variable you're studying. If your experiment compares cancer vs. healthy tissue, then disease is your factor. The values would be "hepatocellular carcinoma" and "normal".

Experiment Type Factor Value Column Example Values
Disease vs. healthy factor value[disease] cancer, normal
Drug treatment factor value[compound] aspirin, DMSO
Time course factor value[time] 0 hour, 6 hour, 24 hour
Tissue comparison factor value[organism part] liver, kidney, heart
Multiple variables Multiple factor columns Both disease AND time
Common mistake: Factor values often duplicate information from characteristics columns — and that's correct! The factor value explicitly marks which characteristic is the experimental variable.

Complete Example

Here's a minimum valid SDRF file for a human liver cancer study, including all required columns from the human template:

source name characteristics[organism] characteristics[organism part] characteristics[disease] characteristics[biological replicate] characteristics[age] characteristics[sex] assay name technology type comment[proteomics data acquisition method] comment[label] comment[instrument] comment[cleavage agent details] comment[fraction identifier] comment[technical replicate] comment[data file] factor value[disease]
patient_001 homo sapiens liver hepatocellular carcinoma 1 55Y male run_001 proteomic profiling by mass spectrometry Data-dependent acquisition label free sample Q Exactive HF NT=Trypsin;AC=MS:1001251 1 1 patient_001.raw hepatocellular carcinoma
patient_002 homo sapiens liver hepatocellular carcinoma 2 62Y female run_002 proteomic profiling by mass spectrometry Data-dependent acquisition label free sample Q Exactive HF NT=Trypsin;AC=MS:1001251 1 1 patient_002.raw hepatocellular carcinoma
control_001 homo sapiens liver normal 1 48Y male run_003 proteomic profiling by mass spectrometry Data-dependent acquisition label free sample Q Exactive HF NT=Trypsin;AC=MS:1001251 1 1 control_001.raw normal
control_002 homo sapiens liver normal 2 51Y female run_004 proteomic profiling by mass spectrometry Data-dependent acquisition label free sample Q Exactive HF NT=Trypsin;AC=MS:1001251 1 1 control_002.raw normal

Scroll horizontally to see all columns →

Sample metadata (characteristics) Data file metadata (comment) Experimental variable (factor value)

Required columns in this example:

  • Sample metadata: source name, organism, organism part, disease, biological replicate, age, sex
  • Data file metadata: assay name, technology type, proteomics data acquisition method, label, instrument, cleavage agent, fraction identifier, technical replicate, data file
  • Factor value: the experimental variable being compared (disease)

What this example tells us:

  • 2 biological replicates per condition (numbered 1-2 within each factor value group)
  • No fractionation (fraction identifier = 1 for all)
  • Single injection per sample (technical replicate = 1)
  • Label-free DDA proteomics with trypsin digestion on Q Exactive HF

Step 6: Validate Your File

Before submission, validate your SDRF to catch errors early:

Option 1: Command Line (Recommended)

# Install the validator
pip install sdrf-pipelines

# Validate your file
parse_sdrf validate-sdrf --sdrf_file your_file.sdrf.tsv

Best for batch validation and integration into pipelines.

Option 2: With Template Check

# Validate against a specific template
parse_sdrf validate-sdrf --sdrf_file your_file.sdrf.tsv --template human

Checks that all required columns for your template are present.

Validation checks for:
  • Correct column names and formatting
  • Valid ontology terms (organism, disease, etc.)
  • Required columns present
  • No empty cells where values are required

Common Scenarios

TMT/iTRAQ Multiplexed Samples

For multiplexed experiments, multiple samples share the same raw file. Each sample gets its own row with a different label:

source name comment[label] comment[data file]
sample_A TMT126 multiplex_1.raw
sample_B TMT127N multiplex_1.raw
sample_C TMT127C multiplex_1.raw
Full TMT documentation →
Fractionated Samples

If you fractionated your sample before MS, add a fraction identifier column:

source name comment[fraction identifier] comment[data file]
sample_001 1 sample_001_F01.raw
sample_001 2 sample_001_F02.raw
sample_001 3 sample_001_F03.raw
Fractionation documentation →
Technical Replicates

Same sample run multiple times? Use the same source name with different assay names and data files:

source name assay name comment[technical replicate] comment[data file]
sample_001 sample_001_rep1 1 sample_001_rep1.raw
sample_001 sample_001_rep2 2 sample_001_rep2.raw
Cell Line Experiments

For cell lines, include the cell line name and Cellosaurus accession:

source name characteristics[cell line] characteristics[cellosaurus accession]
hela_001 HeLa CVCL_0030
hek_001 HEK293 CVCL_0045

Find accessions at Cellosaurus.

Cell lines template →

Common Mistakes to Avoid

Wrong Source Name
Correct source name

Column names must be lowercase

Wrong characteristics [organism]
Correct characteristics[organism]

No space before the bracket

Wrong control for healthy
Correct normal for healthy

Use "normal" for healthy tissue/samples

Wrong Empty cells
Correct not available

Never leave cells empty; use "not available" or "not applicable"

Finding the Right Terms

SDRF uses ontology terms to ensure consistency. Here's where to find them:

Next Steps

Browse Examples

See real SDRF files from published datasets in ProteomeXchange

SDRF Explorer

All SDRF Terms

Complete reference of all columns, their requirements, and valid values

SDRF Terms Reference

Full Specification

Detailed documentation for advanced use cases and edge cases

Read Specification

Get Help

Questions? Open an issue on GitHub to reach the bigbio team

Ask a Question