Create Your SDRF File
Answer a few questions about your experiment and we'll guide you to the right template and examples.
What acquisition method did you use?
What is SDRF?
SDRF (Sample and Data Relationship Format) is a simple tab-separated file (like Excel) that describes your proteomics experiment. It connects your biological samples to your mass spectrometry data files.
The Core Concept
Think of SDRF as a table where:
- Each row = one sample-to-file relationship
- Each column = one piece of information about that sample or file
Learn from Real Examples
The best way to learn is by example. Browse real SDRF files from published proteomics datasets and use them as templates for your own experiments.
Step 1: Choose Your Template
Templates are pre-made SDRF files with the right columns already set up for your experiment type. Pick one based on your organism and experiment:
Core Templates (by organism)
| Your Samples | Template | Includes |
|---|---|---|
| Human samples | human | age, sex, ancestry, ethnicity |
| Mouse, rat, zebrafish | vertebrates | developmental stage, strain |
| Insects, worms | invertebrates | developmental stage |
| Plants | plants | cultivar, growth conditions |
| Other / Not sure | ms-proteomics | minimal required columns |
Specialized Templates (by experiment type)
| Experiment Type | Template | Adds |
|---|---|---|
| Cell line studies | cell-lines | cell line name, Cellosaurus ID |
| DIA acquisition | dia-acquisition | DIA isolation window |
| Immunopeptidomics | immunopeptidomics | HLA alleles, MHC class |
| Cross-linking MS | crosslinking | crosslinker, cleavability |
| Single-cell proteomics | single-cell | single cell identifier |
Step 2: Understand Column Types
SDRF columns follow a naming pattern that tells you what kind of information they contain:
characteristics[...]
Describe the biological sample
characteristics[organism]
characteristics[disease]
characteristics[organism part]
Sample Metadata
comment[...]
Describe the data file or MS run
comment[data file]
comment[instrument]
comment[label]
Data File Metadata
factor value[...]
The experimental variable you're comparing
factor value[disease]
factor value[compound]
factor value[time]
What You're Studying
characteristics[organism]— CorrectCharacteristics[organism]— Wrong (capital C)characteristics [organism]— Wrong (space before bracket)
Step 3: Fill in Sample Information
Open your template in Excel, Google Sheets, or any spreadsheet software. For each sample, fill in the biological information:
| Column | What to Write | Example | Notes |
|---|---|---|---|
source name |
A unique identifier for your biological sample | patient_001 | Unique to the sample; repeats for fractions/replicates |
characteristics[organism] |
Species name (lowercase) | homo sapiens | Use scientific name from NCBI Taxonomy |
characteristics[organism part] |
Tissue or body part | liver | Use terms from UBERON |
characteristics[disease] |
Disease name, or "normal" | hepatocellular carcinoma | Use "normal" for healthy samples |
Step 4: Add Data File Information
For each row, also specify information about the raw file and how it was acquired:
| Column | What to Write | Example | Notes |
|---|---|---|---|
assay name |
A name for this MS run | run_001 | Often same as source name |
comment[label] |
Type of labeling | label free sample | Or TMT126, TMT127N, etc. |
comment[instrument] |
Mass spectrometer used | Q Exactive HF | From PSI-MS ontology |
comment[data file] |
Your raw file name | sample_001.raw | Exact filename including extension |
Step 5: Define Your Experimental Variables
Factor values tell analysis tools what you're comparing in your experiment. This is crucial for downstream analysis!
What is a Factor Value?
A factor value is the experimental variable you're studying. If your experiment compares cancer vs. healthy tissue, then disease is your factor. The values would be "hepatocellular carcinoma" and "normal".
| Experiment Type | Factor Value Column | Example Values |
|---|---|---|
| Disease vs. healthy | factor value[disease] |
cancer, normal |
| Drug treatment | factor value[compound] |
aspirin, DMSO |
| Time course | factor value[time] |
0 hour, 6 hour, 24 hour |
| Tissue comparison | factor value[organism part] |
liver, kidney, heart |
| Multiple variables | Multiple factor columns | Both disease AND time |
Complete Example
Here's a minimum valid SDRF file for a human liver cancer study, including all required columns from the human template:
| source name | characteristics[organism] | characteristics[organism part] | characteristics[disease] | characteristics[biological replicate] | characteristics[age] | characteristics[sex] | assay name | technology type | comment[proteomics data acquisition method] | comment[label] | comment[instrument] | comment[cleavage agent details] | comment[fraction identifier] | comment[technical replicate] | comment[data file] | factor value[disease] |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| patient_001 | homo sapiens | liver | hepatocellular carcinoma | 1 | 55Y | male | run_001 | proteomic profiling by mass spectrometry | Data-dependent acquisition | label free sample | Q Exactive HF | NT=Trypsin;AC=MS:1001251 | 1 | 1 | patient_001.raw | hepatocellular carcinoma |
| patient_002 | homo sapiens | liver | hepatocellular carcinoma | 2 | 62Y | female | run_002 | proteomic profiling by mass spectrometry | Data-dependent acquisition | label free sample | Q Exactive HF | NT=Trypsin;AC=MS:1001251 | 1 | 1 | patient_002.raw | hepatocellular carcinoma |
| control_001 | homo sapiens | liver | normal | 1 | 48Y | male | run_003 | proteomic profiling by mass spectrometry | Data-dependent acquisition | label free sample | Q Exactive HF | NT=Trypsin;AC=MS:1001251 | 1 | 1 | control_001.raw | normal |
| control_002 | homo sapiens | liver | normal | 2 | 51Y | female | run_004 | proteomic profiling by mass spectrometry | Data-dependent acquisition | label free sample | Q Exactive HF | NT=Trypsin;AC=MS:1001251 | 1 | 1 | control_002.raw | normal |
Scroll horizontally to see all columns →
Required columns in this example:
- Sample metadata: source name, organism, organism part, disease, biological replicate, age, sex
- Data file metadata: assay name, technology type, proteomics data acquisition method, label, instrument, cleavage agent, fraction identifier, technical replicate, data file
- Factor value: the experimental variable being compared (disease)
What this example tells us:
- 2 biological replicates per condition (numbered 1-2 within each factor value group)
- No fractionation (fraction identifier = 1 for all)
- Single injection per sample (technical replicate = 1)
- Label-free DDA proteomics with trypsin digestion on Q Exactive HF
Step 6: Validate Your File
Before submission, validate your SDRF to catch errors early:
- Correct column names and formatting
- Valid ontology terms (organism, disease, etc.)
- Required columns present
- No empty cells where values are required
Common Scenarios
TMT/iTRAQ Multiplexed Samples
For multiplexed experiments, multiple samples share the same raw file. Each sample gets its own row with a different label:
| source name | comment[label] | comment[data file] |
|---|---|---|
| sample_A | TMT126 | multiplex_1.raw |
| sample_B | TMT127N | multiplex_1.raw |
| sample_C | TMT127C | multiplex_1.raw |
Fractionated Samples
If you fractionated your sample before MS, add a fraction identifier column:
| source name | comment[fraction identifier] | comment[data file] |
|---|---|---|
| sample_001 | 1 | sample_001_F01.raw |
| sample_001 | 2 | sample_001_F02.raw |
| sample_001 | 3 | sample_001_F03.raw |
Technical Replicates
Same sample run multiple times? Use the same source name with different assay names and data files:
| source name | assay name | comment[technical replicate] | comment[data file] |
|---|---|---|---|
| sample_001 | sample_001_rep1 | 1 | sample_001_rep1.raw |
| sample_001 | sample_001_rep2 | 2 | sample_001_rep2.raw |
Cell Line Experiments
For cell lines, include the cell line name and Cellosaurus accession:
| source name | characteristics[cell line] | characteristics[cellosaurus accession] |
|---|---|---|
| hela_001 | HeLa | CVCL_0030 |
| hek_001 | HEK293 | CVCL_0045 |
Find accessions at Cellosaurus.
Cell lines template →Common Mistakes to Avoid
Source Name
source name
Column names must be lowercase
characteristics [organism]
characteristics[organism]
No space before the bracket
control for healthy
normal for healthy
Use "normal" for healthy tissue/samples
not available
Never leave cells empty; use "not available" or "not applicable"
Finding the Right Terms
SDRF uses ontology terms to ensure consistency. Here's where to find them:
Next Steps
All SDRF Terms
Complete reference of all columns, their requirements, and valid values
SDRF Terms Reference