SDRF-Proteomics

1. Status

Status: Released

Version: 1.1.0 - 2026-01

2. Abstract

The human template provides standardized metadata fields for human proteomics experiments. It extends the base template with clinical and demographic metadata required for human samples.

This template is appropriate for:

  • Clinical proteomics studies

  • Healthy human donor samples

  • Cancer and disease biomarker studies

  • Drug treatment and pharmacoproteomics studies

3. Template Hierarchy

base (construction artifact)
  └── human (SAMPLE layer - must combine with TECHNOLOGY template)

The human template extends the base template, adding disease annotation and human-specific clinical metadata fields.

Important

The human template cannot be used alone. You must combine it with a technology template:

  • ms-proteomics + human - For MS-based proteomics of human samples

  • affinity-proteomics + human - For affinity proteomics (Olink, SomaScan) of human samples

4. Checklist

4.1. Inherited Columns

All columns from the base template are inherited (source name, organism, organism part, biological replicate, assay name, technology type, instrument, technical replicate, data file). Technology-specific columns (cleavage agent, label, fraction identifier, etc.) come from the technology template (ms-proteomics or affinity-proteomics).

4.2. Human-Specific Columns

ColumnRequirementDescriptionSection
characteristics[disease] REQUIRED Disease state of the sample Sample Metadata
characteristics[age] REQUIRED Age of the donor at sample collection Sample Metadata
characteristics[sex] REQUIRED Biological sex of the donor Sample Metadata
characteristics[individual] RECOMMENDED Unique identifier for the donor Individual Identifier
characteristics[ancestry category] RECOMMENDED Ancestry or ethnic background Ancestry Category
characteristics[developmental stage] OPTIONAL Developmental stage (adult, fetal, etc.) Sample Metadata

4.3. Clinical Study Columns (Optional)

ColumnRequirementDescriptionSection
characteristics[pre-existing condition] OPTIONAL Pre-existing medical conditions Pre-existing Condition
characteristics[disease staging] OPTIONAL Disease stage (I, II, III, IV) Disease Staging
characteristics[clinical data] OPTIONAL Clinical measurements (e.g., glucose level) Clinical Data
characteristics[tumor stage] OPTIONAL TNM staging for cancer Tumor Stage
characteristics[tumor grade] OPTIONAL Tumor differentiation grade Tumor Grading
characteristics[treatment] OPTIONAL Treatment administered Treatment
characteristics[compound] OPTIONAL Drug or compound used Compound

5. Validation

Validate SDRF files using sdrf-pipelines:

pip install sdrf-pipelines
parse_sdrf validate-sdrf --sdrf_file your_file.sdrf.tsv --template human

6. Example

source name characteristics[organism] characteristics[disease] characteristics[organism part] characteristics[age] characteristics[sex] characteristics[individual] characteristics[biological replicate] assay name comment[data file] factor value[disease]
patient_001 homo sapiens hepatocellular carcinoma liver 55Y male P001 1 run_001 patient_001.raw hepatocellular carcinoma
patient_002 homo sapiens hepatocellular carcinoma liver 62Y female P002 1 run_002 patient_002.raw hepatocellular carcinoma
control_001 homo sapiens normal liver 48Y male C001 1 run_003 control_001.raw normal
Sample metadata Data file metadata Factor values

7. Detailed Column Guidelines

This section provides detailed guidelines for each human-specific metadata column, organized by category:

8. General Human Metadata

This section describes basic metadata columns applicable to all human samples, including healthy donors and patients.

8.1. Individual Identifier

Format: Alphanumeric pseudonymized identifier (e.g., patient_001, P023455, donor_1).

Important
Never use actual patient names. Use pseudonymized identifiers only. Consider GDPR, HIPAA regulations.

Enables tracking of:

  • Biological replicates from the same individual

  • Matched samples (tumor/normal pairs from same patient)

  • Longitudinal samples (same patient over time)

  • Family/pedigree studies

Examples:

characteristics[individual]: patient_001
characteristics[individual]: individual 234
characteristics[individual]: donor_1

When multiple samples come from the same patient, use consistent identifier:

source name characteristics[individual] characteristics[organism part]
sample_001 patient_001 tumor
sample_002 patient_001 adjacent normal
Sample metadata (characteristics)

8.2. Ancestry Category

Value HANCESTRO ID Description

African

HANCESTRO:0010

African ancestry

European

HANCESTRO:0005

European ancestry

East Asian

HANCESTRO:0009

East Asian ancestry

South Asian

HANCESTRO:0006

South Asian ancestry

American

HANCESTRO:0013

American ancestry (Indigenous peoples of the Americas)

Note
Ancestry category values retain proper noun capitalization (e.g., African, European) since they refer to geographic populations. This is an exception to the general lowercase rule for controlled vocabulary terms. Use the short form (e.g., African) rather than the full label (e.g., African ancestry).

Use not available when ancestry is unknown, not applicable for non-human samples or cell lines where ancestry doesn’t apply.

8.3. Body Measurements

Column Ontology Format Example

characteristics[body mass index]

EFO:0004340

Numeric (kg/m²)

25.5

characteristics[weight]

EFO:0004324

{value} kg

70 kg

characteristics[height]

EFO:0004339

{value} cm

175 cm

8.4. Sampling Time Point

For longitudinal studies, indicates when sample was collected relative to a reference point.

Values: baseline, week 1, day 7, post-treatment, follow-up.

9. Clinical-Specific Terms

This section describes columns for capturing clinical metadata in human studies, applicable to patient samples regardless of disease type.

9.1. Pre-existing Condition

Comorbidities or pre-existing medical conditions. For multiple conditions, use multiple columns (preferred) or semicolon-separated values.

Condition SDRF Value MONDO ID

High blood pressure

hypertension

MONDO:0005044

Diabetes

type 2 diabetes mellitus

MONDO:0005148

Heart disease

coronary artery disease

MONDO:0005010

Obesity

obesity

MONDO:0011122

9.2. Disease Staging

Disease staging captures overall disease progression, including cancer staging, chronic disease phases, and severity classifications.

Value NCIT ID Description Use Case

stage I

NCIT:C27966

Early localized disease

Solid tumors

stage II

?

Locally advanced disease

Solid tumors (use disease-specific staging)

stage III

NCIT:C27970

Regional spread

Solid tumors

stage IV

NCIT:C27971

Distant metastasis

Solid tumors

chronic phase

?

Stable chronic disease

Leukemia, chronic conditions (use disease-specific terms)

end stage

?

Terminal disease phase

End-stage organ failure (use disease-specific terms)

Note
Some staging terms like "stage II", "chronic phase", and "end stage" do not have generic NCIT codes. For these, use disease-specific staging terms when available (e.g., NCIT:C3175 for Chronic Phase CML).

Examples:

characteristics[disease staging]: chronic phase
characteristics[disease staging]: end stage
characteristics[disease staging]: stage 1
characteristics[disease staging]: mild inflammation

Use not available when staging is unknown, not applicable for non-disease samples.

9.3. Clinical Data

Free-text or structured clinical details about samples that don’t fit into other structured fields. Use this column for receptor status, treatment history, surgical details, and other clinical annotations.

Note
Use the term clinical data (EFO:0030083) rather than clinical information to align with EFO ontology.

Examples:

characteristics[clinical data]: breast reduction mammoplasty tissue explant
characteristics[clinical data]: lesional
characteristics[clinical data]: estrogen-receptor-positive, progesterone-receptor-positive, epidermal growth-factor-2-negative
characteristics[clinical data]: no cytogenetic response to imatinib

Best Practice: Consider using structured columns (e.g., separate columns for ER/PR/HER2 status) when possible. Use clinical data for additional context that doesn’t fit elsewhere.

9.4. Clinical History

Captures relevant medical history information for the patient.

10. Cancer-Specific Terms

This section describes columns specific to cancer proteomics studies. These terms are distinct from general clinical staging and provide detailed tumor characterization.

10.1. Tumor Stage (TNM Staging)

TNM staging describes the extent of cancer spread. Format: T{0-4}N{0-3}M{0-1} where T = tumor size/extent, N = lymph node involvement, M = distant metastasis. Prefix p indicates pathological staging (post-surgery), c indicates clinical staging.

Value Description Staging Type

T1N0M0

Small tumor, no lymph nodes, no metastasis

Clinical or pathological

pT2pN1M0

Pathological staging after surgery

Pathological

Type IV

Stage IV disease

Roman numeral notation

IIB

Stage IIB

Roman numeral with substage

Examples:

characteristics[tumor stage]: Type IV
characteristics[tumor stage]: IIB
characteristics[tumor stage]: T3N0M0

10.2. Tumor Grading

Histological tumor grade describes how abnormal cancer cells look under a microscope. Grade correlates with tumor aggressiveness.

Value NCIT ID Description

grade 1

NCIT:C28077

Well differentiated (low grade)

grade 2

NCIT:C28078

Moderately differentiated (intermediate grade)

grade 3

NCIT:C28079

Poorly differentiated (high grade)

grade 4

NCIT:C28082

Undifferentiated (high grade)

Examples:

characteristics[tumor grading]: grade 2
characteristics[tumor grading]: grade 3

10.3. Metastasis Site

Location where cancer has spread from the primary site.

Value Description

lymph node

Metastasis to lymph nodes

liver

Hepatic metastasis

lung

Pulmonary metastasis

bone

Bone metastasis

brain

Brain metastasis

pleura

Pleural metastasis

10.4. Tumor Size and Mass

Format: {value} {unit} (e.g., 2.5 cm, 15 mm, 50 g)

10.5. Biopsy Site

Specific anatomical location where biopsy was performed.

Examples: left leg, right breast, Brodmann area 46, liver segment 7

10.6. Survival Time

Format: {value} {unit} (e.g., 24 months, 3 years)

Used for survival analysis studies to track patient outcomes.

10.7. Last Follow Up

Format: {value} {unit} (e.g., 12 months, 2 years)

Time of last clinical follow-up for longitudinal studies.

10.8. Mitotic Rate

Number of mitoses per high-power field. Used for histological grading in some cancer types.

10.9. Cancer Subtype Classifications

Breast cancer subtype (characteristics[breast cancer subtype]):

Value NCIT ID Description

luminal A

NCIT:C53554

ER+/PR+, HER2-, low Ki67

luminal B

NCIT:C53555

ER+/PR+, HER2+/- , high Ki67

HER2-enriched

NCIT:C53556

ER-/PR-, HER2+

triple-negative

NCIT:C71732

ER-/PR-/HER2-

Other cancer-specific staging systems:

  • characteristics[Dukes stage] - Colorectal cancer staging (A, B, C, D)

  • characteristics[Ann Arbor stage] - Lymphoma staging (I, II, III, IV)

  • characteristics[Gleason score] - Prostate cancer grading

  • characteristics[Weiss grade] - Adrenal cortical carcinoma (low, high)

11. Sampling Site

Specifies the exact sampling location within an organ or tissue, providing more detail than characteristics[organism part]. This column is critical for cancer proteomics studies to distinguish tumor tissue from normal tissue.

11.1. Relationship with characteristics[organism part]

These two columns work together to provide hierarchical anatomical information:

  • characteristics[organism part] = General anatomical structure (e.g., "breast", "liver", "heart")

  • characteristics[sampling site] = Specific location or context within that structure (e.g., "tumor", "normal tissue adjacent to tumor", "left ventricle")

11.2. When to Use Sampling Site

Use characteristics[sampling site] when you need to specify:

  • Tumor vs. normal distinction - Critical for matched tumor/normal studies

  • Specific anatomical sub-regions - e.g., "left ventricle" within "heart"

  • Disease-specific locations - e.g., "adenoma" within "colorectum"

  • Developmental zones - e.g., "root differentiation zone" (for plant studies)

11.3. Usage Patterns

Pattern 1: Cancer Studies (Tumor/Normal/Adjacent)

characteristics[organism part] characteristics[sampling site] Use Case

breast

tumor

Primary tumor tissue

breast

normal tissue adjacent to tumor

Adjacent normal tissue (may have field effects)

breast

tumor-distant section

Normal tissue from same organ, distant from tumor

colorectum

tumor

Colorectal tumor

colorectum

normal tissue

Normal colorectal tissue

liver

normal tissue

Unaffected liver tissue

Examples:

characteristics[organism part]: breast
characteristics[sampling site]: tumor-distant section

characteristics[organism part]: colorectum
characteristics[sampling site]: normal tissue adjacent to tumor

Pattern 2: Anatomical Sub-Regions

characteristics[organism part] characteristics[sampling site] Use Case

heart

left ventricle

Specific heart chamber

brain

frontal cortex

Specific brain region

cerebellum

posterior inferior cerebellum

Detailed cerebellar location

colonic mucosa

sigmoid colon

Specific colon region

11.4. Annotation Preferences

There are two valid approaches for anatomical annotation:

Option 1: Most specific level in organism part (preferred for simple cases)

When no additional context is needed, use the most specific anatomical term directly:

characteristics[organism part]: left ventricle
characteristics[organism part]: cerebral cortex
characteristics[organism part]: sigmoid colon

Option 2: General + sampling site (preferred for complex cases)

When you need additional context (e.g., tumor vs. normal), use both columns:

characteristics[organism part]: heart
characteristics[sampling site]: left ventricle

characteristics[organism part]: breast
characteristics[sampling site]: tumor

Guidelines:

  1. Prefer most specific level in organism part when simple anatomical annotation is sufficient

  2. Use sampling site when you need to capture tumor/normal distinction, disease context, or other qualitative information

  3. Do NOT use multiple organism part columns - This pattern is not recommended

  4. Use lowercase for all values

12. Phenotype and Biomarkers

For general phenotype annotation guidelines applicable to all organisms, see Sample Metadata Guidelines - Phenotype.

This section focuses on human-specific biomarker applications of the phenotype column.

12.1. Clinical Biomarkers

In clinical proteomics, phenotype is commonly used to capture biomarker status for patient stratification:

Biomarker Type Example Values Clinical Application

Hormone receptor status

ER positive, PR negative

Breast cancer treatment selection

Growth factor receptors

HER2 positive, EGFR positive

Targeted therapy eligibility

Immune markers

PD-L1 high, CD8+ infiltrating

Immunotherapy response prediction

Stem cell markers

CD34 positive, ALDH+

Stem cell identification

Examples:

characteristics[phenotype]: HER2 positive
characteristics[phenotype]: ER positive
characteristics[phenotype]: triple negative
characteristics[phenotype]: PD-L1 high expression

12.2. Drug Response in Human Studies

For pharmacoproteomics and precision medicine studies:

characteristics[phenotype]: gefitinib sensitive
characteristics[phenotype]: cisplatin resistant
characteristics[phenotype]: tamoxifen resistant
characteristics[phenotype]: imatinib responder

12.3. Genotype for Human Variants

For human genetic variants, use genotype to capture specific mutations:

characteristics[genotype]: BRCA1 mutation carrier
characteristics[genotype]: KRAS G12D mutant
characteristics[genotype]: wild type
characteristics[genotype]: TP53 R175H

12.4. Biomarker vs Genotype vs Phenotype

Concept Column Example Use Case

Genetic mutation

characteristics[genotype]

BRCA1 mutation

Known genetic variant

Protein expression status

characteristics[phenotype]

HER2 positive

Biomarker measured by IHC/proteomics

Drug sensitivity

characteristics[phenotype]

cisplatin resistant

Functional drug response

13. Treatment and Perturbation Metadata

This section describes columns for capturing drug treatment, compound exposure, and perturbation information. These fields are critical for pharmacoproteomics studies and treatment comparison experiments.

13.1. Treatment

General treatment or intervention applied to samples. Use for treatment categories when more general than specific compounds.

Value NCIT ID Description

chemotherapy

NCIT:C15632

Chemical drug treatment

immunotherapy

NCIT:C15262

Immune-based treatment

radiation therapy

NCIT:C15313

Radiation-based treatment

surgery

NCIT:C17173

Surgical intervention

untreated

NCIT:C48660

No treatment applied

Examples:

characteristics[treatment]: untreated
characteristics[treatment]: Mycobacterium tuberculosis H37Rv whole cell lysate
characteristics[treatment]: LPS stimulation

Use untreated for no treatment, not available when unknown.

13.2. Compound

Chemical compound, drug, or biological agent applied to samples. Use ChEBI ontology for standardized compound identification.

Compound ChEBI ID Drug Class

doxorubicin

CHEBI:28748

Anthracycline chemotherapy

cisplatin

CHEBI:27899

Platinum chemotherapy

tamoxifen

CHEBI:41774

Selective estrogen receptor modulator

dexamethasone

CHEBI:41879

Glucocorticoid

none

EFO:0001461

Control (no compound)

Examples:

characteristics[compound]: tamoxifen (CHEBI_41774)
characteristics[compound]: dexamethasone
characteristics[compound]: none (EFO:0001461)
characteristics[compound]: lipopolysaccharide
characteristics[compound]: interferon gamma

Best Practice: Include ChEBI accession in format compound_name (CHEBI:XXXXX) when available.

13.3. Dose

Format: {value} {unit}

Concentration Unit Example Use Case

nanomolar

100 nanomolar

Cell culture experiments

micromolar

10 micromolar

In vitro studies

millimolar

1 millimolar

High concentration studies

mg/kg

5 mg/kg

Animal studies (body weight-based)

ng/ml

50 ng/ml

Serum/media concentration

μg/ml

10 μg/ml

Protein/antibody concentration

Examples:

characteristics[dose]: 100 nanomolar
characteristics[dose]: 10 micromolar
characteristics[dose]: 50 nanogram per millilitre

13.4. Treatment Time

Format: {value} {unit} where unit is hour, day, week, month (or h, d, w abbreviations)

Duration of treatment exposure.

Value Description

24 hour

One day treatment

4 hours

Short-term treatment

5 days

Multi-day treatment

2 weeks

Extended treatment

Examples:

characteristics[treatment time]: 24 hour
characteristics[treatment time]: 4 hours
characteristics[treatment time]: 5 days

13.5. Treatment Status

Values: pre-treatment, on treatment, post-treatment, treatment naive.

13.6. Treatment Response

Value NCIT ID Description

complete response

NCIT:C4870

No detectable disease (complete remission)

partial response

NCIT:C18058

Tumor size reduced (partial remission)

stable disease

NCIT:C18213

No significant change

progressive disease

NCIT:C17747

Disease progression

13.7. Genetic Modification

Describes the method or type of genetic modification applied to samples (particularly relevant for engineered cell lines and model organisms).

Note
This is distinct from characteristics[genotype] which describes WHAT variant is present. genetic modification describes HOW the modification was done.
Value EFO ID Description

gene knock out

EFO:0000506

Complete gene deletion

gene knock down

EFO:0000513

Gene expression reduction (siRNA, shRNA)

gene overexpression

EFO:0000514

Increased gene expression

transfection

EFO:0000515

DNA introduction (non-viral)

transduction

EFO:0000516

DNA introduction (viral)

wild type genotype

EFO:0005168

No genetic modification

Examples:

characteristics[genetic modification]: gene knock out (EFO:0000506)
characteristics[genetic modification]: AML1-ETO fusion gene transduction
characteristics[genetic modification]: transduction of hTERT, HPVE6
characteristics[genotype]: DNMT3a R882 mutant
characteristics[genotype]: wild type genotype

13.8. Complete Treatment Annotation Example

For comprehensive pharmacoproteomics annotation, combine compound, dose, and treatment time:

source name characteristics[compound] characteristics[dose] characteristics[treatment time] characteristics[phenotype]

sample_treated_1

tamoxifen (CHEBI:41774)

1 micromolar

48 hour

tamoxifen sensitive

sample_treated_2

tamoxifen (CHEBI:41774)

1 micromolar

48 hour

tamoxifen resistant

sample_control

none (EFO:0001461)

not applicable

not applicable

control

Sample metadata

14. Best Practices

14.1. General Guidelines

  1. Protect privacy - Never include identifiable patient information. Use pseudonymized identifiers only.

  2. Use ontology terms - Prefer MONDO for diseases, NCIT for staging/treatments, ChEBI for compounds, PATO for phenotypes.

  3. Be consistent - Use the same format for all samples in a dataset.

  4. Document unknowns - Use not available rather than leaving cells empty.

  5. Include temporal context - Specify when measurements were taken relative to treatment.

  6. Ensure regulatory compliance - Follow GDPR, HIPAA, and other data protection regulations.

14.2. Clinical Proteomics Best Practices

  1. Separate disease from staging - Use characteristics[disease] for the condition and characteristics[disease staging] for progression.

  2. Use sampling site for tumor/normal - Distinguish tissue types using characteristics[sampling site] (e.g., "tumor", "normal tissue adjacent to tumor").

  3. Track matched samples - Use consistent characteristics[individual] identifiers for samples from the same patient.

  4. Document treatment context - Combine characteristics[compound], characteristics[dose], and characteristics[treatment time] for complete treatment annotation.

  5. Use phenotype for biomarkers - Capture molecular markers and drug response status in characteristics[phenotype].

14.3. Cancer-Specific Best Practices

  1. Avoid embedding staging in disease - Don’t use "high grade serous ovarian cancer" in disease column; use separate staging columns.

  2. Use standard staging systems - Prefer TNM notation for solid tumors, Ann Arbor for lymphomas.

  3. Include grading when available - Tumor grade provides important prognostic information.

  4. Document metastasis - Use characteristics[metastasis site] for metastatic samples.

15. Additional Terms

This section contains optional metadata terms that may be useful in specific contexts.

15.1. Smoking Status

Value NCIT ID Description

never smoker

NCIT:C65108

Never smoked

former smoker

NCIT:C65107

Previously smoked, now quit

current smoker

NCIT:C65106

Currently smoking

15.2. Menopausal Status

For female patients. Values: pre-menopausal, peri-menopausal, post-menopausal.

16. References