Dictionary

Attribute Description DependsOn Required
Assay A planned process with the objective to produce information about the material entity that is the evaluant, by physically examining it or its proxies.[OBI_0000070] nan False
Device A thing made or adapted for a particular purpose, especially a piece of mechanical or electronic equipment nan False
Sequencing Module for next generation sequencing assays nan False
Component Category of metadata (e.g. Diagnosis, Biospecimen, scRNA-seq Level 1, etc.); provide the same one for all items/rows. nan True
Patient HTAN patient Component, HTAN Participant ID False
File A type of Information Content Entity specific to OS nan False
Filename Name of a file nan True
File Format Format of a file (e.g. txt, csv, fastq, bam, etc.) nan True
CDS Sequencing Template CDS compatible template file, includes attributes for Genomic Reference, Library Layout, Data Type, Sequencing Platform, Library Selection Method Component, Filename, File Format, HTAN Data File ID, HTAN Parent Biospecimen ID, CDS library_id, CDS library_strategy, CDS library_source, CDS library_selection, CDS library_layout, CDS platform, CDS instrument_model, CDS design_description, CDS reference_genome_assembly, CDS custom_assembly_fasta_file_for_alignment, CDS bases, CDS number_of_reads, CDS coverage, CDS avg_read_length, CDS sequence_alignment_software True
CDS library_id Short unique identifier for the sequencing library. nan True
CDS library_strategy Library strategy nan True
CDS library_source The Library Source specifies the type of source material that is being sequenced nan True
CDS library_selection Library Selection Method nan True
CDS library_layout Paired-end or Single nan True
CDS platform Sequencing Platform used for Sequencing nan True
CDS instrument_model Instrument model used for sequencing nan True
CDS design_description Free-form description of the methods used to create the sequencing library; a brief 'materials and methods' section. nan False
CDS reference_genome_assembly This is only if you are submitting a bam file aligned against a NCBI assembly. nan False
CDS custom_assembly_fasta_file_for_alignment Please provide the name of the custom assembly fasta file used during alignment nan False
CDS bases Count of unique basecalls present in the data. Please count each base only once if using secondary alignments. nan False
CDS number_of_reads Count of the number of reads in the data. Please count each read only once if using secondary alignments. nan False
CDS coverage Depth of coverage on assembly used. Found by (Unique Aligned Basecalls)/(Reference Length) nan False
CDS avg_read_length Found by (Bases)/(Reads) nan False
CDS sequence_alignment_software The name of the software program used to align nucleotide sequencing data. nan False
Checksum MD5 checksum of the BAM file nan True
HTAN Data File ID Self-identifier for this data file - HTAN ID of this file HTAN ID SOP (eg HTANx_yyy_zzz) nan True
HTAN Participant ID HTAN ID associated with a patient based on HTAN ID SOP (eg HTANx_yyy ) nan True
HTAN Biospecimen ID HTAN ID associated with a biosample based on HTAN ID SOP (eg HTANx_yyy_zzz) nan True
HTAN Parent ID HTAN ID of parent from which the biospecimen was obtained. Parent could be another biospecimen or a research participant. nan True
HTAN Parent Biospecimen ID HTAN Biospecimen Identifier (eg HTANx_yyy_zzz) indicating the biospecimen(s) from which these files were derived; multiple parent biospecimen should be comma-separated nan True
HTAN Parent Data File ID HTAN Data File Identifier indicating the file(s) from which these files were derived nan True
Clinical Data Tier 2 Tier 2 Cancer Data Component, HTAN Participant ID, Timepoint Label, Start Days from Index, Stop Days from Index, Sentinel Lymph Node Count, Sentinel Node Positive Assessment Count, Tumor Extranodal Extension Indicator, Satellite Metastasis Present Indicator, Other Biopsy Resection Site, Extent of Tumor Resection, Prior Sites of Radiation, Immunosuppression, Concomitant Medication Received Type, Family Member Vital Status Indicator, COVID19 Occurrence Indicator, COVID19 Current Status, COVID19 Positive Lab Test Indicator, COVID19 Antibody Testing, COVID19 Complications Severity, COVID19 Cancer Treatment Followup, Ecig vape use, Ecig vape 30 day use num, Ecig vape times per day, Type of smoke exposure cumulative years, Chewing tobacco daily use count, Second hand smoke exposure years, Known Genetic Predisposition Mutation, Hereditary Cancer Predisposition Syndrome, Cancer Associated Gene Mutations, Mutational Signatures, Mismatch Repair System Status, Lab Tests for MMR Status, Mode of Cancer Detection, Education Level, Country of Birth, Medically Underserved Area, Rural vs Urban, Cancer Incidence, Cancer Incidence Location False
SRRS Clinical Data Tier 2 Cancer related clinical data specific to SRRS Component, HTAN Participant ID, Timepoint Label, Start Days from Index, Stop Days from Index, Education Level, Country of Birth, Medically Underserved Area, Rural vs Urban, Cancer Incidence, Cancer Incidence Location, Ethnicity, Gender, Race, Vital Status, Age at Diagnosis, Days to Last Follow up, Days to Last Known Disease Status, Days to Recurrence, Last Known Disease Status, Morphology, Primary Diagnosis, Progression or Recurrence, Site of Resection or Biopsy, Tissue or Organ of Origin, NCI Atlas Cancer Site, Tumor Grade, Pack Years Smoked, Years Smoked, Days to Follow Up, Gene Symbol, Molecular Analysis Method, Test Result, Treatment Type, Tumor Largest Dimension Diameter False
Lung Cancer Tier 3 Lung cancer specific attributes in Clinical Data Tier 3 Component, HTAN Participant ID, Timepoint Label, Start Days from Index, Stop Days from Index, Lung Cancer Detection Method Type, Lung Cancer Participant Procedure History, Lung Adjacent Histology Type, Lung Tumor Location Anatomic Site, Lung Tumor Lobe Bronchial Location, Current Lung Cancer Symptoms, Lung Topography, Lung Cancer Harboring Genomic Aberrations False
Colorectal Cancer Tier 3 Colorectal cancer specific attributes in Clinical Data Tier 3 Component, HTAN Participant ID, Timepoint Label, Start Days from Index, Stop Days from Index,Colorectal Cancer Detection Method Type, History of Prior Colon Polyps, Family Colon Cancer History Indicator, Family Medical History Colorectal Polyp Diagnosis, Immediate Family History Endometrial Cancer, Immediate Family History Ovarian Cancer, Patient Inflammatory Bowel Disease Personal Medica History, Patient Colonoscopy Performed Indicator, Colorectal Cancer Tumor Border Configuration, MLH1 Promoter Methylation Status, Colorectal Cancer KRAS Indicator, Colon Polyp Occurence Indicator, Family History Colorectal Polyp, Colorectal Polyp New Indicator, Colorectal Polyp Shape, Size of Polyp Removed, Colorectal Polyp Count, Colorectal Polyp Type, Colorectal Polyp Adenoma Type False
Breast Cancer Tier 3 Breast cancer specific attributes in Clinical Data Tier 3 Component, HTAN Participant ID, Timepoint Label, Start Days from Index, Stop Days from Index,Breast Carcinoma Detection Method Type, Breast Carcinoma Histology Category, Invasive Lobular Breast Carcinoma Histologic Category, Invasive Ductal Breast Carcinoma Histologic Category, Breast Biopsy Procedure Finding Type, Breast Quadrant Site, Breast Cancer Assessment Tests, Breast Cancer Genomic Test Performed, Mammaprint Risk Group, Oncotype Risk Group, Breast Carcinoma Estrogen Receptor Status, Breast Carcinoma Progesteroner Receptor Status, Breast Cancer Allred Estrogen Receptor Score, Prior Invasive Breast Disease, Breast Carcinoma ER Status Percentage Value, Breast Carcinoma PR Status Percentage Value, HER2 Breast Carcinoma Copy Number Total, Breast Carcinoma Centromere 17 Copy Number, Breast Carcinoma HER2 Centromere17 Copynumber Total, Breast Carcinoma HER2 Chromosome17 Ratio, Breast Carcinoma Surgical Procedure Name, Breast Carcinoma HER2 Ratio Diagnosis, Breast Carcinoma HER2 Status, Hormone Therapy Breast Cancer Prevention Indicator, Breast Carcinoma ER Staining Intensity, Breast Carcinoma PR Staining Intensity, Oncotype Score, Breast Imaging Performed Type, Multifocal Breast Carcinoma Present Indicator, Multicentric Breast Carcinoma Present Indicator, BIRADS Mammography Breast Density Category False
Neuroblastoma and Glioma Tier 3 Brain cancer specific attributes in Clinical Data Tier 3 Component, HTAN Participant ID, Timepoint Label, Start Days from Index, Stop Days from Index,CNS Tumor Primary Anatomic Site, Glioma Specific Metastasis Sites, Glioma Specific Radiation Field, Supra Tentorial Ependymoma Molecular Subgroup, Infra Tentorial Ependymoma Molecular Subgroup, Neuroblastoma MYCN Gene Amplification Status False
Acute Lymphoblastic Leukemia Tier 3 Acute Lymphoblastic Leukemia attributes in Clinical Data Tier 3 Component, HTAN Participant ID, Timepoint Label, Start Days from Index, Stop Days from Index,Specimen Blast Count Percentage Value, NCI ALL Risk Group, MRD ALL Diagnostic Sensitivity, CNS Leukemia Status False
Ovarian Cancer Tier 3 Ovarian cancer specific attributes in Clinical Data Tier 3 Component, HTAN Participant ID, Timepoint Label, Start Days from Index, Stop Days from Index,Ovarian Cancer Histologic Subtype, Ovarian Cancer Surgical Outcome, Ovarian Cancer Platinum Status False
Prostate Cancer Tier 3 Prostate cancer specific attributes in Clinical Data Tier 3 Component, HTAN Participant ID, Timepoint Label, Start Days from Index, Stop Days from Index,Location Extent Extraprostatic Extension, Location Nature Positive Margins, Seminal Vesicle Invasion, Prostate Carcinoma Histologic Type, Prostate Cancer Local Extent, Additonal Findings Uninvolved Prostate, Prostate Cancer Cytologic Morphologic Subtypes False
Sarcoma Tier 3 Sarcoma specific attributes in Clinical Data Tier 3 Component, HTAN Participant ID, Timepoint Label, Start Days from Index, Stop Days from Index,Sarcoma Subtype, Sarcoma Diagnosis Classification Category, Sarcoma Tumor Extension Type False
Pancreatic Cancer Tier 3 Pancreatic cancer specific attributes in Clinical Tier Data 3 Component, HTAN Participant ID, Timepoint Label, Start Days from Index, Stop Days from Index,Pancreas Precancer Histopathologic Grade, Pancreatic IPMN Pathology Epithelial Subtype, Pancreatic Duct Final Pathology Type False
Melanoma Tier 3 Melanoma specific attributes in Clinical Data Tier 3 Component, HTAN Participant ID, Timepoint Label, Start Days from Index, Stop Days from Index, Cutaneous Melanoma Tumor Infiltrating Lymphocytes, Cutaneous Melanoma Tumor Regression Range, Melanoma Specimen Clark Level Value, Cutaneous Melanoma Surgical Margins, Melanoma Lesion Size, History of Atypical Nevi, Fitzpatrick Skin Tone, History of Chronic UV Exposure, History of Blistering Sunburn, History of Tanning Bed Use, Immediate Family History Melanoma, Melanoma Biopsy Resection Sites, Cutaneous Melanoma Ulceration, Cutaneous Melanoma Additional Findings False
Demographics Demographic attributes Component, HTAN Participant ID, Ethnicity, Gender, Race, Vital Status, Days to Birth, Country of Residence, Age Is Obfuscated, Year Of Birth, Occupation Duration Years, Premature At Birth, Weeks Gestation at Birth False
Family History Family cancer history Component, HTAN Participant ID, Relative with Cancer History False
Exposure Exposure to carcinogens Component, HTAN Participant ID, Start Days from Index, Smoking Exposure, Alcohol Exposure, Asbestos Exposure, Coal Dust Exposure, Environmental Tobacco Smoke Exposure, Radon Exposure, Respirable Crystalline Silica Exposure False
Follow Up Follow up clinical visits Component, HTAN Participant ID, Days to Follow Up, Adverse Event, Progression or Recurrence, Barretts Esophagus Goblet Cells Present, BMI, Cause of Response, Comorbidity, Comorbidity Method of Diagnosis, Days to Adverse Event, Days to Comorbidity, Diabetes Treatment Type, Disease Response, DLCO Ref Predictive Percent, ECOG Performance Status, FEV1 FVC Post Bronch Percent, FEV 1 FVC Pre Bronch Percent, FEV1 Ref Post Bronch Percent, FEV1 Ref Pre Bronch Percent, Height, Hepatitis Sustained Virological Response, HPV Positive Type, Karnofsky Performance Status, Menopause Status, Pancreatitis Onset Year, Reflux Treatment Type, Risk Factor, Risk Factor Treatment, Viral Hepatitis Serologies, Weight, Adverse Event Grade, AIDS Risk Factors, Body Surface Area, CD4 Count, CDC HIV Risk Factors, Days to Imaging, Evidence of Recurrence Type, HAART Treatment Indicator, HIV Viral Load, Hormonal Contraceptive Use, Hysterectomy Margins Involved, Hysterectomy Type, Imaging Result, Imaging Type, Immunosuppressive Treatment Type, Nadir CD4 Count, Pregnancy Outcome, Recist Targeted Regions Number, Recist Targeted Regions Sum, Scan Tracer Used False
Therapy Clinical therapy or treatment Component, HTAN Participant ID, Treatment or Therapy, Treatment Type, Treatment Effect, Treatment Outcome, Days to Treatment End, Treatment Anatomic Site, Days to Treatment Start, Initial Disease Status, Regimen or Line of Therapy, Therapeutic Agents, Treatment Intent Type, Chemo Concurrent to Radiation, Number of Cycles, Reason Treatment Ended, Treatment Arm, Treatment Dose, Treatment Dose Units, Treatment Effect Indicator, Treatment Frequency False
Diagnosis Disease diagnosis Component, HTAN Participant ID, Age at Diagnosis, Year of Diagnosis, Primary Diagnosis, Precancerous Condition Type, Site of Resection or Biopsy, Tissue or Organ of Origin, Morphology, Tumor Grade, Progression or Recurrence, Last Known Disease Status, Days to Last Follow up, Days to Last Known Disease Status, Method of Diagnosis, Prior Malignancy, Prior Treatment, Metastasis at Diagnosis, Metastasis at Diagnosis Site, First Symptom Prior to Diagnosis, Days to Diagnosis, Percent Tumor Invasion, Residual Disease, Synchronous Malignancy, Tumor Confined to Organ of Origin, Tumor Focality, Tumor Largest Dimension Diameter, Gross Tumor Weight, Breslow Thickness, Vascular Invasion Present, Vascular Invasion Type, Anaplasia Present, Anaplasia Present Type, Laterality, Perineural Invasion Present, Lymphatic Invasion Present, Lymph Nodes Positive, Lymph Nodes Tested, Peritoneal Fluid Cytological Status, Classification of Tumor, Best Overall Response, Mitotic Count, AJCC Clinical M, AJCC Clinical N, AJCC Clinical Stage, AJCC Clinical T, AJCC Pathologic M, AJCC Pathologic N, AJCC Pathologic Stage, AJCC Pathologic T, AJCC Staging System Edition, Cog Neuroblastoma Risk Group, Cog Rhabdomyosarcoma Risk Group, Gleason Grade Group, Gleason Grade Tertiary, Gleason Patterns Percent, Greatest Tumor Dimension, IGCCCG Stage, INPC Grade, INPC Histologic Group, INRG Stage, INSS Stage, International Prognostic Index, IRS Group, IRS Stage, ISS Stage, Lymph Node Involved Site, Margin Distance, Margins Involved Site, Medulloblastoma Molecular Classification, Micropapillary Features, Mitosis Karyorrhexis Index, Non Nodal Regional Disease, Non Nodal Tumor Deposits, Ovarian Specimen Status, Ovarian Surface Involvement, Pregnant at Diagnosis, Primary Gleason Grade, Secondary Gleason Grade, Supratentorial Localization, Tumor Depth, WHO CNS Grade, WHO NTE Grade False
Molecular Test Clinical molecular test data Component, HTAN Participant ID, Timepoint Label, Start Days from Index, Stop Days from Index, Gene Symbol, Molecular Analysis Method, Test Result, AA Change, Antigen, Clinical Biospecimen Type, Blood Test Normal Range Upper, Blood Test Normal Range Lower, Cell Count, Chromosome, Clonality, Copy Number, Cytoband, Exon, Histone Family, Histone Variant, Intron, Laboratory Test, Loci Abnormal Count, Loci Count, Locus, Mismatch Repair Mutation, Molecular Consequence, Pathogenicity, Ploidy, Second Exon, Second Gene Symbol, Specialized Molecular Test, Test Analyte Type, Test Units, Test Value, Transcript, Variant Origin, Variant Type, Zygosity False
Biospecimen HTAN biological entity; this can be tissue, blood, analyte and subsamples of those Component, HTAN Biospecimen ID, Source HTAN Biospecimen ID, HTAN Parent ID, Timepoint Label, Collection Days from Index, Adjacent Biospecimen IDs, Biospecimen Type, Acquisition Method Type, Fixative Type, Storage Method, Processing Days from Index, Protocol Link, Site Data Source, Collection Media, Mounting Medium, Processing Location, Histology Assessment By, Histology Assessment Medium, Preinvasive Morphology, Tumor Infiltrating Lymphocytes, Degree of Dysplasia, Dysplasia Fraction, Number Proliferating Cells, Percent Eosinophil Infiltration, Percent Granulocyte Infiltration, Percent Inflam Infiltration, Percent Lymphocyte Infiltration, Percent Monocyte Infiltration, Percent Necrosis, Percent Neutrophil Infiltration, Percent Normal Cells, Percent Stromal Cells, Percent Tumor Cells, Percent Tumor Nuclei, Fiducial Marker, Slicing Method, Lysis Buffer, Method of Nucleic Acid Isolation False
SRRS Biospecimen SRRS-specific HTAN biological entity; this can be tissue, blood, analyte and subsamples of those, however it can be described via fewer attributes than a standard HTAN specimen Component, HTAN Biospecimen ID, Source HTAN Biospecimen ID, HTAN Parent ID, Adjacent Biospecimen IDs, Biospecimen Type, Timepoint Label, Collection Days from Index, Acquisition Method Type, Ischemic Time, Ischemic Temperature, Collection Media, Topography Code, Additional Topography, Fixative Type, Storage Method, Preinvasive Morphology, Histologic Morphology Code, Preservation Method, Processing Days from Index, Protocol Link False
Source HTAN Biospecimen ID This is the HTAN ID that may have been assigned to the biospecimen at the site of biospecimen origin (e.g. BU). nan False
Other Assay Metadata applying to any assay without standard descriptors. Can be used as a placeholder for minimal amount of metadata until the assay descriptors are standardized Component, Filename, File Format, HTAN Parent Biospecimen ID, HTAN Data File ID, Assay Type False
ExSeq Minimal Minimal metadata for the ExSeq assay Component, Filename, File Format, HTAN Parent Biospecimen ID, HTAN Data File ID, Assay Type False
Assay Type The type and level of assay this metadata applies to (e.g. RPPA, NanoString DSP, etc.) nan True
scRNA-seq Level 1 Single-cell RNA-seq [EFO_0008913] Component, Filename, File Format, HTAN Parent Biospecimen ID, HTAN Data File ID, Nucleic Acid Source, Cryopreserved Cells in Sample, Single Cell Isolation Method, Dissociation Method, Library Construction Method, Read Indicator, Read1, Read2, End Bias, Reverse Transcription Primer, Spike In, Sequencing Platform, Total Number of Input Cells, Input Cells and Nuclei, Library Preparation Days from Index, Single Cell Dissociation Days from Index, Sequencing Library Construction Days from Index, Nucleic Acid Capture Days from Index, Protocol Link, Technical Replicate Group False
scRNA-seq Level 2 Alignment workflows downstream of scRNA-seq Level 1 Component, Filename, File Format, HTAN Parent Data File ID, HTAN Data File ID, scRNAseq Workflow Type, Workflow Version, scRNAseq Workflow Parameters Description, Workflow Link, Genomic Reference, Genomic Reference URL, Genome Annotation URL, Checksum, Whitelist Cell Barcode File Link, Cell Barcode Tag, UMI Tag, Applied Hard Trimming False
scRNA-seq Level 3 Gene and Isoform expression files Component, Filename, File Format, HTAN Parent Data File ID, HTAN Data File ID, Data Category, Matrix Type, Linked Matrices, Cell Median Number Reads, Cell Median Number Genes, Cell Total, scRNAseq Workflow Type, scRNAseq Workflow Parameters Description, Workflow Link, Workflow Version False
scRNA-seq Level 4 Data represents the relationships between cells derived from Level 3 expression data and shown as tSNE or UMAP coordinates per cell, plus all other cell-specific meta information (e.g., cell type) Component, Filename, File Format, HTAN Parent Data File ID, HTAN Data File ID, scRNAseq Workflow Type, scRNAseq Workflow Parameters Description, Workflow Version, Workflow Link False
Slide-seq Level 1 Raw sequencing files for the Slide-seq assay. Component, Filename, File Format, HTAN Parent Biospecimen ID, HTAN Data File ID, Nucleic Acid Source, Read Indicator, Spatial Read1, Spatial Read2, End Bias, Reverse Transcription Primer, Spatial Barcode Offset, Spatial Barcode and UMI, Spike In, Sequencing Platform, Technical Replicate Group, Protocol Link, Spatial Library Construction Method, Library Preparation Days from Index, Sequencing Library Construction Days from Index, Nucleic Acid Capture Days from Index False
Slide-seq Level 2 Aligned sequencing files and QC for the Slide-seq assay. Component, Filename, File Format, HTAN Parent Data File ID, HTAN Data File ID, Slide-seq Workflow Type, Workflow Version, Slide-seq Workflow Parameter Description, Workflow Link, Genomic Reference, Genomic Reference URL, Genome Annotation URL, Checksum, Spatial Barcode Tag, Matched Spatial Barcode Tag, UMI Tag, Applied Hard Trimming False
Slide-seq Level 3 Gene matrices with features and barcodes for Slide-seq as well as spatial information (bead location files). Component, Filename, File Format, HTAN Parent Data File ID, HTAN Data File ID, Run ID, Sequencing Batch ID, Data Category, Matrix Type, Slide-seq Workflow Type, Workflow Version, Slide-seq Workflow Parameter Description, Workflow Link, Beads Total, Median UMI Counts per Spot, Median Number Genes per Spatial Spot, Slide-seq Bead File Type, Slide-seq Fragment Size False
Slide-seq Fragment Size Average cDNA length associated with the experiemtn. Integer nan False
Matched Spatial Barcode Tag SAM tag for matched spot barcode field; please provide a valid spot barcode tag (e.g. CB:Z) (Slide-seq specific) nan True
Beads Total Number of sequenced beads. Applies to raw counts matrix only. Integer nan False
Slide-seq Workflow Type Generic name for the workflow used to analyze the Slide-seq data set. String nan True
Slide-seq Workflow Parameter Description Parameters used to run the Slide-seq workflow. String nan True
Slide-seq Bead File Type The type of Level 3 file submitted as part of the Slide-seq workflow. nan True
Bulk RNA-seq Level 1 Bulk RNA-seq [EFO_0003738] Component, Filename, File Format, HTAN Parent Biospecimen ID, HTAN Data File ID, Library Layout, Read Indicator, Nucleic Acid Source, Micro-region Seq Platform, ROI Tag, Sequencing Platform, Sequencing Batch ID, Read Length, Library Selection Method, Library Preparation Kit Name, Library Preparation Kit Vendor, Library Preparation Kit Version, Library Preparation Days from Index, Spike In, Adapter Name, Adapter Sequence, Base Caller Name, Base Caller Version, Flow Cell Barcode, Fragment Maximum Length, Fragment Mean Length, Fragment Minimum Length, Fragment Standard Deviation Length, Lane Number, Library Strand, Multiplex Barcode, Size Selection Range, Target Depth, To Trim Adapter Sequence, Transcript Integrity Number, RIN, DV200, Adapter Content, Basic Statistics, Encoding, Kmer Content, Overrepresented Sequences, Per Base N Content, Per Base Sequence Content, Per Base Sequence Quality, Per Sequence GC Content, Per Sequence Quality Score, Per Tile Sequence Quality, Percent GC Content, Sequence Duplication Levels, Sequence Length Distribution, Total Reads, QC Workflow Type, QC Workflow Version, QC Workflow Link False
Bulk RNA-seq Level 2 Bulk RNA-seq alignment protocol description Component, Filename, File Format, HTAN Parent Data File ID, HTAN Data File ID, Alignment Workflow Url, Alignment Workflow Type, Genomic Reference, Genomic Reference URL, Index File Name, Average Base Quality, Average Insert Size, Average Read Length, Contamination, Contamination Error, Mean Coverage, MSI Workflow Link, MSI Score, MSI Status, Pairs On Diff CHR, Total Reads, Total Uniquely Mapped, Total Unmapped reads, Proportion Reads Duplicated, Proportion Reads Mapped, Proportion Targets No Coverage, Proportion Base Mismatch, Short Reads, Is lowest level False
Bulk RNA-seq Level 3 Bulk RNA-seq gene expression matrices Component, Filename, File Format, HTAN Parent Data File ID, HTAN Data File ID, Pseudo Alignment Used, Data Category, Expression Units, Matrix Type, Fusion Gene Detected, Fusion Gene Identity False
Bulk WES Level 1 Bulk Whole Exome Sequencing raw files Component, Filename, File Format, HTAN Parent Biospecimen ID, HTAN Data File ID, Sequencing Batch ID, Library Layout, Read Indicator, Library Selection Method, Read Length, Target Capture Kit, Library Preparation Kit Name, Library Preparation Kit Vendor, Library Preparation Kit Version, Sequencing Platform, Adapter Name, Adapter Sequence, Base Caller Name, Base Caller Version, Flow Cell Barcode, Fragment Maximum Length, Fragment Mean Length, Fragment Minimum Length, Fragment Standard Deviation Length, Lane Number, Multiplex Barcode, Library Preparation Days from Index, Size Selection Range, Target Depth, To Trim Adapter Sequence False
Bulk WES Level 2 Bulk Whole Exome Sequencing aligned files and QC Component, Filename, File Format, HTAN Parent Data File ID, HTAN Data File ID, Alignment Workflow Type, Genomic Reference, Genomic Reference URL, Index File Name, Average Base Quality, Average Insert Size, Average Read Length, Contamination, Contamination Error, Mean Coverage, Adapter Content, Basic Statistics, Encoding, Overrepresented Sequences, Per Base N Content, Per Base Sequence Content, Per Base Sequence Quality, Per Sequence GC Content, Per Sequence Quality Score, Per Tile Sequence Quality, Percent GC Content, Sequence Duplication Levels, Sequence Length Distribution, QC Workflow Type, QC Workflow Version, QC Workflow Link, MSI Workflow Link, MSI Score, MSI Status, Pairs On Diff CHR, Total Reads, Total Uniquely Mapped, Total Unmapped reads, Proportion Reads Duplicated, Proportion Reads Mapped, Proportion Targets No Coverage, Proportion Base Mismatch, Short Reads, Proportion Coverage 10x, Proportion Coverage 30X,Is lowest level False
Bulk WES Level 3 Bulk Whole Exome Sequencing called variants Component, Filename, File Format, HTAN Parent Data File ID, HTAN Data File ID, Genomic Reference, Genomic Reference URL, Germline Variants Workflow URL, Germline Variants Workflow Type, Somatic Variants Workflow URL, Somatic Variants Workflow Type, Somatic Variants Sample Type, Structural Variant Workflow URL, Structural Variant Workflow Type False
Microarray Level 1 Microarray Level 1 refers to the raw text table of probe level intensities Component, Filename, File Format, HTAN Data File ID, HTAN Participant ID, HTAN Parent Biospecimen ID, Nucleic Acid Source, Microarray Platform ID, Microarray Molecule, Microarray Label, Microarray Value Definition, Microarray Protocol Auxiliary File False
Microarray Level 2 Microarray Level 2 provides a normalized matrix of values. Component, Filename, File Format, HTAN Participant ID, HTAN Parent Biospecimen ID, HTAN Parent Data File ID, HTAN Data File ID, Microarray Platform ID, Normalization Method False
scATAC-seq Level 1 scATAC-seq files containing sequence read information, with or without alignment, as FASTQ or BAM files Component, Filename, File Format, HTAN Parent Biospecimen ID, HTAN Data File ID, Nucleic Acid Source, Dissociation Method, Single Nucleus Buffer, Single Cell Isolation Method, Transposition Reaction, scATACseq Library Layout, Nucleus Identifier, Nuclei Barcode Length, Nuclei Barcode Read, scATACseq Read1, scATACseq Read2, scATACseq Read3, Library Construction Method, Sequencing Platform, Threshold for Minimum Passing Reads, Total Number of Passing Nuclei, Median Fraction of Reads in Peaks, Median Fraction of Reads in Annotated cis DNA Elements, Median Passing Read Percentage, Median Percentage of Mitochondrial Reads per Nucleus,Technical Replicate Group, Total Reads, Protocol Link False
scATAC-seq Level 2 scATAC-seq files containing aligned sequence data, as a BAM file Component, Filename, File Format, HTAN Parent Data File ID, HTAN Data File ID, Alignment Workflow Url, Alignment Workflow Type, Genomic Reference, Genomic Reference URL, Index File Name, Average Base Quality, Average Insert Size, Average Read Length, Mean Coverage, Pairs On Diff CHR, Total Reads, Proportion Reads Mapped, MapQ30, Total Uniquely Mapped, Total Unmapped reads, Proportion Reads Duplicated, Short Reads, Proportion Coverage 10x, Proportion Coverage 30X, Proportion Targets No Coverage, Proportion Base Mismatch, Median Percentage of Mitochondrial Reads per Nucleus, Contamination,Contamination Error False
scATAC-seq Level 3 Processed data files containing peak information for cells Component, Filename, File Format, HTAN Parent Data File ID, HTAN Data File ID, scATAC-seq Object ID, nCount Peaks, nFeature Peaks, Total Read-Pairs, Duplicate Read-Pairs, Chimeric Read-Pairs, Unmapped Read-Pairs, LowMapQ, Mitochondrial Read-Pairs, Passed Filters, TSS Fragments, DNase Sensitive Region Fragments, Enhancer Region Fragments, Promoter Region Fragments, On Target Fragments, Blacklist Region Fragments, Peak Region Fragments, Peak Region Cutsites, Nucleosome Signal, Nucleosome Percentile, TSS Enrichment, TSS Percentile, Pct Reads in Peaks, Blacklist Ratio, Seurat Clusters, nCount RNA, nFeature RNA, MACS2 Seqnames, MACS2 Start, MACS2 End, MACS2 Width, MACS2 Strand, MACS2 Name, MACS2 Score, MACS2 Fold Change, MACS2 Neg Log10 pvalue Summit, MACS2 Neg Log10 qvalue Summit, MACS2 Relative Summit Position False
scmC-seq Level 1 Files contain raw scmC-seq data. Component, Filename, File Format, HTAN Parent Biospecimen ID, HTAN Data File ID, Nucleic Acid Source, scmCseq Read1, scmCseq Read2, scmCseq Read3, Single Cell Isolation Method, Single Nucleus Buffer, Single Nucleus Capture, Bisulfite Conversion, Library Layout, Nucleus Identifier, Sequencing Platform, Technical Replicate Group, Median Fraction of Reads in Peaks, Median Passing Read Percentage, Peaks Calling Software, Median Percentage of Mitochondrial Reads per Nucleus, Threshold for Minimum Passing Reads, Total Number of Passing Nuclei, Total Reads False
scmC-seq Level 2 Files contain scmC-seq files containing aligned sequence data, as a BAM file. Component, Filename, File Format, HTAN Parent Data File ID, HTAN Data File ID, Alignment Workflow Url, Alignment Workflow Type, Genomic Reference, Genomic Reference URL, Index File Name, Average Base Quality, Average Insert Size, Average Read Length, Contamination, Contamination Error, Mean Coverage, Pairs On Diff CHR, Total Reads, Total Uniquely Mapped, Total Unmapped reads, Proportion Reads Duplicated, Proportion Reads Mapped, Proportion Targets No Coverage, Proportion Base Mismatch, Short Reads False
scATAC-seq Level 4 Data represents the relationships between cells derived from Level 3 expression data and shown as tSNE or UMAP coordinates per cell, plus all other cell-specific meta information (e.g., cell type) Component, Filename, File Format, HTAN Parent Data File ID, HTAN Data File ID, scATACseq Workflow Type, scATACseq Workflow Parameters Description, Workflow Version, Workflow Link False
scDNA-seq Level 1 Single-cell DNA-seq Component, Filename, File Format, HTAN Parent Biospecimen ID, HTAN Data File ID, Sequencing Batch ID, Library Layout, Nucleic Acid Source, Library Selection Method, Read Length, Library Preparation Kit Name, Library Preparation Kit Vendor, Library Preparation Kit Version, Adapter Name, Adapter Sequence, Base Caller Name, Base Caller Version, Flow Cell Barcode, Fragment Maximum Length, Fragment Mean Length, Fragment Minimum Length, Fragment Standard Deviation Length, Lane Number, Library Strand, Multiplex Barcode, Size Selection Range, Target Depth, To Trim Adapter Sequence, Adapter Content, Basic Statistics, Encoding, Kmer Content, Overrepresented Sequences, Per Base N Content, Per Base Sequence Content, Per Base Sequence Quality, Per Sequence GC Content, Per Sequence Quality Score, Per Tile Sequence Quality, Percent GC Content, Sequence Duplication Levels, Sequence Length Distribution, Total Reads, QC Workflow Type, QC Workflow Version, QC Workflow Link False
scDNA-seq Level 2 Alignment workflows downstream of scDNA-seq Level 1 Component, Filename, File Format, HTAN Parent Data File ID, HTAN Data File ID, Alignment Workflow Url, Alignment Workflow Type, Genomic Reference, Genomic Reference URL, Index File Name, Average Base Quality, Average Insert Size, Average Read Length, Mean Coverage, Pairs On Diff CHR, Total Reads, Proportion Reads Mapped, MapQ30, Total Uniquely Mapped, Total Unmapped reads,Proportion Reads Duplicated, Short Reads, Proportion Coverage 10x, Proportion Coverage 30X, Proportion Targets No Coverage, Proportion Base Mismatch, Proportion Mitochondrial Reads, Contamination, Contamination Error False
Multiplexed CITE-seq Level 1 Raw sequencing files for the multiplexed CITE-seq assay Component, Filename, File Format, HTAN Parent Biospecimen ID, HTAN Data File ID, Nucleic Acid Source,Cryopreserved Cells in Sample, Single Cell Isolation Method, Dissociation Method, Library Construction Method,Read Indicator, Read1, Read2, End Bias, Reverse Transcription Primer, Spike In, Spike In Concentration, Sequencing Platform, Total Number of Input Cells, Input Cells and Nuclei, Library Preparation Days from Index, Single Cell Dissociation Days from Index, Sequencing Library Construction Days from Index, Nucleic Acid Capture Days from Index, Protocol Link, Technical Replicate Group, Empty Well Barcode,Well Index,Feature Reference Id, Associated mRNA Library Data File ID, Single Cell Barcode Method Applied, Feature Barcode Library Type, Barcode Folder Synapse ID, Barcode Folder File List False
Multiplexed CITE-seq Level 2 Alignment workflows downstream of Multiplexed CITE-seq Level 1 Component, Filename, File Format, HTAN Parent Biospecimen ID, HTAN Parent Data File ID, HTAN Data File ID, Associated mRNA Library Data File ID, scRNAseq Workflow Type, Workflow Version, scRNAseq Workflow Parameters Description, Workflow Link, Genomic Reference, Genomic Reference URL, Genome Annotation URL, Checksum, Whitelist Cell Barcode File Link, Cell Barcode Tag, UMI Tag, Applied Hard Trimming False
Multiplexed CITE-seq Level 3 Gene and Isoform expression files Component, Filename, File Format, HTAN Parent Data File ID, HTAN Parent Biospecimen ID, HTAN Data File ID, Associated mRNA Library Data File ID, Data Category, Matrix Type, Linked Matrices, Cell Median Number Reads, Cell Median Number Genes, Cell Total, scRNAseq Workflow Type, scRNAseq Workflow Parameters Description, Workflow Link, Workflow Version False
Multiplexed CITE-seq Level 4 Data represents the relationships between cells derived from Level 3 expression data and shown as tSNE or UMAP coordinates per cell, plus all other cell-specific meta information (e.g., cell type) Component, Filename, File Format, HTAN Parent Data File ID, HTAN Parent Biospecimen ID, HTAN Data File ID, Associated mRNA Library Data File ID, scRNAseq Workflow Type, scRNAseq Workflow Parameters Description, Workflow Version, Workflow Link False
Bulk Methylation-seq Level 1 Raw data for bulk methylation sequencing, such as FASTQs and unaligned BAMs Component, Filename, File Format, HTAN Parent Biospecimen ID, HTAN Data File ID, Nucleic Acid Source, Bisulfite Conversion, Sequencing Platform, Replicate Type, Bulk Methylation Assay Type, Total DNA Input False
Bulk Methylation-seq Level 2 Aligned primary data for bulk methylation sequencing, such as gene expression matrix files, VCFs, etc. Component, Filename, File Format, HTAN Parent Data File ID, HTAN Data File ID, Alignment Workflow Url, Trimmer, Bulk Methylation Genomic Reference, Genomic Reference URL, Index File Name, Alignment Workflow Type, Duplicate Removal Software, Mean Coverage, Library Layout, Average Base Quality, Average Insert Size, Average Read Length, Contamination, Contamination Error, Pairs On Diff CHR, Total Reads, Total Uniquely Mapped, Total Unmapped reads, Proportion Reads Duplicated, Proportion Reads Mapped, Proportion Targets No Coverage, Proportion Base Mismatch, Short Reads, Proportion of Minimum CpG Coverage 10X, Proportion Coverage 30X False
Bulk Methylation-seq Level 3 Sample level summary data for bulk methylation sequencing, such as t-SNE plot coordinates, etc. Component, Filename, File Format, HTAN Parent Data File ID, HTAN Data File ID,DMC Calling Tool, DMC Calling Workflow URL, DMR Calling Tool, DMR Calling Workflow URL, pUC19 methylation ratio, Lambda methylation ratio, DMC data file format, DMR data file Format False
Imaging Level 1 Raw imaging data Component, Filename, File Format, HTAN Parent Biospecimen ID, HTAN Data File ID, Imaging Assay Type, Protocol Link, Software and Version, Commit SHA, Pre-processing Completed, Pre-processing Required, Comment False
Imaging Level 2 Raw and pre-processed image data Component, Filename, File Format, HTAN Participant ID, HTAN Parent Biospecimen ID, HTAN Data File ID, Channel Metadata Filename, Imaging Assay Type, Protocol Link, Software and Version, Microscope, Objective, NominalMagnification, LensNA, WorkingDistance,WorkingDistanceUnit, Immersion, Pyramid, Zstack, Tseries, Passed QC, Comment, FOV number, FOVX, FOVXUnit, FOVY, FOVYUnit, Frame Averaging, Image ID, DimensionOrder, PhysicalSizeX, PhysicalSizeXUnit, PhysicalSizeY, PhysicalSizeYUnit, PhysicalSizeZ, PhysicalSizeZUnit, Pixels BigEndian, PlaneCount, SizeC, SizeT, SizeX, SizeY, SizeZ, PixelType, MERFISH Positions File, MERFISH Codebook File False
MERFISH Positions File The positions file is an auxiliary MERFISH file that describes the location of bead positions in the assay. nan False
MERFISH Codebook File The codebook is an auxiliary MERFISH file that describes how each grouping of bits is converted to a gene name. nan False
Imaging Level 3 Segmentation Object segmentations Component, Filename, File Format, HTAN Parent Data File ID, HTAN Data File ID, Imaging Segmentation Data Type, Parameter file, Software and Version, Commit SHA, Imaging Object Class, Number of Objects False
Imaging Level 3 Image Quality controlled imaging data Component, Filename, File Format, HTAN Parent Biospecimen ID, HTAN Parent Data File ID, HTAN Parent Channel Metadata ID, HTAN Data File ID, Imaging Assay Type, Protocol Link,Software and Version, Microscope, Objective, NominalMagnification, LensNA, WorkingDistance, Immersion, Pyramid, Zstack, Tseries, Passed QC, Comment, FOV number, FOVX, FOVY, Frame Averaging False
10x Visium Spatial Transcriptomics - RNA-seq Level 1 Files contain raw RNA-seq data associated with spot/slide data. Component, Filename, Run ID, File Format, HTAN Parent Biospecimen ID, HTAN Data File ID, Read Indicator, Spatial Read1, Spatial Read2, Spatial Library Construction Method, Library Preparation Days from Index, Sequencing Library Construction Days from Index, End Bias, Reverse Transcription Primer, Sequencing Platform, Capture Area, Protocol Link, Slide Version, Slide ID, Image Re-orientation, Permeabilization Time, RIN, DV200 False
10x Visium Spatial Transcriptomics - RNA-seq Level 2 Alignment workflows downstream of Spatial Transcriptomics RNA-seq Level 1. Component, Filename, File Format, Checksum,HTAN Parent Data File ID, HTAN Data File ID, UMI Tag, Whitelist Spatial Barcode File Link, Spatial Barcode Tag, Applied Hard Trimming, Workflow Version, Workflow Link, Genomic Reference, Genomic Reference URL, Genome Annotation URL, HTAN Parent Biospecimen ID, Run ID, Capture Area False
10x Visium Spatial Transcriptomics - Auxiliary Files Auxiliary data associated with spot/slide analysis (aligned Images, quality control files, etc) from Spatial Transcriptomics. Component, Filename, File Format, HTAN Parent Biospecimen ID, HTAN Parent Data File ID, HTAN Data File ID, Run ID, Visium File Type, Slide ID, Capture Area, Workflow Version, Workflow Link False
10x Visium Spatial Transcriptomics - RNA-seq Level 3 Processed data files based on Spatial Transcriptomics RNA-seq Level 2 and Spatial Transcriptomics Auxiliary files. Component, Filename, File Format, HTAN Parent Biospecimen ID, HTAN Parent Data File ID, HTAN Data File ID, Run ID, Visium File Type, Workflow Version, Workflow Link, Capture Area, Spots under tissue, Mean Reads per Spatial Spot, Median Number Genes per Spatial Spot, Sequencing Saturation, Proportion Reads Mapped, Proportion Reads Mapped to Transcriptome, Median UMI Counts per Spot False
10x Visium Spatial Transcriptomics - RNA-seq Level 4 Processed data files based on Spatial Transcriptomics RNA-seq Level 3. Component, Filename, File Format, HTAN Parent Biospecimen ID, HTAN Parent Data File ID, HTAN Data File ID, Run ID, Workflow Version, Workflow Link, Visium Workflow Type, Visium Workflow Parameters Description False
Visium File Type The file type generated for the visium experiment. nan True
Run ID A unique identifier for this individual run (typically associated with a single slide) of the spatial transcriptomic processing workflow. nan True
Capture Area Area (or Capture Area) - One of the either four or two active regions where tissue can be placed on a Visium slide. Each area is intended to contain only one tissue sample. Slide areas are named consecutively from top to bottom: A1, B1, C1, D1 for Visium slides with 6.5 mm Capture Area and A, B for CytAssist slides with 11 mm Capture Area. Both CytAssist slides with 6.5 mm Capture Area and Gateway Slides contain only two slide areas, A1 and D1. nan False
Slide Version Version of imaging slide used. Slide version is critical for the analysis of the sequencing data as different slides have different capture area layouts. nan False
Slide ID For Visium, it is the unique identifier printed on the label of each Visium slide. The serial number starts with V followed by a number which can range between one through five and ends with a dash and a three digit number, such as 123. For CosMx, this refers to the loaded Flow Cell ID. For Xenium, this ID indicates the slide orientation, as it matches the relative location of the ID on the physical Xenium slide. nan False
Image Re-orientation To ensure good fiducial alignment and tissue spots detection, it is important to correct for this shift in orientation. nan False
Permeabilization Time Fixed and stained tissue sections are permeabilized for different times. Each Capture Area captures polyadenylated mRNA from the attached tissue section. Measure is provided in minutes. nan False
Whitelist Spatial Barcode File Link Link to file listing all possible spatial barcodes. URL nan True
Spatial Barcode Tag SAM tag for spot barcode field; please provide a valid spot barcode tag (e.g. CB:Z) nan True
Spatial Barcode Offset Offset in sequence for spot barcode read (in bp): number nan True
Spatial Barcode Length Length of spot barcode read (in bp): number nan True
Spatial Read1 Read 1 content description nan True
Spatial Read2 Read 2 content description nan True
Spatial Library Construction Method Process which results in the creation of a library from fragments of DNA using cloning vectors or oligonucleotides with the role of adaptors [OBI_0000711] nan True
Spatial Barcode and UMI Spot and transcript identifiers Spatial Barcode Offset, Spatial Barcode Length, UMI Barcode Offset, UMI Barcode Length True
Mean Reads per Spatial Spot The number of reads, both under and outside of tissue, divided by the number of barcodes associated with a spot under tissue. nan True
Visium Workflow Type Generic name for the workflow used to analyze the visium data set. nan True
Visium Workflow Parameters Description Parameters used to run the workflow.. nan True
Spots under tissue The number of barcodes associated with a spot under tissue. nan True
Median UMI Counts per Spot The median number of UMI counts per tissue covered spot. nan True
Sequencing Saturation The fraction of reads originating from an already-observed UMI. This is a function of library complexity and sequencing depth. More specifically, this is the fraction of confidently mapped, valid spot-barcode, valid UMI reads that had a non-unique (spot-barcode, UMI, gene). nan True
Proportion Reads Mapped to Transcriptome Fraction of reads that mapped to a unique gene in the transcriptome. The read must be consistent with annotated splice junctions. These reads are considered for UMI counting. nan True
Median Number Genes per Spatial Spot The median number of genes detected per spot under tissue-associated barcode. Detection is defined as the presence of at least 1 UMI count. nan True
NanoString GeoMx DSP Spatial Transcriptomics Level 1 Files contain raw data output from the NanoString GeoMx DSP Pipeline. These can include RCC or DCC Files. Component, Filename, File Format, HTAN Parent Biospecimen ID, HTAN Data File ID, Synapse ID of GeoMx DSP PKC File, GeoMx DSP NGS Sequencing Platform, GeoMx DSP NGS Library Selection Method, GeoMx DSP NGS Library Preparation Kit Name, GeoMx DSP Library Preparation Kit Vendor, GeoMx DSP Library Preparation Kit Version, Synapse ID of GeoMx Lab Worksheet File, Software and Version False
GeoMx DSP Assay Type The assay type which was used for the GeoMx DSP pipeline. nan True
Synapse ID of GeoMx DSP PKC File The Synapse ID(s) associated with the PKC mapping file for the assay. Multiple files are listed as comma separated values. nan True
GeoMx DSP NGS Sequencing Platform A platform is an object aggregate that is the set of instruments and software needed to perform a process [OBI_0000050]. Specific model of the sequencing instrument. nan False
GeoMx DSP NGS Library Selection Method How RNA molecules are isolated. nan False
GeoMx DSP NGS Library Preparation Kit Name Name of Library Preparation Kit. String nan False
GeoMx DSP Library Preparation Kit Vendor Vendor of Library Preparation Kit. String nan False
GeoMx DSP Library Preparation Kit Version Version of Library Preparation Kit. String nan False
Synapse ID of GeoMx Lab Worksheet File Synapse ID(s) of Lab Worksheet Files output from the GeoMx DSP workflow. Multiple files are listed as comma separated values. nan False
NanoString GeoMx DSP Spatial Transcriptomics Level 3 Files contain processed data from the NanoString GeoMx DSP Pipeline. This level depends on GeoMx Level 1 and Imaging Level 2. Component, Filename, File Format, HTAN Parent Biospecimen ID, HTAN Data File ID, GeoMx DSP Assay Type, Synapse ID of GeoMx DSP ROI Segment Annotation File, GeoMx DSP Unique Probe Count, GeoMx DSP Unique Target Count, GeoMx DSP Genomic Reference, Matrix Type, GeoMx DSP Workflow Type, GeoMx DSP Workflow Parameter Description, GeoMx DSP Workflow Link False
Synapse ID of GeoMx DSP ROI Segment Annotation File Synapse ID(s) for ROI/Segmentation annotations in the GeoMx DSP experiment. nan True
GeoMx DSP Genomic Reference Exact version of the human genome reference used in the alignment of reads (e.g. https://www.gencodegenes.org/human/). Only applicable to some applications in GeoMx nan False
GeoMx DSP Unique Probe Count Total number of unique probes reported. nan False
GeoMx DSP Unique Target Count Total number of unique genes reported. nan False
GeoMx DSP Workflow Type Generic name for the workflow used to analyze the GeoMx DSP data set. nan False
GeoMx DSP Workflow Parameter Description Parameters used to run the GeoMx DSP workflow. nan False
GeoMx DSP Workflow Link Link to workflow or command. DockStore.org recommended. URL nan False
NanoString GeoMx DSP ROI RCC Segment Annotation Metadata GeoMx ROI and Segment Metadata Attributes. The assayed biospecimen should be reported one per row with the associated ROI coordinates. HTAN Parent Biospecimen ID, Scan name, ROI name, Segment name, ROI X Coordinate,ROI Y Coordinate, Tags, QC status, Scan Height, Scan Width, Scan Offset X, Scan Offset Y, Binding Density, Positive norm factor, Surface area, Nuclei count, Tissue Stain False
Scan name GeoMx Scan name (as appears in Segment Summary) nan True
ROI name ROI name (application generated). For Xenium this is referred to as the “region name” nan True
Segment name Name given to segment at time of generation nan True
Tags Unique descriptor of a variable group (ie. MAPK+) nan True
ROI X Coordinate X location within the image nan True
ROI Y Coordinate Y location within the image nan True
QC status ROI quality control flag as reported by the application nan False
Scan Height Height of the scan for GeoMx Analysis nan True
Scan Width Width of the scan for GeoMx Analysis nan True
Scan Offset X Offset X of the scan for GeoMx Analysis nan True
Scan Offset Y Offset Y of the scan for GeoMx Analysis nan True
Binding Density The binding density as reported by the application nan False
Positive norm factor The Positive Control Normalization factor calculated using pos-hyb controls nan False
Surface area Surface area of the ROI in square microns (µm^2). In CosMx, this is referred to as the Scan Area. In Xenium, this is referred to as the Region Area nan True
Nuclei count Number of nuclei detected in the segment (if applicable) nan True
Tissue Stain e.g. CD45 or PanCK (if masking was performed) nan False
NanoString GeoMx DSP ROI DCC Segment Annotation Metadata GeoMx ROI and Segment Metadata Attributes. The assayed biospecimen should be reported one per row with the associated ROI coordinates. HTAN Parent Biospecimen ID, Scan name, Slide name, ROI name, Segment name, ROI X Coordinate,ROI Y Coordinate, Tags, Scan Height, Scan Width, Scan Offset X, Scan Offset Y, Surface area, Nuclei count, Sequencing Saturation, MapQ30, Raw reads, Stitched reads, Aligned reads, Deduplicated reads, In Situ Negative median, Biological probe median False
Slide name Similar to a Run ID, the slide name indicates the slide a given ROI is linked to (as reported in Segment Summary). nan False
Raw reads Reads not yet analyzed in any way to be used for data analysis. The number of reads that pass filter from the flow cell represented in the FASTQ file. nan False
Stitched reads Represents consensus from the overlapping sequence of read 1 and 2. This is a % of the aligned reads that were overlapped and consensus confirmed, usually upward of 80% but less in terms of number of reads than aligned reads nan False
Aligned reads Is a sequence that has been aligned to a gene/probe. Typically these reads can number from the hundreds of thousands to tens of millions. In GeoMx alignment is via mapping the RTS ID to a white list of sequences that represent targets. nan False
Deduplicated reads Is the replacement of blocks of duplicate data with a Virtual Index Pointer linking the new sub-block to the existing block of data in a duplicate repository. This is used to reduce the amount of space need to store the data. nan False
In Situ Negative median Is the median of all negative control probes for a given segment. A measure of signal to background for each segment. nan False
Biological probe median Is the median count from all probes except the negative control probes. A measure of signal to background for each segment nan False
HI-C-seq Level 1 Unaligned sequence data Component, HTAN Parent Biospecimen ID, HTAN Data File ID, Filename, File Format, Genomic Reference, Sequencing Platform, Nucleic Acid Source, Technical Replicate Group, Transposition Reaction, Crosslinking Condtion, DNA Digestion Condition, Nuclei Permeabilization Method, Ligation Condition, Biotin Enrichment, DNA Input Amount, Total Reads, Protocol Link False
HI-C-seq Level 2 Aligned read pairs, contact matrix Component, HTAN Data File ID, HTAN Parent Data File ID, Filename, File Format, Genomic Reference, Aligned Read Length, Tool, Resolution, Normalization Method False
HI-C-seq Level 3 Summary data for the HI-C-seq assay. Component, HTAN Parent Data File ID, HTAN Data File ID, Filename, File Format, Genomic Reference, Stripe Calling, Loop Window, Stripe Window, Loop Calling False
Crosslinking Condtion Detailed condition for DNA crosslinking nan True
DNA Digestion Condition Enzymes and treatment length/temperature for genome digestion nan True
Nuclei Permeabilization Method Detergent and treatment condition for nuclei permeabilization and crosslinking softening nan True
Ligation Condition Name of ligase and condition for proximity ligation nan True
Biotin Enrichment Whether biotin is used for enriching ligation product nan True
DNA Input Amount Amount of DNA for library construction, in nanograms. nan True
Resolution Binning size used for generating contact matrix, in basepair. nan True
Stripe Calling Tool used for identifying architectural stripe-forming, interaction hotspots. nan True
Loop Window Binning size used for calling significant dot interactions (loops) nan True
Stripe Window Binning size used for calling significant architectural stripes. Can be an integer or comma-separated list of integers indicating bin size and sliding window size if different. nan True
Loop Calling Tool used for identifying loop interactions nan True
Imaging Level 4 Derived imaging data: Object-by-feature array Component, Filename, File Format, HTAN Parent Data File ID, HTAN Parent Channel Metadata ID, HTAN Data File ID, Parameter file, Software and Version, Commit SHA,Number of Objects, Number of Features,Imaging Object Class, Imaging Summary Statistic False
SRRS Imaging Level 2 SRRS-specific HTAN raw and pre-processed image data Component, Filename, File Format, HTAN Participant ID, HTAN Parent Biospecimen ID, HTAN Data File ID, Channel Metadata Filename, Imaging Assay Type, Protocol Link, Software and Version, Microscope, Objective, NominalMagnification, Pyramid, Zstack, Tseries, Passed QC, Frame Averaging, Image ID, DimensionOrder, PhysicalSizeX, PhysicalSizeXUnit, PhysicalSizeY, PhysicalSizeYUnit, Pixels BigEndian, PlaneCount, SizeC, SizeT, SizeX, SizeY, SizeZ, PixelType False
10X Genomics Xenium ISS Experiment All data pertaining to the 10X Genomics Xenium In-Situ Hybridization experiment Component, Filename, File Format, HTAN Parent Biospecimen ID, HTAN Data File ID, Xenium Bundle Contents, Slide ID, ROI name, Panel Name, Protocol Link, Software and Version,Total Number of Cells, Total Number of Targets, Surface area, Experiment IF Channels, Transcripts per Cell, Percent of Transcripts within Cells, Decoded Transcripts, Xenium IF image HTAN File ID, Xenium HE image HTAN File ID False
Xenium Bundle Contents A comma separated list of filenames within the Xenium bundle zip file nan True
Panel Name The human-readable panel name. This could be the Gene Panel name or Protein Panel name. In Xenium, this refers to the string entered as the name in panel specification (e.g. Xenium Human Immuno-Oncology Add-on B Gene Expression). In CosMx, this refers to the panel name as it appears in the CosMx catalog (e.g. CosMx Human Universal Cell Characterization Panel (1000-plex)) nan True
Total Number of Cells The total number of cells analyzed on the flow cell nan True
Total Number of Targets Refers to the target of an assay. Can be genes/transcripts or probes nan True
Experiment IF Channels A comma-separated list with any number of channels the user deems appropriate(Example: PanCK, CD45, CD3, DAPI) nan True
Transcripts per Cell Mean or Median transcript count per cell analyzed on the flow cell or slide nan True
Percent of Transcripts within Cells The percentage of transcripts assigned to assayed cells nan True
Decoded Transcripts In Xenium, this is the number of high-quality, decoded-to-gene nuclear transcripts divided by the total segmented nuclear area to get a transcript density (units are reported in 100um^2). nan True
Xenium IF image HTAN File ID The HTAN Data File ID of a Imaging Level 2 file nan False
Xenium HE image HTAN File ID The HTAN Data File ID of a Imaging Level 2 file nan False
RPPA Level 2 Array based protemics. Each dilution curve of spot intensities is fitted using the monotone increasing B-spline model in the SuperCurve R package. This fits a single curve using all the samples on a slide with the signal intensity as the response variable and the dilution steps as independent variables. The fitted curve is plotted with the signal intensities on the y-axis and the log2-concentration of proteins on the x-axis for diagnostic purposes. Component, Filename, File Format, HTAN Participant ID, HTAN Parent Biospecimen ID, HTAN Parent Data File ID, HTAN Data File ID, HTAN RPPA Antibody Table, Assay Type, Protocol Link, Software and Version False
HTAN RPPA Antibody Table A table containing antibody level metadata for RPPA HTAN RPPA Antibody Table ID, Filename, File Format, Ab Name Reported on Dataset, GENCODE Gene Symbol Target, UNIPROT Protein ID Target, Phosphoprotein Flag, Vendor, Catalog Number, Internal Ab ID, Species, RPPA Dilution, Phospho Site, RPPA Validation Status, Clone, Clonality, Antibody Notes True
RPPA Level 3 Level 3 Reverse Phase Protein Array (RPPA) data contains intra-batch normalized intensities. Component, Filename, File Format, HTAN Participant ID, HTAN Parent Biospecimen ID, HTAN Parent Data File ID, HTAN Data File ID, Assay Type, Software and Version, Normalization Method False
RPPA Level 4 Level 4 Reverse Phase Protein Array (RPPA) data contains intra-batch corrected intensities. Component, Filename, File Format, HTAN Participant ID, HTAN Parent Biospecimen ID, HTAN Parent Data File ID, HTAN Data File ID, Assay Type, Batch Correction Method False
Nanostring CosMx SMI Experiment RNA and Protein Panel assays applied as part of Nanostring CosMx Spatial Molecular Imager (SMI) Component, Filename, File Format, HTAN Parent Biospecimen ID, HTAN Data File ID, CosMx Bundle Contents, Slide ID, CosMx Assay Type, Panel Name, Protocol Link, Software and Version, Total Number of Cells, Total Number of Targets, Number of FOVs, Surface area, Experiment IF Channels, Transcripts per Cell, Percent of Transcripts within Cells, Mean Total Transcripts per Area, Unique Genes, Total Negative Probe Counts False
CosMx Bundle Contents A comma separated list of filenames within the CosMx bundle zip file nan True
CosMx Assay Type The specification for barcodes on each image. Either RNA probe or protein antibody according to the assay nan True
Number of FOVs The total number of FOVs recorded for the sample on a single flow cell nan True
Mean Total Transcripts per Area The mean total transcripts per um3 nan True
Unique Genes The total unique genes detected above background nan False
Total Negative Probe Counts Mean Total Negative probe counts/cell nan True
Mass Spectrometry Level 1 Mass Spectrometry derived data that includes proteomics, metabolomics, and lipidomics, level 1 Component, Filename, File Format, HTAN Parent Biospecimen ID, HTAN Data File ID, MS Batch ID, MS-based Assay Type, Analyte Type, MS-based Targeted, MS Instrument Vendor and Model, MS Source, Polarity, Mass Range Low Value, Mass Range High Value, Data Collection Mode, MS Scan Mode, MS Labeling, Protocol Link, LC Instrument Vendor and Model, LC Column Vendor and Model, LC Resin, LC Length Value, LC Temp Value, LC ID Value, LC Flow Rate, LC Gradient, LC Mobile Phase A, LC Mobile Phase B, Software and Version, MS Instrument Metadata File False
Mass Spectrometry Level 2 Mass Spectrometry derived data that includes proteomics, metabolomics, and lipidomics, level 2 Component, Filename, File Format, HTAN Data File ID, HTAN Parent Biospecimen ID, HTAN Parent Data File ID, MS Assay Category, Software and Version, Mass Spectrometry Auxiliary File False
Mass Spectrometry Level 3 Mass Spectrometry derived data that includes proteomics, metabolomics, and lipidomics, level 3 Component, Filename, File Format, HTAN Data File ID, HTAN Parent Biospecimen ID, HTAN Parent Data File ID, MS Assay Category, Software and Version, Mass Spectrometry Auxiliary File False
Mass Spectrometry Level 4 Mass Spectrometry derived data that includes proteomics, metabolomics, and lipidomics, level 4 Component, Filename, File Format, HTAN Data File ID, HTAN Parent Biospecimen ID, HTAN Parent Data File ID, MS Assay Category, Software and Version, Mass Spectrometry Auxiliary File False
Mass Spectrometry Auxiliary File Auxiliary software parameter file used in mass spectrometry data processing, recorded as synapse ID (syn12345). Component, Filename, File Format, HTAN Parent Data File ID, HTAN Data File ID False
Imaging Level 3 Channels Channel-level Metadata Attributes HTAN Channel Metadata ID, Channel ID, Channel Name, Channel Passed QC, Cycle Number, Sub Cycle Number, Antibody Role, Target Name, Antibody Name, RRID identifier, Fluorophore, Clone, Lot, Vendor, Catalog Number, Excitation Wavelength, Emission Wavelength, Excitation Bandwidth, Emission Bandwidth, Metal Isotope Element, Metal Isotope Mass, Oligo Barcode Upper Strand, Oligo Barcode Lower Strand, Dilution, Concentration False
HTAN Channel Metadata ID HTAN ID for this channel metadata table (same for all rows) nan True
Channel ID This must match the corresponding field in the OME-XML / TIFF header. (eg 'Channel:0:1') nan True
Channel Name This must match the corresponding field in the OME-XML / TIFF header. (eg 'Blue' or 'CD45' or 'E-cadherin') nan True
Channel Passed QC Identify stains that did not pass QC but are included in the dataset. nan True
No - Channel Failed QC Channel failed QC Channel QC Failure Type False
Channel QC Failure Type Reason the channel failed QC nan False
Other/multiple channel QC faliure types QC failure type not speficied Channel QC Failure Comment False
Channel QC Failure Comment Custom comment on channel QC faliure nan False
Cycle Number The cycle # in which the co-listed reagent(s) was(were) used. Integer >= 1 (up to number of cycles) nan False
Sub Cycle Number Sub cycle number nan False
Target Name Short descriptive name (abbreviation) for this target (antigen) nan True
Antibody Role "Is this antibody acting as a primary or secondary antibody" nan True
Antibody Name Antibody Name (free text (eg “Keratin”, “CD163”, “DNA”)) nan True
RRID identifier Research Resource Identifier (eg “RRID: AB_394606”) nan True
Fluorophore Fluorescent dye label (eg Alexa Fluor 488) nan False
Clone Clone nan False
Lot Lot number from vendor nan False
Vendor Vendor nan False
Catalog Number Catalog Number nan False
Excitation Wavelength Center/peak of the excitation spectrum (nm) nan False
Emission Wavelength Center/peak of the emission spectrum (nm) nan False
Excitation Bandwidth Nominal width of excitation spectrum (nm) nan False
Emission Bandwidth Nominal width of emission spectrum (nm) nan False
Metal Isotope Element Element abbreviation. eg “La” or “Nd” nan False
Metal Isotope Mass Element mass number nan False
Oligo Barcode Upper Strand Oligo Barcode - Upper Strand nan False
Oligo Barcode Lower Strand Oligo Barcode - Lower Strand nan False
Dilution Dilution (eg 1:1000) nan False
Concentration Concentration (eg 10ug/mL) nan False
Imaging Assay Type Type of imaging assay nan True
Channel Metadata Filename Full path within Synapse project of uploaded companion CSV file containing channel-level metadata details nan True
Microscope Microscope type (manufacturer, model, etc) used for this experiment nan True
Objective Objective nan False
NominalMagnification The magnification of the lens as specified by the manufacturer - i.e. '60' is a 60X lens. floating point value > 1(no units) nan True
LensNA The numerical aperture of the lens. Floating point value > 0. nan False
WorkingDistance The working distance of the lens, expressed as a floating point number. Floating point > 0. WorkingDistanceUnit False
WorkingDistanceUnit The units of the working distance. See OME enumeration of allowed values for the UnitsLength attribute -- default: microns (um) nan False
Immersion Immersion medium nan False
Pyramid Does data file contain pyramid of images nan True
Zstack Does data file contain a Z-stack of images nan True
Tseries Does data file contain a time-series of images nan True
Passed QC Did all channels pass QC (if not add free text Comment) nan True
No - Channels QC Not all channels passed QC Comment False
Comment Free text field (generally for QC comment) nan False
FOV number Index of FOV (as it pertains to its sequence order). Integer >= 1 nan False
FOVX Field of view X dimension. Floating point FOVXUnit False
FOVXUnit Field of view X dimension units. See OME enumeration of allowed values for the UnitsLength attribute -- default: microns (um) nan False
FOVY Field of view Y dimension. Floating point value FOVYUnit False
FOVYUnit Field of view Y dimension units. See OME enumeration of allowed values for the UnitsLength attribute -- default: microns (um) nan False
Frame Averaging Number of frames averaged together (if no averaging, set to 1). Integer >= 1 nan False
Image ID Unique internal image identifier. eg "Image:0". (To be extracted from OME-XML) nan True
DimensionOrder The order in which the individual planes of data are interleaved. nan True
PhysicalSizeX Physical size (X-dimension) of a pixel. Units are set by PhysicalSizeXUnit. Floating point value > 0. PhysicalSizeXUnit True
PhysicalSizeXUnit The units of the physical size of a pixel. See OME enumeration of allowed values for the UnitsLength attribute -- default: microns (um) nan True
PhysicalSizeY Physical size (Y-dimension) of a pixel. Units are set by PhysicalSizeYUnit. Floating point value > 0. PhysicalSizeYUnit True
PhysicalSizeYUnit The units of the physical size of a pixel. See OME enumeration of allowed values for the UnitsLength attribute -- default: microns (um) nan True
PhysicalSizeZ Physical size (Z-dimension) of a pixel. Units are set by PhysicalSizeZUnit. Floating point value > 0. PhysicalSizeZUnit True
PhysicalSizeZUnit The units of the physical size of a pixel. See OME enumeration of allowed values for the UnitsLength attribute -- default: microns (um) nan True
Pixels BigEndian Boolean (True/False) nan True
PlaneCount Number of Z-planes (not to be confused with downsampled "pyramid"). Integer >=1 nan True
SizeC Number of channels. Integer >= 1 nan True
SizeT Number of time points. Integer >= 1 nan True
SizeX Size of image: X dimension (in pixels). Integer >= 1 nan True
SizeY Size of image: Y dimension (in pixels). Integer >= 1 nan True
SizeZ Size of image: Z dimension (in pixels). Integer >= 1 nan True
PixelType Data type for each pixel value. E.g. "uint16" nan True
Imaging Segmentation Data Type Specifies how the segmentation is stored nan True
Parameter file Path in Syanpse to a text file listing algorithm version numbers and relevant parameters needed to reproduce the analysis nan False
Commit SHA Short SHA for software version [8 hexadecimal characters (for github), comma separated if multiple] nan False
Imaging Object Class Defines the structure that the mask delineates nan True
Imaging Object Class Other Imaging Object Class Other Imaging Object Class Description False
Imaging Object Class Description Free text description of object class [string] nan True
Number of Objects The number of objects (eg cells) described nan True
Number of Features The number of features (eg channels) described nan True
Imaging Summary Statistic Function used to summarize object/feature intensity nan False
Nucleic Acid Source The source of the input nucleic molecule nan True
Micro-region Seq Platform The platform used for micro-regional RNA sequencing (if applicable) nan False
ROI Tag The tag or grouping used to identify the ROI in micro-regional RNA sequencing (if applicable). Must match the ROI tag within the count matrix in level 3. nan False
Single Cell Isolation Method The method by which cells are isolated into individual reaction containers at a single cell resolution (e.g. wells, micro-droplets) nan True
Dissociation Method The tissue dissociation method used for scRNASeq or scATAC-seq assays nan True
Library Layout Sequencing read type nan True
Nucleus Identifier Unique nuclei barcode; added at transposition step. Determines which nucleus the reads originated from nan True
Nuclei Barcode Nuclei Barcode nan False
scATACseq Library Layout Sequencing read type nan True
Nuclei Barcode Read Nuclei Barcode Read nan True
Nuclei Barcode Length Nuclei Barcode Length nan True
scATACseq Paired End A library layout type nan False
scATACseq Read1 Read 1 content description nan True
scATACseq Read2 Read 2 content description nan True
scATACseq Read3 Read 3 content description nan False
scmCseq Read1 Read 1 content description nan True
scmCseq Read2 Read 2 content description nan True
scmCseq Read3 Read 3 content description nan True
Threshold for Minimum Passing Reads Threshold for calling cells nan True
Total Number of Passing Nuclei Number of nuclei sequenced nan True
Median Fraction of Reads in Peaks Median fraction of reads in peaks (FRIP) Peaks Calling Software True
Median Fraction of Reads in Annotated cis DNA Elements Median fraction of reads in annotated cis-DNA elements (FRIADE) Peaks Calling Software True
Median Passing Read Percentage Non-PCR duplicate nuclear genomic sequence reads not aligning to unanchored contigs out of total reads assigned to the nucleus barcode nan True
Median Percentage of Mitochondrial Reads per Nucleus Contamination from mitochondrial sequences nan True
Peaks Calling Software Generic name of peaks calling tool nan True
Read Indicator Indicate if this is Read 1 (R1), Read 2 (R2), Index Reads (I1), or Other nan True
Read1 Read 1 content description nan True
Read2 Read 2 content description nan True
cDNA Complementary DNA. A DNA copy of an mRNA or complex sample of mRNAs, made using reverse transcriptase cDNA Offset, cDNA Length False
cDNA Offset Offset in sequence for cDNA read (in bp): number nan True
cDNA Length Length of cDNA read (in bp): number nan True
Cell Barcode and UMI Cell and transcript identifiers UMI Barcode Offset, UMI Barcode Length, Median UMIs per Cell Number, Cell Barcode Offset, Cell Barcode Length, Valid Barcodes Cell Number False
Cell Barcode Offset Offset in sequence for cell barcode read (in bp): number nan True
Cell Barcode Length Length of cell barcode read (in bp): number nan True
Valid Barcodes Cell Number Number nan True
UMI Barcode Offset Start position of UMI barcode in the sequence. Values: number, 0 for start of read nan True
UMI Barcode Length Length of UMI barcode read (in bp): number nan True
Median UMIs per Cell Number Number nan True
Cell Median Number Reads Median number of reads per cell. Number nan True
Cell Median Number Genes Median number of genes detected per cell. Number nan True
Cell Total Number of sequenced cells. Applies to raw counts matrix only. nan True
Library Construction Method Process which results in the creation of a library from fragments of DNA using cloning vectors or oligonucleotides with the role of adaptors [OBI_0000711] nan True
Input Cells and Nuclei Number of cells and number of nuclei input; entry format: number, number nan True
CEL-seq2 Highly-multiplexed plate-based single-cell RNA-Seq assay Empty Well Barcode, Well Index False
Empty Well Barcode Unique cell barcode assigned to empty cells used as controls in CEL-seq2 assays. nan True
Well Index Indicate if protein expression (EPCAM/CD45) positive/negative data is available for each cell in CEL-seq2 assays nan False
Library Preparation Days from Index Number of days between sample for assay was received in lab and the libraries were prepared for sequencing [number]. If not applicable please enter 'Not Applicable' nan False
Single Cell Dissociation Days from Index Number of days between sample for single cell assay was received in lab and when the sample was dissociated and cells were isolated [number]. If not applicable please enter 'Not Applicable' nan True
Sequencing Library Construction Days from Index Number of days between sample for assay was received in lab and day of sequencing library construction [number]. If not applicable please enter 'Not Applicable' nan True
Nucleic Acid Capture Days from Index Number of days between sample for single cell assay was received in lab and day of nucleic acid capture part of library construction (in number of days since sample received in lab) [number]. If not applicable please enter 'Not Applicable' nan True
Cryopreserved Cells in Sample Indicate if library preparation was based on revived frozen cells. nan True
End Bias The end of the cDNA molecule that is preferentially sequenced, e.g. 3/5 prime tag/end or the full length transcript nan True
Reverse Transcription Primer An oligo to which new deoxyribonucleotides can be added by DNA polymerase [SO_0000112]. The type of primer used for reverse transcription, e.g. oligo-dT or random primer. This allows users to identify content of the cDNA library input e.g. enriched for mRNA nan True
Feature barcoding A method for adding extra channels of information to cells by running single-cell gene expression in parallel with other assays [https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/feature-bc] Feature Reference Id False
Feature Reference Id Unique ID for this feature. Must not contain whitespace, quote or comma characters. Each ID must be unique and must not collide with a gene identifier from the transcriptome [https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/using/feature-bc-analysis#feature-ref] nan True
Spike In A set of known synthetic RNA molecules with known sequence that are added to the cell lysis mix nan True
ERCC The External RNA Controls Consortium (ERCC) spike in set is commonly used in single-cell experiments for normalization Spike In Concentration False
Spike In Concentration The final concentration or dilution (for commercial sets) of the spike in mix [PMID:21816910] nan True
Sequencing Platform A platform is an object aggregate that is the set of instruments and software needed to perform a process [OBI_0000050]. Specific model of the sequencing instrument. nan True
Technical Replicate Group A common term for all files belonging to the same cell or library. Provide a numbering of each library prep batch (can differ from encapsulation and sequencing batch) nan False
Total Number of Input Cells Number of cells loaded/placed on plates nan True
Sequencing Batch ID Links samples to a specific local sequencer run. Can be string or 'null' nan True
Single Nucleus Buffer Nuclei isolation buffer nan True
Transposition Reaction Name of the transposase, transposon sequences nan True
Read Length The length of the sequencing reads. Can be integer, null nan True
Target Capture Kit Description that can uniquely identify a target capture kit. Suggested value is a combination of vendor, kit name, and kit version. nan True
Library Selection Method How RNA molecules are isolated. nan True
Library Preparation Kit Name Name of Library Preparation Kit. String nan True
Library Preparation Kit Vendor Vendor of Library Preparation Kit. String nan True
Library Preparation Kit Version Version of Library Preparation Kit. String nan True
Adapter Name Name of the sequencing adapter. String nan False
Adapter Sequence Base sequence of the sequencing adapter. String nan False
Base Caller Name Name of the base caller. String nan False
Base Caller Version Version of the base caller. String nan False
Flow Cell Barcode Flow cell barcode. Wrong or missing information may affect analysis results. String nan False
Fragment Maximum Length Maximum length of the sequenced fragments (e.g., as predicted by Agilent Bioanalyzer). Integer nan False
Fragment Mean Length Mean length of the sequenced fragments (e.g., as predicted by Agilent Bioanalyzer). Number nan False
Fragment Minimum Length Minimum length of the sequenced fragments (e.g., as predicted by Agilent Bioanalyzer). Integer nan False
Fragment Standard Deviation Length Standard deviation of the sequenced fragments length (e.g., as predicted by Agilent Bioanalyzer). Number nan False
Lane Number The basic machine unit for sequencing. For Illumina machines, this reflects the physical lane number. Wrong or missing information may affect analysis results. Integer nan False
Library Strand Library stranded-ness. nan False
Multiplex Barcode The barcode/index sequence used. Wrong or missing information may affect analysis results. String nan False
Size Selection Range Range of size selection. String nan False
Target Depth The targeted read depth prior to sequencing. Integer nan False
To Trim Adapter Sequence Does the user suggest adapter trimming? nan False
Yes - Trim Adapter Sequence Trim adapter sequence nan False
Adapter Trimmer Name Name of adapter trimmer nan False
Adapter Trimmer Version Version of the adapter trimmer nan False
Adapter Trimmer Options Options used by adapter trimmer nan False
Transcript Integrity Number Used to describe the quality of the starting material, esp. in regards to FFPE samples. Number nan False
RIN A numerical assessment of the integrity of RNA based on the entire electrophoretic trace of the RNA sample including the presence or absence of degradation products. Number nan False
DV200 Represents the percentage of RNA fragments that are >200 nucleotides in size. Number nan False
Adapter Content State classification given by FASTQC for the metric. Metric specific details about the states are available on their website. nan False
Basic Statistics State classification given by FASTQC for the metric. Metric specific details about the states are available on their website. nan False
Encoding Version of ASCII encoding of quality values found in the file. String nan False
Kmer Content State classification given by FASTQC for the metric. Metric specific details about the states are available on their website. nan False
Overrepresented Sequences State classification given by FASTQC for the metric. Metric specific details about the states are available on their website. nan False
Per Base N Content State classification given by FASTQC for the metric. Metric specific details about the states are available on their website. nan False
Per Base Sequence Content State classification given by FASTQC for the metric. Metric specific details about the states are available on their website. nan False
Per Base Sequence Quality State classification given by FASTQC for the metric. Metric specific details about the states are available on their website. nan False
Per Sequence GC Content State classification given by FASTQC for the metric. Metric specific details about the states are available on their website. nan False
Per Sequence Quality Score State classification given by FASTQC for the metric. Metric specific details about the states are available on their website. nan False
Per Tile Sequence Quality State classification given by FASTQC for the metric. Metric specific details about the states are available on their website. nan False
Percent GC Content The overall %GC of all bases in all sequences. Integer nan False
Sequence Duplication Levels State classification given by FASTQC for the metric. Metric specific details about the states are available on their website. nan False
Sequence Length Distribution State classification given by FASTQC for the metric. Metric specific details about the states are available on their website. nan False
Total Reads Total number of reads per sample. Integer nan False
Whitelist Cell Barcode File Link Link to file listing all possible cell barcodes. URL nan True
Cell Barcode Tag SAM tag for cell barcode field; please provide a valid cell barcode tag (e.g. CB:Z) nan True
UMI Tag SAM tag for the UMI field; please provide a valid UB, UMI (e.g. UB:Z or UR:Z) nan True
Applied Hard Trimming Was Hard Trimming applied nan True
Yes - Applied Hard Trimming Hard Trimming was applied Aligned Read Length False
Aligned Read Length Read length used for alignment if hard trimming was applied nan True
scRNAseq Workflow Type Generic name for the workflow used to analyze a data set. nan True
Workflow Version Major version of the workflow (e.g. Cell Ranger v3.1) nan True
scRNAseq Workflow Parameters Description Parameters used to run the workflow. scRNA-seq level 3: e.g. Normalization and log transformation, ran empty drops or doublet detection, used filter on # genes/cell, etc. scRNA-seq Level 4: dimensionality reduction with PCA and 50 components, nearest-neighbor graph with k = 20 and Leiden clustering with resolution = 1, UMAP visualization using 50 PCA components, marker genes used to annotate cell types, information about droplet matrix (all barcodes) to cell matrix (only informative barcodes representing real cells) conversion nan True
scATACseq Workflow Type Generic name for the workflow used to analyze a data set. nan True
scATACseq Workflow Parameters Description Parameters used to run the scATAC-seq workflow. nan True
Workflow Link Link to workflow or command. DockStore.org recommended. URL nan True
QC Workflow Type Generic name for the workflow used to analyze a data set. String nan False
QC Workflow Version Major version for a workflow. String nan False
QC Workflow Link Link to workflow used. String nan False
Germline Variants Workflow URL Link to workflow document, e.g. Github, DockStore.org recommended nan True
Germline Variants Workflow Type Generic name for the workflow used to analyze a data set nan False
Other Germline Variants Workflow Type Other Germline Variants Workflow Type Custom Germline Variants Workflow Type False
Custom Germline Variants Workflow Type Specify the name of a custom alignment workflow nan True
Somatic Variants Workflow URL Generic name for the workflow used to analyze a data set. nan True
Somatic Variants Workflow Type Generic name for the workflow used to analyze a data set. nan False
Other Somatic Variants Workflow Type Other Somatic Variants Workflow Type Custom Somatic Variants Workflow Type False
Custom Somatic Variants Workflow Type Specify the name of a custom workflow name nan True
Somatic Variants Sample Type Is the sample case or control in somatic variant analysis nan True
Structural Variant Workflow URL Link to workflow document. DockStore.org recommended. URL nan True
Structural Variant Workflow Type Generic name for the workflow used to analyze a data set. nan False
Other Structural Variant Workflow Type Other Structural Variant Workflow Type Custom Structural Variant Workflow Type False
Custom Structural Variant Workflow Type Specify the name of a custom workflow name nan True
Alignment Workflow Url Link to workflow used for read alignment. DockStore.org recommended. String nan True
Alignment Workflow Type Generic name for the workflow used to analyze a data set. nan True
Other Alignment Workflow Other Alignment Workflow Custom Alignment Workflow False
Custom Alignment Workflow Specify the name of a custom alignment workflow nan True
MSI Workflow Link Link to method workflow (or command) used in estimating the MSI. URL nan False
MSI Score Numeric score denoting the aligned reads file's MSI score from MSIsensor. Number nan False
MSI Status MSIsensor determination of either microsatellite stability or instability. nan False
Genomic Reference Exact version of the human genome reference used in the alignment of reads (e.g. GCF_000001405.39) nan True
Genomic Reference URL Link to human genome sequence (e.g. ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_34/GRCh38.primary_assembly.genome.fa.gz) nan True
Genome Annotation URL Link to the human genome annotation (GTF) file (e.g. ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_34/gencode.v34.annotation.gtf.gz) nan True
Index File Name The name (or part of a name) of a file (of any type). String nan True
Average Base Quality Average base quality collected from samtools. Number nan False
Average Insert Size Average insert size collected from samtools. Integer nan False
Average Read Length Average read length collected from samtools. Integer nan False
Contamination Fraction of reads coming from cross-sample contamination collected from GATK4. Number nan False
Contamination Error Estimation error of cross-sample contamination collected from GATK4. Number nan False
Mean Coverage Mean coverage for whole genome sequencing, or mean target coverage for whole exome and targeted sequencing, collected from Picard. Number nan False
Pairs On Diff CHR Pairs on different chromosomes collected from samtools. Integer nan False
Total Uniquely Mapped Number of reads that map to genome. Integer nan False
Total Unmapped reads Number of reads that did not map to genome. Integer nan False
Proportion Reads Duplicated Proportion of duplicated reads collected from samtools. Number nan False
Proportion Reads Mapped Proportion of mapped reads collected from samtools. Number nan False
Proportion Targets No Coverage Proportion of targets that did not reach 1X coverage over any base from Picard Tools. Number nan False
Proportion Base Mismatch Proportion of mismatched bases collected from samtools. Number nan False
Proportion Coverage 10x Proportion of all reference bases for whole genome sequencing, or targeted bases for whole exome and targeted sequencing, that achieves 10X or greater coverage from Picard Tools. nan False
Proportion Mitochondrial Reads Proportion of reads mapping to mitochondria. nan False
Proportion Coverage 30X Proportion of all reference bases for whole genome sequencing, or targeted bases for whole exome and targeted sequencing, that achieves 30X or greater coverage from Picard Tools. nan False
Short Reads Number of reads that were too short. Integer nan False
Pseudo Alignment Used Pseudo aligners such as Kallisto or Salmon do not produce aligned reads BAM files. True indicates pseudoalignment was used. nan True
Software and Version Name of software used to generate expression values. String nan True
Yes - Pseudo Alignment Used Pseudo aligner was used Workflow Link, Software and Version, Genomic Reference, Genomic Reference URL False
Data Category Specific content type of the data file. nan True
Expression Units How quantities are corrected for gene length nan True
Fusion Gene Detected Was a fusion gene identified? nan False
Yes - Fusion Gene Detected A fusion gene was detected Fusion Gene Identity False
Fusion Gene Identity The gene symbols of fused genes. nan False
Other Fusion Gene Other fusion gene detected. Specify Other Fusion Gene False
Specify Other Fusion Gene Specify fusion gene detected, if not in list nan False
Matrix Type Type of data stored in matrix. nan True
Linked Matrices All matrices associated with every part of a SingleCellExperiment object. Comma-delimited list of filenames nan False
Biospecimen Type Biospecimen Type nan True
Analyte Biospecimen Type A molecular derivative (I.e. RNA / DNA / Protein Lysate) obtained from a specimen Analyte Type, Fixation Duration, Slide Charge Type, Section Thickness Value, Sectioning Days from Index, Shipping Condition Type, Ischemic Time, Ischemic Temperature False
Tissue Biospecimen Type Tissue biospecimen Ischemic Time, Ischemic Temperature, Site of Resection or Biopsy, Specimen Laterality, Portion Weight, Total Volume, Tumor Tissue Type, Histologic Morphology Code, Preservation Method, Biospecimen Dimension 1, Biospecimen Dimension 2, Biospecimen Dimension 3, Section Number in Sequence False
Bone Marrow Biospecimen Type Bone Marrow biospecimen Ischemic Time, Ischemic Temperature, Site of Resection or Biopsy, Specimen Laterality, Portion Weight, Total Volume, Tumor Tissue Type, Histologic Morphology Code, Preservation Method, Biospecimen Dimension 1, Biospecimen Dimension 2, Biospecimen Dimension 3, Section Number in Sequence False
Urine Biospecimen Type Urine biospecimen Ischemic Time, Ischemic Temperature, Site of Resection or Biopsy, Specimen Laterality, Portion Weight, Total Volume, Tumor Tissue Type, Histologic Morphology Code, Preservation Method False
Blood Biospecimen Type Blood biospecimen Shipping Condition Type False
Timepoint Label Label to identify the time point at which the clinical data or biospecimen was obtained (e.g. Baseline, End of Treatment, Overall survival, Final). NO PHI/PII INFORMATION IS ALLOWED. nan True
Collection Days from Index Number of days from the research participant's index date that the biospecimen was obtained. If not applicable please enter 'Not Applicable' nan True
Protocol Link Protocols.io ID or DOI link to a free/open protocol resource describing in detail the assay protocol (e.g. surface markers used in Smart-seq, dissociation duration, lot/batch numbers for key reagents such as primers, sequencing reagent kits, etc.) or the protocol by which the sample was obtained or generated. nan True
Adjacent Biospecimen IDs List of HTAN Identifiers (separated by commas) of adjacent biospecimens cut from the same sample; for example HTA3_3000_3, HTA3_3000_4, ... nan False
Mounting Medium The solution in which the specimen is embedded, generally under a cover glass. It may be liquid, gum or resinous, soluble in water, alcohol or other solvents and be sealed from the external atmosphere by non-soluble ringing media nan False
Analyte Type The kind of molecular specimen analyte: a molecular derivative (I.e. RNA / DNA / Protein Lysate) obtained from a specimen nan True
Acquisition Method Type Records the method of acquisition or source for the specimen under consideration. nan True
Other Acquisition Method A custom acquisition method Acquisition Method Other Specify False
Acquisition Method Other Specify A custom acquisition method [Text - max length 100 characters] nan True
Preservation Method Text term that represents the method used to preserve the sample. nan True
Fixative Type Text term to identify the type of fixative used to preserve a tissue specimen nan True
Fixation Duration The length of time, from beginning to end, required to process or preserve biospecimens in fixative (measured in minutes) nan True
Ischemic Time Duration of time, in seconds, between when the specimen stopped receiving oxygen and when it was preserved or processed. Integer value. nan False
Ischemic Temperature Specify whether specimen experienced warm or cold ischemia. nan False
Collection Media Material Specimen is collected into post procedure nan False
Specimen Laterality For tumors in paired organs, designates the side on which the specimen was obtained. nan True
Portion Weight Numeric value that represents the sample portion weight, measured in milligrams. nan False
Total Volume Numeric value for the total amount of sample or specimen Total Volume Unit False
Total Volume Unit Unit of measurement used for the total amount of sample or specimen nan False
Tumor Tissue Type Text that describes the kind of disease present in the tumor specimen as related to a specific timepoint (add rows to select multiple values along with timepoints) nan True
Histologic Morphology Code The microscopic anatomy of normal and abnormal cells and tissues of the specimen as captured in the morphology codes of the International Classification of Diseases for Oncology, 3rd Edition (ICD-O-3). Example - 8010/0 nan True
Preinvasive Morphology Histologic Morphology not included in ICD-O-3 morphology codes, for preinvasive lesions included in the HTAN nan False
Slide Charge Type A description of the charge on the glass slide. nan True
Section Thickness Value Numeric value to describe the thickness of a slice to tissue taken from a biospecimen, measured in microns (um). nan True
Sectioning Days from Index Number of days from the research participant's index date that the biospecimen was sectioned after collection. If not applicable please enter 'Not Applicable' nan True
Storage Method The method by which a biomaterial was stored after preservation or before another protocol was used. nan True
Processing Days from Index Number of days from the research participant's index date that the biospecimen was processed. If not applicable please enter 'Not Applicable' nan True
Shipping Condition Type Text descriptor of the shipping environment of a biospecimen. nan True
Site Data Source Text to identify the data source for the specimen/sample from within the HTAN center, if applicable. Any identifier used within the center to identify data sources. No PHI/PII is allowed. nan False
Processing Location Site with an HTAN center where specimen processing occurs, if applicable. Any identifier used within the center to identify processing location. No PHI/PII is allowed. nan False
Histology Assessment By Text term describing who (in what role) made the histological assessments of the sample nan False
Histology Assessment Medium The method of assessment used to characterize histology nan False
Tumor Infiltrating Lymphocytes Measure of Tumor-Infiltrating Lymphocytes [Number] nan False
Degree of Dysplasia Information related to the presence of cells that look abnormal under a microscope but are not cancer. Records the degree of dysplasia for the cyst or lesion under consideration. nan False
Dysplasia Fraction Resulting value to represent the number of pieces of dysplasia divided by the total number of pieces. [Text: max length 5] nan False
Number Proliferating Cells Numeric value that represents the count of proliferating cells determined during pathologic review of the sample slide(s). nan False
Percent Eosinophil Infiltration Numeric value to represent the percentage of infiltration by eosinophils in a tumor sample or specimen. nan False
Percent Granulocyte Infiltration Numeric value to represent the percentage of infiltration by granulocytes in a tumor sample or specimen. nan False
Percent Inflam Infiltration Numeric value to represent local response to cellular injury, marked by capillary dilatation, edema and leukocyte infiltration; clinically, inflammation is manifest by redness, heat, pain, swelling and loss of function, with the need to heal damaged tissue. nan False
Percent Lymphocyte Infiltration Numeric value to represent the percentage of infiltration by lymphocytes in a solid tissue normal sample or specimen. nan False
Percent Monocyte Infiltration Numeric value to represent the percentage of monocyte infiltration in a sample or specimen. nan False
Percent Necrosis Numeric value to represent the percentage of cell death in a malignant tumor sample or specimen. nan False
Percent Neutrophil Infiltration Numeric value to represent the percentage of infiltration by neutrophils in a tumor sample or specimen. nan False
Percent Normal Cells Numeric value to represent the percentage of normal cell content in a malignant tumor sample or specimen. nan False
Percent Stromal Cells Numeric value to represent the percentage of reactive cells that are present in a malignant tumor sample or specimen but are not malignant such as fibroblasts, vascular structures, etc. nan False
Percent Tumor Cells Numeric value that represents the percentage of infiltration by tumor cells in a sample. nan False
Percent Tumor Nuclei Numeric value to represent the percentage of tumor nuclei in a malignant neoplasm sample or specimen. nan False
Fiducial Marker Imaging specific: fiducial markers for the alignment of images taken across multiple rounds of imaging. nan False
Slicing Method Imaging specific: the method by which the tissue was sliced. nan False
Lysis Buffer scRNA-seq specific: Type of lysis buffer used nan False
Method of Nucleic Acid Isolation Bulk RNA & DNA-seq specific: method used for nucleic acid isolation. E.g. Qiagen Allprep, Qiagen miRNAeasy. [Text - max length 100] nan False
Biospecimen Dimension 1 First dimension of tissue fragment (number, up to one decimal place) measured in units as defined by the "dimensions_unit" CDE Dimensions Unit False
Biospecimen Dimension 2 Second dimension of tissue fragment (number, up to one decimal place) measured in units as defined by the "dimensions_unit" CDE nan False
Biospecimen Dimension 3 Third dimension of tissue fragment (number, up to one decimal place) measured in units as defined by the "dimensions_unit" CDE nan False
Dimensions Unit Unit of measurement used for dimension CDEs in metric system (i.e. cm, mm, etc) nan False
Section Number in Sequence Numeric value (integer, including ranges) provided to a sample in a series of sections (list all adjacent sections in the Adjacent Biospecimen IDs field) nan False
Start Days from Index Number of days from the date of birth (index date) to the date of an event (e.g. exposure to environmental factor, treatment start, etc.). If not applicable please enter 'Not Applicable' nan True
Stop Days from Index Number of days from the date of birth (index date) to the end date of the event (e.g. exposure to environmental factor, treatment start, etc.). Note: if the event occurs at a single time point, e.g. a diagnosis or a lab test, the values for this column is 'Not Applicable' nan False
Ethnicity An individual's self-described social and cultural grouping, specifically whether an individual describes themselves as Hispanic or Latino. The provided values are based on the categories defined by the U.S. Office of Management and Business and used by the U.S. Census Bureau. nan True
Gender Text designations that identify gender. Gender is described as the assemblage of properties that distinguish people on the basis of their societal roles. [Identification of gender is based upon self-report and may come from a form, questionnaire, interview, etc.] nan True
Race An arbitrary classification of a taxonomic group that is a division of a species. It usually arises as a consequence of geographical isolation withina a species and is characterized by shared heredity, physical attributes and behavior, and in the case of humans, by common history, nationality, or geographic distribution. nan True
Vital Status The survival state of the person registered on the protocol. nan True
Dead This indicates the participant is dead and defines further required metadata Year of Death, Cause of Death, Cause of Death Source, Days to Death False
Days to Birth Number of days between the date used for index and the date from a person's date of birth represented as a calculated negative number of days. If not applicable please enter 'Not Applicable' nan False
Year of Death Numeric value to represent the year of the death of an individual. nan True
Country of Residence Country of Residence at enrollment nan False
Age Is Obfuscated The age of the patient has been modified for compliance reasons. The actual age differs from what is reported. Other date intervals for this patient may also be modified. nan False
Year Of Birth Numeric value to represent the calendar year in which an individual was born. nan False
Cause of Death The cause of death nan True
Cause of Death Source The text term used to describe the source used to determine the patient's cause of death. nan False
Days to Death Number of days between the date used for index and the date from a person's date of death represented as a calculated number of days. If not applicable please enter 'Not Applicable' nan False
Occupation Duration Years The number of years a patient worked in a specific occupation. nan False
Premature At Birth The yes/no/unknown indicator used to describe whether the patient was premature (less than 37 weeks gestation) at birth. nan False
Weeks Gestation at Birth Numeric value used to describe the number of weeks starting from the approximate date of the biological mother's last menstrual period and ending with the birth of the patient. nan False
Education Level Highest level of education that the patient completed (direct patient-derived information) nan False
Country of Birth Country where the patient was born. nan False
Medically Underserved Area Areas or populations designated by HRSA as having too few primary care providers, high infant mortality, high poverty or a high elderly population: Use patient zip code to find the county the patient lives in by going to https://www.unitedstateszipcodes.org/ - enter the zip code in the main text field and use the associated county on the right side of the result field. Go to data.hrsa.gov website and select "Query Data". Pick the Medically Underserved Areas/Populations (MUA/P) data source in the step 1 menu and select "View Data". Enter the name of the county (_ county) in the first "Service Area" column, adding the state in the 5th column may help direct you to the data. If the designation type in the third column is "medically underserved area" enter "Yes" as the value. If the county generates a "No data available in table" enter "No" as the value. A value of "Unknown" indicates that sufficient data was not available to look up the value. If value is yes, complete the Medically_underserved_score data element. nan False
Medically Underserved Area - Yes Patient's zip code is in a medically underserved area Medically Underserved Score False
Medically Underserved Area - No Patient's zip code is not in a medically underserved area nan False
Medically Underserved Area - Unknown Insufficient data to look up the Medically Underserved Area value nan False
Medically Underserved Score Index of Medical Underservice (IMU) score, a number between 0 and 100, where 0 represents completely underserved and 100 represents best served or least underserved. Use patient zip code to find the county the patient lives in by going to https://www.unitedstateszipcodes.org/. Enter the zip code in the main text field and use the asociated county on the right side of the result field. Go to data.hrsa.gov website and select Query Data. Pick the Medically Underserved Areas/Populations (MUA/P) data source in the step 1 menu and select View Data. Enter the name of the county (______ county) in the first "Service Area" column, adding the state in the 5th column may help direct you to the data. Enter the Index of Medical Underservice Score in the fourth column to one decimal place as the value. nan False
Rural vs Urban Density of population in the county of residence, based on census data (updated last on 4/28/20). Use patient zip code to find the county the patient lives in by going to https://www.unitedstateszipcodes.org/. Enter the zip code in the main text field and use the associated county on the right side of the result field. Go to https://www2.census.gov/programs-surveys/acs/data/covid_19/Data_Profiles_for_HHS/050-County_By_State/. Select the dp02_XX.csv file where XX = the two letter abbreviation for the appropriate state. On row 166 find the total population for the appropriate county. If the total population is <2,500 enter a value of "Rural Population"; if 2,500 - 50,000 enter a value of "Urban Cluster"; or if >50,000 enter "Urban Population" nan False
Cancer Incidence Incidence of specific cancer type in a defined area (a number between 0 and 100). The rate of incident cases per population of 100,000 persons of a specific type of cancer as designated in the "primary_diagnosis" data element in the county where the patient resides, using the most recent 2013-2017 NCI Cancer Atlas derived data. Use patient zip code to find the county the patient lives in by going to https://www.unitedstateszipcodes.org/. Enter the zip code in the main text field and use the asociated county on the right side of the result field. On the https://gis.cancer.gov/canceratlas/tableview/ website, choose "Incidence" from the Topic dropdown menu, state of interest from the Area menu, "All Races" from the Race menu, and the cancer type ("Both Sexes" when possible) from the Statistic menu. Find the county of interest and enter the numeric Age-Adjusted Rate per 100,000 as the value. nan False
Cancer Incidence Location The county and state in which the patient lives and to which the cancer_incidence data correlates. Record as "County, State" as they appear in the incidence box from which the cancer_incidence data is obtained in the https://gis.cancer.gov/canceratlas/tableview/ website nan False
Relationship Gender The text term used to describe the gender of the patient's relative with a history of cancer. nan False
Relationship Age at Diagnosis The age (in years) when the patient's relative was first diagnosed. nan False
Relationship Primary Diagnosis The text term used to describe the malignant diagnosis of the patient's relative with a history of cancer. nan False
Relationship Type The subgroup that describes the state of connectedness between members of the unit of society organized around kinship ties. nan False
Relative with Cancer History The yes/no/unknown indicator used to describe whether any of the patient's relatives have a history of cancer. nan False
Relatives with Cancer History Count The number of relatives the patient has with a known history of cancer. nan False
Yes - Cancer History Relative Individual has a relative with cancer history Relatives with Cancer History Count, Relationship Type, Relationship Primary Diagnosis, Relationship Gender,Relationship Age at Diagnosis False
Smoking Exposure Indicate if individual has smoking exposure nan True
Yes - Smoking Exposure Individual has been exposed to smoke; requires additional metadata Years Smoked, Pack Years Smoked, Cigarettes per Day, Smoking Frequency, Type of Smoke Exposure, Time between Waking and First Smoke, Tobacco Smoking Onset Year, Tobacco Smoking Quit Year, Tobacco Smoking Status, Type of Tobacco Used, Secondhand Smoke as Child, Smoke Exposure Duration, Tobacco Use per Day, Smokeless Tobacco Quit Age False
Pack Years Smoked Numeric computed value to represent lifetime tobacco exposure defined as number of cigarettes smoked per day x number of years smoked divided by 20. nan True
Years Smoked Numeric value (or unknown) to represent the number of years a person has been smoking. nan True
Alcohol Exposure Indicate if individual has alcohol exposure nan True
Yes - Alcohol Exposure Individual has been exposed to alcohol Alcohol Days Per Week, Alcohol Drinks Per Day, Alcohol History, Alcohol Intensity, Alcohol Type False
Alcohol Days Per Week Numeric value used to describe the average number of days each week that a person consumes an alcoholic beverage. nan False
Alcohol Drinks Per Day Numeric value used to describe the average number of alcoholic beverages a person consumes per day. nan False
Alcohol History A response to a question that asks whether the participant has consumed at least 12 drinks of any kind of alcoholic beverage in their lifetime. nan False
Alcohol Intensity Category to describe the patient's current level of alcohol use as self-reported by the patient. nan False
Alcohol Type Type of alcohol use nan False
Asbestos Exposure The yes/no/unknown indicator used to describe whether the patient was exposed to asbestos. nan False
Cigarettes per Day The average number of cigarettes smoked per day. nan False
Coal Dust Exposure The yes/no/unknown indicator used to describe whether a patient was exposed to fine powder derived by the crushing of coal. nan False
Environmental Tobacco Smoke Exposure The yes/no/unknown indicator used to describe whether a patient was exposed to smoke that is emitted from burning tobacco, including cigarettes, pipes, and cigars. This includes tobacco smoke exhaled by smokers. nan False
Radon Exposure The yes/no/unknown indicator used to describe whether the patient was exposed to radon. nan False
Respirable Crystalline Silica Exposure The yes/no/unknown indicator used to describe whether a patient was exposured to respirable crystalline silica, a widespread, naturally occurring, crystalline metal oxide that consists of different forms including quartz, cristobalite, tridymite, tripoli, ganister, chert and novaculite. nan False
Smoking Frequency The text term used to generally decribe how often the patient smokes. nan False
Secondhand Smoke as Child The text term used to indicate whether the patient was exposed to secondhand smoke as a child. nan False
Smoke Exposure Duration Text term used to describe the length of time the patient was exposed to an environmental factor. nan False
Type of Smoke Exposure The text term used to describe the patient's specific type of smoke exposure. nan False
Marijuana smoke Marijuana smoke exposure Marijuana Use Per Week False
Marijuana Use Per Week Numeric value that represents the number of times the patient uses marijuana each day. nan False
Tobacco Use per Day Numeric value that represents the number of times the patient uses tobacco each day. nan False
Smokeless Tobacco Quit Age Smokeless tobacco quit age nan False
Time between Waking and First Smoke The text term used to describe the approximate amount of time elapsed between the time the patient wakes up in the morning to the time they smoke their first cigarette. nan False
Tobacco Smoking Onset Year The year in which the participant began smoking. nan False
Tobacco Smoking Quit Year The year in which the participant quit smoking. nan False
Tobacco Smoking Status Category describing current smoking status and smoking history as self-reported by a patient nan False
Type of Tobacco Used The text term used to describe the specific type of tobacco used by the patient. nan False
Days to Follow Up Number of days between the date used for index and the date of the patient's last follow-up appointment or contact. If not applicable please enter 'Not Applicable' nan True
Adverse Event Text that represents the Common Terminology Criteria for Adverse Events low level term name for an adverse event. nan False
BMI A calculated numerical quantity that represents an individual's weight to height ratio. nan False
Cause of Response The text term used to describe the suspected cause or reason for the patient disease response. nan False
Comorbidity The text term used to describe a comorbidity disease, which coexists with the patient's malignant disease. nan False
Comorbidity Method of Diagnosis The text term used to describe the method used to diagnose the patient's comorbidity disease. nan False
Days to Adverse Event Number of days between the date used for index and the date of the patient's adverse event. If not applicable please enter 'Not Applicable' nan False
Days to Comorbidity Number of days between the date used for index and the date the patient was diagnosed with a comorbidity. If not applicable please enter 'Not Applicable' nan False
Days to Progression Number of days between the date used for index and the date the patient's disease progressed. If not applicable please enter 'Not Applicable' nan False
Days to Progression Free Number of days between the date used for index and the date the patient's disease was formally confirmed as progression-free. If not applicable please enter 'Not Applicable' nan False
Days to Recurrence Number of days between the date used for index and the date the patient's disease recurred. If not applicable please enter 'Not Applicable' nan True
Diabetes Treatment Type Text term used to describe the types of treatment used to manage diabetes. nan False
Disease Response Code assigned to describe the patient's response or outcome to the disease. nan False
DLCO Ref Predictive Percent The value, as a percentage of predicted lung volume, measuring the amount of carbon monoxide detected in a patient's lungs. nan False
ECOG Performance Status The ECOG functional performance status of the patient/participant. nan False
FEV1 FVC Post Bronch Percent Percentage value to represent result of Forced Expiratory Volume in 1 second (FEV1) divided by the Forced Vital Capacity (FVC) post-bronchodilator. nan False
FEV 1 FVC Pre Bronch Percent Percentage value to represent result of Forced Expiratory Volume in 1 second (FEV1) divided by the Forced Vital Capacity (FVC) pre-bronchodilator. nan False
FEV1 Ref Post Bronch Percent The percentage comparison to a normal value reference range of the volume of air that a patient can forcibly exhale from the lungs in one second post-bronchodilator. nan False
FEV1 Ref Pre Bronch Percent The percentage comparison to a normal value reference range of the volume of air that a patient can forcibly exhale from the lungs in one second pre-bronchodilator. nan False
Height The height of the patient in centimeters. nan False
Hepatitis Sustained Virological Response The yes/no/unknown indicator used to describe whether the patient received treatment for a risk factor the patient had at the time of or prior to their diagnosis. nan False
HPV Positive Type Text classification to represent the strain or type of human papillomavirus identified in an individual. nan False
Karnofsky Performance Status Text term used to describe the classification used of the functional capabilities of a person. nan False
Menopause Status Text term used to describe the patient's menopause status. nan False
Adverse Event Grade The text term used to describe a specific histone variants, which are proteins that substitute for the core canonical histones. nan False
AIDS Risk Factors The text term used to describe a risk factor of the acquired immunodeficiency syndrome (AIDS) that the patient either had at time time of the study or experienced in the past. nan False
Body Surface Area Numeric value used to represent the 2-dimensional extent of the body surface relating height to weight. nan False
CD4 Count The text term used to describe the outcome of the procedure to determine the amount of the CD4 expressing cells in a sample. nan False
CDC HIV Risk Factors The text term used to describe a risk factor for human immunodeficiency virus, as described by the Center for Disease Control. nan False
Days to Imaging Number of days between the date used for index and the date the imaging or scan was performed on the patient. If not applicable please enter 'Not Applicable' nan False
Evidence of Recurrence Type The text term used to describe the type of evidence used to determine whether the patient's disease recurred nan False
HAART Treatment Indicator The text term used to indicate whether the patient received Highly Active Antiretroviral Therapy (HAART). nan False
HIV Viral Load Numeric value that represents the concentration of an analyte or aliquot extracted from the sample or sample portion, measured in milligrams per milliliter. nan False
Hormonal Contraceptive Use The text term used to indicate whether the patient used hormonal contraceptives. nan False
Hysterectomy Margins Involved The text term used to indicate whether the patient's disease was determined to be involved based on the surgical margins of the hysterectomy. nan False
Hysterectomy Type The text term used to describe the type of hysterectomy the patient had. nan False
Imaging Result The text term used to describe the result of the imaging or scan performed on the patient. nan False
Imaging Type The text term used to describe the type of imaging or scan performed on the patient. nan False
Immunosuppressive Treatment Type The text term used to describe the type of immunosuppresive treatment the patient received. nan False
Nadir CD4 Count Numeric value that represents the lowest point to which the CD4 count has dropped (nadir). nan False
Pregnancy Outcome The text term used to describe the type of pregnancy the patient had nan False
Recist Targeted Regions Number Numeric value that represents the number of baseline target lesions, as described by the Response Evaluation Criteria in Solid Tumours (RECIST) criteria nan False
Recist Targeted Regions Sum Numeric value that represents the sum of baseline target lesions, as described by the Response Evaluation Criteria in Solid Tumours (RECIST) criteria. nan False
Scan Tracer Used The text term used to describe the type of tracer used during the imaging or scan of the patient. nan False
Progression or Recurrence Yes/No/unknown indicator to identify whether a patient has had a new tumor event after initial treatment. nan True
Yes - Progression or Recurrence The patient has had a new tumor event after initial treatment Progression or Recurrence Type, Days to Progression, Days to Progression Free, Days to Recurrence, Progression or Recurrence Anatomic Site False
Progression or Recurrence Anatomic Site The text term used to describe the anatomic site of resection; biopsy; tissue or organ of biospecimen origin; progression or recurrent disease; treatment nan False
Treatment Anatomic Site The text term used to describe the anatomic site of resection; biopsy; tissue or organ of biospecimen origin; progression or recurrent disease; treatment nan False
NCI Atlas Cancer Site The primary tumor site used to calculate the incidence rate using the NCI Cancer Atlas, a digital atlas which provides geographical data related to cancer utilizing the Surveillance, Epidemiology, and End Results (SEER) Program cancer incidence rates for 2013 to 2017 nan False
Progression or Recurrence Type The text term used to describe the type of progressive or recurrent disease or relapsed disease. nan False
Reflux Treatment Type Text term used to describe the types of treatment used to manage gastroesophageal reflux disease (GERD). nan False
Risk Factor The text term used to describe a risk factor the patient had at the time of or prior to their diagnosis. nan False
Risk Factor Treatment The yes/no/unknown indicator used to describe whether the patient received treatment for a risk factor the patient had at the time of or prior to their diagnosis. nan False
Viral Hepatitis Serologies Text term that describes the kind of serological laboratory test used to determine the patient's hepatitus status. nan False
Weight The weight of the patient measured in kilograms. nan False
Days to Treatment End Number of days between the date used for index and the date the treatment ended. If not applicable please enter 'Not Applicable' nan False
Days to Treatment Start Number of days between the date used for index and the date the treatment started. If not applicable please enter 'Not Applicable' nan False
Initial Disease Status The text term used to describe the status of the patient's malignancy when the treatment began. nan False
Regimen or Line of Therapy The text term used to describe the regimen or line of therapy. nan False
Therapeutic Agents Text identification of the individual agent(s) used as part of a treatment regimen. nan False
Treatment Effect The text term used to describe the pathologic effect a treatment(s) had on the tumor. nan False
Treatment Intent Type Text term to identify the reason for the administration of a treatment regimen. [Manually-curated] nan False
Treatment or Therapy A yes/no/unknown/not applicable indicator related to the administration of therapeutic agents received. nan False
Treatment Outcome Text term that describes the patient's final outcome after the treatment was administered. nan False
Treatment Type Text term that describes the kind of treatment administered. nan False
Chemo Concurrent to Radiation The text term used to describe whether the patient was receiving chemotherapy concurrent to radiation. nan False
Number of Cycles The numeric value used to describe the number of cycles of a specific treatment or regimen the patient received. nan False
Reason Treatment Ended The text term used to describe the reason a specific treatment or regimen ended. nan False
Treatment Arm Text term used to describe the treatment arm assigned to a patient at the time eligibility is determined. nan False
Treatment Dose The numeric value used to describe the dose of an agent the patient received. nan False
Treatment Dose Units The text term used to describe the dose units of an agent the patient received. nan False
Treatment Effect Indicator The text term used to indicate whether the treatment had an effect on the patient. nan False
Treatment Frequency The text term used to describe the frequency the patient received an agent or regimen. nan False
Sentinel Lymph Node Count Numeric count of sentinel lymph nodes. nan False
Sentinel Node Positive Assessment Count The number or amount of metastatic neoplasms related to the confirmed presence of disease or specific microorganisms during examination of the first rounded mass of lymphatic tissue to which cancer is likely to spread from the primary tumor. nan False
Tumor Extranodal Extension Indicator The indicator to determine extranodal involvement or extent of the disease. nan False
Satellite Metastasis Present Indicator A yes/no indicator to ask if intransit metastases or satellite lesions are present. nan False
Other Biopsy Resection Site A description of the location on or within the human body where the surgical biopsy/resection procedure was performed (Not covered under HTAN Clinical Data Tier 1) nan False
Extent of Tumor Resection The degree to which the lesion has been cut out, or resected. nan False
Precancerous Condition Type The classification of pre-cancerous cells found in a specific collection of data being studied by the Consortium for Molecular and Cellular Characterization of Screen-Detected Lesions (MCL). nan False
Prior Sites of Radiation The anatomic location to which radiation treatment was administered to a patient prior to enrollment on a protocol. nan False
Immunosuppression The indicator that describes whether or not immunosuppressive therapy was administered. nan False
Concomitant Medication Received Type An enumerated list of the type of concomitant medication received by the patient. nan False
Family Member Vital Status Indicator The response indicates whether the family member of the patient with a history of cancer is alive. (Extension to GDC attributes in Family History Tier 1) nan False
COVID19 Occurrence Indicator The indicator that describes whether or not a COVID-19 infectious disorder occurred. nan False
COVID19 Current Status The patient's current COVID-19 status of sign or symptom events or interventions nan False
COVID19 Positive Lab Test Indicator The indicator that describes whether or not there was a COVID-19 positive test result. nan False
COVID19 Antibody Testing Text term that demonstrates the test results of immunoglobulin M (IgM) and immunoglobulin G (IgG) antibodies to the SARS-CoV-2 virus in subject serum samples. nan False
COVID19 Complications Severity Text term that retrospectively indicates the worst complications during COVID-19 infectious disorder in the patient. nan False
COVID19 Cancer Treatment Followup Indicator that describes if cancer treatment was modified for the patient due to COVID-19 infectious disorder nan False
Ecig vape use Use of non-traditional cigarette nicotine delivery device (electronic cigarette, ENDS - electronic nicotine delivery system) nan False
Ecig vape 30 day use num Number of days e-cigarettes or vaping device was used in the last 30 days nan False
Ecig vape times per day e-cig frequency of use (times per day—one “time” consists of around 15 puffs or lasts around 10 minutes) nan False
Type of smoke exposure cumulative years The number of cumulative years of the patient's specific type of smoke exposure nan False
Chewing tobacco daily use count The quantity of daily use of tobacco, in the form of a plug, usually flavored, for chewing rather than smoking. nan False
Second hand smoke exposure years The number of cumulative years of the patient's exposure to second-hand cigarette smoke nan False
Known Genetic Predisposition Mutation A yes/no/unknown indicator to identify whether there is a known genetic predisposition mutation present in the patient. nan False
Hereditary Cancer Predisposition Syndrome History of presence of inherited genetic predisposition syndrome that confers heightened susceptibility to cancer in the patient. nan False
Cancer Associated Gene Mutations Type of inherited germline or other gene mutations that confers heightened susceptibility to cancer identified in patient history nan False
Mutational Signatures Mutational signatures identified in the patient, includes signatures linked to selected exogenous carcinogens, endogenous and enzymatic modification of DNA or defective DNA repair. Note: Include only outputs of tests that were completed clinically for the participant and only include data from a diagnostic array that was completed prior to research sequencing was done. nan False
Mismatch Repair System Status The text that best describes the condition or state of MMR (mismatch repair system) in the patient nan False
Lab Tests for MMR Status Laboratory tests used to evaluate the status of mismatch repair pathways nan False
Mode of Cancer Detection Text term used to describe the mode of cancer detection, like standard of care screening or random detection nan False
Gene Symbol The text term used to describe a gene targeted or included in molecular analysis. For rearrangements, this is should be used to represent the reference gene. Note: This node is meant to capture molecular tests that were completed clinically for the participant and only includes data from diagnostic array that was completed prior to research sequencing was done. Do not include data related to research assay outputs here. nan True
Molecular Analysis Method The text term used to describe the method used for molecular analysis. Note: This node is meant to capture molecular tests that were completed clinically for the participant and only includes data from diagnostic array that was completed prior to research sequencing was done. Do not include data related to research assay outputs here. nan True
Test Result The text term used to describe the result of the molecular test. If the test result was a numeric value see test_value. Note: This node is meant to capture molecular tests that were completed clinically for the participant and only includes data from diagnostic array that was completed prior to research sequencing was done. Do not include data related to research assay outputs here. nan True
AA Change Alphanumeric value used to describe the amino acid change for a specific genetic variant. Example: R116Q. Note: This node is meant to capture molecular tests that were completed clinically for the participant and only includes data from diagnostic array that was completed prior to research sequencing was done. Do not include data related to research assay outputs here. nan False
Antigen The text term used to describe an antigen included in molecular testing. Note: This node is meant to capture molecular tests that were completed clinically for the participant and only includes data from diagnostic array that was completed prior to research sequencing was done. Do not include data related to research assay outputs here. nan False
Clinical Biospecimen Type The text term used to describe the biological material used for testing, diagnostic, treatment or research purposes. Note: This node is meant to capture molecular tests that were completed clinically for the participant and only includes data from diagnostic array that was completed prior to research sequencing was done. Do not include data related to research assay outputs here. nan False
Blood Test Normal Range Upper Numeric value used to describe the upper limit of the normal range used to describe a healthy individual at the institution where the test was completed. nan False
Blood Test Normal Range Lower Numeric value used to describe the lower limit of the normal range used to describe a healthy individual at the institution where the test was completed. nan False
Cell Count Numeric value used to describe the number of cells used for molecular testing. Note: This node is meant to capture molecular tests that were completed clinically for the participant and only includes data from diagnostic array that was completed prior to research sequencing was done. Do not include data related to research assay outputs here. nan False
Chromosome The text term used to describe a chromosome targeted or included in molecular testing. If a specific genetic variant is being reported, this property can be used to capture the chromosome where that variant is located. Note: This node is meant to capture molecular tests that were completed clinically for the participant and only includes data from diagnostic array that was completed prior to research sequencing was done. Do not include data related to research assay outputs here. nan False
Clonality The text term used to describe whether a genomic variant is related by descent from a single progenitor cell. Note: This node is meant to capture molecular tests that were completed clinically for the participant and only includes data from diagnostic array that was completed prior to research sequencing was done. Do not include data related to research assay outputs here. nan False
Copy Number Numeric value used to describe the number of times a section of the genome is repeated or copied within an insertion, duplication or deletion variant. Note: This node is meant to capture molecular tests that were completed clinically for the participant and only includes data from diagnostic array that was completed prior to research sequencing was done. Do not include data related to research assay outputs here. nan False
Cytoband Alphanumeric value used to describe the cytoband or chromosomal location targeted or included in molecular analysis. If a specific genetic variant is being reported, this property can be used to capture the cytoband where the variant is located. Format: [chromosome][chromosome arm].[band+sub-bands]. Example: 17p13.1. Note: This node is meant to capture molecular tests that were completed clinically for the participant and only includes data from diagnostic array that was completed prior to research sequencing was done. Do not include data related to research assay outputs here. nan False
Exon Exon number targeted or included in a molecular analysis. If a specific genetic variant is being reported, this property can be used to capture the exon where that variant is located. Note: This node is meant to capture molecular tests that were completed clinically for the participant and only includes data from diagnostic array that was completed prior to research sequencing was done. Do not include data related to research assay outputs here. nan False
Histone Family The text term used to describe the family, or classification of a group of basic proteins found in chromatin, called histones. nan False
Histone Variant The text term used to describe a specific histone variants, which are proteins that substitute for the core canonical histones. nan False
Intron Intron number targeted or included in molecular analysis. If a specific genetic variant is being reported, this property can be used to capture the intron where that variant is located. nan False
Laboratory Test The text term used to describe the medical testing used to diagnose, treat or further understand a patient's disease. nan False
Loci Abnormal Count Numeric value used to describe the number of loci determined to be abnormal. nan False
Loci Count Numeric value used to describe the number of loci tested. nan False
Locus Alphanumeric value used to describe the locus of a specific genetic variant. Example: NM_001126114. nan False
Mismatch Repair Mutation The yes/no/unknown indicator used to describe whether the mutation included in molecular testing was known to have an affect on the mismatch repair process. Note: This node is meant to capture molecular tests that were completed clinically for the participant and only includes data from diagnostic array that was completed prior to research sequencing was done. Do not include data related to research assay outputs here. nan False
Molecular Consequence The text term used to describe the molecular consequence of genetic variation. Note: This node is meant to capture molecular tests that were completed clinically for the participant and only includes data from diagnostic array that was completed prior to research sequencing was done. Do not include data related to research assay outputs here. nan False
Pathogenicity The text used to describe a variant's level of involvement in the cause of the patient's disease according to the standards outlined by the American College of Medical Genetics and Genomics (ACMG). nan False
Ploidy Text term used to describe the number of sets of homologous chromosomes. nan False
Second Exon The second exon number involved in molecular variation. If a specific genetic variant is being reported, this property can be used to capture the second exon where that variant is located. This property is typically used for a translocation where two different locations are involved in the variation. nan False
Second Gene Symbol The text term used to describe a secondary gene targeted or included in molecular analysis. For rearrangements, this is should represent the location of the variant. Note: This node is meant to capture molecular tests that were completed clinically for the participant and only includes data from diagnostic array that was completed prior to research sequencing was done. Do not include data related to research assay outputs here. nan False
Specialized Molecular Test Text term used to describe a specific test that is not covered in the list of molecular analysis methods. Note: This node is meant to capture molecular tests that were completed clinically for the participant and only includes data from diagnostic array that was completed prior to research sequencing was done. Do not include data related to research assay outputs here. nan False
Test Analyte Type The text term used to describe the type of analyte used for molecular testing. Note: This node is meant to capture molecular tests that were completed clinically for the participant and only includes data from diagnostic array that was completed prior to research sequencing was done. Do not include data related to research assay outputs here. nan False
Test Units The text term used to describe the units of the test value for a molecular test. This property is used in conjunction with test_value. Note: This node is meant to capture molecular tests that were completed clinically for the participant and only includes data from diagnostic array that was completed prior to research sequencing was done. Do not include data related to research assay outputs here. nan False
Test Value The text term or numeric value used to describe a specific result of a molecular test. Note: This node is meant to capture molecular tests that were completed clinically for the participant and only includes data from diagnostic array that was completed prior to research sequencing was done. Do not include data related to research assay outputs here nan False
Transcript Alphanumeric value used to describe the transcript of a specific genetic variant. Note: This node is meant to capture molecular tests that were completed clinically for the participant and only includes data from diagnostic array that was completed prior to research sequencing was done. Do not include data related to research assay outputs here. nan False
Variant Origin The text term used to describe the biological origin of a specific genetic variant. Note: This node is meant to capture molecular tests that were completed clinically for the participant and only includes data from diagnostic array that was completed prior to research sequencing was done. Do not include data related to research assay outputs here. nan False
Variant Type The text term used to describe the type of genetic variation. nan False
Zygosity The text term used to describe the zygosity of a specific genetic variant. nan False
Cog Neuroblastoma Risk Group Text term that represents the categorization of patients on the basis of prognostic factors per a system developed by Children's Oncology Group (COG). Risk level is used to assign treatment intensity. nan False
Cog Rhabdomyosarcoma Risk Group Text term used to describe the classification of rhabdomyosarcoma, as defined by the Children's Oncology Group (COG). nan False
Gleason Grade Group The text term used to describe the overall grouping of grades defined by the Gleason grading classification, which is used to determine the aggressiveness of prostate cancer. Note that this grade describes the entire prostatectomy specimen and is not specific to the sample used for sequencing. nan False
Gleason Grade Tertiary The text term used to describe the tertiary pattern as described by the Gleason Grading System. nan False
Gleason Patterns Percent Numeric value that represents the percentage of Patterns 4 and 5, which is used when the Gleason score is greater than 7 to predict prognosis. nan False
Greatest Tumor Dimension Numeric value that represents the measurement of the widest portion of the tumor in centimeters. nan False
IGCCCG Stage The text term used to describe the International Germ Cell Cancer Collaborative Group (IGCCCG), a grouping used to further classify metastatic testicular tumors. nan False
INPC Grade Text term used to describe the classification of neuroblastic differentiation within neuroblastoma tumors, as defined by the International Neuroblastoma Pathology Classification (INPC). nan False
INPC Histologic Group The text term used to describe the classification of neuroblastomas distinguishing between favorable and unfavorable histologic groups. The histologic score, defined by the International Neuroblastoma Pathology Classification (INPC), is based on age, mitosis-karyorrhexis index (MKI), stromal content and degree of tumor cell differentiation. nan False
INRG Stage The text term used to describe the staging classification of neuroblastic tumors, as defined by the International Neuroblastoma Risk Group (INRG). nan False
INSS Stage Text term used to describe the staging classification of neuroblastic tumors, as defined by the International Neuroblastoma Staging System (INSS). nan False
International Prognostic Index The text term used to describe the International Prognostic Index, which classifies the prognosis of patients with aggressive non-Hodgkin's lymphoma. nan False
IRS Group Text term used to describe the classification of rhabdomyosarcoma tumors, as defined by the Intergroup Rhabdomyosarcoma Study (IRS). nan False
IRS Stage The text term used to describe the classification of rhabdomyosarcoma tumors, as defined by the Intergroup Rhabdomyosarcoma Study (IRS). nan False
ISS Stage The multiple myeloma disease stage at diagnosis. nan False
Lymph Node Involved Site The text term used to describe the anatomic site of lymph node involvement. nan False
Margin Distance Numeric value (in centimeters) that represents the distance between the tumor and the surgical margin. nan False
Margins Involved Site The text term used to describe the anatomic sites that were involved in the survival margins. nan False
Medulloblastoma Molecular Classification The text term used to describe the classification of medulloblastoma tumors based on molecular features. nan False
Micropapillary Features The yes/no/unknown indicator used to describe whether micropapillary features were determined to be present. nan False
Mitosis Karyorrhexis Index Text term that represents the component of the International Neuroblastoma Pathology Classification (INPC) for mitosis-karyorrhexis index (MKI). nan False
Non Nodal Regional Disease The text term used to describe whether the patient had non-nodal regional disease. nan False
Non Nodal Tumor Deposits The yes/no/unknown indicator used to describe the presence of tumor deposits in the pericolic or perirectal fat or in adjacent mesentery away from the leading edge of the tumor. nan False
Ovarian Specimen Status The text term used to describe the physical condition of the involved ovary. nan False
Ovarian Surface Involvement The text term that describes whether the surface tissue (outer boundary) of the ovary shows evidence of involvement or presence of cancer. nan False
Pregnant at Diagnosis The text term used to indicate whether the patient was pregnant at the time they were diagnosed. nan False
Primary Gleason Grade The text term used to describe the primary Gleason score, which describes the pattern of cells making up the largest area of the tumor. The primary and secondary Gleason pattern grades are combined to determine the patient's Gleason grade group, which is used to determine the aggresiveness of prostate cancer. Note that this grade describes the entire prostatectomy specimen and is not specific to the sample used for sequencing. nan False
Secondary Gleason Grade The text term used to describe the secondary Gleason score, which describes the pattern of cells making up the second largest area of the tumor. The primary and secondary Gleason pattern grades are combined to determine the patient's Gleason grade group, which is used to determine the aggresiveness of prostate cancer. Note that this grade describes the entire prostatectomy specimen and is not specific to the sample used for sequencing. nan False
Supratentorial Localization Text term to specify the location of the supratentorial tumor. nan False
Tumor Depth Numeric value that represents the depth of tumor invasion, measured in millimeters (mm). nan False
WHO CNS Grade WHO CNS Grade nan False
WHO NTE Grade WHO NTE Grade nan False
Age at Diagnosis Age at the time of diagnosis expressed in number of days since birth. nan True
Days to Last Follow up Time interval from the date of last follow up to the date of initial pathologic diagnosis, represented as a calculated number of days. If not applicable please enter 'Not Applicable' nan True
Days to Last Known Disease Status Time interval from the date of last follow up to the date of initial pathologic diagnosis, represented as a calculated number of days. If not applicable please enter 'Not Applicable' nan True
Last Known Disease Status Text term that describes the last known state or condition of an individual's neoplasm. nan True
Primary Diagnosis Text term used to describe the patient's histologic diagnosis, as described by the World Health Organization's (WHO) International Classification of Diseases for Oncology (ICD-O). nan True
Prior Malignancy The yes/no/unknown indicator used to describe the patient's history of prior cancer diagnosis. nan False
Prior Treatment A yes/no/unknown/not applicable indicator related to the administration of therapeutic agents received before the body specimen was collected. nan False
Site of Resection or Biopsy The text term used to describe the anatomic site of the resection or biopsy of the patient's malignant disease, as described by the World Health Organization's (WHO) International Classification of Diseases for Oncology (ICD-O). nan True
Tissue or Organ of Origin The text term used to describe the anatomic site of origin, of the patient's malignant disease, as described by the World Health Organization's (WHO) International Classification of Diseases for Oncology (ICD-O). nan True
Tumor Grade Numeric value to express the degree of abnormality of cancer cells, a measure of differentiation and aggressiveness. nan False
AJCC Clinical M Extent of the distant metastasis for the cancer based on evidence obtained from clinical assessment parameters determined prior to treatment. nan False
AJCC Clinical N Extent of the regional lymph node involvement for the cancer based on evidence obtained from clinical assessment parameters determined prior to treatment. nan False
AJCC Clinical Stage Stage group determined from clinical information on the tumor (T), regional node (N) and metastases (M) and by grouping cases with similar prognosis for cancer. nan False
AJCC Clinical T Extent of the primary cancer based on evidence obtained from clinical assessment parameters determined prior to treatment. nan False
AJCC Pathologic M Code to represent the defined absence or presence of distant spread or metastases (M) to locations via vascular channels or lymphatics beyond the regional lymph nodes, using criteria established by the American Joint Committee on Cancer (AJCC). nan False
AJCC Pathologic N The codes that represent the stage of cancer based on the nodes present (N stage) according to criteria based on multiple editions of the AJCC's Cancer Staging Manual. nan False
AJCC Pathologic Stage The extent of a cancer, especially whether the disease has spread from the original site to other parts of the body based on AJCC staging criteria. nan False
AJCC Pathologic T Code of pathological T (primary tumor) to define the size or contiguous extension of the primary tumor (T), using staging criteria from the American Joint Committee on Cancer (AJCC). nan False
AJCC Staging System Edition The text term used to describe the version or edition of the American Joint Committee on Cancer Staging Handbooks, a publication by the group formed for the purpose of developing a system of staging for cancer that is acceptable to the American medical profession and is compatible with other accepted classifications. nan False
Anaplasia Present Yes/no/unknown/Not Reported indicator used to describe whether anaplasia was present at the time of diagnosis. nan False
Yes - Anaplasia Present Indicates anaplasia is present Anaplasia Present Type False
Anaplasia Present Type The text term used to describe the morphologic findings indicating the presence of a malignant cellular infiltrate characterized by the presence of large pleomorphic cells, necrosis, and high mitotic activity in a tissue sample. nan False
Best Overall Response The best improvement achieved throughout the entire course of protocol treatment. nan False
Breslow Thickness The number that describes the distance, in millimeters, between the upper layer of the epidermis and the deepest point of tumor penetration. nan False
Classification of Tumor Text that describes the kind of disease present in the tumor specimen as related to a specific timepoint. nan False
Days to Diagnosis Number of days between the date used for index and the date the patient was diagnosed with the malignant disease. If not applicable please enter 'Not Applicable' nan False
First Symptom Prior to Diagnosis Text term used to describe the patient's first symptom experienced prior to diagnosis and thought to be related to the disease. nan False
Gross Tumor Weight Numeric value used to describe the gross pathologic tumor weight, measured in grams. nan False
Laterality For tumors in paired organs, designates the side on which the cancer originates. nan False
Lymph Nodes Positive The number of lymph nodes involved with disease as determined by pathologic examination. nan False
Lymph Nodes Tested The number of lymph nodes tested to determine whether lymph nodes were involved with disease as determined by a pathologic examination. nan False
Lymphatic Invasion Present A yes/no indicator to ask if small or thin-walled vessel invasion is present, indicating lymphatic involvement nan False
Metastasis at Diagnosis The text term used to describe the extent of metastatic disease present at diagnosis. nan False
Metastasis at Diagnosis Site Text term to identify an anatomic site in which metastatic disease involvement is found. nan False
Method of Diagnosis Text term used to describe the method used to confirm the patients malignant diagnosis. nan False
Mitotic Count The number of mitoses identified under the microscope in tumors. The method of counting varies, according to the specific tumor examined. Usually, the mitotic count is determined based on the number of mitoses per high power field (40X) or 10 high power fields. nan False
Percent Tumor Invasion The percentage of tumor cells spread locally in a malignant neoplasm through infiltration or destruction of adjacent tissue. nan False
Peritoneal Fluid Cytological Status The text term used to describe the malignant status of the peritoneal fluid determined by cytologic testing. nan False
Perineural Invasion Present A yes/no indicator to ask if perineural invasion or infiltration of tumor or cancer is present. nan False
Residual Disease Text terms to describe the status of a tissue margin following surgical resection. nan False
Synchronous Malignancy A yes/no/unknown indicator used to describe whether the patient had an additional malignant diagnosis at the same time the tumor used for sequencing was diagnosed. If both tumors were sequenced, both tumors would have synchronous malignancies. nan False
Tumor Confined to Organ of Origin The yes/no/unknown indicator used to describe whether the tumor is confined to the organ where it originated and did not spread to a proximal or distant location within the body. nan False
Tumor Focality The text term used to describe whether the patient's disease originated in a single location or multiple locations. nan False
Tumor Largest Dimension Diameter Numeric value used to describe the maximum diameter or dimension of the primary tumor, measured in centimeters. nan False
Vascular Invasion Present The yes/no indicator to ask if large vessel or venous invasion was detected by surgery or presence in a tumor specimen. nan False
Yes - Vascular Invasion Present Indicates venous invasion was detected by surgery or presence in a tumor specimen Vascular Invasion Type False
Vascular Invasion Type Text term that represents the type of vascular tumor invasion. nan False
Year of Diagnosis Numeric value to represent the year of an individual's initial pathologic diagnosis of cancer. nan False
Morphology The third edition of the International Classification of Diseases for Oncology, published in 2000 used principally in tumor and cancer registries for coding the site (topography) and the histology (morphology) of neoplasms. The study of the structure of the cells and their arrangement to constitute tissues and, finally, the association among these to form organs. In pathology, the microscopic process of identifying normal and abnormal morphologic characteristics in tissues, by employing various cytochemical and immunocytochemical stains. A system of numbered categories for representation of data. nan True
Topography Code Topography Code, indicating site within the body, based on ICD-O-3. nan False
Additional Topography Topography not included in the ICD-O-3 Topography codes. nan False
Lung Cancer Detection Method Type The means, manner of procedure, or systematic course of actions performed in order to discover or identify lung cancer nan False
Lung Cancer Participant Procedure History Text name of a surgical or operative procedure used in a natural history protocol of a lung cancer participant. nan False
Lung Adjacent Histology Type The type of morphologic characteristics observed by microscope in the tissue next to a benign or malignant tissue growth nan False
Lung Tumor Location Anatomic Site Anatomic location of the tumor inside the lung nan False
Lung Tumor Lobe Bronchial Location Anatomic lobe and bronchial location of the tumor inside the lung nan False
Current Lung Cancer Symptoms Reported lung cancer related symptoms person is currently experiencing nan False
Lung Topography Lung PCA specific topography (not covered in previous tiers) nan False
Lung Cancer Harboring Genomic Aberrations Genomic aberrations in participants with lung cancer (specific lung cancer associated gene mutations not covered in Tiers 1 and 2) nan False
Colorectal Cancer Detection Method Type The means, manner of procedure, or systematic course of actions performed in order to discover or identify colorectal cancer nan False
History of Prior Colon Polyps Yes/No indicator to describe if the subject had a previous history of colon polyps as noted in the history/physical or previous endoscopic report (s). nan False
Family Colon Cancer History Indicator The indicator to designate if any first degree relative has a history of colorectal cancer. nan False
Family Medical History Colorectal Polyp Diagnosis A yes/no/unknown/not applicable indicator related to family medical history diagnosis of polypoid lesion that arises from the colon or rectum and protrudes into the lumen. nan False
Immediate Family History Endometrial Cancer Text that describes the age at which the family member was diagnosed with endometrial or uterine cancer in relationship to their 50th birthday. nan False
Immediate Family History Ovarian Cancer Text that describes the age at which the family member was diagnosed with ovarian cancer in relationship to their 50th birthday. nan False
Patient Inflammatory Bowel Disease Personal Medica History The indicator for patient's personal medical history of inflammatory bowel disease (chronic, non-specific disorders of unknown etiology, including Crohn disease and ulcerative colitis). nan False
Patient Colonoscopy Performed Indicator The yes/no indicator that records if the subject has undergone a previous colonoscopy. nan False
Colorectal Cancer Tumor Border Configuration The description of the border configuration of a colorectal tumor at pathologic assessment. nan False
MLH1 Promoter Methylation Status Text term to define the status of promoter methylation for the MLH1 gene. Note: MLH1 gene is commonly associated with hereditary nonpolyposis colorectal cancer. Testing for methylation of the MLH1 promoter can help distinguish sporadic from inherited cancers. nan False
Colorectal Cancer KRAS Indicator The yes/no/not applicable indicator that describes if patient has diagnosis of colorectal cancer with known KRAS. nan False
Colon Polyp Occurence Indicator Yes/No indicator to describe if the subject had a previous history of colon polyps as noted in the history/physical or previous endoscopic report (s). nan False
Family History Colorectal Polyp A yes/no/unknown/not applicable indicator related to family medical history diagnosis of polypoid lesion that arises from the colon or rectum and protrudes into the lumen. nan False
Colorectal Polyp New Indicator A yes/no response to a question that asks whether any new polyps greater or equal to two millimeter were identified. nan False
Colorectal Polyp Shape Shape of polyp identified in the participant nan False
Size of Polyp Removed Size of the polyp removed in cm nan False
Colorectal Polyp Count The total number of polyps detected nan False
Colorectal Polyp Type Type of polyp found in the participant nan False
Colorectal Polyp Adenoma Type Type of adenoma associated with the polyp nan False
Breast Carcinoma Detection Method Type The means, manner of procedure, or systematic course of actions performed in order to discover or identify breast cancer. nan False
Breast Carcinoma Histology Category Classification of the type of invasive breast carcinoma diagnosed based on histologic attributes. nan False
Invasive Lobular Breast Carcinoma Histologic Category The histologic subtype for an infiltrating lobular carcinoma of the breast. nan False
Invasive Ductal Breast Carcinoma Histologic Category The histologic subtype for the most common type of invasive breast carcinoma. nan False
Breast Biopsy Procedure Finding Type Text term to describe the result of the examination of the breast tissue specimen or fluid as related to the presence and nature of disease. nan False
Breast Quadrant Site The breast quadrant or structure from which the breast tissue specimen was removed for microscopic examination. nan False
Breast Cancer Assessment Tests Text term to identify assessment tests done in participants during diagnosis nan False
Breast Cancer Genomic Test Performed Text term that represents the name of the genomic test performed for breast cancer. nan False
Mammaprint Risk Group Text term that represents the risk group for breast cancer as determined by assessment of the MammaPrint test. nan False
Oncotype Risk Group Text term that represents the risk group for breast cancer as determined by assessment of the Oncotype recurrence score. nan False
Breast Carcinoma Estrogen Receptor Status Text term to represent the overall result of Estrogen Receptor (ER) testing in a participant with breast cancer nan False
Breast Carcinoma Progesteroner Receptor Status Text term to represent the overall result of Progresterone Receptor (PR) testing in a participant with breast cancer nan False
Breast Cancer Allred Estrogen Receptor Score The numeric Allred score, that is cell staining percentage plus intensity, to determine estrogen receptor status. nan False
Prior Invasive Breast Disease Text term to indicate prior invasive breast condition in the participant nan False
Breast Carcinoma ER Status Percentage Value A numerical quantity measured or assigned or computed which captures the estrogen receptor level measured in a participant with breast cancer nan False
Breast Carcinoma PR Status Percentage Value A numerical quantity measured or assigned or computed which captures the progesterone receptor level measured in a participant with breast cancer nan False
HER2 Breast Carcinoma Copy Number Total Result of HER2 Copy Number testing (in a participant with breast cancer), expressed as a range of values. nan False
Breast Carcinoma Centromere 17 Copy Number Result of Centromere 17 testing in a sample or specimen of metastatic breast carcinoma, expressed as a range of values. nan False
Breast Carcinoma HER2 Centromere17 Copynumber Total Number of Cells Counted for HER2 & Centromere 17 Copy Numbers in a participant with breast cancer nan False
Breast Carcinoma HER2 Chromosome17 Ratio HER2 chromosome 17 ratio in participants with breast cancer nan False
Breast Carcinoma Surgical Procedure Name Text name of a surgical procedure performed for a person with a diagnosis of breast cancer nan False
Breast Carcinoma HER2 Ratio Diagnosis HER2 ratio of the participant at diagnosis nan False
Breast Carcinoma HER2 Status Text term to signify the result of the medical procedure that involves testing a sample of blood or tissue for HER2 in a participant with breast cancer nan False
Hormone Therapy Breast Cancer Prevention Indicator Did the patient receive hormonal therapy for prevention of breast cancer? nan False
Breast Carcinoma ER Staining Intensity Text term to indicate the ER staining intensity on pathology assessment in a participant with breast cancer nan False
Breast Carcinoma PR Staining Intensity Text term to indicate the PR staining intensity on pathology assessment in a participant with breast cancer nan False
Oncotype Score OncotypeDX recurrence score nan False
Breast Imaging Performed Type The kind of technology or method performed for screening, diagnosis, surgical procedures or therapy that aids in the visualization of the breast(s). nan False
Multifocal Breast Carcinoma Present Indicator A response to indicate if there is breast cancer characterized by the presence of multiple cancerous tumors that originate from the same clone and usually located in the same quadrant of the breast. nan False
Multicentric Breast Carcinoma Present Indicator A response to indicate if there is breast cancer characterized by the presence of multiple cancerous tumors that originate from different clones and usually located in different quadrants of the breast. nan False
BIRADS Mammography Breast Density Category The category that describes the relative amount of different tissues present in the breast on a mammogram based on the updated 2015 edition of the American College of Radiology's Breast Imaging Reporting and Data System (BI-RADS) reporting guidelines. nan False
CNS Tumor Primary Anatomic Site Primary tumor location in the central nervous system that comprise the tissues of the central nervous system (brain and spinal cord)-not covered in Tiers 1 and 2 nan False
Glioma Specific Metastasis Sites Evidence of active brain metastasis including leptomeningeal involvement nan False
Glioma Specific Radiation Field A description of the location on or within the CNS where radiation was administered in a partcipant with glioma nan False
Supra Tentorial Ependymoma Molecular Subgroup Text term to identify the molecular subgroup in a supra tentorial ependymoma nan False
Infra Tentorial Ependymoma Molecular Subgroup Text term to identify the molecular subgroup in a infra tentorial ependymoma nan False
Neuroblastoma MYCN Gene Amplification Status Neuroblastoma MYCN amplification or over-expression status nan False
Specimen Blast Count Percentage Value The value, in percent(%) of the medical procedure that involves testing a sample of blood for blast cells, immature (undifferentiated) cells during diagnosis nan False
NCI ALL Risk Group The NCI risk group assigned to a patient at initial diagnosis with Acute Lymphoblastic Leukemia. nan False
MRD ALL Diagnostic Sensitivity The assay sensitivity results of a diagnostic assessment of Minimal Residual Disease in patients diagnosed with Acute Lymphoblastic Leukemia. nan False
CNS Leukemia Status The status of central nervous system leukemia at the time of diagnosis. nan False
Ovarian Cancer Histologic Subtype Text term to describe the histological subtype of ovarian cancer in the participant nan False
Ovarian Cancer Surgical Outcome Text term that describes the kind of surgical treatment administered. nan False
Ovarian Cancer Platinum Status Text term to indicate the status of treatment with platinum in participant with ovarian cancer nan False
Location Extent Extraprostatic Extension Location and extent of extraprostatic extension nan False
Location Nature Positive Margins Location and nature of positive margins nan False
Seminal Vesicle Invasion An anatomic position identifying a side of the body where local spread of malignant neoplasm is found to infiltrate tissue in the saclike glandular diverticulum on the ductus deferens in a male. nan False
Prostate Carcinoma Histologic Type The diagnostic subclassification of an invasive prostate carcinoma. nan False
Prostate Cancer Local Extent The response used to categorize the local extent of disease for prostate cancer. nan False
Additonal Findings Uninvolved Prostate Additional findings, uninvolved prostate nan False
Prostate Cancer Cytologic Morphologic Subtypes Text term that describes various morphological and cytological subtypes in protate tumors. nan False
Sarcoma Subtype The subtype related to the scientific determination and investigation, analysis and recognition of the presence and nature of disease, condition, or injury from expressed signs and symptoms of tissue growth resulting from uncontrolled cell proliferation. nan False
Sarcoma Diagnosis Classification Category High level grouping to describe a diagnostic grouping or category for sarcoma, a malignant mesenchymal cell tumor most commonly arising from muscle, fat, fibrous tissue, bone, cartilage, and blood vessels. nan False
Sarcoma Tumor Extension Type The field to indicate the organs and structures to which the tumor has become adherent or has invaded. nan False
Pancreas Precancer Histopathologic Grade The grade of precancerous pancreatic tissue based on microscopic study of characteristic tissue abnormalities by employing various cytochemical and immunocytochemical stains. nan False
Pancreatic IPMN Pathology Epithelial Subtype The Intraductal Papillary Mucinous Neoplasm (IPMN) epithelial cell subtype based on the gross and microscopic examination of a pancreatic neoplasm specimen nan False
Pancreatic Duct Final Pathology Type The final pathology result of the pancreatic duct communication type. nan False
Cutaneous Melanoma Tumor Infiltrating Lymphocytes Description of degree of lymphocytic infiltration surrounding and disrupting tumor cells of the vertical growth phase in a cutaneous melanoma. nan False
Cutaneous Melanoma Tumor Regression Range Description of the degree to which tumor cells are replaced by lymphocytic inflammation with or without dermal melanophages and fibrosis._Range; the difference between the lowest and highest numerical values. nan False
Melanoma Specimen Clark Level Value Definition of the Clark level or depth of involvement of a melanoma in the skin or a specimen. nan False
Cutaneous Melanoma Surgical Margins Text term to indicate presence of tumor at resection margins nan False
Melanoma Lesion Size Diameter of lesion determined on skin examination (pre-bx), in mm nan False
History of Atypical Nevi Patient has a history of atypical nevi nan False
Fitzpatrick Skin Tone The Fitzpatrick classification of skin phototype nan False
History of Chronic UV Exposure History of chronic UV exposure nan False
History of Blistering Sunburn Patient has history of blistering sunburn nan False
History of Tanning Bed Use History of tanning bed use of the patient nan False
Immediate Family History Melanoma Text that describes the age at which the family member was diagnosed with melanoma skin cancer in relationship to their 50th birthday. nan False
Melanoma Biopsy Resection Sites Biopsy resection sites specific to melanoma (not covered in Tiers 1 and 2) nan False
Cutaneous Melanoma Ulceration Description of extent of disruption to the surface of the skin caused by the cutaneous melanoma. nan False
Cutaneous Melanoma Additional Findings Significant pathologic finding present in addition to the cutaneous melanoma. nan False
HTAN RPPA Antibody Table ID HTAN identifier associated with RPPA antibody level metadata. Identical for every row of the table. nan True
Ab Name Reported on Dataset The antibody name. nan True
GENCODE Gene Symbol Target The comma separated list of gene symbols targeted by the antibody. nan True
UNIPROT Protein ID Target The comma separated list of UNIPROT IDs targeted by the antibody. nan True
Phosphoprotein Flag A flag the denotes if an antibody targets a phosphoprotein. nan True
Internal Ab ID Internal lab ID for an antibody. nan True
Species Host animal. nan True
RPPA Dilution The dilution ratio. nan False
Phospho Site The protein site for a phosphoprotein targeting antibody. Report AA and site (i.e. S442) Phosphoprotein Flag False
RPPA Validation Status Valid = RPPA and WB correlation > 0.7; Use with Caution = RPPA and WB correlation < 0.7; Under Evaluation = Antibody has given mixed results and/or evaluated by another lab; We are in the process of (re)validating; Used for QC = These antibodies are used for tissue sample quality control (QC) nan False
Antibody Notes Notes on antibodies replacements and antibody recognition observations. nan False
Pre-processing Completed Pre-processing steps completed to convert level 1 raw data to a single level 2 image nan True
Pre-processing Required Pre-processing steps required to convert level 1 raw data to a single level 2 image nan True
Publication An empty parent attribute for publications nan False
Publication Manifest Publication specific attributes. Component,Publication-associated HTAN Parent Data File ID, HTAN Grant ID, HTAN Center ID, Publication Content Type, DOI, Title, Authors, Corresponding Author, Corresponding Author ORCID, Year of Publication, Location of Publication, Publication Abstract, License, PMID, Publication contains HTAN ID, Data Type, Tool, Supporting Link, Supporting Link Description False
Publication-associated HTAN Parent Data File ID HTAN Data File Identifier(s) of the files associated with the content presented/published. Should be comma-separated lists. nan True
HTAN Grant ID HTAN grant number(s) (i.e. CA------ format) associated with the content presented/published. nan True
HTAN Center ID List of HTAN Center ID(s) associated with the content presented/published. nan True
Publication Content Type The type of content presented or published. nan True
DOI The digital object identifier (DOI) of the content in the form of https://www.doi.org/{doi} to comply with CrossRef DOI display guidelines. nan True
Corresponding Author The name(s) of the corresponding author(s) of the content presented/published. If more than one corresponding author, please list in the order they appear in the author list. nan True
Corresponding Author ORCID The ORCiD(s) of the corresponding author(s) of the content presented/published. Should be a valid ORCiD url starting with https://orcid.org/ followed by a 16 digit identifier in dash separated groups of 4 (for example https://orcid.org/0000-0002-1825-0097). If more than one corresponding author, please list ORCiDs in the order the authors appear in the author list. nan True
Title The title of the content presented or published. nan True
Authors The names of the author(s) of the content presented/published, in the order they appear. nan True
Year of Publication The year the content was presented or published (format YYYY). nan True
Location of Publication The name of the preprint server, journal, or conference where the content was presented/published. nan True
Publication Abstract The abstract or short description of the content presented/published. nan True
License The type of license applicable to the content. nan False
PMID The PubMed identifier associated with the publication (applicable to published manuscripts). Provide as a URL of the form https://pubmed.ncbi.nlm.nih.gov/{pmid} nan False
Data Type Types of data associated with the content. Fill out Other Data Type Specified, if not on the list. nan True
Other Data Type Specified Other types of data associated with the content. nan False
Supporting Link Relevant external links associated with the content (e.g external datasets used for validation). Please note: Supporting Links and Supporting Link Descriptions are provided by authors and are not verified by the NIH NCI or the HTAN DCC. This information and any linked data should only be shared by an authorized individual(s) in accordance with the terms of the HTAN data sharing agreements and policies and/or any other applicable agreement(s). Validated as URL nan False
Supporting Link Description Description of relevant external links associated with the publication (e.g An external mouse dataset used for validation). Please note: Supporting Links and Supporting Link Descriptions are provided by authors and are not verified by the NIH NCI or the HTAN DCC. This information and any linked data should only be shared by an authorized individual(s) in accordance with the terms of the HTAN data sharing agreements and policies and-or any other applicable agreement(s). nan False
Tool Were any software or computational tools generated for this content nan True
Accessory Data Type Accesory specific data type nan False
Accessory An empty parent attribute for accessory nan False
Accessory Manifest Accessory specific attributes Component,Dataset Name,Accessory Synapse ID,Accessory Description, Accessory Data Type,HTAN Center ID,HTAN Parent Biospecimen ID,Accessory-associated HTAN Parent Data File ID False
Dataset Name Name of a dataset (e.g. a Synapse folder) nan True
Accessory Synapse ID Synapse ID of folder containing accessory files nan True
Accessory Description Free text field containing description of accessory file(s) nan True
Accessory-associated HTAN Parent Data File ID HTAN Data File Identifier(s) of the files associated with the accessory content. Should be comma-separated lists. nan False
MapQ30 Number of reads with Quality >= 30. nan False
scATAC-seq Object ID Orig.Ident or scATAC-seq Object ID nan False
nCount Peaks Total number of fragments in peaks nan False
nFeature Peaks Number of peaks with at least one read nan False
Total Read-Pairs Total read-pairs nan False
Duplicate Read-Pairs Number of duplicate read-pairs nan False
Chimeric Read-Pairs Number of chimerically mapped read-pairs nan False
Unmapped Read-Pairs Number of read-pairs with at least one end not mapped nan False
LowMapQ Number of read-pairs with <30 mapq on at least one end nan False
Mitochondrial Read-Pairs Number of read-pairs mapping to mitochondria and non-nuclear contigs nan False
Passed Filters Number of non-duplicate, usable read-pairs i.e. fragments nan False
TSS Fragments Number of fragments overlapping with TSS regions nan False
DNase Sensitive Region Fragments Number of fragments overlapping with DNase sensitive regions nan False
Enhancer Region Fragments Number of fragments overlapping enhancer regions nan False
Promoter Region Fragments Number of fragments overlapping promoter regions nan False
On Target Fragments Number of fragments overlapping any of TSS, enhancer, promoter and DNase hypersensitivity sites (counted with multiplicity) nan False
Blacklist Region Fragments Number of fragments overlapping blacklisted regions nan False
Peak Region Fragments Number of fragments overlapping peaks nan False
Peak Region Cutsites Number of ends of fragments in peak regions nan False
Nucleosome Signal Nucleosome signal score (strength of the nucleosome signal per cell, computed as the ratio of fragments between 147 bp and 294 bp (mononucleosome) to fragments < 147 bp (nucleosome-free)) nan False
Nucleosome Percentile Percentile rank of nucleosome score nan False
TSS Enrichment Transcription start site (TSS) enrichment score nan False
TSS Percentile Percentile rank of TSS score nan False
Pct Reads in Peaks Percentage of reads in peaks nan False
Blacklist Ratio Ratio of reads in blacklist regions nan False
Seurat Clusters Clusters of cells by a shared nearest neighbor (SNN) modularity optimization based clustering algorithm nan False
nCount RNA Total number of fragments in genes nan False
nFeature RNA Number of genes detected in cell nan False
MACS2 Seqnames Chromosome id nan False
MACS2 Start Genomic starting position in MACS2 nan False
MACS2 End Genomic ending position in MACS2 nan False
MACS2 Width Width of the peak in bases in MACS2 nan False
MACS2 Strand DNA stand aligned with in MACS2 nan False
MACS2 Name Name of the peak in MACS2 nan False
MACS2 Score Peak score (proportional to q-value) in MACS2 nan False
MACS2 Fold Change Fold enrichment for this peak summit against random Poisson distribution with local lambda in MACS2 nan False
MACS2 Neg Log10 pvalue Summit Negative log10 p-value for the peak summit in MACS2 nan False
MACS2 Neg Log10 qvalue Summit Negative log10 q-value for the peak summit in MACS2 nan False
MACS2 Relative Summit Position Position of the peak summit related to the start position in MACS2 nan False
Is lowest level Denotes that the manifest represents the lowest data level submitted. Use when L1 data is missing nan False
Yes - Is lowest level If manifest is lowest level require HTAN Parent Biospecimen ID HTAN Parent Biospecimen ID False
Normalization Method Description of Normalization Process nan False
Batch Correction Method Method that was used to batch correct Level 3 data nan False
MS Batch ID Batch ID indicating a set of samples that were run together. nan True
MS-based Assay Type Analytes are the target molecules being measured with the assay. nan True
MS-based Targeted Specifies whether or not a specific molecule(s) is/are targeted for detection/measurement by the assay. Example: The MALDI Imaging analyte is lipids. nan True
MS Instrument Vendor and Model An acquisition instrument is the device that contains the signal detection hardware and signal processing software. Assays generate signals such as light of various intensities or color or signals representing the molecular mass. nan True
MS Source The ion source type used for surface sampling (MALDI, MALDI-2, DESI, or SIMS) or LC-MS/MS data acquisition (nESI) nan True
Polarity The polarity of the mass analysis (positive or negative ion modes) nan True
Mass Range Low Value The low value of the scanned mass range for MS1 in m/z. nan True
Mass Range High Value The high value of the scanned mass range for MS1 in m/z. nan True
Data Collection Mode Mode of data collection in tandem MS assays. Either DDA (Data-dependent acquisition) or DIA (Data-indemendent acquisition. nan True
MS Scan Mode Indicates whether experiment is MS, MS/MS, or other (possibly MS3 for TMT) nan True
MS Labeling Indicates whether samples were labeled prior to MS analysis (e.g., TMT) nan True
LC Instrument Vendor and Model The manufacturer of the instrument used for LC. nan True
LC Column Vendor and Model The manufacturer of the LC Column unless self-packed, pulled tip capilary is used and the model number/name of the LC Column - IF custom self-packed, pulled tip calillary is used enter 'Pulled tip capilary' nan True
LC Resin Details of the resin used for lc, including vendor, particle size, pore size nan True
LC Length Value LC column length in cm. nan True
LC Temp Value LC temperature in C. nan True
LC ID Value LC column inner diameter in microns. nan True
LC Flow Rate LC flow rate in nL/min. nan True
LC Gradient The program dictates the mobile phase solvent composition over the course of the chromatographic run. nan True
LC Mobile Phase A Composition of mobile phase A nan True
LC Mobile Phase B Composition of mobile phase B nan True
MS Instrument Metadata File Additional file containing instrument metadata details. Use either synapse_path or entity_Id nan False
Bisulfite Conversion Name of the kit used in bisulfite conversion. nan True
Replicate Type A common term for all files belonging to the same sample. We suggest using a stable sample accession from a biosample archive like BioSamples. nan True
Bulk Methylation Assay Type Assay types normally determine genomic coverage Targeted Genome, Beadchip Array True
Targeted Genome Assay for analyzing specific mutations in a given sample nan False
Beadchip Array Assay that uses beads to target a specific locus on the genome. nan False
Total DNA Input Overall number of reads for a given sample in digits (microgram, nanogram). nan False
Trimmer Software used for trimming nan True
Bulk Methylation Genomic Reference The human genome reference used in the alignment of reads nan True
Duplicate Removal Software Software used for remove duplicate reads nan True
Proportion of Minimum CpG Coverage 10X Proportion of all reference bases for whole genome sequencing, or targeted sequencing, that achieves 10X or greater coverage per CpG. nan False
DMC Calling Tool Software used for calling differentially methylated CpG (DMC) and differentially methylated region (DMR) nan True
DMC Calling Workflow URL Generic name for the workflow used to analyze a data set nan True
DMR Calling Tool Software used for calling differentially methylated CpG (DMC) and differentially methylated region (DMR) nan True
DMR Calling Workflow URL Generic name for the workflow used to analyze a data set nan True
pUC19 methylation ratio Methylation ratio of mostly methylated pUC19 control, as a percentage nan True
Lambda methylation ratio Methylation ratio of mostly unmethylated lambda control, as a percentage nan True
DMC data file format Format of the data files nan True
DMR data file Format Format of the data files. nan True
MS Assay Category Type of Mass Spectrometry performed. nan True
Publication contains HTAN ID HTAN IDs are used in the publication. nan True
Electron Microscopy Level 1 Raw electron microscopy data as one TIFF file per plane for a 3D image stack or per tile for a 2D large area montage Component, Filename, File Format, HTAN Parent Biospecimen ID, HTAN Data File ID, EM method, EM signal or contrast mech, EM instrument, Protocol Link, Software and Version, SizeX, SizeY, SizeC, SizeZ, PhysicalSizeX, PhysicalSizeY, PhysicalSizeZ, EM dwell or exposure time,EM voltage, EM beam current, EM spot size, EM stage tilt, EM signal processing, EM contrast type False
Electron Microscopy Level 2 Processed electron microscopy data as one OME-TIFF image per plane or montage Component, Filename, File Format, HTAN Parent Data File ID, HTAN Data File ID,Tile overlap X, Tile overlap Y,EM contrast type False
Electron Microscopy Level 3 Segmented electron microscopy data as .am or .tiff formats Component, Filename, File Format, HTAN Parent Data File ID, HTAN Data File ID, Imaging Object Class False
Electron Microscopy Level 4 Movies or other derived files from electron microscopy data Component, Filename, File Format, HTAN Parent Data File ID, HTAN Data File ID, Comment False
EM instrument Make and model of the EM instrument used nan True
EM method Electron microscopy method used nan True
EM signal or contrast mech How the electron microscopy signal is generated from the sample nan True
EM dwell or exposure time Duration in microseconds (µs) of electron beam data collection per pixel or frame nan False
EM voltage Accelerating voltage in kiloelectronvolts (keV) nan False
EM beam current Beam current in nanoamps (nA) nan False
EM spot size Beam spot size in micrometers (µm) nan False
EM stage tilt Physical stage tilt in degrees with respect to the electron beam nan False
EM signal processing SNR improvement strategies used nan False
EM contrast type Does the image use standard SEM contrast or TEM contrast nan False
Tile overlap X Percentage of image overlap to allow tile stitching in x direction nan True
Tile overlap Y Percentage of image overlap to allow tile stitching in x direction nan True
Barretts Esophagus Goblet Cells Present Presence or absennce of Barretts esophagus goblet cells. nan False
Pancreatitis Onset Year Date of onset of pancreatitis. nan False
HTAN Parent Channel Metadata ID HTAN ID for a level 3 channels table. nan True
Single Nucleus Capture Nuclei isolation method nan False
Associated mRNA Library Data File ID Sample Level HTAN Data File ID for the associated level - HTAN ID of this file HTAN ID SOP (eg HTANx_yyy_zzz) nan True
Single Cell Barcode Method Applied The method by which cells are multiplex or labeled with cell surface markers or probes nan True
Feature Barcode Library Type The library construction methods for the feature barcode library nan True
Barcode Folder Synapse ID Synapse ID of the folder containing the barcode lists nan True
Barcode Folder File List A comma separated list of filenames in the gzipped folder detailing what barcodes are specific to demultiplexing samples versus providing surface protein data nan True
Microarray Platform ID The NCBI GEO Microarray Platform ID that links to the table containing the array definition nan True
Microarray Molecule Microarray is measuring this kind of molecule nan True
Microarray Label Microarray used this kind of label nan True
Microarray Value Definition What the provided value signifies nan True
Microarray Protocol Auxiliary File Auxiliary file describing the experimental protocols used, as described in the NCBI GEO microarray template, recorded as synapse ID (syn12345). nan True
Participant Vital Status Update Updates to a participants vital status Component, HTAN Participant ID, Vital Status False
Precancer Diagnosis Diagnosis of a precancerous condition Component, HTAN Participant ID, Precancer Case False
Alive This indicates the participant is alive and defines further required metadata Days to Vital Status Reference False
Days to Vital Status Reference Number of days between the date used for index and the reference date for designation of vital status nan True
Precancer Case Yes/No indicator to designate the participant for whom precancerous lesion(s) was identified (premalignancy only). nan True
Yes - Precancer Case Indicates that the participant is a precancer case Precancerous Condition Type, Days to Precancer Case Designation, WHO Precursor Lesion Code False
Days to Precancer Case Designation Number of days between the date used for index and the reference date for designation of precancer status. nan False
WHO Precursor Lesion Code World Health Organization Classification of Tumour cytopathology-based coding system, includes 'precursor lesion' designations for precancers. ICD-O-3 morphology axis format eg 1234/1 nan False