Assay |
A planned process with the objective to produce information about the material entity that is the evaluant, by physically examining it or its proxies.[OBI_0000070] |
nan |
False |
Device |
A thing made or adapted for a particular purpose, especially a piece of mechanical or electronic equipment |
nan |
False |
Sequencing |
Module for next generation sequencing assays |
nan |
False |
Component |
Category of metadata (e.g. Diagnosis, Biospecimen, scRNA-seq Level 1, etc.); provide the same one for all items/rows. |
nan |
True |
Patient |
HTAN patient |
Component, HTAN Participant ID |
False |
File |
A type of Information Content Entity specific to OS |
nan |
False |
Filename |
Name of a file |
nan |
True |
File Format |
Format of a file (e.g. txt, csv, fastq, bam, etc.) |
nan |
True |
CDS Sequencing Template |
CDS compatible template file, includes attributes for Genomic Reference, Library Layout, Data Type, Sequencing Platform, Library Selection Method |
Component, Filename, File Format, HTAN Data File ID, HTAN Parent Biospecimen ID, CDS library_id, CDS library_strategy, CDS library_source, CDS library_selection, CDS library_layout, CDS platform, CDS instrument_model, CDS design_description, CDS reference_genome_assembly, CDS custom_assembly_fasta_file_for_alignment, CDS bases, CDS number_of_reads, CDS coverage, CDS avg_read_length, CDS sequence_alignment_software |
True |
CDS library_id |
Short unique identifier for the sequencing library. |
nan |
True |
CDS library_strategy |
Library strategy |
nan |
True |
CDS library_source |
The Library Source specifies the type of source material that is being sequenced |
nan |
True |
CDS library_selection |
Library Selection Method |
nan |
True |
CDS library_layout |
Paired-end or Single |
nan |
True |
CDS platform |
Sequencing Platform used for Sequencing |
nan |
True |
CDS instrument_model |
Instrument model used for sequencing |
nan |
True |
CDS design_description |
Free-form description of the methods used to create the sequencing library; a brief 'materials and methods' section. |
nan |
False |
CDS reference_genome_assembly |
This is only if you are submitting a bam file aligned against a NCBI assembly. |
nan |
False |
CDS custom_assembly_fasta_file_for_alignment |
Please provide the name of the custom assembly fasta file used during alignment |
nan |
False |
CDS bases |
Count of unique basecalls present in the data. Please count each base only once if using secondary alignments. |
nan |
False |
CDS number_of_reads |
Count of the number of reads in the data. Please count each read only once if using secondary alignments. |
nan |
False |
CDS coverage |
Depth of coverage on assembly used. Found by (Unique Aligned Basecalls)/(Reference Length) |
nan |
False |
CDS avg_read_length |
Found by (Bases)/(Reads) |
nan |
False |
CDS sequence_alignment_software |
The name of the software program used to align nucleotide sequencing data. |
nan |
False |
Checksum |
MD5 checksum of the BAM file |
nan |
True |
HTAN Data File ID |
Self-identifier for this data file - HTAN ID of this file HTAN ID SOP (eg HTANx_yyy_zzz) |
nan |
True |
HTAN Participant ID |
HTAN ID associated with a patient based on HTAN ID SOP (eg HTANx_yyy ) |
nan |
True |
HTAN Biospecimen ID |
HTAN ID associated with a biosample based on HTAN ID SOP (eg HTANx_yyy_zzz) |
nan |
True |
HTAN Parent ID |
HTAN ID of parent from which the biospecimen was obtained. Parent could be another biospecimen or a research participant. |
nan |
True |
HTAN Parent Biospecimen ID |
HTAN Biospecimen Identifier (eg HTANx_yyy_zzz) indicating the biospecimen(s) from which these files were derived; multiple parent biospecimen should be comma-separated |
nan |
True |
HTAN Parent Data File ID |
HTAN Data File Identifier indicating the file(s) from which these files were derived |
nan |
True |
Clinical Data Tier 2 |
Tier 2 Cancer Data |
Component, HTAN Participant ID, Timepoint Label, Start Days from Index, Stop Days from Index, Sentinel Lymph Node Count, Sentinel Node Positive Assessment Count, Tumor Extranodal Extension Indicator, Satellite Metastasis Present Indicator, Other Biopsy Resection Site, Extent of Tumor Resection, Prior Sites of Radiation, Immunosuppression, Concomitant Medication Received Type, Family Member Vital Status Indicator, COVID19 Occurrence Indicator, COVID19 Current Status, COVID19 Positive Lab Test Indicator, COVID19 Antibody Testing, COVID19 Complications Severity, COVID19 Cancer Treatment Followup, Ecig vape use, Ecig vape 30 day use num, Ecig vape times per day, Type of smoke exposure cumulative years, Chewing tobacco daily use count, Second hand smoke exposure years, Known Genetic Predisposition Mutation, Hereditary Cancer Predisposition Syndrome, Cancer Associated Gene Mutations, Mutational Signatures, Mismatch Repair System Status, Lab Tests for MMR Status, Mode of Cancer Detection, Education Level, Country of Birth, Medically Underserved Area, Rural vs Urban, Cancer Incidence, Cancer Incidence Location |
False |
SRRS Clinical Data Tier 2 |
Cancer related clinical data specific to SRRS |
Component, HTAN Participant ID, Timepoint Label, Start Days from Index, Stop Days from Index, Education Level, Country of Birth, Medically Underserved Area, Rural vs Urban, Cancer Incidence, Cancer Incidence Location, Ethnicity, Gender, Race, Vital Status, Age at Diagnosis, Days to Last Follow up, Days to Last Known Disease Status, Days to Recurrence, Last Known Disease Status, Morphology, Primary Diagnosis, Progression or Recurrence, Site of Resection or Biopsy, Tissue or Organ of Origin, NCI Atlas Cancer Site, Tumor Grade, Pack Years Smoked, Years Smoked, Days to Follow Up, Gene Symbol, Molecular Analysis Method, Test Result, Treatment Type, Tumor Largest Dimension Diameter |
False |
Lung Cancer Tier 3 |
Lung cancer specific attributes in Clinical Data Tier 3 |
Component, HTAN Participant ID, Timepoint Label, Start Days from Index, Stop Days from Index, Lung Cancer Detection Method Type, Lung Cancer Participant Procedure History, Lung Adjacent Histology Type, Lung Tumor Location Anatomic Site, Lung Tumor Lobe Bronchial Location, Current Lung Cancer Symptoms, Lung Topography, Lung Cancer Harboring Genomic Aberrations |
False |
Colorectal Cancer Tier 3 |
Colorectal cancer specific attributes in Clinical Data Tier 3 |
Component, HTAN Participant ID, Timepoint Label, Start Days from Index, Stop Days from Index,Colorectal Cancer Detection Method Type, History of Prior Colon Polyps, Family Colon Cancer History Indicator, Family Medical History Colorectal Polyp Diagnosis, Immediate Family History Endometrial Cancer, Immediate Family History Ovarian Cancer, Patient Inflammatory Bowel Disease Personal Medica History, Patient Colonoscopy Performed Indicator, Colorectal Cancer Tumor Border Configuration, MLH1 Promoter Methylation Status, Colorectal Cancer KRAS Indicator, Colon Polyp Occurence Indicator, Family History Colorectal Polyp, Colorectal Polyp New Indicator, Colorectal Polyp Shape, Size of Polyp Removed, Colorectal Polyp Count, Colorectal Polyp Type, Colorectal Polyp Adenoma Type |
False |
Breast Cancer Tier 3 |
Breast cancer specific attributes in Clinical Data Tier 3 |
Component, HTAN Participant ID, Timepoint Label, Start Days from Index, Stop Days from Index,Breast Carcinoma Detection Method Type, Breast Carcinoma Histology Category, Invasive Lobular Breast Carcinoma Histologic Category, Invasive Ductal Breast Carcinoma Histologic Category, Breast Biopsy Procedure Finding Type, Breast Quadrant Site, Breast Cancer Assessment Tests, Breast Cancer Genomic Test Performed, Mammaprint Risk Group, Oncotype Risk Group, Breast Carcinoma Estrogen Receptor Status, Breast Carcinoma Progesteroner Receptor Status, Breast Cancer Allred Estrogen Receptor Score, Prior Invasive Breast Disease, Breast Carcinoma ER Status Percentage Value, Breast Carcinoma PR Status Percentage Value, HER2 Breast Carcinoma Copy Number Total, Breast Carcinoma Centromere 17 Copy Number, Breast Carcinoma HER2 Centromere17 Copynumber Total, Breast Carcinoma HER2 Chromosome17 Ratio, Breast Carcinoma Surgical Procedure Name, Breast Carcinoma HER2 Ratio Diagnosis, Breast Carcinoma HER2 Status, Hormone Therapy Breast Cancer Prevention Indicator, Breast Carcinoma ER Staining Intensity, Breast Carcinoma PR Staining Intensity, Oncotype Score, Breast Imaging Performed Type, Multifocal Breast Carcinoma Present Indicator, Multicentric Breast Carcinoma Present Indicator, BIRADS Mammography Breast Density Category |
False |
Neuroblastoma and Glioma Tier 3 |
Brain cancer specific attributes in Clinical Data Tier 3 |
Component, HTAN Participant ID, Timepoint Label, Start Days from Index, Stop Days from Index,CNS Tumor Primary Anatomic Site, Glioma Specific Metastasis Sites, Glioma Specific Radiation Field, Supra Tentorial Ependymoma Molecular Subgroup, Infra Tentorial Ependymoma Molecular Subgroup, Neuroblastoma MYCN Gene Amplification Status |
False |
Acute Lymphoblastic Leukemia Tier 3 |
Acute Lymphoblastic Leukemia attributes in Clinical Data Tier 3 |
Component, HTAN Participant ID, Timepoint Label, Start Days from Index, Stop Days from Index,Specimen Blast Count Percentage Value, NCI ALL Risk Group, MRD ALL Diagnostic Sensitivity, CNS Leukemia Status |
False |
Ovarian Cancer Tier 3 |
Ovarian cancer specific attributes in Clinical Data Tier 3 |
Component, HTAN Participant ID, Timepoint Label, Start Days from Index, Stop Days from Index,Ovarian Cancer Histologic Subtype, Ovarian Cancer Surgical Outcome, Ovarian Cancer Platinum Status |
False |
Prostate Cancer Tier 3 |
Prostate cancer specific attributes in Clinical Data Tier 3 |
Component, HTAN Participant ID, Timepoint Label, Start Days from Index, Stop Days from Index,Location Extent Extraprostatic Extension, Location Nature Positive Margins, Seminal Vesicle Invasion, Prostate Carcinoma Histologic Type, Prostate Cancer Local Extent, Additonal Findings Uninvolved Prostate, Prostate Cancer Cytologic Morphologic Subtypes |
False |
Sarcoma Tier 3 |
Sarcoma specific attributes in Clinical Data Tier 3 |
Component, HTAN Participant ID, Timepoint Label, Start Days from Index, Stop Days from Index,Sarcoma Subtype, Sarcoma Diagnosis Classification Category, Sarcoma Tumor Extension Type |
False |
Pancreatic Cancer Tier 3 |
Pancreatic cancer specific attributes in Clinical Tier Data 3 |
Component, HTAN Participant ID, Timepoint Label, Start Days from Index, Stop Days from Index,Pancreas Precancer Histopathologic Grade, Pancreatic IPMN Pathology Epithelial Subtype, Pancreatic Duct Final Pathology Type |
False |
Melanoma Tier 3 |
Melanoma specific attributes in Clinical Data Tier 3 |
Component, HTAN Participant ID, Timepoint Label, Start Days from Index, Stop Days from Index, Cutaneous Melanoma Tumor Infiltrating Lymphocytes, Cutaneous Melanoma Tumor Regression Range, Melanoma Specimen Clark Level Value, Cutaneous Melanoma Surgical Margins, Melanoma Lesion Size, History of Atypical Nevi, Fitzpatrick Skin Tone, History of Chronic UV Exposure, History of Blistering Sunburn, History of Tanning Bed Use, Immediate Family History Melanoma, Melanoma Biopsy Resection Sites, Cutaneous Melanoma Ulceration, Cutaneous Melanoma Additional Findings |
False |
Demographics |
Demographic attributes |
Component, HTAN Participant ID, Ethnicity, Gender, Race, Vital Status, Days to Birth, Country of Residence, Age Is Obfuscated, Year Of Birth, Occupation Duration Years, Premature At Birth, Weeks Gestation at Birth |
False |
Family History |
Family cancer history |
Component, HTAN Participant ID, Relative with Cancer History |
False |
Exposure |
Exposure to carcinogens |
Component, HTAN Participant ID, Start Days from Index, Smoking Exposure, Alcohol Exposure, Asbestos Exposure, Coal Dust Exposure, Environmental Tobacco Smoke Exposure, Radon Exposure, Respirable Crystalline Silica Exposure |
False |
Follow Up |
Follow up clinical visits |
Component, HTAN Participant ID, Days to Follow Up, Adverse Event, Progression or Recurrence, Barretts Esophagus Goblet Cells Present, BMI, Cause of Response, Comorbidity, Comorbidity Method of Diagnosis, Days to Adverse Event, Days to Comorbidity, Diabetes Treatment Type, Disease Response, DLCO Ref Predictive Percent, ECOG Performance Status, FEV1 FVC Post Bronch Percent, FEV 1 FVC Pre Bronch Percent, FEV1 Ref Post Bronch Percent, FEV1 Ref Pre Bronch Percent, Height, Hepatitis Sustained Virological Response, HPV Positive Type, Karnofsky Performance Status, Menopause Status, Pancreatitis Onset Year, Reflux Treatment Type, Risk Factor, Risk Factor Treatment, Viral Hepatitis Serologies, Weight, Adverse Event Grade, AIDS Risk Factors, Body Surface Area, CD4 Count, CDC HIV Risk Factors, Days to Imaging, Evidence of Recurrence Type, HAART Treatment Indicator, HIV Viral Load, Hormonal Contraceptive Use, Hysterectomy Margins Involved, Hysterectomy Type, Imaging Result, Imaging Type, Immunosuppressive Treatment Type, Nadir CD4 Count, Pregnancy Outcome, Recist Targeted Regions Number, Recist Targeted Regions Sum, Scan Tracer Used |
False |
Therapy |
Clinical therapy or treatment |
Component, HTAN Participant ID, Treatment or Therapy, Treatment Type, Treatment Effect, Treatment Outcome, Days to Treatment End, Treatment Anatomic Site, Days to Treatment Start, Initial Disease Status, Regimen or Line of Therapy, Therapeutic Agents, Treatment Intent Type, Chemo Concurrent to Radiation, Number of Cycles, Reason Treatment Ended, Treatment Arm, Treatment Dose, Treatment Dose Units, Treatment Effect Indicator, Treatment Frequency |
False |
Diagnosis |
Disease diagnosis |
Component, HTAN Participant ID, Age at Diagnosis, Year of Diagnosis, Primary Diagnosis, Precancerous Condition Type, Site of Resection or Biopsy, Tissue or Organ of Origin, Morphology, Tumor Grade, Progression or Recurrence, Last Known Disease Status, Days to Last Follow up, Days to Last Known Disease Status, Method of Diagnosis, Prior Malignancy, Prior Treatment, Metastasis at Diagnosis, Metastasis at Diagnosis Site, First Symptom Prior to Diagnosis, Days to Diagnosis, Percent Tumor Invasion, Residual Disease, Synchronous Malignancy, Tumor Confined to Organ of Origin, Tumor Focality, Tumor Largest Dimension Diameter, Gross Tumor Weight, Breslow Thickness, Vascular Invasion Present, Vascular Invasion Type, Anaplasia Present, Anaplasia Present Type, Laterality, Perineural Invasion Present, Lymphatic Invasion Present, Lymph Nodes Positive, Lymph Nodes Tested, Peritoneal Fluid Cytological Status, Classification of Tumor, Best Overall Response, Mitotic Count, AJCC Clinical M, AJCC Clinical N, AJCC Clinical Stage, AJCC Clinical T, AJCC Pathologic M, AJCC Pathologic N, AJCC Pathologic Stage, AJCC Pathologic T, AJCC Staging System Edition, Cog Neuroblastoma Risk Group, Cog Rhabdomyosarcoma Risk Group, Gleason Grade Group, Gleason Grade Tertiary, Gleason Patterns Percent, Greatest Tumor Dimension, IGCCCG Stage, INPC Grade, INPC Histologic Group, INRG Stage, INSS Stage, International Prognostic Index, IRS Group, IRS Stage, ISS Stage, Lymph Node Involved Site, Margin Distance, Margins Involved Site, Medulloblastoma Molecular Classification, Micropapillary Features, Mitosis Karyorrhexis Index, Non Nodal Regional Disease, Non Nodal Tumor Deposits, Ovarian Specimen Status, Ovarian Surface Involvement, Pregnant at Diagnosis, Primary Gleason Grade, Secondary Gleason Grade, Supratentorial Localization, Tumor Depth, WHO CNS Grade, WHO NTE Grade |
False |
Molecular Test |
Clinical molecular test data |
Component, HTAN Participant ID, Timepoint Label, Start Days from Index, Stop Days from Index, Gene Symbol, Molecular Analysis Method, Test Result, AA Change, Antigen, Clinical Biospecimen Type, Blood Test Normal Range Upper, Blood Test Normal Range Lower, Cell Count, Chromosome, Clonality, Copy Number, Cytoband, Exon, Histone Family, Histone Variant, Intron, Laboratory Test, Loci Abnormal Count, Loci Count, Locus, Mismatch Repair Mutation, Molecular Consequence, Pathogenicity, Ploidy, Second Exon, Second Gene Symbol, Specialized Molecular Test, Test Analyte Type, Test Units, Test Value, Transcript, Variant Origin, Variant Type, Zygosity |
False |
Biospecimen |
HTAN biological entity; this can be tissue, blood, analyte and subsamples of those |
Component, HTAN Biospecimen ID, Source HTAN Biospecimen ID, HTAN Parent ID, Timepoint Label, Collection Days from Index, Adjacent Biospecimen IDs, Biospecimen Type, Acquisition Method Type, Fixative Type, Storage Method, Processing Days from Index, Protocol Link, Site Data Source, Collection Media, Mounting Medium, Processing Location, Histology Assessment By, Histology Assessment Medium, Preinvasive Morphology, Tumor Infiltrating Lymphocytes, Degree of Dysplasia, Dysplasia Fraction, Number Proliferating Cells, Percent Eosinophil Infiltration, Percent Granulocyte Infiltration, Percent Inflam Infiltration, Percent Lymphocyte Infiltration, Percent Monocyte Infiltration, Percent Necrosis, Percent Neutrophil Infiltration, Percent Normal Cells, Percent Stromal Cells, Percent Tumor Cells, Percent Tumor Nuclei, Fiducial Marker, Slicing Method, Lysis Buffer, Method of Nucleic Acid Isolation |
False |
SRRS Biospecimen |
SRRS-specific HTAN biological entity; this can be tissue, blood, analyte and subsamples of those, however it can be described via fewer attributes than a standard HTAN specimen |
Component, HTAN Biospecimen ID, Source HTAN Biospecimen ID, HTAN Parent ID, Adjacent Biospecimen IDs, Biospecimen Type, Timepoint Label, Collection Days from Index, Acquisition Method Type, Ischemic Time, Ischemic Temperature, Collection Media, Topography Code, Additional Topography, Fixative Type, Storage Method, Preinvasive Morphology, Histologic Morphology Code, Preservation Method, Processing Days from Index, Protocol Link |
False |
Source HTAN Biospecimen ID |
This is the HTAN ID that may have been assigned to the biospecimen at the site of biospecimen origin (e.g. BU). |
nan |
False |
Other Assay |
Metadata applying to any assay without standard descriptors. Can be used as a placeholder for minimal amount of metadata until the assay descriptors are standardized |
Component, Filename, File Format, HTAN Parent Biospecimen ID, HTAN Data File ID, Assay Type |
False |
ExSeq Minimal |
Minimal metadata for the ExSeq assay |
Component, Filename, File Format, HTAN Parent Biospecimen ID, HTAN Data File ID, Assay Type |
False |
Assay Type |
The type and level of assay this metadata applies to (e.g. RPPA, NanoString DSP, etc.) |
nan |
True |
scRNA-seq Level 1 |
Single-cell RNA-seq [EFO_0008913] |
Component, Filename, File Format, HTAN Parent Biospecimen ID, HTAN Data File ID, Nucleic Acid Source, Cryopreserved Cells in Sample, Single Cell Isolation Method, Dissociation Method, Library Construction Method, Read Indicator, Read1, Read2, End Bias, Reverse Transcription Primer, Spike In, Sequencing Platform, Total Number of Input Cells, Input Cells and Nuclei, Library Preparation Days from Index, Single Cell Dissociation Days from Index, Sequencing Library Construction Days from Index, Nucleic Acid Capture Days from Index, Protocol Link, Technical Replicate Group |
False |
scRNA-seq Level 2 |
Alignment workflows downstream of scRNA-seq Level 1 |
Component, Filename, File Format, HTAN Parent Data File ID, HTAN Data File ID, scRNAseq Workflow Type, Workflow Version, scRNAseq Workflow Parameters Description, Workflow Link, Genomic Reference, Genomic Reference URL, Genome Annotation URL, Checksum, Whitelist Cell Barcode File Link, Cell Barcode Tag, UMI Tag, Applied Hard Trimming |
False |
scRNA-seq Level 3 |
Gene and Isoform expression files |
Component, Filename, File Format, HTAN Parent Data File ID, HTAN Data File ID, Data Category, Matrix Type, Linked Matrices, Cell Median Number Reads, Cell Median Number Genes, Cell Total, scRNAseq Workflow Type, scRNAseq Workflow Parameters Description, Workflow Link, Workflow Version |
False |
scRNA-seq Level 4 |
Data represents the relationships between cells derived from Level 3 expression data and shown as tSNE or UMAP coordinates per cell, plus all other cell-specific meta information (e.g., cell type) |
Component, Filename, File Format, HTAN Parent Data File ID, HTAN Data File ID, scRNAseq Workflow Type, scRNAseq Workflow Parameters Description, Workflow Version, Workflow Link |
False |
Slide-seq Level 1 |
Raw sequencing files for the Slide-seq assay. |
Component, Filename, File Format, HTAN Parent Biospecimen ID, HTAN Data File ID, Nucleic Acid Source, Read Indicator, Spatial Read1, Spatial Read2, End Bias, Reverse Transcription Primer, Spatial Barcode Offset, Spatial Barcode and UMI, Spike In, Sequencing Platform, Technical Replicate Group, Protocol Link, Spatial Library Construction Method, Library Preparation Days from Index, Sequencing Library Construction Days from Index, Nucleic Acid Capture Days from Index |
False |
Slide-seq Level 2 |
Aligned sequencing files and QC for the Slide-seq assay. |
Component, Filename, File Format, HTAN Parent Data File ID, HTAN Data File ID, Slide-seq Workflow Type, Workflow Version, Slide-seq Workflow Parameter Description, Workflow Link, Genomic Reference, Genomic Reference URL, Genome Annotation URL, Checksum, Spatial Barcode Tag, Matched Spatial Barcode Tag, UMI Tag, Applied Hard Trimming |
False |
Slide-seq Level 3 |
Gene matrices with features and barcodes for Slide-seq as well as spatial information (bead location files). |
Component, Filename, File Format, HTAN Parent Data File ID, HTAN Data File ID, Run ID, Sequencing Batch ID, Data Category, Matrix Type, Slide-seq Workflow Type, Workflow Version, Slide-seq Workflow Parameter Description, Workflow Link, Beads Total, Median UMI Counts per Spot, Median Number Genes per Spatial Spot, Slide-seq Bead File Type, Slide-seq Fragment Size |
False |
Slide-seq Fragment Size |
Average cDNA length associated with the experiemtn. Integer |
nan |
False |
Matched Spatial Barcode Tag |
SAM tag for matched spot barcode field; please provide a valid spot barcode tag (e.g. CB:Z) (Slide-seq specific) |
nan |
True |
Beads Total |
Number of sequenced beads. Applies to raw counts matrix only. Integer |
nan |
False |
Slide-seq Workflow Type |
Generic name for the workflow used to analyze the Slide-seq data set. String |
nan |
True |
Slide-seq Workflow Parameter Description |
Parameters used to run the Slide-seq workflow. String |
nan |
True |
Slide-seq Bead File Type |
The type of Level 3 file submitted as part of the Slide-seq workflow. |
nan |
True |
Bulk RNA-seq Level 1 |
Bulk RNA-seq [EFO_0003738] |
Component, Filename, File Format, HTAN Parent Biospecimen ID, HTAN Data File ID, Library Layout, Read Indicator, Nucleic Acid Source, Micro-region Seq Platform, ROI Tag, Sequencing Platform, Sequencing Batch ID, Read Length, Library Selection Method, Library Preparation Kit Name, Library Preparation Kit Vendor, Library Preparation Kit Version, Library Preparation Days from Index, Spike In, Adapter Name, Adapter Sequence, Base Caller Name, Base Caller Version, Flow Cell Barcode, Fragment Maximum Length, Fragment Mean Length, Fragment Minimum Length, Fragment Standard Deviation Length, Lane Number, Library Strand, Multiplex Barcode, Size Selection Range, Target Depth, To Trim Adapter Sequence, Transcript Integrity Number, RIN, DV200, Adapter Content, Basic Statistics, Encoding, Kmer Content, Overrepresented Sequences, Per Base N Content, Per Base Sequence Content, Per Base Sequence Quality, Per Sequence GC Content, Per Sequence Quality Score, Per Tile Sequence Quality, Percent GC Content, Sequence Duplication Levels, Sequence Length Distribution, Total Reads, QC Workflow Type, QC Workflow Version, QC Workflow Link |
False |
Bulk RNA-seq Level 2 |
Bulk RNA-seq alignment protocol description |
Component, Filename, File Format, HTAN Parent Data File ID, HTAN Data File ID, Alignment Workflow Url, Alignment Workflow Type, Genomic Reference, Genomic Reference URL, Index File Name, Average Base Quality, Average Insert Size, Average Read Length, Contamination, Contamination Error, Mean Coverage, MSI Workflow Link, MSI Score, MSI Status, Pairs On Diff CHR, Total Reads, Total Uniquely Mapped, Total Unmapped reads, Proportion Reads Duplicated, Proportion Reads Mapped, Proportion Targets No Coverage, Proportion Base Mismatch, Short Reads, Is lowest level |
False |
Bulk RNA-seq Level 3 |
Bulk RNA-seq gene expression matrices |
Component, Filename, File Format, HTAN Parent Data File ID, HTAN Data File ID, Pseudo Alignment Used, Data Category, Expression Units, Matrix Type, Fusion Gene Detected, Fusion Gene Identity |
False |
Bulk WES Level 1 |
Bulk Whole Exome Sequencing raw files |
Component, Filename, File Format, HTAN Parent Biospecimen ID, HTAN Data File ID, Sequencing Batch ID, Library Layout, Read Indicator, Library Selection Method, Read Length, Target Capture Kit, Library Preparation Kit Name, Library Preparation Kit Vendor, Library Preparation Kit Version, Sequencing Platform, Adapter Name, Adapter Sequence, Base Caller Name, Base Caller Version, Flow Cell Barcode, Fragment Maximum Length, Fragment Mean Length, Fragment Minimum Length, Fragment Standard Deviation Length, Lane Number, Multiplex Barcode, Library Preparation Days from Index, Size Selection Range, Target Depth, To Trim Adapter Sequence |
False |
Bulk WES Level 2 |
Bulk Whole Exome Sequencing aligned files and QC |
Component, Filename, File Format, HTAN Parent Data File ID, HTAN Data File ID, Alignment Workflow Type, Genomic Reference, Genomic Reference URL, Index File Name, Average Base Quality, Average Insert Size, Average Read Length, Contamination, Contamination Error, Mean Coverage, Adapter Content, Basic Statistics, Encoding, Overrepresented Sequences, Per Base N Content, Per Base Sequence Content, Per Base Sequence Quality, Per Sequence GC Content, Per Sequence Quality Score, Per Tile Sequence Quality, Percent GC Content, Sequence Duplication Levels, Sequence Length Distribution, QC Workflow Type, QC Workflow Version, QC Workflow Link, MSI Workflow Link, MSI Score, MSI Status, Pairs On Diff CHR, Total Reads, Total Uniquely Mapped, Total Unmapped reads, Proportion Reads Duplicated, Proportion Reads Mapped, Proportion Targets No Coverage, Proportion Base Mismatch, Short Reads, Proportion Coverage 10x, Proportion Coverage 30X,Is lowest level |
False |
Bulk WES Level 3 |
Bulk Whole Exome Sequencing called variants |
Component, Filename, File Format, HTAN Parent Data File ID, HTAN Data File ID, Genomic Reference, Genomic Reference URL, Germline Variants Workflow URL, Germline Variants Workflow Type, Somatic Variants Workflow URL, Somatic Variants Workflow Type, Somatic Variants Sample Type, Structural Variant Workflow URL, Structural Variant Workflow Type |
False |
Microarray Level 1 |
Microarray Level 1 refers to the raw text table of probe level intensities |
Component, Filename, File Format, HTAN Data File ID, HTAN Participant ID, HTAN Parent Biospecimen ID, Nucleic Acid Source, Microarray Platform ID, Microarray Molecule, Microarray Label, Microarray Value Definition, Microarray Protocol Auxiliary File |
False |
Microarray Level 2 |
Microarray Level 2 provides a normalized matrix of values. |
Component, Filename, File Format, HTAN Participant ID, HTAN Parent Biospecimen ID, HTAN Parent Data File ID, HTAN Data File ID, Microarray Platform ID, Normalization Method |
False |
scATAC-seq Level 1 |
scATAC-seq files containing sequence read information, with or without alignment, as FASTQ or BAM files |
Component, Filename, File Format, HTAN Parent Biospecimen ID, HTAN Data File ID, Nucleic Acid Source, Dissociation Method, Single Nucleus Buffer, Single Cell Isolation Method, Transposition Reaction, scATACseq Library Layout, Nucleus Identifier, Nuclei Barcode Length, Nuclei Barcode Read, scATACseq Read1, scATACseq Read2, scATACseq Read3, Library Construction Method, Sequencing Platform, Threshold for Minimum Passing Reads, Total Number of Passing Nuclei, Median Fraction of Reads in Peaks, Median Fraction of Reads in Annotated cis DNA Elements, Median Passing Read Percentage, Median Percentage of Mitochondrial Reads per Nucleus,Technical Replicate Group, Total Reads, Protocol Link |
False |
scATAC-seq Level 2 |
scATAC-seq files containing aligned sequence data, as a BAM file |
Component, Filename, File Format, HTAN Parent Data File ID, HTAN Data File ID, Alignment Workflow Url, Alignment Workflow Type, Genomic Reference, Genomic Reference URL, Index File Name, Average Base Quality, Average Insert Size, Average Read Length, Mean Coverage, Pairs On Diff CHR, Total Reads, Proportion Reads Mapped, MapQ30, Total Uniquely Mapped, Total Unmapped reads, Proportion Reads Duplicated, Short Reads, Proportion Coverage 10x, Proportion Coverage 30X, Proportion Targets No Coverage, Proportion Base Mismatch, Median Percentage of Mitochondrial Reads per Nucleus, Contamination,Contamination Error |
False |
scATAC-seq Level 3 |
Processed data files containing peak information for cells |
Component, Filename, File Format, HTAN Parent Data File ID, HTAN Data File ID, scATAC-seq Object ID, nCount Peaks, nFeature Peaks, Total Read-Pairs, Duplicate Read-Pairs, Chimeric Read-Pairs, Unmapped Read-Pairs, LowMapQ, Mitochondrial Read-Pairs, Passed Filters, TSS Fragments, DNase Sensitive Region Fragments, Enhancer Region Fragments, Promoter Region Fragments, On Target Fragments, Blacklist Region Fragments, Peak Region Fragments, Peak Region Cutsites, Nucleosome Signal, Nucleosome Percentile, TSS Enrichment, TSS Percentile, Pct Reads in Peaks, Blacklist Ratio, Seurat Clusters, nCount RNA, nFeature RNA, MACS2 Seqnames, MACS2 Start, MACS2 End, MACS2 Width, MACS2 Strand, MACS2 Name, MACS2 Score, MACS2 Fold Change, MACS2 Neg Log10 pvalue Summit, MACS2 Neg Log10 qvalue Summit, MACS2 Relative Summit Position |
False |
scmC-seq Level 1 |
Files contain raw scmC-seq data. |
Component, Filename, File Format, HTAN Parent Biospecimen ID, HTAN Data File ID, Nucleic Acid Source, scmCseq Read1, scmCseq Read2, scmCseq Read3, Single Cell Isolation Method, Single Nucleus Buffer, Single Nucleus Capture, Bisulfite Conversion, Library Layout, Nucleus Identifier, Sequencing Platform, Technical Replicate Group, Median Fraction of Reads in Peaks, Median Passing Read Percentage, Peaks Calling Software, Median Percentage of Mitochondrial Reads per Nucleus, Threshold for Minimum Passing Reads, Total Number of Passing Nuclei, Total Reads |
False |
scmC-seq Level 2 |
Files contain scmC-seq files containing aligned sequence data, as a BAM file. |
Component, Filename, File Format, HTAN Parent Data File ID, HTAN Data File ID, Alignment Workflow Url, Alignment Workflow Type, Genomic Reference, Genomic Reference URL, Index File Name, Average Base Quality, Average Insert Size, Average Read Length, Contamination, Contamination Error, Mean Coverage, Pairs On Diff CHR, Total Reads, Total Uniquely Mapped, Total Unmapped reads, Proportion Reads Duplicated, Proportion Reads Mapped, Proportion Targets No Coverage, Proportion Base Mismatch, Short Reads |
False |
scATAC-seq Level 4 |
Data represents the relationships between cells derived from Level 3 expression data and shown as tSNE or UMAP coordinates per cell, plus all other cell-specific meta information (e.g., cell type) |
Component, Filename, File Format, HTAN Parent Data File ID, HTAN Data File ID, scATACseq Workflow Type, scATACseq Workflow Parameters Description, Workflow Version, Workflow Link |
False |
scDNA-seq Level 1 |
Single-cell DNA-seq |
Component, Filename, File Format, HTAN Parent Biospecimen ID, HTAN Data File ID, Sequencing Batch ID, Library Layout, Nucleic Acid Source, Library Selection Method, Read Length, Library Preparation Kit Name, Library Preparation Kit Vendor, Library Preparation Kit Version, Adapter Name, Adapter Sequence, Base Caller Name, Base Caller Version, Flow Cell Barcode, Fragment Maximum Length, Fragment Mean Length, Fragment Minimum Length, Fragment Standard Deviation Length, Lane Number, Library Strand, Multiplex Barcode, Size Selection Range, Target Depth, To Trim Adapter Sequence, Adapter Content, Basic Statistics, Encoding, Kmer Content, Overrepresented Sequences, Per Base N Content, Per Base Sequence Content, Per Base Sequence Quality, Per Sequence GC Content, Per Sequence Quality Score, Per Tile Sequence Quality, Percent GC Content, Sequence Duplication Levels, Sequence Length Distribution, Total Reads, QC Workflow Type, QC Workflow Version, QC Workflow Link |
False |
scDNA-seq Level 2 |
Alignment workflows downstream of scDNA-seq Level 1 |
Component, Filename, File Format, HTAN Parent Data File ID, HTAN Data File ID, Alignment Workflow Url, Alignment Workflow Type, Genomic Reference, Genomic Reference URL, Index File Name, Average Base Quality, Average Insert Size, Average Read Length, Mean Coverage, Pairs On Diff CHR, Total Reads, Proportion Reads Mapped, MapQ30, Total Uniquely Mapped, Total Unmapped reads,Proportion Reads Duplicated, Short Reads, Proportion Coverage 10x, Proportion Coverage 30X, Proportion Targets No Coverage, Proportion Base Mismatch, Proportion Mitochondrial Reads, Contamination, Contamination Error |
False |
Multiplexed CITE-seq Level 1 |
Raw sequencing files for the multiplexed CITE-seq assay |
Component, Filename, File Format, HTAN Parent Biospecimen ID, HTAN Data File ID, Nucleic Acid Source,Cryopreserved Cells in Sample, Single Cell Isolation Method, Dissociation Method, Library Construction Method,Read Indicator, Read1, Read2, End Bias, Reverse Transcription Primer, Spike In, Spike In Concentration, Sequencing Platform, Total Number of Input Cells, Input Cells and Nuclei, Library Preparation Days from Index, Single Cell Dissociation Days from Index, Sequencing Library Construction Days from Index, Nucleic Acid Capture Days from Index, Protocol Link, Technical Replicate Group, Empty Well Barcode,Well Index,Feature Reference Id, Associated mRNA Library Data File ID, Single Cell Barcode Method Applied, Feature Barcode Library Type, Barcode Folder Synapse ID, Barcode Folder File List |
False |
Multiplexed CITE-seq Level 2 |
Alignment workflows downstream of Multiplexed CITE-seq Level 1 |
Component, Filename, File Format, HTAN Parent Biospecimen ID, HTAN Parent Data File ID, HTAN Data File ID, Associated mRNA Library Data File ID, scRNAseq Workflow Type, Workflow Version, scRNAseq Workflow Parameters Description, Workflow Link, Genomic Reference, Genomic Reference URL, Genome Annotation URL, Checksum, Whitelist Cell Barcode File Link, Cell Barcode Tag, UMI Tag, Applied Hard Trimming |
False |
Multiplexed CITE-seq Level 3 |
Gene and Isoform expression files |
Component, Filename, File Format, HTAN Parent Data File ID, HTAN Parent Biospecimen ID, HTAN Data File ID, Associated mRNA Library Data File ID, Data Category, Matrix Type, Linked Matrices, Cell Median Number Reads, Cell Median Number Genes, Cell Total, scRNAseq Workflow Type, scRNAseq Workflow Parameters Description, Workflow Link, Workflow Version |
False |
Multiplexed CITE-seq Level 4 |
Data represents the relationships between cells derived from Level 3 expression data and shown as tSNE or UMAP coordinates per cell, plus all other cell-specific meta information (e.g., cell type) |
Component, Filename, File Format, HTAN Parent Data File ID, HTAN Parent Biospecimen ID, HTAN Data File ID, Associated mRNA Library Data File ID, scRNAseq Workflow Type, scRNAseq Workflow Parameters Description, Workflow Version, Workflow Link |
False |
Bulk Methylation-seq Level 1 |
Raw data for bulk methylation sequencing, such as FASTQs and unaligned BAMs |
Component, Filename, File Format, HTAN Parent Biospecimen ID, HTAN Data File ID, Nucleic Acid Source, Bisulfite Conversion, Sequencing Platform, Replicate Type, Bulk Methylation Assay Type, Total DNA Input |
False |
Bulk Methylation-seq Level 2 |
Aligned primary data for bulk methylation sequencing, such as gene expression matrix files, VCFs, etc. |
Component, Filename, File Format, HTAN Parent Data File ID, HTAN Data File ID, Alignment Workflow Url, Trimmer, Bulk Methylation Genomic Reference, Genomic Reference URL, Index File Name, Alignment Workflow Type, Duplicate Removal Software, Mean Coverage, Library Layout, Average Base Quality, Average Insert Size, Average Read Length, Contamination, Contamination Error, Pairs On Diff CHR, Total Reads, Total Uniquely Mapped, Total Unmapped reads, Proportion Reads Duplicated, Proportion Reads Mapped, Proportion Targets No Coverage, Proportion Base Mismatch, Short Reads, Proportion of Minimum CpG Coverage 10X, Proportion Coverage 30X |
False |
Bulk Methylation-seq Level 3 |
Sample level summary data for bulk methylation sequencing, such as t-SNE plot coordinates, etc. |
Component, Filename, File Format, HTAN Parent Data File ID, HTAN Data File ID,DMC Calling Tool, DMC Calling Workflow URL, DMR Calling Tool, DMR Calling Workflow URL, pUC19 methylation ratio, Lambda methylation ratio, DMC data file format, DMR data file Format |
False |
Imaging Level 1 |
Raw imaging data |
Component, Filename, File Format, HTAN Parent Biospecimen ID, HTAN Data File ID, Imaging Assay Type, Protocol Link, Software and Version, Commit SHA, Pre-processing Completed, Pre-processing Required, Comment |
False |
Imaging Level 2 |
Raw and pre-processed image data |
Component, Filename, File Format, HTAN Participant ID, HTAN Parent Biospecimen ID, HTAN Data File ID, Channel Metadata Filename, Imaging Assay Type, Protocol Link, Software and Version, Microscope, Objective, NominalMagnification, LensNA, WorkingDistance,WorkingDistanceUnit, Immersion, Pyramid, Zstack, Tseries, Passed QC, Comment, FOV number, FOVX, FOVXUnit, FOVY, FOVYUnit, Frame Averaging, Image ID, DimensionOrder, PhysicalSizeX, PhysicalSizeXUnit, PhysicalSizeY, PhysicalSizeYUnit, PhysicalSizeZ, PhysicalSizeZUnit, Pixels BigEndian, PlaneCount, SizeC, SizeT, SizeX, SizeY, SizeZ, PixelType, MERFISH Positions File, MERFISH Codebook File |
False |
MERFISH Positions File |
The positions file is an auxiliary MERFISH file that describes the location of bead positions in the assay. |
nan |
False |
MERFISH Codebook File |
The codebook is an auxiliary MERFISH file that describes how each grouping of bits is converted to a gene name. |
nan |
False |
Imaging Level 3 Segmentation |
Object segmentations |
Component, Filename, File Format, HTAN Parent Data File ID, HTAN Data File ID, Imaging Segmentation Data Type, Parameter file, Software and Version, Commit SHA, Imaging Object Class, Number of Objects |
False |
Imaging Level 3 Image |
Quality controlled imaging data |
Component, Filename, File Format, HTAN Parent Biospecimen ID, HTAN Parent Data File ID, HTAN Parent Channel Metadata ID, HTAN Data File ID, Imaging Assay Type, Protocol Link,Software and Version, Microscope, Objective, NominalMagnification, LensNA, WorkingDistance, Immersion, Pyramid, Zstack, Tseries, Passed QC, Comment, FOV number, FOVX, FOVY, Frame Averaging |
False |
10x Visium Spatial Transcriptomics - RNA-seq Level 1 |
Files contain raw RNA-seq data associated with spot/slide data. |
Component, Filename, Run ID, File Format, HTAN Parent Biospecimen ID, HTAN Data File ID, Read Indicator, Spatial Read1, Spatial Read2, Spatial Library Construction Method, Library Preparation Days from Index, Sequencing Library Construction Days from Index, End Bias, Reverse Transcription Primer, Sequencing Platform, Capture Area, Protocol Link, Slide Version, Slide ID, Image Re-orientation, Permeabilization Time, RIN, DV200 |
False |
10x Visium Spatial Transcriptomics - RNA-seq Level 2 |
Alignment workflows downstream of Spatial Transcriptomics RNA-seq Level 1. |
Component, Filename, File Format, Checksum,HTAN Parent Data File ID, HTAN Data File ID, UMI Tag, Whitelist Spatial Barcode File Link, Spatial Barcode Tag, Applied Hard Trimming, Workflow Version, Workflow Link, Genomic Reference, Genomic Reference URL, Genome Annotation URL, HTAN Parent Biospecimen ID, Run ID, Capture Area |
False |
10x Visium Spatial Transcriptomics - Auxiliary Files |
Auxiliary data associated with spot/slide analysis (aligned Images, quality control files, etc) from Spatial Transcriptomics. |
Component, Filename, File Format, HTAN Parent Biospecimen ID, HTAN Parent Data File ID, HTAN Data File ID, Run ID, Visium File Type, Slide ID, Capture Area, Workflow Version, Workflow Link |
False |
10x Visium Spatial Transcriptomics - RNA-seq Level 3 |
Processed data files based on Spatial Transcriptomics RNA-seq Level 2 and Spatial Transcriptomics Auxiliary files. |
Component, Filename, File Format, HTAN Parent Biospecimen ID, HTAN Parent Data File ID, HTAN Data File ID, Run ID, Visium File Type, Workflow Version, Workflow Link, Capture Area, Spots under tissue, Mean Reads per Spatial Spot, Median Number Genes per Spatial Spot, Sequencing Saturation, Proportion Reads Mapped, Proportion Reads Mapped to Transcriptome, Median UMI Counts per Spot |
False |
10x Visium Spatial Transcriptomics - RNA-seq Level 4 |
Processed data files based on Spatial Transcriptomics RNA-seq Level 3. |
Component, Filename, File Format, HTAN Parent Biospecimen ID, HTAN Parent Data File ID, HTAN Data File ID, Run ID, Workflow Version, Workflow Link, Visium Workflow Type, Visium Workflow Parameters Description |
False |
Visium File Type |
The file type generated for the visium experiment. |
nan |
True |
Run ID |
A unique identifier for this individual run (typically associated with a single slide) of the spatial transcriptomic processing workflow. |
nan |
True |
Capture Area |
Area (or Capture Area) - One of the either four or two active regions where tissue can be placed on a Visium slide. Each area is intended to contain only one tissue sample. Slide areas are named consecutively from top to bottom: A1, B1, C1, D1 for Visium slides with 6.5 mm Capture Area and A, B for CytAssist slides with 11 mm Capture Area. Both CytAssist slides with 6.5 mm Capture Area and Gateway Slides contain only two slide areas, A1 and D1. |
nan |
False |
Slide Version |
Version of imaging slide used. Slide version is critical for the analysis of the sequencing data as different slides have different capture area layouts. |
nan |
False |
Slide ID |
For Visium, it is the unique identifier printed on the label of each Visium slide. The serial number starts with V followed by a number which can range between one through five and ends with a dash and a three digit number, such as 123. For CosMx, this refers to the loaded Flow Cell ID. For Xenium, this ID indicates the slide orientation, as it matches the relative location of the ID on the physical Xenium slide. |
nan |
False |
Image Re-orientation |
To ensure good fiducial alignment and tissue spots detection, it is important to correct for this shift in orientation. |
nan |
False |
Permeabilization Time |
Fixed and stained tissue sections are permeabilized for different times. Each Capture Area captures polyadenylated mRNA from the attached tissue section. Measure is provided in minutes. |
nan |
False |
Whitelist Spatial Barcode File Link |
Link to file listing all possible spatial barcodes. URL |
nan |
True |
Spatial Barcode Tag |
SAM tag for spot barcode field; please provide a valid spot barcode tag (e.g. CB:Z) |
nan |
True |
Spatial Barcode Offset |
Offset in sequence for spot barcode read (in bp): number |
nan |
True |
Spatial Barcode Length |
Length of spot barcode read (in bp): number |
nan |
True |
Spatial Read1 |
Read 1 content description |
nan |
True |
Spatial Read2 |
Read 2 content description |
nan |
True |
Spatial Library Construction Method |
Process which results in the creation of a library from fragments of DNA using cloning vectors or oligonucleotides with the role of adaptors [OBI_0000711] |
nan |
True |
Spatial Barcode and UMI |
Spot and transcript identifiers |
Spatial Barcode Offset, Spatial Barcode Length, UMI Barcode Offset, UMI Barcode Length |
True |
Mean Reads per Spatial Spot |
The number of reads, both under and outside of tissue, divided by the number of barcodes associated with a spot under tissue. |
nan |
True |
Visium Workflow Type |
Generic name for the workflow used to analyze the visium data set. |
nan |
True |
Visium Workflow Parameters Description |
Parameters used to run the workflow.. |
nan |
True |
Spots under tissue |
The number of barcodes associated with a spot under tissue. |
nan |
True |
Median UMI Counts per Spot |
The median number of UMI counts per tissue covered spot. |
nan |
True |
Sequencing Saturation |
The fraction of reads originating from an already-observed UMI. This is a function of library complexity and sequencing depth. More specifically, this is the fraction of confidently mapped, valid spot-barcode, valid UMI reads that had a non-unique (spot-barcode, UMI, gene). |
nan |
True |
Proportion Reads Mapped to Transcriptome |
Fraction of reads that mapped to a unique gene in the transcriptome. The read must be consistent with annotated splice junctions. These reads are considered for UMI counting. |
nan |
True |
Median Number Genes per Spatial Spot |
The median number of genes detected per spot under tissue-associated barcode. Detection is defined as the presence of at least 1 UMI count. |
nan |
True |
NanoString GeoMx DSP Spatial Transcriptomics Level 1 |
Files contain raw data output from the NanoString GeoMx DSP Pipeline. These can include RCC or DCC Files. |
Component, Filename, File Format, HTAN Parent Biospecimen ID, HTAN Data File ID, Synapse ID of GeoMx DSP PKC File, GeoMx DSP NGS Sequencing Platform, GeoMx DSP NGS Library Selection Method, GeoMx DSP NGS Library Preparation Kit Name, GeoMx DSP Library Preparation Kit Vendor, GeoMx DSP Library Preparation Kit Version, Synapse ID of GeoMx Lab Worksheet File, Software and Version |
False |
GeoMx DSP Assay Type |
The assay type which was used for the GeoMx DSP pipeline. |
nan |
True |
Synapse ID of GeoMx DSP PKC File |
The Synapse ID(s) associated with the PKC mapping file for the assay. Multiple files are listed as comma separated values. |
nan |
True |
GeoMx DSP NGS Sequencing Platform |
A platform is an object aggregate that is the set of instruments and software needed to perform a process [OBI_0000050]. Specific model of the sequencing instrument. |
nan |
False |
GeoMx DSP NGS Library Selection Method |
How RNA molecules are isolated. |
nan |
False |
GeoMx DSP NGS Library Preparation Kit Name |
Name of Library Preparation Kit. String |
nan |
False |
GeoMx DSP Library Preparation Kit Vendor |
Vendor of Library Preparation Kit. String |
nan |
False |
GeoMx DSP Library Preparation Kit Version |
Version of Library Preparation Kit. String |
nan |
False |
Synapse ID of GeoMx Lab Worksheet File |
Synapse ID(s) of Lab Worksheet Files output from the GeoMx DSP workflow. Multiple files are listed as comma separated values. |
nan |
False |
NanoString GeoMx DSP Spatial Transcriptomics Level 3 |
Files contain processed data from the NanoString GeoMx DSP Pipeline. This level depends on GeoMx Level 1 and Imaging Level 2. |
Component, Filename, File Format, HTAN Parent Biospecimen ID, HTAN Data File ID, GeoMx DSP Assay Type, Synapse ID of GeoMx DSP ROI Segment Annotation File, GeoMx DSP Unique Probe Count, GeoMx DSP Unique Target Count, GeoMx DSP Genomic Reference, Matrix Type, GeoMx DSP Workflow Type, GeoMx DSP Workflow Parameter Description, GeoMx DSP Workflow Link |
False |
Synapse ID of GeoMx DSP ROI Segment Annotation File |
Synapse ID(s) for ROI/Segmentation annotations in the GeoMx DSP experiment. |
nan |
True |
GeoMx DSP Genomic Reference |
Exact version of the human genome reference used in the alignment of reads (e.g. https://www.gencodegenes.org/human/). Only applicable to some applications in GeoMx |
nan |
False |
GeoMx DSP Unique Probe Count |
Total number of unique probes reported. |
nan |
False |
GeoMx DSP Unique Target Count |
Total number of unique genes reported. |
nan |
False |
GeoMx DSP Workflow Type |
Generic name for the workflow used to analyze the GeoMx DSP data set. |
nan |
False |
GeoMx DSP Workflow Parameter Description |
Parameters used to run the GeoMx DSP workflow. |
nan |
False |
GeoMx DSP Workflow Link |
Link to workflow or command. DockStore.org recommended. URL |
nan |
False |
NanoString GeoMx DSP ROI RCC Segment Annotation Metadata |
GeoMx ROI and Segment Metadata Attributes. The assayed biospecimen should be reported one per row with the associated ROI coordinates. |
HTAN Parent Biospecimen ID, Scan name, ROI name, Segment name, ROI X Coordinate,ROI Y Coordinate, Tags, QC status, Scan Height, Scan Width, Scan Offset X, Scan Offset Y, Binding Density, Positive norm factor, Surface area, Nuclei count, Tissue Stain |
False |
Scan name |
GeoMx Scan name (as appears in Segment Summary) |
nan |
True |
ROI name |
ROI name (application generated). For Xenium this is referred to as the “region name” |
nan |
True |
Segment name |
Name given to segment at time of generation |
nan |
True |
Tags |
Unique descriptor of a variable group (ie. MAPK+) |
nan |
True |
ROI X Coordinate |
X location within the image |
nan |
True |
ROI Y Coordinate |
Y location within the image |
nan |
True |
QC status |
ROI quality control flag as reported by the application |
nan |
False |
Scan Height |
Height of the scan for GeoMx Analysis |
nan |
True |
Scan Width |
Width of the scan for GeoMx Analysis |
nan |
True |
Scan Offset X |
Offset X of the scan for GeoMx Analysis |
nan |
True |
Scan Offset Y |
Offset Y of the scan for GeoMx Analysis |
nan |
True |
Binding Density |
The binding density as reported by the application |
nan |
False |
Positive norm factor |
The Positive Control Normalization factor calculated using pos-hyb controls |
nan |
False |
Surface area |
Surface area of the ROI in square microns (µm^2). In CosMx, this is referred to as the Scan Area. In Xenium, this is referred to as the Region Area |
nan |
True |
Nuclei count |
Number of nuclei detected in the segment (if applicable) |
nan |
True |
Tissue Stain |
e.g. CD45 or PanCK (if masking was performed) |
nan |
False |
NanoString GeoMx DSP ROI DCC Segment Annotation Metadata |
GeoMx ROI and Segment Metadata Attributes. The assayed biospecimen should be reported one per row with the associated ROI coordinates. |
HTAN Parent Biospecimen ID, Scan name, Slide name, ROI name, Segment name, ROI X Coordinate,ROI Y Coordinate, Tags, Scan Height, Scan Width, Scan Offset X, Scan Offset Y, Surface area, Nuclei count, Sequencing Saturation, MapQ30, Raw reads, Stitched reads, Aligned reads, Deduplicated reads, In Situ Negative median, Biological probe median |
False |
Slide name |
Similar to a Run ID, the slide name indicates the slide a given ROI is linked to (as reported in Segment Summary). |
nan |
False |
Raw reads |
Reads not yet analyzed in any way to be used for data analysis. The number of reads that pass filter from the flow cell represented in the FASTQ file. |
nan |
False |
Stitched reads |
Represents consensus from the overlapping sequence of read 1 and 2. This is a % of the aligned reads that were overlapped and consensus confirmed, usually upward of 80% but less in terms of number of reads than aligned reads |
nan |
False |
Aligned reads |
Is a sequence that has been aligned to a gene/probe. Typically these reads can number from the hundreds of thousands to tens of millions. In GeoMx alignment is via mapping the RTS ID to a white list of sequences that represent targets. |
nan |
False |
Deduplicated reads |
Is the replacement of blocks of duplicate data with a Virtual Index Pointer linking the new sub-block to the existing block of data in a duplicate repository. This is used to reduce the amount of space need to store the data. |
nan |
False |
In Situ Negative median |
Is the median of all negative control probes for a given segment. A measure of signal to background for each segment. |
nan |
False |
Biological probe median |
Is the median count from all probes except the negative control probes. A measure of signal to background for each segment |
nan |
False |
HI-C-seq Level 1 |
Unaligned sequence data |
Component, HTAN Parent Biospecimen ID, HTAN Data File ID, Filename, File Format, Genomic Reference, Sequencing Platform, Nucleic Acid Source, Technical Replicate Group, Transposition Reaction, Crosslinking Condtion, DNA Digestion Condition, Nuclei Permeabilization Method, Ligation Condition, Biotin Enrichment, DNA Input Amount, Total Reads, Protocol Link |
False |
HI-C-seq Level 2 |
Aligned read pairs, contact matrix |
Component, HTAN Data File ID, HTAN Parent Data File ID, Filename, File Format, Genomic Reference, Aligned Read Length, Tool, Resolution, Normalization Method |
False |
HI-C-seq Level 3 |
Summary data for the HI-C-seq assay. |
Component, HTAN Parent Data File ID, HTAN Data File ID, Filename, File Format, Genomic Reference, Stripe Calling, Loop Window, Stripe Window, Loop Calling |
False |
Crosslinking Condtion |
Detailed condition for DNA crosslinking |
nan |
True |
DNA Digestion Condition |
Enzymes and treatment length/temperature for genome digestion |
nan |
True |
Nuclei Permeabilization Method |
Detergent and treatment condition for nuclei permeabilization and crosslinking softening |
nan |
True |
Ligation Condition |
Name of ligase and condition for proximity ligation |
nan |
True |
Biotin Enrichment |
Whether biotin is used for enriching ligation product |
nan |
True |
DNA Input Amount |
Amount of DNA for library construction, in nanograms. |
nan |
True |
Resolution |
Binning size used for generating contact matrix, in basepair. |
nan |
True |
Stripe Calling |
Tool used for identifying architectural stripe-forming, interaction hotspots. |
nan |
True |
Loop Window |
Binning size used for calling significant dot interactions (loops) |
nan |
True |
Stripe Window |
Binning size used for calling significant architectural stripes. Can be an integer or comma-separated list of integers indicating bin size and sliding window size if different. |
nan |
True |
Loop Calling |
Tool used for identifying loop interactions |
nan |
True |
Imaging Level 4 |
Derived imaging data: Object-by-feature array |
Component, Filename, File Format, HTAN Parent Data File ID, HTAN Parent Channel Metadata ID, HTAN Data File ID, Parameter file, Software and Version, Commit SHA,Number of Objects, Number of Features,Imaging Object Class, Imaging Summary Statistic |
False |
SRRS Imaging Level 2 |
SRRS-specific HTAN raw and pre-processed image data |
Component, Filename, File Format, HTAN Participant ID, HTAN Parent Biospecimen ID, HTAN Data File ID, Channel Metadata Filename, Imaging Assay Type, Protocol Link, Software and Version, Microscope, Objective, NominalMagnification, Pyramid, Zstack, Tseries, Passed QC, Frame Averaging, Image ID, DimensionOrder, PhysicalSizeX, PhysicalSizeXUnit, PhysicalSizeY, PhysicalSizeYUnit, Pixels BigEndian, PlaneCount, SizeC, SizeT, SizeX, SizeY, SizeZ, PixelType |
False |
10X Genomics Xenium ISS Experiment |
All data pertaining to the 10X Genomics Xenium In-Situ Hybridization experiment |
Component, Filename, File Format, HTAN Parent Biospecimen ID, HTAN Data File ID, Xenium Bundle Contents, Slide ID, ROI name, Panel Name, Protocol Link, Software and Version,Total Number of Cells, Total Number of Targets, Surface area, Experiment IF Channels, Transcripts per Cell, Percent of Transcripts within Cells, Decoded Transcripts, Xenium IF image HTAN File ID, Xenium HE image HTAN File ID |
False |
Xenium Bundle Contents |
A comma separated list of filenames within the Xenium bundle zip file |
nan |
True |
Panel Name |
The human-readable panel name. This could be the Gene Panel name or Protein Panel name. In Xenium, this refers to the string entered as the name in panel specification (e.g. Xenium Human Immuno-Oncology Add-on B Gene Expression). In CosMx, this refers to the panel name as it appears in the CosMx catalog (e.g. CosMx Human Universal Cell Characterization Panel (1000-plex)) |
nan |
True |
Total Number of Cells |
The total number of cells analyzed on the flow cell |
nan |
True |
Total Number of Targets |
Refers to the target of an assay. Can be genes/transcripts or probes |
nan |
True |
Experiment IF Channels |
A comma-separated list with any number of channels the user deems appropriate(Example: PanCK, CD45, CD3, DAPI) |
nan |
True |
Transcripts per Cell |
Mean or Median transcript count per cell analyzed on the flow cell or slide |
nan |
True |
Percent of Transcripts within Cells |
The percentage of transcripts assigned to assayed cells |
nan |
True |
Decoded Transcripts |
In Xenium, this is the number of high-quality, decoded-to-gene nuclear transcripts divided by the total segmented nuclear area to get a transcript density (units are reported in 100um^2). |
nan |
True |
Xenium IF image HTAN File ID |
The HTAN Data File ID of a Imaging Level 2 file |
nan |
False |
Xenium HE image HTAN File ID |
The HTAN Data File ID of a Imaging Level 2 file |
nan |
False |
RPPA Level 2 |
Array based protemics. Each dilution curve of spot intensities is fitted using the monotone increasing B-spline model in the SuperCurve R package. This fits a single curve using all the samples on a slide with the signal intensity as the response variable and the dilution steps as independent variables. The fitted curve is plotted with the signal intensities on the y-axis and the log2-concentration of proteins on the x-axis for diagnostic purposes. |
Component, Filename, File Format, HTAN Participant ID, HTAN Parent Biospecimen ID, HTAN Parent Data File ID, HTAN Data File ID, HTAN RPPA Antibody Table, Assay Type, Protocol Link, Software and Version |
False |
HTAN RPPA Antibody Table |
A table containing antibody level metadata for RPPA |
HTAN RPPA Antibody Table ID, Filename, File Format, Ab Name Reported on Dataset, GENCODE Gene Symbol Target, UNIPROT Protein ID Target, Phosphoprotein Flag, Vendor, Catalog Number, Internal Ab ID, Species, RPPA Dilution, Phospho Site, RPPA Validation Status, Clone, Clonality, Antibody Notes |
True |
RPPA Level 3 |
Level 3 Reverse Phase Protein Array (RPPA) data contains intra-batch normalized intensities. |
Component, Filename, File Format, HTAN Participant ID, HTAN Parent Biospecimen ID, HTAN Parent Data File ID, HTAN Data File ID, Assay Type, Software and Version, Normalization Method |
False |
RPPA Level 4 |
Level 4 Reverse Phase Protein Array (RPPA) data contains intra-batch corrected intensities. |
Component, Filename, File Format, HTAN Participant ID, HTAN Parent Biospecimen ID, HTAN Parent Data File ID, HTAN Data File ID, Assay Type, Batch Correction Method |
False |
Nanostring CosMx SMI Experiment |
RNA and Protein Panel assays applied as part of Nanostring CosMx Spatial Molecular Imager (SMI) |
Component, Filename, File Format, HTAN Parent Biospecimen ID, HTAN Data File ID, CosMx Bundle Contents, Slide ID, CosMx Assay Type, Panel Name, Protocol Link, Software and Version, Total Number of Cells, Total Number of Targets, Number of FOVs, Surface area, Experiment IF Channels, Transcripts per Cell, Percent of Transcripts within Cells, Mean Total Transcripts per Area, Unique Genes, Total Negative Probe Counts |
False |
CosMx Bundle Contents |
A comma separated list of filenames within the CosMx bundle zip file |
nan |
True |
CosMx Assay Type |
The specification for barcodes on each image. Either RNA probe or protein antibody according to the assay |
nan |
True |
Number of FOVs |
The total number of FOVs recorded for the sample on a single flow cell |
nan |
True |
Mean Total Transcripts per Area |
The mean total transcripts per um3 |
nan |
True |
Unique Genes |
The total unique genes detected above background |
nan |
False |
Total Negative Probe Counts |
Mean Total Negative probe counts/cell |
nan |
True |
Mass Spectrometry Level 1 |
Mass Spectrometry derived data that includes proteomics, metabolomics, and lipidomics, level 1 |
Component, Filename, File Format, HTAN Parent Biospecimen ID, HTAN Data File ID, MS Batch ID, MS-based Assay Type, Analyte Type, MS-based Targeted, MS Instrument Vendor and Model, MS Source, Polarity, Mass Range Low Value, Mass Range High Value, Data Collection Mode, MS Scan Mode, MS Labeling, Protocol Link, LC Instrument Vendor and Model, LC Column Vendor and Model, LC Resin, LC Length Value, LC Temp Value, LC ID Value, LC Flow Rate, LC Gradient, LC Mobile Phase A, LC Mobile Phase B, Software and Version, MS Instrument Metadata File |
False |
Mass Spectrometry Level 2 |
Mass Spectrometry derived data that includes proteomics, metabolomics, and lipidomics, level 2 |
Component, Filename, File Format, HTAN Data File ID, HTAN Parent Biospecimen ID, HTAN Parent Data File ID, MS Assay Category, Software and Version, Mass Spectrometry Auxiliary File |
False |
Mass Spectrometry Level 3 |
Mass Spectrometry derived data that includes proteomics, metabolomics, and lipidomics, level 3 |
Component, Filename, File Format, HTAN Data File ID, HTAN Parent Biospecimen ID, HTAN Parent Data File ID, MS Assay Category, Software and Version, Mass Spectrometry Auxiliary File |
False |
Mass Spectrometry Level 4 |
Mass Spectrometry derived data that includes proteomics, metabolomics, and lipidomics, level 4 |
Component, Filename, File Format, HTAN Data File ID, HTAN Parent Biospecimen ID, HTAN Parent Data File ID, MS Assay Category, Software and Version, Mass Spectrometry Auxiliary File |
False |
Mass Spectrometry Auxiliary File |
Auxiliary software parameter file used in mass spectrometry data processing, recorded as synapse ID (syn12345). |
Component, Filename, File Format, HTAN Parent Data File ID, HTAN Data File ID |
False |
Imaging Level 3 Channels |
Channel-level Metadata Attributes |
HTAN Channel Metadata ID, Channel ID, Channel Name, Channel Passed QC, Cycle Number, Sub Cycle Number, Antibody Role, Target Name, Antibody Name, RRID identifier, Fluorophore, Clone, Lot, Vendor, Catalog Number, Excitation Wavelength, Emission Wavelength, Excitation Bandwidth, Emission Bandwidth, Metal Isotope Element, Metal Isotope Mass, Oligo Barcode Upper Strand, Oligo Barcode Lower Strand, Dilution, Concentration |
False |
HTAN Channel Metadata ID |
HTAN ID for this channel metadata table (same for all rows) |
nan |
True |
Channel ID |
This must match the corresponding field in the OME-XML / TIFF header. (eg 'Channel:0:1') |
nan |
True |
Channel Name |
This must match the corresponding field in the OME-XML / TIFF header. (eg 'Blue' or 'CD45' or 'E-cadherin') |
nan |
True |
Channel Passed QC |
Identify stains that did not pass QC but are included in the dataset. |
nan |
True |
No - Channel Failed QC |
Channel failed QC |
Channel QC Failure Type |
False |
Channel QC Failure Type |
Reason the channel failed QC |
nan |
False |
Other/multiple channel QC faliure types |
QC failure type not speficied |
Channel QC Failure Comment |
False |
Channel QC Failure Comment |
Custom comment on channel QC faliure |
nan |
False |
Cycle Number |
The cycle # in which the co-listed reagent(s) was(were) used. Integer >= 1 (up to number of cycles) |
nan |
False |
Sub Cycle Number |
Sub cycle number |
nan |
False |
Target Name |
Short descriptive name (abbreviation) for this target (antigen) |
nan |
True |
Antibody Role |
"Is this antibody acting as a primary or secondary antibody" |
nan |
True |
Antibody Name |
Antibody Name (free text (eg “Keratin”, “CD163”, “DNA”)) |
nan |
True |
RRID identifier |
Research Resource Identifier (eg “RRID: AB_394606”) |
nan |
True |
Fluorophore |
Fluorescent dye label (eg Alexa Fluor 488) |
nan |
False |
Clone |
Clone |
nan |
False |
Lot |
Lot number from vendor |
nan |
False |
Vendor |
Vendor |
nan |
False |
Catalog Number |
Catalog Number |
nan |
False |
Excitation Wavelength |
Center/peak of the excitation spectrum (nm) |
nan |
False |
Emission Wavelength |
Center/peak of the emission spectrum (nm) |
nan |
False |
Excitation Bandwidth |
Nominal width of excitation spectrum (nm) |
nan |
False |
Emission Bandwidth |
Nominal width of emission spectrum (nm) |
nan |
False |
Metal Isotope Element |
Element abbreviation. eg “La” or “Nd” |
nan |
False |
Metal Isotope Mass |
Element mass number |
nan |
False |
Oligo Barcode Upper Strand |
Oligo Barcode - Upper Strand |
nan |
False |
Oligo Barcode Lower Strand |
Oligo Barcode - Lower Strand |
nan |
False |
Dilution |
Dilution (eg 1:1000) |
nan |
False |
Concentration |
Concentration (eg 10ug/mL) |
nan |
False |
Imaging Assay Type |
Type of imaging assay |
nan |
True |
Channel Metadata Filename |
Full path within Synapse project of uploaded companion CSV file containing channel-level metadata details |
nan |
True |
Microscope |
Microscope type (manufacturer, model, etc) used for this experiment |
nan |
True |
Objective |
Objective |
nan |
False |
NominalMagnification |
The magnification of the lens as specified by the manufacturer - i.e. '60' is a 60X lens. floating point value > 1(no units) |
nan |
True |
LensNA |
The numerical aperture of the lens. Floating point value > 0. |
nan |
False |
WorkingDistance |
The working distance of the lens, expressed as a floating point number. Floating point > 0. |
WorkingDistanceUnit |
False |
WorkingDistanceUnit |
The units of the working distance. See OME enumeration of allowed values for the UnitsLength attribute -- default: microns (um) |
nan |
False |
Immersion |
Immersion medium |
nan |
False |
Pyramid |
Does data file contain pyramid of images |
nan |
True |
Zstack |
Does data file contain a Z-stack of images |
nan |
True |
Tseries |
Does data file contain a time-series of images |
nan |
True |
Passed QC |
Did all channels pass QC (if not add free text Comment) |
nan |
True |
No - Channels QC |
Not all channels passed QC |
Comment |
False |
Comment |
Free text field (generally for QC comment) |
nan |
False |
FOV number |
Index of FOV (as it pertains to its sequence order). Integer >= 1 |
nan |
False |
FOVX |
Field of view X dimension. Floating point |
FOVXUnit |
False |
FOVXUnit |
Field of view X dimension units. See OME enumeration of allowed values for the UnitsLength attribute -- default: microns (um) |
nan |
False |
FOVY |
Field of view Y dimension. Floating point value |
FOVYUnit |
False |
FOVYUnit |
Field of view Y dimension units. See OME enumeration of allowed values for the UnitsLength attribute -- default: microns (um) |
nan |
False |
Frame Averaging |
Number of frames averaged together (if no averaging, set to 1). Integer >= 1 |
nan |
False |
Image ID |
Unique internal image identifier. eg "Image:0". (To be extracted from OME-XML) |
nan |
True |
DimensionOrder |
The order in which the individual planes of data are interleaved. |
nan |
True |
PhysicalSizeX |
Physical size (X-dimension) of a pixel. Units are set by PhysicalSizeXUnit. Floating point value > 0. |
PhysicalSizeXUnit |
True |
PhysicalSizeXUnit |
The units of the physical size of a pixel. See OME enumeration of allowed values for the UnitsLength attribute -- default: microns (um) |
nan |
True |
PhysicalSizeY |
Physical size (Y-dimension) of a pixel. Units are set by PhysicalSizeYUnit. Floating point value > 0. |
PhysicalSizeYUnit |
True |
PhysicalSizeYUnit |
The units of the physical size of a pixel. See OME enumeration of allowed values for the UnitsLength attribute -- default: microns (um) |
nan |
True |
PhysicalSizeZ |
Physical size (Z-dimension) of a pixel. Units are set by PhysicalSizeZUnit. Floating point value > 0. |
PhysicalSizeZUnit |
True |
PhysicalSizeZUnit |
The units of the physical size of a pixel. See OME enumeration of allowed values for the UnitsLength attribute -- default: microns (um) |
nan |
True |
Pixels BigEndian |
Boolean (True/False) |
nan |
True |
PlaneCount |
Number of Z-planes (not to be confused with downsampled "pyramid"). Integer >=1 |
nan |
True |
SizeC |
Number of channels. Integer >= 1 |
nan |
True |
SizeT |
Number of time points. Integer >= 1 |
nan |
True |
SizeX |
Size of image: X dimension (in pixels). Integer >= 1 |
nan |
True |
SizeY |
Size of image: Y dimension (in pixels). Integer >= 1 |
nan |
True |
SizeZ |
Size of image: Z dimension (in pixels). Integer >= 1 |
nan |
True |
PixelType |
Data type for each pixel value. E.g. "uint16" |
nan |
True |
Imaging Segmentation Data Type |
Specifies how the segmentation is stored |
nan |
True |
Parameter file |
Path in Syanpse to a text file listing algorithm version numbers and relevant parameters needed to reproduce the analysis |
nan |
False |
Commit SHA |
Short SHA for software version [8 hexadecimal characters (for github), comma separated if multiple] |
nan |
False |
Imaging Object Class |
Defines the structure that the mask delineates |
nan |
True |
Imaging Object Class Other |
Imaging Object Class Other |
Imaging Object Class Description |
False |
Imaging Object Class Description |
Free text description of object class [string] |
nan |
True |
Number of Objects |
The number of objects (eg cells) described |
nan |
True |
Number of Features |
The number of features (eg channels) described |
nan |
True |
Imaging Summary Statistic |
Function used to summarize object/feature intensity |
nan |
False |
Nucleic Acid Source |
The source of the input nucleic molecule |
nan |
True |
Micro-region Seq Platform |
The platform used for micro-regional RNA sequencing (if applicable) |
nan |
False |
ROI Tag |
The tag or grouping used to identify the ROI in micro-regional RNA sequencing (if applicable). Must match the ROI tag within the count matrix in level 3. |
nan |
False |
Single Cell Isolation Method |
The method by which cells are isolated into individual reaction containers at a single cell resolution (e.g. wells, micro-droplets) |
nan |
True |
Dissociation Method |
The tissue dissociation method used for scRNASeq or scATAC-seq assays |
nan |
True |
Library Layout |
Sequencing read type |
nan |
True |
Nucleus Identifier |
Unique nuclei barcode; added at transposition step. Determines which nucleus the reads originated from |
nan |
True |
Nuclei Barcode |
Nuclei Barcode |
nan |
False |
scATACseq Library Layout |
Sequencing read type |
nan |
True |
Nuclei Barcode Read |
Nuclei Barcode Read |
nan |
True |
Nuclei Barcode Length |
Nuclei Barcode Length |
nan |
True |
scATACseq Paired End |
A library layout type |
nan |
False |
scATACseq Read1 |
Read 1 content description |
nan |
True |
scATACseq Read2 |
Read 2 content description |
nan |
True |
scATACseq Read3 |
Read 3 content description |
nan |
False |
scmCseq Read1 |
Read 1 content description |
nan |
True |
scmCseq Read2 |
Read 2 content description |
nan |
True |
scmCseq Read3 |
Read 3 content description |
nan |
True |
Threshold for Minimum Passing Reads |
Threshold for calling cells |
nan |
True |
Total Number of Passing Nuclei |
Number of nuclei sequenced |
nan |
True |
Median Fraction of Reads in Peaks |
Median fraction of reads in peaks (FRIP) |
Peaks Calling Software |
True |
Median Fraction of Reads in Annotated cis DNA Elements |
Median fraction of reads in annotated cis-DNA elements (FRIADE) |
Peaks Calling Software |
True |
Median Passing Read Percentage |
Non-PCR duplicate nuclear genomic sequence reads not aligning to unanchored contigs out of total reads assigned to the nucleus barcode |
nan |
True |
Median Percentage of Mitochondrial Reads per Nucleus |
Contamination from mitochondrial sequences |
nan |
True |
Peaks Calling Software |
Generic name of peaks calling tool |
nan |
True |
Read Indicator |
Indicate if this is Read 1 (R1), Read 2 (R2), Index Reads (I1), or Other |
nan |
True |
Read1 |
Read 1 content description |
nan |
True |
Read2 |
Read 2 content description |
nan |
True |
cDNA |
Complementary DNA. A DNA copy of an mRNA or complex sample of mRNAs, made using reverse transcriptase |
cDNA Offset, cDNA Length |
False |
cDNA Offset |
Offset in sequence for cDNA read (in bp): number |
nan |
True |
cDNA Length |
Length of cDNA read (in bp): number |
nan |
True |
Cell Barcode and UMI |
Cell and transcript identifiers |
UMI Barcode Offset, UMI Barcode Length, Median UMIs per Cell Number, Cell Barcode Offset, Cell Barcode Length, Valid Barcodes Cell Number |
False |
Cell Barcode Offset |
Offset in sequence for cell barcode read (in bp): number |
nan |
True |
Cell Barcode Length |
Length of cell barcode read (in bp): number |
nan |
True |
Valid Barcodes Cell Number |
Number |
nan |
True |
UMI Barcode Offset |
Start position of UMI barcode in the sequence. Values: number, 0 for start of read |
nan |
True |
UMI Barcode Length |
Length of UMI barcode read (in bp): number |
nan |
True |
Median UMIs per Cell Number |
Number |
nan |
True |
Cell Median Number Reads |
Median number of reads per cell. Number |
nan |
True |
Cell Median Number Genes |
Median number of genes detected per cell. Number |
nan |
True |
Cell Total |
Number of sequenced cells. Applies to raw counts matrix only. |
nan |
True |
Library Construction Method |
Process which results in the creation of a library from fragments of DNA using cloning vectors or oligonucleotides with the role of adaptors [OBI_0000711] |
nan |
True |
Input Cells and Nuclei |
Number of cells and number of nuclei input; entry format: number, number |
nan |
True |
CEL-seq2 |
Highly-multiplexed plate-based single-cell RNA-Seq assay |
Empty Well Barcode, Well Index |
False |
Empty Well Barcode |
Unique cell barcode assigned to empty cells used as controls in CEL-seq2 assays. |
nan |
True |
Well Index |
Indicate if protein expression (EPCAM/CD45) positive/negative data is available for each cell in CEL-seq2 assays |
nan |
False |
Library Preparation Days from Index |
Number of days between sample for assay was received in lab and the libraries were prepared for sequencing [number]. If not applicable please enter 'Not Applicable' |
nan |
False |
Single Cell Dissociation Days from Index |
Number of days between sample for single cell assay was received in lab and when the sample was dissociated and cells were isolated [number]. If not applicable please enter 'Not Applicable' |
nan |
True |
Sequencing Library Construction Days from Index |
Number of days between sample for assay was received in lab and day of sequencing library construction [number]. If not applicable please enter 'Not Applicable' |
nan |
True |
Nucleic Acid Capture Days from Index |
Number of days between sample for single cell assay was received in lab and day of nucleic acid capture part of library construction (in number of days since sample received in lab) [number]. If not applicable please enter 'Not Applicable' |
nan |
True |
Cryopreserved Cells in Sample |
Indicate if library preparation was based on revived frozen cells. |
nan |
True |
End Bias |
The end of the cDNA molecule that is preferentially sequenced, e.g. 3/5 prime tag/end or the full length transcript |
nan |
True |
Reverse Transcription Primer |
An oligo to which new deoxyribonucleotides can be added by DNA polymerase [SO_0000112]. The type of primer used for reverse transcription, e.g. oligo-dT or random primer. This allows users to identify content of the cDNA library input e.g. enriched for mRNA |
nan |
True |
Feature barcoding |
A method for adding extra channels of information to cells by running single-cell gene expression in parallel with other assays [https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/feature-bc] |
Feature Reference Id |
False |
Feature Reference Id |
Unique ID for this feature. Must not contain whitespace, quote or comma characters. Each ID must be unique and must not collide with a gene identifier from the transcriptome [https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/using/feature-bc-analysis#feature-ref] |
nan |
True |
Spike In |
A set of known synthetic RNA molecules with known sequence that are added to the cell lysis mix |
nan |
True |
ERCC |
The External RNA Controls Consortium (ERCC) spike in set is commonly used in single-cell experiments for normalization |
Spike In Concentration |
False |
Spike In Concentration |
The final concentration or dilution (for commercial sets) of the spike in mix [PMID:21816910] |
nan |
True |
Sequencing Platform |
A platform is an object aggregate that is the set of instruments and software needed to perform a process [OBI_0000050]. Specific model of the sequencing instrument. |
nan |
True |
Technical Replicate Group |
A common term for all files belonging to the same cell or library. Provide a numbering of each library prep batch (can differ from encapsulation and sequencing batch) |
nan |
False |
Total Number of Input Cells |
Number of cells loaded/placed on plates |
nan |
True |
Sequencing Batch ID |
Links samples to a specific local sequencer run. Can be string or 'null' |
nan |
True |
Single Nucleus Buffer |
Nuclei isolation buffer |
nan |
True |
Transposition Reaction |
Name of the transposase, transposon sequences |
nan |
True |
Read Length |
The length of the sequencing reads. Can be integer, null |
nan |
True |
Target Capture Kit |
Description that can uniquely identify a target capture kit. Suggested value is a combination of vendor, kit name, and kit version. |
nan |
True |
Library Selection Method |
How RNA molecules are isolated. |
nan |
True |
Library Preparation Kit Name |
Name of Library Preparation Kit. String |
nan |
True |
Library Preparation Kit Vendor |
Vendor of Library Preparation Kit. String |
nan |
True |
Library Preparation Kit Version |
Version of Library Preparation Kit. String |
nan |
True |
Adapter Name |
Name of the sequencing adapter. String |
nan |
False |
Adapter Sequence |
Base sequence of the sequencing adapter. String |
nan |
False |
Base Caller Name |
Name of the base caller. String |
nan |
False |
Base Caller Version |
Version of the base caller. String |
nan |
False |
Flow Cell Barcode |
Flow cell barcode. Wrong or missing information may affect analysis results. String |
nan |
False |
Fragment Maximum Length |
Maximum length of the sequenced fragments (e.g., as predicted by Agilent Bioanalyzer). Integer |
nan |
False |
Fragment Mean Length |
Mean length of the sequenced fragments (e.g., as predicted by Agilent Bioanalyzer). Number |
nan |
False |
Fragment Minimum Length |
Minimum length of the sequenced fragments (e.g., as predicted by Agilent Bioanalyzer). Integer |
nan |
False |
Fragment Standard Deviation Length |
Standard deviation of the sequenced fragments length (e.g., as predicted by Agilent Bioanalyzer). Number |
nan |
False |
Lane Number |
The basic machine unit for sequencing. For Illumina machines, this reflects the physical lane number. Wrong or missing information may affect analysis results. Integer |
nan |
False |
Library Strand |
Library stranded-ness. |
nan |
False |
Multiplex Barcode |
The barcode/index sequence used. Wrong or missing information may affect analysis results. String |
nan |
False |
Size Selection Range |
Range of size selection. String |
nan |
False |
Target Depth |
The targeted read depth prior to sequencing. Integer |
nan |
False |
To Trim Adapter Sequence |
Does the user suggest adapter trimming? |
nan |
False |
Yes - Trim Adapter Sequence |
Trim adapter sequence |
nan |
False |
Adapter Trimmer Name |
Name of adapter trimmer |
nan |
False |
Adapter Trimmer Version |
Version of the adapter trimmer |
nan |
False |
Adapter Trimmer Options |
Options used by adapter trimmer |
nan |
False |
Transcript Integrity Number |
Used to describe the quality of the starting material, esp. in regards to FFPE samples. Number |
nan |
False |
RIN |
A numerical assessment of the integrity of RNA based on the entire electrophoretic trace of the RNA sample including the presence or absence of degradation products. Number |
nan |
False |
DV200 |
Represents the percentage of RNA fragments that are >200 nucleotides in size. Number |
nan |
False |
Adapter Content |
State classification given by FASTQC for the metric. Metric specific details about the states are available on their website. |
nan |
False |
Basic Statistics |
State classification given by FASTQC for the metric. Metric specific details about the states are available on their website. |
nan |
False |
Encoding |
Version of ASCII encoding of quality values found in the file. String |
nan |
False |
Kmer Content |
State classification given by FASTQC for the metric. Metric specific details about the states are available on their website. |
nan |
False |
Overrepresented Sequences |
State classification given by FASTQC for the metric. Metric specific details about the states are available on their website. |
nan |
False |
Per Base N Content |
State classification given by FASTQC for the metric. Metric specific details about the states are available on their website. |
nan |
False |
Per Base Sequence Content |
State classification given by FASTQC for the metric. Metric specific details about the states are available on their website. |
nan |
False |
Per Base Sequence Quality |
State classification given by FASTQC for the metric. Metric specific details about the states are available on their website. |
nan |
False |
Per Sequence GC Content |
State classification given by FASTQC for the metric. Metric specific details about the states are available on their website. |
nan |
False |
Per Sequence Quality Score |
State classification given by FASTQC for the metric. Metric specific details about the states are available on their website. |
nan |
False |
Per Tile Sequence Quality |
State classification given by FASTQC for the metric. Metric specific details about the states are available on their website. |
nan |
False |
Percent GC Content |
The overall %GC of all bases in all sequences. Integer |
nan |
False |
Sequence Duplication Levels |
State classification given by FASTQC for the metric. Metric specific details about the states are available on their website. |
nan |
False |
Sequence Length Distribution |
State classification given by FASTQC for the metric. Metric specific details about the states are available on their website. |
nan |
False |
Total Reads |
Total number of reads per sample. Integer |
nan |
False |
Whitelist Cell Barcode File Link |
Link to file listing all possible cell barcodes. URL |
nan |
True |
Cell Barcode Tag |
SAM tag for cell barcode field; please provide a valid cell barcode tag (e.g. CB:Z) |
nan |
True |
UMI Tag |
SAM tag for the UMI field; please provide a valid UB, UMI (e.g. UB:Z or UR:Z) |
nan |
True |
Applied Hard Trimming |
Was Hard Trimming applied |
nan |
True |
Yes - Applied Hard Trimming |
Hard Trimming was applied |
Aligned Read Length |
False |
Aligned Read Length |
Read length used for alignment if hard trimming was applied |
nan |
True |
scRNAseq Workflow Type |
Generic name for the workflow used to analyze a data set. |
nan |
True |
Workflow Version |
Major version of the workflow (e.g. Cell Ranger v3.1) |
nan |
True |
scRNAseq Workflow Parameters Description |
Parameters used to run the workflow. scRNA-seq level 3: e.g. Normalization and log transformation, ran empty drops or doublet detection, used filter on # genes/cell, etc. scRNA-seq Level 4: dimensionality reduction with PCA and 50 components, nearest-neighbor graph with k = 20 and Leiden clustering with resolution = 1, UMAP visualization using 50 PCA components, marker genes used to annotate cell types, information about droplet matrix (all barcodes) to cell matrix (only informative barcodes representing real cells) conversion |
nan |
True |
scATACseq Workflow Type |
Generic name for the workflow used to analyze a data set. |
nan |
True |
scATACseq Workflow Parameters Description |
Parameters used to run the scATAC-seq workflow. |
nan |
True |
Workflow Link |
Link to workflow or command. DockStore.org recommended. URL |
nan |
True |
QC Workflow Type |
Generic name for the workflow used to analyze a data set. String |
nan |
False |
QC Workflow Version |
Major version for a workflow. String |
nan |
False |
QC Workflow Link |
Link to workflow used. String |
nan |
False |
Germline Variants Workflow URL |
Link to workflow document, e.g. Github, DockStore.org recommended |
nan |
True |
Germline Variants Workflow Type |
Generic name for the workflow used to analyze a data set |
nan |
False |
Other Germline Variants Workflow Type |
Other Germline Variants Workflow Type |
Custom Germline Variants Workflow Type |
False |
Custom Germline Variants Workflow Type |
Specify the name of a custom alignment workflow |
nan |
True |
Somatic Variants Workflow URL |
Generic name for the workflow used to analyze a data set. |
nan |
True |
Somatic Variants Workflow Type |
Generic name for the workflow used to analyze a data set. |
nan |
False |
Other Somatic Variants Workflow Type |
Other Somatic Variants Workflow Type |
Custom Somatic Variants Workflow Type |
False |
Custom Somatic Variants Workflow Type |
Specify the name of a custom workflow name |
nan |
True |
Somatic Variants Sample Type |
Is the sample case or control in somatic variant analysis |
nan |
True |
Structural Variant Workflow URL |
Link to workflow document. DockStore.org recommended. URL |
nan |
True |
Structural Variant Workflow Type |
Generic name for the workflow used to analyze a data set. |
nan |
False |
Other Structural Variant Workflow Type |
Other Structural Variant Workflow Type |
Custom Structural Variant Workflow Type |
False |
Custom Structural Variant Workflow Type |
Specify the name of a custom workflow name |
nan |
True |
Alignment Workflow Url |
Link to workflow used for read alignment. DockStore.org recommended. String |
nan |
True |
Alignment Workflow Type |
Generic name for the workflow used to analyze a data set. |
nan |
True |
Other Alignment Workflow |
Other Alignment Workflow |
Custom Alignment Workflow |
False |
Custom Alignment Workflow |
Specify the name of a custom alignment workflow |
nan |
True |
MSI Workflow Link |
Link to method workflow (or command) used in estimating the MSI. URL |
nan |
False |
MSI Score |
Numeric score denoting the aligned reads file's MSI score from MSIsensor. Number |
nan |
False |
MSI Status |
MSIsensor determination of either microsatellite stability or instability. |
nan |
False |
Genomic Reference |
Exact version of the human genome reference used in the alignment of reads (e.g. GCF_000001405.39) |
nan |
True |
Genomic Reference URL |
Link to human genome sequence (e.g. ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_34/GRCh38.primary_assembly.genome.fa.gz) |
nan |
True |
Genome Annotation URL |
Link to the human genome annotation (GTF) file (e.g. ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_34/gencode.v34.annotation.gtf.gz) |
nan |
True |
Index File Name |
The name (or part of a name) of a file (of any type). String |
nan |
True |
Average Base Quality |
Average base quality collected from samtools. Number |
nan |
False |
Average Insert Size |
Average insert size collected from samtools. Integer |
nan |
False |
Average Read Length |
Average read length collected from samtools. Integer |
nan |
False |
Contamination |
Fraction of reads coming from cross-sample contamination collected from GATK4. Number |
nan |
False |
Contamination Error |
Estimation error of cross-sample contamination collected from GATK4. Number |
nan |
False |
Mean Coverage |
Mean coverage for whole genome sequencing, or mean target coverage for whole exome and targeted sequencing, collected from Picard. Number |
nan |
False |
Pairs On Diff CHR |
Pairs on different chromosomes collected from samtools. Integer |
nan |
False |
Total Uniquely Mapped |
Number of reads that map to genome. Integer |
nan |
False |
Total Unmapped reads |
Number of reads that did not map to genome. Integer |
nan |
False |
Proportion Reads Duplicated |
Proportion of duplicated reads collected from samtools. Number |
nan |
False |
Proportion Reads Mapped |
Proportion of mapped reads collected from samtools. Number |
nan |
False |
Proportion Targets No Coverage |
Proportion of targets that did not reach 1X coverage over any base from Picard Tools. Number |
nan |
False |
Proportion Base Mismatch |
Proportion of mismatched bases collected from samtools. Number |
nan |
False |
Proportion Coverage 10x |
Proportion of all reference bases for whole genome sequencing, or targeted bases for whole exome and targeted sequencing, that achieves 10X or greater coverage from Picard Tools. |
nan |
False |
Proportion Mitochondrial Reads |
Proportion of reads mapping to mitochondria. |
nan |
False |
Proportion Coverage 30X |
Proportion of all reference bases for whole genome sequencing, or targeted bases for whole exome and targeted sequencing, that achieves 30X or greater coverage from Picard Tools. |
nan |
False |
Short Reads |
Number of reads that were too short. Integer |
nan |
False |
Pseudo Alignment Used |
Pseudo aligners such as Kallisto or Salmon do not produce aligned reads BAM files. True indicates pseudoalignment was used. |
nan |
True |
Software and Version |
Name of software used to generate expression values. String |
nan |
True |
Yes - Pseudo Alignment Used |
Pseudo aligner was used |
Workflow Link, Software and Version, Genomic Reference, Genomic Reference URL |
False |
Data Category |
Specific content type of the data file. |
nan |
True |
Expression Units |
How quantities are corrected for gene length |
nan |
True |
Fusion Gene Detected |
Was a fusion gene identified? |
nan |
False |
Yes - Fusion Gene Detected |
A fusion gene was detected |
Fusion Gene Identity |
False |
Fusion Gene Identity |
The gene symbols of fused genes. |
nan |
False |
Other Fusion Gene |
Other fusion gene detected. |
Specify Other Fusion Gene |
False |
Specify Other Fusion Gene |
Specify fusion gene detected, if not in list |
nan |
False |
Matrix Type |
Type of data stored in matrix. |
nan |
True |
Linked Matrices |
All matrices associated with every part of a SingleCellExperiment object. Comma-delimited list of filenames |
nan |
False |
Biospecimen Type |
Biospecimen Type |
nan |
True |
Analyte Biospecimen Type |
A molecular derivative (I.e. RNA / DNA / Protein Lysate) obtained from a specimen |
Analyte Type, Fixation Duration, Slide Charge Type, Section Thickness Value, Sectioning Days from Index, Shipping Condition Type, Ischemic Time, Ischemic Temperature |
False |
Tissue Biospecimen Type |
Tissue biospecimen |
Ischemic Time, Ischemic Temperature, Site of Resection or Biopsy, Specimen Laterality, Portion Weight, Total Volume, Tumor Tissue Type, Histologic Morphology Code, Preservation Method, Biospecimen Dimension 1, Biospecimen Dimension 2, Biospecimen Dimension 3, Section Number in Sequence |
False |
Bone Marrow Biospecimen Type |
Bone Marrow biospecimen |
Ischemic Time, Ischemic Temperature, Site of Resection or Biopsy, Specimen Laterality, Portion Weight, Total Volume, Tumor Tissue Type, Histologic Morphology Code, Preservation Method, Biospecimen Dimension 1, Biospecimen Dimension 2, Biospecimen Dimension 3, Section Number in Sequence |
False |
Urine Biospecimen Type |
Urine biospecimen |
Ischemic Time, Ischemic Temperature, Site of Resection or Biopsy, Specimen Laterality, Portion Weight, Total Volume, Tumor Tissue Type, Histologic Morphology Code, Preservation Method |
False |
Blood Biospecimen Type |
Blood biospecimen |
Shipping Condition Type |
False |
Timepoint Label |
Label to identify the time point at which the clinical data or biospecimen was obtained (e.g. Baseline, End of Treatment, Overall survival, Final). NO PHI/PII INFORMATION IS ALLOWED. |
nan |
True |
Collection Days from Index |
Number of days from the research participant's index date that the biospecimen was obtained. If not applicable please enter 'Not Applicable' |
nan |
True |
Protocol Link |
Protocols.io ID or DOI link to a free/open protocol resource describing in detail the assay protocol (e.g. surface markers used in Smart-seq, dissociation duration, lot/batch numbers for key reagents such as primers, sequencing reagent kits, etc.) or the protocol by which the sample was obtained or generated. |
nan |
True |
Adjacent Biospecimen IDs |
List of HTAN Identifiers (separated by commas) of adjacent biospecimens cut from the same sample; for example HTA3_3000_3, HTA3_3000_4, ... |
nan |
False |
Mounting Medium |
The solution in which the specimen is embedded, generally under a cover glass. It may be liquid, gum or resinous, soluble in water, alcohol or other solvents and be sealed from the external atmosphere by non-soluble ringing media |
nan |
False |
Analyte Type |
The kind of molecular specimen analyte: a molecular derivative (I.e. RNA / DNA / Protein Lysate) obtained from a specimen |
nan |
True |
Acquisition Method Type |
Records the method of acquisition or source for the specimen under consideration. |
nan |
True |
Other Acquisition Method |
A custom acquisition method |
Acquisition Method Other Specify |
False |
Acquisition Method Other Specify |
A custom acquisition method [Text - max length 100 characters] |
nan |
True |
Preservation Method |
Text term that represents the method used to preserve the sample. |
nan |
True |
Fixative Type |
Text term to identify the type of fixative used to preserve a tissue specimen |
nan |
True |
Fixation Duration |
The length of time, from beginning to end, required to process or preserve biospecimens in fixative (measured in minutes) |
nan |
True |
Ischemic Time |
Duration of time, in seconds, between when the specimen stopped receiving oxygen and when it was preserved or processed. Integer value. |
nan |
False |
Ischemic Temperature |
Specify whether specimen experienced warm or cold ischemia. |
nan |
False |
Collection Media |
Material Specimen is collected into post procedure |
nan |
False |
Specimen Laterality |
For tumors in paired organs, designates the side on which the specimen was obtained. |
nan |
True |
Portion Weight |
Numeric value that represents the sample portion weight, measured in milligrams. |
nan |
False |
Total Volume |
Numeric value for the total amount of sample or specimen |
Total Volume Unit |
False |
Total Volume Unit |
Unit of measurement used for the total amount of sample or specimen |
nan |
False |
Tumor Tissue Type |
Text that describes the kind of disease present in the tumor specimen as related to a specific timepoint (add rows to select multiple values along with timepoints) |
nan |
True |
Histologic Morphology Code |
The microscopic anatomy of normal and abnormal cells and tissues of the specimen as captured in the morphology codes of the International Classification of Diseases for Oncology, 3rd Edition (ICD-O-3). Example - 8010/0 |
nan |
True |
Preinvasive Morphology |
Histologic Morphology not included in ICD-O-3 morphology codes, for preinvasive lesions included in the HTAN |
nan |
False |
Slide Charge Type |
A description of the charge on the glass slide. |
nan |
True |
Section Thickness Value |
Numeric value to describe the thickness of a slice to tissue taken from a biospecimen, measured in microns (um). |
nan |
True |
Sectioning Days from Index |
Number of days from the research participant's index date that the biospecimen was sectioned after collection. If not applicable please enter 'Not Applicable' |
nan |
True |
Storage Method |
The method by which a biomaterial was stored after preservation or before another protocol was used. |
nan |
True |
Processing Days from Index |
Number of days from the research participant's index date that the biospecimen was processed. If not applicable please enter 'Not Applicable' |
nan |
True |
Shipping Condition Type |
Text descriptor of the shipping environment of a biospecimen. |
nan |
True |
Site Data Source |
Text to identify the data source for the specimen/sample from within the HTAN center, if applicable. Any identifier used within the center to identify data sources. No PHI/PII is allowed. |
nan |
False |
Processing Location |
Site with an HTAN center where specimen processing occurs, if applicable. Any identifier used within the center to identify processing location. No PHI/PII is allowed. |
nan |
False |
Histology Assessment By |
Text term describing who (in what role) made the histological assessments of the sample |
nan |
False |
Histology Assessment Medium |
The method of assessment used to characterize histology |
nan |
False |
Tumor Infiltrating Lymphocytes |
Measure of Tumor-Infiltrating Lymphocytes [Number] |
nan |
False |
Degree of Dysplasia |
Information related to the presence of cells that look abnormal under a microscope but are not cancer. Records the degree of dysplasia for the cyst or lesion under consideration. |
nan |
False |
Dysplasia Fraction |
Resulting value to represent the number of pieces of dysplasia divided by the total number of pieces. [Text: max length 5] |
nan |
False |
Number Proliferating Cells |
Numeric value that represents the count of proliferating cells determined during pathologic review of the sample slide(s). |
nan |
False |
Percent Eosinophil Infiltration |
Numeric value to represent the percentage of infiltration by eosinophils in a tumor sample or specimen. |
nan |
False |
Percent Granulocyte Infiltration |
Numeric value to represent the percentage of infiltration by granulocytes in a tumor sample or specimen. |
nan |
False |
Percent Inflam Infiltration |
Numeric value to represent local response to cellular injury, marked by capillary dilatation, edema and leukocyte infiltration; clinically, inflammation is manifest by redness, heat, pain, swelling and loss of function, with the need to heal damaged tissue. |
nan |
False |
Percent Lymphocyte Infiltration |
Numeric value to represent the percentage of infiltration by lymphocytes in a solid tissue normal sample or specimen. |
nan |
False |
Percent Monocyte Infiltration |
Numeric value to represent the percentage of monocyte infiltration in a sample or specimen. |
nan |
False |
Percent Necrosis |
Numeric value to represent the percentage of cell death in a malignant tumor sample or specimen. |
nan |
False |
Percent Neutrophil Infiltration |
Numeric value to represent the percentage of infiltration by neutrophils in a tumor sample or specimen. |
nan |
False |
Percent Normal Cells |
Numeric value to represent the percentage of normal cell content in a malignant tumor sample or specimen. |
nan |
False |
Percent Stromal Cells |
Numeric value to represent the percentage of reactive cells that are present in a malignant tumor sample or specimen but are not malignant such as fibroblasts, vascular structures, etc. |
nan |
False |
Percent Tumor Cells |
Numeric value that represents the percentage of infiltration by tumor cells in a sample. |
nan |
False |
Percent Tumor Nuclei |
Numeric value to represent the percentage of tumor nuclei in a malignant neoplasm sample or specimen. |
nan |
False |
Fiducial Marker |
Imaging specific: fiducial markers for the alignment of images taken across multiple rounds of imaging. |
nan |
False |
Slicing Method |
Imaging specific: the method by which the tissue was sliced. |
nan |
False |
Lysis Buffer |
scRNA-seq specific: Type of lysis buffer used |
nan |
False |
Method of Nucleic Acid Isolation |
Bulk RNA & DNA-seq specific: method used for nucleic acid isolation. E.g. Qiagen Allprep, Qiagen miRNAeasy. [Text - max length 100] |
nan |
False |
Biospecimen Dimension 1 |
First dimension of tissue fragment (number, up to one decimal place) measured in units as defined by the "dimensions_unit" CDE |
Dimensions Unit |
False |
Biospecimen Dimension 2 |
Second dimension of tissue fragment (number, up to one decimal place) measured in units as defined by the "dimensions_unit" CDE |
nan |
False |
Biospecimen Dimension 3 |
Third dimension of tissue fragment (number, up to one decimal place) measured in units as defined by the "dimensions_unit" CDE |
nan |
False |
Dimensions Unit |
Unit of measurement used for dimension CDEs in metric system (i.e. cm, mm, etc) |
nan |
False |
Section Number in Sequence |
Numeric value (integer, including ranges) provided to a sample in a series of sections (list all adjacent sections in the Adjacent Biospecimen IDs field) |
nan |
False |
Start Days from Index |
Number of days from the date of birth (index date) to the date of an event (e.g. exposure to environmental factor, treatment start, etc.). If not applicable please enter 'Not Applicable' |
nan |
True |
Stop Days from Index |
Number of days from the date of birth (index date) to the end date of the event (e.g. exposure to environmental factor, treatment start, etc.). Note: if the event occurs at a single time point, e.g. a diagnosis or a lab test, the values for this column is 'Not Applicable' |
nan |
False |
Ethnicity |
An individual's self-described social and cultural grouping, specifically whether an individual describes themselves as Hispanic or Latino. The provided values are based on the categories defined by the U.S. Office of Management and Business and used by the U.S. Census Bureau. |
nan |
True |
Gender |
Text designations that identify gender. Gender is described as the assemblage of properties that distinguish people on the basis of their societal roles. [Identification of gender is based upon self-report and may come from a form, questionnaire, interview, etc.] |
nan |
True |
Race |
An arbitrary classification of a taxonomic group that is a division of a species. It usually arises as a consequence of geographical isolation withina a species and is characterized by shared heredity, physical attributes and behavior, and in the case of humans, by common history, nationality, or geographic distribution. |
nan |
True |
Vital Status |
The survival state of the person registered on the protocol. |
nan |
True |
Dead |
This indicates the participant is dead and defines further required metadata |
Year of Death, Cause of Death, Cause of Death Source, Days to Death |
False |
Days to Birth |
Number of days between the date used for index and the date from a person's date of birth represented as a calculated negative number of days. If not applicable please enter 'Not Applicable' |
nan |
False |
Year of Death |
Numeric value to represent the year of the death of an individual. |
nan |
True |
Country of Residence |
Country of Residence at enrollment |
nan |
False |
Age Is Obfuscated |
The age of the patient has been modified for compliance reasons. The actual age differs from what is reported. Other date intervals for this patient may also be modified. |
nan |
False |
Year Of Birth |
Numeric value to represent the calendar year in which an individual was born. |
nan |
False |
Cause of Death |
The cause of death |
nan |
True |
Cause of Death Source |
The text term used to describe the source used to determine the patient's cause of death. |
nan |
False |
Days to Death |
Number of days between the date used for index and the date from a person's date of death represented as a calculated number of days. If not applicable please enter 'Not Applicable' |
nan |
False |
Occupation Duration Years |
The number of years a patient worked in a specific occupation. |
nan |
False |
Premature At Birth |
The yes/no/unknown indicator used to describe whether the patient was premature (less than 37 weeks gestation) at birth. |
nan |
False |
Weeks Gestation at Birth |
Numeric value used to describe the number of weeks starting from the approximate date of the biological mother's last menstrual period and ending with the birth of the patient. |
nan |
False |
Education Level |
Highest level of education that the patient completed (direct patient-derived information) |
nan |
False |
Country of Birth |
Country where the patient was born. |
nan |
False |
Medically Underserved Area |
Areas or populations designated by HRSA as having too few primary care providers, high infant mortality, high poverty or a high elderly population: Use patient zip code to find the county the patient lives in by going to https://www.unitedstateszipcodes.org/ - enter the zip code in the main text field and use the associated county on the right side of the result field. Go to data.hrsa.gov website and select "Query Data". Pick the Medically Underserved Areas/Populations (MUA/P) data source in the step 1 menu and select "View Data". Enter the name of the county (_ county) in the first "Service Area" column, adding the state in the 5th column may help direct you to the data. If the designation type in the third column is "medically underserved area" enter "Yes" as the value. If the county generates a "No data available in table" enter "No" as the value. A value of "Unknown" indicates that sufficient data was not available to look up the value. If value is yes, complete the Medically_underserved_score data element. |
nan |
False |
Medically Underserved Area - Yes |
Patient's zip code is in a medically underserved area |
Medically Underserved Score |
False |
Medically Underserved Area - No |
Patient's zip code is not in a medically underserved area |
nan |
False |
Medically Underserved Area - Unknown |
Insufficient data to look up the Medically Underserved Area value |
nan |
False |
Medically Underserved Score |
Index of Medical Underservice (IMU) score, a number between 0 and 100, where 0 represents completely underserved and 100 represents best served or least underserved. Use patient zip code to find the county the patient lives in by going to https://www.unitedstateszipcodes.org/. Enter the zip code in the main text field and use the asociated county on the right side of the result field. Go to data.hrsa.gov website and select Query Data. Pick the Medically Underserved Areas/Populations (MUA/P) data source in the step 1 menu and select View Data. Enter the name of the county (______ county) in the first "Service Area" column, adding the state in the 5th column may help direct you to the data. Enter the Index of Medical Underservice Score in the fourth column to one decimal place as the value. |
nan |
False |
Rural vs Urban |
Density of population in the county of residence, based on census data (updated last on 4/28/20). Use patient zip code to find the county the patient lives in by going to https://www.unitedstateszipcodes.org/. Enter the zip code in the main text field and use the associated county on the right side of the result field. Go to https://www2.census.gov/programs-surveys/acs/data/covid_19/Data_Profiles_for_HHS/050-County_By_State/. Select the dp02_XX.csv file where XX = the two letter abbreviation for the appropriate state. On row 166 find the total population for the appropriate county. If the total population is <2,500 enter a value of "Rural Population"; if 2,500 - 50,000 enter a value of "Urban Cluster"; or if >50,000 enter "Urban Population" |
nan |
False |
Cancer Incidence |
Incidence of specific cancer type in a defined area (a number between 0 and 100). The rate of incident cases per population of 100,000 persons of a specific type of cancer as designated in the "primary_diagnosis" data element in the county where the patient resides, using the most recent 2013-2017 NCI Cancer Atlas derived data. Use patient zip code to find the county the patient lives in by going to https://www.unitedstateszipcodes.org/. Enter the zip code in the main text field and use the asociated county on the right side of the result field. On the https://gis.cancer.gov/canceratlas/tableview/ website, choose "Incidence" from the Topic dropdown menu, state of interest from the Area menu, "All Races" from the Race menu, and the cancer type ("Both Sexes" when possible) from the Statistic menu. Find the county of interest and enter the numeric Age-Adjusted Rate per 100,000 as the value. |
nan |
False |
Cancer Incidence Location |
The county and state in which the patient lives and to which the cancer_incidence data correlates. Record as "County, State" as they appear in the incidence box from which the cancer_incidence data is obtained in the https://gis.cancer.gov/canceratlas/tableview/ website |
nan |
False |
Relationship Gender |
The text term used to describe the gender of the patient's relative with a history of cancer. |
nan |
False |
Relationship Age at Diagnosis |
The age (in years) when the patient's relative was first diagnosed. |
nan |
False |
Relationship Primary Diagnosis |
The text term used to describe the malignant diagnosis of the patient's relative with a history of cancer. |
nan |
False |
Relationship Type |
The subgroup that describes the state of connectedness between members of the unit of society organized around kinship ties. |
nan |
False |
Relative with Cancer History |
The yes/no/unknown indicator used to describe whether any of the patient's relatives have a history of cancer. |
nan |
False |
Relatives with Cancer History Count |
The number of relatives the patient has with a known history of cancer. |
nan |
False |
Yes - Cancer History Relative |
Individual has a relative with cancer history |
Relatives with Cancer History Count, Relationship Type, Relationship Primary Diagnosis, Relationship Gender,Relationship Age at Diagnosis |
False |
Smoking Exposure |
Indicate if individual has smoking exposure |
nan |
True |
Yes - Smoking Exposure |
Individual has been exposed to smoke; requires additional metadata |
Years Smoked, Pack Years Smoked, Cigarettes per Day, Smoking Frequency, Type of Smoke Exposure, Time between Waking and First Smoke, Tobacco Smoking Onset Year, Tobacco Smoking Quit Year, Tobacco Smoking Status, Type of Tobacco Used, Secondhand Smoke as Child, Smoke Exposure Duration, Tobacco Use per Day, Smokeless Tobacco Quit Age |
False |
Pack Years Smoked |
Numeric computed value to represent lifetime tobacco exposure defined as number of cigarettes smoked per day x number of years smoked divided by 20. |
nan |
True |
Years Smoked |
Numeric value (or unknown) to represent the number of years a person has been smoking. |
nan |
True |
Alcohol Exposure |
Indicate if individual has alcohol exposure |
nan |
True |
Yes - Alcohol Exposure |
Individual has been exposed to alcohol |
Alcohol Days Per Week, Alcohol Drinks Per Day, Alcohol History, Alcohol Intensity, Alcohol Type |
False |
Alcohol Days Per Week |
Numeric value used to describe the average number of days each week that a person consumes an alcoholic beverage. |
nan |
False |
Alcohol Drinks Per Day |
Numeric value used to describe the average number of alcoholic beverages a person consumes per day. |
nan |
False |
Alcohol History |
A response to a question that asks whether the participant has consumed at least 12 drinks of any kind of alcoholic beverage in their lifetime. |
nan |
False |
Alcohol Intensity |
Category to describe the patient's current level of alcohol use as self-reported by the patient. |
nan |
False |
Alcohol Type |
Type of alcohol use |
nan |
False |
Asbestos Exposure |
The yes/no/unknown indicator used to describe whether the patient was exposed to asbestos. |
nan |
False |
Cigarettes per Day |
The average number of cigarettes smoked per day. |
nan |
False |
Coal Dust Exposure |
The yes/no/unknown indicator used to describe whether a patient was exposed to fine powder derived by the crushing of coal. |
nan |
False |
Environmental Tobacco Smoke Exposure |
The yes/no/unknown indicator used to describe whether a patient was exposed to smoke that is emitted from burning tobacco, including cigarettes, pipes, and cigars. This includes tobacco smoke exhaled by smokers. |
nan |
False |
Radon Exposure |
The yes/no/unknown indicator used to describe whether the patient was exposed to radon. |
nan |
False |
Respirable Crystalline Silica Exposure |
The yes/no/unknown indicator used to describe whether a patient was exposured to respirable crystalline silica, a widespread, naturally occurring, crystalline metal oxide that consists of different forms including quartz, cristobalite, tridymite, tripoli, ganister, chert and novaculite. |
nan |
False |
Smoking Frequency |
The text term used to generally decribe how often the patient smokes. |
nan |
False |
Secondhand Smoke as Child |
The text term used to indicate whether the patient was exposed to secondhand smoke as a child. |
nan |
False |
Smoke Exposure Duration |
Text term used to describe the length of time the patient was exposed to an environmental factor. |
nan |
False |
Type of Smoke Exposure |
The text term used to describe the patient's specific type of smoke exposure. |
nan |
False |
Marijuana smoke |
Marijuana smoke exposure |
Marijuana Use Per Week |
False |
Marijuana Use Per Week |
Numeric value that represents the number of times the patient uses marijuana each day. |
nan |
False |
Tobacco Use per Day |
Numeric value that represents the number of times the patient uses tobacco each day. |
nan |
False |
Smokeless Tobacco Quit Age |
Smokeless tobacco quit age |
nan |
False |
Time between Waking and First Smoke |
The text term used to describe the approximate amount of time elapsed between the time the patient wakes up in the morning to the time they smoke their first cigarette. |
nan |
False |
Tobacco Smoking Onset Year |
The year in which the participant began smoking. |
nan |
False |
Tobacco Smoking Quit Year |
The year in which the participant quit smoking. |
nan |
False |
Tobacco Smoking Status |
Category describing current smoking status and smoking history as self-reported by a patient |
nan |
False |
Type of Tobacco Used |
The text term used to describe the specific type of tobacco used by the patient. |
nan |
False |
Days to Follow Up |
Number of days between the date used for index and the date of the patient's last follow-up appointment or contact. If not applicable please enter 'Not Applicable' |
nan |
True |
Adverse Event |
Text that represents the Common Terminology Criteria for Adverse Events low level term name for an adverse event. |
nan |
False |
BMI |
A calculated numerical quantity that represents an individual's weight to height ratio. |
nan |
False |
Cause of Response |
The text term used to describe the suspected cause or reason for the patient disease response. |
nan |
False |
Comorbidity |
The text term used to describe a comorbidity disease, which coexists with the patient's malignant disease. |
nan |
False |
Comorbidity Method of Diagnosis |
The text term used to describe the method used to diagnose the patient's comorbidity disease. |
nan |
False |
Days to Adverse Event |
Number of days between the date used for index and the date of the patient's adverse event. If not applicable please enter 'Not Applicable' |
nan |
False |
Days to Comorbidity |
Number of days between the date used for index and the date the patient was diagnosed with a comorbidity. If not applicable please enter 'Not Applicable' |
nan |
False |
Days to Progression |
Number of days between the date used for index and the date the patient's disease progressed. If not applicable please enter 'Not Applicable' |
nan |
False |
Days to Progression Free |
Number of days between the date used for index and the date the patient's disease was formally confirmed as progression-free. If not applicable please enter 'Not Applicable' |
nan |
False |
Days to Recurrence |
Number of days between the date used for index and the date the patient's disease recurred. If not applicable please enter 'Not Applicable' |
nan |
True |
Diabetes Treatment Type |
Text term used to describe the types of treatment used to manage diabetes. |
nan |
False |
Disease Response |
Code assigned to describe the patient's response or outcome to the disease. |
nan |
False |
DLCO Ref Predictive Percent |
The value, as a percentage of predicted lung volume, measuring the amount of carbon monoxide detected in a patient's lungs. |
nan |
False |
ECOG Performance Status |
The ECOG functional performance status of the patient/participant. |
nan |
False |
FEV1 FVC Post Bronch Percent |
Percentage value to represent result of Forced Expiratory Volume in 1 second (FEV1) divided by the Forced Vital Capacity (FVC) post-bronchodilator. |
nan |
False |
FEV 1 FVC Pre Bronch Percent |
Percentage value to represent result of Forced Expiratory Volume in 1 second (FEV1) divided by the Forced Vital Capacity (FVC) pre-bronchodilator. |
nan |
False |
FEV1 Ref Post Bronch Percent |
The percentage comparison to a normal value reference range of the volume of air that a patient can forcibly exhale from the lungs in one second post-bronchodilator. |
nan |
False |
FEV1 Ref Pre Bronch Percent |
The percentage comparison to a normal value reference range of the volume of air that a patient can forcibly exhale from the lungs in one second pre-bronchodilator. |
nan |
False |
Height |
The height of the patient in centimeters. |
nan |
False |
Hepatitis Sustained Virological Response |
The yes/no/unknown indicator used to describe whether the patient received treatment for a risk factor the patient had at the time of or prior to their diagnosis. |
nan |
False |
HPV Positive Type |
Text classification to represent the strain or type of human papillomavirus identified in an individual. |
nan |
False |
Karnofsky Performance Status |
Text term used to describe the classification used of the functional capabilities of a person. |
nan |
False |
Menopause Status |
Text term used to describe the patient's menopause status. |
nan |
False |
Adverse Event Grade |
The text term used to describe a specific histone variants, which are proteins that substitute for the core canonical histones. |
nan |
False |
AIDS Risk Factors |
The text term used to describe a risk factor of the acquired immunodeficiency syndrome (AIDS) that the patient either had at time time of the study or experienced in the past. |
nan |
False |
Body Surface Area |
Numeric value used to represent the 2-dimensional extent of the body surface relating height to weight. |
nan |
False |
CD4 Count |
The text term used to describe the outcome of the procedure to determine the amount of the CD4 expressing cells in a sample. |
nan |
False |
CDC HIV Risk Factors |
The text term used to describe a risk factor for human immunodeficiency virus, as described by the Center for Disease Control. |
nan |
False |
Days to Imaging |
Number of days between the date used for index and the date the imaging or scan was performed on the patient. If not applicable please enter 'Not Applicable' |
nan |
False |
Evidence of Recurrence Type |
The text term used to describe the type of evidence used to determine whether the patient's disease recurred |
nan |
False |
HAART Treatment Indicator |
The text term used to indicate whether the patient received Highly Active Antiretroviral Therapy (HAART). |
nan |
False |
HIV Viral Load |
Numeric value that represents the concentration of an analyte or aliquot extracted from the sample or sample portion, measured in milligrams per milliliter. |
nan |
False |
Hormonal Contraceptive Use |
The text term used to indicate whether the patient used hormonal contraceptives. |
nan |
False |
Hysterectomy Margins Involved |
The text term used to indicate whether the patient's disease was determined to be involved based on the surgical margins of the hysterectomy. |
nan |
False |
Hysterectomy Type |
The text term used to describe the type of hysterectomy the patient had. |
nan |
False |
Imaging Result |
The text term used to describe the result of the imaging or scan performed on the patient. |
nan |
False |
Imaging Type |
The text term used to describe the type of imaging or scan performed on the patient. |
nan |
False |
Immunosuppressive Treatment Type |
The text term used to describe the type of immunosuppresive treatment the patient received. |
nan |
False |
Nadir CD4 Count |
Numeric value that represents the lowest point to which the CD4 count has dropped (nadir). |
nan |
False |
Pregnancy Outcome |
The text term used to describe the type of pregnancy the patient had |
nan |
False |
Recist Targeted Regions Number |
Numeric value that represents the number of baseline target lesions, as described by the Response Evaluation Criteria in Solid Tumours (RECIST) criteria |
nan |
False |
Recist Targeted Regions Sum |
Numeric value that represents the sum of baseline target lesions, as described by the Response Evaluation Criteria in Solid Tumours (RECIST) criteria. |
nan |
False |
Scan Tracer Used |
The text term used to describe the type of tracer used during the imaging or scan of the patient. |
nan |
False |
Progression or Recurrence |
Yes/No/unknown indicator to identify whether a patient has had a new tumor event after initial treatment. |
nan |
True |
Yes - Progression or Recurrence |
The patient has had a new tumor event after initial treatment |
Progression or Recurrence Type, Days to Progression, Days to Progression Free, Days to Recurrence, Progression or Recurrence Anatomic Site |
False |
Progression or Recurrence Anatomic Site |
The text term used to describe the anatomic site of resection; biopsy; tissue or organ of biospecimen origin; progression or recurrent disease; treatment |
nan |
False |
Treatment Anatomic Site |
The text term used to describe the anatomic site of resection; biopsy; tissue or organ of biospecimen origin; progression or recurrent disease; treatment |
nan |
False |
NCI Atlas Cancer Site |
The primary tumor site used to calculate the incidence rate using the NCI Cancer Atlas, a digital atlas which provides geographical data related to cancer utilizing the Surveillance, Epidemiology, and End Results (SEER) Program cancer incidence rates for 2013 to 2017 |
nan |
False |
Progression or Recurrence Type |
The text term used to describe the type of progressive or recurrent disease or relapsed disease. |
nan |
False |
Reflux Treatment Type |
Text term used to describe the types of treatment used to manage gastroesophageal reflux disease (GERD). |
nan |
False |
Risk Factor |
The text term used to describe a risk factor the patient had at the time of or prior to their diagnosis. |
nan |
False |
Risk Factor Treatment |
The yes/no/unknown indicator used to describe whether the patient received treatment for a risk factor the patient had at the time of or prior to their diagnosis. |
nan |
False |
Viral Hepatitis Serologies |
Text term that describes the kind of serological laboratory test used to determine the patient's hepatitus status. |
nan |
False |
Weight |
The weight of the patient measured in kilograms. |
nan |
False |
Days to Treatment End |
Number of days between the date used for index and the date the treatment ended. If not applicable please enter 'Not Applicable' |
nan |
False |
Days to Treatment Start |
Number of days between the date used for index and the date the treatment started. If not applicable please enter 'Not Applicable' |
nan |
False |
Initial Disease Status |
The text term used to describe the status of the patient's malignancy when the treatment began. |
nan |
False |
Regimen or Line of Therapy |
The text term used to describe the regimen or line of therapy. |
nan |
False |
Therapeutic Agents |
Text identification of the individual agent(s) used as part of a treatment regimen. |
nan |
False |
Treatment Effect |
The text term used to describe the pathologic effect a treatment(s) had on the tumor. |
nan |
False |
Treatment Intent Type |
Text term to identify the reason for the administration of a treatment regimen. [Manually-curated] |
nan |
False |
Treatment or Therapy |
A yes/no/unknown/not applicable indicator related to the administration of therapeutic agents received. |
nan |
False |
Treatment Outcome |
Text term that describes the patient's final outcome after the treatment was administered. |
nan |
False |
Treatment Type |
Text term that describes the kind of treatment administered. |
nan |
False |
Chemo Concurrent to Radiation |
The text term used to describe whether the patient was receiving chemotherapy concurrent to radiation. |
nan |
False |
Number of Cycles |
The numeric value used to describe the number of cycles of a specific treatment or regimen the patient received. |
nan |
False |
Reason Treatment Ended |
The text term used to describe the reason a specific treatment or regimen ended. |
nan |
False |
Treatment Arm |
Text term used to describe the treatment arm assigned to a patient at the time eligibility is determined. |
nan |
False |
Treatment Dose |
The numeric value used to describe the dose of an agent the patient received. |
nan |
False |
Treatment Dose Units |
The text term used to describe the dose units of an agent the patient received. |
nan |
False |
Treatment Effect Indicator |
The text term used to indicate whether the treatment had an effect on the patient. |
nan |
False |
Treatment Frequency |
The text term used to describe the frequency the patient received an agent or regimen. |
nan |
False |
Sentinel Lymph Node Count |
Numeric count of sentinel lymph nodes. |
nan |
False |
Sentinel Node Positive Assessment Count |
The number or amount of metastatic neoplasms related to the confirmed presence of disease or specific microorganisms during examination of the first rounded mass of lymphatic tissue to which cancer is likely to spread from the primary tumor. |
nan |
False |
Tumor Extranodal Extension Indicator |
The indicator to determine extranodal involvement or extent of the disease. |
nan |
False |
Satellite Metastasis Present Indicator |
A yes/no indicator to ask if intransit metastases or satellite lesions are present. |
nan |
False |
Other Biopsy Resection Site |
A description of the location on or within the human body where the surgical biopsy/resection procedure was performed (Not covered under HTAN Clinical Data Tier 1) |
nan |
False |
Extent of Tumor Resection |
The degree to which the lesion has been cut out, or resected. |
nan |
False |
Precancerous Condition Type |
The classification of pre-cancerous cells found in a specific collection of data being studied by the Consortium for Molecular and Cellular Characterization of Screen-Detected Lesions (MCL). |
nan |
False |
Prior Sites of Radiation |
The anatomic location to which radiation treatment was administered to a patient prior to enrollment on a protocol. |
nan |
False |
Immunosuppression |
The indicator that describes whether or not immunosuppressive therapy was administered. |
nan |
False |
Concomitant Medication Received Type |
An enumerated list of the type of concomitant medication received by the patient. |
nan |
False |
Family Member Vital Status Indicator |
The response indicates whether the family member of the patient with a history of cancer is alive. (Extension to GDC attributes in Family History Tier 1) |
nan |
False |
COVID19 Occurrence Indicator |
The indicator that describes whether or not a COVID-19 infectious disorder occurred. |
nan |
False |
COVID19 Current Status |
The patient's current COVID-19 status of sign or symptom events or interventions |
nan |
False |
COVID19 Positive Lab Test Indicator |
The indicator that describes whether or not there was a COVID-19 positive test result. |
nan |
False |
COVID19 Antibody Testing |
Text term that demonstrates the test results of immunoglobulin M (IgM) and immunoglobulin G (IgG) antibodies to the SARS-CoV-2 virus in subject serum samples. |
nan |
False |
COVID19 Complications Severity |
Text term that retrospectively indicates the worst complications during COVID-19 infectious disorder in the patient. |
nan |
False |
COVID19 Cancer Treatment Followup |
Indicator that describes if cancer treatment was modified for the patient due to COVID-19 infectious disorder |
nan |
False |
Ecig vape use |
Use of non-traditional cigarette nicotine delivery device (electronic cigarette, ENDS - electronic nicotine delivery system) |
nan |
False |
Ecig vape 30 day use num |
Number of days e-cigarettes or vaping device was used in the last 30 days |
nan |
False |
Ecig vape times per day |
e-cig frequency of use (times per day—one “time” consists of around 15 puffs or lasts around 10 minutes) |
nan |
False |
Type of smoke exposure cumulative years |
The number of cumulative years of the patient's specific type of smoke exposure |
nan |
False |
Chewing tobacco daily use count |
The quantity of daily use of tobacco, in the form of a plug, usually flavored, for chewing rather than smoking. |
nan |
False |
Second hand smoke exposure years |
The number of cumulative years of the patient's exposure to second-hand cigarette smoke |
nan |
False |
Known Genetic Predisposition Mutation |
A yes/no/unknown indicator to identify whether there is a known genetic predisposition mutation present in the patient. |
nan |
False |
Hereditary Cancer Predisposition Syndrome |
History of presence of inherited genetic predisposition syndrome that confers heightened susceptibility to cancer in the patient. |
nan |
False |
Cancer Associated Gene Mutations |
Type of inherited germline or other gene mutations that confers heightened susceptibility to cancer identified in patient history |
nan |
False |
Mutational Signatures |
Mutational signatures identified in the patient, includes signatures linked to selected exogenous carcinogens, endogenous and enzymatic modification of DNA or defective DNA repair. Note: Include only outputs of tests that were completed clinically for the participant and only include data from a diagnostic array that was completed prior to research sequencing was done. |
nan |
False |
Mismatch Repair System Status |
The text that best describes the condition or state of MMR (mismatch repair system) in the patient |
nan |
False |
Lab Tests for MMR Status |
Laboratory tests used to evaluate the status of mismatch repair pathways |
nan |
False |
Mode of Cancer Detection |
Text term used to describe the mode of cancer detection, like standard of care screening or random detection |
nan |
False |
Gene Symbol |
The text term used to describe a gene targeted or included in molecular analysis. For rearrangements, this is should be used to represent the reference gene. Note: This node is meant to capture molecular tests that were completed clinically for the participant and only includes data from diagnostic array that was completed prior to research sequencing was done. Do not include data related to research assay outputs here. |
nan |
True |
Molecular Analysis Method |
The text term used to describe the method used for molecular analysis. Note: This node is meant to capture molecular tests that were completed clinically for the participant and only includes data from diagnostic array that was completed prior to research sequencing was done. Do not include data related to research assay outputs here. |
nan |
True |
Test Result |
The text term used to describe the result of the molecular test. If the test result was a numeric value see test_value. Note: This node is meant to capture molecular tests that were completed clinically for the participant and only includes data from diagnostic array that was completed prior to research sequencing was done. Do not include data related to research assay outputs here. |
nan |
True |
AA Change |
Alphanumeric value used to describe the amino acid change for a specific genetic variant. Example: R116Q. Note: This node is meant to capture molecular tests that were completed clinically for the participant and only includes data from diagnostic array that was completed prior to research sequencing was done. Do not include data related to research assay outputs here. |
nan |
False |
Antigen |
The text term used to describe an antigen included in molecular testing. Note: This node is meant to capture molecular tests that were completed clinically for the participant and only includes data from diagnostic array that was completed prior to research sequencing was done. Do not include data related to research assay outputs here. |
nan |
False |
Clinical Biospecimen Type |
The text term used to describe the biological material used for testing, diagnostic, treatment or research purposes. Note: This node is meant to capture molecular tests that were completed clinically for the participant and only includes data from diagnostic array that was completed prior to research sequencing was done. Do not include data related to research assay outputs here. |
nan |
False |
Blood Test Normal Range Upper |
Numeric value used to describe the upper limit of the normal range used to describe a healthy individual at the institution where the test was completed. |
nan |
False |
Blood Test Normal Range Lower |
Numeric value used to describe the lower limit of the normal range used to describe a healthy individual at the institution where the test was completed. |
nan |
False |
Cell Count |
Numeric value used to describe the number of cells used for molecular testing. Note: This node is meant to capture molecular tests that were completed clinically for the participant and only includes data from diagnostic array that was completed prior to research sequencing was done. Do not include data related to research assay outputs here. |
nan |
False |
Chromosome |
The text term used to describe a chromosome targeted or included in molecular testing. If a specific genetic variant is being reported, this property can be used to capture the chromosome where that variant is located. Note: This node is meant to capture molecular tests that were completed clinically for the participant and only includes data from diagnostic array that was completed prior to research sequencing was done. Do not include data related to research assay outputs here. |
nan |
False |
Clonality |
The text term used to describe whether a genomic variant is related by descent from a single progenitor cell. Note: This node is meant to capture molecular tests that were completed clinically for the participant and only includes data from diagnostic array that was completed prior to research sequencing was done. Do not include data related to research assay outputs here. |
nan |
False |
Copy Number |
Numeric value used to describe the number of times a section of the genome is repeated or copied within an insertion, duplication or deletion variant. Note: This node is meant to capture molecular tests that were completed clinically for the participant and only includes data from diagnostic array that was completed prior to research sequencing was done. Do not include data related to research assay outputs here. |
nan |
False |
Cytoband |
Alphanumeric value used to describe the cytoband or chromosomal location targeted or included in molecular analysis. If a specific genetic variant is being reported, this property can be used to capture the cytoband where the variant is located. Format: [chromosome][chromosome arm].[band+sub-bands]. Example: 17p13.1. Note: This node is meant to capture molecular tests that were completed clinically for the participant and only includes data from diagnostic array that was completed prior to research sequencing was done. Do not include data related to research assay outputs here. |
nan |
False |
Exon |
Exon number targeted or included in a molecular analysis. If a specific genetic variant is being reported, this property can be used to capture the exon where that variant is located. Note: This node is meant to capture molecular tests that were completed clinically for the participant and only includes data from diagnostic array that was completed prior to research sequencing was done. Do not include data related to research assay outputs here. |
nan |
False |
Histone Family |
The text term used to describe the family, or classification of a group of basic proteins found in chromatin, called histones. |
nan |
False |
Histone Variant |
The text term used to describe a specific histone variants, which are proteins that substitute for the core canonical histones. |
nan |
False |
Intron |
Intron number targeted or included in molecular analysis. If a specific genetic variant is being reported, this property can be used to capture the intron where that variant is located. |
nan |
False |
Laboratory Test |
The text term used to describe the medical testing used to diagnose, treat or further understand a patient's disease. |
nan |
False |
Loci Abnormal Count |
Numeric value used to describe the number of loci determined to be abnormal. |
nan |
False |
Loci Count |
Numeric value used to describe the number of loci tested. |
nan |
False |
Locus |
Alphanumeric value used to describe the locus of a specific genetic variant. Example: NM_001126114. |
nan |
False |
Mismatch Repair Mutation |
The yes/no/unknown indicator used to describe whether the mutation included in molecular testing was known to have an affect on the mismatch repair process. Note: This node is meant to capture molecular tests that were completed clinically for the participant and only includes data from diagnostic array that was completed prior to research sequencing was done. Do not include data related to research assay outputs here. |
nan |
False |
Molecular Consequence |
The text term used to describe the molecular consequence of genetic variation. Note: This node is meant to capture molecular tests that were completed clinically for the participant and only includes data from diagnostic array that was completed prior to research sequencing was done. Do not include data related to research assay outputs here. |
nan |
False |
Pathogenicity |
The text used to describe a variant's level of involvement in the cause of the patient's disease according to the standards outlined by the American College of Medical Genetics and Genomics (ACMG). |
nan |
False |
Ploidy |
Text term used to describe the number of sets of homologous chromosomes. |
nan |
False |
Second Exon |
The second exon number involved in molecular variation. If a specific genetic variant is being reported, this property can be used to capture the second exon where that variant is located. This property is typically used for a translocation where two different locations are involved in the variation. |
nan |
False |
Second Gene Symbol |
The text term used to describe a secondary gene targeted or included in molecular analysis. For rearrangements, this is should represent the location of the variant. Note: This node is meant to capture molecular tests that were completed clinically for the participant and only includes data from diagnostic array that was completed prior to research sequencing was done. Do not include data related to research assay outputs here. |
nan |
False |
Specialized Molecular Test |
Text term used to describe a specific test that is not covered in the list of molecular analysis methods. Note: This node is meant to capture molecular tests that were completed clinically for the participant and only includes data from diagnostic array that was completed prior to research sequencing was done. Do not include data related to research assay outputs here. |
nan |
False |
Test Analyte Type |
The text term used to describe the type of analyte used for molecular testing. Note: This node is meant to capture molecular tests that were completed clinically for the participant and only includes data from diagnostic array that was completed prior to research sequencing was done. Do not include data related to research assay outputs here. |
nan |
False |
Test Units |
The text term used to describe the units of the test value for a molecular test. This property is used in conjunction with test_value. Note: This node is meant to capture molecular tests that were completed clinically for the participant and only includes data from diagnostic array that was completed prior to research sequencing was done. Do not include data related to research assay outputs here. |
nan |
False |
Test Value |
The text term or numeric value used to describe a specific result of a molecular test. Note: This node is meant to capture molecular tests that were completed clinically for the participant and only includes data from diagnostic array that was completed prior to research sequencing was done. Do not include data related to research assay outputs here |
nan |
False |
Transcript |
Alphanumeric value used to describe the transcript of a specific genetic variant. Note: This node is meant to capture molecular tests that were completed clinically for the participant and only includes data from diagnostic array that was completed prior to research sequencing was done. Do not include data related to research assay outputs here. |
nan |
False |
Variant Origin |
The text term used to describe the biological origin of a specific genetic variant. Note: This node is meant to capture molecular tests that were completed clinically for the participant and only includes data from diagnostic array that was completed prior to research sequencing was done. Do not include data related to research assay outputs here. |
nan |
False |
Variant Type |
The text term used to describe the type of genetic variation. |
nan |
False |
Zygosity |
The text term used to describe the zygosity of a specific genetic variant. |
nan |
False |
Cog Neuroblastoma Risk Group |
Text term that represents the categorization of patients on the basis of prognostic factors per a system developed by Children's Oncology Group (COG). Risk level is used to assign treatment intensity. |
nan |
False |
Cog Rhabdomyosarcoma Risk Group |
Text term used to describe the classification of rhabdomyosarcoma, as defined by the Children's Oncology Group (COG). |
nan |
False |
Gleason Grade Group |
The text term used to describe the overall grouping of grades defined by the Gleason grading classification, which is used to determine the aggressiveness of prostate cancer. Note that this grade describes the entire prostatectomy specimen and is not specific to the sample used for sequencing. |
nan |
False |
Gleason Grade Tertiary |
The text term used to describe the tertiary pattern as described by the Gleason Grading System. |
nan |
False |
Gleason Patterns Percent |
Numeric value that represents the percentage of Patterns 4 and 5, which is used when the Gleason score is greater than 7 to predict prognosis. |
nan |
False |
Greatest Tumor Dimension |
Numeric value that represents the measurement of the widest portion of the tumor in centimeters. |
nan |
False |
IGCCCG Stage |
The text term used to describe the International Germ Cell Cancer Collaborative Group (IGCCCG), a grouping used to further classify metastatic testicular tumors. |
nan |
False |
INPC Grade |
Text term used to describe the classification of neuroblastic differentiation within neuroblastoma tumors, as defined by the International Neuroblastoma Pathology Classification (INPC). |
nan |
False |
INPC Histologic Group |
The text term used to describe the classification of neuroblastomas distinguishing between favorable and unfavorable histologic groups. The histologic score, defined by the International Neuroblastoma Pathology Classification (INPC), is based on age, mitosis-karyorrhexis index (MKI), stromal content and degree of tumor cell differentiation. |
nan |
False |
INRG Stage |
The text term used to describe the staging classification of neuroblastic tumors, as defined by the International Neuroblastoma Risk Group (INRG). |
nan |
False |
INSS Stage |
Text term used to describe the staging classification of neuroblastic tumors, as defined by the International Neuroblastoma Staging System (INSS). |
nan |
False |
International Prognostic Index |
The text term used to describe the International Prognostic Index, which classifies the prognosis of patients with aggressive non-Hodgkin's lymphoma. |
nan |
False |
IRS Group |
Text term used to describe the classification of rhabdomyosarcoma tumors, as defined by the Intergroup Rhabdomyosarcoma Study (IRS). |
nan |
False |
IRS Stage |
The text term used to describe the classification of rhabdomyosarcoma tumors, as defined by the Intergroup Rhabdomyosarcoma Study (IRS). |
nan |
False |
ISS Stage |
The multiple myeloma disease stage at diagnosis. |
nan |
False |
Lymph Node Involved Site |
The text term used to describe the anatomic site of lymph node involvement. |
nan |
False |
Margin Distance |
Numeric value (in centimeters) that represents the distance between the tumor and the surgical margin. |
nan |
False |
Margins Involved Site |
The text term used to describe the anatomic sites that were involved in the survival margins. |
nan |
False |
Medulloblastoma Molecular Classification |
The text term used to describe the classification of medulloblastoma tumors based on molecular features. |
nan |
False |
Micropapillary Features |
The yes/no/unknown indicator used to describe whether micropapillary features were determined to be present. |
nan |
False |
Mitosis Karyorrhexis Index |
Text term that represents the component of the International Neuroblastoma Pathology Classification (INPC) for mitosis-karyorrhexis index (MKI). |
nan |
False |
Non Nodal Regional Disease |
The text term used to describe whether the patient had non-nodal regional disease. |
nan |
False |
Non Nodal Tumor Deposits |
The yes/no/unknown indicator used to describe the presence of tumor deposits in the pericolic or perirectal fat or in adjacent mesentery away from the leading edge of the tumor. |
nan |
False |
Ovarian Specimen Status |
The text term used to describe the physical condition of the involved ovary. |
nan |
False |
Ovarian Surface Involvement |
The text term that describes whether the surface tissue (outer boundary) of the ovary shows evidence of involvement or presence of cancer. |
nan |
False |
Pregnant at Diagnosis |
The text term used to indicate whether the patient was pregnant at the time they were diagnosed. |
nan |
False |
Primary Gleason Grade |
The text term used to describe the primary Gleason score, which describes the pattern of cells making up the largest area of the tumor. The primary and secondary Gleason pattern grades are combined to determine the patient's Gleason grade group, which is used to determine the aggresiveness of prostate cancer. Note that this grade describes the entire prostatectomy specimen and is not specific to the sample used for sequencing. |
nan |
False |
Secondary Gleason Grade |
The text term used to describe the secondary Gleason score, which describes the pattern of cells making up the second largest area of the tumor. The primary and secondary Gleason pattern grades are combined to determine the patient's Gleason grade group, which is used to determine the aggresiveness of prostate cancer. Note that this grade describes the entire prostatectomy specimen and is not specific to the sample used for sequencing. |
nan |
False |
Supratentorial Localization |
Text term to specify the location of the supratentorial tumor. |
nan |
False |
Tumor Depth |
Numeric value that represents the depth of tumor invasion, measured in millimeters (mm). |
nan |
False |
WHO CNS Grade |
WHO CNS Grade |
nan |
False |
WHO NTE Grade |
WHO NTE Grade |
nan |
False |
Age at Diagnosis |
Age at the time of diagnosis expressed in number of days since birth. |
nan |
True |
Days to Last Follow up |
Time interval from the date of last follow up to the date of initial pathologic diagnosis, represented as a calculated number of days. If not applicable please enter 'Not Applicable' |
nan |
True |
Days to Last Known Disease Status |
Time interval from the date of last follow up to the date of initial pathologic diagnosis, represented as a calculated number of days. If not applicable please enter 'Not Applicable' |
nan |
True |
Last Known Disease Status |
Text term that describes the last known state or condition of an individual's neoplasm. |
nan |
True |
Primary Diagnosis |
Text term used to describe the patient's histologic diagnosis, as described by the World Health Organization's (WHO) International Classification of Diseases for Oncology (ICD-O). |
nan |
True |
Prior Malignancy |
The yes/no/unknown indicator used to describe the patient's history of prior cancer diagnosis. |
nan |
False |
Prior Treatment |
A yes/no/unknown/not applicable indicator related to the administration of therapeutic agents received before the body specimen was collected. |
nan |
False |
Site of Resection or Biopsy |
The text term used to describe the anatomic site of the resection or biopsy of the patient's malignant disease, as described by the World Health Organization's (WHO) International Classification of Diseases for Oncology (ICD-O). |
nan |
True |
Tissue or Organ of Origin |
The text term used to describe the anatomic site of origin, of the patient's malignant disease, as described by the World Health Organization's (WHO) International Classification of Diseases for Oncology (ICD-O). |
nan |
True |
Tumor Grade |
Numeric value to express the degree of abnormality of cancer cells, a measure of differentiation and aggressiveness. |
nan |
False |
AJCC Clinical M |
Extent of the distant metastasis for the cancer based on evidence obtained from clinical assessment parameters determined prior to treatment. |
nan |
False |
AJCC Clinical N |
Extent of the regional lymph node involvement for the cancer based on evidence obtained from clinical assessment parameters determined prior to treatment. |
nan |
False |
AJCC Clinical Stage |
Stage group determined from clinical information on the tumor (T), regional node (N) and metastases (M) and by grouping cases with similar prognosis for cancer. |
nan |
False |
AJCC Clinical T |
Extent of the primary cancer based on evidence obtained from clinical assessment parameters determined prior to treatment. |
nan |
False |
AJCC Pathologic M |
Code to represent the defined absence or presence of distant spread or metastases (M) to locations via vascular channels or lymphatics beyond the regional lymph nodes, using criteria established by the American Joint Committee on Cancer (AJCC). |
nan |
False |
AJCC Pathologic N |
The codes that represent the stage of cancer based on the nodes present (N stage) according to criteria based on multiple editions of the AJCC's Cancer Staging Manual. |
nan |
False |
AJCC Pathologic Stage |
The extent of a cancer, especially whether the disease has spread from the original site to other parts of the body based on AJCC staging criteria. |
nan |
False |
AJCC Pathologic T |
Code of pathological T (primary tumor) to define the size or contiguous extension of the primary tumor (T), using staging criteria from the American Joint Committee on Cancer (AJCC). |
nan |
False |
AJCC Staging System Edition |
The text term used to describe the version or edition of the American Joint Committee on Cancer Staging Handbooks, a publication by the group formed for the purpose of developing a system of staging for cancer that is acceptable to the American medical profession and is compatible with other accepted classifications. |
nan |
False |
Anaplasia Present |
Yes/no/unknown/Not Reported indicator used to describe whether anaplasia was present at the time of diagnosis. |
nan |
False |
Yes - Anaplasia Present |
Indicates anaplasia is present |
Anaplasia Present Type |
False |
Anaplasia Present Type |
The text term used to describe the morphologic findings indicating the presence of a malignant cellular infiltrate characterized by the presence of large pleomorphic cells, necrosis, and high mitotic activity in a tissue sample. |
nan |
False |
Best Overall Response |
The best improvement achieved throughout the entire course of protocol treatment. |
nan |
False |
Breslow Thickness |
The number that describes the distance, in millimeters, between the upper layer of the epidermis and the deepest point of tumor penetration. |
nan |
False |
Classification of Tumor |
Text that describes the kind of disease present in the tumor specimen as related to a specific timepoint. |
nan |
False |
Days to Diagnosis |
Number of days between the date used for index and the date the patient was diagnosed with the malignant disease. If not applicable please enter 'Not Applicable' |
nan |
False |
First Symptom Prior to Diagnosis |
Text term used to describe the patient's first symptom experienced prior to diagnosis and thought to be related to the disease. |
nan |
False |
Gross Tumor Weight |
Numeric value used to describe the gross pathologic tumor weight, measured in grams. |
nan |
False |
Laterality |
For tumors in paired organs, designates the side on which the cancer originates. |
nan |
False |
Lymph Nodes Positive |
The number of lymph nodes involved with disease as determined by pathologic examination. |
nan |
False |
Lymph Nodes Tested |
The number of lymph nodes tested to determine whether lymph nodes were involved with disease as determined by a pathologic examination. |
nan |
False |
Lymphatic Invasion Present |
A yes/no indicator to ask if small or thin-walled vessel invasion is present, indicating lymphatic involvement |
nan |
False |
Metastasis at Diagnosis |
The text term used to describe the extent of metastatic disease present at diagnosis. |
nan |
False |
Metastasis at Diagnosis Site |
Text term to identify an anatomic site in which metastatic disease involvement is found. |
nan |
False |
Method of Diagnosis |
Text term used to describe the method used to confirm the patients malignant diagnosis. |
nan |
False |
Mitotic Count |
The number of mitoses identified under the microscope in tumors. The method of counting varies, according to the specific tumor examined. Usually, the mitotic count is determined based on the number of mitoses per high power field (40X) or 10 high power fields. |
nan |
False |
Percent Tumor Invasion |
The percentage of tumor cells spread locally in a malignant neoplasm through infiltration or destruction of adjacent tissue. |
nan |
False |
Peritoneal Fluid Cytological Status |
The text term used to describe the malignant status of the peritoneal fluid determined by cytologic testing. |
nan |
False |
Perineural Invasion Present |
A yes/no indicator to ask if perineural invasion or infiltration of tumor or cancer is present. |
nan |
False |
Residual Disease |
Text terms to describe the status of a tissue margin following surgical resection. |
nan |
False |
Synchronous Malignancy |
A yes/no/unknown indicator used to describe whether the patient had an additional malignant diagnosis at the same time the tumor used for sequencing was diagnosed. If both tumors were sequenced, both tumors would have synchronous malignancies. |
nan |
False |
Tumor Confined to Organ of Origin |
The yes/no/unknown indicator used to describe whether the tumor is confined to the organ where it originated and did not spread to a proximal or distant location within the body. |
nan |
False |
Tumor Focality |
The text term used to describe whether the patient's disease originated in a single location or multiple locations. |
nan |
False |
Tumor Largest Dimension Diameter |
Numeric value used to describe the maximum diameter or dimension of the primary tumor, measured in centimeters. |
nan |
False |
Vascular Invasion Present |
The yes/no indicator to ask if large vessel or venous invasion was detected by surgery or presence in a tumor specimen. |
nan |
False |
Yes - Vascular Invasion Present |
Indicates venous invasion was detected by surgery or presence in a tumor specimen |
Vascular Invasion Type |
False |
Vascular Invasion Type |
Text term that represents the type of vascular tumor invasion. |
nan |
False |
Year of Diagnosis |
Numeric value to represent the year of an individual's initial pathologic diagnosis of cancer. |
nan |
False |
Morphology |
The third edition of the International Classification of Diseases for Oncology, published in 2000 used principally in tumor and cancer registries for coding the site (topography) and the histology (morphology) of neoplasms. The study of the structure of the cells and their arrangement to constitute tissues and, finally, the association among these to form organs. In pathology, the microscopic process of identifying normal and abnormal morphologic characteristics in tissues, by employing various cytochemical and immunocytochemical stains. A system of numbered categories for representation of data. |
nan |
True |
Topography Code |
Topography Code, indicating site within the body, based on ICD-O-3. |
nan |
False |
Additional Topography |
Topography not included in the ICD-O-3 Topography codes. |
nan |
False |
Lung Cancer Detection Method Type |
The means, manner of procedure, or systematic course of actions performed in order to discover or identify lung cancer |
nan |
False |
Lung Cancer Participant Procedure History |
Text name of a surgical or operative procedure used in a natural history protocol of a lung cancer participant. |
nan |
False |
Lung Adjacent Histology Type |
The type of morphologic characteristics observed by microscope in the tissue next to a benign or malignant tissue growth |
nan |
False |
Lung Tumor Location Anatomic Site |
Anatomic location of the tumor inside the lung |
nan |
False |
Lung Tumor Lobe Bronchial Location |
Anatomic lobe and bronchial location of the tumor inside the lung |
nan |
False |
Current Lung Cancer Symptoms |
Reported lung cancer related symptoms person is currently experiencing |
nan |
False |
Lung Topography |
Lung PCA specific topography (not covered in previous tiers) |
nan |
False |
Lung Cancer Harboring Genomic Aberrations |
Genomic aberrations in participants with lung cancer (specific lung cancer associated gene mutations not covered in Tiers 1 and 2) |
nan |
False |
Colorectal Cancer Detection Method Type |
The means, manner of procedure, or systematic course of actions performed in order to discover or identify colorectal cancer |
nan |
False |
History of Prior Colon Polyps |
Yes/No indicator to describe if the subject had a previous history of colon polyps as noted in the history/physical or previous endoscopic report (s). |
nan |
False |
Family Colon Cancer History Indicator |
The indicator to designate if any first degree relative has a history of colorectal cancer. |
nan |
False |
Family Medical History Colorectal Polyp Diagnosis |
A yes/no/unknown/not applicable indicator related to family medical history diagnosis of polypoid lesion that arises from the colon or rectum and protrudes into the lumen. |
nan |
False |
Immediate Family History Endometrial Cancer |
Text that describes the age at which the family member was diagnosed with endometrial or uterine cancer in relationship to their 50th birthday. |
nan |
False |
Immediate Family History Ovarian Cancer |
Text that describes the age at which the family member was diagnosed with ovarian cancer in relationship to their 50th birthday. |
nan |
False |
Patient Inflammatory Bowel Disease Personal Medica History |
The indicator for patient's personal medical history of inflammatory bowel disease (chronic, non-specific disorders of unknown etiology, including Crohn disease and ulcerative colitis). |
nan |
False |
Patient Colonoscopy Performed Indicator |
The yes/no indicator that records if the subject has undergone a previous colonoscopy. |
nan |
False |
Colorectal Cancer Tumor Border Configuration |
The description of the border configuration of a colorectal tumor at pathologic assessment. |
nan |
False |
MLH1 Promoter Methylation Status |
Text term to define the status of promoter methylation for the MLH1 gene. Note: MLH1 gene is commonly associated with hereditary nonpolyposis colorectal cancer. Testing for methylation of the MLH1 promoter can help distinguish sporadic from inherited cancers. |
nan |
False |
Colorectal Cancer KRAS Indicator |
The yes/no/not applicable indicator that describes if patient has diagnosis of colorectal cancer with known KRAS. |
nan |
False |
Colon Polyp Occurence Indicator |
Yes/No indicator to describe if the subject had a previous history of colon polyps as noted in the history/physical or previous endoscopic report (s). |
nan |
False |
Family History Colorectal Polyp |
A yes/no/unknown/not applicable indicator related to family medical history diagnosis of polypoid lesion that arises from the colon or rectum and protrudes into the lumen. |
nan |
False |
Colorectal Polyp New Indicator |
A yes/no response to a question that asks whether any new polyps greater or equal to two millimeter were identified. |
nan |
False |
Colorectal Polyp Shape |
Shape of polyp identified in the participant |
nan |
False |
Size of Polyp Removed |
Size of the polyp removed in cm |
nan |
False |
Colorectal Polyp Count |
The total number of polyps detected |
nan |
False |
Colorectal Polyp Type |
Type of polyp found in the participant |
nan |
False |
Colorectal Polyp Adenoma Type |
Type of adenoma associated with the polyp |
nan |
False |
Breast Carcinoma Detection Method Type |
The means, manner of procedure, or systematic course of actions performed in order to discover or identify breast cancer. |
nan |
False |
Breast Carcinoma Histology Category |
Classification of the type of invasive breast carcinoma diagnosed based on histologic attributes. |
nan |
False |
Invasive Lobular Breast Carcinoma Histologic Category |
The histologic subtype for an infiltrating lobular carcinoma of the breast. |
nan |
False |
Invasive Ductal Breast Carcinoma Histologic Category |
The histologic subtype for the most common type of invasive breast carcinoma. |
nan |
False |
Breast Biopsy Procedure Finding Type |
Text term to describe the result of the examination of the breast tissue specimen or fluid as related to the presence and nature of disease. |
nan |
False |
Breast Quadrant Site |
The breast quadrant or structure from which the breast tissue specimen was removed for microscopic examination. |
nan |
False |
Breast Cancer Assessment Tests |
Text term to identify assessment tests done in participants during diagnosis |
nan |
False |
Breast Cancer Genomic Test Performed |
Text term that represents the name of the genomic test performed for breast cancer. |
nan |
False |
Mammaprint Risk Group |
Text term that represents the risk group for breast cancer as determined by assessment of the MammaPrint test. |
nan |
False |
Oncotype Risk Group |
Text term that represents the risk group for breast cancer as determined by assessment of the Oncotype recurrence score. |
nan |
False |
Breast Carcinoma Estrogen Receptor Status |
Text term to represent the overall result of Estrogen Receptor (ER) testing in a participant with breast cancer |
nan |
False |
Breast Carcinoma Progesteroner Receptor Status |
Text term to represent the overall result of Progresterone Receptor (PR) testing in a participant with breast cancer |
nan |
False |
Breast Cancer Allred Estrogen Receptor Score |
The numeric Allred score, that is cell staining percentage plus intensity, to determine estrogen receptor status. |
nan |
False |
Prior Invasive Breast Disease |
Text term to indicate prior invasive breast condition in the participant |
nan |
False |
Breast Carcinoma ER Status Percentage Value |
A numerical quantity measured or assigned or computed which captures the estrogen receptor level measured in a participant with breast cancer |
nan |
False |
Breast Carcinoma PR Status Percentage Value |
A numerical quantity measured or assigned or computed which captures the progesterone receptor level measured in a participant with breast cancer |
nan |
False |
HER2 Breast Carcinoma Copy Number Total |
Result of HER2 Copy Number testing (in a participant with breast cancer), expressed as a range of values. |
nan |
False |
Breast Carcinoma Centromere 17 Copy Number |
Result of Centromere 17 testing in a sample or specimen of metastatic breast carcinoma, expressed as a range of values. |
nan |
False |
Breast Carcinoma HER2 Centromere17 Copynumber Total |
Number of Cells Counted for HER2 & Centromere 17 Copy Numbers in a participant with breast cancer |
nan |
False |
Breast Carcinoma HER2 Chromosome17 Ratio |
HER2 chromosome 17 ratio in participants with breast cancer |
nan |
False |
Breast Carcinoma Surgical Procedure Name |
Text name of a surgical procedure performed for a person with a diagnosis of breast cancer |
nan |
False |
Breast Carcinoma HER2 Ratio Diagnosis |
HER2 ratio of the participant at diagnosis |
nan |
False |
Breast Carcinoma HER2 Status |
Text term to signify the result of the medical procedure that involves testing a sample of blood or tissue for HER2 in a participant with breast cancer |
nan |
False |
Hormone Therapy Breast Cancer Prevention Indicator |
Did the patient receive hormonal therapy for prevention of breast cancer? |
nan |
False |
Breast Carcinoma ER Staining Intensity |
Text term to indicate the ER staining intensity on pathology assessment in a participant with breast cancer |
nan |
False |
Breast Carcinoma PR Staining Intensity |
Text term to indicate the PR staining intensity on pathology assessment in a participant with breast cancer |
nan |
False |
Oncotype Score |
OncotypeDX recurrence score |
nan |
False |
Breast Imaging Performed Type |
The kind of technology or method performed for screening, diagnosis, surgical procedures or therapy that aids in the visualization of the breast(s). |
nan |
False |
Multifocal Breast Carcinoma Present Indicator |
A response to indicate if there is breast cancer characterized by the presence of multiple cancerous tumors that originate from the same clone and usually located in the same quadrant of the breast. |
nan |
False |
Multicentric Breast Carcinoma Present Indicator |
A response to indicate if there is breast cancer characterized by the presence of multiple cancerous tumors that originate from different clones and usually located in different quadrants of the breast. |
nan |
False |
BIRADS Mammography Breast Density Category |
The category that describes the relative amount of different tissues present in the breast on a mammogram based on the updated 2015 edition of the American College of Radiology's Breast Imaging Reporting and Data System (BI-RADS) reporting guidelines. |
nan |
False |
CNS Tumor Primary Anatomic Site |
Primary tumor location in the central nervous system that comprise the tissues of the central nervous system (brain and spinal cord)-not covered in Tiers 1 and 2 |
nan |
False |
Glioma Specific Metastasis Sites |
Evidence of active brain metastasis including leptomeningeal involvement |
nan |
False |
Glioma Specific Radiation Field |
A description of the location on or within the CNS where radiation was administered in a partcipant with glioma |
nan |
False |
Supra Tentorial Ependymoma Molecular Subgroup |
Text term to identify the molecular subgroup in a supra tentorial ependymoma |
nan |
False |
Infra Tentorial Ependymoma Molecular Subgroup |
Text term to identify the molecular subgroup in a infra tentorial ependymoma |
nan |
False |
Neuroblastoma MYCN Gene Amplification Status |
Neuroblastoma MYCN amplification or over-expression status |
nan |
False |
Specimen Blast Count Percentage Value |
The value, in percent(%) of the medical procedure that involves testing a sample of blood for blast cells, immature (undifferentiated) cells during diagnosis |
nan |
False |
NCI ALL Risk Group |
The NCI risk group assigned to a patient at initial diagnosis with Acute Lymphoblastic Leukemia. |
nan |
False |
MRD ALL Diagnostic Sensitivity |
The assay sensitivity results of a diagnostic assessment of Minimal Residual Disease in patients diagnosed with Acute Lymphoblastic Leukemia. |
nan |
False |
CNS Leukemia Status |
The status of central nervous system leukemia at the time of diagnosis. |
nan |
False |
Ovarian Cancer Histologic Subtype |
Text term to describe the histological subtype of ovarian cancer in the participant |
nan |
False |
Ovarian Cancer Surgical Outcome |
Text term that describes the kind of surgical treatment administered. |
nan |
False |
Ovarian Cancer Platinum Status |
Text term to indicate the status of treatment with platinum in participant with ovarian cancer |
nan |
False |
Location Extent Extraprostatic Extension |
Location and extent of extraprostatic extension |
nan |
False |
Location Nature Positive Margins |
Location and nature of positive margins |
nan |
False |
Seminal Vesicle Invasion |
An anatomic position identifying a side of the body where local spread of malignant neoplasm is found to infiltrate tissue in the saclike glandular diverticulum on the ductus deferens in a male. |
nan |
False |
Prostate Carcinoma Histologic Type |
The diagnostic subclassification of an invasive prostate carcinoma. |
nan |
False |
Prostate Cancer Local Extent |
The response used to categorize the local extent of disease for prostate cancer. |
nan |
False |
Additonal Findings Uninvolved Prostate |
Additional findings, uninvolved prostate |
nan |
False |
Prostate Cancer Cytologic Morphologic Subtypes |
Text term that describes various morphological and cytological subtypes in protate tumors. |
nan |
False |
Sarcoma Subtype |
The subtype related to the scientific determination and investigation, analysis and recognition of the presence and nature of disease, condition, or injury from expressed signs and symptoms of tissue growth resulting from uncontrolled cell proliferation. |
nan |
False |
Sarcoma Diagnosis Classification Category |
High level grouping to describe a diagnostic grouping or category for sarcoma, a malignant mesenchymal cell tumor most commonly arising from muscle, fat, fibrous tissue, bone, cartilage, and blood vessels. |
nan |
False |
Sarcoma Tumor Extension Type |
The field to indicate the organs and structures to which the tumor has become adherent or has invaded. |
nan |
False |
Pancreas Precancer Histopathologic Grade |
The grade of precancerous pancreatic tissue based on microscopic study of characteristic tissue abnormalities by employing various cytochemical and immunocytochemical stains. |
nan |
False |
Pancreatic IPMN Pathology Epithelial Subtype |
The Intraductal Papillary Mucinous Neoplasm (IPMN) epithelial cell subtype based on the gross and microscopic examination of a pancreatic neoplasm specimen |
nan |
False |
Pancreatic Duct Final Pathology Type |
The final pathology result of the pancreatic duct communication type. |
nan |
False |
Cutaneous Melanoma Tumor Infiltrating Lymphocytes |
Description of degree of lymphocytic infiltration surrounding and disrupting tumor cells of the vertical growth phase in a cutaneous melanoma. |
nan |
False |
Cutaneous Melanoma Tumor Regression Range |
Description of the degree to which tumor cells are replaced by lymphocytic inflammation with or without dermal melanophages and fibrosis._Range; the difference between the lowest and highest numerical values. |
nan |
False |
Melanoma Specimen Clark Level Value |
Definition of the Clark level or depth of involvement of a melanoma in the skin or a specimen. |
nan |
False |
Cutaneous Melanoma Surgical Margins |
Text term to indicate presence of tumor at resection margins |
nan |
False |
Melanoma Lesion Size |
Diameter of lesion determined on skin examination (pre-bx), in mm |
nan |
False |
History of Atypical Nevi |
Patient has a history of atypical nevi |
nan |
False |
Fitzpatrick Skin Tone |
The Fitzpatrick classification of skin phototype |
nan |
False |
History of Chronic UV Exposure |
History of chronic UV exposure |
nan |
False |
History of Blistering Sunburn |
Patient has history of blistering sunburn |
nan |
False |
History of Tanning Bed Use |
History of tanning bed use of the patient |
nan |
False |
Immediate Family History Melanoma |
Text that describes the age at which the family member was diagnosed with melanoma skin cancer in relationship to their 50th birthday. |
nan |
False |
Melanoma Biopsy Resection Sites |
Biopsy resection sites specific to melanoma (not covered in Tiers 1 and 2) |
nan |
False |
Cutaneous Melanoma Ulceration |
Description of extent of disruption to the surface of the skin caused by the cutaneous melanoma. |
nan |
False |
Cutaneous Melanoma Additional Findings |
Significant pathologic finding present in addition to the cutaneous melanoma. |
nan |
False |
HTAN RPPA Antibody Table ID |
HTAN identifier associated with RPPA antibody level metadata. Identical for every row of the table. |
nan |
True |
Ab Name Reported on Dataset |
The antibody name. |
nan |
True |
GENCODE Gene Symbol Target |
The comma separated list of gene symbols targeted by the antibody. |
nan |
True |
UNIPROT Protein ID Target |
The comma separated list of UNIPROT IDs targeted by the antibody. |
nan |
True |
Phosphoprotein Flag |
A flag the denotes if an antibody targets a phosphoprotein. |
nan |
True |
Internal Ab ID |
Internal lab ID for an antibody. |
nan |
True |
Species |
Host animal. |
nan |
True |
RPPA Dilution |
The dilution ratio. |
nan |
False |
Phospho Site |
The protein site for a phosphoprotein targeting antibody. Report AA and site (i.e. S442) |
Phosphoprotein Flag |
False |
RPPA Validation Status |
Valid = RPPA and WB correlation > 0.7; Use with Caution = RPPA and WB correlation < 0.7; Under Evaluation = Antibody has given mixed results and/or evaluated by another lab; We are in the process of (re)validating; Used for QC = These antibodies are used for tissue sample quality control (QC) |
nan |
False |
Antibody Notes |
Notes on antibodies replacements and antibody recognition observations. |
nan |
False |
Pre-processing Completed |
Pre-processing steps completed to convert level 1 raw data to a single level 2 image |
nan |
True |
Pre-processing Required |
Pre-processing steps required to convert level 1 raw data to a single level 2 image |
nan |
True |
Publication |
An empty parent attribute for publications |
nan |
False |
Publication Manifest |
Publication specific attributes. |
Component,Publication-associated HTAN Parent Data File ID, HTAN Grant ID, HTAN Center ID, Publication Content Type, DOI, Title, Authors, Corresponding Author, Corresponding Author ORCID, Year of Publication, Location of Publication, Publication Abstract, License, PMID, Publication contains HTAN ID, Data Type, Tool, Supporting Link, Supporting Link Description |
False |
Publication-associated HTAN Parent Data File ID |
HTAN Data File Identifier(s) of the files associated with the content presented/published. Should be comma-separated lists. |
nan |
True |
HTAN Grant ID |
HTAN grant number(s) (i.e. CA------ format) associated with the content presented/published. |
nan |
True |
HTAN Center ID |
List of HTAN Center ID(s) associated with the content presented/published. |
nan |
True |
Publication Content Type |
The type of content presented or published. |
nan |
True |
DOI |
The digital object identifier (DOI) of the content in the form of https://www.doi.org/{doi} to comply with CrossRef DOI display guidelines. |
nan |
True |
Corresponding Author |
The name(s) of the corresponding author(s) of the content presented/published. If more than one corresponding author, please list in the order they appear in the author list. |
nan |
True |
Corresponding Author ORCID |
The ORCiD(s) of the corresponding author(s) of the content presented/published. Should be a valid ORCiD url starting with https://orcid.org/ followed by a 16 digit identifier in dash separated groups of 4 (for example https://orcid.org/0000-0002-1825-0097). If more than one corresponding author, please list ORCiDs in the order the authors appear in the author list. |
nan |
True |
Title |
The title of the content presented or published. |
nan |
True |
Authors |
The names of the author(s) of the content presented/published, in the order they appear. |
nan |
True |
Year of Publication |
The year the content was presented or published (format YYYY). |
nan |
True |
Location of Publication |
The name of the preprint server, journal, or conference where the content was presented/published. |
nan |
True |
Publication Abstract |
The abstract or short description of the content presented/published. |
nan |
True |
License |
The type of license applicable to the content. |
nan |
False |
PMID |
The PubMed identifier associated with the publication (applicable to published manuscripts). Provide as a URL of the form https://pubmed.ncbi.nlm.nih.gov/{pmid} |
nan |
False |
Data Type |
Types of data associated with the content. Fill out Other Data Type Specified, if not on the list. |
nan |
True |
Other Data Type Specified |
Other types of data associated with the content. |
nan |
False |
Supporting Link |
Relevant external links associated with the content (e.g external datasets used for validation). Please note: Supporting Links and Supporting Link Descriptions are provided by authors and are not verified by the NIH NCI or the HTAN DCC. This information and any linked data should only be shared by an authorized individual(s) in accordance with the terms of the HTAN data sharing agreements and policies and/or any other applicable agreement(s). Validated as URL |
nan |
False |
Supporting Link Description |
Description of relevant external links associated with the publication (e.g An external mouse dataset used for validation). Please note: Supporting Links and Supporting Link Descriptions are provided by authors and are not verified by the NIH NCI or the HTAN DCC. This information and any linked data should only be shared by an authorized individual(s) in accordance with the terms of the HTAN data sharing agreements and policies and-or any other applicable agreement(s). |
nan |
False |
Tool |
Were any software or computational tools generated for this content |
nan |
True |
Accessory Data Type |
Accesory specific data type |
nan |
False |
Accessory |
An empty parent attribute for accessory |
nan |
False |
Accessory Manifest |
Accessory specific attributes |
Component,Dataset Name,Accessory Synapse ID,Accessory Description, Accessory Data Type,HTAN Center ID,HTAN Parent Biospecimen ID,Accessory-associated HTAN Parent Data File ID |
False |
Dataset Name |
Name of a dataset (e.g. a Synapse folder) |
nan |
True |
Accessory Synapse ID |
Synapse ID of folder containing accessory files |
nan |
True |
Accessory Description |
Free text field containing description of accessory file(s) |
nan |
True |
Accessory-associated HTAN Parent Data File ID |
HTAN Data File Identifier(s) of the files associated with the accessory content. Should be comma-separated lists. |
nan |
False |
MapQ30 |
Number of reads with Quality >= 30. |
nan |
False |
scATAC-seq Object ID |
Orig.Ident or scATAC-seq Object ID |
nan |
False |
nCount Peaks |
Total number of fragments in peaks |
nan |
False |
nFeature Peaks |
Number of peaks with at least one read |
nan |
False |
Total Read-Pairs |
Total read-pairs |
nan |
False |
Duplicate Read-Pairs |
Number of duplicate read-pairs |
nan |
False |
Chimeric Read-Pairs |
Number of chimerically mapped read-pairs |
nan |
False |
Unmapped Read-Pairs |
Number of read-pairs with at least one end not mapped |
nan |
False |
LowMapQ |
Number of read-pairs with <30 mapq on at least one end |
nan |
False |
Mitochondrial Read-Pairs |
Number of read-pairs mapping to mitochondria and non-nuclear contigs |
nan |
False |
Passed Filters |
Number of non-duplicate, usable read-pairs i.e. fragments |
nan |
False |
TSS Fragments |
Number of fragments overlapping with TSS regions |
nan |
False |
DNase Sensitive Region Fragments |
Number of fragments overlapping with DNase sensitive regions |
nan |
False |
Enhancer Region Fragments |
Number of fragments overlapping enhancer regions |
nan |
False |
Promoter Region Fragments |
Number of fragments overlapping promoter regions |
nan |
False |
On Target Fragments |
Number of fragments overlapping any of TSS, enhancer, promoter and DNase hypersensitivity sites (counted with multiplicity) |
nan |
False |
Blacklist Region Fragments |
Number of fragments overlapping blacklisted regions |
nan |
False |
Peak Region Fragments |
Number of fragments overlapping peaks |
nan |
False |
Peak Region Cutsites |
Number of ends of fragments in peak regions |
nan |
False |
Nucleosome Signal |
Nucleosome signal score (strength of the nucleosome signal per cell, computed as the ratio of fragments between 147 bp and 294 bp (mononucleosome) to fragments < 147 bp (nucleosome-free)) |
nan |
False |
Nucleosome Percentile |
Percentile rank of nucleosome score |
nan |
False |
TSS Enrichment |
Transcription start site (TSS) enrichment score |
nan |
False |
TSS Percentile |
Percentile rank of TSS score |
nan |
False |
Pct Reads in Peaks |
Percentage of reads in peaks |
nan |
False |
Blacklist Ratio |
Ratio of reads in blacklist regions |
nan |
False |
Seurat Clusters |
Clusters of cells by a shared nearest neighbor (SNN) modularity optimization based clustering algorithm |
nan |
False |
nCount RNA |
Total number of fragments in genes |
nan |
False |
nFeature RNA |
Number of genes detected in cell |
nan |
False |
MACS2 Seqnames |
Chromosome id |
nan |
False |
MACS2 Start |
Genomic starting position in MACS2 |
nan |
False |
MACS2 End |
Genomic ending position in MACS2 |
nan |
False |
MACS2 Width |
Width of the peak in bases in MACS2 |
nan |
False |
MACS2 Strand |
DNA stand aligned with in MACS2 |
nan |
False |
MACS2 Name |
Name of the peak in MACS2 |
nan |
False |
MACS2 Score |
Peak score (proportional to q-value) in MACS2 |
nan |
False |
MACS2 Fold Change |
Fold enrichment for this peak summit against random Poisson distribution with local lambda in MACS2 |
nan |
False |
MACS2 Neg Log10 pvalue Summit |
Negative log10 p-value for the peak summit in MACS2 |
nan |
False |
MACS2 Neg Log10 qvalue Summit |
Negative log10 q-value for the peak summit in MACS2 |
nan |
False |
MACS2 Relative Summit Position |
Position of the peak summit related to the start position in MACS2 |
nan |
False |
Is lowest level |
Denotes that the manifest represents the lowest data level submitted. Use when L1 data is missing |
nan |
False |
Yes - Is lowest level |
If manifest is lowest level require HTAN Parent Biospecimen ID |
HTAN Parent Biospecimen ID |
False |
Normalization Method |
Description of Normalization Process |
nan |
False |
Batch Correction Method |
Method that was used to batch correct Level 3 data |
nan |
False |
MS Batch ID |
Batch ID indicating a set of samples that were run together. |
nan |
True |
MS-based Assay Type |
Analytes are the target molecules being measured with the assay. |
nan |
True |
MS-based Targeted |
Specifies whether or not a specific molecule(s) is/are targeted for detection/measurement by the assay. Example: The MALDI Imaging analyte is lipids. |
nan |
True |
MS Instrument Vendor and Model |
An acquisition instrument is the device that contains the signal detection hardware and signal processing software. Assays generate signals such as light of various intensities or color or signals representing the molecular mass. |
nan |
True |
MS Source |
The ion source type used for surface sampling (MALDI, MALDI-2, DESI, or SIMS) or LC-MS/MS data acquisition (nESI) |
nan |
True |
Polarity |
The polarity of the mass analysis (positive or negative ion modes) |
nan |
True |
Mass Range Low Value |
The low value of the scanned mass range for MS1 in m/z. |
nan |
True |
Mass Range High Value |
The high value of the scanned mass range for MS1 in m/z. |
nan |
True |
Data Collection Mode |
Mode of data collection in tandem MS assays. Either DDA (Data-dependent acquisition) or DIA (Data-indemendent acquisition. |
nan |
True |
MS Scan Mode |
Indicates whether experiment is MS, MS/MS, or other (possibly MS3 for TMT) |
nan |
True |
MS Labeling |
Indicates whether samples were labeled prior to MS analysis (e.g., TMT) |
nan |
True |
LC Instrument Vendor and Model |
The manufacturer of the instrument used for LC. |
nan |
True |
LC Column Vendor and Model |
The manufacturer of the LC Column unless self-packed, pulled tip capilary is used and the model number/name of the LC Column - IF custom self-packed, pulled tip calillary is used enter 'Pulled tip capilary' |
nan |
True |
LC Resin |
Details of the resin used for lc, including vendor, particle size, pore size |
nan |
True |
LC Length Value |
LC column length in cm. |
nan |
True |
LC Temp Value |
LC temperature in C. |
nan |
True |
LC ID Value |
LC column inner diameter in microns. |
nan |
True |
LC Flow Rate |
LC flow rate in nL/min. |
nan |
True |
LC Gradient |
The program dictates the mobile phase solvent composition over the course of the chromatographic run. |
nan |
True |
LC Mobile Phase A |
Composition of mobile phase A |
nan |
True |
LC Mobile Phase B |
Composition of mobile phase B |
nan |
True |
MS Instrument Metadata File |
Additional file containing instrument metadata details. Use either synapse_path or entity_Id |
nan |
False |
Bisulfite Conversion |
Name of the kit used in bisulfite conversion. |
nan |
True |
Replicate Type |
A common term for all files belonging to the same sample. We suggest using a stable sample accession from a biosample archive like BioSamples. |
nan |
True |
Bulk Methylation Assay Type |
Assay types normally determine genomic coverage |
Targeted Genome, Beadchip Array |
True |
Targeted Genome |
Assay for analyzing specific mutations in a given sample |
nan |
False |
Beadchip Array |
Assay that uses beads to target a specific locus on the genome. |
nan |
False |
Total DNA Input |
Overall number of reads for a given sample in digits (microgram, nanogram). |
nan |
False |
Trimmer |
Software used for trimming |
nan |
True |
Bulk Methylation Genomic Reference |
The human genome reference used in the alignment of reads |
nan |
True |
Duplicate Removal Software |
Software used for remove duplicate reads |
nan |
True |
Proportion of Minimum CpG Coverage 10X |
Proportion of all reference bases for whole genome sequencing, or targeted sequencing, that achieves 10X or greater coverage per CpG. |
nan |
False |
DMC Calling Tool |
Software used for calling differentially methylated CpG (DMC) and differentially methylated region (DMR) |
nan |
True |
DMC Calling Workflow URL |
Generic name for the workflow used to analyze a data set |
nan |
True |
DMR Calling Tool |
Software used for calling differentially methylated CpG (DMC) and differentially methylated region (DMR) |
nan |
True |
DMR Calling Workflow URL |
Generic name for the workflow used to analyze a data set |
nan |
True |
pUC19 methylation ratio |
Methylation ratio of mostly methylated pUC19 control, as a percentage |
nan |
True |
Lambda methylation ratio |
Methylation ratio of mostly unmethylated lambda control, as a percentage |
nan |
True |
DMC data file format |
Format of the data files |
nan |
True |
DMR data file Format |
Format of the data files. |
nan |
True |
MS Assay Category |
Type of Mass Spectrometry performed. |
nan |
True |
Publication contains HTAN ID |
HTAN IDs are used in the publication. |
nan |
True |
Electron Microscopy Level 1 |
Raw electron microscopy data as one TIFF file per plane for a 3D image stack or per tile for a 2D large area montage |
Component, Filename, File Format, HTAN Parent Biospecimen ID, HTAN Data File ID, EM method, EM signal or contrast mech, EM instrument, Protocol Link, Software and Version, SizeX, SizeY, SizeC, SizeZ, PhysicalSizeX, PhysicalSizeY, PhysicalSizeZ, EM dwell or exposure time,EM voltage, EM beam current, EM spot size, EM stage tilt, EM signal processing, EM contrast type |
False |
Electron Microscopy Level 2 |
Processed electron microscopy data as one OME-TIFF image per plane or montage |
Component, Filename, File Format, HTAN Parent Data File ID, HTAN Data File ID,Tile overlap X, Tile overlap Y,EM contrast type |
False |
Electron Microscopy Level 3 |
Segmented electron microscopy data as .am or .tiff formats |
Component, Filename, File Format, HTAN Parent Data File ID, HTAN Data File ID, Imaging Object Class |
False |
Electron Microscopy Level 4 |
Movies or other derived files from electron microscopy data |
Component, Filename, File Format, HTAN Parent Data File ID, HTAN Data File ID, Comment |
False |
EM instrument |
Make and model of the EM instrument used |
nan |
True |
EM method |
Electron microscopy method used |
nan |
True |
EM signal or contrast mech |
How the electron microscopy signal is generated from the sample |
nan |
True |
EM dwell or exposure time |
Duration in microseconds (µs) of electron beam data collection per pixel or frame |
nan |
False |
EM voltage |
Accelerating voltage in kiloelectronvolts (keV) |
nan |
False |
EM beam current |
Beam current in nanoamps (nA) |
nan |
False |
EM spot size |
Beam spot size in micrometers (µm) |
nan |
False |
EM stage tilt |
Physical stage tilt in degrees with respect to the electron beam |
nan |
False |
EM signal processing |
SNR improvement strategies used |
nan |
False |
EM contrast type |
Does the image use standard SEM contrast or TEM contrast |
nan |
False |
Tile overlap X |
Percentage of image overlap to allow tile stitching in x direction |
nan |
True |
Tile overlap Y |
Percentage of image overlap to allow tile stitching in x direction |
nan |
True |
Barretts Esophagus Goblet Cells Present |
Presence or absennce of Barretts esophagus goblet cells. |
nan |
False |
Pancreatitis Onset Year |
Date of onset of pancreatitis. |
nan |
False |
HTAN Parent Channel Metadata ID |
HTAN ID for a level 3 channels table. |
nan |
True |
Single Nucleus Capture |
Nuclei isolation method |
nan |
False |
Associated mRNA Library Data File ID |
Sample Level HTAN Data File ID for the associated level - HTAN ID of this file HTAN ID SOP (eg HTANx_yyy_zzz) |
nan |
True |
Single Cell Barcode Method Applied |
The method by which cells are multiplex or labeled with cell surface markers or probes |
nan |
True |
Feature Barcode Library Type |
The library construction methods for the feature barcode library |
nan |
True |
Barcode Folder Synapse ID |
Synapse ID of the folder containing the barcode lists |
nan |
True |
Barcode Folder File List |
A comma separated list of filenames in the gzipped folder detailing what barcodes are specific to demultiplexing samples versus providing surface protein data |
nan |
True |
Microarray Platform ID |
The NCBI GEO Microarray Platform ID that links to the table containing the array definition |
nan |
True |
Microarray Molecule |
Microarray is measuring this kind of molecule |
nan |
True |
Microarray Label |
Microarray used this kind of label |
nan |
True |
Microarray Value Definition |
What the provided value signifies |
nan |
True |
Microarray Protocol Auxiliary File |
Auxiliary file describing the experimental protocols used, as described in the NCBI GEO microarray template, recorded as synapse ID (syn12345). |
nan |
True |
Participant Vital Status Update |
Updates to a participants vital status |
Component, HTAN Participant ID, Vital Status |
False |
Precancer Diagnosis |
Diagnosis of a precancerous condition |
Component, HTAN Participant ID, Precancer Case |
False |
Alive |
This indicates the participant is alive and defines further required metadata |
Days to Vital Status Reference |
False |
Days to Vital Status Reference |
Number of days between the date used for index and the reference date for designation of vital status |
nan |
True |
Precancer Case |
Yes/No indicator to designate the participant for whom precancerous lesion(s) was identified (premalignancy only). |
nan |
True |
Yes - Precancer Case |
Indicates that the participant is a precancer case |
Precancerous Condition Type, Days to Precancer Case Designation, WHO Precursor Lesion Code |
False |
Days to Precancer Case Designation |
Number of days between the date used for index and the reference date for designation of precancer status. |
nan |
False |
WHO Precursor Lesion Code |
World Health Organization Classification of Tumour cytopathology-based coding system, includes 'precursor lesion' designations for precancers. ICD-O-3 morphology axis format eg 1234/1 |
nan |
False |