CC-PROMISE effectively integrates two forms of molecular data with multiple biologically related endpoints

Xueyuan Cao, Kristine R. Crews, James Downing, Jatinder Lamba, Stanley B. Pounds

Research output: Contribution to journalArticle

3 Citations (Scopus)

Abstract

Background: As new technologies allow investigators to collect multiple forms of molecular data (genomic, epigenomic, transcriptomic, etc) and multiple endpoints on a clinical trial cohort, it will become necessary to effectively integrate all these data in a way that reliably identifies biologically important genes. Methods: We introduce CC-PROMISE as an integrated data analysis method that combines components of canonical correlation (CC) and projection onto the most interesting evidence (PROMISE). For each gene, CC-PROMISE first uses CC to compute scores that represent the association of two forms of molecular data with each other. Next, these scores are substituted into PROMISE to evaluate the statistical evidence that the molecular data show a biologically meaningful relationship with the endpoints. Results: CC-PROMISE shows outstanding performance in simulation studies and an example application involving pediatric leukemia. In simulation studies, CC-PROMISE controls the type I error (misleading significance) rate very near the nominal level across 100 distinct null settings in which no molecular-endpoint association exists. Also, CC-PROMISE has better statistical power than three other methods that control type I error in 396 of 400 (99 %) alternative settings for which a molecular-endpoint association is present; the power advantage of CC-PROMISE exceeds 30 % in 127 of the 400 (32 %) alternative settings. These advantages of CC-PROMISE are also observed in an example application. Conclusion: CC-PROMISE very effectively identifies genes for which some form of molecular data shows a biologically meaningful association with multiple related endpoints. Availability: The R package CCPROMISE is currently available from www.stjuderesearch.org/site/depts/biostats/software.

Original languageEnglish (US)
Article number382
JournalBMC Bioinformatics
Volume17
DOIs
StatePublished - Oct 6 2016

Fingerprint

Canonical Correlation
Genes
Integrate
Projection
Pediatrics
Epigenomics
Leukemia
Software
Research Personnel
Clinical Trials
Availability
Technology
Type I error
Gene
Form
Evidence
Simulation Study
Multiple Endpoints
Statistical Power
Alternatives

All Science Journal Classification (ASJC) codes

  • Structural Biology
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Applied Mathematics

Cite this

CC-PROMISE effectively integrates two forms of molecular data with multiple biologically related endpoints. / Cao, Xueyuan; Crews, Kristine R.; Downing, James; Lamba, Jatinder; Pounds, Stanley B.

In: BMC Bioinformatics, Vol. 17, 382, 06.10.2016.

Research output: Contribution to journalArticle

Cao, Xueyuan ; Crews, Kristine R. ; Downing, James ; Lamba, Jatinder ; Pounds, Stanley B. / CC-PROMISE effectively integrates two forms of molecular data with multiple biologically related endpoints. In: BMC Bioinformatics. 2016 ; Vol. 17.
@article{0ec1961612c7489ab07c5c819380b590,
title = "CC-PROMISE effectively integrates two forms of molecular data with multiple biologically related endpoints",
abstract = "Background: As new technologies allow investigators to collect multiple forms of molecular data (genomic, epigenomic, transcriptomic, etc) and multiple endpoints on a clinical trial cohort, it will become necessary to effectively integrate all these data in a way that reliably identifies biologically important genes. Methods: We introduce CC-PROMISE as an integrated data analysis method that combines components of canonical correlation (CC) and projection onto the most interesting evidence (PROMISE). For each gene, CC-PROMISE first uses CC to compute scores that represent the association of two forms of molecular data with each other. Next, these scores are substituted into PROMISE to evaluate the statistical evidence that the molecular data show a biologically meaningful relationship with the endpoints. Results: CC-PROMISE shows outstanding performance in simulation studies and an example application involving pediatric leukemia. In simulation studies, CC-PROMISE controls the type I error (misleading significance) rate very near the nominal level across 100 distinct null settings in which no molecular-endpoint association exists. Also, CC-PROMISE has better statistical power than three other methods that control type I error in 396 of 400 (99 {\%}) alternative settings for which a molecular-endpoint association is present; the power advantage of CC-PROMISE exceeds 30 {\%} in 127 of the 400 (32 {\%}) alternative settings. These advantages of CC-PROMISE are also observed in an example application. Conclusion: CC-PROMISE very effectively identifies genes for which some form of molecular data shows a biologically meaningful association with multiple related endpoints. Availability: The R package CCPROMISE is currently available from www.stjuderesearch.org/site/depts/biostats/software.",
author = "Xueyuan Cao and Crews, {Kristine R.} and James Downing and Jatinder Lamba and Pounds, {Stanley B.}",
year = "2016",
month = "10",
day = "6",
doi = "10.1186/s12859-016-1217-0",
language = "English (US)",
volume = "17",
journal = "BMC Bioinformatics",
issn = "1471-2105",
publisher = "BioMed Central",

}

TY - JOUR

T1 - CC-PROMISE effectively integrates two forms of molecular data with multiple biologically related endpoints

AU - Cao, Xueyuan

AU - Crews, Kristine R.

AU - Downing, James

AU - Lamba, Jatinder

AU - Pounds, Stanley B.

PY - 2016/10/6

Y1 - 2016/10/6

N2 - Background: As new technologies allow investigators to collect multiple forms of molecular data (genomic, epigenomic, transcriptomic, etc) and multiple endpoints on a clinical trial cohort, it will become necessary to effectively integrate all these data in a way that reliably identifies biologically important genes. Methods: We introduce CC-PROMISE as an integrated data analysis method that combines components of canonical correlation (CC) and projection onto the most interesting evidence (PROMISE). For each gene, CC-PROMISE first uses CC to compute scores that represent the association of two forms of molecular data with each other. Next, these scores are substituted into PROMISE to evaluate the statistical evidence that the molecular data show a biologically meaningful relationship with the endpoints. Results: CC-PROMISE shows outstanding performance in simulation studies and an example application involving pediatric leukemia. In simulation studies, CC-PROMISE controls the type I error (misleading significance) rate very near the nominal level across 100 distinct null settings in which no molecular-endpoint association exists. Also, CC-PROMISE has better statistical power than three other methods that control type I error in 396 of 400 (99 %) alternative settings for which a molecular-endpoint association is present; the power advantage of CC-PROMISE exceeds 30 % in 127 of the 400 (32 %) alternative settings. These advantages of CC-PROMISE are also observed in an example application. Conclusion: CC-PROMISE very effectively identifies genes for which some form of molecular data shows a biologically meaningful association with multiple related endpoints. Availability: The R package CCPROMISE is currently available from www.stjuderesearch.org/site/depts/biostats/software.

AB - Background: As new technologies allow investigators to collect multiple forms of molecular data (genomic, epigenomic, transcriptomic, etc) and multiple endpoints on a clinical trial cohort, it will become necessary to effectively integrate all these data in a way that reliably identifies biologically important genes. Methods: We introduce CC-PROMISE as an integrated data analysis method that combines components of canonical correlation (CC) and projection onto the most interesting evidence (PROMISE). For each gene, CC-PROMISE first uses CC to compute scores that represent the association of two forms of molecular data with each other. Next, these scores are substituted into PROMISE to evaluate the statistical evidence that the molecular data show a biologically meaningful relationship with the endpoints. Results: CC-PROMISE shows outstanding performance in simulation studies and an example application involving pediatric leukemia. In simulation studies, CC-PROMISE controls the type I error (misleading significance) rate very near the nominal level across 100 distinct null settings in which no molecular-endpoint association exists. Also, CC-PROMISE has better statistical power than three other methods that control type I error in 396 of 400 (99 %) alternative settings for which a molecular-endpoint association is present; the power advantage of CC-PROMISE exceeds 30 % in 127 of the 400 (32 %) alternative settings. These advantages of CC-PROMISE are also observed in an example application. Conclusion: CC-PROMISE very effectively identifies genes for which some form of molecular data shows a biologically meaningful association with multiple related endpoints. Availability: The R package CCPROMISE is currently available from www.stjuderesearch.org/site/depts/biostats/software.

UR - http://www.scopus.com/inward/record.url?scp=84990841946&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84990841946&partnerID=8YFLogxK

U2 - 10.1186/s12859-016-1217-0

DO - 10.1186/s12859-016-1217-0

M3 - Article

VL - 17

JO - BMC Bioinformatics

JF - BMC Bioinformatics

SN - 1471-2105

M1 - 382

ER -