POST

A framework for set-based association analysis in high-dimensional data

Xueyuan Cao, E. Olusegun George, Mingjuan Wang, Dale B. Armstrong, Cheng Cheng, Susana Raimondi, Jeffrey E. Rubnitz, James R. Downing, Mondira Kundu, Stanley B. Pounds

Research output: Contribution to journalArticle

Abstract

Evaluating the differential expression of a set of genes belonging to a common biological process or ontology has proven to be a very useful tool for biological discovery. However, existing gene-set association methods are limited to applications that evaluate differential expression across k⩾2 treatment groups or biological categories. This limitation precludes researchers from most effectively evaluating the association with other phenotypes that may be more clinically meaningful, such as quantitative variables or censored survival time variables. Projection onto the Orthogonal Space Testing (POST) is proposed as a general procedure that can robustly evaluate the association of a gene-set with several different types of phenotypic data (categorical, ordinal, continuous, or censored). For each gene-set, POST transforms the gene profiles into a set of eigenvectors and then uses statistical modeling to compute a set of z-statistics that measure the association of each eigenvector with the phenotype. The overall gene-set statistic is the sum of squared z-statistics weighted by the corresponding eigenvalues. Finally, bootstrapping is used to compute a p-value. POST may evaluate associations with or without adjustment for covariates. In simulation studies, it is shown that the performance of POST in evaluating the association with a categorical phenotype is similar to or exceeds that of existing methods. In evaluating the association of 875 biological processes with the time to relapse of pediatric acute myeloid leukemia, POST identified the well-known oncogenic WNT signaling pathway as its top hit. These results indicate that POST can be a very useful tool for evaluating the association of a gene-set with a variety of different phenotypes. We have developed an R package named POST which is freely available in Bioconductor.

Original languageEnglish (US)
Pages (from-to)76-81
Number of pages6
JournalMethods
Volume145
DOIs
StatePublished - Aug 1 2018

Fingerprint

Genes
Testing
Phenotype
Biological Phenomena
Statistics
Eigenvalues and eigenfunctions
Biological Ontologies
Pediatrics
Acute Myeloid Leukemia
Ontology
Research Personnel
Recurrence

All Science Journal Classification (ASJC) codes

  • Molecular Biology
  • Biochemistry, Genetics and Molecular Biology(all)

Cite this

Cao, X., George, E. O., Wang, M., Armstrong, D. B., Cheng, C., Raimondi, S., ... Pounds, S. B. (2018). POST: A framework for set-based association analysis in high-dimensional data. Methods, 145, 76-81. https://doi.org/10.1016/j.ymeth.2018.05.011

POST : A framework for set-based association analysis in high-dimensional data. / Cao, Xueyuan; George, E. Olusegun; Wang, Mingjuan; Armstrong, Dale B.; Cheng, Cheng; Raimondi, Susana; Rubnitz, Jeffrey E.; Downing, James R.; Kundu, Mondira; Pounds, Stanley B.

In: Methods, Vol. 145, 01.08.2018, p. 76-81.

Research output: Contribution to journalArticle

Cao, X, George, EO, Wang, M, Armstrong, DB, Cheng, C, Raimondi, S, Rubnitz, JE, Downing, JR, Kundu, M & Pounds, SB 2018, 'POST: A framework for set-based association analysis in high-dimensional data', Methods, vol. 145, pp. 76-81. https://doi.org/10.1016/j.ymeth.2018.05.011
Cao X, George EO, Wang M, Armstrong DB, Cheng C, Raimondi S et al. POST: A framework for set-based association analysis in high-dimensional data. Methods. 2018 Aug 1;145:76-81. https://doi.org/10.1016/j.ymeth.2018.05.011
Cao, Xueyuan ; George, E. Olusegun ; Wang, Mingjuan ; Armstrong, Dale B. ; Cheng, Cheng ; Raimondi, Susana ; Rubnitz, Jeffrey E. ; Downing, James R. ; Kundu, Mondira ; Pounds, Stanley B. / POST : A framework for set-based association analysis in high-dimensional data. In: Methods. 2018 ; Vol. 145. pp. 76-81.
@article{41df8b9f382d47b080bf2e86829dcaf9,
title = "POST: A framework for set-based association analysis in high-dimensional data",
abstract = "Evaluating the differential expression of a set of genes belonging to a common biological process or ontology has proven to be a very useful tool for biological discovery. However, existing gene-set association methods are limited to applications that evaluate differential expression across k⩾2 treatment groups or biological categories. This limitation precludes researchers from most effectively evaluating the association with other phenotypes that may be more clinically meaningful, such as quantitative variables or censored survival time variables. Projection onto the Orthogonal Space Testing (POST) is proposed as a general procedure that can robustly evaluate the association of a gene-set with several different types of phenotypic data (categorical, ordinal, continuous, or censored). For each gene-set, POST transforms the gene profiles into a set of eigenvectors and then uses statistical modeling to compute a set of z-statistics that measure the association of each eigenvector with the phenotype. The overall gene-set statistic is the sum of squared z-statistics weighted by the corresponding eigenvalues. Finally, bootstrapping is used to compute a p-value. POST may evaluate associations with or without adjustment for covariates. In simulation studies, it is shown that the performance of POST in evaluating the association with a categorical phenotype is similar to or exceeds that of existing methods. In evaluating the association of 875 biological processes with the time to relapse of pediatric acute myeloid leukemia, POST identified the well-known oncogenic WNT signaling pathway as its top hit. These results indicate that POST can be a very useful tool for evaluating the association of a gene-set with a variety of different phenotypes. We have developed an R package named POST which is freely available in Bioconductor.",
author = "Xueyuan Cao and George, {E. Olusegun} and Mingjuan Wang and Armstrong, {Dale B.} and Cheng Cheng and Susana Raimondi and Rubnitz, {Jeffrey E.} and Downing, {James R.} and Mondira Kundu and Pounds, {Stanley B.}",
year = "2018",
month = "8",
day = "1",
doi = "10.1016/j.ymeth.2018.05.011",
language = "English (US)",
volume = "145",
pages = "76--81",
journal = "Methods",
issn = "1046-2023",
publisher = "Academic Press Inc.",

}

TY - JOUR

T1 - POST

T2 - A framework for set-based association analysis in high-dimensional data

AU - Cao, Xueyuan

AU - George, E. Olusegun

AU - Wang, Mingjuan

AU - Armstrong, Dale B.

AU - Cheng, Cheng

AU - Raimondi, Susana

AU - Rubnitz, Jeffrey E.

AU - Downing, James R.

AU - Kundu, Mondira

AU - Pounds, Stanley B.

PY - 2018/8/1

Y1 - 2018/8/1

N2 - Evaluating the differential expression of a set of genes belonging to a common biological process or ontology has proven to be a very useful tool for biological discovery. However, existing gene-set association methods are limited to applications that evaluate differential expression across k⩾2 treatment groups or biological categories. This limitation precludes researchers from most effectively evaluating the association with other phenotypes that may be more clinically meaningful, such as quantitative variables or censored survival time variables. Projection onto the Orthogonal Space Testing (POST) is proposed as a general procedure that can robustly evaluate the association of a gene-set with several different types of phenotypic data (categorical, ordinal, continuous, or censored). For each gene-set, POST transforms the gene profiles into a set of eigenvectors and then uses statistical modeling to compute a set of z-statistics that measure the association of each eigenvector with the phenotype. The overall gene-set statistic is the sum of squared z-statistics weighted by the corresponding eigenvalues. Finally, bootstrapping is used to compute a p-value. POST may evaluate associations with or without adjustment for covariates. In simulation studies, it is shown that the performance of POST in evaluating the association with a categorical phenotype is similar to or exceeds that of existing methods. In evaluating the association of 875 biological processes with the time to relapse of pediatric acute myeloid leukemia, POST identified the well-known oncogenic WNT signaling pathway as its top hit. These results indicate that POST can be a very useful tool for evaluating the association of a gene-set with a variety of different phenotypes. We have developed an R package named POST which is freely available in Bioconductor.

AB - Evaluating the differential expression of a set of genes belonging to a common biological process or ontology has proven to be a very useful tool for biological discovery. However, existing gene-set association methods are limited to applications that evaluate differential expression across k⩾2 treatment groups or biological categories. This limitation precludes researchers from most effectively evaluating the association with other phenotypes that may be more clinically meaningful, such as quantitative variables or censored survival time variables. Projection onto the Orthogonal Space Testing (POST) is proposed as a general procedure that can robustly evaluate the association of a gene-set with several different types of phenotypic data (categorical, ordinal, continuous, or censored). For each gene-set, POST transforms the gene profiles into a set of eigenvectors and then uses statistical modeling to compute a set of z-statistics that measure the association of each eigenvector with the phenotype. The overall gene-set statistic is the sum of squared z-statistics weighted by the corresponding eigenvalues. Finally, bootstrapping is used to compute a p-value. POST may evaluate associations with or without adjustment for covariates. In simulation studies, it is shown that the performance of POST in evaluating the association with a categorical phenotype is similar to or exceeds that of existing methods. In evaluating the association of 875 biological processes with the time to relapse of pediatric acute myeloid leukemia, POST identified the well-known oncogenic WNT signaling pathway as its top hit. These results indicate that POST can be a very useful tool for evaluating the association of a gene-set with a variety of different phenotypes. We have developed an R package named POST which is freely available in Bioconductor.

UR - http://www.scopus.com/inward/record.url?scp=85047800078&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85047800078&partnerID=8YFLogxK

U2 - 10.1016/j.ymeth.2018.05.011

DO - 10.1016/j.ymeth.2018.05.011

M3 - Article

VL - 145

SP - 76

EP - 81

JO - Methods

JF - Methods

SN - 1046-2023

ER -