Regression and data mining methods for analyses of multiple rare variants in the Genetic Analysis Workshop 17 mini-exome data

Joan E. Bailey-Wilson, Jennifer S. Brennan, Shelley B. Bull, Robert Culverhouse, Yoonhee Kim, Yuan Jiang, Jeesun Jung, Qing Li, Claudia Lamina, Ying Liu, Reedik Mägi, Yue S. Niu, Claire Simpson, Libo Wang, Yildiz E. Yilmaz, Heping Zhang, Zhaogong Zhang

Research output: Contribution to journalArticle

2 Citations (Scopus)

Abstract

Group 14 of Genetic Analysis Workshop 17 examined several issues related to analysis of complex traits using DNA sequence data. These issues included novel methods for analyzing rare genetic variants in an aggregated manner (often termed collapsing rare variants), evaluation of various study designs to increase power to detect effects of rare variants, and the use of machine learning approaches to model highly complex heterogeneous traits. Various published and novel methods for analyzing traits with extreme locus and allelic heterogeneity were applied to the simulated quantitative and disease phenotypes. Overall, we conclude that power is (as expected) dependent on locus-specific heritability or contribution to disease risk, large samples will be required to detect rare causal variants with small effect sizes, extreme phenotype sampling designs may increase power for smaller laboratory costs, methods that allow joint analysis of multiple variants per gene or pathway are more powerful in general than analyses of individual rare variants, population-specific analyses can be optimal when different subpopulations harbor private causal mutations, and machine learning methods may be useful for selecting subsets of predictors for follow-up in the presence of extreme locus heterogeneity and large numbers of potential predictors.

Original languageEnglish (US)
JournalGenetic Epidemiology
Volume35
Issue numberSUPPL. 1
DOIs
StatePublished - Dec 5 2011

Fingerprint

Exome
Data Mining
Education
Phenotype
Costs and Cost Analysis
Mutation
Population
Genes
Machine Learning

All Science Journal Classification (ASJC) codes

  • Epidemiology
  • Genetics(clinical)

Cite this

Bailey-Wilson, J. E., Brennan, J. S., Bull, S. B., Culverhouse, R., Kim, Y., Jiang, Y., ... Zhang, Z. (2011). Regression and data mining methods for analyses of multiple rare variants in the Genetic Analysis Workshop 17 mini-exome data. Genetic Epidemiology, 35(SUPPL. 1). https://doi.org/10.1002/gepi.20657

Regression and data mining methods for analyses of multiple rare variants in the Genetic Analysis Workshop 17 mini-exome data. / Bailey-Wilson, Joan E.; Brennan, Jennifer S.; Bull, Shelley B.; Culverhouse, Robert; Kim, Yoonhee; Jiang, Yuan; Jung, Jeesun; Li, Qing; Lamina, Claudia; Liu, Ying; Mägi, Reedik; Niu, Yue S.; Simpson, Claire; Wang, Libo; Yilmaz, Yildiz E.; Zhang, Heping; Zhang, Zhaogong.

In: Genetic Epidemiology, Vol. 35, No. SUPPL. 1, 05.12.2011.

Research output: Contribution to journalArticle

Bailey-Wilson, JE, Brennan, JS, Bull, SB, Culverhouse, R, Kim, Y, Jiang, Y, Jung, J, Li, Q, Lamina, C, Liu, Y, Mägi, R, Niu, YS, Simpson, C, Wang, L, Yilmaz, YE, Zhang, H & Zhang, Z 2011, 'Regression and data mining methods for analyses of multiple rare variants in the Genetic Analysis Workshop 17 mini-exome data', Genetic Epidemiology, vol. 35, no. SUPPL. 1. https://doi.org/10.1002/gepi.20657
Bailey-Wilson, Joan E. ; Brennan, Jennifer S. ; Bull, Shelley B. ; Culverhouse, Robert ; Kim, Yoonhee ; Jiang, Yuan ; Jung, Jeesun ; Li, Qing ; Lamina, Claudia ; Liu, Ying ; Mägi, Reedik ; Niu, Yue S. ; Simpson, Claire ; Wang, Libo ; Yilmaz, Yildiz E. ; Zhang, Heping ; Zhang, Zhaogong. / Regression and data mining methods for analyses of multiple rare variants in the Genetic Analysis Workshop 17 mini-exome data. In: Genetic Epidemiology. 2011 ; Vol. 35, No. SUPPL. 1.
@article{24b9f121fe1444a8bd3b6e27f0540687,
title = "Regression and data mining methods for analyses of multiple rare variants in the Genetic Analysis Workshop 17 mini-exome data",
abstract = "Group 14 of Genetic Analysis Workshop 17 examined several issues related to analysis of complex traits using DNA sequence data. These issues included novel methods for analyzing rare genetic variants in an aggregated manner (often termed collapsing rare variants), evaluation of various study designs to increase power to detect effects of rare variants, and the use of machine learning approaches to model highly complex heterogeneous traits. Various published and novel methods for analyzing traits with extreme locus and allelic heterogeneity were applied to the simulated quantitative and disease phenotypes. Overall, we conclude that power is (as expected) dependent on locus-specific heritability or contribution to disease risk, large samples will be required to detect rare causal variants with small effect sizes, extreme phenotype sampling designs may increase power for smaller laboratory costs, methods that allow joint analysis of multiple variants per gene or pathway are more powerful in general than analyses of individual rare variants, population-specific analyses can be optimal when different subpopulations harbor private causal mutations, and machine learning methods may be useful for selecting subsets of predictors for follow-up in the presence of extreme locus heterogeneity and large numbers of potential predictors.",
author = "Bailey-Wilson, {Joan E.} and Brennan, {Jennifer S.} and Bull, {Shelley B.} and Robert Culverhouse and Yoonhee Kim and Yuan Jiang and Jeesun Jung and Qing Li and Claudia Lamina and Ying Liu and Reedik M{\"a}gi and Niu, {Yue S.} and Claire Simpson and Libo Wang and Yilmaz, {Yildiz E.} and Heping Zhang and Zhaogong Zhang",
year = "2011",
month = "12",
day = "5",
doi = "10.1002/gepi.20657",
language = "English (US)",
volume = "35",
journal = "Genetic Epidemiology",
issn = "0741-0395",
publisher = "Wiley-Liss Inc.",
number = "SUPPL. 1",

}

TY - JOUR

T1 - Regression and data mining methods for analyses of multiple rare variants in the Genetic Analysis Workshop 17 mini-exome data

AU - Bailey-Wilson, Joan E.

AU - Brennan, Jennifer S.

AU - Bull, Shelley B.

AU - Culverhouse, Robert

AU - Kim, Yoonhee

AU - Jiang, Yuan

AU - Jung, Jeesun

AU - Li, Qing

AU - Lamina, Claudia

AU - Liu, Ying

AU - Mägi, Reedik

AU - Niu, Yue S.

AU - Simpson, Claire

AU - Wang, Libo

AU - Yilmaz, Yildiz E.

AU - Zhang, Heping

AU - Zhang, Zhaogong

PY - 2011/12/5

Y1 - 2011/12/5

N2 - Group 14 of Genetic Analysis Workshop 17 examined several issues related to analysis of complex traits using DNA sequence data. These issues included novel methods for analyzing rare genetic variants in an aggregated manner (often termed collapsing rare variants), evaluation of various study designs to increase power to detect effects of rare variants, and the use of machine learning approaches to model highly complex heterogeneous traits. Various published and novel methods for analyzing traits with extreme locus and allelic heterogeneity were applied to the simulated quantitative and disease phenotypes. Overall, we conclude that power is (as expected) dependent on locus-specific heritability or contribution to disease risk, large samples will be required to detect rare causal variants with small effect sizes, extreme phenotype sampling designs may increase power for smaller laboratory costs, methods that allow joint analysis of multiple variants per gene or pathway are more powerful in general than analyses of individual rare variants, population-specific analyses can be optimal when different subpopulations harbor private causal mutations, and machine learning methods may be useful for selecting subsets of predictors for follow-up in the presence of extreme locus heterogeneity and large numbers of potential predictors.

AB - Group 14 of Genetic Analysis Workshop 17 examined several issues related to analysis of complex traits using DNA sequence data. These issues included novel methods for analyzing rare genetic variants in an aggregated manner (often termed collapsing rare variants), evaluation of various study designs to increase power to detect effects of rare variants, and the use of machine learning approaches to model highly complex heterogeneous traits. Various published and novel methods for analyzing traits with extreme locus and allelic heterogeneity were applied to the simulated quantitative and disease phenotypes. Overall, we conclude that power is (as expected) dependent on locus-specific heritability or contribution to disease risk, large samples will be required to detect rare causal variants with small effect sizes, extreme phenotype sampling designs may increase power for smaller laboratory costs, methods that allow joint analysis of multiple variants per gene or pathway are more powerful in general than analyses of individual rare variants, population-specific analyses can be optimal when different subpopulations harbor private causal mutations, and machine learning methods may be useful for selecting subsets of predictors for follow-up in the presence of extreme locus heterogeneity and large numbers of potential predictors.

UR - http://www.scopus.com/inward/record.url?scp=82455219111&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=82455219111&partnerID=8YFLogxK

U2 - 10.1002/gepi.20657

DO - 10.1002/gepi.20657

M3 - Article

C2 - 22128066

AN - SCOPUS:82455219111

VL - 35

JO - Genetic Epidemiology

JF - Genetic Epidemiology

SN - 0741-0395

IS - SUPPL. 1

ER -