Modeling Linkage Disequilibrium Increases Accuracy of Polygenic Risk Scores

Schizophrenia Working Group of the Psychiatric Genomics Consortium, Psychosis Endophenotypes International Consortium, Wellcome Trust Case Control Consortium, Discovery, Biology, and Risk of Inherited Variants in Breast Cancer (DRIVE) study, Hereditary Breast and Ovarian Cancer Research Group Netherlands (HEBON)

Research output: Contribution to journalArticle

140 Citations (Scopus)

Abstract

Polygenic risk scores have shown great promise in predicting complex disease risk and will become more accurate as training sample sizes increase. The standard approach for calculating risk scores involves linkage disequilibrium (LD)-based marker pruning and applying a p value threshold to association statistics, but this discards information and can reduce predictive accuracy. We introduce LDpred, a method that infers the posterior mean effect size of each marker by using a prior on effect sizes and LD information from an external reference panel. Theory and simulations show that LDpred outperforms the approach of pruning followed by thresholding, particularly at large sample sizes. Accordingly, predicted R2 increased from 20.1% to 25.3% in a large schizophrenia dataset and from 9.8% to 12.0% in a large multiple sclerosis dataset. A similar relative improvement in accuracy was observed for three additional large disease datasets and for non-European schizophrenia samples. The advantage of LDpred over existing methods will grow as sample sizes increase.

Original languageEnglish (US)
Pages (from-to)576-592
Number of pages17
JournalAmerican Journal of Human Genetics
Volume97
Issue number4
DOIs
StatePublished - Jan 1 2015

Fingerprint

Linkage Disequilibrium
Sample Size
Schizophrenia
Multiple Sclerosis
Datasets

All Science Journal Classification (ASJC) codes

  • Genetics
  • Genetics(clinical)

Cite this

Schizophrenia Working Group of the Psychiatric Genomics Consortium, Psychosis Endophenotypes International Consortium, Wellcome Trust Case Control Consortium, Discovery, Biology, and Risk of Inherited Variants in Breast Cancer (DRIVE) study, & Hereditary Breast and Ovarian Cancer Research Group Netherlands (HEBON) (2015). Modeling Linkage Disequilibrium Increases Accuracy of Polygenic Risk Scores. American Journal of Human Genetics, 97(4), 576-592. https://doi.org/10.1016/j.ajhg.2015.09.001

Modeling Linkage Disequilibrium Increases Accuracy of Polygenic Risk Scores. / Schizophrenia Working Group of the Psychiatric Genomics Consortium; Psychosis Endophenotypes International Consortium; Wellcome Trust Case Control Consortium; Discovery, Biology, and Risk of Inherited Variants in Breast Cancer (DRIVE) study; Hereditary Breast and Ovarian Cancer Research Group Netherlands (HEBON).

In: American Journal of Human Genetics, Vol. 97, No. 4, 01.01.2015, p. 576-592.

Research output: Contribution to journalArticle

Schizophrenia Working Group of the Psychiatric Genomics Consortium, Psychosis Endophenotypes International Consortium, Wellcome Trust Case Control Consortium, Discovery, Biology, and Risk of Inherited Variants in Breast Cancer (DRIVE) study & Hereditary Breast and Ovarian Cancer Research Group Netherlands (HEBON) 2015, 'Modeling Linkage Disequilibrium Increases Accuracy of Polygenic Risk Scores', American Journal of Human Genetics, vol. 97, no. 4, pp. 576-592. https://doi.org/10.1016/j.ajhg.2015.09.001
Schizophrenia Working Group of the Psychiatric Genomics Consortium, Psychosis Endophenotypes International Consortium, Wellcome Trust Case Control Consortium, Discovery, Biology, and Risk of Inherited Variants in Breast Cancer (DRIVE) study, Hereditary Breast and Ovarian Cancer Research Group Netherlands (HEBON). Modeling Linkage Disequilibrium Increases Accuracy of Polygenic Risk Scores. American Journal of Human Genetics. 2015 Jan 1;97(4):576-592. https://doi.org/10.1016/j.ajhg.2015.09.001
Schizophrenia Working Group of the Psychiatric Genomics Consortium ; Psychosis Endophenotypes International Consortium ; Wellcome Trust Case Control Consortium ; Discovery, Biology, and Risk of Inherited Variants in Breast Cancer (DRIVE) study ; Hereditary Breast and Ovarian Cancer Research Group Netherlands (HEBON). / Modeling Linkage Disequilibrium Increases Accuracy of Polygenic Risk Scores. In: American Journal of Human Genetics. 2015 ; Vol. 97, No. 4. pp. 576-592.
@article{5bd22f30c3644d048e42d30295e8dd95,
title = "Modeling Linkage Disequilibrium Increases Accuracy of Polygenic Risk Scores",
abstract = "Polygenic risk scores have shown great promise in predicting complex disease risk and will become more accurate as training sample sizes increase. The standard approach for calculating risk scores involves linkage disequilibrium (LD)-based marker pruning and applying a p value threshold to association statistics, but this discards information and can reduce predictive accuracy. We introduce LDpred, a method that infers the posterior mean effect size of each marker by using a prior on effect sizes and LD information from an external reference panel. Theory and simulations show that LDpred outperforms the approach of pruning followed by thresholding, particularly at large sample sizes. Accordingly, predicted R2 increased from 20.1{\%} to 25.3{\%} in a large schizophrenia dataset and from 9.8{\%} to 12.0{\%} in a large multiple sclerosis dataset. A similar relative improvement in accuracy was observed for three additional large disease datasets and for non-European schizophrenia samples. The advantage of LDpred over existing methods will grow as sample sizes increase.",
author = "{Schizophrenia Working Group of the Psychiatric Genomics Consortium} and {Psychosis Endophenotypes International Consortium} and {Wellcome Trust Case Control Consortium} and {Discovery, Biology, and Risk of Inherited Variants in Breast Cancer (DRIVE) study} and {Hereditary Breast and Ovarian Cancer Research Group Netherlands (HEBON)} and Vilhj{\'a}lmsson, {Bjarni J.} and Jian Yang and Finucane, {Hilary K.} and Alexander Gusev and Sara Lindstr{\"o}m and Stephan Ripke and Giulio Genovese and Loh, {Po Ru} and Gaurav Bhatia and Ron Do and Tristan Hayeck and Won, {Hong Hee} and Neale, {Benjamin M.} and Aiden Corvin and Walters, {James T.R.} and Farh, {Kai How} and Holmans, {Peter A.} and Phil Lee and Brendan Bulik-Sullivan and Collier, {David A.} and Hailiang Huang and Pers, {Tune H.} and Ingrid Agartz and Esben Agerbo and Margot Albus and Madeline Alexander and Farooq Amin and Bacanu, {Silviu A.} and Martin Begemann and Belliveau, {Richard A.} and Judit Bene and Bergen, {Sarah E.} and Elizabeth Bevilacqua and Bigdeli, {Tim B.} and Black, {Donald W.} and Richard Bruggeman and Buccola, {Nancy G.} and Buckner, {Randy L.} and William Byerley and Wiepke Cahn and Guiqing Cai and Dominique Campion and Cantor, {Rita M.} and Carr, {Vaughan J.} and Noa Carrera and Catts, {Stanley V.} and Chambert, {Kimberly D.} and Chan, {Raymond C.K.} and Chen, {Ronald Y.L.} and Aaron Wolen",
year = "2015",
month = "1",
day = "1",
doi = "10.1016/j.ajhg.2015.09.001",
language = "English (US)",
volume = "97",
pages = "576--592",
journal = "American Journal of Human Genetics",
issn = "0002-9297",
publisher = "Cell Press",
number = "4",

}

TY - JOUR

T1 - Modeling Linkage Disequilibrium Increases Accuracy of Polygenic Risk Scores

AU - Schizophrenia Working Group of the Psychiatric Genomics Consortium

AU - Psychosis Endophenotypes International Consortium

AU - Wellcome Trust Case Control Consortium

AU - Discovery, Biology, and Risk of Inherited Variants in Breast Cancer (DRIVE) study

AU - Hereditary Breast and Ovarian Cancer Research Group Netherlands (HEBON)

AU - Vilhjálmsson, Bjarni J.

AU - Yang, Jian

AU - Finucane, Hilary K.

AU - Gusev, Alexander

AU - Lindström, Sara

AU - Ripke, Stephan

AU - Genovese, Giulio

AU - Loh, Po Ru

AU - Bhatia, Gaurav

AU - Do, Ron

AU - Hayeck, Tristan

AU - Won, Hong Hee

AU - Neale, Benjamin M.

AU - Corvin, Aiden

AU - Walters, James T.R.

AU - Farh, Kai How

AU - Holmans, Peter A.

AU - Lee, Phil

AU - Bulik-Sullivan, Brendan

AU - Collier, David A.

AU - Huang, Hailiang

AU - Pers, Tune H.

AU - Agartz, Ingrid

AU - Agerbo, Esben

AU - Albus, Margot

AU - Alexander, Madeline

AU - Amin, Farooq

AU - Bacanu, Silviu A.

AU - Begemann, Martin

AU - Belliveau, Richard A.

AU - Bene, Judit

AU - Bergen, Sarah E.

AU - Bevilacqua, Elizabeth

AU - Bigdeli, Tim B.

AU - Black, Donald W.

AU - Bruggeman, Richard

AU - Buccola, Nancy G.

AU - Buckner, Randy L.

AU - Byerley, William

AU - Cahn, Wiepke

AU - Cai, Guiqing

AU - Campion, Dominique

AU - Cantor, Rita M.

AU - Carr, Vaughan J.

AU - Carrera, Noa

AU - Catts, Stanley V.

AU - Chambert, Kimberly D.

AU - Chan, Raymond C.K.

AU - Chen, Ronald Y.L.

AU - Wolen, Aaron

PY - 2015/1/1

Y1 - 2015/1/1

N2 - Polygenic risk scores have shown great promise in predicting complex disease risk and will become more accurate as training sample sizes increase. The standard approach for calculating risk scores involves linkage disequilibrium (LD)-based marker pruning and applying a p value threshold to association statistics, but this discards information and can reduce predictive accuracy. We introduce LDpred, a method that infers the posterior mean effect size of each marker by using a prior on effect sizes and LD information from an external reference panel. Theory and simulations show that LDpred outperforms the approach of pruning followed by thresholding, particularly at large sample sizes. Accordingly, predicted R2 increased from 20.1% to 25.3% in a large schizophrenia dataset and from 9.8% to 12.0% in a large multiple sclerosis dataset. A similar relative improvement in accuracy was observed for three additional large disease datasets and for non-European schizophrenia samples. The advantage of LDpred over existing methods will grow as sample sizes increase.

AB - Polygenic risk scores have shown great promise in predicting complex disease risk and will become more accurate as training sample sizes increase. The standard approach for calculating risk scores involves linkage disequilibrium (LD)-based marker pruning and applying a p value threshold to association statistics, but this discards information and can reduce predictive accuracy. We introduce LDpred, a method that infers the posterior mean effect size of each marker by using a prior on effect sizes and LD information from an external reference panel. Theory and simulations show that LDpred outperforms the approach of pruning followed by thresholding, particularly at large sample sizes. Accordingly, predicted R2 increased from 20.1% to 25.3% in a large schizophrenia dataset and from 9.8% to 12.0% in a large multiple sclerosis dataset. A similar relative improvement in accuracy was observed for three additional large disease datasets and for non-European schizophrenia samples. The advantage of LDpred over existing methods will grow as sample sizes increase.

UR - http://www.scopus.com/inward/record.url?scp=84952665106&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84952665106&partnerID=8YFLogxK

U2 - 10.1016/j.ajhg.2015.09.001

DO - 10.1016/j.ajhg.2015.09.001

M3 - Article

VL - 97

SP - 576

EP - 592

JO - American Journal of Human Genetics

JF - American Journal of Human Genetics

SN - 0002-9297

IS - 4

ER -