Methodological aspects of the genetic dissection of gene expression

Ö Carlborg, D. J. De Koning, K. F. Manly, E. Chesler, Robert Williams, Chris S. Haley

Research output: Contribution to journalArticle

49 Citations (Scopus)

Abstract

Motivation: Dissection of the genetics underlying gene expression utilizes techniques from microarray analyses as well as quantitative trait loci (QTL) mapping. Available QTL mapping methods are not tailored for the highly automated analyses required to deal with the thousands of gene transcripts encountered in the mapping of QTL affecting gene expression (sometimes referred to as eQTL). This report focuses on the adaptation of QTL mapping methodology to perform automated mapping of QTL affecting gene expression. Results: The analyses of expression data on >12 000 gene transcripts in BXD recombinant inbred mice found, on average, 629 QTL exceeding the genome-wide 5% threshold. Using additional information on trait repeatabilities and QTL location, 168 of these were classified as 'high confidence' QTL. Current sample sizes of genetical genomics studies make it possible to detect a reasonable number of QTL using simple genetic models, but considerably larger studies are needed to evaluate more complex genetic models. After extensive analyses of real data and additional simulated data (altogether >300 000 genome scans) we make the following recommendations for detection of QTL for gene expression: (1) For populations with an unbalanced number of replicates on each genotype, weighted least squares should be preferred above ordinary least squares. Weights can be based on the repeatability of the trait and the number of replicates. (2) A genome scan based on multiple marker information but analysing only at marker locations is a good approximation to a full interval mapping procedure. (3) Significance testing should be based on empirical genome-wide significance thresholds that are derived for each trait separately. (4) The significant QTL can be separated into high and low confidence QTL using a false discovery rate that incorporates prior information such as transcript repeatabilities and co-localization of gene-transcripts and QTL. (5) Including observations on the founder lines in the QTL analysis should be avoided as it inflates the test statistic and increases the Type I error. (6) To increase the computational efficiency of the study, use of parallel computing is advised. These recommendations are summarized in a possible strategy for mapping of QTL in a least squares framework.

Original languageEnglish (US)
Pages (from-to)2383-2393
Number of pages11
JournalBioinformatics
Volume21
Issue number10
DOIs
StatePublished - May 15 2005

Fingerprint

Quantitative Trait Loci
Dissection
Gene expression
Gene Expression
Genes
Repeatability
Genome
Least-Squares Analysis
Genetic Models
Microarrays
Parallel processing systems
Computational efficiency
Gene
Confidence
Recommendations
Statistics
Ordinary Least Squares
Testing
Type I error
Weighted Least Squares

All Science Journal Classification (ASJC) codes

  • Statistics and Probability
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Computational Theory and Mathematics
  • Computational Mathematics

Cite this

Carlborg, Ö., De Koning, D. J., Manly, K. F., Chesler, E., Williams, R., & Haley, C. S. (2005). Methodological aspects of the genetic dissection of gene expression. Bioinformatics, 21(10), 2383-2393. https://doi.org/10.1093/bioinformatics/bti241

Methodological aspects of the genetic dissection of gene expression. / Carlborg, Ö; De Koning, D. J.; Manly, K. F.; Chesler, E.; Williams, Robert; Haley, Chris S.

In: Bioinformatics, Vol. 21, No. 10, 15.05.2005, p. 2383-2393.

Research output: Contribution to journalArticle

Carlborg, Ö, De Koning, DJ, Manly, KF, Chesler, E, Williams, R & Haley, CS 2005, 'Methodological aspects of the genetic dissection of gene expression', Bioinformatics, vol. 21, no. 10, pp. 2383-2393. https://doi.org/10.1093/bioinformatics/bti241
Carlborg Ö, De Koning DJ, Manly KF, Chesler E, Williams R, Haley CS. Methodological aspects of the genetic dissection of gene expression. Bioinformatics. 2005 May 15;21(10):2383-2393. https://doi.org/10.1093/bioinformatics/bti241
Carlborg, Ö ; De Koning, D. J. ; Manly, K. F. ; Chesler, E. ; Williams, Robert ; Haley, Chris S. / Methodological aspects of the genetic dissection of gene expression. In: Bioinformatics. 2005 ; Vol. 21, No. 10. pp. 2383-2393.
@article{fb46e6bd196e42f4a3b14068d8cf2570,
title = "Methodological aspects of the genetic dissection of gene expression",
abstract = "Motivation: Dissection of the genetics underlying gene expression utilizes techniques from microarray analyses as well as quantitative trait loci (QTL) mapping. Available QTL mapping methods are not tailored for the highly automated analyses required to deal with the thousands of gene transcripts encountered in the mapping of QTL affecting gene expression (sometimes referred to as eQTL). This report focuses on the adaptation of QTL mapping methodology to perform automated mapping of QTL affecting gene expression. Results: The analyses of expression data on >12 000 gene transcripts in BXD recombinant inbred mice found, on average, 629 QTL exceeding the genome-wide 5{\%} threshold. Using additional information on trait repeatabilities and QTL location, 168 of these were classified as 'high confidence' QTL. Current sample sizes of genetical genomics studies make it possible to detect a reasonable number of QTL using simple genetic models, but considerably larger studies are needed to evaluate more complex genetic models. After extensive analyses of real data and additional simulated data (altogether >300 000 genome scans) we make the following recommendations for detection of QTL for gene expression: (1) For populations with an unbalanced number of replicates on each genotype, weighted least squares should be preferred above ordinary least squares. Weights can be based on the repeatability of the trait and the number of replicates. (2) A genome scan based on multiple marker information but analysing only at marker locations is a good approximation to a full interval mapping procedure. (3) Significance testing should be based on empirical genome-wide significance thresholds that are derived for each trait separately. (4) The significant QTL can be separated into high and low confidence QTL using a false discovery rate that incorporates prior information such as transcript repeatabilities and co-localization of gene-transcripts and QTL. (5) Including observations on the founder lines in the QTL analysis should be avoided as it inflates the test statistic and increases the Type I error. (6) To increase the computational efficiency of the study, use of parallel computing is advised. These recommendations are summarized in a possible strategy for mapping of QTL in a least squares framework.",
author = "{\"O} Carlborg and {De Koning}, {D. J.} and Manly, {K. F.} and E. Chesler and Robert Williams and Haley, {Chris S.}",
year = "2005",
month = "5",
day = "15",
doi = "10.1093/bioinformatics/bti241",
language = "English (US)",
volume = "21",
pages = "2383--2393",
journal = "Bioinformatics",
issn = "1367-4803",
publisher = "Oxford University Press",
number = "10",

}

TY - JOUR

T1 - Methodological aspects of the genetic dissection of gene expression

AU - Carlborg, Ö

AU - De Koning, D. J.

AU - Manly, K. F.

AU - Chesler, E.

AU - Williams, Robert

AU - Haley, Chris S.

PY - 2005/5/15

Y1 - 2005/5/15

N2 - Motivation: Dissection of the genetics underlying gene expression utilizes techniques from microarray analyses as well as quantitative trait loci (QTL) mapping. Available QTL mapping methods are not tailored for the highly automated analyses required to deal with the thousands of gene transcripts encountered in the mapping of QTL affecting gene expression (sometimes referred to as eQTL). This report focuses on the adaptation of QTL mapping methodology to perform automated mapping of QTL affecting gene expression. Results: The analyses of expression data on >12 000 gene transcripts in BXD recombinant inbred mice found, on average, 629 QTL exceeding the genome-wide 5% threshold. Using additional information on trait repeatabilities and QTL location, 168 of these were classified as 'high confidence' QTL. Current sample sizes of genetical genomics studies make it possible to detect a reasonable number of QTL using simple genetic models, but considerably larger studies are needed to evaluate more complex genetic models. After extensive analyses of real data and additional simulated data (altogether >300 000 genome scans) we make the following recommendations for detection of QTL for gene expression: (1) For populations with an unbalanced number of replicates on each genotype, weighted least squares should be preferred above ordinary least squares. Weights can be based on the repeatability of the trait and the number of replicates. (2) A genome scan based on multiple marker information but analysing only at marker locations is a good approximation to a full interval mapping procedure. (3) Significance testing should be based on empirical genome-wide significance thresholds that are derived for each trait separately. (4) The significant QTL can be separated into high and low confidence QTL using a false discovery rate that incorporates prior information such as transcript repeatabilities and co-localization of gene-transcripts and QTL. (5) Including observations on the founder lines in the QTL analysis should be avoided as it inflates the test statistic and increases the Type I error. (6) To increase the computational efficiency of the study, use of parallel computing is advised. These recommendations are summarized in a possible strategy for mapping of QTL in a least squares framework.

AB - Motivation: Dissection of the genetics underlying gene expression utilizes techniques from microarray analyses as well as quantitative trait loci (QTL) mapping. Available QTL mapping methods are not tailored for the highly automated analyses required to deal with the thousands of gene transcripts encountered in the mapping of QTL affecting gene expression (sometimes referred to as eQTL). This report focuses on the adaptation of QTL mapping methodology to perform automated mapping of QTL affecting gene expression. Results: The analyses of expression data on >12 000 gene transcripts in BXD recombinant inbred mice found, on average, 629 QTL exceeding the genome-wide 5% threshold. Using additional information on trait repeatabilities and QTL location, 168 of these were classified as 'high confidence' QTL. Current sample sizes of genetical genomics studies make it possible to detect a reasonable number of QTL using simple genetic models, but considerably larger studies are needed to evaluate more complex genetic models. After extensive analyses of real data and additional simulated data (altogether >300 000 genome scans) we make the following recommendations for detection of QTL for gene expression: (1) For populations with an unbalanced number of replicates on each genotype, weighted least squares should be preferred above ordinary least squares. Weights can be based on the repeatability of the trait and the number of replicates. (2) A genome scan based on multiple marker information but analysing only at marker locations is a good approximation to a full interval mapping procedure. (3) Significance testing should be based on empirical genome-wide significance thresholds that are derived for each trait separately. (4) The significant QTL can be separated into high and low confidence QTL using a false discovery rate that incorporates prior information such as transcript repeatabilities and co-localization of gene-transcripts and QTL. (5) Including observations on the founder lines in the QTL analysis should be avoided as it inflates the test statistic and increases the Type I error. (6) To increase the computational efficiency of the study, use of parallel computing is advised. These recommendations are summarized in a possible strategy for mapping of QTL in a least squares framework.

UR - http://www.scopus.com/inward/record.url?scp=19544378364&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=19544378364&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/bti241

DO - 10.1093/bioinformatics/bti241

M3 - Article

VL - 21

SP - 2383

EP - 2393

JO - Bioinformatics

JF - Bioinformatics

SN - 1367-4803

IS - 10

ER -