Approaches to reduce false positives and false negatives in the analysis of microarray data

Applications in type 1 diabetes research

Jian Wu, Nataliya I. Lenchik, Ivan Gerling

Research output: Contribution to journalArticle

11 Citations (Scopus)

Abstract

Background: As studies of molecular biology system attempt to achieve a comprehensive understanding of a particular system, Type 1 errors may be a significant problem. However, few investigators are inclined to accept the increase in Type 2 errors (false positives) that may result when less stringent statistical cut-off values are used. To address this dilemma, we developed an analysis strategy that used a stringent statistical analysis to create a list of differentially expressed genes that served as "bait" to "fish out" other genes with similar patterns of expression. Results: Comparing two strains of mice (NOD and C57Bl/6), we identified 93 genes with statistically significant differences in their patterns of expression. Hierarchical clustering identified an additional 39 genes with similar patterns of expression differences between the two strains. Pathway analysis was then employed: 1) identify the central genes and define biological processes that may be regulated by the genes identified, and 2) identify genes on the lists that could not be connected to each other in pathways (potential false positives). For networks created by both gene lists, the most connected (central) genes were interferon gamma (IFN-γ) and tumor necrosis factor alpha (TNF-α). These two cytokines are relevant to the biological differences between the two strains of mice. Furthermore, the network created by the list of 39 genes also suggested other biological differences between the strains. Conclusion: Taken together, these data demonstrate how stringent statistical analysis, combined with hierarchical clustering and pathway analysis may offer deeper insight into the biological processes reflected from a set of expression array data. This approach allows us to 'recapture" false negative genes that otherwise would have been missed by the statistical analysis.

Original languageEnglish (US)
Article numberS12
JournalBMC Genomics
Volume9
Issue numberSUPPL. 2
DOIs
StatePublished - Sep 16 2008

Fingerprint

Microarray Analysis
Type 1 Diabetes Mellitus
Research
Genes
Biological Phenomena
Cluster Analysis
Inbred NOD Mouse
Interferon-gamma
Molecular Biology
Fishes
Tumor Necrosis Factor-alpha
Research Personnel
Cytokines

All Science Journal Classification (ASJC) codes

  • Biotechnology
  • Genetics

Cite this

Approaches to reduce false positives and false negatives in the analysis of microarray data : Applications in type 1 diabetes research. / Wu, Jian; Lenchik, Nataliya I.; Gerling, Ivan.

In: BMC Genomics, Vol. 9, No. SUPPL. 2, S12, 16.09.2008.

Research output: Contribution to journalArticle

@article{164b9d61f7d744959eaf80b70deefb47,
title = "Approaches to reduce false positives and false negatives in the analysis of microarray data: Applications in type 1 diabetes research",
abstract = "Background: As studies of molecular biology system attempt to achieve a comprehensive understanding of a particular system, Type 1 errors may be a significant problem. However, few investigators are inclined to accept the increase in Type 2 errors (false positives) that may result when less stringent statistical cut-off values are used. To address this dilemma, we developed an analysis strategy that used a stringent statistical analysis to create a list of differentially expressed genes that served as {"}bait{"} to {"}fish out{"} other genes with similar patterns of expression. Results: Comparing two strains of mice (NOD and C57Bl/6), we identified 93 genes with statistically significant differences in their patterns of expression. Hierarchical clustering identified an additional 39 genes with similar patterns of expression differences between the two strains. Pathway analysis was then employed: 1) identify the central genes and define biological processes that may be regulated by the genes identified, and 2) identify genes on the lists that could not be connected to each other in pathways (potential false positives). For networks created by both gene lists, the most connected (central) genes were interferon gamma (IFN-γ) and tumor necrosis factor alpha (TNF-α). These two cytokines are relevant to the biological differences between the two strains of mice. Furthermore, the network created by the list of 39 genes also suggested other biological differences between the strains. Conclusion: Taken together, these data demonstrate how stringent statistical analysis, combined with hierarchical clustering and pathway analysis may offer deeper insight into the biological processes reflected from a set of expression array data. This approach allows us to 'recapture{"} false negative genes that otherwise would have been missed by the statistical analysis.",
author = "Jian Wu and Lenchik, {Nataliya I.} and Ivan Gerling",
year = "2008",
month = "9",
day = "16",
doi = "10.1186/1471-2164-9-S2-S12",
language = "English (US)",
volume = "9",
journal = "BMC Genomics",
issn = "1471-2164",
publisher = "BioMed Central",
number = "SUPPL. 2",

}

TY - JOUR

T1 - Approaches to reduce false positives and false negatives in the analysis of microarray data

T2 - Applications in type 1 diabetes research

AU - Wu, Jian

AU - Lenchik, Nataliya I.

AU - Gerling, Ivan

PY - 2008/9/16

Y1 - 2008/9/16

N2 - Background: As studies of molecular biology system attempt to achieve a comprehensive understanding of a particular system, Type 1 errors may be a significant problem. However, few investigators are inclined to accept the increase in Type 2 errors (false positives) that may result when less stringent statistical cut-off values are used. To address this dilemma, we developed an analysis strategy that used a stringent statistical analysis to create a list of differentially expressed genes that served as "bait" to "fish out" other genes with similar patterns of expression. Results: Comparing two strains of mice (NOD and C57Bl/6), we identified 93 genes with statistically significant differences in their patterns of expression. Hierarchical clustering identified an additional 39 genes with similar patterns of expression differences between the two strains. Pathway analysis was then employed: 1) identify the central genes and define biological processes that may be regulated by the genes identified, and 2) identify genes on the lists that could not be connected to each other in pathways (potential false positives). For networks created by both gene lists, the most connected (central) genes were interferon gamma (IFN-γ) and tumor necrosis factor alpha (TNF-α). These two cytokines are relevant to the biological differences between the two strains of mice. Furthermore, the network created by the list of 39 genes also suggested other biological differences between the strains. Conclusion: Taken together, these data demonstrate how stringent statistical analysis, combined with hierarchical clustering and pathway analysis may offer deeper insight into the biological processes reflected from a set of expression array data. This approach allows us to 'recapture" false negative genes that otherwise would have been missed by the statistical analysis.

AB - Background: As studies of molecular biology system attempt to achieve a comprehensive understanding of a particular system, Type 1 errors may be a significant problem. However, few investigators are inclined to accept the increase in Type 2 errors (false positives) that may result when less stringent statistical cut-off values are used. To address this dilemma, we developed an analysis strategy that used a stringent statistical analysis to create a list of differentially expressed genes that served as "bait" to "fish out" other genes with similar patterns of expression. Results: Comparing two strains of mice (NOD and C57Bl/6), we identified 93 genes with statistically significant differences in their patterns of expression. Hierarchical clustering identified an additional 39 genes with similar patterns of expression differences between the two strains. Pathway analysis was then employed: 1) identify the central genes and define biological processes that may be regulated by the genes identified, and 2) identify genes on the lists that could not be connected to each other in pathways (potential false positives). For networks created by both gene lists, the most connected (central) genes were interferon gamma (IFN-γ) and tumor necrosis factor alpha (TNF-α). These two cytokines are relevant to the biological differences between the two strains of mice. Furthermore, the network created by the list of 39 genes also suggested other biological differences between the strains. Conclusion: Taken together, these data demonstrate how stringent statistical analysis, combined with hierarchical clustering and pathway analysis may offer deeper insight into the biological processes reflected from a set of expression array data. This approach allows us to 'recapture" false negative genes that otherwise would have been missed by the statistical analysis.

UR - http://www.scopus.com/inward/record.url?scp=52249121573&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=52249121573&partnerID=8YFLogxK

U2 - 10.1186/1471-2164-9-S2-S12

DO - 10.1186/1471-2164-9-S2-S12

M3 - Article

VL - 9

JO - BMC Genomics

JF - BMC Genomics

SN - 1471-2164

IS - SUPPL. 2

M1 - S12

ER -