Machine Learning Data Imputation and Classification in a Multicohort Hypertension Clinical Study

Minority Health-Grid Network

Research output: Contribution to journalArticle

4 Citations (Scopus)

Abstract

Health-care initiatives are pushing the development and utilization of clinical data for medical discovery and translational research studies. Machine learning tools implemented for Big Data have been applied to detect patterns in complex diseases. This study focuses on hypertension and examines phenotype data across a major clinical study called Minority Health Genomics and Translational Research Repository Database composed of self-reported African American (AA) participants combined with related cohorts. Prior genome-wide association studies for hypertension in AAs presumed that an increase of disease burden in susceptible populations is due to rare variants. But genomic analysis of hypertension, even those designed to focus on rare variants, has yielded marginal genome-wide results over many studies. Machine learning and other nonparametric statistical methods have recently been shown to uncover relationships in complex phenotypes, genotypes, and clinical data. We trained neural networks with phenotype data for missing-data imputation to increase the usable size of a clinical data set. Validity was established by showing performance effects using the expanded data set for the association of phenotype variables with case/control status of patients. Data mining classification tools were used to generate association rules.

Original languageEnglish (US)
Pages (from-to)43-54
Number of pages12
JournalBioinformatics and Biology Insights
Volume9s3
DOIs
StatePublished - Jan 1 2015

Fingerprint

Hypertension
Imputation
Learning systems
Machine Learning
Genes
Phenotype
Translational Medical Research
Association rules
Health care
Data mining
Statistical methods
Minority Health
Health
Neural networks
Data Mining
Genome-Wide Association Study
Genomics
African Americans
Genome
Genotype

All Science Journal Classification (ASJC) codes

  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Computational Mathematics
  • Applied Mathematics

Cite this

Machine Learning Data Imputation and Classification in a Multicohort Hypertension Clinical Study. / Minority Health-Grid Network.

In: Bioinformatics and Biology Insights, Vol. 9s3, 01.01.2015, p. 43-54.

Research output: Contribution to journalArticle

@article{52c6d7e66b354c94b1bb40815efbf112,
title = "Machine Learning Data Imputation and Classification in a Multicohort Hypertension Clinical Study",
abstract = "Health-care initiatives are pushing the development and utilization of clinical data for medical discovery and translational research studies. Machine learning tools implemented for Big Data have been applied to detect patterns in complex diseases. This study focuses on hypertension and examines phenotype data across a major clinical study called Minority Health Genomics and Translational Research Repository Database composed of self-reported African American (AA) participants combined with related cohorts. Prior genome-wide association studies for hypertension in AAs presumed that an increase of disease burden in susceptible populations is due to rare variants. But genomic analysis of hypertension, even those designed to focus on rare variants, has yielded marginal genome-wide results over many studies. Machine learning and other nonparametric statistical methods have recently been shown to uncover relationships in complex phenotypes, genotypes, and clinical data. We trained neural networks with phenotype data for missing-data imputation to increase the usable size of a clinical data set. Validity was established by showing performance effects using the expanded data set for the association of phenotype variables with case/control status of patients. Data mining classification tools were used to generate association rules.",
author = "{Minority Health-Grid Network} and William Seffens and Chad Evans and Taylor, {Herman A.} and Quarells, {Rakale Collins} and Arnett, {Donna K.} and Gibbons, {Gary H.} and Davis, {Robert L.} and Leal, {Suzanne M.} and Nickerson, {Deborah A.} and James Perkins and Rotimi, {Charles N.} and Robert Davis and Wilson, {James G.}",
year = "2015",
month = "1",
day = "1",
doi = "10.4137/BBI.S29473",
language = "English (US)",
volume = "9s3",
pages = "43--54",
journal = "Bioinformatics and Biology Insights",
issn = "1177-9322",
publisher = "Libertas Academica Ltd.",

}

TY - JOUR

T1 - Machine Learning Data Imputation and Classification in a Multicohort Hypertension Clinical Study

AU - Minority Health-Grid Network

AU - Seffens, William

AU - Evans, Chad

AU - Taylor, Herman A.

AU - Quarells, Rakale Collins

AU - Arnett, Donna K.

AU - Gibbons, Gary H.

AU - Davis, Robert L.

AU - Leal, Suzanne M.

AU - Nickerson, Deborah A.

AU - Perkins, James

AU - Rotimi, Charles N.

AU - Davis, Robert

AU - Wilson, James G.

PY - 2015/1/1

Y1 - 2015/1/1

N2 - Health-care initiatives are pushing the development and utilization of clinical data for medical discovery and translational research studies. Machine learning tools implemented for Big Data have been applied to detect patterns in complex diseases. This study focuses on hypertension and examines phenotype data across a major clinical study called Minority Health Genomics and Translational Research Repository Database composed of self-reported African American (AA) participants combined with related cohorts. Prior genome-wide association studies for hypertension in AAs presumed that an increase of disease burden in susceptible populations is due to rare variants. But genomic analysis of hypertension, even those designed to focus on rare variants, has yielded marginal genome-wide results over many studies. Machine learning and other nonparametric statistical methods have recently been shown to uncover relationships in complex phenotypes, genotypes, and clinical data. We trained neural networks with phenotype data for missing-data imputation to increase the usable size of a clinical data set. Validity was established by showing performance effects using the expanded data set for the association of phenotype variables with case/control status of patients. Data mining classification tools were used to generate association rules.

AB - Health-care initiatives are pushing the development and utilization of clinical data for medical discovery and translational research studies. Machine learning tools implemented for Big Data have been applied to detect patterns in complex diseases. This study focuses on hypertension and examines phenotype data across a major clinical study called Minority Health Genomics and Translational Research Repository Database composed of self-reported African American (AA) participants combined with related cohorts. Prior genome-wide association studies for hypertension in AAs presumed that an increase of disease burden in susceptible populations is due to rare variants. But genomic analysis of hypertension, even those designed to focus on rare variants, has yielded marginal genome-wide results over many studies. Machine learning and other nonparametric statistical methods have recently been shown to uncover relationships in complex phenotypes, genotypes, and clinical data. We trained neural networks with phenotype data for missing-data imputation to increase the usable size of a clinical data set. Validity was established by showing performance effects using the expanded data set for the association of phenotype variables with case/control status of patients. Data mining classification tools were used to generate association rules.

UR - http://www.scopus.com/inward/record.url?scp=85030831288&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85030831288&partnerID=8YFLogxK

U2 - 10.4137/BBI.S29473

DO - 10.4137/BBI.S29473

M3 - Article

VL - 9s3

SP - 43

EP - 54

JO - Bioinformatics and Biology Insights

JF - Bioinformatics and Biology Insights

SN - 1177-9322

ER -