Purposeful selection of variables in logistic regression

Zoran Bursac, Clinton Heath Gauss, David Keith Williams, David W. Hosmer

Research output: Contribution to journalArticle

847 Citations (Scopus)

Abstract

Background: The main problem in many model-building situations is to choose from a large set of covariates those that should be included in the "best" model. A decision to keep a variable in the model might be based on the clinical or statistical significance. There are several variable selection algorithms in existence. Those methods are mechanical and as such carry some limitations. Hosmer and Lemeshow describe a purposeful selection of covariates within which an analyst makes a variable selection decision at each step of the modeling process. Methods: In this paper we introduce an algorithm which automates that process. We conduct a simulation study to compare the performance of this algorithm with three well documented variable selection procedures in SAS PROC LOGISTIC: FORWARD, BACKWARD, and STEPWISE. Results: We showthat the advantage of this approach is when the analyst is interested in risk factor modeling and not just prediction. In addition to significant covariates, this variable selection procedure has the capability of retaining important confounding variables, resulting potentially in a slightly richer model. Application of the macro is further illustrated with the Hosmer and Lemeshow Worchester Heart Attack Study (WHAS) data. Conclusion: If an analyst is in need ofan algorithm that will help guide the retention of significant covariates as well as confounding ones they should consider this macro as an alternative tool.

Original languageEnglish (US)
Article number17
JournalSource Code for Biology and Medicine
Volume3
DOIs
StatePublished - Dec 16 2008
Externally publishedYes

Fingerprint

Logistics
Logistic Models
Macros
Confounding Factors (Epidemiology)
Myocardial Infarction
Variable selection
Covariates
Logistic regression
Analysts
Confounding

All Science Journal Classification (ASJC) codes

  • Information Systems
  • Computer Science Applications
  • Health Informatics
  • Information Systems and Management

Cite this

Purposeful selection of variables in logistic regression. / Bursac, Zoran; Gauss, Clinton Heath; Williams, David Keith; Hosmer, David W.

In: Source Code for Biology and Medicine, Vol. 3, 17, 16.12.2008.

Research output: Contribution to journalArticle

Bursac, Zoran ; Gauss, Clinton Heath ; Williams, David Keith ; Hosmer, David W. / Purposeful selection of variables in logistic regression. In: Source Code for Biology and Medicine. 2008 ; Vol. 3.
@article{aec166d7f2a54184bf29936b3335294c,
title = "Purposeful selection of variables in logistic regression",
abstract = "Background: The main problem in many model-building situations is to choose from a large set of covariates those that should be included in the {"}best{"} model. A decision to keep a variable in the model might be based on the clinical or statistical significance. There are several variable selection algorithms in existence. Those methods are mechanical and as such carry some limitations. Hosmer and Lemeshow describe a purposeful selection of covariates within which an analyst makes a variable selection decision at each step of the modeling process. Methods: In this paper we introduce an algorithm which automates that process. We conduct a simulation study to compare the performance of this algorithm with three well documented variable selection procedures in SAS PROC LOGISTIC: FORWARD, BACKWARD, and STEPWISE. Results: We showthat the advantage of this approach is when the analyst is interested in risk factor modeling and not just prediction. In addition to significant covariates, this variable selection procedure has the capability of retaining important confounding variables, resulting potentially in a slightly richer model. Application of the macro is further illustrated with the Hosmer and Lemeshow Worchester Heart Attack Study (WHAS) data. Conclusion: If an analyst is in need ofan algorithm that will help guide the retention of significant covariates as well as confounding ones they should consider this macro as an alternative tool.",
author = "Zoran Bursac and Gauss, {Clinton Heath} and Williams, {David Keith} and Hosmer, {David W.}",
year = "2008",
month = "12",
day = "16",
doi = "10.1186/1751-0473-3-17",
language = "English (US)",
volume = "3",
journal = "Source Code for Biology and Medicine",
issn = "1751-0473",
publisher = "BioMed Central",

}

TY - JOUR

T1 - Purposeful selection of variables in logistic regression

AU - Bursac, Zoran

AU - Gauss, Clinton Heath

AU - Williams, David Keith

AU - Hosmer, David W.

PY - 2008/12/16

Y1 - 2008/12/16

N2 - Background: The main problem in many model-building situations is to choose from a large set of covariates those that should be included in the "best" model. A decision to keep a variable in the model might be based on the clinical or statistical significance. There are several variable selection algorithms in existence. Those methods are mechanical and as such carry some limitations. Hosmer and Lemeshow describe a purposeful selection of covariates within which an analyst makes a variable selection decision at each step of the modeling process. Methods: In this paper we introduce an algorithm which automates that process. We conduct a simulation study to compare the performance of this algorithm with three well documented variable selection procedures in SAS PROC LOGISTIC: FORWARD, BACKWARD, and STEPWISE. Results: We showthat the advantage of this approach is when the analyst is interested in risk factor modeling and not just prediction. In addition to significant covariates, this variable selection procedure has the capability of retaining important confounding variables, resulting potentially in a slightly richer model. Application of the macro is further illustrated with the Hosmer and Lemeshow Worchester Heart Attack Study (WHAS) data. Conclusion: If an analyst is in need ofan algorithm that will help guide the retention of significant covariates as well as confounding ones they should consider this macro as an alternative tool.

AB - Background: The main problem in many model-building situations is to choose from a large set of covariates those that should be included in the "best" model. A decision to keep a variable in the model might be based on the clinical or statistical significance. There are several variable selection algorithms in existence. Those methods are mechanical and as such carry some limitations. Hosmer and Lemeshow describe a purposeful selection of covariates within which an analyst makes a variable selection decision at each step of the modeling process. Methods: In this paper we introduce an algorithm which automates that process. We conduct a simulation study to compare the performance of this algorithm with three well documented variable selection procedures in SAS PROC LOGISTIC: FORWARD, BACKWARD, and STEPWISE. Results: We showthat the advantage of this approach is when the analyst is interested in risk factor modeling and not just prediction. In addition to significant covariates, this variable selection procedure has the capability of retaining important confounding variables, resulting potentially in a slightly richer model. Application of the macro is further illustrated with the Hosmer and Lemeshow Worchester Heart Attack Study (WHAS) data. Conclusion: If an analyst is in need ofan algorithm that will help guide the retention of significant covariates as well as confounding ones they should consider this macro as an alternative tool.

UR - http://www.scopus.com/inward/record.url?scp=61349120218&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=61349120218&partnerID=8YFLogxK

U2 - 10.1186/1751-0473-3-17

DO - 10.1186/1751-0473-3-17

M3 - Article

VL - 3

JO - Source Code for Biology and Medicine

JF - Source Code for Biology and Medicine

SN - 1751-0473

M1 - 17

ER -