Outliers and Robust Logistic Regression in Health Sciences

Authors

  • Francisco Cutanda Henríquez

Abstract

Logistic regression methods have many applications in Health Sciences. There is a vast literature about procedures to be followed and the way to find the estimators for the parameters from the observed values, and these methods are implemented to all the usual statistical packages. These estimators are of the “maximum likelihood” kind, i.e., they are the ones that make the observed values the most probable among all the models that could have been used. The good properties of the maximum likelihood estimators are widely demonstrated. However, there are some practical circumstances that may cause the presence of “outliers”, i.e., observed values not corresponding to the logistic model we are assuming as a hypothesis. Occasionally, these anomalous observations can have a strong effect on the fit, and lead the study to the wrong conclusion. The causes of these outliers depend on the particular study, but it is possible to point out classification errors, observations (subjects) with special features which have not been taken into account, uncertainty in the measurement of some parameters, etc. The problem with maximum likelihood estimators is that they are not “robust”, i.e., their sensitivity to outliers could be arbitrarily large, and a minority of outliers could lead to a wrong logistic model. In this work, we will show two cases illustrating possible consequences, and we will discuss the application of robust methods.

Published

2009-01-26

Issue

Section

SPECIALL COLLABORATIONS