# PUBH-BIO 7220 Assignment 7

The Ohio State University College of Public Health PUBH-BIO 7220 Assignment 7
You may include output and code as an appendix at the end of your assignment.
When asked to run a test, please include ALL of the following
1. The null and alternative hypotheses
2. The result with p-value
3. State if you reject the null hypothesis or not
4. One sentence conclusion in non-statistical language ***
Questions
I. Using the Cleveland data set, fit a regression model with HD_diag as the response and sex
trestbps chol, and exang as the covariates. Include your output here.
A. Evaluate the goodness of fit of the model using both Hosmer-Lemeshow test (using
deciles of risk) and the Pearson Chi-square statistic. Assess whether the results of the 2
tests are consistent.
II. On the basis of the logistic model with sex trestbps chol, and thalach as
covariate, estimate the sensitivity and specificity of classifying subjects as having or not
having heart disease diagnosis using the cut-off values for the probability of heart disease of
0.5.
A. Repeat the previous exercise using the cut-off point specified below and fill in the table.
Draw the ROC curve by hand using the values from the table.

 Cut-off Sensitivity Specificity 0 0.1 0.2 0.3 0.4 0.6 0.8 1

B. Use stat to obtain the ROC curve. What is the discriminatory power of the model?
C. Suppose someone had fraudulently accesses your computer and altered the data of the
dependent variable in such a way that the coefficients of the model would remain the
same. However, the predicted probabilities of the outcome would be largely affected.
What would happen to the goodness of fit statistics?
III. Fit a model with sex and fbs as covariates. Assess the overall fit of the model and its
discriminatory power by conduct the Hosmer-Lemeshow goodness of fit test and calculate
the area under the curve.
A. Estimate the predicted probability of the outcome.
The Ohio State University College of Public Health PUBH-BIO 7220 Assignment 7
The data in hyponatremia.dta derive from an epidemiological study of hyponatremia (a life
threatening condition) among runners of the 2002 Boston Marathon. Hyponatremia is defined as
an electrolyte disturbance in which the serum sodium concentration is lower than normal (<135
mmol/l). The aim of the study was to determine whether a runner experienced hyponatremia and
to identify the principal risk factors. Participants in the 2002 Boston Marathon completed a
survey including demographic and anthropometric characteristics (BMI) one or two days before
the race. After the race, runners provided a blood sample in order to measure their serum sodium
concentration and completed a questionnaire detailing their urine output during the race. Prerace
and postrace weights were also recorded. Use the hyponatremia dataset for the following
exercises.
IV. Run a logistic regression model with nas135 as the dependent variable and female and
urinat3p as covariates and estimate the predicted probability of the outcome. Conduct
the Hosmer-Lemeshow goodness of fit test and calculate the area under the ROC curve.
V. Make a frequency table with nas135, female and urinat3p.
A. Open a new Stata session in which you create a dataset with these variables only in
aggregated form. Generate a variable named freq which is the frequency of each cell in
the table. The new dataset will have 8 rows, 1 for each combination of nas135,
female and urinat3p. Run a logistic regression model with nas135 as the
dependent variable and female and urinat3p as covariates. Include your output
here.
B. What are the estimated the predicted probability of the outcome?
C. Compare the coefficients and estimated predicted probability of outcome for this model
to those of the model using the original dataset.
D. Alter the odds of the outcome for each of the 4 female and urinat3p combinations:
create a new variable, named fakefreq, that has the value of 31 (nas135=1, female=0,
urinat3p=0), 45 (nas135=1, female=1, urinat3p=0), 6(nas135=1, female=0,
urinat3p=1), and 6(nas135=1, female=1, urinat3p=1). The total number of
observations in each of the 4 subgroups should not change, therefore the frequency for
the nas135=0 cells should change accordingly.
E. Fit a model with female and urinat3p using fakefreq instead of freq as weight.
Compute the estimated probabilities of the outcome and compare them with those
estimated from the original data.
F. Conduct the Hosmer-Lemeshow goodness of fit test and calculate the area under the
ROC curve. Compare both statistics with those obtained from the original data.
VI. Fit the model with runtime, wtdiff, bmi and bmi2 as covariates using the original
dataset where bmi2=bmi*bmi. Compute the leverage h, the change in chi-square ΔX2, the
change in deviance and the influence diagnostic ∆