1. A document containing your answers to the questions, including any tables and

graphs.

2. A Stata do-file containing the commands you used to obtain the answers.

Materials to be supplied: Data file.

Detailed Instructions:

Answer all questions. Feel free to use all course material, textbook, etc. to complete

this homework.

Submission is online i.e. you have to upload both the files on the ELE. The submitted

documents will contain your answers to the questions, including any output, tables and

graphs. Also, attach the do-file containing the commands you used to obtain your answers.

You must indicate which question your commands relate to. You may only submit once;

it is not possible to alter your documents and re-submit at a later time.

The data corresponding to this homework is uploaded on the ELE page (ref.dta).

Marking will follow the university’s assessment guidelines. Emphasis is placed on the

correct interpretation of summary statistics, regression coefficients and hypothesis tests.

For graphical analysis, emphasis is placed on the conclusions drawn from a graph, as well

as a mention of the limitations, if any.

Unless mentioned otherwise, assume a significance level of 5% (i.e. 95% confidence

level) for all statistical tests. Always state the null hypothesis and alternative hypothesis

of the test, as well as your decision at the stated level of significance.

Introduction

People in the U.K. voted for EU referendum on 23 June, 2016. Each voter had a choice

between two options: Remain (U.K. remains a part of EU) and Leave (U.K. leaves the

EU). You are given a data set (ref.dta) which contains information on 267 cities across

England to study the factors that may have driven voters’ decision. A city is considered

to have voted for Remain if the number of votes for Remain is larger than the number of

votes for Leave; otherwise the city voted for Leave. The data contains following variables

for each city:

Variable | Description |

city | Name of the city |

region | Geographical region of the city |

pct turnout | % of eligible voters who voted |

remain | Number of votes for Remain |

leave | Number of votes for Leave |

rejected ballots | Number of invalid votes |

pct remain | % of votes for Remain |

pct leave | % of votes for Leave |

youngerpop | % of population aged 16-29 (Younger Population) |

midpop | % of population aged 30-44 |

olderpop | % of population aged 45+ (Older Population) |

meanage | Mean age |

medianage | Median age |

totalpop | Total population |

ukpop | Population of U.K. citizens |

unemp | Unemployment Rate |

deprivationindex | A measure of income, education, health and crime. (a higher number implies more deprivation) |

immipop | % of immigrant population |

vote | =1 if a city has a larger number of votes for Leave as compared to Remain; 0 otherwise |

turnout | =1 if pct turnout is larger than 77; 0 otherwise |

I Basic Descriptive Statistics [35 marks]

1. [ 3 marks ] How many cities voted for Remain i.e. they received a larger number of

votes for Remain as compared to Leave?

2. [5 marks ] What fraction of cities that voted for Leave have a lower than average

unemployment rate and also have a lower than average deprivation index? Interpret

your answer.

3. [ 8 marks ] Plot a graph of your choice to distinguish the unemployment rate distribution between cities which voted for Remain and Leave. Interpret the graph.

4. [ 7 marks ] How many geographical regions in the data have the mean unemployment

rate larger than their median unemployment rate? Interpret your answer.

5. [ 6 marks ] Calculate the total number of invalid votes for each geographical region.

6. [ 6 marks ] Claim: Cities that voted for Leave (vote=1) have a higher % of eligible

voters who voted (pct turnout) than the cities that voted for Remain (vote=0), on

average. True or False.

II Linear Regression Analysis [65 marks]

pct leave = β0+β1immipop+β2deprivationindex+β3olderpop+β4olderpop×unemp+error

(1)

1. [ 10 marks ] Estimate a linear regression model with pct leave as the dependent

variable and immipop, deprivationindex, olderpop and (olderpop × unemp) as the

independent variables (as shown in equation (1)). Interpret the meaning of the

coefficients on immipop and (olderpop × unemp).

2. [ 10 marks ] Plot a graph of your choice to comment on whether the residuals of the

above regression model in equation (1) are normally distributed. Conduct a test to

confirm your findings.

3. [ 10 marks ] What is the average elasticity of pct leave with respect to immipop in

the above regression model in equation (1)? Construct a 95% confidence interval

around the average elasticity.

4. [ 12 marks ] Use the regression model in equation (1) to predict the outcome of voting

in the Edinburgh City. The characteristics of Edinburgh City are: olderpop=38.2%

; unemp=5.4% ; deprivationindex=28,000 and immipop=15.8%. Based upon the

prediction, did Edinburgh City vote for Leave?

5. [ 8 marks ] Plot a scatterplot graph of the residuals from the regression model in

equation (1) versus the deprivationindex. Discuss it.

pct leave = β0 + β1immipop + β2deprivationindex + β3olderpop + β4olderpop × unemp

+β5olderpop2 + error

(2)

6. [ 15 marks ] Estimate the regression model above in equation (2). Answer the

following questions based on this new regression model:

(a) [ 5 marks ] In terms of the goodness-of-fit, is this new regression model in

equation (2) preferred over the model in equation (1) ?

(b) [ 10 marks ] The marginal effect of olderpop on pct leave at an unemployment

rate of 7% in a city is zero. What is the percentage of older population in this

city?

