POLI 380 (002) 201819 Assignment 2

POLI 380 (002) 201819 Assignment 2

There are SIX questions for a total of 30 points – points per question are indicated in parentheses before each question.

In none of this should you paste in your Stata results to give us a string of numbers to look at. Even if it looks OK when you paste it, it doesn’t look good when we grade it. Work numbers into sentences when asked to do so.

You may work on the assignment with other students, but all answers must be written up individually. Answers that are substantially identical to those of another student will be treated as plagiarism.

This assignment is going to use THREE different datasets!

  1. The first is the same as last time: your individual sample of the Census Microdata File.
  2. For Questions 3 – 5, you will use the Canadian National Election Study from 2015. It is posted on Canvas under /modules/course materials under ‘course materials’.
  3. For Question 6, use the “Quality of Government” dataset. It is in the same place as the other data.

After you’ve written the answers in a document that you save for yourself, submit the answers in the appropriate question box on Canvas.

Q1: Proportion mean & CI using census data. [2pts]

(1 point) a. Use your census dataset sample to estimate the NUMBER (not the percentage) of people living in Canada who were born outside of the country. Assume the total population of Canada is exactly 35 million. (You have no other source to help you estimate the number, just your sample).

(1 point) b. Indicate how far away from the true number of people born outside Canada in the full population (i.e. not the percentage) you would expect to be, 19 times out of 20.
Say: ± ______ number of people. (NOT ± %) You can calculate this using the formula for the standard error of a proportion and then use that result to calculate the number of Canadians, as you will have done in question a.

In Canvas, just enter one number for a) and another number for b). For this question you do not need to ‘write up’ the answers.

Q2. Means from the census [5 pts]

Find the mean ‘total income’ of people born inside and outside Canada. Restrict your analysis to those aged 25-65. Report the means in a smoothly worded paragraph that summarizes the findings for a reader.

Questions 3 to 5: Canadian National Election Study.

Now switch to the 2015 Canadian National Election Study.


Draw a random sample of 3,500 cases from the dataset. That way you will all get different samples that I can have my computer replicate.

First, set the random number seed by typing: set seed courseidnumber (where you replace ‘courseidnumber’ with the same number as your course id (also the number from your census data set), NOT your real student number).

Use the command sample: sample 3500, count.

If you do not include “count” in your command, Stata thinks you want 3500% of your sample and won’t be able to do anything.

Now use the separate command count to double check that you now have 3500 cases to work with:

Stata should simply report: 3500. (If it’s close, it’s ok).

Q3. Attitudes toward marginalized groups spending [10 points]

We are going to look Canadians’ attitudes toward marginalized groups. To start, find the variables that measure survey respondents’ evaluation of the following groups on the standard 0-100 thermometer scale. Specifically, find the variables measuring evaluations of Aboriginals, Muslims who live here, gays & lesbians, and visible minorities. 1 NOTE: use the variables with the prefix p_pos_ do not use the variables p_like_. Being careful about recoding, create an index variable that indicates a respondent’s average rating of these groups. The index variable should range 0-100.

Next, find the variable that indicates respondents’ province of residence.

In a single, concise, and engaging paragraph, summarize the distribution of the ‘feelings toward marginalized people index’ for a) all survey respondents, b) those who live in Quebec, and c) those who live in BC. Be sure to provide information about how these concepts were measured as well as descriptive information about these distributions (i.e., mean, range, information about spread, information about shape of the distributions).

Your audience is someone reading a newspaper article or op-ed. You’ll need to explain, in general terms, what the index variable measures and what different values/scores mean.

Q4: Partisanship, perceptions of politicians, and crosstabs [5 points]

What is the relationship between beliefs about voting as a choice or a duty and intentions to vote? Start with the variable vote_duty. Exclude people who ‘refused’ to answer the question from your analysis. For the variable lklytovote, exclude those who said don’t know, refused, or 1000. Recode those who already voted to be in the ‘certain to vote’ category.

Now run a crosstab telling us how responses to the ‘likely to vote’ variable are distributed within categories of the ‘duty’ question. Be careful about looking at column or row percentages. Report results of this crosstab in a clear and compelling paragraph. Use some of the results from the table, but not all, when explaining what you found. You do not need to run a chi-square test, or report the p-value, for this question.

Q5: Interpreting a p-value [3pts]

Create a binary variable that indicates whether people are younger than 35 years old (so 18-34).

Rerun the cross tab you did for Q4 but only among those under 35. When you run the cross tab this time be sure to use the “,chi” command to get a p-value. In a single sentence, report and interpret the p-value for an intelligent reader who knows little about statistics. Your answer should only talk about the p-value and what it means. Don’t discuss the percentages present in your crosstab for this question.

Q6: Difference of means [5 pts]

Use the Quality of Government data set. Run a t-test to assess the difference in mean “proportion of seats held by women” among countries with proportional vs majoritarian electoral systems. Use wdi_wip to measure women in parliament and gol_est to measure electoral system (note: exclude countries with ‘mixed’ electoral systems). Run the t-test only on countries that are clear democracies (specifically countries with a p_democ score of 8, 9, or 10.) Report results from the t-test in a smoothly worded paragraph that explains your findings. Be sure to provide the readers with sufficient information (imagine they know nothing about the data set or variable) to understand your results. Talk about whether or not there is a relationship, and its size and direction. Also, be sure to interpret the p-value.

1 The labels for these groups are subjective and contested. Many may also contest the phrasing or use of capitalization. I’ve used labels that match the specific questions asked of survey respondents. Deciding which groups to include in an index of marginalized groups is, of course, grounds for reasonable debate. Some people may think that other groups mentioned in survey questions are marginalized. For instance, the survey also asked about: feminists, francophones, and immigrants. Different people might assert that some or all of the groups listed here or in the question are or aren’t marginalized. As you know, defining your concepts is an important part of social science research and different definitions and operationalizations can lead to different results. For the purpose of the assignment, where the focus is on the mechanics of social science research, we will use the groups I listed above. The choice of groups represents my understanding of a mainstream, and perhaps conservative, conception of marginalization. If you feel strongly that the index I’ve asked you to create lacks validity, I’d be more than happy to hear from you by email, after class, or during my office hours.

Leave a Reply

Your email address will not be published. Required fields are marked *