Friday, August 5, 2016

Weighting the Survey Data

There's a new Georgia poll out that shows Hillary Clinton 4 percentage points ahead of Donald Trump, just within the margin of error but signaling that the state is, indeed, in play.

Election surveys are a mix of art and science. The art can be question wording and the special sauce that goes into deciding who is a likely voter. The science is how to weight the data to reflect the population you're trying to describe. Simply put, survey responses are usually off to some degree -- too many women, too few young people, not enough African-Americans, etc. In Georgia surveyors can look at the demographics of previous elections and "weight" their results to reflect the population and those who vote.

Here are the crosstabs of a new AJC poll. Most folks don't bother with this stuff, but you'll find interesting tidbits. See the two rows called Unweighted Base and Total Respondents? Unweighted Base reflects the raw numbers, in this case 847. The Total Respondents reflects (I'm guessing) the weighted numbers of 767. It's possible the total reflects "likely" versus "unlikely" voters but I can find no evidence of that, so I'm gonna assume I'm reading this correctly until told otherwise.

Check this out:
  • Unweighted number of young respondents (ages 18-39): 165, or 19.5 percent of total unweighted respondents.
  • Weighted number of young respondents: 248, or 32.3 percent of total.
How did they invent 83 respondents? They didn't. Again, if I'm reading these xtabs correctly, they've weighted the data to reflect how many young people failed to respond to the poll but actually participate in elections. (We can argue about 18-39 being "young" on another day). That means each "young" person's opinion gets a little heavier in the final calculation. When that happens, someone loses, probably older respondents. In this survey, those 65 and older made up 29.9 percent of the unweighted base, but only 19.6 percent of the "Total Respondents." We see the same for blacks, though not as dramatic, with 26.2 percent of raw respondents in the survey but 30.8 percent in the "total respondents" category, meaning they got weighted slightly higher and whites slightly lower.

What's this all mean? In Georgia, not much, as we have good socio-demographics collected at the voting booth, giving us a solid comparison of previous voters. Some states do this, some don't. Often we have to guess as to "likely voters," which is the special sauce I mentioned above. Different polling shops have slightly different methods, and sometimes their methods are so off that they end up badly projecting an election.

By the way, the weighting didn't seriously affect party identification. For example, 29.0 percent were Democrats in the raw data, 28.8 in the weighted data.

Also I can't see in the AJC story how the survey was conducted. Landlines? Cells? Smoke signal? Live callers, or robo calls? Given the firm they use, real pros, I'm assuming a quality survey.

No comments: