Tuesday, September 27, 2016

Weird Data

In a dataset of all U.S. colleges and universities, I decided to sort it based on a column for undergraduate enrollment. A single school popped up with only one enrolled student, the Institute for Advanced Medical Esthetics in Richmond, Virginia. No, I did not make up that name. Click on the link and see for yourself. This number makes no sense to me, and take it as a lesson in doubting data just as you'd scrutinize any source. Maybe there is only one student, but that seems unlikely. It is based on 2013 data and perhaps the school was just getting started.

(By the way, the largest listed, with 166,816 students, was no surprise -- University of Phoenix--Online campus. Next was Ivy Tech Community College in Indiana with 87,017 students. Wow).

Post-Debate SLOPs

A SLOP is a self-selected opinion poll, usually conducted online at some web site, and are excellent examples of crappy polls based on biased samples. The only people who participate are those who frequent a site, and if it's partisan you know the direction they'll lean, and those who bothered enough to take part. A real sample is random, or as close as we can get it, with people having a more-or-less equal chance of being included.

That brings us to last night's first presidential debate and the subsequent polls -- legitimate, and less so -- that quickly emerged.

CNN/ORC is usually first with these snap polls and it showed Clinton "won" the debate. OK, no surprise there to the casual viewer, especially as Trump unraveled a bit at the end. But what also emerged was a reliance by Trump supporters on a host of SLOPs that showed -- shocker -- Trump won. See a tweet below.


Does it shock anyone, for example, that folks who visit Drudge thought that Trump overwhelmingly won? Or Brietbart? Some more legitimate sites like CNBC, Time, and Fortune have it closer, but these are still SLOPs. They may measure enthusiasm, or a fanbase likely to bother voting on such things, but they don't measure by any stretch of the imagination public opinion about who "won" a debate.

Never pay attention to SLOPs. And news orgs should never use them and, if they do for the hell of it, should never report on them as being meaningful.

Was the CNN/ORC poll biased? It does include more Dems than Republicans, but it reflects the population it's trying to describe -- people who reported watching the debate. Maybe more Trump fans were watching NFL football. I dunno. A later poll, using real methodology, also found Clinton won (though not by quite as big a margin). In other words, real polls find Clinton won. Polls with absolutely no methodological rigor find Trump won. You decide which to believe.


Wednesday, September 21, 2016

Earnings and Repaying Loans

Another day of playing with data, this time looking at how Georgia colleges rank in terms of earnings and percent who repay loans. No surprise, the two lists are similar. In other words, the more graduates make from a school, the better that school's loan repayment numbers. But that's not always the case. Below I provide the Top 10 in Georgia by earnings and, in the column next to it, the ranking of that school in loan repayments. A few jump out at you. Mercer is 6th in earnings, but 20th in repayment. Spelman is 8th in cashola made, but paying 33rd in paying that cashola back. Same with Shorter (10th and 38th).


College or
University

Rank in
Earnings
Rank in
Repayment
Georgia Tech
  1
  1
Emory
  2
  2
Southern Polytechnic
  3
14
Creative Circus
  4
10
Portfolio Center
  5
  6
Mercer
  6
20
UGA
  7
  4
Spelman
  8
33
Oglethorpe
  9
16
Shorter
10
38




Tuesday, September 20, 2016

College Data

I've just started playing with a large set of university data. So much data, so little time. Lemme give you a couple of highlights from some quick-and-dirty analyses. There are three columns that contain the average debt of students from low (below $30k), middle ($30-75k), and high (over $75k) income families. I found a few surprises, if I'm reading the numbers correctly.

Let's start with the high income college debt. The leading school is Albany State ($25,000 average debt), followed by SCAD ($21,500) and Georgia Tech ($21,000). Emory and Mercer come in next. UGA, where I teach, is 10th at $17,250. Next is middle income college debt. Leading the way is Morehouse ($27,000), followed by Shorter ($26,000), Spelman ($25,300) and Albany State ($25,187). UGA is 22nd at $17,000. Finally, those lower income families. Spelman leads the list ($25,000), followed by Morehouse ($26,675), Shorter ($25,563) and Albany State ($23,637). Oh, UGA is 27th at $16,224.

So what can we take away from this? There are a lot of consistencies in the list. Albany State, for example. And Morehouse.

When I have time I'm going to correlate the debt to the salaries reported for graduates, see how it all falls. Or perhaps look at completion rate compared to debt incurred (for example, Spelman has the 6th highest completion rate, while Albany State is in the middle of the pack with a 40 percent completion rate). There is lots of data here, accessible to me as a member of Investigative Reporters & Editors.

The list of schools alone is fun. We have the Gupton Jones College of Funeral Service, by far my favorite, though Beauty College of America is a close second.

The data is the entire country, I just carved out a Georgia slice for some analyses. I can rank order by any variable, or look at relationships among the variables. Again, so much data, so little time.





Monday, September 19, 2016

Inconsistent Style

We harp on AP style in classes. We demand students be consistent in how they present the news so that inconsistencies don't get in the way of storytelling. Abbreviate words in a similar fashion, for example. Capitalize the same. So look below. In the left story's hed, all the words are in uppercase. On the right, only some are. And let's not forget the noun-verb agreement problem in that story to the right, plus its factual error. The police did not make a sexual assault allegation, a student reported it and the police later, upon getting more information, said it didn't happen. This hed misleads. Plus there's that lazy stock photo. You're a news site.






f

Wednesday, September 14, 2016

Minority Faculty

The University of Missouri has set aside $1.6 million to increase its minority faculty. According to the story, they want to get up to 13 percent in four years.

So, how does UGA compare?

Using only full time faculty via UGA's data portal, only 5.8 percent of UGA's faculty are black. Of the colleges Social Work does the best, with 35.5 percent African American, followed by Education (11.6 percent). Who sucks? It's a tie, at 0.0 percent, between Environment & Design, Forestry & National Resources, though as you can see from the data presented below Env & Design makes up for it somewhat with four Hispanic faculty. Again, the data below are just full-time faculty, from Fall 2015. I could do a lot more, but it's hard to squeeze onto this blog.

SCHOOL
Total
Faculty
Black
Faculty
Hispanic
Faculty
Arts & Sciences
790
38
33
Ag & Env Science
572
39
14
Education
215
25
  4
Vet Med
178
  8
12
Business
135
  2
  4
Fam & Con Sci
  85
  7
  4
Pharmacy
  68
  4
  1
Journalism
  63
  1
  3
Law
  60
  5
  1
SPIA
  59
  1
  0
Public Health
  56
  6
  2
Engineering
  53
  2
  0
Forestry
  52
  0
  1
Env & Design
  36
  0
  4
Social Work
  31
11
  0
Ecology
  28
  1
  0



c

Tuesday, September 13, 2016

Voting


No reason to elaborate. Read this NYTimes piece on who votes, who doesn't, and what predicts the casting of a ballot. It's worth your time.

Friday, September 9, 2016

Georgia Drought Alert

There's an AJC story out about Georgia's drought problem. It's real. To help you see it better, below I provide the drought map roughly one year apart. The older is first, the newer one just below. You can see there's a definite problem in northwest Gerogia, south southeast Georgia seems in better shape. Data from here.





Thursday, September 8, 2016

Georgia Suicides

Much has been written about the increase in suicides among middle age white males -- a topic near and dear to my heart as a middle age white male. See, for example, this Psychology Today article. So I got to wondering, is this happening in Georgia (where I live and teach and, so far, successfully avoid death).

The short answer -- yes.

The longer answer -- there's a trend toward older suicides in Georgia, at least in the data I analyzed.

First the overall trend. Perhaps the best way is to look at the two graphs below, the top one of suicides by age in 1995, the second in 2015. Clearly 2015 skews a bit more toward older folks. As you can see, in 1995 the ages 35-39 and 40-44 had the most suicides. In 2015, the 50-54 age group dominates. If I fit a trend line you'd see in 2015 the data leans more to the right -- that is, older.


OK, but what about white males? The story is similar. Again, see the graphs below.  As it shows, there's a bump to the right in the later data, with the 50-54 age category being the most prominent in the 2015 data but the younger categories being more prominent in the 1995 data.


Now some numbers. In 1995 the 50-54 age group of white males made up only 4.5 percent of all suicides. By 2015 that same group was at 7.2 percent of all suicides. Some more numbers. In general, in every year, white males make up most of the suicides. In 2015, for example, it usually hovers around the 60 to 70 percent mark. Simply put, white males tend to make up two-thirds of all suicides in any given year, and sometimes it's as high as nearly nine out of 10. In 2015, there were 1,245 suicides in Georgia, 787 of those being by white males. That's in part because there are more whites than blacks in Georgia, so to really do this I'd need to check it out proportionally, by population, but I'm certain without doing the math it's still higher than you'd expect.

There's more I can do with these data, but it's late in the day.





Wednesday, September 7, 2016

Car Color and Parking Tickets (at UGA)

In my continuing mission to mine UGA parking ticket data from the last academic year, we turn today to the color of the car and number of tickets written. This is what I do while waiting for my ride home.

The hard part of this analysis is clear. I don't know how many red or black cars there are on campus, plus cars off campus can also get a ticket. In other words, if black cars get the most tickets (they do), that may simply be a function of there being more black cars available on campus -- therefore they get more tickets.

So how do we deal with this problem? We turn to Wikipedia, of course.

This Wikipedia page lists the North America popularity of car colors. It actually has two rankings from two sources, both similar. Below I compare the ranking of North American popularity of car color (using the Dupont Paint list) with the ranking of the number of tickets written on campus last academic year.

The
Rank
American
Popularity
Tickets on
UGA Campus
  1
Black
Black
  2
Silver
White
  3
White
Silver
  4
Gray
Gray
  5
Red
Blue

As you can see, the lists are remarkably similar. The difference in white and silver is small on the popularity list and very close in tickets written as well. Call it a tie.

You budding methodologists out there see the problem, of course. What's popular in North America isn't necessarily the same pool we're drawing from for cars that happen to park illegally on the University of Georgia campus during the 2015-2016 academic year. Yup, call that a limitation. A caveat. A problem. Other than that, it's kinda interesting that the lists are so similar. And red cars don't make the Top Five at UGA (it's number 6, by the way).

What's it all mean? That I'll do anything to play with data.









Let's Rewrite That Story

I don't often do this, but let's rewrite a student story put online. First off, good job GradyNewsource getting the breaking news online, especially with a pic. It's not a huge story with mostly scrapes from a bus, but that's OK. First off, here's the story. It's about an accident on UGA's east campus. Skim it first and then come back here and let's take a coupla grafs, see if we can improve on them.

The Lede
Earlier today, around 12:30 p.m., a University of Georgia East Campus Express bus hit a student riding her bike on East Campus Road.
Rarely if ever start with the day/time element of a story, and don't use "today" in your story. You don't know when your reader is, ya know, reading the story. Yes, on air it's different, you say "today," but online you put the actual day something happened. Second, do we know the bus hit the student, or vice versa? How about attribution to that point, if you're gonna raise it. Finally, you've put the word East twice in the same sentence. Avoid that by naming the bus route later in the story, if needed, but not in the lede. Instead, write something like: A UGA student suffered minor injuries early Tuesday afternoon when her bike collided with a bus on East Campus. Or something like that. Note how, without attribution, I leave it open as to who is guilty.

Later Graf

I'm pressed for time, so let's skip down the story a bit to this sentence:
There is still no word on if the bus driver has been charged with any crime yet. The name of the student has not been released by authorities.
The way this is written you assume the bus driver will be charged with a crime. And I especially hate the word yet in this sentence. Too suggestive. Perhaps she biked in front of the bus, or the bus had the right of way.  If you know different, tell me why and who says so, but the sentence above suggests it's the bus drivers fault with no supporting information.

This story was cranked out fast, so points for getting it online, but take care in how you write it and later go back and clean it up online. No reason to leave it parked there in its original form. Update the thing.






Trump is Ford, Clinton is Honda?




So this survey breaks it down by candidate support and make/model of vehicles owned. I got there via this Daily Caller story (yes, I sometimes read it, please don't tell anyone). Essentially, Trump supporters drive American. Clinton supporters drive those other cars and *shudder* Prius. That's the message here, for what little it's worth.

For the record, I drive a Ford.

So what do we know about the methodology? Is this a real poll? It's hard to say. Here's what they report (graphic below), with the yellow circle added by me to point out the details. It's heavy on men by about six percentage points, and we have no idea whether the "all 50 states" were representative of the states or weighted more heavily than they should be of certain, less populous states. All it says in the tiny type below is they "surveyed" folks, nothing about how they did it, whether it was random, or what. They report a margin of error, but that only works if you have a random survey.




















f

Thursday, September 1, 2016

Telling People Stuff in a Poll

So there's this PPP poll of Wisconsin, a robo-poll best I can tell. Normally I wouldn't spend much time on it but I found this interesting. Early in the survey they ask respondents about the U.S. Senate race. So:
Q1: The candidates for U.S. Senate are Democrat Russ Feingold and Republican Ron Johnson. If the election were today, who would you vote for?
Straightforward question, right? Feingold, the Democrat, is up 49-42, with 9 percent undecided. Later in the survey they do something interesting. They ask several questions somewhat critical of Republicans (Questions 3, 4, and 7, for example, see them on the link above).

Check out the wording here:
Q9: During his years in Congress, Ron Johnson has opposed every effort to raise the federal minimum wage. He has repeatedly voted against increasing it, and has even called for getting rid of the federal minimum wage altogether. Does this make you much more likely, somewhat more likely, somewhat less likely or much less likely to support Ron Johnson?
OK, that comes kinda close to a push poll, though to be honest that's hard to do in a statewide poll with any effectiveness. And right after this question, there's this:
Q10: Having heard all the information in this poll, let me ask you again: The candidates for U.S. Senate are Democrat Russ Feingold and Republican Ron Johnson. If the election were today, who would you vote for?
So all this loaded info, did it affect the poll results? Barely. In Q10, the results still favored Feingold, this time 52-40, with 8 percent undecided. So 1 percent of respondents shifted their position, probably to Feingold, as did a few Johnson supporters. We'd need access to the raw data, or at least more extensive crosstabs, to see if this is truly the case.

Let me be clear that private polling for the candidates will often do this kind of thing, sometimes to test approaches in the campaign. Will this issue move voters? This one? How about this one? You don't see it that often in public polling -- which is why I bring it up.