Friday, March 28, 2008

More March Madness Math

There seems to be a fair amount of interest in the statistical analysis of basketball these days, although the sport is still way behind baseball in terms of the issues being analyzed. The guys at basketball prospectus are doing some good things, but currently, in my opinion, a lot of their work is simplistic and repetitive and caters to the casual bracket-playing public, but not to those of us who really want to understand basketball.

In truth, we probably also must admit that basketball is less susceptible to sweeping statistical generalizations than is baseball. Does anybody doubt anymore the lack of importance of the RBI stat? But ultimately, baseball is a team sport that sums up for the most part the individual actions of separate players.

Basketball, on the other hand, is a team sport played collectively. That is why dogmatic characterizations about who is better or best are rarely appropriate. Recently, one commentator opined that Kevin Love was better than Tyler Hansbrough, no doubt whatsoever in that commentator's mind. Now aside from the insufficient data upon which this commentator, John Gasaway based his conclusion, he also forgot something key. Basketball is a team game, not an individual sport.

Last night, the North Carolina Tar Heels defeated the Washington State University Cougars by a score of 68-47. At half time, Carolina led the game 35-21, and their star player Hansbrough had two points. But how could this be? If Carolina's POY candidate had been completely stiffled, how could the Tar Heels possibly be leading at halftime by such an onerous margin?

Well, its a team game. Gasaway might have you believe that Hansbrough had a horrible half. A more sophisticated view of what happened might be, however, that Hansbrough had a sensational first half, after all, his team had a lead that was equal to 67% percent of the Cougars's first half total. Washington State put so much focus on stopping Hansbrough, that they couldn't stop any of the other players for UNC. This is a team game and Gasaway ignores this reality with his brash and unwarranted, sweeping conclusions about which player is best.

Kevin Love and Tyler Hansbrough are both sensational players and UNC deeply wanted to see Love in Carolina Blue and I doubt UCLA would have turned Hansbrough away, had he shown up on their front door.

Second point: Points per possession analysis is a fundamental analytical tool used by basketball coaches to evaluate their team's play. Coaches in the know have been utilizing PPP analysis for over fifty years, as a means of evaluating a team's play independently of how quickly the ball is changing hands from team to team during a game.

It can help provide insight into the relative strengths of teams that play at different tempos or paces. It is not a secret, but like OBP in baseball, some people think it is such a cool way of looking at the sport, that they overwork the concept.

Or, to start with a joke, only a statistician could make us believe that leading a game by ten points is just as good as leading it by twenty points.

PPP differential is important, but it is not as important as absolute point differential, regardless of what anyone tries to tell you. Why not? Let's look at a simple arithmetic example involving two college teams, the Cheetahs and the Terrapins.

The Cheetahs play fast and have lot's of possessions on average in their games and they average a lot of points as well. By the end of the year, statistics show the Cheetahs score an average of 80 points per game, while giving up only 70 to their opponents in the first 38 minutes of their schedule in the fast and tough, African Coast Conference.

The Terrapins play slow. They have a coach who loves the movie, Hoosiers and believes every bit of that movie is true. Their team averages 40 points per game in the first 38 minutes of their games in the slow and rough, Equatorial Coast Conference, while rendering 35 to their opponents and they only utilize half the possessions per game that the Cheetahs use.

Now which team is better? Advocates of strict PPP analysis will tell you that these teams are equal, because their point differential per possession is exactly the same.

They are wrong.


Let's see what can happen to both teams in a game where they are especially unlucky down the stretch in the last two minutes. In the first game, the Cheetahs's opponent, the Salukis, hit three three-pointers in the last two minutes, after having only made three in the entire game that far. For the Cheetahs, there seems to be a lid on the rim and they don't score a single point down the stretch, laboring to hold on, 80-79.

In the second, game, the Terrapin's opponents, the Sloths hit three three-pointers down the stretch, after having only made three in the entire game that far. For the Terrapins, there seems to be a lid on the rim and they don't score a single point down the stretch, and they lose a heartbreaker, 44-40.

Why the divergence in outcomes? Simple mathematics. The Cheetahs were able to overcome a somewhat unlikely series of outcomes in the final two minutes, because their absolute lead was much greater. The Terrapins could not, even though up to then, their PPP differential had been exactly the same as the Cheetahs.

Thus, the mathematics are basically simple. The better (more talented?) a team is, the faster the tempo they should employ, ceteris paribus, because it gives them an extra cushion against unlikely events occurring to take the game's outcome away from the expected mean. Ceteris paribus is Latin for all other things being equal.

Let's look at some real world examples. In the current basketball ratings, Wisconsin is ranked number 3 and North Carolina is ranked number four. These teams make for good examples because Wisconsin plays much like the Terrapins above, while UNC is more of a Cheetahs-type squad. North Carolina has the better record at 35-2, while Wisconsin has the stronger power index, in spite of having lost twice more and having been blown out in one game.

Basically, this is because Pomeroy has not found a way (or has chosen not to) to incorporate the mathematical notion of standard deviation from the expected outcome into his ratings. I will use his words here:

"Consistency is basically the standard deviation of scoring difference by game for a team. Again, it’s not included in the ratings calculation. It can be an aid in determining which teams are overrated by my system. Highly rated teams that are inconsistent tend to look beatable more often. As of this writing, Georgia is ranked 329 in consistency and Oklahoma is at 334. They’ve played their best games against poor teams, and their worst against good ones.

Ideally, I’d synthesize the consistency and rating into one number, but I haven’t found a way I’m comfortable with. So right now, I’m throwing this system out there with all its warts for everyone to see. The warts tend to decrease as more games are played, but at least I’ve made you aware of them and where they can pop up."

First, I would like to point out, consistent with the above, the fact that WSU was one of the least consistent teams in all of college basketball this year, which may account for a lot of the head scratching by people who could not figure out what was wrong with Pomeroy's rankings here. I would argue that the rankings were wrong with respect to WSU and I think he admits why here, although you have to go trolling into the dark recesses of his fantastic site to find this. I am not sure, however, why he continues with the pretense of multiplying out the logs, given this fundamental defect.

So, getting back to Wisconsin, whom Pomeroy rates above UNC, while the USA Today computer rankings have Wisconsin fifth, with UNC, first, what is the truth of the matter?

Well, there may not be any ultimate truth. If a team is incredibly good but very inconsistent, they might be very likely to be ranked number one all year but then get upset in the tournament, which is a simple one and out format, not best of seven as in the NBA.

And just so I don't get a lot of I-told-you-so emails from either UNC or Wisconsin fans, when one or the other loses, one loss, without more is not enough to defeat the analysis. Unlikely events happen all the time and if they didn't, life would be quite strange and boring!

Nevertheless, here, I believe that Wisconsin being rated more highly is due to a flaw in Pomeroy's system. Not only does UNC have, in fact, fewer losses, their style of play seems to insulate them more from upsets. And if we look at his other catgories, we see it. Carolina is the 56th most consistent team in the country, 2nd only to Memphis among teams still playing, while Wisconsin is the 205th most consistent team. Wisconsin's slow style of playing provides them with less of a cushion against upsets. On the other hand, their style is probably fairly efficiently tailored for their talent composition, and for them to change it, would violate the ceteris paribus part of our mathematical example.

So, what does it all mean? A team like Wisconsin is far less likely to be able to go through an entire season unbeaten, compared to a Memphis or UNC, which have high consistency ratings. Thus, when people say UNC is clearly "better," what this should mean is that UNC is likely to pass through the year with fewer losses. But this does not mean that UNC is necessarily likely to beat Wisconsin in a head to head match-up.

And to take the analysis one step further, let's ask ourselves, who is better between the real Terrapins of Maryland and the inappropriately named Cougars of WSU. Which of those two teams is better? Well, surely you say, it has to be WSU, since they made the NCAA tourney and had a fairly high ranking, while Maryland went out in the second round of the NIT. And yet, Maryland beat UNC and most Carolina fans would probably tell you that they would rather face WSU any day, over Maryland.

I will let the reader puzzle over that one, but will finish by stating that it seems that teams that play slow will generally be more at risk at the ends of games when playing inferior opponents. While this might not be a huge problem over the course of an 18 game regular season schedule, it can be huge when the rule is one and done. That may help explain how Georgetown got upset by Davidson and why Wisconsin could be at risk today, as well as UCLA against Xavier.

Ultimately, it's because, no matter how you slice it, leading 50-40 is not as good as leading 100-80 and don't let anybody lie to you with statistics and imply that it is.

No comments: