A commenter noted that it was interesting that Petra Majdic showed up as being “statistically significantly” better at longer distance races (as opposed to sprint races), although just barely.
It turns out this is a good example of the statistical concept of leverage. Check out the following graph that compares skiers at each end of the statistically significant group I highlighted in my last post:
Johaug was near the top of the list and Majdic just barely snuck in at the bottom. What’s going on here is that the “standard” distances, 10-15km don’t have much of an effect on the model. But really short distances (5k or Prologues) and really long distances (30km and up) exert more leverage on the regression line.
So Johaug hasn’t done terribly well in Prologues, but has done well in 30k’s. Conversely, Majdic doesn’t have quite as extreme a split between those two race lengths. It is true that basically all of Majdic’s results at 30km+ are classic races (big surprise!) and she is quite the classic specialist. So it could be that despite attempting to control for technique in my model I haven’t quite entirely removed technique as a confounding factor in Majdic’s case.
Related posts: