In my race recap for the Davos distance race I noted the strong performance of Masako Ishida, and pointed out what an extreme classic specialist she is. A certain world famous XC skiing journalist wanted to know if the conventional wisdom he’d heard was correct, that Japan’s skiers typically do better in classic skiing overall.

This was particularly fun to tackle, since it turned out to be a situation where simply graphing the data in a clever way wasn’t enough. We actually have to do statistics! Woo hoo! Don’t worry, though, while the techniques I ended up using for this analysis are fairly sophisticated, the results are pretty easy to understand and explain.

I’ll start with my first pass:

My first thought when approaching this kind of question is always to simply throw the data up on a graph and see what I can see. So this is all of Japan’s WC, WSC and OWG results back to 1992. Note that I’ve plotted rank, not FIS points, to keep the distance and sprint panels on the same scale.

The classic preference is clear in recent years on the women’s side, but since we already know that Masako Ishida has a monster proclivity for classic skiing, this could just be due to *her* results. And other than the recent results for the women, it’s tough to make out any obvious patterns. That’s because this graph treats all of Japan’s results from every athlete as a single group. But clearly different skiers will have different abilities in skating vs. classic. So we need a way to look at each individual skier’s races. The problem is that there are more than 20 Japanese skiers (in my database at least) with a fair number of WC starts. Can you imagine looking at a similar graph but with more than 40 panels (distance and sprint for each athlete)? That wouldn’t be very illuminating, I think.

One option would be to artificially limit myself to a small number of skiers, but then our answer would apply only to those skiers we picked, not all of Japan’s skiers. If we really want a good answer to this question we really need to include as much data as possible.

The solution is to use a model (gasp!). In particular, a hierarchical linear model. I’m not going to bore people with a detailed description of how this worked; if you’re really curious ask questions in the comments. The bottom line is that this tool allows me to estimate the difference in results performance both overall and for each skier individually at the same time (by doing both at the same time, it often does a better job at each).

I probably could have squeezed this all into a single model, but I decided it would be easier to explain to folks if I modelled sprint and distance races separately. That also allows me to use FIS points as a measure for distance races and rank for sprinting, which makes somewhat more sense anyway.

In distance races Japanese skiers (men and women) tend to ski about 3.49 FIS points slower in freestyle races (95% CI -1.75,8.73). That little parenthetical just now meant that the 95% confidence interval for this effect ranges from -1.75 FIS points to 8.73 FIS points. Since this interval includes zero, we would typically say that this does not meet the threshold for “statistical significance”, meaning that we can’t say with much confidence that the *real* difference isn’t actually zero. Also, 3.49 FIS points is not a very large difference in practical terms.

But remember that this fancy-shmancy model I’m using doesn’t just estimate the overall effect, it also estimates this difference for each individual skier. The following graph displays the results, along with their associated 95% confidence intervals:

The dots are the estimated effects, so positive values means that skier does worse in skating races and negative values means that skier does better. The bars are the 95% CI’s.

This rather wonderfully captures several important things about statistics and data analysis! First, you’ll note that here too, basically all of the confidence intervals overlap with zero, meaning that the sticklers among us will simply throw up their hands and say the results are inconclusive.

There are only two skiers that seem to conclusively prefer one technique or the other: the aforementioned Masako Ishida and Mitsuo Horigome. But this misses an important fact, that the overwhelming majority of Japanese skiers have estimated effects larger than zero (although many of these effects are also quite small), indicating at least some preference for classic skiing. The important lesson here is that just because your results aren’t “statistically significant” doesn’t mean we can’t learn interesting things from the data.

In this case, the fact that so many Japanese skiers have a nominal preference for classic events (despite each individual effect not being significant, or perhaps small) plausibly suggests that as a group, Japan does perform somewhat better in classic races.

To satisfy all those statistical significance sticklers out there, we have our second important lesson. That is that XC skiing is a volatile business and so large amounts of variability are just par for the course. That variability is what’s muddying the waters here and preventing us from getting nice clean “significant” results. So while it may be fair to say that Japanese skiers are generally better at classic events, we should keep this variability in mind and use caution when we’re watching a specific skier or a particular race.

The trend is there, but it’s imprecise.

Ok, what about sprinting? Same deal, but instead of FIS points, we’re just going to use rank:

Remember we’re talking about places now, not points. So an effect estimate of 5 means that skier finishes around 5 places *worse* in freestyle sprints, on average. So positive values would indicate a nominal preference for classic and vice versa.

There are more clear cut cases here, to be sure, with ~13 different skiers having a preference that is significant. The overall effect of skating for Japanese skiers as a group is 1.11 finishing places (95% CI -2.06,4.27). Once again, this effect appears small and is not statistically significant.

However, we still have this phenomenon of considerably more Japanese skiers preferring (albeit by small, often statistically insignificant margins) classic races. Additionally, among the more stark cases, ~10 prefer classic while only ~3 prefer skating.

Of course, now that I’ve done this for Japan, what other countries might be interesting…?

Ok, if you don’t care about annoying technical details, feel free to stop reading now.

I checked for an interaction between technique and gender and failed to see anything worth reporting (let alone anything that was statistically significant). My discussion/interpretation of confidence intervals has been meant for a lay audience, so try to give me a break if I glossed over something you think is important. Feel free to gripe in the comments, though. However, if you complain about multiple comparisons problems, I will not respond and instead will roll my eyes and make fun of you privately with all my stats friends. 😉

Related posts: