Consistency in World Cup Skiing

FasterSkierMarch 11, 201014

Kris Freeman recently commented on his blog, “I am a serious contender for the most volatile and inconsistent skier on the world cup”, in reference to his disappointing races at the Vancouver Olympics.

Every cross-country ski racer knows that you can’t always race at your best, all the time. Some days you just feel better than others. Often there’s an obvious reason (sickness, fatigue, overtraining etc.) but sometimes not. Racers work very hard to condition their bodies to perform at very high levels, repeatedly, throughout a season. However, it is inevitable that there are some differences from race to race.

These observation lead naturally to a topic that’s not very “sexy”, but it’s what stats geeks think about all the time: variation. Let’s look at some data regarding variability in ski racing and see what we can learn.

Here are the FIS points for individual distance races in the 03-04 season for Andrei Golovko and the 93-94 season for Jari Raesaenen. Golovko was all over the place and Raesaenen was quite consistent.

Calculating a standard deviation (SD) of the FIS points is one way to quantify variability. In my example above, Raesaenen had a SD of 7.1 while Golovko’s was 23.6. Clearly I’ve picked some extreme examples here, so we might ask some follow-up questions: How much variation is normal? Are some racers unusually consistent? Are some racers unusually inconsistent?

Let’s look at these questions using data from the distance events at major international ski races: World Cups (WC), Olympics (OWG) and World Championships (WSC). Here’s a more precise description of what I did and why. If you don’t care for technical details, feel free to skip the rest of this paragraph. For each athlete, I looked for seasons where they had received FIS points of less than 150 in at least nine WC, OWG or WSC races. Why less than 150 FIS points and at least nine races? First, there are some athletes with enormously high FIS point races. These results are going to seriously cloud the issue. (Trivia: care to bet what the highest FIS point score in my database is?) Second, some athletes may only race in 1-2 of these events in a season. That will make a racer look very, very consistent! So I’m trying to weed out things like this that will cloud the data. As it is, nine is small number of points to use for a SD.

In other words, for each athlete we find seasons where they had at least 9 WC, OWG or WSC races with less than 150 FIS points and then calculate the SD of these results. That’s one data point. Repeat for each athlete and we end up with several hundred SDs.

What do we get? The average standard deviation is 17.7 FIS points for men and 18.9 FIS points for women. Now, what the heck does this mean? Suppose, through a stunning and miraculous chain of events, I ended up on the World Cup circuit and an average race for me yields ~50 FIS points. If my SD is a “typical” 17.7, I would expect most of my races to fall between ~14 and ~86 FIS points, i.e. two SDs below and two SDs above my average 50 point race. Anything outside of that range would be fairly unusual.

So a SD of around 18-19 is typical, but just how typical is it? To answer this we’ll consider the following two histograms of all the SDs for men and women respectively:

These histograms give us a sense of the variation in variability. Confused? That’s ok, here’s the deal. We started out looking at how variable a single athlete’s results are over the course of a season, which we’ll call “within-athlete variation”. The histograms are plots of several hundred examples of within-athlete variation. This gives us a rough sense of just how variable this within-athlete variation might be and helps us to see how typical or unusual different levels of within-athlete variability are. For example, an unusually small amount of variation might correspond to a SD of less than 10, while an unusually large amount of variation might correspond to a SD of at least 28-30.

Finally, let’s return to Kris Freeman’s comments that I noted above. Has he, in fact, been unusually variable this season? Well, his SD for this season (with only six races) is 54.2. However, that is almost entirely due to one race, the Olympic 30k Pursuit. Removing that one race drops his SD for the season down to 27.1. Still high, but not frighteningly high.

Has he been getting more inconsistent over time? Here are the SDs for Freeman’s seasons with at least six races:

Season        Races     SD

————— ——— ——

2002-2003    6        20.6

2003-2004    16      11.9

2004-2005    10      32.9

2005-2006    7        44.2

2006-2007    11       19.5

2007-2008    11       24.0

2008-2009    7        16.8

2009-2010    6        54.2

I don’t see much of a pattern there. Is it reasonable to say that Freeman is in contention for being one of the most “volatile” racers on the World Cup circuit this season? Perhaps; but if Freeman is significantly less consistent than other skiers it is certainly almost entirely due to blood sugar issues. As he mentioned in his blog post, maintaining a correct insulin dose is extraordinarily difficult and these doses can change over time. One slip-up can result in a single catastrophically bad race that will make an entire season look very “volatile”.

And given that Freeman has had several other quite consistent seasons (2003-2004, 2006-2007, 2008-2009) it might be more fair to say that when he’s on top of his blood sugar management he’s as, or perhaps more, consistent than his peers.

I should also note that the manner in which I’ve examined skier variability so far is fairly crude, as it relies on calculating SDs based upon anywhere from 6-15 values at a time which, as I noted, is fairly sensitive to a single outlying result. An alternative approach would be to employ some heavier artillery and model skiers’ variability over the course of their entire career. I may revisit this topic later to address this, but for the moment you’ll just have to trust me that it doesn’t dramatically alter the picture with respect to Kris Freeman specifically.

I’ll close with the reminder that skiing consistently is not the same as skiing fast. Go back to my first example and ask yourself which season you’d rather have had, Raesaenen’s or Golovko’s?


Loading Facebook Comments ...


  • Ben Page

    March 11, 2010 at 1:32 am

    I’d be curios to know the correlation between a skier’s consistency (what you’ve measured as the SD) and their world cup success. In other words, do the people on world cup podiums have a significantly lower SD than a more average skier?

    If we look at a skier like Andy Newell (a case study of one), we can see that he is quite good at being in the top 10. Obviously sprinting is an odd duck in that FIS points are only calculated from the qualification round, but I would guess that Newell is more consistent than average. And yet, this consistency has not led to a single world cup victory. If we delve further, we find that his one silver medal came after a string of poor performances.

    I’m simply wondering if consistency is really all that great. Maybe our athletes would be better off if they could learn to be a little more ‘volatile’ if that meant popping onto the podium every now and again.

    I’m not suggesting that inconsistency is the key to world cup success; rather, I am simply curious enough to ask the question.

  • nexer

    March 11, 2010 at 7:31 am

    The obvious conclusion when you look at the data is that athletes with low points will have low SD’s, but athletes with high points may or may not have high SD’s. Athletes with high points and high SD’s will (if past performance is an indicator of future results) might stand a better chance of getting a better result.

    As Mr. Obvious John Madden will say, the better athletes stand a better chance at winning.

  • Cloxxki

    March 11, 2010 at 9:41 am

    Interesting stuff!

    Are a skier’s DNF’s also somehow taken into account?

  • nexer

    March 11, 2010 at 10:04 am

    Points are based on time back and it’s impossible to assign a time to DNF.

  • JoranElias

    March 11, 2010 at 12:01 pm

    @Ben Page: That depends on how you measure success. (Obviously!) A couple results in the top 10 might constitute a very successful season for many racers, even if the rest of their results are much worse. I’d guess that most people would prefer, in order from least to most preferable: slow and consistent, slow and inconsistent, fast and inconsistent, fast and consistent.

    Also, I specifically focused on distance events because sprinting is somewhat more confusing with qualification vs. final ranking difficulties. Also, there’s simply less data for sprinting, as it has been around on the WC circuit for barely 10 years or so.

    @nexer: In general, yes. The correlation is not rock solid, though. I’ll try to update the post with some pictures showing this relationship.

    @Cloxxki: As @nexer said, DNFs are tough. Obviously in some sense they represent a “bad result” but in a way that is tough to quantify fairly. I made the decision fairly early on to simply ignore DNFs, which I am mildly regretting now. I may revisit this late.

  • Cloxxki

    March 11, 2010 at 4:09 pm

    I see, that’s a though one.
    One approach I see, is let DNF’s be calculated as being seriously off one’s typical pace. Imagine, you’re on average 3% off the pace, and you hit a day where everything goes wrong. You’re soon off the back, and thinking of the training and racing in the season ahead, you decide to call it a day early, and try to train well tomorrow. Would you soldier on, badly motivated, in bad technique, low heartrates, on slow skis, you might finish the race 10% off the pace, and DONE for the week. Puking, cramping, the works. But, a measurable result.
    Unless someone like Kris hits a blood sugar crisis (serious health issue, definate reason to quit and get better), one would think a skier to normally be able to get himself together, and somehow finish the race like it were a race. It just won’t be pretty. If you could guess a fair rate of performance drop for a racer to get out of a DNF, you could run the results again. A skier may SCORE more consistently than (s)he skis.

    That said, one guy’s bad result is another’s reason to absorb a DNF.

  • JoranElias

    March 11, 2010 at 4:28 pm

    @Cloxxki: Yes, that would, generally speaking, be how I’d do it too. You’d have to tinker a bit with how exactly you assigned times to DNFs, but it could be done.

    Based solely on my impressions from gathering the data, though, I’d be willing to bet it doesn’t change the picture much, at least when you’re looking at the entire group rather than a specific individual. The number of DNFs on the World Cup circuit is (again, just my impression, I haven’t counted) a rather small proportion of all the results. We’re talking maybe 2-4 out of 40-80 for perhaps one/two races in three. Well under 2-3%.

    But believe me, it’s part of a quickly growing list of things I’d like to look at!

  • Doug1

    March 11, 2010 at 6:37 pm

    One of the big flaws that I see with your model is that 150 fis points is a very low bar. when you set the bar that low you’re capturing many skiers that are on the low end of the world cup. Yes they’ve skied world cups but that is not a good definition of a elite skier.
    Since the caliber of skiers that Kris is comparing himself to are the top 30-50 in the world it would make much more sense to set the bar at 50 FIS points. 50 points usually ends up around 40-50th place.
    I bet that if you look at skiers that qualify for that, or that are ranked in the red group, the standard deviation would drop significantly.
    Could you by chance post the data set that you’re working off of? It would be great to actually do some calculation of my own to see what happens.

  • JoranElias

    March 12, 2010 at 2:58 pm


    You’re right, I am capturing skiers at the low end of the world cup. I’m also capturing skiers at the high end, though. My objective was to get a sense of how all wc skiers vary, not just a subset. So in that sense, neither one of us is wrong, we’re just asking different questions.

    However! I do disagree on the usefulness of looking only at sub-50 point races. I tried many different cutoffs looking for one that was reasonable (in the context of my specific question) and settled on 9 and 150.

    What I found was that lowering the point cutoff has two big drawbacks: (1) you end up tossing bad races from good racers, artificially deflating their variability and (2) if you go too low, you end up running out of data.

    For example, using your suggested cutoff of 50, suppose we have a racer with 9 races between 15-30 points and one race at 80 points. A cutoff of 50 seriously underestimates the real variability of this racer by simply omitting their 80 point race.

    Now, the same sort of thing will happen with any cutoff, even at 150. But the higher you go, the less it happens. I was mainly concerned about a small number of very large fis points scores that would make plotting a histogram awkward.

    So I went for a relatively high number of races (9) to weed out some of the sort of people you’re talking about, since people regularly scoring really high FIS points aren’t as likely to compete in a large fraction of the events in a given season, and a low point cutoff (150) to make sure I’m capturing the “real” variability of the racers.

    Finally, I don’t need to redo the analysis to tell you that lowering the point cutoff will lower the SD values. What you’re suggesting amounts to saying “if we look at values in a less variable range, they will vary less”. Numbers between 0-50 will vary less than numbers between 0-150 no matter what.

    I hope that made sense…

  • Doug1

    March 12, 2010 at 7:06 pm

    I don’t think that including the slower world cup skiers in the mix is the right thing to do because we are looking at what the best are doing. Kris and others don’t want to be like the b or c tier skiers, they want to be the best. So in that context, I think it only makes sense to analyze the best.

    I think that a better model, would be to take the top 50 skiers in the world based on their FIS points profile, and if one of those skiers doesn’t have a minimum number of world cup starts this year, throw them out. Then find the standard deviations of that group. This will eliminate the bias of excluding bad races, and including bad skiers.

  • JoranElias

    March 12, 2010 at 7:57 pm


    I will simply reiterate what I began with before: My objective was to get a sense of how all wc skiers vary, not just a subset. So in that sense, neither one of us is wrong, we’re just asking different questions.

    When you say “I don’t think that including the slower world cup skiers in the mix is the right thing to do because we are looking at what the best are doing” you’re assuming that we’re trying to do the same thing! 😉

    If I had asked your question, then yes, what I did wouldn’t be very relavent. But I did answer the question that I _did_ start with. I just didn’t answer yours.

    See how we can both be right, here?

  • Doug1

    March 12, 2010 at 8:16 pm

    Very true. That is why i wanted to look into further myself. A fun debate nonetheless.

  • mattmuir

    March 13, 2010 at 5:43 pm

    I wonder what the average number of starts is, then what the average number of starts per nation is, and how the USST is situated in those figures.

  • JoranElias

    March 14, 2010 at 4:02 pm


    I glanced at some numbers but I wasn’t exactly clear on what you were asking (average start per athlete? per season?) so here’s a rough idea of totals. You can infer averages yourself by dividing.

    Looking at WC, OWG and WSC races, Distance and Sprint, Men and Women, the big nations (NOR, SWE, FIN, RUS, GER) tend to have ~120-160 finishes per season (excluding dnf,dsq). Then there’s nations like CAN,CZE,EST,FRA,JPN,SUI,UKR and USA that tend to have ~40-90 finishes per season. SUI, for example, is typically closer to 90 while the USA is typically closer to 50.

    Then there’s lots of nations with 1-30 per year or so.

    I literally just glanced at a quick summary, so don’t necessarily hold me to those numbers, but they should be roughly in the right ballpark.

Leave a Reply