US Distance Results: A statistical analysis

An endearing facet of cross country ski racing is the seasonal rhythm it introduces into our lives. Summer, fall and winter each hold unique rituals and landmarks of training, racing and recovery. Spring brings with it its own rituals, among them retrospection on the season just completed. Athletes around the nation use these months to reflect on their season, evaluate their performance and plan for the future.

In recent years, the same has been true for the US skiing community as a whole. In particular, each spring we are treated to a new round of articles discussing the progress, or lack thereof, of US skiing. FasterSkier.com has been a prominent forum for many of these discussions.

I have followed these discussions with some interest and wade into the topic with considerable trepidation. I want to be clear that while I was a ski racer for much of my life (occasionally even achieving a level of not-slowness) I claim no expertise on how a nation ought to develop international caliber athletes. I have no racing or coaching experience that could even remotely be considered close to an international level of competition. However, I do possess a large quantity of data on skiing results, I am more qualified than many as a data analyst, and an obvious thing to do with skiing results data is to look at trends in performance over time. If I want to write data oriented articles on skiing, I’ll have to take up the topic eventually. Hopefully, I won’t make any enemies!

These discussions have elicited a significant amount of anger and vitriol in the past so I feel compelled to qualify this article with the following:

Nothing that appears in this article is intended to constitute a Final True Answer regarding US international skiing performance. No data set can completely capture the entirety of the US skiing community. When used well, data can provide a useful guide for future questions and debate. In that sense my intention is not to settle a controversy but to provide common guideposts for future debate by people more knowledgeable than myself. Finally, nothing that follows should be construed as a commentary on the commitment, dedication or desire of US athletes or coaches. Any impression to the contrary is entirely the fault of sloppy writing on my part.

With that bit of ass-covering out of the way, let’s get down to business.

The motivating question for this article is: Are international results of US skiers improving? Now, while this question seems simple enough, it needs to be unpacked a bit.

First, we have the obvious breakdown into four subcategories of men’s and women’s, distance and sprint results. To keep the length manageable, I’m going to split this article in half, addressing distance results now and sprint results in a few days.

Second, we have to pick a time frame. The data I have limits this to seasons dating back to around the early 1990’s. (I would love nothing more than to extend this back to the heady days of Bill Koch, Tim Caldwell, etc. If anyone knows of repositories of international ski results, with times, that stretch back that far, please let me know!) I’m going to take the broadest possible view and consider all the data I have. So our time span runs from the 1991-1992 to the 2009-2010 seasons.

Third, we need to pick which races to consider. As in my previous articles, I’m going to restrict myself to World Cup (WC), Olympic (OWG) and World Championships (WSC) races. My reason is simply that these are the primary “elite races” that we measure international success by and data on domestic races is harder to come by. There are drawbacks to this approach that I’ll touch on at the end of the article.

Fourth, we need to select a metric by which to measure success. I’m going to use FIS points. No system for measuring performance across different races is perfect and FIS points are no exception. However, they are an accepted standard and probably do a reasonable job, all things considered.

In the interests of full disclosure, since my source for these data is the FIS website, I am largely dependent on them for accuracy. In the process of compiling these data, I discovered many, many errors on the FIS website. Most of these errors would not have much of an effect on this analysis. However, where I could resolve the issue in a verifiable way, I did so. Beyond that, any errors in my data are most likely the fault of FIS. Honest! Also, FIS doesn’t record points for results during much of the 90’s. For these races I have calculated them myself assuming a penalty of zero and current rules on F-values, which while not technically accurate, seems reasonable.

The following two graphs show the FIS points (truncated above at 100) for men’s and women’s distance races from six nations, including the US and Canada. (Click on images for larger versions.)

Women's Distance Performance — Women's distance performance

Every FIS point result (below 100) in a WC, OWG or WSC distance race for every athlete from each nation during this time period is shown. There’s lots of over-plotting, so I made the dots partially transparent. Darker areas indicate areas with more data points.

I don’t know about you, but I find it exceedingly difficult to spot any trends. There are just too many dots and it overwhelms the eye.

I could plot a trend line using a scatterplot smoother for each nation, but this presents some problems. A typical trend (or regression) line tracks the average (or mean) FIS point result for each nation. Now, that might be of interest in other contexts, but given our question it would be somewhat misleading.

For some, discussing US skiing performance means discussing how our best athletes are doing, not our average ones. Others may want to include a discussion of depth: how many athletes do we have performing at a certain level? Both are legitimate topics of conversation. What we’d like is a way to visualize both of these concepts at the same time.

For this article, I’ve settled on an informal version of quantile regression. Quantile is the term statisticians use for what most people think of as percentiles. As in, “scoring in the 95th percentile” on a test means your score was among the top 5% of all scores. Of course, for FIS points, lower scores are better, so when I say “in the 5th percentile”, I’ll mean in the bottom 5%. The 0.05 (or 5%) quantile refers to the cutoff value that determines which values are among the lowest 5%. The 0.5 quantile is commonly known as the median.

What we’ll do here is look at trend lines for quantiles. This will allow us look at trends at different levels of performance simultaneously. The following two graphs display the trend lines for the 0.5, 0.25, 0.15 and 0.05 quantiles for FIS points in distance events for men and women in the same six nations as above.

Each line represents the cutoff for that quantiles best FIS point results in men’s distance races for each nation. So a Canadian guy in the early 90’s who had a race under ~50 FIS points would have had one of the top 5% results for Canadian men that season. Having lines nearer the bottom means your racers are going faster. Having lines that are bunched close together near the bottom means you have lots of skiers going fast. These two elements, the height of each line and the spacing between the lines, gives us a rough picture of both the quality and the depth of each nation over time.

Now, what can we make of this? Obviously, Norway has been very good for quite some time and has a lot of depth: their trend lines are all closely bunched near the bottom. The dramatic improvement in the German teams during the late 90’s and early 00’s is easy to spot. The Russian men appear to be steadily regaining their form, presumably following a backslide around the break-up of the Soviet Union. Interestingly, the Russian women, while certainly very strong, seem to be headed in the opposite direction. The Swedish men got a bit of a bump during the Per Elofsson era and have remained consistently strong since then. Finally, the Swedish women have been improving over the past 4-5 season, which can be seen in their graph as well.

(I chose these countries to join CAN/USA in the graphs not because I thought they made for the best comparisons, but because their data had some interesting features and they serve as a convenient reference point. Obviously I’ve omitted nations whose teams have sizes and budgets more comparable to the USA and CAN. I simply decided that international comparisons were not going to be the focus of this article, so I decided not to worry about it.)

How about Canada? Their men were plugging along at about the same level as the US men until the Salt Lake Olympics, at which point they began a dramatic turn around which has continued through this past season. This trend began long before Ivan Babikov switched nationalities. It is interesting to note that all of the quantile trend lines for the Canadian men have sloped sharply downward. This tells us that they have been improving top to bottom (or at least top to middle, the median). It will be interesting to see what happens to this trend over the next 4-5 seasons. If the lines basically flatten out and hold steady where they are now, the Canadian men may end up being reasonably compared to countries like Germany and Russia. Exciting stuff!

The Canadian women demonstrate a somewhat different story, that can probably be summed up in two words: Beckie Scott. The precipitous dip in their trend lines during the late 90’s and early 00’s roughly corresponds to Scott’s career. Of course, this isn’t entirely fair, but it makes for a pithy opening to a paragraph. The Canadian women had a fairly deep team during this time period (Sara Renner, Maliane Theriault, the Fortier’s, etc.) and the graph reflects this, as the lines remained fairly bunched together and improved at all levels. However, with Beckie Scott’s retirement (and several others) the team’s performance seems to be sliding and perhaps losing some depth.

Finally, what can we say about the US performance over time? First, the men: we see some significant improvement from the mid-90’s until around the Salt Lake Olympics, at all levels. This is largely the era of Justin Wadsworth, Marcus Nash, John Bauer, Carl Swenson, and Pat Weaver. Since then, things have essentially plateaued. Additionally, we can see that the trend lines are considerably more spread out than in the past and also compared to the Canadian men. This would indicate a relative lack of depth.

This time, my pithy opening is a little more appropriate: Kris Freeman. Quite simply, he’s been the story of US male distance results for nearly a decade now. Indeed, the 5% trend line, representing the top 5% distance results by US men is basically owned by Kris Freeman for the past decade. To what degree? Well, here’s a list of the number of sub-40 FIS point races in WC, OWG and WSC (a relatively low bar) by American men over the past decade:

FREEMAN Kris                  53
SWENSON Carl                  20
JOHNSON Andrew             7
WADSWORTH Justin         6
SOUTHAM James               3
BAUER John                    2
FLORA Lars                    2
KUZZY Garrott                2
CHAMBERLAIN David       1
KOOS Torin                    1
LIEBSCH Matthew       1
NASH Marcus                   1
NEWELL Andrew                 1
WEAVER Patrick                1

More than half belong to one skier, Freeman, and another 30% or so belong to skiers who have since retired. The fact that the lines have remained fairly flat for the last 10 years may surprise some people (at least, I was). The flatness in the 5% line might be underplaying some real improvement by Freeman, since as we’ve discussed earlier, his blood sugar issues can make him some what erratic. Hence a “lack of improvement” in this respect probably hides some real gains that Freeman has made over this time period and instead represents some sort of middle ground between him getting generally faster but having a harder time managing his blood sugar. As for the other trend lines, I don’t feel comfortable speculating as to why they may have remained flat over the last decade; my theories about Freeman seem, well, speculative enough as it is.

The chart depicting the quantile trend lines for US women might also surprise some people (again, it surprised me a bit). While the top performances have never really broken much below 50 FIS points, the general trend, at all levels, has been improving over the last last several years. Naysayers will point out this this improvement is simply recapturing ground lost in the years immediately following the Salt Lake Olympics. Compared to the US men, the US women don’t appear to have the same lack of depth: the trend lines for the US women don’t seem nearly as spread out compared to the women from other nations. However, this is slightly misleading due to the different scales in the men’s and women’s graphs. Note that the absolute gaps from the 5% line to the 50% line are actually roughly comparable between the US women and US men, encompassing a range of around 50-60 FIS points. These graphs probably aren’t the best for detecting anything more than coarse differences in the spacings of the trend lines, so I won’t pursue this further, other than to warn you not to parse this feature of the graphs too aggressively.

If the single biggest story in the US men’s graph was Kris Freeman, then I’d say that the single biggest story on the women’s side is the small number of starts on the international level. (You thought I was going to say Kikkan Randall, didn’t you!) Here’s a list of the number of sub-100 FIS point races in WC, OWG and WSC (a very low bar) by US women over the past decade:

WAGNER Wendy Kay              22
KEMPPEL Nina                      20
RANDALL Kikkan                    19
STEPHEN Elizabeth                 9
ARRITOLA Morgan                   7
KONRAD Sarah                      5
DUSSAULT Rebecca              4
BROOKS Holly                      2
COMPTON Caitlin                   2
BENOIT Tessa                      1
JONES Barbara                     1
SMITH Aubrey                      1
SMYTH Morgan                      1
TRYGSTAD-SAARI Kristina       1

Certainly, Kikkan Randall’s improvements in distance events in recent years (I’ll show you a graph in a bit) is an important component in the direction of the trend lines. However, I’d argue that another important story here is that American women just aren’t starting as many international races as the men. I found 355 results since 2000-2001 for US men and only 203 results for US women over the same time period. The smaller number of races makes the trend lines correspondingly more sensitive to the results of a single athlete.

The reason that I highlight the lack of starts on the women’s side is that if you try to look at trends among individual US women (and I have) you find that most simply don’t have enough distance results at the WC, OWG or WSC level to draw any conclusions, at least from my perspective as a statistician. Coaches and athletes may be more adventurous in reading stuff into small amounts of data. That doesn’t mean that there’s too little information there to tell us anything, just that there’s too little information for me, personally, to feel comfortable displaying and commenting on it.

Finally, here’s a graph showing the distance results for Kris Freeman and Kikkan Randall with trend lines (representing the average, not a quantile):

As I warned above, I’m going to avoid engaging in any commentary as to What This All Means. Any impression I may have given that I was doing otherwise was entirely unintentional. What I can do, now that I’ve given you a bunch of data to chew on, is beat you to the punch and list all the reasons I can think of not to trust any of it!

A bit hyperbolic, to be sure, but I want to point out some major deficiencies that prevent this article from being the Comprehensive Story on US Distance Performance:

– By looking only at WC, OWG and WSC races, we are entirely missing any trends (up or down) among US racers not competing at this level. The US has relatively few racers competing on the WC level, so this encompasses a lot of skiers. Indeed, one could argue that these data, at least over the last 8 years, mostly reflect the performance of just two skiers: Kris Freeman and Kikkan Randall.

– It is important to note that there are rules governing the number of starts allocated to each nation, the US certainly gets fewer than most other nations, and that these rules change over time. If we were interested in explicit comparisons of performance between nations, this should be taken into account.

– The quantile regression method I chose is quite robust, meaning that a small number of exceptionally good or poor races are unlikely to significantly impact the trend lines. Depending on one’s point of view, this is either good or bad.

– The trend lines themselves obscure small changes from one year to the next. Since I wanted to look at a large time frame, I specifically fit these trend lines in a manner that made them fairly resistant to changes. This means that it takes a fair bit of work by the data, over more than one season, to pull one of the lines up or down. Again, depending on your point of view, this is either good or bad.

– Many arguments can be mounted that FIS points, as a measure of skiing performance, sucks. (In the future, I might put forward a few of them!) All sorts of metrics can be concocted: World Cup points, FIS point list ranking and so on.

– I haven’t shown you data from nations with small, low budget ski teams, other than the US and Canada. I wasn’t really interested in explicit comparisons between nations, but I will try to revisit this in the future.

– Finally, the data themselves may (and probably do) contain some errors. This may include the FIS points I calculated myself, data entry errors by myself or errors by FIS that I didn’t catch or wasn’t able to fix. Things that were obviously errors (1.5 hour 5k, anyone?) were simply omitted. As a simple example of an unresolved issue, it is unclear to me whether FIS actively returns to old results and recalculates FIS points based upon retroactive disqualifications. Obviously, this only impacts the points for major races if the winner is disqualified, but it has happened.

That’s a lot of holes, and I’m sure the commenters will think of many more. Still, all data sets and analyses are in some way incomplete. At the very least I hope that this serves as a useful starting point and helps to provide some common ground for discussions regarding US international performances. And of course, I hope it was entertaining and interesting.

Check back in a few days for the next installment in which we’ll examine sprint results…

FasterSkier

LOCAL 237

Was there any program that didn’t use Miley Cyrus as a...

Loading Facebook Comments ...

20 comments

sailguy
April 12, 2010 at 9:49 am

Fascinating analysis, keep these coming.

Just to add a confounding factor to the analysis, there is a challenge with number of racers. Most races that count for this analysis have entry limits (4 per country, fairly often), but the host country gets to add more athletes, sometimes as many as twelve. Part of the Canadian men’s recent improvement may come from getting to start a large number of athletes at WC races in Canada. Norway and Sweden host a lot of major races, which helps them statistically as well as practically. No idea what you can do about this; the fundamental problem with statistics is the crap data available 🙂

For data, I remember that Cross Country Canada talked a few years ago about collecting stories and results from the ‘early’ days. They may have unpublished data as a result. As the Canadians weren’t always on the first page of paper results, there may be a lot of useful data there.

Maybe a fun project for next season would be to see if FS readers can hunt down records from the pre-internet days? I recently discovered that the mother of one of my skiing friends has an amazing scrapbook that covers my entire junior career (and everyone else who raced against her son). It seems to me that the mothers of skiers who actually won races, even regionally, might have similar collections.
peetch
April 12, 2010 at 10:09 am

Nice work.

Remember to be cautious when comparing FIS points over time since the way we calculate them has changed multiple times. Both the F-Value that penalizes time back and the baseline penalty have been calculated in different ways over the years. For example, compare the 15km classics at the 2002 and 2006 Olympics, and note that similar percentages back are receiving considerably different FIS point results. This creates even greater problems when looking at non-OWG/WC races in which there are penalties, since differences in point and penalty calculations affect not only the single race, but also propagate through the results of other races. Thus we are unable to directly and empirically compare the FIS points of an athlete from present with one from ten years ago.

Does anyone know of ways of overcoming this challenge? This would seem to be a key issue for understanding athlete development, so we could empirically compare our athletes of today with the results of successful World Cup athletes when they were younger.
SkiingBear
April 12, 2010 at 11:32 am

Very interesting. One thing you pointed out a couple of times that I’d like to comment on is the team depth issue. I think it was on the Team Today website where Pete Vordenberg was talking about the fact that other than Kikkan Randall had a couple of great races, the US team fell far short of their goals. He actually attributed that to the lack of depth. For example, if memory serves it was the 30k race where Freeman was looking so strong with the lead pack and then his blood sugar crashed so the top US finisher was in the 30s (if my memory serves again…).
Reese
April 12, 2010 at 12:23 pm

jeeze, this is pretty extensive… and awesome! finally, some fact-based analysis
benji_uffenbeck
April 12, 2010 at 2:50 pm

Nice work, numbers are good! As people have already commented, the variety of methods used to calculate FIS points over time no doubt scews the data. In particular, I’m curious as to the impact of mass start races on FIS points? It would seem that the frequency of bunch finishes in mass starts would result in a slew of low FIS point values – likely far exceeding what you’d normally see in a traditional interval start race. It’s possible that you could find a correlation between the number of mass start races per season and the number of countries with very low FIS point scores. In other words, I’d expect to see fewer people with FIS results under 10 back in the days where every race was interval start.

Another wild card to consider is doping. The mid to late 90’s were well known to be a high point of doping since EPO tests were almost nonexistent. I remember reading something about hemoglobin testing from the 1994 World Championships in Thunder Bay that showed crazy high levels for much of the field. For the skiers that weren’t cheating, you’d have to think their FIS points suffered accordingly. Some of those 50 point races back in the 90’s might well have been much better than that… I doubt we’ll ever get a fair comparison for skiers from that era.
Tim Kelley
April 12, 2010 at 4:43 pm

Benji nailed a key flaw in this data analysis – interval versus mass starts. The point spread for a distance race is usually much less with mass starts. Take a 50 km race that is a tour for 42 kilometers before it becomes a race (like in Torino where AJ was at 19 FIS points). If the 50 km was a interval start, time-trial, slug-fest for the whole 50 kms – a 19 FIS point mass start US score would likely end up at 50, 100 or more FIS points.

I’m not sure why you cut your data set off at the early nineties? I thought the FIS database had results from 1982 on. I personally would not have used FIS data because it is incomplete. 77 to 81 “unofficial” World Cup data is missing and there is no digitized FIS race data pre-77. Instead I would have used just Olympic and WC results and calculated FIS points in a uniform manner. The dataset would be smaller, but the data can actually be found. That way you could start the trend analysis from 1960 or 1970 and include the Koch / Caldwell / Galanes / Dunklee / Peterson / Simoneau days (all these skiers mentioned were consistent WC point scorers, and you only got WC points for the top 20 back then).
JoranElias
April 12, 2010 at 5:33 pm

More great comments!

@sailguy – You’re right that the different number of starts assigned to each nation makes explicit comparisons between nations challenging. It should mess with the trends within nation only to the extent that the number of starts change over time, averaged over each season.

I could be wrong, but don’t both the US and CAN get extra starts for WC’s held in North America?

Several people mentioned the changes in point calculations over the years. First, changes in penalty calculation methods _shouldn’t_ have any effect since by restricting to WC/OWG/WSC races, I’m essentially limiting myself to races that by definition have a penalty of zero.

Changes in F-factor values is certainly a concern. I might have gone back and re-calculated them all myself using current rules, but the FIS website does a terrible job of labeling which pursuits had a break and which were continuous. I could probably fudge it, but I’d never be completely sure I’d gotten it exactly right. The really old results were doable just because there basically weren’t any pursuits back then.

I looked at the 2002 and 2006 OWG 15k men’s classic races, and I’m not seeing a huge difference; but then I’d expect changes in mass start and pursuit races, not interval start races. I could be wrong, though…

There’s certainly a difference in point profiles between mass start and interval start races, particularly lately. Methods of calculating points that fix this problem is a future article topic I’m considering.

In my defense, I don’t think that (a) different F-factor values over time or (b) differences between mass/interval start, are having enough of an impact to radically change the “big picture” I’ve presented here. Why? (a) I’ve spent a lot of time with this data, and I just haven’t seen any evidence of noticeably aberrant fis points in particular years or groups of years, or weird trends over time that can’t otherwise be explained. Most of the data had points calculated with identical rules, and the section that wasn’t I just haven’t noticed anything weird in it. (b) I’d be worried about this only to the extent that I think that particular athletes/nations are deliberately targeting mass start races. That may be happening, but I doubt it’s happening enough to significantly skew things.

My defense on (a) is pretty hand-wavy; I’m just asking you to trust me. I’m somewhat more comfortable not worrying about (b).

@TimKelley – I stopped at the early 90’s because that’s when FIS stops providing full results with times, even for OWG results. If you poke around on their website, you’ll see that earlier results either have no times, or they are clearly truncated, sometimes down to the top 10 or even just the top 3. I do suspect that FIS’s records on just WC races are not 100% complete. (Wasn’t there a year where the Marcialonga was good for WC points? If so, I don’t think it’s indicated on their website.) However, I’m fairly confident that I have enough races that are right to get a good general picture of things.

Now, I’m no defender of FIS points. They certainly aren’t the best. But I think they work pretty well as a crude measure of performance, which is specifically why I took such a bird’s eye view for this analysis. I’d never try to drill down further into the data without using some other measure. Keep in mind that I’m aggregating over all athletes from each nation over an entire season at a time here. The (statistical) robustness of quantiles helps some too and then on top of that I’m smoothing out the quantile data. That’s a lot of barriers for data problems push their way through.

I guess all I’m really asking people to buy about FIS points is that even over long periods of time, having lots of results under 15 FIS points is better than having your best results between 30-40 FIS points. That’s a pretty crude distinction that will obscure a lot of the (real) issues people are raising.

Doping. Not much I can do about that. :shrug: Caveat Emptor.

Finally, at least you’ll all have to think of something else to pick on me for in the sprint half of the article, since I tossed FIS points out entirely for that one (for obvious reasons). 😉 I’m sure you’ll think of something, though…
JoranElias
April 12, 2010 at 5:44 pm

I just wanted to clarify that my comments above are not meant to deny the validity of the FIS point criticisms. I only wanted to draw a distinction between “Flaw that if fixed could potentially improve the analysis” and “Flaw that until fixed renders analysis utterly worthless”.

So my comments were meant simply to argue that these problems tend more towards the former than the latter. But of course, that’s just my opinion.
Tim Kelley
April 13, 2010 at 11:43 am

Joran: Thinking about this some more – why not simplify the complexity of your analysis by choosing the one constant of Olympic and World Championship xc ski racing over time – the relays. Relays have been 4 of a country’s top skiers going the same distance (10 km men, 5 km women) forever. Yes – the technique has changed for two of the legs over time … but the point here is to determine whether US distance skiers are getting faster in general, not in a specific technique.

This data should be easier to find (Olympics at least – IOC site) than complete FIS race data for World Cups. And you can rid your worries about FIS point data because you don’t need to use it. Chart the percent back of the winning time (or the average time of the medalists) and see what the trend shows over time. If over time the percent back of the US relay teams to the medalists is decreasing, then that will show if US distance skiers, as a country, are getting faster. If the percent out is increasing, then that will tell another story.

I agree with you – sprint analysis will be easier to do using your technique. But your analysis technique on distance results seems to have too large of a missing data window (60s through 80s) and too many unresolved variables – IMO.
JoranElias
April 13, 2010 at 2:07 pm

@TimKelley –

Why not do as you suggest? Well, because if I did, someone else would show up and (correctly) complain that I’m evaluating the entire performances of many athletes based upon a single race by only 4 athletes once every 2-4 years. I’d be willing to bet that most athletes and coaches wouldn’t want to be evaluated on their performance in a single race held once every few years and would (rightly) complain that that isn’t fair. Personally, I agree with that critique and so that’s not the direction I went in.

Can’t please everyone! 😉

The lesson here is that we shouldn’t get too fixated on what the “right” method of analysis is. There are always many different ways to approach a problem and they can all yield useful information. Doing what you outline would surely be interesting (I just haven’t grabbed relay data at all) and I’d love to hear how it turns out if you try it.

On the more specific point of contention here, the changing of FIS point calculations over time, since I do think that’s at least potentially a problem, I went back and redid the graphs using FIS points calculated in a uniform manner (current F-values). This is also “wrong” in that I can’t be totally sure I’ve correctly classified every pursuit as a with break or without break variety, but at least this way we can be certain that we’ve eliminated dramatic changes in scoring from season to season.

What happened? As I expected, the graphs look basically identical. The only change I can detect is that the 5% line for the US men looks (very) _slightly_ angled down, rather than flat over the last few season. But that’s it.

Personally, I consider that pretty strong evidence that the changing F-factor values wasn’t a huge concern for my analysis.

Finally, I think your complaint about missing data from the 60s-80s is simply mistaken. I think we both probably suspect that if we had that data we’d see (for the men anyway) a general decline in performance since the early 80’s. But missing part of time series data doesn’t make the part you have _wrong_, it simply changes the context and the questions you can answer. If what you want to know is the trend in US performance since the Bill Koch era, than my data aren’t helpful. But that doesn’t make it wrong; it just doesn’t answer the question you were hoping it would. So on this point I think we’re just going to have to agree to disagree. :shrug:
Tim Kelley
April 13, 2010 at 9:40 pm

Joran: Your initial premise for doing this analysis was “are US skiers getting faster?” So yes, you are not wrong in choosing the timeline that you did. But you are definitely not answering the question you initially posed in a serious manner. You have chosen an arbitrary data window that is based on data availability, not reality over the long term. To give this analysis credibility for distance racing you have to go back further than the 90s or the dawn of the Internet (and easy data access).

You say that skiers don’t want to be judged by one race every 2-4 years? Excuse me, that’s what the ultimate goal of xc ski racing is – to win Olympic or World Cub medals, or at least to do as well as one can in these major championships. Haven’t you heard of people training for an Olympic cycle? The goal is to be the best that you can be when the Olympics come around.

So I disagree. The Olympics are the best metric to use for this analysis because that is when ski racers are expected and supposed to be skiing their fastest. No top skiers should be dogging an Olympic race, like they might in a World Cup race, to save themselves for a more important race.

If you want to publish something meaningful on fasterskier regarding US distance skier performance over time, you need to use uniform data over an extended time period. Olympic (and World Cup) relay data is likely your answer given that your choice of data sources seem to be garbage for this exercise. If you don’t want to go to the effort of producing some data that is meaningful, like using relay data … perhaps this would be a fine project for that math, computer and data mining whiz named Benji?
FasterSkier
April 13, 2010 at 10:04 pm

If I understand the issues correctly, one of the biggest drawbacks of using only Olympic results is that it is a tiny sample size and therefore would make it difficult to draw meaningful statistical conclusions.

Additionally, the period is no more arbitrary than starting in ’77 – there was ski racing before ’77. As Joran points out, his work is looking at whether US skiing has improved since 1990. There is nothing wrong with asking that question and he shouldn’t be criticized for not asking the question YOU want answered. Finding the data you want, and entering it into a computer, would be a monumental task.

Additionally your idea that relays are the best measurement is flawed. Look at the Olympic relay this year. It is easy to argue that Norway does not have the 2nd best distance team in the world right now. They have Petter Northug, and then a whole bunch of 2nd tier (out of the top-10) World Cup distance skiers.

And what would be the point of a study based on relay data? We know the answer – 2002 would be the high point – since 1980, the US has finished inside the top-6 in the Olympic relay just that once. A great result, but it doesn’t really tell us very much. And as we have seen at other times, the US can’t field four strong relay skiers, but may have a single skier capable of winning a race (i.e Kris Freeman). If winning medals is the goal, a Freeman medal would seemingly be more valuable than a combined 5th in the relay…
Martin Hall
April 14, 2010 at 7:58 am

Here are a few thoughts for you to chew on when it comes to the mid-70s to the early 80s—I think the Canadian men are now approaching what the US men did during that period of time—they will just have to do what they are doing now for another 4-6 years. Also the US women during that era would be close behind or equal to the US women right now—while both of those teams would be way behind the Canadian women’s team of 2002 to 2010. Back to the US men—they were loaded with top talent and they had depth: Bill Koch, Tim Caldwell, Doug Peterson, Ron Yeager, Chris Haines, Jim Galanes, Dan Simoneau, Stan Dunklee, Larry Martin and Craig ward. Of course there was the 76 Olympics with Kochies medal and the 6th place relay team and the other high placings they had. The 80 Olympics in Placid were a flop for these guys. But then they really hit their stride for the first 3 years of the 80s—they were winning World Cup relays, placing 3-5 guys in the top 20 in 50% of theWorld Cups —these guys were a force. Kochie won the overall World Cup one year and was 3rd another time. The year he was third he probably could have won it if he had listened to his Coach Mike Gallagher. We had a World Cup in Anchorage and there was a WC relay and 15 km and then on the Monday the US National 30 km was held and on Tuesday we flew from Anchorage to Labrador City—the flight from hell. Then on Thursday Kochie time trialed the 30km course to see what the 30 was going to be like on the weekend–well needless to say he was flat for the WC 30 and ended up 3rd in the overall WC.
I know my evaluation is subjective, and I sure would like to see the numbers, but I’ve seen all of what is going on and I’d bet the numbers would prove me right.
So, there is some work out there for the current skiers to do yet to beat these guys.
benji_uffenbeck
April 14, 2010 at 2:17 pm

The original question asked in this article was, “Are international results of US skiers improving?”. Although the results shown are certainly interesting, I don’t think they really answer the question.

The biggest problem I see with the analysis is the relationship between FIS points and actual rankings. The most important thing in top level ski racing is what place you get, not how far behind the winner you were. Devon Kershaw had an incredible FIS point result of 0.3 in the Olympic 50km this year, unfortunately for him he was 5th. You’d think 0.3 would equate to a medal, but it didn’t.

I checked the most recent FIS points list for men, and there are currently 38 men with distance points < 15. Next I checked an old FIS points list from 1995; I found 9 men with distance points < 15. I have not done any checking beyond this, but I'd guess that the number of racers with low point values has grown over time, especially with the advent of mass starts, and also just a general increase in the availability of decent points races. If the number of skiers from all countries with low FIS points results is increasing with time, then what is the significance of more US results under a certain threshold? At the least it's watered down a bit.

To put it differently, in 1995 a skier could improve from 30 FIS points to 20 FIS points, and their world ranking would improve from 35th to 16th. In 2010 a similar improvement would move a skier from 92nd to 59th!

So, if FIS points of US skiers trend downward recently you'd have to say we're improving, but the rankings associated with that improvement are not nearly as impressive in 2010 as they were in 1995.

Bottom line, there are more people than ever before within a small time gap of the best skier in the world. The 100th ranked male distance skier in the world currently has 31 FIS points! The trends may show that US skiers are getting closer to the winner in terms of percentage back, and you could certainly argue this is a good thing. However, if you want to track real improvement, it all boils down to what place you get. Bill Koch was #1 in the world for an entire season, and no American since has come close in distance racing.
triguy
April 15, 2010 at 12:12 pm

@TimKelley. It sounds like you are the one that doesn’t want to put the effort into doing anything meaningful. The arm-chair quarterbacking is not productive. If the Olympic relay is the be all end all of data analysis for US distance skiing, I suggest you look up the results, summarize and analyse those results and present them to the FS readers for us to debate and critique. If we are using the results you suggest we would come to the obvious conclusion that the 2002 men’s team was in fact the peak of US men’s distance skiing, better than the 1980 mens team with the so-called dominant skiers like Koch, Caldwell and Galenes (they were 8th vs 5th in ’02). They were also much closer to the leaders in ’02, only 30 sec from a medal.

In general I think the data as presented provides a very good picture of the distance results over the last 15 years. As it has been mentioned already doping was a significant issue until the last 5-10 years and including data much beyond that time just complicates things and also adds many other significant barriers to drawing any conclusions or interpretation. I think the graphs confirm what many people have thought over the past 10 years about the performance of both the Can/US teams in distance racing. I also don’t really think we need data to show Bill Koch was the best distance skier and that the depth was better at that time for the US men. If we want the ‘entire’ picture we can start in the 70’s just to capture Koch, we would need to start in the 20’s or even earlier to capture the real beginning of the sport. Anyone up for that task????

I would argue that looking at FIS points is actually better for this exercise than looking at placing and results. Obviously as pointed out already a 10 point race 20 years ago might have been a medal or at least a high placing when today it might mean you were dropped from the pack and ended up 30th. However, in order to win the race you need to be in the pack at the end, so looking at the low FIS points might be more relevant than looking at a 25th or 30th placing in a bunch sprint. Sure Devon was 5th at the Olympics but he was only a few feet away from a medal so his 0.3 FIS points reflects that accurately. He could have been 5th and 500 yards from a medal, but 5th in and of itself doesn’t tell you that.

When you only look at results you can lose the picture of what happened in the race. Look at the weak fields in some WC races (Whistler ’09, Russia ’10, etc). Skiers are finishing 13th, 14th, etc in a WC that have never even cracked the top 30 or 40 in a full field. Should we look at that 13th or 22nd and say they had the most amazing race of their lives or look at the FIS points and see that the field was weak and they were well off the pace. Look at Southam last year – 22nd at Whistler (69.69 FIS), 33rd at World Champs (32.04 FIS). Placing would say he was better at Whistler but points, time behind says he was much better at World Champs.

@benji. The number of skiers below 15 FIS points will have almost nothing to do with access to lower points races. Under the current rules only WC, World Champs, Olympics can have a start value of 0, all other races have a minimum penalty of 15 (25 and 35 for U23 and Junior WC). The only area that would be affected is the number of Nations group spots for countries like Norway, Sweden, Russia that have huge depth and were previously limited by quota spots (with skiers that can actually score under 15). If one conclusion to what you are suggesting is that we have more depth now and more skiers at a high level, we would have the same challenges when looking at results placing and actually might have more confounding variables. The answer in your case would be to only look at medals, since winning is what matters. The problem with that method would be the significant lack of medals for CAN/USA over the entire history of nordic skiing that the data would be a few blips on a graph spread out over 85 years.

One thing that would be interesting in terms of depth of field would be to look at the number of different skiers on the world cup from year to year and the number of skiers that medal on the world cup every year. In theory if the depth is increasing we should see a higher number of skiers with a podium finish each year (of course correcting for the total number of world cups held each year). It would also be interesting to see if the number of nations represented on the world cup is increasing and also if the number of nations in the top-30 overall standings and on the podium is increasing.

Final point. I think that people like Joran should be applauded for looking at the data and taking the time (huge amount of time) to put the data together and summarize/post this info for people to review. Of course a healthy debate about the pro/cons of the method and what conclusions we can draw is good, but just saying the entire exercise is wrong because it’s not what you want is relatively pointless and will just serve to keep these people from doing the work in the future.
JoranElias
April 15, 2010 at 1:54 pm

@triguy

Here’s a quick answer one of your questions. The following tables give us some indication of how much turnover there is on WC (and OWG, WSC) podiums. Specifically, the third column is the ratio of the number of distinct racers achieving a podium finish to the total number of podiums available (3 * the number of races that season).

Enjoy!

season gender r
———- ———- ———-
1991-1992 Men 0.4125
1992-1993 Men 0.33
1993-1994 Men 0.44
1994-1995 Men 0.3575
1995-1996 Men 0.33
1996-1997 Men 0.44
1997-1998 Men 0.495
1998-1999 Men 0.495
1999-2000 Men 0.5775
2000-2001 Men 0.5775
2001-2002 Men 0.55
2002-2003 Men 0.605
2003-2004 Men 0.6875
2004-2005 Men 0.6325
2005-2006 Men 0.77
2006-2007 Men 0.715
2007-2008 Men 0.6325
2008-2009 Men 0.77
2009-2010 Men 0.6325
#######################
1991-1992 Women 0.2475
1992-1993 Women 0.2475
1993-1994 Women 0.33
1994-1995 Women 0.33
1995-1996 Women 0.22
1996-1997 Women 0.33
1997-1998 Women 0.3025
1998-1999 Women 0.385
1999-2000 Women 0.3025
2000-2001 Women 0.33
2001-2002 Women 0.44
2002-2003 Women 0.44
2003-2004 Women 0.385
2004-2005 Women 0.3575
2005-2006 Women 0.44
2006-2007 Women 0.44
2007-2008 Women 0.4675
2008-2009 Women 0.4675
2009-2010 Women 0.5225
Brian Olsen
April 15, 2010 at 7:13 pm

Joran,

Great work! Seems like someone was trying to avoid finishing their taxes today 🙂

Can I ask, did you use R, SPSS, or…?

Brian
benji_uffenbeck
April 15, 2010 at 7:17 pm

For any particular season, it might be possible to come up with an average FIS score for placements 1 – 30 in World Cups, Olympics, and World Championships races combined. 1st place is always worth 0.0 points, but after that each place would have an associated average FIS score for that season. With this list, you could then rank the seasons by how “competitive” they were. This would produce some sort of trend of competitiveness over time.

If you then applied Joran’s FIS points analysis trends on top of the seasonal rankings, you would get a better idea of how much success could be expected from a given FIS score.

For example, in 1995 an FIS score < 10 might regularly result in a top 5 result, while in 2010 it might only average a top 10 result (I'm just guessing with these numbers). Based on the current analysis, we would say that a skier netting 10 points in 1995 and 2010 was equal, but the actual results show otherwise.

I would think there are cases where a nation's FIS scores are decreasing, yet the number of top 10's etc. does not necessary increase at the same rate. Until you can make this correlation, you can't really say whether results for a particular country are improving over time. With FIS points analysis alone, you get the trend of percent back over time, but you don't have the context of actual results. Combining the two graphs might give you a better idea of this picture.

Joran – Please know that I'm not trying to criticize your work, just offering up other ideas and opinions. I enjoyed your article and found it really interesting.
JoranElias
April 16, 2010 at 10:17 am

@Brian Olsen – What are you talking about? Doesn’t _everyone_ do their taxes in February, like me? 😉 To answer your question, I’m using a combination of SQLite, Python and R.

@Benji – Don’t sweat it! It’s flattering that people find the article interesting enough to comment on it (more than once!).
teamepokeedsbyn
April 16, 2010 at 5:39 pm

It seems to me, since we are talking about a team and how it stacks up over time against the best skiers in the world, one might combine FIS points for maybe the three best athletes from each country for traditonal x-c distances. This might give one a better picture on how a nation is coming along (or not). For an even better picture of the future, do the same (3 skiers as a single data point) for Junior Worlds (bag the silly “U23”, as the best 22-23 year olds are racing WC/Olys).

This, i think, will give a better picture on how our “team” stands against the world’s best, versus on relying on a single skier (Freeman or Randall) for results.

I think Tim Kelly is am excellent armchair quarterback, highly productive, but kinda ugly.