Now that I have some athlete similarity code up and running, let’s take it for a spin, shall we?
The basic idea is to pick a skier (Beckie Scott in this case) and then mine my results database for skiers who’ve had similar careers. This is a fairly complicated task with a lot of steps. You can refer to my previous post for more details on the methodology. The important things to remember at the moment, though are:
– My measure of similarity looks at every result in overlapping age ranges
– This is an inherently noisy process; we can expect some bad matches
– This is not a 100% automated process; we should expect to have to make some judgements along the way about when two skiers can reasonably be thought of as “similar”.
– Distance and sprint racing will be treated separately.
– Athletes that are “most similar” to Beckie Scott might still be very different in an absolute sense.
Now let’s get down to it. My algorithm grabbed around 60 skiers that were plausibly similar to Beckie Scott with respect to distance racing. After analyzing these 60, here are the 9 identified as the most similar (in no particular order):
Scott is shown in the upper left for reference. As I warned, some of these seem like more reasonable matches than others. Remember that what I’m matching here is the similarity of the overall point cloud, not particular success at particular races, etc.
Arianna Follis, Marianna Longa and maybe Elin Ek seem like decent matches, although Ek certainly wasn’t as successful as Scott. Karine Philippot also seems like a decent match, although again she wasn’t quite as successful, and she continued to race at later ages.
Cristina Paluselli looks like a good fit, up to around age 24-25. Scott continued to improve significantly, while Paluselli stalled a bit. Sara Renner is much the same: very similar up to around 24-25 years old, and then just didn’t quite get to the next level like Beckie Scott did.
Petra Majdic and Riita-Liisa Roponen are being matched for roughly the opposite reason. Their careers look more similar to Scott’s between ages 26-33 than earlier.
How about sprinting?
Things are a little messier here (which I found generally to be the case when looking at sprint race data). Let’s dispense with Manuela Henkel and Pirjo Muranen first. They clearly don’t seem like good matches.
Anna Dahlberg, Aino-Kaisa Saarinen and maybe Petra Majdic seem like reasonable matches. Dahlberg certainly wasn’t nearly as successful as Scott, but the general age vs. result pattern is roughly similar at least. Majdic and Saarinen had more success at an earlier age than Scott, but other than that, they’re pretty close.
Beckie Scott had a fair degree of success at sprinting, which means she’s got a bunch of dots down near the bottom of her graph. This will lead my algorithm to select skiers who’ve just generally been good at sprinting. This tells most of the story behind why Bjoergen and Skari are included here. The message here is that Beckie Scott was (for a few years) one of the world’s best sprinters, so the algorithm is going to compare her favorably to other dominant sprinters.
Virpi Kuitunen is an interesting case. Her graph doesn’t look all that similar to Scott’s at first. But partly this is because her graph is so bizarre looking. That jump from age 24-25 happened in 2001, Kuitunen is Finnish, and, well…then she took some involuntary “time off” for two years for her chemical exploits.
Try something: imagine her development pattern between ages 24 and 27 proceeded roughly how you think they “should” have by just filling in the graph with a nice continuous improvement in results. Now it would look more similar to Scott’s, at least between ages 21-32 or so. So maybe, in an alternate doping free universe, Kuitunen might have been more similar to Scott. Who knows…