Data as the new doping

Never before have athletes produced so many figures.

Foto © Sam Rentmeester . 20160529 . Sports Datavalley, meten aan een fiets, Delft Integraal DI // thema

Devices like the little blue measuring instrument on the bike above, are generating a tsunami of data. The challenge is filtering the data.
Photo © Sam Rentmeester 

Skating coach Jac Orie was one of the first to discover this. Over the past fifteen years he collected training data from forty skaters: heart rate, lap times, and subjective intensity scores. Together with Leiden University data scientist Arno Knobbe Orie discovered the phenomenon of ‘super compensation’: after a heavy training session, athletes are first tired, then incredibly fit, before finally falling back to their base level. Orie uses this knowledge in his training sessions to allow skaters to peak at exactly the right moment. This led to Kjeld Nuis winning a gold medal in last winter’s World Championship.

To measure is to win
Cycling team Giant-Alpecin is also monitoring its cyclists’ performances. TU Delft student Marieke de Vries (Applied Mathematics) helps the team to identify anomalous performances in the huge data sets collected by the team managers: the cyclists’ heart rate, pedalling force, speed, and track. If performance is lower than expected, the cyclist is probably coming down with something.

Another TU Delft student, Jeroen Roseboom, worked with embedded scientist Koen Muilwijk from InnoSports Lab The Hague on the current, wind, boat speed and boat position data of boats sailing across the Bay of Rio. The coach, collecting data from a dingy, noticed that strange values were appearing in the data at the moment of tacking. “There was a lot of white noise in the data,” says Professor Geurt Jongbloed. Jongbloed is Professor of Mathematical Statistics at the Faculty of Electrical Engineering, Mathematics and  Computer Science. Roseboom developed a correction for the measurement data based on a  mathematical model. “These are measurements of current, wind and speed in relation to the  water, which means that these data are interdependent. This redundancy can be used to improve the deviant measurements.”

His colleague Professor Geert-Jan Houben (Professor of Web Information Systems at the Faculty of EEMCS) worked with Ortec Sports (‘creating value from official data’) on the input of football and hockey data. Some data, such as position in the field, speed, and heart frequency, are recorded directly through sensors. Other data, such as possession of the ball and number of successful passes, are recorded manually. But how reliable are these data? What can be done about missing data? And how many people do you actually need for tracking? To answer these questions Houben uses the knowledge he acquired when studying the use of crowd-sourcing for the description of drawings at the Rijksmuseum. Houben: “We develop general theories that can be applied to concrete situations. Whether you are looking at a play situation or deciding which people to deploy for interpreting a drawing, the generic principles are the same.”As an example, Houben mentions the analysis of passes, which can range from fast and short (‘tiki-taka’ style) to long and far (‘kick & rush’). For a coach, it is important to know where the ball that a striker kicks into the goal came from. If a pattern can be found, the coach will know where to place his defending players in order to intercept this kind of pass.

Does this lead to a competitive advantage? Temporarily, yes, according to Houben. If only one of the teams uses this kind of data analysis, this team will have a clear advantage. “But once everyone starts using data science, it will be the end of top sport,” philosophizes Houben. “Because it is precisely the uncertainty that makes top sport exciting. And this is what you lose when you know everything.”

Pack of millions
Cycling app Strava counts more than eight million users worldwide, more than one million of whom are active (i.e. with recent posts). This makes this sports app, used by cyclists and runners to share their performances and routes with other users, one of the best known recreational sports applications.
“In amateur sport, the data per athlete are less comprehensive and less reliable than in professional sport,” says Houben. Data regarding age, gender, heart frequency, speed, track, altimeters and pedalling power are all saved so that you can compare your performance to that of others. But how reliable are these data? Houben pleads for data literacy so that people learn to use these applications better. “You shouldn’t blindly trust a training advice without knowing what it is based on.” His colleague Jongbloed also sees advantages: “If you see that on a certain track your own heart rate increases much faster than that of other athletes, it may be worthwhile to have it checked out.”

Intersecting lines
Our computational power is increasing according to Moore’s Law, but the quantity of available data grows much faster still. At some point these lines will intersect and we will have more data than we can handle. How can we keep sports data manageable and relevant?

“Devices are generating a tsunami of data,” says Houben. “It’s not that easy to make sense of it all. Our greatest challenge is filtering the data. Data should have a clear meaning and structure. Once you have achieved this, it is much easier to process the data.”

Jongbloed sees the presentation of the results as an important challenge. How do you translate statistical connections to advice for a coach or trainer? Visualisation can be helpful in this context. Clear visuals can make all the difference. But the question remains: How did a particular result come about? “This still requires a certain level of knowledge,” says Houben.

Stay informed about the research

Receive the Delft Outlook newsletter 4 times a year