Mastering SQL Window Functions: A Step-by-Step Guide to Analyzing Olympic Data

Idrissa Tankari

--

SQL window functions are powerful for running complex queries, especially when working with large datasets. In this article, I'll guide you through a step-by-step tutorial on using SQL window functions to analyze Olympic data, focusing on Summer Olympic medals. Whether you're a beginner or looking to sharpen your SQL skills, this project will give you the skills to analyze real-world data.

We will cover various SQL window functions, including ranking athletes, analyzing medalists over time, and calculating running totals. The dataset used is a table named Summer_Medals, which contains historical Olympic medal data.

Introduction to SQL Window Functions

Window functions allow you to perform calculations across a set of table rows related to the current row. Unlike aggregate functions, window functions do not collapse rows but instead, add result columns to each row. This is especially useful when we want to rank, filter, or perform running totals while keeping all the rows intact.

Basic Setup — Retrieving Olympic Medal Data

We begin by querying the dataset, returning key information such as the year, event, and country for each medal.

SELECT Year, Event, Country, Medal
FROM Summer_Medals
WHERE Medal = 'Gold'
ORDER BY Year, Event;

This query returns a list of gold medal-winning countries grouped by year and event.

Reigning Champions by Gender

The LAG() function allows you to access data from the previous row within the same result set based on a specified ordering. This is helpful for comparing the current row to a previous one without having to join the table on itself.

Now, let's determine the reigning champions by gender using the LAG() window function. This allows us to look back at the previous year's champion for comparison.

WITH Tennis_Gold AS (
SELECT DISTINCT Gender, Year, Country
FROM Summer_Medals
WHERE Year >= 2000 AND Event = 'Javelin Throw' AND Medal = 'Gold'
)
SELECT
Gender, Year, Country AS Champion,
LAG(Country) OVER (PARTITION BY Gender ORDER BY Year ASC) AS Last_Champion
FROM Tennis_Gold
ORDER BY Gender ASC, Year ASC;

This query helps us identify trends in gender-specific events by listing both the current and prior champions.

Reigning Champions by Gender and Event

To take it a step further, we analyze champions by both gender and event using a similar approach. This is particularly useful for identifying patterns in specific athletic events.

WITH Athletics_Gold AS (
SELECT DISTINCT Gender, Year, Event, Country
FROM Summer_Medals
WHERE Year >= 2000 AND Discipline = 'Athletics' AND Event IN ('100M', '10000M') AND Medal = 'Gold'
)
SELECT
Gender, Year, Event, Country AS Champion,
LAG(Country) OVER (PARTITION BY Gender, Event ORDER BY Year ASC) AS Last_Champion
FROM Athletics_Gold
ORDER BY Event ASC, Gender ASC, Year ASC;

This breakdown allows us to see the progression of champions across multiple years and disciplines.

Future Gold Medalists

LEAD() works similarly to LAG(), but instead of looking at previous rows, it allows you to access data from future rows.

Now, let's predict future gold medalists. Using, we can look forward to the next three years' potential champions.

WITH Discus_Medalists AS (
SELECT DISTINCT Year, Athlete
FROM Summer_Medals
WHERE Medal = 'Gold' AND Event = 'Discus Throw' AND Gender = 'Women' AND Year >= 2000
)
SELECT
Year, Athlete,
LEAD(Athlete, 3) OVER (ORDER BY Year ASC) AS Future_Champion
FROM Discus_Medalists
ORDER BY Year ASC;

This query is useful when analyzing potential future champions, helping us to forecast who might dominate in upcoming competitions.

Last City by Year

LAST_VALUE() returns the last value in an ordered set of rows, useful for retrieving the most recent value in a series of records.

This query allows us to see which city last hosted the Olympics in the given dataset, using the LAST_VALUE() window function.

WITH Hosts AS (
SELECT DISTINCT Year, City
FROM Summer_Medals
)
SELECT
Year, City,
LAST_VALUE(City) OVER (ORDER BY Year ASC RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS Last_City
FROM Hosts
ORDER BY Year ASC;

This query provides a quick overview of the cities that hosted the Olympic Games over time.

Ranking Athletes by Medals Earned

The RANK() function assigns a rank to rows within a partition based on the specified column. If there are rows with equal values, they will be assigned the same rank, but the next rank will skip by the number of rows tied.

Next, let's rank athletes by the total number of medals they have earned using the RANK() function.

WITH Athlete_Medals AS (
SELECT Athlete, COUNT(*) AS Medals
FROM Summer_Medals
GROUP BY Athlete
)
SELECT
Athlete, Medals,
RANK() OVER (ORDER BY Medals DESC) AS Rank_N
FROM Athlete_Medals
ORDER BY Medals DESC;

This gives us a clear ranking of top athletes based on the number of medals they've won.

Ranking Athletes from Multiple Countries

DENSE_RANK() is similar to RANK(), but it does not skip ranks when there are ties. If multiple rows have the same rank, the next rank is not skipped.

Using the DENSE_RANK() function, we can rank athletes by country to compare performance across nations.

WITH Athlete_Medals AS (
SELECT Country, Athlete, COUNT(*) AS Medals
FROM Summer_Medals
WHERE Country IN ('JPN', 'KOR') AND Year >= 2000
GROUP BY Country, Athlete
)
SELECT
Country, Athlete,
DENSE_RANK() OVER (PARTITION BY Country ORDER BY Medals DESC) AS Rank_N
FROM Athlete_Medals
ORDER BY Country ASC, Rank_N ASC;

This query allows for a detailed comparison between athletes of different countries, making it easy to identify top performers by nation.

Top, Middle, and Bottom Thirds

NTILE() divides rows into a specified number of roughly equal groups, and assigns a group number to each row. It's useful for breaking a dataset into quartiles, deciles, or other groupings.

We can divide athletes into performance tiers by medals using the NTILE() function.

WITH Athlete_Medals AS (
SELECT Athlete, COUNT(*) AS Medals
FROM Summer_Medals
GROUP BY Athlete
HAVING COUNT(*) > 1
)
SELECT
Athlete, Medals,
NTILE(3) OVER (ORDER BY Medals DESC) AS Third
FROM Athlete_Medals
ORDER BY Medals DESC, Athlete ASC;

This categorizes athletes into three distinct groups based on performance.

Running Totals of Athlete Medals

The SUM() function can also be used as a window function to calculate a running total, which accumulates the sum of values for the current and all previous rows.

To calculate the running totals of medals for athletes from a particular country, we use the SUM() window function.

WITH Athlete_Medals AS (
SELECT Athlete, COUNT(*) AS Medals
FROM Summer_Medals
WHERE Country = 'USA' AND Medal = 'Gold' AND Year >= 2000
GROUP BY Athlete
)
SELECT
Athlete, Medals,
SUM(Medals) OVER (ORDER BY Athlete ASC) AS Max_Medals
FROM Athlete_Medals
ORDER BY Athlete ASC;

This query provides cumulative totals for each athlete's medals.

Conclusion

In this article, we covered a variety of SQL window functions and demonstrated their practical use cases in analyzing Olympic medal data. We explored ranking athletes, tracking reigning champions, predicting future winners, and more. Window functions are a powerful addition to any data analyst's toolkit, and practicing these queries will give you the confidence to tackle more complex SQL problems.

SQL window functions are a game-changer when working with large datasets, and mastering them will open up many new possibilities for your analysis work.

Happy querying!

--

--

No responses yet

Write a response