SQL Project: Analysis of 120-years Olympics History Data

Bayonle
2 min readJul 4, 2023

--

by me (designed on Canva)

With the mindset of practising my SQL skill, I took on the SQL Challenge I designed my @Thoufiq Mohammed on his YouTube page (@techFTQ). In this post, I will share;

  1. What the project is about and where to find it (incase you are interested).
  2. How I solved the questions and demonstrated SQL knowledge.

Project Details

The challenge contains 20 analytics questions to be answered with SQL (I used PostgreSQL). The dataset contains information about Olympics games for about 120 years. The datasets were downloaded at Kaggle.

The file athlete_events.csv contains 271116 rows and 15 columns. Each row corresponds to an individual athlete competing in an individual Olympic event (athlete-events). The columns are:

metadata

The noc_regions.csv contains two columns: NOC and region (country).

You can find the Analytics questions on TechTFQ blog.

My Solution

After downloading the datasets, I loaded them in Excel to explore them. Then I:

  1. Set up a Database in PostgreSQL
CREATE TABLE CREATE TABLE IF NOT EXISTS events
(id INT,
name VARCHAR,
sex VARCHAR,
age VARCHAR,
height VARCHAR,
weight VARCHAR,
team VARCHAR,
noc VARCHAR,
games VARCHAR,
year INT,
season VARCHAR,
city VARCHAR,
sport VARCHAR,
event VARCHAR,
medal VARCHAR);

DROP TABLE IF EXISTS region;
CREATE TABLE IF NOT EXISTS region
( noc VARCHAR,
region VARCHAR,
notes VARCHAR);

And I imported the dataset into respective tables.

2. I started querying to answer each questions. The queries are documented in my GitHub repository. Examples of interesting ones there are:

/* Q12: Fetch the top 5 athletes who have won the most medals 
(gold/silver/bronze) */

WITH t1 AS (SELECT name, sex, team, COUNT(*) AS total_medals
FROM events
WHERE medal IN ('Gold','Silver','Bronze')
GROUP BY name, sex,team
ORDER BY total_medals DESC),

t2 AS (SELECT *, DENSE_RANK() OVER(ORDER BY total_medals DESC) AS rnk
FROM t1)

SELECT name, team, total_medals
FROM t2 WHERE rnk <= 5

-- Q6: Identify the sport which was played in all summer olympics

SELECT DISTINCT sport, COUNT(sport)
FROM (SELECT DISTINCT games, sport -- table of summers and each sport
FROM events
WHERE season ='Summer'
ORDER by games) AS t1
GROUP BY sport
HAVING COUNT(sport) = (SELECT COUNT (DISTINCT games) -- total_no of Summers
FROM events
WHERE season = 'Summer')

Skills Involved: CTEs, Aggregate functions, Window Functions, Crosstabs or Pivoting and others.

--

--

Bayonle
Bayonle

Written by Bayonle

I use this as a medium to communicate and share my thoughts and work in the Data Field.

No responses yet