Exploring Surface Temperatures

by

Analysing global surface temperatures over time in SQL


Global land surface temperatures are a critical indicator of climate change and environmental health. By analyzing historical temperature data, we can uncover trends, identify anomalies, and gain insights into the impacts of global warming on different regions. In this post, I will explore some ways to perform analyses on this type of data. The dataset was downloaded from Kaggle here, provided by Masood Ahmed.

The first CSV I’m using is ‘globallandtemperaturesbycountry.csv’, which has the following schema;


Some questions we might want to ask of this data are:
1) Which months are the hottest and coldest?
2) Which countries are the hottest and coldest?
3) Which were the hottest and coldest periods?
4) What do the overall trends suggest?

Let’s start simple by looking at the aggregate average temperature by month across all years and countries. We can see that the hottest three months of the year (to nobody’s surprise) are July, August and June, while the three coldest are January, December and February.

SQL
SELECT 
    MONTH([dt]) AS Month,
    ROUND(AVG([AverageTemperature]), 2) AS AvgMonthlyTemperature
FROM 
    [LH_Raw].[dbo].[globallandtemperaturesbycountry]
GROUP BY 
    MONTH([dt])
ORDER BY 
    AvgMonthlyTemperature DESC;


Before drilling down to a more granular level, there are still some interesting overall aggregate data we could take a look at.

SQL
-- Step 1: Calculate average temperature for each 25-year bucket and assign row numbers for median calculation
WITH AvgTemperatures AS (
    SELECT 
        (YEAR([dt]) / 25) * 25 AS YearBucketStart,  
        AVG([AverageTemperature]) AS AvgTemperature, 
        ROW_NUMBER() OVER (PARTITION BY (YEAR([dt]) / 25) * 25 ORDER BY AVG([AverageTemperature])) AS RowAsc,  --
        COUNT(*) OVER (PARTITION BY (YEAR([dt]) / 25) * 25) AS TotalCount  
    FROM 
        [LH_Raw].[dbo].[globallandtemperaturesbycountry]
    GROUP BY 
        (YEAR([dt]) / 25) * 25, [Country] 
),

-- Step 2: Calculate the Min, Max, and Median temperatures for each 25-year bucket
MedianTemperatures AS (
    SELECT 
        YearBucketStart,
        ROUND(MIN(AvgTemperature), 2) AS MinAvgTemperature,  
        ROUND(MAX(AvgTemperature), 2) AS MaxAvgTemperature, 
        ROUND(CASE 
            WHEN TotalCount % 2 = 1 THEN 
                MAX(CASE WHEN RowAsc = (TotalCount + 1) / 2 THEN AvgTemperature END)
            ELSE  
                AVG(CASE WHEN RowAsc IN ((TotalCount / 2), (TotalCount / 2) + 1) THEN AvgTemperature END)
        END, 2) AS Median
    FROM 
        AvgTemperatures
    GROUP BY 
        YearBucketStart, TotalCount  
),

-- Step 3: Calculate previous period's Min, Max, and Median temperatures
FinalQuery AS (
    SELECT 
        YearBucketStart,
        MinAvgTemperature,
        MaxAvgTemperature,
        Median,
        LAG(MinAvgTemperature) OVER (ORDER BY YearBucketStart) AS PreviousMin,  
        LAG(MaxAvgTemperature) OVER (ORDER BY YearBucketStart) AS PreviousMax,  
        LAG(Median) OVER (ORDER BY YearBucketStart) AS PreviousMedian  
    FROM 
        MedianTemperatures
)

-- Step 4: Calculate the difference between the current and previous periods
SELECT 
    YearBucketStart,
    MinAvgTemperature,
    MaxAvgTemperature,
    Median,
    ROUND((Median - ISNULL(PreviousMedian, Median)), 2) AS VsPrevMed,  
    ROUND((MinAvgTemperature - ISNULL(PreviousMin, MinAvgTemperature)), 2) AS VsPrevMin, 
    ROUND((MaxAvgTemperature - ISNULL(PreviousMax, MaxAvgTemperature)), 2) AS VsPrevMax  
FROM 
    FinalQuery
ORDER BY 
    YearBucketStart;  


In the ‘AvgTemperatures’ CTE, I’ve grouped the data into 25-year buckets to make it easier to manage. Within each bucket, the avg temp is calculated per country. Then, a sequential number is assigned to each row for calculating the median, and COUNT() determines total rows in each bucket (which is necessary for handling even and odd row counts). Then, in the ‘MedianTemperatures’ CTE I’ve calculated the min, max and med of AvgTemperature for each 25-year period conditionally, adjusting for even and odd row counts using window functions. The ‘FinalQuery’ CTE retrieves the previous period’s min, max and median values using LAG().

Looking briefly at the output, it generally suggests a warming trend over the last 300 years, with both median and maximum temperatures increasing. Fluctuations in minimum temperatures might suggest more variability in colder periods. The relatively smaller changes in the more recent periods might indicate that the rate of change is stabilizing, but it remains upward nevertheless.

The next analysis is looking at countries by hottest and coldest temperature. For ease, I’ve continued with the 25-year buckets.

SQL
WITH YearlyAvgTemp AS (
    SELECT 
        YEAR([dt]) AS Year,
        [Country],
        ROUND(AVG([AverageTemperature]), 2) AS AvgYearlyTemperature
    FROM 
        [LH_Raw].[dbo].[globallandtemperaturesbycountry]
    WHERE 
        [AverageTemperature] IS NOT NULL
    GROUP BY 
        YEAR([dt]), [Country]
),
BucketizedTemps AS (
    SELECT 
        Year,
        Country,
        AvgYearlyTemperature,
        (Year / 25) * 25 AS YearBucket
    FROM 
        YearlyAvgTemp
),
RankedBucketizedTemps AS (
    SELECT 
        YearBucket,
        Country,
        AvgYearlyTemperature,
        ROW_NUMBER() OVER (PARTITION BY YearBucket ORDER BY AvgYearlyTemperature DESC) AS HighTempRank,
        ROW_NUMBER() OVER (PARTITION BY YearBucket ORDER BY AvgYearlyTemperature ASC) AS LowTempRank
    FROM 
        BucketizedTemps
)
SELECT 
    YearBucket,
    MAX(CASE WHEN HighTempRank = 1 THEN Country END) AS HottestCountry,
    MAX(CASE WHEN HighTempRank = 1 THEN ROUND(AvgYearlyTemperature, 2) END) AS HottestTemperature,
    MAX(CASE WHEN LowTempRank = 1 THEN Country END) AS ColdestCountry,
    MAX(CASE WHEN LowTempRank = 1 THEN ROUND(AvgYearlyTemperature, 2) END) AS ColdestTemperature
FROM 
    RankedBucketizedTemps
GROUP BY 
    YearBucket
ORDER BY 
    YearBucket;

Djibouti appears frequently as the hottest country in multiple 25-year periods, while Greenland is consistently the coldest country across the periods. I would like to explore the temporal changes in temperature across these two countries a little more:

SQL
WITH YearlyAvgTemp AS (
    SELECT 
        YEAR([dt]) AS Year,
        [Country],
        AVG([AverageTemperature]) AS AvgYearlyTemperature,
        MIN([AverageTemperature]) AS MinYearlyTemperature,
        MAX([AverageTemperature]) AS MaxYearlyTemperature
    FROM 
        [LH_Raw].[dbo].[globallandtemperaturesbycountry]
    WHERE 
        [AverageTemperature] IS NOT NULL
        AND [Country] IN ('Greenland', 'Djibouti')
    GROUP BY 
        YEAR([dt]), [Country]
),
BucketizedTemps AS (
    SELECT 
        Year,
        Country,
        AvgYearlyTemperature,
        MinYearlyTemperature,
        MaxYearlyTemperature,
        (Year / 25) * 25 AS YearBucket
    FROM 
        YearlyAvgTemp
),
AggregatedTemps AS (
    SELECT 
        YearBucket,
        Country,
        ROUND(AVG(AvgYearlyTemperature), 2) AS AvgTemperature,
        ROUND(MIN(MinYearlyTemperature), 2) AS MinTemperature,
        ROUND(MAX(MaxYearlyTemperature), 2) AS MaxTemperature
    FROM 
        BucketizedTemps
    GROUP BY 
        YearBucket, Country
),
AggregatedWithChange AS (
    SELECT 
        YearBucket,
        Country,
        AvgTemperature,
        LAG(AvgTemperature) OVER (PARTITION BY Country ORDER BY YearBucket) AS PrevAvgTemperature,
        MinTemperature,
        MaxTemperature
    FROM 
        AggregatedTemps
)
SELECT 
    YearBucket,
    Country,
    AvgTemperature,
    ROUND(AvgTemperature - PrevAvgTemperature, 2) AS ChangeFromPrevPeriod,
    MinTemperature,
    MaxTemperature
FROM 
    AggregatedWithChange
ORDER BY 
    YearBucket, Country;


Djibouti’s average temperature shows a general upward trend, with the more significant increases happening in the later periods; min and max temperatures reflect this trend. Greenland’s average temperature fluctuates but also shows a notable warming trend in recent periods. The minimum temperatures remain consistently low, but there are also periods of slight warming.

Succinctly, the overarching trend across all analyses is an increase in global surface temperatures over the last 300 years.

Leave a Reply

Your email address will not be published. Required fields are marked *