Exploring the LEGO Legacy: A 53-Year Data Journey with the Maven LEGO Challenge

Iwa Sanjaya
6 min readJan 24, 2024

--

It took me almost two full days to complete the data analytics project dedicated to the monthly Maven challenge. This time, the dataset focuses on LEGO, and in this article, I’ll explain the comprehensive step-by-step approach I took to finish the project.

Challenge Objective

Use your creativity and analytical skills to build an interactive dashboard or visual that allows users to discover the history and changes in LEGO sets over the last 50 years.

About the Dataset

It contains information on LEGO sets released from 1970 to 2022, including details like the theme, number of pieces, recommended age, retail price, and images for each set.

Company Profile

The LEGO® Group is a Danish toy company that was founded in 1932 by Ole Kirk Christiansen. The company is renowned for its iconic interlocking plastic bricks, commonly known as LEGO bricks, which allow users to construct a wide variety of objects and structures. The company’s commitment to creativity, play, and education has made it a global leader in the toy industry. LEGO sets cover various themes, including licensed properties such as Star Wars, Harry Potter, and Marvel, as well as original themes like LEGO City and LEGO Technic.

Introduction

In this challenge, I developed a simple yet interactive dashboard to interpret and present my findings from the dataset to the stakeholders. To do this effectively, I assumed the role of a stakeholder, considering the information that might be crucial to them. This approach helped me identify what needs to be extracted from the data source. Consequently, I formulated the following questions to guide the analytical process:

  1. What has been the trend in LEGO sets over the past 53 years? Is there a correlation between the annual increase (or decrease) in the number of LEGO sets released and its impact on the company’s revenue and operating profit?
  2. How much is the number of LEGO sets produced for each category? Which category has the largest number of sets?
  3. What is the largest number of LEGO sets produced based on the number of bricks in each set?
  4. Is there a correlation between the prices of LEGO sets and the minimum age recommendations for those sets? If so, what could be the underlying reasons for this correlation?

Limitations

Missing (blank) values in some fields

I noticed numerous missing values in the dataset, particularly in the ‘pieces,’ ‘minifigs,’ ‘agerange_min,’ and ‘US_retailPrice’ fields, which could potentially compromise the validity of the final results. To address this issue, two approaches can be considered:

  1. Manually complete the missing values from alternative sources available on the internet (if possible), ensuring the validity of gathered data.
  2. Exclude the missing (blank) values from the calculation and visualization process. Filter out these values or entries from the table or charts to prevent the presentation of misleading information.

Each method has its pros and cons. If you proceed with the first approach, you’ll need to put in the effort to gather and check additional data. However, if you do it right, this approach can make your final results more reliable. On the other hand, the second approach is simpler. You just leave out the missing information from your calculations and visuals, but this might affect the accuracy of your end result.

However, in this case, I decided to proceed with existing data and leave out the missing values because I think it already gives good overall view of the trends over the past 53 years.

Step 1: Data Cleaning

This step is not required as the data source is already clean.

Step 2: Data Transformation

  • I went the extra mile to locate financial metrics, such as revenues and operating profits, in LEGO’s annual reports. I established an additional Excel table to systematically document these metrics over the years and managed to gather records from 2003 to 2022. While it’s not a complete dataset, it provides sufficient information to illustrate the historical trend of these metrics in comparison to the quantity of LEGO sets being produced annually.
  • I created a new column named ‘Piece Count’ to categorize the quantity of LEGO pieces in each set into five groups. The specified ranges are: 1–249, 250–499, 500–999, 1000–1999, and equal to or more than 2,000 pieces. I referred to LEGO’s official website for the categorization, which employs similar criteria to sort and filter its products. The following DAX formula is being used:
Piece Count = 
SWITCH(
TRUE(),
lego_sets[pieces] >= 1 && lego_sets[pieces] <= 249, "1-249",
lego_sets[pieces] >= 250 && lego_sets[pieces] <= 499, "250-499",
lego_sets[pieces] >= 500 && lego_sets[pieces] <= 999, "500-999",
lego_sets[pieces] >= 1000 && lego_sets[pieces] <= 1999, "1000-1999",
lego_sets[pieces] >= 2000, "≥2000",
BLANK()
)

Step 3: Data Visualization

Page1 — ‘INSIGHTS’ page acts as the main dashboard which provides key information about my findings, displayed using various charts and graphs
Page 2 — ‘TRIVIA’ page contains the information about the top 10 biggest and most expensive LEGO sets as of 2022

The dashboard consists of two pages: ‘Insights’ and ‘Trivia.’ The ‘Insights’ page addresses all key questions that might be important for stakeholders. This page contains four insights supported with visuals and brief explanations. The ‘Trivia’ page provides information about the top 10 biggest and most expensive LEGO sets, including details about the sets that top the lists, such as the product explanation, number of pieces, retail price, and release year. As for the visualization, I matched the theme colors with the official LEGO color codes to make it look professional, relevant, yet still playful. I designed it to appear as simple as possible to emphasize all the critical information and make it easy for the audience to read.

Findings

The dashboard reveals four valuable insights that I discovered during my analysis of the data:

A picture of a line chart and a stacked column chart. The line chart depicts the fluctuations in the number of LEGO sets from 2003 to 2022. These fluctuations align with the changes in EBIT and revenue, depicted in the stacked column chart below. The chart reveals an overall annual growth trend.
The annual trend in the number of released LEGO sets aligns with the company’s revenue and EBIT from 2003 to 2022
  1. Over the span of 53 years, from 1970 to 2022, there has been a growth in the number of LEGO sets released, exhibiting a Compound Annual Growth Rate (CAGR) of 6.14%. Examining the years between 2003 and 2022, we can see a clear connection between changes in the company’s revenue and operating profit and the annual introduction or reduction of LEGO sets. This suggests that the release of new sets plays a crucial role in influencing the company’s overall profitability.
  2. As of 2022, LEGO has a total of 7 product categories, 17 theme groups, 152 themes, and 874 sub-themes. Among these, the ‘Normal’ category has the highest number of sets, totaling 12,757 sets, while the ‘Random’ category has the fewest sets, amounting to 64 sets.
  3. The company has predominantly manufactured LEGO sets with a piece count ranging from 1 to 249, totaling 10,984 sets or 75.66%. As the piece count per set increases, the number of sets produced decreases.
  4. In general, the average retail price of LEGO sets tends to rise as the recommended minimum age increases. This suggests that the company offers more complex sets with a higher number of pieces to customers of an older age group, leading to an increased per-set selling price.
  5. Trivia: The LEGO Art World Map holds the title of being the largest individual LEGO set ever made, comprising 11,695 pieces and measuring over 25.5 inches (65 cm) in height and 40.5 inches (104 cm) in width. Despite its substantial size, the World Map is priced at only $249.99, translating to an approximate cost of 2 cents per brick. On the other hand, the LEGO Star Wars Millennium Falcon and AT-AT from the Ultimate Collector Series claim the spot for the most expensive LEGO sets, priced at $849.99 per set.

Thank you for reading! I hope you get valuable insights from my dashboard.

References

LEGO Annual Report

https://www.lego.com/en-us

https://www.brickeconomy.com/sets/top/most-expensive-lego-sets

https://thecollector.io/features/the-30-biggest-lego-sets-ever

--

--

Iwa Sanjaya
Iwa Sanjaya

Written by Iwa Sanjaya

A data storyteller, making complex data approachable for non-data savvy.

Responses (1)