Data Analysis: Identify the Problem You’re Trying to Solve
August 24, 2020 by
One of the most important steps in data analysis is often missed or not done with enough care: identifying the problem you're trying to solve.
It's like Alice in Alice's Adventures in Wonderland. She follows the rabbit down the hole with no idea where she is going, what she is facing, or how to get home. Once down the hole, she gets distracted by one adventure after another. When she eventually realizes she is lost and asks the Cheshire Cat for directions, he gives her the sage advice, "It doesn't matter where you're going if you don't know where you want to go."
Distractions, Confusion, and Rabbit Holes
Similarly, we can get lost in metaphorical rabbit holes when doing analysis. Going down an analysis rabbit hole means spending time on something that ends up being a waste. Imagine a situation where we are tasked with assessing the success of a bus rapid transit (BRT) system in our community. There are many potential ways to get lost:
- Confusion about purpose — A team member might have a different definition of success from us. We might assume our project is going to focus on the economic benefits of the BRT and our partner could be exploring environmental impact. One of us ends up wasting our time.
- Missing data — We could not fully thinking our plan through. We might spend hours researching the economic benefits of BRTs before realizing we don't have access to any publicly available economic data.
- Distractions — We could spend too much time on a small part of the analysis — for example, making a graph look pretty — while ignoring the rest of the analysis.
Ultimately, there are lots of rabbit holes we could get lost in. It might not be until we finally finish the project that we realize how much time we spent that wasn't really productive.
Define Your Purpose
What the Cheshire Cat knew was that in order avoid going down rabbit holes, we need to know our purpose.
Socrata has created the Socrata Data Academy, a series of free, online courses designed for government workers, in order to teach basic to advance data analysis skills and steer all students away from rabbit holes.
The first lesson in the online training is to identify the problem we are trying to solve. In the training, we learn about the importance of doing diligence upfront, such as:
- Scope the problem correctly
- Gather information
- Understand the true goals of the analysis
- Define what needs to get done by when
- Put this all together in a clear and concise written problem statement that gets signed off on by all stakeholders
Identifying the problem is the crucial first step of the analysis and is often overlooked. However, doing it well helps us avoid going down analysis rabbit holes.