Lessons in Causality: Measuring Impact in the Superchain Ecosystem

Lessons in Causality: Measuring Impact in the Superchain Ecosystem

Measuring Impact Is Hard


We all love a good story, especially in crypto, where rapid change and open data make it easy to find patterns and draw conclusions. An incentive program launches, and new addresses follow. A protocol upgrade goes live, and usage spikes. It’s tempting to attribute any shift in key metrics to the most visible intervention. But without a structured approach to measurement, these assumptions are fragile at best and misleading at worst.

These aren’t just academic questions, they go to the heart of how we allocate resources, design incentives, and evaluate outcomes. Getting these answers right is critical for the long-term success of the Collective, to ensure we’re rewarding the right builders and supporting contributions that drive sustainable ecosystem growth.

In a space as complex and fast-moving as crypto, correlation is often mistaken for causation.  As Randall Munroe humorously captures in one of his xkcd comics, it’s easy to see patterns in data and assume they’re meaningful, even when they’re just coincidences.

Source: https://xkcd.com/925/

At the Optimism Collective, we take a deliberately experimental and causal mindset. We design measurement systems and run experiments to go beyond surface metrics so that we can iterate faster, make better decisions, and build what truly works.

Why Observational Data Alone Can Be Misleading

Imagine giving apples to elite athletes before an event, seeing they run fast and then concluding the apple made them so. But they were probably already fast to begin with. Without a proper counterfactual (what would’ve happened without the apple), we risk mistaking correlation for causation.

The same thing happens in crypto.

Take the example of an incentive targeting users based on their gas fee spending. In this simplified scenario, the x-axis represents gas fees paid, and the y-axis represents user retention. Suppose eligibility is based on crossing a certain gas fee threshold (in reality, criteria are often more complex). The goal is to evaluate whether receiving the incentive improves retention.

Note: This data is for illustrative purpose only

At first glance, it might look like users who spend more on gas also stick around longer, suggesting the incentive is working. But that relationship can be misleading. Those who cross the threshold are likely already more engaged and would’ve stuck around even without the incentive.

That’s selection bias: we’re comparing fundamentally different groups. The incentive may appear effective, but the observed impact could be entirely driven by pre-existing differences, not the program itself.

Causal Questions are Everywhere in the Superchain Ecosystem

While regression algorithms are great at identifying correlations and predicting growth trends, understanding why something happened is much harder. Yet causal questions are everywhere in the Superchain ecosystem. Here are a few examples:

Category

Example Question

Protocol Design

Did cheaper transaction fees lead to more users and onchain activity?

Retro Funding

What impact did Retro Funding have on developer activity, onchain contributions, or TVL?

Airdrops

Did airdrop recipients show higher retention than non-recipients?

Growth Campaigns

Did a campaign increase TVL, or would it have grown anyway?

Governance

Does deliberation lead to more informed or less polarized decision-making?

While measuring causal impact is difficult, it’s a challenge worth tackling, and many other domains, from public policy that used randomized evaluation to save taxpayers millions of dollars, to tech companies that built non-experimental causal inference tools to measure the benefit of new tools, have faced and overcome. 

We don’t need to start from scratch. We can draw from proven methods and real-world examples to build smarter, more accountable systems. To do that, we need a shared way of thinking about causality that’s both practical and accessible.

A Practical Framework for Causal Thinking


Measuring impact in open systems is difficult, but it becomes a bit easier when we approach it with the right mindset. Below is a practical framework for thinking causally, even when we can’t run perfect experiments.

Define the Objective and Measurement Upfront

Before anything, we should ask ourselves: “What’s the decision this is meant to inform?” This idea comes from the Experimentation Prioritization Framework at Optimism, which recommends focusing on experiments (or measurements) that directly inform actionable decisions. 

Just as important is being explicit about how we’ll measure success. What metric(s) matter most for the outcome we care about—retention, growth, revenue, decentralization? Are we optimizing for a short-term spike, or long-term sustainability? Having a clear, shared definition upfront ensures our analysis aligns with what really matters.

It’s tempting to define measurement after an initiative is already live. Doing so opens the door to cherry-picking metrics or rationalizing outcomes after the fact. Instead, we should treat measurement design as part of the initiative itself: planned early, tightly aligned with the decision at hand, and baked into execution from the start.

We can use the decision tree below to ensure our research topic and measurement efforts are actually useful.

Source: How We Experiment: Principles for Designing Experiments

When Randomization Isn’t an Option

In a perfect world, we’d run randomized experiments to cleanly isolate the effect of any intervention where possible. However, that’s rarely feasible in reality. Programs like airdrops, Retro Funding, liquidity mining and new feature launches affect the whole ecosystem at once, making it hard to create clean control groups. 

Still, we can learn from structured observation. Methods like regression discontinuity or synthetic control help estimate impact when randomness isn’t possible. Even non-causal tools like descriptive trends, network analysis, sentiment tracking, and simulation can offer valuable insight when interpreted carefully. 

The key is to choose the right method for the question, and to stay honest about what we can (and can’t) conclude.

There are many causal inference methods out there, each suited to different data and decision contexts. To help decide what approach to use, the chart below (while not exhaustive) outlines different analytical methods based on two factors:

(1) strength of causal inference, and (2) data requirements.

Here’s a quick guide to what these methods mean:

Method

Description

Pros

Cons

Exploratory Analysis

Trend analysis, dashboards, before/after metrics

Useful for hypothesis generation and identifying potential signals

Doesn’t control for confounding variables—cannot establish causality

Regression Discontinuity (RDD)

Compares outcomes just above and below a threshold

Can approximate causal inference if threshold is sharp and other factors are smooth across it

Requires a clearly defined threshold and enough data around it

Synthetic Control

Constructs a counterfactual using a weighted combination of similar entities not exposed to the treatment

Useful when randomized experiments aren’t possible; can model complex interventions

Requires many comparable control entities and strong assumptions

Randomized Experiments (A/B Testing)

Randomly assigns treatment to users or entities to isolate impact

Gold standard for causal inference; ensures differences are due to the intervention

Can be expensive, slow, or infeasible in some contexts

In the next section, we’ll walk through a few case studies from the Superchain ecosystem of applying different methods to asset impact and uncover insights.

Observations from the Superchain Ecosystem


While not every initiative is launched with experimental design in mind, we can still learn from them using thoughtful analytical approaches. Below are a few examples within the Superchain ecosystem where we’ve tried to better understand real impact, despite imperfect setups.

Example

Type of Analysis

Method

OP Reward Program Exploratory Analysis

Observational

Exploratory and longitudinal analysis

Airdrop Retention Analysis

Quasi-experimental

Regression Discontinuity (RDD)

Retro Funding Impact Measurement

Quasi-experimental

Synthetic Control

We will explain each of them in more detail below.

OP Reward Program Exploratory Analysis

We evaluated the effectiveness of OP reward programs across 3 seasons in OP Rewards Analytics Update. These programs varied in design, objective, and protocol, so instead of aiming for a unified causal estimate, we took an exploratory approach, analyzing performance during the incentive period and in the 30 days after program end. The goal was to identify and compare in terms of retention, usage and potential strategic tradeoffs across implementations. It’s important to note, however, that we cannot attribute the observed increases in TVL or usage solely to the reward program.

Airdrop Retention via Regression Discontinuity

To overcome the issue of confounding and to estimate the effect on addresses receiving airdrop 5 on subsequent retention, we used a regression discontinuity design to understand effectiveness of an intervention around an arbitrary threshold / boundary—in this case, addresses just above or below the 50 OP threshold. 

The results suggest that receiving the airdrop led to a 4.2 percentage point (pp) increase in 30-day retention and a 2.8pp increase in 60-day retention, compared to similar addresses that did not receive the airdrop.

Source: Did OP Airdrop 5 Increase User Retention Rates? A Regression Discontinuity Analysis

Measuring Retro Funding Impact with Synthetic Control

Open Source Observer (OSO) used synthetic control to estimate what would have happened to rewarded projects had they not received funding. By constructing a weighted composite of similar projects from peer ecosystems, we could use a counterfactual to compare actual outcomes against, offering a read on program effectiveness, despite the lack of randomization.

Source: Early experiments with synthetic controls and causal inference & Github

Final Words


The Optimism Collective should continue to apply these approaches across ecosystem programs, from incentive design to governance to developer funding. What makes crypto especially powerful is the ability to test and learn in real time. With massive amounts of public onchain data, we have a unique opportunity to study human behavior, coordination, incentive response and governance at scale, much like how big data transformed the way we understand and build for the internet today.

It's an ongoing process of iteration and learning, and each step brings us closer to developing a more robust and systematic approach to understanding what truly drives impact.