What do Story Points measure?

Devs in modern agile environments spend a lot of time talking about story points through estimation ceremonies, velocity calculation, and artifacts from the Scrum world like “burn down charts.” There is general agreement that these points should be a measure of the effort to make a certain feature “live”. Beyond that though, they are a slippery thing. They are integral to estimation and planning, but should never be treated as estimates of time. Devs will be held to the delivery commitments inferred from those estimates, but story points should never be used to compare work across developers or across teams. The assigned value should take into account total effort, complexity, risk, and uncertainty, but it should be drawn from one of the first seven or eight values in the Fibonacci sequence. Story points appear to be doing some very heavy lifting.

Despite noble intentions to free developers from date-based planning, story points wind up playing essentially the same function as a deadline, even with the best of intentions. Maybe this sequence sounds familiar:

A generally high-performing team is engaged in the planning process. The business has a number of features they would like to see completed before the end of the year, and they have put those at the top of their priority list. The team follows their usual estimation process, and groups the features into Sprints, it looks to be doable in the allotted time so they ‘commit’ to that plan. All is well for the first few weeks, but then in rapid succession a team that is a dependency for their feature is well behind in their delivery, one of the team’s contractors is unexpectedly swapped out for a different contractor who needs to be onboarded, and a new set of security controls has been announced that will require updates to several existing codebases in the next six weeks. The team barely completes half of what they committed to. At the next planning session, they are admonished for falling behind on the committed work.

This is an exaggeration, although maybe not by much. Even when teams are not specifically criticized for missing commitments, they still usually need to account for why it happened despite the fact that commonly occurring obstacles such as dependency issues, team changes, and enterprise mandates are publicly known. In any case, developers are encouraged to try to “close the gap” in subsequent sprints. If agile says we can’t have deadlines well gosh darnit we’ll still find a way to hold the developers accountable for their time!

The core problem is the knowledge that developers can control the amount of effort it takes to deliver a feature, which is true, but misleading. The amount of effort to deliver is generally inversely proportional to the amount of technical debt left behind after it is delivered. When developers are forced to account for external changes while still delivering the same feature outcome, they adjust by cutting back on all of the design, architectural, testing, documentation, and automation work that keep a codebase sustainable and of high quality. This attempt to hold developers accountable for work commitments only results in reduced quality which will ultimately come back to destroy team velocity.

A better way

In designing a new system, it is important to highlight the flaws of the current system:

Fails to auto-normalize – Story points are unreliable because the “available pool” is based on the size and composition of the team itself. Every team must determine what their velocity is based on past experience and guesswork, and if the team grows or shrinks, or changes in skillset or experience level, it can dramatically impact apparent velocity in an unpredictable way. Every developer joining a new team must get a sense of what a ‘5-pointer’ looks like to that team.

Too similar to time tracking – Since point values are intended to represent effort and risk, and these are basically just measured by time (expended to build or gained/lost to risk) it’s far too easy to think of points as some rough time metric despite many sources warning against doing so.

Poor at accounting for non-functional work – While features get story points, other things get percentages, i.e. spend 20% on non-functional work, leave 10% for fly-in work. There’s an attempt then to mentally remove that from the baseline of time available for the story point-tracked work, but that makes the estimation process unnecessarily complicated and opaque.

Mismatched Incentives – Developers are held to estimates for work that can change dramatically based on factors entirely out of their control, or that simply has a lot of uncertainty because it has never been done before. Their only remaining option is to cut corners around exactly the kinds of things that create long-term sustainability.

The “100 Story Points” Method

Time is the one truly fixed resource in software development and it is a fool’s errand to keep a schedule-driven industry from trying to reduce everything to time. The problem is in treating delivery as the process of inserting story point coins (aka time) into the bank until a feature comes out. While it is clear that spending more time on a particular task will on average get it completed more quickly, the relationship is far from linear. Instead it is more accurate to say that spending more time on a task will increase the team’s confidence that a given feature will be delivered within a given Sprint, up to a point. Therefore the goal is to make sure the team allocates the most effort towards the most important features so as to make them the most likely to be completed.

The 100 Story Points method assigns 1% of the team’s available capacity to each point, and developers must then negotiate how many of the 100 available points should be applied towards which features, plus non-feature work (automation and developer experience stuff), unexpected fly-in work, or maintenance. Once points are subtracted out for the non-functionals, the trick for the functional work is that the rather than estimating feature size, developers are instead asked to give their confidence level for the the feature being completed at a certain point allocation level (“I believe that with 20 points this would be 80% confidence, with 30 I would go up to 90%”). This avoids the discussion of a commitment to completion; developers are only committing to the amount of work they should allocate to each item.

It addresses many of the shortcomings of the standard approach to story points:

Auto-normalizes to change – Estimates don’t need to be adjusted as the size or composition of the team changes as the commitment does not represent an amount of work to be completed. If a team member joins, confidence estimates might go up, and more work might be completed, but nothing “committed” needs to change. There’s always 100 points to spend.

Time commitment, not tracking – Rather than playing a semantic game about whether points are time, it is explicitly true but a commitment to expending a percentage of total time, not in what work will be completed.

Handles feature work similarly to other types – If the team decides they need to allocate 25 points to handle maintenance work and other non-functional upkeep, then they pull that out and allocate the remaining 75 points to features. This can help illustrate the effect of starving that kind of work earlier, when it begins to dominate the point expenditures.

Aligned Incentives – The business wants their highest priority features as quickly as they can, developers want to deliver those features at a sustainable rate. As long as both parties understand how time is to be allocated, it can be assumed that the engineering team will follow those priorities while making sure the overall product health is sound.

Other Advantages

An additional advantage of this approach is that it removes the counterproductive practice of getting “commitment” on feature work. This only serves to hold one side accountable. Instead, the business agrees that this is the correct allocation of effort based on the current priorities and have some expectation of what will be completed based on the developer’s estimates. Certainly the business has some right to ask questions if the team says 95% change of completion with 50 story points and it is nowhere near completion by the end, or if delivery speed is not meeting a critical business need, but it helps frame the discussion around the challenges rather than on why the team ‘failed to deliver’.

It can be nice to show more than just done/not done, and be able to say “With the 20 points spent last Sprint, we increased our likelihood of completion to 95% this Sprint with only 5 points”. It also lends itself nicely to visualizations and other analysis since it works entirely on percentages of effort.

Next Up: Planning with 100 Story Points

One thought on “What do Story Points measure?

  1. I’d really like our team to try this 100 story points method. I once thought that we should vote using chips on the work that we felt gave us the most business value. The twist that I like with the 100 story point method is that confidence vote that goes with it. I bet after a team used this method for two planning events, the confidence would be implicitly known for that team based on the amount of story points assigned. It would be great to see that on a graph.

    Like

Leave a reply to Jerry Thomas Cancel reply