Lessons Learned in Release Management

Released a major feature on a Friday at 4pm. By 6pm, support tickets were flooding in. By Monday morning, we’d rolled back, apologised to customers, and I’d learned an expensive lesson about release timing.

Some lessons you learn from books. Some you learn by breaking production.

The Release That Taught Me Everything

At a one startup, with plenty of customers, we rebuilt our core engine. It was months of work. Tested end-to-end and ready to ship. Decided on a big bang release - switch everyone over at once. Rationale: gradual rollout would create two code paths to maintain. Clean cut was better, of course.

We released on Thursday morning and within two hours, we discovered the new engine was 3x slower for a specific task that represented almost half of customer usage. Not broken - just slow enough to be frustrating.

We couldn’t roll back easily because we’d migrated database schema. We had to spend the weekend optimizing under pressure. By Monday it was fixed, but we definitely lost customer trust and spent months rebuilding it.

The lessons:

Big bang releases are high-risk gambling
“Thoroughly tested” doesn’t include every edge case
Database migrations are one-way doors
Thursday releases are still Friday releases when things go wrong
Technical success doesn’t equal customer success

Understanding the Fundamentals

What Release Management Actually Is

Most people think release management is deployment mechanics: who pushes the button, what time, what’s the rollback plan. That’s release execution.

Release management is: deciding what to release, when, to whom, with what messaging, with what rollback options, and how to measure success.

At one team we shipped features whenever engineering finished them. Called it “continuous deployment.” More like chaos deployment, am I right or am I right. Some features launched at 2am when two users noticed. Some launched in the afternoon when half our users were online. Some launched with no announcement. Some got blog posts. No consistency.

Changed to deliberate release cadence:

Minor updates: rolled out continuously via feature flags
Medium features: grouped into weekly releases, announced in-app
Major features: monthly releases with email, blog post, webinar

Gave customers predictability. Gave us time to prepare support and marketing. Gave us data on what worked.

Core Principles That Actually Matter

Principle one: Decouple deploy from release. Deploy code to production regularly. Release features to customers deliberately. Feature flags make this possible.

At one company, we deployed to production 10+ times daily. Most deployments were invisible - code behind flags, not released to anyone. When ready to release, we toggled flags for specific customer segments.

Meant we could deploy Friday afternoon without risk. The deploy wasn’t the release.

Principle two: Release to segments, not everyone. Internal team first, then beta customers, then 10% of users, then everyone. Catch issues when they affect 5 people, not 1000.

Principle three: Measure impact immediately. Not “did it deploy successfully.” Did the feature get used? Did target metrics move? Did support volume spike?

At one company, we had a release dashboard that showed:

Feature adoption rate (first 24 hours)
Impact on key metrics (vs previous week)
Support ticket volume (vs baseline)
Error rates (vs previous deploy)

If anything looked wrong, we paused rollout and investigated. This is to catch problems before they became disasters. No one like disasters, so catch problems early. Ok?

A Practical Framework

Step One: Pre-Release Checklist

Not a bureaucratic checklist. A forcing function for thinking through what could go wrong.

Questions to answer:

Can we roll back? How quickly? What’s the cost?
What’s the blast radius if this breaks? 100 users? 10,000?
What does success look like in first 24 hours?
Who needs to know this is launching?
What’s our communication plan if something goes wrong?

10 minutes of thinking prevents hours of firefighting.

Imagine discovering during the checklist that rollback would require manual data migration for every customer. It will for sure change the deployment strategy to be incremental instead of big bang. The checklist saves. Use the checklist.

Step Two: Staged Rollout

Ship to internal team first. Full product team uses it in production for a week. Iron out obvious issues. Then beta customers who opted in. They’re forgiving and give detailed feedback. Then 10% random sample. Watch metrics closely. If something’s wrong, pause. Then 50%. Then everyone.

Once, we found a critical bug at the 10% stage. Small feature that caused infinite loops for users. Affected 3% of that 10%.

If we’d gone straight to 100%, would have hit 10x more users. The staged rollout saves. Use staged rollouts.

Step Three: Communication Cadence

Before release: Announce what’s coming to internal team and beta users. Set expectations.

During release: If it’s gradual, tell customers when they’ll get it. Nothing worse than “new feature!” emails when they don’t have access yet.

After release: Show adoption metrics to team. What worked? What didn’t? What did we learn?

Cool idea? Do release retrospectives 72 hours after every major release. While it is fresh.

Putting It Into Practice

Managing the Inevitable Release Disaster

Something will go wrong. Not if - when. How you handle it matters more than avoiding it.

The response pattern:

Acknowledge immediately (don’t go silent)
Stop the rollout if you can
Assess impact (how many customers, how bad)
Fix forward or roll back (depends on rollback cost)
Communicate timeline for fix
Do a public post-mortem after

Oh boy, once we shipped a feature that broke email service for users. We discovered it six hours after the release.

Our response:

Posted immediately: “We’re aware email notifications aren’t working for users. We’re investigating.”
Rolled back within a couple of minutes. Well, ok, 30 minutes.
Posted update: “Issue resolved. Emails should be flowing normally. We’re investigating root cause and will share findings.”
Wrote public post-mortem explaining what happened, why, how we fixed it, what we’re changing to prevent recurrence

Customers appreciated transparency. Trust recovered quickly because we handled it well.

Release Metrics That Actually Matter

Not “deployment success rate.” That’s engineering metrics.

Product release metrics:

Adoption rate: What percentage of eligible users tried the feature in first week?
Retention of feature: Of users who tried it, what percentage use it again?
Impact on north star metric: Did this move the number we care about?
Support volume: Did we create more work for support?

Key Takeaways

Right, let’s make this concrete:

Decouple deploy from release using feature flags - Ship code frequently. Release features deliberately to specific segments.
Always have a rollback plan - Understand the cost and time to revert. If rollback is expensive, use gradual rollout.
Release to segments, not everyone at once - Internal, beta, 10%, 50%, 100%. Catch problems small.
Measure impact immediately - Adoption, retention, metric movement, support volume. Know if it’s working within 24 hours.
Have a pre-release checklist - Who needs to know? What’s the rollback plan? What does success look like? 10 minutes thinking saves hours firefighting.
Do release retrospectives while it’s fresh - 72 hours after release, review what worked and what didn’t. Capture lessons before they’re forgotten.

Final Thoughts

Every product manager has a release disaster story. The Friday afternoon deploy that ruined the weekend. The big bang release that had to be rolled back. The feature that broke production. What’s yours?

The teams that get good at releases aren’t the ones who avoid problems. They’re the ones who contain problems, respond transparently, and learn systematically.

Start with one change this week: if you’re doing big bang releases, try a gradual rollout. If you’re not using feature flags, set them up. If you don’t have a pre-release checklist, create one.

Good release management isn’t about perfect executions. It’s about limiting blast radius when things inevitably go wrong.

Have questions or thoughts? Get in touch - I’d love to hear from you!