CloudNine SaaS: When the Founder Says 'We Have a Debt Problem'
How a non-technical founder's public acknowledgment of tech debt created the psychological safety an engineering team needed to fix a codebase in crisis
Company Profile
CloudNine SaaS
CloudNine SaaS is a B2B project management platform built for mid-market companies managing distributed teams. Founded three years ago by a non-technical founder with deep domain expertise in project management, the company raised a $28M Series B to scale its engineering team and accelerate feature delivery.
The platform runs on a React/Node.js/PostgreSQL stack, serving 2,400 paying customers with $6.2M in annual recurring revenue. The engineering culture from day one was classic startup: move fast and break things. That philosophy got them to product-market fit. It also got them into serious trouble.
45
Engineers
2,400
Paying Customers
$6.2M
ARR
3yr
Codebase Age
The Situation
Moving Fast, Breaking Everything
CloudNine had shipped aggressively for three years. Early decisions -- choosing quick solutions over sustainable ones, skipping tests to hit demo deadlines, letting each team pick their own patterns -- had compounded into a codebase that was actively resisting new work. Despite hiring 15 more engineers in the past year, the team was shipping less than ever. The velocity charts told a story that no amount of hiring could fix.
Velocity Collapse
Feature velocity dropped 60% over 12 months despite hiring 15 more engineers. The team was growing but output was shrinking -- a classic sign of compounding tech debt.
Review Bottleneck
Average PR review time ballooned from 4 hours at launch to 4.5 days. PRs were so tangled with unrelated concerns that reviewers could not confidently approve anything without deep investigation.
47-Minute Test Suite
The test suite took 47 minutes to run. Developers stopped running it locally, pushing untested code to CI and waiting 45 minutes to find out if they broke something. The feedback loop was destroyed.
Database Archaeology
The database had 340 unused columns across 89 tables. Nobody knew which columns were safe to remove because no documentation existed and the ORM layer obscured actual usage patterns.
Six State Management Approaches
The React frontend used six different state management approaches -- Redux, MobX, Context API, local state, custom hooks, and one team's homegrown solution. Every new developer had to learn all six to work across the app.
3-Week Onboarding
Onboarding a new engineer took 3 weeks before their first meaningful commit. The local dev environment was fragile, documentation was outdated, and tribal knowledge was the primary source of truth.
Production Incident Surge
Production incidents climbed from 1-2 per month a year ago to 3-4 per week. The on-call rotation became dreaded. Two engineers asked to be removed from it entirely.
Warning Signs
The signs were accumulating for months. Each one alone seemed like a normal growing pain. Together, they painted a picture of a startup approaching a breaking point.
Five Sprints of Decline
Sprint velocity charts showed steady decline for 5 consecutive sprints. Each retrospective identified "tech debt" as a blocker. Each sprint planning session promised "we'll address it next sprint."
Working Around, Not Through
Senior engineers started working around the codebase instead of through it -- duplicating modules rather than refactoring shared ones, adding new endpoints instead of fixing broken abstractions.
Tech Leads Walking Out
Two tech leads quit within a month of each other. Both cited the same frustration in exit interviews: "We're just patching patches at this point. There's no path to making this codebase healthy."
Uncomfortable Questions
Series B investors asked about "engineering efficiency" in the quarterly review. They had noticed the team grew 50% but feature delivery slowed. The board was starting to wonder where the money was going.
"Why Does Everything Take 3x?"
The product team was openly frustrated. Features that used to take a week now took three. The product manager started adding "tech complexity buffer" to every estimate -- a tax that kept growing.
The Breaking Point
The CTO's Data Dump
The CTO compiled six months of engineering metrics and presented them to the founder: "We are shipping 60% less than a year ago with 50% more engineers. Our cost per feature has tripled. At this trajectory, we will miss our Series C targets by a wide margin."
The Founder's Skepticism
The founder -- a non-technical product visionary -- was initially skeptical. "Are we sure this isn't just a people problem? Do we need better engineers?" But the CTO walked through the data methodically: the same engineers who were productive a year ago were now struggling. The codebase had changed, not the people.
The All-Hands Moment
At the next company all-hands, the founder said something no one expected: "We moved fast and broke things. That got us here. But now we need to fix what we broke. This is not an engineering problem -- it is a company priority. And it starts with me admitting that we let this happen."
Psychological Safety Unlocked
That single statement changed everything. Engineers who had been afraid to flag debt -- worried they would be seen as slow or negative -- suddenly felt safe to speak up. Within a week, the team had cataloged 147 debt items they had been silently working around. The backlog that had been invisible was now visible.
The Playbook: 10 Months to Recovery
CloudNine structured their recovery as four phases, each with clear goals and measurable outcomes. The founder stayed visibly involved throughout, attending weekly engineering syncs and publicly celebrating debt reduction milestones.
Fix It February
The founder declared "Fix It February" -- no new features for four weeks. The entire engineering team focused exclusively on the highest-impact debt items. This was controversial with the product team, but the founder's backing made it non-negotiable.
- Reduced test suite from 47 minutes to 8 minutes through parallel execution and removing 200+ flaky tests
- Standardized on a single state management approach (Zustand) with a migration guide for the five deprecated patterns
- Implemented PR size limits and required focused, single-concern pull requests
Result: PR review time dropped from 4.5 days to 1.5 days
Clean the Foundation
With the immediate pain points addressed, the team tackled the structural issues that were silently multiplying every bug and slowing every feature.
- Database audit: removed 280 of 340 unused columns and added proper indexes based on actual query patterns
- Consolidated 6 different API patterns into a consistent RESTful design with shared middleware
- Implemented proper error boundaries and monitoring with Sentry for error tracking and DataDog for performance
Result: Production incidents dropped from 3-4 per week to 1 per week
Make It Pleasant to Work Here
With the codebase stabilized, the team focused on developer experience -- making it fast and enjoyable to contribute. This phase was critical for retention and for making sure the improvements would stick.
- Rebuilt local development environment with Docker Compose -- full stack running in 2 minutes from a fresh clone
- Created comprehensive API documentation auto-generated from OpenAPI spec with interactive examples
- Built an onboarding guide with a "first PR in 2 days" guarantee -- including a curated list of starter issues
Result: New engineer onboarding dropped from 3 weeks to 3 days
Never Again
The final phase focused on making debt management a permanent part of how CloudNine operates -- not a one-time cleanup but a continuous practice backed by leadership.
- Established the "20% rule" -- one day per week dedicated to tech debt, mandated by the founder and protected from feature pressure
- Created a tech debt backlog visible to the entire company -- not hidden in engineering Jira, but on a shared dashboard anyone could see
- Monthly "State of the Codebase" report shared with investors -- proactively addressing engineering health before they ask
Result: Feature velocity recovered to 110% of pre-debt baseline
Results: Before vs After
Comparison of key metrics before and after the 10-month recovery program
Key Metrics
Feature Velocity
40% (declining)
110%
of baseline
PR Review Time
4.5 days
19 hours
82% reduction
Test Suite
47 minutes
8 min
83% faster
Onboarding
3 weeks
3 days
86% faster
Lessons Learned
Founder Acknowledgment Creates Safety
When the founder publicly acknowledged debt, it created psychological safety for the entire engineering organization. Engineers stopped hiding problems and started surfacing them. The backlog of invisible debt became visible overnight -- and visible debt is fixable debt.
Dedicated Sprints Break the Cycle
"Fix It February" broke the cycle of endless patching. When the entire team focuses on debt simultaneously, you get compound benefits: fixing one system makes fixing the next one easier. Piecemeal debt work in normal sprints never achieves this momentum.
Visible Backlogs Kill the Sandbagging Narrative
Making the tech debt backlog visible to non-engineers prevented the "engineering is sandbagging" narrative. When product managers and executives can see the debt items and their impact, the conversation shifts from "why is engineering slow" to "how do we prioritize these fixes."
Share Health Metrics With Investors
Sharing codebase health with investors builds trust and preempts uncomfortable questions. CloudNine's monthly "State of the Codebase" report turned a potential investor concern into a strength: "This team understands their technical risk and manages it proactively."
The 20% Rule Only Works From the Top
The 20% rule for tech debt only works when it comes from the top. Engineering-only policies get overridden by product pressure every time. When the founder says "one day a week for debt, no exceptions," it sticks. When engineering says it, it lasts until the next urgent feature request.
"If you're a founder and your engineers keep asking for 'refactoring time,' listen. The longer you wait, the more it costs. And if you can't say 'we have a debt problem' in an all-hands, your team can't fix it. That one sentence -- spoken publicly, by the person at the top -- was worth more than any Jira board or sprint planning session we ever ran."
-- Advice from CloudNine's CTO, reflecting on the turnaround
Frequently Asked Questions
Extremely common. Nearly every startup that achieves product-market fit accumulates significant tech debt in the process. The "move fast and break things" culture is effective for finding what works, but it creates a compounding problem: every shortcut taken to ship faster makes the next feature slower to build. The key is not avoiding debt entirely -- that is unrealistic for a startup -- but recognizing when the debt is costing more than the speed it originally bought you. CloudNine hit that point when velocity dropped 60% despite growing the team by 50%.
Founder involvement matters because tech debt remediation requires trade-offs that only leadership can authorize. Pausing features, dedicating sprint capacity, and protecting debt-reduction time from product pressure all require top-level backing. More importantly, when a founder publicly acknowledges debt, it signals that flagging technical problems is valued, not punished. Engineers in many organizations stay silent about debt because raising it is seen as being negative or slow. A founder saying "this is my priority" transforms the culture overnight.
Both have their place. A dedicated debt sprint like CloudNine's "Fix It February" is best when debt has accumulated to the point where incremental fixes cannot keep up. The compound effect of the entire team focusing on debt simultaneously produces results that drip-fed work across normal sprints cannot match. However, dedicated sprints are not sustainable as a long-term strategy. After the initial sprint, CloudNine transitioned to a 20% weekly allocation for ongoing debt management. The ideal approach is: aggressive sprint to stop the bleeding, then sustained weekly allocation to prevent it from happening again.
Proactively and in business terms. Investors understand that startups accumulate debt -- what concerns them is whether the team is aware of it and managing it. CloudNine's monthly "State of the Codebase" report included metrics investors care about: feature velocity trends, incident rates, onboarding speed, and deployment frequency. Frame debt remediation as an investment in engineering leverage, not a confession of past mistakes. Show the cost of inaction (declining velocity, rising incidents) alongside the cost of remediation and the expected return. Investors who see data-driven engineering leadership are more confident, not less.
Focus on leading indicators that reveal debt before it becomes a crisis. Key metrics include: PR review time (rising review times signal growing complexity), test suite duration (slow tests mean developers skip them), time to first commit for new hires (measures onboarding friction), incident frequency and mean time to recovery (measures codebase stability), and sprint velocity trends over 4-6 sprint windows (a single slow sprint means nothing -- five consecutive declining sprints means everything). Track these on a dashboard visible to the whole team, not buried in engineering reports.
Prevention requires three things working together: protected capacity, visible tracking, and cultural reinforcement. Protected capacity means a non-negotiable percentage of sprint time for debt work -- CloudNine uses 20%, mandated by the founder. Visible tracking means a debt backlog that the entire company can see, not a hidden engineering list. Cultural reinforcement means celebrating debt reduction the same way you celebrate feature launches. CloudNine added a "debt item of the week" to their all-hands meeting and gave quarterly awards for the highest-impact debt fixes. When debt reduction is visible, protected, and celebrated, it becomes part of how the team operates rather than something they do when things get bad enough.
Is Your Startup Slowing Down?
If your team is growing but your velocity is shrinking, tech debt is likely the cause. Learn how to measure it, communicate it, and fix it before it stalls your next round.