EduPlatform: When Every Feature Touches 14 Services
How an edtech company discovered that splitting a monolith by code file instead of business domain created a distributed monolith worse than the original
Company Profile
EduPlatform Global
EduPlatform Global is an online learning platform serving 2.8 million students and 45,000 instructors across 60 countries. With 110 engineers organized into 8 product teams, the company delivers live classes, recorded courses, assessments, and certification programs for universities and corporate training departments.
Their platform originally ran as a Ruby on Rails monolith. Between 2021 and 2023, the team extracted it into what they called a "microservices" architecture -- 67 services in total. But the migration happened without architectural governance, and the result was something far worse than what they started with.
2.8M
Students
110
Engineers
67
Services
8
Product Teams
The Situation
A Distributed Monolith
EduPlatform spent two years migrating from a Rails monolith to microservices. But nobody defined service boundaries based on business domains -- they simply split the monolith by code file. The User model became its own service. The Enrollment model became another. The Payment module became a third. Every business operation that previously called methods in the same process now made network calls across services that were never designed to be independent. The result was a distributed monolith with all the complexity of microservices and none of the benefits.
Tightly Coupled Services
Most features required changes to 8-14 services simultaneously. The student enrollment flow alone touched 14 services -- if any single one failed, enrollment broke completely. Services were not independent; they were a monolith connected by HTTP calls instead of function calls.
Circular Dependencies
23 pairs of services had circular dependencies -- Service A called Service B, which called Service A right back. Debugging a single request meant tracing calls across 6-8 services, often in circles. No API contracts existed; services called each other's internal methods via a shared database.
The Deploy Train
Average feature deployment required coordinated releases across 6+ services. The team ran a "deploy train" every two weeks involving 30+ engineers to orchestrate the release order. Independent deployment -- the primary benefit of microservices -- was completely impossible.
Feature Delivery Collapsed
Feature delivery time increased from 2 weeks to 8 weeks after the microservices migration. What was supposed to make teams faster made them four times slower. Every pull request required approval from at least 3 other teams because changes rippled across service boundaries.
Cascading Failures
The service mesh complexity caused cascading failures that took down the entire platform. Three full outages in a single semester, each triggered by one service failure that propagated through the dependency chain. Students could not access courses during midterm week.
Warning Signs
The symptoms were clear for months, but the team kept attributing them to growing pains rather than fundamental architectural problems.
4x Slower Delivery
Feature delivery time quadrupled from 2 weeks to 8 weeks after the microservices migration. The team was shipping fewer features with more engineers than before the migration started.
Biweekly Deploy Train
Coordinated releases every two weeks involving 30+ engineers. Half a day lost to deployment coordination every sprint. Services could not deploy independently because of shared state and circular calls.
3 Full Outages Per Semester
Service mesh complexity caused cascading failures. One service going down pulled others with it because of synchronous dependency chains and no circuit breakers. Students lost access during critical exam periods.
Enrollment Failures
Students unable to enroll during peak registration periods. The enrollment flow touched 14 services, and peak traffic exposed every weakness in the dependency chain simultaneously.
Cross-Team PR Bottleneck
8 teams, but every pull request required approval from at least 3 other teams. Service boundaries did not align with team boundaries, so every change crossed organizational lines.
The Breaking Point
Fall Semester Enrollment Outage
A 12-hour outage during fall semester enrollment affected 340,000 students. The enrollment service triggered a cascade through 14 dependent services, each failing in sequence. The on-call team spent 8 hours just identifying which service had failed first because the distributed tracing was incomplete and the circular dependencies made the failure path impossible to follow.
University Partner Ultimatum
University partners representing 40% of revenue demanded SLA guarantees. Three major universities began evaluating competing platforms. The enrollment outage happened during their busiest week, and their students were the ones affected. The message was clear: fix this or lose the contracts.
Board Mandate
The board mandated: "Fix the architecture or find a buyer." The revenue at risk from university partner churn exceeded the total cost of remediation. For the first time, architecture debt was framed not as an engineering problem but as a business survival question.
The Playbook: 18 Months to Recovery
EduPlatform structured their remediation as four phases, starting with emergency stabilization and ending with governance that prevents the same mistakes from recurring.
Emergency Stabilization
- Implemented circuit breakers on all inter-service calls to prevent cascading failures
- Created the "enrollment critical path" -- identified and hardened the 14 services that enrollment depended on
- Added comprehensive distributed tracing with Jaeger across all service boundaries
Result: Zero enrollment outages for spring semester
Domain Redesign
- Hired a domain-driven design consultant to facilitate bounded context workshops with every team
- Redefined service boundaries around business domains (enrollment, content delivery, assessment, payments) instead of code structure
- Identified that 67 services could be consolidated to 12 bounded contexts; broke all circular dependencies with event-driven communication
Result: Feature deployment reduced from 6+ services to 1-2 services per change
Service Consolidation
- Merged 67 services into 18 well-bounded services -- not 12 as originally planned, a pragmatic compromise based on team structure and deployment needs
- Implemented proper API contracts using OpenAPI specs and consumer-driven contract testing between all services
- Eliminated shared database access -- each service owns its data, communicating through well-defined APIs and events
Result: Feature delivery from 8 weeks to 2 weeks; deploy train eliminated
Architecture Governance
- Created Architecture Decision Records (ADR) process requiring documented rationale for every new service or major API change
- Implemented automated architecture fitness functions that detect coupling violations, circular dependencies, and shared database access in CI
- Quarterly "Architecture Health" review correlating architecture metrics with business outcomes; team topology reorganized around domains, not layers
Result: Architecture health score tracked and improving quarter over quarter
Results: Before vs After
Comparison of key metrics before and after the 18-month architecture remediation
Key Metrics
Feature Delivery
8 weeks
2 weeks
75% reduction
Services per Feature
14
1.5 avg
89% reduction
Service Count
67
18
Well-bounded
Coordinated Deploys
Every 2 weeks
None
Eliminated
Lessons Learned
Distributed Monolith Is Worse
Microservices without bounded contexts is just a distributed monolith with network latency added. You get all the operational complexity of microservices -- deployment coordination, distributed tracing, network failures -- with none of the independence benefits.
Code File Is Not a Domain
Splitting by code file instead of business domain creates services that are coupled by definition. The User model, Enrollment model, and Payment model all serve the same business process. Separating them into services did not create independence -- it created network overhead between tightly coupled components.
Architecture Debt Compounds Fastest
Architecture debt compounds faster than code debt. Each new service added more coupling to the existing mesh. Every feature built on top of the wrong boundaries made the boundaries harder to fix. The cost of remediation grew exponentially while the team thought they were making progress.
67 Services Were Not Microservices
Having 67 services did not mean having microservices. It meant having a monolith with extra steps -- network hops, serialization overhead, deployment coordination, and distributed debugging. The number of services is irrelevant; what matters is whether each service can be developed, deployed, and scaled independently.
DDD Before Migration
Domain-Driven Design should happen before the migration, not after. EduPlatform spent two years splitting a monolith without understanding their domains, then spent 18 months undoing the damage. The DDD workshops that eventually fixed the architecture would have cost a fraction of the remediation if done upfront.
The Deploy Train Tells the Truth
The "deploy train" was the clearest symptom that service boundaries were wrong. If you need coordinated releases, your services are not independent. The deploy train is not a process problem to be optimized -- it is an architecture problem to be fixed at the root.
"If adding a feature requires coordinated deployments across more than 2 services, you don't have microservices. You have a distributed monolith. Fix the boundaries before adding more services."
-- Lesson from EduPlatform Global's architecture team, shared at a platform engineering conference
Frequently Asked Questions
A distributed monolith is a system that has the deployment topology of microservices but the coupling characteristics of a monolith. The clearest indicators: you cannot deploy one service without coordinating with others, a change in one service requires changes in multiple other services, services share a database or call each other's internal APIs, and you have circular dependencies between services. If your "microservices" require a deployment train, you almost certainly have a distributed monolith.
Service boundaries should follow business domains, not code structure. Use Domain-Driven Design techniques like event storming and bounded context mapping to identify natural boundaries where business processes are relatively independent. A well-bounded service owns a complete business capability: its data, its logic, and its API. The test is simple -- can this service be developed, tested, deployed, and scaled by a single team without coordinating with others? If not, the boundary is in the wrong place.
DDD should happen before the first service is extracted from the monolith. The entire point of DDD is to identify the right boundaries -- splitting a monolith without understanding your domains guarantees you will draw boundaries in the wrong places. Invest 4-8 weeks in event storming workshops, context mapping, and domain modeling before writing any migration code. This upfront investment prevents the far more expensive mistake of building 67 tightly coupled services that need to be consolidated later.
Code-level debt -- duplicate functions, missing tests, outdated libraries -- is localized and can be fixed incrementally. Architecture debt affects the entire system structure: wrong service boundaries, missing API contracts, circular dependencies, shared databases. Architecture debt compounds faster because every feature built on wrong boundaries reinforces those boundaries. Fixing architecture debt requires coordinated, system-wide changes that are orders of magnitude more expensive than fixing code debt. Prevent it with upfront design; code debt can be managed with ongoing cleanup.
Yes, and EduPlatform proved it. They used a strangler fig approach to merge services incrementally. Start by identifying which services belong to the same bounded context. Route traffic through a new consolidated service while keeping the old services running. Migrate data ownership one entity at a time. Replace synchronous inter-service calls with internal method calls within the consolidated service. The key is doing it incrementally -- merging two services at a time rather than attempting a big-bang consolidation.
Deploy coordination is a symptom of coupling, not a process problem. Fix it at the root: implement proper API contracts with versioning so services can evolve independently. Replace synchronous calls with event-driven communication where possible. Eliminate shared database access so each service owns its data. Use consumer-driven contract testing to verify compatibility without coordinated releases. Once services are truly independent -- they own their data, expose stable APIs, and communicate through events -- each team can deploy on their own schedule.
Is Your Architecture Creating Hidden Debt?
Architecture debt compounds faster than code debt. Learn the techniques to identify wrong boundaries, fix coupling, and prevent distributed monolith patterns.