boxesEngineering
Microservices
Monolith-first thinking: drawing boundaries, data ownership, sync vs async comms, sagas, resilience patterns.
1 item
Links1
Start with a monolith
Most products don't need microservices. A well-structured modular monolith — clear internal module boundaries, one deploy, one database — is faster to build, easier to debug, and trivially consistent. Split out a service only when you have a concrete reason: independent scaling, team ownership, isolation of a risky/heavy component (e.g. an ML inference service), or a different language/runtime need.
Drawing boundaries
- Split by business capability / bounded context, not by technical layer. "Billing", "Notifications", "Inventory" — not "the database service" and "the API service".
- Each service owns its data. No other service reaches into its database directly; access is only through its API or events. Shared databases recreate all the coupling you were trying to escape.
Communication
- Sync (REST/gRPC) when the caller needs an answer now. gRPC for internal, high-throughput, typed contracts; REST for external/public.
- Async (events) when the caller can fire-and-forget. Async is the default for cross-service side effects — it removes temporal coupling and improves resilience.
- Avoid deep synchronous call chains (A→B→C→D). One slow link stalls the whole request and multiplies failure probability.
Distributed data & transactions
- No cross-service ACID transaction. Use the Saga pattern: a sequence of local transactions, each with a compensating action to undo on failure. Orchestrate sagas with Temporal for sanity.
- Accept eventual consistency as the cost of the architecture. Design UX and reads around it.
Resilience — assume things fail
- Timeouts on every network call (never infinite).
- Retries with exponential backoff + jitter — but only for idempotent operations.
- Circuit breakers to stop hammering a downed dependency and give it room to recover.
- Bulkheads: isolate resource pools so one failing dependency can't consume all threads/connections.
Operational baseline (non-negotiable before you split)
- Centralized structured logging with a correlation/trace ID threaded through every call.
- Distributed tracing (OpenTelemetry) — without it, debugging across services is guesswork.
- Independent CI/CD per service, plus contract tests so a producer can't silently break a consumer.
The honest trade-off: microservices trade in-process simplicity for operational and network complexity. Only pay that price when the organizational or scaling benefit is real.