boxesEngineering

Microservices

Monolith-first thinking: drawing boundaries, data ownership, sync vs async comms, sagas, resilience patterns.

1 item

Links1

01NotesNote

Start with a monolith

Most products don't need microservices. A well-structured modular monolith — clear internal module boundaries, one deploy, one database — is faster to build, easier to debug, and trivially consistent. Split out a service only when you have a concrete reason: independent scaling, team ownership, isolation of a risky/heavy component (e.g. an ML inference service), or a different language/runtime need.

Drawing boundaries

  • Split by business capability / bounded context, not by technical layer. "Billing", "Notifications", "Inventory" — not "the database service" and "the API service".
  • Each service owns its data. No other service reaches into its database directly; access is only through its API or events. Shared databases recreate all the coupling you were trying to escape.

Communication

  • Sync (REST/gRPC) when the caller needs an answer now. gRPC for internal, high-throughput, typed contracts; REST for external/public.
  • Async (events) when the caller can fire-and-forget. Async is the default for cross-service side effects — it removes temporal coupling and improves resilience.
  • Avoid deep synchronous call chains (A→B→C→D). One slow link stalls the whole request and multiplies failure probability.

Distributed data & transactions

  • No cross-service ACID transaction. Use the Saga pattern: a sequence of local transactions, each with a compensating action to undo on failure. Orchestrate sagas with Temporal for sanity.
  • Accept eventual consistency as the cost of the architecture. Design UX and reads around it.

Resilience — assume things fail

  • Timeouts on every network call (never infinite).
  • Retries with exponential backoff + jitter — but only for idempotent operations.
  • Circuit breakers to stop hammering a downed dependency and give it room to recover.
  • Bulkheads: isolate resource pools so one failing dependency can't consume all threads/connections.

Operational baseline (non-negotiable before you split)

  • Centralized structured logging with a correlation/trace ID threaded through every call.
  • Distributed tracing (OpenTelemetry) — without it, debugging across services is guesswork.
  • Independent CI/CD per service, plus contract tests so a producer can't silently break a consumer.

The honest trade-off: microservices trade in-process simplicity for operational and network complexity. Only pay that price when the organizational or scaling benefit is real.