Built around Fastify + TypeScript, Postgres (no ORM; stored procedures), Kafka/NATS (JetStream), Redis, and AWS (ECS Fargate, RDS, MSK/NATS, Secrets Manager, S3, CloudWatch/Grafana).

Everything below assumes you’ll clone the enterprise-starter you now have per service (security, idempotency, OpenAPI, observability already wired).

Executive overview (cross-cutting)

Non-negotiables

Double-entry, immutable ledger; all money movement derives from it.

Exactly-once effects: HTTP idempotency + DB outbox + consumers (FOR UPDATE SKIP LOCKED/JetStream acks).

Strong multi-tenancy: RLS by tenant_id; ABAC at API; mTLS service-to-service.

Schema governance: OpenAPI 3.1 for HTTP; versioned JSON/Avro for events; backward-compatible changes by default.

Observability: OpenTelemetry traces, Prometheus metrics, Pino logs. SLOs with burn-rate alerts.

Compliance: WORM audit logs, key rotation, CIS Benchmarks, PITR drills, period close.

Developer Experience: per-service repo template; scripted DB migrations; contract tests; semantic-release.

Core phases

Phase 0 Foundations: CPS, Ledger, Wallet, Pricing, Limits/Risk.

Phase 1 Money Movement Happy Path: Orchestrator + C2B/STK + Notifications + Webhooks.

Phase 2 Ops Loop: Settlement + Reconciliation + Disputes/Reversals.

Phase 3 Scale: P2P, B2C, B2B, Agent, Reporting, FX (optional).

Service-by-service blueprints

Each service uses the same template: Mission → APIs → Events → Data → Flows → Failure/Idempotency → Security → Observability → Scaling → Testing → Runbooks → Rollout.

1) `cps-service` — Control Plane (Tenancy, Identity, RBAC/ABAC)

Mission: Source of truth for tenants, users, credentials, roles/policies, API keys.

APIs (HTTP):

POST /v1/tenants (create)

POST /v1/users (create; msisdn/email)

POST /v1/users/{id}/credentials/pin (set/rotate; Argon2id)

POST /v1/roles, POST /v1/policies, POST /v1/api-keys

POST /v1/auth/token (for merchant/service-accounts; OIDC-compatible)

Events:

Publishes: tenant.created, identity.user.created, policy.updated

Subscribes: none (foundational)

Data (DDL highlights):

tenant(tenant_id, name, base_currency, created_at)

user(user_id, tenant_id, msisdn, email, status) with RLS

credential(user_id, type, secret_hash, salt, status)

role, policy(subject, resource, action, condition_jsonb)

api_key(id, tenant_id, subject_id, scopes[], secret_hash, status)

Key flows: user onboarding; PIN rotate; policy attach; API key rotate.

Failure/Idempotency: all POST require Idempotency-Key. Token issuance has strict replay-prevention (nonce tied to subject+aud).

Security: Argon2id (HSM/KMS-sealed); rate limit auth endpoints; mTLS internal; ABAC enforced on every request.

Observability: auth latency, token issuance error rate, rate-limit hits.

Scaling: stateless; Postgres connection pool; cache policy decisions (LRU/Redis).

Testing: property tests on ABAC conditions; credential rotation; brute-force lockout.

Runbooks: lockout/unlock flow; key compromise rotation; subject deprovisioning.

Rollout: seed platform admin tenant/user; publish OIDC metadata to internal portal.

2) `kyc-compliance-service` — KYC & AML

Mission: KYC profiles, document orchestration, sanctions/PEP checks, risk scoring.

APIs:

POST /v1/kyc/profiles (start; INDIVIDUAL/BUSINESS)

PATCH /v1/kyc/profiles/{id} (submit docs/answers)

POST /v1/kyc/profiles/{id}:verify (async checks; webhooks to self on provider callback)

GET /v1/kyc/profiles/{id} (status, risk score)

Events: publishes kyc.profile.verified|rejected|updated.

Data:

kyc_profile(profile_id, tenant_id, subject_id, type, status, risk_score, current_version)

kyc_profile_version(profile_id, version, form_jsonb, documents_jsonb, status)

aml_screening(profile_id, provider, result_jsonb, risk_score)

Flows: start → collect → verify (providers) → status emit → orchestrator consumes gate.

Failure/Idempotency: multi-provider retries; store provider correlation IDs; dedupe by subject+doc type.

Security: encrypt PII cols; signed file uploads (S3 pre-signed); data retention policy.

Observability: verification turn-around, provider error rates, KYC pass rate by tier.

Scaling: async workers for provider calls; rate-limit/queue per provider SLA.

Runbooks: provider outage—fallback to alternate; manual review queue overflow.

3) `catalog-pricing-service` — Products, Tariffs, Fees, Taxes

Mission: Central pricing engine for all flows (C2B/P2P/B2C/B2B/Agent).

APIs:

POST /v1/catalog/tariffs (+rules)

GET /v1/quotations?product&amount&currency&walletTier&channel (returns fee/tax/commission breakdown)

Events: pricing.tariff.updated.

Data:

tariff(tariff_id, product, currency, effective_from,to)

tariff_rule(tariff_id, min,max, fee_type[%|fixed], amount, pct, cap, channel, tier)

commission_rule(tariff_id, agent_tier, pct, floor, cap)

Flows: orchestrator requests quote at auth time; store quote with payment.

Security: only privileged roles mutate tariffs; 4-eyes approval on production tariff changes.

Observability: quotation latency, cache hit ratio, tariff publish audit.

Testing: property tests over random slabs; regression fixtures per product.

4) `limits-risk-service` — Limits & Velocity + Risk Decisions

Mission: Hard/soft limits, velocity (count/amount), geofence/device risk.

APIs:

POST /v1/limits (define per tier/actor)

POST /v1/risk/check (sync pre-auth decision: ALLOW/DENY/CHALLENGE)

Data/Cache:

PG: limit_policy(subject_type, scope, window, max_count, max_amount)

Redis: risk_counter:key windows (sliding) replicated to PG snapshots.

Flows: orchestrator invokes pre-auth; if CHALLENGE → STK or OTP.

Observability: decision distribution, false positive rate, counter drift Redis↔PG.

Runbooks: reset counters; quarantine device; promote temporary overrides.

5) `ledger-service` — System of Record

Mission: Immutable double-entry, period close, balances.

APIs:
POST /v1/ledger/journals, POST /v1/ledger/journals/{id}:reverse,
POST /v1/periods/{yyyy-mm}:close, GET /v1/accounts/{code}/balances, GET /v1/journals/{id}

Events: ledger.posted, ledger.period.closed.

Data:
account, journal_entry, journal_line, account_balance, gl_period, idempotency_key, command_outbox.

Flows: sp_post_journal_batch validates, posts, updates balances, maintains hash chain.

Failure/Idempotency: POST requires Idempotency-Key; outbox semantics for downstream.

Security: RLS by tenant; no UPDATE/DELETE; reversals only; nightly hash verification; WORM export.

Observability: post latency, ΣD=ΣC invariant checks, hash-chain integrity, outbox lag.

Runbooks: integrity failure (freeze posting → verify → recover), period close stuck, hot partition bloat.

6) `wallet-service` — Wallets, Holds, Statements

Mission: Logical customer/merchant/agent wallets mapped to ledger accounts.

APIs:
POST /v1/wallets, GET /v1/wallets/{id}, POST /v1/wallets/{id}/hold, POST /v1/wallets/{id}/release,
GET /v1/wallets/{id}/statement?from&to&cursor, POST /v1/wallets/{id}/lock|unlock

Events: wallet.created|locked|hold.created|released.

Data:
wallet(wallet_id, owner_type, owner_id, currency, tier, status),
wallet_account_map(wallet_id, ledger_account_id, purpose),
wallet_hold(hold_id, wallet_id, amount_minor, reason, expires_at, status).

Flows: create wallet → map ledger accounts; holds for auth/capture; statements from ledger read models.

Security: verify owner + ABAC; PIN verify via CPS for sensitive ops.

Observability: hold expiration backlog, statement latency, wallet lock events.

7) `orchestrator-service` — Payment Switch / State Machines

Mission: Owns payment lifecycle; composes Pricing, Risk, Wallet, Ledger; emits canonical events.

APIs:
POST /v1/payments (init), GET /v1/payments/{id} (status), POST /v1/payments/{id}:cancel, POST /v1/payments/{id}:capture

Events: publishes payment.created|authorized|captured|failed|cancelled.

Data:

payment(payment_id, product, channel, payer_wallet_id, payee_wallet_id, amount, fees, currency, state, idem_key, source_ref)

, payment_event.

Core state machine (C2B sample):
PENDING → AUTHORIZING (risk+pricing) → AWAITING_APPROVAL (STK) → AUTHORIZED → CAPTURING → CAPTURED
Errors: → FAILED; before capture: CANCELLED.

Idempotency: request keyed by body+tenant; transitions guarded (illegal transitions 409).

Security: strict ABAC (payer must own wallet); enforce KYC level; per-route rate limits.

Observability: auth rate, capture success, failure reasons (operator, risk, balance).

Runbooks: stuck in PENDING; duplicate callbacks; capture retry policy.

8) `c2b-stk-service` — C2B & STK Push

Mission: Merchant-initiated C2B with handset approval.

APIs:
POST /v1/c2b/initiate (merchant → platform), GET /v1/c2b/{id}, POST /v1/stk/callbacks/operator (operator → us)

Events: stk.prompt.sent, stk.approved|declined; consumes payment.created to initiate STK.

Data: c2b_request(req_id, merchant_id, order_ref, msisdn, amount, currency, status, operator_txn_id, callback_url), stk_push(push_id, req_id, msisdn, prompt_status, approval_code, operator_payload).

Flows: merchant initiation → orchestrator creates payment → STK service calls operator → callback → orchestrator continues.

Failure/Idempotency: dedupe on merchant_id+order_ref; callback replay safe (operator_txn_id).

Security: merchant authZ; HMAC verify on merchant callbacks.

Observability: prompt success rate, time-to-approval, operator errors.

Runbooks: operator outage; delayed callbacks; mismatch amount.

9) `p2p-service` — Person-to-Person

Mission: Direct wallet-to-wallet transfers with optional fees.

APIs:
POST /v1/p2p/transfers (fromWalletId, toWalletId|msisdn, amount, narration), GET /v1/p2p/transfers/{id}

Events: p2p.authorized|captured|failed.

Data: p2p_transfer(transfer_id, from_wallet_id, to_wallet_id, amount_minor, fees_minor, state).

Flows: verify KYC/limits → ledger posting → notifications/webhooks.

Security: sender PIN verification; blocklisted recipients; AML triggers.

Observability: success rate, average transfer time, fraud flags.

10) `b2c-service` — Disbursements

Mission: Bulk payouts to customers (salary/refunds).

APIs:
POST /v1/b2c/payouts (batch upload+validate), GET /v1/b2c/payouts/{id}

Events: b2c.payout.created|completed; item-level b2c.item.completed|failed.

Data: b2c_batch(batch_id, merchant_wallet_id, count, totals, state), b2c_item(item_id, dest_wallet_id|msisdn, amount_minor, fees_minor, state).

Flows: validate file → quote fees → throttle executions → ledger → per-item status events.

Scaling: segment items by partition keys; parallel workers; idempotent upserts.

11) `b2b-service` — Business-to-Business

Mission: Merchant → merchant transfers; invoice ref, memos, optional FX.

APIs: POST /v1/b2b/payments, GET /v1/b2b/payments/{id}
Data: b2b_payment(id, from_wallet_id, to_wallet_id, amount, currency, state).

Security: higher KYC tier required; per-txn approvals (4-eyes) for large amounts.

12) `agent-ops-service` — Cash-In / Cash-Out / Float

Mission: Agent network operations; commissions.

APIs:
POST /v1/agent/cashin, POST /v1/agent/cashout, GET /v1/agent/float, POST /v1/agent/commission/settle

Data: agent(agent_id, float_wallet_id, status, location), agent_txn(txn_id, agent_id, customer_wallet_id, type, amount, fees, state).

Flows: cash-in (credit e-money, debit agent float); cash-out (reverse); commissions accrue to payable ledger.

Security: agent device binding; geo-fence; high-risk velocity checks.

13) `settlement-service` — Merchant Settlement

Mission: Generate settlement batches; dispatch to bank; confirm; handle reserves.

APIs:
POST /v1/settlement/runs?merchant&from&to, GET /v1/settlement/batches/{id},
POST /v1/settlement/batches/{id}:dispatch, POST /v1/settlement/batches/{id}:confirm

Events: settlement.batch.created|dispatched|completed.

Data: settlement_batch(batch_id, merchant_id, period_from,to, gross, fees, net, reserve, state), settlement_item, settlement_instruction(bank, amount, currency, status, bank_ref).

Flows: compute payable from ledger read model → create batch → generate instruction/file → bank confirm → mirror ledger entries.

Security: HSM signing for files if required; protect bank credentials in Secrets Manager; 4-eyes for dispatch.

Observability: T+N adherence, batch age, failed instructions.

14) `recon-service` — Reconciliation

Mission: Ingest operator/bank statements; match vs ledger; manage breaks.

APIs:
POST /v1/recon/import (source, format, S3 url), GET /v1/recon/breaks?status, POST /v1/recon/breaks/{id}:resolve

Events: recon.break.created|resolved.

Data: recon_file(file_id, source, meta, imported_at), recon_row(row_id, file_id, value_date, amount, currency, ref, match_key, state), recon_break(break_id, type, our_ref, ext_ref, amount_delta, status).

Flows: normalize → auto-match strategies (exact, ±tolerance, window, key) → breaks → adjustments via ledger.

Observability: match rate, time-to-resolve, recurring break fingerprints.

15) `disputes-service` — Disputes & Chargebacks

Mission: Dispute lifecycle; evidence; reversal integration.

APIs:
POST /v1/disputes (open), PATCH /v1/disputes/{id} (state transitions), GET /v1/disputes/{id}

Events: dispute.opened|evidence.required|resolved|reversed.

Data: dispute(id, payment_id, opened_by, reason_code, state, deadlines), dispute_evidence(id, dispute_id, doc_url, hash, uploaded_by).

Flows: open → gather evidence → adjudicate → execute reversal (ledger) → notify stakeholders.

16) `notifications-service` — Multichannel Messaging

Mission: Template-driven SMS/Email/Push/WhatsApp; retries; logs.

APIs: POST /v1/notify, POST /v1/templates, GET /v1/logs/{id}

Data: message_template(id, channel, locale, body), message_log(id, to, channel, template_id, payload, state, provider_ref).

Events: consumes payment.*, settlement.*, etc.; publishes notify.sent|failed.

Security: HMAC signing for provider webhooks; PII minimization in payloads.

Observability: per-provider success, latency, retry counts.

17) `webhooks-service` — Partner Webhooks

Mission: Register endpoints; sign payloads; ordered, retried delivery.

APIs: POST /v1/webhooks, POST /v1/webhooks/test, GET /v1/webhooks/{id}

Data: webhook_endpoint(id, tenant_id, url, topics[], secret, status), webhook_delivery(id, endpoint_id, topic, payload, attempt, state).

Flows: subscribe → on event → enqueue delivery with HMAC signature (sha256=...) → exponential backoff → dead-letter → operator tooling to replay.

Security: per-tenant isolation; deny-list internal address ranges; TLS validation.

Observability: delivery latency, failure buckets (5xx, timeouts, DNS).

System design glue

Event taxonomy & topics

payment.created|authorized|captured|failed|cancelled (orchestrator)

stk.prompt.sent|approved|declined (c2b-stk)

ledger.posted|period.closed (ledger)

wallet.hold.created|released (wallet)

settlement.batch.created|dispatched|completed (settlement)

recon.break.created|resolved (recon)

dispute.opened|resolved (disputes)

notify.sent|failed (notifications)

Headers: tenantId, traceId, schemaVersion, idempotencyKey, sourceService.

Data governance

Classify: Public, Internal, Restricted (PII), Secret.

Encrypt: at rest (KMS); column-level for PII (msisdn, doc numbers).

Retention: journals forever; PII per policy; logs 90–180d; audit WORM ≥ 7y (jurisdiction-dependent).

Access: break-glass with time-boxed roles; all access audited.

Performance targets (initial)

Ledger post DB time P95 ≤ 50 ms; orchestrated payment end-to-end P95 ≤ 600 ms @ ≥ 2k TPS

STK prompt round-trip median ≤ 8 s (operator-dependent)

Settlement daily run ≤ 30 min; Recon file ≤ 15 min per 100k rows

Testing & quality gates (apply to every repo)

Unit (business logic) • Integration (routes+DB) • Contract (OpenAPI+events) • Property (invariants) • Performance (k6/Gatling) • Chaos (kill workers mid-txn) • Security (SAST/SCA, dependency review) • DR drills (PITR).
CI must fail on: schema drift, breaking change, coverage < threshold, Trivy high vulns, OpenAPI invalid.

Key property tests:

Ledger: ΣD == ΣC per entry/day/tenant/currency; hash chain continuity.

Orchestrator: illegal state transitions rejected.

Limits/Risk: window counters never negative; Redis↔PG snapshot reconciliation.

STK: callback replay idempotent.

Webhooks: signed delivery verifies; replay idempotent.

Runbooks (top incidents)

Payments stuck in PENDING: check orchestrator outbox lag, limits/risk dependency, operator callback queue; replay partition.

Outbox backlog: scale consumers, inspect poison queue, verify DB locks.

Ledger hash-check fail: freeze posting (feature flag), compute diff window, restore from PITR if tamper suspected.

Settlement variance: re-compute payable, compare with recon breaks, halt dispatch.

Provider outage (STK/Email/SMS): failover routing, degrade to queued mode, inform merchants via status page.

RDS pressure: bump pool/fillfactor, partition hot tables, add read replica for reports.

Deployment & infra (per service)

ECS Fargate (one service per repo) behind ALB; private subnets; SG least-privilege.

RDS Postgres 16: primary + reader; PITR enabled; pg_partman where needed (ledger).

Kafka/MSK or NATS JetStream: at-least-once; topic per domain; retention policies.

Redis: ElastiCache for velocity counters/caching.

Secrets Manager/KMS for keys; auto rotation.

S3: WORM bucket for audit; bucket per tenant for statements/exports.

IaC: CDK stacks per service (compute, SG, task defs, IAM, alarms).

CD pipeline: blue/green; pre-traffic migrations; auto rollback on health/SLO breach.

Phased delivery plan (90-day aggressive)

Weeks 1–2 (Phase 0A): CPS, Ledger core, Wallet basics.
Weeks 3–4 (Phase 0B): Pricing, Limits/Risk; end-to-end dry run (no operator).
Weeks 5–6 (Phase 1A): Orchestrator thin slice; C2B/STK with mock operator; Notifications/Webhooks.
Weeks 7–8 (Phase 1B): Real operator integration; production-grade idempotency, retries, circuit breakers.
Weeks 9–10 (Phase 2A): Settlement (single merchant), Recon (CSV); daily cycles live.
Weeks 11–12 (Phase 2B): Disputes/Reversals; audit/reporting basics.
Weeks 13–14 (Phase 3A): P2P, B2C; Agent cash-in/out.
Weeks 15+ (Hardening): Perf @ target TPS, chaos drills, DR rehearsals, pen-test findings.

Exit criteria per phase: documented SLOs met, runbooks updated, contract tests green, security controls verified.

What to do next (actionable)

Clone the enterprise starter for ledger-service, wallet-service, cps-service, catalog-pricing-service, limits-risk-service.

Add the service-specific OpenAPI stubs and DDL listed above; wire the stored procs for ledger.

Stand up orchestrator-service + c2b-stk-service with a mock operator to complete the C2B happy path locally.

Wire notifications + webhooks; add HMAC signatures and replay protection.

Establish schema registry (events), API portal (OpenAPI), and approval workflows for tariff/settlement dispatch.

Add dashboards & alerts: golden signals per service, outbox lag, payment funnel.

Microservices

Executive overview (cross-cutting)#

Service-by-service blueprints#

1) cps-service — Control Plane (Tenancy, Identity, RBAC/ABAC)#

2) kyc-compliance-service — KYC & AML#

3) catalog-pricing-service — Products, Tariffs, Fees, Taxes#

4) limits-risk-service — Limits & Velocity + Risk Decisions#

5) ledger-service — System of Record#

6) wallet-service — Wallets, Holds, Statements#

7) orchestrator-service — Payment Switch / State Machines#

8) c2b-stk-service — C2B & STK Push#

9) p2p-service — Person-to-Person#

10) b2c-service — Disbursements#

11) b2b-service — Business-to-Business#

12) agent-ops-service — Cash-In / Cash-Out / Float#

13) settlement-service — Merchant Settlement#

14) recon-service — Reconciliation#

15) disputes-service — Disputes & Chargebacks#

16) notifications-service — Multichannel Messaging#

17) webhooks-service — Partner Webhooks#

System design glue#

Event taxonomy & topics#

Data governance#

Performance targets (initial)#

Testing & quality gates (apply to every repo)#

Runbooks (top incidents)#

Deployment & infra (per service)#

Phased delivery plan (90-day aggressive)#

What to do next (actionable)#

Executive overview (cross-cutting)

Service-by-service blueprints

1) `cps-service` — Control Plane (Tenancy, Identity, RBAC/ABAC)

2) `kyc-compliance-service` — KYC & AML

3) `catalog-pricing-service` — Products, Tariffs, Fees, Taxes

4) `limits-risk-service` — Limits & Velocity + Risk Decisions

5) `ledger-service` — System of Record

6) `wallet-service` — Wallets, Holds, Statements

7) `orchestrator-service` — Payment Switch / State Machines

8) `c2b-stk-service` — C2B & STK Push

9) `p2p-service` — Person-to-Person

10) `b2c-service` — Disbursements

11) `b2b-service` — Business-to-Business

12) `agent-ops-service` — Cash-In / Cash-Out / Float

13) `settlement-service` — Merchant Settlement

14) `recon-service` — Reconciliation

15) `disputes-service` — Disputes & Chargebacks

16) `notifications-service` — Multichannel Messaging

17) `webhooks-service` — Partner Webhooks

System design glue

Event taxonomy & topics

Data governance

Performance targets (initial)

Testing & quality gates (apply to every repo)

Runbooks (top incidents)

Deployment & infra (per service)

Phased delivery plan (90-day aggressive)

What to do next (actionable)