Eliminating Integration Test Flakiness in Microservices

The Challenge

Across a large microservices organization, three problems consistently eroded engineering velocity:

Flakiness at scale. Tests depended on shared infrastructure, live network calls, and uncontrolled data, producing intermittent failures unrelated to the code under test. The practical result: engineers re-ran tests rather than trusted them.
Environment lock-in. Tests written for local or CI environments could not run against staging or production without being rewritten. There was no single test artifact that could validate behavior across the environments where it actually mattered.
Structural unsuitability for AI tooling. Tests were written in an imperative, environment-specific style with no clear separation between intent, dependencies, and data — making them difficult for AI tools to generate, understand, or maintain at the moment teams most needed to scale coverage with AI assistance.

The cumulative effect was slow feedback cycles, low trust in the test suite, high maintenance cost, and a test infrastructure that actively resisted modernization.

The Solution

The Pebble Framework is a JUnit-based library written in Kotlin, supporting both Java and Kotlin test code. It is built around several core design decisions:

12-factor as a testability contract. The Pebble Framework treats 12-factor app design as both a requirement for the service under test and a general design guide. Services that externalize configuration and dependencies become substitutable at test time, which is what makes environment-agnostic execution possible.
Write once, run everywhere. A single test can run against fully faked, hermetic dependencies locally or against real dependencies in staging and production — unchanged. The same test providing sub-second local feedback can serve as a post-deploy production verification.
Built-in fakes for real dependency types. Rather than integrating a separate mocking tool per protocol, the Pebble Framework ships fakes for HTTP, GraphQL, Redis, SQL, and Kafka, all configured through one consistent API.
Hot-swapping the service under test. When a test fails, engineers can apply code changes to the running service without a full restart, turning the debug loop from restart-and-wait into a live, interactive cycle.
Ecosystem-wide test data model. The Pebble Framework proposes a unified data model where every service owns its slice of the domain and exposes an internal API for test-data generation. Tests compose data across services through these APIs — a coherent approach that works identically in every environment, including production, where data is synthetic and isolated.
Safety by default via a proxy service. A dedicated proxy sits between the service under test and real dependencies, preventing mutation or leakage of production data while enabling traffic capture and routing between real and faked dependencies.
Traffic recording into reproducible tests. Real (non-production) traffic can be recorded and replayed as reproducible local tests through a separate recorder component, lowering the effort of creating high-fidelity scenarios.
AI-ready structure. Intent, dependencies, and data are declared explicitly and consistently, giving AI tooling the structure needed to generate new tests and maintain existing ones.

The Pebble Framework operates in two execution modes, illustrated below.

Local environment — no proxy; downstream services are replayed from recorded data through built-in fakes:

Non-prod / prod environment — the test process talks directly to the service under test and to test-data generators, while the proxy service guards real dependencies:

The Pebble Framework consists of five components: the JUnit library (test API, fakes, and execution model), the proxy service (safety guard, traffic capture, and routing), test-data generators (per-service internal APIs for synthetic data generation), a separate recorder component (for capturing and replaying real traffic as reproducible local tests), and an IntelliJ plugin currently on the roadmap.

The Results

The Pebble Framework is described as forward-looking; the following are the designed outcomes, as stated in the source:

Near-zero flakiness. Faked dependencies, controlled data, and the guarded proxy make tests deterministic — failures indicate real defects rather than environmental noise.
Lower infrastructure and maintenance cost. A single framework replaces a stack of per-concern tools and bespoke per-service setups, reducing both infrastructure consumption and ongoing maintenance effort.
A single test artifact per scenario. The write-once/run-everywhere model collapses duplication across local, CI, staging, and production into one reusable test.
A test suite ready for AI-assisted development. Explicit, declarative test structure gives AI tooling the input it needs to generate and maintain tests at scale.

Eliminating Integration Test Flakiness Across a Large Microservices Platform

The Challenge

The Solution

The Results

Tech Stack

Languages

Protocols & API Styles

Databases & Streaming

Tooling