Architecting a 2-Minute Deployment Pipeline: From Legacy to GitLab CI
When I joined Workpay, our deployment pipeline was the single greatest bottleneck to developer velocity. We were shipping a monolithic PHP application handling payroll computations for thousands of businesses. However, the deployment cycle consumed over three hours. The pipeline was bogged down by sequential test suites running on a single stateful EC2 build runner, lack of cache reuse, and frequent environment failures due to host-level dependency drift.
This is an architectural teardown of how we re-engineered our GitLab CI pipeline to run in under two minutes, scaled our testing, and handled zero-downtime database migrations under a Blue/Green ECS deployment model.
The Legacy Bottlenecks
To speed up a pipeline, you must first identify what is stateful and what is sequential. Our legacy setup had three major flaws:
- Sequential Test Execution: We had 1,500 integration and unit tests. Running them sequentially on a single, oversized host took 45 minutes. As the codebase grew, execution time scaled linearly.
- Dependency Drift: Our build host was stateful. A package update on the host (such as an implicit minor version upgrade of
libzip-devor the PHPext-intlextension) would cause build failures that did not match local development environments. - Blocking Database Migrations: Run-in-place database migrations would lock critical tables (like
payroll_ledgers) during rolling updates. When a container running the new code ran a migration that altered or dropped a column before the old containers had finished draining, the legacy containers would immediately throw SQL errors and crash, breaking the user experience.
Phase 1: Ephemeral Runners and Layer Caching
We first migrated our build runners to Docker-in-Docker (DinD) on ephemeral AWS EC2 spot instances. Every build runs in an isolated container defined by a strict version-locked Docker image (php:8.2-fpm-alpine3.18).
To optimize docker build times, we implemented multi-stage builds and aggressive dependency caching. The first stage installs Composer packages, caching the vendor/ directory based on the cryptographic hash of composer.lock.
Here is the caching configuration we implemented in our .gitlab-ci.yml:
stages:
- build
- test
- deploy
variables:
COMPOSER_CACHE_DIR: "$CI_PROJECT_DIR/.composer-cache"
cache:
key:
files:
- composer.lock
paths:
- .composer-cache/
- vendor/
build:dependencies:
stage: build
image: composer:2.6
script:
- composer install --prefer-dist --no-ansi --no-interaction --no-progress --no-scripts --dry-run
- composer install --prefer-dist --no-ansi --no-interaction --no-progress --no-scripts
By hashing the lock file, the runner skips package fetching entirely unless dependencies actually change, reducing dependency resolution from eight minutes to under 15 seconds.
Phase 2: Parallel Test Isolation with Pest
We migrated from PHPUnit to Pest and parallelized test execution. While Pest supports parallel test runs out of the box using --parallel, running database-intensive integration tests concurrently leads to database lockups and dirty state collisions.
To solve this, we avoided shared database state by dynamically bootstrapping individual SQLite databases in-memory for each parallel thread. In our tests/Pest.php config, we dynamically intercept the connection boot and map it to a thread-specific memory buffer:
// In tests/Pest.php or TestCase.php
protected function setUp(): void
{
parent::setUp();
$token = env('TEST_TOKEN', '1'); // Injected by Pest Parallel
$dbPath = "database_test_{$token}.sqlite";
config(['database.connections.testing' => [
'driver' => 'sqlite',
'database' => ':memory:', // Or database_path($dbPath) for concurrency separation
'prefix' => '',
]]);
Artisan::call('migrate');
}
In our .gitlab-ci.yml, we combine Pest’s parallel threads with GitLab’s parallel:matrix to run tests across multiple independent runner nodes concurrently:
test:integration:
stage: test
image: our-private-registry/php-test-runner:8.2
parallel: 4
script:
- vendor/bin/pest --parallel --processes=4
This setup scales testing across 16 parallel threads, bringing execution time down from 45 minutes to 90 seconds.
Phase 3: Zero-Downtime Blue/Green Database Migrations
When deploying to ECS behind an Application Load Balancer (ALB), Blue/Green deployments spin up the new task group (Green) and direct a fraction of traffic to them while slowly draining active connections from the old task group (Blue). This rolling window takes around 5 minutes.
During this window, both old and new code are active. If your deployment pipeline automatically runs php artisan migrate first and drops or alters a column, the active Blue tasks will instantly fail when they query that table.
To prevent this, we enforced a strict Expand and Contract (Three-Phase) Migration Pattern:
Phase 1: Expand (Add Only)
We only write migrations that add columns or tables, leaving existing structures untouched. Columns must be nullable or have safe defaults. We deploy code that writes to both the old and the new fields (dual-writing) but continues reading from the old fields.
// Phase 1 Migration
Schema::table('payroll_ledgers', function (Blueprint $table) {
$table->string('new_currency_code', 3)->nullable(); // Nullable to prevent legacy write failures
});
Phase 2: Backfill
We run an asynchronous, chunked background job to migrate historical database records from the old schema format to the new format. This runs outside the deployment pipeline to avoid locking production tables.
Phase 3: Contract (Cleanup)
Once the backfill is complete and we confirm all active traffic is routed to the new code, we deploy a release that reads exclusively from the new schema fields. Finally, we execute a post-deployment migration to drop the old column:
// Phase 3 Migration (Run safely after Blue containers are gone)
Schema::table('payroll_ledgers', function (Blueprint $table) {
$table->dropColumn('old_currency_code');
});
This discipline entirely eliminated deployment-related database lockups and runtime SQL crashes.
Retrospective and Key Lessons
Re-engineering a deployment process is never just about installing a new CI runner. It is a commitment to isolation, statelessness, and backward compatibility.
By moving to version-locked, ephemeral containers, parallelizing integration tests using isolated thread databases, and adopting the Expand/Contract migration pattern, we reduced our pipeline times to under two minutes while maintaining 99.99% availability during deployments.