DCS: Giving the power of configuration to the user
If you’re not familiar with Blend, think of us as a modern experience for getting a loan. We offer a guided, personalized front end application that makes it easy for borrowers to connect their account data and more structured and secure for lenders to process it.
We process loan applications for over 130 financial institutions including Wells Fargo and US Bank. To make the borrowing experience seamless, we integrate behind the scenes with our customers’ in-house tech stacks and dozens of third-party vendors.
We rely on a suite of reusable components to drive our integrations. Building generic components to implement common integration patterns allows us to build new integrations quickly and then test, monitor, and troubleshoot them effectively. Because so many integrations depend on these components, keeping them clean, general, and reliable isn’t just important — it’s critical.
Configurations as the drivers
Integration- and customer-specific configurations drive those generic components. For example, we built an external event system called BONS. BONS configurations contain things like the connection information for receivers of the events and which events to send to each customer. BONS does the heavy lifting of receiving, packaging, and sending events in a generalized way and uses its configurations for the specifics required to tailor those events to each customer.
Maintaining these configurations became a challenge as we built more integrations to power this consumer lending ecosystem. This is the story of how we tackled that challenge.
The problem with in-code configurations
Customer-specific configurations grew over time as we added integration features and onboarded new customers. Two years ago, an engineer could comfortably manage the configuration for a handful of integration features and customers. Since then, our feature set and customers grew faster than our engineering team.
What didn’t change was where these configurations lived — in code. Having these configurations in the same codebase as the service consuming them meant that engineers still needed to make and deploy changes to these configurations. It quickly became a problem for three reasons:
- There were many small changes, and they required significant engineering effort regardless of how small the change was.
- Because customer-specific configurations change frequently, the backlog of requested changes steadily grew, with engineering bandwidth as a bottleneck.
- Engineers were no longer the main stakeholders for these configuration changes. They were requested by customers or other internal teams at Blend (client support, deployments, etc.). Engineers had to make code changes to configurations without much context, or we had to track down that context before making the change. This wasted time and led to mistakes.
Storing configurations in DCS
We needed to remove the need for engineers to make changes to customer-specific configurations. We decided to move these configurations to a place where they could be managed by their primary stakeholders: our support and deployment teams, and eventually our customers.
We built a configuration store called Deployment Configuration Service (DCS) which stores configurations and serves them to our integration services. Its key features are:
- An API that allows configurations to be accessed and modified instantly
- Configuration schema management and versioning
- Configuration validation to prevent invalid and/or harmful modifications
- Grouping of configurations by feature and customer
- An audit trail of changes
Key design challenges
Moving configurations from code to a centralized store introduced several design challenges.
Because our integration services can’t work without these configuration objects, the availability of configurations is critical. With configurations in code, keeping those objects available is as simple as importing them. When configurations are instead being served by a separate service, it’s more difficult to make sure our integrations have the configuration objects they need to work.
DCS runs on the same Kubernetes cluster, as the services that use it. This keeps network communication fast and reliable. We also take advantage of Kubernetes autoscaling, dynamically increasing or decreasing service replicas based on load.
DCS uses Postgres on Amazon RDS. We use RDS snapshots, warm replicas, and hot swaps to make sure we can recover from database-related issues. The service also maintains and scales its own database connection pool to keep queries fast when load increases.
In addition to maintaining high availability of DCS itself, we built a DCS library that allows client services to cache DCS configurations locally. Integration services that implement the cache are further protected from network latency and DCS outages.
When configurations were in code, Git provided more than enough auditability. Combine that with pull requests, code reviews, and the release checks in our build pipeline, and you can see how making changes to configurations in code comes with a hefty paper trail. We needed to maintain this level of transparency so that if things went wrong, we could still piece together what happened and when.
To address this, our configuration values table is append-only, and we track who made the change in the record itself. Even small changes result in a new record, so the full history of an integration’s configuration is cataloged. For an additional layer of redundancy, we plan to start hashing and sending each record to a secondary store. In the event of an audit, we can verify that a configuration’s history was never modified directly in the database.
In addition to making it easier to look through the configuration change history, we wanted to enable active monitoring. When a configuration is changed, DCS notifies several monitoring channels with a summary of the update. This makes updates transparent throughout the company and allows us to spot-check them in real time.
Access and modification control
We were no longer able to use the access and modification controls already in place around making changes to production code. We needed an easy way to control which users viewed and edited configurations while guarding against invalid configuration updates due to human error — things that source control and testing pipelines had been doing for us.
We decided to build a UI for updating configurations. This gave us a single place to manage role-based access and protect configuration updates at a field-by-field level. While the UI creates guardrails around configuration updates, we decided to support encoding configuration schemas in DCS itself. DCS validates incoming update requests to a configuration against that schema and blocks non-conforming updates, preventing invalid configurations from being stored. This provides additional protection and peace of mind, especially when developers use the API directly.
Configuration schema migration
When configurations are in code, the schema of a configuration object and the code that uses it are tightly coupled. When someone needs to change the schema — adding a new key, for example — they can immediately update the consuming code to read from that new key in the same pull request. Those two changes can therefore be made together, tested together, and deployed together in lockstep. When configuration objects are stored in a separate service from the consuming code, it’s harder to keep the two working in sync. To avoid downtime, developers need to coordinate non-backwards-compatible changes to a configuration with a deploy of the consuming service.
To solve for this, we allow each configuration schema to be assigned a version number. The service consuming the configuration object provides the version it currently supports in its request to DCS. Clients of the service can increment that version independently when they make the proper code changes and deploy to migrate to the new configuration schema.
In addition to schema version, the DCS API supports requesting the “latest” configuration object at a particular version or any previously stored configuration object by UUID, allowing client services to toggle between any stored configuration objects for testing or blue/green deploys of updated values via API request parameters.
Certain configurations, particularly ones that are customer-specific or change frequently, become a burden to maintain in code as they scale. Moving these configurations into a database where they can be modified dynamically can reduce the burden. However, this comes with its own set of challenges. What you get for free with in code configurations — availability, auditability, access control, and configuration schema updates — become major considerations in designing and implementing your database solution.
We’ve seen dramatic benefits from moving our customer-specific integration configurations into a centralized datastore. Configuration changes that once required pull requests, code reviews, multiple engineers, and sometimes days of effort can now be made instantly by customer-facing teams that have the most context on the change.
We’ve increased the velocity and reduced the engineering effort required to make configuration changes in our integration layer. This allows us to stand up more integrations with more customers and efficiently test and maintain new configuration-driven features with existing customers.
Gabe is a software engineer based out of Blend’s New York office. He’s focused on scaling our integrations and evolving our microservice ecosystem.
Interested in joining Blend in San Francisco or New York? We’re hiring and would love to talk to you!