How we stopped wasting time building custom integrations
At Blend, we’re always working to increase transparency and equity in access to consumer lending and lending-adjacent markets. The current technical ecosystem of consumer lending is disjointed. Much like bridges bring communities together in the real world, much of what Blend does depends on our ability first to construct virtual bridges (integrations) between these existing, disjointed systems, and second, to create a unified experience for our users. Much like in the real-world scenario, building bridges isn’t just about plopping the physical structure into place; instead, the real challenge is connecting the people on either side of that bridge and helping them communicate. The parallel in the world of software is uncanny: hundreds of different systems — many speaking completely different languages with completely different contexts and databases across multiple network boundaries — must work together with a common goal to create clarity for the user. That act — teaching systems a common language and orchestrating their work together for a greater purpose — is at the very core of what the Integrations Engineering team at Blend does on a daily basis.
Blend’s operation in the finance space demands a focus on quality, security, and reliability of productionalized code. Our clients (and our clients’ clients) expect performance and availability of services. Practically, this translates to a heterogenous mixture of both enterprise and consumer-grade integrations with low-error budgets and tight SLAs. The challenges of such integrations are well-known, well-studied, and well-documented by industry luminaries (I’m an embarrassingly huge fan of Martin Fowler in the area of enterprise architecture thought leadership). Overcoming those challenges is as much art as it is science.
As a fast-growing company (both in the size of our development team and the size of our customer base), we anticipated engineering bandwidth problems nearly a full calendar year ago, as we realized the average integration project for a single system required the energy of a full-time engineer. This one-to-one pattern was not financially scalable if we hoped to transform the entire consumer lending space. We needed to come up with a solution.
Each engineer was responsible for maintaining integrations with a single client.
Each engineer is capable of maintaining multiple client integrations via the standard BONS interface.
While there are a variety of API convergence products on the market, our team felt, after careful review, that no out-of-the-box solutions met our needs. Those with sufficient flexibility to allow the easy integration of custom legacy solutions and modern web solutions did not provide adequate security, and, conversely, those that provided adequate security did not provide the breadth of integration capabilities — or flexibility — that we knew from experience we required.
What did we do?
Enter BONS: the Blend Omnichannel Notification System. BONS (so named because I love the similarly-titled candy) is a hybrid RESTful API/webhook event service that enables our Integrations Engineers to interact with almost any digital interface through configuration rather than customized code. This frees up our developers to drive our product forward, rather than get bogged down in maintenance engineering.
Why did we do it?
You might be thinking to yourself, “Writing a custom integrations platform from scratch sounds like a really large investment… Why should I take that risk?” You’d be right to ask this!
By December of 2016, the integration path with superbank A, our first enterprise customer, was ongoing and a very large time commitment on multiple dimensions of our organization (product, engineering, infrastructure, design, compliance and legal, etc.) We were in the midst of our second design session with our next enterprise customer (we’ll call them superbank B) when we realized: we cannot possibly simultaneously scale both of these clients at the velocity we’d like with our current work pattern!
Adding to our concern was the small size of our team. We were a textbook example of the Keyholder’s Dilemma: Each team member was the expert in her own deployment vertical; a single departure could have caused catastrophic interruptions to our delivery plans, meaning increased risk for both Blend and our clients.
Then, of course, there was the legal and compliance work required to get each new integration off the ground in a sustainable, secure way. Even if we were able to shoulder the superhuman workload of engineering the solutions needed for present and future clients, we’d still need to iterate on those solutions with the compliance and legal teams from both Blend and the client organization. Anyone that has undergone security reviews knows this essential process can take weeks — or months — depending on system complexity, data sensitivity, and the number of stakeholders. It was a risk level we couldn’t accept for a process we couldn’t avoid.
With all of this in mind, some back-of-the-napkin estimation revealed a troubling trend: We’d need to hire one full-stack engineer per work stream at each bank. That was simply not realistic, nor was it aligned with our medium- or long-term growth strategy.
As a team, we sat down to brainstorm: How can we simultaneously design, build, test, and deploy a repeatable, scalable integration system while also working at 150 percent of capacity on only two current customers? How can we create a compliance and security scenario that requires minimal review with each additional client, but maintains the same high standards Blend espouses for all projects?
The answer seemed simple: work with our customers to align the build of our existing workload our integration enablement roadmap with the Agile build and rollout of what would become the BONS system…integrate our plans into existing client roadmaps to create efficiencies for both them and us! Some quick planning and sketches, followed by conversations with key technical contacts at superbank B, led us to realize that not only did we have their support, but they were excited to innovate and build something new because of the efficiencies it would create for them in the long-term future of our partnership. (And, candidly, we were infectiously excited about the prospect of building something awesome).
With whiteboards, pens, and ideas in hand, we embarked on an awesome development journey with superbank B over the course of nearly six months. But back to our original question: We mitigated the risk of development cost by making the new system a core part of a customer project. So how did we do it?
An Introduction to BONS
At its core, BONS is an event-driven system (read more about the Event Sourcing pattern from Martin Fowler). The service sits between the Blend internal microservices ecosystem and our customers’ systems. More generically, it creates a bridge between systems that produce actionable information and systems that consume events related to that information. It coordinates data-change events between those internal microservices for our customers, providing a simple, well-typed interface for change events that may drive downstream actions in a consuming service. The primary goal is to be able to intercept those messages, decorate them with client-specific configuration data, and slingshot them to the appropriate downstream client systems, all the while tracking the state of events to provide an auditable window into the system. What makes BONS truly special, however, is its ability to work in concert with Blend’s API: Events are designed to help an appropriately credentialed and authorized consuming system dynamically call Blend’s API resources in post-processing by providing relative linking as an out-of-the-box attribute of the event messages. Combining these two systems enables external parties to react in near-real time to borrower activities on the Blend platform.
Dynamic Message Configuration
While having an opinionated interface may sound like an obvious idea, it actually becomes rather challenging when all of the services that must consume it speak different languages of different complexities (or, to add a dimension of complexity: a mixture of both different and non-compatible versions of the same languages and completely different languages). In addition to lexical variation, each supported system may be able to handle (or even require) more or less data than similar systems.
In order to tackle this challenge, we introduced the idea of dynamic configurations: database- or code-backed configuration objects that contain information about four key areas.
It’s not enough to simply persist an event — we need to understand from where the event originated to understand where it needs to go next. Internally, we handle origination information with naming conventions that are centrally managed by our service manager, Rolodex. When an event is sent to BONS, it already contains all the background information we need to make future routing decisions. We retrieve any pre-processing steps for event formation from the event routing directives (where do we send this when it’s ready, how do we authenticate, and what do we do when it succeeds or fails?) in the routing layer before passing the decorated context and inbound event to the preprocessor.
Once the Message Shaper receives the decorated context from the router, we build the outbound event object that will be shared with subscribing systems. The recipe for constructing the object is built into the configuration context provided by the router. The core event structure is well-defined and typed: We define a few basic attributes that must exist for every event (e.g., a contextually unique identifier, timestamps, primary object reference information, etc.).
We also allow for the addition of a metadata sub-object that may contain information specific to a particular subscriber or message type to allow for flexibility in message content across a wide range of potential consumers. Each configuration contains self-validation information, which is used to validate the finished object before it is passed to the Dispatcher to be sent to the subscribing event consumers.
The Dispatcher receives an instance of the Event class, along with the requisite routing configuration context, from the Message Shaper. For increased security, routing configuration contains encrypted information about where to find the necessary secret values required for authorization to external systems. A secondary system handles retrieving, using, and destroying secret information when requests are eventually dispatched. After the request is properly built and the Event object translated into the output format of the consuming system, the Event is sent to the proper set of subscribers within the context of the originating system’s routing configuration. Depending on the synchronous or asynchronous nature of the event passed from the router (also part of the routing directives), BONS’ persistent representation of the Event’s state will be updated.
Event Management and Recovery
Events have a state diagram that permits only certain transitions. Interpreting those states and enforcing their transition rules is the job of the Manager. BONS simultaneously supports synchronous and asynchronous operation based on event-client configuration pairs. In the first case (synchronous operation), the Event’s state is transitioned immediately upon receipt of response data from the Dispatcher’s external systems request. In the second case, we must curate the behavior of mid-process asynchronous Events. After a message is dispatched, it may be immediately transitioned to the ACKNOWLEDGED or FAILED state, depending on the synchronous ACK of the external system. For acknowledged events, BONS will hold the Event open as it waits for confirmation that external downstream processing has begun or that, again, there has been a failure. In the positive case, BONS will wait for confirmation from the external consumer that processing has completed — again with the option of SUCCEEDED or FAILED. If things do not go as planned at any point, the Event may be selected for retry by the Manager. Once retries are exhausted, the Event will be in an unrecoverable state, causing its inclusion in various monitoring dashboards and reports for investigation.
The development process for the underlying systems took nearly five months to bring to a productionizable state. In that time, we were lucky to have fantastic development partners and willing testers internally at Blend and externally at superbank B. Simultaneously, we were able to develop the individual features required for superbank B as reusable, generic configurations for the BONS system, rather than one-off custom integrations.
In May of 2017, over a very large quantity of pizza and several on-site days with superbank B, we rolled the system into our beta environments for user acceptance testing. We spent several more weeks making iterative changes — the result of heavy QA to the system — to improve ease of monitoring, reliability, and auditability. Finally, in the wee hours of an otherwise normal day in late May, again over copious amounts of pizza, we deployed the entire system to production. Blend and superbank B developers waited in anticipation-laden silence for the first production Events to move through the system. After roughly three very long minutes, the data began to flow: database logs started logging, dashboards sprung to life, and the system was live!
How did it work?
The results of our months of work were more than we could have hoped for. The most obvious benefit and the impetus for our original design — the ability to scale at a better ratio than one developer per system — was almost immediately realized. Instead of managing a single deployment, a software engineer on the Integrations team could now handle five to ten rollouts simultaneously using pre-built BONS configurations. More importantly, the time to go live for previously complex integrations decreased from a scale of quarters to a scale of weeks. Our team had increased capacity and velocity. There were other downstream benefits, as well.
Firstly, the documentation created in partnership with superbank B around BONS configurations was generalized for use by Blend’s Sales, Professional Services, and Account Management teams, allowing them to provide customers with up-to-date, a la carte options for integrations solutions that had defined implementation processes and timelines. Providing a confident timeline and resource estimations for customers meant happier customers.
Secondly, having centralized and standardized documentation for core integration paths meant easier internal conversations between technical and non-technical departments about rollout timelines and effort-of-work estimations. Creating a common language around integrations enabled Blend’s customer-facing teams to speak with one voice.
Finally, the event-driven nature of BONS meant that understanding where problems occurred in the often complex flow of data from borrower to lender — and beyond — was easier than ever; well-defined event transition boundaries meant that the time required for software engineers to investigate issues decreased markedly. More importantly, by providing step-by-step process instructions and troubleshooting guides, Blend was able to empower non-technical assets with the ability to troubleshoot perceived issues before escalating to software engineers. This enabled a faster on-the-ground response for our customers and quicker time to resolution when software engineers did need to be involved, thanks to detailed problem reports generated by the system.
What is next?
At Blend, we believe that just because something works doesn’t mean it can’t be improved! To that end, we’ve got big plans for BONS. A few current projects include:
Task Graphs for Asynchronous Events
Many of Blend’s integration tasks are asynchronous in nature. At present, an inbound event may be mapped to only a single consequence within a tenant/instance context; that is, the origin node of the task graph for our current system has, at most, one downstream node. To further enhance our processing abilities, we’re working on a system that will allow for events to dynamically generate instances of pre-configured task graphs (sequences of integration tasks with certain completion dependencies). This will pave the way for Blend to be able to quickly and easily provide our clients with even more comprehensive and complex integration services, while not increasing time to launch or maintenance costs.
To allow better interaction with the BONS system for non-technical users, we plan to present an internal user interface that will assist in the visualization of event processing graphs for other departments within Blend. In addition, this will address one of our largest opportunities for increasing efficiency: the ability of non-engineering users to declaratively configure integrations with little to no engineering support.
At present, BONS uses a single retry strategy for all event types. Our team plans to incorporate customizable retry strategies in the future. This will enable the system to more quickly stop pointless retries on non-recoverable error classes and alert a human, further decreasing time to resolution for system issues.
As the number of systems that our team touches increases, so, too, will the need for ever-more transparent mechanisms for managing inter- and intra-system communications. Just as in real-world bridge building, leveraging new technologies and design patterns will continue to push the limits of efficiency and effectiveness for Blend’s engineers and customers. Thanks to our growth, our amazing team, and our adventurous, pioneering customers, Blend is uniquely positioned to continue building bridges for years to come.
If you’re interested in helping us build virtual bridges and teach machines to work together, if you love integrations engineering, or if you just absolutely love the idea of Blend and our quest to transform lending through technology, let’s chat.