How to unlock your system’s scalability and maintainability for the demands of modern age business.
We’re living in a world where everything is becoming distributed. Data, systems, teams—they’ve been getting more and more distributed for a number of years now.
About a decade ago, the microservices revolution and the advent of the cloud shook up enterprise IT architectures for good. Nowadays, GraphQL is all the rage and making waves in enterprises, in particular thanks to Apollo’s supergraph and its ability to unify data and services in an organization-wide composition layer.
However, in one area of work that my company does, I’ve noticed what could be summed up as follows: GraphQL is the torch shining a light on bad architecture.
What do I mean by this?
Ultimately, building a successful supergraph that can query your entire organization as a graph requires following what I like to call the three commandments of software architecture:
The reason for this is that it’s much easier to put individual domains that have their own responsibilities on a supergraph than trying to do the same with a legacy monolith that has all the concerns mixed up.
Therefore, whether you’re looking to adopt GraphQL at your company, or just want to improve your codebase’s maintainability and scalability, you will want to refactor highly intertwined code into smaller and more manageable modules—a process also known as decoupling.
Life after decoupling and the scaling paradox
Let’s imagine we have a growing monolith. This is usually the case with enterprise systems, which is understandable because they didn’t start yesterday. As time goes on, the monolith grows out of proportion and becomes increasingly difficult to maintain.
At the extreme, it gets to a point where nobody wants to touch it any more, in fear of breaking something. Meanwhile, the business requires ever more features to be delivered quickly. Invariably, frustrations grow and something needs to be done.
Thus, we embark on a journey to split the monolith apart into many specialized modules. Once this is done, the system becomes much more maintainable, as the modules can be developed, tested, and reasoned about independently of one another. We can finally work with our system again. Yay!
As the decoupled system grows further, a new frustration comes along. It is now so big that we need a number of teams to operate in parallel and go through the software development life cycle independently, so as to keep up with the demand for speed of delivery. This means that each team in charge of a particular domain/module requires its own runtime to deploy to.
Consequently, we now face the challenge of creating a distributed system, where modules have to talk to each other across deployment runtimes. Some of the challenges that we will encounter are:
- How do we test all the changes being made in a safe manner?
- How do we make sure the runtimes can depend on each other not to break?
- How do we make sure the systems are performant and don’t put each other into a deadlock?
- How do we bring together the data they all produce to a single consumer?
This is the paradox of scaling software systems. We reduced the complexity of a large monolith through decoupling, only to run into another set of complexities that come from distribution.
What can we do about it then?
Event-driven architecture to the rescue
Luckily, there’s no need to reinvent the wheel to meet the distribution challenge, as there are tried and tested industry practices in place. In my experience, both as a consultant and as the CEO of a high-end software development company, event-driven architecture (EDA) is preferred for building state-of-the-art distributed systems.
EDA’s fundamental concept is that of events, which can be defined as changes in state that are meaningful to the business. Examples could be “item added to basket” or “item purchased”. The occurrence of an event triggers a reaction in the system, e.g. when a user checks out, their credit card will be charged.
Notably, EDA uses asynchronous communication, which allows for rapid, real-time, reliable communication between multiple independent modules of our distributed system by means of a message broker. As a result, we can design a process in a way that, in case one service is temporarily unavailable, dependent processes don’t fail but continue as per some defined policies.
All in all, event-driven architecture brings enormous benefits to the table: scalability, reliability, and the ability to handle high throughput with low latency. And when I say it’s a leading industry practice that’s gaining more and more popularity, you don’t have to take my words for granted.
According to a 2021 survey of 840 professionals across 9 countries, 23% of businesses boast a robust event distribution ecosystem, while 13% achieved EDA maturity. Further, 19% have a central team that promotes EDA across their orgs, 18% are experimenting with EDA in multiple use cases, while 7% are planning to do so.
Only 8% of the respondents had no plans to look into building event-driven systems and I think it’s safe to assume this number may have fallen down further by now.
If you’re looking to catch up with this industry-wide trend then I have one piece of advice for you: consider event sourcing.
Event sourcing: the future-proof design pattern
Essentially, event sourcing is event-driven architecture on steroids. Or, you can think of it this way—by applying event sourcing you can get EDA as a side effect, for free!
How’s that? Before I explain, let’s first look into the basics of event sourcing.
Most apps need to store data in some fashion. When storing data in a traditional database, the focus is on the data values, i.e. the current state of the database. The point of event sourcing is that events become the source of truth for our system, so instead of a static view of data, we keep a record of state changes (events). Say goodbye to traditional databases, and welcome event stores.
Now, I know you might be thinking: “Why would I do that?” The short answer is: because event sourcing grants unique superpowers.
Since data is stored as a series of events (state changes), then, in order to query a system, we need to play back a subset of events that is relevant for that query. We call this “projecting events” and the result of such an operation is a “projection.”
Now, consider the following use case. As an e-commerce company, you’d like to know how many times users added and removed items to the cart before they finally bought something. You just can’t get such information from a traditional database right on the spot. You’d have to build appropriate reporting first and only then start collecting the data from that point onwards.
In an event-sourced system, you just need to do a projection to obtain such information, which is just a matter of minutes. You can actually generate such a report for any specific period of time. Thanks to placing events as the source of truth in your system, you effectively gain time machine capabilities when it comes to reporting and data analysis.
The great benefit of this is unprecedented business intelligence. However, the flexibility in handling data granted by event sourcing isn’t limited to internal use cases.
Let’s say there’s new demand for data coming from the frontend. In a traditional database setup, this may require changes in the system to provide a new form of data. Meanwhile, in an event-sourced system it’s very easy to meet this demand by simply replaying events and creating a new read model to suit every purpose.
This has enormous consequences for customer experience. Event sourcing allows you to meet new demand for data much easier and incredibly faster, greatly facilitating the best practice of demand-oriented GraphQL schema design. Just think of the time and costs saved as a result.
Previously, I mentioned that with event sourcing you can get event-driven architecture as a side effect. This is because if you’re already storing events then it’s just a matter of publishing them so as to use them as a means of communication between distributed modules. There you go, event-driven architecture on steroids!
Let go of database-driven development
Nevertheless, with great power comes great responsibility and it’s true that event sourcing comes with a learning curve. The problem is that most developers are so focused on databases and storage mechanisms that it forces them to always think in terms of current state.
You can hardly blame people, as that’s just usually the way they learn to code in the first place. It’s also true that storing data as events comes with additional complexity that is non-existent in current state tables.
I strongly believe though that embracing this complexity is a negligible cost when weighed against the great powers acquired in return. Let’s sum some of them up:
- Event sourcing is built for integration from the get go which makes it an ideal pattern for modern distributed systems.
- Event sourcing allows for flexible shaping of data, making it a breeze to meet the demand for data from clients.
- Event sourcing projections enable outstanding business intelligence capabilities with the possibility to query systems for any given time period.
And that’s actually just the tip of the iceberg. A 2021 qualitative study of 25 engineers involved in building event-sourced systems found that event sourcing allows to decrease the complexity of large systems, thus enabling their development and maintainability. Further, thanks to storing state changes in an event store, event-sourced systems boast improved reliability, since engineers can replay the events in a projection after a system failure and fix what went wrong. Last but not least, the high scalability of event-sourced systems enables them to serve large numbers of end-users.
Where to start
Like I already said, event sourcing does come with a learning curve. That’s why at Xolvio, the digital enablement company that helps engineering organizations scale deliberately, we put together a learning repository that you can use as a starting point for how to build an event-sourced system using Typescript.
On a final note, if you’re looking to transform your software delivery for the age of distributed systems, then I’m sure you’ll be interested to read a little about Deliberate Scaling—Xolvio’s proprietary approach to modernizing enterprise IT architecture and delivery in a stepwise fashion for maximum scalability and code quality.
Let me know if you have any questions or thoughts in the comments below.
Let us help you on your journey to Quality Faster
We at Xolvio specialize in helping our clients get more for less. We can get you to the holy grail of continuous deployment where every commit can go to production — and yes, even for large enterprises.
Feel free to schedule a call or send us a message below to see how we can help.
orBook a call
Event Storming: The Visual Requirements Facilitation Technique to Accelerate Software Design and Development
An introduction to the fun and effective workshop technique that unlocks heightened levels of collaboration across silo and specialization boundaries.
Introducing GraphQL Schema Storming
How to annotate user experiences in a collaborative fashion to facilitate and accelerate GraphQL schema design.
Escape and Xolvio Introduce GraphQL Security Review and Remediation Services
Announcing our official partnership with Escape to help teams secure their GraphQL APIs.