Systems design is the process of defining the architecture, modules, interfaces, and data for a system to satisfy specified requirements. Systems design could be seen as the application of systems theory to product development.
CQRS has been around for a long time, but if you’re not familiar with it, it’s new to you. Take a look at a quick introduction to what it is and how it works.
CQRS, Command Query Responsibility Segregation, is a method for optimizing writes to databases (queries) and reads from them (commands). Nowadays, many companies work with one large database. But these databases weren’t originally built to scale. When they were planned in the 1990s, there wasn’t so much data that needed to be consumed quickly.
In the age of Big Data, many databases can’t handle the growing number of complex reads and writes, resulting in errors, bottlenecks, and slow customer service. This is something DevOps engineers need to find a solution for.
Take a ride-sharing service like Uber or Lyft (just as an example; this is not an explanation of how these companies work). Traditionally (before CQRS), the client app queries the service database for drivers, whose profiles they see on their screens. At the same time, drivers can send commands to the service database to update their profile. The service database needs to be able to crosscheck queries for drivers, user locations, and driver commands about their profile and send the cache to client apps and drivers. These kinds of queries can put a strain on the database.
CQRS to the rescue. CQRS dictates the segregation of complex queries and commands. Instead of all queries and commands going to the database by the same code, the queries and data manipulation routines are simplified by adding a process in between for the data processing. This reduces stress on the database by enabling easier reading and writing cache data from the database.
A manifestation of such a segregation would be:
Pushing the commands into a robust and scalable queue platform like Apache Kafka and storing the raw commands in a data warehouse software like AWS Redshift or HP Vertica.
Creating an aggregated cache table where data is queried from and displayed to the users.
The magic happens in the streaming part. Advanced calculations based on Google’s MapReduce technology enable quick and advanced distributed calculations across many machines. As a result, large amounts of data are quickly processed, and the right data gets to the right users, quickly and on time.
CQRS can be used by any service that is based on fast-growing data, whether user data (i.e. user profiles) or machine data (i.e. monitoring metrics). If you want to learn more about CQRS, check out this article.
One of the most common, and efficient, models deployed by enterprises is the Layered Architecture, also called the n-tiered pattern. It packs similar components together in a horizontal manner and is self-independent. What does that mean?
It implies that the layers of the model are interconnected to each other but not interdependent. Similar components stay at the same level allowing the layers to be separated inadvertently based on the nature of the code. It is this isolation, that lends the software layers an independent nature.
Consider an instance, wherein you’d want to switch from an Oracle database to an SQL. This shift may cause you to upend the database layer but will not have a domino effect on any other layer.
Evidently, it serves a challenge for an enterprise software architect to create layers that separate from each other. Nevertheless, since the roles of each layer are clearly distinct, it accredits this software development architecture the following qualities:
It is easily maintainable as enterprise software developers with limited, or should we say pertinent, knowledge can be assigned to operate on a single layer.
You can test changes in layers separately from each other.
Upgraded versions of the software can be implemented effortlessly.
The flow of code is top-down, meaning it enters the presentation layer first and trickles down to the bottom-most layer that is the database layer. Each layer has a designated task based on the nature of the components it preserves. These could be checking the consistency of values within the code or reformatting the code altogether.
Refactoring – a key – way to lower frontend maintenance cost is a software development process by which developers change the internal shape and size of the code. They do it without affecting its external attributes can also be carried out in an n-tiered model.
This software development architecture can be customized to add layers to the presentation, business, persistence, and database levels. Such a model is called a Hybrid Layered architecture.
Amongst the various types of software architecture, the layered variant suits enterprises that don’t want to go overboard with experimenting and want to stick to the traditional software architecture design patterns.
Testing components become relatively easier as inter-dependencies are negligible in this format of software development engineering.
Considering many software frameworks were built with the backdrop of an n-tiered structure, applications built with them, as a result, happen to be in the layered format as well.
Larger applications tend to be resource-intensive if based on this format, therefore for such projects, it is advised to overlook the layered pattern.
Although the layers are independent, yet the entire version of the software is installed as a single unit. Therefore, even if you update a single layer, you would have to re-install the entire apparatus all over again.
Such systems are not scalable due to the coupling between the layers.
The Layered architecture pattern suits the niche of LOB i.e. Line of Business Applications. These are applications that are essential to the functioning of the business itself. For instance, the accounts department of an organization needs software such as QuickBooks, Xero, Sage or Wave Accounting for keeping financial data.
Similarly, the marketing team would demand a customer relationship management software slash tool to help them cope with the volume of interactions. In short, applications that do more than just CRUD (create, read, update, and delete) operations are suited to the layered architecture pattern.
After listening how the community has interpreted Command-Query Responsibility Segregation I think that the time has come for some clarification. Some have been tying it together to Event Sourcing. Most have been overlaying their previous layered architecture assumptions on it. Here I hope to identify CQRS itself, and describe in which places it can connect to other patterns. This is quite a long post. Why CQRS Before describing the details of CQRS we need to understand the two main driving forces behind it: collaboration and staleness. Collaboration refers to circumstances under which multiple actors will be using/modifying the same set of data – whether or not the intention of the actors is actually to collaborate with each other. There are often rules which indicate which user can perform which kind of modification and modifications that may have been acceptable in one case may not be acceptable in others. We’ll give some examples shortly. Actors can be human like normal users, or automated like software.Staleness refers to the fact that in a collaborative environment, once data has been shown to a user, that same data may have been changed by another actor – it is stale. Almost any system which makes use of a cache is serving stale data – often for performance reasons. What this means is that we cannot entirely trust our users decisions, as they could have been made based on out-of-date information.Standard layered architectures don’t explicitly deal with either of these issues. While putting everything in the same database may be one step in the direction of handling collaboration, staleness is usually exacerbated in those architectures by the use of caches as a performance-improving afterthought.A picture for referenceI’ve given some talks about CQRS using this diagram to explain it:The boxes named AC are Autonomous Components. We’ll describe what makes them autonomous when discussing commands. But before we go into the complicated parts, let’s start with queries:QueriesIf the data we’re going to be showing users is stale anyway, is it really necessary to go to the master database and get it from there? Why transform those 3rd normal form structures to domain objects if we just want data – not any rule-preserving behaviors? Why transform those domain objects to DTOs to transfer them across a wire, and who said that wire has to be exactly there? Why transform those DTOs to view model objects?In short, it looks like we’re doing a heck of a lot of unnecessary work based on the assumption that reusing code that has already been written will be easier than just solving the problem at hand. Let’s try a different approach:How about we create an additional data store whose data can be a bit out of sync with the master database – I mean, the data we’re showing the user is stale anyway, so why not reflect in the data store itself. We’ll come up with an approach later to keep this data store more or less in sync.Now, what would be the correct structure for this data store? How about just like the view model? One table for each view. Then our client could simply SELECT * FROM MyViewTable (or possibly pass in an ID in a where clause), and bind the result to the screen. That would be just as simple as can be. You could wrap that up with a thin facade if you feel the need, or with stored procedures, or using AutoMapper which can simply map from a data reader to your view model class. The thing is that the view model structures are already wire-friendly, so you don’t need to transform them to anything else.You could even consider taking that data store and putting it in your web tier. It’s just as secure as an in-memory cache in your web tier. Give your web servers SELECT only permissions on those tables and you should be fine.Query Data StorageWhile you can use a regular database as your query data store it isn’t the only option. Consider that the query schema is in essence identical to your view model. You don’t have any relationships between your various view model classes, so you shouldn’t need any relationships between the tables in the query data store.So do you actually need a relational database?The answer is no, but for all practical purposes and due to organizational inertia, it is probably your best choice (for now).Scaling QueriesSince your queries are now being performed off of a separate data store than your master database, and there is no assumption that the data that’s being served is 100% up to date, you can easily add more instances of these stores without worrying that they don’t contain the exact same data. The same mechanism that updates one instance can be used for many instances, as we’ll see later.This gives you cheap horizontal scaling for your queries. Also, since your not doing nearly as much transformation, the latency per query goes down as well. Simple code is fast code.Data modificationsSince our users are making decisions based on stale data, we need to be more discerning about which things we let through. Here’s a scenario explaining why:Let’s say we have a customer service representative who is one the phone with a customer. This user is looking at the customer’s details on the screen and wants to make them a ‘preferred’ customer, as well as modifying their address, changing their title from Ms to Mrs, changing their last name, and indicating that they’re now married. What the user doesn’t know is that after opening the screen, an event arrived from the billing department indicating that this same customer doesn’t pay their bills – they’re delinquent. At this point, our user submits their changes.Should we accept their changes?Well, we should accept some of them, but not the change to ‘preferred’, since the customer is delinquent. But writing those kinds of checks is a pain – we need to do a diff on the data, infer what the changes mean, which ones are related to each other (name change, title change) and which are separate, identify which data to check against – not just compared to the data the user retrieved, but compared to the current state in the database, and then reject or accept.Unfortunately for our users, we tend to reject the whole thing if any part of it is off. At that point, our users have to refresh their screen to get the up-to-date data, and retype in all the previous changes, hoping that this time we won’t yell at them because of an optimistic concurrency conflict.As we get larger entities with more fields on them, we also get more actors working with those same entities, and the higher the likelihood that something will touch some attribute of them at any given time, increasing the number of concurrency conflicts.If only there was some way for our users to provide us with the right level of granularity and intent when modifying data. That’s what commands are all about.CommandsA core element of CQRS is rethinking the design of the user interface to enable us to capture our users’ intent such that making a customer preferred is a different unit of work for the user than indicating that the customer has moved or that they’ve gotten married. Using an Excel-like UI for data changes doesn’t capture intent, as we saw above.We could even consider allowing our users to submit a new command even before they’ve received confirmation on the previous one. We could have a little widget on the side showing the user their pending commands, checking them off asynchronously as we receive confirmation from the server, or marking them with an X if they fail. The user could then double-click that failed task to find information about what happened.Note that the client sends commands to the server – it doesn’t publish them. Publishing is reserved for events which state a fact – that something has happened, and that the publisher has no concern about what receivers of that event do with it.Commands and ValidationIn thinking through what could make a command fail, one topic that comes up is validation. Validation is different from business rules in that it states a context-independent fact about a command. Either a command is valid, or it isn’t. Business rules on the other hand are context dependent.In the example we saw before, the data our customer service rep submitted was valid, it was only due to the billing event arriving earlier which required the command to be rejected. Had that billing event not arrived, the data would have been accepted.Even though a command may be valid, there still may be reasons to reject it.As such, validation can be performed on the client, checking that all fields required for that command are there, number and date ranges are OK, that kind of thing. The server would still validate all commands that arrive, not trusting clients to do the validation.Rethinking UIs and commands in light of validationThe client can make of the query data store when validating commands. For example, before submitting a command that the customer has moved, we can check that the street name exists in the query data store.At that point, we may rethink the UI and have an auto-completing text box for the street name, thus ensuring that the street name we’ll pass in the command will be valid. But why not take things a step further? Why not pass in the street ID instead of its name? Have the command represent the street not as a string, but as an ID (int, guid, whatever).On the server side, the only reason that such a command would fail would be due to concurrency – that someone had deleted that street and that that hadn’t been reflected in the query store yet; a fairly exceptional set of circumstances.Reasons valid commands fail and what to do about itSo we’ve got a well-behaved client that is sending valid commands, yet the server still decides to reject them. Often the circumstances for the rejection are related to other actors changing state relevant to the processing of that command.In the CRM example above, it is only because the billing event arrived first. But “first” could be a millisecond before our command. What if our user pressed the button a millisecond earlier? Should that actually change the business outcome? Shouldn’t we expect our system to behave the same when observed from the outside?So, if the billing event arrived second, shouldn’t that revert preferred customers to regular ones? Not only that, but shouldn’t the customer be notified of this, like by sending them an email? In which case, why not have this be the behavior for the case where the billing event arrives first? And if we’ve already got a notification model set up, do we really need to return an error to the customer service rep? I mean, it’s not like they can do anything about it other than notifying the customer.So, if we’re not returning errors to the client (who is already sending us valid commands), maybe all we need to do on the client when sending a command is to tell the user “thank you, you will receive confirmation via email shortly”. We don’t even need the UI widget showing pending commands.Commands and AutonomyWhat we see is that in this model, commands don’t need to be processed immediately – they can be queued. How fast they get processed is a question of Service-Level Agreement (SLA) and not architecturally significant. This is one of the things that makes that node that processes commands autonomous from a runtime perspective – we don’t require an always-on connection to the client.Also, we shouldn’t need to access the query store to process commands – any state that is needed should be managed by the autonomous component – that’s part of the meaning of autonomy.Another part is the issue of failed message processing due to the database being down or hitting a deadlock. There is no reason that such errors should be returned to the client – we can just rollback and try again. When an administrator brings the database back up, all the message waiting in the queue will then be processed successfully and our users receive confirmation.The system as a whole is quite a bit more robust to any error conditions.Also, since we don’t have queries going through this database any more, the database itself is able to keep more rows/pages in memory which serve commands, improving performance. When both commands and queries were being served off of the same tables, the database server was always juggling rows between the two.Autonomous ComponentsWhile in the picture above we see all commands going to the same AC, we could logically have each command processed by a different AC, each with it’s own queue. That would give us visibility into which queue was the longest, letting us see very easily which part of the system was the bottleneck. While this is interesting for developers, it is critical for system administrators.Since commands wait in queues, we can now add more processing nodes behind those queues (using the distributor with NServiceBus) so that we’re only scaling the part of the system that’s slow. No need to waste servers on any other requests.Service LayersOur command processing objects in the various autonomous components actually make up our service layer. The reason you don’t see this layer explicitly represented in CQRS is that it isn’t really there, at least not as an identifiable logical collection of related objects – here’s why:In the layered architecture (AKA 3-Tier) approach, there is no statement about dependencies between objects within a layer, or rather it is implied to be allowed. However, when taking a command-oriented view on the service layer, what we see are objects handling different types of commands. Each command is independent of the other, so why should we allow the objects which handle them to depend on each other?Dependencies are things which should be avoided, unless there is good reason for them.Keeping the command handling objects independent of each other will allow us to more easily version our system, one command at a time, not needing even to bring down the entire system, given that the new version is backwards compatible with the previous one.Therefore, keep each command handler in its own VS project, or possibly even in its own solution, thus guiding developers away from introducing dependencies in the name of reuse (it’s a fallacy). If you do decide as a deployment concern, that you want to put them all in the same process feeding off of the same queue, you can ILMerge those assemblies and host them together, but understand that you will be undoing much of the benefits of your autonomous components.Whither the domain model?Although in the diagram above you can see the domain model beside the command-processing autonomous components, it’s actually an implementation detail. There is nothing that states that all commands must be processed by the same domain model. Arguably, you could have some commands be processed by transaction script, others using table module (AKA active record), as well as those using the domain model. Event-sourcing is another possible implementation.Another thing to understand about the domain model is that it now isn’t used to serve queries. So the question is, why do you need to have so many relationships between entities in your domain model?(You may want to take a second to let that sink in.)Do we really need a collection of orders on the customer entity? In what command would we need to navigate that collection? In fact, what kind of command would need any one-to-many relationship? And if that’s the case for one-to-many, many-to-many would definitely be out as well. I mean, most commands only contain one or two IDs in them anyway.Any aggregate operations that may have been calculated by looping over child entities could be pre-calculated and stored as properties on the parent entity. Following this process across all the entities in our domain would result in isolated entities needing nothing more than a couple of properties for the IDs of their related entities – “children” holding the parent ID, like in databases.In this form, commands could be entirely processed by a single entity – viola, an aggregate root that is a consistency boundary.Persistence for command processingGiven that the database used for command processing is not used for querying, and that most (if not all) commands contain the IDs of the rows they’re going to affect, do we really need to have a column for every single domain object property? What if we just serialized the domain entity and put it into a single column, and had another column containing the ID? This sounds quite similar to key-value storage that is available in the various cloud providers. In which case, would you really need an object-relational mapper to persist to this kind of storage?You could also pull out an additional property per piece of data where you’d want the “database” to enforce uniqueness.I’m not suggesting that you do this in all cases – rather just trying to get you to rethink some basic assumptions.Let me reiterateHow you process the commands is an implementation detail of CQRS.Keeping the query store in syncAfter the command-processing autonomous component has decided to accept a command, modifying its persistent store as needed, it publishes an event notifying the world about it. This event often is the “past tense” of the command submitted:MakeCustomerPerferredCommand -> CustomerHasBeenMadePerferredEventThe publishing of the event is done transactionally together with the processing of the command and the changes to its database. That way, any kind of failure on commit will result in the event not being sent. This is something that should be handled by default by your message bus, and if you’re using MSMQ as your underlying transport, requires the use of transactional queues.The autonomous component which processes those events and updates the query data store is fairly simple, translating from the event structure to the persistent view model structure. I suggest having an event handler per view model class (AKA per table).Here’s the picture of all the pieces again: Bounded ContextsWhile CQRS touches on many pieces of software architecture, it is still not at the top of the food chain. CQRS if used is employed within a bounded context (DDD) or a business component (SOA) – a cohesive piece of the problem domain. The events published by one BC are subscribed to by other BCs, each updating their query and command data stores as needed.UI’s from the CQRS found in each BC can be “mashed up” in a single application, providing users a single composite view on all parts of the problem domain. Composite UI frameworks are very useful for these cases.SummaryCQRS is about coming up with an appropriate architecture for multi-user collaborative applications. It explicitly takes into account factors like data staleness and volatility and exploits those characteristics for creating simpler and more scalable constructs.One cannot truly enjoy the benefits of CQRS without considering the user-interface, making it capture user intent explicitly. When taking into account client-side validation, command structures may be somewhat adjusted. Thinking through the order in which commands and events are processed can lead to notification patterns which make returning errors unnecessary.While the result of applying CQRS to a given project is a more maintainable and performant code base, this simplicity and scalability require understanding the detailed business requirements and are not the result of any technical “best practice”. If anything, we can see a plethora of approaches to apparently similar problems being used together – data readers and domain models, one-way messaging and synchronous calls.
Quality Assurance practice was relatively simple when we built the Monolith systems with traditional waterfall development models. The Quality Assurance (QA) teams would start the GUI layer’s validation process after waiting for months for the product development. To enhance the testing process, we would have to spend a lot of efforts and $$$ (commercial tools) in automating the GUI via various tools like Microfocus UFT, Selenium, Test Complete, Coded UI, Ranorex etc., and most often these tests are complex to maintain and scale. Thus, most QA teams would have to restrict their automated tests to smoke and partial regression, ending in inadequate test coverage.
With modern technology, the new era tech companies, including Varo, have widely adopted Microservices-based architecture combined with the Agile/ Dev-Ops development model. This opens up a lot of opportunities for Quality Assurance practice, and in my opinion, this was the origin of the transformation from Quality Assurance to “Quality Engineering.”
The Common Pitfall
While automated testing gives us a massive benefit with the three R’s (Repeatable → Run any number of times, Reliable → Run with confidence, Reusable → Develop, and share), it also comes with maintenance costs. I like to quote Grady Boosh’s — “A fool with a tool is still a fool.” Targeting inappropriate areas would not provide us the desired benefit. We should consider several factors to choose the right candidate for automation. Few to name are the lifespan of the product, the volatility of the requirements, the complexity of the tests, business criticality, and technical feasibility.
It’s well known that the cost of a bug increases toward the right of software lifecycle. So it is necessary to implement several walls of defenses to arrest these software bugs as early as possible (Shift-Left Testing paradigm). By implementing an agile development model with a fail-fast mindset, we have taken care of our first wall of defense. But to move faster in this shorter development cycle, we must build robust automated test suites to take care of the rolled out features and make room for testing the new features.
The 3 Layer Architecture
The Varo architecture comprises three essential layers.
The Frontend layer (Web, iOS, Mobile apps) — User experience
The Orchestration layer (GraphQl) — Makes multiple microservices calls and returns the decision and data to the frontend apps
The Microservice layer (gRPC, Kafka, Postgres) — Core business layer
While understanding the microservice architecture for testing, there were several questions posed.
Which layer to test?
What to test in these layers?
Does testing frontend automatically validate downstream service?
Does testing multiple layers introduce redundancies?
We will try to answer these by analyzing the table below, which provides an overview of what these layers mean for quality engineering.
After inferring the table, we have loosely adopted the Testing Pyramid pattern to invest in automated testing as:
Full feature/ functional validations on Microservices layer
Business process validations on Orchestration layer
E2E validations on Frontend layer
The diagram below best represents our test strategy for each layer.
Note: Though we have automated white-box tests such as unit-test and integration-test, we exclude those in this discussion.
Let’s take the example below for illustration to understand best how this Pyramid works.
The user is presented with a form to submit. The form accepts three inputs — Field A to get the user identifier, Field B a drop-down value, and Field C accepts an Integer value (based on a defined range).
Once the user clicks on the Submit button, the GraphQL API calls Microservice A to get what type of customer. Then it calls the next Microservice B to validate the acceptable range of values for Field C (which depends on the values from Field A and Field B).
1. Feature Validations
✓ Positive behavior (Smoke, Functional, System Integration)
Validating behavior with a valid set of data combinations
Validating integration — Impact on upstream/ downstream systems
✓ Negative behavior
Validating with invalid data (for example: Invalid authorization, Disqualified data)
2. Fluent Validations
✓ Evaluating field definitions — such as
Mandatory field (not empty/not null)
Invalid data types (for example: Int → negative value, String → Junk values with special characters or UUID → Invalid UUID formats)
Let’s look at how the “feature validations” can be written for the above use case by applying one of the test case authoring techniques — Boundary Value Analysis.
To test the scenario above, it would require 54 different combinations of feature validations, and below is the rationale to pick the right candidate for each layer.
Microservice Layer: This is the layer delivered first, enabling us to invest in automated testing as early as possible (Shift-Left). And the scope for our automation would be 100% of all the above scenarios.
Orchestration Layer: This layer translates the information from the microservice to frontend layers; we try to select at least two tests (1 positive & 1 negative) for each scenario. The whole objective is to ensure the integration is working as expected.
Frontend Layer: In this layer, we focus on E2E validations, which means these validations would be a part of the complete user journey. But we would ensure that we have at least one or more positive and negative scenarios embedded in those E2E tests. Business priority (frequently used data by the real-time users) helps us to select the best scenario for our E2E validations.
There are always going to be sets of redundant tests across these layers. But that is the trade-off we had to take to ensure that we have correct quality gates on each of these layers. The pros of this approach are that we achieve safe and faster deployments to Production by enabling quicker testing cycles, better test coverage, and risk-free decisions. In addition, having these functional test suites spread across the layers helps us to isolate the failures in respective areas, thus saving us time to troubleshoot an issue.
However, often, not one size fits all. The decision has to be made based on understanding how the software architecture is built and the supporting infrastructure to facilitate the testing efforts. One of the critical success factors for this implementation is building a good quality engineering team with the right skills and proper tools. But that is another story — Coming soon “Quality Engineering: Redefined.”
There is something going on within the front-end community recently. Server-side rendering is getting more and more traction thanks to React and its built-in server-side hydration feature. But it’s not the only solution to deliver a fast experience to the user with a super fast time-to-first-byte (TTFB) score: Pre-rendering is also a pretty good strategy. What’s the difference between these solutions and a fully client-rendered application?
Under a good and reliable internet connection, it’s pretty fast and works well. But it can be a lot better, and it doesn’t have to be difficult to make it that way. That’s what we will see in the following sections.
Server-side Rendering (SSR)
An SSR solution is something we used to do a lot, many years ago, but tend to forget in favor of a client-side rendering solution.
With old server-side rendering solutions, you built a web page—with PHP for example—the server compiled everything, included the data, and delivered a fully populated HTML page to the client. It was fast and effective.
But… every time you navigated to another route, the server had to do the work all over again: Get the PHP file, compile it, and deliver the HTML, with all the CSS and JS delaying the page load to a few hundred ms or even whole seconds.
What if you could do the first page load with the SSR solution, and then use a framework to do dynamic routing with AJAX, fetching only the necessary data?
This is why SSR is getting more and more traction within the community because React popularized this problem with an easy-to-use solution: The RenderToString method.
This new kind of web application is called a universal app or an isomorphic app. There’s still some controversy over the exact meanings of these terms and the relationship between them, but many people use them interchangeably.
Anyway, the advantage of this solution is being able to develop an app server-side and client-side with the same code and deliver a really fast experience to the user with custom data. The disadvantage is that you need to run a server.
SSR is used to fetch data and pre-populate a page with custom content, leveraging the server’s reliable internet connection. That is, the server’s own internet connection is better than that of a user with lie-fi), so it’s able to prefetch and amalgamate data before delivering it to the user.
With the pre-populated data, using an SSR app can also fix an issue that client-rendered apps have with social sharing and the OpenGraph system. For example, if you have only one index.html file to deliver to the client, they will only have one type of metadata—most likely your homepage metadata. This won’t be contextualized when you want to share a different route, so none of your routes will be shown on other sites with their proper user content (description and preview picture) that users would want to share with the world.
The mandatory server for a universal app can be a deterrent for some and may be overkill for a small application. This is why pre-rendering can be a really nice alternative.
I discovered this solution with Preact and its own CLI that allows you to compile all pre-selected routes so you can store a fully populated HTML file to a static server. This lets you deliver a super-fast experience to the user, thanks to the Preact/React hydration function, without the need for Node.js.
The catch is, because this isn’t SSR, you don’t have user-specific data to show at this point—it’s just a static (and somewhat generic) file sent directly on the first request, as-is. So if you have user-specific data, here is where you can integrate a beautifully designed skeleton to show the user their data is coming, to avoid some frustration on their part:
There is another catch: In order for this technique to work, you still need to have a proxy or something to redirect the user to the right file.
With a single-page application, you need to redirect all requests to the root file, and then the framework redirects the user with its built-in routing system. So the first page load is always the same root file.
In order for a pre-rendering solution to work, you need to tell your proxy that some routes need specific files and not always the root index.html file.
For example, say you have four routes (/, /about, /jobs, and blog) and all of them have different layouts. You need four different HTML files to deliver the skeleton to the user that will then let React/Preact/etc. rehydrate it with data. So if you redirect all those routes to the root index.html file, the page will have an unpleasant, glitchy feel during loading, whereby the user will see the skeleton of the wrong page until it finishes loading and replaces the layout. For example, the user might see a homepage skeleton with only one column, when they had asked for a different page with a Pinterest-like gallery.
The solution is to tell your proxy that each of those four routes needs a specific file:
https://my-website.com → Redirect to the root index.html file
https://my-website.com/about → Redirect to the /about/index.html file
https://my-website.com/jobs → Redirect to the /jobs/index.html file
https://my-website.com/blog → Redirect to the /blog/index.html file
This is why this solution can be useful for small applications—you can see how painful it would be if you had a few hundred pages.
Strictly speaking, it’s not mandatory to do it this way—you could just use a static file directly. For example, https://my-website.com/about/ will work without any redirection because it will automatically search for an index.html inside its directory. But you need this proxy if you have param urls—https://my-website.com/profile/guillaume will need to redirect the request to /profile/index.html with its own layout, because profile/guillaume/index.html doesn’t exist and will trigger a 404 error.
In short, there are three basic views at play with the rendering strategies described above: A loading screen, a skeleton, and the full page once it’s finally rendered.
Depending on the strategy, sometimes we use all three of these views, and sometimes we jump straight to a fully-rendered page. Only in one use case are we forced to use a different approach:
Landing (e.g. /)
Static (e.g. /about)
Fixed Dynamic (e.g. /news)
Parameterized Dynamic (e.g. /users/:user-id)
Loading → Full
Loading → Full
Loading → Skeleton → Full
Loading → Skeleton → Full
Skeleton → Full
HTTP 404 (page not found)
Pre-rendered With Proxy
Skeleton → Full
Skeleton → Full
Client-only Rendering is Often Not Enough
Client-rendered applications are something we should avoid now because we can do better for the user. And doing better, in this case, is as easy as the pre-rendering solution. It’s definitely an improvement over client-only rendering and easier to implement than a fully server-side-rendered application.
An SSR/universal application can be really powerful if you have a large application with a lot of different pages. It allows your content to be focused and relevant when talking to a social crawler. This is also true for search engine robots, which now take your site’s performance into account when ranking it.
Stay tuned for a follow-up tutorial, where I will walk through the transformation of an SPA into pre-rendered and SSR versions, and compare their performance.
Microservices are defined as a self-regulating, and independent codebase that can be written and maintained even by a small team of developers. Microservices Architecture consists of such loosely coupled services with each service responsible for the execution of its associated business logic.
The services are separated from each other based on the nature of their domains and belong to a mini-microservice pool. Enterprise mobile app developers leverage the capabilities of this architecture especially for complex applications.
Microservices Architecture allows developers to release versions of software thanks to sophisticated automation of software building, testing, and deployment – something that acts as a prime differentiation point between Microservices and Monolithic architecture.
Since the services are bifurcated into pools, the architecture design pattern makes the system highly fault-tolerant. In other words, the whole software won’t collapse on its head even if some microservices cease to function.
An enterprise mobile app development company working on such an architecture for clients can deploy multiple programming languages to build different microservices for their specific purpose. Therefore the technology stack can be kept updated with the latest upgrades in computing.
This architecture is a perfect fit for applications that need to scale. Since the services are already independent of each other, they can scale individually rather than overloading the entire system with the need to expand.
Services can be integrated into any application depending upon the scope of work.
Since each service is unique in its ability to contribute to the whole codebase, it could be challenging for an enterprise mobile application development company to interlink all and operate so many distinctive services seamlessly.
Developers must define a standard protocol for all services to adhere to. It is important to do so, as the decentralized approach towards coding microservices in multiple languages can pose serious issues while debugging.
Each microservice with its limited environment is responsible to maintain the integrity of the data. It is up to the architects of such a system to come up with a universally consistent data integrity protocol, wherever possible.
You definitely need the best of breed professionals to design such a system for you as the technology stack keeps changing.
Use Microservices Architecture for apps in which a specific segment will be used heavily than the others and would need a sporadic burst of scaling. Instead of a standalone application you may also deploy this for a service that provides functionality to other applications of the system.
Ever wondered how large enterprise scale systems are designed? Before major software development starts, we have to choose a suitable architecture that will provide us with the desired functionality and quality attributes. Hence, we should understand different architectures, before applying them to our design.
What is an Architectural Pattern?
According to Wikipedia,
An architectural pattern is a general, reusable solution to a commonly occurring problem in software architecture within a given context. Architectural patterns are similar to software design pattern but have a broader scope.
In this article, I will be briefly explaining the following 10 common architectural patterns with their usage, pros and cons.
1. Layered pattern
This pattern can be used to structure programs that can be decomposed into groups of subtasks, each of which is at a particular level of abstraction. Each layer provides services to the next higher layer.
The most commonly found 4 layers of a general information system are as follows.
Presentation layer (also known as UI layer)
Application layer (also known as service layer)
Business logic layer (also known as domain layer)
Data access layer (also known as persistence layer)
General desktop applications.
E commerce web applications.
2. Client-server pattern
This pattern consists of two parties; a server and multiple clients. The server component will provide services to multiple client components. Clients request services from the server and the server provides relevant services to those clients. Furthermore, the server continues to listen to client requests.
Online applications such as email, document sharing and banking.
3. Master-slave pattern
This pattern consists of two parties; master and slaves. The master component distributes the work among identical slave components, and computes a final result from the results which the slaves return.
In database replication, the master database is regarded as the authoritative source, and the slave databases are synchronized to it.
Peripherals connected to a bus in a computer system (master and slave drives).
4. Pipe-filter pattern
This pattern can be used to structure systems which produce and process a stream of data. Each processing step is enclosed within a filter component. Data to be processed is passed through pipes. These pipes can be used for buffering or for synchronization purposes.
Compilers. The consecutive filters perform lexical analysis, parsing, semantic analysis, and code generation.
Workflows in bioinformatics.
5. Broker pattern
This pattern is used to structure distributed systems with decoupled components. These components can interact with each other by remote service invocations. A broker component is responsible for the coordination of communication among components.
Servers publish their capabilities (services and characteristics) to a broker. Clients request a service from the broker, and the broker then redirects the client to a suitable service from its registry.
In this pattern, individual components are known as peers. Peers may function both as a client, requesting services from other peers, and as a server, providing services to other peers. A peer may act as a client or as a server or as both, and it can change its role dynamically with time.
This pattern primarily deals with events and has 4 major components; event source, event listener, channel and event bus. Sources publish messages to particular channels on an event bus. Listeners subscribe to particular channels. Listeners are notified of messages that are published to a channel to which they have subscribed before.
8. Model-view-controller pattern
This pattern, also known as MVC pattern, divides an interactive application in to 3 parts as,
model — contains the core functionality and data
view — displays the information to the user (more than one view may be defined)
controller — handles the input from the user
This is done to separate internal representations of information from the ways information is presented to, and accepted from, the user. It decouples components and allows efficient code reuse.
Architecture for World Wide Web applications in major programming languages.
This pattern is useful for problems for which no deterministic solution strategies are known. The blackboard pattern consists of 3 main components.
blackboard — a structured global memory containing objects from the solution space
knowledge source — specialized modules with their own representation
control component — selects, configures and executes modules.
All the components have access to the blackboard. Components may produce new data objects that are added to the blackboard. Components look for particular kinds of data on the blackboard, and may find these by pattern matching with the existing knowledge source.
Vehicle identification and tracking
Protein structure identification
Sonar signals interpretation.
10. Interpreter pattern
This pattern is used for designing a component that interprets programs written in a dedicated language. It mainly specifies how to evaluate lines of programs, known as sentences or expressions written in a particular language. The basic idea is to have a class for each symbol of the language.
Database query languages such as SQL.
Languages used to describe communication protocols.
Comparison of Architectural Patterns
The table given below summarizes the pros and cons of each architectural pattern.
Hope you found this article useful. I would love to hear your thoughts. 😇
Event sourcing, eventual consistency, microservices, CQRS… These are quickly becoming household names in mainstream application development. But do you know what makes them tick? What are the basic building blocks required to assemble complex, business-centric applications from fine-grained services without turning the lot into a big ball of mud?
This article examines a fundamental building block — event streaming. Leading the charge will be Apache Kafka — the de facto standard in event streaming platforms, which we’ll observe through Kafdrop — a feature-packed web UI.
A Brief Intro
Event streaming platforms reside in the broader class of Message-oriented Middleware (MoM) and are similar to traditional message queues and topics but offer stronger temporal guarantees and typically order-of-magnitude performance gains due to log-structured immutability. In simple terms, write operations are mostly limited to sequential appends, which make them fast. Really fast.
Whereas messages in a traditional Message Queue (MQ) tend to be arbitrarily ordered and generally independent of one another, events (or records) in a stream tend to be strongly ordered, often chronologically or causally. Also, a stream persists its records, whereas an MQ will discard a message as soon as it has been read.
For this reason, event streaming tends to be a better fit for Event-Driven Architectures, encompassing event sourcing, eventual consistency, and CQRS concepts. (Of course, there are FIFO message queues too, but the differences between FIFO queues and fully-fledged event streaming platforms are quite substantial, not limited to ordering alone.)
Event streaming platforms are a comparatively recent paradigm within the broader MoM domain. There are only a handful of mainstream implementations available, compared to hundreds of MQ-style brokers, some going back to the 1980s (e.g. Tuxedo). Compared to established standards such as AMQP, MQTT, XMPP, and JMS, there are no equivalent standards in the streaming space.
Event streaming platforms are an active area of continuous research and experimentation. In spite of this, streaming platforms aren’t just a niche concept or an academic idea with a few esoteric use cases; they can be applied effectively to a broad range of messaging and eventing scenarios, routinely displacing their more traditional counterparts.
The diagram below offers a brief overview of the Kafka component architecture. While the intention isn’t to indoctrinate you with all of Kafka’s inner workings, some appreciation of its design will go a long way in explaining the key concepts that we will cover shortly.
Kafka is a distributed system comprising several key components:
Broker nodes: Responsible for the bulk of I/O operations and durable persistence within the cluster. Brokers accommodate the append-only log files that comprise the topic partitions hosted by the cluster. Partitions can be replicated across multiple brokers for both horizontal scalability and increased durability — these are called replicas. A broker node may act as the leader for certain replicas, while being a follower for others. A single broker node will also be elected as the cluster controller — responsible for the internal management of partition states. This includes the arbitration of the leader-follower roles for any given partition.
ZooKeeper nodes: Under the hood, Kafka needs a way to manage the overall controller status in the cluster. Should the controller drop out for whatever reason, there is a protocol in place to elect another controller from the set of remaining brokers. The actual mechanics of controller election, heart-beating, and so forth, are largely implemented in ZooKeeper. ZooKeeper also acts as a configuration repository of sorts, maintaining cluster metadata, leader-follower states, quotas, user information, ACLs, and other housekeeping items. Due to the underlying gossiping and consensus protocol, the number of ZooKeeper nodes must be odd.
Producers: These are client applications responsible for appending records to Kafka topics. Because of the log-structured nature of Kafka and the ability to share topics across multiple consumer ecosystems, only producers are able to modify the data in the underlying log files. The actual I/O is performed by the broker nodes on behalf of the producer clients. Any number of producers may publish to the same topic, selecting the partitions used to persist the records.
Consumers: These are client applications that read from topics. Any number of consumers may read from the same topic; however, depending on the configuration and grouping of consumers, there are rules governing the distribution of records among the consumers.
Topics, Partitions, Records, and Offsets
A partition is a totally ordered sequence of records and is fundamental to Kafka. A record has an ID — a 64-bit integer offset and a millisecond-precise timestamp. Also, it may have a key and a value; both are byte arrays and both are optional. The term “totally ordered” simply means that for any given producer, records will be written in the order they were emitted by the application. If record P was published before Q, then P will precede Q in the partition. (Assuming P and Q share a partition.)
Furthermore, they will be read in the same order by all consumers; P will always be read before Q, for every possible consumer. This ordering guarantee is vital in a large majority of use cases. Published records will generally correspond to some real-life events, and preserving the timeline of these events is often essential.
Note: Kafka uses the term “record,” where others might use “message” or “event.” In this article, we will use the terms interchangeably, depending on the context. Similarly, you might see the term “stream” as a generic substitute for “topic.”
There is no recognized ordering across producers. If two (or more) producers emit records simultaneously, those records may materialize in arbitrary order. That said, this order will still be observed uniformly across all consumers.
A record’s offset uniquely identifies it in the partition. The offset is a strictly monotonically-increasing integer in a sparse address space, meaning that each successive offset is always higher than its predecessor and there may be varying gaps between neighboring offsets. Gaps might legitimately appear if compaction is enabled or as a result of transactions; we don’t need to delve into the details at this stage. Suffice it to say that offsets need not be contiguous.
Your application shouldn’t attempt to literally interpret an offset or guess what the next offset might be. It may, however, infer the relative order of any record pair based on their offsets, sort the records by their offset, and so forth.
The diagram below shows what a partition looks like on the inside.1
start of partition
|0..00000|First record |
|0..00001|Second record |
|0..00002|Third record |
|0..00003|Fourth record |
|0..00007|Fifth record |
|0..00008|Sixth record |
|0..00010|Seventh record |
|0..56789|Last record |
end of partition
The beginning offset, also called the low-water mark, is the first message that will be presented to a consumer. Due to Kafka’s bounded retention, this is not necessarily the first message that was published. Records may be pruned on the basis of time and/or partition size. When this occurs, the low-water mark will appear to advance, and records earlier than the low-water mark will be truncated.
Conversely, the high-water mark is the offset immediately following the last record in the partition, also known as the end offset. It is the offset that will be assigned to the next record that will be published. It is not the offset of the last record.
A topic is a logical composition of partitions. A topic may have one or more partitions, and a partition must be a part of exactly one topic. Topics are fundamental to Kafka, allowing for both parallelism and load balancing.
Earlier, we said that partitions exhibit total order. Because partitions within a topic are mutually independent, the topic is said to exhibit partial order. In simple terms, this means that certain records may be ordered in relation to one another while being unordered with respect to certain other records. The concepts of total and partial order, while sounding somewhat academic, are hugely important in the construction of performant event streaming pipelines. They enables us to process records in parallel where we can, while maintaining order where we must. We’ll explore the concepts of record order, consumer parallelism, and topic sizing in a short while.
Example: Publishing Messages
Let’s put some of this theory into practice. We are going to spin up a pair of Docker containers — one for Kafka and another for Kafdrop. Rather than launching them individually, we’ll use Docker Compose.
Create a docker-compose.yaml file in a directory of your choice, containing the following:1
Note: We’re using the obsidiandynamics/kafka image for convenience because it neatly bundles Kafka and ZooKeeper into a single image. If you wanted to, you could replace this with images from Confluent or Wurstmeister, but then you’d have to wire it all up properly. The obsidiandynamics/kafka image does all this for you, so it’s highly recommended for beginners (and lazy pros).
Then, start it with docker-compose up. Once it boots, navigate to localhost:9000 in your browser. You should see the Kafdrop landing screen.
You should see our single-broker cluster. It’s a promising start, but there are no topics. Not a problem; let’s create a topic and publish some messages using Kafka’s command-line tools. Conveniently, we already have a Kafka image running as part of our docker-compose stack, so we can shell into it to use the built-in CLI tools.1
docker exec -it kafka-kafdrop_kafka_1 bash
This gets you into a Bash shell. The tools are in the /opt/kafka/bin directory, so let’s cd into it:1
Create a topic named streams-intro with three partitions:1
Note:kafka-topics uses the --bootstrap-server argument to configure the Kafka broker list, while kafka-console-producer uses the --broker-list argument for the same purpose. Also, --property arguments are largely undocumented; be prepared to Google your way around.
Records are separated by newlines. The key and the value parts are delimited by colons, as indicated by the key.separator property. For the sake of an example, type in the following (a copy-paste will do):1
Press CTRL+D when done. Then, switch back to Kafdrop and click on the streams-intro topic. You’ll see an overview of the topic, along with a detailed breakdown of the underlying partitions:
Let’s pause for a moment and dissect what’s been done. We created a topic with three partitions. We then published five records using two unique keys — foo and bar. Kafka uses keys to map records to partitions, such that all records with the same key will always appear on the same partition. Handy, but also important because it lets the publisher dictate the precise order of records. We’ll discuss key hashing and partition assignments in more detail later; in the meanwhile, sit back and enjoy the ride.
Looking at the partitions table, partition #0 has the first and last offsets at zero and two respectively. Partition #2 has them at zero and three, while partition #1 appears to blank. Clicking on #0 in the Kafdrop web UI sends us to a topic viewer:
We can see the two records published under the bar key. Note, they are completely unrelated to the foo records. Other than being collated within the same topic, there is nothing that binds records across partitions.
Note: In case you were wondering, the arrow to the left of the message lets you expand and pretty-print JSON-encoded messages. As our examples didn’t use JSON, there’s nothing to pretty-print.
It can be said without exaggeration that Kafka’s built-in tooling is an abomination. There is no consistency in the naming of command arguments and the simple act of publishing keyed messages requires you to jump through hoops — passing in obscure, undocumented properties. The usability of the built-in tools is a well-known heartache within the Kafka community. This is a real shame. It’s like buying a Ferrari, only to have it delivered with plastic hub caps. Fortunately, there are alternatives — both commercial and open source — that can fill the glaring gaps in tooling and observability.
Consumers and Consumer Groups
So far we have learned that producers emit records into the stream; these records are organized into nicely ordered partitions. Kafka’s pub-sub topology adheres to a flexible multipoint-to-multipoint model, meaning that there may be any number of producers and consumers simultaneously interacting with a stream. Depending on the actual solution context, stream topologies may also be point-to-multipoint, multipoint-to-point, and point-to-point. It’s about time we looked at how records are consumed.
A consumer is a process or a thread that attaches to a Kafka cluster via a client library. (One is available for most languages.) A consumer generally, but not necessarily, operates as part of an encompassing consumer group. The group is specified by the group.id property. Consumer groups are effectively a load-balancing mechanism within Kafka — distributing partition assignments approximately evenly among the individual consumer instances within the group.
When the first consumer in a group joins the topic, it will receive all partitions in that topic. When a second consumer subsequently joins, it will get approximately half of the partitions, relieving the first consumer of half of its prior load. The process runs in reverse when consumers leave (by disconnecting or timing out) — the remaining consumers will absorb a greater number of partitions.
So, a consumer siphons records from a topic, pulling from the share of partitions that have been assigned to it by Kafka, alongside the other consumers in its group. As far as load-balancing goes, this should be fairly straightforward. But here’s the kicker — the act of consuming a record does not remove it. This might seem contradictory at first, especially if you associate the act of consuming with depletion. (If anything, a consumer should have been called a ‘reader’, but let’s not dwell on the choice of terminology.)
The simple fact is, consumers have absolutely no impact on topics and their partitions. A topic is an append-only ledger that may only be mutated by the producer, or by Kafka itself (as part of compaction or cleanup). Consumers are “cheap,” so you can have quite a number of them tail the logs without stressing the cluster. This is yet another point of distinction between an event stream and a traditional message queue, and it’s a crucial one.
A consumer internally maintains an offset that points to the next record in a partition, advancing the offset for every successive read. When a consumer first subscribes to a topic, it may elect to start at either the head-end or the tail-end of the topic. This behavior is controlled by setting the auto.offset.reset property to one of latest, earliest or none. In the latter case, an exception will be thrown if no previous offset exists for the consumer group.
Consumers retain their offset state vector locally. Since consumers across different consumer groups do not interfere, there may be any number of them reading concurrently from the same topic. Consumers run at their own pace — a slow or backlogged consumer has no impact on its peers.
To illustrate this concept, consider a scenario involving a topic with two partitions. Two consumer groups, A and B, are subscribed to the topic. Each group has three instances, the consumers being named A1, A2, A3, B1, B2, and B3. The diagram below illustrates how the two groups might share the topic and how the consumers advance through the records independently of one another.1
Look carefully and you’ll notice something is missing. Consumers A3 and B1 aren’t there. That’s because Kafka guarantees that a partition may only be assigned to at most one consumer within its consumer group. (We say ‘at most’ to cover the case when all consumers are offline.) Because there are three consumers in each group, but only two partitions, one consumer will remain idle — waiting for another consumer in its respective group to depart before being assigned a partition.
In this manner, consumer groups are not only a load-balancing mechanism, but also a fence-like exclusivity control, used to build highly performant pipelines without sacrificing safety, particularly when there is a requirement that a record may only be handled by one thread or process at any given time.
Consumer groups are also used to ensure availability. By periodically pulling records from a topic, the consumer implicitly signals to the cluster that it’s in a ‘healthy’ state, thereby extending the lease over its partition assignment. However, should the consumer fail to read again within the allowable deadline, it will be deemed faulty and its partitions will be reassigned — apportioned among the remaining ‘healthy’ consumers within its group. This deadline is controlled by the max.poll.interval.ms consumer client property, set to five minutes by default.
To use a transportation analogy, a topic is like a highway, while a partition is a lane. A record is the equivalent of a car, and its occupants correspond to the record’s value. Several cars can safely travel on the same highway, providing they keep to their lane. Cars sharing the same line ride in a sequence, forming a queue. Now, suppose each lane leads to an off-ramp, diverting its traffic to some location. If one off-ramp gets banked up, other off-ramps may still flow smoothly.
It’s precisely this highway-lane metaphor that Kafka exploits to achieve its end-to-end throughput, easily reaching millions of records per second on commodity hardware. When creating a topic, one can choose the partition count — the number of lanes, if you will.
The partitions are divided approximately evenly among the individual consumers in a consumer group, with a guarantee that no partition will be assigned to two (or more) consumers at the same time, providing that these consumers are part of the same consumer group. Referring to our analogy, a car will never end up in two off-ramps simultaneously; however, two lanes might conceivably lead to the same off-ramp.
Note: A topic may be resized after creation by increasing the number of partitions. It is not possible, however, to decrease the partition count without recreating the topic.
Records correspond to events, messages, commands, or any other streamable content. Precisely how records are partitioned is left to the discretion of the producer(s). A producer may explicitly assign a partition index when publishing a record, although this approach is rarely used. A much more common approach is to assign a key to a record, as we have done in our earlier example. The key is completely opaque to Kafka. In other words, Kafka doesn’t attempt to interpret the contents of the key, treating it as an array of bytes. These bytes are hashed to derive a partition index, using a consistent hashing technique.
Records sharing the same hash are guaranteed to occupy the same partition. Assuming a topic with multiple partitions, records with a different key will likely end up in different partitions. However, due to hash collisions, records with different hashes may also end up in the same partition. Such is the nature of hashing. If you understand how a hash table works, this is no different.
Producers rarely care which specific partition the records will map to, only that related records end up in the same partition, and that their order is preserved. Similarly, consumers are largely indifferent to their assigned partitions, so long that they receive the records in the same order as they were published, and their partition assignment does not overlap with any other consumer in their group.
We already said that consumers maintain an internal state with respect to their partition offsets. At some point, that state must be shared with Kafka, so that when a partition is reassigned, the new consumer can resume processing from where the outgoing consumer left off. Similarly, if the consumers were to disconnect, upon reconnection they would ideally skip over those records that have already been processed.
Persisting the consumer state back to the Kafka cluster is called committing an offset. Typically, a consumer will read a record (or a batch of records) and commit the offset of the last record, plus one. If a new consumer takes over the topic, it will commence processing from the last committed offset, hence the plus-one step is essential. (Otherwise, the last processed record would be handled a second time.)
Fun fact: Kafka employs a recursive approach to managing committed offsets, elegantly utilising itself to persist and track offsets. When an offset is committed, Kafka will publish a binary record on the internal __consumer_offsets topic. The contents of this topic are compacted in the background, creating an efficient event store that progressively reduces to only the last known commit points for any given consumer group.
Controlling the point when an offset is committed provides a great deal of flexibility around delivery guarantees, handing Kafka a yet another trump card. The term ‘delivery’ assumes not just reading a record, but the full processing cycle, complete with any side-effects. One can shift from an at-most-once to an at-least-once delivery model by simply moving the commit operation from a point before the processing of a record is commenced, to a point sometime after the processing is complete. With this model, should the consumer fail midway through processing a record, the record will be re-read the following partition reassignment.
By default, a Kafka consumer will automatically commit offsets every five seconds, regardless of whether the consumer has finished processing the record. Often, this is not what you want, as it may lead to mixed delivery semantics. For example, in the event of consumer failure, some records might be delivered twice, while others might not be delivered at all. To enable manual offset committing, set the enable.auto.commit property to false.
Note: There are a few gotchas like this in Kafka. Pay close attention to the (producer and consumer) client properties in the official Kafka documentation, particularly to the stated defaults. Don’t assume for a moment that the defaults are sensible, insofar as they ought to favour safety over other competing qualities. Kafka defaults tend to be optimised for performance, and will need to be explicitly overridden on the client when safety is a critical objective. Fortunately, setting the properties to insure safety has only a minor impact on performance — Kafka is still a beast. Remember the first rule of optimisation: Don’t do it. Kafka would have been even better, had their creators given this more thought.
Getting offset committing right can be tricky, and routinely catches out beginners. A committed offset implies that the record one below that offset and all prior records have been dealt with by the consumer. When designing at-least-once or exactly-once applications, an offset should only be committed when the application is dealt with with the record in question and all records before it.
In other words, the record has been processed to the point that any actions that would have resulted from the record have been carried out and finalized. This may include calling other APIs, updating a database, committing transactions, persisting the record’s payload, or publishing more records. Stated otherwise, if the consumer were to fail after committing the record, then not ever seeing this record again must not be detrimental to its correctness.
In the at-least-once (and by extension, the exactly-once) scenario, a typical consumer implementation will commit its offset linearly, in tandem with the processing of the records. That is, read a record, commit it (plus-one), read the next, commit it (plus one), and so on. A common tactic is to process a batch of records concurrently (where this makes sense), using a thread pool, and only confirm the last record when the entire batch is done. The commit process in Kafka is very efficient, the client library will send commit requests asynchronously to the cluster using an in-memory queue, without blocking the consumer. The client application can register an optional callback, notifying it when the commit has been acknowledged by the cluster.
The consumer group is a somewhat understated concept that is pivotal to the versatility of an event streaming platform. By simply varying the affinity of consumers with their groups, one can arrive at vastly different distribution topologies — from a topic-like, pub-sub behavior to an MQ-style, point-to-point model. Because records are never truly consumed (the advancing offset only creates the illusion of consumption), one can concurrently superimpose disparate distribution topologies over a single event stream.
Consumer groups are completely optional; a consumer does not need to be encompassed in a consumer group to pull messages from a topic. A free consumer omits the group.id property. Doing so allows it to operate under relaxed rules, entirely transferring the responsibility for consumer management to the application.
Note: The use of the term ‘free’ to denote a consumer without an encompassing group is not part of the standard Kafka nomenclature. As Kafka lacks a canonical term to describe this, the term ‘free’ was adopted here.
Free consumers do not subscribe to a topic. Instead, the consuming application is responsible for manually assigning a set of topic-partitions to the consumer, individually specifying the starting offset for each topic-partition pair. Free consumers do not commit their offsets to Kafka; it is up to the application to track the progress of such consumers and persist their state as appropriate, using a datastore of their choosing. The concepts of automatic partition assignment, rebalancing, offset persistence, partition exclusivity, consumer heart-beating and failure detection, and other so-called niceties accorded to consumer groups cease to exist in this mode.
Free consumers are not observed in the wild as often as their grouped counterparts. There are predominantly two use cases where a free consumer is an appropriate choice. The first, is when you genuinely need full control of the partition assignment scheme and/or you require an alternative place to store consumer offsets. This is very rare.
Needless to say, it’s also very difficult to implement correctly, given the multitude of scenarios one must account for. The second, more commonly seen use case, is when you have a stateless or ephemeral consumer that needs to monitor a topic. For example, you might be interested in tailing a topic to identify specific records, or just as a debugging tool. You might only care about records that were published when your stateless consumer was online, so concerns such as persisting offsets and resuming from the last processed record are completely irrelevant.
A good example of where this is used routinely is the Kafdrop web UI, which we’ve already seen. When you click on a topic to view the messages, Kafdrop creates a free consumer and assigns the requested partition to it, reading the records from the supplied offsets. Navigating to a different topic or partition will reset the consumer, discarding any prior state.
The illustration below outlines the relationship between producers, topics, partitions, consumers, and consumer groups.1
Topics are subdivided into partitions, each forming an independent, totally-ordered sequence within a wider, partially-ordered stream.
Multiple producers are able to publish to a topic, picking a partition at will. This may be accomplished either directly, by specifying a partition index, or indirectly, by way of a record key, which deterministically hashes to a consistent partition index. (In the diagram above, both Producer 1 and Producer 2 publish to the same topic.)
Partitions in a topic can be load-balanced across a population of consumers in a consumer group, allocating partitions approximately evenly among the members of that group. (Consumer 2 and Consumer 3 each get one partition.)
A consumer in a group is not guaranteed a partition assignment. Where the group’s population outnumbers the partitions, some consumers will remain idle until this balance equalizes or tips in favor of the other side. (Consumer 4 remains partition-less.)
Partitions may be manually assigned to free consumers. If necessary, an entire topic may be assigned to a single free consumer — this is done by individually assigning all partitions. (Consumer 1 can be freely assigned any partition.)
When contrasting at-least-once with at-most-once delivery semantics, an often-asked question is: Why can’t we have it exactly once?
Without delving into the academic details, which involve conjectures and impossibility proofs, it is sufficient to say that exactly-once semantics are not possible without collaboration with the consumer application. What does this mean in practice?
Consumers in event streaming applications must be idempotent. In other words, processing the same record repeatedly should have no net effect on the consumer ecosystem. If a record has no additive effects, the consumer is inherently idempotent. (For example, if the consumer simply overwrites an existing database entry with a new one, then the update is naturally idempotent.) Otherwise, the consumer must check whether a record has already been processed, and to what extent, prior to processing a record. The combination of at-least-once delivery and consumer idempotence collectively leads to exactly-once semantics.
Example: A Trading Platform
With all this theory looming over us like Kubrick’s Monolith, it would be inappropriate to conclude without offering the reader a practical scenario.
Let’s say you were looking for specific price patterns in listed stocks, emitting trading signals when a particular pattern is identified. There are a large number of stocks, and understandably you’d like them processed in parallel. However, the time series for any given ticker code must be processed sequentially on a single consumer.
Kafka makes this use case, and others like it, almost trivial to implement. We would create a pair of topics: prices for the raw price data, and orders for any resulting orders. We can be fairly generous with our partition counts, as the nature of the data gives us ample opportunities for parallelism.
At the feed source, we could publish a record for each price on the prices topic, keyed by the ticker code. Kafka’s automatic partition assignment will ensure that every ticker code is handled by (at most) one consumer in its group. The consumer instances are free to scale in and out to match the processing load. Consumer groups should be meaningfully named, ideally reflecting the purpose of the consuming application. A good example might be trading-strategy.abc, for a fictitious trading strategy named ‘ABC’.
Once a price pattern is identified by the consumer, it can publish another message — the order request — on the orders topic. We’ll muster up another consumer group — order-execution — responsible for reading the orders and forwarding them to the broker.
In this simple example, we have created an end-to-end trading pipeline that is entirely event-driven and highly scalable — at least theoretically, assuming there are no other bottlenecks. We can dynamically add more processing nodes to the individual stages to cope with the increased load where it’s called for.
Now, let’s spice things up a bit. Suppose you need several trading strategies operating concurrently, driven by a common data feed. Furthermore, the trading strategies will be developed by different teams; the objective being to decouple these implementations as much as possible, allowing the teams to operate autonomously — develop and deploy at their individual cadence, perhaps even using different programming languages and tool-chains. That said, you’d ideally want to reuse as much of what’s already been written. So, how would we pull this off?
Kafka’s flexible multipoint-to-multipoint pub-sub architecture combines stateful consumption with broadcast semantics. Using distinct consumer groups, Kafka allows disparate applications to share input topics, processing events at their own pace. The second trading strategy would need a dedicated consumer group — trading-strategy.xyz — applying its specific business logic to the common pricing stream, publishing the resulting orders to the same orders topic. In this fashion, Kafka enables you to construct modular event processing pipelines from discrete elements that are readily reusable and composable.
Note: In the days of service buses and traditional ‘enterprisey’ message brokers, before event sourcing entered the mainstream, you would have had to choose between persistent message queues or transient broadcast topics. In our example, you would likely have created multiple FIFO queues, using the fan-out pattern. Because Kafka generalises pub-sub topics and persistent message queues into a unified model, a single source topic can power a diverse range of consumers without incurring duplication.
Event streaming platforms are a highly effective building block in the construction of modular, loosely-coupled, event-driven applications. Within the world of event streaming, Kafka has solidified its position as the go-to open-source solution that is both amazingly flexible and highly performant. Concurrency and parallelism are at the heart of Kafka’s architecture, forming partially-ordered event streams that can be load-balanced across a scalable consumer ecosystem. A simple reconfiguration of consumers and their encompassing groups can bring about vastly different event distribution and processing semantics; shifting the offset commit point can invert the delivery guarantee from an at-most-once to an at-least-once model.
Of course, Kafka isn’t without its flaws. The tooling is sub-par, to put it mildly; most Kafka practitioners have long abandoned the out-of-the-box CLI utilities in favour of other open-source tools such as Kafdrop, Kafkacat and third-party commercial offerings like Kafka Tool. The breadth of Kafka’s configuration options is overwhelming, with defaults that are riddled with gotchas, ready to shock the unsuspecting first-time user.
All in all, Kafka represents a paradigm shift in how we architect and build complex systems. Its benefits go beyond the superfluous, and they dwarf any of the niggles that are bound to exist in a technology that has undergone such aggressive adoption. Crucially, it paves the way for further progress in its space; Apache Pulsar is a prime example of an alternative platform that has improved on much of Kafka’s shortcomings, yet owes a great deal to its predecessor for laying the cornerstone and bringing the genre to the mainstream.
Microservices architecture has become a hot topic in the software backend development world. The ecosystem carries a profound impact on not just the enterprises’ IT function but also in the digital transformation of an entire app business.
The debate of Microservices vs monolithic architecture defines a revolutionary shift in how an IT team approaches their software development cycle: Whether they go with the approach that brands like Google, Amazon, and Netflix chose or do they go with the simplicity quotient that a startup which is at the development stage demands.
In this article, we are going to get startups an answer to which backend architecture they should choose when they are starting their journey to become a startup.
Table Of Content:
What are Microservices Architecture?
What is Monolithic Architecture?
Microservices vs Monolithic Architecture: Advantages and Disadvantages
How to Choose Between Monolithic and Microservice Architecture?
Migrating from a Monolithic Architecture to a Microservice Ecosystem
What are Microservices Architecture?
Microservices architecture contains a mix of small and autonomous services where every service is self-contained and must be implemented as a single business ability. It is a distinct approach used for development of software systems which focus on developing several single-function modules with clearly-defined operations and interfaces. The approach has become a popular trend in the past several years as more and more Enterprises are looking to become Agile and make a shift towards DevOps.
Components of Microservices architecture that makes it one of the best enterprise architecture:
The services are independent, small, and loosely coupled
Encapsulates a business or customer scenario
Every service is different codebase
Services can be independently deployed
Services interact with each other using APIs
With the question of what are microservices architecture now answered, let us move on to look into what is monolithic architecture.
What is Monolithic Architecture?
Monolithic application has a single codebase having multiple modules. The modules, in turn, are divided into either technical features or business features. The architecture comes with a single build system that helps build complete application. It also comes with a single deployable or executable binary.
Now that we have looked into what is monolithic architecture and microservices architecture, let us look into the disadvantages and benefits that both the backend system offers to get an understanding of what separates them from each other.
Microservices vs Monolithic Architecture: Advantages and Disadvantages
Advantages of Monolithic Architecture
A. Zero Deployment Dependencies
An organized and well-documented Monolith architecture makes it possible for Backend developers to not worry about which version would be compatible with which service, how to find which services are present and what they do, etc.
B. Error Tracing
One of the biggest benefits of monolithic is that all the transactions are logged into one place, making error tracing task a breeze.
C. No Silos
The one factor that works in the favour of monolithic in the microservices vs monolithic architecture debate is absence of silos. It becomes very easy for the developers to work on multiple parts of the app for they are all structured similarly, using the same tools, which makes it okay to have no prior distributed computing knowledge.
D. Cross-cutting concerns:
Spending time in defining the services which do not bleed in each other’s time is the time that you can actually spend in developing things that help the customers.
E. Shared Code:
No shared libraries where the complete scope needed for services to operate is sent along each request.
Limitations of Monolithic Architecture
A. Lack of Flexibility:
Monolithic architectures are not flexible. You cannot use different technologies when you have incorporated Monolithic. The technology stack which have been decided at the beginning have to be followed throughout the project, making upgrades a next to impossible task.
B. Development Speed:
Microservices speed development process is famous when you compare microservices architecture vs monolithic architecture. Development is very slow in monolithic architecture. It can be very difficult for team members to understand and then modify the code of large monolithic applications. Additionally, as the size of codebase increases, the IDE gets overloaded and gets slower. All of this results in a slowed down app development speed.
C. Difficult Scalability:
Scaling monolithic applications becomes difficult when the apps becomes large. While developers can develop new instances of monolith and load balancer to distribute the traffic to new instances, monolithic architecture cannot scale with the increasing load.
Benefits of Microservices Architecture
The biggest factor in favor of microservices in the difference between microservices and monolithic architecture is that it handles complexity issues by decomposing the app into manageable service set that are faster to develop and easier to maintain and understand.
It enables independent service development through a team which is focused on the particular service, which makes the ideal choice of businesses that work with an Agile development approach.
It lowers the barrier of adopting newer technologies as the developers have the freedom to choose whatever technology that makes sense to their project.
It makes it possible for every microservice to be deployed individually. The result of which is that continuous deployment of complex application becomes possible.
Drawbacks of Microservices Architecture
Microservices add a complexity to project simply by the fact that the microservices application is distributed system. To solve the complexities, developers have to select and implement inter-process communication that is based on either RPC or messaging.
They work with partitioned database architecture. The business transactions which update multiple business entities inside the microservices application also have to update different databases that are owned by multiple services.
It is a lot more difficult to implement changes which span across multiple services. While in case of Monolithic architecture, an app development agency only have to change the corresponding modules, integrate all the changes, and then deploy them all in one go.
Deployment of a microservice application is very complex. It consists of a number of services, which individually have multiple runtime instances. In contrast, a monolithic application is deployed on set of identical servers behind load balancer.
The benefits and limitations are prevalent in both monolithic and microservices architecture. This makes it extremely difficult for a startup to gauge which backend architecture to incorporate in their journey.
Let us help you.
How to Choose Between Monolithic and Microservice Architecture?
The fact that both the approaches come with their own set of pros and cons are a sign that there is no one size fits all methodology when it comes to choosing a backend architecture. But there are a few questions that can help you decide which is the right direction to head into.
Are You Working in a Familiar Sector?
When you work in an industry where you know the veins of the sector and you know the demands and the needs of the customers, it becomes easier to enter into the system with a definite structure. The same, however, is not possible with a business that is very new in the industry, for the amount of looming doubts are much greater.
So, the use of microservice architecture in app development is best suited in cases where you know the industry inside out. If that is not the case, go with monolithic approach to develop your app.
How Prepared is Your Team?
Is your team aware with the best practices for implementing microservices? Or are they more comfortable with working around the simplicity of monolithic? Will your team and your business offering expand in the coming time? You will have to find answers to all these questions to gauge whether the people who have to work on a project are even ready to migrate.
What is Your Infrastructure Like?
Everything from the development to the deployment of a monolithic web application would require a cloud-based infrastructure. You will have to make use of Amazon AWS and Google Cloud for deploying even tiny elements. While the cloud technologies make the process easier, The idea of setting up database server for every other microservice and then scaling out is something that startup entrepreneur might not be comfortable with.
Have you Evaluated the Business Risk?
More often than not, businesses take microservices’ side in the Microservices vs Monolithic Architecture thinking it is the right thing for their business. What they forget to factor in is the chance that their application might not become as scalable as they are optimistically expecting and they might have to suffer the risks of adding a highly scalable system in their process.
Here is a short list of pointers that would help you make the decision of choosing to opt for software development processes with microservices vs monolithic architecture:
When to Choose Monolithic Architecture?
When your team is at a founding stage
When you are developing a proof of concept
When you have no experience in microservices
When you have experience in the development on solid frameworks, like the Ruby on Rails, Laravel, etc.
When to Choose Microservices Architecture?
You need independent, quick delivery service
You need to extend your team
Your platform need to be extremely efficient
You don’t have a tight deadline to work with
Migrating from a Monolithic Architecture to a Microservice Ecosystem
The right approach for migrating a monolithic architecture to a microservice ecosystem is to divide the monolith processes and turn them into microservices. The result of this is a two-factor plan:
Identification of existing monolithic elements which can get decoupled
A validation that the new functionality can be developed as microservice
One of the main challenges that can emerge when initiating the migration from a monolithic architecture to a microservice architecture is to design and create an integration between existing system and a new microservice. A solution for this can be to add a glue code which allows them to connect later, something like an API.
API gateway can also help in combining multiple individual service calls in one coarse-grained service, and this in turn would help reduce the integration cost with monolithic system.
When you compare microservices architecture vs monolithic architecture, you will find the former being a hot trend. Every entrepreneur wants to say that their app is based on this architecture. But the temptation to focus only on the problems of monolithic architecture and abandon the architecture should be measured against the actual value of microservice architecture.
The right approach would be to develop new apps using a monolithic approach and move to microservices only when the justification of the move is backed by proper metrics like performance monitoring.
For established businesses, microservices tend to be avenues for continuous deployment, team based development, and an agility to shift to new technologies. But for startups, or companies that are just starting, adopting microservices can impact the software project success very negatively.
FAQs About Microservices vs Monolithic Architecture
Q. What is the Purpose of Microservices?
The Microservice architectures allow you to divide the application in separate independent services, where each of them are managed by different groups in the software development agency. This way, the responsibility gets divided and the application is developed and deployed at a much faster rate.
Q. Does moving from a monolith to a Microservice architecture help with resilience?
Yes. Since microservices enable developers to handle multiple parts of the project at the same time in a streamlined manner, it becomes much easier to identify issues and solve them within time. Something that is next to impossible in case of Monolithic architecture where it is impossible to add new technologies or change the process, mid project.
Q. What is the difference between Microservices vs Monolithic approach?
The difference in microservices and monolithic architecture is the difference of approaches. While in case of Monolithic architecture, there is a single build system, Microservices come with multiple build systems, which makes development and deployment of an application faster.
Q. When To Choose Microservices Over Monolithic Architecture
The choice of going with microservices over monolithic architecture can be decided upon these factors: