Given a sorted array, remove the duplicates from the array in-place such that each element appears at most twice, and return the new length.
Do not allocate extra space for another array, you must do this by modifying the input array in-place with O(1) extra memory.
Example
Given array [1, 1, 1, 3, 5, 5, 7]
The output should be 6, with the first six elements of the array being [1, 1, 3, 5, 5, 7]
Remove Duplicates from Sorted Array II solution in Java
It can also be solved in O(n) time complexity by using two pointers (indexes).
classRemoveDuplicatesSortedArrayII{privatestaticintremoveDuplicates(int[] nums){int n = nums.length;/*
* This index will move when we modify the array in-place to include an element
* so that it is not repeated more than twice.
*/int j =0;for(int i =0; i < n; i++){/*
* If the current element is equal to the element at index i+2, then skip the
* current element because it is repeated more than twice.
*/if(i < n -2&& nums[i]== nums[i +2]){continue;}
nums[j++]= nums[i];}return j;}publicstaticvoidmain(String[] args){int[] nums =newint[]{1,1,1,3,5,5,7};int newLength =removeDuplicates(nums);System.out.println("Length of array after removing duplicates = "+ newLength);System.out.print("Array = ");for(int i =0; i < newLength; i++){System.out.print(nums[i]+" ");}System.out.println();}}
# Output
Length of array after removing duplicates =6
Array =113557
In this quick and short article, you’ll find two examples demonstrating how to check if a File or Directory exists at a given path in Java. We’re going to get familiar with different ways to check the existence of a file or directory.
Check if a File/Directory exists using Java IO’s File.exists()
import java.io.File;
public class CheckFileExists1 {
public static void main(String[] args) {
File file = new File("/Users/callicoder/demo.txt");
if(file.exists()) {
System.out.printf("File %s exists%n", file);
} else {
System.out.printf("File %s doesn't exist%n", file);
}
}
}
Check if a File/Directory exists using Java NIO’s Files.exists() or Files.notExists()
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
public class CheckFileExists {
public static void main(String[] args) {
Path filePath = Paths.get("/Users/callicoder/demo.txt");
// Checking existence using Files.exists
if(Files.exists(filePath)) {
System.out.printf("File %s exists%n", filePath);
} else {
System.out.printf("File %s doesn't exist%n", filePath);
}
// Checking existence using Files.notExists
if(Files.notExists(filePath)) {
System.out.printf("File %s doesn't exist%n", filePath);
} else {
System.out.printf("File %s exists%n", filePath);
}
}
}
So in Part 1 of this article, we’ve learned what micro-frontends are, what they are meant for, and what the main concepts associated with them are. In this part, we are going to dive deep into various micro-frontend architecture types. We will understand what characteristics they share and what they don’t. We’ll see how complex each architecture is to implement. Then through the documents-to-applications continuum concept, we will identify our application type and choose the proper micro-frontend architecture that meets our application’s needs at most.
Micro-frontend architecture characteristics
In order to define micro-frontend architecture types, let’s reiterate over two micro-frontend integration concepts that we learned in the first part: (Fig. 1 and 2).
routing/page transitions
For page transition, we’ve seen that there are server and client-side routings, which in another way are called hard and soft transitions. It’s “hard” because with server-side routing, each page transition leads to a full page reload, which means for a moment the user sees a blank page until the content starts to render and show up. And client-side routing is called soft because on page transition, only some part of the page content is being reloaded (usually header, footer, and some common content stays as it is) by JavaScript. So there is no blank page, and, also the new content is smaller so it arrives earlier than during the full page reload. Therefore, with soft transitions, we undoubtedly have a better user experience than with hard ones.
Figure 1. routing/page transition types
Currently, almost all modern web applications are Single Page Applications (SPAs). This is a type of application where all page transitions are soft and smooth. When speaking about micro-frontends, we can consider each micro-frontend as a separate SPA, so the page transitions within each of them can also be considered soft. From a micro-frontend architectural perspective, by saying page transitions, we meant the transitions from one micro frontend to another and not the local transitions within a micro frontend. Therefore this transition type is one of the key characteristics of a micro-frontend architecture type.
composition/rendering
For composition, we’ve spoken about the server and client-side renderings. In the 2000s, when JavaScript was very poor and limited, almost all websites were fully rendered on the server-side. With the evolution of web applications and JavaScript, client-side rendering became more and more popular. SPAs mentioned earlier also are applications with client-side rendering.
Figure 2. composition/rendering types
Currently, when saying server-side rendering or universal rendering, we don’t assume that the whole application is rendered on the server-side — it’s more like hybrid rendering where some content is rendered on the server-side and the other part on the client-side. In the first article about micro-frontends, we’ve spoken about the cases when universal rendering can be useful. So the rendering is the second characteristic of a micro-frontend architecture type.
Architecture types
Now that we have defined two characteristics of micro-frontend architecture, we can extract four types based on various combinations of those two.
Linked Applications
Hard transitions;
Client-side rendering;
This is the simplest case where the content rendering is being done on the client-side and where the transition between micro-frontends is hard. As an example, we can consider two SPAs that are hosted under the same domain (http://team.com). A simple Nginx proxy server serves the first micro-frontend on the path “/a” and the second micro-frontend on the path “/b.” Page transitions within each micro-frontend are smooth as we assumed they are SPAs.
The transition from one micro-frontend to another happens via links which in fact is a hard transition and leads to a full page reload (Fig. 3). This micro-frontend architecture has the simplest implementation complexity.
Figure 3. Linked application transition model
Linked Universal Applications
Hard transitions;
Universal rendering;
If we add universal rendering features to the previous case, we’ll get the Linked Universal Applications architecture type. This means that some content of either one or many micro frontends renders on the server-side. But still, the transition from one micro to another is done via links. This architecture type is more complex than the first one as we have to implement server-side rendering.
Unified Applications
Soft transitions;
Client-side rendering;
Here comes my favorite part. To have a soft transition between micro-frontends, there should be a shell-application that will act as an umbrella for all micro-frontend applications. It will only have a markup layout and, most importantly, a top-level client routing. So the transition from path “/a” to path “/b” will be handed over to the shell-application (Fig. 4). As it is a client-side routing and is the responsibility of JavaScript, it can be done softly, making a perfect user experience. Though this type of application is fully rendered on the client-side, there is no universal rendering of this architecture type.
Figure 4. Unified application with a shell application
Unified Universal Applications
Soft transitions;
Universal rendering;
And here we reach the most powerful and, at the same time, the most complex architecture type where we have both soft transitions between micro-frontends and universal rendering (Fig. 5).
Figure 5. Micro-frontend architecture types
A logical question can arise like, “oh, why don’t we pick the fourth and the best option?” Simply because these four architecture types have different levels of complexity, and the more complex our architecture is, the more we will pay for it for both implementation and maintenance (Fig. 6). So there is no absolute right or wrong architecture. We just have to find a way to understand how to choose the right architecture for our application, and we are going to do it in the next section.
Figure 6. Micro-frontend architecture types based on their implementation complexities.
Which architecture to choose?
There is an interesting concept called the documents-to-applications continuum (Fig. 7). It’s an axis with two ends where we have more content-centric websites, and on the other side, we have pure web applications that are behavior-centric.
Content-centric: the priority is given to the content. Think of some news or a blog website where users usually open an article. The content is what matters in such websites; the faster it loads, the more time the users save. There are not so many transitions in these websites, so they can easily be hard, and no one will care.
Behavior-centric: the priority is given to the user experience. These are pure applications like online code editors, and playgrounds, etc. In these applications, you are usually in a flow of multiple actions. Thus there are many route/state transitions and user interactions along with the flow. For such types of applications having soft transitions is key for good UX.
Figure 7. Correlation of the document-to-applications continuum and micro-frontend architecture types.
From Fig. 7, we can see that there is an overlap of content and behavior-centric triangles. This is the area where the progressive web applications lay. These are applications where both the content and the user experience are important. A good example is an e-commerce web application.
The faster the first page loads, the more pleasant it will be for the customers to stay on the website and shop.
The first example is that of a content-centric application, while the second one is behavior-centric.
Now let’s see where our architecture types lay in the document-to-applications continuum. Since we initially assumed that each micro-frontend is a SPA by itself, we can assume that all our architecture types lay in the behavior-centric spectrum. Note that this assumption of SPA’s is not a prerogative and is more of an opinionated approach. As both linked universal apps and universal unified apps have universal rendering features, they will lay in progressive web apps combined with both content and behavior-centric spectrums. Linked apps and unified apps are architecture types that are more suited to web applications where the best user experience has higher priority than the content load time.
First of all, we have to identify our application requirements and where it is positioned in the document-to-application continuum. Then it’s all about what complexity level of implementation we can afford.
Conclusion
This time, as we already learned the main concepts of micro-frontends, we dived deep into micro-frontend applications’ high-level architecture. We defined architecture characteristics and came up with four types. And at the end, we learned how and based on what to choose our architecture type. I’d like to mention that I’ve gained most of the knowledge on micro-frontends and especially the things I wrote about in this article from the book by Micheal Geers “ Micro frontends in Action,” which I recommend reading if you want to know much more about this topic.
What’s next?
For the next chapter, we will be going towards the third architecture — Unified apps, as this type of architecture, mostly fits the AI annotation platform. We will be learning how to build unified micro-frontends. So stay tuned for the last third part.
Event sourcing, eventual consistency, microservices, CQRS… These are quickly becoming household names in mainstream application development. But do you know what makes them tick? What are the basic building blocks required to assemble complex, business-centric applications from fine-grained services without turning the lot into a big ball of mud?
This article examines a fundamental building block — event streaming. Leading the charge will be Apache Kafka — the de facto standard in event streaming platforms, which we’ll observe through Kafdrop — a feature-packed web UI.
A Brief Intro
Event streaming platforms reside in the broader class of Message-oriented Middleware (MoM) and are similar to traditional message queues and topics but offer stronger temporal guarantees and typically order-of-magnitude performance gains due to log-structured immutability. In simple terms, write operations are mostly limited to sequential appends, which make them fast. Really fast.
Whereas messages in a traditional Message Queue (MQ) tend to be arbitrarily ordered and generally independent of one another, events (or records) in a stream tend to be strongly ordered, often chronologically or causally. Also, a stream persists its records, whereas an MQ will discard a message as soon as it has been read.
For this reason, event streaming tends to be a better fit for Event-Driven Architectures, encompassing event sourcing, eventual consistency, and CQRS concepts. (Of course, there are FIFO message queues too, but the differences between FIFO queues and fully-fledged event streaming platforms are quite substantial, not limited to ordering alone.)
Event streaming platforms are a comparatively recent paradigm within the broader MoM domain. There are only a handful of mainstream implementations available, compared to hundreds of MQ-style brokers, some going back to the 1980s (e.g. Tuxedo). Compared to established standards such as AMQP, MQTT, XMPP, and JMS, there are no equivalent standards in the streaming space.
Event streaming platforms are an active area of continuous research and experimentation. In spite of this, streaming platforms aren’t just a niche concept or an academic idea with a few esoteric use cases; they can be applied effectively to a broad range of messaging and eventing scenarios, routinely displacing their more traditional counterparts.
The diagram below offers a brief overview of the Kafka component architecture. While the intention isn’t to indoctrinate you with all of Kafka’s inner workings, some appreciation of its design will go a long way in explaining the key concepts that we will cover shortly.
Kafka is a distributed system comprising several key components:
Broker nodes: Responsible for the bulk of I/O operations and durable persistence within the cluster. Brokers accommodate the append-only log files that comprise the topic partitions hosted by the cluster. Partitions can be replicated across multiple brokers for both horizontal scalability and increased durability — these are called replicas. A broker node may act as the leader for certain replicas, while being a follower for others. A single broker node will also be elected as the cluster controller — responsible for the internal management of partition states. This includes the arbitration of the leader-follower roles for any given partition.
ZooKeeper nodes: Under the hood, Kafka needs a way to manage the overall controller status in the cluster. Should the controller drop out for whatever reason, there is a protocol in place to elect another controller from the set of remaining brokers. The actual mechanics of controller election, heart-beating, and so forth, are largely implemented in ZooKeeper. ZooKeeper also acts as a configuration repository of sorts, maintaining cluster metadata, leader-follower states, quotas, user information, ACLs, and other housekeeping items. Due to the underlying gossiping and consensus protocol, the number of ZooKeeper nodes must be odd.
Producers: These are client applications responsible for appending records to Kafka topics. Because of the log-structured nature of Kafka and the ability to share topics across multiple consumer ecosystems, only producers are able to modify the data in the underlying log files. The actual I/O is performed by the broker nodes on behalf of the producer clients. Any number of producers may publish to the same topic, selecting the partitions used to persist the records.
Consumers: These are client applications that read from topics. Any number of consumers may read from the same topic; however, depending on the configuration and grouping of consumers, there are rules governing the distribution of records among the consumers.
Topics, Partitions, Records, and Offsets
A partition is a totally ordered sequence of records and is fundamental to Kafka. A record has an ID — a 64-bit integer offset and a millisecond-precise timestamp. Also, it may have a key and a value; both are byte arrays and both are optional. The term “totally ordered” simply means that for any given producer, records will be written in the order they were emitted by the application. If record P was published before Q, then P will precede Q in the partition. (Assuming P and Q share a partition.)
Furthermore, they will be read in the same order by all consumers; P will always be read before Q, for every possible consumer. This ordering guarantee is vital in a large majority of use cases. Published records will generally correspond to some real-life events, and preserving the timeline of these events is often essential.
Note: Kafka uses the term “record,” where others might use “message” or “event.” In this article, we will use the terms interchangeably, depending on the context. Similarly, you might see the term “stream” as a generic substitute for “topic.”
There is no recognized ordering across producers. If two (or more) producers emit records simultaneously, those records may materialize in arbitrary order. That said, this order will still be observed uniformly across all consumers.
A record’s offset uniquely identifies it in the partition. The offset is a strictly monotonically-increasing integer in a sparse address space, meaning that each successive offset is always higher than its predecessor and there may be varying gaps between neighboring offsets. Gaps might legitimately appear if compaction is enabled or as a result of transactions; we don’t need to delve into the details at this stage. Suffice it to say that offsets need not be contiguous.
Your application shouldn’t attempt to literally interpret an offset or guess what the next offset might be. It may, however, infer the relative order of any record pair based on their offsets, sort the records by their offset, and so forth.
The diagram below shows what a partition looks like on the inside.1
start of partition
2
+--------+-----------------+
3
|0..00000|First record |
4
+--------+-----------------+
5
|0..00001|Second record |
6
+--------+-----------------+
7
|0..00002|Third record |
8
+--------+-----------------+
9
|0..00003|Fourth record |
10
+--------+-----------------+
11
|0..00007|Fifth record |
12
+--------+-----------------+
13
|0..00008|Sixth record |
14
+--------+-----------------+
15
|0..00010|Seventh record |
16
+--------+-----------------+
17
...
18
+--------+-----------------+
19
|0..56789|Last record |
20
+--------+-----------------+
21
end of partition
The beginning offset, also called the low-water mark, is the first message that will be presented to a consumer. Due to Kafka’s bounded retention, this is not necessarily the first message that was published. Records may be pruned on the basis of time and/or partition size. When this occurs, the low-water mark will appear to advance, and records earlier than the low-water mark will be truncated.
Conversely, the high-water mark is the offset immediately following the last record in the partition, also known as the end offset. It is the offset that will be assigned to the next record that will be published. It is not the offset of the last record.
A topic is a logical composition of partitions. A topic may have one or more partitions, and a partition must be a part of exactly one topic. Topics are fundamental to Kafka, allowing for both parallelism and load balancing.
Earlier, we said that partitions exhibit total order. Because partitions within a topic are mutually independent, the topic is said to exhibit partial order. In simple terms, this means that certain records may be ordered in relation to one another while being unordered with respect to certain other records. The concepts of total and partial order, while sounding somewhat academic, are hugely important in the construction of performant event streaming pipelines. They enables us to process records in parallel where we can, while maintaining order where we must. We’ll explore the concepts of record order, consumer parallelism, and topic sizing in a short while.
Example: Publishing Messages
Let’s put some of this theory into practice. We are going to spin up a pair of Docker containers — one for Kafka and another for Kafdrop. Rather than launching them individually, we’ll use Docker Compose.
Create a docker-compose.yaml file in a directory of your choice, containing the following:1
Note: We’re using the obsidiandynamics/kafka image for convenience because it neatly bundles Kafka and ZooKeeper into a single image. If you wanted to, you could replace this with images from Confluent or Wurstmeister, but then you’d have to wire it all up properly. The obsidiandynamics/kafka image does all this for you, so it’s highly recommended for beginners (and lazy pros).
Then, start it with docker-compose up. Once it boots, navigate to localhost:9000 in your browser. You should see the Kafdrop landing screen.
You should see our single-broker cluster. It’s a promising start, but there are no topics. Not a problem; let’s create a topic and publish some messages using Kafka’s command-line tools. Conveniently, we already have a Kafka image running as part of our docker-compose stack, so we can shell into it to use the built-in CLI tools.1
docker exec -it kafka-kafdrop_kafka_1 bash
This gets you into a Bash shell. The tools are in the /opt/kafka/bin directory, so let’s cd into it:1
cd /opt/kafka/bin
Create a topic named streams-intro with three partitions:1
Note:kafka-topics uses the --bootstrap-server argument to configure the Kafka broker list, while kafka-console-producer uses the --broker-list argument for the same purpose. Also, --property arguments are largely undocumented; be prepared to Google your way around.
Records are separated by newlines. The key and the value parts are delimited by colons, as indicated by the key.separator property. For the sake of an example, type in the following (a copy-paste will do):1
foo:first message
2
foo:second message
3
bar:first message
4
foo:third message
5
bar:second message
Press CTRL+D when done. Then, switch back to Kafdrop and click on the streams-intro topic. You’ll see an overview of the topic, along with a detailed breakdown of the underlying partitions:
Let’s pause for a moment and dissect what’s been done. We created a topic with three partitions. We then published five records using two unique keys — foo and bar. Kafka uses keys to map records to partitions, such that all records with the same key will always appear on the same partition. Handy, but also important because it lets the publisher dictate the precise order of records. We’ll discuss key hashing and partition assignments in more detail later; in the meanwhile, sit back and enjoy the ride.
Looking at the partitions table, partition #0 has the first and last offsets at zero and two respectively. Partition #2 has them at zero and three, while partition #1 appears to blank. Clicking on #0 in the Kafdrop web UI sends us to a topic viewer:
We can see the two records published under the bar key. Note, they are completely unrelated to the foo records. Other than being collated within the same topic, there is nothing that binds records across partitions.
Note: In case you were wondering, the arrow to the left of the message lets you expand and pretty-print JSON-encoded messages. As our examples didn’t use JSON, there’s nothing to pretty-print.
It can be said without exaggeration that Kafka’s built-in tooling is an abomination. There is no consistency in the naming of command arguments and the simple act of publishing keyed messages requires you to jump through hoops — passing in obscure, undocumented properties. The usability of the built-in tools is a well-known heartache within the Kafka community. This is a real shame. It’s like buying a Ferrari, only to have it delivered with plastic hub caps. Fortunately, there are alternatives — both commercial and open source — that can fill the glaring gaps in tooling and observability.
Consumers and Consumer Groups
So far we have learned that producers emit records into the stream; these records are organized into nicely ordered partitions. Kafka’s pub-sub topology adheres to a flexible multipoint-to-multipoint model, meaning that there may be any number of producers and consumers simultaneously interacting with a stream. Depending on the actual solution context, stream topologies may also be point-to-multipoint, multipoint-to-point, and point-to-point. It’s about time we looked at how records are consumed.
A consumer is a process or a thread that attaches to a Kafka cluster via a client library. (One is available for most languages.) A consumer generally, but not necessarily, operates as part of an encompassing consumer group. The group is specified by the group.id property. Consumer groups are effectively a load-balancing mechanism within Kafka — distributing partition assignments approximately evenly among the individual consumer instances within the group.
When the first consumer in a group joins the topic, it will receive all partitions in that topic. When a second consumer subsequently joins, it will get approximately half of the partitions, relieving the first consumer of half of its prior load. The process runs in reverse when consumers leave (by disconnecting or timing out) — the remaining consumers will absorb a greater number of partitions.
So, a consumer siphons records from a topic, pulling from the share of partitions that have been assigned to it by Kafka, alongside the other consumers in its group. As far as load-balancing goes, this should be fairly straightforward. But here’s the kicker — the act of consuming a record does not remove it. This might seem contradictory at first, especially if you associate the act of consuming with depletion. (If anything, a consumer should have been called a ‘reader’, but let’s not dwell on the choice of terminology.)
The simple fact is, consumers have absolutely no impact on topics and their partitions. A topic is an append-only ledger that may only be mutated by the producer, or by Kafka itself (as part of compaction or cleanup). Consumers are “cheap,” so you can have quite a number of them tail the logs without stressing the cluster. This is yet another point of distinction between an event stream and a traditional message queue, and it’s a crucial one.
A consumer internally maintains an offset that points to the next record in a partition, advancing the offset for every successive read. When a consumer first subscribes to a topic, it may elect to start at either the head-end or the tail-end of the topic. This behavior is controlled by setting the auto.offset.reset property to one of latest, earliest or none. In the latter case, an exception will be thrown if no previous offset exists for the consumer group.
Consumers retain their offset state vector locally. Since consumers across different consumer groups do not interfere, there may be any number of them reading concurrently from the same topic. Consumers run at their own pace — a slow or backlogged consumer has no impact on its peers.
To illustrate this concept, consider a scenario involving a topic with two partitions. Two consumer groups, A and B, are subscribed to the topic. Each group has three instances, the consumers being named A1, A2, A3, B1, B2, and B3. The diagram below illustrates how the two groups might share the topic and how the consumers advance through the records independently of one another.1
Look carefully and you’ll notice something is missing. Consumers A3 and B1 aren’t there. That’s because Kafka guarantees that a partition may only be assigned to at most one consumer within its consumer group. (We say ‘at most’ to cover the case when all consumers are offline.) Because there are three consumers in each group, but only two partitions, one consumer will remain idle — waiting for another consumer in its respective group to depart before being assigned a partition.
In this manner, consumer groups are not only a load-balancing mechanism, but also a fence-like exclusivity control, used to build highly performant pipelines without sacrificing safety, particularly when there is a requirement that a record may only be handled by one thread or process at any given time.
Consumer groups are also used to ensure availability. By periodically pulling records from a topic, the consumer implicitly signals to the cluster that it’s in a ‘healthy’ state, thereby extending the lease over its partition assignment. However, should the consumer fail to read again within the allowable deadline, it will be deemed faulty and its partitions will be reassigned — apportioned among the remaining ‘healthy’ consumers within its group. This deadline is controlled by the max.poll.interval.ms consumer client property, set to five minutes by default.
To use a transportation analogy, a topic is like a highway, while a partition is a lane. A record is the equivalent of a car, and its occupants correspond to the record’s value. Several cars can safely travel on the same highway, providing they keep to their lane. Cars sharing the same line ride in a sequence, forming a queue. Now, suppose each lane leads to an off-ramp, diverting its traffic to some location. If one off-ramp gets banked up, other off-ramps may still flow smoothly.
It’s precisely this highway-lane metaphor that Kafka exploits to achieve its end-to-end throughput, easily reaching millions of records per second on commodity hardware. When creating a topic, one can choose the partition count — the number of lanes, if you will.
The partitions are divided approximately evenly among the individual consumers in a consumer group, with a guarantee that no partition will be assigned to two (or more) consumers at the same time, providing that these consumers are part of the same consumer group. Referring to our analogy, a car will never end up in two off-ramps simultaneously; however, two lanes might conceivably lead to the same off-ramp.
Note: A topic may be resized after creation by increasing the number of partitions. It is not possible, however, to decrease the partition count without recreating the topic.
Records correspond to events, messages, commands, or any other streamable content. Precisely how records are partitioned is left to the discretion of the producer(s). A producer may explicitly assign a partition index when publishing a record, although this approach is rarely used. A much more common approach is to assign a key to a record, as we have done in our earlier example. The key is completely opaque to Kafka. In other words, Kafka doesn’t attempt to interpret the contents of the key, treating it as an array of bytes. These bytes are hashed to derive a partition index, using a consistent hashing technique.
Records sharing the same hash are guaranteed to occupy the same partition. Assuming a topic with multiple partitions, records with a different key will likely end up in different partitions. However, due to hash collisions, records with different hashes may also end up in the same partition. Such is the nature of hashing. If you understand how a hash table works, this is no different.
Producers rarely care which specific partition the records will map to, only that related records end up in the same partition, and that their order is preserved. Similarly, consumers are largely indifferent to their assigned partitions, so long that they receive the records in the same order as they were published, and their partition assignment does not overlap with any other consumer in their group.
Committing Offsets
We already said that consumers maintain an internal state with respect to their partition offsets. At some point, that state must be shared with Kafka, so that when a partition is reassigned, the new consumer can resume processing from where the outgoing consumer left off. Similarly, if the consumers were to disconnect, upon reconnection they would ideally skip over those records that have already been processed.
Persisting the consumer state back to the Kafka cluster is called committing an offset. Typically, a consumer will read a record (or a batch of records) and commit the offset of the last record, plus one. If a new consumer takes over the topic, it will commence processing from the last committed offset, hence the plus-one step is essential. (Otherwise, the last processed record would be handled a second time.)
Fun fact: Kafka employs a recursive approach to managing committed offsets, elegantly utilising itself to persist and track offsets. When an offset is committed, Kafka will publish a binary record on the internal __consumer_offsets topic. The contents of this topic are compacted in the background, creating an efficient event store that progressively reduces to only the last known commit points for any given consumer group.
Controlling the point when an offset is committed provides a great deal of flexibility around delivery guarantees, handing Kafka a yet another trump card. The term ‘delivery’ assumes not just reading a record, but the full processing cycle, complete with any side-effects. One can shift from an at-most-once to an at-least-once delivery model by simply moving the commit operation from a point before the processing of a record is commenced, to a point sometime after the processing is complete. With this model, should the consumer fail midway through processing a record, the record will be re-read the following partition reassignment.
By default, a Kafka consumer will automatically commit offsets every five seconds, regardless of whether the consumer has finished processing the record. Often, this is not what you want, as it may lead to mixed delivery semantics. For example, in the event of consumer failure, some records might be delivered twice, while others might not be delivered at all. To enable manual offset committing, set the enable.auto.commit property to false.
Note: There are a few gotchas like this in Kafka. Pay close attention to the (producer and consumer) client properties in the official Kafka documentation, particularly to the stated defaults. Don’t assume for a moment that the defaults are sensible, insofar as they ought to favour safety over other competing qualities. Kafka defaults tend to be optimised for performance, and will need to be explicitly overridden on the client when safety is a critical objective. Fortunately, setting the properties to insure safety has only a minor impact on performance — Kafka is still a beast. Remember the first rule of optimisation: Don’t do it. Kafka would have been even better, had their creators given this more thought.
Getting offset committing right can be tricky, and routinely catches out beginners. A committed offset implies that the record one below that offset and all prior records have been dealt with by the consumer. When designing at-least-once or exactly-once applications, an offset should only be committed when the application is dealt with with the record in question and all records before it.
In other words, the record has been processed to the point that any actions that would have resulted from the record have been carried out and finalized. This may include calling other APIs, updating a database, committing transactions, persisting the record’s payload, or publishing more records. Stated otherwise, if the consumer were to fail after committing the record, then not ever seeing this record again must not be detrimental to its correctness.
In the at-least-once (and by extension, the exactly-once) scenario, a typical consumer implementation will commit its offset linearly, in tandem with the processing of the records. That is, read a record, commit it (plus-one), read the next, commit it (plus one), and so on. A common tactic is to process a batch of records concurrently (where this makes sense), using a thread pool, and only confirm the last record when the entire batch is done. The commit process in Kafka is very efficient, the client library will send commit requests asynchronously to the cluster using an in-memory queue, without blocking the consumer. The client application can register an optional callback, notifying it when the commit has been acknowledged by the cluster.
The consumer group is a somewhat understated concept that is pivotal to the versatility of an event streaming platform. By simply varying the affinity of consumers with their groups, one can arrive at vastly different distribution topologies — from a topic-like, pub-sub behavior to an MQ-style, point-to-point model. Because records are never truly consumed (the advancing offset only creates the illusion of consumption), one can concurrently superimpose disparate distribution topologies over a single event stream.
Free Consumers
Consumer groups are completely optional; a consumer does not need to be encompassed in a consumer group to pull messages from a topic. A free consumer omits the group.id property. Doing so allows it to operate under relaxed rules, entirely transferring the responsibility for consumer management to the application.
Note: The use of the term ‘free’ to denote a consumer without an encompassing group is not part of the standard Kafka nomenclature. As Kafka lacks a canonical term to describe this, the term ‘free’ was adopted here.
Free consumers do not subscribe to a topic. Instead, the consuming application is responsible for manually assigning a set of topic-partitions to the consumer, individually specifying the starting offset for each topic-partition pair. Free consumers do not commit their offsets to Kafka; it is up to the application to track the progress of such consumers and persist their state as appropriate, using a datastore of their choosing. The concepts of automatic partition assignment, rebalancing, offset persistence, partition exclusivity, consumer heart-beating and failure detection, and other so-called niceties accorded to consumer groups cease to exist in this mode.
Free consumers are not observed in the wild as often as their grouped counterparts. There are predominantly two use cases where a free consumer is an appropriate choice. The first, is when you genuinely need full control of the partition assignment scheme and/or you require an alternative place to store consumer offsets. This is very rare.
Needless to say, it’s also very difficult to implement correctly, given the multitude of scenarios one must account for. The second, more commonly seen use case, is when you have a stateless or ephemeral consumer that needs to monitor a topic. For example, you might be interested in tailing a topic to identify specific records, or just as a debugging tool. You might only care about records that were published when your stateless consumer was online, so concerns such as persisting offsets and resuming from the last processed record are completely irrelevant.
A good example of where this is used routinely is the Kafdrop web UI, which we’ve already seen. When you click on a topic to view the messages, Kafdrop creates a free consumer and assigns the requested partition to it, reading the records from the supplied offsets. Navigating to a different topic or partition will reset the consumer, discarding any prior state.
The illustration below outlines the relationship between producers, topics, partitions, consumers, and consumer groups.1
Topics are subdivided into partitions, each forming an independent, totally-ordered sequence within a wider, partially-ordered stream.
Multiple producers are able to publish to a topic, picking a partition at will. This may be accomplished either directly, by specifying a partition index, or indirectly, by way of a record key, which deterministically hashes to a consistent partition index. (In the diagram above, both Producer 1 and Producer 2 publish to the same topic.)
Partitions in a topic can be load-balanced across a population of consumers in a consumer group, allocating partitions approximately evenly among the members of that group. (Consumer 2 and Consumer 3 each get one partition.)
A consumer in a group is not guaranteed a partition assignment. Where the group’s population outnumbers the partitions, some consumers will remain idle until this balance equalizes or tips in favor of the other side. (Consumer 4 remains partition-less.)
Partitions may be manually assigned to free consumers. If necessary, an entire topic may be assigned to a single free consumer — this is done by individually assigning all partitions. (Consumer 1 can be freely assigned any partition.)
Exactly-Once Delivery
When contrasting at-least-once with at-most-once delivery semantics, an often-asked question is: Why can’t we have it exactly once?
Without delving into the academic details, which involve conjectures and impossibility proofs, it is sufficient to say that exactly-once semantics are not possible without collaboration with the consumer application. What does this mean in practice?
Consumers in event streaming applications must be idempotent. In other words, processing the same record repeatedly should have no net effect on the consumer ecosystem. If a record has no additive effects, the consumer is inherently idempotent. (For example, if the consumer simply overwrites an existing database entry with a new one, then the update is naturally idempotent.) Otherwise, the consumer must check whether a record has already been processed, and to what extent, prior to processing a record. The combination of at-least-once delivery and consumer idempotence collectively leads to exactly-once semantics.
Example: A Trading Platform
With all this theory looming over us like Kubrick’s Monolith, it would be inappropriate to conclude without offering the reader a practical scenario.
Let’s say you were looking for specific price patterns in listed stocks, emitting trading signals when a particular pattern is identified. There are a large number of stocks, and understandably you’d like them processed in parallel. However, the time series for any given ticker code must be processed sequentially on a single consumer.
Kafka makes this use case, and others like it, almost trivial to implement. We would create a pair of topics: prices for the raw price data, and orders for any resulting orders. We can be fairly generous with our partition counts, as the nature of the data gives us ample opportunities for parallelism.
At the feed source, we could publish a record for each price on the prices topic, keyed by the ticker code. Kafka’s automatic partition assignment will ensure that every ticker code is handled by (at most) one consumer in its group. The consumer instances are free to scale in and out to match the processing load. Consumer groups should be meaningfully named, ideally reflecting the purpose of the consuming application. A good example might be trading-strategy.abc, for a fictitious trading strategy named ‘ABC’.
Once a price pattern is identified by the consumer, it can publish another message — the order request — on the orders topic. We’ll muster up another consumer group — order-execution — responsible for reading the orders and forwarding them to the broker.
In this simple example, we have created an end-to-end trading pipeline that is entirely event-driven and highly scalable — at least theoretically, assuming there are no other bottlenecks. We can dynamically add more processing nodes to the individual stages to cope with the increased load where it’s called for.
Now, let’s spice things up a bit. Suppose you need several trading strategies operating concurrently, driven by a common data feed. Furthermore, the trading strategies will be developed by different teams; the objective being to decouple these implementations as much as possible, allowing the teams to operate autonomously — develop and deploy at their individual cadence, perhaps even using different programming languages and tool-chains. That said, you’d ideally want to reuse as much of what’s already been written. So, how would we pull this off?
Kafka’s flexible multipoint-to-multipoint pub-sub architecture combines stateful consumption with broadcast semantics. Using distinct consumer groups, Kafka allows disparate applications to share input topics, processing events at their own pace. The second trading strategy would need a dedicated consumer group — trading-strategy.xyz — applying its specific business logic to the common pricing stream, publishing the resulting orders to the same orders topic. In this fashion, Kafka enables you to construct modular event processing pipelines from discrete elements that are readily reusable and composable.
Note: In the days of service buses and traditional ‘enterprisey’ message brokers, before event sourcing entered the mainstream, you would have had to choose between persistent message queues or transient broadcast topics. In our example, you would likely have created multiple FIFO queues, using the fan-out pattern. Because Kafka generalises pub-sub topics and persistent message queues into a unified model, a single source topic can power a diverse range of consumers without incurring duplication.
In Conclusion
Event streaming platforms are a highly effective building block in the construction of modular, loosely-coupled, event-driven applications. Within the world of event streaming, Kafka has solidified its position as the go-to open-source solution that is both amazingly flexible and highly performant. Concurrency and parallelism are at the heart of Kafka’s architecture, forming partially-ordered event streams that can be load-balanced across a scalable consumer ecosystem. A simple reconfiguration of consumers and their encompassing groups can bring about vastly different event distribution and processing semantics; shifting the offset commit point can invert the delivery guarantee from an at-most-once to an at-least-once model.
Of course, Kafka isn’t without its flaws. The tooling is sub-par, to put it mildly; most Kafka practitioners have long abandoned the out-of-the-box CLI utilities in favour of other open-source tools such as Kafdrop, Kafkacat and third-party commercial offerings like Kafka Tool. The breadth of Kafka’s configuration options is overwhelming, with defaults that are riddled with gotchas, ready to shock the unsuspecting first-time user.
All in all, Kafka represents a paradigm shift in how we architect and build complex systems. Its benefits go beyond the superfluous, and they dwarf any of the niggles that are bound to exist in a technology that has undergone such aggressive adoption. Crucially, it paves the way for further progress in its space; Apache Pulsar is a prime example of an alternative platform that has improved on much of Kafka’s shortcomings, yet owes a great deal to its predecessor for laying the cornerstone and bringing the genre to the mainstream.
In this spring boot tutorial, we will learn about spring-boot-starter-parent dependency which is used internally by all spring boot dependencies. We will also learn what all configurations this dependency provides, and how to override them.
What is spring-boot-starter-parent dependency?
The spring-boot-starter-parent dependency is the parent POM providing dependency and plugin management for Spring Boot-based applications. It contains the default versions of Java to use, the default versions of dependencies that Spring Boot uses, and the default configuration of the Maven plugins.
Few important configurations provided by this file are as below. Please refer to this link to read the complete configuration.
<?xmlversion="1.0"encoding="UTF-8"?><projectxmlns="http://maven.apache.org/POM/4.0.0"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:schemaLocation="http://maven.apache.org/POM/4.0.0http://maven.apache.org/xsd/maven-4.0.0.xsd;<modelVersion>4.0.0</modelVersion><parent><groupId>org.springframework.boot</groupId><artifactId>spring-boot-dependencies</artifactId><version>${revision}</version><relativePath>../../spring-boot-dependencies</relativePath></parent><artifactId>spring-boot-starter-parent</artifactId><packaging>pom</packaging><name>Spring Boot Starter Parent</name><description>Parent pom providing dependency and plugin management for applicationsbuilt with Maven</description><properties><java.version>1.8</java.version><resource.delimiter>@</resource.delimiter> <!-- delimiter that doesn't clash with Spring ${} placeholders --><project.build.sourceEncoding>UTF-8</project.build.sourceEncoding><project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding><maven.compiler.source>${java.version}</maven.compiler.source><maven.compiler.target>${java.version}</maven.compiler.target></properties>...<resource><directory>${basedir}/src/main/resources</directory><filtering>true</filtering><includes><include>**/application*.yml</include><include>**/application*.yaml</include><include>**/application*.properties</include></includes></resource></project>
The spring-boot-starter-parent dependency further inherits from spring-boot-dependencies, which is defined at the top of above POM file at line number : 9.
This file is the actual file which contains the information of default version to use for all libraries. The following code shows the different versions of various dependencies that are configured in spring-boot-dependencies:
Above list is very long and you can read complete list in this link.
How to override default dependency version?
As you see, spring boot has default version to use for most of dependencies. You can override the version of your choice or project need, in properties tag in your project’s pom.xml file.
e.g. Spring boot used default version of google GSON library as 2.8.2.
Now in your eclipse editor, you can see the message as : The managed version is 2.7 The artifact is managed in org.springframework.boot:spring-boot-dependencies:2.0.0.RELEASE.
Almost every website or web-application uses routing. Discovering a website by changing its URL is a very powerful feature that comes standard with the web. How all of this is handled can vary a lot between different websites and web-applications.
All websites and web-applications, whether they use server-side or client-side routing, are accessed from a server. How a website or web-application responds to different URLs is commonly handled server-side, although with the rising popularity of JavaScript frameworks, other ways have been found to manage routing.
Routing
Routing is the mechanism by which requests are connected to some code. It is essentially the way you navigate through a website or web-application. By clicking on a link, the URL changes which provides the user with some new data or a new webpage.
Server-side
When browsing, the adjustment of a URL can make a lot of things happen. This will happen regularly by clicking on a link, which in turn will request a new page from the server. This is what we call a server-side route. A whole new document is served to the user.
A server-side request causes the whole page to refresh. This is because a new GET request is sent to the server which responds with a new document, completely discarding the old page altogether.
Pros
A server-side route will only request the data that’s needed. No more, no less.
Because server-side routing has been the standard for a long time, search engines are optimised for webpages that come from the server.
Cons
Every request results in a full-page refresh. That means that unnecessary data is being requested. A header and a footer of a webpage often stays the same. This isn’t something you would want to request from the server again.
It can take a while for the page to be rendered. However, this is only the case when the document to be rendered is very large or when you have slow internet speed.
Client-side
A client-side route happens when the route is handled internally by the JavaScript that is loaded on the page. When a user clicks on a link, the URL changes but the request to the server is prevented. The adjustment to the URL will result in a changed state of the application. The changed state will ultimately result in a different view of the webpage. This could be the rendering of a new component, or even a request to a server for some data that the application will turn into some HTML elements.
It is important to note that the whole page won’t refresh when using client-side routing. There are just some elements inside the application that will change.
Pros
Because less data is processed, routing between views is generally faster.
Smooth transitions and animations between views are easier to implement.
Cons
The whole website or web-application needs to be loaded on the first request. That’s why the initial loading time usually takes longer.
Because the whole website or web-application is loaded initially, there is a possibility that there is data downloaded for views you won’t even come across.
It requires more setup work or even a library. Because server-side is the standard, extra code must be written to make client-side routing possible.
Search engine crawling is less optimised. Google is making good progress on crawling single-paged-apps, but it isn’t nearly as efficient as server-side routed websites.
Summary
There is no best method to manage your routing. Server-side and client-side routing both have their advantages and weaknesses. It is important to make your decision based on the needs of your website or web-application, or heck, even combine the two.
Learn to create maven web project in eclipse which we should be able to import on eclipse IDE for further development.
To create eclipse supported web project, we will need to create first a normal maven we application and then we will make it compatible to eclipse IDE.
1. Create maven web project in eclipse
Run this maven command to create a maven web project named ‘demoWebApplication‘. Maven archetype used is ‘maven-archetype-webapp‘.
This will create maven web project structure and web application specific files like web.xml.
2. Convert to eclipse dynamic web project
To Convert created maven web project to eclipse dynamic web project, following maven command needs to be run.
$ mvn eclipse:eclipse -Dwtpversion=2.0
Please remember that adding “-Dwtpversion=2.0” is necessary, otherwise using only “mvn eclipse:eclipse” will convert it to only normal Java project (without web support), and you will not be able to run it as web application.
3. Import web project in Eclipse
Click on File menu and click on Import option.
Now, click on “Existing project..” in general section.
Now, browse the project root folder and click OK. Finish.
Above steps will import the project into eclipse work space. You can verify the project structure like this.
In this maven tutorial, we learned how to create maven dynamic web project in eclipse. In this example, I used eclipse oxygen. You may have different eclipse version but the steps to follow will be same.
You can declare properties inside Kotlin classes in the same way you declare any other variable. These properties can be mutable (declared using var) or immutable (declared using val).
Here is an example –
// User class with a Primary constructor that accepts three parameters
class User(_id: Int, _name: String, _age: Int) {
// Properties of User class
val id: Int = _id // Immutable (Read only)
var name: String = _name // Mutable
var age: Int = _age // Mutable
}
You can get or set the properties of an object using the dot(.) notation like so –
val user = User(1, "Jack Sparrow", 44)
// Getting a Property
val name = user.name
// Setting a Property
user.age = 46
// You cannot set read-only properties
user.id = 2 // Error: Val cannot be assigned
Getters and Setters
Kotlin internally generates a default getter and setter for mutable properties, and a getter (only) for read-only properties.
It calls these getters and setters internally whenever you access or modify a property using the dot(.) notation.
Here is how the User class that we saw in the previous section looks like with the default getters and setters –
class User(_id: Int, _name: String, _age: Int) {
val id: Int = _id
get() = field
var name: String = _name
get() = field
set(value) {
field = value
}
var age: Int = _age
get() = field
set(value) {
field = value
}
}
You might have noticed two strange identifiers in all the getter and setter methods – field and value.
Let’s understand what these identifiers are for –
1. value
We use value as the name of the setter parameter. This is the default convention in Kotlin but you’re free to use any other name if you want.
The value parameter contains the value that a property is assigned to. For example, when you write user.name = "Bill Gates", the value parameter contains the assigned value “Bill Gates”.
2. Backing Field (field)
Backing field helps you refer to the property inside the getter and setter methods. This is required because if you use the property directly inside the getter or setter then you’ll run into a recursive call which will generate a StackOverflowError.
Let’s see that in action. Let’s modify the User class and refer to the properties directly inside the getters and setters instead of using the field identifier –
class User(_name: String, _age: Int) {
var name: String = _name
get() = name // Calls the getter recursively
var age: Int = _age
set(value) {
age = value // Calls the setter recursively
}
}
fun main(args: Array<String>) {
val user = User("Jack Sparrow", 44)
// Getting a Property
println("${user.name}") // StackOverflowError
// Setting a Property
user.age = 45 // StackOverflowError
}
The above program will generate a StackOverflowError due to the recursive calls in the getter and setter methods. This is why Kotlin has the concept of backing fields. It makes storing the property value in memory possible. When you initialize a property with a value in the class, the initializer value is written to the backing field of that property –
class User(_id: Int) {
val id: Int = _id // The initializer value is written to the backing field of the property
}
A backing field is generated for a property if
A custom getter/setter references it through the field identifier or,
It uses the default implementation of at least one of the accessors (getter or setter). (Remember, the default getter and setter references the field identifier themselves)
For example, a backing field is not generated for the id property in the following class because it neither uses the default implementation of the getter nor refers to the field identifier in the custom getter –
class User() {
val id // No backing field is generated
get() = 0
}
Since a backing field is not generated, you won’t be able to initialize the above property like so –
class User(_id: Int) {
val id: Int = _id //Error: Initialization not possible because no backing field is generated which could store the initialized value in memory
get() = 0
}
Creating Custom Getters and Setters
You can ditch Kotlin’s default getter/setter and define a custom getter and setter for the properties of your class.
Here is an example –
class User(_id: Int, _name: String, _age: Int) {
val id: Int = _id
var name: String = _name
// Custom Getter
get() {
return field.toUpperCase()
}
var age: Int = _age
// Custom Setter
set(value) {
field = if(value > 0) value else throw IllegalArgumentException("Age must be greater than zero")
}
}
fun main(args: Array<String>) {
val user = User(1, "Jack Sparrow", 44)
println("${user.name}") // JACK SPARROW
user.age = -1 // Throws IllegalArgumentException: Age must be greater that zero
}
Conclusion
Thanks for reading folks. In this article, You learned how Kotlin’s getters and setters work and how to define a custom getter and setter for the properties of a class.
In this article, you’ll learn how to read a text file or binary (image) file in Java using various classes and utility methods provided by Java like:
BufferedReader, LineNumberReader, Files.readAllLines, Files.lines, BufferedInputStream, Files.readAllBytes, etc.
Let’s look at each of the different ways of reading a file in Java with the help of examples.
Java read file using BufferedReader
BufferedReader is a simple and performant way of reading text files in Java. It reads text from a character-input stream. It buffers characters to provide efficient reading.
Java read file line by line using Files.readAllLines()
Files.readAllLines() is a utility method of the Java NIO’s Files class that reads all the lines of a file and returns a List<String> containing each line. It internally uses BufferedReader to read the file.
Java read binary file (image file) using BufferedInputStream
All the examples presented in this article so far read textual data from a character-input stream. If you’re reading a binary data such as an image file then you need to use a byte-input stream.
BufferedInputStream lets you read raw stream of bytes. It also buffers the input for improving performance
import java.io.*;
import java.nio.file.Files;
import java.nio.file.Paths;
public class BufferedInputStreamImageCopyExample {
public static void main(String[] args) {
try(InputStream inputStream = Files.newInputStream(Paths.get("sample.jpg"));
BufferedInputStream bufferedInputStream = new BufferedInputStream(inputStream);
OutputStream outputStream = Files.newOutputStream(Paths.get("sample-copy.jpg"));
BufferedOutputStream bufferedOutputStream = new BufferedOutputStream(outputStream)) {
byte[] buffer = new byte[4096];
int numBytes;
while ((numBytes = bufferedInputStream.read(buffer)) != -1) {
bufferedOutputStream.write(buffer, 0, numBytes);
}
} catch (IOException ex) {
System.out.format("I/O error: %s%n", ex);
}
}
}
Java read file into []byte using Files.readAllBytes()
If you want to read the entire contents of a file in a byte array then you can use the Files.readAllBytes() method.
import com.sun.org.apache.xpath.internal.operations.String;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;
public class FilesReadAllBytesExample {
public static void main(String[] args) {
try {
byte[] data = Files.readAllBytes(Paths.get("demo.txt"));
// Use byte data
} catch (IOException ex) {
System.out.format("I/O error: %s%n", ex);
}
}
}
I’m a big fan of Microservices. Does it sound weird, given the title? Well, it should not.
Every now and then, a new paradigm or technology arises. It brings hope and hype. This will save us all! Yeah, it’s true. Maybe it is ground-breaking. Extremely beneficial. However, like everything in life, every solution has its scope.
Have you ever tried to apply Newtonian laws to subatomic particles? Or, have you ever tried to squeeze a television into a sock? Pretty much the same here. Hype makes you forget this little clause at the end of the contract. And results are catastrophic.
In a normal web application, you have a single process serving all requests. Well, you can replicate it for balancing reasons, but in any case there’s a single package/software you have to run. And it bundles all the logic together.
When you use Microservices, instead, you have several, distinct processes, each one serving just a subset of requests. Each process runs a distinct package, with just the code it needs.
Is it really new?
No, it isn’t.
The Microservice pattern is nothing new, actually, like almost every emergent design (besides it has been around for some years). It is Object-Oriented Programming applied on a higher level: application architecture. No coincidence that it often makes use of HTTP APIs, particularly RESTful, which are in turn OOP principles applied to API design. P.S. I hate acronyms.
Let me explain.
When you create a program using Object-Oriented Programming, you split it into several pieces, Classes. Each Class represents a component of the solution and has a specific role. It also works just on its own subset of data. Does it sound familiar?
This is not the right place to discuss about Object-Oriented Programming in general, but I have to point out some aspects. If you are not familiar with it, you can reference online documentation and, soon, my other article about it!
The Two Principles — The Good Tailor
When you design Classes, you must follow two main principles: loose coupling and high cohesion.
To be honest, these should be the guiding stars of each software project, not just OOP ones. But that’s another story. Or is it?
High cohesionis the situation where each component is strongly consistent. All its data and methods refer to a very specific area, or meaning, or role.
Loose couplingis the situation where each component has the least possible amount of interactions with other components. It can work on its own, hiding low-level details, and exposes the essential interfaces.
For Microservices it is the same. I can’t stress this enough. A good OOP design leads to an optimal Microservice design. There is no middle ground.
This should be almost enough to answer why and when to use Microservices. But since there is more, let’s break it down.
Be a Good Tailor. Loosely sew highly-entangled (coherent) fabrics. Photo by Jeff Wade on Unsplash
When to split?
This is exactly as it is for Object-Oriented Programming.
When you can respect high cohesion and loose coupling.
No differently.
You know how to split code into Classes in the right way. Why would you split your Microservices randomly? If a bad Class diagram is painful to maintain and manage, because of the sparsity of code in different files, think of randomly split Microservices application, where you have different processes altogether.
But why on Earth would I split my application into sub-processes?
Imagine how much it’s easy to make two Objects interact. You just have to call an Object’s method and you are good to go. All within a single program.
Microservices, instead, live in different processes. Possibly, on different machines. You have to communicate over a network, using APIs.
It adds complexity! You must have excellent reasons to do so. Not for trend, not for fun. Because, I assure you, if you use Microservices randomly, their management is not fun at all.
Actually, you must have the same reasons you’d have when you decide to use APIs. It’s the same.
APIs let you hide all the implementation details behind them. That’s essential in some cases, and it offers incredible benefits.
1. Scalability for non-uniform traffic
Scenario: you have a Supermarket application. You have a Stock Microservice, that just shows you the amounts of articles in stock, and a Vision Microservice, which uses GPUs to detect your articles given a picture of them.
If you receive thousands of calls for your Stock APIs and just a bunch of requests for your Vision APIs, you could just replicate your Stock Microservice.
No need to double your GPU resources!
Think of it. If you kept everything in a single machine or application, and you still wanted to scale to match all the incoming requests, you would have to replicate it entirely. A waste.
2. Error resilience
Suppose you have a Bank application. Your Transfer Microservice keeps crashing, maybe for a temporary error or, worse, for a newly introduced bug (damn untested releases…).
Why should the entire Bank application be down? Maybe some customers wouldn’t even notice, because they just want to see their movements or they want to use their cards.
3. Separate deployments
Similarly to the reason above, if you have a new feature to add or a bug to fix, you can just deploy the right Microservice. Your build and deployment times should be smaller.
Also, you could put your Microservices on different machines, in different locations, maybe.
4. Complete isolation
Maybe you need complete isolation between Microservices. In this way, you can use different DBs (SQL and noSQL) or different technologies altogether. This will give you complete freedom! And it is of course strictly related to the other points as well, about error resilience or requirements.
5. Different requirements
Do you have a heavy computation to perform? Maybe a Machine Learning one, where you need GPUs, their libraries… Maybe that part would be much better in Python, while the other one could benefit from Java. Or maybe libraries for the same language would create some conflicts, but you desperately need two specific versions for two distinct tasks. And finally, perhaps you want to use two different Cloud products to host your two distinct pieces.
There are plenty of reasons, some more common than others.
In those cases, split the application! Probably it’s easier. But remember to respect the Two Principles! Be a Good Tailor, or you would have hard times sharing data between your Microservices…
Even the highest of Monoliths is made by blocks. It might be atoms, but it is. Photo by Zoltan Tasi on Unsplash
Is it not your case? Go for a Monolith.
If your system is tightly coupled, use a single application. A Monolith.
You’ll have cleaner dependency management, and you won’t need complex orchestrations or distributed systems to trace errors, share data, collect logs, sync calls, secure network interactions… and I can go on.
Is Object-Oriented Programming not suitable for your application architecture? Or, better, is RESTful design not suitable? Microservices won’t be either!
If your application serves a long sequence of API calls, maybe because an API requires the result of the previous ones, this is not ideal. It would introduce also network delays. If your application is a single procedure that requires user interaction, well, the sentence is self-explaining. It is a Procedure! You have to share a state between calls, each component is not independent and a single error in the sequence will block it entirely. There is hardly a benefit in splitting the application.
And remember, Monolith doesn’t mean Mess
Monolith.
Mess.
They are also spelled differently.
Someone sees Monolith as a bunch of unordered code. They claim that Microservice pattern is the only way to have splits, order and all.
Well, I’ll reveal something.
It’s — not — true.
Your code design does not depend on process divisions. Object-Oriented Programming doesn’t either. If you split your code right, if you manage your dependencies in a clear hierarchy, as well as your files, you can achieve the same level of cohesion and decoupling.
And another secret: if you divide your APIs into separate packages/modules/controllers/blueprints/whatever in your code, then you can decide to serve them in a single process or split them in Microservices at build-time, or at run-time too, for free.
Wow.
Conclusions
I’m a big fan of Microservices, as I said.
If used in the right context, for the right reasons, they solve issues and improve performances or costs.
But please, stop making them the one and only solution for all problems. Because they can also be the root of all evil.