Welcome To Fusebes - Dev & Programming Blog

Kotlin Variables and Data Types
07
Mar
2021

Kotlin Variables and Data Types

In this article, You’ll learn how to declare variables in Kotlin, how Kotlin infers the type of variables, and what are the basic data types supported by Kotlin for creating variables.

You’ll also learn how to work with various data types and how to convert one type to another.

Variables

A variable refers to a memory location that stores some data. It has a name and an associated type. The type of a variable defines the range of values that the variable can hold, and the operations that can be done on those values.

You can declare a variable in Kotlin using var and val keywords.

A variable declared using val keyword is immutable (read-only). It cannot be reassigned after it is initialized –

val name = "Bill Gates"
name = "Satoshi Nakamoto"	// Error: Val cannot be reassigned 

For defining a mutable variable, i.e. a variable whose value can be changed, use the var keyword –

var country = "USA"
country = "India"    // Works

Type inference

Did you notice one thing about the variable declarations in the previous section? We didn’t specify the type of variables.

Although Kotlin is a statically typed language, It doesn’t require you to explicitly specify the type of every variable you declare. It can infer the type of a variable from the initializer expression –

val greeting = "Hello, World"  // type inferred as `String`
val year = 2018                // type inferred as `Int`

If you want to explicitly specify the type of a variable, you can do that like this –

// Explicitly defining the type of variables
val greeting: String = "Hello, World" 
val year: Int = 2018

Note that the type declaration becomes mandatory if you’re not initializing the variable at the time of declaration –

var language   // Error: The variable must either have a Type annotation or be initialized
language =  "French"

The above variable declaration fails because Kotlin has no way to infer the type of the variable without an initializer expression. In this case, you must explicitly specify the type of the variable –

var language: String   // Works
language = "French"

Data Types

Data Types are used to categorize a set of related values and define the operations that can be done on them.

Just like other languages, Kotlin has predefined types like IntDouble BooleanChar etc.

In Kotlin, everything (even the basic types like Int and Boolean) is an object. More specifically, everything behaves like an Object.

Kotlin may represent some of the basic types like numbers, characters and booleans as primitive values at runtime to improve performance, but for the end users, all of them are objects.

This is contrary to languages like Java that has separate primitive types like intdoubleetc, and their corresponding wrapper types like IntegerDouble etc.

Let’s now look at all the basic data types used in Kotlin one by one –

Numbers

Numeric types in Kotlin are similar to Java. They can be categorized into integer and floating point types.

Integers

  • Byte – 8 bit
  • Short– 16 bit
  • Int – 32 bit
  • Long – 64 bit

Floating Point Numbers

  • Float – 32 bit single-precision floating point value.
  • Double – 64 bit double-precision floating point value.

Following are few examples of numeric types –

// Kotlin Numeric Types Examples
val myByte: Byte = 10
val myShort: Short = 125

val myInt = 1000
val myLong = 1000L	// The suffix 'L' is used to specify a long value

val myFloat = 126.78f   // The suffix 'f' or 'F' represents a Float 
val myDouble = 325.49

You can also use underscore in numeric values to make them more readable –

val hundredThousand = 100_000
val oneMillion = 1_000_000

You can declare hexadecimal and binary values like this –

val myHexa = 0x0A0F  // Hexadecimal values are prefixed with '0x' or '0X'
val myBinary = 0b1010  // Binary values are prefixed with '0b' or '0B'

Note that Kotlin doesn’t have any representation for octal values.

Booleans

The type Boolean is used to represent logical values. It can have two possible values true and false.

val myBoolean = true
val anotherBoolean = false

Characters

Characters are represented using the type Char. Unlike Java, Char types cannot be treated as numbers. They are declared using single quotes like this –

val letterChar = 'A'
val digitChar = '9'

Just like other languages, special characters in Kotlin are escaped using a backslash. Some examples of escaped characters are – \n (newline), \t (tab), \r (carriage return), \b (backspace) etc.

Strings

Strings are represented using the String class. They are immutable, that means you cannot modify a String by changing some of its elements.

You can declare a String like this –

var myStr = "Hello, Kotlin"

You can access the character at a particular index in a String using str[index]. The index starts from zero –

var name = "John"
var firstCharInName = name[0]  // 'J'
var lastCharInName = name[name.length - 1]  // 'n'

The length property is used to get the length of a String.

Escaped String and Raw String

Strings declared in double quotes can have escaped characters like ‘\n’ (new line), ‘\t’ (tab) etc –

var myEscapedString = "Hello Reader,\nWelcome to my Blog"

In Kotlin, you also have an option to declare raw strings. These Strings have no escaping and can span multiple lines –

var myMultilineRawString = """
    The Quick Brown Fox
    Jumped Over a Lazy Dog.
"""

Arrays

Arrays in Kotlin are represented using the Array class. You can create an array in Kotlin either using the library function arrayOf() or using the Array() constructor.

Creating Arrays using the arrayOf library function

You can pass a bunch of values to the arrayOf function to create an array like this –

var numbers = arrayOf(1, 2, 3, 4, 5)
var animals = arrayOf("Cat", "Dog", "Lion", "Tiger")

Note that you can also pass values of mixed types to the arrayOf() function, and it will still work (but don’t do that) –

var mixedArray = arrayOf(1, true, 3, "Hello", 'A')	// Works and creates an array of Objects

You can also enforce a particular type while creating the array like this –

var numArray = arrayOf<Int>(1, 2, 3, 4, "Hello")  // Compiler Error

Accessing the elements of an array by their index

You can access the element at a particular index in an array using array[index]. The index starts from zero –

val myDoubleArray = arrayOf(4.0, 6.9, 1.7, 12.3, 5.4)
val firstElement = myDoubleArray[0]
val lastElement = myDoubleArray[myDoubleArray.size - 1]

Every array has a size property that you can use to get the size of the array.

You can also modify the array element at an index like this –

val a = arrayOf(4, 5, 7)  // [4, 5, 7]
a[1] = 10		          // [4, 10, 7]	

Primitive Arrays

As we learned earlier, everything in Kotlin is an object. But to improve performance it represents some of the basic types like numbers, characters and booleans as primitive types at runtime.

The arrayOf() function creates arrays of boxed/wrapper types. That is, arrayOf(1, 2, 3) corresponds to Java’s Integer[] array.

But, Kotlin provides a way to create arrays of primitive types as well. It contains specialized classes for representing array of primitive types. Those classes are – IntArrayDoubleArrayCharArray etc. You can create an array of primitive types using the corresponding library functions – intArrayOf()doubleArrayOf()charArrayOf() etc. –

val myCharArray = charArrayOf('K', 'O', 'T')  // CharArray (corresponds to Java 'char[]')
val myIntArray = intArrayOf(1, 3, 5, 7)		// IntArray (corresponds to Java 'int[]')

Creating Arrays using the Array() constructor

The Array() constructor takes two arguments –

  1. the size of the array, and
  2. a function that takes the array index as an argument and returns the element to be inserted at that index.
var mySquareArray = Array(5, {i -> i * i})	// [0, 1, 4, 9, 16]

The second argument to the Array() constructor is a lambda expression. Lambda expressions are anonymous functions that are declared and passed around as expressions. We’ll learn more about lambda expressions in a future tutorial.

The above lambda expression takes the index of an array element and returns the value that should be inserted at that index, which is the square of the index in this case.

Type Conversions

Unlike Java, Kotlin doesn’t support implicit conversion from smaller types to larger types. For example, Int cannot be assigned to Long or Double.

var myInt = 100
var myLong: Long = myInt // Compiler Error

However, Every number type contains helper functions that can be used to explicitly convert one type to another.

Following helper functions are supported for type conversion between numeric types –

  • toByte()
  • toShort()
  • toInt()
  • toLong()
  • toFLoat()
  • toDouble()
  • toChar()

Examples of explicit type conversions

Here is how you can convert an Int to Long –

val myInt = 100
val myLong = myInt.toLong()   // Explicitly converting 'Int' to 'Long'

You can also convert larger types to smaller types –

val doubleValue = 176.80
val intValue = doubleValue.toInt()  // 176

Every type in Kotlin, not just numeric type, supports a helper function called toString() to convert it to String.

val myInt = 1000
myInt.toString()  // "1000"

You can also convert a String to a numeric type like so –

val str = "1000"
val intValue = str.toInt()

If the String-to-Number conversion is not possible then a NumberFormatException is thrown –

val str = "1000ABC"
str.toInt()   // Throws java.lang.NumberFormatException

Conclusion

That’s all folks! In this article, You learned how to declare mutable and immutable variables. How type inference works in Kotlin. What are the basic data types supported in Kotlin. How to work with data types like IntLong DoubleCharBooleanString and Array, and how to convert one type to another.

JPA / Hibernate ElementCollection Example with Spring Boot
03
Mar
2021

JPA / Hibernate ElementCollection Example with Spring Boot

In this article, You’ll learn how to map a collection of basic as well as embeddable types using JPA’s @ElementCollection and @CollectionTable annotations.

Let’s say that the users of your application can have multiple phone numbers and addresses. To map this requirement into the database schema, you need to create separate tables for storing the phone numbers and addresses –

Hibernate Spring Boot JPA @ElementCollection example table structure

Both the tables user_phone_numbers and user_addresses contain a foreign key to the users table.

You can implement such relationship at the object level using JPA’s one-to-many mapping. But for basic and embeddable types like the one in the above schema, JPA has a simple solution in the form of ElementCollection.

Let’s create a project from scratch and learn how to use an ElementCollection to map the above schema in your application using hibernate.

Creating the Application

If you have Spring Boot CLI installed, then simply type the following command in your terminal to generate the application –

spring init -n=jpa-element-collection-demo -d=web,jpa,mysql --package-name=com.example.jpa jpa-element-collection-demo

Alternatively, You can use Spring Initializr web app to generate the application by following the instructions below –

  1. Open http://start.spring.io
  2. Enter Artifact as “jpa-element-collection-demo”
  3. Click Options dropdown to see all the options related to project metadata.
  4. Change Package Name to “com.example.jpa”
  5. Select WebJPA and Mysql dependencies.
  6. Click Generate to generate and download the project.

Configuring MySQL database and Hibernate Log levels

Let’s first configure the database URL, username, password, hibernate log levels and other properties.

Open src/main/resources/application.properties and add the following properties to it –

# DATASOURCE (DataSourceAutoConfiguration & DataSourceProperties)
spring.datasource.url=jdbc:mysql://localhost:3306/jpa_element_collection_demo?useSSL=false&serverTimezone=UTC&useLegacyDatetimeCode=false
spring.datasource.username=root
spring.datasource.password=root

# Hibernate

# The SQL dialect makes Hibernate generate better SQL for the chosen database
spring.jpa.properties.hibernate.dialect = org.hibernate.dialect.MySQL5InnoDBDialect

# Hibernate ddl auto (create, create-drop, validate, update)
spring.jpa.hibernate.ddl-auto = update

logging.level.org.hibernate.SQL=DEBUG
logging.level.org.hibernate.type=TRACE

Please change spring.datasource.username and spring.datasource.password as per your MySQL installation. Also, create a database named jpa_element_collection_demo before proceeding to the next section.

Defining the Entity classes

Next, We’ll define the Entity classes that will be mapped to the database tables we saw earlier.

Before defining the User entity, let’s first define the Address type which will be embedded inside the User entity.

All the domain models will go inside the package com.example.jpa.model.

1. Address – Embeddable Type

package com.example.jpa.model;

import javax.persistence.Embeddable;
import javax.validation.constraints.NotNull;
import javax.validation.constraints.Size;

@Embeddable
public class Address {
    @NotNull
    @Size(max = 100)
    private String addressLine1;

    @NotNull
    @Size(max = 100)
    private String addressLine2;

    @NotNull
    @Size(max = 100)
    private String city;

    @NotNull
    @Size(max = 100)
    private String state;

    @NotNull
    @Size(max = 100)
    private String country;

    @NotNull
    @Size(max = 100)
    private String zipCode;

    public Address() {

    }

    public Address(String addressLine1, String addressLine2, String city, 
                   String state, String country, String zipCode) {
        this.addressLine1 = addressLine1;
        this.addressLine2 = addressLine2;
        this.city = city;
        this.state = state;
        this.country = country;
        this.zipCode = zipCode;
    }

    // Getters and Setters (Omitted for brevity)
}

2. User Entity

Let’s now see how we can map a collection of basic types (phone numbers) and embeddable types (addresses) using hibernate –

package com.example.jpa.model;

import javax.persistence.*;
import javax.validation.constraints.Email;
import javax.validation.constraints.NotNull;
import javax.validation.constraints.Size;
import java.util.HashSet;
import java.util.Set;

@Entity
@Table(name = "users")
public class User {
    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    private Long id;

    @NotNull
    @Size(max = 100)
    private String name;

    @NotNull
    @Email
    @Size(max = 100)
    @Column(unique = true)
    private String email;

    @ElementCollection
    @CollectionTable(name = "user_phone_numbers", joinColumns = @JoinColumn(name = "user_id"))
    @Column(name = "phone_number")
    private Set<String> phoneNumbers = new HashSet<>();

    @ElementCollection(fetch = FetchType.LAZY)
    @CollectionTable(name = "user_addresses", joinColumns = @JoinColumn(name = "user_id"))
    @AttributeOverrides({
            @AttributeOverride(name = "addressLine1", column = @Column(name = "house_number")),
            @AttributeOverride(name = "addressLine2", column = @Column(name = "street"))
    })
    private Set<Address> addresses = new HashSet<>();


    public User() {

    }

    public User(String name, String email, Set<String> phoneNumbers, Set<Address> addresses) {
        this.name = name;
        this.email = email;
        this.phoneNumbers = phoneNumbers;
        this.addresses = addresses;
    }

    // Getters and Setters (Omitted for brevity)
}

We use @ElementCollection annotation to declare an element-collection mapping. All the records of the collection are stored in a separate table. The configuration for this table is specified using the @CollectionTable annotation.

The @CollectionTable annotation is used to specify the name of the table that stores all the records of the collection, and the JoinColumn that refers to the primary table.

Moreover, When you’re using an Embeddable type with Element Collection, you can use the @AttributeOverrides and @AttributeOverride annotations to override/customize the fields of the embeddable type.

Defining the Repository

Next, Let’s create the repository for accessing the user’s data from the database. You need to create a new package called repository inside com.example.jpa package and add the following interface inside the repository package –

package com.example.jpa.repository;

import com.example.jpa.model.User;
import org.springframework.data.jpa.repository.JpaRepository;
import org.springframework.stereotype.Repository;

@Repository
public interface UserRepository extends JpaRepository<User, Long> {

}

Testing the ElementCollection Setup

Finally, Let’s write the code to test our setup in the main class JpaElementCollectionDemoApplication.java –

package com.example.jpa;

import com.example.jpa.model.Address;
import com.example.jpa.model.User;
import com.example.jpa.repository.UserRepository;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.CommandLineRunner;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;

import java.util.HashSet;
import java.util.Set;

@SpringBootApplication
public class JpaElementCollectionDemoApplication implements CommandLineRunner {

    @Autowired
    private UserRepository userRepository;

    public static void main(String[] args) {
        SpringApplication.run(JpaElementCollectionDemoApplication.class, args);
    }

    @Override
    public void run(String... args) throws Exception {
        // Cleanup database tables.
        userRepository.deleteAll();

        // Insert a user with multiple phone numbers and addresses.
        Set<String> phoneNumbers = new HashSet<>();
        phoneNumbers.add("+91-9999999999");
        phoneNumbers.add("+91-9898989898");

        Set<Address> addresses = new HashSet<>();
        addresses.add(new Address("747", "Golf View Road", "Bangalore",
                "Karnataka", "India", "560008"));
        addresses.add(new Address("Plot No 44", "Electronic City", "Bangalore",
                "Karnataka", "India", "560001"));

        User user = new User("Yaniv Levy", "yaniv@fusebes.com",
                phoneNumbers, addresses);

        userRepository.save(user);
    }
}

The main class implements the CommandLineRunner interface and provides the implementation of its run() method.

The run() method is executed post application startup. In the run() method, we first cleanup all the tables and then insert a new user with multiple phone numbers and addresses.

You can run the application by typing mvn spring-boot:run from the root directory of the project.

After running the application, Go ahead and check all the tables in MySQL. The users table will have a new entry, the user_phone_numbers and user_addresses table will have two new entries –

mysql> select * from users;
+----+-----------------------+--------------------+
| id | email                 | name               |
+----+-----------------------+--------------------+
|  3 | yaniv@fusebes.com | Yaniv Levy |
+----+-----------------------+--------------------+
1 row in set (0.01 sec)
mysql> select * from user_phone_numbers;
+---------+----------------+
| user_id | phone_number   |
+---------+----------------+
|       3 | +91-9898989898 |
|       3 | +91-9999999999 |
+---------+----------------+
2 rows in set (0.00 sec)
mysql> select * from user_addresses;
+---------+--------------+-----------------+-----------+---------+-----------+----------+
| user_id | house_number | street          | city      | country | state     | zip_code |
+---------+--------------+-----------------+-----------+---------+-----------+----------+
|       3 | 747          | Golf View Road  | Bangalore | India   | Karnataka | 560008   |
|       3 | Plot No 44   | Electronic City | Bangalore | India   | Karnataka | 560001   |
+---------+--------------+-----------------+-----------+---------+-----------+----------+
2 rows in set (0.00 sec)

Conclusion

That’s all in this article Folks. I hope you learned how to use hibernate’s element-collection with Spring Boot.

Thanks for reading guys. Happy Coding! 🙂

Spring Boot Logging Best Practices Guide
05
May
2021

Spring Boot Logging Best Practices Guide

Logging in Spring Boot can be confusing, and the wide range of tools and frameworks make it a challenge to even know where to start. This guide talks through the most common best practices for Spring Boot logging and gives five key suggestions to add to your logging tool kit.

What’s in the Spring Boot Box?

The Spring Boot Starters all depend on spring-boot-starter-logging. This is where the majority of the logging dependencies for your application come from. The dependencies involve a facade (SLF4J) and frameworks (Logback). It’s important to know what these are and how they fit together.

SLF4J is a simple front-facing facade supported by several logging frameworks. It’s main advantage is that you can easily switch from one logging framework to another. In our case, we can easily switch our logging from Logback to Log4j, Log4j2 or JUL.

The dependencies we use will also write logs. For example, Hibernate uses SLF4J, which fits perfectly as we have that available. However, the AWS SDK for Java uses Apache Commons Logging (JCL). Spring-boot-starter-logging includes the necessary bridges to ensure those logs are delegated to our logging framework out of the box.

SLF4J usage:

At a high level, all the application code has to worry about is:

  1. Getting an instance of an SLF4J logger (Regardless of the underlying framework):
    private static final Logger LOG = LoggerFactory.getLogger(MyClass.class);Copy
  2. Writing some logs:
    LOG.info(“My message set at info level”);Copy

Logback or Log4j2?

Spring Boot’s default logging framework is Logback. Your application code should interface only with the SLF4J facade so that it’s easy to switch to an alternative framework if necessary.

Log4j2 is newer and claims to improve on the performance of Logback. Log4j2 also supports a wide range of appenders so it can log to files, HTTP, databases, Cassandra, Kafka, as well as supporting asynchronous loggers. If logging performance is of high importance, switching to log4j2 may improve your metrics. Otherwise, for simplicity, you may want to stick with the default Logback implementation.

This guide will provide configuration examples for both frameworks.

Want to use log4j2? You’ll need to exclude spring-boot-starter-logging and include spring-boot-starter-logging-log4j2.

spring boot logging frameworks

5 Tips for Getting the Most Out of Your Spring Boot Logging

With your initial set up out of the way, here are 5 top tips for spring boot logging.

1. Configuring Your Log Format

Spring Boot Logging provides default configurations for logback and log4j2. These specify the logging level, the appenders (where to log) and the format of the log messages.

For all but a few specific packages, the default log level is set to INFO, and by default, the only appender used is the Console Appender, so logs will be directed only to the console.

The default format for the logs using logback looks like this:

logback default logging format

Let’s take a look at that last line of log, which was a statement created from within a controller with the message “My message set at info level”.

It looks simple, yet the default log pattern for logback seems “off” at first glance. As much as it looks like it could be, it’s not regex, it doesn’t parse email addresses, and actually, when we break it down it’s not so bad.

%clr(%d{${LOG_DATEFORMAT_PATTERN:-yyyy-MM-dd HH:mm:ss.SSS}}){faint}
%clr(${LOG_LEVEL_PATTERN:-%5p}) %clr(${PID:- }){magenta} %clr(---){faint}
%clr([%15.15t]){faint} %clr(%-40.40logger{39}){cyan} %clr(:){faint}
%m%n${LOG_EXCEPTION_CONVERSION_WORD:-%wEx}Copy

Understanding the Default Logback Pattern

The variables that are available for the log format allow you to create meaningful logs, so let’s look a bit deeper at the ones in the default log pattern example.Show 102550100 entriesSearch:

Pattern PartWhat it Means
%clr%clr specifies a colour. By default, it is based on log levels, e.g, INFO is green. If you want to specify specific colours, you can do that too.

The format is:
%clr(Your message){your colour}

So for example, if we wanted to add “Demo” to the start of every log message, in green, we would write:
%clr(Demo){green}
%d{${LOG_DATEFORMAT_PATTERN:-yyyy-MM-dd HH:mm:ss.SSS}}%d is the current date, and the part in curly braces is the format. ${VARIABLE}:-default is a way of specifying that we should use the $VARIABLE environment variable for the format, if it is available, and if not, fall back to default. This is handy if you want to override these values in your properties files, by providing arguments, or by setting environment variables.

In this example, the default format is yyyy-MM-dd HH:mm:ss.SSS unless we specify a variable named LOG_DATEFORMAT_PATTERN. In the logs above, we can see 2020-10-19 10:09:58.152 matches the default pattern, meaning we did not specify a custom LOG_DATEFORMAT_PATTERN.
${LOG_LEVEL_PATTERN:-%5p}Uses the LOG_LEVEL_PATTERN if it is defined, else will print the log level with right padding up to 5 characters (E.g “INFO” becomes “INFO “ but “TRACE” will not have the trailing space). This keeps the rest of the log aligned as it’ll always be 5 characters.
${PID:- }The environment variable $PID, if it exists. If not, space.
tThe name of the thread triggering the log message.
loggerThe name of the logger (up to 39 characters), in our case this is the class name.
%mThe log message.
%nThe platform-specific line separator.
%wExIf one exists, wEx is the stack trace of any exception, formatted using Spring Boot’s ExtendedWhitespaceThrowableProxyConverter.

Showing 1 to 9 of 9 entriesPreviousNext

Customising the log format

You can customise the ${} variables that are found in the logback-spring.xml by passing in properties or environment variables. For example, you may set logging.pattern.console to override the whole of the console log pattern. 

However, for more control, including adding additional appenders, it is recommended to create your logback-spring.xml and place it inside your resources folder. You can do the same with log4j2 by adding log4j2-spring.xml to your resources folder.

Armed with the ability to customise your logs, you should consider adding:

  • Application name.
  • A request ID.
  • The endpoint being requested (E.g /health).

There are a few items in the default log that I would remove unless you have a specific use case for them:

  • The ‘—’ separator.
  • The thread name.
  • The process ID.

With the ability to customise these through the use of the logback-spring.xml or log4j2-spring.xml, the format of your logs is fully within your control.

2. Configuring the Destination for Your Logs (Appenders and Loggers)

An appender is just a fancy name for the part of the logging framework that sends your logs to a particular target. Both frameworks can output to console, over HTTP, to databases, or over a TCP socket, as well as to many other targets. The way we configure the destination for the logs is by adding, removing and configuring these appenders. 

You have more control over which appenders you use, and the configuration of them, if you create your own custom .xml configuration. However, the default logging configuration does make use of environment properties that allow you to override some parts of it, for example, the date format.

Preset configuration for logging to files are available within Spring Boot Logging. You can use the logback configuration with a file appender or the log4j2 configuration with a file appender if you specify logging.file or logging.path in your application properties.

The official docs for logback appenders and log4j2 appenders detail the parameters required for each of the appenders available, and how to configure them in your XML file. One tip for choosing the destination for your logs is to have a plan for rotating them. Writing logs to a file always feels like a great idea, until the storage used for that file runs out and brings down the whole service. 

Log4j and logback both have a RollingFileAppender which handles rotating these log files based on file size, or time, and it’s exactly that which Spring Boot Logging uses if you set the logging.file property. 

3. Logging as a Cross-Cutting Concern to Keep Your Code Clean (Using Filters and Aspects)

You might want to log every HTTP request your API receives. That’s a fairly normal requirement, but putting a log statement into every controller is unnecessary duplication. It’s easy to forget and make mistakes. A requirement that you want to log every method within your packages that your application calls would be even more cumbersome. 

I’ve seen developers use this style of logging at trace level so that they can turn it on to see exactly what is happening in a production environment. Adding log statements to the start and end of every method is messy, and there is a better way. This is where filters and aspects save the day and avoid the code duplication.

When to Use a Filter Vs When to Use Aspect-Oriented Programming

If you are looking to create log statements related to specific requests, you should opt for using filters, as they are part of the handling chain that your application already goes through for each request. They are easier to write, easier to test and usually more performant than using aspects. If you are considering more cross-cutting concerns, for example, audit logging, or logging every method that causes an exception to be thrown, use AOP. 

Using a Filter to Log Every Request

Filters can be registered with your web container by creating a class implementing javax.servlet.Filter and annotating it with @Component, or adding it as an @Bean in one of your configuration classes. When your spring-boot-starter application starts up, it will create the Filter and register it with the container.

You can choose to create your own Filter, or to use an existing one. To make use of the existing Filter, you need to supply a CommonsRequestLoggingFilter bean and set your logging level to debug. You’ll get something that looks like:

2020-10-27 18:50:50.427 DEBUG 24168 --- [nio-8080-exec-2] o.a.coyote.http11.Http11InputBuffer      : Received [GET /health HTTP/1.1
tracking-header: my-tracking
User-Agent: PostmanRuntime/7.26.5
Accept: */*
Postman-Token: 04a661b7-209c-43c3-83ea-e09466cf3d92
Host: localhost:8080
Accept-Encoding: gzip, deflate, br
Connection: keep-alive
]Copy

If you use the existing one, you have little control over the message that gets logged. 

If you want more control, create your own Filter using this example, and you then have full control over the content of the log message.

Using Aspects for Cross-Cutting Concerns

Aspect-oriented programming enables you to fulfill cross-cutting concerns, like logging for example, in one place. You can do this without your logging code needing to sprawl across every class.

This approach is great for use cases such as:

  • Logging any exceptions thrown from any method within your packages (See @AfterThrowing)
  • Logging performance metrics by timing before/after each method is run (See @Around)
  • Audit logging. You can log calls to methods that have your a custom annotation on, such as adding @Audit. You only need to create a pointcut matching calls to methods with that annotation

Let’s start with a simple example – we want to log the name of every public method that we call within our package, com.example.demo. There are only a few steps to writing an Aspect that will run before every public method in a package that you specify.

  1. Include spring-boot-starter-aop in your pom.xml or build.gradle.
  2. Add @EnableAspectJAutoProxy to one of your configuration classes. This line tells spring-boot that you want to enable AspectJ support.
  3. Add your pointcut, which defines a pattern that is matched against method signatures as they run. You can find more about how to construct your matching pattern in the spring boot documentation for AOP. In our example, we match any method inside the com.example.demo package.
  4. Add your Aspect. This defines when you want to run your code in relation to the pointcut (E.g, before, after or around the methods that it matches). In this example, the @Before annotation causes the method to be executed before any methods that match the pointcut. 

That’s all there is to logging every method call. The logs will appear as:

2020-10-27 19:26:33.269  INFO 2052 --- [nio-8080-exec-2]
com.example.demo.MyAspect                : Called checkHealthCopy

By making changes to your pointcut, you can write logs for every method annotated with a specific annotation. For example, consider what you can do with:

@annotation(com.example.demo.Audit)Copy

4. Applying Context to Your Logs Using MDC

(This would run for every method annotated with a custom annotation, @Audit).

MDC (Mapped Diagnostic Context) is a complex-sounding name for a map of key-value pairs, associated with a single thread. Each thread has its own map. You can add keys/values to the map at runtime, and then reference the keys from that map in your logging pattern. 

The approach comes with a warning that threads may be reused, and so you’ll need to make sure to clear your MDC after each request to avoid your context leaking from one request to the next.

MDC is accessible through SLF4J and supported by both Logback and Log4j2, so we don’t need to worry about the specifics of the underlying implementation. 

The MDC section in the SLF4J documentation gives the simplest examples.

Tracking Requests Through Your Application Using Filters and MDC

Want to be able to group logs for a specific request? The Mapped Diagnostic Context (MDC) will help. 

The steps are:

  1. Add a header to each request going to your API, for example, ‘tracking-id’. You can generate this on the fly (I suggest using a UUID) if your client cannot provide one.
  2. Create a filter that runs once per request and stores that value in the MDC.
  3. Update your logging pattern to reference the key in the MDC to retrieve the value.

Using a Filter, this is how you can read values from the request and set them on the MDC. Make sure to clear up after the request by calling MDC.clear(), preferably in a finally block so that it always runs. 

After setting the value on your MDC, just add %X{tracking}  to your logging pattern (Replacing the word “tracking” with the key you have put in MDC) and your logs will contain the value in every log message for that request. 

If a client reports a problem, as long as you can get a unique tracking-id from your client, then you’ll be able to search your logs and pull up every log statement generated from that specific request.

Other use cases that you may want to put into your MDC and include on every log message include:

  • The application version.
  • Details of the request, for example, the path.
  • Details of the logged-in user, for example, the username.

5. Unit Testing Your Log Statements

Why Test Your Logs?

You can unit test your logging code. Too often this is overlooked because the log statements return void. For example, logger.info(“foo”);  does not return a value that you can assert against. 

It’s easy to make mistakes. Log statements usually involve parameters or formatted strings, and it’s easy to put log statements in the wrong place. Unit testing reassures you that your logs do what you expect and that you’re covered when refactoring to avoid any accidental modifications to your logging behaviour.

The Approach to Testing Your Logs

The Problem

SLF4J’s LoggerFactory.getLogger is static, making it difficult to mock. Searching through any outputted log files in our unit tests is error-prone (E.g we need to consider resetting the log files between each unit test). How do we assert against the logs?

The Solution

The trick is to add your own test appender to the logging framework (e.g Logback or Log4j2) that captures the logs from your application in memory, allowing us to assert against the output later. The steps are:

  1. Before each test case, add an appender to your logger.
  2. Within the test, call your application code that logs some output.
  3. The logger will delegate to your test appender.
  4. Assert that your expected logs have been received by your test appender.

Each logging framework has suitable appenders, but referencing those concrete appenders in our tests means we need to depend on the specific framework rather than SLF4J. That’s not ideal, but the alternatives of searching through logged output in files, or implementing our own SLF4J implementation is overkill, making this the pragmatic choice.

Here are a couple of tricks for unit testing using JUnit 4 rules or JUnit 5 extensions that will keep your test classes clean, and reduce the coupling with the logging framework.

Testing Log Statements Using Junit 5 Extensions in Two Steps

JUnit 5 extensions help to avoid code duplicates between your tests. Here’s how to set up your logging tests in two steps:

Step 1: Create your JUnit extension

Create your extension for Logback

Create your extension for Log4j2

Step 2: Use that rule to assert against your log statement with logback or log4j2

Testing Log Statements Using Junit 4 Rules in Two Steps

JUnit 4 rules help to avoid code duplication by extracting the common test code away from the test classes. In our example, we don’t want to duplicate the code for adding a test appender to our logger in every test class.

Step 1: Create your JUnit rule. 

Create your rule for Logback

Create your rule for Log4j2

Step 2: Use that rule to assert against your log statements using logback or log4j2.

With these approaches, you can assert that your log statements have been called with a message and level that you expect. 

Conclusion

The Spring Boot Logging Starter provides everything you need to quickly get started, whilst allowing full control when you need it. We’ve looked at how most logging concerns (formatting, destinations, cross-cutting logging, context and unit tests) can be abstracted away from your core application code.

Any global changes to your logging can be done in one place, and the classes for the rest of your application don’t need to change. At the same time, unit tests for your log statements provide you with reassurance that your log statements are being fired after making any alterations to your business logic.

These are my top 5 tips for configuring Spring Boot Logging. However, when your logging configuration is set up, remember that your logs are only ever as good as the content you put in them. Be mindful of the content you are logging, and make sure you are using the right logging levels.

How to use Log4j 2 with Spring Boot
03
Mar
2021

How to use Log4j 2 with Spring Boot

Apache Log4j 2 is a successor of Log4j 1.x (who would have guessed? :p).

Log4j 2 provides a significant improvement in performance compared to its predecessor. It contains asynchronous loggers and supports multiple APIs including SLF4J, commons logging, and java.util.loggging.

In this article, you’ll learn how to integrate and configure Log4j 2 in Spring Boot applications. You’ll configure different types of appenders including RollingFileAppender and SMTPAppender. You’ll also learn how to use Async logging capabilities provided by Log4j 2.

The idea is to build a simple Spring Boot application from scratch and demonstrate how to go about integrating and configuring Log4j 2 in the app.

So, Let’s get started!

Creating the Project

Let’s use Spring Boot CLI to generate the project. If you don’t have Spring Boot CLI installed, I highly encourage you to do so. Check out the Official Spring Boot documentation for any help with the installation.

Fire up your terminal and type the following command to generate the project –

spring init --name=log4j2-demo --dependencies=web log4j2-demo

Once the project is generated, import it into your favorite IDE. The project’s directory structure should look like this –

Spring Boot Log4j2 Example Directory Structure

Adding Log4j2

All the Spring Boot starters depend on spring-boot-starter-logging, which uses Logback by default. For using Log4j2, you need to exclude spring-boot-starter-logging and add spring-boot-starter-log4j2 dependency.

Open pom.xml file and add the following snippet to the <dependencies> section –

<!-- Exclude Spring Boot's Default Logging -->
<dependency>
	<groupId>org.springframework.boot</groupId>
	<artifactId>spring-boot-starter</artifactId>
	<exclusions>
		<exclusion>
			<groupId>org.springframework.boot</groupId>
			<artifactId>spring-boot-starter-logging</artifactId>
		</exclusion>
	</exclusions>
</dependency>

<!-- Add Log4j2 Dependency -->
<dependency>
	<groupId>org.springframework.boot</groupId>
	<artifactId>spring-boot-starter-log4j2</artifactId>
</dependency>

Configuring Log4j2

Spring Boot automatically configures Log4j if it finds a file named log4j2.xml or log4j2.json or log4j2.yaml in the classpath.

In this article, we’ll configure log4j 2 using XML. Create a new file log4j2.xml inside src/main/resources directory, and add the following configuration to it –

<?xml version="1.0" encoding="UTF-8"?>
<Configuration status="WARN" monitorInterval="30">
    <Properties>
        <Property name="LOG_PATTERN">
            %d{yyyy-MM-dd HH:mm:ss.SSS} %5p ${hostName} --- [%15.15t] %-40.40c{1.} : %m%n%ex
        </Property>
    </Properties>
    <Appenders>
        <Console name="ConsoleAppender" target="SYSTEM_OUT" follow="true">
            <PatternLayout pattern="${LOG_PATTERN}"/>
        </Console>
    </Appenders>
    <Loggers>
        <Logger name="com.example.log4j2demo" level="debug" additivity="false">
            <AppenderRef ref="ConsoleAppender" />
        </Logger>

        <Root level="info">
            <AppenderRef ref="ConsoleAppender" />
        </Root>
    </Loggers>
</Configuration>

The above configuration defines a simple ConsoleAppender and declares two loggers – an application specific logger and the the root logger.

Using Log4j 2 in the app

Let’s add some logging code to check our logger configuration. Open Log4j2DemoApplication.java and replace it with the following code –

package com.example.log4j2demo;

import org.apache.logging.log4j.LogManager;
import org.apache.logging.log4j.Logger;
import org.springframework.boot.ApplicationArguments;
import org.springframework.boot.ApplicationRunner;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;

@SpringBootApplication
public class Log4j2DemoApplication implements ApplicationRunner {
    private static final Logger logger = LogManager.getLogger(Log4j2DemoApplication.class);

    public static void main(String[] args) {
        SpringApplication.run(Log4j2DemoApplication.class, args);
    }

    @Override
    public void run(ApplicationArguments applicationArguments) throws Exception {
        logger.debug("Debugging log");
        logger.info("Info log");
        logger.warn("Hey, This is a warning!");
        logger.error("Oops! We have an Error. OK");
        logger.fatal("Damn! Fatal error. Please fix me.");
    }
}

We have added 5 simple logs of different log levels. When you run the app, it logs everything on the console.

Adding a Rolling File Appender

If you want your logs to be written to a file, then you can use a RollingFile appender. RollingFile appender creates a new file whenever the log file reaches a certain threshold specified by the triggering policy.

It stores the old log files with names matching the pattern specified by the filePattern parameter –

<!-- Rolling File Appender -->
<RollingFile name="FileAppender" fileName="logs/log4j2-demo.log" 
             filePattern="logs/log4j2-demo-%d{yyyy-MM-dd}-%i.log">
    <PatternLayout>
        <Pattern>${LOG_PATTERN}</Pattern>
    </PatternLayout>
    <Policies>
        <SizeBasedTriggeringPolicy size="10MB" />
    </Policies>
    <DefaultRolloverStrategy max="10"/>
</RollingFile>

In the above RollingFile configuration, I’ve specified a SizeBasedTriggeringPolicy which will roll files over when the size reaches 10MB. The <DefaultRollOverStrategy max="10" /> element specifies the maximum number of log files that will be kept.

There is another common triggering policy called TimeBasedTriggeringPolicy which is used commonly in Log4j2. You can use TimeBasedTriggeringPolicy to specify how often a rollover should occur based on the most specific time unit in the date/time pattern specified in the filePattern attribute.

Here is how you can use the TimeBasedTriggeringPolicy in the above example to roll files over every day –

<Policies>
    <TimeBasedTriggeringPolicy interval="1" />
</Policies>

The interval attribute specifies the frequency of rollover. So if you want to roll files over every week, you can specify interval="7".

In the above example, the date/time pattern of the file is {yyy-MM-dd}, where the most specific time unit is dd (date). Therefore the TimeBasedTriggeringPolicy roll files over based on date. If the date/time pattern was yyy-MM-dd-HH, the rollover would occur based on hour.

Finally, To use the RollingFile appender, you need to add the above RollingFile configuration to the <Appenders> section inside log4j2.xml, and configure one of your loggers to use it like so –

<Root level="info">
    <AppenderRef ref="ConsoleAppender"/>
    <AppenderRef ref="FileAppender"/>
</Root>

Adding an SMTP Appender

SMTP appender is very useful in production systems when you want to be notified about any errors in your application via email.

You can configure an SMTP appender to send ERROR emails to you using an SMTP server like so –

<!-- SMTP Appender -->
<SMTP name="MailAppender"
      subject="Log4j2 Demo [PROD]"
      to="youremail@example.com"
      from="log4j2-demo-alerts@example.com"
      smtpHost="yourSMTPHost"
      smtpPort="587"
      smtpUsername="yourSMTPUsername"
      smtpPassword="yourSMTPPassword"
      bufferSize="1">
    <ThresholdFilter level="ERROR" onMatch="ACCEPT" onMismatch="DENY"/>
    <PatternLayout>
        <Pattern>${LOG_PATTERN}</Pattern>
    </PatternLayout>
</SMTP>

Note that, for SMTP appender to work, you need to include spring-boot-starter-mail dependency to your pom.xml file –

<!-- Needed for SMTP appender -->
<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-mail</artifactId>
</dependency>

Asynchronous Logging

Log4j2 Supports Async loggers. These loggers provide a drastic improvement in performance compared to their synchronous counterparts.

Async Loggers internally use a library called Disruptor for asynchronous logging.

We need to include Disruptor dependency for using async loggers. Add the following to your pom.xml file –

<!-- Needed for Async Logging with Log4j 2 -->
<dependency>
    <groupId>com.lmax</groupId>
    <artifactId>disruptor</artifactId>
    <version>3.3.6</version>
</dependency>

Now you can either make all Loggers asynchronous, or create a mix of sync and async loggers.

1. Making all Loggers Asynchronous

Making all loggers asynchronous is very easy. You just need to set the SystemProperty Log4jContextSelector to org.apache.logging.log4j.core.async.AsyncLoggerContextSelector.

You can do that by adding the System Property at runtime like so –

mvn spring-boot:run -DLog4jContextSelector=org.apache.logging.log4j.core.async.AsyncLoggerContextSelector

or, when using a packaged jar –

mvn clean package
java -jar target/log4j2-demo-0.0.1-SNAPSHOT.jar -DLog4jContextSelector=org.apache.logging.log4j.core.async.AsyncLoggerContextSelector

2. Using a mix of Sync and Async Loggers

You can also use a mix of sync and async loggers with Log4j 2 using the <AsyncLogger> configuration element.

In the following example, all the application specific logs will be asynchronous, and other root logs will be synchronous –

<Loggers>
    <AsyncLogger name="com.example.log4j2demo" level="debug" additivity="false">
        <AppenderRef ref="ConsoleAppender" />
        <AppenderRef ref="FileAppender" />
    </AsyncLogger>

    <Root level="info">
        <AppenderRef ref="ConsoleAppender" />
        <AppenderRef ref="FileAppender" />
    </Root>
</Loggers>

You don’t need to add any SystemProperty when you’re using <AsyncLogger> element. Just add the above loggers to your log4j2.xml file and you’re ready to go.

Conclusion

Congratulations folks! In this article, we learned how to integrate and configure Log4j 2 in Spring Boot apps. We also configured different types of appenders like ConsoleAppender, RollingFile Appender, and SMTP Appender. Finally, we learned how to use Async loggers provided by Log4j2.

Java Optional Tutorial with Examples
03
Mar
2021

Java Optional Tutorial with Examples

If you’re a Java programmer, then you must have heard about or experienced NullPointerExceptions in your programs.

NullPointerExceptions are Runtime Exceptions which are thrown by the jvm at runtime. Null checks in programs are often overlooked by developers causing serious bugs in code.

Java 8 introduced a new type called Optional<T> to help developers deal with null values properly.

The concept of Optional is not new and other programming languages have similar constructs. For example – Scala has Optional[T]Haskell has Maybe type.

In this blog post, I’ll explain about Java 8’s Optional type and show you how to use it by giving simple examples.

What is Optional?

Optional is a container type for a value which may be absent. Confused? Let me explain.

Consider the following function which takes a user id, fetches the user’s details with the given id from the database and returns it –

User findUserById(String userId) { ... };

If userId is not present in the database then the above function returns null. Now, let’s consider the following code written by a client –

User user = findUserById("667290");
System.out.println("User's Name = " + user.getName());

A common NullPointerException situation, right? The developer forgot to add the null check in his code. If userId is not present in the database, then the above code snippet will throw a NullPointerException.

Now, let’s understand how Optional will help you mitigate the risk of running into NullPointerException here –

Optional<User> findUserById(String userId) { ... };

By returning Optional<User> from the function, we have made it clear to the clients of this function that there might not be a User with the given userId. Now the clients of this function are explicitly forced to handle this fact.

The client code can now be written as –

Optional<User> optional = findUserById("667290");

optional.ifPresent(user -> {
    System.out.println("User's name = " + user.getName());    
})

Once you have an Optional object, you can use various utility methods to work with the Optional. The ifPresent() method in the above example calls the supplied lambda expression if the user is present, otherwise it does nothing.

Well! You get the idea here right? The client is now forced by the type system to write the Optional check in his code.

Creating an Optional object

1. Create an empty Optional

An empty Optional Object describes the absence of a value.

Optional<User> user = Optional.empty();

2. Create an Optional with a non-null value –

User user = new User("667290", "Rajeev Kumar Singh");
Optional<User> userOptional = Optional.of(user);

If the argument supplied to Optional.of() is null, then it will throw a NullPointerException immediately and the Optional object won’t be created.

3. Create an Optional with a value which may or may not be null –

Optional<User> userOptional = Optional.ofNullable(user);

If the argument passed to Optional.ofNullable() is non-null, then it returns an Optional containing the specified value, otherwise it returns an empty Optional.

Checking the presence of a value

1. isPresent()

isPresent() method returns true if the Optional contains a non-null value, otherwise it returns false.

if(optional.isPresent()) {
    // value is present inside Optional
    System.out.println("Value found - " + optional.get());
} else {
    // value is absent
    System.out.println("Optional is empty");
}	

2. ifPresent()

ifPresent() method allows you to pass a Consumer function that is executed if a value is present inside the Optional object.

It does nothing if the Optional is empty.

optional.ifPresent(value -> {
    System.out.println("Value found - " + value);
});

Note that I have supplied a lambda expression to the ifPresent() method. This makes the code more readable and concise.

Retrieving the value using get() method

Optional’s get() method returns a value if it is present, otherwise it throws NoSuchElementException.

User user = optional.get()

You should avoid using get() method on your Optionals without first checking whether a value is present or not, because it throws an exception if the value is absent.

Returning default value using orElse()

orElse() is great when you want to return a default value if the Optional is empty. Consider the following example –

// return "Unknown User" if user is null
User finalUser = (user != null) ? user : new User("0", "Unknown User");

Now, let’s see how we can write the above logic using Optional’s orElse() construct –

// return "Unknown User" if user is null
User finalUser = optionalUser.orElse(new User("0", "Unknown User"));

Returning default value using orElseGet()

Unlike orElse(), which returns a default value directly if the Optional is empty, orElseGet() allows you to pass a Supplier function which is invoked when the Optional is empty. The result of the Supplier function becomes the default value of the Optional –

User finalUser = optionalUser.orElseGet(() -> {
    return new User("0", "Unknown User");
});

Throw an exception on absence of a value

You can use orElseThrow() to throw an exception if Optional is empty. A typical scenario in which this might be useful is – returning a custom ResourceNotFound() exception from your REST API if the object with the specified request parameters does not exist.

@GetMapping("/users/{userId}")
public User getUser(@PathVariable("userId") String userId) {
    return userRepository.findByUserId(userId).orElseThrow(
	    () -> new ResourceNotFoundException("User not found with userId " + userId);
    );
}

Filtering values using filter() method

Let’s say you have an Optional object of User. You want to check its gender and call a function if it’s a MALE. Here is how you would do it using old school method –

if(user != null && user.getGender().equalsIgnoreCase("MALE")) {
    // call a function
}

Now, let’s use Optional along with filter to achieve the same –

userOptional.filter(user -> user.getGender().equalsIgnoreCase("MALE"))
.ifPresent(() -> {
    // Your function
})

The filter() method takes a predicate as an argument. If the Optional contains a non-null value and the value matches the given predicate, then filter() method returns an Optional with that value, otherwise it returns an empty Optional.

So, the function inside ifPresent() in the above example will be called if and only if the Optional contains a user and user is a MALE.

Extracting and transforming values using map()

Let’s say that you want to get the address of a user if it is present and print it if the user is from India.

Considering the following getAddress() method inside User class –

Address getAddress() {
    return this.address;
}

Here is how you would achieve the desired result –

if(user != null) {
    Address address = user.getAddress();
    if(address != null && address.getCountry().equalsIgnoreCase("India")) {
	    System.out.println("User belongs to India");
    }
}

Now, let’s see how we can get the same result using map() method –

userOptional.map(User::getAddress)
.filter(address -> address.getCountry().equalsIgnoreCase("India"))
.ifPresent(() -> {
    System.out.println("User belongs to India");
});

You see how concise and readable the above code is? Let’s break the above code snippet and understand it in detail –

// Extract User's address using map() method.
Optional<Address> addressOptional = userOptional.map(User::getAddress)

// filter address from India
Optional<Address> indianAddressOptional = addressOptional.filter(address -> address.getCountry().equalsIgnoreCase("India"));

// Print, if country is India
indianAddressOptional.ifPresent(() -> {
    System.out.println("User belongs to India");
});

In the above example, map() method returns an empty Optional in the following cases – 1. user is absent in userOptional. 2. user is present but getAdderess() returns null.

otherwise, it returns an Optional<Address> containing user’s address.

Cascading Optionals using flatMap()

Let’s consider the above map() example again. You might ask that if user’s address can be null then why the heck aren’t you returning an Optional<Address> instead of plain Address from getAddress() method?

And, You’re right! Let’s correct that, let’s now assume that getAddress() returns Optional<Address>. Do you think that above code will still work?

The answer is no! The problem is the following line –

Optional<Address> addressOptional = userOptional.map(User::getAddress)

Since getAddress() returns Optional<Address>, the return type of userOptional.map() will be Optional<Optional<Address>>

Optional<Optional<Address>> addressOptional = userOptional.map(User::getAddress)

Oops! We certainly don’t want that nested Optional. Let’s use flatMap() to correct that –

Optional<Address> addressOptional = userOptional.flatMap(User::getAddress)

Cool! So, Rule of thumb here – if the mapping function returns an Optional, use flatMap() instead of map() to get the flattened result from your Optional

Conclusion

Thank you for reading. If you Optional<Liked> this blog post. Give an Optional<High Five> in the comment section below.

3 Layer Automated Testing
26
Mar
2021

3 Layer Automated Testing

Birth of Quality Engineering

Quality Assurance practice was relatively simple when we built the Monolith systems with traditional waterfall development models. The Quality Assurance (QA) teams would start the GUI layer’s validation process after waiting for months for the product development. To enhance the testing process, we would have to spend a lot of efforts and $$$ (commercial tools) in automating the GUI via various tools like Microfocus UFTSeleniumTest CompleteCoded UIRanorex etc., and most often these tests are complex to maintain and scale. Thus, most QA teams would have to restrict their automated tests to smoke and partial regression, ending in inadequate test coverage.

With modern technology, the new era tech companies, including Varo, have widely adopted Microservices-based architecture combined with the Agile/ Dev-Ops development model. This opens up a lot of opportunities for Quality Assurance practice, and in my opinion, this was the origin of the transformation from Quality Assurance to “Quality Engineering.”

The Common Pitfall

While automated testing gives us a massive benefit with the three R’s (Repeatable → Run any number of times, Reliable → Run with confidence, Reusable → Develop, and share), it also comes with maintenance costs. I like to quote Grady Boosh’s — “A fool with a tool is still a fool.” Targeting inappropriate areas would not provide us the desired benefit. We should consider several factors to choose the right candidate for automation. Few to name are the lifespan of the product, the volatility of the requirements, the complexity of the tests, business criticality, and technical feasibility.

It’s well known that the cost of a bug increases toward the right of software lifecycle. So it is necessary to implement several walls of defenses to arrest these software bugs as early as possible (Shift-Left Testing paradigm). By implementing an agile development model with a fail-fast mindset, we have taken care of our first wall of defense. But to move faster in this shorter development cycle, we must build robust automated test suites to take care of the rolled out features and make room for testing the new features.

The 3 Layer Architecture

The Varo architecture comprises three essential layers.

  • The Frontend layer (Web, iOS, Mobile apps) — User experience
  • The Orchestration layer (GraphQl) — Makes multiple microservices calls and returns the decision and data to the frontend apps
  • The Microservice layer (gRPC, Kafka, Postgres) — Core business layer

While understanding the microservice architecture for testing, there were several questions posed.

  • Which layer to test?
  • What to test in these layers?
  • Does testing frontend automatically validate downstream service?
  • Does testing multiple layers introduce redundancies?

We will try to answer these by analyzing the table below, which provides an overview of what these layers mean for quality engineering.

After inferring the table, we have loosely adopted the Testing Pyramid pattern to invest in automated testing as:

  • Full feature/ functional validations on Microservices layer
  • Business process validations on Orchestration layer
  • E2E validations on Frontend layer

The diagram below best represents our test strategy for each layer.

Note: Though we have automated white-box tests such as unit-test and integration-test, we exclude those in this discussion.

Use Case

Let’s take the example below for illustration to understand best how this Pyramid works.

The user is presented with a form to submit. The form accepts three inputs — Field A to get the user identifier, Field B a drop-down value, and Field C accepts an Integer value (based on a defined range).

Once the user clicks on the Submit button, the GraphQL API calls Microservice A to get what type of customer. Then it calls the next Microservice B to validate the acceptable range of values for Field C (which depends on the values from Field A and Field B).

Validations:

1. Feature Validations

✓ Positive behavior (Smoke, Functional, System Integration)

  • Validating behavior with a valid set of data combinations
  • Validating database
  • Validating integration — Impact on upstream/ downstream systems

✓ Negative behavior

  • Validating with invalid data (for example: Invalid authorization, Disqualified data)

2. Fluent Validations

✓ Evaluating field definitions — such as

  • Mandatory field (not empty/not null)
  • Invalid data types (for example: Int → negative value, String → Junk values with special characters or UUID → Invalid UUID formats)

Let’s look at how the “feature validations” can be written for the above use case by applying one of the test case authoring techniques — Boundary Value Analysis.

To test the scenario above, it would require 54 different combinations of feature validations, and below is the rationale to pick the right candidate for each layer.

Microservice Layer: This is the layer delivered first, enabling us to invest in automated testing as early as possible (Shift-Left). And the scope for our automation would be 100% of all the above scenarios.

Orchestration Layer: This layer translates the information from the microservice to frontend layers; we try to select at least two tests (1 positive & 1 negative) for each scenario. The whole objective is to ensure the integration is working as expected.

Frontend Layer: In this layer, we focus on E2E validations, which means these validations would be a part of the complete user journey. But we would ensure that we have at least one or more positive and negative scenarios embedded in those E2E tests. Business priority (frequently used data by the real-time users) helps us to select the best scenario for our E2E validations.

Conclusion

There are always going to be sets of redundant tests across these layers. But that is the trade-off we had to take to ensure that we have correct quality gates on each of these layers. The pros of this approach are that we achieve safe and faster deployments to Production by enabling quicker testing cycles, better test coverage, and risk-free decisions. In addition, having these functional test suites spread across the layers helps us to isolate the failures in respective areas, thus saving us time to troubleshoot an issue.

However, often, not one size fits all. The decision has to be made based on understanding how the software architecture is built and the supporting infrastructure to facilitate the testing efforts. One of the critical success factors for this implementation is building a good quality engineering team with the right skills and proper tools. But that is another story — Coming soon “Quality Engineering: Redefined.”

A Comparison Between Spring and Spring Boot
26
Mar
2021

A Comparison Between Spring and Spring Boot

What Is Spring?

Simply put, the Spring framework provides comprehensive infrastructure support for developing Java applications.

It’s packed with some nice features like Dependency Injection, and out of the box modules like:

  • Spring JDBC
  • Spring MVC
  • Spring Security
  • Spring AOP
  • Spring ORM
  • Spring Test

These modules can drastically reduce the development time of an application.

For example, in the early days of Java web development, we needed to write a lot of boilerplate code to insert a record into a data source. By using the JDBCTemplate of the Spring JDBC module, we can reduce it to a few lines of code with only a few configurations.

3. What Is Spring Boot?

Spring Boot is basically an extension of the Spring framework, which eliminates the boilerplate configurations required for setting up a Spring application.

It takes an opinionated view of the Spring platform, which paves the way for a faster and more efficient development ecosystem.

Here are just a few of the features in Spring Boot:

  • Opinionated ‘starter’ dependencies to simplify the build and application configuration
  • Embedded server to avoid complexity in application deployment
  • Metrics, Health check, and externalized configuration
  • Automatic config for Spring functionality – whenever possible

Let’s get familiar with both of these frameworks step by step.

4. Maven Dependencies

First of all, let’s look at the minimum dependencies required to create a web application using Spring:

<dependency>
    <groupId>org.springframework</groupId>
    <artifactId>spring-web</artifactId>
    <version>5.3.5</version>
</dependency>
<dependency>
    <groupId>org.springframework</groupId>
    <artifactId>spring-webmvc</artifactId>
    <version>5.3.5</version>
</dependency>

Unlike Spring, Spring Boot requires only one dependency to get a web application up and running:

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-web</artifactId>
    <version>2.4.4</version>
</dependency>

All other dependencies are added automatically to the final archive during build time.

Another good example is testing libraries. We usually use the set of Spring Test, JUnit, Hamcrest, and Mockito libraries. In a Spring project, we should add all of these libraries as dependencies.

Alternatively, in Spring Boot we only need the starter dependency for testing to automatically include these libraries.

Spring Boot provides a number of starter dependencies for different Spring modules. Some of the most commonly used ones are:

  • spring-boot-starter-data-jpa
  • spring-boot-starter-security
  • spring-boot-starter-test
  • spring-boot-starter-web
  • spring-boot-starter-thymeleaf

For the full list of starters, also check out the Spring documentation.

5. MVC Configuration

Let’s explore the configuration required to create a JSP web application using both Spring and Spring Boot.

Spring requires defining the dispatcher servlet, mappings, and other supporting configurations. We can do this using either the web.xml file or an Initializer class:

public class MyWebAppInitializer implements WebApplicationInitializer {
 
    @Override
    public void onStartup(ServletContext container) {
        AnnotationConfigWebApplicationContext context
          = new AnnotationConfigWebApplicationContext();
        context.setConfigLocation("com.baeldung");
 
        container.addListener(new ContextLoaderListener(context));
 
        ServletRegistration.Dynamic dispatcher = container
          .addServlet("dispatcher", new DispatcherServlet(context));
         
        dispatcher.setLoadOnStartup(1);
        dispatcher.addMapping("/");
    }
}

We also need to add the @EnableWebMvc annotation to a @Configuration class, and define a view-resolver to resolve the views returned from the controllers:

@EnableWebMvc
@Configuration
public class ClientWebConfig implements WebMvcConfigurer { 
   @Bean
   public ViewResolver viewResolver() {
      InternalResourceViewResolver bean
        = new InternalResourceViewResolver();
      bean.setViewClass(JstlView.class);
      bean.setPrefix("/WEB-INF/view/");
      bean.setSuffix(".jsp");
      return bean;
   }
}

By comparison, Spring Boot only needs a couple of properties to make things work once we add the web starter:

spring.mvc.view.prefix=/WEB-INF/jsp/
spring.mvc.view.suffix=.jsp

All of the Spring configuration above is automatically included by adding the Boot web starter through a process called auto-configuration.

What this means is that Spring Boot will look at the dependencies, properties, and beans that exist in the application and enable configuration based on these.

Of course, if we want to add our own custom configuration, then the Spring Boot auto-configuration will back away.

5.1. Configuring Template Engine

Now let’s learn how to configure a Thymeleaf template engine in both Spring and Spring Boot.

In Spring, we need to add the thymeleaf-spring5 dependency and some configurations for the view resolver:

@Configuration
@EnableWebMvc
public class MvcWebConfig implements WebMvcConfigurer {

    @Autowired
    private ApplicationContext applicationContext;

    @Bean
    public SpringResourceTemplateResolver templateResolver() {
        SpringResourceTemplateResolver templateResolver = 
          new SpringResourceTemplateResolver();
        templateResolver.setApplicationContext(applicationContext);
        templateResolver.setPrefix("/WEB-INF/views/");
        templateResolver.setSuffix(".html");
        return templateResolver;
    }

    @Bean
    public SpringTemplateEngine templateEngine() {
        SpringTemplateEngine templateEngine = new SpringTemplateEngine();
        templateEngine.setTemplateResolver(templateResolver());
        templateEngine.setEnableSpringELCompiler(true);
        return templateEngine;
    }

    @Override
    public void configureViewResolvers(ViewResolverRegistry registry) {
        ThymeleafViewResolver resolver = new ThymeleafViewResolver();
        resolver.setTemplateEngine(templateEngine());
        registry.viewResolver(resolver);
    }
}

Spring Boot 1 only requires the dependency of spring-boot-starter-thymeleaf to enable Thymeleaf support in a web application. Due to the new features in Thymeleaf3.0, we also have to add thymeleaf-layout-dialect as a dependency in a Spring Boot 2 web application. Alternatively, we can choose to add a spring-boot-starter-thymeleaf dependency that’ll take care of all of this for us.

Once the dependencies are in place, we can add the templates to the src/main/resources/templates folder and the Spring Boot will display them automatically.

6. Spring Security Configuration

For the sake of simplicity, we’ll see how the default HTTP Basic authentication is enabled using these frameworks.

Let’s start by looking at the dependencies and configuration we need to enable Security using Spring.

Spring requires both the standard spring-security-web and spring-security-config dependencies to set up Security in an application.

Next we need to add a class that extends the WebSecurityConfigurerAdapter and makes use of the @EnableWebSecurity annotation:

@Configuration
@EnableWebSecurity
public class CustomWebSecurityConfigurerAdapter extends WebSecurityConfigurerAdapter {
 
    @Autowired
    public void configureGlobal(AuthenticationManagerBuilder auth) throws Exception {
        auth.inMemoryAuthentication()
          .withUser("user1")
            .password(passwordEncoder()
            .encode("user1Pass"))
          .authorities("ROLE_USER");
    }
 
    @Override
    protected void configure(HttpSecurity http) throws Exception {
        http.authorizeRequests()
          .anyRequest().authenticated()
          .and()
          .httpBasic();
    }
    
    @Bean
    public PasswordEncoder passwordEncoder() {
        return new BCryptPasswordEncoder();
    }
}

Here we’re using inMemoryAuthentication to set up the authentication.

Spring Boot also requires these dependencies to make it work, but we only need to define the dependency of spring-boot-starter-security as this will automatically add all the relevant dependencies to the classpath.

The security configuration in Spring Boot is the same as the one above.

To see how the JPA configuration can be achieved in both Spring and Spring Boot, we can check out our article A Guide to JPA with Spring.

7. Application Bootstrap

The basic difference in bootstrapping an application in Spring and Spring Boot lies with the servlet. Spring uses either the web.xml or SpringServletContainerInitializer as its bootstrap entry point.

On the other hand, Spring Boot uses only Servlet 3 features to bootstrap an application. Let’s talk about this in detail.

7.1. How Spring Bootstraps?

Spring supports both the legacy web.xml way of bootstrapping as well as the latest Servlet 3+ method.

Let’s see the web.xml approach in steps:

  1. Servlet container (the server) reads web.xml.
  2. The DispatcherServlet defined in the web.xml is instantiated by the container.
  3. DispatcherServlet creates WebApplicationContext by reading WEB-INF/{servletName}-servlet.xml.
  4. Finally, the DispatcherServlet registers the beans defined in the application context.

Here’s how Spring bootstraps using the Servlet 3+ approach:

  1. The container searches for classes implementing ServletContainerInitializer and executes.
  2. The SpringServletContainerInitializer finds all classes implementing WebApplicationInitializer.
  3. The WebApplicationInitializer creates the context with XML or @Configuration classes.
  4. The WebApplicationInitializer creates the DispatcherServlet with the previously created context.

7.2. How Spring Boot Bootstraps?

The entry point of a Spring Boot application is the class which is annotated with @SpringBootApplication:

@SpringBootApplication
public class Application {
    public static void main(String[] args) {
        SpringApplication.run(Application.class, args);
    }
}

By default, Spring Boot uses an embedded container to run the application. In this case, Spring Boot uses the public static void main entry point to launch an embedded web server.

It also takes care of the binding of the Servlet, Filter, and ServletContextInitializer beans from the application context to the embedded servlet container.

Another feature of Spring Boot is that it automatically scans all the classes in the same package or sub packages of the Main-class for components.

Additionally, Spring Boot provides the option of deploying it as a web archive in an external container. In this case, we have to extend the SpringBootServletInitializer:

@SpringBootApplication
public class Application extends SpringBootServletInitializer {
    // ...
}

Here the external servlet container looks for the Main-class defined in the META-INF file of the web archive, and the SpringBootServletInitializer will take care of binding the Servlet, Filter, and ServletContextInitializer.

8. Packaging and Deployment

Finally, let’s see how an application can be packaged and deployed. Both of these frameworks support common package managing technologies like Maven and Gradle; however, when it comes to deployment, these frameworks differ a lot.

For instance, the Spring Boot Maven Plugin provides Spring Boot support in Maven. It also allows packaging executable jar or war archives and running an application “in-place.”

Some of the advantages of Spring Boot over Spring in the context of deployment include:

  • Provides embedded container support
  • Provision to run the jars independently using the command java -jar
  • Option to exclude dependencies to avoid potential jar conflicts when deploying in an external container
  • Option to specify active profiles when deploying
  • Random port generation for integration tests

9. Conclusion

In this article, we learned about the differences between Spring and Spring Boot.

In a few words, we can say that Spring Boot is simply an extension of Spring itself to make development, testing, and deployment more convenient.

Advantages and Disadvantages of Spring Boot
30
Mar
2021

Advantages and Disadvantages of Spring Boot

The two best advantages of boot is simplified & version conflict free dependency management through the starter POMs and opinionated auto-configuration of most commonly used libraries and behaviors.

The embedded jars enables to package the web applications as jar file which can be run anywhere.

It’s actuator module provides HTTP endpoints to access application internals like detailed metrics, application inner working, health status, etc.

On disadvantages side, they are very few. Still many developers may see the transitive dependencies included with starter poms as burden to deployment packaging.

Also, it’s auto-configuration feature may enable many such features which we may never use in application lifecycle and they will sit there all the time initialized and fully configured. It may cause some un-necessary resource utilization.

Microservices Architecture
14
May
2021

Microservices Architecture

Microservices are defined as a self-regulating, and independent codebase that can be written and maintained even by a small team of developers. Microservices Architecture consists of such loosely coupled services with each service responsible for the execution of its associated business logic. 

The services are separated from each other based on the nature of their domains and belong to a mini-microservice pool. Enterprise mobile app developers leverage the capabilities of this architecture especially for complex applications. 

Microservices Architecture allows developers to release versions of software thanks to sophisticated automation of software building, testing, and deployment – something that acts as a prime differentiation point between Microservices and Monolithic architecture.

Microservices Architecture

Benefits

  • Since the services are bifurcated into pools, the architecture design pattern makes the system highly fault-tolerant. In other words, the whole software won’t collapse on its head even if some microservices cease to function. 
  • An enterprise mobile app development company working on such an architecture for clients can deploy multiple programming languages to build different microservices for their specific purpose. Therefore the technology stack can be kept updated with the latest upgrades in computing. 
  • This architecture is a perfect fit for applications that need to scale. Since the services are already independent of each other, they can scale individually rather than overloading the entire system with the need to expand. 
  • Services can be integrated into any application depending upon the scope of work. 

Potential Drawbacks 

  • Since each service is unique in its ability to contribute to the whole codebase, it could be challenging for an enterprise mobile application development company to interlink all and operate so many distinctive services seamlessly. 
  • Developers must define a standard protocol for all services to adhere to. It is important to do so, as the decentralized approach towards coding microservices in multiple languages can pose serious issues while debugging. 
  • Each microservice with its limited environment is responsible to maintain the integrity of the data. It is up to the architects of such a system to come up with a universally consistent data integrity protocol, wherever possible. 
  • You definitely need the best of breed professionals to design such a system for you as the technology stack keeps changing. 

Ideal For

Use Microservices Architecture for apps in which a specific segment will be used heavily than the others and would need a sporadic burst of scaling. Instead of a standalone application you may also deploy this for a service that provides functionality to other applications of the system. 

Introduction to Event Streaming with Kafka and Kafdrop
26
Mar
2021

Introduction to Event Streaming with Kafka and Kafdrop

Event sourcing, eventual consistency, microservices, CQRS… These are quickly becoming household names in mainstream application development. But do you know what makes them tick? What are the basic building blocks required to assemble complex, business-centric applications from fine-grained services without turning the lot into a big ball of mud?

This article examines a fundamental building block — event streaming. Leading the charge will be Apache Kafka — the de facto standard in event streaming platforms, which we’ll observe through Kafdrop — a feature-packed web UI.

A Brief Intro

Event streaming platforms reside in the broader class of Message-oriented Middleware (MoM) and are similar to traditional message queues and topics but offer stronger temporal guarantees and typically order-of-magnitude performance gains due to log-structured immutability. In simple terms, write operations are mostly limited to sequential appends, which make them fast. Really fast.

Whereas messages in a traditional Message Queue (MQ) tend to be arbitrarily ordered and generally independent of one another, events (or records) in a stream tend to be strongly ordered, often chronologically or causally. Also, a stream persists its records, whereas an MQ will discard a message as soon as it has been read.

For this reason, event streaming tends to be a better fit for Event-Driven Architectures, encompassing event sourcing, eventual consistency, and CQRS concepts. (Of course, there are FIFO message queues too, but the differences between FIFO queues and fully-fledged event streaming platforms are quite substantial, not limited to ordering alone.)

Event streaming platforms are a comparatively recent paradigm within the broader MoM domain. There are only a handful of mainstream implementations available, compared to hundreds of MQ-style brokers, some going back to the 1980s (e.g. Tuxedo). Compared to established standards such as AMQP, MQTT, XMPP, and JMS, there are no equivalent standards in the streaming space.

Event streaming platforms are an active area of continuous research and experimentation. In spite of this, streaming platforms aren’t just a niche concept or an academic idea with a few esoteric use cases; they can be applied effectively to a broad range of messaging and eventing scenarios, routinely displacing their more traditional counterparts.

You may also like: A Kafka Tutorial for Everyone, no Matter Your Stage in Development.

Architecture Overview

The diagram below offers a brief overview of the Kafka component architecture. While the intention isn’t to indoctrinate you with all of Kafka’s inner workings, some appreciation of its design will go a long way in explaining the key concepts that we will cover shortly.

Kafka Architecture Overview

Kafka is a distributed system comprising several key components:

  • Broker nodes: Responsible for the bulk of I/O operations and durable persistence within the cluster. Brokers accommodate the append-only log files that comprise the topic partitions hosted by the cluster. Partitions can be replicated across multiple brokers for both horizontal scalability and increased durability — these are called replicas. A broker node may act as the leader for certain replicas, while being a follower for others. A single broker node will also be elected as the cluster controller — responsible for the internal management of partition states. This includes the arbitration of the leader-follower roles for any given partition.
  • ZooKeeper nodes: Under the hood, Kafka needs a way to manage the overall controller status in the cluster. Should the controller drop out for whatever reason, there is a protocol in place to elect another controller from the set of remaining brokers. The actual mechanics of controller election, heart-beating, and so forth, are largely implemented in ZooKeeper. ZooKeeper also acts as a configuration repository of sorts, maintaining cluster metadata, leader-follower states, quotas, user information, ACLs, and other housekeeping items. Due to the underlying gossiping and consensus protocol, the number of ZooKeeper nodes must be odd.
  • Producers: These are client applications responsible for appending records to Kafka topics. Because of the log-structured nature of Kafka and the ability to share topics across multiple consumer ecosystems, only producers are able to modify the data in the underlying log files. The actual I/O is performed by the broker nodes on behalf of the producer clients. Any number of producers may publish to the same topic, selecting the partitions used to persist the records.
  • Consumers: These are client applications that read from topics. Any number of consumers may read from the same topic; however, depending on the configuration and grouping of consumers, there are rules governing the distribution of records among the consumers.

Topics, Partitions, Records, and Offsets

partition is a totally ordered sequence of records and is fundamental to Kafka. A record has an ID — a 64-bit integer offset and a millisecond-precise timestamp. Also, it may have a key and a value; both are byte arrays and both are optional. The term “totally ordered” simply means that for any given producer, records will be written in the order they were emitted by the application. If record P was published before Q, then P will precede Q in the partition. (Assuming P and Q share a partition.)

Furthermore, they will be read in the same order by all consumers; P will always be read before Q, for every possible consumer. This ordering guarantee is vital in a large majority of use cases. Published records will generally correspond to some real-life events, and preserving the timeline of these events is often essential.

Note: Kafka uses the term “record,” where others might use “message” or “event.” In this article, we will use the terms interchangeably, depending on the context. Similarly, you might see the term “stream” as a generic substitute for “topic.”

There is no recognized ordering across producers. If two (or more) producers emit records simultaneously, those records may materialize in arbitrary order. That said, this order will still be observed uniformly across all consumers.

A record’s offset uniquely identifies it in the partition. The offset is a strictly monotonically-increasing integer in a sparse address space, meaning that each successive offset is always higher than its predecessor and there may be varying gaps between neighboring offsets. Gaps might legitimately appear if compaction is enabled or as a result of transactions; we don’t need to delve into the details at this stage. Suffice it to say that offsets need not be contiguous.

Your application shouldn’t attempt to literally interpret an offset or guess what the next offset might be. It may, however, infer the relative order of any record pair based on their offsets, sort the records by their offset, and so forth.

The diagram below shows what a partition looks like on the inside.1

     start of partition

2

+--------+-----------------+

3

|0..00000|First record     |

4

+--------+-----------------+

5

|0..00001|Second record    |

6

+--------+-----------------+

7

|0..00002|Third record     |

8

+--------+-----------------+

9

|0..00003|Fourth record    |

10

+--------+-----------------+

11

|0..00007|Fifth record     |

12

+--------+-----------------+

13

|0..00008|Sixth record     |

14

+--------+-----------------+

15

|0..00010|Seventh record   |

16

+--------+-----------------+

17

            ...

18

+--------+-----------------+

19

|0..56789|Last record      |

20

+--------+-----------------+

21

       end of partition

The beginning offset, also called the low-water mark, is the first message that will be presented to a consumer. Due to Kafka’s bounded retention, this is not necessarily the first message that was published. Records may be pruned on the basis of time and/or partition size. When this occurs, the low-water mark will appear to advance, and records earlier than the low-water mark will be truncated.

Conversely, the high-water mark is the offset immediately following the last record in the partition, also known as the end offset. It is the offset that will be assigned to the next record that will be published. It is not the offset of the last record.

topic is a logical composition of partitions. A topic may have one or more partitions, and a partition must be a part of exactly one topic. Topics are fundamental to Kafka, allowing for both parallelism and load balancing.

Earlier, we said that partitions exhibit total order. Because partitions within a topic are mutually independent, the topic is said to exhibit partial order. In simple terms, this means that certain records may be ordered in relation to one another while being unordered with respect to certain other records. The concepts of total and partial order, while sounding somewhat academic, are hugely important in the construction of performant event streaming pipelines. They enables us to process records in parallel where we can, while maintaining order where we must. We’ll explore the concepts of record order, consumer parallelism, and topic sizing in a short while.

Example: Publishing Messages

Let’s put some of this theory into practice. We are going to spin up a pair of Docker containers — one for Kafka and another for Kafdrop. Rather than launching them individually, we’ll use Docker Compose.

Create a docker-compose.yaml file in a directory of your choice, containing the following:1

version: "2"

2

services:

3

  kafdrop:

4

    image: obsidiandynamics/kafdrop

5

    restart: "no"

6

    ports:

7

      - "9000:9000"

8

    environment:

9

      KAFKA_BROKERCONNECT: "kafka:29092"

10

    depends_on:

11

      - "kafka"

12

  kafka:

13

    image: obsidiandynamics/kafka

14

    restart: "no"

15

    ports:

16

      - "2181:2181"

17

      - "9092:9092"

18

    environment:

19

      KAFKA_LISTENERS: "INTERNAL://:29092,EXTERNAL://:9092"

20

      KAFKA_ADVERTISED_LISTENERS: "INTERNAL://kafka:29092,EXTERNAL://localhost:9092"

21

      KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: "INTERNAL:PLAINTEXT,EXTERNAL:PLAINTEXT"

22

      KAFKA_INTER_BROKER_LISTENER_NAME: "INTERNAL"

Note: We’re using the obsidiandynamics/kafka image for convenience because it neatly bundles Kafka and ZooKeeper into a single image. If you wanted to, you could replace this with images from Confluent or Wurstmeister, but then you’d have to wire it all up properly. The obsidiandynamics/kafka image does all this for you, so it’s highly recommended for beginners (and lazy pros).

Then, start it with docker-compose up. Once it boots, navigate to localhost:9000 in your browser. You should see the Kafdrop landing screen.

Kafdrop landing page

You should see our single-broker cluster. It’s a promising start, but there are no topics. Not a problem; let’s create a topic and publish some messages using Kafka’s command-line tools. Conveniently, we already have a Kafka image running as part of our docker-compose stack, so we can shell into it to use the built-in CLI tools.1

docker exec -it kafka-kafdrop_kafka_1 bash

This gets you into a Bash shell. The tools are in the /opt/kafka/bin directory, so let’s cd into it:1

cd /opt/kafka/bin

Create a topic named streams-intro with three partitions:1

./kafka-topics.sh --bootstrap-server localhost:9092 \

2

    --create --partitions 3 --replication-factor 1 \

3

    --topic streams-intro

Switching back to Kafdrop, we should now see the new topic in the list.

Kafdrop topics list

Time to publish stuff. We are going to use the kafka-console-producer tool:1

./kafka-console-producer.sh --broker-list localhost:9092 \

2

    --topic streams-intro --property "parse.key=true" \

3

    --property "key.separator=:"

Note:kafka-topics uses the --bootstrap-server argument to configure the Kafka broker list, while kafka-console-producer uses the --broker-list argument for the same purpose. Also, --property arguments are largely undocumented; be prepared to Google your way around.

Records are separated by newlines. The key and the value parts are delimited by colons, as indicated by the key.separator property. For the sake of an example, type in the following (a copy-paste will do):1

foo:first message

2

foo:second message

3

bar:first message

4

foo:third message

5

bar:second message

Press CTRL+D when done. Then, switch back to Kafdrop and click on the streams-intro topic. You’ll see an overview of the topic, along with a detailed breakdown of the underlying partitions:

Kafdrop topic overview

Let’s pause for a moment and dissect what’s been done. We created a topic with three partitions. We then published five records using two unique keys — foo and bar. Kafka uses keys to map records to partitions, such that all records with the same key will always appear on the same partition. Handy, but also important because it lets the publisher dictate the precise order of records. We’ll discuss key hashing and partition assignments in more detail later; in the meanwhile, sit back and enjoy the ride.

Looking at the partitions table, partition #0 has the first and last offsets at zero and two respectively. Partition #2 has them at zero and three, while partition #1 appears to blank. Clicking on #0 in the Kafdrop web UI sends us to a topic viewer:

Kafdrop topic viewer

We can see the two records published under the bar key. Note, they are completely unrelated to the foo records. Other than being collated within the same topic, there is nothing that binds records across partitions.

Note: In case you were wondering, the arrow to the left of the message lets you expand and pretty-print JSON-encoded messages. As our examples didn’t use JSON, there’s nothing to pretty-print.

It can be said without exaggeration that Kafka’s built-in tooling is an abomination. There is no consistency in the naming of command arguments and the simple act of publishing keyed messages requires you to jump through hoops — passing in obscure, undocumented properties. The usability of the built-in tools is a well-known heartache within the Kafka community. This is a real shame. It’s like buying a Ferrari, only to have it delivered with plastic hub caps. Fortunately, there are alternatives — both commercial and open source — that can fill the glaring gaps in tooling and observability.

Consumers and Consumer Groups

So far we have learned that producers emit records into the stream; these records are organized into nicely ordered partitions. Kafka’s pub-sub topology adheres to a flexible multipoint-to-multipoint model, meaning that there may be any number of producers and consumers simultaneously interacting with a stream. Depending on the actual solution context, stream topologies may also be point-to-multipoint, multipoint-to-point, and point-to-point. It’s about time we looked at how records are consumed.

consumer is a process or a thread that attaches to a Kafka cluster via a client library. (One is available for most languages.) A consumer generally, but not necessarily, operates as part of an encompassing consumer group. The group is specified by the group.id property. Consumer groups are effectively a load-balancing mechanism within Kafka — distributing partition assignments approximately evenly among the individual consumer instances within the group.

When the first consumer in a group joins the topic, it will receive all partitions in that topic. When a second consumer subsequently joins, it will get approximately half of the partitions, relieving the first consumer of half of its prior load. The process runs in reverse when consumers leave (by disconnecting or timing out) — the remaining consumers will absorb a greater number of partitions.

So, a consumer siphons records from a topic, pulling from the share of partitions that have been assigned to it by Kafka, alongside the other consumers in its group. As far as load-balancing goes, this should be fairly straightforward. But here’s the kicker — the act of consuming a record does not remove it. This might seem contradictory at first, especially if you associate the act of consuming with depletion. (If anything, a consumer should have been called a ‘reader’, but let’s not dwell on the choice of terminology.)

The simple fact is, consumers have absolutely no impact on topics and their partitions. A topic is an append-only ledger that may only be mutated by the producer, or by Kafka itself (as part of compaction or cleanup). Consumers are “cheap,” so you can have quite a number of them tail the logs without stressing the cluster. This is yet another point of distinction between an event stream and a traditional message queue, and it’s a crucial one.

A consumer internally maintains an offset that points to the next record in a partition, advancing the offset for every successive read. When a consumer first subscribes to a topic, it may elect to start at either the head-end or the tail-end of the topic. This behavior is controlled by setting the auto.offset.reset property to one of latestearliest or none. In the latter case, an exception will be thrown if no previous offset exists for the consumer group.

Consumers retain their offset state vector locally. Since consumers across different consumer groups do not interfere, there may be any number of them reading concurrently from the same topic. Consumers run at their own pace — a slow or backlogged consumer has no impact on its peers.

To illustrate this concept, consider a scenario involving a topic with two partitions. Two consumer groups, A and B, are subscribed to the topic. Each group has three instances, the consumers being named A1A2A3B1B2, and B3. The diagram below illustrates how the two groups might share the topic and how the consumers advance through the records independently of one another.1

               Partition 0                       Partition 1

2

               +--------+                        +--------+

3

               |0..00000|                        |0..00000|

4

               +--------+                        +--------+

5

               |0..00001| <= consumer A2         |0..00001|

6

               +--------+                        +--------+

7

               |0..00002|                        |0..00002| <= consumer A1

8

               +--------+                        +--------+

9

               |0..00003|                        |0..00003| 

10

               +--------+                        +--------+

11

                  ...                               ...

12

               +--------+                        +--------+

13

               |0..00008| <= consumer B3         |0..00008| <= consumer B2

14

               +--------+                        +--------+

15

               |0..00009|                        |0..00009|

16

               +--------+                        +--------+

17

producer P1 => |0..00010|                        |0..00010|

18

               +--------+                        +--------+

19

                                  producer P1 => |0..00011|

20

                                                 +--------+

21

Look carefully and you’ll notice something is missing. Consumers A3 and B1 aren’t there. That’s because Kafka guarantees that a partition may only be assigned to at most one consumer within its consumer group. (We say ‘at most’ to cover the case when all consumers are offline.) Because there are three consumers in each group, but only two partitions, one consumer will remain idle — waiting for another consumer in its respective group to depart before being assigned a partition.

In this manner, consumer groups are not only a load-balancing mechanism, but also a fence-like exclusivity control, used to build highly performant pipelines without sacrificing safety, particularly when there is a requirement that a record may only be handled by one thread or process at any given time.

Consumer groups are also used to ensure availability. By periodically pulling records from a topic, the consumer implicitly signals to the cluster that it’s in a ‘healthy’ state, thereby extending the lease over its partition assignment. However, should the consumer fail to read again within the allowable deadline, it will be deemed faulty and its partitions will be reassigned — apportioned among the remaining ‘healthy’ consumers within its group. This deadline is controlled by the max.poll.interval.ms consumer client property, set to five minutes by default.

To use a transportation analogy, a topic is like a highway, while a partition is a lane. A record is the equivalent of a car, and its occupants correspond to the record’s value. Several cars can safely travel on the same highway, providing they keep to their lane. Cars sharing the same line ride in a sequence, forming a queue. Now, suppose each lane leads to an off-ramp, diverting its traffic to some location. If one off-ramp gets banked up, other off-ramps may still flow smoothly.

It’s precisely this highway-lane metaphor that Kafka exploits to achieve its end-to-end throughput, easily reaching millions of records per second on commodity hardware. When creating a topic, one can choose the partition count — the number of lanes, if you will.

The partitions are divided approximately evenly among the individual consumers in a consumer group, with a guarantee that no partition will be assigned to two (or more) consumers at the same time, providing that these consumers are part of the same consumer group. Referring to our analogy, a car will never end up in two off-ramps simultaneously; however, two lanes might conceivably lead to the same off-ramp.

Note: A topic may be resized after creation by increasing the number of partitions. It is not possible, however, to decrease the partition count without recreating the topic.

Records correspond to events, messages, commands, or any other streamable content. Precisely how records are partitioned is left to the discretion of the producer(s). A producer may explicitly assign a partition index when publishing a record, although this approach is rarely used. A much more common approach is to assign a key to a record, as we have done in our earlier example. The key is completely opaque to Kafka. In other words, Kafka doesn’t attempt to interpret the contents of the key, treating it as an array of bytes. These bytes are hashed to derive a partition index, using a consistent hashing technique.

Records sharing the same hash are guaranteed to occupy the same partition. Assuming a topic with multiple partitions, records with a different key will likely end up in different partitions. However, due to hash collisions, records with different hashes may also end up in the same partition. Such is the nature of hashing. If you understand how a hash table works, this is no different.

Producers rarely care which specific partition the records will map to, only that related records end up in the same partition, and that their order is preserved. Similarly, consumers are largely indifferent to their assigned partitions, so long that they receive the records in the same order as they were published, and their partition assignment does not overlap with any other consumer in their group.

Committing Offsets

We already said that consumers maintain an internal state with respect to their partition offsets. At some point, that state must be shared with Kafka, so that when a partition is reassigned, the new consumer can resume processing from where the outgoing consumer left off. Similarly, if the consumers were to disconnect, upon reconnection they would ideally skip over those records that have already been processed.

Persisting the consumer state back to the Kafka cluster is called committing an offset. Typically, a consumer will read a record (or a batch of records) and commit the offset of the last record, plus one. If a new consumer takes over the topic, it will commence processing from the last committed offset, hence the plus-one step is essential. (Otherwise, the last processed record would be handled a second time.)

Fun fact: Kafka employs a recursive approach to managing committed offsets, elegantly utilising itself to persist and track offsets. When an offset is committed, Kafka will publish a binary record on the internal __consumer_offsets topic. The contents of this topic are compacted in the background, creating an efficient event store that progressively reduces to only the last known commit points for any given consumer group.

Controlling the point when an offset is committed provides a great deal of flexibility around delivery guarantees, handing Kafka a yet another trump card. The term ‘delivery’ assumes not just reading a record, but the full processing cycle, complete with any side-effects. One can shift from an at-most-once to an at-least-once delivery model by simply moving the commit operation from a point before the processing of a record is commenced, to a point sometime after the processing is complete. With this model, should the consumer fail midway through processing a record, the record will be re-read the following partition reassignment.

By default, a Kafka consumer will automatically commit offsets every five seconds, regardless of whether the consumer has finished processing the record. Often, this is not what you want, as it may lead to mixed delivery semantics. For example, in the event of consumer failure, some records might be delivered twice, while others might not be delivered at all. To enable manual offset committing, set the enable.auto.commit property to false.

Note: There are a few gotchas like this in Kafka. Pay close attention to the (producer and consumer) client properties in the official Kafka documentation, particularly to the stated defaults. Don’t assume for a moment that the defaults are sensible, insofar as they ought to favour safety over other competing qualities. Kafka defaults tend to be optimised for performance, and will need to be explicitly overridden on the client when safety is a critical objective. Fortunately, setting the properties to insure safety has only a minor impact on performance — Kafka is still a beast. Remember the first rule of optimisation: Don’t do it. Kafka would have been even better, had their creators given this more thought.

Getting offset committing right can be tricky, and routinely catches out beginners. A committed offset implies that the record one below that offset and all prior records have been dealt with by the consumer. When designing at-least-once or exactly-once applications, an offset should only be committed when the application is dealt with with the record in question and all records before it.

In other words, the record has been processed to the point that any actions that would have resulted from the record have been carried out and finalized. This may include calling other APIs, updating a database, committing transactions, persisting the record’s payload, or publishing more records. Stated otherwise, if the consumer were to fail after committing the record, then not ever seeing this record again must not be detrimental to its correctness.

In the at-least-once (and by extension, the exactly-once) scenario, a typical consumer implementation will commit its offset linearly, in tandem with the processing of the records. That is, read a record, commit it (plus-one), read the next, commit it (plus one), and so on. A common tactic is to process a batch of records concurrently (where this makes sense), using a thread pool, and only confirm the last record when the entire batch is done. The commit process in Kafka is very efficient, the client library will send commit requests asynchronously to the cluster using an in-memory queue, without blocking the consumer. The client application can register an optional callback, notifying it when the commit has been acknowledged by the cluster.

The consumer group is a somewhat understated concept that is pivotal to the versatility of an event streaming platform. By simply varying the affinity of consumers with their groups, one can arrive at vastly different distribution topologies — from a topic-like, pub-sub behavior to an MQ-style, point-to-point model. Because records are never truly consumed (the advancing offset only creates the illusion of consumption), one can concurrently superimpose disparate distribution topologies over a single event stream.

Free Consumers

Consumer groups are completely optional; a consumer does not need to be encompassed in a consumer group to pull messages from a topic. A free consumer omits the group.id property. Doing so allows it to operate under relaxed rules, entirely transferring the responsibility for consumer management to the application.

Note: The use of the term ‘free’ to denote a consumer without an encompassing group is not part of the standard Kafka nomenclature. As Kafka lacks a canonical term to describe this, the term ‘free’ was adopted here.

Free consumers do not subscribe to a topic. Instead, the consuming application is responsible for manually assigning a set of topic-partitions to the consumer, individually specifying the starting offset for each topic-partition pair. Free consumers do not commit their offsets to Kafka; it is up to the application to track the progress of such consumers and persist their state as appropriate, using a datastore of their choosing. The concepts of automatic partition assignment, rebalancing, offset persistence, partition exclusivity, consumer heart-beating and failure detection, and other so-called niceties accorded to consumer groups cease to exist in this mode.

Free consumers are not observed in the wild as often as their grouped counterparts. There are predominantly two use cases where a free consumer is an appropriate choice. The first, is when you genuinely need full control of the partition assignment scheme and/or you require an alternative place to store consumer offsets. This is very rare.

Needless to say, it’s also very difficult to implement correctly, given the multitude of scenarios one must account for. The second, more commonly seen use case, is when you have a stateless or ephemeral consumer that needs to monitor a topic. For example, you might be interested in tailing a topic to identify specific records, or just as a debugging tool. You might only care about records that were published when your stateless consumer was online, so concerns such as persisting offsets and resuming from the last processed record are completely irrelevant.

A good example of where this is used routinely is the Kafdrop web UI, which we’ve already seen. When you click on a topic to view the messages, Kafdrop creates a free consumer and assigns the requested partition to it, reading the records from the supplied offsets. Navigating to a different topic or partition will reset the consumer, discarding any prior state.

The illustration below outlines the relationship between producers, topics, partitions, consumers, and consumer groups.1

+----------+          +----------+

2

|PRODUCER 1|          |PRODUCER 2|

3

+-----v----+          +-----v----+

4

      |                     |

5

      |                     |

6

      |                     |

7

 +----V---------------------V-----------------------------------------+

8

 |                            >>> TOPIC >>>                           |

9

 |            +---------------------------------------------------+   |

10

 | PARTITION 0|record 0..00|record 0..01|record 0..02|record 0..03|   |

11

 |            +--------------------v------------------------------+   |

12

 |                                 |                                  |

13

 |            +--------------------|------------------------------+   |

14

 | PARTITION 1|record 0..00|       |    |record 0..02|record 0..03|   |

15

 |            +--------------------|-------------v----------------+   |

16

 |                                 |             |                    |

17

 +----------v----------------------|-------------|--------------------+

18

            |                      |             |       

19

            |                      |             | 

20

            |                      |             | 

21

            |              +-------|-------------|----------------------+

22

            |              |       |             |                      |

23

       +----V-----+        | +-----V----+   +----V-----+   +----------+ |

24

       |CONSUMER 1|        | |CONSUMER 2|   |CONSUMER 3|   |CONSUMER 4| |

25

       +----------+        | +----------+   +----------+   +----------+ |

26

                           |               CONSUMER GROUP               |

27

                           +--------------------------------------------+

28

29

The key takeaways are:

  • Topics are subdivided into partitions, each forming an independent, totally-ordered sequence within a wider, partially-ordered stream.
  • Multiple producers are able to publish to a topic, picking a partition at will. This may be accomplished either directly, by specifying a partition index, or indirectly, by way of a record key, which deterministically hashes to a consistent partition index. (In the diagram above, both Producer 1 and Producer 2 publish to the same topic.)
  • Partitions in a topic can be load-balanced across a population of consumers in a consumer group, allocating partitions approximately evenly among the members of that group. (Consumer 2 and Consumer 3 each get one partition.)
  • A consumer in a group is not guaranteed a partition assignment. Where the group’s population outnumbers the partitions, some consumers will remain idle until this balance equalizes or tips in favor of the other side. (Consumer 4 remains partition-less.)
  • Partitions may be manually assigned to free consumers. If necessary, an entire topic may be assigned to a single free consumer — this is done by individually assigning all partitions. (Consumer 1 can be freely assigned any partition.)

Exactly-Once Delivery

When contrasting at-least-once with at-most-once delivery semantics, an often-asked question is: Why can’t we have it exactly once?

Without delving into the academic details, which involve conjectures and impossibility proofs, it is sufficient to say that exactly-once semantics are not possible without collaboration with the consumer application. What does this mean in practice?

Consumers in event streaming applications must be idempotent. In other words, processing the same record repeatedly should have no net effect on the consumer ecosystem. If a record has no additive effects, the consumer is inherently idempotent. (For example, if the consumer simply overwrites an existing database entry with a new one, then the update is naturally idempotent.) Otherwise, the consumer must check whether a record has already been processed, and to what extent, prior to processing a record. The combination of at-least-once delivery and consumer idempotence collectively leads to exactly-once semantics.

Example: A Trading Platform

With all this theory looming over us like Kubrick’s Monolith, it would be inappropriate to conclude without offering the reader a practical scenario.

Let’s say you were looking for specific price patterns in listed stocks, emitting trading signals when a particular pattern is identified. There are a large number of stocks, and understandably you’d like them processed in parallel. However, the time series for any given ticker code must be processed sequentially on a single consumer.

Kafka makes this use case, and others like it, almost trivial to implement. We would create a pair of topics: prices for the raw price data, and orders for any resulting orders. We can be fairly generous with our partition counts, as the nature of the data gives us ample opportunities for parallelism.

At the feed source, we could publish a record for each price on the prices topic, keyed by the ticker code. Kafka’s automatic partition assignment will ensure that every ticker code is handled by (at most) one consumer in its group. The consumer instances are free to scale in and out to match the processing load. Consumer groups should be meaningfully named, ideally reflecting the purpose of the consuming application. A good example might be trading-strategy.abc, for a fictitious trading strategy named ‘ABC’.

Once a price pattern is identified by the consumer, it can publish another message — the order request — on the orders topic. We’ll muster up another consumer group — order-execution — responsible for reading the orders and forwarding them to the broker.

In this simple example, we have created an end-to-end trading pipeline that is entirely event-driven and highly scalable — at least theoretically, assuming there are no other bottlenecks. We can dynamically add more processing nodes to the individual stages to cope with the increased load where it’s called for.

Now, let’s spice things up a bit. Suppose you need several trading strategies operating concurrently, driven by a common data feed. Furthermore, the trading strategies will be developed by different teams; the objective being to decouple these implementations as much as possible, allowing the teams to operate autonomously — develop and deploy at their individual cadence, perhaps even using different programming languages and tool-chains. That said, you’d ideally want to reuse as much of what’s already been written. So, how would we pull this off? 

Trading Platform

Kafka’s flexible multipoint-to-multipoint pub-sub architecture combines stateful consumption with broadcast semantics. Using distinct consumer groups, Kafka allows disparate applications to share input topics, processing events at their own pace. The second trading strategy would need a dedicated consumer group — trading-strategy.xyz — applying its specific business logic to the common pricing stream, publishing the resulting orders to the same orders topic. In this fashion, Kafka enables you to construct modular event processing pipelines from discrete elements that are readily reusable and composable.

Note: In the days of service buses and traditional ‘enterprisey’ message brokers, before event sourcing entered the mainstream, you would have had to choose between persistent message queues or transient broadcast topics. In our example, you would likely have created multiple FIFO queues, using the fan-out pattern. Because Kafka generalises pub-sub topics and persistent message queues into a unified model, a single source topic can power a diverse range of consumers without incurring duplication.

In Conclusion

Event streaming platforms are a highly effective building block in the construction of modular, loosely-coupled, event-driven applications. Within the world of event streaming, Kafka has solidified its position as the go-to open-source solution that is both amazingly flexible and highly performant. Concurrency and parallelism are at the heart of Kafka’s architecture, forming partially-ordered event streams that can be load-balanced across a scalable consumer ecosystem. A simple reconfiguration of consumers and their encompassing groups can bring about vastly different event distribution and processing semantics; shifting the offset commit point can invert the delivery guarantee from an at-most-once to an at-least-once model.

Of course, Kafka isn’t without its flaws. The tooling is sub-par, to put it mildly; most Kafka practitioners have long abandoned the out-of-the-box CLI utilities in favour of other open-source tools such as KafdropKafkacat and third-party commercial offerings like Kafka Tool. The breadth of Kafka’s configuration options is overwhelming, with defaults that are riddled with gotchas, ready to shock the unsuspecting first-time user.

All in all, Kafka represents a paradigm shift in how we architect and build complex systems. Its benefits go beyond the superfluous, and they dwarf any of the niggles that are bound to exist in a technology that has undergone such aggressive adoption. Crucially, it paves the way for further progress in its space; Apache Pulsar is a prime example of an alternative platform that has improved on much of Kafka’s shortcomings, yet owes a great deal to its predecessor for laying the cornerstone and bringing the genre to the mainstream.