Category: Spring Boot

The Spring Framework is an application framework and inversion of control container for the Java platform. The framework’s core features can be used by any Java application, but there are extensions for building web applications on top of the Java EE platform.

Spring Boot Database Migrations with Flyway
05
Mar
2021

Spring Boot Database Migrations with Flyway

Nothing is perfect and complete when it is built the first time. The database schema of your brand new application is no exception. It is bound to change over time when you try to fit new requirements or add new features.

Flyway is a tool that lets you version control incremental changes to your database so that you can migrate it to a new version easily and confidently.

In this article, you’ll learn how to use Flyway in Spring Boot applications to manage changes to your database.

We’ll build a simple Spring Boot application with MySQL Database & Spring Data JPA, and learn how to integrate Flyway in the app.

Let’s get started!

Creating the Application

Let’s use Spring Boot CLI to generate the application. Fire up your terminal and type the following command to generate the project.

spring init --name=flyway-demo --dependencies=web,mysql,data-jpa,flyway flyway-demo

Once the project is generated, import it into your favorite IDE. The directory structure of the application would look like this –

Spring Boot Flyway Database Migration Example Directory Structure

Configuring MySQL and Hibernate

First create a new MySQL database named flyway_demo.

Then Open src/main/application.properties file and add the following properties to it –

## Spring DATASOURCE (DataSourceAutoConfiguration & DataSourceProperties)
spring.datasource.url = jdbc:mysql://localhost:3306/flyway_demo?useSSL=false
spring.datasource.username = root
spring.datasource.password = root

## Hibernate Properties
# The SQL dialect makes Hibernate generate better SQL for the chosen database
spring.jpa.properties.hibernate.dialect = org.hibernate.dialect.MySQL5InnoDBDialect

## This is important
# Hibernate ddl auto (create, create-drop, validate, update)
spring.jpa.hibernate.ddl-auto = validate

Please change spring.datasource.username and spring.datasource.password as per your MySQL installation.

The property spring.jpa.hibernate.ddl-auto is important. It tries to validate the database schema according to the entities that you have created in the application and throws an error if the schema doesn’t match the entity specifications.

Creating a Domain Entity

Let’s create a simple Entity in our application so that we can create and test flyway migration for this entity.

First, Create a new package called domain inside com.example.flywaydemo package. Then, create the following User.java file inside com.example.flywaydemo.domain package –

package com.example.flywaydemo.domain;

import javax.persistence.*;
import javax.validation.constraints.NotBlank;
import javax.validation.constraints.Size;

@Entity
@Table(name = "users")
public class User {
    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    private Long id;

    @NotBlank
    @Column(unique = true)
    @Size(min = 1, max = 100)
    private String username;

    @NotBlank
    @Size(max = 50)
    private String firstName;

    @Size(max = 50)
    private String lastName;

    public Long getId() {
        return id;
    }

    public void setId(Long id) {
        this.id = id;
    }

    public String getUsername() {
        return username;
    }

    public void setUsername(String username) {
        this.username = username;
    }

    public String getFirstName() {
        return firstName;
    }

    public void setFirstName(String firstName) {
        this.firstName = firstName;
    }

    public String getLastName() {
        return lastName;
    }

    public void setLastName(String lastName) {
        this.lastName = lastName;
    }
}

Creating a Flyway Migration script

Flyway tries to read database migration scripts from classpath:db/migration folder by default.

All the migration scripts must follow a particular naming convention – V<VERSION_NUMBER>__<NAME>.sql. Checkout the Official Flyway documentation to learn more about naming convention.

Let’s create our very first database migration script. First, Create the db/migration folder inside src/main/resources directory –

mkdir -p src/main/resources/db/migration

Now, create a new file named V1__init.sql inside src/main/resources/db/migration directory and add the following sql scripts –

CREATE TABLE users (
  id bigint(20) NOT NULL AUTO_INCREMENT,
  username varchar(100) NOT NULL,
  first_name varchar(50) NOT NULL,
  last_name varchar(50) DEFAULT NULL,
  PRIMARY KEY (id),
  UNIQUE KEY UK_username (username)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

Running the Application

When you run the application, flyway will automatically check the current database version and apply any pending migrations.

Run the app by typing the following command in terminal –

mvn spring-boot:run

When running the app for the first time, you’ll see the following logs pertaining to Flyway, which says that it has migrated the schema to version 1 – init.

Flyway Migration Logs

How does Flyway manage migrations?

Flyway creates a table called flyway_schema_history when it runs the migration for the first time and stores all the meta-data required for versioning the migrations in this table.

mysql> show tables;
+-----------------------+
| Tables_in_flyway_demo |
+-----------------------+
| flyway_schema_history |
| users                 |
+-----------------------+
2 rows in set (0.00 sec)

You can check the contents of flyway_schema_history table in your mysql database –

mysql> select * from flyway_schema_history;
+----------------+---------+-------------+------+--------------+------------+--------------+---------------------+----------------+---------+
| installed_rank | version | description | type | script       | checksum   | installed_by | installed_on        | execution_time | success |
+----------------+---------+-------------+------+--------------+------------+--------------+---------------------+----------------+---------+
|              1 | 1       | init        | SQL  | V1__init.sql | 1952043475 | root         | 2018-03-06 11:25:58 |             16 |       1 |
+----------------+---------+-------------+------+--------------+------------+--------------+---------------------+----------------+---------+
1 row in set (0.00 sec)

It stores the current migration’s version, script file name, and checksum among other details in the table.

When you run the app, Flyway first validates the already applied migration scripts by calculating their checksum and matching it with the checksum stored in the meta-data table.

So, If you change V1__init.sql after the migration is applied, Flyway will throw an error saying that the checksum doesn’t match.

Therefore, for doing any changes to the schema, you need to create another migration script file with a new version V2__<NAME>.sql and let Flyway apply that when you run the app.

Adding multiple migrations

Let’s create another migration script and see how flyway migrates the database to the new version automatically.

Create a new script V2__testdata.sql inside src/main/resources/db/migration with the following contents –

INSERT INTO users(username, first_name, last_name) 
VALUES('fusebes', 'Yaniv', 'Levy');
INSERT INTO users(username, first_name, last_name) VALUES('flywaytest', 'Flyway', 'Test');

If you run the app now, flyway will detect the new migration script, and migrate the database to this version.

Open mysql and check the users table. You’ll see that the above two entries are automatically created in the users table –

mysql> select * from users;
+----+------------+-----------+------------+
| id | first_name | last_name | username   |
+----+------------+-----------+------------+
|  4 | Yaniv      | Levy      | fusebes    |
|  5 | Flyway     | Test      | flywaytest |
+----+------------+-----------+------------+
2 rows in set (0.01 sec)

Also, Flyway stores this new schema version in its meta-data table –

mysql> select * from flyway_schema_history;
+----------------+---------+-------------+------+------------------+-------------+--------------+---------------------+----------------+---------+
| installed_rank | version | description | type | script           | checksum    | installed_by | installed_on        | execution_time | success |
+----------------+---------+-------------+------+------------------+-------------+--------------+---------------------+----------------+---------+
|              1 | 1       | init        | SQL  | V1__init.sql     |  1952043475 | root         | 2018-03-06 11:25:58 |             16 |       1 |
|              2 | 2       | testdata    | SQL  | V2__testdata.sql | -1926058189 | root         | 2018-03-06 11:25:58 |              6 |       1 |
+----------------+---------+-------------+------+------------------+-------------+--------------+---------------------+----------------+---------+
2 rows in set (0.00 sec)

Conclusion

That’s all folks! In this article, You learned how to integrate Flyway in a Spring Boot application for versioning database changes.

Thank you for reading. See you in the next blog post.

Why We Use Spring Boot Maven Plugin
30
Mar
2021

Why We Use Spring Boot Maven Plugin?

It provides Spring Boot support in Maven, letting us package executable jar or war archives and run an application “in-place”. To use it, we must use Maven 3.2 (or later).

The plugin provides several goals to work with a Spring Boot application:

  • spring-boot:repackage: create a jar or war file that is auto-executable. It can replace the regular artifact or can be attached to the build lifecycle with a separate classifier.
  • spring-boot:run: run your Spring Boot application with several options to pass parameters to it.
  • spring-boot:start and stop: integrate your Spring Boot application to the integration-test phase so that the application starts before it.
  • spring-boot:build-info: generate a build information that can be used by the Actuator.
JPA / Hibernate Composite Primary Key Example with Spring Boot
03
Mar
2021

JPA / Hibernate Composite Primary Key Example with Spring Boot

In this article, You’ll learn how to map a composite primary key in Hibernate using JPA’s @Embeddable and @EmbeddedId annotations.

Let’s say that We have an application that manages Employees of various companies. Every employee has a unique employeeId within his company. But the same employeeId can be present in other companies as well, So we can not uniquely identity an employee just by his employeeId.

To identify an employee uniquely, we need to know his employeeId and companyId both. Check out the following Employees table that contains a composite primary key which includes both the employeeId and companyId columns –

Hibernate Composite Primary Key Example Table Structure

Let’s create a project from scratch and learn how to map such composite primary key using JPA and Hibernate.

Creating the Project

You can generate the project quickly using Spring Boot CLI by typing the following command in the terminal –

spring init -n=jpa-composite-primary-key-demo -d=web,jpa,mysql --package-name=com.example.jpa jpa-composite-primary-key-demo

Alternatively, You can also use Spring Initializr web app to generate the project. Follow the instructions below to generate the app using Spring Initializr web app –

  1. Open http://start.spring.io
  2. Enter Artifact as “jpa-composite-primary-key-demo”
  3. Click Options dropdown to see all the options related to project metadata.
  4. Change Package Name to “com.example.jpa”
  5. Select WebJPA and Mysql dependencies.
  6. Click Generate to generate and download the project.

Following is the directory structure of the complete application for your reference –

JPA, Hibernate, Spring Boot Composite Primary Key Example Directory Structure

(Your bootstrapped project won’t have model and repository packages and all the other classes. We’ll create them as we proceed to next sections)

Configuring the Database and Hibernate Log Levels

Let’s add the MySQL database URL, username and password configurations in src/main/resources/application.properties file –

# DATASOURCE (DataSourceAutoConfiguration & DataSourceProperties)
spring.datasource.url=jdbc:mysql://localhost:3306/jpa_composite_pk_demo?useSSL=false&serverTimezone=UTC&useLegacyDatetimeCode=false
spring.datasource.username=root
spring.datasource.password=root

# Hibernate

# The SQL dialect makes Hibernate generate better SQL for the chosen database
spring.jpa.properties.hibernate.dialect = org.hibernate.dialect.MySQL5InnoDBDialect

# Hibernate ddl auto (create, create-drop, validate, update)
spring.jpa.hibernate.ddl-auto = update

logging.level.org.hibernate.SQL=DEBUG
logging.level.org.hibernate.type=TRACE

Apart from MySQL database configurations, I’ve also specified hibernate log levels and other properties.

The property spring.jpa.hibernate.ddl-auto = update keeps the Entity types in your application and the mapped database tables in sync. Whenever you update a domain entity, the corresponding mapped table in the database will also get updated when you restart the application next time.

This is great for development because you don’t need to manually create or update the tables. They will automatically be created/updated based on the Entity classes in your application.

Before proceeding to the next section, Please make sure that you create a MySQL database named jpa_composite_pk_demo and change spring.datasource.username and spring.datasource.password properties as per your MySQL installation.

Defining the Domain model

A composite primary key is mapped using an Embeddable type in hibernate. We’ll first create an Embeddable type called EmployeeIdentity containing the employeeId and companyId fields, and then create the Employee entity which will embed the EmployeeIdentity type.

Create a new package named model inside com.example.jpa package and then add the following classes inside the model package –

1. EmployeeIdentity – Embeddable Type

package com.example.jpa.model;

import javax.persistence.Embeddable;
import javax.validation.constraints.NotNull;
import javax.validation.constraints.Size;
import java.io.Serializable;

@Embeddable
public class EmployeeIdentity implements Serializable {
    @NotNull
    @Size(max = 20)
    private String employeeId;

    @NotNull
    @Size(max = 20)
    private String companyId;

    public EmployeeIdentity() {

    }

    public EmployeeIdentity(String employeeId, String companyId) {
        this.employeeId = employeeId;
        this.companyId = companyId;
    }

    public String getEmployeeId() {
        return employeeId;
    }

    public void setEmployeeId(String employeeId) {
        this.employeeId = employeeId;
    }

    public String getCompanyId() {
        return companyId;
    }

    public void setCompanyId(String companyId) {
        this.companyId = companyId;
    }

    @Override
    public boolean equals(Object o) {
        if (this == o) return true;
        if (o == null || getClass() != o.getClass()) return false;

        EmployeeIdentity that = (EmployeeIdentity) o;

        if (!employeeId.equals(that.employeeId)) return false;
        return companyId.equals(that.companyId);
    }

    @Override
    public int hashCode() {
        int result = employeeId.hashCode();
        result = 31 * result + companyId.hashCode();
        return result;
    }
}

2. Employee – Domain model

package com.example.jpa.model;

import org.hibernate.annotations.NaturalId;
import javax.persistence.Column;
import javax.persistence.EmbeddedId;
import javax.persistence.Entity;
import javax.persistence.Table;
import javax.validation.constraints.NotNull;
import javax.validation.constraints.Email;
import javax.validation.constraints.Size;

@Entity
@Table(name = "employees")
public class Employee {

    @EmbeddedId
    private EmployeeIdentity employeeIdentity;

    @NotNull
    @Size(max = 60)
    private String name;

    @NaturalId
    @NotNull
    @Email
    @Size(max = 60)
    private String email;

    @Size(max = 15)
    @Column(name = "phone_number", unique = true)
    private String phoneNumber;

    public Employee() {

    }

    public Employee(EmployeeIdentity employeeIdentity, String name, String email, String phoneNumber) {
        this.employeeIdentity = employeeIdentity;
        this.name = name;
        this.email = email;
        this.phoneNumber = phoneNumber;
    }

    // Getters and Setters (Omitted for brevity)
}

In the Employee class, We use @EmbeddedId annotation to embed the EmployeeIdentity type and mark it as a primary key.

Creating the Repository

Next, Let’s create the repository for accessing the Employee data from the database. First, Create a new package named repository inside com.example.jpa package, then add the following interface inside the repository package –

package com.example.jpa.repository;

import com.example.jpa.model.Employee;
import com.example.jpa.model.EmployeeIdentity;
import org.springframework.data.jpa.repository.JpaRepository;
import org.springframework.stereotype.Repository;

@Repository
public interface EmployeeRepository extends JpaRepository<Employee, EmployeeIdentity> {

}

Code to test the Composite Primary Key Mapping

Finally, Let’s write some code to test the composite primary key mapping. Open the main class JpaCompositePrimaryKeyDemoApplication.java and replace it with the following code –

package com.example.jpa;

import com.example.jpa.model.Employee;
import com.example.jpa.model.EmployeeIdentity;
import com.example.jpa.repository.EmployeeRepository;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.CommandLineRunner;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;

@SpringBootApplication
public class JpaCompositePrimaryKeyDemoApplication implements CommandLineRunner {

    @Autowired
    private EmployeeRepository employeeRepository;

    public static void main(String[] args) {
        SpringApplication.run(JpaCompositePrimaryKeyDemoApplication.class, args);
    }

    @Override
    public void run(String... args) throws Exception {
        // Cleanup employees table
        employeeRepository.deleteAllInBatch();

        // Insert a new Employee in the database
        Employee employee = new Employee(new EmployeeIdentity("E-123", "D-457"),
                "Yaniv Levy",
                "yaniv@fusebes.com",
                "+91-9999999999");

        employeeRepository.save(employee);
    }
}

We first clean up the Employee table and then insert a new Employee record with an employeeId and a companyId to test the setup.

You can run the application by typing mvn spring-boot:run from the root directory of the project. The Employee record will be inserted in the database once the application is successfully started.

Querying using the Composite Primary Key

Let’s now see some query examples using the composite primary key –

1. Retrieving an Employee using the composite primary key – (employeeId and companyId)

// Retrieving an Employee Record with the composite primary key
employeeRepository.findById(new EmployeeIdentity("E-123", "D-457"));

2. Retrieving all employees of a particular company

Let’s say that you want to retrieve all the employees of a company by companyId. For doing this, just add the following method in the EmployeeRepository interface.

@Repository
public interface EmployeeRepository extends JpaRepository<Employee, EmployeeIdentity> {
    /* 
       Spring Data JPA will automatically parse this method name 
       and create a query from it
    */
    List<Employee> findByEmployeeIdentityCompanyId(String companyId);
}

That’s all! You don’t need to implement anything. Spring Data JPA will dynamically generate a query using the method name. You can use the above method in the main class to retrieve all the employees of a company like this –

// Retrieving all the employees of a particular company
employeeRepository.findByEmployeeIdentityCompanyId("D-457");

Conclusion

Congratulations guys! In this article, you learned how to implement a composite primary key in hibernate using @Embeddable and @EmbeddedId annotations.

Thanks for reading. See you in the next post.

Spring Boot2 @SpringBootApplication Auto Configuration
30
Mar
2021

Spring Boot 2 @SpringBootApplication Auto Configuration

Spring boot is very easy to use and it does a lot of things under the hood, you might not be aware of. In future, a good developer will be who will know exactly what is going on behind spring boot auto configuration, how to use it in your favor and how to disable certain sections which you do not want into your project.

To understand most basic things behind spring boot, we will create a minimum boot application with single dependency and single launch class file. We will then analyze the startup logs to get the insights.

Create Spring boot application with launch class

  1. Create a new maven project in eclipse with archetype “maven-archetype-quickstart“.
  2. Update pom.xml file with spring-boot-starter-web dependency and plugin information.<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd;<modelVersion>4.0.0</modelVersion> <groupId>com.fusebes</groupId><artifactId>springbootdemo</artifactId><version>0.0.1-SNAPSHOT</version><packaging>jar</packaging> <name>springbootdemo</name><url>http://maven.apache.org</url> <parent><groupId>org.springframework.boot</groupId><artifactId>spring-boot-starter-parent</artifactId><version>2.0.0.RELEASE</version></parent> <properties><java.version>1.8</java.version><project.build.sourceEncoding>UTF-8</project.build.sourceEncoding></properties> <dependencies><dependency><groupId>org.springframework.boot</groupId><artifactId>spring-boot-starter-web</artifactId></dependency></dependencies> <build><plugins><plugin><groupId>org.springframework.boot</groupId><artifactId>spring-boot-maven-plugin</artifactId></plugin></plugins></build> <repositories><repository><id>repository.spring.release</id><name>Spring GA Repository</name><url>http://repo.spring.io/release</url></repository></repositories></project>
  3. Create launch application.package com.fusebes.demo; import org.springframework.boot.SpringApplication;import org.springframework.boot.autoconfigure.SpringBootApplication;import org.springframework.context.ApplicationContext; @SpringBootApplicationpublic class App {public static void main(String[] args) {ApplicationContext ctx = SpringApplication.run(App.class, args);}}What this launch class does?Above class is called spring boot application launch class. It used to Bootstrap and launch a Spring application from a Java main() method. It typically does following things –
    • Create an instance of Spring’s ApplicationContext.
    • Enable the functionality to accept command-line arguments and expose them as Spring properties.
    • Load all the Spring beans as per the configuration. You can do other operations as well as per project need arises.

@SpringBootApplication Annotation

This annotation is a shortcut of applying 3 annotations in one statement –

  1. @SpringBootConfiguration@SpringBootConfiguration is new annotation in Spring boot 2. Previously, we have been using @Configuration annotation. You can use @Configuration in place of this. Both are same thing.It indicates that a class provides Spring Boot application @Configuration. It simply means that annotated class is a configuration class and shall be scanned for further configurations and bean definitions.
  2. @EnableAutoConfigurationThis annotation is used to enable auto-configuration of the Spring Application Context, attempting to guess and configure beans that you are likely to need. Auto-configuration classes are usually applied based on your classpath and what beans you have defined.Auto-configuration tries to be as intelligent as possible and will back-away as you define more of your own configuration. You can always manually exclude any configuration that you never want to apply using two methods –i) Use excludeName()
    ii) Using the spring.autoconfigure.exclude property in properties file. e.g.@EnableAutoConfiguration(excludeName = {"multipartResolver","mbeanServer"})Auto-configuration is always applied after user-defined beans have been registered.
  3. @ComponentScanThis annotation provides support parallel with Spring XML’s context:component-scan element.Either basePackageClasses() or basePackages() may be specified to define specific packages to scan. If specific packages are not defined, scanning will occur from the package of the class that declares this annotation.

Run the launch application and check logs

Let’s start running it with the simplest option–running as a Java application. In your IDE, right-click on the application class and run it as Java Application. For getting insight of registered beans, I have added modified the launch application as below.

@SpringBootApplicationpublic class App {public static void main(String[] args) {ApplicationContext ctx = SpringApplication.run(App.class, args); String[] beanNames = ctx.getBeanDefinitionNames(); Arrays.sort(beanNames); for (String beanName : beanNames) {System.out.println(beanName);}}}

Now see the logs –

.   ____          _            __ _ _/\\ / ___'_ __ _ _(_)_ __  __ _ \ \ \ \( ( )\___ | '_ | '_| | '_ \/ _` | \ \ \ \\\/  ___)| |_)| | | | | || (_| |  ) ) ) )'  |____| .__|_| |_|_| |_\__, | / / / /=========|_|==============|___/=/_/_/_/:: Spring Boot ::        (v2.0.0.RELEASE) 2018-04-02 13:09:41.100  INFO 11452 --- [           main] com.fusebes.demo.App               : Starting App on FFC15B4E9C5AA with PID 11452 (C:\Users\zkpkhua\IDPPaymentTransfers_Integrated\springbootdemo\target\classes started by zkpkhua in C:\Users\zkpkhua\IDPPaymentTransfers_Integrated\springbootdemo)2018-04-02 13:09:41.108  INFO 11452 --- [           main] com.fusebes.demo.App               : No active profile set, falling back to default profiles: default2018-04-02 13:09:41.222  INFO 11452 --- [           main] ConfigServletWebServerApplicationContext : Refreshing org.springframework.boot.web.servlet.context.AnnotationConfigServletWebServerApplicationContext@4450d156: startup date [Mon Apr 02 13:09:41 IST 2018]; root of context hierarchy2018-04-02 13:09:43.474  INFO 11452 --- [           main] o.s.b.w.embedded.tomcat.TomcatWebServer  : Tomcat initialized with port(s): 8080 (http)2018-04-02 13:09:43.526  INFO 11452 --- [           main] o.apache.catalina.core.StandardService   : Starting service [Tomcat]2018-04-02 13:09:43.526  INFO 11452 --- [           main] org.apache.catalina.core.StandardEngine  : Starting Servlet Engine: Apache Tomcat/8.5.282018-04-02 13:09:43.748  INFO 11452 --- [ost-startStop-1] o.a.c.c.C.[Tomcat].[localhost].[/]       : Initializing Spring embedded WebApplicationContext2018-04-02 13:09:43.748  INFO 11452 --- [ost-startStop-1] o.s.web.context.ContextLoader            : Root WebApplicationContext: initialization completed in 2531 ms2018-04-02 13:09:43.964  INFO 11452 --- [ost-startStop-1] o.s.b.w.servlet.ServletRegistrationBean  : Servlet dispatcherServlet mapped to [/]2018-04-02 13:09:43.969  INFO 11452 --- [ost-startStop-1] o.s.b.w.servlet.FilterRegistrationBean   : Mapping filter: 'characterEncodingFilter' to: [/*]2018-04-02 13:09:43.970  INFO 11452 --- [ost-startStop-1] o.s.b.w.servlet.FilterRegistrationBean   : Mapping filter: 'hiddenHttpMethodFilter' to: [/*]2018-04-02 13:09:43.970  INFO 11452 --- [ost-startStop-1] o.s.b.w.servlet.FilterRegistrationBean   : Mapping filter: 'httpPutFormContentFilter' to: [/*]2018-04-02 13:09:43.970  INFO 11452 --- [ost-startStop-1] o.s.b.w.servlet.FilterRegistrationBean   : Mapping filter: 'requestContextFilter' to: [/*]2018-04-02 13:09:44.480  INFO 11452 --- [           main] s.w.s.m.m.a.RequestMappingHandlerAdapter : Looking for @ControllerAdvice: org.springframework.boot.web.servlet.context.AnnotationConfigServletWebServerApplicationContext@4450d156: startup date [Mon Apr 02 13:09:41 IST 2018]; root of context hierarchy2018-04-02 13:09:44.627  INFO 11452 --- [           main] s.w.s.m.m.a.RequestMappingHandlerMapping : Mapped "{[/error]}" onto public org.springframework.http.ResponseEntity<java.util.Map<java.lang.String, java.lang.Object>> org.springframework.boot.autoconfigure.web.servlet.error.BasicErrorController.error(javax.servlet.http.HttpServletRequest)2018-04-02 13:09:44.630  INFO 11452 --- [           main] s.w.s.m.m.a.RequestMappingHandlerMapping : Mapped "{[/error],produces=}" onto public org.springframework.web.servlet.ModelAndView org.springframework.boot.autoconfigure.web.servlet.error.BasicErrorController.errorHtml(javax.servlet.http.HttpServletRequest,javax.servlet.http.HttpServletResponse)2018-04-02 13:09:44.681  INFO 11452 --- [           main] o.s.w.s.handler.SimpleUrlHandlerMapping  : Mapped URL path [/webjars/**] onto handler of type [class org.springframework.web.servlet.resource.ResourceHttpRequestHandler]2018-04-02 13:09:44.682  INFO 11452 --- [           main] o.s.w.s.handler.SimpleUrlHandlerMapping  : Mapped URL path [/**] onto handler of type [class org.springframework.web.servlet.resource.ResourceHttpRequestHandler]2018-04-02 13:09:44.747  INFO 11452 --- [           main] o.s.w.s.handler.SimpleUrlHandlerMapping  : Mapped URL path [/**/favicon.ico] onto handler of type [class org.springframework.web.servlet.resource.ResourceHttpRequestHandler]2018-04-02 13:09:45.002  INFO 11452 --- [           main] o.s.j.e.a.AnnotationMBeanExporter        : Registering beans for JMX exposure on startup2018-04-02 13:09:45.070  INFO 11452 --- [           main] o.s.b.w.embedded.tomcat.TomcatWebServer  : Tomcat started on port(s): 8080 (http) with context path ''2018-04-02 13:09:45.076  INFO 11452 --- [           main] com.fusebes.demo.App               : Started App in 4.609 seconds (JVM running for 5.263)appbasicErrorControllerbeanNameHandlerMappingbeanNameViewResolvercharacterEncodingFilterconventionErrorViewResolverdefaultServletHandlerMappingdefaultValidatordefaultViewResolverdispatcherServletdispatcherServletRegistrationerrorerrorAttributeserrorPageCustomizererrorPageRegistrarBeanPostProcessorfaviconHandlerMappingfaviconRequestHandlerhandlerExceptionResolverhiddenHttpMethodFilterhttpPutFormContentFilterhttpRequestHandlerAdapterjacksonCodecCustomizerjacksonObjectMapperjacksonObjectMapperBuilderjsonComponentModulelocaleCharsetMappingsCustomizermappingJackson2HttpMessageConvertermbeanExportermbeanServermessageConvertersmethodValidationPostProcessormultipartConfigElementmultipartResolvermvcContentNegotiationManagermvcConversionServicemvcHandlerMappingIntrospectormvcPathMatchermvcResourceUrlProvidermvcUriComponentsContributormvcUrlPathHelpermvcValidatormvcViewResolverobjectNamingStrategyorg.springframework.boot.autoconfigure.AutoConfigurationPackagesorg.springframework.boot.autoconfigure.condition.BeanTypeRegistryorg.springframework.boot.autoconfigure.context.ConfigurationPropertiesAutoConfigurationorg.springframework.boot.autoconfigure.context.PropertyPlaceholderAutoConfigurationorg.springframework.boot.autoconfigure.http.HttpMessageConvertersAutoConfigurationorg.springframework.boot.autoconfigure.http.HttpMessageConvertersAutoConfiguration$StringHttpMessageConverterConfigurationorg.springframework.boot.autoconfigure.http.JacksonHttpMessageConvertersConfigurationorg.springframework.boot.autoconfigure.http.JacksonHttpMessageConvertersConfiguration$MappingJackson2HttpMessageConverterConfigurationorg.springframework.boot.autoconfigure.http.codec.CodecsAutoConfigurationorg.springframework.boot.autoconfigure.http.codec.CodecsAutoConfiguration$JacksonCodecConfigurationorg.springframework.boot.autoconfigure.info.ProjectInfoAutoConfigurationorg.springframework.boot.autoconfigure.internalCachingMetadataReaderFactoryorg.springframework.boot.autoconfigure.jackson.JacksonAutoConfigurationorg.springframework.boot.autoconfigure.jackson.JacksonAutoConfiguration$Jackson2ObjectMapperBuilderCustomizerConfigurationorg.springframework.boot.autoconfigure.jackson.JacksonAutoConfiguration$JacksonObjectMapperBuilderConfigurationorg.springframework.boot.autoconfigure.jackson.JacksonAutoConfiguration$JacksonObjectMapperConfigurationorg.springframework.boot.autoconfigure.jackson.JacksonAutoConfiguration$ParameterNamesModuleConfigurationorg.springframework.boot.autoconfigure.jmx.JmxAutoConfigurationorg.springframework.boot.autoconfigure.security.reactive.ReactiveSecurityAutoConfigurationorg.springframework.boot.autoconfigure.validation.ValidationAutoConfigurationorg.springframework.boot.autoconfigure.web.client.RestTemplateAutoConfigurationorg.springframework.boot.autoconfigure.web.embedded.EmbeddedWebServerFactoryCustomizerAutoConfigurationorg.springframework.boot.autoconfigure.web.embedded.EmbeddedWebServerFactoryCustomizerAutoConfiguration$TomcatWebServerFactoryCustomizerConfigurationorg.springframework.boot.autoconfigure.web.servlet.DispatcherServletAutoConfigurationorg.springframework.boot.autoconfigure.web.servlet.DispatcherServletAutoConfiguration$DispatcherServletConfigurationorg.springframework.boot.autoconfigure.web.servlet.DispatcherServletAutoConfiguration$DispatcherServletRegistrationConfigurationorg.springframework.boot.autoconfigure.web.servlet.HttpEncodingAutoConfigurationorg.springframework.boot.autoconfigure.web.servlet.MultipartAutoConfigurationorg.springframework.boot.autoconfigure.web.servlet.ServletWebServerFactoryAutoConfigurationorg.springframework.boot.autoconfigure.web.servlet.ServletWebServerFactoryConfiguration$EmbeddedTomcatorg.springframework.boot.autoconfigure.web.servlet.WebMvcAutoConfigurationorg.springframework.boot.autoconfigure.web.servlet.WebMvcAutoConfiguration$EnableWebMvcConfigurationorg.springframework.boot.autoconfigure.web.servlet.WebMvcAutoConfiguration$WebMvcAutoConfigurationAdapterorg.springframework.boot.autoconfigure.web.servlet.WebMvcAutoConfiguration$WebMvcAutoConfigurationAdapter$FaviconConfigurationorg.springframework.boot.autoconfigure.web.servlet.error.ErrorMvcAutoConfigurationorg.springframework.boot.autoconfigure.web.servlet.error.ErrorMvcAutoConfiguration$DefaultErrorViewResolverConfigurationorg.springframework.boot.autoconfigure.web.servlet.error.ErrorMvcAutoConfiguration$WhitelabelErrorViewConfigurationorg.springframework.boot.autoconfigure.websocket.servlet.WebSocketServletAutoConfigurationorg.springframework.boot.autoconfigure.websocket.servlet.WebSocketServletAutoConfiguration$TomcatWebSocketConfigurationorg.springframework.boot.context.properties.ConfigurationBeanFactoryMetadataorg.springframework.boot.context.properties.ConfigurationPropertiesBindingPostProcessororg.springframework.context.annotation.internalAutowiredAnnotationProcessororg.springframework.context.annotation.internalCommonAnnotationProcessororg.springframework.context.annotation.internalConfigurationAnnotationProcessororg.springframework.context.annotation.internalRequiredAnnotationProcessororg.springframework.context.event.internalEventListenerFactoryorg.springframework.context.event.internalEventListenerProcessorparameterNamesModulepreserveErrorControllerTargetClassPostProcessorpropertySourcesPlaceholderConfigurerrequestContextFilterrequestMappingHandlerAdapterrequestMappingHandlerMappingresourceHandlerMappingrestTemplateBuilderserver-org.springframework.boot.autoconfigure.web.ServerPropertiesservletWebServerFactoryCustomizersimpleControllerHandlerAdapterspring.http.encoding-org.springframework.boot.autoconfigure.http.HttpEncodingPropertiesspring.info-org.springframework.boot.autoconfigure.info.ProjectInfoPropertiesspring.jackson-org.springframework.boot.autoconfigure.jackson.JacksonPropertiesspring.mvc-org.springframework.boot.autoconfigure.web.servlet.WebMvcPropertiesspring.resources-org.springframework.boot.autoconfigure.web.ResourcePropertiesspring.security-org.springframework.boot.autoconfigure.security.SecurityPropertiesspring.servlet.multipart-org.springframework.boot.autoconfigure.web.servlet.MultipartPropertiesstandardJacksonObjectMapperBuilderCustomizerstringHttpMessageConvertertomcatServletWebServerFactorytomcatServletWebServerFactoryCustomizertomcatWebServerFactoryCustomizerviewControllerHandlerMappingviewResolverwebServerFactoryCustomizerBeanPostProcessorwebsocketContainerCustomizerwelcomePageHandlerMapping

You see how many beans got registered automatically. That’s beauty of spring boot. If you want to dig deeper into why any particular bean got registered? You can see that by putting a debug flag at application startup.

Simply pass -Ddebug=true as VM argument.

Now when you run the application, you will get lots of debug logs having similar information :

CodecsAutoConfiguration.JacksonCodecConfiguration matched:- @ConditionalOnClass found required class 'com.fasterxml.jackson.databind.ObjectMapper'; @ConditionalOnMissingClass did not find unwanted class (OnClassCondition) CodecsAutoConfiguration.JacksonCodecConfiguration#jacksonCodecCustomizer matched:- @ConditionalOnBean (types: com.fasterxml.jackson.databind.ObjectMapper; SearchStrategy: all) found bean 'jacksonObjectMapper' (OnBeanCondition) DispatcherServletAutoConfiguration.DispatcherServletConfiguration matched:- @ConditionalOnClass found required class 'javax.servlet.ServletRegistration'; @ConditionalOnMissingClass did not find unwanted class (OnClassCondition)- Default DispatcherServlet did not find dispatcher servlet beans (DispatcherServletAutoConfiguration.DefaultDispatcherServletCondition) DispatcherServletAutoConfiguration.DispatcherServletRegistrationConfiguration matched:- @ConditionalOnClass found required class 'javax.servlet.ServletRegistration'; @ConditionalOnMissingClass did not find unwanted class (OnClassCondition)- DispatcherServlet Registration did not find servlet registration bean (DispatcherServletAutoConfiguration.DispatcherServletRegistrationCondition) .........

Above logs tell why a particular bean was registered into spring context. This information is very useful when you debug the issues with auto configutation.

Similarily, everytime we add a new dependency to a Spring Boot project, Spring Boot auto-configuration automatically tries to configure the beans based on the dependency.

I hope that information discussed above will help you in future while debugging spring boot related issues.

Happy Learning !!

Spring Boot Annotations
30
Mar
2021

Spring Boot Annotations

The Spring Boot annotations are mostly placed in
org.springframework.boot.autoconfigure and
org.springframework.boot.autoconfigure.condition packages.
Let’s learn about some frequently used spring boot annotations as well as which work behind the scene.

1. @SpringBootApplication

Spring boot is mostly about auto-configuration. This auto-configuration is done by component scanning i.e. finding all classes in classspath for @Component annotation. It also involve scanning of @Configuration annotation and initialize some extra beans.

@SpringBootApplication annotation enable all able things in one step. It enables the three features:

  1. @EnableAutoConfiguration : enable auto-configuration mechanism
  2. @ComponentScan : enable @Component scan
  3. @SpringBootConfiguration : register extra beans in the context

The java class annotated with @SpringBootApplication is the main class of a Spring Boot application and application starts from here.

import org.springframework.boot.SpringApplication;import org.springframework.boot.autoconfigure.SpringBootApplication; @SpringBootApplicationpublic class Application { public static void main(String[] args) {SpringApplication.run(Application.class, args);} }

2. @EnableAutoConfiguration

This annotation enables auto-configuration of the Spring Application Context, attempting to guess and configure beans that we are likely to need based on the presence of predefined classes in classpath.

For example, if we have tomcat-embedded.jar on the classpath, we are likely to want a TomcatServletWebServerFactory.

As this annotation is already included via @SpringBootApplication, so adding it again on main class has no impact. It is also advised to include this annotation only once via @SpringBootApplication.

Auto-configuration classes are regular Spring Configuration beans. They are located using the SpringFactoriesLoader mechanism (keyed against this class). Generally auto-configuration beans are @Conditional beans (most often using @ConditionalOnClass and @ConditionalOnMissingBean annotations).

3. @SpringBootConfiguration

It indicates that a class provides Spring Boot application configuration. It can be used as an alternative to the Spring’s standard @Configuration annotation so that configuration can be found automatically.

Application should only ever include one @SpringBootConfiguration and most idiomatic Spring Boot applications will inherit it from @SpringBootApplication.

The main difference is both annotations is that @SpringBootConfiguration allows configuration to be automatically located. This can be especially useful for unit or integration tests.

4. @ImportAutoConfiguration

It import and apply only the specified auto-configuration classes. The difference between @ImportAutoConfiguration and @EnableAutoConfiguration is that later attempts to configure beans that are found in the classpath during scanning, whereas @ImportAutoConfiguration only runs the configuration classes that we provide in the annotation.

We should use @ImportAutoConfiguration when we don’t want to enable the default auto-configuration.

@ComponentScan("path.to.your.controllers")@ImportAutoConfiguration({WebMvcAutoConfiguration.class,DispatcherServletAutoConfiguration.class,EmbeddedServletContainerAutoConfiguration.class,ServerPropertiesAutoConfiguration.class,HttpMessageConvertersAutoConfiguration.class})public class App {public static void main(String[] args) {SpringApplication.run(App.class, args);}}

5. @AutoConfigureBefore, @AutoConfigureAfter, @AutoConfigureOrder

We can use the @AutoConfigureAfter or @AutoConfigureBefore annotations if our configuration needs to be applied in a specific order (before of after).

If we want to order certain auto-configurations that should not have any direct knowledge of each other, we can also use @AutoConfigureOrder. That annotation has the same semantic as the regular @Order annotation but provides a dedicated order for auto-configuration classes.

@Configuration@AutoConfigureAfter(CacheAutoConfiguration.class)@ConditionalOnBean(CacheManager.class)@ConditionalOnClass(CacheStatisticsProvider.class)public class RedissonCacheStatisticsAutoConfiguration {@Beanpublic RedissonCacheStatisticsProvider redissonCacheStatisticsProvider(){return new RedissonCacheStatisticsProvider();}}

5. Condition Annotations

All auto-configuration classes generally have one or more @Conditional annotations. It allow to register a bean only when the condition meets. Following are some useful conditional annotations to use.

5.1. @ConditionalOnBean and @ConditionalOnMissingBean

These annotations let a bean be included based on the presence or absence of specific beans.

It’s value attribute is used to specify beans by type or by name. Also the search attribute lets us limit the ApplicationContext hierarchy that should be considered when searching for beans.

Using these annotations at the class level prevents registration of the @Configuration class as a bean if the condition does not match.

In below example, bean JpaTransactionManager will only be loaded if a bean of type JpaTransactionManager is not already defined in the application context.

@Bean@ConditionalOnMissingBean(type = "JpaTransactionManager")JpaTransactionManager transactionManager(EntityManagerFactory entityManagerFactory) {JpaTransactionManager transactionManager = new JpaTransactionManager();transactionManager.setEntityManagerFactory(entityManagerFactory);return transactionManager;}

5.2. @ConditionalOnClass and @ConditionalOnMissingClass

These annotations let configuration classes be included based on the presence or absence of specific classes. Notice that annotation metadata is parsed by using spring ASM module, and even if a class might not be present in runtime – you can still refer to the class in annotation.

We can also use value attribute to refer the real class or the name attribute to specify the class name by using a String value.

Below configuration will create EmbeddedAcmeService only if this class is available in runtime and no other bean with same name is present in application context.

@Configuration@ConditionalOnClass(EmbeddedAcmeService.class)static class EmbeddedConfiguration { @Bean@ConditionalOnMissingBeanpublic EmbeddedAcmeService embeddedAcmeService() { ... } }

5.3. @ConditionalOnNotWebApplication and @ConditionalOnWebApplication

These annotations let configuration be included depending on whether the application is a “web application” or not. In Spring, a web application is one which meets at least one of below three requirements:

  1. uses a Spring WebApplicationContext
  2. defines a session scope
  3. has a StandardServletEnvironment

5.4. @ConditionalOnProperty

This annotation lets configuration be included based on the presence and value of a Spring Environment property.

For example, if we have different datasource definitions for different environments, we can use this annotation.

@Bean@ConditionalOnProperty(name = "env", havingValue = "local")DataSource dataSource() {// ...} @Bean@ConditionalOnProperty(name = "env", havingValue = "prod")DataSource dataSource() {// ...}

5.5. @ConditionalOnResource

This annotation lets configuration be included only when a specific resource is present in the classpath. Resources can be specified by using the usual Spring conventions.

@ConditionalOnResource(resources = "classpath:vendor.properties")Properties additionalProperties() {// ...}

5.6. @ConditionalOnExpression

This annotation lets configuration be included based on the result of a SpEL expression. Use this annotation when condition to evaluate is complex one and shall be evaluated as one condition.

@Bean@ConditionalOnExpression("${env} && ${havingValue == 'local'}")DataSource dataSource() {// ...}

5.7. @ConditionalOnCloudPlatform

This annotation lets configuration be included when the specified cloud platform is active.

@Configuration@ConditionalOnCloudPlatform(CloudPlatform.CLOUD_FOUNDRY)public class CloudConfigurationExample {@Beanpublic MyBean myBean(MyProperties properties) {return new MyBean(properties.getParam);}}

Drop me your questions related to spring boot annotations in comments.

Happy Learning !!

Ref: Spring Boot Docs

JPA / Hibernate ElementCollection Example with Spring Boot
03
Mar
2021

JPA / Hibernate ElementCollection Example with Spring Boot

In this article, You’ll learn how to map a collection of basic as well as embeddable types using JPA’s @ElementCollection and @CollectionTable annotations.

Let’s say that the users of your application can have multiple phone numbers and addresses. To map this requirement into the database schema, you need to create separate tables for storing the phone numbers and addresses –

Hibernate Spring Boot JPA @ElementCollection example table structure

Both the tables user_phone_numbers and user_addresses contain a foreign key to the users table.

You can implement such relationship at the object level using JPA’s one-to-many mapping. But for basic and embeddable types like the one in the above schema, JPA has a simple solution in the form of ElementCollection.

Let’s create a project from scratch and learn how to use an ElementCollection to map the above schema in your application using hibernate.

Creating the Application

If you have Spring Boot CLI installed, then simply type the following command in your terminal to generate the application –

spring init -n=jpa-element-collection-demo -d=web,jpa,mysql --package-name=com.example.jpa jpa-element-collection-demo

Alternatively, You can use Spring Initializr web app to generate the application by following the instructions below –

  1. Open http://start.spring.io
  2. Enter Artifact as “jpa-element-collection-demo”
  3. Click Options dropdown to see all the options related to project metadata.
  4. Change Package Name to “com.example.jpa”
  5. Select WebJPA and Mysql dependencies.
  6. Click Generate to generate and download the project.

Configuring MySQL database and Hibernate Log levels

Let’s first configure the database URL, username, password, hibernate log levels and other properties.

Open src/main/resources/application.properties and add the following properties to it –

# DATASOURCE (DataSourceAutoConfiguration & DataSourceProperties)
spring.datasource.url=jdbc:mysql://localhost:3306/jpa_element_collection_demo?useSSL=false&serverTimezone=UTC&useLegacyDatetimeCode=false
spring.datasource.username=root
spring.datasource.password=root

# Hibernate

# The SQL dialect makes Hibernate generate better SQL for the chosen database
spring.jpa.properties.hibernate.dialect = org.hibernate.dialect.MySQL5InnoDBDialect

# Hibernate ddl auto (create, create-drop, validate, update)
spring.jpa.hibernate.ddl-auto = update

logging.level.org.hibernate.SQL=DEBUG
logging.level.org.hibernate.type=TRACE

Please change spring.datasource.username and spring.datasource.password as per your MySQL installation. Also, create a database named jpa_element_collection_demo before proceeding to the next section.

Defining the Entity classes

Next, We’ll define the Entity classes that will be mapped to the database tables we saw earlier.

Before defining the User entity, let’s first define the Address type which will be embedded inside the User entity.

All the domain models will go inside the package com.example.jpa.model.

1. Address – Embeddable Type

package com.example.jpa.model;

import javax.persistence.Embeddable;
import javax.validation.constraints.NotNull;
import javax.validation.constraints.Size;

@Embeddable
public class Address {
    @NotNull
    @Size(max = 100)
    private String addressLine1;

    @NotNull
    @Size(max = 100)
    private String addressLine2;

    @NotNull
    @Size(max = 100)
    private String city;

    @NotNull
    @Size(max = 100)
    private String state;

    @NotNull
    @Size(max = 100)
    private String country;

    @NotNull
    @Size(max = 100)
    private String zipCode;

    public Address() {

    }

    public Address(String addressLine1, String addressLine2, String city, 
                   String state, String country, String zipCode) {
        this.addressLine1 = addressLine1;
        this.addressLine2 = addressLine2;
        this.city = city;
        this.state = state;
        this.country = country;
        this.zipCode = zipCode;
    }

    // Getters and Setters (Omitted for brevity)
}

2. User Entity

Let’s now see how we can map a collection of basic types (phone numbers) and embeddable types (addresses) using hibernate –

package com.example.jpa.model;

import javax.persistence.*;
import javax.validation.constraints.Email;
import javax.validation.constraints.NotNull;
import javax.validation.constraints.Size;
import java.util.HashSet;
import java.util.Set;

@Entity
@Table(name = "users")
public class User {
    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    private Long id;

    @NotNull
    @Size(max = 100)
    private String name;

    @NotNull
    @Email
    @Size(max = 100)
    @Column(unique = true)
    private String email;

    @ElementCollection
    @CollectionTable(name = "user_phone_numbers", joinColumns = @JoinColumn(name = "user_id"))
    @Column(name = "phone_number")
    private Set<String> phoneNumbers = new HashSet<>();

    @ElementCollection(fetch = FetchType.LAZY)
    @CollectionTable(name = "user_addresses", joinColumns = @JoinColumn(name = "user_id"))
    @AttributeOverrides({
            @AttributeOverride(name = "addressLine1", column = @Column(name = "house_number")),
            @AttributeOverride(name = "addressLine2", column = @Column(name = "street"))
    })
    private Set<Address> addresses = new HashSet<>();


    public User() {

    }

    public User(String name, String email, Set<String> phoneNumbers, Set<Address> addresses) {
        this.name = name;
        this.email = email;
        this.phoneNumbers = phoneNumbers;
        this.addresses = addresses;
    }

    // Getters and Setters (Omitted for brevity)
}

We use @ElementCollection annotation to declare an element-collection mapping. All the records of the collection are stored in a separate table. The configuration for this table is specified using the @CollectionTable annotation.

The @CollectionTable annotation is used to specify the name of the table that stores all the records of the collection, and the JoinColumn that refers to the primary table.

Moreover, When you’re using an Embeddable type with Element Collection, you can use the @AttributeOverrides and @AttributeOverride annotations to override/customize the fields of the embeddable type.

Defining the Repository

Next, Let’s create the repository for accessing the user’s data from the database. You need to create a new package called repository inside com.example.jpa package and add the following interface inside the repository package –

package com.example.jpa.repository;

import com.example.jpa.model.User;
import org.springframework.data.jpa.repository.JpaRepository;
import org.springframework.stereotype.Repository;

@Repository
public interface UserRepository extends JpaRepository<User, Long> {

}

Testing the ElementCollection Setup

Finally, Let’s write the code to test our setup in the main class JpaElementCollectionDemoApplication.java –

package com.example.jpa;

import com.example.jpa.model.Address;
import com.example.jpa.model.User;
import com.example.jpa.repository.UserRepository;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.CommandLineRunner;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;

import java.util.HashSet;
import java.util.Set;

@SpringBootApplication
public class JpaElementCollectionDemoApplication implements CommandLineRunner {

    @Autowired
    private UserRepository userRepository;

    public static void main(String[] args) {
        SpringApplication.run(JpaElementCollectionDemoApplication.class, args);
    }

    @Override
    public void run(String... args) throws Exception {
        // Cleanup database tables.
        userRepository.deleteAll();

        // Insert a user with multiple phone numbers and addresses.
        Set<String> phoneNumbers = new HashSet<>();
        phoneNumbers.add("+91-9999999999");
        phoneNumbers.add("+91-9898989898");

        Set<Address> addresses = new HashSet<>();
        addresses.add(new Address("747", "Golf View Road", "Bangalore",
                "Karnataka", "India", "560008"));
        addresses.add(new Address("Plot No 44", "Electronic City", "Bangalore",
                "Karnataka", "India", "560001"));

        User user = new User("Yaniv Levy", "yaniv@fusebes.com",
                phoneNumbers, addresses);

        userRepository.save(user);
    }
}

The main class implements the CommandLineRunner interface and provides the implementation of its run() method.

The run() method is executed post application startup. In the run() method, we first cleanup all the tables and then insert a new user with multiple phone numbers and addresses.

You can run the application by typing mvn spring-boot:run from the root directory of the project.

After running the application, Go ahead and check all the tables in MySQL. The users table will have a new entry, the user_phone_numbers and user_addresses table will have two new entries –

mysql> select * from users;
+----+-----------------------+--------------------+
| id | email                 | name               |
+----+-----------------------+--------------------+
|  3 | yaniv@fusebes.com | Yaniv Levy |
+----+-----------------------+--------------------+
1 row in set (0.01 sec)
mysql> select * from user_phone_numbers;
+---------+----------------+
| user_id | phone_number   |
+---------+----------------+
|       3 | +91-9898989898 |
|       3 | +91-9999999999 |
+---------+----------------+
2 rows in set (0.00 sec)
mysql> select * from user_addresses;
+---------+--------------+-----------------+-----------+---------+-----------+----------+
| user_id | house_number | street          | city      | country | state     | zip_code |
+---------+--------------+-----------------+-----------+---------+-----------+----------+
|       3 | 747          | Golf View Road  | Bangalore | India   | Karnataka | 560008   |
|       3 | Plot No 44   | Electronic City | Bangalore | India   | Karnataka | 560001   |
+---------+--------------+-----------------+-----------+---------+-----------+----------+
2 rows in set (0.00 sec)

Conclusion

That’s all in this article Folks. I hope you learned how to use hibernate’s element-collection with Spring Boot.

Thanks for reading guys. Happy Coding! 🙂

Introduction to Event Streaming with Kafka and Kafdrop
26
Mar
2021

Introduction to Event Streaming with Kafka and Kafdrop

Event sourcing, eventual consistency, microservices, CQRS… These are quickly becoming household names in mainstream application development. But do you know what makes them tick? What are the basic building blocks required to assemble complex, business-centric applications from fine-grained services without turning the lot into a big ball of mud?

This article examines a fundamental building block — event streaming. Leading the charge will be Apache Kafka — the de facto standard in event streaming platforms, which we’ll observe through Kafdrop — a feature-packed web UI.

A Brief Intro

Event streaming platforms reside in the broader class of Message-oriented Middleware (MoM) and are similar to traditional message queues and topics but offer stronger temporal guarantees and typically order-of-magnitude performance gains due to log-structured immutability. In simple terms, write operations are mostly limited to sequential appends, which make them fast. Really fast.

Whereas messages in a traditional Message Queue (MQ) tend to be arbitrarily ordered and generally independent of one another, events (or records) in a stream tend to be strongly ordered, often chronologically or causally. Also, a stream persists its records, whereas an MQ will discard a message as soon as it has been read.

For this reason, event streaming tends to be a better fit for Event-Driven Architectures, encompassing event sourcing, eventual consistency, and CQRS concepts. (Of course, there are FIFO message queues too, but the differences between FIFO queues and fully-fledged event streaming platforms are quite substantial, not limited to ordering alone.)

Event streaming platforms are a comparatively recent paradigm within the broader MoM domain. There are only a handful of mainstream implementations available, compared to hundreds of MQ-style brokers, some going back to the 1980s (e.g. Tuxedo). Compared to established standards such as AMQP, MQTT, XMPP, and JMS, there are no equivalent standards in the streaming space.

Event streaming platforms are an active area of continuous research and experimentation. In spite of this, streaming platforms aren’t just a niche concept or an academic idea with a few esoteric use cases; they can be applied effectively to a broad range of messaging and eventing scenarios, routinely displacing their more traditional counterparts.

You may also like: A Kafka Tutorial for Everyone, no Matter Your Stage in Development.

Architecture Overview

The diagram below offers a brief overview of the Kafka component architecture. While the intention isn’t to indoctrinate you with all of Kafka’s inner workings, some appreciation of its design will go a long way in explaining the key concepts that we will cover shortly.

Kafka Architecture Overview

Kafka is a distributed system comprising several key components:

  • Broker nodes: Responsible for the bulk of I/O operations and durable persistence within the cluster. Brokers accommodate the append-only log files that comprise the topic partitions hosted by the cluster. Partitions can be replicated across multiple brokers for both horizontal scalability and increased durability — these are called replicas. A broker node may act as the leader for certain replicas, while being a follower for others. A single broker node will also be elected as the cluster controller — responsible for the internal management of partition states. This includes the arbitration of the leader-follower roles for any given partition.
  • ZooKeeper nodes: Under the hood, Kafka needs a way to manage the overall controller status in the cluster. Should the controller drop out for whatever reason, there is a protocol in place to elect another controller from the set of remaining brokers. The actual mechanics of controller election, heart-beating, and so forth, are largely implemented in ZooKeeper. ZooKeeper also acts as a configuration repository of sorts, maintaining cluster metadata, leader-follower states, quotas, user information, ACLs, and other housekeeping items. Due to the underlying gossiping and consensus protocol, the number of ZooKeeper nodes must be odd.
  • Producers: These are client applications responsible for appending records to Kafka topics. Because of the log-structured nature of Kafka and the ability to share topics across multiple consumer ecosystems, only producers are able to modify the data in the underlying log files. The actual I/O is performed by the broker nodes on behalf of the producer clients. Any number of producers may publish to the same topic, selecting the partitions used to persist the records.
  • Consumers: These are client applications that read from topics. Any number of consumers may read from the same topic; however, depending on the configuration and grouping of consumers, there are rules governing the distribution of records among the consumers.

Topics, Partitions, Records, and Offsets

partition is a totally ordered sequence of records and is fundamental to Kafka. A record has an ID — a 64-bit integer offset and a millisecond-precise timestamp. Also, it may have a key and a value; both are byte arrays and both are optional. The term “totally ordered” simply means that for any given producer, records will be written in the order they were emitted by the application. If record P was published before Q, then P will precede Q in the partition. (Assuming P and Q share a partition.)

Furthermore, they will be read in the same order by all consumers; P will always be read before Q, for every possible consumer. This ordering guarantee is vital in a large majority of use cases. Published records will generally correspond to some real-life events, and preserving the timeline of these events is often essential.

Note: Kafka uses the term “record,” where others might use “message” or “event.” In this article, we will use the terms interchangeably, depending on the context. Similarly, you might see the term “stream” as a generic substitute for “topic.”

There is no recognized ordering across producers. If two (or more) producers emit records simultaneously, those records may materialize in arbitrary order. That said, this order will still be observed uniformly across all consumers.

A record’s offset uniquely identifies it in the partition. The offset is a strictly monotonically-increasing integer in a sparse address space, meaning that each successive offset is always higher than its predecessor and there may be varying gaps between neighboring offsets. Gaps might legitimately appear if compaction is enabled or as a result of transactions; we don’t need to delve into the details at this stage. Suffice it to say that offsets need not be contiguous.

Your application shouldn’t attempt to literally interpret an offset or guess what the next offset might be. It may, however, infer the relative order of any record pair based on their offsets, sort the records by their offset, and so forth.

The diagram below shows what a partition looks like on the inside.1

     start of partition

2

+--------+-----------------+

3

|0..00000|First record     |

4

+--------+-----------------+

5

|0..00001|Second record    |

6

+--------+-----------------+

7

|0..00002|Third record     |

8

+--------+-----------------+

9

|0..00003|Fourth record    |

10

+--------+-----------------+

11

|0..00007|Fifth record     |

12

+--------+-----------------+

13

|0..00008|Sixth record     |

14

+--------+-----------------+

15

|0..00010|Seventh record   |

16

+--------+-----------------+

17

            ...

18

+--------+-----------------+

19

|0..56789|Last record      |

20

+--------+-----------------+

21

       end of partition

The beginning offset, also called the low-water mark, is the first message that will be presented to a consumer. Due to Kafka’s bounded retention, this is not necessarily the first message that was published. Records may be pruned on the basis of time and/or partition size. When this occurs, the low-water mark will appear to advance, and records earlier than the low-water mark will be truncated.

Conversely, the high-water mark is the offset immediately following the last record in the partition, also known as the end offset. It is the offset that will be assigned to the next record that will be published. It is not the offset of the last record.

topic is a logical composition of partitions. A topic may have one or more partitions, and a partition must be a part of exactly one topic. Topics are fundamental to Kafka, allowing for both parallelism and load balancing.

Earlier, we said that partitions exhibit total order. Because partitions within a topic are mutually independent, the topic is said to exhibit partial order. In simple terms, this means that certain records may be ordered in relation to one another while being unordered with respect to certain other records. The concepts of total and partial order, while sounding somewhat academic, are hugely important in the construction of performant event streaming pipelines. They enables us to process records in parallel where we can, while maintaining order where we must. We’ll explore the concepts of record order, consumer parallelism, and topic sizing in a short while.

Example: Publishing Messages

Let’s put some of this theory into practice. We are going to spin up a pair of Docker containers — one for Kafka and another for Kafdrop. Rather than launching them individually, we’ll use Docker Compose.

Create a docker-compose.yaml file in a directory of your choice, containing the following:1

version: "2"

2

services:

3

  kafdrop:

4

    image: obsidiandynamics/kafdrop

5

    restart: "no"

6

    ports:

7

      - "9000:9000"

8

    environment:

9

      KAFKA_BROKERCONNECT: "kafka:29092"

10

    depends_on:

11

      - "kafka"

12

  kafka:

13

    image: obsidiandynamics/kafka

14

    restart: "no"

15

    ports:

16

      - "2181:2181"

17

      - "9092:9092"

18

    environment:

19

      KAFKA_LISTENERS: "INTERNAL://:29092,EXTERNAL://:9092"

20

      KAFKA_ADVERTISED_LISTENERS: "INTERNAL://kafka:29092,EXTERNAL://localhost:9092"

21

      KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: "INTERNAL:PLAINTEXT,EXTERNAL:PLAINTEXT"

22

      KAFKA_INTER_BROKER_LISTENER_NAME: "INTERNAL"

Note: We’re using the obsidiandynamics/kafka image for convenience because it neatly bundles Kafka and ZooKeeper into a single image. If you wanted to, you could replace this with images from Confluent or Wurstmeister, but then you’d have to wire it all up properly. The obsidiandynamics/kafka image does all this for you, so it’s highly recommended for beginners (and lazy pros).

Then, start it with docker-compose up. Once it boots, navigate to localhost:9000 in your browser. You should see the Kafdrop landing screen.

Kafdrop landing page

You should see our single-broker cluster. It’s a promising start, but there are no topics. Not a problem; let’s create a topic and publish some messages using Kafka’s command-line tools. Conveniently, we already have a Kafka image running as part of our docker-compose stack, so we can shell into it to use the built-in CLI tools.1

docker exec -it kafka-kafdrop_kafka_1 bash

This gets you into a Bash shell. The tools are in the /opt/kafka/bin directory, so let’s cd into it:1

cd /opt/kafka/bin

Create a topic named streams-intro with three partitions:1

./kafka-topics.sh --bootstrap-server localhost:9092 \

2

    --create --partitions 3 --replication-factor 1 \

3

    --topic streams-intro

Switching back to Kafdrop, we should now see the new topic in the list.

Kafdrop topics list

Time to publish stuff. We are going to use the kafka-console-producer tool:1

./kafka-console-producer.sh --broker-list localhost:9092 \

2

    --topic streams-intro --property "parse.key=true" \

3

    --property "key.separator=:"

Note:kafka-topics uses the --bootstrap-server argument to configure the Kafka broker list, while kafka-console-producer uses the --broker-list argument for the same purpose. Also, --property arguments are largely undocumented; be prepared to Google your way around.

Records are separated by newlines. The key and the value parts are delimited by colons, as indicated by the key.separator property. For the sake of an example, type in the following (a copy-paste will do):1

foo:first message

2

foo:second message

3

bar:first message

4

foo:third message

5

bar:second message

Press CTRL+D when done. Then, switch back to Kafdrop and click on the streams-intro topic. You’ll see an overview of the topic, along with a detailed breakdown of the underlying partitions:

Kafdrop topic overview

Let’s pause for a moment and dissect what’s been done. We created a topic with three partitions. We then published five records using two unique keys — foo and bar. Kafka uses keys to map records to partitions, such that all records with the same key will always appear on the same partition. Handy, but also important because it lets the publisher dictate the precise order of records. We’ll discuss key hashing and partition assignments in more detail later; in the meanwhile, sit back and enjoy the ride.

Looking at the partitions table, partition #0 has the first and last offsets at zero and two respectively. Partition #2 has them at zero and three, while partition #1 appears to blank. Clicking on #0 in the Kafdrop web UI sends us to a topic viewer:

Kafdrop topic viewer

We can see the two records published under the bar key. Note, they are completely unrelated to the foo records. Other than being collated within the same topic, there is nothing that binds records across partitions.

Note: In case you were wondering, the arrow to the left of the message lets you expand and pretty-print JSON-encoded messages. As our examples didn’t use JSON, there’s nothing to pretty-print.

It can be said without exaggeration that Kafka’s built-in tooling is an abomination. There is no consistency in the naming of command arguments and the simple act of publishing keyed messages requires you to jump through hoops — passing in obscure, undocumented properties. The usability of the built-in tools is a well-known heartache within the Kafka community. This is a real shame. It’s like buying a Ferrari, only to have it delivered with plastic hub caps. Fortunately, there are alternatives — both commercial and open source — that can fill the glaring gaps in tooling and observability.

Consumers and Consumer Groups

So far we have learned that producers emit records into the stream; these records are organized into nicely ordered partitions. Kafka’s pub-sub topology adheres to a flexible multipoint-to-multipoint model, meaning that there may be any number of producers and consumers simultaneously interacting with a stream. Depending on the actual solution context, stream topologies may also be point-to-multipoint, multipoint-to-point, and point-to-point. It’s about time we looked at how records are consumed.

consumer is a process or a thread that attaches to a Kafka cluster via a client library. (One is available for most languages.) A consumer generally, but not necessarily, operates as part of an encompassing consumer group. The group is specified by the group.id property. Consumer groups are effectively a load-balancing mechanism within Kafka — distributing partition assignments approximately evenly among the individual consumer instances within the group.

When the first consumer in a group joins the topic, it will receive all partitions in that topic. When a second consumer subsequently joins, it will get approximately half of the partitions, relieving the first consumer of half of its prior load. The process runs in reverse when consumers leave (by disconnecting or timing out) — the remaining consumers will absorb a greater number of partitions.

So, a consumer siphons records from a topic, pulling from the share of partitions that have been assigned to it by Kafka, alongside the other consumers in its group. As far as load-balancing goes, this should be fairly straightforward. But here’s the kicker — the act of consuming a record does not remove it. This might seem contradictory at first, especially if you associate the act of consuming with depletion. (If anything, a consumer should have been called a ‘reader’, but let’s not dwell on the choice of terminology.)

The simple fact is, consumers have absolutely no impact on topics and their partitions. A topic is an append-only ledger that may only be mutated by the producer, or by Kafka itself (as part of compaction or cleanup). Consumers are “cheap,” so you can have quite a number of them tail the logs without stressing the cluster. This is yet another point of distinction between an event stream and a traditional message queue, and it’s a crucial one.

A consumer internally maintains an offset that points to the next record in a partition, advancing the offset for every successive read. When a consumer first subscribes to a topic, it may elect to start at either the head-end or the tail-end of the topic. This behavior is controlled by setting the auto.offset.reset property to one of latestearliest or none. In the latter case, an exception will be thrown if no previous offset exists for the consumer group.

Consumers retain their offset state vector locally. Since consumers across different consumer groups do not interfere, there may be any number of them reading concurrently from the same topic. Consumers run at their own pace — a slow or backlogged consumer has no impact on its peers.

To illustrate this concept, consider a scenario involving a topic with two partitions. Two consumer groups, A and B, are subscribed to the topic. Each group has three instances, the consumers being named A1A2A3B1B2, and B3. The diagram below illustrates how the two groups might share the topic and how the consumers advance through the records independently of one another.1

               Partition 0                       Partition 1

2

               +--------+                        +--------+

3

               |0..00000|                        |0..00000|

4

               +--------+                        +--------+

5

               |0..00001| <= consumer A2         |0..00001|

6

               +--------+                        +--------+

7

               |0..00002|                        |0..00002| <= consumer A1

8

               +--------+                        +--------+

9

               |0..00003|                        |0..00003| 

10

               +--------+                        +--------+

11

                  ...                               ...

12

               +--------+                        +--------+

13

               |0..00008| <= consumer B3         |0..00008| <= consumer B2

14

               +--------+                        +--------+

15

               |0..00009|                        |0..00009|

16

               +--------+                        +--------+

17

producer P1 => |0..00010|                        |0..00010|

18

               +--------+                        +--------+

19

                                  producer P1 => |0..00011|

20

                                                 +--------+

21

Look carefully and you’ll notice something is missing. Consumers A3 and B1 aren’t there. That’s because Kafka guarantees that a partition may only be assigned to at most one consumer within its consumer group. (We say ‘at most’ to cover the case when all consumers are offline.) Because there are three consumers in each group, but only two partitions, one consumer will remain idle — waiting for another consumer in its respective group to depart before being assigned a partition.

In this manner, consumer groups are not only a load-balancing mechanism, but also a fence-like exclusivity control, used to build highly performant pipelines without sacrificing safety, particularly when there is a requirement that a record may only be handled by one thread or process at any given time.

Consumer groups are also used to ensure availability. By periodically pulling records from a topic, the consumer implicitly signals to the cluster that it’s in a ‘healthy’ state, thereby extending the lease over its partition assignment. However, should the consumer fail to read again within the allowable deadline, it will be deemed faulty and its partitions will be reassigned — apportioned among the remaining ‘healthy’ consumers within its group. This deadline is controlled by the max.poll.interval.ms consumer client property, set to five minutes by default.

To use a transportation analogy, a topic is like a highway, while a partition is a lane. A record is the equivalent of a car, and its occupants correspond to the record’s value. Several cars can safely travel on the same highway, providing they keep to their lane. Cars sharing the same line ride in a sequence, forming a queue. Now, suppose each lane leads to an off-ramp, diverting its traffic to some location. If one off-ramp gets banked up, other off-ramps may still flow smoothly.

It’s precisely this highway-lane metaphor that Kafka exploits to achieve its end-to-end throughput, easily reaching millions of records per second on commodity hardware. When creating a topic, one can choose the partition count — the number of lanes, if you will.

The partitions are divided approximately evenly among the individual consumers in a consumer group, with a guarantee that no partition will be assigned to two (or more) consumers at the same time, providing that these consumers are part of the same consumer group. Referring to our analogy, a car will never end up in two off-ramps simultaneously; however, two lanes might conceivably lead to the same off-ramp.

Note: A topic may be resized after creation by increasing the number of partitions. It is not possible, however, to decrease the partition count without recreating the topic.

Records correspond to events, messages, commands, or any other streamable content. Precisely how records are partitioned is left to the discretion of the producer(s). A producer may explicitly assign a partition index when publishing a record, although this approach is rarely used. A much more common approach is to assign a key to a record, as we have done in our earlier example. The key is completely opaque to Kafka. In other words, Kafka doesn’t attempt to interpret the contents of the key, treating it as an array of bytes. These bytes are hashed to derive a partition index, using a consistent hashing technique.

Records sharing the same hash are guaranteed to occupy the same partition. Assuming a topic with multiple partitions, records with a different key will likely end up in different partitions. However, due to hash collisions, records with different hashes may also end up in the same partition. Such is the nature of hashing. If you understand how a hash table works, this is no different.

Producers rarely care which specific partition the records will map to, only that related records end up in the same partition, and that their order is preserved. Similarly, consumers are largely indifferent to their assigned partitions, so long that they receive the records in the same order as they were published, and their partition assignment does not overlap with any other consumer in their group.

Committing Offsets

We already said that consumers maintain an internal state with respect to their partition offsets. At some point, that state must be shared with Kafka, so that when a partition is reassigned, the new consumer can resume processing from where the outgoing consumer left off. Similarly, if the consumers were to disconnect, upon reconnection they would ideally skip over those records that have already been processed.

Persisting the consumer state back to the Kafka cluster is called committing an offset. Typically, a consumer will read a record (or a batch of records) and commit the offset of the last record, plus one. If a new consumer takes over the topic, it will commence processing from the last committed offset, hence the plus-one step is essential. (Otherwise, the last processed record would be handled a second time.)

Fun fact: Kafka employs a recursive approach to managing committed offsets, elegantly utilising itself to persist and track offsets. When an offset is committed, Kafka will publish a binary record on the internal __consumer_offsets topic. The contents of this topic are compacted in the background, creating an efficient event store that progressively reduces to only the last known commit points for any given consumer group.

Controlling the point when an offset is committed provides a great deal of flexibility around delivery guarantees, handing Kafka a yet another trump card. The term ‘delivery’ assumes not just reading a record, but the full processing cycle, complete with any side-effects. One can shift from an at-most-once to an at-least-once delivery model by simply moving the commit operation from a point before the processing of a record is commenced, to a point sometime after the processing is complete. With this model, should the consumer fail midway through processing a record, the record will be re-read the following partition reassignment.

By default, a Kafka consumer will automatically commit offsets every five seconds, regardless of whether the consumer has finished processing the record. Often, this is not what you want, as it may lead to mixed delivery semantics. For example, in the event of consumer failure, some records might be delivered twice, while others might not be delivered at all. To enable manual offset committing, set the enable.auto.commit property to false.

Note: There are a few gotchas like this in Kafka. Pay close attention to the (producer and consumer) client properties in the official Kafka documentation, particularly to the stated defaults. Don’t assume for a moment that the defaults are sensible, insofar as they ought to favour safety over other competing qualities. Kafka defaults tend to be optimised for performance, and will need to be explicitly overridden on the client when safety is a critical objective. Fortunately, setting the properties to insure safety has only a minor impact on performance — Kafka is still a beast. Remember the first rule of optimisation: Don’t do it. Kafka would have been even better, had their creators given this more thought.

Getting offset committing right can be tricky, and routinely catches out beginners. A committed offset implies that the record one below that offset and all prior records have been dealt with by the consumer. When designing at-least-once or exactly-once applications, an offset should only be committed when the application is dealt with with the record in question and all records before it.

In other words, the record has been processed to the point that any actions that would have resulted from the record have been carried out and finalized. This may include calling other APIs, updating a database, committing transactions, persisting the record’s payload, or publishing more records. Stated otherwise, if the consumer were to fail after committing the record, then not ever seeing this record again must not be detrimental to its correctness.

In the at-least-once (and by extension, the exactly-once) scenario, a typical consumer implementation will commit its offset linearly, in tandem with the processing of the records. That is, read a record, commit it (plus-one), read the next, commit it (plus one), and so on. A common tactic is to process a batch of records concurrently (where this makes sense), using a thread pool, and only confirm the last record when the entire batch is done. The commit process in Kafka is very efficient, the client library will send commit requests asynchronously to the cluster using an in-memory queue, without blocking the consumer. The client application can register an optional callback, notifying it when the commit has been acknowledged by the cluster.

The consumer group is a somewhat understated concept that is pivotal to the versatility of an event streaming platform. By simply varying the affinity of consumers with their groups, one can arrive at vastly different distribution topologies — from a topic-like, pub-sub behavior to an MQ-style, point-to-point model. Because records are never truly consumed (the advancing offset only creates the illusion of consumption), one can concurrently superimpose disparate distribution topologies over a single event stream.

Free Consumers

Consumer groups are completely optional; a consumer does not need to be encompassed in a consumer group to pull messages from a topic. A free consumer omits the group.id property. Doing so allows it to operate under relaxed rules, entirely transferring the responsibility for consumer management to the application.

Note: The use of the term ‘free’ to denote a consumer without an encompassing group is not part of the standard Kafka nomenclature. As Kafka lacks a canonical term to describe this, the term ‘free’ was adopted here.

Free consumers do not subscribe to a topic. Instead, the consuming application is responsible for manually assigning a set of topic-partitions to the consumer, individually specifying the starting offset for each topic-partition pair. Free consumers do not commit their offsets to Kafka; it is up to the application to track the progress of such consumers and persist their state as appropriate, using a datastore of their choosing. The concepts of automatic partition assignment, rebalancing, offset persistence, partition exclusivity, consumer heart-beating and failure detection, and other so-called niceties accorded to consumer groups cease to exist in this mode.

Free consumers are not observed in the wild as often as their grouped counterparts. There are predominantly two use cases where a free consumer is an appropriate choice. The first, is when you genuinely need full control of the partition assignment scheme and/or you require an alternative place to store consumer offsets. This is very rare.

Needless to say, it’s also very difficult to implement correctly, given the multitude of scenarios one must account for. The second, more commonly seen use case, is when you have a stateless or ephemeral consumer that needs to monitor a topic. For example, you might be interested in tailing a topic to identify specific records, or just as a debugging tool. You might only care about records that were published when your stateless consumer was online, so concerns such as persisting offsets and resuming from the last processed record are completely irrelevant.

A good example of where this is used routinely is the Kafdrop web UI, which we’ve already seen. When you click on a topic to view the messages, Kafdrop creates a free consumer and assigns the requested partition to it, reading the records from the supplied offsets. Navigating to a different topic or partition will reset the consumer, discarding any prior state.

The illustration below outlines the relationship between producers, topics, partitions, consumers, and consumer groups.1

+----------+          +----------+

2

|PRODUCER 1|          |PRODUCER 2|

3

+-----v----+          +-----v----+

4

      |                     |

5

      |                     |

6

      |                     |

7

 +----V---------------------V-----------------------------------------+

8

 |                            >>> TOPIC >>>                           |

9

 |            +---------------------------------------------------+   |

10

 | PARTITION 0|record 0..00|record 0..01|record 0..02|record 0..03|   |

11

 |            +--------------------v------------------------------+   |

12

 |                                 |                                  |

13

 |            +--------------------|------------------------------+   |

14

 | PARTITION 1|record 0..00|       |    |record 0..02|record 0..03|   |

15

 |            +--------------------|-------------v----------------+   |

16

 |                                 |             |                    |

17

 +----------v----------------------|-------------|--------------------+

18

            |                      |             |       

19

            |                      |             | 

20

            |                      |             | 

21

            |              +-------|-------------|----------------------+

22

            |              |       |             |                      |

23

       +----V-----+        | +-----V----+   +----V-----+   +----------+ |

24

       |CONSUMER 1|        | |CONSUMER 2|   |CONSUMER 3|   |CONSUMER 4| |

25

       +----------+        | +----------+   +----------+   +----------+ |

26

                           |               CONSUMER GROUP               |

27

                           +--------------------------------------------+

28

29

The key takeaways are:

  • Topics are subdivided into partitions, each forming an independent, totally-ordered sequence within a wider, partially-ordered stream.
  • Multiple producers are able to publish to a topic, picking a partition at will. This may be accomplished either directly, by specifying a partition index, or indirectly, by way of a record key, which deterministically hashes to a consistent partition index. (In the diagram above, both Producer 1 and Producer 2 publish to the same topic.)
  • Partitions in a topic can be load-balanced across a population of consumers in a consumer group, allocating partitions approximately evenly among the members of that group. (Consumer 2 and Consumer 3 each get one partition.)
  • A consumer in a group is not guaranteed a partition assignment. Where the group’s population outnumbers the partitions, some consumers will remain idle until this balance equalizes or tips in favor of the other side. (Consumer 4 remains partition-less.)
  • Partitions may be manually assigned to free consumers. If necessary, an entire topic may be assigned to a single free consumer — this is done by individually assigning all partitions. (Consumer 1 can be freely assigned any partition.)

Exactly-Once Delivery

When contrasting at-least-once with at-most-once delivery semantics, an often-asked question is: Why can’t we have it exactly once?

Without delving into the academic details, which involve conjectures and impossibility proofs, it is sufficient to say that exactly-once semantics are not possible without collaboration with the consumer application. What does this mean in practice?

Consumers in event streaming applications must be idempotent. In other words, processing the same record repeatedly should have no net effect on the consumer ecosystem. If a record has no additive effects, the consumer is inherently idempotent. (For example, if the consumer simply overwrites an existing database entry with a new one, then the update is naturally idempotent.) Otherwise, the consumer must check whether a record has already been processed, and to what extent, prior to processing a record. The combination of at-least-once delivery and consumer idempotence collectively leads to exactly-once semantics.

Example: A Trading Platform

With all this theory looming over us like Kubrick’s Monolith, it would be inappropriate to conclude without offering the reader a practical scenario.

Let’s say you were looking for specific price patterns in listed stocks, emitting trading signals when a particular pattern is identified. There are a large number of stocks, and understandably you’d like them processed in parallel. However, the time series for any given ticker code must be processed sequentially on a single consumer.

Kafka makes this use case, and others like it, almost trivial to implement. We would create a pair of topics: prices for the raw price data, and orders for any resulting orders. We can be fairly generous with our partition counts, as the nature of the data gives us ample opportunities for parallelism.

At the feed source, we could publish a record for each price on the prices topic, keyed by the ticker code. Kafka’s automatic partition assignment will ensure that every ticker code is handled by (at most) one consumer in its group. The consumer instances are free to scale in and out to match the processing load. Consumer groups should be meaningfully named, ideally reflecting the purpose of the consuming application. A good example might be trading-strategy.abc, for a fictitious trading strategy named ‘ABC’.

Once a price pattern is identified by the consumer, it can publish another message — the order request — on the orders topic. We’ll muster up another consumer group — order-execution — responsible for reading the orders and forwarding them to the broker.

In this simple example, we have created an end-to-end trading pipeline that is entirely event-driven and highly scalable — at least theoretically, assuming there are no other bottlenecks. We can dynamically add more processing nodes to the individual stages to cope with the increased load where it’s called for.

Now, let’s spice things up a bit. Suppose you need several trading strategies operating concurrently, driven by a common data feed. Furthermore, the trading strategies will be developed by different teams; the objective being to decouple these implementations as much as possible, allowing the teams to operate autonomously — develop and deploy at their individual cadence, perhaps even using different programming languages and tool-chains. That said, you’d ideally want to reuse as much of what’s already been written. So, how would we pull this off? 

Trading Platform

Kafka’s flexible multipoint-to-multipoint pub-sub architecture combines stateful consumption with broadcast semantics. Using distinct consumer groups, Kafka allows disparate applications to share input topics, processing events at their own pace. The second trading strategy would need a dedicated consumer group — trading-strategy.xyz — applying its specific business logic to the common pricing stream, publishing the resulting orders to the same orders topic. In this fashion, Kafka enables you to construct modular event processing pipelines from discrete elements that are readily reusable and composable.

Note: In the days of service buses and traditional ‘enterprisey’ message brokers, before event sourcing entered the mainstream, you would have had to choose between persistent message queues or transient broadcast topics. In our example, you would likely have created multiple FIFO queues, using the fan-out pattern. Because Kafka generalises pub-sub topics and persistent message queues into a unified model, a single source topic can power a diverse range of consumers without incurring duplication.

In Conclusion

Event streaming platforms are a highly effective building block in the construction of modular, loosely-coupled, event-driven applications. Within the world of event streaming, Kafka has solidified its position as the go-to open-source solution that is both amazingly flexible and highly performant. Concurrency and parallelism are at the heart of Kafka’s architecture, forming partially-ordered event streams that can be load-balanced across a scalable consumer ecosystem. A simple reconfiguration of consumers and their encompassing groups can bring about vastly different event distribution and processing semantics; shifting the offset commit point can invert the delivery guarantee from an at-most-once to an at-least-once model.

Of course, Kafka isn’t without its flaws. The tooling is sub-par, to put it mildly; most Kafka practitioners have long abandoned the out-of-the-box CLI utilities in favour of other open-source tools such as KafdropKafkacat and third-party commercial offerings like Kafka Tool. The breadth of Kafka’s configuration options is overwhelming, with defaults that are riddled with gotchas, ready to shock the unsuspecting first-time user.

All in all, Kafka represents a paradigm shift in how we architect and build complex systems. Its benefits go beyond the superfluous, and they dwarf any of the niggles that are bound to exist in a technology that has undergone such aggressive adoption. Crucially, it paves the way for further progress in its space; Apache Pulsar is a prime example of an alternative platform that has improved on much of Kafka’s shortcomings, yet owes a great deal to its predecessor for laying the cornerstone and bringing the genre to the mainstream.

How to Schedule Tasks with Spring Boot
06
Feb
2021

How to Schedule Tasks with Spring Boot

In this article, You’ll learn how to schedule tasks in Spring Boot using @Scheduled annotation. You’ll also learn how to use a custom thread pool for executing all the scheduled tasks.

The @Scheduled annotation is added to a method along with some information about when to execute it, and Spring Boot takes care of the rest.

Spring Boot internally uses the TaskScheduler interface for scheduling the annotated methods for execution.

The purpose of this article is to build a simple project demonstrating all the concepts related to task scheduling.

Create the Project

Let’s use Spring Boot CLI to create the Project. Fire up your terminal and type the following command to generate the project –

$ spring init --name=scheduler-demo scheduler-demo 

Alternatively, You can generate the project using Spring Initializer web app. Just go to http://start.spring.io/, enter the Artifact’s value as “scheduler-demo” and click Generate to generate and download the project.

Once the project is generated, import it in your favorite IDE. The project’s directory structure should like this –

Spring Boot Scheduled Annotation Example Directory Structure

Enable Scheduling

You can enable scheduling simply by adding the @EnableScheduling annotation to the main application class or one of the Configuration classes.

Open SchedulerDemoApplication.java and add @EnableScheduling annotation like so –

package com.example.schedulerdemo;

import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.scheduling.annotation.EnableScheduling;

@SpringBootApplication
@EnableScheduling
public class SchedulerDemoApplication {

	public static void main(String[] args) {
		SpringApplication.run(SchedulerDemoApplication.class, args);
	}
}

Scheduling Tasks

Scheduling a task with Spring Boot is as simple as annotating a method with @Scheduled annotation, and providing few parameters that will be used to decide the time at which the task will run.

Before adding tasks, Let’s first create the container for all the scheduled tasks. Create a new class called ScheduledTasks inside com.example.schedulerdemo package with the following contents –

package com.example.schedulerdemo;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.scheduling.annotation.Scheduled;
import org.springframework.stereotype.Component;

import java.time.LocalDateTime;
import java.time.format.DateTimeFormatter;
import java.util.concurrent.TimeUnit;

@Component
public class ScheduledTasks {
    private static final Logger logger = LoggerFactory.getLogger(ScheduledTasks.class);
    private static final DateTimeFormatter dateTimeFormatter = DateTimeFormatter.ofPattern("HH:mm:ss");

    public void scheduleTaskWithFixedRate() {}

    public void scheduleTaskWithFixedDelay() {}

    public void scheduleTaskWithInitialDelay() {}

    public void scheduleTaskWithCronExpression() {}
}

The class contains four empty methods. We’ll look at the implementation of all the methods one by one.

All the scheduled methods should follow the following two criteria –

  • The method should have a void return type.
  • The method should not accept any arguments.

Cool! Let’s now jump into the implementation.

1. Scheduling a Task with Fixed Rate

You can schedule a method to be executed at a fixed interval by using fixedRate parameter in the @Scheduled annotation. In the following example, The annotated method will be executed every 2 seconds.

@Scheduled(fixedRate = 2000)
public void scheduleTaskWithFixedRate() {
    logger.info("Fixed Rate Task :: Execution Time - {}", dateTimeFormatter.format(LocalDateTime.now()) );
}
# Sample Output
Fixed Rate Task :: Execution Time - 10:26:58
Fixed Rate Task :: Execution Time - 10:27:00
Fixed Rate Task :: Execution Time - 10:27:02
....
....

The fixedRate task is invoked at the specified interval even if the previous invocation of the task is not finished.

2. Scheduling a Task with Fixed Delay

You can execute a task with a fixed delay between the completion of the last invocation and the start of the next, using fixedDelay parameter.

The fixedDelay parameter counts the delay after the completion of the last invocation.

Consider the following example –

@Scheduled(fixedDelay = 2000)
public void scheduleTaskWithFixedDelay() {
    logger.info("Fixed Delay Task :: Execution Time - {}", dateTimeFormatter.format(LocalDateTime.now()));
    try {
        TimeUnit.SECONDS.sleep(5);
    } catch (InterruptedException ex) {
        logger.error("Ran into an error {}", ex);
        throw new IllegalStateException(ex);
    }
}

Since the task itself takes 5 seconds to complete and we have specified a delay of 2 seconds between the completion of the last invocation and the start of the next, there will be a delay of 7 seconds between each invocation –

# Sample Output
Fixed Delay Task :: Execution Time - 10:30:01
Fixed Delay Task :: Execution Time - 10:30:08
Fixed Delay Task :: Execution Time - 10:30:15
....
....

3. Scheduling a Task With Fixed Rate and Initial Delay

You can use initialDelay parameter with fixedRate and fixedDelay to delay the first execution of the task with the specified number of milliseconds.

In the following example, the first execution of the task will be delayed by 5 seconds and then it will be executed normally at a fixed interval of 2 seconds –

@Scheduled(fixedRate = 2000, initialDelay = 5000)
public void scheduleTaskWithInitialDelay() {
    logger.info("Fixed Rate Task with Initial Delay :: Execution Time - {}", dateTimeFormatter.format(LocalDateTime.now()));
}
# Sample output (Server Started at 10:48:46)
Fixed Rate Task with Initial Delay :: Execution Time - 10:48:51
Fixed Rate Task with Initial Delay :: Execution Time - 10:48:53
Fixed Rate Task with Initial Delay :: Execution Time - 10:48:55
....
....

4. Scheduling a Task using Cron Expression

If the above simple parameters can not fulfill your needs, then you can use cron expressions to schedule the execution of your tasks.

In the following example, I have scheduled the task to be executed every minute –

@Scheduled(cron = "0 * * * * ?")
public void scheduleTaskWithCronExpression() {
    logger.info("Cron Task :: Execution Time - {}", dateTimeFormatter.format(LocalDateTime.now()));
}
# Sample Output
Cron Task :: Execution Time - 11:03:00
Cron Task :: Execution Time - 11:04:00
Cron Task :: Execution Time - 11:05:00

Running @Scheduled Tasks in a Custom Thread Pool

By default, all the @Scheduled tasks are executed in a default thread pool of size one created by Spring.

You can verify that by logging the name of the current thread in all the methods –

logger.info("Current Thread : {}", Thread.currentThread().getName());

All the methods will print the following –

Current Thread : pool-1-thread-1

But hey, You can create your own thread pool and configure Spring to use that thread pool for executing all the scheduled tasks.

Create a new package config inside com.example.schedulerdemo, and then create a new class called SchedulerConfig inside config package with the following contents –

package com.example.schedulerdemo.config;

import org.springframework.context.annotation.Configuration;
import org.springframework.scheduling.annotation.SchedulingConfigurer;
import org.springframework.scheduling.concurrent.ThreadPoolTaskScheduler;
import org.springframework.scheduling.config.ScheduledTaskRegistrar;

@Configuration
public class SchedulerConfig implements SchedulingConfigurer {
    private final int POOL_SIZE = 10;

    @Override
    public void configureTasks(ScheduledTaskRegistrar scheduledTaskRegistrar) {
        ThreadPoolTaskScheduler threadPoolTaskScheduler = new ThreadPoolTaskScheduler();

        threadPoolTaskScheduler.setPoolSize(POOL_SIZE);
        threadPoolTaskScheduler.setThreadNamePrefix("my-scheduled-task-pool-");
        threadPoolTaskScheduler.initialize();

        scheduledTaskRegistrar.setTaskScheduler(threadPoolTaskScheduler);
    }
}

That’s all you need to do for configuring Spring to use your own thread pool instead of the default one.

If you log the name of the current thread in the scheduled methods now, you’ll get the output like so –

Current Thread : my-scheduled-task-pool-1
Current Thread : my-scheduled-task-pool-2

# etc...

Conclusion

In this article, you learned how to schedule tasks in Spring Boot using @Scheduled annotation. You also learned how to use a custom thread pool for running these tasks.

The Scheduling abstraction provided by Spring Boot works pretty well for simple use-cases. But if you have more advanced use cases like Persistent JobsClusteringDynamically adding and triggering new jobs then check out the following article –

Spring Boot Quartz Scheduler Example: Building an Email Scheduling App

Thank you for reading. See you in the next Post!

Spring Boot Logging Best Practices Guide
05
May
2021

Spring Boot Logging Best Practices Guide

Logging in Spring Boot can be confusing, and the wide range of tools and frameworks make it a challenge to even know where to start. This guide talks through the most common best practices for Spring Boot logging and gives five key suggestions to add to your logging tool kit.

What’s in the Spring Boot Box?

The Spring Boot Starters all depend on spring-boot-starter-logging. This is where the majority of the logging dependencies for your application come from. The dependencies involve a facade (SLF4J) and frameworks (Logback). It’s important to know what these are and how they fit together.

SLF4J is a simple front-facing facade supported by several logging frameworks. It’s main advantage is that you can easily switch from one logging framework to another. In our case, we can easily switch our logging from Logback to Log4j, Log4j2 or JUL.

The dependencies we use will also write logs. For example, Hibernate uses SLF4J, which fits perfectly as we have that available. However, the AWS SDK for Java uses Apache Commons Logging (JCL). Spring-boot-starter-logging includes the necessary bridges to ensure those logs are delegated to our logging framework out of the box.

SLF4J usage:

At a high level, all the application code has to worry about is:

  1. Getting an instance of an SLF4J logger (Regardless of the underlying framework):
    private static final Logger LOG = LoggerFactory.getLogger(MyClass.class);Copy
  2. Writing some logs:
    LOG.info(“My message set at info level”);Copy

Logback or Log4j2?

Spring Boot’s default logging framework is Logback. Your application code should interface only with the SLF4J facade so that it’s easy to switch to an alternative framework if necessary.

Log4j2 is newer and claims to improve on the performance of Logback. Log4j2 also supports a wide range of appenders so it can log to files, HTTP, databases, Cassandra, Kafka, as well as supporting asynchronous loggers. If logging performance is of high importance, switching to log4j2 may improve your metrics. Otherwise, for simplicity, you may want to stick with the default Logback implementation.

This guide will provide configuration examples for both frameworks.

Want to use log4j2? You’ll need to exclude spring-boot-starter-logging and include spring-boot-starter-logging-log4j2.

spring boot logging frameworks

5 Tips for Getting the Most Out of Your Spring Boot Logging

With your initial set up out of the way, here are 5 top tips for spring boot logging.

1. Configuring Your Log Format

Spring Boot Logging provides default configurations for logback and log4j2. These specify the logging level, the appenders (where to log) and the format of the log messages.

For all but a few specific packages, the default log level is set to INFO, and by default, the only appender used is the Console Appender, so logs will be directed only to the console.

The default format for the logs using logback looks like this:

logback default logging format

Let’s take a look at that last line of log, which was a statement created from within a controller with the message “My message set at info level”.

It looks simple, yet the default log pattern for logback seems “off” at first glance. As much as it looks like it could be, it’s not regex, it doesn’t parse email addresses, and actually, when we break it down it’s not so bad.

%clr(%d{${LOG_DATEFORMAT_PATTERN:-yyyy-MM-dd HH:mm:ss.SSS}}){faint}
%clr(${LOG_LEVEL_PATTERN:-%5p}) %clr(${PID:- }){magenta} %clr(---){faint}
%clr([%15.15t]){faint} %clr(%-40.40logger{39}){cyan} %clr(:){faint}
%m%n${LOG_EXCEPTION_CONVERSION_WORD:-%wEx}Copy

Understanding the Default Logback Pattern

The variables that are available for the log format allow you to create meaningful logs, so let’s look a bit deeper at the ones in the default log pattern example.Show 102550100 entriesSearch:

Pattern PartWhat it Means
%clr%clr specifies a colour. By default, it is based on log levels, e.g, INFO is green. If you want to specify specific colours, you can do that too.

The format is:
%clr(Your message){your colour}

So for example, if we wanted to add “Demo” to the start of every log message, in green, we would write:
%clr(Demo){green}
%d{${LOG_DATEFORMAT_PATTERN:-yyyy-MM-dd HH:mm:ss.SSS}}%d is the current date, and the part in curly braces is the format. ${VARIABLE}:-default is a way of specifying that we should use the $VARIABLE environment variable for the format, if it is available, and if not, fall back to default. This is handy if you want to override these values in your properties files, by providing arguments, or by setting environment variables.

In this example, the default format is yyyy-MM-dd HH:mm:ss.SSS unless we specify a variable named LOG_DATEFORMAT_PATTERN. In the logs above, we can see 2020-10-19 10:09:58.152 matches the default pattern, meaning we did not specify a custom LOG_DATEFORMAT_PATTERN.
${LOG_LEVEL_PATTERN:-%5p}Uses the LOG_LEVEL_PATTERN if it is defined, else will print the log level with right padding up to 5 characters (E.g “INFO” becomes “INFO “ but “TRACE” will not have the trailing space). This keeps the rest of the log aligned as it’ll always be 5 characters.
${PID:- }The environment variable $PID, if it exists. If not, space.
tThe name of the thread triggering the log message.
loggerThe name of the logger (up to 39 characters), in our case this is the class name.
%mThe log message.
%nThe platform-specific line separator.
%wExIf one exists, wEx is the stack trace of any exception, formatted using Spring Boot’s ExtendedWhitespaceThrowableProxyConverter.

Showing 1 to 9 of 9 entriesPreviousNext

Customising the log format

You can customise the ${} variables that are found in the logback-spring.xml by passing in properties or environment variables. For example, you may set logging.pattern.console to override the whole of the console log pattern. 

However, for more control, including adding additional appenders, it is recommended to create your logback-spring.xml and place it inside your resources folder. You can do the same with log4j2 by adding log4j2-spring.xml to your resources folder.

Armed with the ability to customise your logs, you should consider adding:

  • Application name.
  • A request ID.
  • The endpoint being requested (E.g /health).

There are a few items in the default log that I would remove unless you have a specific use case for them:

  • The ‘—’ separator.
  • The thread name.
  • The process ID.

With the ability to customise these through the use of the logback-spring.xml or log4j2-spring.xml, the format of your logs is fully within your control.

2. Configuring the Destination for Your Logs (Appenders and Loggers)

An appender is just a fancy name for the part of the logging framework that sends your logs to a particular target. Both frameworks can output to console, over HTTP, to databases, or over a TCP socket, as well as to many other targets. The way we configure the destination for the logs is by adding, removing and configuring these appenders. 

You have more control over which appenders you use, and the configuration of them, if you create your own custom .xml configuration. However, the default logging configuration does make use of environment properties that allow you to override some parts of it, for example, the date format.

Preset configuration for logging to files are available within Spring Boot Logging. You can use the logback configuration with a file appender or the log4j2 configuration with a file appender if you specify logging.file or logging.path in your application properties.

The official docs for logback appenders and log4j2 appenders detail the parameters required for each of the appenders available, and how to configure them in your XML file. One tip for choosing the destination for your logs is to have a plan for rotating them. Writing logs to a file always feels like a great idea, until the storage used for that file runs out and brings down the whole service. 

Log4j and logback both have a RollingFileAppender which handles rotating these log files based on file size, or time, and it’s exactly that which Spring Boot Logging uses if you set the logging.file property. 

3. Logging as a Cross-Cutting Concern to Keep Your Code Clean (Using Filters and Aspects)

You might want to log every HTTP request your API receives. That’s a fairly normal requirement, but putting a log statement into every controller is unnecessary duplication. It’s easy to forget and make mistakes. A requirement that you want to log every method within your packages that your application calls would be even more cumbersome. 

I’ve seen developers use this style of logging at trace level so that they can turn it on to see exactly what is happening in a production environment. Adding log statements to the start and end of every method is messy, and there is a better way. This is where filters and aspects save the day and avoid the code duplication.

When to Use a Filter Vs When to Use Aspect-Oriented Programming

If you are looking to create log statements related to specific requests, you should opt for using filters, as they are part of the handling chain that your application already goes through for each request. They are easier to write, easier to test and usually more performant than using aspects. If you are considering more cross-cutting concerns, for example, audit logging, or logging every method that causes an exception to be thrown, use AOP. 

Using a Filter to Log Every Request

Filters can be registered with your web container by creating a class implementing javax.servlet.Filter and annotating it with @Component, or adding it as an @Bean in one of your configuration classes. When your spring-boot-starter application starts up, it will create the Filter and register it with the container.

You can choose to create your own Filter, or to use an existing one. To make use of the existing Filter, you need to supply a CommonsRequestLoggingFilter bean and set your logging level to debug. You’ll get something that looks like:

2020-10-27 18:50:50.427 DEBUG 24168 --- [nio-8080-exec-2] o.a.coyote.http11.Http11InputBuffer      : Received [GET /health HTTP/1.1
tracking-header: my-tracking
User-Agent: PostmanRuntime/7.26.5
Accept: */*
Postman-Token: 04a661b7-209c-43c3-83ea-e09466cf3d92
Host: localhost:8080
Accept-Encoding: gzip, deflate, br
Connection: keep-alive
]Copy

If you use the existing one, you have little control over the message that gets logged. 

If you want more control, create your own Filter using this example, and you then have full control over the content of the log message.

Using Aspects for Cross-Cutting Concerns

Aspect-oriented programming enables you to fulfill cross-cutting concerns, like logging for example, in one place. You can do this without your logging code needing to sprawl across every class.

This approach is great for use cases such as:

  • Logging any exceptions thrown from any method within your packages (See @AfterThrowing)
  • Logging performance metrics by timing before/after each method is run (See @Around)
  • Audit logging. You can log calls to methods that have your a custom annotation on, such as adding @Audit. You only need to create a pointcut matching calls to methods with that annotation

Let’s start with a simple example – we want to log the name of every public method that we call within our package, com.example.demo. There are only a few steps to writing an Aspect that will run before every public method in a package that you specify.

  1. Include spring-boot-starter-aop in your pom.xml or build.gradle.
  2. Add @EnableAspectJAutoProxy to one of your configuration classes. This line tells spring-boot that you want to enable AspectJ support.
  3. Add your pointcut, which defines a pattern that is matched against method signatures as they run. You can find more about how to construct your matching pattern in the spring boot documentation for AOP. In our example, we match any method inside the com.example.demo package.
  4. Add your Aspect. This defines when you want to run your code in relation to the pointcut (E.g, before, after or around the methods that it matches). In this example, the @Before annotation causes the method to be executed before any methods that match the pointcut. 

That’s all there is to logging every method call. The logs will appear as:

2020-10-27 19:26:33.269  INFO 2052 --- [nio-8080-exec-2]
com.example.demo.MyAspect                : Called checkHealthCopy

By making changes to your pointcut, you can write logs for every method annotated with a specific annotation. For example, consider what you can do with:

@annotation(com.example.demo.Audit)Copy

4. Applying Context to Your Logs Using MDC

(This would run for every method annotated with a custom annotation, @Audit).

MDC (Mapped Diagnostic Context) is a complex-sounding name for a map of key-value pairs, associated with a single thread. Each thread has its own map. You can add keys/values to the map at runtime, and then reference the keys from that map in your logging pattern. 

The approach comes with a warning that threads may be reused, and so you’ll need to make sure to clear your MDC after each request to avoid your context leaking from one request to the next.

MDC is accessible through SLF4J and supported by both Logback and Log4j2, so we don’t need to worry about the specifics of the underlying implementation. 

The MDC section in the SLF4J documentation gives the simplest examples.

Tracking Requests Through Your Application Using Filters and MDC

Want to be able to group logs for a specific request? The Mapped Diagnostic Context (MDC) will help. 

The steps are:

  1. Add a header to each request going to your API, for example, ‘tracking-id’. You can generate this on the fly (I suggest using a UUID) if your client cannot provide one.
  2. Create a filter that runs once per request and stores that value in the MDC.
  3. Update your logging pattern to reference the key in the MDC to retrieve the value.

Using a Filter, this is how you can read values from the request and set them on the MDC. Make sure to clear up after the request by calling MDC.clear(), preferably in a finally block so that it always runs. 

After setting the value on your MDC, just add %X{tracking}  to your logging pattern (Replacing the word “tracking” with the key you have put in MDC) and your logs will contain the value in every log message for that request. 

If a client reports a problem, as long as you can get a unique tracking-id from your client, then you’ll be able to search your logs and pull up every log statement generated from that specific request.

Other use cases that you may want to put into your MDC and include on every log message include:

  • The application version.
  • Details of the request, for example, the path.
  • Details of the logged-in user, for example, the username.

5. Unit Testing Your Log Statements

Why Test Your Logs?

You can unit test your logging code. Too often this is overlooked because the log statements return void. For example, logger.info(“foo”);  does not return a value that you can assert against. 

It’s easy to make mistakes. Log statements usually involve parameters or formatted strings, and it’s easy to put log statements in the wrong place. Unit testing reassures you that your logs do what you expect and that you’re covered when refactoring to avoid any accidental modifications to your logging behaviour.

The Approach to Testing Your Logs

The Problem

SLF4J’s LoggerFactory.getLogger is static, making it difficult to mock. Searching through any outputted log files in our unit tests is error-prone (E.g we need to consider resetting the log files between each unit test). How do we assert against the logs?

The Solution

The trick is to add your own test appender to the logging framework (e.g Logback or Log4j2) that captures the logs from your application in memory, allowing us to assert against the output later. The steps are:

  1. Before each test case, add an appender to your logger.
  2. Within the test, call your application code that logs some output.
  3. The logger will delegate to your test appender.
  4. Assert that your expected logs have been received by your test appender.

Each logging framework has suitable appenders, but referencing those concrete appenders in our tests means we need to depend on the specific framework rather than SLF4J. That’s not ideal, but the alternatives of searching through logged output in files, or implementing our own SLF4J implementation is overkill, making this the pragmatic choice.

Here are a couple of tricks for unit testing using JUnit 4 rules or JUnit 5 extensions that will keep your test classes clean, and reduce the coupling with the logging framework.

Testing Log Statements Using Junit 5 Extensions in Two Steps

JUnit 5 extensions help to avoid code duplicates between your tests. Here’s how to set up your logging tests in two steps:

Step 1: Create your JUnit extension

Create your extension for Logback

Create your extension for Log4j2

Step 2: Use that rule to assert against your log statement with logback or log4j2

Testing Log Statements Using Junit 4 Rules in Two Steps

JUnit 4 rules help to avoid code duplication by extracting the common test code away from the test classes. In our example, we don’t want to duplicate the code for adding a test appender to our logger in every test class.

Step 1: Create your JUnit rule. 

Create your rule for Logback

Create your rule for Log4j2

Step 2: Use that rule to assert against your log statements using logback or log4j2.

With these approaches, you can assert that your log statements have been called with a message and level that you expect. 

Conclusion

The Spring Boot Logging Starter provides everything you need to quickly get started, whilst allowing full control when you need it. We’ve looked at how most logging concerns (formatting, destinations, cross-cutting logging, context and unit tests) can be abstracted away from your core application code.

Any global changes to your logging can be done in one place, and the classes for the rest of your application don’t need to change. At the same time, unit tests for your log statements provide you with reassurance that your log statements are being fired after making any alterations to your business logic.

These are my top 5 tips for configuring Spring Boot Logging. However, when your logging configuration is set up, remember that your logs are only ever as good as the content you put in them. Be mindful of the content you are logging, and make sure you are using the right logging levels.

Spring Boot Actuator metrics monitoring with Prometheus and Grafana
03
Mar
2021

Spring Boot Actuator metrics monitoring with Prometheus and Grafana

Welcome to the second part of the Spring Boot Actuator tutorial series. In the first part, you learned what spring-boot-actuator module does, how to configure it in a spring boot application, and how to interact with various actuator endpoints.

In this article, you’ll learn how to integrate spring boot actuator with a monitoring system called Prometheus and a graphing solution called Grafana.

At the end of this article, you’ll have a Prometheus as well as a Grafana dashboard setup in your local machine where you’ll be able to visualize and monitor all the metrics generated from the Spring Boot application.

Prometheus

Prometheus is an open-source monitoring system that was originally built by SoundCloud. It consists of the following core components –

  • A data scraper that pulls metrics data over HTTP periodically at a configured interval.
  • time-series database to store all the metrics data.
  • A simple user interface where you can visualize, query, and monitor all the metrics.

Grafana

Grafana allows you to bring data from various data sources like Elasticsearch, Prometheus, Graphite, InfluxDB etc, and visualize them with beautiful graphs.

It also lets you set alert rules based on your metrics data. When an alert changes state, it can notify you over email, slack, or various other channels.

Note that, Prometheus dashboard also has simple graphs. But Grafana’s graphs are way better. That’s why, in this post, we’ll integrate Grafana with Prometheus to import and visualize our metrics data.

Adding Micrometer Prometheus Registry to your Spring Boot application

Spring Boot uses Micrometer, an application metrics facade to integrate actuator metrics with external monitoring systems.

It supports several monitoring systems like Netflix Atlas, AWS Cloudwatch, Datadog, InfluxData, SignalFx, Graphite, Wavefront, Prometheus etc.

To integrate actuator with Prometheus, you need to add the micrometer-registry-prometheus dependency –

<!-- Micrometer Prometheus registry  -->
<dependency>
	<groupId>io.micrometer</groupId>
	<artifactId>micrometer-registry-prometheus</artifactId>
</dependency>

Once you add the above dependency, Spring Boot will automatically configure a PrometheusMeterRegistry and a CollectorRegistry to collect and export metrics data in a format that can be scraped by a Prometheus server.

All the application metrics data are made available at an actuator endpoint called /prometheus. The Prometheus server can scrape this endpoint to get metrics data periodically.

Exploring Spring Boot Actuator’s /prometheus Endpoint

Let’s explore the prometheus endpoint that is exposed by Spring Boot when micrometer-registry-prometheus dependency is available on the classpath.

First of all, you’ll start seeing the prometheus endpoint on the actuator endpoint-discovery page (http://localhost:8080/actuator) –

Spring Boot Actuator Prometheus Endpoint

The prometheus endpoint exposes metrics data in a format that can be scraped by a Prometheus server. You can see the exposed metrics data by navigating to the prometheus endpoint (http://localhost:8080/actuator/prometheus) –

# HELP jvm_buffer_memory_used_bytes An estimate of the memory that the Java virtual machine is using for this buffer pool
# TYPE jvm_buffer_memory_used_bytes gauge
jvm_buffer_memory_used_bytes{id="direct",} 81920.0
jvm_buffer_memory_used_bytes{id="mapped",} 0.0
# HELP jvm_threads_live The current number of live threads including both daemon and non-daemon threads
# TYPE jvm_threads_live gauge
jvm_threads_live 23.0
# HELP tomcat_global_received_bytes_total  
# TYPE tomcat_global_received_bytes_total counter
tomcat_global_received_bytes_total{name="http-nio-8080",} 0.0
# HELP jvm_gc_pause_seconds Time spent in GC pause
# TYPE jvm_gc_pause_seconds summary
jvm_gc_pause_seconds_count{action="end of minor GC",cause="Allocation Failure",} 7.0
jvm_gc_pause_seconds_sum{action="end of minor GC",cause="Allocation Failure",} 0.232
jvm_gc_pause_seconds_count{action="end of minor GC",cause="Metadata GC Threshold",} 1.0
jvm_gc_pause_seconds_sum{action="end of minor GC",cause="Metadata GC Threshold",} 0.01
jvm_gc_pause_seconds_count{action="end of major GC",cause="Metadata GC Threshold",} 1.0
jvm_gc_pause_seconds_sum{action="end of major GC",cause="Metadata GC Threshold",} 0.302
# HELP jvm_gc_pause_seconds_max Time spent in GC pause
# TYPE jvm_gc_pause_seconds_max gauge
jvm_gc_pause_seconds_max{action="end of minor GC",cause="Allocation Failure",} 0.0
jvm_gc_pause_seconds_max{action="end of minor GC",cause="Metadata GC Threshold",} 0.0
jvm_gc_pause_seconds_max{action="end of major GC",cause="Metadata GC Threshold",} 0.0
# HELP jvm_gc_live_data_size_bytes Size of old generation memory pool after a full GC
# TYPE jvm_gc_live_data_size_bytes gauge
jvm_gc_live_data_size_bytes 5.0657472E7

## More data ...... (Omitted for brevity)

Downloading and Running Prometheus using Docker

1. Downloading Prometheus

You can download the Prometheus docker image using docker pull command like so –

$ docker pull prom/prometheus

Once the image is downloaded, you can type docker image ls command to view the list of images present locally –

$ docker image ls
REPOSITORY                                   TAG                 IMAGE ID            CREATED             SIZE
prom/prometheus                              latest              b82ef1f3aa07        5 days ago          119MB

2. Prometheus Configuration (prometheus.yml)

Next, We need to configure Prometheus to scrape metrics data from Spring Boot Actuator’s /prometheus endpoint.

Create a new file called prometheus.yml with the following configurations –

# my global config
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'
    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.
    static_configs:
    - targets: ['127.0.0.1:9090']

  - job_name: 'spring-actuator'
    metrics_path: '/actuator/prometheus'
    scrape_interval: 5s
    static_configs:
    - targets: ['HOST_IP:8080']

The above configuration file is an extension of the basic configuration file available in the Prometheus documentation.

The most important stuff to note in the above configuration file is the spring-actuator job inside scrape_configs section.

The metrics_path is the path of the Actuator’s prometheus endpoint. The targets section contains the HOST and PORT of your Spring Boot application.

Please make sure to replace the HOST_IP with the IP address of the machine where your Spring Boot application is running. Note that, localhost won’t work here because we’ll be connecting to the HOST machine from the docker container. You must specify the network IP address.

3. Running Prometheus using Docker

Finally, Let’s run Prometheus using Docker. Type the following command to start a Prometheus server in the background –

$ docker run -d --name=prometheus -p 9090:9090 -v <PATH_TO_prometheus.yml_FILE>:/etc/prometheus/prometheus.yml prom/prometheus --config.file=/etc/prometheus/prometheus.yml

Please make sure to replace the <PATH_TO_prometheus.yml_FILE> with the PATH where you have stored the Prometheus configuration file.

After running the above command, docker will start the Prometheus server inside a container. You can see the list of all the containers with the following command –

$ docker container ls
CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS                    NAMES
e036eb20b8ad        prom/prometheus     "/bin/prometheus --c…"   4 minutes ago       Up 4 minutes        0.0.0.0:9090->9090/tcp   prometheus

4. Visualizing Spring Boot Metrics from Prometheus dashboard

That’s it! You can now navigate to http://localhost:9090 to explore the Prometheus dashboard.

You can enter a Prometheus query expression inside the Expression text field and visualize all the metrics for that query.

Following are some Prometheus graphs for our Spring Boot application’s metrics –

  • System’s CPU usage –
Spring Boot Actuator Metrics Dashboard Prometheus
  • Response latency of a slow API –
Spring Boot Actuator Prometheus Dashboard Graph Example

You can check out the official Prometheus documentation to learn more about Prometheus Query Expressions.

Downloading and running Grafana using Docker

Type the following command to download and run Grafana using Docker –

$ docker run -d --name=grafana -p 3000:3000 grafana/grafana 

The above command will start Grafana inside a Docker container and make it available on port 3000 on the Host machine.

You can type docker container ls to see the list of Docker containers –

$ docker container ls
CONTAINER ID        IMAGE               COMMAND                  CREATED                  STATUS              PORTS                    NAMES
cf9196b30d0d        grafana/grafana     "/run.sh"                Less than a second ago   Up 5 seconds        0.0.0.0:3000->3000/tcp   grafana
e036eb20b8ad        prom/prometheus     "/bin/prometheus --c…"   16 minutes ago           Up 16 minutes       0.0.0.0:9090->9090/tcp   prometheus

That’s it! You can now navigate to http://localhost:3000 and log in to Grafana with the default username admin and password admin.

Configuring Grafana to import metrics data from Prometheus

Follow the steps below to import metrics from Prometheus and visualize them on Grafana:

1. Add the Prometheus data source in Grafana

Spring Boot Actuator Prometheus Grafana Dashboard

2. Create a new Dashboard with a Graph

Spring Boot Actuator Grafana Dashboard Create Graph

3. Add a Prometheus Query expression in Grafana’s query editor

Spring Boot Actuator Grafana Dashboard Prometheus Metrics Graph

4. Visualize metrics from Grafana’s dashboard

Spring Boot Actuator Grafana Dashboard Visualize Prometheus metrics graph

Read the First Part: Spring Boot Actuator: Health check, Auditing, Metrics gathering and Monitoring

More Learning Resources