Logs play a significant role in any application. It is extremely useful for investigating issues when an application encounters an error or does not behave as expected. Log files contains records of events and operations performed on the system or an application. It provides a history of events that can be analyzed to find the root cause of the issue.
Nowadays, we are often moving from monolith software applications to a more distributed microservices architecture at a faster speed. This migration brings a lot of flexibility to the application, but at the same time it also brings complexity in terms of maintainability.
Challenges with Microservices Logging
In Microservices Architecture many distributed services are communicating with each other and each application produces its own log files. Managing log files of multiple services can be time-consuming and painful.
Since many microservices are running on different servers, their health statuses must be monitored to avoid failures. If any one of the services fail, then it is laborious to find the impacted service due to its distributed nature.
To analyze error logs on microservices, we need to filter and check the logs of individual services one by one until we get the relevant information. This means significant time and effort will be spent on debugging and troubleshooting each application and its integrations.
Centralized Logging Solution
The Centralized logging approach enables us to consolidate all the logs from distributed application services and monitor them from one interface. It provides easy access to logs for analysis in one single dashboard, saving us from having to search on each server individually to troubleshoot.
The most popular implementation of a centralized logging solution is called the ELK (Elasticsearch, Logstash, Kibana) Stack. There are also other alternatives.
Before we go on, we should mention the pros and cons of a Centralized logging approach:
- Pros:
- All application service and system logs are accessible from one Interface
- A single dashboard provides an application-wide view
- Better monitoring of the whole microservice application
- Cons:
- Requires significant one-time efforts to set up centralized logging
- Not very useful for Monolithic applications
Centralized Logging Components
A centralized logging solution is based on three main components.
- Collect, Parse, Transform:
The first component of the centralized logging mechanism is collecting incoming log data and parsing it to a more structured format. Filters can also be used to extract relevant data.
Logstash is the most popular framework for this activity. Other alternatives are FluentD and Flume. AWS Kinesis can also be used along with CloudWatch in place of Logstash.
- Store, Index, Search:
The second component is storing the consolidated and filtered data so that it can be processed for analysis. Indexing is also applied to the data to make it searchable. Elasticsearch is a widely used search engine for centralized logging. It helps in providing speedy search results.
- Analyze, Visualize, Monitor:
The third component of the mechanism is to provide visualizations to the processed data that can be analyzed and monitored from one interface. It provides single-point access to the logs of all the application services involved. Kibana is a popular choice and provides one dashboard to access all the data. Further, it supports different chart structure visualizations of the events to provide better control over a large amount of data.
Best Practices of Centralized Logging
- Use Correlation Ids in services:
Always use correlation or unique transaction ids while working with microservices. Having a correlation id will help in tracking the requests which are hopping to different services. This will make finding the failed request and debugging easier.
- Add structure to the log:
Adding structural information to the logs and following the same pattern across the microservices is important. It will give a clear view of all the data required in troubleshooting.
- Trigger error notification alerts:
Setup auto error notifications based on the level of error. This will help to notify us of situations when services are down or not responding.
- Include relevant details in logs:
Adding only relevant details in the log and avoiding unrequired information will significantly reduce the log volume. That will improve the overall load on the server.
What is Elastic Stack (Formerly ELK Stack)?
ELK Stack is an acronym used to describe a stack that comprises three popular projects: Elasticsearch, Logstash, and Kibana. Often referred to as Elasticsearch, the ELK Stack gives you the ability to aggregate logs from all your systems and applications, analyze these logs, and create visualizations for application and infrastructure monitoring providing faster troubleshooting, security analytics, and more.
Why ELK Stack?
Having a good log monitoring infrastructure is a key feature to have while developing any software. For instance, in a microservices architecture, any single operation triggers a chain of API calls making it challenging to debug the entire application in case an error comes.
This is where Logs act as essential information allowing us to better investigate and diagnose errors. They can help sysadmins, support teams, and even developers to follow the different operations carried out by the different services of the system.
However, it becomes very complex to maintain this critical data in a distributed environment where many applications, services, and systems are running. As a solution for this problem, we’re going to look at the ELK stack, a useful tool for centralized log aggregation and analysis.
What is elasticsearch?
- Elasticsearch is a distributed, open-source search and analytics engine built on Apache Lucene and developed in Java.
- It started as a scalable version of the Lucene open-source search framework then added the ability to horizontally scale Lucene indices.
- Elasticsearch allows you to store, search, and analyze huge volumes of data quickly and in near real-time and give back answers in milliseconds.
- It’s able to achieve fast search responses because instead of searching the text directly, it searches an index.
- It uses a structure based on documents instead of tables and schemas and comes with extensive REST APIs for storing and searching the data.
- At its core, you can think of Elasticsearch as a server that can process JSON requests and give you back JSON data.
How does Elasticsearch work?
- Fundamentally, Elasticsearch organizes data into documents, which are JSON-based units of information representing entities.
- Documents are grouped into indices, similar to databases, based on their characteristics.
- Elasticsearch uses inverted indices, a data structure that maps words to their document locations, for an efficient search. Elasticsearch’s distributed architecture enables the rapid search and analysis of massive amounts of data with almost real-time performance.
Logical Concepts
Documents:
- Documents are the basic unit of information that can be indexed in Elasticsearch expressed in JSON, which is the global internet data interchange format. You can think of a document like a row in a relational database, representing a given entity — the thing you’re searching for.
- In Elasticsearch, a document can be more than just text, it can be any structured data encoded in JSON. That data can be things like numbers, strings, and dates.
- Each document has a unique ID and a given data type, which describes what kind of entity the document is.
- For example, a document can represent an encyclopedia article or log entries from a web server.
Index:
- An index is a collection of documents that have similar characteristics.
- An index is the highest level entity that you can query against in Elasticsearch.
- You can think of the index as being similar to a database in a relational database schema. Any documents in an index are typically logically related.
- In the context of an e-commerce website, for example, you can have an index for Customers, one for Products, one for Orders, and so on.
- An index is identified by a name that is used to refer to the index while performing indexing, search, update, and delete operations against the documents in it.
Inverted Index:
- An index, in Elasticsearch, is actually what’s called an inverted index, which is the mechanism by which all search engines work.
- It is a data structure that stores a mapping from content, such as words or numbers, to its locations in a document or a set of documents.
- It is a hashmap-like data structure that directs you from a word to a document.
- An inverted index doesn’t store strings directly and instead splits each document up to individual search terms (i.e. each word) then maps each search term to the documents those search terms occur within. For example, in the image below, the term “best” occurs in document 2, so it is mapped to that document. This serves as a quick look-up of where to find search terms in a given document. By using distributed inverted indices, Elasticsearch quickly finds the best matches for full-text searches from even very large data sets.
Term Frequency/Inverse Document Frequency (TF/IDF):
- TF-IDF means Term Frequency * Inverse Document Frequency.
- Term Frequency is how often a term appears in a given document.
- So if the word “space” occurs very frequently in a given document, it would have a high term frequency.
- The same applies if “the” word appears frequently to the document, it would also have a high term frequency.
- Document Frequency is how often a term appears in all documents.
- Document frequency is just how often a term appears in all of the documents in your entire index.
- Term Frequency / Document Frequency (divide) measures the relevance of a term in a document.
The word “space” probably doesn’t occur very often across the entire index, so it would have a low document frequency. However, the word “the” does appear in all documents quite frequently, so it would have a very high document frequency. If we divide term frequency by document frequency, that’s the same as multiplying by the inverse document frequency, mathematically we get a measure of relevance. It measures not only how often this term occurs within the document, but how does that compare to how often this term occurs in documents across the entire index?
So with that example, the word, “space”, in an article about space would rank very highly. However, the word “the” wouldn’t necessarily rank very high at all. It is because that’s a common term found in every other document as well, and this is the basic idea of how search engines work. If you’re searching for a given term, it will try to give you back results in the order of their relevancy. Relevancy is loosely based on the concept of TF-IDF, it’s not really complicated.
Backend Components:
Cluster:
An Elasticsearch cluster is a group of one or more node instances that are connected together. The power of an Elasticsearch cluster lies in the distribution of tasks, searching, and indexing, across all the nodes in the cluster.
Node:
A node is a single server that is a part of a cluster. A node stores data and participates in the cluster’s indexing and search capabilities. An Elasticsearch node can be configured in different ways:
- Master Node — Controls the Elasticsearch cluster and is responsible for all cluster-wide operations like creating/deleting an index as well as adding/removing nodes.
- Data Node — Stores data and executes data-related operations such as search and aggregation.
- Client Node — Forwards cluster requests to the master node and data-related requests to data nodes.
Shards:
Elasticsearch provides the ability to subdivide the index into multiple pieces called shards. Each shard is in itself a fully-functional and independent “index” that can be hosted on any node within a cluster. By distributing the documents in an index across multiple shards, and distributing those shards across multiple nodes, Elasticsearch can ensure redundancy, which both protects against hardware failures and increases query capacity as nodes are added to a cluster.
Replicas:
Elasticsearch allows you to make one or more copies of your index’s shards which are called “replica shards” or just “replicas”. Basically, a replica shard is a copy of a primary shard. Each document in an index belongs to one primary shard. Replicas provide redundant copies of your data to protect against hardware failure and increase capacity to serve read requests like searching or retrieving a document.
What is Elasticsearch Used for?
Now that we have a general understanding of what Elasticsearch is, the logical concepts behind it, and its architecture, we have a better sense of why and how it can be used for a variety of use cases. Below, we’ll examine some of Elasticsearch’s primary use cases and provide examples of how companies are using it today.
Primary Use Cases:
- Application search —- For applications that rely heavily on a search platform for the access, retrieval, and reporting of data.
- Website search —- Websites which store a lot of content find Elasticsearch a very useful tool for effective and accurate searches. It’s no surprise that Elasticsearch is steadily gaining ground in the site search domain sphere.
- Enterprise search —- Elasticsearch allows enterprise-wide search that includes document search, E-commerce product search, blog search, people search, and any form of search you can think of. In fact, it has steadily penetrated and replaced the search solutions of most of the popular websites we use on a daily basis. From a more enterprise-specific perspective, Elasticsearch is used to great success in company intranets.
- Logging and log analytics —- As we’ve discussed, Elasticsearch is commonly used for ingesting and analyzing log data in near-real-time and in a scalable manner. It also provides important operational insights on log metrics to drive actions.
- Infrastructure metrics and container monitoring —- Many companies use the ELK stack to analyze various metrics. This may involve gathering data across several performance parameters that vary by use case.
- Security analytics —- Another major analytics application of Elasticsearch is security analysis. Access logs and similar logs concerning system security can be analyzed with the ELK stack, providing a more complete picture of what’s going on across your systems in real-time.
- Business analytics —- Many of the built-in features available within the ELK Stack make it a good option as a business analytics tool. However, there is a steep learning curve for implementing this product in most organizations. This is especially true in cases where companies have multiple data sources besides Elasticsearch–since Kibana only works with Elasticsearch data. A good alternative is Knowi, an analytics platform that natively integrates with Elasticsearch and allows even non-technical business users to create visualizations and perform analytics on Elasticsearch data without prior knowledge or expertise of the ELK Stack.
Company Use Cases:
Netflix:
Netflix relies on the ELK Stack across various use cases to monitor and analyze customer service operations and security logs.
For example, Elasticsearch is the underlying engine behind their messaging system. In addition, the company chose Elasticsearch for its automatic sharding and replication, flexible schema, nice extension model, and ecosystem with many plugins. Netflix has steadily increased their use of Elasticsearch from a few isolated deployments to over a dozen clusters consisting of several hundred nodes.
Ebay:
With countless business-critical text search and analytics use cases that utilize Elasticsearch as the backbone, eBay has created a custom ‘Elasticsearch-as-a-Service’ platform to allow easy Elasticsearch cluster provisioning on their internal OpenStack-based cloud platform.
Walmart:
Walmart utilizes the Elastic Stack to reveal the hidden potential of its data to gain insights about customer purchasing patterns, track store performance metrics, and holiday analytics — all in near real-time. It also leverages ELK’s security features for security with SSO, alerting for anomaly detection, and monitoring for DevOps.
Benefits of using Elasticsearch:
- Fast Search: ElasticSearch provides very fast search results, making it ideal for applications where speed is important. It is capable of searching millions of documents in real-time and returning results in just a few milliseconds.
- Scalability: ElasticSearch is designed to be highly scalable and can easily handle large volumes of data. It can be used to index and search data across multiple servers, making it ideal for distributed applications.
- Flexibility: ElasticSearch is very flexible and can be used for a wide range of applications, including full-text search, analytics, logging, and more. It also supports a wide range of data types and provides various search options, including fuzzy search, partial matching, and more.
- High Availability: ElasticSearch is designed to be highly available, with built-in features such as data replication and automatic failover. This ensures that your data is always available and protected against hardware failures.
- Open-Source: ElasticSearch is open-source, which means that it is free to use and can be customized to suit your specific needs. It also has a large and active community of developers who contribute to its development and provide support.
- Integration: ElasticSearch can be easily integrated with other technologies, including logstash, Kibana, and others. This makes it easy to build powerful applications that combine search, analytics, and visualization.
Kibana:
Kibana is a data visualization and management tool for Elasticsearch that provides real-time histograms, line graphs, pie charts, and maps. It lets you visualize your Elasticsearch data and navigate the Elastic Stack.
You can select the way you give shape to your data by starting with one question to find out where the interactive visualization will lead you. For example, since Kibana is often used for log analysis, it allows you to answer questions about where your web hits are coming from, your distribution URLs, and so on.
If you’re not building your own application on top of Elasticsearch, Kibana is a great way to search and visualize your index with a powerful and flexible UI. However, a major drawback is that every visualization can only work against a single index/index pattern. So if you have indices with strictly different data, you’ll have to create separate visualizations for each.
For more advanced use cases, Knowi is a good option. It allows you to join your Elasticsearch data across multiple indexes and blend it with other SQL/NoSQL/REST-API data sources, then create visualizations from it in a business-user friendly UI.
Logstash:
Logstash is used to aggregate and process data and send it to Elasticsearch. It is an open-source, server-side data processing pipeline that ingests data from a multitude of sources simultaneously, transforms it, and then sends it to collect. It also transforms and prepares data regardless of format by identifying named fields to build structure, and transforming them to converge on a common format.
For example, since data is often scattered across different systems in various formats, Logstash allows you to tie different systems together like web servers, databases, Amazon services, etc. and publish data to wherever it needs to go in a continuous streaming fashion.
Beats:
Beats is a collection of lightweight, single-purpose data shipping agents used to send data from hundreds or thousands of machines and systems to Logstash or Elasticsearch. Beats are great for gathering data as they can sit on your servers, with your containers, or deploy as functions then centralize data in Elasticsearch.
For example, Filebeat can sit on your server, monitor log files as they come in, parse them, and import into Elasticsearch in near-real-time.
What are Examples of Beats?
There are currently six official Beats from Elastic: Filebeat, Metricbeat, Packetbeat, Heartbeat, Winlogbeat, and Auditbeat. All of these beats are open source and Apache-licensed. Elastic maintains a list of regularly updated community beats that users can download, install, and even modify as needed. While each beat has its own distinct use, they all solve the common problem of gathering data at its source and making it easy and efficient to ship that data to Elasticsearch.
Filebeat:
Filebeat is designed to read files from your system. It is particularly useful for system and application log files, but can be used for any text files that you would like to index to Elasticsearch in some way. In the logging case, it helps centralize logs and files in an efficient manner by reading from your various servers and VMs, then shipping to a central Logstash or Elasticsearch instance.
Metricbeat:
As the name implies, Metricbeat is used to collect metrics from servers and systems. It is a lightweight platform dedicated to sending system and service statistics. Like Filebeat, Metricbeat includes modules to grab metrics from operating systems like Linux, Windows and Mac OS, applications such as Apache, MongoDB, MySQL and nginx.
Metricbeat is extremely lightweight and can be installed on your systems without impacting system or application performance. As with all of the Beats, Metricbeat makes it easy to create your own custom modules.
Packetbeat:
Packetbeat, a lightweight network packet analyzer, monitors network protocols to enable users to keep tabs on network latency, errors, response times, SLA performance, user access patterns and more. With Packetbeat, data is processed in real time so users can understand and monitor how traffic is flowing through their network. Furthermore, Packetbeat supports multiple application layer protocols, including MySQL and HTTP.
Winlogbeat:
Winlogbeat is a tool specifically designed for providing live streams of Windows event logs. It can read events from any Windows event log channel, monitoring log-ons, log-on failures, USB storage device usage and the installation of new software programs. The raw data collected by Winlogbeat is automatically sent to Elasticsearch and then indexed for convenient future reference. Winlogbeat acts as a security enhancement tool and makes it possible for a company to keep tabs on literally everything that is happening on its Windows-powered hosts.
Auditbeat:
Auditbeat performs a similar function on Linux platforms, monitoring user and process activity across your fleet. Auditd event data is analyzed and sent, in real time, to Elasticsearch for monitoring the security of your environment.
Heartbeat:
Heartbeat is a lightweight shipper for uptime monitoring. It monitors services by pinging them and then ships data to Elasticsearch for analysis and visualization. Heartbeat can ping using ICMP, TCP and HTTP. IT has support for TLS, authentication and proxies. Its efficient DNS resolution enables it to monitor every single host behind a load-balanced server.
When to Use Filebeat and When to Use Logstash?
Filebeat is considered one of the best log shippers as it is lightweight, supports SSL & TLS encryption, and is extremely reliable. However, it cannot transform the logs into easy-to-analyze structured data. That’s the part performed by Logstash. If you require advanced log enhancement like filtering out unwanted bits of data or transforming data to another format, you have to go for Logstash. If you are only interested in the timestamp and message content, you can choose Filebeat to act as your log aggregator, especially in a distributed environment.
Filebeat can either ship data directly to Elasticsearch or first to Logstash, and then Logstash can ingest this data to Elasticsearch. If you want to use the benefit of Filebeat and Logstash, you can very well go with the second approach.
ELK Installation and Other Configurations:
Elasticsearch configuration:
- Download the Elasticsearch zip file from the official elastic website and extract the zip file contents.
- Start the Elasticsearch cluster by running bin/elasticsearch on Linux and macOS or bin\elasticsearch.bat on Windows.
- Make sure the Elasticsearch cluster is up and working fine by opening a browser and navigating to https://localhost:9200.
When we run the elasticsearch for the very first time, it will generate the default password for default user “elastic”. By using that default password we can login to elasticsearch.
It will also generate the enrollment token. When we run Kibana for the first time, it will ask for an enrollment token and then after it will ask for elasticsearch username and password.
Kibana Configuration:
- Setting up Kibana is similar to Elasticsearch. Download the latest version of the application from the official website.
- To start Kibana, run bin/kibana on Linux and macOS or bin/kibana.bat on Windows.
- By default, Kibana listens on port 5601. If you go to http://localhost:5601, you should be redirected to the Kibana home page.
Logstash Configuration:
- Download and extract the latest version of Logstash from official Logstash downloads.
- Here, inside the config folder, you need to create one configuration(.conf) file. For instance, in this case, you will be creating logstash-sample.conf.
- A Logstash configuration consists of 3 components:
1. Input – The input section in the configuration file defines the name and the absolute
path of the file from where data has to be fetched. In your case, it will be from the log file generated by the spring boot application.
2. Filter – The filter section defines what processing you want to apply to the data.
- 3. Output – The output section defines the target where you want to send the data (elasticsearch, for example).
To run the logstash configuration file, use the command: logstash -f <file>.conf
- In the input, you have specified the type and the complete path of your file. Note that the value of the path must be absolute and cannot be relative.
- In filters, wherever in the logs you will find a Tab character(\t) followed by “at”, you will tag that entry as a stack trace of an error. Grok is simply a filter plugin useful to parse and apply queries to unstructured data in Logstash.
- In the output, you have defined a response that prints to the STDOUT of the console running Logstash using output codecs.
According to the official docs:
Output codecs are a convenient method for encoding your data before it leaves the output without needing a separate filter in your Logstash pipeline. Here, rubydebug outputs event data using the ruby “awesome_print” library. This is the default codec for stdout. And finally, you have defined the output target where these logs have to be sent from logstash, that is, Elasticsearch running locally on port 9200.
Filebeat configuration:
You can configure Filebeat similar to how you have done for other ELK stacks.
- Extract the zip file after downloading it from the official Filebeat Downloads.
- In Logstash, you have to modify the logstash.conf file; similarly, here, you have to do the same thing in filebeat.yml.
To run Filebeat, use the command filebeat.exe -c filebeat.yml
- In the input, you have to specify the complete path of the log file from where Filebeat will read the data.
- In the output, you have to define the hosts as the Elasticsearch and the port on which it is configured to listen. The protocol takes either HTTP or HTTPS as one of its values. It specifies where Elasticsearch is reachable. In this case, it is HTTP.
- Now run the file with the command filebeat.exe -c filebeat.yml.
- Similar to what you have done for logstash, you need to create a filebeat index inside Kibana by getting the index name from the Elasticsearch indices list.
- Here, you will see a new index name starting with “filebeat-“. This is the index that has been created by Filebeat.
- Next, navigate back to the index pattern management console in Kibana.
- Click on the Create index pattern and type the index name as filebeat-*.
- In the next step, pick a field for filtering the data. You can again pick @timestamp and then click on Create index pattern.
- After this, navigate to http://localhost:5601/app/discover.
- Select the filebeat index from the filters that you just created, and you’ll be able to see and analyze the logs.
Creating Indexes and Viewing Logs in Kibana:
To access data from Elasticsearch, Kibana requires index patterns. As an analogy, in kibana, indexes are like the tables of a SQL database where you can store the data. Therefore, you must create an index pattern first to explore your data.
How do you know what index to create?
For this, navigate to http://localhost:9200/_cat/indices; it will display all the indexes that have ever been created inside elasticsearch. Here you will see an index name starting with “logstash-“. This is the index that has been created by Logstash.
- Next, go to http://localhost:5601/app/management/kibana/indexPatterns and click on create index pattern on the top right.
- Type the index name as logstash-* and then click on next.
- Optionally, in the next step, you can pick a field for filtering the data. You can choose @timestamp and then click on Create index pattern.
- After this, head on to the http://localhost:5601/app/discover.
- Select the index from the filters that you just created, and you’ll be able to see and analyze the logs.
Architecture of centralized logging with microservices:
Conclusion:
Centralized logging provides single-point access to log events from across microservices. This will significantly reduce debugging and troubleshooting efforts. Moreover, it will be very useful to monitor all microservices from one dashboard if you are anticipating microservice applications to extend and include new services.