Welcome!
Beyond Collection: Building Observability with Logs
Beyond Collection: Building Observability with Logs
Your logs are key to enhancing APM traces, providing metrics, detecting potential security threats, and maintaining regulatory compliance. Getting the full value from your logs requires more than just collecting and analyzing log lines.
This activity is split in two main parts:
- Part one: We will focus on Apache logs and use them as examples to leverage log parsing, logging without limits, log monitors, and log analytics.
- Part two: We will use a real application fully instrumented to demonstrate how logs can bring context to your metrics and traces.
When you are ready to continue, click Start Scenario.
Congratulations!
You've completed the scenario!
Scenario Rating
That feeling when you've finished the workshop \(^O^)/
Your environment is currently being packaged as a Docker container and the download will begin shortly. To run the image locally, once Docker has been installed, use the commands
cat scrapbook_lokoms_log-workshop-4_container.tar | docker load
docker run -it /lokoms_log-workshop-4:
Oops!! Sorry, it looks like this scenario doesn't currently support downloads. We'll fix that shortly.

Steps
Beyond Collection: Building Observability with Logs
Logging best practices presentation
Don't go further than this step for now, and listen to the Logging best practices presentation.
Log Management 101
Collection
Datadog is a Saas product, meaning if you want to see your logs in Datadog you need to collect them from your hosts, containers, applications, browsers, and serverless functions and send them to Datadog.
Details on how to collect logs from your environment can be seen:
- In the Datadog Log collection documentation.
- In the Getting started section of your Datadog application.
For this workshop, we are going to cover log collection from a Docker environment thanks to the Agent.
Running the Agent
To run the Agent:
- Connect to your Datadog application.
Enable the log management product by entering the log menu and lick Getting Started:
Reach out to the Docker Agent installation command, and copy past it in the terminal on the right of this panel.
This should have installed the Agent and log collection is enabled, check that the Agent is running with docker ps
Log generation
Once the Agent is up and running, let's generate some fake Apache logs with:
docker run --name flog -d -it --rm mingrammer/flog -f apache_combined -l -n 100000 -d 0.2
This uses the mingrammer/flog in order to generate fake log in the apache_combined
format.
Log Management 101
Checking logs
Go now in your log explorer page of your Datadog Account to see the log flowing in:
As you can see logs are received but are not parsed, in order to go beyond just log collection we need to parse those logs in order to extract information and enrich them with categorization
Pattern view
While you are in the log explorer view, switch to the pattern view in the upper left corner of the page to see the global list of patterns from your logs:
As you may notice, the date and user-agents information are messing with the log clustering detection feature, parsing will also allow us to change this.
Beyond collection: Processing
In order to get the most out of your logs you need to process them in order to enforce:
- A Structure
- A Standart
- A Norm
In order to achieve this, you are going to use the Log configuration page of Datadog. The process is always the same:
- You create a pipeline to only process a defined subset of your logs in a given way.
- You add processors to your pipeline to actually define your processing strategy in a sequential way.
- You standardize your list of just-extracted-attributes, thanks to the Standard Attribute section.
[EXERCISE] Parsing
Time for your first exercise, the goal here is to parse the collected Apache log in order to extract all useful information and assign them to a standard attribute.
Pro tip: To help you in this exercise, check the exemples from the Datadog Log Parsing documentation.
Exercise:
- Go in the log configuration page.
- Create a first pipeline with a filter
service:flog
. - Create a Grok processor in this pipeline
- Add a log from
flog
in the log sample section. - Define your first parsing rule in order to extract the following information:
rule %{ip:network.client.ip} %{notSpace:http.ident:nullIf("-")} %{notSpace:http.auth:nullIf("-")} \[%{date("dd/MMM/yyyy:HH:mm:ss Z"):date}\] ".*
Note the .*
at the end of the rule .
means any character, *
means any number of times. Adding this at the end of your rule allows it to match only the beginning of your log.
The goal now is to extract all other information from your log with new %{MATCHER:ATTRIBUTE}
groups:
Matcher | Attribute | Description |
---|---|---|
word |
http.method |
The HTTP method associated with the request. |
notSpace |
http.url |
The URL associated with the request. |
number |
http.version |
The HTTP version used. |
integer |
http.status_code |
The status code returned. |
integer |
network.bytes_written |
The amount of bytes of the HTTP response. |
notSpace |
http.referer |
The referer of the request. |
data |
http.useragent |
The User-Agent associated with the request. |
Note: Those attributes follow the Datadog Standard attributes Naming convention.
[ANSWER] Parsing
If you reached this page without doing the exercise from the previous step, your are cheating ( ͡° ͜ʖ ͡°)
Your grok parser should look like this:
The rule used is the following, copy paste it in your grok parser to make sure to be correctly set for the rest of the workshop:
rule %{ip:network.client.ip} %{notSpace:http.ident:nullIf("-")} %{notSpace:http.auth:nullIf("-")} \[%{date("dd/MMM/yyyy:HH:mm:ss Z"):date}\] "%{word:http.method} %{notSpace:http.url} HTTP\/%{number:http.version}" %{number:http.status_code} %{integer:network.bytes_written} "%{notSpace:http.referer}" "%{data:http.useragent}"
With the following explanation:
Text | Pattern |
---|---|
172.20.0.1 | %{ip:network.client_ip} |
cassin5575 | %{notSpace:http.auth:nullIf("-")} |
[26/06/2019:09:22:54 +0000] | [%{date("dd/MM/yyyy:HH:mm:ss Z"):date}] |
GET | %{word:http.method} |
/simulate_sensors | %{notSpace:http.url} |
1.1 | %{number:http.version} |
200 | %{number:http.status_code} |
2345 | %{integer:network.bytes_written} |
http://www.dynamicmodels.net/cross-platform/open-source/revolutionary | %{notSpace:http.referer} |
Mozilla/5.0 (X11; Linux i686; rv:5.0) Gecko/2002-04-12 Firefox/37.0 | %{data:http.useragent} |
[EXERCISE] Enriching your logs - Severity
Now that the logs are parsed and each value can be manipulated, let's push our processing further by enriching the logs.
The goal of this exercise is to assign a severity based on the status code of the log. An access log by definition doesn't have any severity attached, but there is a way to assign your log a severity depending on the value of the http.status_code
attribute.
Exercise:
Create a category processor in your pipeline, and add four categories to it:
@http.status_code:[200 TO 299]
:ok
@http.status_code:[300 TO 399]
:notice
@http.status_code:[400 TO 499]
:warning
@http.status_code:[500 TO 599]
:error
The value should be assigned to a new attribute:
http.status_code_category
Remap the newly created attribute as the status of the log with to the Status Remapper.
[ANSWER] Enriching your logs - Severity
If you reached this page without doing the exercise from the previous step, your are cheating ( ͡° ͜ʖ ͡°)
Category processor
The configuration of the category processor should look like this:
Status remapper
The configuration of the status remapper should look like this:
[EXERCISE] Enriching your logs - URL and UserAgent
After assigning a severity to our logs, let's just purely enrich it by parsing the URL and the User-Agent in order to extract more information from those two attributes.
Exercise:
Create an URL parser to extract all query parameters from your requested URL in
http.url
and pass them intohttp.url_details
.Create a User-Agent parser to extract all User Agent information from the attribute
http.useragent
and pass them into the attributehttp.useragent_details
[ANSWER] Enriching your logs - URL and UserAgent
If you reached this page without doing the exercise from the previous step, your are cheating ( ͡° ͜ʖ ͡°)
URL parser
The configuration of the URL parser should look like this:
User-Agent parser
The configuration of the User-Agent parser should look like this:
[EXERCISE] Enriching your logs - Message
After enriching the log, we can now move to an advanced processing capability that will allow you to better leverage the pattern feature of the log explorer.
Since the pattern feature is based on a log status, service, and message, the goal here is to refactor this message in order to keep only useful information for clustering.
Exercise:
Create a temporary message in the attribute
msg_tmp
thanks to the String Builder processor for your log that includes:Request %{http.method} %{http.url_details.path} with response %{http.status_code}
Remap this temporary message in
msg_tmp
as the official log message with a Message remapper.
[ANSWER] Enriching your logs - Message
If you reached this page without doing the exercise from the previous step, your are cheating ( ͡° ͜ʖ ͡°)
String builder
The configuration of the String builder processor should look like this:
Message remapper
The configuration of the message remapper processor should look like this:
Beyond collection: Processing
Pipeline overview
If you followed all previous exercises, your pipeline should look like this:
Be aware that processors within a pipeline are applied sequentially so order does matter.
Final result
Go back in your log explorer view, your logs should now be properly parsed and have the right status assigned to them:
Pattern view
Because the message of your log is now properly formatted, and the status correctly assigned the pattern feature is now way more accurate:
Beyond collection: Manipulation
Now that logs are properly processed and implement logging best practices, you can leverage all of the other features of Datadog:
- Extended Log search over the log list thanks to Facets and Measures.
- Log analytics.
- Log monitors.
In order to be able to manipulate a log attribute, you need to create a Facet or a Measure out of it.
Use facets when you need:
- To filter your logs against specific value(s). For instance, create a facet on an environment tag to scope troubleshooting down to development, staging, or production environments.
- To get relative insights for values. For instance, create a facet on http.network.client.geoip.country.iso_code to see the top countries most impacted per number of 5XX errors on your NGINX web access logs, enriched with the Datadog GeoIP Processor.
- To count unique values. For instance, create a facet on user.email from your Kong logs to know how many users connect every day to your website
Use measures when you need:
- To aggregate values from multiple logs. For instance, create a measure on the size of tiles served by the Varnish cache of a map server and keep track of the average daily throughput, or top-most referrers per sum of tile size requested.
- To range filter your logs. For instance, create a measure on the execution time of Ansible tasks, and see the list of servers having the most runs taking more than 10s.
- To sort logs against that value. For instance, create a measure on the amount of payments performed with your Python microservice. You can then search all the logs, starting with the one with the highest amount.
[EXERCISE] Facets creation and manipulation
Create a facet on the following attribute:
http.url_details.path
http.method
http.status_code
Use those facets to filter your log poll.
- Create an analysis in the Log analysis page that displays the relative share of request count per status code.
[ANSWER] Facets creation and manipulation
If you reached this page without doing the exercise from the previous step, your are cheating ( ͡° ͜ʖ ͡°)
To create a facet, click on the attribute in any log and select create facet:
After a facet is defined you can manipulate it in log analysis, find below the relative share of request count per status code:
Beyond collection: Monitoring
In the grand scheme of things, visualisation is great.. but monitoring is better!
You can create a log monitor to alert you when a specified type of log exceeds a user-defined threshold over a given period of time.
[EXERCISE] Monitor creation
Let's create a log monitor that sends an alert when there are too much 5xx logs for any url path:
- Create a query in the log analytics page that displays the top 100 URL paths with a 5xx status code.
- Export your log analytics as a monitor and define
2
as the alert threshold. - Create a monitor notification template that inlines the
http.url_details.path
in the notification thanks to the following message:
ALERT more than 2 5xx status code have been detected for {{http.url_details.path}} over the last 5mins
[ANSWER] Monitor creation
If you reached this page without doing the exercise from the previous step, your are cheating ( ͡° ͜ʖ ͡°)
This is what your query should look like:
When creating a log monitor, the group that triggers the monitor is available in the monitor notification message:
Beyond collection: Management as in Log Management
We saw how to manipulate your logs and how you could monitor them, but in Log management there is the word management.
The whole idea of Datadog Logging Without Limits is that: it's not because a log is sent to Datadog, a log is "ingested", a log is "processed" that you necessarily want to index it for search/analytics/monitor use-cases.
Ingesting and indexing are two different things.
When ingesting a log in Datadog you can:
- Index it to allow search, analytics, and monitors
AND/OR:
- Archive it thanks to Log archives. And retrieve it later thanks to the Rehydrate from Archives feature.
AND/OR:
- Generate a metric out of it in order to only track a KPI.
[EXERCISE] Logging without Limits
Let's remove the highest cardinality status code from the logs, but keep track of the overall amount received as a metric:
- Remove all logs with
2xx
status code from your index thanks to an index filter. - Generate a metric
nb.logs.per_status.per_status_code
from your logs counting the amount of log received for theflog
service, tagged byhttp.status_code
andstatus
Bonus
- Enable the two telemetry metrics in the Generate metric section.
- Import the Log Estimated usage dashboard in your account.
[ANSWER] Logging without Limits
If you reached this page without doing the exercise from the previous step, your are cheating ( ͡° ͜ʖ ͡°)
Removing logs from an index
To remove all logs with 2xx status code you should add the following index filter:
Generating a metrics
In order to keep track of your KPIs the following metric should be generated:
Bonus
After enabling the metrics you should get a dashboard like this:
Bonus: The Source tag
We saw how you could create pipelines in order to parse and enrich your logs in order to manipulate them and monitor them.
To reduce your time to value so that you can focus on what you do best: building amazing applications, Datadog has already created OOTB pipelines that are automatically imported when the right source
tag is set.
The full list of OOTB pipelines and their corresponding source can be seen in the upper right corner of the Log configuration page under the Browse Pipeline Library button:
Exercise:
- Kill the previous container emitting logs:
docker kill flog
- Launch it again with the right
source
tag:docker run -d --label com.datadoghq.ad.logs='[{"source": "apache", "service": "apache"}]' -it --rm mingrammer/flog -f apache_combined -l -n 100000 -d 0.2
Now go in your log configuration page to see the Apache pipeline activated automatically:
In your log explorer you also now have Apache saved views that allow you to focus on specific log use-cases:
Break
After the break we are going to discover how you could go beyond log collection with a real app example.
To do:
Kill all running containers to have a fresh start:
docker container kill $(docker ps -q)
Start the Storedog application with the following command:
app_start
After a couple of minutes, click on the storedog
tab of the terminal to see the application running or access this link:
https://[[HOST_SUBDOMAIN]]-3000-[[KATACODA_HOST]].environments.katacoda.com/
You should see this page:
Refresh the Application page, click around to begin generating metrics, APM traces, and logs for your application.
Alternatively you can click on generating_traffic
to generate traffic constantly.
Real app: log collection
Since this application is running over Docker containers, in order to collect all monitoring data, the Datadog Agent runs as a container alongside the one of your application. The Datadog Agent container is configured via environment variables and mounting volumes on the underlying host:
agent:
image: "datadog/agent:7.21.0"
environment:
- DD_API_KEY
- DD_APM_ENABLED=true
- DD_LOGS_ENABLED=true
- DD_LOGS_CONFIG_CONTAINER_COLLECT_ALL=true
- DD_PROCESS_AGENT_ENABLED=true
- DD_DOCKER_LABELS_AS_TAGS={"my.custom.label.team":"team"}
- DD_TAGS='env:ruby-shop'
ports:
- "8126:8126"
volumes:
- /var/run/docker.sock:/var/run/docker.sock:ro
- /proc/:/host/proc/:ro
- /sys/fs/cgroup/:/host/sys/fs/cgroup:ro
labels:
com.datadoghq.ad.logs: '[{"source": "datadog-agent", "service": "agent"}]'
By default, the Agent needs only your Datadog API key to start collecting metrics from your other containers, but by adding extra configurations, the Agent can also collect your application Traces, logs, and processes data:
Trace collection
To allow for Trace collection coming from other containers, the port 8126
is open on the Agent container.
Next, the DD_APM_ENABLED
environment variable is set to true
. Although enabled by default, setting this variable lets other people know you are using APM.
Learn more on the Datadog APM documentation
Log collection
The following configuration lines in your docker-compose.yml
file at the root of the workshop directory allow to enable log collection for all running containers:
agent:
environment:
# (...)
- DD_LOGS_ENABLED=true
- DD_LOGS_CONFIG_CONTAINER_COLLECT_ALL=true
volume:
# (...)
- /opt/datadog-agent/run:/opt/datadog-agent/run:rw
Configuration | type | Explanations |
---|---|---|
DD_LOGS_ENABLED=true | env variable | Enables log collection |
DD_LOGS_CONFIG_CONTAINER_COLLECT_ALL=true | env variable | Enables log collection for all containers |
/opt/datadog-agent/run:/opt/datadog-agent/run:rw | volume | Used to store pointers on container current log |
Refer to the Datadog Agent log collection documentation to learn more.
Real app: Tagging
Collecting every type of data available is key to go beyond just collecting your logs. Each type of data represents a new facet or a new approach to solving a given issue. To bind the data together, Datadog uses three tags:
- The
hostname
tag (or thecontainerId
if working with containers) - The
source
tag - The
service
tag
Hostname tag
The hostname
(or containerId
) tags are collected automatically by the Datadog Agent. To overwrite these tags, edit the main datadog.yaml
configuration file or use an environment variable.
Source tag
The source tag enables log integration pipelines and enforces the Datadog naming conventions
Datadog has a wide range of log integrations. To enable the Log integration pipelines in Datadog, pass the source name as a value for the source attribute with a docker label. See the full list of supported log sources.
Learn more about Datadog attribute naming convention.
Service tag
The service tag binds metrics traces and logs.
It works automatically with known images, but if you have a custom application name, the service name may not match the services in APM, so you need to set it up with a custom label:
labels:
com.datadoghq.ad.logs: '[{"source": "python", "service": "discounts-service"}]'
In the Datadog Container Ecosystem, Labels are key to leveraging the most out of Datadog Autodiscovery, if you look at your docker-compose.yml
file at the root of the workshop directory, you can see that each container has associated labels:
version: '3'
services:
agent:
# (...)
labels:
com.datadoghq.ad.logs: '[{"source": "datadog-agent", "service": "agent"}]'
discounts:
# (...)
labels:
com.datadoghq.ad.logs: '[{"source": "python", "service": "discounts-service"}]'
my.custom.label.team: "discount"
frontend:
# (...)
labels:
com.datadoghq.ad.logs: '[{"source": "ruby", "service": "store-frontend"}]'
my.custom.label.team: "frontend"
advertisements:
# (...)
labels:
com.datadoghq.ad.logs: '[{"source": "python", "service": "ads-service"}]'
my.custom.label.team: "advertisements"
db:
# (...)
labels:
com.datadoghq.ad.logs: '[{"source": "postgres", "service": "postgres"}]'
Real app: Integrations setup
Now that the data is collected, enable the corresponding integration in Datadog to pre-configure your platform:
- Go in Integrations -> Integrations.
- Enable the Datadog - Docker integration.
- Enable the Datadog - SSH integration.
- Refresh the integration page.
Out of the box, the Docker integrations installed a Dashboard:
Note that on any given dashboard, you can click on a displayed metric to switch to the corresponding logs:
Real app: Containers overview
Our application is running over Docker containers, as such Datadog has a Container centric page:
Where you can see all containers running with their associated:
- Processes
- Metrics
- Logs
When zooming on a given container, you can see the processes running inside it with its key metrics:
and the corresponding logs being emitted in real-time by this container:
Real app: Trace and log correlation
When collecting traces and logs, after enabling Log-Trace connection, you can now see your logs attached to your trace:
And your logs have a trace attached
Real app: Problems
Let's finish by investigating a real problem that could happen in your environment, execute the following commands:
creating_issue
And try to understand what happened with the APM service overview, the log explorer, and the docker dashboard.