MDC Logging

15 Jul 2015

MDC

Mapped Diagnostic Context is a mechanism by which we can insert contextual data into the logging system at runtime and use it to provide additional information within the filename of the logfile, or within the log statements themselves.

Use cases for this center around multi-threaded applications where either:

Each thread is responsible for the full lifecyle of a message and we wish to tie all activity relating to that message in a single file with a filename relating to message keys.
Multiple threads are responsible for different aspects of a message’s lifecycle and we wish to feed every log statement associated with a particular message into some sort of log aggregator (like logstash) and then search this activity later using elastic search.

For example, in order to investigate (via log inspection) the processing of the following json message:

{
  messageId:UUID-1242-682830180-2818230-5383389,
  userId:12345,
  processingDate:20151223,
  firstName:"Cyril",
  lastName:"Figgis",
  job:accountant
}

We might wish to generate a single log file whose filename contains the userId, processingDate, and messageId in order to allow us to easily search for via the unix ls command. For example: messageLog-12345-20151223-1242-682830180-2818230-5383389.log

Assuming the message keys are final for the duration of the message processing, then this log file could be found using ls **12345 to find all logs relating to Cyril, or ls *12345*20151223* to find all logs for Cyril which occurred on the 23rd of December.

The problem with this 1 file per message approach is that on certain filesystems, particularly small ones (linode’s disks for example), if you process millions of messages, you end up with millions of log files which are difficult to process, and you might even run out of inodes.

A much more sustainable approach is to have 1 logfile per thread, and instead, stamp the message key information on the log statements within the file, for example:

Thread-01.log

20151223:23:00:00:500 INFO 1242-682830180-2818230-5383389 12345 Processing message
20151223:23:00:03:500 INFO 1242-682830180-2818230-5383389 12345 Calling rest service on http://service:8080/query?x=1
20151223:23:00:07:500 INFO 1242-682830180-2818230-5383389 12345 Transaction complete

The above log file contains 2 different message activities however, our logstash and elastic search layer will allow us to aggregate the data because every line has been stamped with the message key data.

This is how we actually stamp the message key data:

MDC.setContext("descriptor",contextualString);

For example:

public void processMessage(Message m) {
  MDC.setContext("descriptor",m.toString());
  LOGGER.info("Begin message processing")
  //processing logic goes here and any log statements like this next one will be stamped with contextual data:
  LOGGER.info("End message processing")
  MDC.setContext("descriptor", "");
}

Adding this MDC context setup and cleanup at the beginning of every method is tedious, especially if the context can be provided via a m.toString() method. To avoid this we can even create a custom annotation to be processed via AspectJ or similar, where by we annotate the method and if ambiguous the payload whose toString() method should provide the contextual data.

If using toString() is not for you, then maybe

Ensure your data objects provide a getContext() method specifically designed for this purpose.

This allows us to focus on the business logic rather than peppering our code with boilerplate code, such a message would be highly re-usable too.

Here’s a better example:

//example usage
@Loggable
public void processMessage(@Payload Message m) {
  //core business logic here
}

//payload annotation
@Retention(RetentionPolicy.RUNTIME)
@Target(ElementType.PARAMETER)
public @interface Payload {
}

//loggable annotation
@Retention(RetentionPolicy.RUNTIME)
@Target(ElementType.METHOD)
public @interface Loggable {
}

public class LoggableAspect {
	@Around("@annotation(uk.co.solong.loggable.Loggable)")
	public Object around(ProceedingJoinPoint point) throws Throwable {
		Object[] args = point.getArgs();
		for (Object arg : args) {
			//find the arg that is annotated with payload
			MDC.setContext("descriptor", arg.toString());
		}
		LOGGER.info("======Begin message processing======");
		LOGGER.info("======End message processing======");
		MDC.setContext("descriptor", "");
	}
}

Read more... 👓

Packaging Applications via JDeb

05 May 2015

Suppose you have an application written in java which you want to deploy to an ubuntu based server via puppet. Puppet uses native packaging in order to install packages in an idempotent manner, for ubuntu this means your application must be available as a debian package.

This article explains how to package a service as part of your maven build in a cross platform manner using JDeb with the minimal set of files. The result is a deb file containing your service which can be deployed to your server, installed, configured, and run as a service.

Lets start off by adding the jdeb plugin to the pom.

<plugin>
  <artifactId>jdeb</artifactId>
  <groupId>org.vafer</groupId>
  <version>1.3</version>
  <executions>
    <execution>
      <phase>package</phase>
      <goals>
        <goal>jdeb</goal>
      </goals>
      <configuration>
        <deb>${project.build.directory}/${project.artifactId}.deb</deb>
        <verbose>true</verbose>
        <snapshotExpand>true</snapshotExpand>
        <!-- expand "SNAPSHOT" to what is in the "USER" env variable -->
        <snapshotEnv>USER</snapshotEnv>
        <verbose>true</verbose>
        <controlDir>${basedir}/src/deb/control</controlDir>
        <dataSet>
          <data>
            <src>${project.build.directory}/appassembler/bin</src>
            <type>directory</type>
            <mapper>
              <type>perm</type>
              <prefix>/opt/${company.name}/${project.artifactId}/${project.version}/bin</prefix>
              <user>root</user>
              <group>root</group>
              <filemode>755</filemode>
            </mapper>
          </data>
          <data>
            <src>${project.build.directory}/appassembler</src>
            <excludes>bin/**</excludes>
            <type>directory</type>
            <mapper>
              <type>perm</type>
              <prefix>/opt/${company.name}/${project.artifactId}/${project.version}</prefix>
              <user>root</user>
              <group>root</group>
              <filemode>644</filemode>
            </mapper>
          </data>
          <data>
            <type>template</type>
            <paths>
            <path>/opt/${company.name}/${project.artifactId}/${project.version}/conf</path>
            <path>etc/${project.artifactId}</path>
            <path>var/lib/${project.artifactId}</path>
            <path>var/log/${project.artifactId}</path>
            <path>var/run/${project.artifactId}</path>
            </paths>
            <mapper>
              <type>perm</type>
              <user>loader</user>
              <group>loader</group>
            </mapper>
          </data>
          <data>
            <type>link</type>
            <linkName>/opt/${company.name}/${project.artifactId}/current</linkName>
            <linkTarget>/opt/${company.name}/${project.artifactId}/${project.version}</linkTarget>
            <symlink>true</symlink>
          </data>
          <data>
            <src>src/deb/upstart/${project.artifactId}.conf</src>
            <type>file</type>
            <mapper>
              <type>perm</type>
              <prefix>/etc/init</prefix>
              <filemode>644</filemode>
              <user>root</user>
              <group>root</group>
            </mapper>
          </data>
        </dataSet>
      </configuration>
    </execution>
  </executions>
</plugin>
    

The above config can broken down into the following components:

Upstart Script config
‘current version’ symlink creation
Config directory creation
Start-up scripts directory
Control directory declaration
Jar capture
Install directory creation

Control directory

We must create a control directory in our project to hold the set of files required to build a debian package.

/src/deb/control

Then create a file called control within the above control directory with this exact contents, substituting the maintainer email address only (leave the name, version, description as placeholders):

Package: [[name]]
Version: [[version]]
Section: misc
Priority: low
Architecture: all
Depends: 
Description: [[description]]
Maintainer: name@yourcompany.com

Create the following files with no content in the control directory:

src/deb/control/postinst
src/deb/control/postrm
src/deb/control/preinst
src/deb/control/prerm

The above files can be used as part of the debian package install lifecycle (post install, pre install etc). We don’t need to do any configuration so we can leave them without content for now.

Other files

Create another blank file:

/src/deb/init.d/myservice

Upstart directory and script

Create a script named after your project (e.g. myproject.conf) in the /src/deb/upstart directory, replace myproject with the name of your project and company with the name of your company.

/src/deb/upstart/myproject.conf

# vim: set ft=upstart ts=4 et:
description "myproject"

start on runlevel [2345]
stop on runlevel [!2345]

limit nofile 64000 64000

kill timeout 300 # wait 300s between SIGTERM and SIGKILL.

pre-start script
    mkdir -p /var/lib/myproject/
    mkdir -p /var/log/myproject/
end script

script
    ENABLE_MYPROJECT="yes"

    if [ "x$ENABLE_MYPROJECT" = "xyes" ]; then
        exec start-stop-daemon --start --quiet --chuid root  \
            --exec /opt/company/myproject/current/bin/start -- 
    fi
end script

The above script will allow your service to run as a daemon on startup. It also makes sure the log directories are created.

When you build out the project using mvn install, it will compile the jar and create a debian artifact which you can deploy to your server.

The package will be installed at /opt/company/project/version

In addition to the core directory there will be a symlink to the ‘current version’ /opt/company/project/current

Within this installation directory there will be a configuration directory and a binaries directory. You can then use puppet or some other provisioner to create the configuration property files required in the config directory.

Lastly you’ll want to create a set of run scripts which are compatible with the upstart script fork model.

Lets examine the following build plugin

<plugin>
    <groupId>org.codehaus.mojo</groupId>
    <artifactId>appassembler-maven-plugin</artifactId>
    <version>1.8.1</version>
    <executions>
        <execution>
            <phase>package</phase>
            <goals>
                <goal>assemble</goal>
            </goals>
        </execution>
    </executions>
    <configuration>
        <includeConfigurationDirectoryInClasspath>true</includeConfigurationDirectoryInClasspath>
        <configurationDirectory>conf</configurationDirectory>
        <programs>
            <program>
                <mainClass>uk.co.solong.application.main.spring.java.AutoAnnotationWebApplication</mainClass>
                <id>start</id>
                <jvmSettings>
                    <initialMemorySize>20m</initialMemorySize>
                    <maxMemorySize>256m</maxMemorySize>
                    <maxStackSize>128m</maxStackSize>
                    <systemProperties>
                        <systemProperty>logback.configurationFile=logback.xml</systemProperty>
                        <systemProperty>APP_ENV=prod</systemProperty>
                    </systemProperties>
                </jvmSettings>
            </program>
        </programs>
    </configuration>
</plugin>

In the above script we provide the AutoAnnotationWebApplication class as the Main, and any necessary system properties.

Be sure to include the configuration directory in the class path otherwise properties files will not be visible to the service.

For a concrete example, take a look at the pom used for list.tf

Read more... 👓

Managing Custom Domains for Github Pages via Linode

02 Jan 2015

Github Pages is a service provided by Github which can be used to host static blogs. In this article I walk through the basic setup of Github Pages and how to correctly configure a custom domain via the Linode DNS manager.

Goals

Enable comments (via disqus)
Receive mail (managed by google)
Access the github site via http://www.domain.com
Access the github site via http://domain.com (will redirect to www.domain.com)
Configure the site optimally so that github does not throw DNS related warnings.
Add comments

Basic Setup

Lets start by creating a blog in github. Fastest way to do this is to clone the project at «Insert template area here» I created this project which contains everything we need to get started, but uses placeholders instead of dummy content (so you won’t have to remove a bunch of fake articles).

CNAME File

In the root directory of the project you can find a blank file called CNAME Assuming your site is called www.domain.com Add the following as the first and only line in the file:

www.domain.com

Adding this file ensures that github knows to serve this repository up for that domain.

A Records

We then need to set the A Records in Linode up to ensure that domain.com resolves to the github primary and secondary servers. To manage A records, go to the DNS Manager, edit the domain, and ensure that the A/AAAA records table looks exactly like it does below. (2 records only, both as shown).

A/AAAA Records
Hostname	IP Address	TTL
	192.30.252.154	Default
	192.30.252.153	Default

CNAME

To allow www.domain.com to work, we must add the following CNAME record which should point to the repositoryname.

CNAME Records
Hostname	Aliases to	TTL
www	yourreponamehere.github.io	Default

Read more... 👓

Java Logging Quickstart

13 Dec 2014

On a typical application’s dependency graph we could find as many as 4 libraries relating to logging. Why do we need so many? What do each of the libraries actually do? slf4j? logback? core/classic? log4j2? This article attempts to answer some of these questions.

Lets start with a real example POM from an application I own:

<dependency>
	<groupId>org.slf4j</groupId>
	<artifactId>slf4j-api</artifactId>
	<version>${slf4j.version}</version>
</dependency>
<dependency>
	<groupId>ch.qos.logback</groupId>
	<artifactId>logback-classic</artifactId>
	<version>${logback.version}</version>
</dependency>
<dependency>
	<groupId>ch.qos.logback</groupId>
	<artifactId>logback-core</artifactId>
	<version>${logback.version}</version>
</dependency>
<dependency>
    <groupId>org.slf4j</groupId>
    <artifactId>jcl-over-slf4j</artifactId>
    <version>1.7.7</version>
</dependency>

We can categorise each of these logging dependencies into 3 types:

Facades
Implementations
Adapters

The nature of your project (Application or Library) and how poorly libraries outside your control have been written will largely govern which of the above categories you’ll need to include in your dependency chain. As a rule:

If you’re writing a library that has no dependencies, you will only ever need to include Facades.
If you’re writing an application, you will need to include both Implementations and Facades.
If you’re writing an application that depends on libraries where the authors have failed to give proper thought to their logging strategy, you will most likely need to include adapters to deal with the fallout arising from this.

For the purposes of this article, an application is a stand-alone block of code that either has or uses a main method, and a library is just a bunch of classes that provide some functionality in the context of an application.

Lets talk about each of the dependencies from the POM in turn.

slf4j-api

SLF4J-API (Simple Logging Facade For Java) is as the name suggests a logging facade - i.e. a layer of abstraction, which allows us to define the concept of “logging” without deciding if we’re logging to disk or STDOUT, or which classes we want to filter, or what level we want to debug at. You can think of this library as being similar to an abstract class that somebody else later tunes to their needs. By including this library on the classpath, we can allow our libraries and applications to perform logging at the ‘conceptual’ level. Note that this is just a facade and that without an implementation backing it, nothing will happen, no messages will be logged. However, a facade does at least allow us to start writing code, so here’s what a typical usage pattern looks like:

package uk.co.solong.example.logging;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

public class Example {

    private static final Logger logger = LoggerFactory.getLogger(Example.class);

    public void doNothing() {
        logger.info("Doing nothing");
    }

}

Note the two classes imported Logger, LoggerFactory, we used these to create a logger (passing in the name of the class), and said that a message “Doing nothing” should be logged at the info level. But we never said if this should be written to disk; this is for the implementation to deal with, indeed there are many implementations to chose from, their performance characteristics are different and so by binding our library to a facade rather than an implementation we allow ourselves the freedom to not only chose a different implementation, but to not have to recompile our code when we chose to switch.

logback-classic and logback-core

Logback classic and Logback core are 2 artifacts belonging to the logging *implementation known as logback, this means that these libraries are responsible for actually writing stuff to disk, and facilitate the configuration of logging from a threading perspective, provide filtering and other “concrete implementation” related stuff. If logging performance is of concern to you, then your implementation of choice is one that you’ll want to think carefully about. log4j and log4j 2, are two alternaties to the logback implementations and you can pretty much swap them out on a 1:1 basis (because we were smart enough to use a facade). If you were writing an application, you would want to pick one, however, if you are writing a library

jcl-over-slf4j

JCL over slf4j is an adapter which is designed to send calls made to the Jakarta Commons Logging through the slf4j api. Why would you need to include this? Some libraries that you have might be really old, or the developers might not be thinking about logging properly and just slap any old logging jar on. If you don’t include this, and there are calls to jcl that go unrouted, you can often see symptoms such as log entries appearing in STDOUT, or log entries ignoring your log-level suppression rules.

Read more... 👓

Github as a PPA alternative

06 Nov 2014

When using puppet, software packages should be native to the OS you are provisioning. For example, you should use debian (.deb) packages on Ubuntu rather than zip files.

You can publish open source packages to public Debian repositories. Closed projects present a challenge; either pay for a binary repo or maintain your own PPA repository. Both Bintray and JFrog are expensive ranging between $45 and $300 per month. The effort in maintaining a private PPA is also quite high.

Github allows you to attach binary artifacts to your releases via their API. Since I am already paying $5/mo for a private github repository to manage my sourcecode, I thought I would get my moneys-worth. To that end, I have written a custom puppet module capable of pulling artifacts from tagged releases.

The module requires the following info:

github account name
project name
project version
github api key

The sourcecode for the module is available here: gitartifact.

Usage:

Just import the module like any other puppet module.

puppet module install danielburrell-gitartifact

Place this in your puppet manifest:

 gitartifact { "mygitartifact":
        account => "danielburrell",
        repository => "repo",
        releasetag => "0.0.2",
        pattern => "partialmatch.deb",
        oauth => "YOUR_OAUTH_TOKEN_HERE",
        destination => '/home/dan/gitartifact_0.0.1-SNAPSHOT.deb',
} 

You can chain package dependencies together with the gitartifact element.

Read more... 👓