Introduction
This post describes how I setup central logging for the distributed cloud application that I am developing, and why I did so. It details the software that I used and how I configured it, and it explains why I did, what I did, where I did it.
I wanted to add central logging to my application and associated systems because,
- Accessing individual logs on distinct systems and in distinct files on those systems had too much overhead
- I wanted my logs, as far as possible, to survive a catastrophic failure of the application and systems (of course there is always a possibility that the logging system itself fails)
- I wanted to avoid touching my production systems too often
- As I add more staff for development and operations, I want to avoid complicating my firewall rules by having to open up access to the individual systems from each admin’s system
- I wanted to log “everything” to see if I could debug problems by simply reviewing the logs rather than having to repeat problematic transactions and trace through my code. Repeating problematic transactions and tracing through code, to debug problems, is close to impossible in my production systems anyway, as this is the back-end for a consumer mobile app with a potentially huge user base. Further, I cannot rely on my users to report problems, or when they do, to report the correct set of steps to reproduce the problem.
- So far the log everything strategy has worked very well. I’ve solved numerous problems by simply reviewing the logs and I have not had to trace through my code even once.
- With the availability of cheap high performance systems today and with the distributed nature of the workloads, I have not encountered significant slowdowns in end user response times so far.
- I wanted my central logging system to automatically send me an email when an error occurred rather than for me to discover the problem some other way and then go hunting for the logs
- Down the road I want to process the logs more intelligently, and codify my common triage workflows, so that the system can automatically determine root causes and then alert me to the issue rather than its symptoms
- E.g., In a simple example, if there is a problem with my database server, then my
MySql
logs will have errors in them and my Java application may encounter manySQL Exceptions
that are logged as well. In this case, instead of being inundated by alerts for everyJava SQL exception
, I’d rather just get an alert with the error log entry fromMySql
detailing theJava SQL exceptions
as likely symptoms.
- E.g., In a simple example, if there is a problem with my database server, then my
- Logs are also a cheap way for me to track other metrics such as which features are used the most. I can get this information by processing the logs instead of writing more code to specifically track feature usage and increment counters and metrics etc. Processing the logs also gives me more freedom to retroactively track metrics that I may need later, instead of having to update my application and getting those metrics only from that point on. Data like which features are used the most is great for focusing Marketing, Product Management, and Engineering resources.
- A central logging system also allows me to correlate logs from multiple systems by say timestamp and username (say I am able to codify usernames into the logging from complex transactions across multiple systems). This way I can see a full end-to-end trace of what happened on behalf of a user and tell where in the chain of function calls the problem occurred and how transaction arguments morphed during that movement of data across systems.
- I have a thought that comparing logs output from my test systems and production systems can also quickly tell me if certain code paths are being tested enough or not. Of course there is COTS software to report code coverage but I could potentially avoid running another software application if I can just use my logs.
Synchronizing time for Log Analysis
- In order to debug problems in multi-access / multi-threaded systems like web and mobile applications, its very important that your systems have a consistent view of time. When you are debugging problems across multiple systems, e.g., even simply between a web and database server, you need to be able to string log entries together to identify the code execution path from transaction start to failure.
- Synchronizing time is very easy with
NTP
so there is no reason to avoid this step
Setup Steps
Machines and OSes used
My systems are running in AWS EC2
and are currently t1
or t2 micro
instances running Ubuntu 14.04 LTS
Synchronizing time across my systems using NTP
- I’m currently of the opinion that for my purposes of synchronizing time I can just set the timezone of all my instances to be US Pacific Time irrespective of where the instances really are. This way I don’t have to do any timezone processing of my log entry timestamps when I am investigating an issue and all externally reported times just have to be converted to a single timezone for debugging purposes.
- Setting the common timezone of all my instances
$ sudo dpkg-reconfigure tzdata
- Setting up NTP
- Installation
$ sudo apt-get install ntp
- Configuration (optional)
- The default
NTP
configuration file hasNTP pool servers
already configured and its not necessary to change them - Just in case, you can edit the file
/etc/ntp.conf
and add the lines
- The default
server 0.us.pool.ntp.org
server 1.us.pool.ntp.org
server 2.us.pool.ntp.org
server 3.us.pool.ntp.org
- Reference
NTP
configuration and the optimal pool servers to use are documented here
Creating a common logs format across my application and systems
I wanted to have a common logs format across all my application code and systems to make debugging easier. This way,
- I can count on all log entries having all the data fields / information that I deem necessary for debugging and tracking
- The code that I write to process my logs is simpler
- Understanding / consuming the log entries is easier as they all follow a standard format
Using Log4J2
for application logging
- Early on I standardized on using
Java
for all my application code because I’m good atJava
and it has such widespread support that I can code every piece of my application back-end in it, be itweb services
andREST APIs
, or components that are straight upJava server applications
. There is also this amazing application logging package namedLog4J2
! I’ve been usingLog4J2
for many years and absolutely love it. - Adding
Log4J2
to yourJava
application or web project and configuring it is very easy and described in detail here - I’ll just say that I used a
RollingFileAppender
which writes logs to a well named file in a standard sub-directory under/var/log/
. Having just one type of OS across my systems helps standardization of course. - Also I don’t need to keep old log files around on the systems since all log entries are all being sent to my central logging system. This way I can just delete the files once they get too big and start new ones.
Updating the default java.util.logging
format used by Tomcat
- I primarily have five kinds of components –
Servlets
that run inTomcat
instances,MySql
,Redis
,syslog-ng
, and straight upJava server applications
that are run periodically bycron
jobs. - I’m trying to minimize the number of diverse types of components so that standardization and maintenance is easier. The net effort required to develop, deploy, and maintain, a component has to be minimal. So something that is harder to develop in-house but easier to maintain and standardize can win out over
COTS
software. - So far I’ve standardized the logging format of my
Java Servlets
,Java server applications
, andTomcat
itself.- My
Java Servlets
and andJava server applications
useLog4J2
and thats covered above Tomcat
itself uses thejava.util.logging
package. Updating the format of theTomcat
logs was easy.Tomcat
has three types of log files by default.catalina.*
,localhost.*
, andlocalhost_access_log.*
- The logs that I am interested in are in the localhost.* log files
- These capture uncaught
exceptions
generated within my deployed applications - I had to do two things
- Update the format
- The format is set in
Tomcat's
logging.properties
file which on my systems is in/etc/tomcat7/
- I added the format line:
java.util.logging.SimpleFormatter.format = %1$td %1$tb %1$tY %1$tH:%1$tM:%1$tS,%1$tL \"%1$tY-%1$tm-%1$td %1$tH:%1$tM:%1$tS,%1$tL\" %4$s %2$s [] \"%5$s%6$s\"%n
- This matches the format that I use in my
Log4J2
configuration - I then saved my changes and restarted
Tomcat
- More information on this is available here
- The format is set in
- Create and update a well named
hard link
to thelocalhost.*
log file- The format of the
localhost.*
log filenames islocalhost.<date>.log
andTomcat
creates a new file daily - I didn’t want to mess with this but
syslog-ng OSE
requires a constant filename to track and gather logs from - Hence, I created a well named
hard link
to the latest version of this file and update this hard link to the latest version every hour via acron
job- I had to create a
hard link
because the modified date on asoft link
would not change when the underlying file is changed, hencesyslog-ng
would not know that a change has occurred / new logs have been written to the log file
- I had to create a
- The
cron
job that runs every hour executes the commandsudo ln -f $(ls -t localhost.* | head -1) tomcat_localhost.log
- The format of the
- Update the format
- My
Deciding the log entries of interest that I want to be proactively notified about
- Early on in my application design and development, I decided that I would try to log everything, as explained in the introduction to this post. Hence, I decided that I would have only three severities of logs –
DEBUG
,WARN
, andERROR
. Actually I started out with two –DEBUG
andERROR
, withERRORs
being the ones that I want to be notified about obviously. However, soon I realized that there are certain business related events that I also need notifications about, e.g., a user creating a new account or canceling their account. These are not errors but my marketing and sales departments would be interested in this information hence I added theWARN
severity to cover such things. - My log entries are very descriptive in that they include every dynamic part of a particular action. Typically these are the arguments for the action like a username, datetime, entity ID, etc. Of course you have to be careful not to accidentally log things like passwords and authentication tokens in plain text, or at all actually.
- This being my standard format for a log entry, then the only thing I need to change is the severity in order to decide whether its something to be notified about or not. And for Notifications I have –
DEBUG
(don’t notify),WARN
(notify the business contact),ERROR
(notify the engineering contact). - These rules have worked for me so far but maybe in the future they will need more complexity. The main thing is to be rigorous so that I don’t compromise consistency and ease of maintenance.
Setting up a storage system for my logs
- Given that I want to log everything, I was obviously going to have a tremendous number of log entries. Hence, I needed a robust storage system.
- I decided to go with
MySql
because its robust, scalable, and easy to manage for me. I looked at usingNoSql
systems but I just found that their management and maintenance requires more time than I had. Maybe down the road I will use them. - Another thing is that I don’t expect to store all my logs in
MySql
forever. I figure that I only need a few days worth of logs inMySql
, in case I need to debug a problem. The log processing that I have done so far is working off recent logs and these are easy to store and get fromMySql
. To save older logs, I’m planning to implement a system whereby my logs are saved both inMySql
and in a longer term data storage system like cloud storage sold byAWS
,Microsoft Azure
, orGCE
, simultaneously. This way, I will not have to move logs fromMySql
to secondary storage. I’ll just delete older logs fromMySql
while still having copies in cloud storage. - Setting up
MySql
onUbuntu
is very simple and I’m not going to cover it here - Also,
syslog-ng OSE
, just requires my database name, username, and password, and then it creates / alters / drops all the tables based on mysyslog-ng
configuration. This cuts my overhead tremendously.
Setting up a system to alert (email) me when log entries of interest are generated
- The first and simplest form of log processing that I wanted to do, as described above, was to get emails when an
ERROR
orWARN
severity log entry is created. - Since my logs are in
MySql
, it was very easy to write a smallJava program
that retrieves newERROR
orWARN
log entries (since the last run of the program) and then emails me these log entries. - I run this
Java program
via acron
job every minute, and of course thisJava program
is also usingLog4J2
so its own logs end up insyslog-ng OSE
too. Now if there is a problem with this program, I won’t get any emails but at least the logs will be inMySql
for me to review.
Setting up the core logging system – syslog-ng OSE
- I looked at many different options for a central logging system (both paid and free) – there are a ridiculously large number of products for logging out there. I was of course partial to free products.
- I ended up testing two products
- The so-called
ELK stack
consisting ofElasticSearch
,LogStash
, andKibana
, and - On the recommendation of my friend and fellow CTO, Sebastien Soudan, the
syslog-ng Open Source Edition (OSE)
- The so-called
- The
ELK stack
proved to be too time-consuming for me at the time. Maybe now the documentation and error checking has improved. It has a lot of great features and bothElasticSearch
andLogStash
are very fast and scalable. Ultimately though, the configuration and debugging proved to be quite challenging. Having three distinct moving parts didn’t help either. Also I ran into a bug that was fixed in the next GA release and there was no timeline for this next release. - So I then tested
syslog-ng OSE
. My experience was fantastic to say the least! syslog-ng
is clearly a tried and tested, robust, and scalable, logging system with many features to process logs from varied sources, write them to varied destinations (like myMySql
database), route, and process log entries in complex ways, and a simple but such an important feature for me – handling multiline log entries simply.- There are two components to setup for
syslog-ng OSE
–server
andclient
- The server is aggregating logs over the network, processing these logs, and sending them on for storage to my
MySql
database. - I also run the alerting program (described above) to email me when
ERROR
andWARN
log entries are generated on this system - The clients run on every other system and are collecting log entries from one or more log files on the client systems, and sending these log entries over the network to the
syslog-ng server
using the standard and robustsyslog protocol
- Since I am using
Log4J2
, I could have used asyslog appender
to haveLog4J2
send the logs directly to mysyslog-ng server
. However, thelog4j2
logging function call is a blocking call. This means that if the network or my server are down or slow, then my application will also block. Hence, I prefer for my application to log to a file and forsyslog-ng
to pick it up from the file. - The neat thing (and a thing that cuts my maintenance headache) is that the
client
andserver
are both the same program just configured as either client or server - I found that the overhead of
syslog-ng
is negligible in terms ofCPU
,memory
, anddisk space
. The latency of logs being written to the original log files, and from there being transported over the network to thesyslog-ng server
and from there over the network once again to the database is negligible – less than a second end-to-end. - Another really neat feature is that I can configure the
client
andserver
to communicate securely.syslog-ng OSE
supportsTLS
, hence my clients and server authenticate each other, and all my log entries are encrypted as they are transported over the network. - So here is my setup
Documentation
The Ubuntu
packages for syslog-ng
are available at
Installing syslog-ng
on a system
- Add the
GPG-key
toAPT
curl http://packages.madhouse-project.org/debian/archive-key.txt | sudo apt-key add -
- Add the repositories to
/etc/apt/sources.list
deb http://packages.madhouse-project.org/ubuntu trusty syslog-ng-3.5 syslog-ng-incubator-3.5
deb-src http://packages.madhouse-project.org/ubuntu trusty syslog-ng-3.5 syslog-ng-incubator-3.5
- Update the
APT
cache
$ sudo apt-get update
- Install the
syslog-ng package
$ sudo apt-get install syslog-ng-core
$ sudo apt-get install syslog-ng
Install libdbd-mysql
on the syslog-ng server
machine
syslog-ng
requires this driver to communicate with theMySql
database
$ sudo apt-get install libdbd-mysql
MySql
Configuration
- Login to
MySql
mysql -u <username> -p
- Enter your database password at the prompt
- Create database named
logs
CREATE DATABASE logs;
- Assign users to the
logs
database
GRANT ALL ON logs.* TO <username>@'<IP Address of syslog-ng Server>' IDENTIFIED BY '<database password>';
- Open
MySql
port3306
from<IP Address of syslog-ng Server>
in yourfirewall
guarding theMySql
server
syslog-ng Server
configuration
- The
syslog-ng
configuration is saved in file/etc/syslog-ng/syslog-ng.conf
- Keep the default configuration and add the following
- Source
source s_network_ietf_syslog {
syslog(ip(0.0.0.0) port(5000) transport("tcp"));
};
- This sets up
syslog-ng
to listen forsyslog
messages onTCP
port5000
. Yoursyslog-ng clients
will communicate with the server on this port. - Parser
parser log4j_logs_parser{
csv-parser(
columns(LOG4J.DAY_OF_MONTH, LOG4J.MONTH_NAME, LOG4J.YEAR, LOG4J.TIME, LOG4J.SEVERITY, LOG4J.CLASS_NAME, LOG4J.LOCATION_IN_SOURCE_CODE, LOG4J.THREAD, LOG4J.LOG_MESSAGE)
delimiters(" ")
flags(escape-backslash, greedy)
quote-pairs('""')
);
};
- We use
syslog-ng's CSV parser
to parse the log entries into fields for storage in the database - This makes it easier to process the log entries down the line
- Earlier in the
Log4J2
configuration I used spaces judiciously between fields to make sure this parser works correctly.- The line
columns(…)
represents the fields in my log messages. The names likeLOG4J.DAY_OF_MONTH
andLOG4J.CLASS_NAME
are names made up by me so that I can refer to each of these parsed fields later in my configuration. Thesyslog-ng CSV parser
will parse each log entry based on thedelimiter
specified and then match the parts of the parsed log entry to the column names that I have specified. If I have less column names than the parser actually finds then I will lose the excess data from the log entry.
- The line
- I also surrounded certain fields like the message within double-quotes
("")
to ensure that theCSV parser
treats data within double-quotes as a single field even if it has spaces (mydelimiter
). This is configured by thequote-pairs('""')
setting. - The
escape-backslash flag
is used to escape quotes within log entry fields - And finally, the
greedy flag
is a catch-all. If by some misconfiguration or chance a log entry has more fields than mycolumns
, then everything after the last field of the parsed log entry is added to the last column of the parser. This way I won’t lose any log entry data. - Destination
destination d_logs_mysql_server {
sql(
type(mysql)
host("<IP address or DNS name of the MySql server>")
username("<database username>")
password("<database password>")
database("logs")
table("logs")
columns("log_date_time datetime", "sender", "senders_send_date_time datetime", "servers_receive_date_time datetime", "severity", "class_name", "location_in_source_code", "thread", "message")
values("${LOG4J.ISO8601_DATETIME}", "${HOST}", "${S_UNIXTIME}", "${R_UNIXTIME}", "${LOG4J.SEVERITY}", "${LOG4J.CLASS_NAME}", "${LOG4J.LOCATION_IN_SOURCE_CODE}", "${LOG4J.THREAD}", "${LOG4J.LOG_MESSAGE}")
);
};
- The destination for the log entries is my
MySql
database - You’ll see that the
database
refers to the database I created above under theMySql
configuration - The
table
instructssyslog-ng
to create a table, also namedlogs
, if it does not exist in the databaselogs
- The
columns
instructssyslog-ng
to create columns having the specified names.syslog-ng
does a good job of inferring data types for the columns. - The
values
refers to fields from my parser above or from default fields that everysyslog message
has. Remember that in my configurationsyslog-ng
is transmitting log entries from the clients to the server assyslog messages
. Thus${HOST}
refers to the sender’shostname
contained in everysyslog message
. Similarly,${S_UNIXTIME}
refers to the datetime that the sender sent thesyslog message
. - The neat thing is that if I change this configuration, then
syslog-ng
will automatically alter the database table as necessary. - Log Statement
log {
source(s_network_ietf_syslog);
parser(log4j_logs_parser);
destination(d_logs_mysql_server);
};
- The
log
statement finally ties everything together- It instructs
syslog-ng
to receive log messages from the clients, parse the log messages, and then send them on to theMySql
destination
- It instructs
syslog-ng Client
configuration
- Once again, the
syslog-ng
configuration is saved in file/etc/syslog-ng/syslog-ng.conf
- Keep the default configuration and add the following
- Source
source s_tmhapp_log_file{
file (
"/var/log/tomcat7/<my_log_file_name>"
flags(no-parse)
multi-line-mode(regexp)
multi-line-prefix("[0-9]{2} [a-zA-Z]{3} [0-9]{4}")
);
};
- All my clients are writing logs to files and these files are the source of log entries on my clients
- The
no-parse
flag instructssyslog-ng
that the log entries inside the files are not in standard syslog format and thus should not be parsed as such. This is because the log entries are following my custom format that I previously set inLog4J2
. syslog-ng
will thus treat the entire log entry as the payload (or message) of asyslog packet
. It sends onesyslog packet
from theclient
to theserver
for each log entry.- The
multi-line-prefix
instructssyslog-ng
to handle multi-line log entries.syslog-ng
applies the specified regular expression to incoming text data from the log files and identifies the start (thisprefix
) of a new log entry and then treats all text data (including newlines) until the next instance of theprefix
is found, as part of the single log entry. Without this,syslog-ng
treats a newline character as the end of a log entry. - The
multi-line-mode(regexp)
instructssyslog-ng
that the specifiedprefix
is aregular expression
and should be processed as such. - Destination
destination d_network_ietf_syslog{
syslog(
"<IP address or DNS name of the server>"
transport("tcp")
port(5000)
localport(5000)
);
};
- The
destination
specifies where the server is and on which port - I also chose to fix the outgoing port for communications to lock down my firewall rules further
- Log Statement
log {
source(s_tmhapp_log_file);
destination(d_network_ietf_syslog);
};
- Finally, as in the
server
, thelog
statement ties everything together - Note, that the log entries are parsed on the
server
Debugging syslog-ng
- Start
syslog-ng
with the following command
$ sudo strace -o /tmp/trace.txt -s256 -ff -ttT syslog-ng -f /etc/syslog-ng/syslog-ng.conf -Fdv
syslog-ng
sends all its debug output to the console and I have found this more than enough to troubleshoot my setup
Starting and Stopping syslog-ng
for Production
- The standard commands apply
$ sudo service syslog-ng <start> | <stop> | <restart>
Optional – Mutual TLS
Encryption
between syslog-ng clients
and the server
- We will use the same
private key
andssl certificate
for bothclient's
and theserver
- This is because for
syslog-ng
, one must store both the client and server certificate on the machine without passwords - Therefore, if a malicious user breaks into these machines, they’ll get the certificates anyways
- So the main protections are to restrict access to the machines and to have the data transmission encrypted
- This is because for
- Create the
private key
without apassphrase
$ openssl genrsa -out <private key name>.pem 2048
- Create the
ssl certificate
$ openssl req -new -x509 -key <private key name>.pem -out <ssl certificate name>.pem -days 7300
- Fill out the required certificate data as prompted
- Copy the
ssl certificate
to theclients
in theclient's
directory/etc/syslog-ng/ca.d/
- Copy the
ssl certificate
to theserver's
directory/etc/syslog-ng/ca.d/
- In the
/etc/syslog-ng/ca.d/
Directory of eachclient
and theserver
- Issue the following command on the
ssl certificate
- Issue the following command on the
$ openssl x509 -noout -hash -in <ssl certificate name>.pem
- The result is a
hash
(e.g., 6d2962a8), a series of alphanumeric characters based on the Distinguished Name of thessl certificate
. - Issue the following command to create a
symbolic link
to thessl certificate
that uses the hash returned by the previous command and the.0
suffix.
$ sudo ln -s <ssl certificate name>.pem 6d2962a8.0
- Copy the
ssl certificate
to theclients
in theclient's
directory/etc/syslog-ng/cert.d/
- Copy the
certificate
to theserver's
directory/etc/syslog-ng/cert.d/
- Copy the
private key
to theclients
in theclient's
directory/etc/syslog-ng/key.d/
- Copy the
private key
to theserver's
directory/etc/syslog-ng/key.d/
- On the
syslog-ng clients
, configureTLS
in thesyslog destination
to theServer
- Change
transport("tcp")
totransport("tls")
in thesyslog destination
- And, add
- Change
tls(
ca-dir("/etc/syslog-ng/ca.d")
key-file("/etc/syslog-ng/key.d/<private key name>.pem")
cert-file("/etc/syslog-ng/cert.d/<ssl certificate name>.pem")
)
- to the
syslog destination
- On the
syslog-ng server
, configureTLS
in thesyslog source
of theServer
- Change
transport("tcp")
totransport("tls")
in thesyslog source
- Add
- Change
tls(
ca-dir("/etc/syslog-ng/ca.d")
key-file("/etc/syslog-ng/key.d/<private key name>.pem")
cert-file("/etc/syslog-ng/cert.d/<ssl certificate name>.pem")
)
- to the
syslog source
References
- http://www.balabit.com/network-security/syslog-ng/opensource-logging-system
- https://help.ubuntu.com/community/UbuntuTime
- http://logging.apache.org/log4j/2.x/manual/configuration.html
- http://docs.oracle.com/javase/7/docs/api/java/util/logging/SimpleFormatter.html
- http://docs.oracle.com/javase/7/docs/api/java/util/Formatter.html#syntax
- http://stackoverflow.com/questions/5416769/how-to-configure-the-jdk14-loggings-pattern
- http://tomcat.apache.org/tomcat-7.0-doc/logging.html
- http://www.balabit.com/sites/default/files/documents/syslog-ng-ose-3.6-guides/en/syslog-ng-ose-v3.6-guide-admin/html/index.html
- http://asylum.madhouse-project.org/projects/debian/