From time to time I wear a support hat and usually one of the first things I say is “Send me the log files”. Log file messages are not usually quite as unhelpful but you can imagine my reaction when the only information in the file is a message like that.
I’m going to run through a few ideas that can make logging more effective, but one of the main points I want to make is that logging is not just for developers, its for support and production people too.
Tools like Log4Net and Log4J make it easy to add logging to code but generating the right log messages with useful information requires a bit of thought. When you are writing code, try to put yourself in the shoes of the person reading the log file when something has gone wrong.
So… “Error - Failed to connect” – Duh - Failed to connect to what? It’s obvious you need to insert some strings into the message including at least the host name and port but then there might be several things in the process trying to connect to something so it would make sense to identify what is doing the connecting. So a much better message might be something like “HeartbeatListener failed to connect to daemon on MyServer:7500” if that’s what it’s trying to connect to.
Then again, is it actually an error, does it stop the system from functioning correctly? Should it just be logged as a Warning? Maybe you can just be log it as Info. Suppose you have something trying to initiate a connection to several servers and you don’t actually care which succeeds, so long as one of them does. It’s important to get this right because log files are increasingly used as monitoring tools with anything from enterprise tools like splunk to a simple ‘tail –f output.log | grep Error | ….’
Monitoring systems will often email a lot of people if they detect something logged as ‘Error’ so your incorrect logging level can be very visible. Following on from that it’s a good idea to think about how many times you report an error, and consider the possibility of your system getting into a loop repeatedly logging an error and triggering emails all weekend. I know from bitter experience how hard it is to delete 45,000 messages from a Blackberry.
While they would clearly be better served using the Tick42 Metrics, developers sometimes choose to use logging to monitor system activity and performance and this is often left switched on in UAT and sometime production environments. That’s fine until things start going wrong and you have to start extracting the errors and warnings from hundreds of megabytes of log files to see the sequence of events that is of interest. The Log4X tools make it easy to work with multiple loggers, so, if you need to log each request your service receives and how long it took, put that in its own log file.
Log4X provides a lot of options for managing log files; you can specify maximum sizes or a daily rollover. If you are planning on a lot of log output, make sure the files are big enough. If you restart a service after a problem has occurred, its possible that all the clients reconnecting and requesting everything will cause a huge quantity of messages. The last thing you want is for all these messages to push the information that you need to investigate the problem over the end of your size limited log files.
Logging is one of those things that you don’t really think too much about until you suddenly find you really need it. This article just scratches the surface.