Many years ago when I started my first job as a developer, one of the things that I noticed was that it was always difficult to solve an issue that our end users were having. Often, we’d try to replicate the problem, be unable to, and end up having to call the user and ask them to share their screen with us, go over to watch them do whatever they were doing, or call us back whenever they had it come up again.
Another common trend was the sheer amount of time it took to track down the root cause of a bug that could be replicated. Particularly with complex, data-driven systems, tracking down bugs would be time consuming difficult, and result in a lot of late nights and sometimes weekends to make things usable for clients in a timely manner.
Does any of this sound familiar? At first, being inexperienced, I chalked it up to just the way it was. Complex programs are more likely to be difficult to maintain after all. My younger self thought it was just the cost of doing business.
It didn’t take all that long for me to realize there had to be a better way to do things. After constantly repeating the above patterns, it finally dawned on my inexperienced self that the problem was our software wasn’t transparent enough about what it was doing. A lack of information was the culprit to all the time sinks, so we leaned on the customer to tell us about what they were doing or looking through the code repeatedly to try and find a problem. I realized, none of our software was effectively logging what was going on behind the scenes – it had been taken for granted for so long, it just wasn’t part of the way we wrote software.
That realization changed the way I write code. One of the first questions I have now when I start a new project is – what logging mechanisms are we going to rely on? I’m not going to go in depth here about a logging framework here, as there are many excellent ones. I’m more going to focus on several types of logging you can be doing and why, particularly thinking about solving the types of problems I used to experience above.
So… what types of logging are must haves?
The idea behind error logging is to capture details of any unhandled exception that occurs within the software.
Ideally, we should capture all the details we can about the exception to review including the exception message, stack trace, and anything else that might be relevant. The great thing about capturing detailed exception data is we can usually pin down an exact line of code that’s failing. The other thing that’s nice about error logging is it’s usually pretty easy to set up – many programming frameworks have a built-in method for capturing unhandled exceptions so that you can record information about them in some way. There are third party tools you can install such as Sentry and Application Insights as well that help to give you detailed exception data in a palatable way.
In my opinion, error logging is a must, a bare minimum that every program has to have. It’s a huge red flag to me if a piece of software doesn’t have some mechanism of capturing errors for developers to review when there’s a problem. At the bare minimum, setting up a database table to store unhandled exception data in is a fairly simple task, and can save hours trying to track down a problem.
To sum up: Error logging helps us find WHERE code is failing.
Trace logging is the process of saving informative statements about what a program is doing at any given time.
Trace statements may be simple very simple statements of fact such as “User xxx accessed cart”, or they may include entire data structures/objects that are being worked with at a given time in the code. Trace statements are a way for us as developers to see the path that code is following when we aren’t the ones running it. This often eliminates that call to the user to have them walk us through the issue, because ideally we can look up the trace logs at the time, they had the problem and see what was happening.
Once again, many frameworks support trace logging, and at a minimum it’s simple to set up trace logs to save to a database or text file. We’ll often set different status levels on trace logs, such as “information”, “warning”, etc. to be able to identify and filter these logs a little more granularly.
When first starting a project, I’ll often put in TONS of trace logging statements, probably far more than necessary. I’ll log what’s going on, the state of different objects in my code, and so on. I’ll often pare the trace logging down later once I’ve had a chance to test it and see what logging is valuable and what isn’t so much. By making a habit of writing trace logs while I write my code though, I end up with better trace logs that aren’t an afterthought – they reflect the intent of the code.
There are one or two issues that can arise with trace logging. If we overdo it, we may fill too much space in a database or on a hard drive. Make sure you clean these logs up so you don’t suddenly run out of space. There’s also the danger that we might log something sensitive and that someone looking at that log could see something they shouldn’t. If you’re cognizant of this, it’s usually not an issue, but it is something to keep in mind.
To sum up: Trace logging helps us find the WHY code is failing
The final type of logging I’m going to talk about is audit logging. Audit logging the process of logging the actions that user’s take within the system in a formal way.
I like to think of audit logging as an insurance policy. In my opinion, any time a user takes an action that impacts data that the software uses, this should be logged. Not only can this be helpful additional information for trace logging, but it also holds users accountable for their actions, and makes it easier to figure out if something in your system isn’t working due to a bug, or maybe due to a user mistake.
As with the other two, audit logging is usually easy to set up. There are frameworks and tools that can help with it if needed, but I’ll often just set up a simple table specifically designated as the audit log that tracks what users took what actions that affected system data when. I make setting up this logging mechanism an initial development step with a project once again, so that I can be adding audit log data as I go whenever it makes sense.
To sum up: Audit logging helps us find WHO has touched code that is failing
So there you have it. By making error logging, trace logging, and audit logging a priority in my development process, I’ve eliminated most of those late nights, weekends, and frustrating user phone calls.
It seems like such a simple thing, but it’s something that I used to take for granted, and that I’ve seen many other developers ignore in the name of “saving time” or “shipping faster.” The truth is, the minuscule amount of extra time and effort made to have a good logging strategy as a part of your development process will save you lots more time down the road when it comes to supporting your application. That’s why logging is a developer’s secret weapon.