Skip to content

Value-specific logfiles with Logback and Clojure

Recently one of our customer projects at work discovered a need for value-specific logfiles.

For our purposes, let’s say we have a retail system. Every evening, we want to run a settlement report against this system’s financial behavior for the day. This settlement report will…

1. Execute, via a single process, application code to produce a settlement report
2. Emit low-level operational facts about that execution to a system log to record details of process behavior
3. Persist its high-level findings to a relational database table with a unique primary key

Here, we’re interested in altering the behavior in step 2 to produce separate per-execution logfiles rather than one continuous logfile for all reports. That is, we’d like the following…

ls -al logs

-rw------- 1 log-user staff 2.0M Jul 4 17:02 settlement-report-84.log
-rw------- 1 log-user staff 2.0M Jul 5 17:02 settlement-report-85.log

That DB primary key gives us a useful handle to differentiate.


Logback possesses reasonably flexible configuration, and allows access to custom Logger and LoggerFactory implementations. One can use the Logger protocol’s write! method to execute arbitrary code — a useful example being‘s own tests — with the tradeoff that any formatting choices made in this method will need to be managed apart from the application’s Logback config file.

It turns out Logback can actually get us most of the way there with its out-of-the-box facilities.

We can cooperate a little more closely with the existing Logback config by establishing a task-specific logger in concert with a SiftingAppender, whose purpose is to provide functionality around this notion of value-specific logfiles.

Here are the specific bits of the config/logback.xml

The discriminator key above is the most significant bit. It will use the appender‘s file directive to produce logfiles that are distinguished by report-id value.

That value is conveyed via Logback’s “Mapped Diagnostic Context“. We made use of io-clj/logging, a tiny Clojure library, to abstract away interaction with MDC, along with a dynamic var for thread-local state.


Client Code

Considerations Addressed Hereby

We make use here of‘s log macro, a bit of plumbing that gives access to multiple arities. We’re using the 4-arity implementation that allows one to specify the logger name. The default Logback behavior dispatches on caller namespace, but we can use an arbitrary string here to point to our value-specific logger in the Logback config.xml.

Providing vars named after the conventional set of logging levels (info, debug, etc.) allows one to migrate client namespaces to the value-specific logger by simply changing the client’s require spec to reference the new logging namespace.

Making these vars available as macros is a common pattern with logging wrappers to preserve the original call-site, so Logback can use the caller’s namespace to dispatch to the right logger; things like line numbers are also preserved.

Providing this as a separate namespace with a dynamic var for storage is a subtlety that may not be readily apparent. Mechanically, it allows us to use the logging facility from any other namespace without causing a cycle in the call-graph. Semantically, that’s necessary because neither Logback’s MDC facility nor the io-clj/logging library are prepared to handle Clojure’s thread pools out of the box.

That is, simple usages passing a data literal to the library’s with-logging-context macro will behave as expected but any logging calls on the other side of a thread boundary will provoke bad behavior. The value of the data literal in code that uses a thread pool will intermittently occupy the wrong thread; often the process driving the logging will have no such bindings, triggering the appender’s defaultValue, and therefore logging to the wrong file. In our project, we happen to use a number of core.async routines to drive the settlement data pipe-line; each of these contexts needs to re-establish the *logging-context* binding in its own logging calls. Providing this stand-alone logging facility means those constituent call-sites can just call the appropriate log-level in app.logging and let the macro manage access to the dynamic binding as appropriate.

Considerations Unaddressed

This is imperfect in at least one way. We are treating the monotonically-increasing primary key as universally unique when it is not; the PK may be “re-used” by other instances of the system — particularly instances with a new DB like system tests that exercise this codepath. If Logback is configured to clobber existing files when attempting to create a logfile with a name that already exists, data loss may result. An easy fix here would be to add a timestamp to the filename.

Thanks to Sam Umbach for valuable input and review!

Posted in Uncategorized.

Setting up a reverse SSH tunnel

Scenario: you want to allow Fred SSH access to your local machine and, for whatever reason, cannot expose your local SSH port to the public Internet.

   |                |
   |    Yourbox     |
   |                |

You can establish a reverse SSH tunnel to a remote machine (a “jumphost”) that *can* expose its local SSH port to the Internet.

A. SSH to jumphost with a reverse tunnel pointing jumphost’s port 1234 back to your port 22.


Posted in Random.

Block-select in iTerm

Hold down Alt+Cmd to turn your mouse cursor into a crosshair, letting you select text blockwise.

Posted in Uncategorized.

Printing only the matching column with awk and grep

I’ve spent way too many minutes looking this up over the years.

awk 'match($0,/regexp/) {print substr($0,RSTART,RLENGTH)}' inputfile

and about 1 time in 10 I’m startled to remember grep makes this very easy…

grep -o regexp inputfile

From here.

Posted in Uncategorized.

HATEOAS with visuals

This is the most satisfyingly visual explanation of HATEOAS I’ve seen.

Here’s what I came to understand. Every communication protocol has a state machine. For some protocols they are very simple, for others they are more complex. When you implement a protocol via RPC, you build methods that modify the state of the communication. That state is maintained as a black box at the endpoint. Because the protocol state is hidden, it is easy to get things wrong. For instance, you might call Process before calling Init. People have been looking for ways to avoid these problems by annotating interface type information for a long time, but I’m not aware of any mainstream solutions. The fact that the state of the protocol is encapsulated behind method invocations that modify that state in non-obvious ways also makes versioning interesting.

The essence of REST is to make the states of the protocol explicit and addressible by URIs. The current state of the protocol state machine is represented by the URI you just operated on and the state representation you retrieved. You change state by operating on the URI of the state you’re moving to, making that your new state. A state’s representation includes the links (arcs in the graph) to the other states that you can move to from the current state. This is exactly how browser based apps work, and there is no reason that your app’s protocol can’t work that way too.

Also, Martin Fowler’s article Richardson Maturity Model: steps toward the glory of REST is excellent. Richardson’s book, Restful Web Services, doesn’t mention his own maturity model or HATEOAS itself — it gives the most passing of references to Link tags or headers, for example. Good book, though.

Posted in Uncategorized.

Camphone thread engine

A couple years ago, I built a small Ruby script to let users send pictures from their phones to a webforum. This was an idea I’d been kicking around for a while since the early mobile photo-sharing sites sprang up, and when I stumbled across the MMS2R gem, I figured it was time. It quickly became one of the more popular features of the forum; it’s neat to be able to chime in from where-ever you are with a quick snapshot of what you’re seeing or doing.

The other night, I needed to dig back into that script and found my brain itching from the way the script had organically grown a bit sideways over time, so I decided to refactor it and toss it up on Github in case it’s helpful as a starting point for anyone who wants to do something similar.

Posted in Development.

magit Cheatsheet

magit is an emacs mode for git, and is crazy awesome. I’ve distilled the Magit User Manual into a command-centric cheatsheet for my own purposes and offer it here in case you find it helpful…

Magit Cheatsheet

Also including the upstream org-mode file in case you feel like wrestling with org-mode’s org-export-as-html and its weird set of quasi-controls.

In order to get org-mode to export that nicely, I had to add two user variables to my .emacs, along with a macro I used to quick-replace the TD elements with the classes they need. Here’s the snippet.

Posted in Development.

Some performance monitoring stuff that’s been on my radar lately

Monitoring server load on Linux

I’m all about top and mytop, but there’s at least one surprise in here for me — namely, /usr/bin/tload.

Measuring & Optimizing I/O Performance using iostat…

If IO performance is suspect, iostat is your best friend. Having said that, the man pages are cryptic so don’t be surprised if you find yourself reading the source. To get started, identify the device in question and start a monitoring process…

which contains a note on how Facebook stores 360 photo uploads every second.

Posted in Server. compromised

Anybody who’s had an interest in security over the last 10 years has probably heard of — kind of a clearinghouse for all sorts of security-related info, vulnerability announcements, etc.

Here’s a blow-by-blow transcript of their systems getting completely wiped out by someone who didn’t agree with their practices. The transcript is hosted on but was originally available as the index.html of the site itself.

It looks like they used custom code to exploit a hole in astalavista’s Litespeed webserver; googling the exploit used just brings up copies of the transcript itself.

This is probably the most brutal exploit transcript I’ve ever seen — reference line 1839 where the exploiter, having found where the backups are being stored by trawling the admin’s .bash_history, FTPs there and deletes their offsite backups. (If your production system stores the FTP password or password-less SSH keys to your offsite backup server, exposed through your .bash_history no less, is it really offsite?)

astalavista is still down at the moment; I imagine they’ve scrambled to change all their passwords, shore up their exposure, and see if they can dig up a backup from anywhere.

It makes a pretty remarkable contrast with this recent How I Hacked Hacker News (with arc security advisory) accounting by someone who compromised Paul Graham’s via some pretty painstaking research of system behavior — and then worked with Robert T. Morris (the Robert T. Morris) to get the issue fixed.

Posted in Server.

Tagged with .

HOWTO: Rack Middleware for API Throttling

I will show you a technique to impose a rate limit (aka API Throttling) on a Ruby Web Service. I will be using Rack middleware so you can use this no matter what Ruby Web Framework you are using, as long as it is Rack-compliant.

Awesome. I can think of so many uses for this basic pattern to replace stuff I usually have to do at the app level (slow) or at the Apache level (difficult to port, prone to decay).

Comments note that Memcache might be a better pick than Redis, which works even better with the setup I have in mind, though Redis sounded fun too.

Posted in Server.

Tagged with , .