Table of Contents

Nerdlog: fast, multi-host TUI log viewer with timeline histogram

Loosely inspired by Graylog/Kibana, but without the bloat. Pretty much no setup needed, either.

Hosted on Github: https://github.com/dimonomid/nerdlog.

First of all, a little demo. Here, we're dealing with (fake) logs from 4 remote nodes, and simulating a scenario of drilling down into logs to find the issue, filtering out irrelevant messages and finding relevant ones.

Project history

It might be useful to know the history to understand the project motivation and overall direction.

My team and I were working on a service which was printing a fairly sizeable amount of logs from a distributed cluster of 20+ nodes: about 2-3M log messages per hour in total. There was no containerization: the nodes were actual AWS instances running Ubuntu, and our web services were running there directly as just systemd services, naturally printing logs to /var/log/syslog. To read the logs though, we were using Graylog, and querying those logs for an hour was taking no more than 1-3 seconds, so it was pretty quick.

Infra people hated Graylog though, since it required some annoying maintenance from them, and so at some point the decision was made to switch to Splunk instead. And when Splunk was finally rolled out, I had to find out that it was incredibly, ridiculously slow. Honestly, looking at it, I don't quite understand how they are even selling it. If you've used Splunk, you might know that it has two modes: “Smart” and “Fast”. In “Smart” mode, the same query for an hour of logs was taking _a few minutes_. And in so called “Fast” mode, it was taking 30-60s (and that “Fast” mode has some other limitations which makes it a lot less useful). It might have been a misconfiguration of some sort (I'm not an infra guy so I don't know), but no one knew how or wanted to fix it, and so it was clear that once Graylog is finally shut down, we'll lose our ability to query logs quickly, and it was a massive bummer for us.

And I thought that it's just ridiculous. 2-3M log messages doesn't sound like such a big amount of logs, and it seemed like some old-school shell hacks on plain log files, without having any centralized logging server, should be able to be about as fast as Graylog was, and it should be enough for most of our needs. As you remember, our stuff was running as systemd services printing logs to /var/log/syslog, so these plain log files were readily available to us. And so that's how the project started: I couldn't stop thinking of it, so I took a week off, and went on a personal hackathon to implement this proof-of-concept log fetcher and viewer, which is ssh-ing directly to the nodes, and analyzing plain log files using bash + tail + head + awk hacks.

It has proven to be very capable of replacing the essential features we had in Graylog: being fast, querying logs from multiple remote nodes simultaneously, drawing the histogram for the whole requested time period, supporting context (key-value pairs) for every message. Apart from that, it was actually refreshing to use a snappy keyboard-navigated terminal app instead of clunky web UIs, so in a sense I liked it even more than Graylog. As to Splunk, I ended up almost never using it to fetch logs from our nodes.

So having that backstory, you can already get a feel of the goals and design of Nerdlog: it is laser-focused on being super efficient while querying logs from multiple remote machines simultaneously, filtering them by time range and patterns, and apart from showing the actual logs, also drawing the histogram.

Design highlights

Project state

The initial implementation (that personal hackathon I mentioned above) took place in 2022, and after reaching the good-enough point, pretty much no development was done for a few years.

It was good enough for our internal needs, but definitely not general enough to make it public, and so after a few years, I finally made an effort in 2025 to address the most obvious issues, and share it, in the hopes that it'll be useful to some.

It's still kinda in a proof-of-concept stage though. Implemented as fast as possible, spaghetti code abounds, could be covered with more tests, a lot more features could be implemented, etc. Was only tested for real on Linux, and with Linux hosts to get the logs from.

But it works. It's pretty usable and surprisingly fast.

Core concepts

Logstreams

As the name suggests, a logstream (or shortened to lstream) is a consecutive stream of log messages; in Nerdlog implementation, a logstream can be provided by one or more log files (actually, as of now the limitation is to have at most 2 files in a logstream). For example, /var/log/syslog.1 and /var/log/syslog constitute a single logstream.

In order to collect data from a logstream, Nerdlog needs to know a few things: first the ssh connection details (hostname, user and port), filename of the last log file, and filename of the previous log file (in the future there might be support for more older files). In the most explicit form, here's how a single logstream specification would look like:

myuser@myhost.com:22:/var/log/syslog:/var/log/syslog.1

It's a valid syntax and can be used on the query edit form. Multiple logstreams can be provided too, comma-separated.

However, having many hosts to connect to, it would be tedious having to specify them all like that; so, here's how it can be simplified:

Default values

Everything except hostname is optional here: just like you'd expect, user defaults to the current OS user, and port defaults to 22. Then, latest logfile defaults to either /var/log/messages or /var/log/syslog (whatever is present on the host), and the previous log file defaults to the same as latest one but with the appended .1 to it, so e.g. /var/log/syslog.1, just like log rotation tools normally do.

Putting it all together, if the defaults work for us, all we have to do is to specify myhost.com. Or again, multiple hosts like foo.com,bar.com.

SSH config

Nerdlog reads ssh config file (~/.ssh/config) as well. So for example if our ssh config contains this:

Host myhost-01
  User myuser
  HostName actualhost1.com
  Port 1234

Host myhost-02
  User myuser
  HostName actualhost2.com
  Port 7890

Then we can specify just myhost-01, and it'll be an equivalent of:

myuser@actualhost1.com:1234

Globs are supported too, so if we want to get logs from both hosts in this ssh config, we can specify myhost-*, and it'll be an equivalent of:

myuser@actualhost1.com:1234,myuser@actualhost2.com:7890

Nerdlog logstreams config

One obvious problem though is that SSH config doesn't let us specify the log files to read. If we need to configure non-default log files, we can use the ~/.config/nerdlog/logstreams.yaml file, which looks like that:

log_streams:
  myhost-01:
    hostname: actualhost1.com
    port: 1234
    user: myuser
    log_files:
      - /some/custom/logfile
  myhost-02:
    hostname: actualhost2.com
    port: 7890
    user: myuser
    log_files:
      - /some/custom/logfile

Having that, we can specify myhost-01, and it'll be an equivalent of:

myuser@actualhost1.com:1234:/some/custom/logfile:/some/custom/logfile.1

Combining multiple configs

In fact, Nerdlog checks all of these configs in the following order, where every next step can fill missing things in, using hostname as a key:

Therefore, having the SSH config as shown above, we can simplify the aforementioned logstreams.yaml as follows:

log_streams:
  myhost-01:
    log_files:
        * /some/custom/logfile
  myhost-02:
    log_files:
        * /some/custom/logfile

And get the same result, because hostname, user and port will come from the SSH config.

Query

A Nerdlog query consists of 3 primary components and 1 extra:

On the query edit form, you'll see one more field: “Select field expression”, it looks like this:

time STICKY, lstream, message, *

But it only affects the presentation of the logs in the UI. It somewhat resembles the SQL SELECT syntax, although a lot more limited.

The STICKY here just means that when the table is scrolled to the right, these sticky columns will remain visible at the left side.

Another supported keyword here is AS, so e.g. message AS msg is a valid syntax.

How it works

It might be useful to understand the internal mechanics of it, because certain behavior or usage limitations will be then more obvious.

Once you specify one or more logstreams on the query edit form, and submit it, Nerdlog will initiate a separate ssh connection for every logstream (even if the logstream's host is localhost; a shortcut is not yet implemented for it). If we have multiple logstreams on the same host, as of today it'll still make a separate ssh connection for every logstream.

Then, for every logstream:

Overview of query implementation

Here's how a query is executed, on a high level. Conceptually, here are the steps that we need to take:

First, on the agent side:

Additionally, the agent prints some progress info to stderr, such that Nerdlog can show it on the UI, and we know how far we are in the query. Very convenient for large log files, especially when the index file is being generated (see details below).

And on the Nerdlog side:

An important point here is that, perhaps unintuitively, the awk pattern is checked against raw log lines. So for example, if in the UI we see a column program being foo, and want to filter logs only with that value of program, when writing an awk pattern we have to think how it looks in the raw log file. Perhaps just /foo/ can be good enough, but keep in mind that it'll potentially match more logs (those that contain foo in some other place, not necessarily in the program field)

Index file

As mentioned above, the first step when executing a query is cutting the logs outside of the requested time range. It could be done by manually checking every line in a logfile to find the right place, but if the log files are large and the timerange being queried is relatively small (which is often the case), this is the slowest part of the query and it's often repeated in multiple subsequent queries.

So to optimize that, the agent script maintains an index file: basically a file stored as /tmp/nerdlog_agent_index_….., with a mapping from a timestamp like 2025-03-09-06:02 to the line number and byte offset in the corresponding log file. As you see, the resolution here is 1 minute; it means we can't query time ranges more granular than a minute.

So when a query comes in, with the starting timestamp being e.g. 2025-04-20-09:05, the agent first checks if the index file already has this timestamp. If so, then we know which part of the file to cut. If not, and the requested timestamp is later than the last one in the index, we need to “index up”: add more lines to the index file, starting from the last one there. And obviously there's logic to invalidate index files and regenerate them from scratch; this happens when log files are being rotated.

So indexing does take some time (on 2GB log file it takes about 10s in my experiments), but it only has to be done once after the log files were rotated, so at most once a day in most setups. And thanks to that, the timerange-based part of the query is very efficient: we know almost right away which parts of the log files to cut.

Requirements

SSH access

In order to read logs from a host, one has to have ssh access to that host (so far, there is no shortcut even for localhost); and obviously have read access to the log files. Notably, to read /var/log/syslog or /var/log/messages, one typically has to be added to the adm group.

See the consequent limitations, and possible workarounds, below.

SSH agent

So far, ssh-agent is the only option for Nerdlog to connect to remote nodes; so the agent should be running and the necessary keys should be added to it (you can check the keys in the agent by running ssh-add -L). You should be able to connect to the servers without entering a password.

Host requirements

Nerdlog agent relies on a bunch of standard tools to be present on the hosts, such as bash, awk, tail, head, gzip etc; many systems will already have everything installed, but a few special requirements are worth mentioning:

Limitations

Consequences of requiring SSH access

As mentioned above, SSH connection is a requirement, which does present a significant limitation as well.

For very small startup-ish teams, or personal projects, this shouldn't be an issue even for production (and that's the primary use case that Nerdlog targets anyway). However, if giving ssh access to production hosts to all devs is not cool, and we still want to use Nerdlog, then there are at least 2 possible ways to address this issue:

Uses CPU & IO of the actual hosts

Unlike centralized systems like Graylog or Kibana, Nerdlog fetches the logs directly from the hosts which generate the logs, and it consumes some CPU and IO on these hosts to perform the filtering and analysis. So, if the host is already very overloaded in case of an emergency, then getting logs from it might make things worse. Likewise, if the host becomes unresponsive for whatever reason, we can't get logs from it either.

Just like the previous point, this too can be addressed by syncing logs to a separate logging server, if we consider this problem severe enough.

Depends on the log file rotation policy

Typically, there are only 2 log files available (e.g. /var/log/syslog and /var/log/syslog.1); there are some more gzipped files (which Nerdlog doesn't support currently), and typically these files are rotated every day (or every week in some cases). So unless you make an extra effort to extend the lifetime of your log files, you'll only be able to read very recent logs.

For our use cases this was totally fine, but mentioning it just in case.

Installation and Usage

For the installation instructions, details about UI, available commands and options, check the readme on Github.

FAQ

Why the patterns are in awk syntax?

Because, at least in the current implementation, it's the simplest and most efficient way to implement filtering. As you remember from the “How it works” section above, after cutting the logs outside of the requested time range, we do the filtering, generate timeline histogram data, and print the last N log lines, keeping track of where they were in the original file (so that in the UI we can point the user at that line, if they want to). All this is done using an awk script in a single pass, and obviously it's easier to have filtering as part of the same awk script.

How is it better than lnav?

It's not better, and not worse. It's just very different.

Lnav's primary focus is to work with local log files, and it's great at it. You can just throw the whole directory with logs at lnav, and it'll find its way. It's possible to read remote logs as well, but it was never lnav's primary focus, and so remains an extra feature on top. For example, it's not practical to use lnav to check logs from 20+ nodes with 500MB log files each.

Nerdlog's primary focus is to work with remote logs, and to be efficient at it even when log files are large. Yes you can absolutely read logs from 20+ nodes with 500MB log files each, or more.