A tick in the monitoring box

I’ve been looking into ways to store and display the data coming from my SmartThings sensors and other sources. A traditional SQL database is certainly an option, and one that I started with. Trouble is, without a lot of extra work, SQL isn’t ideal for time series data, and I’d still have to find a way to visualise the data.

That’s a lot of wheels to reinvent. There are plenty of solutions for storing and displaying time series data that already exist, and today I’m going to take a closer look at one of them.

InfluxDB is the database element of a suite of products from InfluxData – the TICK stack, where:

  • Telegraf is an agent for capturing and sending time series data.
  • InfluxDB is the database for storing the data.
  • Chronograf is a tool for visualising the data in InfluxDB.
  • Kapacitor is the tool for alerting on thresholds and anomalies.

InfluxData offer support for the products, which are all open source except for Chronograf which is free for a single user and has paid options.

I’ve tried the TIC of the TICK so far, and they’re all really easy to set up. Install package, run service, done.

I set Telegraf up with some default settings to monitor my home server for all the usual metrics – CPU, disk capacity and so on. The only configuration needed was to set the URL of the InfluxDB database. I also chose to reduce the default interval for collecting metrics from 10 seconds to 60. I really don’t need it that granular.

The InfluxDB end took care of itself – it’s not necessary to create tables in advance. Data consists of timestamped measurements that have tags and consist of a number of field/value pairs. Each unique combination of measurement and tags is known as a series. The best way to see this is by looking at an example.

Let’s look at the CPU data that Telegraf is sending, using the influx command line:

> show tag keys from cpu
name: cpu

CPU is the measurement, and it has two tags – host and cpu. Now let’s look at the fields:

> show field keys from cpu
name: cpu

So each measurement will have all of those fields. We can list the distinct series in our database:

> show series

My server has four CPUs and I’m also collecting the total CPU use.

Putting the two together, each entry in the series will consist of each of the field keys. InfluxDB uses an SQL like syntax for querying the data. I say “SQL like”. It’s different enough that it’ll trip you up if you already know SQL, but the basics of selection are very similar:

> select last(usage_system) from cpu where "cpu" = 'cpu1' limit 1
name: cpu
time last
1471737660000000000 0.18363939899828194

It’s worth nothing the quotes there – double quotes for tag names are optional and only needed if there are special characters or spaces in the name but tag values always need single quotes.

I had a quick play with Chronograf, but it lacked a little flexibility. Luckily there are other options. Grafana is another open source display tool that can use InfluxDB as a data source, and has a wealth of display options as well as an entire ecosystem of plugins.

It didn’t take more than a few clicks to set up a dashboard with the key metrics for my server:

Of course, Telegraf isn’t the only way to get data into InfluxDB. Any application can send metrics to InfluxDB using the HTTP API or a client library. The API is a little unusual – rather than, say, a JSON API you just post a string:

curl -X POST "http://localhost:8086?write?db=<databasename> --data-binary 'measurement,tagname=tagvalue,tagname=tagvalue fieldname=value,fieldname=value timestamp'

Note the odd mix of the query string for the database name and POST for the data.

There was a JSON api in previous versions of InfluxDB but it was removed over concerns about performance. I can see the reasoning but it would be a useful option for integration into other tools.

The other key features worth noting are retention policies and continuous queries. A retention policy determines how long data is kept and continuous queries are used to consolidate data into summary data points – for example, taking one minute intervals and writing hourly averages into another retention policy.

Data is written into a specific retention policy so the decisions about retention policies need to be made up front. It’s a bit fiddly – a continuous query has to be defined for every metric, and selection is based on retention policy so dashboards have to specify which policy to use. This means it’s impossible to draw a graph that gets progressively less granular as you zoom out to longer timespans, at least with the standard tools.

For my purposes it’s not a big problem, although I can see that I’ll need to restart my monitoring from scratch when I’ve really got to grips with the policies I need. For test purposes I’m just keeping a week of the raw stats.

The next step will be to pull my SmartThings temperature data into InfluxDB and draw some more graphs…

in Home Automation

Add a Comment

Your email address will not be published. All comments will be reviewed.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Related Posts