Line charts in Javascript
Recently I wanted to take a look at some personal data that I had been collecting for several years (Quantified Self and Lifelogging ftw :D). Until now it was sitting there write-only, with me occasionally peeking at it manually, but because it was in a pretty much raw format (actually, multiple formats, from different sources), I didn't interact with it too much. However, recently I was a bit bored, I wanted to code something just for fun (as much as parsing XML files can be called fun), so I cleaned up the data and now I wanted to get some "useful" information out of it.
The data consists of multiple timeseries for various labels, of integer data. The timestamp has second granularity, but the measurements are not every second, sometimes with several days or months between consecutive measurements.
I want to be able to plot multiple timeseries on the same graph, so I can compare them to each other. Because it spans years, I want to be able to zoom and pan on the data. The chart should also look reasonably decent. The library I use should be free and open-source. And the kicker requirement: when zooming out, the data should get aggregated. This means that when looking at the data on the yearly scale, I don't want to see individual dots for each second where I have a measurement, but I want to see all the measurements for a month added up, and have only data points for each month show up. I want to do this on several levels, so that I have monthly, daily and hourly aggregations.
And after two weeks of playing with various kinds of charting libraries in my free time, I reached the conclusion that the state of the art in free and open-source Javascript libraries is quite sad. Almost all of the libraries are awesome at only one thing and customizing them is quite hard. A quick review of the main ones I looked at:
- d3.js - super awesome, but too low level. I had played with it before, but I didn't want to reinvent the wheel completely this time.
- Bokeh - this would have been awesome. It is in Python, so it would have been super easy to integrate with the rest of my code. It generates HTML and Javascript, so you can view it in the browser. But because it uses generated coded, it's quite hard to customize anything you don't have explicit hooks for, and for zooming and aggregation you don't have hooks. I may use it in the future for other projects, because otherwise it's super awesome.
- Vega - is a visualization grammar, a declarative format for creating, saving and sharing visualization designs. Try defining custom zoom behaviour with aggregation as a JSON. Nope.
- MetricsGraphics.js - from Mozilla. Quite nice, but it never heard about zooming and it's quite opinionated, so customizing it is unlikely.
- Cubism - this one is specifically for time series. It has really cool horizon charts (they are really nice at showing a data from a large domain in a small amount of vertical space). But it's geared towards real time usage (so new data is coming in all the time) and it has no zooming either.
- dygraphs - You can only zoom in and reset the zoom, no zoom out AND you can't pan while zoomed in.
- Rickshaw - Based on d3.js - zooming only with separate preview and you have to write CSS to position stuff around.
- nvd3.js - Based on d3.js - looks nice out of the box, but you have to hack your own zoom and panning mechanism.
- c3.js - Based on d3.js. Based on a cursory look, it does everything I want it to do.
- dc.js - Based on d3.js (notice a pattern here?) and crossfilter. It also does what I need it to do.
There are some others commercial libraries too (Highcharts or amCharts), which do everything and the kitchen sink, but meh, vive la open source.
I ended up going with dc.js for two reasons: the API is much nicer, being in the style of d3.js, while c3.js has a more declarative syntax, including some string-to-function magic, and the other reason being that it is much easier to combine multiple kinds of charts in it and do filtering across them (thanks to the crossfilter integration). Also, out of the box, the charts in dc.js are nicer than the ones in c3.js, but I'm sure all this can be changed without (too) much hassle.
So, let's get to the fun part: coding the line chart.
I'll assume that we have an HTML file that contains a div with an id "chart" and the necessary CSS and Javascript imports
Some code to generate some random data, in the form of lists of {"timestamp": Date, "data": number}.
function randomDate(start, end) {
return new Date(start.getTime() + Math.random() * (end.getTime() - start.getTime()));
}
function getRandomInt(min, max) {
return Math.floor(Math.random() * (max - min + 1)) + min;
}
function generateData() {
var data = []
for (var i = 0; i < 1000; i++) {
data.push({"timestamp": randomDate(new Date(2014, 01, 01), new Date(2015, 12, 31)),
"data": getRandomInt(10, 1000)})
}
return data.sort(function(a,b) { return a.timestamp - b.timestamp })
}
Now let's declare some d3.js formatters, generate the data, initialize the chart, and do some preprocessing on the data:
var dateFormat = d3.time.format.iso
var dayFormat = d3.time.format('%x')
var numberFormat = d3.format('d');
var chart = dc.compositeChart('#chart');
var data = {"label1": generateData(), "label2": generateData(),
"label3": generateData(), "label4": generateData(),
"label5": generateData()}
// Parse the timestamp and precompute the slots where each datapoint will fit
// when aggregating
for (var label in data) {
if (data.hasOwnProperty(label)) {
data[label].forEach(function(d) {
d.dd = dateFormat.parse(d.timestamp);
d.hour = d3.time.hour(d.dd)
d.day = d3.time.day(d.dd)
d.month = d3.time.month(d.dd)
})
}
}
We are using a composite chart from dc.js, so we will have to generate the each line as an individual line chart. For each line, we generate the crossfilter group (which does aggregations and filtering) and then create the actual chart, setting the correct data and tooltip title.
function generateCharts(data, aggregation) {
charts = []
for (var label in data) {
if (data.hasOwnProperty(label)) {
var ndx = crossfilter(data[label]);
var dim = ndx.dimension(function (d) { return d[aggregation] });
lengths = dim.group().reduceSum(function(d) { return d.data})
charts.push(dc.lineChart(chart)
.group(lengths, label)
.dimension(dim)
.title((function(name) { return function (d) {
return name + '\n' + dayFormat(d.key) + '\n' + numberFormat(d.value);
}})(label))
)
}
}
return charts
}
currentGroup = "month"
charts = generateCharts(data, currentGroup)
Now we have to set all the options for our chart, nothing special.
chart
.width(1000)
.height(400)
.zoomScale([1,800])
.zoomOutRestrict(false)
.transitionDuration(0)
.margins({top: 30, right: 50, bottom: 25, left: 50})
.mouseZoomable(true)
.x(d3.time.scale().domain([new Date(2014, 1, 1), new Date(2015, 11, 31)]))
.round(d3.time.month.round)
.xUnits(d3.time.months)
.elasticY(true)
.renderHorizontalGridLines(true)
.keyAccessor(function(d) {
return d.key;
})
.legend(dc.legend().x(900).y(10).itemHeight(13).gap(5))
.valueAccessor(function (d) {
return d.value;
})
.compose(charts)
.shareColors(true)
.shareTitle(false)
.brushOn(false)
This is where the magic happens. When we get a zoom event, we check what is the range that is shown on screen and based on how wide it is, we regenerate the chart if we went to a different level.
.on('zoomed', function(chart, filter) {
var range = chart.x().domain()
var diff = range[1] - range[0]
if (diff < 1000*3600*24*15) {
aggregate = "hour"
} else if (diff < 1000*3600*24*30*6) {
aggregate = "day"
} else {
aggregate = "month"
}
if (aggregate != currentGroup) {
charts = generateCharts(data, aggregate)
chart.compose(charts).render()
currentGroup = aggregate
}
})
dc.renderAll();
The results can be seen here, with the source being in this GitHub repo. And now that we have pretty graphs, it's time to interpret them :))
And after the post was written, plot.ly open-sourced their own library X(. I'll have to investigate that one too.