Recently I wanted to take a look at some personal data that I had been collecting for several years (Quantified Self and Lifel­og­ging ftw :D). Until now it was sitting there write-only, with me oc­ca­sion­al­ly peeking at it manually, but because it was in a pretty much raw format (actually, multiple formats, from different sources), I didn't interact with it too much. However, recently I was a bit bored, I wanted to code something just for fun (as much as parsing XML files can be called fun), so I cleaned up the data and now I wanted to get some "useful" in­for­ma­tion out of it.

The data consists of multiple timeseries for various labels, of integer data. The timestamp has second gran­u­lar­i­ty, but the mea­sure­ments are not every second, sometimes with several days or months between con­sec­u­tive mea­sure­ments.

I want to be able to plot multiple timeseries on the same graph, so I can compare them to each other. Because it spans years, I want to be able to zoom and pan on the data. The chart should also look reasonably decent. The library I use should be free and open-source. And the kicker re­quire­ment: when zooming out, the data should get aggregated. This means that when looking at the data on the yearly scale, I don't want to see individual dots for each second where I have a mea­sure­ment, but I want to see all the mea­sure­ments for a month added up, and have only data points for each month show up. I want to do this on several levels, so that I have monthly, daily and hourly ag­gre­ga­tions.

And after two weeks of playing with various kinds of charting libraries in my free time, I reached the conclusion that the state of the art in free and open-source Javascript libraries is quite sad. Almost all of the libraries are awesome at only one thing and cus­tomiz­ing them is quite hard. A quick review of the main ones I looked at:

  • d3.js - super awesome, but too low level. I had played   with it before, but I didn't want to   reinvent the wheel completely this time.
  • Bokeh - this would have been awesome. It   is in Python, so it would have been super easy to integrate with the rest of   my code. It generates HTML and Javascript, so you can view it in the browser.   But because it uses generated coded, it's quite hard to customize anything you   don't have explicit hooks for, and for zooming and ag­gre­ga­tion you don't have   hooks. I may use it in the future for other projects, because otherwise it's   super awesome.
  • Vega - is a vi­su­al­iza­tion grammar, a   de­clar­a­tive format for creating, saving and sharing vi­su­al­iza­tion designs. Try   defining custom zoom behaviour with ag­gre­ga­tion as a JSON. Nope.
  • Met­ric­s­Graph­ics.js - from Mozilla. Quite   nice, but it never heard about zooming and it's quite opin­ion­at­ed, so   cus­tomiz­ing it is unlikely.
  • Cubism - this one is specif­i­cal­ly for time   series. It has really cool horizon charts (they are really nice at showing a   data from a large domain in a small amount of vertical space). But it's geared   towards real time usage (so new data is coming in all the time) and it has no   zooming either.
  • dygraphs - You can only zoom in and reset the zoom, no   zoom out AND you can't pan while zoomed in.
  • Rickshaw - Based on d3.js - zooming   only with separate preview and you have to write CSS to position stuff around.
  • nvd3.js - Based on d3.js - looks nice out of the box, but   you have to hack your own zoom and panning mechanism.
  • c3.js - Based on d3.js. Based on a cursory look, it does   everything I want it to do.
  • dc.js - Based on d3.js (notice a pattern   here?) and cross­fil­ter. It also does what I need it to do.

There are some others commercial libraries too (Highcharts or amCharts), which do everything and the kitchen sink, but meh, vive la open source.

I ended up going with dc.js for two reasons: the API is much nicer, being in the style of d3.js, while c3.js has a more de­clar­a­tive syntax, including some string-to-function magic, and the other reason being that it is much easier to combine multiple kinds of charts in it and do filtering across them (thanks to the cross­fil­ter in­te­gra­tion). Also, out of the box, the charts in dc.js are nicer than the ones in c3.js, but I'm sure all this can be changed without (too) much hassle.

So, let's get to the fun part: coding the line chart.

I'll assume that we have an HTML file that contains a div with an id "chart" and the necessary CSS and Javascript imports

Some code to generate some random data, in the form of lists of {"­time­stam­p": Date, "data": number}.

function randomDate(start, end) {
        return new Date(start.getTime() + Math.random() * (end.getTime() - start.getTime()));
}
function getRandomInt(min, max) {
        return Math.floor(Math.random() * (max - min + 1)) + min;
}

function generateData() {
    var data = []    
    for (var i = 0; i < 1000; i++) {
        data.push({"timestamp": randomDate(new Date(2014, 01, 01), new Date(2015, 12, 31)), 
                   "data": getRandomInt(10, 1000)})
    }
    return data.sort(function(a,b) { return a.timestamp - b.timestamp })
}

Now let's declare some d3.js formatters, generate the data, initialize the chart, and do some pre­pro­cess­ing on the data:

var dateFormat = d3.time.format.iso
var dayFormat = d3.time.format('%x')
var numberFormat = d3.format('d');
var chart = dc.compositeChart('#chart');
var data = {"label1": generateData(), "label2": generateData(),
            "label3": generateData(), "label4": generateData(),
            "label5": generateData()}
// Parse the timestamp and precompute the slots where each datapoint  will fit 
//  when aggregating
for (var label in data) {
    if (data.hasOwnProperty(label)) {
        data[label].forEach(function(d) {
            d.dd = dateFormat.parse(d.timestamp);
            d.hour = d3.time.hour(d.dd)
            d.day = d3.time.day(d.dd)
            d.month = d3.time.month(d.dd)
        })
    }
}

We are using a composite chart from dc.js, so we will have to generate the each line as an individual line chart. For each line, we generate the cross­fil­ter group (which does ag­gre­ga­tions and filtering) and then create the actual chart, setting the correct data and tooltip title.

function generateCharts(data, aggregation) {
    charts = []
    for (var label in data) {
        if (data.hasOwnProperty(label)) {
            var ndx = crossfilter(data[label]);
            var dim = ndx.dimension(function (d) { return d[aggregation] });

            lengths = dim.group().reduceSum(function(d) { return d.data})
            charts.push(dc.lineChart(chart)
                            .group(lengths, label)
                            .dimension(dim)
                            .title((function(name) { return function (d) {
                                return name + '\n' + dayFormat(d.key) + '\n' + numberFormat(d.value);
                            }})(label))
            )
        }
    }
    return charts
}
currentGroup = "month"
charts = generateCharts(data, currentGroup)

Now we have to set all the options for our chart, nothing special.

chart 
    .width(1000)
    .height(400)
    .zoomScale([1,800])
    .zoomOutRestrict(false)
    .transitionDuration(0)
    .margins({top: 30, right: 50, bottom: 25, left: 50})
    .mouseZoomable(true)
    .x(d3.time.scale().domain([new Date(2014, 1, 1), new Date(2015, 11, 31)]))
    .round(d3.time.month.round)
    .xUnits(d3.time.months)
    .elasticY(true)
    .renderHorizontalGridLines(true)
    .keyAccessor(function(d) {
        return d.key;
    })
    .legend(dc.legend().x(900).y(10).itemHeight(13).gap(5))
    .valueAccessor(function (d) {
        return d.value;
    })
    .compose(charts)
    .shareColors(true)
    .shareTitle(false)
    .brushOn(false)

This is where the magic happens. When we get a zoom event, we check what is the range that is shown on screen and based on how wide it is, we regenerate the chart if we went to a different level.

    .on('zoomed', function(chart, filter) {
        var range = chart.x().domain()
        var diff = range[1] - range[0]
        if (diff < 1000*3600*24*15) {
            aggregate = "hour"
        } else if (diff < 1000*3600*24*30*6) {
            aggregate = "day"
        } else {
            aggregate = "month"
        }
        if (aggregate != currentGroup) {
            charts = generateCharts(data, aggregate)
            chart.compose(charts).render()
            currentGroup = aggregate
        }
    })

dc.renderAll();

The results can be seen here, with the source being in this GitHub repo. And now that we have pretty graphs, it's time to interpret them :))

And after the post was written, plot.ly open-sourced their own library X(. I'll have to in­ves­ti­gate that one too.