Vladimir Prus


vladimirprus.com

Monday, April 21, 2014

IRC visualization using D3

In order to learn D3, I've created some basic visualization of IRC messages, with pictures like below. GitHub has the sources and demo.
I started with an internal IRC channel, and thought there might be 2 or 3 group of people mostly talking to each other. The created graph was a perfectly round cloud. Some surprises in the center, but that's it. The histograms showed that some people are very chatty, and pairwise histogram of messages was even more telling - 1300 pairs that only exchanged a few messages, and 60 very active paths. Neither showing just the top 20 users nor excluding top 20 users revealed any grouping. Then I've made the each node in the graph be pulled to the only another node - the top recipient of messages - and got the above picture. There are some groups visible, and the grouping is indeed around functional areas.

The data in the demo is artificial, but I've tried to recreate the same distribution, and the results are fairly close. Trying on logs for some other channels, mostly open-source, was a failure. The histograms are similar - few people with a large number of messages - but the graphs had no grouping at all.

D3 itself is fairly nice framework, as many of its examples show already. Nice data binding mechanism, animation, and a lot of utilities. Still, it's a construction kit. There are no charts one can use. of out the box and even things like margins are copy-pasted between all examples. It was also required to fine-tune everything to the data, from chart dimensions to margins to parameters of the graph layout algorithm. If one tries on another data set, the axes of histograms break down, and the nodes of the graph disappear to the sides of the screen.

There are libraries that try to create reusable charts on top of D3. C3 appears to have nice standard charts, but completely hides all of D3's power. NVD3 is quite promising, but has no documentation and is apparently in the middle of extensive rewrite. DC is meant to be a frontend to crossfilter, but not so useful with arbitrary data - and in another context, I just could not get it to create charts I want.

I will likely use D3 in future, though will need a personal set of chart helpers first.

No comments: