First Cut of Cleavage

Posted by Venerable High Pope Swanage I, Cogent Animal of Our Lady of Discord 04 March 2011 at 11:24AM

I've been playing with the last few days of my 20% time at Relevance working with Michael Feathers' ideas about Code Turbulence, specifically with a mind for being able to examine it through time.

There's two basic approaches one can take to doing this, in my mind, which is taking a sequence of scatter plots and animating them (projecting the time dimension through time), or taking the sequence of scatter plots and layering them (projecting the time dimension through space). I chose to go with the latter.

My implementation was done in Clojure, mostly because I was interested in the idea, it involved crunching through large batches of data (mining every commit of a git repository!) and I'd yet to take a project from inception to implementation in Clojure.

As of right now it works (for a very generous definition of work), in so far that you can point it at an arbitrary git repo full of java source code and get out a neat three dimensional graph. It looks like this: Cleavage early render graph

Right now it's stacks of pie plates. Each stack represents one file (more accurately, each pie plate represents one file at a point in time). This was simple to write, but it bloats the dataset the renderer is dealing with drastically since relatively static files still have to get processed on each revision.

Some places I'd like to go with this:

  • Refactor the use of the repository dir everywhere to instead use the Clojure protocol/type system. This will enable much easier support of other repositories (like svn!)
  • Make the unit of rendering a file instead of a revision, and render a path instead of many discrete points. There's a huge number of points in the dataset just to represent that one file is static over the history of the git repo.
  • Provide a good way to identify each of the files that are tracked. This might be color coding with a key, or playing with rendering the file name to a buffer and then texturing the tendril with its filename.
  • Support more than java!
  • Improve the code complexity analyzer. Right now it's a VERY coarse, inaccurate regex to find branching logic in java files. Having multiple strategies for defining what complexity is, and implementing them for multiple languages would really increase the value of this tool in my opinion.
  • Make it nicer to look at. This might tie into identifying files and color coding.

I'd like to thank Srinivas Akella for giving me solid fundamentals in OpenGL, Wes Huang and Dan Manthey for teaching me how to think in LISP, Michael Feathers for introducing me to the idea of code turbulence, and all my Relevance co-conspirators for helping me get Clojure chops sufficient to get this far.