Naming considered MOSTLY harmless

Posted by Venerable High Pope Swanage I, Cogent Animal of Our Lady of Discord 20 June 2014 at 08:10PM

This is a continuance of a thought that I'd briefly mulled over on twitter to some reaction. No point in summarizing as it's only a few characters: "In I find it best to try and avoid naming anything except for the tools (fns and constants) used to construct a running system."

The reactions were great, as I would more or less expect, namely trying to drive at the thrust of why I would say such a thing. "What are examples of things I would avoid naming?" "By naming do I mean assigning to a Var, or something else?" and "My examples are primarily stateful things, is this to cure live programming woes?"

I'll do my best to expound on my reasoning in more detail, become more crisp in defining what I mean, and perhaps persuade you to strive in the vicinity of this same goal.

Let's start off with applying a more rigorous definition of this guideline to start with: "In Clojure, I find it best to minimize the number of things resolvable in a global namespace to a single unique value. I think that the minimal set of such things probably includes functions, protocols, types, records, compile time constant values. What all of the things I am comfortable naming share in common is that they are definitions abstract of any individual concrete application of some Clojure program."

So, from that, it means that a named thing assigned to a Var qualifies, as that occupies some point in the Clojure namespace resolver. Something bound to a JNDI name is resolvable in a global namespace, and so it also qualifies. Binding a value to a symbol, as with let, does not operate under this rule, as the scope of that name binding is the lexical scope of the let form. Associating a value into a map would not qualify, nor would it count if you're updating an atom holding a map with a new value. However, this principle does encourage you to avoid putting that atom in Var that is globally adressable. There are definite gray areas and times to bend this guideline as well. A dynamic Var is a globally resolvable thing, but the binding of that Var to any value is local to a thread. There's a safe and sane way to override this with the bindings macro. So I think it's safe too, but my tendency is to avoid using it except when there are concrete reasons to do so.

Some of the things I avoid naming are any non-Var Clojure identities like Agents, Atoms and Refs. I tend to try and avoid naming core.async Channels, though I sometimes do find not doing so sufficiently inconvenient that I name them. I'd rather work with a database connection in a dynamic Var or as a parameter than as a value in a Var that gets established via e.g. alter-var-root!

I'm going to tackle last the question of "Why?", because it is long to describe. Thanks for bearing with me. My motivations for why is that the things I have described as avoiding naming are the kinds of things which are resources that your program will be making use of. Some specific atom that's holding the current state of the game world is a resource you're making use of as the game engine runs forward in time to provide continuity between discrete game states. The things you name with (def) and (defn) special forms often include defining the behavior of your Clojure program. If you can structure your program such that the resources it uses are not bound to names as part of loading a file, you have now decoupled the process of altering your program's behavior from the process of initiating its dynamic state.

So one thing you may get out of this is easier live programming, certainly!

Another value this guideline will enable, though, is a conscious process of coupling process to resources. When you make this act conscious, it makes parallelizing your program easier, and weakens the coupling between any particular process and a particular resource. As a terrible non-compiling example for contemplation, consider:

(ns foo)

(def conn (delay (initialize-db-connection)))

(def get-foos
  []
    (execute @conn "select * from foo"))

(-main
  [& args]
  (println (get-foos))
compared with
(ns foo)

(def get-foos
  [conn]
    (execute conn "select * from foo"))

(-main
  [& args]
  (println (get-foos (initialize-db-connection)))
In the former, get-foos only operates on the locally defined conn, and its correct functioning is predicated on that conn being successfully initialized prior to the query executing. As an aside, how often have you seen conn defined without being wrapped in a delay and dereferenced? Have you done it yourself?

The latter is functionally equivalent to the former for the use case of main, but get-foos now can accept an arbitrary initialized connection. If you wanted to get the foos out of 5 different databases, you could do so pretty easily mapping over those databases. get-foos behavior is less coupled to its compilation environment and more coupled to its runtime environment.

This has broad, subtle consequences when the systems you build are more complex than the simple example cases I've described above.

This is a principle I've found valuable from personal experience and most often reinforced by deviating away from it painfully. It is not an easy thing to do, but after you spend some time thinking about how to achieve it, it starts to get easier. Maybe you'll find it useful too if you give it a try.

I'd like to thank @Baranosky, @ambrosebs, @alandipert, and @BrandonBloom for getting me wound up on this subject.