Environment variables considered harmful

Andy Buckley

High energy physics software (and, for all I know, academic software in other disciplines) is plagued by an obsession on using the system environment to define configuration aspects. This obsession has been applied to package dependencies, package behaviour and a host of other features, and is exceedingly misguided, being based on the simplicity of writing code to read in a configuration from the shell environment rather than using a configuration file.

The shell environment is not intended to be overpopulated with hundreds of environment settings for single projects: a well-written program should only use a few environment variables and at that they should be optional, designed to force a temporary change in behaviour. Any software that relies on users to add settings to their login scripts and .rc files is going about things the wrong way. This sort of environment dependancy leads to a plethora of users' systems in various degrees of misconfiguration, with the result that much time is wasted and scientific results may be trusted less.

Another aspect of this problem is that for projects where many different command shells are in use, separate and very portable sourced shell scripts must be maintained. This naturally leads to a broken system after several maintainer cycles, since it is hard to know several shell languages intimately enough to write very portable scripts in each: several maintainers will be required to keep the system configuration working and the end result is that scientific results may depend on whether the user was working in tcsh or bash/ksh shell! Worse still, the environment is also very susceptible to users' personal settings: I have seen several cases where .bashrc files used to choose favourite editors, set shell aliases and so-on have been responsible for the non-functioning of HEP experimental software: in none of these case were we ever able to work out why the clash occurred and it became "standard" practice to purge the shell environment before setting up a minimal environment to be populated by the experiment's environment-mangling setup scripts.

A much better system would be for the projects to use configuration files built on a robust and standard grammar (e.g. XML) rather than the environment to define the project's setup. Any project which relies on the shell "source" command to build several hundred of its environment settings is asking for trouble: the environment is not designed to be used in this way.

The logical continuation of this argument is an exercise for the reader :-)

Comments