Skip to content

Latest commit

 

History

History
1024 lines (794 loc) · 41.2 KB

README.md

File metadata and controls

1024 lines (794 loc) · 41.2 KB

isconf(8)

ISconf 4.2.8.250
10/14/2014



NAME

isconf - infrastructure build and configuration manager

SYNOPSIS

isconf [-Dhrq] [-c config] [-m message] verb [verb_args] ...

QUICK START

First, follow the short installation instructions in the INSTALL file that came with this package. It's best to do this on whatever you're using as a golden master image, then deploy that image to all of your machines. If you're only setting up a few machines and have no image server, then you might be able to get away with installing each manually from the vendor CD, if you carefully install them each the same way.

Later, to install the latest version of package 'foo' on ten thousand hosts, including any hosts that are currently down or not yet built, you can log into any host and say this:

      cd /tmp
      wget http://example.com/foo-1.2.tar.gz  
      isconf start
      isconf lock just a comment about installing foo
      isconf snap foo-1.2.tar.gz  
      isconf exec tar -xzvf foo-1.2.tar.gz 
      isconf exec make -C foo-1.2 install
      isconf exec rm -rf foo-1.2.tar.gz foo-1.2
      isconf ci

...then, on the other 9,999 hosts, run this during boot or from cron:

      isconf start
      isconf up

If you're only managing a few machines, you can probably get away with not starting the isconf daemon at boot -- as above, just start it manually when you need it by saying 'isconf start'. If you're turning these machines over to someone else for long-term support, don't want to teach them isconf, and expect them to manually make a mess anyway, then it would make sense to leave the daemon off when you're done anyway. See BUGS/RESTRICTIONS for some security reasons why it also makes sense to leave the daemon off when you're done.

DESCRIPTION

See the GLOSSARY below for terms and concepts.

ISconf can be thought of as a cross between sudo(8) and a distributed version control tool like Git or Bitkeeper. Changes you make via ISconf are journaled and added to a distributed repository, queuing them for execution on other target machines. Those other target machines do not need to be running, or even be built, at the time you check in changes. As you turn on, build, reboot, and/or run 'isconf up' on other machines, ISconf consults the journal and executes the same changes, in the same order, on each machine.

The ISconf architecture is completely peer-to-peer; there are no central servers or other single points of failure, and it is designed for use in partially-partitioned networks such as DMZ environments. The command-line client talks to a daemon which runs on each machine. The daemon, usually started at boot, handles distributed file storage, locking, and network communications.

ISconf is not intended for use in environments where you want to make manual, ad-hoc, or other out-of-band changes to machines. If you don't have the will to rebuild all of your machines from scratch so you know what's on their disks, don't care about disaster recovery, don't need to keep any of your machines in lock-step with each other, don't need to test O/S changes before deploying them to production, aren't as interested in O/S patch management, or still want to log in as root on target machines and make arbitrary untracked changes, then you don't want this package.

BACKGROUND

One hundred years ago, automobiles were built by hand. Each vehicle was unique, composed of parts which were often crafted on the spot. Repairs were expensive and frequent, owners needed to be mechanics, fleetwide engineering changes were non-existent.

Then came mass production. Today, a single automotive assembly line produces vehicles of varying colors and options, all built from the same basic design and tooling. Replacement parts are interchangeable; technicians bolt engineering changes onto existing vehicles with a reasonable expectation that the parts will fit. Economies of scale have led to highly optimized designs, performance, and usability. Drivers turn the key and go.

Most IT departments are nowhere near that sort of capability; they still install and maintain operating systems and applications by hand. Each machine is unique, reliability is elusive, users become technicians, fixes often require re-engineering, most outages are caused by other fixes, and infrastructure-wide changes are fraught with peril if they are possible at all. Even basic security patches are, as a rule, applied sporadically.

ISconf provides some of the standardized tooling needed for deterministic, reproducible management of UNIX machines -- the kind of reproducibility you can count on for consistency, disaster recovery, reliability, security, and auditability. ISconf manages hosts over their entire lifecycle following initial install, allowing you to continue to test and deploy both major and minor changes well after the target hosts have been placed into production service. With this tool you can safely replace kernels and bootloaders, install new patches, packages, and tarballs, run arbitrary commands, and even re-install the entire operating system under program control, and do it all in a way that can be consistently reproduced on other current or future machines.

Over the last decade, users of earlier versions of ISconf have found that this consistency gives systems administrators enough breathing room to "get ahead of the ticket curve", reclaim more of their nights and weekends, and to, finally, begin to do more engineering and less firefighting.

PREREQUISITES

To do deterministic and repeatable host management, there are some things you need to do in addition to just installing and using ISconf. Above all, you need to maintain a reasonable level of control over the root-owned bits which you place on your disks, both during initial install as well as throughout their lifetime.

Automated systems administration is all about making self-modifying code behave consistently. If you don't start from a known state and keep it that way, then you can make no assertions about how your machines will behave in comparison with each other -- a change which works on one host may not work on others. Once you've destroyed this consistency, you can no longer count on QA, disaster recovery, load balancers, distributed applications, HA clusters, new deployments, or even single machine rebuilds to work correctly.

  1. If two or more hosts are supposed to act the same, then you need to install them from the same disk image. This applies to rebuilds of a single host as well as multiple installs of identical hosts. See base image in the glossary.

  2. Your host install tool needs to be able to capture an image of an existing machine, save it on an install server, then dump that image onto subsequent machines verbatim, altering only those things which are supposed to be unique, such as IP address and hostname. Among Linux installers, for example, systemimager meets this requirement; kickstart does not. Under Solaris, you'll need to use Flash Archives, not Jumpstart. See checkpoint image in the glossary.

  3. After initial install, you need to manage hosts exclusively with ISconf -- no manual or other out-of-band changes. There is one semi-exception to this rule: You might want to use another tool to manage environmentally-influenced configuration files. You'll want to manage the binaries of that tool using ISconf, and take care to ensure that the external tool manages only those files which it must. See environmental data in the glossary for more discussion.

FLAGS

Flags appear only after the isconf command name, not after subcommands (as opposed to e.g. CVS).

-c config

Top-level configuration file. Defaults to /etc/is/main.cf.

-D

Show debug info on stderr.

-m

Message -- human-readable comment describing the change. Required only when locking. This flag is deprecated and is likely to be removed; see the 'lock' verb below for how to provide the message in a forward-compatible way.

-r

Allow reboot if needed. Used only with the 'up' verb below. Also see the 'reboot' verb. Ordinarily you would execute 'isconf -r up' from an rc script, which is a relatively safe time to allow reboots.

This flag has no effect unless there is a 'reboot' operation pending in the journal. If there is a 'reboot' pending, then this flag allows the reboot to take place. You only want to provide this flag at times when it's safe to reboot the local machine.

Without this flag, if 'isconf up' encounters a 'reboot' operation during journal replay, the replay stops, an error message is issued, and subsequent changes are not applied. You'll need to run 'isconf -r up' to continue past this point -- we cannot assume that the later changes will work without the reboot.

-q

Quiet -- don't show verbose output.

-V

Version -- show ISconf version.

SUBCOMMANDS

Subcommands are often called 'verbs' in ISconf documentation and usage. They can be grouped into the following categories:

Changing disk state

lock, unlock, snap, exec, reboot, ci, up

Branch management

fork, migrate

Daemon management

start, stop, restart

The following is a detailed description of all subcommands, in alphabetical order. In these descriptions, the origin host is the host where a user executes lock, snap, exec, reboot, or ci, and the target host is where a user executes up(date).

ci

Check in local changes, such as snap or exec, and release branch lock.

Run on origin.

exec command args ...

Execute an arbitrary command. Causes the command to be executed immediately on the local machine, and queued for execution on target machines after ci.

Example:

      isconf lock "permanently shut down apache"
      isconf exec /etc/rc2.d/S85apache stop
      isconf exec rm /etc/rc2.d/S85apache 
      isconf ci

If you want to embed shell redirects or pipes in the exec arguments, then you'll need to wrap the arguments in a shell invocation. For example, this *won't* do what you want -- it will only change /etc/motd on the origin machine:

      isconf exec echo web server down > /etc/motd

Here's what you really want instead:

      isconf exec sh -c "echo web server down > /etc/motd"

fork newbranch

Create a new branch from the current branch, and migrate the local host onto the new branch. The original branch is the "parent" branch, and the new branch is the "child" branch.

If host A executes a fork, then it is the only host moved to the branch; hosts B and C do not change. If you want B or C to move to the new branch as well, see migrate.

Low-level implementation: Since a journal describes the details of a branch, then a fork essentially just copies the entire journal contents from the parent branch into a new journal named after the child branch, then runs the migrate code path.

lock message ...

Lock the branch. Required before snap, exec, reboot, or ci, and recommended before fork and migrate. The message will be recorded in the journal for each subsequent transaction until the next ci.

migrate branchname

Migrates the local host onto a new branch. In human language this means the host is going to change roles.

Switching a host to a new branch is only possible if the new branch is a child of the host's old branch, and if there have been no transactions executed on the host since the new branch was forked off -- in other words, the new branch's journal content needs to be a contiguous superset of the old branch's journal content. If these conditions aren't met, migrate will exit with a non-zero return code.

reboot

Reboots the machine. Before reboot, adds a journal entry which will cause all target machines on this branch to reboot at the same point in their build. For example, this is what you might do to install and boot a new kernel:

      isconf lock "upgrade to 2.6.20"
      isconf snap kernel-2.6.20-1.i686.rpm
      isconf exec rpm -ivh kernel-2.6.20-1.i686.rpm
      isconf reboot
      isconf ci
  
      # on other machines
      isconf -r up

Apply thought when using this verb; 'isconf up' (without the -r) won't finish if there is a 'reboot' pending as the next action in the journal. You need 'isconf -r up' -- and you don't want to put that in crontab, unless you really don't mind your machines rebooting at that time. See the -r flag for details.

Never say 'isconf exec reboot' -- that will only reboot the local machine, and will never create any sort of journal entry; the reboot kills isconf itself before the journal entry can be made. Always say 'isconf reboot' instead.

By default, ISconf runs 'shutdown -r now' to cause the reboot. If you want or need to use a different command, see the IS_REBOOT_CMD environment variable below.

restart

Restart the daemon. Equivalent to a stop followed by a start.

snap filename

Snapshot a file for install on target machines. Preserves the current contents, permissions, and mode bits of the file. After ci, any target host on the same branch can run 'isconf up', which will cause ISconf to install the file on the target host.

start

stop

Start or stop the daemon.

unlock

Break the lock on the local branch. Use with great care. This reverses the effect of a lock, invalidates the work stored in journal.wip on the locking machine, and will likely require the person who set the lock to discard their work and/or rebuild the machine where the lock was made.

Generally speaking, it's better to pick up the telephone and call the person who set the lock, asking them politely to finish whatever they were doing and check it in, rather than use this subcommand.

up

Update. Causes the isconf daemon to attempt execution of any new transactions in the journal. Errors and messages are copied to stderr and stdout of isconf as well as to syslog. Exits with a non-zero return code in case of error.

If used with -r, and if a pending reboot entry is encountered in the journal, then the host will reboot.

ENVIRONMENT

ISconf behavior is controlled predominantly by environment variables. These can be set and exported before starting or restarting the isconf daemon, or can be set in configuration files, usually main.cf. Any variables set in the environment will be overridden by those set in the configuration file.

IS_DOMAIN

ISconf domain name -- more or less equivalent to an AFS cell name or a Kerberos realm name; all of the machines sharing this name will share in the distributed cache that makes up the ISconf repository. Normally you'd want all of the machines in a given legal entity -- the same corporation, for instance, to use the same domain name. This is an arbitrary string, but by convention it is usually based on the DNS domain name.

Rather than set this in an environment variable, you're better off populating the /var/is/conf/domain file, below.

See the domain glossary entry.

IS_HOME

The base directory which ISconf uses for data storage. Defaults to /var/is.

IS_HMAC_KEYS

The name of a file which contains a list of HMAC keys. See the hmac_keys file below.

IS_HTTP_PORT

The port number which each ISconf HTTP server listens on. Used only for file fetches between machines, and is likely to be deprecated in a near-future release. Defaults to port 65028.

IS_NETS

The name of a file which contains a list of broadcast and/or host addresses which ISconf should advertize file updates to. See nets file below. Likely to change in a future release.

IS_NOBROADCAST

Boolean. If set, do not send UDP broadcast packets; only send UDP point-to-point packets to the addresses listed in **nets* file. Likely to change in a future release.

IS_PORT

The port number which ISconf daemons use to communicate between each other. Right now this is UDP only, but TCP will be added in 4.2.7, and UDP is likely to be deprecated. Defaults to port 65027.

IS_REBOOT_CMD

The command which ISconf uses to reboot the machine in response to an 'isconf reboot' request. Defaults to "shutdown -r now".

FILES

/etc/is/main.cf

Top-level configuration file for ISconf. See CONFIGURATION for details. As of this writing, ISconf does not distribute this file for you. In earlier versions, we used to simply rsync it from a central server at the beginning of each execution. In a near-future version, look for it to be managed by the distributed cache.

/var/is

See IS_HOME above.

/var/is/conf/domain

Single-line file, newline optional, containing only the string which is to be used for the ISconf domain name. See IS_DOMAIN above.

hmac_keys

HMAC key list, one key per line. See IS_HMAC_KEYS. If this file exists and contains properly-formatted keys, then RFC 2104 HMAC authentication is enabled; wire messages which are not properly authenticated will be ignored.

The first key in the list is used for generating authentication codes on all outgoing messages, and is the first key tried when authenticating inbound messages. If the first key fails to authenticate an inbound message, and if more than one key is listed in the file, then the second and subsequent keys are tried, in order. This mechanism enables you to update the primary key while preserving backward compatibility with older keys, allowing for a transition period.

When updating keys, it's a good idea to first add the new key as a secondary key to the hmac_keys file, and deploy that to all machines. Once you're sure that all of your machines (and install images) have the new key, then move the new key up to the primary position in the file, leaving any old key(s) in the file as secondaries, then deploy that. Finally, once you're again sure that all of your machines (and install images) are using the new primary key, then (and only then) should you think about retiring any old key(s).

Take care when deploying this file for the first time on hosts which are already running ISconf; those ISconf daemons which get it first will refuse to listen to any which don't yet have the file; this will prevent further deployment if you're using ISconf to deploy the file. To prevent this from happening, you can include the special key +ANY+ at the end of the file. If encountered in the file, this special key disables HMAC authentication of received messages, but does not prevent generation of authentication codes on transmitted messages. What you want to do is deploy the file with one or more real keys listed in it, followed by the +ANY+ key. The file might look like this when first deployed:

      someauthenticationkey
      +ANY+

As you deploy the above file, hosts will begin sending authenticated messages to each other using the someauthenticationkey key, but will ignore the authentication codes they receive. Once you are sure that all of your hosts have that copy of the file, then deploy the file again, this time with the +ANY+ key removed. This will cause hosts to begin checking received authentication codes against someauthenticationkey, while discarding any messages not properly authenticated.

For best security, each key should be about 20 bytes long; see RFC 2104. Keys can can include any ASCII character except space, newline, or the pound (hash) (#) sign. Lines beginning with pound signs are comments. Blank lines are ignored. If no keys are found in the file, then the entire file is ignored, and HMAC authentication is disabled.

ISconf checks for new versions of this file every 10 seconds when it is processing inbound packets -- there is no need to restart the ISconf daemon.

The hash function used internally is SHA-1, with Python's hmac module doing the real work.

You should ensure that this file is only readable by root.

This entire mechanism is likely to change and/or be replaced by PGP key signatures in a future release.

nets

Network broadcast list -- see IS_NETS above. See t/nets for an example. Likely to change.

CONFIGURATION

ISconf uses environment variables for its configuration, and these variables are in turn passed on to any executables ISconf calls -- see ENVIRONMENT. These environment variables can be set in /etc/is/main.cf. The format of this file is similar to a makefile, but whitespace is whitespace -- tabs aren't required. Each stanza looks like this:

      target: optional includes
          var1 = value
          var2 = value

The 'target' string above is matched against the hostname; case is significant. If it contains dots, it's matched against the FQDN. If it starts with a caret (^) it is a regex matched against the FQDN. The first matching target is the only one used, however the special target named 'DEFAULT' is always matched. Variables set in DEFAULT, earlier includes, or earlier in the same stanza are overridden by identically-named variables which appear later in matched stanzas. Comments are any text following a hash (#) on any line.

You can see the resulting environment by using the -D flag.

Here's an example /etc/is/main.cf:

      DEFAULT:
          NTPSERVERS = ntp1 ntp2 bigben.ucsd.edu mcs.anl.gov
          IS_NETS=/etc/is/nets
  
      NET1:
          GATEWAY = 10.10.1.1
  
      NET2:
          GATEWAY = 10.10.2.1
  
      # The host 'scotty' will end up with these environment variables
      # set during the ISconf run:
      #
      # NTPSERVERS="ntp1 ntp2 bigben.ucsd.edu mcs.anl.gov"
      # GATEWAY=10.10.1.1
      # building=23
      # floor=2
      # IS_NETS=/etc/is/nets.scotty
      #
      scotty: NET1
          building = this value is ignored
          building = 23
          floor = 2
          IS_NETS=/etc/is/nets.scotty
  
      # kirk will get:
      #
      # NTPSERVERS="ntp1 ntp2 bigben.ucsd.edu mcs.anl.gov"
      # IS_NETS=/etc/is/nets
      # GATEWAY = 10.10.2.1
      # building=52
      # floor=12
      # 
      kirk: NET2
          building = 52
          floor = 12
  
      LOST:
          building = unknown
          floor = unknown
  
      # any other host in example.com:
      #
      # NTPSERVERS="ntp1 ntp2 bigben.ucsd.edu mcs.anl.gov"
      # IS_NETS=/etc/is/nets
      # building=unknown
      # floor=unknown
      # GATEWAY=10.2.3.1
      # 
      ^.*\.example\.com: LOST
          GATEWAY = 10.2.3.1
  
      # any other host not in example.com:
      #
      # NTPSERVERS="ntp1 ntp2 bigben.ucsd.edu mcs.anl.gov"
      # IS_NETS=/etc/is/nets
      # building=unknown
      # floor=unknown
      # GATEWAY=10.0.0.1
      # 
      ^.*: LOST
          GATEWAY = 10.0.0.1

GLOSSARY

base image

An image which was created directly from vendor CD or another external source, and which contains an empty journal. Normally as simple as possible, with only a management tool (such as ISconf) and its prerequisites added. See image glossary entry.

You will usually create only one base image per platform -- see one-base. You will create at least one checkpoint image per branch.

branch

Host model or type. Similar usage as in software version control. A different branch is normally used for each set of hosts that need their own disk image and that do wildly different or conflicting things. For example, a DNS server and a database server would tend to be on different branches.

A branch is described by the sequence of transactions in a journal. A new branch is created by forking an existing branch, then creating a checkpoint image.

Branch names must match this regular expression:

          \w+[-\w\.]+

See also class.

For more discussion of what branches are, and how they contrast with domains, see http://trac.t7a.org/isconf/wiki/DomainsVsBranches.

categories of data

There appear to be three categories of data or executables on the disk of a typical UNIX machine:

  1. evolvable data -- this includes binaries and executables scripts, as well as most configuration files (see glossary entry)
  2. environmental data -- that set of configuration data which must match external conditions (see glossary entry)
  3. user or business data

checkpoint image

An offline copy of the disk image of a given branch at a given revision, used to differentiate branches and for speedier installs. A checkpoint image is made by installing a host from an ancestor checkpoint or base image, allowing its branch's journal entries to execute, then capturing the resulting disk content. See image glossary entry.

class

This is an anti-definition: the word "class" should not be used to describe anything related to deterministic host management. It brings with it misconceptions, such as "hosts can be subclassed", "changes in the parent class can be automatically and safely propagated to subclasses", and so on; most of these misconceptions imply that editing history is a safe thing to do.

congruent

Remaining in compliance with a fully-descriptive specification. If a configuration management tool is congruent, the machines it manages will remain in lock-step with the desired state. This makes it easier to maintain a representative test environment, and allows for more predictable disaster recovery. ISconf is congruent. Also see the convergent glossary entry, and:

http://www.infrastructures.org/papers/turing/turing.html#methods/congruence

convergent

Tending to converge towards a desired state. If a configuration management tool is convergent, the machines it manages will trend towards each other in disk state, but for practical reasons they will rarely reach congruence. It will be difficult to maintain a representative test environment, and changes will tend to be made first, and tested first, in production. Predictable disaster recovery will remain elusive. Also see the congruent glossary entry. For more in-depth information about convergence, see:

http://www.infrastructures.org/papers/turing/turing.html#methods/convergence

domain

An ISconf domain name is more or less equivalent to a NIS domain name, an AFS cell name, or a Kerberos realm name. This name is an arbitrary string, but by convention it is usually based on the DNS domain name.

ISconf domains are a security mechanism, primarily in regards to information hiding. All of the machines sharing the same ISconf domain name will share the same distributed cache, so root users on all of these machines will be able to read the contents of the cache. Likewise, machines that are in different domains will not share the same cache, so root users of these machines will not have access to the cache contents of the other domain. This becomes important if there is any proprietary or sensitive information stored in the ISconf cache, for example via a 'snap' or 'exec' command.

Normally you'd want all of the machines in a given legal entity -- the same corporation, for instance, to use the same domain name. For example, a small company using ISconf might use an ISconf domain name of 'example.com' on all of their machines. A larger company might have multiple divisions or subsidiaries and legal or security reasons for segregating machines. The large campany might put most of their machines in 'example.com', but for regulatory or security reasons might isolate a subsidiary into 'foo.example.com', and might put their bastion and firewall machines into 'security.example.com'. Note again that there doens't need to be a 'security.example.com' DNS domain for this to work.

The idea of ISconf domains is to completely isolate legal entities from each other when sharing the same net. Machines in different domains refuse to cache each other's data, answer each other's queries, and so on. Domains really come into play in the TCP crypto and user auth code (ISconf 4.3 and later), where each domain has its own PGP keyring; its own database of hosts and users, and all of the wire traffic is encrypted accordingly.

Establishing two machines in different domains means "I don't want these machines to ever cooperate at all. I will never merge their branches, I don't want them to be able to share or see each other's packages, cache space, or wire traffic."

For more discussion of what domains are, and how they contrast with branches, see http://trac.t7a.org/isconf/wiki/DomainsVsBranches.

Domain names must match this regular expression:

          \w+[-\w\.]+

editing history

"Editing history" is what happens when you build a machine based on a set of instructions, then alter the instructions that you used to build the machine. Once you've done this, there is no mathematically provable way to ensure that your new instructions will still build the same machine, short of building the new machine and then comparing the entire disk content to the old one.

In ISconf, editing history would mean editing the journal file itself -- while there's nothing (currently) which would stop you from doing that, and while the resulting file would be dutifully distributed and applied to the target machines, it's highly discouraged and may be a lot more difficult to do in the future, as we add things like digital signatures and checksums to the mix.

Editing history can create major outages when:

  • you're trying to deploy changes which worked in QA (using the old instructions) to production (using the new instructions)
  • you're trying to execute a disaster recovery, or even a single host rebuild, and you no longer have the old disk content available
  • you're trying to add a new server to an existing farm and don't have time to resort to backups or run rsync across both disks

environmental data

Configuration data (usually files) whose content is predominantly influenced by external business, political, procedural, or economic factors, and whose function is critical to the integrity of business data or to the operation of ISconf. Examples include files containing IP addresses, domain names, and other information which, if out of date, will break the ability of ISconf to continue journal replay. See also categories of data.

This version of ISconf does not attempt to manage environmental data natively. In earlier versions of ISconf, we would simply rsync environmental configuration files (such as /etc/hosts and resolv.conf) from a per-environment server at the beginning of each execution. We weren't real happy with the limited flexibility that gave us, but this method might work for you. If you want to do this, either modify or wrap the main isconf script to call rsync, and then set up an rsync server somewhere. See http://www.infrastructures.org/bootstrap/gold.shtml for more details. (If demand is there, we can add an executable hook that makes this easier.)

If a file meets the description of evolvable data, then it is not environmental data, and it should be managed via a simple isconf snap, rather than the means described below. For instance, /etc/passwd and /etc/resolv.conf are usually environmental, while /etc/services and /etc/inittab are much more influenced by local applications, and in most cases should be managed via isconf snap.

A better way to manage environmental data is to store the raw data (or pathnames pointing to the raw data) in /etc/is/main.cf and then generate the configuration files during boot and/or cron. (Look for an isconf verb in a near-future release which lets you export the content of /etc/is/main.cf as a shell script. In the meantime you can do this the other way around -- call ISconf from a wrapper script which sets up the environment you want.)

Your goal should be to keep the set of environmental data as small as possible, via architectural decisions in both infrastructure and applications.

You need to be able to examine each bit of environmental data to try to predict its behavior during deployment. Your ability to do this will always be flawed -- you cannot possibly imagine all of the permutations that might be encountered during future operations. Keeping the environmental data set small reduces your workload and the risk caused by a flawed analysis.

You need to be able to test each bit of environmental data after deployment. Any change in environmental data, by definition, cannot be tested anywhere except in its native environment. If this environment is production, then we can only test these changes after deploying them to production -- this is bad, but unless you have completely duplicate networks, down to the details of IP addresses and hostnames, there's not much you can do about it. Keeping the environmental dataset small reduces the variations between environments; ideally, IP addresses and/or hostnames might be the only differences you need to analyze and test for.

The classic case of what not to do involves hardcoding IP addresses in executables -- we all know this is bad, but here's why: Embedding an IP address in a larger executable taints the entire executable, requiring that we manage the whole file as environmental data. It's better to move that IP address to a separate configuration file, to shrink the size of the environmental data set.

Executables aren't the only thing that can be tainted. Embedding an IP address into a larger configuration file of non-environmental data also taints the rest of the configuration file. If you have ever generated configuration files by merging IP addresses into templates of other data, then you have experienced this case. By using templates, you prevent taint spread.

Taken to an extreme, tainting of files and packages can cause an explosion in the size of the environmental dataset, and an explosion of risk, to the point where all data on disk must be considered to be environmental, and all changes must be considered untested prior to production rollout. If you find yourself in this situation, your best bet might be to go with a convergent tool such as cfengine; you'll lose congruence, though, until you're able to fix the original problems and rebuild your machines. See convergent and congruent.

evolvable data

Data which can be managed via journal replay. This includes successive versions of executables, packages, kernels, patches, and configuration data which is not dependent on external environment. See also environmental data.

Examples of evolvable data include /bin/ls, /etc/mailcap, and libc.

It's usually safe to assume that all data is evolvable until proven otherwise. It's relatively easy to later begin managing a particular data item as environmental data if it proves necessary.

image

The bits placed on disk during installation; this will be either the base image or a checkpoint image taken from a child branch.

This version of ISconf does not do image management (it's in the release plan). Images need to be managed and installed using a certain category of host install tool. See PREREQUISITES.

one-base

One-base is an axiom of ISconf (and probably deterministic host management in general) -- it says that a host of any branch can be created by installing the base image for that platform and then replaying that branch's journal. This means you may only need one base image for any given platform -- starting from there you can use journal replay to morph the image into any other image which is described by a branch's journal.

"One base to start them all, one base to gild them, one base to boot them all and in the darkness build them."

Sorry.

journal

The transaction log of all changes made to a branch, starting from the base image. Used for replay on other hosts of the same branch.

INTERNALS

The basic algorithm that ISconf uses is roughly:

  • Journal the changes that are going to be made.
  • Preserve all entries in the journal over the lifetime of the infrastructure.
  • Only append entries to the journal -- never delete, never alter or re-order.
  • Apply changes to one or more test machines by reading the journal.
  • Maintain a history of changes that have been applied to each host. The master copy of this history should reside on the local disk of that host, and must be destroyed if the disk becomes corrupt or the host is rebuilt.
  • Later, apply the same changes in the same order on other machines, by reading the same journal, using the same code path, consulting their local histories to see what is yet to be done.
  • (This bullet point not yet implemented in 4.2.X.) Keep track of those files which a human explicitly says do not need to be versioned, and in those cases (only), refer only to the last journal entry for those files. An example is resolv.conf; in this case, you only want the most recent version to be applied, in order to ensure the host will function at all. (But consider new, edited, and deleted configuration files; these three operations actually could make use of distinct handling.)

BUGS/RESTRICTIONS

See http://trac.t7a.org/isconf/report for bugs, and see notes for a given release at http://trac.t7a.org/isconf/roadmap?show=all.

This version of ISconf was assembled with the features most requested by early adopters, and does not pretend to be secure or scalable. It is intended for use in small deployments, trusted internal networks, and evaluation. If you do install this version in a production environment, you should plan to upgrade as newer versions become available.

Having said that, we do use this version of ISconf ourselves.

Because we'll need to change wire protocols to add in the security bits, the next upgrade is likely to be a tricky procedure; you may need to keep an old machine around for a while as a cache server until you're sure you've upgraded all of your existing machines and updated your checkpoint images. Keep your rollouts small for now.

Known flaws in this release include:

  • Files are transported via cleartext HTTP. Any file checked into ISconf is visible by anyone with a web browser. HTTP in general is a poor protocol for ISconf, is being used at the suggestion of an early adopter, and we plan to deprecate it as soon as we can get the consensus that it's the wrong direction.
  • Control messages are transported via UDP and/or UDP broadcast, for expediency. This protocol is going to be deprecated in favor of a TCP mesh which will do both control messages and file transport.
  • No authentication or encryption is performed for any operation on the wire. A properly-formatted packet can be forged to insert unsafe content into the journal for an entire branch. We plan to add HMAC soonest, and later PGP signatures and either PGP or SSL transport encryption as part of the TCP mesh layer.
  • Each machine stores a complete copy of all files in the cache. If you snap hundreds of megabytes of files, you will use hundreds of megabytes of disk space on each node. Once the TCP mesh is up, we'll have a protocol capable of quorum counting. This will let us starve the cache on ordinary nodes, while allowing designated "master" nodes to store a copy of everything -- the cache on these can then be backed up for safe-keeping as well.
  • We don't pretend to handle a certain subset of configuration files right now -- see the environmental data glossary entry.
  • Logging is rudimentary right now; everything gets dumped into various files in /tmp. This all needs to be migrated to syslog and/or files in var log.

SEE ALSO


Background on where all this came from http://www.infrastructures.org ISconf main site http://www.isconf.org ISconf development site http://trac.t7a.org/isconf cfengine(8) http://www.cfengine.org python(1) http://www.python.org


Most ISconf developers and users can be found on the infrastructures mailing list at http://mailman.terraluna.org/mailman/listinfo/infrastructures

AUTHOR

Steve Traugott -- http://www.stevegt.com