The Tao of Option Parsing
=========================

Optik is an implementation of what I have always considered the most
obvious, straightforward, and user-friendly way to design a user
interface for command-line programs.  In short, I have fairly firm ideas
of the Right Way (and the many Wrong Ways) to do argument parsing, and
Optik reflects many of those ideas.  This document is meant to explain
this philosophy, which in turn is heavily influenced by the Unix and GNU
toolkits.


Terminology
-----------

First, we need to establish some terminology.

argument
  a chunk of text that a user enters on the command-line, and that the
  shell passes to execl() or execv().  In Python, arguments are
  elements of sys.argv[1:].  (sys.argv[0] is the name of the program
  being executed; in the context of parsing arguments, it's not very
  important.)  Unix shells also use the term "word".

  It's occasionally desirable to substitute an argument list other
  than sys.argv[1:], so you should read "argument" as "an element of
  sys.argv[1:], or of some other list provided as a substitute for
  sys.argv[1:]".

option   
  an argument used to supply extra information to guide or customize
  the execution of a program.  There are many different syntaxes for
  options; the traditional Unix syntax is "-" followed by a single
  letter, e.g. "-x" or "-F".  Also, traditional Unix syntax allows
  multiple options to be merged into a single argument, e.g.  "-x -F"
  is equivalent to "-xF".  The GNU project introduced "--" followed by
  a series of hyphen-separated words, e.g. "--file" or "--dry-run".
  These are the only two option syntaxes provided by Optik.

  Some other option syntaxes that the world has seen include:

  * a hyphen followed by a few letters, e.g. "-pf" (this is
    *not* the same as multiple options merged into a single argument)
  * a hyphen followed by a whole word, e.g. "-file" (this is
    technically equivalent to the previous syntax, but they aren't
    usually seen in the same program)
  * a plus sign followed by a single letter, or a few letters,
    or a word, e.g. "+f", "+rgb"
  * a slash followed by a letter, or a few letters, or a word, e.g.
    "/f", "/file"

  These option syntaxes are not supported by Optik, and they never will
  be.  (If you really want to use one of those option syntaxes, you'll
  have to subclass OptionParser and override all the difficult bits.
  But please don't!  Optik does things the traditional Unix/GNU way
  deliberately; the first three are non-standard anywhere, and the last
  one makes sense only if you're exclusively targeting MS-DOS/Windows
  and/or VMS.)

option argument
  an argument that follows an option, is closely associated with that
  option, and is consumed from the argument list when the option is.
  Often, option arguments may also be included in the same argument as
  the option, e.g. ::

    ["-f", "foo"]

  may be equivalent to ::

    ["-ffoo"]

  (Optik supports this syntax.)

  Some options never take an argument.  Some options always take an
  argument.  Lots of people want an "optional option arguments"
  feature, meaning that some options will take an argument if they see
  it, and won't if they don't.  This is somewhat controversial,
  because it makes parsing ambiguous: if "-a" takes an optional
  argument and "-b" is another option entirely, how do we interpret
  "-ab"?  Optik does not currently support this.

positional argument
  something leftover in the argument list after options have been
  parsed, ie. after options and their arguments have been parsed and
  removed from the argument list.

required option
  an option that must be supplied on the command-line; the phrase
  "required option" is an oxymoron and I personally consider it poor UI
  design.  Optik doesn't prevent you from implementing required options,
  but doesn't give you much help at it either.  See
  examples/required_1.py and examples/required_2.py for two ways to
  implement required options with Optik.

For example, consider this hypothetical command-line::

  prog -v --report /tmp/report.txt foo bar

"-v" and "--report" are both options.  Assuming the --report option
takes one argument, "/tmp/report.txt" is an option argument.  "foo"
and "bar" are positional arguments.


What are options for?
---------------------

Options are used to provide extra information to tune or customize the
execution of a program.  In case it wasn't clear, options are usually
*optional*.  A program should be able to run just fine with no options
whatsoever.  (Pick a random program from the Unix or GNU toolsets.  Can
it run without any options at all and still make sense?  The only
exceptions I can think of are find, tar, and dd -- all of which are
mutant oddballs that have been rightly criticized for their non-standard
syntax and confusing interfaces.)

Lots of people want their programs to have "required options".  Think
about it.  If it's required, then it's *not optional*!  If there is a
piece of information that your program absolutely requires in order to
run successfully, that's what positional arguments are for.  (However,
if you insist on adding "required options" to your programs, look in the
examples/ directory of the source distribution for two ways of
implementing them with Optik.)

Consider the humble "cp" utility, for copying files.  It doesn't make
much sense to try to copy files without supplying a destination and at
least one source.  Hence, "cp" fails if you run it with no arguments.
However, it has a flexible, useful syntax that does not rely on options
at all::

    cp SOURCE DEST
    cp SOURCE ... DEST-DIR

You can get pretty far with just that.  Most "cp" implementations
provide a bunch of options to tweak exactly how the files are copied:
you can preserve mode and modification time, avoid following symlinks,
ask before clobbering existing files, etc.  But none of this distracts
from the core mission of "cp", which is to copy one file to another, or
N files to another directory.


What are positional arguments for?
----------------------------------

In case it wasn't clear from the above example: positional arguments are
for those pieces of information that your program absolutely, positively
requires to run.

A good user interface should have as few absolute requirements as
possible.  If your program requires 17 distinct pieces of information in
order to run successfully, it doesn't much matter *how* you get that
information from the user -- most people will give up and walk away
before they successfully run the program.  This applies whether the user
interface is a command-line, a configuration file, a GUI, or whatever:
if you make that many demands on your users, most of them will just give
up.

In short, try to minimize the amount of information that users are
absolutely required to supply -- use sensible defaults whenever
possible.  Of course, you also want to make your programs reasonably
flexible.  That's what options are for.  Again, it doesn't matter if
they are entries in a config file, checkboxes in the "Preferences"
dialog of a GUI, or command-line options -- the more options you
implement, the more flexible your program is, and the more complicated
its implementation becomes.  It's quite easy to overwhelm users (and
yourself!) with too much flexibility, so be careful there.
