whizard is hosted by Hepforge, IPPP Durham
Previous Up Next

Chapter ‍4 Steering WHIZARD: SINDARIN Overview

4.1 The command language for WHIZARD

A conventional physics application program gets its data from a set of input files. Alternatively, it is called as a library, so the user has to write his own code to interface it, or it combines these two approaches. WHIZARD ‍1 was built in this way: there were some input files which were written by the user, and it could be called both stand-alone or as an external library.

WHIZARD ‍2 is also a stand-alone program. It comes with its own full-fledged script language, called SINDARIN. All interaction between the user and the program is done in SINDARIN expressions, commands, and scripts. Two main reasons led us to this choice:

  • In any nontrivial physics study, cuts and (parton- or hadron-level) analysis are of central importance. The task of specifying appropriate kinematics and particle selection for a given process is well defined, but it is impossible to cover all possiblities in a simple format like the cut files of WHIZARD ‍1.

    The usual way of dealing with this problem is to write analysis driver code (often in C++, using external libraries for Lorentz algebra etc. However, the overhead of writing correct C++ or Fortran greatly blows up problems that could be formulated in a few lines of text.

  • While many problems lead to a repetitive workflow (process definition, integration, simulation), there are more involved tasks that involve parameter scans, comparisons of different processes, conditional execution, or writing output in widely different formats. This is easily done by a steering script, which should be formulated in a complete language.

The SINDARIN language is built specifically around event analysis, suitably extended to support steering, including data types, loops, conditionals, and I/O.

It would have been possible to use an established general-purpose language for these tasks. For instance, OCaml which is a functional language would be a suitable candidate, and the matrix-element generator O’Mega is written in that language. Another candidate would be a popular scripting language such as PYTHON.

We started to support interfaces for commonly used languages: prime examples for C, C++, and PYTHON are found in the share/interfaces subdirectory. However, introducing a special-purpose language has the three distinct advantages: First, it is compiled and executed by the very Fortran code that handles data and thus accesses it without interfaces. Second, it can be designed with a syntax especially suited to the task of event handling and Monte-Carlo steering, and third, the user is not forced to learn all those features of a generic language that are of no relevance to the application he/she is interested in.

4.2 SINDARIN scripts

A SINDARIN script tells the WHIZARD program what it has to do. Typically, the script is contained in a file which you (the user) create. The file name is arbitrary; by convention, it has the extension ‘.sin’. WHIZARD takes the file name as its argument on the command line and executes the contained script:

/home/user$ whizard script.sin

Alternatively, you can call WHIZARD interactively and execute statements line by line; we describe this below in Sec.14.2.

A SINDARIN script is a sequence of statements, similar to the statements in any imperative language such as Fortran or C. Examples of statements are commands like integrate, variable declarations like logical ?flag or assigments like mH = 130 GeV.

The script is free-form, i.e., indentation, extra whitespace and newlines are syntactically insignificant. In contrast to most languages, there is no statement separator. Statements simply follow each other, just separated by whitespace.

statement1 statement2
statement3
               statement4

Nevertheless, for clarity we recommend to write one statement per line where possible, and to use proper indentation for longer statements, nested and bracketed expressions.

A command may consist of a keyword, a list of arguments in parantheses (), and an option script which itself is a sequence of statements.

command
command_with_args (arg1, arg2)
command_with_option { option }
command_with_options (arg) {
  option_statement1
  option_statement2
}

As a rule, parentheses () enclose arguments and expressions, as you would expect. Arguments enclosed in square brackets [] also exist. They have a special meaning, they denote subevents (collections of momenta) in event analysis. Braces {} enclose blocks of SINDARIN code. In particular, the option script associated with a command is a block of code that may contain local parameter settings, for instance. Braces always indicate a scoping unit, so parameters will be restored their previous values when the execution of that command is completed.

The script can contain comments. Comments are initiated by either a # or a ! character and extend to the end of the current line.

statement
# This is a comment
statement  ! This is also a comment

4.3 Errors

Before turning to proper SINDARIN syntax, let us consider error messages. SINDARIN distinguishes syntax errors and runtime errors.

Syntax errors are recognized when the script is read and compiled, before any part is executed. Look at this example:

process foo = u, ubar => d, dbar
md = 10
integrade (foo)

WHIZARD will fail with the error message

sqrts = 1 TeV
integrade (foo)
          ^^
| Expected syntax: SEQUENCE    <cmd_num> = <var_name> '=' <expr>
| Found token: KEYWORD:    '('
******************************************************************************
******************************************************************************
*** FATAL ERROR:  Syntax error (at or before the location indicated above)
******************************************************************************
******************************************************************************
WHIZARD run aborted.

which tells you that you have misspelled the command integrate, so the compiler tried to interpret it as a variable.

Runtime errors are categorized by their severity. A warning is simply printed:

Warning: No cuts have been defined.

This indicates a condition that is suspicious, but may actually be intended by the user.

When an error is encountered, it is printed with more emphasis

******************************************************************************
*** ERROR: Variable 'md' set without declaration
******************************************************************************

and the program tries to continue. However, this usually indicates that there is something wrong. (The d quark is defined massless, so md is not a model parameter.) WHIZARD counts errors and warnings and tells you at the end

| There were  1 error(s) and no warnings.

just in case you missed the message.

Other errors are considered fatal, and execution stops at this point.

******************************************************************************
******************************************************************************
*** FATAL ERROR:  Colliding beams: sqrts is zero (please set sqrts)
******************************************************************************
******************************************************************************

Here, WHIZARD was unable to do anything sensible. But at least (in this case) it told the user what to do to resolve the problem.

4.4 Statements

SINDARIN statements are executed one by one. For an overview, we list the most common statements in the order in which they typically appear in a SINDARIN script, and quote the basic syntax and simple examples. This should give an impression on the WHIZARD’s capabilities and on the user interface. The list is not complete. Note that there are no mandatory commands (although an empty SINDARIN script is not really useful). The details and options are explained in later sections.

4.4.1 Process Configuration

model

model = model-name

This assignment sets or resets the current physics model. The Standard Model is already preloaded, so the model assignment applies to non-default models. Obviously, the model must be known to WHIZARD. Example:

model = MSSM

See Sec. ‍5.3.

alias

alias alias-name = alias-definition

Particles are specified by their names. For most particles, there are various equivalent names. Names containing special characters such as a + sign have to be quoted. The alias assignment defines an alias for a list of particles. This is useful for setting up processes with sums over flavors, cut expressions, and more. The alias name is then used like a simple particle name. Example:

alias jet = u:d:s:U:D:S:g

See Sec. ‍5.2.1.

process

process tag = incoming => outgoing

Define a process. You give the process a name ⟨tag⟩ by which it is identified later, and specify the incoming and outgoing particles, and possibly options. You can define an arbitrary number of processes as long as they are distinguished by their names. Example:

process w_plus_jets = g, g => "W+", jet, jet

See Sec. ‍5.4.

sqrts

sqrts = energy-value

Define the center-of-mass energy for collision processes. The default setup will assume head-on central collisions of two beams. Example:

sqrts = 500 GeV

See Sec. ‍5.5.1.

beams

beams = beam-particles
beams =
beam-particles => structure-function-setup

Declare beam particles and properties. The current value of sqrts is used, unless specified otherwise. Example:

beams = u:d:s, U:D:S => lhapdf

With options, the assignment allows for defining beam structure in some detail. This includes beamstrahlung and ISR for lepton colliders, precise structure function definition for hadron colliders, asymmetric beams, beam polarization, and more. See Sec. ‍5.5.

4.4.2 Parameters

Parameter settings

parameter = value
type user-parameter
type user-parameter = value

Specify a value for a parameter. There are predefined parameters that affect the behavior of a command, model-specific parameters (masses, couplings), and user-defined parameters. The latter have to be declared with a type, which may be int (integer), real, complex, logical, string, or alias. Logical parameter names begin with a question mark, string parameter names with a dollar sign. Examples:

mb = 4.2 GeV
?rebuild_grids = true
real mass_sum = mZ + mW
string $message = "This is a string"

The value need not be a literal, it can be an arbitrary expression of the correct type. See Sec. ‍4.7.

read_slha

read_slha (filename)

This is useful only for supersymmetric models: read a parameter file in the SUSY Les Houches Accord format. The file defines parameter values and, optionally, decay widths, so this command removes the need for writing assignments for each of them.

read_slha ("sps1a.slha")

See Sec. ‍10.2.

show

show (data-objects)

Print the current value of some data object. This includes not just variables, but also models, libraries, cuts, etc. This is rather a debugging aid, so don’t expect the output to be concise in the latter cases. Example:

show (mH, wH)

See Sec. ‍5.10.

printf

printf format-string (data-objects)

Pretty-print the data objects according to the given format string. If there are no data objects, just print the format string. This command is borrowed from the C programming language; it is actually an interface to the system’s printf(3) function. The conversion specifiers are restricted to d,i,e,f,g,s, corresponding to the output of integer, real, and string variables. Example:

printf "The Higgs mass is %f GeV" (mH)

See Sec. ‍5.10.

4.4.3 Integration

cuts

cuts = logical-cut-expression

The cut expression is a logical macro expression that is evaluated for each phase space point during integration and event generation. You may construct expressions out of various observables that are computed for the (partonic) particle content of the current event. If the expression evaluates to true, the matrix element is calculated and the event is used. If it evaluates to false, the matrix element is set zero and the event is discarded. Note that for collisions the expression is evaluated in the lab frame, while for decays it is evaluated in the rest frame of the decaying particle. In case you want to impose cuts on a factorized process, i.e. a combination of a production process and one or more decay processes, you have to use the selection keyword instead.

Example for the keyword cuts:

cuts = all Pt > 20 GeV [jet]
  and  all mZ - 10 GeV < M < mZ + 10 GeV [lepton, lepton]
  and  no  abs (Eta) < 2 [jet]

See Sec. ‍5.2.5.

integrate

integrate (process-tags)

Compute the total cross section for a process. The command takes into account the definition of the process, the beam setup, cuts, and parameters as defined in the script. Parameters may also be specified as options to the command.

Integration is necessary for each process for which you want to know total or differential cross sections, or event samples. Apart from computing a value, it sets up and adapts phase space and integration grids that are used in event generation. If you just need an event sample, you can omit an explicit integrate command; the simulate command will call it automatically. Example:

integrate (w_plus_jets, z_plus_jets)

See Sec. ‍5.7.1.

?phs_only/n_calls_test

integrate (process-tag) { ?phs_only = true n_calls_test = 1000 }

These are just optional settings for the integrate command discussed just a second ago. The ?phs_only = true (note that variables starting with a question mark are logicals) option tells WHIZARD to prepare a process for integration, but instead of performing the integration, just to generate a phase space parameterization. n_calls_test = <num> evaluates the sampling function for random integration channels and random momenta. VAMP integration grids are neither generated nor used, so the channel selection corresponds to the first integration pass, before any grids or channel weights are adapted. The number of sampling points is given by <num>. The output contains information about the timing, number of sampling points that passed the kinematics selection, and the number of matrix-element values that were actually evaluated. This command is useful mainly for debugging and diagnostics. Example:

integrate (some_large_process) { ?phs_only = true  n_calls_test = 1000 }

(Note that there used to be a separate command matrix_element_test until version 2.1.1 of WHIZARD which has been discarded in order to simplify the SINDARIN syntax.)

4.4.4 Events

histogram

histogram tag (lower-bound, upper-bound)
histogram
tag (lower-bound, upper-bound, step)

Declare a histogram for event analysis. The histogram is filled by an analysis expression, which is evaluated once for each event during a subsequent simulation step. Example:

histogram pt_distribution (0, 150 GeV, 10 GeV)

See Sec. ‍5.9.3.

plot

plot tag

Declare a plot for displaying data points. The plot may be filled by an analysis expression that is evaluated for each event; this would result in a scatter plot. More likely, you will use this feature for displaying data such as the energy dependence of a cross section. Example:

plot total_cross_section

See Sec. ‍5.9.4.

selection

selection = selection-expression

The selection expression is a logical macro expression that is evaluated once for each event. It is applied to the event record, after all decays have been executed (if any). It is therefore intended e.g. for modelling detector acceptance cuts etc. For unfactorized processes the usage of cuts or selection leads to the same results. Events for which the selection expression evaluates to false are dropped; they are neither analyzed nor written to any user-defined output file. However, the dropped events are written to WHIZARD’s native event file. For unfactorized processes it is therefore preferable to implement all cuts using the cuts keyword for the integration, see cuts above. Example:

selection = all Pt > 50 GeV [lepton]

The syntax is generically the same as for the cuts expression, see Sec. ‍5.2.5. For more information see also Sec. ‍5.9.

analysis

analysis = analysis-expression

The analysis expression is a logical macro expression that is evaluated once for each event that passes the integration and selection cuts in a subsequent simulation step. The expression has type logical in analogy with the cut expression; however, its main use will be in side effects caused by embedded record expressions. The record expression books a value, calculated from observables evaluated for the current event, in one of the predefined histograms or plots. Example:

analysis = record pt_distribution (eval Pt [photon])
      and  record mval (eval M [lepton, lepton])

See Sec. ‍5.9.

unstable

unstable particle (decay-channels)

Specify that a particle can decay, if it occurs in the final state of a subsequent simulation step. (In the integration step, all final-state particles are considered stable.) The decay channels are processes which should have been declared before by a process command (alternatively, there are options that WHIZARD takes care of this automatically; cf. Sec. ‍5.8.2). They may be integrated explicitly, otherwise the unstable command will take care of the integration before particle decays are generated. Example:

unstable Z (z_ee, z_jj)

Note that the decay is an on-shell approximation. Alternatively, WHIZARD is capable of generating the final state(s) directly, automatically including the particle as an internal resonance together with irreducible background. Depending on the physical problem and on the complexity of the matrix-element calculation, either option may be more appropriate.

See Sec. ‍5.8.2.

n_events

n_events = integer

Specify the number of events that a subsequent simulation step should produce. By default, simulated events are unweighted. (Unweighting is done by a rejection operation on weighted events, so the usual caveats on event unweighting by a numerical Monte-Carlo generator do apply.) Example:

n_events = 20000

See Sec. ‍5.8.1.

simulate

simulate (process-tags)

Generate an event sample. The command allows for analyzing the generated events by the analysis expression. Furthermore, events can be written to file in various formats. Optionally, the partonic events can be showered and hadronized, partly using included external (PYTHIA) or truly external programs called by WHIZARD. Example:

simulate (w_plus_jets) { sample_format = lhef }

See Sec. ‍5.8.1 and Chapter ‍11.

graph

graph (tag) = histograms-and-plots

Combine existing histograms and plots into a common graph. Also useful for pretty-printing single histograms or plots. Example:

graph comparison {
  $title = "$p_T$ distribution for two different values of $m_h$"
} = hist1 & hist2

See Sec. ‍12.4.

write_analysis

write_analysis (analysis-objects)

Writes out data tables for the specified analysis objects (plots, graphs, histograms). If the argument is empty or absent, write all analysis objects currently available. The tables are available for feeding external programs. Example:

write_analysis

See Sec. ‍5.9.

compile_analysis

compile_analysis (analysis-objects)

Analogous to write_analysis, but the generated data tables are processed by LATEX and gamelan, which produces Postscript and PDF versions of the displayed data. Example:

compile_analysis

See Sec. ‍5.9.

4.5 Control Structures

Like any complete programming language, SINDARIN provides means for branching and looping the program flow.

4.5.1 Conditionals

if

if logical_expression then statements
elsif
logical_expression then statements
else
statements
endif

Execute statements conditionally, depending on the value of a logical expression. There may be none or multiple elsif branches, and the else branch is also optional. Example:

if (sqrts > 2 * mtop) then
  integrate (top_pair_production)
else
  printf "Top pair production is not possible"
endif

The current SINDARIN implementation puts some restriction on the statements that can appear in a conditional. For instance, process definitions must be done unconditionally.

4.5.2 Loops

scan

scan variable = (value-list) { statements }

Execute the statements repeatedly, once for each value of the scan variable. The statements are executed in a local context, analogous to the option statement list for commands. The value list is a comma-separated list of expressions, where each item evaluates to the value that is assigned to ⟨variable⟩ for this iteration.

The type of the variable is not restricted to numeric, scans can be done for various object types. For instance, here is a scan over strings:

scan string $str = ("%.3g", "%.4g", "%.5g") { printf $str (mW) }

The output:

[user variable] $str = "%.3g"
80.4
[user variable] $str = "%.4g"
80.42
[user variable] $str = "%.5g"
80.419

For a numeric scan variable in particular, there are iterators that implement the usual functionality of for loops. If the scan variable is of type integer, an iterator may take one of the forms

start-value => end-value
start-value => end-value /+ add-step
start-value => end-value /- subtract-step
start-value => end-value /* multiplicator
start-value => end-value // divisor

The iterator can be put in place of an expression in the ⟨value-list⟩. Here is an example:

scan int i = (1, (3 => 5), (10 => 20 /+ 4))

which results in the output

[user variable] i =            1
[user variable] i =            3
[user variable] i =            4
[user variable] i =            5
[user variable] i =           10
[user variable] i =           14
[user variable] i =           18

[Note that the ⟨statements⟩ part of the scan construct may be empty or absent.]

For real scan variables, there are even more possibilities for iterators:

start-value => end-value
start-value => end-value /+ add-step
start-value => end-value /- subtract-step
start-value => end-value /* multiplicator
start-value => end-value // divisor
start-value => end-value /+/ n-points-linear
start-value => end-value /*/ n-points-logarithmic

The first variant is equivalent to /+ 1. The /+ and /- operators are intended to add or subtract the given step once for each iteration. Since in floating-point arithmetic this would be plagued by rounding ambiguities, the actual implementation first determines the (integer) number of iterations from the provided step value, then recomputes the step so that the iterations are evenly spaced with the first and last value included.

The /* and // operators are analogous. Here, the initial value is intended to be multiplied by the step value once for each iteration. After determining the integer number of iterations, the actual scan values will be evenly spaced on a logarithmic scale.

Finally, the /+/ and /*/ operators allow to specify the number of iterations (not counting the initial value) directly. The ⟨start-value⟩ and ⟨end-value⟩ are always included, and the intermediate values will be evenly spaced on a linear (/+/) or logarithmic (/*/) scale.

Example:

scan real mh = (130 GeV,
           (140 GeV => 160 GeV /+ 5 GeV),
           180 GeV,
           (200 GeV => 1 TeV /*/ 10))
  {  integrate (higgs_decay) }

4.5.3 Including Files

include

include (file-name)

Include a SINDARIN script from the specified file. The contents must be complete commands; they are compiled and executed as if they were part of the current script. Example:

include ("default_cuts.sin")

4.6 Expressions

SINDARIN expressions are classified by their types. The type of an expression is verified when the script is compiled, before it is executed. This provides some safety against simple coding errors.

Within expressions, grouping is done using ordinary brackets (). For subevent expressions, use square brackets [].

4.6.1 Numeric

The language supports the classical numeric types

  • int for integer: machine-default, usually 32 bit;
  • real, usually double precision or 64 bit;
  • complex, consisting of real and imaginary part equivalent to a real each.

SINDARIN supports arithmetic expressions similar to conventional languages. In arithmetic expressions, the three numeric types can be mixed as appropriate. The computation essentially follows the rules for mixed arithmetic in Fortran. The arithmetic operators are +, -, *, /, ^. Standard functions such as sin, sqrt, etc. are available. See Sec. ‍5.1.1 to Sec. ‍5.1.3.

Numeric values can be associated with units. Units evaluate to numerical factors, and their use is optional, but they can be useful in the physics context for which WHIZARD is designed. Note that the default energy/mass unit is GeV, and the default unit for cross sections is fbarn.

4.6.2 Logical and String

The language also has the following standard types:

  • logical (a.k.a. boolean). Logical variable names have a ? (question mark) as prefix.
  • string (arbitrary length). String variable names have a $ (dollar) sign as prefix.

There are comparisons, logical operations, string concatenation, and a mechanism for formatting objects as strings for output.

4.6.3 Special

Furthermore, SINDARIN deals with a bunch of data types tailored specifically for Monte Carlo applications:

  • alias objects denote a set of particle species.
  • subevt objects denote a collection of particle momenta within an event. They have their uses in cut and analysis expressions.
  • process object are generated by a process statement. There are no expressions involving processes, but they are referred to by integrate and simulate commands.
  • model: There is always a current object of type and name model. Several models can be used concurrently by appropriately defining processes, but this happens behind the scenes.
  • beams: Similarly, the current implementation allows only for a single object of this type at a given time, which is assigned by a beams = statement and used by integrate.

In the current implementation, SINDARIN has no container data types derived from basic types, such as lists, arrays, or hashes, and there are no user-defined data types. (The subevt type is a container for particles in the context of events, but there is no type for an individual particle: this is represented as a one-particle subevt). There are also containers for inclusive processes which are however simply handled as an expansion into several components of a master process tag.

4.7 Variables

SINDARIN supports global variables, variables local to a scoping unit (the option body of a command, the body of a scan loop), and variables local to an expression.

Some variables are predefined by the system (intrinsic variables). They are further separated into independent variables that can be reset by the user, and derived or locked variables that are automatically computed by the program, but not directly user-modifiable. On top of that, the user is free to introduce his own variables (user variables).

The names of numerical variables consist of alphanumeric characters and underscores. The first character must not be a digit. Logical variable names are furthermore prefixed by a ? (question mark) sign, while string variable names begin with a $ (dollar) sign.

Character case does matter. In this manual we follow the convention that variable names consist of lower-case letters, digits, and underscores only, but you may also use upper-case letters if you wish.

Physics models contain their own, specific set of numeric variables (masses, couplings). They are attached to the model where they are defined, so they appear and disappear with the model that is currently loaded. In particular, if two different models contain a variable with the same name, these two variables are nevertheless distinct: setting one doesn’t affect the other. This feature might be called, in computer-science jargon, a mixin.

User variables – global or local – are declared by their type when they are introduced, and acquire an initial value upon declaration. Examples:

  int i = 3
  real my_cut_value = 10 GeV
  complex c = 3 - 4 * I
  logical ?top_decay_allowed = mH > 2 * mtop
  string $hello = "Hello world!"
  alias q = d:u:s:c

An existing user variable can be assigned a new value without a declaration:

  i = i + 1

and it may also be redeclared if the new declaration specifies the same type, this is equivalent to assigning a new value.

Variables local to an expression are introduced by the let ... in contruct. Example:

  real a = let int n = 2 in
           x^n + y^n

The explicit int declaration is necessary only if the variable n has not been declared before. An intrinsic variable must not be declared: let mtop = 175.3 GeV in …

let constructs can be concatenated if several local variables need to be assigned: let a = 3 in let b = 4 in expression.

Variables of type subevt can only be defined in let constructs.

Exclusively in the context of particle selections (event analysis), there are observables as special numeric objects. They are used like numeric variables, but they are never declared or assigned. They get their value assigned dynamically, computed from the particle momentum configuration. Hence, they may be understood as (intrinsic and predefined) macros. By convention, observable names begin with a capital letter.

Further macros are

  • cuts and analysis. They are of type logical, and can be assigned an expression by the user. They are evaluated once for each event.
  • scale, factorization_scale and renormalization_scale are real numeric macros which define the energy scale(s) of an event. The latter two override the former. If no scale is defined, the partonic energy is used as the process scale.
  • weight is a real numeric macro. If it is assigned an expression, the expression is evaluated for each valid phase-space point, and the result multiplies the matrix element.

Previous Up Next