@section preface_sec Preface: How to use this document

The documentation for %ForceBalance exists in two forms: a web page and
a PDF manual.  They contain equivalent content.  The newest versions
of the software and documentation, along with relevant literature, can
be found on the <a href=https://simtk.org/home/forcebalance/>SimTK website</a>.

\b Users of the program should read the <em>Introduction,
Installation</em>, <em>Usage</em>, and <em>Tutorial</em> sections on
the main page.

<b>Developers and contributors</b> should read the
Introduction chapter, including the <em>Program Layout</em> and
<em>Creating Documentation</em> sections.  The <em><a href=http://leeping.github.io/forcebalance/doc/html/api/roadmap.html>
API documentation</a></em>, which describes all of the modules, classes and
functions in the program, is intended as a reference
for contributors who are writing code.

%ForceBalance is a work in progress; using the program is nontrivial
and many features are still being actively developed.  Thus, users and
developers are highly encouraged to contact me through
the <a href=https://simtk.org/home/forcebalance/>SimTK website</a>, either by sending me email or posting to the
public forum, in order to get things up and running.

Thanks!

Lee-Ping Wang

@section intro_sec Introduction

Welcome to %ForceBalance! :)

This is a <em> theoretical and computational chemistry </em> program
primarily developed by Lee-Ping Wang and the Wang research group
at UC Davis.  The full list of people who made this project possible
are given in the \ref credits.

The function of %ForceBalance is <em>automatic force field
optimization</em>.  Here I will provide some background, which for the
sake of brevity and readability will lack precision and details.  In
the future, this documentation will include literature citations which
will guide further reading.

@subsection background Background: Empirical Potentials

In theoretical and computational chemistry, there are many methods for
computing the potential energy of a collection of atoms and molecules
given their positions in space.  For a system of \a N particles, the
potential energy surface (or <em>potential</em> for short) is a
function of the \a 3N variables that specify the atomic coordinates.
The potential is the foundation for many types of atomistic
simulations, including molecular dynamics and Monte Carlo, which are
used to simulate all sorts of chemical and biochemical processes
ranging from protein folding and enzyme catalysis to reactions between
small molecules in interstellar clouds.

The true potential is given by the energy eigenvalue of the
time-independent Schrodinger's equation, but since the exact solution
is intractable for virtually all systems of interest, approximate
methods are used.  Some are <em>ab initio</em> methods ('from first
principles') since they are derived directly from approximating
Schrodinger's equation; examples include the independent electron
approximation (Hartree-Fock) and perturbation theory (MP2).  However,
most methods contain some tunable constants or <em>empirical
parameters</em> which are carefully chosen to make the method as
accurate as possible.  Three examples: the widely used B3LYP
approximation in density functional theory (DFT) contains three
parameters, the semiempirical PM3 method has 10-20 parameters per
chemical element, and classical force fields have hundreds to
thousands of parameters.  All such formulations require an accurate
parameterization to properly describe reality.

\image html ladder_sm.png "An arrangement of simulation methods by accuracy vs. computational cost."
\image latex ladder.png "An arrangement of simulation methods by accuracy vs. computational cost." width=10cm

The main audience of %ForceBalance is the scientific community that
uses and develops classical force fields.  These force fields do not
use the Schrodinger's equation as a starting point; instead, the
potential is entirely specified using elementary mathematical
functions.  Thus, the rigorous physical foundation is sacrificed but
the computational cost is reduced by a factor of millions, enabling
atomic-resolution simulations of large biomolecules on long timescales
and allowing the study of problems like protein folding.

In classical force fields, relatively few parameters may be determined
directly from experiment - for instance, a chemical bond may be
described using a harmonic spring with the experimental bond length
and vibrational frequency.  More often there is no experimentally
measurable counterpart to a parameter - for example, electrostatic
interactions are often described as Coulomb interactions between pairs
of atomic point "partial charges", but the fractional charge assigned
to each atom has no rigorous experimental of theoretical definition.
To complicate matters further, most molecular motions arise from a
combination of interactions and are sensitive to many parameters at
once - for example, the dihedral interaction term is intended to
govern torsional motion about a bond, but these motions are modulated
by the flexibility of the nearby bond and angle interactions as well
as the nonbonded interactions on either side.

\image html interactions_sm.png "An illustration of some interactions typically found in classical force fields."
\image latex interactions.png "An illustration of some interactions typically found in classical force fields." width=10cm

For all of these reasons, force field parameterization is difficult.
In the current practice, parameters are often determined by fitting to
results from other calculations (for example, restrained electrostatic
potential fitting (RESP) for determining the partial charges) or
chosen so that the simulation results match experimental measurements
(for example, adjusting the partial charges on a solvent molecule to
reproduce the bulk dielectric constant.)  Published force fields have
been modified by hand over decades to maximize their agreement with
experimental observations (for example, adjusting some parameters in
order to reproduce particular protein NMR structure) at the expense of
reproducibility.

@subsection mission_statement Purpose and brief description of this program 

Given this background, I can make the following statement.  <b>The
purpose of %ForceBalance is to create force fields by applying a highly
general and systematic process with explicitly specified input data
and optimization methods, paving the way to higher accuracy and
improved reproducibility. </b>

At a high level, %ForceBalance takes an empirical potential and a set
of reference data as inputs, and tunes the parameters such that the
simulations are able to reproduce the data as accurately as possible.
Examples of reference data include energy and forces from high-level
QM calculations, experimentally known molecular properties
(e.g. polarizabilities and multipole moments), and experimentally
measured bulk properties (e.g. density and dielectric constant).

%ForceBalance presents the problem of potential optimization in a
unified and easily extensible framework.  Since there are many
empirical potentials in theoretical chemistry and similarly many types
of reference data, significant effort is taken to provide an
infrastructure which allows a researcher to fit any type of
potential to any type of reference data.

Conceptually, a set of reference data (usually a physical quantity of
some kind), in combination with a method for computing the
corresponding quantity with the force field, is called a
<b>target</b>.  For example:

- A force field can predict the density of a liquid by running NPT
molecular dynamics, and this computed value can be compared against
the experimental density.

- A force field can be used to evaluate the energies and forces at
several molecular geometries, and these can be compared against
energies and forces from higher-level quantum chemistry calculations
using these same geometries.  This is known as <b>force and energy
matching</b>.

- A force field can predict the multipole moments and polarizabilities
of a molecule isolated in vacuum, and these can be compared against
experimental measurements.

Within a target, the accuracy of the force
field can be optimized by tuning the parameters to minimize the
difference between the computed and reference quantities.  One or more
targets can be combined to produce an aggregate
<b>objective function</b> whose domain is the <b>parameter space</b>.
This objective function, which typically depends on the parameters in
a complex way, is minimized using nonlinear optimization algorithms.
The result is a force field which minimizes the errors for all of the targets.

\image html cycle_sm.png "The division of the potential optimization problem into three parts; the force field, targets and optimization algorithm."
\image latex cycle.png "The division of the potential optimization problem into three parts; the force field, targets and optimization algorithm." height=10cm

The problem is now split into three main components; the force field,
the targets, and the optimization algorithm.  %ForceBalance
uses this conceptual division to define three classes with minimal
interdependence.  Thus, if a researcher wishes to explore a new
functional form, incorporate a new type of reference data or try a new
optimization algorithm, he or she would only need to contribute to one
branch of the program without having to restructure the entire code
base.

The scientific problems and concepts that this program is based upon
are further described in my Powerpoint presentations and publications,
which can be found on the <a href=https://simtk.org/home/forcebalance/>SimTK website</a>.

@section credits Credits

- Lee-Ping Wang is the principal developer and author. (2006-2019)

- Yudong Qiu contributed the surface tension target, pure finite-difference
for thermodynamic property gradients, the optimized geometry target,
the OpenMM implementation of the vibrational frequency target,
the interface to PropertyEstimator, and other
improvements when ForceBalance is used with the SMIRNOFF force field. (2016-2019)

- Jeffrey Wagner contributed modernized testing infrastructure and continuous integration,
and assisted with developing the interface to the Open Force Field toolkit. (2019)

- Hyesu Jang contributed modernized testing infrastructure with Jeff Wagner. (2019)

- Simon Boothroyd contributed the interface to PropertyEstimator with Yudong Qiu. (2019)

- Arthur Vigil contributed the unit testing framework and many unit tests,
significant improvements to the automatic documentation generation, 
logging of output, graphical user interface, and various code improvements.
(2012-2013)

- Keri McKiernan contributed the lipid bilayer target, starting from
multiple initial conditions at each thermodynamic phase point, and several
other features. (2012-2015)

- Erik Brandt contributed the general thermodynamic property "Thermo" target. (2014)

- Johnny Israeli contributed the XmlScript functionality that allows
adjustable parameters to be read from embedded scripts in OpenMM XML files. (2012)

- John Stoppelman contributed the Reopt.py script to re-compute QM single point energies
at MM optimized minima and also added Psi4 input parsing / writing capabilities in molecule.py. (2020)

- Matt Welborn contributed the parallelization-over-snapshots
functionality in the general force matching module (2011, only
in the older ForTune code).

- Jiahao Chen contributed the call graph generator, the QTPIE
fluctuating-charge force field (which Lee-Ping implemented into
GROMACS), the interface to the MOPAC semiempirical code, and many
helpful discussions in 2009-2011. (Many of these are only in the
older ForTune code)

- Troy Van Voorhis provided scientific guidance and many of
the central ideas as well as financial support during the time Lee-Ping
was a graduate student (2006-2011).

- Vijay Pande provided scientific guidance and financial support,
and through the SimBios program gave this software a home on the Web
at the <a href=https://simtk.org/home/forcebalance/>SimTK website</a>.

- Todd Martinez provided scientific guidance and financial support.

@section funding Funding

The development of this code has been supported in part by the following grants and awards:

NIH Grant R01 AI130684-02
ACS-PRF 58158-DNI6
NIH Grant U54 GM072970
Open Force Field Consortium

