===============
 Bout analysis
===============

Here is a brief demo on bout analysis with `skdiveMove.bouts` for data
generated by mixtures of random Poisson processes.

Set up the environment.  Consider loading the `logging` module and setting
up a logger to monitor progress to this section.

.. jupyter-execute::

   # Set up
   import os
   import os.path as osp
   import numpy as np
   import pandas as pd
   import matplotlib.pyplot as plt
   from skdiveMove.tests import diveMove2skd
   import skdiveMove.bouts as skbouts

   # For figure sizes
   _FIG1X1 = (7, 6)
   _FIG1X2 = (12, 5)
   _FIG3X1 = (11, 11)

   pd.set_option("display.precision", 3)
   np.set_printoptions(precision=3, sign="+")
   %matplotlib inline


Calculate postdive duration
===========================

Create a :class:`TDR` object to easily calculate the necessary statistics:

.. jupyter-execute::
   :linenos:

   tdrX = diveMove2skd()
   pars = {"offset_zoc": 3,
           "dry_thr": 70,
           "wet_thr": 3610,
           "dive_thr": 3,
           "dive_model": "unimodal",
           "smooth_par": 0.1,
           "knot_factor": 20,
           "descent_crit_q": 0.01,
           "ascent_crit_q": 0}

   tdrX.calibrate(zoc_method="offset", offset=pars["offset_zoc"],
                  dry_thr=pars["dry_thr"], wet_thr=pars["dry_thr"],
                  dive_thr=pars["dive_thr"],
                  dive_model=pars["dive_model"],
                  smooth_par=pars["smooth_par"],
                  knot_factor=pars["knot_factor"],
                  descent_crit_q=pars["descent_crit_q"],
                  ascent_crit_q=pars["ascent_crit_q"])
   stats = tdrX.dive_stats()
   stamps = tdrX.stamp_dives(ignore_z=True)
   stats_tab = pd.concat((stamps, stats), axis=1)
   stats_tab.info()

Extract postdive duration for further analysis.

.. jupyter-execute::
   :linenos:

   postdives = stats_tab["postdive_dur"][stats_tab["phase_id"] == 4]
   postdives_diff = postdives.dt.total_seconds().diff()[1:].abs()
   # Remove isolated dives
   postdives_diff = postdives_diff[postdives_diff < 2000]


Non-linear least squares via "broken-stick" model
=================================================

`skdiveMove` provides the :class:`BoutsNLS` class for fitting non-linear
least squares (NLS) models to a modified histogram of a given variable.

The first step is to generate a modified histogram of postdive duration,
and this requires choosing the bin width for the histogram.

.. jupyter-execute::
   :linenos:

   postdives_nlsbouts = skbouts.BoutsNLS(postdives_diff, 0.1)
   print(postdives_nlsbouts)


Two-process model
~~~~~~~~~~~~~~~~~

Assuming a 2-process model, calculate starting values, providing a guess at
50 s interdive interval.

.. jupyter-execute::
   :linenos:

   fig, ax = plt.subplots(figsize=_FIG1X1)
   init_pars2 = postdives_nlsbouts.init_pars([50], plot=True, ax=ax)

Fit the two-process model.

.. jupyter-execute::
   :linenos:

   coefs2, pcov2 = postdives_nlsbouts.fit(init_pars2)
   # Coefficients
   print(coefs2)

.. jupyter-execute::
   :linenos:

   # Covariance between parameters
   print(pcov2)

Calculate bout-ending criterion.

.. jupyter-execute::
   :linenos:

   # `bec` returns ndarray, and we have only one here
   print("bec = {[0]:.2f}".format(postdives_nlsbouts.bec(coefs2)))

Plot the fit.

.. jupyter-execute::
   :linenos:

   fig, ax = plt.subplots(figsize=_FIG1X1)
   postdives_nlsbouts.plot_fit(coefs2, ax=ax);


Three-process model
~~~~~~~~~~~~~~~~~~~

Attempt to discern three processes in the data.

.. jupyter-execute::
   :linenos:

   fig, ax = plt.subplots(figsize=_FIG1X1)
   init_pars3 = postdives_nlsbouts.init_pars([50, 550], plot=True, ax=ax)

Fit three-process model.

.. jupyter-execute::
   :linenos:

   coefs3, pcov3 = postdives_nlsbouts.fit(init_pars3)
   # Coefficients
   print(coefs3)

.. jupyter-execute::
   :linenos:

   # Covariance between parameters
   print(pcov3)

Plot the fit.

.. jupyter-execute::
   :linenos:

   fig, ax = plt.subplots(figsize=_FIG1X1)
   postdives_nlsbouts.plot_fit(coefs3, ax=ax);

Compare the cumulative frequency distributions of two- vs three-process
models.

.. jupyter-execute::
   :linenos:

   fig, axs = plt.subplots(1, 2, figsize=_FIG1X2)
   postdives_nlsbouts.plot_ecdf(coefs2, ax=axs[0])
   postdives_nlsbouts.plot_ecdf(coefs3, ax=axs[1]);

The three-process model does not seem appropriate.


Maximum likelihood estimation
=============================

Another way to model Poisson mixtures that does not rely on the
subjectively created histogram, and involves fewer parameters, requires
fitting via maximum likelihood estimation (MLM). This approach is available
in :class:`BoutsMLE`.

Set up an instance.

.. jupyter-execute::
   :linenos:

   postdives_mlebouts = skbouts.BoutsMLE(postdives_diff, 0.1)
   print(postdives_mlebouts)

Again, assuming a 2-process model, calculate starting values.

.. jupyter-execute::
   :linenos:

   fig, ax = plt.subplots(figsize=_FIG1X1)
   init_pars = postdives_mlebouts.init_pars([50], plot=True, ax=ax)

Fit the two-process model.  It is important, but optional, to supply
reasonable bounds to help the optimization algorithm.  Otherwise, the
algorithm may fail to converge.  The fitting procedure is done in two
steps: with and without a reparameterized log-likelihood function.
Therefore, there are two sets of bounds required.

.. jupyter-execute::
   :linenos:

   p_bnd = (-2, None)                 # bounds for `p`
   lda1_bnd = (-5, None)              # bounds for `lambda1`
   lda2_bnd = (-10, None)             # bounds for `lambda2`
   bnd1 = (p_bnd, lda1_bnd, lda2_bnd)
   p_bnd = (1e-8, None)
   lda1_bnd = (1e-8, None)
   lda2_bnd = (1e-8, None)
   bnd2 = (p_bnd, lda1_bnd, lda2_bnd)
   fit1, fit2 = postdives_mlebouts.fit(init_pars,
                                       fit1_opts=dict(method="L-BFGS-B",
			                              bounds=bnd1),
			               fit2_opts=dict(method="L-BFGS-B",
			  	                      bounds=bnd2))

.. jupyter-execute::
   :linenos:

   # First fit
   print(fit1)

.. jupyter-execute::
   :linenos:

   # Second fit
   print(fit2)

Calculate bout-ending criterion (BEC).

.. jupyter-execute::
   :linenos:

   print("bec = {:.2f}".format(postdives_mlebouts.bec(fit2)))

Plot the fit.

.. jupyter-execute::
   :linenos:

   fig, ax = plt.subplots(figsize=_FIG1X1)
   postdives_mlebouts.plot_fit(fit2, ax=ax);

Compare the cumulative frequency distribution between NLS and MLM model
estimates.

.. jupyter-execute::
   :linenos:

   fig, axs = plt.subplots(1, 2, figsize=_FIG1X2)
   postdives_nlsbouts.plot_ecdf(coefs2, ax=axs[0])
   axs[0].set_title("NLS")
   postdives_mlebouts.plot_ecdf(fit2, ax=axs[1])
   axs[1].set_title("MLM");

Label bouts based on BEC from the last MLM model.  Note that `Timedelta`
type needs to be converted to total seconds to allow comparison with BEC.

.. jupyter-execute::
   :linenos:

   bec = postdives_mlebouts.bec(fit2)
   skbouts.label_bouts(postdives.dt.total_seconds(), bec, as_diff=True)

Feel free to download a copy of this demo
(:jupyter-download:script:`boutsdemo`).
