Source code for discrete_random_variable

# -*- coding: utf-8 -*-
# The MIT License (MIT)
#
# Copyright (c) 2016 Peter Foster <pyitlib@gmx.us>
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.

"""
This module implements various information-theoretic quantities for discrete
random variables.

For ease of reference, function names follow the following convention:

Function names beginning with "entropy" : Entropy measures

Function names beginning with "information" : Mutual information measures

Function names beginning with "divergence" : Divergence measures

Function names ending with "pmf" : Functions operating on arrays of probability
mass assignments (as opposed realisations of random variables)

================================================== ===================== ============== ======== ======== =======================
Function                                           Generalises           Non-negativity Symmetry Identity Metric properties
================================================== ===================== ============== ======== ======== =======================
:meth:`divergence_jensenshannon`                                         Yes            Yes      Yes      Square root is a metric
:meth:`divergence_jensenshannon_pmf`                                     Yes            Yes      Yes      Square root is a metric
:meth:`divergence_kullbackleibler`                                       Yes            No       Yes
:meth:`divergence_kullbackleibler_pmf`                                   Yes            No       Yes
:meth:`divergence_kullbackleibler_symmetrised`                           Yes            Yes      Yes
:meth:`divergence_kullbackleibler_symmetrised_pmf`                       Yes            Yes      Yes
:meth:`entropy`                                                          Yes
:meth:`entropy_conditional`                                              Yes            No       No
:meth:`entropy_cross`                                                    Yes            No       No
:meth:`entropy_cross_pmf`                                                Yes            No       No
:meth:`entropy_joint`                                                    Yes            Yes      No
:meth:`entropy_pmf`                                                      Yes
:meth:`entropy_residual`                           information_variation Yes            Yes      No
:meth:`information_binding`                        information_mutual    Yes            Yes      No
:meth:`information_co`                             information_mutual    No             No       No
:meth:`information_enigmatic`                                            No             Yes      No
:meth:`information_exogenous_local`                                      Yes            Yes      No
:meth:`information_interaction`                    information_mutual    No             No       No
:meth:`information_lautum`                                               Yes            No       No
:meth:`information_multi`                          information_mutual    Yes            Yes      No
:meth:`information_mutual`                                               Yes            Yes      No
:meth:`information_mutual_conditional`                                   Yes            No       No
:meth:`information_mutual_normalised`                                    Yes            See docs No       See docs
:meth:`information_variation`                                            Yes            Yes      No       Is a metric
================================================== ===================== ============== ======== ======== =======================

.. rubric:: References
.. [AbPl12] Abdallah, S.A.; Plumbley, M.D.: A measure of statistical \
complexity based on predictive information with application to finite spin \
systems. In: Physics Letters A, Vol. 376, No. 4, 2012, P. 275-281.
.. [Bell03] Bell, A.J.: The co-information lattice. In: Proceedings of the \
International Workshop on Independent Component Analysis and Blind Signal \
Separation. 2003.
.. [CoTh06] Cover, T.M.; Thomas, J.A.: Elements of information theory \
(2nd ed.). John Wiley & Sons, 2006.
.. [Croo15] Crooks, G.E.: On measures of entropy and information. \
http://threeplusone.com/info, retrieved 2017-03-16.
.. [GaSa95] Gale, W.A.; Sampson, G.: Good‐Turing frequency estimation \
without tears. In: Journal of Quantitative Linguistics, \
Vol. 2, No. 3, 1995, P. 217-237.
.. [Han78] Han, T.S.: Nonnegative entropy measures of multivariate symmetric \
correlations. In: Information and Control, Vol. 36, 1978, P. 133-156.
.. [HaSt09] Hausser, J.; Strimmer, K.: Entropy inference and the James-Stein \
estimator, with application to nonlinear gene association networks. \
In: Journal of Machine Learning Research, Vol. 10, 2009, P. 1469-1484.
.. [JaBr03] Jakulin, A.; Bratko, I.: Quantifying and visualizing attribute \
interactions. arXiv preprint cs/0308002, 2003.
.. [JaEC11] James, R.G.; Ellison, C.J.; Crutchfield, J.P.: Anatomy of a bit: \
Information in a time series observation. In: Chaos: An Interdisciplinary \
Journal of Nonlinear Science, Vol. 21, No. 3, 2011.
.. [Lin91] Lin, J.: Divergence measures based on the Shannon entropy. \
In: IEEE Transactions on Information theory, Vol. 37, No. 1, 1991, P. 145-151.
.. [Meil03] Meilă, M.: Comparing clusterings by the variation of \
information. In: Learning theory and kernel machines. \
Springer, 2003, P. 173-187.
.. [Murp12] Murphy, K. P.: Machine learning: a probabilistic perspective. \
MIT press, 2012.
.. [PaVe08] Palomar, D. P.; Verdú, S.: Lautum information. In: IEEE \
transactions on information theory, Vol. 54, No. 3, 2008, P. 964-975.
.. [StVe98] Studený, M.; Vejnarová, J.: The multiinformation function \
as a tool for measuring stochastic dependence. In: Learning in graphical \
models. Springer Netherlands, 1998, P. 261-297.
.. [VeWe06] Verdú, S.; Weissman, T.: Erasure entropy. In: Proc. IEEE \
International Symposium on Information Theory, 2006, P. 98-102.
.. [Wata60] Watanabe, S.: Information theoretical analysis of multivariate \
correlation. In: IBM Journal of research and development, \
Vol. 4, No. 1, 1960, P. 66-82.
"""

import warnings
import numpy as np
import sklearn.preprocessing
import pandas as pd

NONE_REPLACEMENT = -32768

# Aims of project: Comprehensive, Simple-to-use (avoid lots of function calls,
# prefer flags, convenient defaults for possible interactive use). Focus on
# data analysis. Documentation.
# TODO Add guidance on which estimator to use (within module doc)
# TODO Add notes on interpretation of each measure (within each function doc)
# TODO Add basic equalities and properties, followed by interpretation (within
# each function doc)
# TODO Note about which measures are metrics (within each function doc).
# TODO Add information diagrams (within each function doc).
# TODO Implement joint observation mapping function
# encode/map_joint_observations. This works by sorting and returning unique
# observations, similar to entropy_joint()
# TODO Implement generalised Jensen-Shannon divergence
# TODO Information bottleneck and deterministic information bottleneck (See
# Strose and Schwab 2016). NB Can be used in either supervised or unsupervised
# manner. Implement in separate module. Implement the earlier hierarchical
# aggolerative clustering approach as well?. The approaches should also
# incorporate `de-noising' capability: With probability p, corrupt observations
# with a random symbol. The learnt model should also have a way of discarding
# un-used input features (feature selection).
# TODO Re-arrange functions based on dependencies
# TODO Add information in documentation on when quantities are maximised or
# minimised



[docs]
def entropy_residual(
    X, base=2, fill_value=-1, estimator="ML", Alphabet_X=None, keep_dims=False
):
    """
    Returns the estimated residual entropy [JaEC11] (also known as erasure
    entropy [VeWe06]) for an array X containing realisations of discrete random
    variables.

    **Mathematical definition**:

    Given discrete random variables :math:`X_1, \ldots, X_n`, the residual
    entropy :math:`R(X_1, \\ldots, X_n)` is defined as:

    .. math::
        R(X_1, \\ldots, X_n) = H(X_1, \\ldots, X_n) - B(X_1, \\ldots, X_n)

    where :math:`H(\\cdot, \\ldots, \\cdot)` denotes the joint entropy and
    where :math:`B(\\cdot, \\ldots, \\cdot)` denotes the binding information.

    **Estimation**:

    Residual information is estimated based on frequency tables, using the
    following functions:

        entropy_joint()

        entropy()

    See below for a list of available estimators. Note that although residual
    information is a non-negative quantity, depending on the chosen estimator
    the obtained estimate may be negative.

    **Parameters**:

    X : numpy array (or array-like object such as a list of immutables, as \
    accepted by np.array())
        An array containing discrete random variable realisations. Successive
        realisations of a random variable are indexed by the last axis in the
        array; multiple random variables may be specified using preceding axes.
        When X.ndim==1, returns a scalar and is equivalent to entropy(). When
        X.ndim>1, returns a scalar based on jointly considering all random
        variables indexed in the array. X may not contain (floating point) NaN
        values. Missing data may be specified using numpy masked arrays, as
        well as using standard numpy array/array-like objects; see below
        for details.
    base : float
        The desired logarithmic base (default 2).
    fill_value : object
        It is possible to specify missing data using numpy masked arrays,
        pandas Series/DataFrames, as well as using standard numpy
        array/array-like objects with assigned placeholder values. When using
        numpy masked arrays, this function invokes np.ma.filled() internally,
        so that missing data are represented with the array's object-internal
        placeholder value fill_value (this function's fill_value parameter is
        ignored in such cases). When using pandas Series/DataFrames, an initial
        conversion to a numpy masked array is performed. When using standard
        numpy array/array-like objects, this function's fill_value parameter is
        used to specify the placeholder value for missing data (defaults to
        -1).

        Data equal to the placeholder value are subsequently ignored.
    estimator : str or float
        The desired estimator (see above for details on estimators). Possible
        values are:

            *'ML' (the default value)* : Maximum likelihood estimator.

            *any floating point value* : Maximum a posteriori esimator using
            Dirichlet prior (equivalent to maximum likelihood with pseudo-count
            for each outcome as specified).

            *PERKS* : Maximum a posteriori esimator using Dirichlet prior
            (equivalent to maximum likelihood with pseudo-count for each
            outcome set to 1/L, where L is the number of possible outcomes.

            *MINIMAX* : Maximum a posteriori esimator using Dirichlet prior
            (equivalent to maximum likelihood with pseudo-count for each
            outcome set to sqrt(N)/L, where N is the total number of
            realisations and where L is the number of possible outcomes.

            *JAMES-STEIN* : James-Stein estimator [HaSt09].

            *GOOD-TURING* : Good-Turing estimator [GaSa95].

    Alphabet_X : numpy array (or array-like object such as a list of \
    immutables, as accepted by np.array())
        An array specifying the alphabet/alphabets of possible outcomes that
        random variable realisations in array X may assume. Defaults to None,
        in which case the alphabet/alphabets of possible outcomes is/are
        implicitly based the observed outcomes in array X, with no additional,
        unobserved outcomes. In combination with any estimator other than
        maximum likelihood, it may be useful to specify alphabets including
        unobserved outcomes. For such cases, successive possible outcomes of a
        random variable are indexed by the last axis in Alphabet_X; multiple
        alphabets may be specified using preceding axes, with the requirement
        X.shape[:-1]==Alphabet_X.shape[:-1]. Alphabets of different sizes may
        be specified either using numpy masked arrays, or by padding with the
        chosen placeholder fill_value.

        NB: When specifying multiple alphabets, an alphabet of possible joint
        outcomes is always implicit from the alphabets of possible (marginal)
        outcomes in Alphabet_X. For example, specifying
        Alphabet_X=np.array(((1,2),(1,2))) implies an alphabet of possible
        joint outcomes np.array(((1,1,2,2),(1,2,1,2))).
    keep_dims : boolean
        When set to True, an additional dimension of length one is appended to
        the returned array, facilitating any broadcast operations required by
        the user (defaults to False).

    **Implementation notes**:

    Before estimation, outcomes are mapped to the set of non-negative integers
    internally, with the value -1 representing missing data. To avoid this
    internal conversion step, supply integer data and use the default fill
    value -1.

    """
    H_joint = entropy_joint(X, base, fill_value, estimator, Alphabet_X)

    R = H_joint - information_binding(X, base, fill_value, estimator, Alphabet_X)

    if keep_dims:
        R = R[..., np.newaxis]

    return R




[docs]
def information_exogenous_local(
    X, base=2, fill_value=-1, estimator="ML", Alphabet_X=None, keep_dims=False
):
    """
    Returns the estimated exogenous local information [JaEC11] for an array X
    containing realisations of discrete random variables.

    **Mathematical definition**:

    Given discrete random variables :math:`X_1, \ldots, X_n`, the exogenous
    local information :math:`W(X_1, \\ldots, X_n)` is defined as:

    .. math::
        W(X_1, \\ldots, X_n) = T(X_1, \\ldots, X_n) + B(X_1, \\ldots, X_n)

    where :math:`T(\\cdot, \\ldots, \\cdot)` denotes the multi-information and
    where :math:`B(\\cdot, \\ldots, \\cdot)` denotes the binding information.

    **Estimation**:

    Exogenous local information is estimated based on frequency tables, using
    the following functions:

        entropy_joint()

        entropy()

    See below for a list of available estimators. Note that although exogenous
    local information is a non-negative quantity, depending on the chosen
    estimator the obtained estimate may be negative.

    **Parameters**:

    X : numpy array (or array-like object such as a list of immutables, as \
    accepted by np.array())
        An array containing discrete random variable realisations. Successive
        realisations of a random variable are indexed by the last axis in the
        array; multiple random variables may be specified using preceding axes.
        When X.ndim==1, returns the scalar 0. When X.ndim>1, returns a scalar
        based on jointly considering all random variables indexed in the array.
        X may not contain (floating point) NaN values. Missing data may be
        specified using numpy masked arrays, as well as using standard
        numpy array/array-like objects; see below for details.
    base : float
        The desired logarithmic base (default 2).
    fill_value : object
        It is possible to specify missing data using numpy masked arrays,
        pandas Series/DataFrames, as well as using standard numpy
        array/array-like objects with assigned placeholder values. When using
        numpy masked arrays, this function invokes np.ma.filled() internally,
        so that missing data are represented with the array's object-internal
        placeholder value fill_value (this function's fill_value parameter is
        ignored in such cases). When using pandas Series/DataFrames, an initial
        conversion to a numpy masked array is performed. When using standard
        numpy array/array-like objects, this function's fill_value parameter is
        used to specify the placeholder value for missing data (defaults to
        -1).

        Data equal to the placeholder value are subsequently ignored.
    estimator : str or float
        The desired estimator (see above for details on estimators). Possible
        values are:

            *'ML' (the default value)* : Maximum likelihood estimator.

            *any floating point value* : Maximum a posteriori esimator using
            Dirichlet prior (equivalent to maximum likelihood with pseudo-count
            for each outcome as specified).

            *PERKS* : Maximum a posteriori esimator using Dirichlet prior
            (equivalent to maximum likelihood with pseudo-count for each
            outcome set to 1/L, where L is the number of possible outcomes.

            *MINIMAX* : Maximum a posteriori esimator using Dirichlet prior
            (equivalent to maximum likelihood with pseudo-count for each
            outcome set to sqrt(N)/L, where N is the total number of
            realisations and where L is the number of possible outcomes.

            *JAMES-STEIN* : James-Stein estimator [HaSt09].

            *GOOD-TURING* : Good-Turing estimator [GaSa95].

    Alphabet_X : numpy array (or array-like object such as a list of \
    immutables, as accepted by np.array())
        An array specifying the alphabet/alphabets of possible outcomes that
        random variable realisations in array X may assume. Defaults to None,
        in which case the alphabet/alphabets of possible outcomes is/are
        implicitly based the observed outcomes in array X, with no additional,
        unobserved outcomes. In combination with any estimator other than
        maximum likelihood, it may be useful to specify alphabets including
        unobserved outcomes. For such cases, successive possible outcomes of a
        random variable are indexed by the last axis in Alphabet_X; multiple
        alphabets may be specified using preceding axes, with the requirement
        X.shape[:-1]==Alphabet_X.shape[:-1]. Alphabets of different sizes may
        be specified either using numpy masked arrays, or by padding with the
        chosen placeholder fill_value.

        NB: When specifying multiple alphabets, an alphabet of possible joint
        outcomes is always implicit from the alphabets of possible (marginal)
        outcomes in Alphabet_X. For example, specifying
        Alphabet_X=np.array(((1,2),(1,2))) implies an alphabet of possible
        joint outcomes np.array(((1,1,2,2),(1,2,1,2))).
    keep_dims : boolean
        When set to True, an additional dimension of length one is appended to
        the returned array, facilitating any broadcast operations required by
        the user (defaults to False).

    **Implementation notes**:

    Before estimation, outcomes are mapped to the set of non-negative integers
    internally, with the value -1 representing missing data. To avoid this
    internal conversion step, supply integer data and use the default fill
    value -1.
    """
    W = information_binding(
        X, base, fill_value, estimator, Alphabet_X
    ) + information_multi(X, base, fill_value, estimator, Alphabet_X)

    if keep_dims:
        W = W[..., np.newaxis]

    return W




[docs]
def information_enigmatic(
    X, base=2, fill_value=-1, estimator="ML", Alphabet_X=None, keep_dims=False
):
    # Note: can be negative
    # Note: equals multivariate mutual information when N=3, can test for this
    """
    Returns the estimated enigmatic information [JaEC11] for an array X
    containing realisations of discrete random variables.

    **Mathematical definition**:

    Given discrete random variables :math:`X_1, \ldots, X_n`, the enigmatic
    information :math:`Q(X_1, \\ldots, X_n)` is defined as:

    .. math::
        Q(X_1, \\ldots, X_n) = T(X_1, \\ldots, X_n) - B(X_1, \\ldots, X_n)

    where :math:`T(\\cdot, \\ldots, \\cdot)` denotes the multi-information and
    where :math:`B(\\cdot, \\ldots, \\cdot)` denotes the binding information.

    **Estimation**:

    Enigmatic information is estimated based on frequency tables, using the
    following functions:

        entropy_joint()

        entropy()

    See below for a list of available estimators. Note that although enigmatic
    information is a non-negative quantity, depending on the chosen estimator
    the obtained estimate may be negative.

    **Parameters**:

    X : numpy array (or array-like object such as a list of immutables, as \
    accepted by np.array())
        An array containing discrete random variable realisations. Successive
        realisations of a random variable are indexed by the last axis in the
        array; multiple random variables may be specified using preceding axes.
        When X.ndim==1, returns the scalar 0. When X.ndim>1, returns a scalar
        based on jointly considering all random variables indexed in the array.
        X may not contain (floating point) NaN values. Missing data may be
        specified using numpy masked arrays, as well as using standard
        numpy array/array-like objects; see below for details.
    base : float
        The desired logarithmic base (default 2).
    fill_value : object
        It is possible to specify missing data using numpy masked arrays,
        pandas Series/DataFrames, as well as using standard numpy
        array/array-like objects with assigned placeholder values. When using
        numpy masked arrays, this function invokes np.ma.filled() internally,
        so that missing data are represented with the array's object-internal
        placeholder value fill_value (this function's fill_value parameter is
        ignored in such cases). When using pandas Series/DataFrames, an initial
        conversion to a numpy masked array is performed. When using standard
        numpy array/array-like objects, this function's fill_value parameter is
        used to specify the placeholder value for missing data (defaults to
        -1).

        Data equal to the placeholder value are subsequently ignored.
    estimator : str or float
        The desired estimator (see above for details on estimators). Possible
        values are:

            *'ML' (the default value)* : Maximum likelihood estimator.

            *any floating point value* : Maximum a posteriori esimator using
            Dirichlet prior (equivalent to maximum likelihood with pseudo-count
            for each outcome as specified).

            *PERKS* : Maximum a posteriori esimator using Dirichlet prior
            (equivalent to maximum likelihood with pseudo-count for each
            outcome set to 1/L, where L is the number of possible outcomes.

            *MINIMAX* : Maximum a posteriori esimator using Dirichlet prior
            (equivalent to maximum likelihood with pseudo-count for each
            outcome set to sqrt(N)/L, where N is the total number of
            realisations and where L is the number of possible outcomes.

            *JAMES-STEIN* : James-Stein estimator [HaSt09].

            *GOOD-TURING* : Good-Turing estimator [GaSa95].

    Alphabet_X : numpy array (or array-like object such as a list of \
    immutables, as accepted by np.array())
        An array specifying the alphabet/alphabets of possible outcomes that
        random variable realisations in array X may assume. Defaults to None,
        in which case the alphabet/alphabets of possible outcomes is/are
        implicitly based the observed outcomes in array X, with no additional,
        unobserved outcomes. In combination with any estimator other than
        maximum likelihood, it may be useful to specify alphabets including
        unobserved outcomes. For such cases, successive possible outcomes of a
        random variable are indexed by the last axis in Alphabet_X; multiple
        alphabets may be specified using preceding axes, with the requirement
        X.shape[:-1]==Alphabet_X.shape[:-1]. Alphabets of different sizes may
        be specified either using numpy masked arrays, or by padding with the
        chosen placeholder fill_value.

        NB: When specifying multiple alphabets, an alphabet of possible joint
        outcomes is always implicit from the alphabets of possible (marginal)
        outcomes in Alphabet_X. For example, specifying
        Alphabet_X=np.array(((1,2),(1,2))) implies an alphabet of possible
        joint outcomes np.array(((1,1,2,2),(1,2,1,2))).
    keep_dims : boolean
        When set to True, an additional dimension of length one is appended to
        the returned array, facilitating any broadcast operations required by
        the user (defaults to False).

    **Implementation notes**:

    Before estimation, outcomes are mapped to the set of non-negative integers
    internally, with the value -1 representing missing data. To avoid this
    internal conversion step, supply integer data and use the default fill
    value -1.
    """
    Q = information_multi(
        X, base, fill_value, estimator, Alphabet_X
    ) - information_binding(X, base, fill_value, estimator, Alphabet_X)

    if keep_dims:
        Q = Q[..., np.newaxis]

    return Q




[docs]
def information_interaction(
    X, base=2, fill_value=-1, estimator="ML", Alphabet_X=None, keep_dims=False
):
    """
    Returns the estimated interaction information [JaBr03] for an array X
    containing realisations of discrete random variables.

    **Mathematical definition**:

    Given discrete random variables :math:`X_1, \ldots, X_n`, the interaction
    information :math:`\\mathrm{Int}(X_1, \\ldots, X_n)` is defined as:

    .. math::
        \\mathrm{Int}(X_1, \\ldots, X_n) = - \\sum_{T \\subseteq
        \\{1,\\ldots, n\\}} (-1)^{n-|T|}  H(X_i : i \in T)

    where :math:`H(X_i : i \in T)` denotes the joint entropy of the subset of
    random variables specified by :math:`T`. Thus, interaction information is
    an alternating sum of joint entropies, with the sets of random variables
    used to compute the joint entropy in each term selected from the power set
    of available random variables.

    Note that interaction information is equal in magnitude to the
    co-information :math:`I(X_1, \\ldots, X_n)`, with equality for the case
    where :math:`n` is even,

    .. math::
        \\mathrm{Int}(X_1, \\ldots, X_n) = (-1)^n I(X_1, \\ldots, X_n).

    **Estimation**:

    Interaction information is estimated based on frequency tables, using the
    following functions:

        entropy_joint()

    See below for a list of available estimators. Note that although
    interaction information is a non-negative quantity, depending on the chosen
    estimator the obtained estimate may be negative.

    **Parameters**:

    X : numpy array (or array-like object such as a list of immutables, as \
    accepted by np.array())
        An array containing discrete random variable realisations. Successive
        realisations of a random variable are indexed by the last axis in the
        array; multiple random variables may be specified using preceding axes.
        When X.ndim==1, returns a scalar and is equivalent to -1*entropy().
        When X.ndim>1, returns a scalar based on jointly considering all random
        variables indexed in the array. X may not contain (floating point) NaN
        values. Missing data may be specified using numpy masked arrays, as
        well as using standard numpy array/array-like objects; see below
        for details.
    base : float
        The desired logarithmic base (default 2).
    fill_value : object
        It is possible to specify missing data using numpy masked arrays,
        pandas Series/DataFrames, as well as using standard numpy
        array/array-like objects with assigned placeholder values. When using
        numpy masked arrays, this function invokes np.ma.filled() internally,
        so that missing data are represented with the array's object-internal
        placeholder value fill_value (this function's fill_value parameter is
        ignored in such cases). When using pandas Series/DataFrames, an initial
        conversion to a numpy masked array is performed. When using standard
        numpy array/array-like objects, this function's fill_value parameter is
        used to specify the placeholder value for missing data (defaults to
        -1).

        Data equal to the placeholder value are subsequently ignored.
    estimator : str or float
        The desired estimator (see above for details on estimators). Possible
        values are:

            *'ML' (the default value)* : Maximum likelihood estimator.

            *any floating point value* : Maximum a posteriori esimator using
            Dirichlet prior (equivalent to maximum likelihood with pseudo-count
            for each outcome as specified).

            *PERKS* : Maximum a posteriori esimator using Dirichlet prior
            (equivalent to maximum likelihood with pseudo-count for each
            outcome set to 1/L, where L is the number of possible outcomes.

            *MINIMAX* : Maximum a posteriori esimator using Dirichlet prior
            (equivalent to maximum likelihood with pseudo-count for each
            outcome set to sqrt(N)/L, where N is the total number of
            realisations and where L is the number of possible outcomes.

            *JAMES-STEIN* : James-Stein estimator [HaSt09].

            *GOOD-TURING* : Good-Turing estimator [GaSa95].

    Alphabet_X : numpy array (or array-like object such as a list of \
    immutables, as accepted by np.array())
        An array specifying the alphabet/alphabets of possible outcomes that
        random variable realisations in array X may assume. Defaults to None,
        in which case the alphabet/alphabets of possible outcomes is/are
        implicitly based the observed outcomes in array X, with no additional,
        unobserved outcomes. In combination with any estimator other than
        maximum likelihood, it may be useful to specify alphabets including
        unobserved outcomes. For such cases, successive possible outcomes of a
        random variable are indexed by the last axis in Alphabet_X; multiple
        alphabets may be specified using preceding axes, with the requirement
        X.shape[:-1]==Alphabet_X.shape[:-1]. Alphabets of different sizes may
        be specified either using numpy masked arrays, or by padding with the
        chosen placeholder fill_value.

        NB: When specifying multiple alphabets, an alphabet of possible joint
        outcomes is always implicit from the alphabets of possible (marginal)
        outcomes in Alphabet_X. For example, specifying
        Alphabet_X=np.array(((1,2),(1,2))) implies an alphabet of possible
        joint outcomes np.array(((1,1,2,2),(1,2,1,2))).
    keep_dims : boolean
        When set to True, an additional dimension of length one is appended to
        the returned array, facilitating any broadcast operations required by
        the user (defaults to False).

    **Implementation notes**:

    Before estimation, outcomes are mapped to the set of non-negative integers
    internally, with the value -1 representing missing data. To avoid this
    internal conversion step, supply integer data and use the default fill
    value -1.
    """
    X, fill_value_X = _sanitise_array_input(X, fill_value)
    if Alphabet_X is not None:
        Alphabet_X, fill_value_Alphabet_X = _sanitise_array_input(
            Alphabet_X, fill_value
        )
        Alphabet_X, _ = _autocreate_alphabet(Alphabet_X, fill_value_Alphabet_X)
    else:
        Alphabet_X, fill_value_Alphabet_X = _autocreate_alphabet(X, fill_value_X)

    if X.size == 0:
        raise ValueError("arg X contains no elements")
    if np.any(_isnan(X)):
        raise ValueError("arg X contains NaN values")
    if Alphabet_X.size == 0:
        raise ValueError("arg Alphabet_X contains no elements")
    if np.any(_isnan(Alphabet_X)):
        raise ValueError("arg Alphabet_X contains NaN values")
    if _isnan(fill_value_X):
        raise ValueError("fill value for arg X is NaN")
    if X.shape[:-1] != Alphabet_X.shape[:-1]:
        raise ValueError("leading dimensions of args X and Alphabet_X do not " "match")
    if not (np.isscalar(base) and np.isreal(base) and base > 0):
        raise ValueError("arg base not a positive real-valued scalar")

    S, fill_value = _map_observations_to_integers(
        (X, Alphabet_X), (fill_value_X, fill_value_Alphabet_X)
    )
    X, Alphabet_X = S

    # Re-shape X, so that we may handle multi-dimensional arrays equivalently
    # and iterate across 0th axis
    X = np.reshape(X, (-1, X.shape[-1]))
    Alphabet_X = np.reshape(Alphabet_X, (-1, Alphabet_X.shape[-1]))

    I = 0
    M = np.zeros(X.shape[0]).astype("bool")
    M = _increment_binary_vector(M)
    while np.any(M):
        I -= (-1) ** (X.shape[0] - np.sum(M)) * entropy_joint(
            X[M], base, fill_value, estimator, Alphabet_X[M]
        )
        M = _increment_binary_vector(M)

    if keep_dims:
        I = I[..., np.newaxis]

    return I




[docs]
def information_co(
    X, base=2, fill_value=-1, estimator="ML", Alphabet_X=None, keep_dims=False
):
    """
    Returns the estimated co-information [Bell03] for an array X containing
    realisations of discrete random variables.

    **Mathematical definition**:

    Given discrete random variables :math:`X_1, \ldots, X_n`, the
    co-information :math:`I(X_1, \\ldots, X_n)` is defined as:

    .. math::
        I(X_1, \\ldots, X_n) = - \\sum_{T \\subseteq \\{1,\\ldots, n\\}}
        (-1)^{|T|}  H(X_i : i \in T)

    where :math:`H(X_i : i \in T)` denotes the joint entropy of the subset of
    random variables specified by :math:`T`. Thus, co-information is an
    alternating sum of joint entropies, with the sets of random variables used
    to compute the joint entropy in each term selected from the power set of
    available random variables.

    Note that co-information is equal in magnitude to the interaction
    information :math:`\\mathrm{Int}(X_1, \\ldots, X_n)`, with equality for the
    case where :math:`n` is even,

    .. math::
        I(X_1, \\ldots, X_n) = (-1)^n \\mathrm{Int}(X_1, \\ldots, X_n).

    **Estimation**:

    Co-information is estimated based on frequency tables, using the following
    functions:

        entropy_joint()

    See below for a list of available estimators. Note that although
    co-information is a non-negative quantity, depending on the chosen
    estimator the obtained estimate may be negative.

    **Parameters**:

    X : numpy array (or array-like object such as a list of immutables, as \
    accepted by np.array())
        An array containing discrete random variable realisations. Successive
        realisations of a random variable are indexed by the last axis in the
        array; multiple random variables may be specified using preceding axes.
        When X.ndim==1, returns a scalar and is equivalent to entropy(). When
        X.ndim>1, returns a scalar based on jointly considering all random
        variables indexed in the array. X may not contain (floating point) NaN
        values. Missing data may be specified using numpy masked arrays, as
        well as using standard numpy array/array-like objects; see below
        for details.
    base : float
        The desired logarithmic base (default 2).
    fill_value : object
        It is possible to specify missing data using numpy masked arrays,
        pandas Series/DataFrames, as well as using standard numpy
        array/array-like objects with assigned placeholder values. When using
        numpy masked arrays, this function invokes np.ma.filled() internally,
        so that missing data are represented with the array's object-internal
        placeholder value fill_value (this function's fill_value parameter is
        ignored in such cases). When using pandas Series/DataFrames, an initial
        conversion to a numpy masked array is performed. When using standard
        numpy array/array-like objects, this function's fill_value parameter is
        used to specify the placeholder value for missing data (defaults to
        -1).

        Data equal to the placeholder value are subsequently ignored.
    estimator : str or float
        The desired estimator (see above for details on estimators). Possible
        values are:

            *'ML' (the default value)* : Maximum likelihood estimator.

            *any floating point value* : Maximum a posteriori esimator using
            Dirichlet prior (equivalent to maximum likelihood with pseudo-count
            for each outcome as specified).

            *PERKS* : Maximum a posteriori esimator using Dirichlet prior
            (equivalent to maximum likelihood with pseudo-count for each
            outcome set to 1/L, where L is the number of possible outcomes.

            *MINIMAX* : Maximum a posteriori esimator using Dirichlet prior
            (equivalent to maximum likelihood with pseudo-count for each
            outcome set to sqrt(N)/L, where N is the total number of
            realisations and where L is the number of possible outcomes.

            *JAMES-STEIN* : James-Stein estimator [HaSt09].

            *GOOD-TURING* : Good-Turing estimator [GaSa95].

    Alphabet_X : numpy array (or array-like object such as a list of \
    immutables, as accepted by np.array())
        An array specifying the alphabet/alphabets of possible outcomes that
        random variable realisations in array X may assume. Defaults to None,
        in which case the alphabet/alphabets of possible outcomes is/are
        implicitly based the observed outcomes in array X, with no additional,
        unobserved outcomes. In combination with any estimator other than
        maximum likelihood, it may be useful to specify alphabets including
        unobserved outcomes. For such cases, successive possible outcomes of a
        random variable are indexed by the last axis in Alphabet_X; multiple
        alphabets may be specified using preceding axes, with the requirement
        X.shape[:-1]==Alphabet_X.shape[:-1]. Alphabets of different sizes may
        be specified either using numpy masked arrays, or by padding with the
        chosen placeholder fill_value.

        NB: When specifying multiple alphabets, an alphabet of possible joint
        outcomes is always implicit from the alphabets of possible (marginal)
        outcomes in Alphabet_X. For example, specifying
        Alphabet_X=np.array(((1,2),(1,2))) implies an alphabet of possible
        joint outcomes np.array(((1,1,2,2),(1,2,1,2))).
    keep_dims : boolean
        When set to True, an additional dimension of length one is appended to
        the returned array, facilitating any broadcast operations required by
        the user (defaults to False).

    **Implementation notes**:

    Before estimation, outcomes are mapped to the set of non-negative integers
    internally, with the value -1 representing missing data. To avoid this
    internal conversion step, supply integer data and use the default fill
    value -1.

    """
    X, fill_value_X = _sanitise_array_input(X, fill_value)
    if Alphabet_X is not None:
        Alphabet_X, fill_value_Alphabet_X = _sanitise_array_input(
            Alphabet_X, fill_value
        )
        Alphabet_X, _ = _autocreate_alphabet(Alphabet_X, fill_value_Alphabet_X)
    else:
        Alphabet_X, fill_value_Alphabet_X = _autocreate_alphabet(X, fill_value_X)

    if X.size == 0:
        raise ValueError("arg X contains no elements")
    if np.any(_isnan(X)):
        raise ValueError("arg X contains NaN values")
    if Alphabet_X.size == 0:
        raise ValueError("arg Alphabet_X contains no elements")
    if np.any(_isnan(Alphabet_X)):
        raise ValueError("arg Alphabet_X contains NaN values")
    if _isnan(fill_value_X):
        raise ValueError("fill value for arg X is NaN")
    if X.shape[:-1] != Alphabet_X.shape[:-1]:
        raise ValueError("leading dimensions of args X and Alphabet_X do not " "match")
    if not (np.isscalar(base) and np.isreal(base) and base > 0):
        raise ValueError("arg base not a positive real-valued scalar")

    S, fill_value = _map_observations_to_integers(
        (X, Alphabet_X), (fill_value_X, fill_value_Alphabet_X)
    )
    X, Alphabet_X = S

    # Re-shape X, so that we may handle multi-dimensional arrays equivalently
    # and iterate across 0th axis
    X = np.reshape(X, (-1, X.shape[-1]))
    Alphabet_X = np.reshape(Alphabet_X, (-1, Alphabet_X.shape[-1]))

    I = 0
    M = np.zeros(X.shape[0]).astype("bool")
    M = _increment_binary_vector(M)
    while np.any(M):
        I -= (-1) ** (np.sum(M)) * entropy_joint(
            X[M], base, fill_value, estimator, Alphabet_X[M]
        )
        M = _increment_binary_vector(M)

    if keep_dims:
        I = I[..., np.newaxis]

    return I




[docs]
def information_binding(
    X, base=2, fill_value=-1, estimator="ML", Alphabet_X=None, keep_dims=False
):
    """
    Returns the estimated binding information [AbPl12] (also known as dual
    total correlation [Han78]) for an array X containing realisations of
    discrete random variables.

    **Mathematical definition**:

    Given discrete random variables :math:`X_1, \ldots, X_n`, the binding
    information :math:`B(X_1, \\ldots, X_n)` is defined as:

    .. math::
        B(X_1, \\ldots, X_n) = H(X_1, \\ldots, X_n) -
        \\sum_{i=1}^{n} H(X_i | X_1, \\ldots X_{i-1}, X_{i+1}, \\ldots, X_n)

    where :math:`H(\\cdot)` denotes the entropy and where
    :math:`H(\\cdot | \\cdot)` denotes the conditional entropy.

    **Estimation**:

    Binding information is estimated based on frequency tables, using the
    following functions:

        entropy_joint()

        entropy()

    See below for a list of available estimators. Note that although binding
    information is a non-negative quantity, depending on the chosen estimator
    the obtained estimate may be negative.

    **Parameters**:

    X : numpy array (or array-like object such as a list of immutables, as \
    accepted by np.array())
        An array containing discrete random variable realisations. Successive
        realisations of a random variable are indexed by the last axis in the
        array; multiple random variables may be specified using preceding axes.
        When X.ndim==1, returns the scalar 0. When X.ndim>1, returns a scalar
        based on jointly considering all random variables indexed in the array.
        X may not contain (floating point) NaN values. Missing data may be
        specified using numpy masked arrays, as well as using standard
        numpy array/array-like objects; see below for details.
    base : float
        The desired logarithmic base (default 2).
    fill_value : object
        It is possible to specify missing data using numpy masked arrays,
        pandas Series/DataFrames, as well as using standard numpy
        array/array-like objects with assigned placeholder values. When using
        numpy masked arrays, this function invokes np.ma.filled() internally,
        so that missing data are represented with the array's object-internal
        placeholder value fill_value (this function's fill_value parameter is
        ignored in such cases). When using pandas Series/DataFrames, an initial
        conversion to a numpy masked array is performed. When using standard
        numpy array/array-like objects, this function's fill_value parameter is
        used to specify the placeholder value for missing data (defaults to
        -1).

        Data equal to the placeholder value are subsequently ignored.
    estimator : str or float
        The desired estimator (see above for details on estimators). Possible
        values are:

            *'ML' (the default value)* : Maximum likelihood estimator.

            *any floating point value* : Maximum a posteriori esimator using
            Dirichlet prior (equivalent to maximum likelihood with pseudo-count
            for each outcome as specified).

            *PERKS* : Maximum a posteriori esimator using Dirichlet prior
            (equivalent to maximum likelihood with pseudo-count for each
            outcome set to 1/L, where L is the number of possible outcomes.

            *MINIMAX* : Maximum a posteriori esimator using Dirichlet prior
            (equivalent to maximum likelihood with pseudo-count for each
            outcome set to sqrt(N)/L, where N is the total number of
            realisations and where L is the number of possible outcomes.

            *JAMES-STEIN* : James-Stein estimator [HaSt09].

            *GOOD-TURING* : Good-Turing estimator [GaSa95].

    Alphabet_X : numpy array (or array-like object such as a list of \
    immutables, as accepted by np.array())
        An array specifying the alphabet/alphabets of possible outcomes that
        random variable realisations in array X may assume. Defaults to None,
        in which case the alphabet/alphabets of possible outcomes is/are
        implicitly based the observed outcomes in array X, with no additional,
        unobserved outcomes. In combination with any estimator other than
        maximum likelihood, it may be useful to specify alphabets including
        unobserved outcomes. For such cases, successive possible outcomes of a
        random variable are indexed by the last axis in Alphabet_X; multiple
        alphabets may be specified using preceding axes, with the requirement
        X.shape[:-1]==Alphabet_X.shape[:-1]. Alphabets of different sizes may
        be specified either using numpy masked arrays, or by padding with the
        chosen placeholder fill_value.

        NB: When specifying multiple alphabets, an alphabet of possible joint
        outcomes is always implicit from the alphabets of possible (marginal)
        outcomes in Alphabet_X. For example, specifying
        Alphabet_X=np.array(((1,2),(1,2))) implies an alphabet of possible
        joint outcomes np.array(((1,1,2,2),(1,2,1,2))).
    keep_dims : boolean
        When set to True, an additional dimension of length one is appended to
        the returned array, facilitating any broadcast operations required by
        the user (defaults to False).

    **Implementation notes**:

    Before estimation, outcomes are mapped to the set of non-negative integers
    internally, with the value -1 representing missing data. To avoid this
    internal conversion step, supply integer data and use the default fill
    value -1.

    """
    X, fill_value_X = _sanitise_array_input(X, fill_value)
    if Alphabet_X is not None:
        Alphabet_X, fill_value_Alphabet_X = _sanitise_array_input(
            Alphabet_X, fill_value
        )
        Alphabet_X, _ = _autocreate_alphabet(Alphabet_X, fill_value_Alphabet_X)
    else:
        Alphabet_X, fill_value_Alphabet_X = _autocreate_alphabet(X, fill_value_X)

    # Exceptions needed to create Alphabet_X correctly if None
    # Not all of these are needed, however we include them for consistency.
    if X.size == 0:
        raise ValueError("arg X contains no elements")
    if np.any(_isnan(X)):
        raise ValueError("arg X contains NaN values")
    if Alphabet_X.size == 0:
        raise ValueError("arg Alphabet_X contains no elements")
    if np.any(_isnan(Alphabet_X)):
        raise ValueError("arg Alphabet_X contains NaN values")
    if _isnan(fill_value_X):
        raise ValueError("fill value for arg X is NaN")
    if X.shape[:-1] != Alphabet_X.shape[:-1]:
        raise ValueError("leading dimensions of args X and Alphabet_X do not " "match")
    if not (np.isscalar(base) and np.isreal(base) and base > 0):
        raise ValueError("arg base not a positive real-valued scalar")

    S, fill_value = _map_observations_to_integers(
        (X, Alphabet_X), (fill_value_X, fill_value_Alphabet_X)
    )
    X, Alphabet_X = S

    # Exceptions thrown by entropy_joint
    H_joint = entropy_joint(X, base, fill_value, estimator, Alphabet_X)
    B = H_joint
    # Re-shape X, so that we may handle multi-dimensional arrays equivalently
    # and iterate across 0th axis
    X = np.reshape(X, (-1, X.shape[-1]))
    Alphabet_X = np.reshape(Alphabet_X, (-1, Alphabet_X.shape[-1]))
    M = np.arange(X.shape[0])

    for i in range(X.shape[0]):
        B -= H_joint
        if X.shape[0] > 1:
            B += entropy_joint(
                X[M != i], base, fill_value, estimator, Alphabet_X[M != i]
            )

    if keep_dims:
        B = B[..., np.newaxis]

    return B




[docs]
def information_multi(
    X, base=2, fill_value=-1, estimator="ML", Alphabet_X=None, keep_dims=False
):
    """
    Returns the estimated multi-information [StVe98] (also known as total
    correlation [Wata60]) for an array X containing realisations of discrete
    random variables.

    **Mathematical definition**:

    Given discrete random variables :math:`X_1, \ldots, X_n`, the
    multi-information :math:`T(X_1, \\ldots, X_n)` is defined as:

    .. math::
        T(X_1, \\ldots, X_n) = \\left( \\sum_{i=1}^{n} H(X_i) \\right) -
        H(X_1, \\ldots, X_n)

    where :math:`H(\\cdot)` denotes the entropy and where
    :math:`H(\\cdot, \\ldots, \\cdot)` denotes the joint entropy.

    **Estimation**:

    Multi-information is estimated based on frequency tables, using the
    following functions:

        entropy_joint()

        entropy()

    See below for a list of available estimators. Note that although
    multi-information is a non-negative quantity, depending on the chosen
    estimator the obtained estimate may be negative.

    **Parameters**:

    X : numpy array (or array-like object such as a list of immutables, as \
    accepted by np.array())
        An array containing discrete random variable realisations. Successive
        realisations of a random variable are indexed by the last axis in the
        array; multiple random variables may be specified using preceding axes.
        When X.ndim==1, returns the scalar 0. When X.ndim>1, returns a scalar
        based on jointly considering all random variables indexed in the array.
        X may not contain (floating point) NaN values. Missing data may be
        specified using numpy masked arrays, as well as using standard
        numpy array/array-like objects; see below for details.
    base : float
        The desired logarithmic base (default 2).
    fill_value : object
        It is possible to specify missing data using numpy masked arrays,
        pandas Series/DataFrames, as well as using standard numpy
        array/array-like objects with assigned placeholder values. When using
        numpy masked arrays, this function invokes np.ma.filled() internally,
        so that missing data are represented with the array's object-internal
        placeholder value fill_value (this function's fill_value parameter is
        ignored in such cases). When using pandas Series/DataFrames, an initial
        conversion to a numpy masked array is performed. When using standard
        numpy array/array-like objects, this function's fill_value parameter is
        used to specify the placeholder value for missing data (defaults to
        -1).

        Data equal to the placeholder value are subsequently ignored.
    estimator : str or float
        The desired estimator (see above for details on estimators). Possible
        values are:

            *'ML' (the default value)* : Maximum likelihood estimator.

            *any floating point value* : Maximum a posteriori esimator using
            Dirichlet prior (equivalent to maximum likelihood with pseudo-count
            for each outcome as specified).

            *PERKS* : Maximum a posteriori esimator using Dirichlet prior
            (equivalent to maximum likelihood with pseudo-count for each
            outcome set to 1/L, where L is the number of possible outcomes.

            *MINIMAX* : Maximum a posteriori esimator using Dirichlet prior
            (equivalent to maximum likelihood with pseudo-count for each
            outcome set to sqrt(N)/L, where N is the total number of
            realisations and where L is the number of possible outcomes.

            *JAMES-STEIN* : James-Stein estimator [HaSt09].

            *GOOD-TURING* : Good-Turing estimator [GaSa95].

    Alphabet_X : numpy array (or array-like object such as a list of \
    immutables, as accepted by np.array())
        An array specifying the alphabet/alphabets of possible outcomes that
        random variable realisations in array X may assume. Defaults to None,
        in which case the alphabet/alphabets of possible outcomes is/are
        implicitly based the observed outcomes in array X, with no additional,
        unobserved outcomes. In combination with any estimator other than
        maximum likelihood, it may be useful to specify alphabets including
        unobserved outcomes. For such cases, successive possible outcomes of a
        random variable are indexed by the last axis in Alphabet_X; multiple
        alphabets may be specified using preceding axes, with the requirement
        X.shape[:-1]==Alphabet_X.shape[:-1]. Alphabets of different sizes may
        be specified either using numpy masked arrays, or by padding with the
        chosen placeholder fill_value.

        NB: When specifying multiple alphabets, an alphabet of possible joint
        outcomes is always implicit from the alphabets of possible (marginal)
        outcomes in Alphabet_X. For example, specifying
        Alphabet_X=np.array(((1,2),(1,2))) implies an alphabet of possible
        joint outcomes np.array(((1,1,2,2),(1,2,1,2))).
    keep_dims : boolean
        When set to True, an additional dimension of length one is appended to
        the returned array, facilitating any broadcast operations required by
        the user (defaults to False).

    **Implementation notes**:

    Before estimation, outcomes are mapped to the set of non-negative integers
    internally, with the value -1 representing missing data. To avoid this
    internal conversion step, supply integer data and use the default fill
    value -1.
    """
    H = entropy(X, base, fill_value, estimator, Alphabet_X)
    H_joint = entropy_joint(X, base, fill_value, estimator, Alphabet_X)

    T = np.sum(H) - H_joint

    if keep_dims:
        T = T[..., np.newaxis]

    return T




[docs]
def information_mutual_conditional(
    X,
    Y,
    Z,
    cartesian_product=False,
    base=2,
    fill_value=-1,
    estimator="ML",
    Alphabet_X=None,
    Alphabet_Y=None,
    Alphabet_Z=None,
    keep_dims=False,
):
    """
    Returns the conditional mutual information (see e.g. [CoTh06]) between
    arrays X and Y given array Z, each containing discrete random variable
    realisations.

    **Mathematical definition**:

    Given discrete random variables :math:`X`, :math:`Y`,  :math:`Z`, the
    conditional mutual information :math:`I(X;Y|Z)` is defined as:

    .. math::
        I(X;Y|Z) = H(X|Z) - H(X|Y,Z)

    where :math:`H(\\cdot|\\cdot)` denotes the conditional entropy.

    **Estimation**:

    Conditional mutual information is estimated based on frequency tables,
    using the following functions:

        entropy_joint()

        entropy()

    See below for a list of available estimators. Note that although
    conditional mutual information is a non-negative quantity, depending on the
    chosen estimator the obtained estimate may be negative.

    **Parameters**:

    X,Y,Z : numpy array (or array-like object such as a list of immutables, \
    as accepted by np.array())
        *cartesian_product==False*: X,Y,Z are arrays containing discrete random
        variable realisations, with X.shape==Y.shape==Z.shape. Successive
        realisations of a random variable are indexed by the last axis in the
        respective arrays; multiple random variables in X,Y,Z may be specified
        using preceding axes of the respective arrays (random variables are
        paired **one-to-one** between X,Y,Z). When X.ndim==Y.ndim==Z.ndim==1,
        returns a scalar. When X.ndim>1 and Y.ndim>1 and Z.ndim>1, returns an
        array of estimated conditional mutual information values with
        dimensions X.shape[:-1]. Neither X nor Y nor Z may contain (floating
        point) NaN values. Missing data may be specified using numpy masked
        arrays, as well as using standard numpy array/array-like objects;
        see below for details.

        *cartesian_product==True*: X,Y,Z are arrays containing discrete random
        variable realisations, with X.shape[-1]==Y.shape[-1]==Z.shape[-1].
        Successive realisations of a random variable are indexed by the last
        axis in the respective arrays; multiple random variables in X,Y,Z may
        be specified using preceding axes of the respective arrays (random
        variables are paired **many-to-many** between X,Y,Z). When
        X.ndim==Y.ndim==Z.ndim==1, returns a scalar. When X.ndim>1 or Y.ndim>1
        or Z.ndim>1, returns an array of estimated conditional mutual
        information values with dimensions
        np.append(X.shape[:-1],Y.shape[:-1],Z.shape[:-1]). Neither X nor Y nor
        Z may contain (floating point) NaN values. Missing data may be
        specified using numpy masked arrays, as well as using standard
        numpy array/array-like objects; see below for details.
    cartesian_product : boolean
        Indicates whether random variables are paired **one-to-one** between
        X,Y,Z (cartesian_product==False, the default value) or **many-to-many**
        between X,Y,Z (cartesian_product==True).
    base : float
        The desired logarithmic base (default 2).
    fill_value : object
        It is possible to specify missing data using numpy masked arrays,
        pandas Series/DataFrames, as well as using standard numpy
        array/array-like objects with assigned placeholder values. When using
        numpy masked arrays, this function invokes np.ma.filled() internally,
        so that missing data are represented with the array's object-internal
        placeholder value fill_value (this function's fill_value parameter is
        ignored in such cases). When using pandas Series/DataFrames, an initial
        conversion to a numpy masked array is performed. When using standard
        numpy array/array-like objects, this function's fill_value parameter is
        used to specify the placeholder value for missing data (defaults to
        -1).

        Data equal to the placeholder value are subsequently ignored.
    estimator : str or float
        The desired estimator (see above for details on estimators). Possible
        values are:

            *'ML' (the default value)* : Maximum likelihood estimator.

            *any floating point value* : Maximum a posteriori esimator using
            Dirichlet prior (equivalent to maximum likelihood with pseudo-count
            for each outcome as specified).

            *PERKS* : Maximum a posteriori esimator using Dirichlet prior
            (equivalent to maximum likelihood with pseudo-count for each
            outcome set to 1/L, where L is the number of possible outcomes.

            *MINIMAX* : Maximum a posteriori esimator using Dirichlet prior
            (equivalent to maximum likelihood with pseudo-count for each
            outcome set to sqrt(N)/L, where N is the total number of
            realisations and where L is the number of possible outcomes.

            *JAMES-STEIN* : James-Stein estimator [HaSt09].

            *GOOD-TURING* : Good-Turing estimator [GaSa95].

    Alphabet_X, Alphabet_Y, Alphabet_Z : numpy array (or array-like object \
    such as a list of immutables, as accepted by np.array())
        Respectively an array specifying the alphabet/alphabets of possible
        outcomes that random variable realisations in array X, Y, Z may assume.
        Defaults to None, in which case the alphabet/alphabets of possible
        outcomes is/are implicitly based the observed outcomes in array X, Y, Z
        respectively, with no additional, unobserved outcomes. In combination
        with any estimator other than maximum likelihood, it may be useful to
        specify alphabets including unobserved outcomes. For such cases,
        successive possible outcomes of a random variable are indexed by the
        last axis in Alphabet_X, Alphabet_Y, Alphabet_Z respectively; multiple
        alphabets may be specified using preceding axes, with the requirement
        X.shape[:-1]==Alphabet_X.shape[:-1] (analogously for Y and Z).
        Alphabets of different sizes may be specified either using numpy masked
        arrays, or by padding with the chosen placeholder fill_value.

        NB: When specifying alphabets, an alphabet of possible joint outcomes
        is always implicit from the alphabets of possible (marginal) outcomes
        in Alphabet_X, Alphabet_Y, Alphabet_Z. For example, specifying
        Alphabet_X=Alphabet_Y=Alphabet_Z=np.array(((1,2)) implies an alphabet
        of possible joint outcomes
        np.array((1,1,1,1,2,2,2,2),((1,1,2,2,1,1,2,2),(1,2,1,2,1,2,1,2))).
    keep_dims : boolean
        When set to True and cartesian_product==False an additional dimension
        of length one is appended to the returned array, facilitating any
        broadcast operations required by the user (defaults to False). Has no
        effect when cartesian_product==True.

    **Implementation notes**:

    Before estimation, outcomes are mapped to the set of non-negative integers
    internally, with the value -1 representing missing data. To avoid this
    internal conversion step, supply integer data and use the default fill
    value -1.
    """
    X, fill_value_X = _sanitise_array_input(X, fill_value)
    Y, fill_value_Y = _sanitise_array_input(Y, fill_value)
    Z, fill_value_Z = _sanitise_array_input(Z, fill_value)
    if Alphabet_X is not None:
        Alphabet_X, fill_value_Alphabet_X = _sanitise_array_input(
            Alphabet_X, fill_value
        )
        Alphabet_X, _ = _autocreate_alphabet(Alphabet_X, fill_value_Alphabet_X)
    else:
        Alphabet_X, fill_value_Alphabet_X = _autocreate_alphabet(X, fill_value_X)
    if Alphabet_Y is not None:
        Alphabet_Y, fill_value_Alphabet_Y = _sanitise_array_input(
            Alphabet_Y, fill_value
        )
        Alphabet_Y, _ = _autocreate_alphabet(Alphabet_Y, fill_value_Alphabet_Y)
    else:
        Alphabet_Y, fill_value_Alphabet_Y = _autocreate_alphabet(Y, fill_value_Y)
    if Alphabet_Z is not None:
        Alphabet_Z, fill_value_Alphabet_Z = _sanitise_array_input(
            Alphabet_Z, fill_value
        )
        Alphabet_Z, _ = _autocreate_alphabet(Alphabet_Z, fill_value_Alphabet_Z)
    else:
        Alphabet_Z, fill_value_Alphabet_Z = _autocreate_alphabet(Z, fill_value_Z)

    if X.size == 0:
        raise ValueError("arg X contains no elements")
    if Y.size == 0:
        raise ValueError("arg Y contains no elements")
    if Z.size == 0:
        raise ValueError("arg Z contains no elements")
    if np.any(_isnan(X)):
        raise ValueError("arg X contains NaN values")
    if np.any(_isnan(Y)):
        raise ValueError("arg Y contains NaN values")
    if np.any(_isnan(Z)):
        raise ValueError("arg Z contains NaN values")
    if Alphabet_X.size == 0:
        raise ValueError("arg Alphabet_X contains no elements")
    if np.any(_isnan(Alphabet_X)):
        raise ValueError("arg Alphabet_X contains NaN values")
    if Alphabet_Y.size == 0:
        raise ValueError("arg Alphabet_Y contains no elements")
    if np.any(_isnan(Alphabet_Y)):
        raise ValueError("arg Alphabet_Y contains NaN values")
    if Alphabet_Z.size == 0:
        raise ValueError("arg Alphabet_Z contains no elements")
    if np.any(_isnan(Alphabet_Z)):
        raise ValueError("arg Alphabet_Z contains NaN values")
    if _isnan(fill_value_X):
        raise ValueError("fill value for arg X is NaN")
    if _isnan(fill_value_Y):
        raise ValueError("fill value for arg Y is NaN")
    if _isnan(fill_value_Z):
        raise ValueError("fill value for arg Z is NaN")
    if X.shape[:-1] != Alphabet_X.shape[:-1]:
        raise ValueError("leading dimensions of args X and Alphabet_X do not " "match")
    if Y.shape[:-1] != Alphabet_Y.shape[:-1]:
        raise ValueError("leading dimensions of args Y and Alphabet_Y do not " "match")
    if Z.shape[:-1] != Alphabet_Z.shape[:-1]:
        raise ValueError("leading dimensions of args Z and Alphabet_Z do not " "match")
    if not cartesian_product and (X.shape != Y.shape or X.shape != Z.shape):
        raise ValueError("dimensions of args X, Y, Z do not match")
    if cartesian_product and (X.shape[-1] != Y.shape[-1] or X.shape[-1] != Z.shape[-1]):
        raise ValueError("trailing dimensions of args X, Y, Z do not match")
    if not (np.isscalar(base) and np.isreal(base) and base > 0):
        raise ValueError("arg base not a positive real-valued scalar")

    S, fill_value = _map_observations_to_integers(
        (X, Alphabet_X, Y, Alphabet_Y, Z, Alphabet_Z),
        (
            fill_value_X,
            fill_value_Alphabet_X,
            fill_value_Y,
            fill_value_Alphabet_Y,
            fill_value_Z,
            fill_value_Alphabet_Z,
        ),
    )
    X, Alphabet_X, Y, Alphabet_Y, Z, Alphabet_Z = S

    if not cartesian_product:
        I = np.empty(X.shape[:-1])
        if I.ndim > 0:
            I[:] = np.nan
        else:
            I = np.float64(np.nan)
    else:
        shapeI_Z = Z.shape[:-1]
        Z = np.reshape(Z, (-1, Z.shape[-1]))
        Alphabet_Z = np.reshape(Alphabet_Z, (-1, Alphabet_Z.shape[-1]))
        I = []
        for i in range(Z.shape[0]):

            def f(X, Y, Alphabet_X, Alphabet_Y):
                return information_mutual_conditional(
                    X,
                    Y,
                    Z[i],
                    False,
                    base,
                    fill_value,
                    estimator,
                    Alphabet_X,
                    Alphabet_Y,
                    Alphabet_Z[i],
                )

            I.append(_cartesian_product_apply(X, Y, f, Alphabet_X, Alphabet_Y))
        shapeI_XY = I[0].shape
        if len(shapeI_Z) == 0:
            I = np.array(I)[0].reshape(shapeI_XY)
        else:
            I = np.array(I)
            I = np.rollaxis(I, 0, len(I.shape))
            I = I.reshape(np.append(shapeI_XY, shapeI_Z).astype("int"))
        return I

    # Re-shape H, X,Y,Z so that we may handle multi-dimensional arrays
    # equivalently and iterate across 0th axis
    X = np.reshape(X, (-1, X.shape[-1]))
    Y = np.reshape(Y, (-1, Y.shape[-1]))
    Z = np.reshape(Z, (-1, Z.shape[-1]))
    Alphabet_X = np.reshape(Alphabet_X, (-1, Alphabet_X.shape[-1]))
    Alphabet_Y = np.reshape(Alphabet_Y, (-1, Alphabet_Y.shape[-1]))
    Alphabet_Z = np.reshape(Alphabet_Z, (-1, Alphabet_Z.shape[-1]))
    orig_shape_I = I.shape
    I = np.reshape(I, (-1, 1))

    for i in range(X.shape[0]):
        I_ = (
            entropy_joint(
                np.vstack((X[i], Z[i])),
                base,
                fill_value,
                estimator,
                _vstack_pad((Alphabet_X[i], Alphabet_Z[i]), fill_value),
            )
            + entropy_joint(
                np.vstack((Y[i], Z[i])),
                base,
                fill_value,
                estimator,
                _vstack_pad((Alphabet_Y[i], Alphabet_Z[i]), fill_value),
            )
            - entropy_joint(
                np.vstack((X[i], Y[i], Z[i])),
                base,
                fill_value,
                estimator,
                _vstack_pad((Alphabet_X[i], Alphabet_Y[i], Alphabet_Z[i]), fill_value),
            )
            - entropy_joint(Z[i], base, fill_value, estimator, Alphabet_Z[i])
        )
        I[i] = I_

    # Reverse re-shaping
    I = np.reshape(I, orig_shape_I)

    if keep_dims and not cartesian_product:
        I = I[..., np.newaxis]

    return I




[docs]
def information_lautum(
    X,
    Y=None,
    cartesian_product=False,
    base=2,
    fill_value=-1,
    estimator="ML",
    Alphabet_X=None,
    Alphabet_Y=None,
    keep_dims=False,
):
    """
    Returns the lautum information [PaVe08] between arrays X and Y, each
    containing discrete random variable realisations.

    **Mathematical definition**:

    Denoting with :math:`P_X(x)`, :math:`P_Y(x)` respectively the probability
    of observing an outcome :math:`x` with discrete random variables :math:`X`,
    :math:`Y`, and denoting with :math:`P_{XY}(x,y)` the probability of jointly
    observing outcomes :math:`x`, :math:`y` respectively with :math:`X`,
    :math:`Y`, the lautum information :math:`L(X;Y)` is defined as:

    .. math::
        \\begin{eqnarray}
            L(X;Y) &=& -\\sum_x \\sum_y
            {P_X(x) P_Y(y) \\log {\\frac{P_X(x) P_Y(y)}{P_{XY}(x,y)}}} \\\\
            &=& D_{\\mathrm{KL}}(P_X P_Y \\parallel P_{XY})
        \\end{eqnarray}

    where :math:`D_{\\mathrm{KL}}(\\cdot \\parallel \\cdot)` denotes the
    Kullback-Leibler divergence. Note that *lautum* is *mutual* spelt
    backwards; denoting with :math:`I(\\cdot;\\cdot)` the mutual information it
    may be shown (see e.g. [CoTh06]) that

    .. math::
        \\begin{eqnarray}
            I(X;Y) &=& D_{\\mathrm{KL}}(P_{XY} \\parallel P_X P_Y).
        \\end{eqnarray}

    **Estimation**:

    Lautum information is estimated based on frequency tables. See below for a
    list of available estimators.

    **Parameters**:

    X,Y : numpy array (or array-like object such as a list of immutables, as \
    accepted by np.array())
        *cartesian_product==False and Y is not None*: X and Y are arrays
        containing discrete random variable realisations, with
        X.shape[:-1]==Y.shape[:-1]. Successive realisations of a random
        variable are indexed by the last axis in the respective arrays;
        multiple random variables in X and Y may be specified using preceding
        axes of the respective arrays (random variables are paired
        **one-to-one** between X and Y). When X.ndim==Y.ndim==1, returns a
        scalar. When X.ndim>1 and Y.ndim>1, returns an array of estimated
        information values with dimensions X.shape[:-1]. Neither X nor Y may
        contain (floating point) NaN values. Missing data may be specified
        using numpy masked arrays, as well as using standard numpy
        array/array-like objects; see below for details.

        *cartesian_product==True and Y is not None*: X and Y are arrays
        containing discrete random variable realisations. Successive
        realisations of a random variable are indexed by the last axis in the
        respective arrays; multiple random variables in X and Y may be
        specified using preceding axes of the respective arrays (random
        variables are paired **many-to-many** between X and Y). When
        X.ndim==Y.ndim==1, returns a scalar. When X.ndim>1 or Y.ndim>1, returns
        an array of estimated information values with dimensions
        np.append(X.shape[:-1],Y.shape[:-1]). Neither X nor Y may contain
        (floating point) NaN values. Missing data may be specified using numpy
        masked arrays, as well as using standard numpy array/array-like
        objects; see below for details.

        *Y is None*: Equivalent to information_lautum(X, X, ... ). Thus, a
        shorthand syntax for computing lautum information (in bits) between all
        pairs of random variables in X is information_lautum(X).
    cartesian_product : boolean
        Indicates whether random variables are paired **one-to-one** between X
        and Y (cartesian_product==False, the default value) or **many-to-many**
        between X and Y (cartesian_product==True).
    base : float
        The desired logarithmic base (default 2).
    fill_value : object
        It is possible to specify missing data using numpy masked arrays,
        pandas Series/DataFrames, as well as using standard numpy
        array/array-like objects with assigned placeholder values. When using
        numpy masked arrays, this function invokes np.ma.filled() internally,
        so that missing data are represented with the array's object-internal
        placeholder value fill_value (this function's fill_value parameter is
        ignored in such cases). When using pandas Series/DataFrames, an initial
        conversion to a numpy masked array is performed. When using standard
        numpy array/array-like objects, this function's fill_value parameter is
        used to specify the placeholder value for missing data (defaults to
        -1).

        Data equal to the placeholder value are subsequently ignored.
    estimator : str or float
        The desired estimator (see above for details on estimators). Possible
        values are:

            *'ML' (the default value)* : Maximum likelihood estimator.

            *any floating point value* : Maximum a posteriori esimator using
            Dirichlet prior (equivalent to maximum likelihood with pseudo-count
            for each outcome as specified).

            *PERKS* : Maximum a posteriori esimator using Dirichlet prior
            (equivalent to maximum likelihood with pseudo-count for each
            outcome set to 1/L, where L is the number of possible outcomes.

            *MINIMAX* : Maximum a posteriori esimator using Dirichlet prior
            (equivalent to maximum likelihood with pseudo-count for each
            outcome set to sqrt(N)/L, where N is the total number of
            realisations and where L is the number of possible outcomes.

            *JAMES-STEIN* : James-Stein estimator [HaSt09].

            *GOOD-TURING* : Good-Turing estimator [GaSa95].

    Alphabet_X, Alphabet_Y : numpy array (or array-like object such as a list \
    of immutables, as accepted by np.array())
        Respectively an array specifying the alphabet/alphabets of possible
        outcomes that random variable realisations in array X, Y may assume.
        Defaults to None, in which case the alphabet/alphabets of possible
        outcomes is/are implicitly based the observed outcomes in array X, Y
        respectively, with no additional, unobserved outcomes. In combination
        with any estimator other than maximum likelihood, it may be useful to
        specify alphabets including unobserved outcomes. For such cases,
        successive possible outcomes of a random variable are indexed by the
        last axis in Alphabet_X, Alphabet_Y respectively; multiple alphabets
        may be specified using preceding axes, with the requirement
        X.shape[:-1]==Alphabet_X.shape[:-1] (analogously for Y). Alphabets of
        different sizes may be specified either using numpy masked arrays, or
        by padding with the chosen placeholder fill_value.

        NB: When specifying alphabets, an alphabet of possible joint outcomes
        is always implicit from the alphabets of possible (marginal) outcomes
        in Alphabet_X, Alphabet_Y. For example, specifying
        Alphabet_X=Alphabet_Y=np.array(((1,2)) implies an alphabet of possible
        joint outcomes np.array(((1,1,2,2),(1,2,1,2))).
    keep_dims : boolean
        When set to True and cartesian_product==False an additional dimension
        of length one is appended to the returned array, facilitating any
        broadcast operations required by the user (defaults to False). Has no
        effect when cartesian_product==True.

    **Implementation notes**:

    Before estimation, outcomes are mapped to the set of non-negative integers
    internally, with the value -1 representing missing data. To avoid this
    internal conversion step, supply integer data and use the default fill
    value -1.
    """
    if Y is None:
        Y = X
        cartesian_product = True
        Alphabet_Y = Alphabet_X

    X, fill_value_X = _sanitise_array_input(X, fill_value)
    Y, fill_value_Y = _sanitise_array_input(Y, fill_value)
    if Alphabet_X is not None:
        Alphabet_X, fill_value_Alphabet_X = _sanitise_array_input(
            Alphabet_X, fill_value
        )
        Alphabet_X, _ = _autocreate_alphabet(Alphabet_X, fill_value_Alphabet_X)
    else:
        Alphabet_X, fill_value_Alphabet_X = _autocreate_alphabet(X, fill_value_X)
    if Alphabet_Y is not None:
        Alphabet_Y, fill_value_Alphabet_Y = _sanitise_array_input(
            Alphabet_Y, fill_value
        )
        Alphabet_Y, _ = _autocreate_alphabet(Alphabet_Y, fill_value_Alphabet_Y)
    else:
        Alphabet_Y, fill_value_Alphabet_Y = _autocreate_alphabet(Y, fill_value_Y)

    if X.size == 0:
        raise ValueError("arg X contains no elements")
    if np.any(_isnan(X)):
        raise ValueError("arg X contains NaN values")
    if Y.size == 0:
        raise ValueError("arg Y contains no elements")
    if np.any(_isnan(Y)):
        raise ValueError("arg Y contains NaN values")
    if Alphabet_X.size == 0:
        raise ValueError("arg Alphabet_X contains no elements")
    if np.any(_isnan(Alphabet_X)):
        raise ValueError("arg Alphabet_X contains NaN values")
    if Alphabet_Y.size == 0:
        raise ValueError("arg Alphabet_Y contains no elements")
    if np.any(_isnan(Alphabet_Y)):
        raise ValueError("arg Alphabet_Y contains NaN values")
    if _isnan(fill_value_X):
        raise ValueError("fill value for arg X is NaN")
    if _isnan(fill_value_Y):
        raise ValueError("fill value for arg Y is NaN")
    if X.shape[:-1] != Alphabet_X.shape[:-1]:
        raise ValueError("leading dimensions of args X and Alphabet_X do not " "match")
    if Y.shape[:-1] != Alphabet_Y.shape[:-1]:
        raise ValueError("leading dimensions of args Y and Alphabet_Y do not " "match")
    if not cartesian_product and X.shape != Y.shape:
        raise ValueError("dimensions of args X and Y do not match")
    if cartesian_product and X.shape[-1] != Y.shape[-1]:
        raise ValueError("trailing dimensions of args X and Y do not match")
    if not (np.isscalar(base) and np.isreal(base) and base > 0):
        raise ValueError("arg base not a positive real-valued scalar")

    S, fill_value = _map_observations_to_integers(
        (X, Alphabet_X, Y, Alphabet_Y),
        (fill_value_X, fill_value_Alphabet_X, fill_value_Y, fill_value_Alphabet_Y),
    )
    X, Alphabet_X, Y, Alphabet_Y = S

    if not cartesian_product:
        H = np.empty(X.shape[:-1])
        if H.ndim > 0:
            H[:] = np.nan
        else:
            H = np.float64(np.nan)
    else:

        def f(X, Y, Alphabet_X, Alphabet_Y):
            return information_lautum(
                X, Y, False, base, fill_value, estimator, Alphabet_X, Alphabet_Y
            )

        return _cartesian_product_apply(X, Y, f, Alphabet_X, Alphabet_Y)

    # Re-shape H, X and Y, so that we may handle multi-dimensional arrays
    # equivalently and iterate across 0th axis
    X = np.reshape(X, (-1, X.shape[-1]))
    Y = np.reshape(Y, (-1, Y.shape[-1]))
    Alphabet_X = np.reshape(Alphabet_X, (-1, Alphabet_X.shape[-1]))
    Alphabet_Y = np.reshape(Alphabet_Y, (-1, Alphabet_Y.shape[-1]))
    orig_shape_H = H.shape
    H = np.reshape(H, (-1, 1))

    _verify_alphabet_sufficiently_large(X, Alphabet_X, fill_value)
    _verify_alphabet_sufficiently_large(Y, Alphabet_Y, fill_value)

    for i in range(X.shape[0]):
        # Sort X and Y jointly, so that we can determine joint symbol
        # probabilities
        JointXY = np.vstack((X[i], Y[i]))
        JointXY = JointXY[:, JointXY[0].argsort(kind="mergesort")]
        JointXY = JointXY[:, JointXY[1].argsort(kind="mergesort")]

        # Compute symbol run-lengths
        # Compute symbol change indicators
        B = np.any(JointXY[:, 1:] != JointXY[:, :-1], axis=0)

        # Obtain symbol change positions
        I = np.append(np.where(B), JointXY.shape[1] - 1)
        # Compute run lengths
        L = np.diff(np.append(-1, I))

        alphabet_XY = JointXY[:, I]
        if estimator != "ML":
            L, alphabet_XY = _append_empty_bins_using_alphabet(
                L,
                alphabet_XY,
                _vstack_pad((Alphabet_X[i], Alphabet_Y[i]), fill_value),
                fill_value,
            )
        L, alphabet_XY = _remove_counts_at_fill_value(L, alphabet_XY, fill_value)
        if not np.any(L):
            continue
        P_XY, _ = _estimate_probabilities(L, estimator)

        # Assign probabilities in P_XY to P_XY_reshaped, a matrix which
        # exhaustively records probabilities for all elements in the cartesian
        # product of alphabets. In this way, we can subsequently integrate
        # across variables X, Y.
        alphabet_X = np.unique(alphabet_XY[0])
        alphabet_Y = np.unique(alphabet_XY[1])
        P_XY_reshaped = np.zeros((alphabet_Y.size, alphabet_X.size))
        j = k = c = 0
        for c in range(P_XY.size):
            if c > 0 and alphabet_XY[1, c] != alphabet_XY[1, c - 1]:
                k = 0
            while alphabet_X[k] != alphabet_XY[0, c]:
                k = k + 1
            while alphabet_Y[j] != alphabet_XY[1, c]:
                j = j + 1
            P_XY_reshaped[j, k] = P_XY[c]

        # Integrate across X, Y
        P_X = np.reshape(np.sum(P_XY_reshaped, axis=0), (1, -1))
        P_Y = np.reshape(np.sum(P_XY_reshaped, axis=1), (-1, 1))

        H[i] = divergence_kullbackleibler_pmf(
            np.reshape(P_X * P_Y, (1, -1)),
            np.reshape(P_XY_reshaped, (1, -1)),
            False,
            base,
        )

    # Reverse re-shaping
    H = np.reshape(H, orig_shape_H)

    if keep_dims and not cartesian_product:
        H = H[..., np.newaxis]

    return H




[docs]
def information_mutual_normalised(
    X,
    Y=None,
    norm_factor="Y",
    cartesian_product=False,
    fill_value=-1,
    estimator="ML",
    Alphabet_X=None,
    Alphabet_Y=None,
    keep_dims=False,
):
    # TODO Documentation should include properties for each of the
    # normalisation factors
    """
    Returns the normalised mutual information between arrays X and Y, each
    containing discrete random variable realisations.

    **Mathematical definition**:

    Given discrete random variables :math:`X`, :math:`Y`, the normalised mutual
    information :math:`NI(X;Y)` is defined as:

    .. math::
        NI(X;Y) = \\frac{I(X;Y)}{C_n}

    where :math:`I` denotes the mutual information and where :math:`C_n`
    denotes a normalisation factor. Normalised mutual information is a
    dimensionless quantity, with :math:`C_n` alternatively defined as:

    .. math::
         \\begin{eqnarray}
           C_{\\text{X}} &=& H(X) \\\\
           C_{\\text{Y}} &=& H(Y) \\\\
           C_{\\text{X+Y}} &=& H(X) + H(Y) \\\\
           C_{\\text{MIN}} &=& \\min \{ H(X), H(Y) \} \\\\
           C_{\\text{MAX}} &=& \\max \{ H(X), H(Y) \} \\\\
           C_{\\text{XY}} &=& H(X,Y) \\\\
           C_{\\text{SQRT}} &=& \\sqrt{H(X) H(Y)}
         \\end{eqnarray}

    where :math:`H(\\cdot)` and :math:`H(\\cdot,\\cdot)` respectively denote
    the entropy and joint entropy.

    **Estimation**:

    Normalised mutual information is estimated based on frequency tables, using
    the following functions:

        entropy_joint()

        entropy()

    See below for a list of available estimators. Note that although normalised
    mutual information is a non-negative quantity, depending on the chosen
    estimator the obtained estimate may be negative.

    **Parameters**:

    X,Y : numpy array (or array-like object such as a list of immutables, as \
    accepted by np.array())
        *cartesian_product==False and Y is not None*: X and Y are arrays
        containing discrete random variable realisations, with
        X.shape==Y.shape. Successive realisations of a random variable are
        indexed by the last axis in the respective arrays; multiple random
        variables in X and Y may be specified using preceding axes of the
        respective arrays (random variables are paired **one-to-one** between X
        and Y). When X.ndim==Y.ndim==1, returns a scalar. When X.ndim>1 and
        Y.ndim>1, returns an array of estimated normalised information values
        with dimensions X.shape[:-1]. Neither X nor Y may contain (floating
        point) NaN values. Missing data may be specified using numpy masked
        arrays, as well as using standard numpy array/array-like objects;
        see below for details.

        *cartesian_product==True and Y is not None*: X and Y are arrays
        containing discrete random variable realisations, with
        X.shape[-1]==Y.shape[-1]. Successive realisations of a random variable
        are indexed by the last axis in the respective arrays; multiple random
        variables in X and Y may be specified using preceding axes of the
        respective arrays (random variables are paired **many-to-many** between
        X and Y). When X.ndim==Y.ndim==1, returns a scalar. When X.ndim>1 or
        Y.ndim>1, returns an array of estimated normalised information values
        with dimensions np.append(X.shape[:-1],Y.shape[:-1]). Neither X nor Y
        may contain (floating point) NaN values. Missing data may be specified
        using numpy masked arrays, as well as using standard numpy
        array/array-like objects; see below for details.

        *Y is None*: Equivalent to information_mutual_normalised(X, X,
        norm_factor, True). Thus, a shorthand syntax for computing normalised
        mutual information (based on C_n = C_Y as defined above) between all
        pairs of random variables in X is information_mutual_normalised(X).
    norm_factor : string
        The desired normalisation factor, specified as a string. Internally,
        the supplied string is converted to upper case and spaces are
        discarded. Subsequently, the function tests for one of the following
        string values, each corresponding to an alternative normalisation
        factor as defined above:

        *'X'*

        *'Y' (the default value)*

        *'X+Y' (equivalently 'Y+X')*

        *'MIN'*

        *'MAX'*

        *'XY' (equivalently YX)*

        *'SQRT'*
    cartesian_product : boolean
        Indicates whether random variables are paired **one-to-one** between X
        and Y (cartesian_product==False, the default value) or **many-to-many**
        between X and Y (cartesian_product==True).
    fill_value : object
        It is possible to specify missing data using numpy masked arrays,
        pandas Series/DataFrames, as well as using standard numpy
        array/array-like objects with assigned placeholder values. When using
        numpy masked arrays, this function invokes np.ma.filled() internally,
        so that missing data are represented with the array's object-internal
        placeholder value fill_value (this function's fill_value parameter is
        ignored in such cases). When using pandas Series/DataFrames, an initial
        conversion to a numpy masked array is performed. When using standard
        numpy array/array-like objects, this function's fill_value parameter is
        used to specify the placeholder value for missing data (defaults to
        -1).

        Data equal to the placeholder value are subsequently ignored.
    estimator : str or float
        The desired estimator (see above for details on estimators). Possible
        values are:

            *'ML' (the default value)* : Maximum likelihood estimator.

            *any floating point value* : Maximum a posteriori esimator using
            Dirichlet prior (equivalent to maximum likelihood with pseudo-count
            for each outcome as specified).

            *PERKS* : Maximum a posteriori esimator using Dirichlet prior
            (equivalent to maximum likelihood with pseudo-count for each
            outcome set to 1/L, where L is the number of possible outcomes.

            *MINIMAX* : Maximum a posteriori esimator using Dirichlet prior
            (equivalent to maximum likelihood with pseudo-count for each
            outcome set to sqrt(N)/L, where N is the total number of
            realisations and where L is the number of possible outcomes.

            *JAMES-STEIN* : James-Stein estimator [HaSt09].

            *GOOD-TURING* : Good-Turing estimator [GaSa95].

    Alphabet_X, Alphabet_Y : numpy array (or array-like object such as a list \
    of immutables, as accepted by np.array())
        Respectively an array specifying the alphabet/alphabets of possible
        outcomes that random variable realisations in array X, Y may assume.
        Defaults to None, in which case the alphabet/alphabets of possible
        outcomes is/are implicitly based the observed outcomes in array X, Y
        respectively, with no additional, unobserved outcomes. In combination
        with any estimator other than maximum likelihood, it may be useful to
        specify alphabets including unobserved outcomes. For such cases,
        successive possible outcomes of a random variable are indexed by the
        last axis in Alphabet_X, Alphabet_Y respectively; multiple alphabets
        may be specified using preceding axes, with the requirement
        X.shape[:-1]==Alphabet_X.shape[:-1] (analogously for Y). Alphabets of
        different sizes may be specified either using numpy masked arrays, or
        by padding with the chosen placeholder fill_value.

        NB: When specifying alphabets, an alphabet of possible joint outcomes
        is always implicit from the alphabets of possible (marginal) outcomes
        in Alphabet_X, Alphabet_Y. For example, specifying
        Alphabet_X=Alphabet_Y=np.array(((1,2)) implies an alphabet of possible
        joint outcomes np.array(((1,1,2,2),(1,2,1,2))).
    keep_dims : boolean
        When set to True and cartesian_product==False an additional dimension
        of length one is appended to the returned array, facilitating any
        broadcast operations required by the user (defaults to False). Has no
        effect when cartesian_product==True.

    **Implementation notes**:

    Before estimation, outcomes are mapped to the set of non-negative integers
    internally, with the value -1 representing missing data. To avoid this
    internal conversion step, supply integer data and use the default fill
    value -1.
    """
    if Y is None:
        Y = X
        cartesian_product = True
        Alphabet_Y = Alphabet_X

    X, fill_value_X = _sanitise_array_input(X, fill_value)
    Y, fill_value_Y = _sanitise_array_input(Y, fill_value)
    if Alphabet_X is not None:
        Alphabet_X, fill_value_Alphabet_X = _sanitise_array_input(
            Alphabet_X, fill_value
        )
        Alphabet_X, _ = _autocreate_alphabet(Alphabet_X, fill_value_Alphabet_X)
    else:
        Alphabet_X, fill_value_Alphabet_X = _autocreate_alphabet(X, fill_value_X)
    if Alphabet_Y is not None:
        Alphabet_Y, fill_value_Alphabet_Y = _sanitise_array_input(
            Alphabet_Y, fill_value
        )
        Alphabet_Y, _ = _autocreate_alphabet(Alphabet_Y, fill_value_Alphabet_Y)
    else:
        Alphabet_Y, fill_value_Alphabet_Y = _autocreate_alphabet(Y, fill_value_Y)

    if X.size == 0:
        raise ValueError("arg X contains no elements")
    if Y.size == 0:
        raise ValueError("arg Y contains no elements")
    if np.any(_isnan(X)):
        raise ValueError("arg X contains NaN values")
    if np.any(_isnan(Y)):
        raise ValueError("arg Y contains NaN values")
    if Alphabet_X.size == 0:
        raise ValueError("arg Alphabet_X contains no elements")
    if np.any(_isnan(Alphabet_X)):
        raise ValueError("arg Alphabet_X contains NaN values")
    if Alphabet_Y.size == 0:
        raise ValueError("arg Alphabet_Y contains no elements")
    if np.any(_isnan(Alphabet_Y)):
        raise ValueError("arg Alphabet_Y contains NaN values")
    if _isnan(fill_value_X):
        raise ValueError("fill value for arg X is NaN")
    if _isnan(fill_value_Y):
        raise ValueError("fill value for arg Y is NaN")
    if X.shape[:-1] != Alphabet_X.shape[:-1]:
        raise ValueError("leading dimensions of args X and Alphabet_X do not " "match")
    if Y.shape[:-1] != Alphabet_Y.shape[:-1]:
        raise ValueError("leading dimensions of args Y and Alphabet_Y do not " "match")
    if not isinstance(norm_factor, str):
        raise ValueError("arg norm_factor not a string")
    if not cartesian_product and X.shape != Y.shape:
        raise ValueError("dimensions of args X and Y do not match")
    if cartesian_product and X.shape[-1] != Y.shape[-1]:
        raise ValueError("trailing dimensions of args X and Y do not match")
    # NB: No base parameter needed here, therefore no test!

    S, fill_value = _map_observations_to_integers(
        (X, Alphabet_X, Y, Alphabet_Y),
        (fill_value_X, fill_value_Alphabet_X, fill_value_Y, fill_value_Alphabet_Y),
    )
    X, Alphabet_X, Y, Alphabet_Y = S

    I = information_mutual(
        X,
        Y,
        cartesian_product,
        fill_value=fill_value,
        estimator=estimator,
        Alphabet_X=Alphabet_X,
        Alphabet_Y=Alphabet_Y,
    )

    norm_factor = norm_factor.upper().replace(" ", "")
    if norm_factor == "Y":
        H2 = entropy(
            Y, fill_value=fill_value, estimator=estimator, Alphabet_X=Alphabet_Y
        )
        H2 = np.reshape(
            H2, np.append(np.ones(I.ndim - H2.ndim), H2.shape).astype("int")
        )

        C = H2
    elif norm_factor == "X":
        H1 = entropy(
            X, fill_value=fill_value, estimator=estimator, Alphabet_X=Alphabet_X
        )
        H1 = np.reshape(
            H1, np.append(H1.shape, np.ones(I.ndim - H1.ndim)).astype("int")
        )

        C = H1
    elif norm_factor == "Y+X" or norm_factor == "X+Y":
        H1 = entropy(
            X, fill_value=fill_value, estimator=estimator, Alphabet_X=Alphabet_X
        )
        H1 = np.reshape(
            H1, np.append(H1.shape, np.ones(I.ndim - H1.ndim)).astype("int")
        )
        H2 = entropy(
            Y, fill_value=fill_value, estimator=estimator, Alphabet_X=Alphabet_Y
        )
        H2 = np.reshape(
            H2, np.append(np.ones(I.ndim - H2.ndim), H2.shape).astype("int")
        )

        C = H1 + H2
    elif norm_factor == "MIN":
        H1 = entropy(
            X, fill_value=fill_value, estimator=estimator, Alphabet_X=Alphabet_X
        )
        H1 = np.reshape(
            H1, np.append(H1.shape, np.ones(I.ndim - H1.ndim)).astype("int")
        )
        H2 = entropy(
            Y, fill_value=fill_value, estimator=estimator, Alphabet_X=Alphabet_Y
        )
        H2 = np.reshape(
            H2, np.append(np.ones(I.ndim - H2.ndim), H2.shape).astype("int")
        )

        C = np.minimum(H1, H2)
    elif norm_factor == "MAX":
        H1 = entropy(
            X, fill_value=fill_value, estimator=estimator, Alphabet_X=Alphabet_X
        )
        H1 = np.reshape(
            H1, np.append(H1.shape, np.ones(I.ndim - H1.ndim)).astype("int")
        )
        H2 = entropy(
            Y, fill_value=fill_value, estimator=estimator, Alphabet_X=Alphabet_Y
        )
        H2 = np.reshape(
            H2, np.append(np.ones(I.ndim - H2.ndim), H2.shape).astype("int")
        )

        C = np.maximum(H1, H2)
    elif norm_factor == "XY" or norm_factor == "YX":
        if not cartesian_product:
            H = np.empty(X.shape[:-1])
            if H.ndim > 0:
                H[:] = np.nan
            else:
                H = np.float64(np.nan)

            # Re-shape H and X, so that we may handle multi-dimensional arrays
            # equivalently and iterate across 0th axis
            # Re-shape X, so that we may handle multi-dimensional arrays
            # equivalently and iterate across 0th axis
            X = np.reshape(X, (-1, X.shape[-1]))
            # Re-shape Y, so that we may handle multi-dimensional arrays
            # equivalently and iterate across 0th axis
            Y = np.reshape(Y, (-1, Y.shape[-1]))
            Alphabet_X = np.reshape(Alphabet_X, (-1, Alphabet_X.shape[-1]))
            Alphabet_Y = np.reshape(Alphabet_Y, (-1, Alphabet_Y.shape[-1]))

            orig_shape_H = H.shape
            H = np.reshape(H, (-1, 1))

            for i in range(X.shape[0]):
                H[i] = entropy_joint(
                    np.vstack((X[i], Y[i])),
                    fill_value=fill_value,
                    estimator=estimator,
                    Alphabet_X=_vstack_pad((Alphabet_X[i], Alphabet_Y[i]), fill_value),
                )
            H = np.reshape(H, orig_shape_H)

            C = H
        else:

            def f(X, Y, Alphabet_X, Alphabet_Y):
                return entropy_joint(
                    np.vstack((X, Y)),
                    fill_value=fill_value,
                    estimator=estimator,
                    Alphabet_X=_vstack_pad((Alphabet_X, Alphabet_Y), fill_value),
                )

            H = _cartesian_product_apply(X, Y, f, Alphabet_X, Alphabet_Y)

            C = H
    elif norm_factor == "SQRT":
        H1 = entropy(
            X, fill_value=fill_value, estimator=estimator, Alphabet_X=Alphabet_X
        )
        H1 = np.reshape(
            H1, np.append(H1.shape, np.ones(I.ndim - H1.ndim)).astype("int")
        )
        H2 = entropy(
            Y, fill_value=fill_value, estimator=estimator, Alphabet_X=Alphabet_Y
        )
        H2 = np.reshape(
            H2, np.append(np.ones(I.ndim - H2.ndim), H2.shape).astype("int")
        )

        C = np.sqrt(H1 * H2)
    else:
        raise ValueError("arg norm_factor has invalid value")

    I = I / C

    if keep_dims and not cartesian_product:
        I = I[..., np.newaxis]

    return I




[docs]
def information_variation(
    X,
    Y=None,
    cartesian_product=False,
    base=2,
    fill_value=-1,
    estimator="ML",
    Alphabet_X=None,
    Alphabet_Y=None,
    keep_dims=False,
):
    """
    Returns the variation of information [Meil03] between arrays X and Y, each
    containing discrete random variable realisations.

    **Mathematical definition**:

    Given discrete random variables :math:`X`, :math:`Y`, the variation of
    information :math:`VI(X;Y)` is defined as:

    .. math::
        VI(X;Y) = H(X|Y) + H(Y|X)

    where :math:`H(\\cdot|\\cdot)` denotes the conditional entropy.

    **Estimation**:

    Variation of information is estimated based on frequency tables, using the
    following functions:

        entropy_joint()

        entropy()

    See below for a list of available estimators. Note that although variation
    of information is a non-negative quantity, depending on the chosen
    estimator the obtained estimate may be negative.

    **Parameters**:

    X,Y : numpy array (or array-like object such as a list of immutables, as \
    accepted by np.array())
        *cartesian_product==False and Y is not None*: X and Y are arrays
        containing discrete random variable realisations, with
        X.shape==Y.shape. Successive realisations of a random variable are
        indexed by the last axis in the respective arrays; multiple random
        variables in X and Y may be specified using preceding axes of the
        respective arrays (random variables are paired **one-to-one** between X
        and Y). When X.ndim==Y.ndim==1, returns a scalar. When X.ndim>1 and
        Y.ndim>1, returns an array of estimated information values with
        dimensions X.shape[:-1]. Neither X nor Y may contain (floating point)
        NaN values. Missing data may be specified using numpy masked arrays, as
        well as using standard numpy array/array-like objects; see below
        for details.

        *cartesian_product==True and Y is not None*: X and Y are arrays
        containing discrete random variable realisations, with
        X.shape[-1]==Y.shape[-1]. Successive realisations of a random variable
        are indexed by the last axis in the respective arrays; multiple random
        variables in X and Y may be specified using preceding axes of the
        respective arrays (random variables are paired **many-to-many** between
        X and Y). When X.ndim==Y.ndim==1, returns a scalar. When X.ndim>1 or
        Y.ndim>1, returns an array of estimated information values with
        dimensions np.append(X.shape[:-1],Y.shape[:-1]). Neither X nor Y may
        contain (floating point) NaN values. Missing data may be specified
        using numpy masked arrays, as well as using standard numpy
        array/array-like objects; see below for details.

        *Y is None*: Equivalent to information_variation(X, X, ... ). Thus, a
        shorthand syntax for computing variation of information (in bits)
        between all pairs of random variables in X is information_variation(X).
    cartesian_product : boolean
        Indicates whether random variables are paired **one-to-one** between X
        and Y (cartesian_product==False, the default value) or **many-to-many**
        between X and Y (cartesian_product==True).
    base : float
        The desired logarithmic base (default 2).
    fill_value : object
        It is possible to specify missing data using numpy masked arrays,
        pandas Series/DataFrames, as well as using standard numpy
        array/array-like objects with assigned placeholder values. When using
        numpy masked arrays, this function invokes np.ma.filled() internally,
        so that missing data are represented with the array's object-internal
        placeholder value fill_value (this function's fill_value parameter is
        ignored in such cases). When using pandas Series/DataFrames, an initial
        conversion to a numpy masked array is performed. When using standard
        numpy array/array-like objects, this function's fill_value parameter is
        used to specify the placeholder value for missing data (defaults to
        -1).

        Data equal to the placeholder value are subsequently ignored.
    estimator : str or float
        The desired estimator (see above for details on estimators). Possible
        values are:

            *'ML' (the default value)* : Maximum likelihood estimator.

            *any floating point value* : Maximum a posteriori esimator using
            Dirichlet prior (equivalent to maximum likelihood with pseudo-count
            for each outcome as specified).

            *PERKS* : Maximum a posteriori esimator using Dirichlet prior
            (equivalent to maximum likelihood with pseudo-count for each
            outcome set to 1/L, where L is the number of possible outcomes.

            *MINIMAX* : Maximum a posteriori esimator using Dirichlet prior
            (equivalent to maximum likelihood with pseudo-count for each
            outcome set to sqrt(N)/L, where N is the total number of
            realisations and where L is the number of possible outcomes.

            *JAMES-STEIN* : James-Stein estimator [HaSt09].

            *GOOD-TURING* : Good-Turing estimator [GaSa95].

    Alphabet_X, Alphabet_Y : numpy array (or array-like object such as a list \
    of immutables, as accepted by np.array())
        Respectively an array specifying the alphabet/alphabets of possible
        outcomes that random variable realisations in array X, Y may assume.
        Defaults to None, in which case the alphabet/alphabets of possible
        outcomes is/are implicitly based the observed outcomes in array X, Y
        respectively, with no additional, unobserved outcomes. In combination
        with any estimator other than maximum likelihood, it may be useful to
        specify alphabets including unobserved outcomes. For such cases,
        successive possible outcomes of a random variable are indexed by the
        last axis in Alphabet_X, Alphabet_Y respectively; multiple alphabets
        may be specified using preceding axes, with the requirement
        X.shape[:-1]==Alphabet_X.shape[:-1] (analogously for Y). Alphabets of
        different sizes may be specified either using numpy masked arrays, or
        by padding with the chosen placeholder fill_value.

        NB: When specifying alphabets, an alphabet of possible joint outcomes
        is always implicit from the alphabets of possible (marginal) outcomes
        in Alphabet_X, Alphabet_Y. For example, specifying
        Alphabet_X=Alphabet_Y=np.array(((1,2)) implies an alphabet of possible
        joint outcomes np.array(((1,1,2,2),(1,2,1,2))).
    keep_dims : boolean
        When set to True and cartesian_product==False an additional dimension
        of length one is appended to the returned array, facilitating any
        broadcast operations required by the user (defaults to False). Has no
        effect when cartesian_product==True.

    **Implementation notes**:

    Before estimation, outcomes are mapped to the set of non-negative integers
    internally, with the value -1 representing missing data. To avoid this
    internal conversion step, supply integer data and use the default fill
    value -1.
    """
    if Y is None:
        Y = X
        cartesian_product = True
        Alphabet_Y = Alphabet_X

    H1 = entropy_conditional(
        X, Y, cartesian_product, base, fill_value, estimator, Alphabet_X, Alphabet_Y
    )
    H2 = entropy_conditional(
        Y, X, cartesian_product, base, fill_value, estimator, Alphabet_Y, Alphabet_X
    )

    if cartesian_product:
        H2 = H2.T

    H = H1 + H2

    if keep_dims and not cartesian_product:
        H = H[..., np.newaxis]

    return H




[docs]
def information_mutual(
    X,
    Y=None,
    cartesian_product=False,
    base=2,
    fill_value=-1,
    estimator="ML",
    Alphabet_X=None,
    Alphabet_Y=None,
    keep_dims=False,
):
    """
    Returns the mutual information (see e.g. [CoTh06]) between arrays X and Y,
    each containing discrete random variable realisations.

    **Mathematical definition**:

    Given discrete random variables :math:`X`, :math:`Y`, the mutual
    information :math:`I(X;Y)` is defined as:

    .. math::
        I(X;Y) = H(X) - H(X|Y)

    where :math:`H(\\cdot)` denotes the entropy and where
    :math:`H(\\cdot|\\cdot)` denotes the conditional entropy.

    **Estimation**:

    Mutual information is estimated based on frequency tables, using the
    following functions:

        entropy_joint()

        entropy()

    See below for a list of available estimators. Note that although mutual
    information is a non-negative quantity, depending on the chosen estimator
    the obtained estimate may be negative.

    **Parameters**:

    X,Y : numpy array (or array-like object such as a list of immutables, as \
    accepted by np.array())
        *cartesian_product==False and Y is not None*: X and Y are arrays
        containing discrete random variable realisations, with
        X.shape==Y.shape. Successive realisations of a random variable are
        indexed by the last axis in the respective arrays; multiple random
        variables in X and Y may be specified using preceding axes of the
        respective arrays (random variables are paired **one-to-one** between X
        and Y). When X.ndim==Y.ndim==1, returns a scalar. When X.ndim>1 and
        Y.ndim>1, returns an array of estimated mutual information values with
        dimensions X.shape[:-1]. Neither X nor Y may contain (floating point)
        NaN values. Missing data may be specified using numpy masked arrays, as
        well as using standard numpy array/array-like objects; see below
        for details.

        *cartesian_product==True and Y is not None*: X and Y are arrays
        containing discrete random variable realisations, with
        X.shape[-1]==Y.shape[-1]. Successive realisations of a random variable
        are indexed by the last axis in the respective arrays; multiple random
        variables in X and Y may be specified using preceding axes of the
        respective arrays (random variables are paired **many-to-many** between
        X and Y). When X.ndim==Y.ndim==1, returns a scalar. When X.ndim>1 or
        Y.ndim>1, returns an array of estimated mutual information values with
        dimensions np.append(X.shape[:-1],Y.shape[:-1]). Neither X nor Y may
        contain (floating point) NaN values. Missing data may be specified
        using numpy masked arrays, as well as using standard numpy
        array/array-like objects; see below for details.

        *Y is None*: Equivalent to information_mutual(X, X, ... ). Thus, a
        shorthand syntax for computing mutual information (in bits) between all
        pairs of random variables in X is information_mutual(X).
    cartesian_product : boolean
        Indicates whether random variables are paired **one-to-one** between X
        and Y (cartesian_product==False, the default value) or **many-to-many**
        between X and Y (cartesian_product==True).
    base : float
        The desired logarithmic base (default 2).
    fill_value : object
        It is possible to specify missing data using numpy masked arrays,
        pandas Series/DataFrames, as well as using standard numpy
        array/array-like objects with assigned placeholder values. When using
        numpy masked arrays, this function invokes np.ma.filled() internally,
        so that missing data are represented with the array's object-internal
        placeholder value fill_value (this function's fill_value parameter is
        ignored in such cases). When using pandas Series/DataFrames, an initial
        conversion to a numpy masked array is performed. When using standard
        numpy array/array-like objects, this function's fill_value parameter is
        used to specify the placeholder value for missing data (defaults to
        -1).

        Data equal to the placeholder value are subsequently ignored.
    estimator : str or float
        The desired estimator (see above for details on estimators). Possible
        values are:

            *'ML' (the default value)* : Maximum likelihood estimator.

            *any floating point value* : Maximum a posteriori esimator using
            Dirichlet prior (equivalent to maximum likelihood with pseudo-count
            for each outcome as specified).

            *PERKS* : Maximum a posteriori esimator using Dirichlet prior
            (equivalent to maximum likelihood with pseudo-count for each
            outcome set to 1/L, where L is the number of possible outcomes.

            *MINIMAX* : Maximum a posteriori esimator using Dirichlet prior
            (equivalent to maximum likelihood with pseudo-count for each
            outcome set to sqrt(N)/L, where N is the total number of
            realisations and where L is the number of possible outcomes.

            *JAMES-STEIN* : James-Stein estimator [HaSt09].

            *GOOD-TURING* : Good-Turing estimator [GaSa95].

    Alphabet_X, Alphabet_Y : numpy array (or array-like object such as a list \
    of immutables, as accepted by np.array())
        Respectively an array specifying the alphabet/alphabets of possible
        outcomes that random variable realisations in array X, Y may assume.
        Defaults to None, in which case the alphabet/alphabets of possible
        outcomes is/are implicitly based the observed outcomes in array X, Y
        respectively, with no additional, unobserved outcomes. In combination
        with any estimator other than maximum likelihood, it may be useful to
        specify alphabets including unobserved outcomes. For such cases,
        successive possible outcomes of a random variable are indexed by the
        last axis in Alphabet_X, Alphabet_Y respectively; multiple alphabets
        may be specified using preceding axes, with the requirement
        X.shape[:-1]==Alphabet_X.shape[:-1] (analogously for Y). Alphabets of
        different sizes may be specified either using numpy masked arrays, or
        by padding with the chosen placeholder fill_value.

        NB: When specifying alphabets, an alphabet of possible joint outcomes
        is always implicit from the alphabets of possible (marginal) outcomes
        in Alphabet_X, Alphabet_Y. For example, specifying
        Alphabet_X=Alphabet_Y=np.array(((1,2)) implies an alphabet of possible
        joint outcomes np.array(((1,1,2,2),(1,2,1,2))).
    keep_dims : boolean
        When set to True and cartesian_product==False an additional dimension
        of length one is appended to the returned array, facilitating any
        broadcast operations required by the user (defaults to False). Has no
        effect when cartesian_product==True.

    **Implementation notes**:

    Before estimation, outcomes are mapped to the set of non-negative integers
    internally, with the value -1 representing missing data. To avoid this
    internal conversion step, supply integer data and use the default fill
    value -1.
    """
    H_conditional = entropy_conditional(
        X, Y, cartesian_product, base, fill_value, estimator, Alphabet_X, Alphabet_Y
    )
    H = entropy(X, base, fill_value, estimator, Alphabet_X)

    H = np.reshape(
        H, np.append(H.shape, np.ones(H_conditional.ndim - H.ndim)).astype("int")
    )

    H = H - H_conditional

    if keep_dims and not cartesian_product:
        H = H[..., np.newaxis]

    return H




[docs]
def entropy_cross(
    X,
    Y=None,
    cartesian_product=False,
    base=2,
    fill_value=-1,
    estimator="ML",
    Alphabet_X=None,
    Alphabet_Y=None,
    keep_dims=False,
):
    """
    Returns the cross entropy (see e.g. [Murp12]) between arrays X and Y, each
    containing discrete random variable realisations.

    **Mathematical definition**:

    Denoting with :math:`P_X(x)`, :math:`P_Y(x)` respectively the probability
    of observing an outcome :math:`x` with discrete random variables :math:`X`,
    :math:`Y`, the cross entropy :math:`H^\\times(X,Y)` is defined as:

    .. math::
        H^\\times(X,Y) = -\\sum_x {P_X(x) \\log {P_Y(x)}}.

    **Estimation**:

    Cross entropy is estimated based on frequency tables. See below for a list
    of available estimators.

    **Parameters**:

    X,Y : numpy array (or array-like object such as a list of immutables, as \
    accepted by np.array())
        *cartesian_product==False and Y is not None*: X and Y are arrays
        containing discrete random variable realisations, with
        X.shape[:-1]==Y.shape[:-1]. Successive realisations of a random
        variable are indexed by the last axis in the respective arrays;
        multiple random variables in X and Y may be specified using preceding
        axes of the respective arrays (random variables are paired
        **one-to-one** between X and Y). When X.ndim==Y.ndim==1, returns a
        scalar. When X.ndim>1 and Y.ndim>1, returns an array of estimated cross
        entropies with dimensions X.shape[:-1]. Neither X nor Y may contain
        (floating point) NaN values. Missing data may be specified using numpy
        masked arrays, as well as using standard numpy array/array-like
        objects; see below for details.

        *cartesian_product==True and Y is not None*: X and Y are arrays
        containing discrete random variable realisations. Successive
        realisations of a random variable are indexed by the last axis in the
        respective arrays; multiple random variables in X and Y may be
        specified using preceding axes of the respective arrays (random
        variables are paired **many-to-many** between X and Y). When
        X.ndim==Y.ndim==1, returns a scalar. When X.ndim>1 or Y.ndim>1, returns
        an array of estimated cross entropies with dimensions
        np.append(X.shape[:-1],Y.shape[:-1]). Neither X nor Y may contain
        (floating point) NaN values. Missing data may be specified using numpy
        masked arrays, as well as using standard numpy array/array-like
        objects; see below for details.

        *Y is None*: Equivalent to entropy_cross(X, X, ... ). Thus, a shorthand
        syntax for computing cross entropies (in bits) between all pairs of
        random variables in X is entropy_cross(X).
    cartesian_product : boolean
        Indicates whether random variables are paired **one-to-one** between X
        and Y (cartesian_product==False, the default value) or **many-to-many**
        between X and Y (cartesian_product==True).
    base : float
        The desired logarithmic base (default 2).
    fill_value : object
        It is possible to specify missing data using numpy masked arrays,
        pandas Series/DataFrames, as well as using standard numpy
        array/array-like objects with assigned placeholder values. When using
        numpy masked arrays, this function invokes np.ma.filled() internally,
        so that missing data are represented with the array's object-internal
        placeholder value fill_value (this function's fill_value parameter is
        ignored in such cases). When using pandas Series/DataFrames, an initial
        conversion to a numpy masked array is performed. When using standard
        numpy array/array-like objects, this function's fill_value parameter is
        used to specify the placeholder value for missing data (defaults to
        -1).

        Data equal to the placeholder value are subsequently ignored.
    estimator : str or float
        The desired estimator (see above for details on estimators). Possible
        values are:

            *'ML' (the default value)* : Maximum likelihood estimator.

            *any floating point value* : Maximum a posteriori esimator using
            Dirichlet prior (equivalent to maximum likelihood with pseudo-count
            for each outcome as specified).

            *PERKS* : Maximum a posteriori esimator using Dirichlet prior
            (equivalent to maximum likelihood with pseudo-count for each
            outcome set to 1/L, where L is the number of possible outcomes.

            *MINIMAX* : Maximum a posteriori esimator using Dirichlet prior
            (equivalent to maximum likelihood with pseudo-count for each
            outcome set to sqrt(N)/L, where N is the total number of
            realisations and where L is the number of possible outcomes.

            *JAMES-STEIN* : James-Stein estimator [HaSt09].

            *GOOD-TURING* : Good-Turing estimator [GaSa95].

    Alphabet_X, Alphabet_Y : numpy array (or array-like object such as a list \
    of immutables, as accepted by np.array())
        Respectively an array specifying the alphabet/alphabets of possible
        outcomes that random variable realisations in array X, Y may assume.
        Defaults to None, in which case the alphabet/alphabets of possible
        outcomes is/are implicitly based the observed outcomes in array X, Y
        respectively, with no additional, unobserved outcomes. In combination
        with any estimator other than maximum likelihood, it may be useful to
        specify alphabets including unobserved outcomes. For such cases,
        successive possible outcomes of a random variable are indexed by the
        last axis in Alphabet_X, Alphabet_Y respectively; multiple alphabets
        may be specified using preceding axes, with the requirement
        X.shape[:-1]==Alphabet_X.shape[:-1] (analogously for Y). Alphabets of
        different sizes may be specified either using numpy masked arrays, or
        by padding with the chosen placeholder fill_value.
    keep_dims : boolean
        When set to True and cartesian_product==False an additional dimension
        of length one is appended to the returned array, facilitating any
        broadcast operations required by the user (defaults to False). Has no
        effect when cartesian_product==True.

    **Implementation notes**:

    Before estimation, outcomes are mapped to the set of non-negative integers
    internally, with the value -1 representing missing data. To avoid this
    internal conversion step, supply integer data and use the default fill
    value -1.

    """
    if Y is None:
        Y = X
        cartesian_product = True
        Alphabet_Y = Alphabet_X

    X, fill_value_X = _sanitise_array_input(X, fill_value)
    Y, fill_value_Y = _sanitise_array_input(Y, fill_value)
    if Alphabet_X is not None:
        Alphabet_X, fill_value_Alphabet_X = _sanitise_array_input(
            Alphabet_X, fill_value
        )
        Alphabet_X, _ = _autocreate_alphabet(Alphabet_X, fill_value_Alphabet_X)
    else:
        Alphabet_X, fill_value_Alphabet_X = _autocreate_alphabet(X, fill_value_X)
    if Alphabet_Y is not None:
        Alphabet_Y, fill_value_Alphabet_Y = _sanitise_array_input(
            Alphabet_Y, fill_value
        )
        Alphabet_Y, _ = _autocreate_alphabet(Alphabet_Y, fill_value_Alphabet_Y)
    else:
        Alphabet_Y, fill_value_Alphabet_Y = _autocreate_alphabet(Y, fill_value_Y)

    if X.size == 0:
        raise ValueError("arg X contains no elements")
    if Y.size == 0:
        raise ValueError("arg Y contains no elements")
    if np.any(_isnan(X)):
        raise ValueError("arg X contains NaN values")
    if np.any(_isnan(Y)):
        raise ValueError("arg Y contains NaN values")
    if Alphabet_X.size == 0:
        raise ValueError("arg Alphabet_X contains no elements")
    if np.any(_isnan(Alphabet_X)):
        raise ValueError("arg Alphabet_X contains NaN values")
    if Alphabet_Y.size == 0:
        raise ValueError("arg Alphabet_Y contains no elements")
    if np.any(_isnan(Alphabet_Y)):
        raise ValueError("arg Alphabet_Y contains NaN values")
    if _isnan(fill_value_X):
        raise ValueError("fill value for arg X is NaN")
    if _isnan(fill_value_Y):
        raise ValueError("fill value for arg Y is NaN")
    if X.shape[:-1] != Alphabet_X.shape[:-1]:
        raise ValueError("leading dimensions of args X and Alphabet_X do not " "match")
    if Y.shape[:-1] != Alphabet_Y.shape[:-1]:
        raise ValueError("leading dimensions of args Y and Alphabet_Y do not " "match")
    if not cartesian_product and X.shape[:-1] != Y.shape[:-1]:
        raise ValueError("dimensions of args X and Y do not match")
    if not (np.isscalar(base) and np.isreal(base) and base > 0):
        raise ValueError("arg base not a positive real-valued scalar")

    S, fill_value = _map_observations_to_integers(
        (X, Alphabet_X, Y, Alphabet_Y),
        (fill_value_X, fill_value_Alphabet_X, fill_value_Y, fill_value_Alphabet_Y),
    )
    X, Alphabet_X, Y, Alphabet_Y = S

    if not cartesian_product:
        H = np.empty(X.shape[:-1])
        if H.ndim > 0:
            H[:] = np.nan
        else:
            H = np.float64(np.nan)
    else:

        def f(X, Y, Alphabet_X, Alphabet_Y):
            return entropy_cross(
                X, Y, False, base, fill_value, estimator, Alphabet_X, Alphabet_Y
            )

        return _cartesian_product_apply(X, Y, f, Alphabet_X, Alphabet_Y)

    # Re-shape H, X and Y, so that we may handle multi-dimensional arrays
    # equivalently and iterate across 0th axis
    X = np.reshape(X, (-1, X.shape[-1]))
    Y = np.reshape(Y, (-1, Y.shape[-1]))
    Alphabet_X = np.reshape(Alphabet_X, (-1, Alphabet_X.shape[-1]))
    Alphabet_Y = np.reshape(Alphabet_Y, (-1, Alphabet_Y.shape[-1]))
    orig_shape_H = H.shape
    H = np.reshape(H, (-1, 1))

    _verify_alphabet_sufficiently_large(X, Alphabet_X, fill_value)
    _verify_alphabet_sufficiently_large(Y, Alphabet_Y, fill_value)

    # NB: Observations are not considered jointly, thus elements in each row
    # are sorted independently
    X = np.sort(X, axis=1)
    Y = np.sort(Y, axis=1)

    # Compute symbol run-lengths
    # Compute symbol change indicators
    B = X[:, 1:] != X[:, :-1]
    C = Y[:, 1:] != Y[:, :-1]
    for i in range(X.shape[0]):
        # Obtain symbol change positions
        I = np.append(np.where(B[i]), X.shape[1] - 1)
        # Compute run lengths
        L = np.diff(np.append(-1, I))

        alphabet_X = X[i, I]
        if estimator != "ML":
            L, alphabet_X = _append_empty_bins_using_alphabet(
                L, alphabet_X, Alphabet_X[i], fill_value
            )
        L, alphabet_X = _remove_counts_at_fill_value(L, alphabet_X, fill_value)
        if not np.any(L):
            continue
        P1, _ = _estimate_probabilities(L, estimator)

        # Obtain symbol change positions
        J = np.append(np.where(C[i]), Y.shape[1] - 1)
        # Compute run lengths
        L = np.diff(np.append(-1, J))

        alphabet_Y = Y[i, J]
        if estimator != "ML":
            L, alphabet_Y = _append_empty_bins_using_alphabet(
                L, alphabet_Y, Alphabet_Y[i], fill_value
            )
        L, alphabet_Y = _remove_counts_at_fill_value(L, alphabet_Y, fill_value)
        if not np.any(L):
            continue
        P2, _ = _estimate_probabilities(L, estimator)

        # Merge probability distributions, so that common symbols have common
        # array location
        Alphabet = np.union1d(alphabet_X, alphabet_Y)
        P = np.zeros_like(Alphabet, dtype=P1.dtype)
        Q = np.zeros_like(Alphabet, dtype=P2.dtype)
        P[np.isin(Alphabet, alphabet_X, assume_unique=True)] = P1
        Q[np.isin(Alphabet, alphabet_Y, assume_unique=True)] = P2

        H[i] = entropy_cross_pmf(P, Q, False, base)

    # Reverse re-shaping
    H = np.reshape(H, orig_shape_H)

    if keep_dims and not cartesian_product:
        H = H[..., np.newaxis]

    return H




[docs]
def divergence_kullbackleibler(
    X,
    Y=None,
    cartesian_product=False,
    base=2,
    fill_value=-1,
    estimator="ML",
    Alphabet_X=None,
    Alphabet_Y=None,
    keep_dims=False,
):
    """
    Returns the Kullback-Leibler divergence (see e.g. [CoTh06]) between arrays
    X and Y, each containing discrete random variable realisations.

    **Mathematical definition**:

    Denoting with :math:`P_X(x)`, :math:`P_Y(x)` respectively the probability
    of observing an outcome :math:`x` with discrete random variables :math:`X`,
    :math:`Y`, the Kullback-Leibler divergence
    :math:`D_{\\mathrm{KL}}(P_X\\parallel P_Y)` is defined as:

    .. math::
        D_{\\mathrm{KL}}(P_X \\parallel P_Y) =
        -\\sum_x {P_X(x) \\log {\\frac{P_Y(x)}{P_X(x)}}}.

    **Estimation**:

    Kullback-Leibler divergence is estimated based on frequency tables, using
    the following functions:

        entropy_cross()

        entropy()

    See below for a list of available estimators. Note that although
    Kullback-Leibler divergence is a non-negative quantity, depending on the
    chosen estimator the obtained estimate may be negative.

    **Parameters**:

    X,Y : numpy array (or array-like object such as a list of immutables, as \
    accepted by np.array())
        *cartesian_product==False and Y is not None*: X and Y are arrays
        containing discrete random variable realisations, with
        X.shape[:-1]==Y.shape[:-1]. Successive realisations of a random
        variable are indexed by the last axis in the respective arrays;
        multiple random variables in X and Y may be specified using preceding
        axes of the respective arrays (random variables are paired
        **one-to-one** between X and Y). When X.ndim==Y.ndim==1, returns a
        scalar. When X.ndim>1 and Y.ndim>1, returns an array of estimated
        divergence values with dimensions X.shape[:-1]. Neither X nor Y may
        contain (floating point) NaN values. Missing data may be specified
        using numpy masked arrays, as well as using standard numpy
        array/array-like objects; see below for details.

        *cartesian_product==True and Y is not None*: X and Y are arrays
        containing discrete random variable realisations. Successive
        realisations of a random variable are indexed by the last axis in the
        respective arrays; multiple random variables in X and Y may be
        specified using preceding axes of the respective arrays (random
        variables are paired **many-to-many** between X and Y). When
        X.ndim==Y.ndim==1, returns a scalar. When X.ndim>1 or Y.ndim>1, returns
        an array of estimated divergence values with dimensions
        np.append(X.shape[:-1],Y.shape[:-1]). Neither X nor Y may contain
        (floating point) NaN values. Missing data may be specified using numpy
        masked arrays, as well as using standard numpy array/array-like
        objects; see below for details.

        *Y is None*: Equivalent to divergence_kullbackleibler(X, X, ... ).
        Thus, a shorthand syntax for computing Kullback-Leibler divergence (in
        bits) between all pairs of random variables in X is
        divergence_kullbackleibler(X).
    cartesian_product : boolean
        Indicates whether random variables are paired **one-to-one** between X
        and Y (cartesian_product==False, the default value) or **many-to-many**
        between X and Y (cartesian_product==True).
    base : float
        The desired logarithmic base (default 2).
    fill_value : object
        It is possible to specify missing data using numpy masked arrays,
        pandas Series/DataFrames, as well as using standard numpy
        array/array-like objects with assigned placeholder values. When using
        numpy masked arrays, this function invokes np.ma.filled() internally,
        so that missing data are represented with the array's object-internal
        placeholder value fill_value (this function's fill_value parameter is
        ignored in such cases). When using pandas Series/DataFrames, an initial
        conversion to a numpy masked array is performed. When using standard
        numpy array/array-like objects, this function's fill_value parameter is
        used to specify the placeholder value for missing data (defaults to
        -1).

        Data equal to the placeholder value are subsequently ignored.
    estimator : str or float
        The desired estimator (see above for details on estimators). Possible
        values are:

            *'ML' (the default value)* : Maximum likelihood estimator.

            *any floating point value* : Maximum a posteriori esimator using
            Dirichlet prior (equivalent to maximum likelihood with pseudo-count
            for each outcome as specified).

            *PERKS* : Maximum a posteriori esimator using Dirichlet prior
            (equivalent to maximum likelihood with pseudo-count for each
            outcome set to 1/L, where L is the number of possible outcomes.

            *MINIMAX* : Maximum a posteriori esimator using Dirichlet prior
            (equivalent to maximum likelihood with pseudo-count for each
            outcome set to sqrt(N)/L, where N is the total number of
            realisations and where L is the number of possible outcomes.

            *JAMES-STEIN* : James-Stein estimator [HaSt09].

            *GOOD-TURING* : Good-Turing estimator [GaSa95].

    Alphabet_X, Alphabet_Y : numpy array (or array-like object such as a list \
    of immutables, as accepted by np.array())
        Respectively an array specifying the alphabet/alphabets of possible
        outcomes that random variable realisations in array X, Y may assume.
        Defaults to None, in which case the alphabet/alphabets of possible
        outcomes is/are implicitly based the observed outcomes in array X, Y
        respectively, with no additional, unobserved outcomes. In combination
        with any estimator other than maximum likelihood, it may be useful to
        specify alphabets including unobserved outcomes. For such cases,
        successive possible outcomes of a random variable are indexed by the
        last axis in Alphabet_X, Alphabet_Y respectively; multiple alphabets
        may be specified using preceding axes, with the requirement
        X.shape[:-1]==Alphabet_X.shape[:-1] (analogously for Y). Alphabets of
        different sizes may be specified either using numpy masked arrays, or
        by padding with the chosen placeholder fill_value.
    keep_dims : boolean
        When set to True and cartesian_product==False an additional dimension
        of length one is appended to the returned array, facilitating any
        broadcast operations required by the user (defaults to False). Has no
        effect when cartesian_product==True.

    **Implementation notes**:

    Before estimation, outcomes are mapped to the set of non-negative integers
    internally, with the value -1 representing missing data. To avoid this
    internal conversion step, supply integer data and use the default fill
    value -1.

    """
    H_cross = entropy_cross(
        X, Y, cartesian_product, base, fill_value, estimator, Alphabet_X, Alphabet_Y
    )
    H = entropy(X, base, fill_value, estimator, Alphabet_X)

    H = np.reshape(H, np.append(H.shape, np.ones(H_cross.ndim - H.ndim)).astype("int"))

    H = H_cross - H

    if keep_dims and not cartesian_product:
        H = H[..., np.newaxis]

    return H




[docs]
def divergence_jensenshannon(
    X,
    Y=None,
    cartesian_product=False,
    base=2,
    fill_value=-1,
    estimator="ML",
    Alphabet_X=None,
    Alphabet_Y=None,
    keep_dims=False,
):
    """
    Returns the Jensen-Shannon divergence [Lin91] between arrays X and Y, each
    containing discrete random variable realisations.

    **Mathematical definition**:

    Denoting with :math:`P_X`, :math:`P_Y` respectively probability
    distributions with common domain, associated with discrete random variables
    :math:`X`, :math:`Y`, the Jensen-Shannon divergence
    :math:`D_{\\mathrm{JS}}(P_X \\parallel P_Y)` is defined as:

    .. math::
        D_{\\mathrm{JS}}(P_X \\parallel P_Y) =
        \\frac{1}{2} D_{\\mathrm{KL}}(P_X \\parallel M) +
        \\frac{1}{2} D_{\\mathrm{KL}}(P_Y \\parallel M)

    where :math:`M = \\frac{1}{2}(P_X + P_Y)` and where
    :math:`D_{\\mathrm{KL}}(\\cdot \\parallel \\cdot)` denotes the
    Kullback-Leibler divergence.

    **Estimation**:

    Jensen-Shannon divergence is estimated based on frequency tables. See below
    for a list of available estimators.

    **Parameters**:

    X,Y : numpy array (or array-like object such as a list of immutables, as \
    accepted by np.array())
        *cartesian_product==False and Y is not None*: X and Y are arrays
        containing discrete random variable realisations, with
        X.shape[:-1]==Y.shape[:-1]. Successive realisations of a random
        variable are indexed by the last axis in the respective arrays;
        multiple random variables in X and Y may be specified using preceding
        axes of the respective arrays (random variables are paired
        **one-to-one** between X and Y). When X.ndim==Y.ndim==1, returns a
        scalar. When X.ndim>1 and Y.ndim>1, returns an array of estimated
        divergence values with dimensions X.shape[:-1]. Neither X nor Y may
        contain (floating point) NaN values. Missing data may be specified
        using numpy masked arrays, as well as using standard numpy
        array/array-like objects; see below for details.

        *cartesian_product==True and Y is not None*: X and Y are arrays
        containing discrete random variable realisations. Successive
        realisations of a random variable are indexed by the last axis in the
        respective arrays; multiple random variables in X and Y may be
        specified using preceding axes of the respective arrays (random
        variables are paired **many-to-many** between X and Y). When
        X.ndim==Y.ndim==1, returns a scalar. When X.ndim>1 or Y.ndim>1, returns
        an array of estimated divergence values with dimensions
        np.append(X.shape[:-1],Y.shape[:-1]). Neither X nor Y may contain
        (floating point) NaN values. Missing data may be specified using numpy
        masked arrays, as well as using standard numpy array/array-like
        objects; see below for details.

        *Y is None*: Equivalent to divergence_jensenshannon(X, X, ... ). Thus,
        a shorthand syntax for computing Jensen-Shannon divergence (in bits)
        between all pairs of random variables in X is
        divergence_jensenshannon(X).
    cartesian_product : boolean
        Indicates whether random variables are paired **one-to-one** between X
        and Y (cartesian_product==False, the default value) or **many-to-many**
        between X and Y (cartesian_product==True).
    base : float
        The desired logarithmic base (default 2).
    fill_value : object
        It is possible to specify missing data using numpy masked arrays,
        pandas Series/DataFrames, as well as using standard numpy
        array/array-like objects with assigned placeholder values. When using
        numpy masked arrays, this function invokes np.ma.filled() internally,
        so that missing data are represented with the array's object-internal
        placeholder value fill_value (this function's fill_value parameter is
        ignored in such cases). When using pandas Series/DataFrames, an initial
        conversion to a numpy masked array is performed. When using standard
        numpy array/array-like objects, this function's fill_value parameter is
        used to specify the placeholder value for missing data (defaults to
        -1).

        Data equal to the placeholder value are subsequently ignored.
    estimator : str or float
        The desired estimator (see above for details on estimators). Possible
        values are:

            *'ML' (the default value)* : Maximum likelihood estimator.

            *any floating point value* : Maximum a posteriori esimator using
            Dirichlet prior (equivalent to maximum likelihood with pseudo-count
            for each outcome as specified).

            *PERKS* : Maximum a posteriori esimator using Dirichlet prior
            (equivalent to maximum likelihood with pseudo-count for each
            outcome set to 1/L, where L is the number of possible outcomes.

            *MINIMAX* : Maximum a posteriori esimator using Dirichlet prior
            (equivalent to maximum likelihood with pseudo-count for each
            outcome set to sqrt(N)/L, where N is the total number of
            realisations and where L is the number of possible outcomes.

            *JAMES-STEIN* : James-Stein estimator [HaSt09].

            *GOOD-TURING* : Good-Turing estimator [GaSa95].

    Alphabet_X, Alphabet_Y : numpy array (or array-like object such as a list \
    of immutables, as accepted by np.array())
        Respectively an array specifying the alphabet/alphabets of possible
        outcomes that random variable realisations in array X, Y may assume.
        Defaults to None, in which case the alphabet/alphabets of possible
        outcomes is/are implicitly based the observed outcomes in array X, Y
        respectively, with no additional, unobserved outcomes. In combination
        with any estimator other than maximum likelihood, it may be useful to
        specify alphabets including unobserved outcomes. For such cases,
        successive possible outcomes of a random variable are indexed by the
        last axis in Alphabet_X, Alphabet_Y respectively; multiple alphabets
        may be specified using preceding axes, with the requirement
        X.shape[:-1]==Alphabet_X.shape[:-1] (analogously for Y). Alphabets of
        different sizes may be specified either using numpy masked arrays, or
        by padding with the chosen placeholder fill_value.
    keep_dims : boolean
        When set to True and cartesian_product==False an additional dimension
        of length one is appended to the returned array, facilitating any
        broadcast operations required by the user (defaults to False). Has no
        effect when cartesian_product==True.

    **Implementation notes**:

    Before estimation, outcomes are mapped to the set of non-negative integers
    internally, with the value -1 representing missing data. To avoid this
    internal conversion step, supply integer data and use the default fill
    value -1.

    """
    if Y is None:
        Y = X
        cartesian_product = True
        Alphabet_Y = Alphabet_X

    X, fill_value_X = _sanitise_array_input(X, fill_value)
    Y, fill_value_Y = _sanitise_array_input(Y, fill_value)
    if Alphabet_X is not None:
        Alphabet_X, fill_value_Alphabet_X = _sanitise_array_input(
            Alphabet_X, fill_value
        )
        Alphabet_X, _ = _autocreate_alphabet(Alphabet_X, fill_value_Alphabet_X)
    else:
        Alphabet_X, fill_value_Alphabet_X = _autocreate_alphabet(X, fill_value_X)
    if Alphabet_Y is not None:
        Alphabet_Y, fill_value_Alphabet_Y = _sanitise_array_input(
            Alphabet_Y, fill_value
        )
        Alphabet_Y, _ = _autocreate_alphabet(Alphabet_Y, fill_value_Alphabet_Y)
    else:
        Alphabet_Y, fill_value_Alphabet_Y = _autocreate_alphabet(Y, fill_value_Y)

    if X.size == 0:
        raise ValueError("arg X contains no elements")
    if Y.size == 0:
        raise ValueError("arg Y contains no elements")
    if np.any(_isnan(X)):
        raise ValueError("arg X contains NaN values")
    if np.any(_isnan(Y)):
        raise ValueError("arg Y contains NaN values")
    if Alphabet_X.size == 0:
        raise ValueError("arg Alphabet_X contains no elements")
    if np.any(_isnan(Alphabet_X)):
        raise ValueError("arg Alphabet_X contains NaN values")
    if Alphabet_Y.size == 0:
        raise ValueError("arg Alphabet_Y contains no elements")
    if np.any(_isnan(Alphabet_Y)):
        raise ValueError("arg Alphabet_Y contains NaN values")
    if _isnan(fill_value_X):
        raise ValueError("fill value for arg X is NaN")
    if _isnan(fill_value_Y):
        raise ValueError("fill value for arg Y is NaN")
    if X.shape[:-1] != Alphabet_X.shape[:-1]:
        raise ValueError("leading dimensions of args X and Alphabet_X do not " "match")
    if Y.shape[:-1] != Alphabet_Y.shape[:-1]:
        raise ValueError("leading dimensions of args Y and Alphabet_Y do not " "match")
    if not cartesian_product and X.shape[:-1] != Y.shape[:-1]:
        raise ValueError("dimensions of args X and Y do not match")
    if not (np.isscalar(base) and np.isreal(base) and base > 0):
        raise ValueError("arg base not a positive real-valued scalar")

    S, fill_value = _map_observations_to_integers(
        (X, Alphabet_X, Y, Alphabet_Y),
        (fill_value_X, fill_value_Alphabet_X, fill_value_Y, fill_value_Alphabet_Y),
    )
    X, Alphabet_X, Y, Alphabet_Y = S

    if not cartesian_product:
        H = np.empty(X.shape[:-1])
        if H.ndim > 0:
            H[:] = np.nan
        else:
            H = np.float64(np.nan)
    else:

        def f(X, Y, Alphabet_X, Alphabet_Y):
            return divergence_jensenshannon(
                X, Y, False, base, fill_value, estimator, Alphabet_X, Alphabet_Y
            )

        return _cartesian_product_apply(X, Y, f, Alphabet_X, Alphabet_Y)

    # Re-shape H, X and Y, so that we may handle multi-dimensional arrays
    # equivalently and iterate across 0th axis
    X = np.reshape(X, (-1, X.shape[-1]))
    Y = np.reshape(Y, (-1, Y.shape[-1]))
    Alphabet_X = np.reshape(Alphabet_X, (-1, Alphabet_X.shape[-1]))
    Alphabet_Y = np.reshape(Alphabet_Y, (-1, Alphabet_Y.shape[-1]))
    orig_shape_H = H.shape
    H = np.reshape(H, (-1, 1))

    _verify_alphabet_sufficiently_large(X, Alphabet_X, fill_value)
    _verify_alphabet_sufficiently_large(Y, Alphabet_Y, fill_value)

    # NB: Observations are not considered jointly, thus elements in each row
    # are sorted independently
    X = np.sort(X, axis=1)
    Y = np.sort(Y, axis=1)

    # Compute symbol run-lengths
    # Compute symbol change indicators
    B = X[:, 1:] != X[:, :-1]
    C = Y[:, 1:] != Y[:, :-1]
    for i in range(X.shape[0]):
        # Obtain symbol change positions
        I = np.append(np.where(B[i]), X.shape[1] - 1)
        # Compute run lengths
        L = np.diff(np.append(-1, I))

        alphabet_X = X[i, I]
        if estimator != "ML":
            L, alphabet_X = _append_empty_bins_using_alphabet(
                L, alphabet_X, Alphabet_X[i], fill_value
            )
        L, alphabet_X = _remove_counts_at_fill_value(L, alphabet_X, fill_value)
        if not np.any(L):
            continue
        P1, _ = _estimate_probabilities(L, estimator)

        # Obtain symbol change positions
        J = np.append(np.where(C[i]), Y.shape[1] - 1)
        # Compute run lengths
        L = np.diff(np.append(-1, J))

        alphabet_Y = Y[i, J]
        if estimator != "ML":
            L, alphabet_Y = _append_empty_bins_using_alphabet(
                L, alphabet_Y, Alphabet_Y[i], fill_value
            )
        L, alphabet_Y = _remove_counts_at_fill_value(L, alphabet_Y, fill_value)
        if not np.any(L):
            continue
        P2, _ = _estimate_probabilities(L, estimator)

        # Merge probability distributions, so that common symbols have common
        # array location
        Alphabet = np.union1d(alphabet_X, alphabet_Y)
        P = np.zeros_like(Alphabet, dtype=P1.dtype)
        Q = np.zeros_like(Alphabet, dtype=P2.dtype)
        P[np.isin(Alphabet, alphabet_X, assume_unique=True)] = P1
        Q[np.isin(Alphabet, alphabet_Y, assume_unique=True)] = P2

        H[i] = (
            entropy_pmf(0.5 * P + 0.5 * Q, base)
            - 0.5 * entropy_pmf(P, base)
            - 0.5 * entropy_pmf(Q, base)
        )

    # Reverse re-shaping
    H = np.reshape(H, orig_shape_H)

    if keep_dims and not cartesian_product:
        H = H[..., np.newaxis]

    return H




[docs]
def divergence_kullbackleibler_symmetrised(
    X,
    Y=None,
    cartesian_product=False,
    base=2,
    fill_value=-1,
    estimator="ML",
    Alphabet_X=None,
    Alphabet_Y=None,
    keep_dims=False,
):
    """
    Returns the symmetrised Kullback-Leibler divergence [Lin91] between arrays
    X and Y, each containing discrete random variable realisations.

    **Mathematical definition**:

    Denoting with :math:`P_X`, :math:`P_Y` respectively probability
    distributions with common domain, associated with discrete random variables
    :math:`X`, :math:`Y`, the symmetrised Kullback-Leibler divergence
    :math:`D_{\\mathrm{SKL}}(P_X \\parallel P_Y)` is defined as:

    .. math::
        D_{\\mathrm{SKL}}(P_X \\parallel P_Y) =
        D_{\\mathrm{KL}}(P_X \\parallel P_Y) +
        D_{\\mathrm{KL}}(P_Y \\parallel P_X)

    where :math:`D_{\\mathrm{KL}}(\\cdot \\parallel \\cdot)` denotes the
    Kullback-Leibler divergence.

    **Estimation**:

    Symmetrised Kullback-Leibler divergence is estimated based on frequency
    tables, using the following functions:

        entropy_cross()

        entropy()

    See below for a list of available estimators. Note that although
    symmetrised Kullback-Leibler divergence is a non-negative quantity,
    depending on the chosen estimator the obtained estimate may be negative.

    **Parameters**:

    X,Y : numpy array (or array-like object such as a list of immutables, as \
    accepted by np.array())
        *cartesian_product==False and Y is not None*: X and Y are arrays
        containing discrete random variable realisations, with
        X.shape[:-1]==Y.shape[:-1]. Successive realisations of a random
        variable are indexed by the last axis in the respective arrays;
        multiple random variables in X and Y may be specified using preceding
        axes of the respective arrays (random variables are paired
        **one-to-one** between X and Y). When X.ndim==Y.ndim==1, returns a
        scalar. When X.ndim>1 and Y.ndim>1, returns an array of estimated
        divergence values with dimensions X.shape[:-1]. Neither X nor Y may
        contain (floating point) NaN values. Missing data may be specified
        using numpy masked arrays, as well as using standard numpy
        array/array-like objects; see below for details.

        *cartesian_product==True and Y is not None*: X and Y are arrays
        containing discrete random variable realisations. Successive
        realisations of a random variable are indexed by the last axis in the
        respective arrays; multiple random variables in X and Y may be
        specified using preceding axes of the respective arrays (random
        variables are paired **many-to-many** between X and Y). When
        X.ndim==Y.ndim==1, returns a scalar. When X.ndim>1 or Y.ndim>1, returns
        an array of estimated divergence values with dimensions
        np.append(X.shape[:-1],Y.shape[:-1]). Neither X nor Y may contain
        (floating point) NaN values. Missing data may be specified using numpy
        masked arrays, as well as using standard numpy array/array-like
        objects; see below for details.

        *Y is None*: Equivalent to divergence_kullbackleibler_symmetrised(X, X,
        ... ). Thus, a shorthand syntax for computing symmetrised
        Kullback-Leibler divergence (in bits) between all pairs of random
        variables in X is divergence_kullbackleibler_symmetrised(X).
    cartesian_product : boolean
        Indicates whether random variables are paired **one-to-one** between X
        and Y (cartesian_product==False, the default value) or **many-to-many**
        between X and Y (cartesian_product==True).
    base : float
        The desired logarithmic base (default 2).
    fill_value : object
        It is possible to specify missing data using numpy masked arrays,
        pandas Series/DataFrames, as well as using standard numpy
        array/array-like objects with assigned placeholder values. When using
        numpy masked arrays, this function invokes np.ma.filled() internally,
        so that missing data are represented with the array's object-internal
        placeholder value fill_value (this function's fill_value parameter is
        ignored in such cases). When using pandas Series/DataFrames, an initial
        conversion to a numpy masked array is performed. When using standard
        numpy array/array-like objects, this function's fill_value parameter is
        used to specify the placeholder value for missing data (defaults to
        -1).

        Data equal to the placeholder value are subsequently ignored.
    estimator : str or float
        The desired estimator (see above for details on estimators). Possible
        values are:

            *'ML' (the default value)* : Maximum likelihood estimator.

            *any floating point value* : Maximum a posteriori esimator using
            Dirichlet prior (equivalent to maximum likelihood with pseudo-count
            for each outcome as specified).

            *PERKS* : Maximum a posteriori esimator using Dirichlet prior
            (equivalent to maximum likelihood with pseudo-count for each
            outcome set to 1/L, where L is the number of possible outcomes.

            *MINIMAX* : Maximum a posteriori esimator using Dirichlet prior
            (equivalent to maximum likelihood with pseudo-count for each
            outcome set to sqrt(N)/L, where N is the total number of
            realisations and where L is the number of possible outcomes.

            *JAMES-STEIN* : James-Stein estimator [HaSt09].

            *GOOD-TURING* : Good-Turing estimator [GaSa95].

    Alphabet_X, Alphabet_Y : numpy array (or array-like object such as a list \
    of immutables, as accepted by np.array())
        Respectively an array specifying the alphabet/alphabets of possible
        outcomes that random variable realisations in array X, Y may assume.
        Defaults to None, in which case the alphabet/alphabets of possible
        outcomes is/are implicitly based the observed outcomes in array X, Y
        respectively, with no additional, unobserved outcomes. In combination
        with any estimator other than maximum likelihood, it may be useful to
        specify alphabets including unobserved outcomes. For such cases,
        successive possible outcomes of a random variable are indexed by the
        last axis in Alphabet_X, Alphabet_Y respectively; multiple alphabets
        may be specified using preceding axes, with the requirement
        X.shape[:-1]==Alphabet_X.shape[:-1] (analogously for Y). Alphabets of
        different sizes may be specified either using numpy masked arrays, or
        by padding with the chosen placeholder fill_value.
    keep_dims : boolean
        When set to True and cartesian_product==False an additional dimension
        of length one is appended to the returned array, facilitating any
        broadcast operations required by the user (defaults to False). Has no
        effect when cartesian_product==True.

    **Implementation notes**:

    Before estimation, outcomes are mapped to the set of non-negative integers
    internally, with the value -1 representing missing data. To avoid this
    internal conversion step, supply integer data and use the default fill
    value -1.

    """
    if Y is None:
        Y = X
        cartesian_product = True
        Alphabet_Y = Alphabet_X

    H1 = divergence_kullbackleibler(
        X, Y, cartesian_product, base, fill_value, estimator, Alphabet_X, Alphabet_Y
    )
    H2 = divergence_kullbackleibler(
        Y, X, cartesian_product, base, fill_value, estimator, Alphabet_Y, Alphabet_X
    )

    if cartesian_product:
        H2 = H2.T

    H = H1 + H2

    if keep_dims and not cartesian_product:
        H = H[..., np.newaxis]

    return H




[docs]
def entropy_conditional(
    X,
    Y=None,
    cartesian_product=False,
    base=2,
    fill_value=-1,
    estimator="ML",
    Alphabet_X=None,
    Alphabet_Y=None,
    keep_dims=False,
):
    """
    Returns the conditional entropy (see e.g. [CoTh06]) between arrays X and Y,
    each containing discrete random variable realisations.

    **Mathematical definition**:

    Given discrete random variables :math:`X`, :math:`Y`, the conditional
    entropy :math:`H(X|Y)` is defined as:

    .. math::
        H(X|Y) = H(X,Y) - H(Y)

    where :math:`H(\\cdot,\\cdot)` denotes the joint entropy and where
    :math:`H(\\cdot)` denotes the entropy.

    **Estimation**:

    Conditional entropy is estimated based on frequency tables, using the
    following functions:

        entropy_joint()

        entropy()

    See below for a list of available estimators. Note that although
    conditional entropy is a non-negative quantity, depending on the chosen
    estimator the obtained estimate may be negative.

    **Parameters**:

    X,Y : numpy array (or array-like object such as a list of immutables, as \
    accepted by np.array())
        *cartesian_product==False and Y is not None*: X and Y are arrays
        containing discrete random variable realisations, with
        X.shape==Y.shape. Successive realisations of a random variable are
        indexed by the last axis in the respective arrays; multiple random
        variables in X and Y may be specified using preceding axes of the
        respective arrays (random variables are paired **one-to-one** between X
        and Y). When X.ndim==Y.ndim==1, returns a scalar. When X.ndim>1 and
        Y.ndim>1, returns an array of estimated conditional entropies with
        dimensions X.shape[:-1]. Neither X nor Y may contain (floating point)
        NaN values. Missing data may be specified using numpy masked arrays, as
        well as using standard numpy array/array-like objects; see below
        for details.

        *cartesian_product==True and Y is not None*: X and Y are arrays
        containing discrete random variable realisations, with
        X.shape[-1]==Y.shape[-1]. Successive realisations of a random variable
        are indexed by the last axis in the respective arrays; multiple random
        variables in X and Y may be specified using preceding axes of the
        respective arrays (random variables are paired **many-to-many** between
        X and Y). When X.ndim==Y.ndim==1, returns a scalar. When X.ndim>1 or
        Y.ndim>1, returns an array of estimated conditional entropies with
        dimensions np.append(X.shape[:-1],Y.shape[:-1]). Neither X nor Y may
        contain (floating point) NaN values. Missing data may be specified
        using numpy masked arrays, as well as using standard numpy
        array/array-like objects; see below for details.

        *Y is None*: Equivalent to entropy_conditional(X, X, ... ). Thus, a
        shorthand syntax for computing conditional entropies (in bits) between
        all pairs of random variables in X is entropy_conditional(X).
    cartesian_product : boolean
        Indicates whether random variables are paired **one-to-one** between X
        and Y (cartesian_product==False, the default value) or **many-to-many**
        between X and Y (cartesian_product==True).
    base : float
        The desired logarithmic base (default 2).
    fill_value : object
        It is possible to specify missing data using numpy masked arrays,
        pandas Series/DataFrames, as well as using standard numpy
        array/array-like objects with assigned placeholder values. When using
        numpy masked arrays, this function invokes np.ma.filled() internally,
        so that missing data are represented with the array's object-internal
        placeholder value fill_value (this function's fill_value parameter is
        ignored in such cases). When using pandas Series/DataFrames, an initial
        conversion to a numpy masked array is performed. When using standard
        numpy array/array-like objects, this function's fill_value parameter is
        used to specify the placeholder value for missing data (defaults to
        -1).

        Data equal to the placeholder value are subsequently ignored.
    estimator : str or float
        The desired estimator (see above for details on estimators). Possible
        values are:

            *'ML' (the default value)* : Maximum likelihood estimator.

            *any floating point value* : Maximum a posteriori esimator using
            Dirichlet prior (equivalent to maximum likelihood with pseudo-count
            for each outcome as specified).

            *PERKS* : Maximum a posteriori esimator using Dirichlet prior
            (equivalent to maximum likelihood with pseudo-count for each
            outcome set to 1/L, where L is the number of possible outcomes.

            *MINIMAX* : Maximum a posteriori esimator using Dirichlet prior
            (equivalent to maximum likelihood with pseudo-count for each
            outcome set to sqrt(N)/L, where N is the total number of
            realisations and where L is the number of possible outcomes.

            *JAMES-STEIN* : James-Stein estimator [HaSt09].

            *GOOD-TURING* : Good-Turing estimator [GaSa95].

    Alphabet_X, Alphabet_Y : numpy array (or array-like object such as a list \
    of immutables, as accepted by np.array())
        Respectively an array specifying the alphabet/alphabets of possible
        outcomes that random variable realisations in array X, Y may assume.
        Defaults to None, in which case the alphabet/alphabets of possible
        outcomes is/are implicitly based the observed outcomes in array X, Y
        respectively, with no additional, unobserved outcomes. In combination
        with any estimator other than maximum likelihood, it may be useful to
        specify alphabets including unobserved outcomes. For such cases,
        successive possible outcomes of a random variable are indexed by the
        last axis in Alphabet_X, Alphabet_Y respectively; multiple alphabets
        may be specified using preceding axes, with the requirement
        X.shape[:-1]==Alphabet_X.shape[:-1] (analogously for Y). Alphabets of
        different sizes may be specified either using numpy masked arrays, or
        by padding with the chosen placeholder fill_value.

        NB: When specifying alphabets, an alphabet of possible joint outcomes
        is always implicit from the alphabets of possible (marginal) outcomes
        in Alphabet_X, Alphabet_Y. For example, specifying
        Alphabet_X=Alphabet_Y=np.array(((1,2)) implies an alphabet of possible
        joint outcomes np.array(((1,1,2,2),(1,2,1,2))).
    keep_dims : boolean
        When set to True and cartesian_product==False an additional dimension
        of length one is appended to the returned array, facilitating any
        broadcast operations required by the user (defaults to False). Has no
        effect when cartesian_product==True.

    **Implementation notes**:

    Before estimation, outcomes are mapped to the set of non-negative integers
    internally, with the value -1 representing missing data. To avoid this
    internal conversion step, supply integer data and use the default fill
    value -1.

    """
    # TODO Add note in documentation (for other functions where appropriate)
    # about creating joint observations using appropriate function
    if Y is None:
        Y = X
        cartesian_product = True
        Alphabet_Y = Alphabet_X

    X, fill_value_X = _sanitise_array_input(X, fill_value)
    Y, fill_value_Y = _sanitise_array_input(Y, fill_value)
    if Alphabet_X is not None:
        Alphabet_X, fill_value_Alphabet_X = _sanitise_array_input(
            Alphabet_X, fill_value
        )
        Alphabet_X, _ = _autocreate_alphabet(Alphabet_X, fill_value_Alphabet_X)
    else:
        Alphabet_X, fill_value_Alphabet_X = _autocreate_alphabet(X, fill_value_X)
    if Alphabet_Y is not None:
        Alphabet_Y, fill_value_Alphabet_Y = _sanitise_array_input(
            Alphabet_Y, fill_value
        )
        Alphabet_Y, _ = _autocreate_alphabet(Alphabet_Y, fill_value_Alphabet_Y)
    else:
        Alphabet_Y, fill_value_Alphabet_Y = _autocreate_alphabet(Y, fill_value_Y)

    if X.size == 0:
        raise ValueError("arg X contains no elements")
    if Y.size == 0:
        raise ValueError("arg Y contains no elements")
    if np.any(_isnan(X)):
        raise ValueError("arg X contains NaN values")
    if np.any(_isnan(Y)):
        raise ValueError("arg Y contains NaN values")
    if Alphabet_X.size == 0:
        raise ValueError("arg Alphabet_X contains no elements")
    if np.any(_isnan(Alphabet_X)):
        raise ValueError("arg Alphabet_X contains NaN values")
    if Alphabet_Y.size == 0:
        raise ValueError("arg Alphabet_Y contains no elements")
    if np.any(_isnan(Alphabet_Y)):
        raise ValueError("arg Alphabet_Y contains NaN values")
    if _isnan(fill_value_X):
        raise ValueError("fill value for arg X is NaN")
    if _isnan(fill_value_Y):
        raise ValueError("fill value for arg Y is NaN")
    if X.shape[:-1] != Alphabet_X.shape[:-1]:
        raise ValueError("leading dimensions of args X and Alphabet_X do not " "match")
    if Y.shape[:-1] != Alphabet_Y.shape[:-1]:
        raise ValueError("leading dimensions of args Y and Alphabet_Y do not " "match")
    if not cartesian_product and X.shape != Y.shape:
        raise ValueError("dimensions of args X and Y do not match")
    if cartesian_product and X.shape[-1] != Y.shape[-1]:
        raise ValueError("trailing dimensions of args X and Y do not match")
    if not (np.isscalar(base) and np.isreal(base) and base > 0):
        raise ValueError("arg base not a positive real-valued scalar")

    S, fill_value = _map_observations_to_integers(
        (X, Alphabet_X, Y, Alphabet_Y),
        (fill_value_X, fill_value_Alphabet_X, fill_value_Y, fill_value_Alphabet_Y),
    )
    X, Alphabet_X, Y, Alphabet_Y = S

    if not cartesian_product:
        H = np.empty(X.shape[:-1])
        if H.ndim > 0:
            H[:] = np.nan
        else:
            H = np.float64(np.nan)
    else:

        def f(X, Y, Alphabet_X, Alphabet_Y):
            return entropy_conditional(
                X, Y, False, base, fill_value, estimator, Alphabet_X, Alphabet_Y
            )

        return _cartesian_product_apply(X, Y, f, Alphabet_X, Alphabet_Y)

    # Re-shape H, X and Y, so that we may handle multi-dimensional arrays
    # equivalently and iterate across 0th axis
    X = np.reshape(X, (-1, X.shape[-1]))
    Y = np.reshape(Y, (-1, Y.shape[-1]))
    Alphabet_X = np.reshape(Alphabet_X, (-1, Alphabet_X.shape[-1]))
    Alphabet_Y = np.reshape(Alphabet_Y, (-1, Alphabet_Y.shape[-1]))
    orig_shape_H = H.shape
    H = np.reshape(H, (-1, 1))

    for i in range(X.shape[0]):
        H[i] = entropy_joint(
            np.vstack((X[i], Y[i])),
            base,
            fill_value,
            estimator,
            _vstack_pad((Alphabet_X[i], Alphabet_Y[i]), fill_value),
        ) - entropy(Y[i], base, fill_value, estimator, Alphabet_Y[i])

    # Reverse re-shaping
    H = np.reshape(H, orig_shape_H)

    if keep_dims and not cartesian_product:
        H = H[..., np.newaxis]

    return H




[docs]
def entropy_joint(
    X, base=2, fill_value=-1, estimator="ML", Alphabet_X=None, keep_dims=False
):
    """
    Returns the estimated joint entropy (see e.g. [CoTh06]) for an array X
    containing realisations of discrete random variables.

    **Mathematical definition**:

    Denoting with :math:`P(x_1, \\ldots, x_n)` the probability of jointly
    observing outcomes :math:`(x_1, \\ldots, x_n)` of :math:`n` discrete random
    variables :math:`(X_1, \ldots, X_n)`, the joint entropy
    :math:`H(X_1, \\ldots, X_n)` is defined as:

    .. math::
        H(X_1, \\ldots, X_n) = -\\sum_{x_1} \\ldots \\sum_{x_n}
        {P(x_1, \\ldots, x_n ) \\log {P(x_1, \\ldots, x_n)}}.

    **Estimation**:

    Joint entropy is estimated based on frequency tables. See below for a list
    of available estimators.

    *Parameters**:

    X : numpy array (or array-like object such as a list of immutables, as \
    accepted by np.array())
        An array containing discrete random variable realisations. Successive
        realisations of a random variable are indexed by the last axis in the
        array; multiple random variables may be specified using preceding axes.
        When X.ndim==1, returns a scalar and is equivalent to entropy(). When
        X.ndim>1, returns a scalar based on jointly considering all random
        variables indexed in the array. X may not contain (floating point) NaN
        values. Missing data may be specified using numpy masked arrays, as
        well as using standard numpy array/array-like objects; see below
        for details.
    base : float
        The desired logarithmic base (default 2).
    fill_value : object
        It is possible to specify missing data using numpy masked arrays,
        pandas Series/DataFrames, as well as using standard numpy
        array/array-like objects with assigned placeholder values. When using
        numpy masked arrays, this function invokes np.ma.filled() internally,
        so that missing data are represented with the array's object-internal
        placeholder value fill_value (this function's fill_value parameter is
        ignored in such cases). When using pandas Series/DataFrames, an initial
        conversion to a numpy masked array is performed. When using standard
        numpy array/array-like objects, this function's fill_value parameter is
        used to specify the placeholder value for missing data (defaults to
        -1).

        Data equal to the placeholder value are subsequently ignored.
    estimator : str or float
        The desired estimator (see above for details on estimators). Possible
        values are:

            *'ML' (the default value)* : Maximum likelihood estimator.

            *any floating point value* : Maximum a posteriori esimator using
            Dirichlet prior (equivalent to maximum likelihood with pseudo-count
            for each outcome as specified).

            *PERKS* : Maximum a posteriori esimator using Dirichlet prior
            (equivalent to maximum likelihood with pseudo-count for each
            outcome set to 1/L, where L is the number of possible outcomes.

            *MINIMAX* : Maximum a posteriori esimator using Dirichlet prior
            (equivalent to maximum likelihood with pseudo-count for each
            outcome set to sqrt(N)/L, where N is the total number of
            realisations and where L is the number of possible outcomes.

            *JAMES-STEIN* : James-Stein estimator [HaSt09].

            *GOOD-TURING* : Good-Turing estimator [GaSa95].

    Alphabet_X : numpy array (or array-like object such as a list of \
    immutables, as accepted by np.array())
        An array specifying the alphabet/alphabets of possible outcomes that
        random variable realisations in array X may assume. Defaults to None,
        in which case the alphabet/alphabets of possible outcomes is/are
        implicitly based the observed outcomes in array X, with no additional,
        unobserved outcomes. In combination with any estimator other than
        maximum likelihood, it may be useful to specify alphabets including
        unobserved outcomes. For such cases, successive possible outcomes of a
        random variable are indexed by the last axis in Alphabet_X; multiple
        alphabets may be specified using preceding axes, with the requirement
        X.shape[:-1]==Alphabet_X.shape[:-1]. Alphabets of different sizes may
        be specified either using numpy masked arrays, or by padding with the
        chosen placeholder fill_value.

        NB: When specifying multiple alphabets, an alphabet of possible joint
        outcomes is always implicit from the alphabets of possible (marginal)
        outcomes in Alphabet_X. For example, specifying
        Alphabet_X=np.array(((1,2),(1,2))) implies an alphabet of possible
        joint outcomes np.array(((1,1,2,2),(1,2,1,2))).
    keep_dims : boolean
        When set to True, an additional dimension of length one is appended to
        the returned array, facilitating any broadcast operations required by
        the user (defaults to False).

    **Implementation notes**:

    Before estimation, outcomes are mapped to the set of non-negative integers
    internally, with the value -1 representing missing data. To avoid this
    internal conversion step, supply integer data and use the default fill
    value -1.

    """
    # TODO If we add joint observation function, we can reduce code duplication
    # in this function.
    # TODO NB: The joint observation function must honour missing data fill
    # values.
    X, fill_value_X = _sanitise_array_input(X, fill_value)
    if Alphabet_X is not None:
        Alphabet_X, fill_value_Alphabet_X = _sanitise_array_input(
            Alphabet_X, fill_value
        )
        Alphabet_X, _ = _autocreate_alphabet(Alphabet_X, fill_value_Alphabet_X)
    else:
        Alphabet_X, fill_value_Alphabet_X = _autocreate_alphabet(X, fill_value_X)

    if X.size == 0:
        raise ValueError("arg X contains no elements")
    if np.any(_isnan(X)):
        raise ValueError("arg X contains NaN values")
    if Alphabet_X.size == 0:
        raise ValueError("arg Alphabet_X contains no elements")
    if np.any(_isnan(Alphabet_X)):
        raise ValueError("arg Alphabet_X contains NaN values")
    if _isnan(fill_value_X):
        raise ValueError("fill value for arg X is NaN")
    if X.shape[:-1] != Alphabet_X.shape[:-1]:
        raise ValueError("leading dimensions of args X and Alphabet_X do not " "match")
    if not (np.isscalar(base) and np.isreal(base) and base > 0):
        raise ValueError("arg base not a positive real-valued scalar")

    S, fill_value = _map_observations_to_integers(
        (X, Alphabet_X), (fill_value_X, fill_value_Alphabet_X)
    )
    X, Alphabet_X = S

    # Re-shape X, so that we may handle multi-dimensional arrays equivalently
    # and iterate across 0th axis
    X = np.reshape(X, (-1, X.shape[-1]))
    Alphabet_X = np.reshape(Alphabet_X, (-1, Alphabet_X.shape[-1]))

    _verify_alphabet_sufficiently_large(X, Alphabet_X, fill_value)

    # Sort columns
    for i in range(X.shape[0]):
        X = X[:, X[i].argsort(kind="mergesort")]

    # Compute symbol run-lengths
    # Compute symbol change indicators
    B = np.any(X[:, 1:] != X[:, :-1], axis=0)
    # Obtain symbol change positions
    I = np.append(np.where(B), X.shape[1] - 1)
    # Compute run lengths
    L = np.diff(np.append(-1, I))

    alphabet_X = X[:, I]
    if estimator != "ML":
        n_additional_empty_bins = _determine_number_additional_empty_bins(
            L, alphabet_X, Alphabet_X, fill_value
        )
    else:
        n_additional_empty_bins = 0
    L, _ = _remove_counts_at_fill_value(L, alphabet_X, fill_value)
    if not np.any(L):
        return np.float64(np.nan)

    # P_0 is the probability mass assigned to each additional empty bin
    P, P_0 = _estimate_probabilities(L, estimator, n_additional_empty_bins)
    H_0 = n_additional_empty_bins * P_0 * -np.log2(P_0 + np.spacing(0)) / np.log2(base)
    H = entropy_pmf(P, base, require_valid_pmf=False) + H_0

    if keep_dims:
        H = H[..., np.newaxis]

    return H




[docs]
def entropy(X, base=2, fill_value=-1, estimator="ML", Alphabet_X=None, keep_dims=False):
    """
    Returns the estimated entropy (see e.g. [CoTh06]) for an array X containing
    realisations of a discrete random variable.

    **Mathematical definition**:

    Denoting with :math:`P(x)` the probability of observing outcome :math:`x`
    of a discrete random variable :math:`X`, the entropy :math:`H(X)` is
    defined as:

    .. math::
        H(X) = -\\sum_x {P(x) \\log {P(x)}}.

    **Estimation**:

    Entropy is estimated based on frequency tables. See below for a list of
    available estimators.

    **Parameters**:

    X : numpy array (or array-like object such as a list of immutables, as \
    accepted by np.array())
        An array containing discrete random variable realisations. Successive
        realisations of a random variable are indexed by the last axis in the
        array; multiple random variables may be specified using preceding axes.
        When X.ndim==1, returns a scalar. When X.ndim>1, returns an array of
        estimated entropies with dimensions X.shape[:-1].  X may not contain
        (floating point) NaN values. Missing data may be specified using numpy
        masked arrays, as well as using standard numpy array/array-like
        objects; see below for details.
    base : float
        The desired logarithmic base (default 2).
    fill_value : object
        It is possible to specify missing data using numpy masked arrays,
        pandas Series/DataFrames, as well as using standard numpy
        array/array-like objects with assigned placeholder values. When using
        numpy masked arrays, this function invokes np.ma.filled() internally,
        so that missing data are represented with the array's object-internal
        placeholder value fill_value (this function's fill_value parameter is
        ignored in such cases). When using pandas Series/DataFrames, an initial
        conversion to a numpy masked array is performed. When using standard
        numpy array/array-like objects, this function's fill_value parameter is
        used to specify the placeholder value for missing data (defaults to
        -1).

        Data equal to the placeholder value are subsequently ignored.
    estimator : str or float
        The desired estimator (see above for details on estimators). Possible
        values are:

            *'ML' (the default value)* : Maximum likelihood estimator.

            *any floating point value* : Maximum a posteriori esimator using
            Dirichlet prior (equivalent to maximum likelihood with pseudo-count
            for each outcome as specified).

            *PERKS* : Maximum a posteriori esimator using Dirichlet prior
            (equivalent to maximum likelihood with pseudo-count for each
            outcome set to 1/L, where L is the number of possible outcomes.

            *MINIMAX* : Maximum a posteriori esimator using Dirichlet prior
            (equivalent to maximum likelihood with pseudo-count for each
            outcome set to sqrt(N)/L, where N is the total number of
            realisations and where L is the number of possible outcomes.

            *JAMES-STEIN* : James-Stein estimator [HaSt09].

            *GOOD-TURING* : Good-Turing estimator [GaSa95].

    Alphabet_X : numpy array (or array-like object such as a list of \
    immutables, as accepted by np.array())
        An array specifying the alphabet/alphabets of possible outcomes that
        random variable realisations in array X may assume. Defaults to None,
        in which case the alphabet/alphabets of possible outcomes is/are
        implicitly based the observed outcomes in array X, with no additional,
        unobserved outcomes. In combination with any estimator other than
        maximum likelihood, it may be useful to specify alphabets including
        unobserved outcomes. For such cases, successive possible outcomes of a
        random variable are indexed by the last axis in Alphabet_X; multiple
        alphabets may be specified using preceding axes, with the requirement
        X.shape[:-1]==Alphabet_X.shape[:-1]. Alphabets of different sizes may
        be specified either using numpy masked arrays, or by padding with the
        chosen placeholder fill_value.
    keep_dims : boolean
        When set to True, an additional dimension of length one is appended to
        the returned array, facilitating any broadcast operations required by
        the user (defaults to False).

    **Implementation notes**:

    Before estimation, outcomes are mapped to the set of non-negative integers
    internally, with the value -1 representing missing data. To avoid this
    internal conversion step, supply integer data and use the default fill
    value -1.

    """
    # NB: We would be able to reduce code duplication by invoking
    # entropy_cross(X,X). However, performance would likely be lower!
    X, fill_value_X = _sanitise_array_input(X, fill_value)
    if Alphabet_X is not None:
        Alphabet_X, fill_value_Alphabet_X = _sanitise_array_input(
            Alphabet_X, fill_value
        )
        Alphabet_X, _ = _autocreate_alphabet(Alphabet_X, fill_value_Alphabet_X)
    else:
        Alphabet_X, fill_value_Alphabet_X = _autocreate_alphabet(X, fill_value_X)

    if X.size == 0:
        raise ValueError("arg X contains no elements")
    if np.any(_isnan(X)):
        raise ValueError("arg X contains NaN values")
    if Alphabet_X.size == 0:
        raise ValueError("arg Alphabet_X contains no elements")
    if np.any(_isnan(Alphabet_X)):
        raise ValueError("arg Alphabet_X contains NaN values")
    if _isnan(fill_value_X):
        raise ValueError("fill value for arg X is NaN")
    if X.shape[:-1] != Alphabet_X.shape[:-1]:
        raise ValueError("leading dimensions of args X and Alphabet_X do not " "match")
    if not (np.isscalar(base) and np.isreal(base) and base > 0):
        raise ValueError("arg base not a positive real-valued scalar")

    S, fill_value = _map_observations_to_integers(
        (X, Alphabet_X), (fill_value_X, fill_value_Alphabet_X)
    )
    X, Alphabet_X = S

    H = np.empty(X.shape[:-1])
    if H.ndim > 0:
        H[:] = np.nan
    else:
        H = np.float64(np.nan)

    # Re-shape H and X, so that we may handle multi-dimensional arrays
    # equivalently and iterate across 0th axis
    X = np.reshape(X, (-1, X.shape[-1]))
    Alphabet_X = np.reshape(Alphabet_X, (-1, Alphabet_X.shape[-1]))
    orig_shape_H = H.shape
    H = np.reshape(H, (-1, 1))

    _verify_alphabet_sufficiently_large(X, Alphabet_X, fill_value)

    # NB: This is not joint entropy. Elements in each row are sorted
    # independently
    X = np.sort(X, axis=1)

    # Compute symbol run-lengths
    # Compute symbol change indicators
    B = X[:, 1:] != X[:, :-1]
    for i in range(X.shape[0]):
        # Obtain symbol change positions
        I = np.append(np.where(B[i]), X.shape[1] - 1)
        # Compute run lengths
        L = np.diff(np.append(-1, I))

        alphabet_X = X[i, I]
        if estimator != "ML":
            n_additional_empty_bins = _determine_number_additional_empty_bins(
                L, alphabet_X, Alphabet_X[i], fill_value
            )
        else:
            n_additional_empty_bins = 0
        L, _ = _remove_counts_at_fill_value(L, alphabet_X, fill_value)
        if not np.any(L):
            continue

        # P_0 is the probability mass assigned to each additional empty bin
        P, P_0 = _estimate_probabilities(L, estimator, n_additional_empty_bins)
        H_0 = (
            n_additional_empty_bins
            * P_0
            * -np.log2(P_0 + np.spacing(0))
            / np.log2(base)
        )
        H[i] = entropy_pmf(P, base, require_valid_pmf=False) + H_0

    # Reverse re-shaping
    H = np.reshape(H, orig_shape_H)

    if keep_dims:
        H = H[..., np.newaxis]

    return H




[docs]
def entropy_pmf(P, base=2, require_valid_pmf=True, keep_dims=False):
    """
    Returns the entropy (see e.g. [CoTh06]) of an array P representing a
    discrete probability distribution.

    **Mathematical definition**:

    Denoting with :math:`P(x)` the probability mass associated with observing
    an outcome :math:`x` under distribution :math:`P`, the entropy :math:`H(P)`
    is defined as:

    .. math::
        H(P) = -\\sum_x {P(x) \\log {P(x)}}.

    **Parameters**:

    P : numpy array (or array-like object such as a list of immutables, as \
    accepted by np.array())
        An array containing probability mass assignments. Probabilities in a
        distribution are indexed by the last axis in the array; multiple
        probability distributions may be specified using preceding axes. When
        P.ndim==1, returns a scalar. When P.ndim>1, returns an array of
        entropies with dimensions P.shape[:-1]. P may not contain (floating
        point) NaN values.
    base : float
        The desired logarithmic base (default 2).
    require_valid_pmf : boolean
        When set to True (the default value), verifies that probability mass
        assignments in each distribution sum to 1. When set to False, no such
        test is performed, thus allowing incomplete probability distributions
        to be processed.
    keep_dims : boolean
        When set to True, an additional dimension of length one is appended to
        the returned array, facilitating any broadcast operations required by
        the user (defaults to False).
    """
    P, _ = _sanitise_array_input(P)

    if P.size == 0:
        raise ValueError("arg P contains no elements")
    if np.any(_isnan(P)):
        raise ValueError("arg P contains NaN values")
    if np.any(np.logical_or(P < 0, P > 1)):
        raise ValueError("arg P contains values outside unit interval")
    if require_valid_pmf and not np.allclose(np.sum(P, axis=-1), 1):
        raise ValueError("arg P does not sum to unity across last axis")
    if not (np.isscalar(base) and np.isreal(base) and base > 0):
        raise ValueError("arg base not a positive real-valued scalar")

    H = -np.sum(P * np.log2(P + np.spacing(0)), axis=-1)
    H = H / np.log2(base)

    if keep_dims:
        H = H[..., np.newaxis]

    return H




[docs]
def entropy_cross_pmf(
    P, Q=None, cartesian_product=False, base=2, require_valid_pmf=True, keep_dims=False
):
    """
    Returns the cross entropy (see e.g. [Murp12]) between arrays P and Q, each
    representing a discrete probability distribution.

    **Mathematical definition**:

    Denoting with :math:`P(x)`, :math:`Q(x)` respectively the probability mass
    associated with observing an outcome :math:`x` under distributions
    :math:`P`, :math:`Q`, the cross entropy :math:`H^\\times(P,Q)` is defined
    as:

    .. math::
        H^\\times(P,Q) = -\\sum_x {P(x) \\log {Q(x)}}.

    **Parameters**:

    P, Q : numpy array (or array-like object such as a list of immutables, as \
    accepted by np.array())
        *cartesian_product==False and Q is not None*: P and Q are arrays
        containing probability mass assignments, with P.shape==Q.shape.
        Probabilities in a distribution are indexed by the last axis in the
        respective arrays; multiple probability distributions in P and Q may be
        specified using preceding axes of the respective arrays (distributions
        are paired **one-to-one** between P and Q). When P.ndim==Q.ndim==1,
        returns a scalar. When P.ndim>1 and Q.ndim>1, returns an array of cross
        entropies with dimensions P.shape[:-1]. Neither P nor Q may contain
        (floating point) NaN values.

        *cartesian_product==True and Q is not None*: P and Q are arrays
        containing probability mass assignments, with P.shape[-1]==Q.shape[-1].
        Probabilities in a distribution are indexed by the last axis in the
        respective arrays; multiple probability distributions in P and Q may be
        specified using preceding axes of the respective arrays (distributions
        are paired **many-to-many** between P and Q). When P.ndim==Q.ndim==1,
        returns a scalar. When P.ndim>1 and Q.ndim>1, returns an array of cross
        entropies with dimensions np.append(P.shape[:-1],Q.shape[:-1]). Neither
        P nor Q may contain (floating point) NaN values.

        *Q is None*: Equivalent to entropy_cross_pmf(P, P, ... ). Thus, a
        shorthand syntax for computing cross entropies (in bits) between all
        pairs of probability distributions in P is entropy_cross_pmf(P).
    cartesian_product : boolean
        Indicates whether probability distributions are paired **one-to-one**
        between P and Q (cartesian_product==False, the default value) or
        **many-to-many** between P and Q (cartesian_product==True).
    base : float
        The desired logarithmic base (default 2).
    require_valid_pmf : boolean
        When set to True (the default value), verifies that probability mass
        assignments in each distribution sum to 1. When set to False, no such
        test is performed, thus allowing incomplete probability distributions
        to be processed.
    keep_dims : boolean
        When set to True and cartesian_product==False an additional dimension
        of length one is appended to the returned array, facilitating any
        broadcast operations required by the user (defaults to False). Has no
        effect when cartesian_product==True.
    """
    if Q is None:
        Q = P
        cartesian_product = True

    P, _ = _sanitise_array_input(P)
    Q, _ = _sanitise_array_input(Q)

    if P.size == 0:
        raise ValueError("arg P contains no elements")
    if Q.size == 0:
        raise ValueError("arg Q contains no elements")
    if np.any(_isnan(P)):
        raise ValueError("arg P contains NaN values")
    if np.any(_isnan(Q)):
        raise ValueError("arg Q contains NaN values")
    if not cartesian_product and P.shape != Q.shape:
        raise ValueError("dimensions of args P and Q do not match")
    if cartesian_product and P.shape[-1] != Q.shape[-1]:
        raise ValueError("trailing dimensions of args P and Q do not match")
    if np.any(np.logical_or(P < 0, P > 1)):
        raise ValueError("arg P contains values outside unit interval")
    if np.any(np.logical_or(Q < 0, Q > 1)):
        raise ValueError("arg Q contains values outside unit interval")
    if require_valid_pmf and not np.allclose(np.sum(P, axis=-1), 1):
        raise ValueError("arg P does not sum to unity across last axis")
    if require_valid_pmf and not np.allclose(np.sum(Q, axis=-1), 1):
        raise ValueError("arg Q does not sum to unity across last axis")
    if not (np.isscalar(base) and np.isreal(base) and base > 0):
        raise ValueError("arg base not a positive real-valued scalar")

    if cartesian_product:

        def f(P, Q):
            return entropy_cross_pmf(P, Q, False, base, require_valid_pmf)

        return _cartesian_product_apply(P, Q, f)

    with np.errstate(invalid="ignore", divide="ignore"):
        H = P * np.log2(Q)
    H[P == 0] = 0
    H = -np.sum(H, axis=-1)
    H = H / np.log2(base)

    if keep_dims and not cartesian_product:
        H = H[..., np.newaxis]

    return H




[docs]
def divergence_kullbackleibler_pmf(
    P, Q=None, cartesian_product=False, base=2, require_valid_pmf=True, keep_dims=False
):
    """
    Returns the Kullback-Leibler divergence (see e.g. [CoTh06]) between arrays
    P and Q, each representing a discrete probability distribution.

    **Mathematical definition**:

    Denoting with :math:`P(x)`, :math:`Q(x)` respectively the probability mass
    associated with observing an outcome :math:`x` under distributions
    :math:`P`, :math:`Q`, the Kullback-Leibler divergence
    :math:`D_{\\mathrm{KL}}(P \\parallel Q)` is defined as:

    .. math::
        D_{\\mathrm{KL}}(P \\parallel Q) =
        -\\sum_x {P(x) \\log {\\frac{Q(x)}{P(x)}}}.

    **Parameters**:

    P, Q : numpy array (or array-like object such as a list of immutables, as \
    accepted by np.array())
        *cartesian_product==False and Q is not None*: P and Q are arrays
        containing probability mass assignments, with P.shape==Q.shape.
        Probabilities in a distribution are indexed by the last axis in the
        respective arrays; multiple probability distributions in P and Q may be
        specified using preceding axes of the respective arrays (distributions
        are paired **one-to-one** between P and Q). When P.ndim==Q.ndim==1,
        returns a scalar. When P.ndim>1 and Q.ndim>1, returns an array of
        divergence values with dimensions P.shape[:-1]. Neither P nor Q may
        contain (floating point) NaN values.

        *cartesian_product==True and Q is not None*: P and Q are arrays
        containing probability mass assignments, with P.shape[-1]==Q.shape[-1].
        Probabilities in a distribution are indexed by the last axis in the
        respective arrays; multiple probability distributions in P and Q may be
        specified using preceding axes of the respective arrays (distributions
        are paired **many-to-many** between P and Q). When P.ndim==Q.ndim==1,
        returns a scalar. When P.ndim>1 and Q.ndim>1, returns an array of
        divergence values with dimensions np.append(P.shape[:-1],Q.shape[:-1]).
        Neither P nor Q may contain (floating point) NaN values.

        *Q is None*: Equivalent to divergence_kullbackleibler_pmf(P, P, ... ).
        Thus, a shorthand syntax for computing Kullback-Leibler divergence (in
        bits) between all pairs of probability distributions in P is
        divergence_kullbackleibler_pmf(P).
    cartesian_product : boolean
        Indicates whether probability distributions are paired **one-to-one**
        between P and Q (cartesian_product==False, the default value) or
        **many-to-many** between P and Q (cartesian_product==True).
    base : float
        The desired logarithmic base (default 2).
    require_valid_pmf : boolean
        When set to True (the default value), verifies that probability mass
        assignments in each distribution sum to 1. When set to False, no such
        test is performed, thus allowing incomplete probability distributions
        to be processed.
    keep_dims : boolean
        When set to True and cartesian_product==False an additional dimension
        of length one is appended to the returned array, facilitating any
        broadcast operations required by the user (defaults to False). Has no
        effect when cartesian_product==True.
    """
    H_cross = entropy_cross_pmf(P, Q, cartesian_product, base, require_valid_pmf)
    H = entropy_pmf(P, base, require_valid_pmf)

    H = np.reshape(H, np.append(H.shape, np.ones(H_cross.ndim - H.ndim)).astype("int"))

    H = H_cross - H

    if keep_dims and not cartesian_product:
        H = H[..., np.newaxis]

    return H




[docs]
def divergence_jensenshannon_pmf(
    P, Q=None, cartesian_product=False, base=2, require_valid_pmf=True, keep_dims=False
):
    """
    Returns the Jensen-Shannon divergence [Lin91] between arrays P and Q, each
    representing a discrete probability distribution.

    **Mathematical definition**:

    Denoting with :math:`P`, :math:`Q` probability distributions with common
    domain, the Jensen-Shannon divergence
    :math:`D_{\\mathrm{JS}}(P \\parallel Q)` is defined as:

    .. math::
        D_{\\mathrm{JS}}(P \\parallel Q) =
        \\frac{1}{2} D_{\\mathrm{KL}}(P \\parallel M) +
        \\frac{1}{2} D_{\\mathrm{KL}}(Q \\parallel M)

    where :math:`M = \\frac{1}{2}(P + Q)` and where
    :math:`D_{\\mathrm{KL}}(\\cdot \\parallel \\cdot)` denotes the
    Kullback-Leibler divergence.

    **Parameters**:

    P, Q : numpy array (or array-like object such as a list of immutables, as \
    accepted by np.array())
        *cartesian_product==False and Q is not None*: P and Q are arrays
        containing probability mass assignments, with P.shape==Q.shape.
        Probabilities in a distribution are indexed by the last axis in the
        respective arrays; multiple probability distributions in P and Q may be
        specified using preceding axes of the respective arrays (distributions
        are paired **one-to-one** between P and Q). When P.ndim==Q.ndim==1,
        returns a scalar. When P.ndim>1 and Q.ndim>1, returns an array of
        divergence values with dimensions P.shape[:-1]. Neither P nor Q may
        contain (floating point) NaN values.

        *cartesian_product==True and Q is not None*: P and Q are arrays
        containing probability mass assignments, with P.shape[-1]==Q.shape[-1].
        Probabilities in a distribution are indexed by the last axis in the
        respective arrays; multiple probability distributions in P and Q may be
        specified using preceding axes of the respective arrays (distributions
        are paired **many-to-many** between P and Q). When P.ndim==Q.ndim==1,
        returns a scalar. When P.ndim>1 and Q.ndim>1, returns an array of
        divergence values with dimensions np.append(P.shape[:-1],Q.shape[:-1]).
        Neither P nor Q may contain (floating point) NaN values.

        *Q is None*: Equivalent to divergence_jensenshannon_pmf(P, P, ... ).
        Thus, a shorthand syntax for computing Jensen-Shannon divergence (in
        bits) between all pairs of probability distributions in P is
        divergence_jensenshannon_pmf(P).
    cartesian_product : boolean
        Indicates whether probability distributions are paired **one-to-one**
        between P and Q (cartesian_product==False, the default value) or
        **many-to-many** between P and Q (cartesian_product==True).
    base : float
        The desired logarithmic base (default 2).
    require_valid_pmf : boolean
        When set to True (the default value), verifies that probability mass
        assignments in each distribution sum to 1. When set to False, no such
        test is performed, thus allowing incomplete probability distributions
        to be processed.
    keep_dims : boolean
        When set to True and cartesian_product==False an additional dimension
        of length one is appended to the returned array, facilitating any
        broadcast operations required by the user (defaults to False). Has no
        effect when cartesian_product==True.
    """
    if Q is None:
        Q = P
        cartesian_product = True

    P, _ = _sanitise_array_input(P)
    Q, _ = _sanitise_array_input(Q)

    if P.size == 0:
        raise ValueError("arg P contains no elements")
    if Q.size == 0:
        raise ValueError("arg Q contains no elements")
    if np.any(_isnan(P)):
        raise ValueError("arg P contains NaN values")
    if np.any(_isnan(Q)):
        raise ValueError("arg Q contains NaN values")
    if not cartesian_product and P.shape != Q.shape:
        raise ValueError("dimensions of args P and Q do not match")
    if cartesian_product and P.shape[-1] != Q.shape[-1]:
        raise ValueError("trailing dimensions of args P and Q do not match")
    if np.any(np.logical_or(P < 0, P > 1)):
        raise ValueError("arg P contains values outside unit interval")
    if np.any(np.logical_or(Q < 0, Q > 1)):
        raise ValueError("arg Q contains values outside unit interval")
    if require_valid_pmf and not np.allclose(np.sum(P, axis=-1), 1):
        raise ValueError("arg P does not sum to unity across last axis")
    if require_valid_pmf and not np.allclose(np.sum(Q, axis=-1), 1):
        raise ValueError("arg Q does not sum to unity across last axis")
    if not (np.isscalar(base) and np.isreal(base) and base > 0):
        raise ValueError("arg base not a positive real-valued scalar")

    if cartesian_product:

        def f(P, Q):
            return divergence_jensenshannon_pmf(P, Q, False, base, require_valid_pmf)

        return _cartesian_product_apply(P, Q, f)

    H1 = entropy_pmf(0.5 * (P + Q), base, require_valid_pmf)
    H = (
        H1
        - 0.5 * entropy_pmf(P, base, require_valid_pmf)
        - 0.5 * entropy_pmf(Q, base, require_valid_pmf)
    )

    if keep_dims and not cartesian_product:
        H = H[..., np.newaxis]

    return H




[docs]
def divergence_kullbackleibler_symmetrised_pmf(
    P, Q=None, cartesian_product=False, base=2, require_valid_pmf=True, keep_dims=False
):
    """
    Returns the symmetrised Kullback-Leibler divergence [Lin91] between arrays
    P and Q, each representing a discrete probability distribution.

    **Mathematical definition**:

    Denoting with :math:`P`, :math:`Q` probability distributions with common
    domain, the symmetrised Kullback-Leibler divergence
    :math:`D_{\\mathrm{SKL}}(P \\parallel Q)` is defined as:

    .. math::
        D_{\\mathrm{SKL}}(P \\parallel Q) =
        D_{\\mathrm{KL}}(P \\parallel Q) +
        D_{\\mathrm{KL}}(Q \\parallel P)

    where :math:`D_{\\mathrm{KL}}(\\cdot \\parallel \\cdot)` denotes the
    Kullback-Leibler divergence.

    **Parameters**:

    P, Q : numpy array (or array-like object such as a list of immutables, as \
    accepted by np.array())
        *cartesian_product==False and Q is not None*: P and Q are arrays
        containing probability mass assignments, with P.shape==Q.shape.
        Probabilities in a distribution are indexed by the last axis in the
        respective arrays; multiple probability distributions in P and Q may be
        specified using preceding axes of the respective arrays (distributions
        are paired **one-to-one** between P and Q). When P.ndim==Q.ndim==1,
        returns a scalar. When P.ndim>1 and Q.ndim>1, returns an array of
        divergence values with dimensions P.shape[:-1]. Neither P nor Q may
        contain (floating point) NaN values.

        *cartesian_product==True and Q is not None*: P and Q are arrays
        containing probability mass assignments, with P.shape[-1]==Q.shape[-1].
        Probabilities in a distribution are indexed by the last axis in the
        respective arrays; multiple probability distributions in P and Q may be
        specified using preceding axes of the respective arrays (distributions
        are paired **many-to-many** between P and Q). When P.ndim==Q.ndim==1,
        returns a scalar. When P.ndim>1 and Q.ndim>1, returns an array of
        divergence values with dimensions np.append(P.shape[:-1],Q.shape[:-1]).
        Neither P nor Q may contain (floating point) NaN values.

        *Q is None*: Equivalent to
        divergence_kullbackleibler_symmetrised_pmf(P, P, ... ). Thus, a
        shorthand syntax for computing symmetrised Kullback-Leibler divergence
        (in bits) between all pairs of probability distributions in P is
        divergence_kullbackleibler_symmetrised_pmf(P).
    cartesian_product : boolean
        Indicates whether probability distributions are paired **one-to-one**
        between P and Q (cartesian_product==False, the default value) or
        **many-to-many** between P and Q (cartesian_product==True).
    base : float
        The desired logarithmic base (default 2).
    require_valid_pmf : boolean
        When set to True (the default value), verifies that probability mass
        assignments in each distribution sum to 1. When set to False, no such
        test is performed, thus allowing incomplete probability distributions
        to be processed.
    keep_dims : boolean
        When set to True and cartesian_product==False an additional dimension
        of length one is appended to the returned array, facilitating any
        broadcast operations required by the user (defaults to False). Has no
        effect when cartesian_product==True.
    """
    if Q is None:
        Q = P
        cartesian_product = True

    H1 = divergence_kullbackleibler_pmf(
        P, Q, cartesian_product, base, require_valid_pmf
    )
    H2 = divergence_kullbackleibler_pmf(
        Q, P, cartesian_product, base, require_valid_pmf
    )

    if cartesian_product:
        H2 = H2.T

    H = H1 + H2

    if keep_dims and not cartesian_product:
        H = H[..., np.newaxis]

    return H



def _append_empty_bins_using_alphabet(Counts, Alphabet, Full_Alphabet, fill_value):
    if Alphabet.ndim == 1:
        assert Full_Alphabet.ndim == 1
        A = np.setdiff1d(
            Full_Alphabet[Full_Alphabet != fill_value], Alphabet, assume_unique=True
        )
        Alphabet = np.append(Alphabet, A)
        Counts = np.append(Counts, np.tile(0, Alphabet.size - Counts.size))
        I = Alphabet.argsort(kind="mergesort")
        Alphabet = Alphabet[I]
        Counts = Counts[I]
        assert np.all(Alphabet[1:] != Alphabet[:-1])
        return Counts, Alphabet
    else:
        assert Full_Alphabet.shape[0] == 2 and Alphabet.shape[0] == 2
        Alph1 = np.unique(Full_Alphabet[0, Full_Alphabet[0] != fill_value])
        Alph2 = np.unique(Full_Alphabet[1, Full_Alphabet[1] != fill_value])
        A = np.zeros((2, Alph1.size * Alph2.size), dtype=Full_Alphabet.dtype)
        c = 0
        for j in range(Alph2.size):
            for i in range(Alph1.size):
                A[0, c] = Alph1[i]
                A[1, c] = Alph2[j]
                c = c + 1
        Unseen = np.ones(A.shape[-1], dtype="bool")
        i = j = 0
        while i < Alphabet.shape[-1] and j < A.shape[-1]:
            if np.all(Alphabet[:, i] == A[:, j]):
                Unseen[j] = False
                j = j + 1
                i = i + 1
            elif np.any(Alphabet[:, i] > A[:, j]):
                j = j + 1
            else:
                i = i + 1
        Alphabet = np.hstack((Alphabet, A[:, Unseen]))
        Counts = np.append(Counts, np.tile(0, Alphabet.size - Counts.size))
        # Sort columns
        for i in range(Alphabet.shape[0]):
            I = Alphabet[i].argsort(kind="mergesort")
            Alphabet = Alphabet[:, I]
            Counts = Counts[I]
        assert np.all(np.any(Alphabet[:, 1:] != Alphabet[:, :-1], axis=0))
        return Counts, Alphabet


def _autocreate_alphabet(X, fill_value):
    Lengths = np.apply_along_axis(lambda x: np.unique(x).size, axis=-1, arr=X)
    max_length = np.max(Lengths)

    def pad_with_fillvalue(x):
        return np.append(x, np.tile(fill_value, int(max_length - x.size)))

    Alphabet = np.apply_along_axis(
        lambda x: pad_with_fillvalue(np.unique(x)), axis=-1, arr=X
    )
    return (Alphabet, fill_value)


def _cartesian_product_apply(X, Y, function, Alphabet_X=None, Alphabet_Y=None):
    """
    Applies a function to arrays X and Y, each containing discrete random
    variable realisations. (Internal function.)

    **Parameters**:

    X,Y : numpy array (or array-like object such as a list of immutables, as \
    accepted by np.array())
        *cartesian_product==False: X and Y are arrays containing discrete
        random variable realisations, with X.shape==Y.shape. Successive
        realisations of a random variable are indexed by the last axis in the
        respective arrays; multiple random variables in X and Y may be
        specified using preceding axes of the respective arrays (random
        variables are paired **one-to-one** between X and Y). When
        X.ndim==Y.ndim==1, returns a scalar based on calling function(X,Y).
        When X.ndim>1 and Y.ndim>1, returns an array with dimensions
        X.shape[:-1], based on calling function once for each variable pairing
        in the one-to-one relation.

        *cartesian_product==True: X and Y are arrays containing discrete random
        variable realisations, with X.shape[-1]==Y.shape[-1]. Successive
        realisations of a random variable are indexed by the last axis in the
        respective arrays; multiple random variables in X and Y may be
        specified using preceding axes of the respective arrays (random
        variables are paired **many-to-many** between X and Y). When
        X.ndim==Y.ndim==1, returns a scalar based on calling function(X,Y).
        When X.ndim>1 or Y.ndim>1, returns an array with dimensions
        np.append(X.shape[:-1],Y.shape[:-1]), based on calling function once
        for each variable pairing in the many-to-many relation.
    function : function
        A function with two vector-valued arguments.
     """
    assert X.ndim > 0 and Y.ndim > 0
    assert X.size > 0 and Y.size > 0
    if Alphabet_X is not None or Alphabet_Y is not None:
        assert Alphabet_X.ndim > 0 and Alphabet_Y.ndim > 0
        assert Alphabet_X.size > 0 and Alphabet_Y.size > 0

    H = np.empty(np.append(X.shape[:-1], Y.shape[:-1]).astype("int"))
    if H.ndim > 0:
        H[:] = np.nan
    else:
        H = np.float64(np.nan)

    X = np.reshape(X, (-1, X.shape[-1]))
    Y = np.reshape(Y, (-1, Y.shape[-1]))
    if Alphabet_X is not None or Alphabet_Y is not None:
        Alphabet_X = np.reshape(Alphabet_X, (-1, Alphabet_X.shape[-1]))
        Alphabet_Y = np.reshape(Alphabet_Y, (-1, Alphabet_Y.shape[-1]))
    orig_shape_H = H.shape
    H = np.reshape(H, (-1, 1))

    n = 0
    for i in range(X.shape[0]):
        for j in range(Y.shape[0]):
            if Alphabet_X is not None or Alphabet_Y is not None:
                H[n] = function(X[i], Y[j], Alphabet_X[i], Alphabet_Y[j])
            else:
                H[n] = function(X[i], Y[j])
            n = n + 1

    # Reverse re-shaping
    H = np.reshape(H, orig_shape_H)

    return H


def _determine_number_additional_empty_bins(
    Counts, Alphabet, Full_Alphabet, fill_value
):
    alphabet_sizes = np.sum(np.atleast_2d(Full_Alphabet) != fill_value, axis=-1)
    if np.any(alphabet_sizes != fill_value):
        joint_alphabet_size = np.prod(alphabet_sizes[alphabet_sizes > 0])
        if joint_alphabet_size <= 0:
            raise ValueError(
                "Numerical overflow detected. Joint alphabet " "size too large."
            )
    else:
        joint_alphabet_size = 0
    return joint_alphabet_size - np.sum(
        np.all(np.atleast_2d(Alphabet) != fill_value, axis=0)
    )


def _estimate_probabilities(Counts, estimator, n_additional_empty_bins=0):
    # TODO Documentation should present the following guidelines:
    # 1) Good-Turing may be used if slope requirement satisfied and if
    # unobserved symbols have been defined (TODO Clarify what the requirement
    # is)
    # 2) James-Stein approach may be used as an alternative
    # 3) Dirichlet prior may be used in all other cases

    assert np.sum(Counts) > 0
    assert np.all(Counts.astype("int") == Counts)
    assert n_additional_empty_bins >= 0
    Counts = Counts.astype("int")

    if isinstance(estimator, str):
        estimator = estimator.upper().replace(" ", "")

    if np.isreal(estimator) or estimator in ("ML", "PERKS", "MINIMAX"):
        if np.isreal(estimator):
            alpha = estimator
        elif estimator == "PERKS":
            alpha = 1.0 / (Counts.size + n_additional_empty_bins)
        elif estimator == "MINIMAX":
            alpha = np.sqrt(np.sum(Counts)) / (Counts.size + n_additional_empty_bins)
        else:
            alpha = 0
        Theta = (Counts + alpha) / (
            1.0 * np.sum(Counts) + alpha * (Counts.size + n_additional_empty_bins)
        )
        # Theta_0 is the probability mass assigned to each additional empty bin
        if n_additional_empty_bins > 0:
            Theta_0 = alpha / (
                1.0 * np.sum(Counts) + alpha * (Counts.size + n_additional_empty_bins)
            )
        else:
            Theta_0 = 0

    elif estimator == "GOOD-TURING":
        # TODO We could also add a Chen-Chao vocabulary size estimator (See
        # Bhat Suma's thesis)

        # The following notation is based on Gale and Sampson (1995)
        # Determine histogram of counts N_r (index r denotes count)
        X = np.sort(Counts)
        B = X[1:] != X[:-1]  # Compute symbol change indicators
        I = np.append(np.where(B), X.size - 1)  # Obtain symbol change positions
        N_r = np.zeros(X[I[-1]] + 1)
        N_r[X[I]] = np.diff(np.append(-1, I))  # Compute run lengths
        N_r[0] = 0  # Ensures that unobserved symbols do not interfere

        # Compute Z_r, a locally averaged version of N_r
        R = np.where(N_r)[0]
        Q = np.append(0, R[:-1])
        T = np.append(R[1:], 2 * R[-1] - Q[-1])
        Z_r = np.zeros_like(N_r)
        Z_r[R] = N_r[R] / (0.5 * (T - Q))

        # Fit least squares regression line to plot of log(Z_r) versus log(r)
        x = np.log10(np.arange(1, Z_r.size))
        with np.errstate(invalid="ignore", divide="ignore"):
            y = np.log10(Z_r[1:])
        x = x[np.isfinite(y)]
        y = y[np.isfinite(y)]
        m, c = np.linalg.lstsq(np.vstack([x, np.ones(x.size)]).T, y, rcond=None)[0]
        if m >= -1:
            warnings.warn(
                "Regression slope < -1 requirement in linear "
                "Good-Turing estimate not satisfied"
            )
        # Compute smoothed value of N_r based on interpolation
        # We need to refer to SmoothedN_{r+1} for all observed values of r
        SmoothedN_r = np.zeros(N_r.size + 1)
        SmoothedN_r[1:] = 10 ** (np.log10(np.arange(1, SmoothedN_r.size)) * m + c)

        # Determine threshold value of r at which to use smoothed values of N_r
        # (SmoothedN_r), as apposed to straightforward N_r.
        # Variance of Turing estimate
        with np.errstate(invalid="ignore", divide="ignore"):
            VARr_T = (
                (np.arange(N_r.size) + 1) ** 2
                * (1.0 * np.append(N_r[1:], 0) / (N_r**2))
                * (1 + np.append(N_r[1:], 0) / N_r)
            )
            x = (np.arange(N_r.size) + 1) * 1.0 * np.append(N_r[1:], 0) / N_r
            y = (np.arange(N_r.size) + 1) * 1.0 * SmoothedN_r[1:] / (SmoothedN_r[:-1])
            assert np.isinf(VARr_T[0]) or np.isnan(VARr_T[0])
            turing_is_sig_diff = np.abs(x - y) > 1.96 * np.sqrt(VARr_T)
        assert turing_is_sig_diff[0] == np.array(False)
        # NB: 0th element can be safely ignored, since always 0
        T = np.where(turing_is_sig_diff == np.array(False))[0]
        if T.size > 1:
            thresh_r = T[1]
            # Use smoothed estimates from the first non-significant
            # np.abs(SmoothedN_r-N_r) position onwards
            SmoothedN_r[:thresh_r] = N_r[:thresh_r]
        else:
            # Use only non-smoothed estimates (except for SmoothedN_r[-1])
            SmoothedN_r[:-1] = N_r

        # Estimate probability of encountering one particular symbol among the
        # objects observed r times, r>0
        p_r = np.zeros(N_r.size)
        N = np.sum(Counts)
        p_r[1:] = (
            (np.arange(1, N_r.size) + 1)
            * 1.0
            * SmoothedN_r[2:]
            / (SmoothedN_r[1:-1] * N)
        )
        # Estimate probability of observing any unseen symbol
        p_r[0] = 1.0 * N_r[1] / N

        # Assign probabilities to observed symbols
        Theta = np.array([p_r[r] for r in Counts])
        Theta[Counts == 0] = 0

        # Normalise probabilities for observed symbols, so that they sum to one
        if np.any(Counts == 0) or n_additional_empty_bins > 0:
            Theta = (1 - p_r[0]) * Theta / np.sum(Theta)
        else:
            warnings.warn(
                "No unobserved outcomes specified. Disregarding the "
                "probability mass allocated to any unobserved "
                "outcomes."
            )
            Theta = Theta / np.sum(Theta)

        # Divide p_0 among unobserved symbols
        with np.errstate(invalid="ignore", divide="ignore"):
            p_emptybin = p_r[0] / (np.sum(Counts == 0) + n_additional_empty_bins)
        Theta[Counts == 0] = p_emptybin
        # Theta_0 is the probability mass assigned to each additional empty bin
        if n_additional_empty_bins > 0:
            Theta_0 = p_emptybin
        else:
            Theta_0 = 0

    elif estimator == "JAMES-STEIN":
        Theta, _ = _estimate_probabilities(Counts, "ML")
        p_uniform = 1.0 / (Counts.size + n_additional_empty_bins)
        with np.errstate(invalid="ignore", divide="ignore"):
            Lambda = (1 - np.sum(Theta**2)) / (
                (np.sum(Counts) - 1)
                * (
                    np.sum((p_uniform - Theta) ** 2)
                    + n_additional_empty_bins * p_uniform**2
                )
            )

        if Lambda > 1:
            Lambda = 1
        elif Lambda < 0:
            Lambda = 0
        elif np.isnan(Lambda):
            Lambda = 1

        Theta = Lambda * p_uniform + (1 - Lambda) * Theta
        # Theta_0 is the probability mass assigned to each additional empty bin
        if n_additional_empty_bins > 0:
            Theta_0 = Lambda * p_uniform
        else:
            Theta_0 = 0
    else:
        raise ValueError("invalid value specified for estimator")

    return Theta, Theta_0


def _increment_binary_vector(X):
    carry_1 = False
    x = X[0] ^ True
    if not x:
        carry_1 = True
    X[0] = x
    for i in range(1, X.size):
        x = X[i] ^ carry_1
        if not X[i] and x:
            carry_1 = False
        if X[i] and not x:
            carry_1 = True
        X[i] = x

        if not carry_1:
            break

    return X


def _isnan(X):
    X = np.asarray(X)
    if np.issubdtype(X.dtype, np.integer) or np.issubdtype(X.dtype, np.floating):
        return np.isnan(X)
    else:
        f = np.vectorize(_isnan_element)
        return f(X)


def _isnan_element(x):
    if isinstance(x, type(np.nan)):
        return np.isnan(x)
    else:
        return False


def _map_observations_to_integers(Symbol_matrices, Fill_values):
    assert len(Symbol_matrices) == len(Fill_values)
    FILL_VALUE = -1
    if np.any([not np.issubdtype(A.dtype, np.integer) for A in Symbol_matrices]) or np.any(
        np.array(Fill_values) != FILL_VALUE
    ):
        L = sklearn.preprocessing.LabelEncoder()
        F = [np.atleast_1d(v) for v in Fill_values]
        L.fit(np.concatenate([A.ravel() for A in Symbol_matrices] + F))
        # TODO make sure to test with various (unusual) data types
        Symbol_matrices = [
            L.transform(A.ravel()).reshape(A.shape) for A in Symbol_matrices
        ]
        Fill_values = [L.transform(np.atleast_1d(f)) for f in Fill_values]

        for A, f in zip(Symbol_matrices, Fill_values):
            assert not np.any(A == FILL_VALUE)
            A[A == f] = FILL_VALUE

    assert np.all([np.issubdtype(A.dtype, np.integer) for A in Symbol_matrices])
    return Symbol_matrices, FILL_VALUE


def _remove_counts_at_fill_value(Counts, Alphabet, fill_value):
    I = np.any(np.atleast_2d(Alphabet) == fill_value, axis=0)
    if np.any(I):
        Counts = Counts[~I]
        Alphabet = Alphabet.T[~I].T
    return (Counts, Alphabet)


def _sanitise_array_input(X, fill_value=-1):
    # Avoid Python 3 issues with numpy arrays containing None elements
    if np.any(np.equal(X, None)) or fill_value is None:
        X = np.asarray(X)
        assert np.all(X != NONE_REPLACEMENT)
        M = np.equal(X, None)
        X = np.where(M, NONE_REPLACEMENT, X)
    if fill_value is None:
        X = np.array(X)
        fill_value = NONE_REPLACEMENT

    if isinstance(X, (pd.core.frame.DataFrame, pd.core.series.Series)):
        # Create masked array, honouring Dataframe/Series missing entries
        # NB: We transpose for convenience, so that quantities are computed for
        # each column
        X = np.ma.MaskedArray(X, X.isnull())

    if isinstance(X, np.ma.MaskedArray):
        fill_value = X.fill_value

        if np.any(X == fill_value):
            warnings.warn("Masked array contains data equal to fill value")

        if X.dtype.kind in ("S", "U"):
            kind = X.dtype.kind
            current_dtype_len = int(X.dtype.str.split(kind)[1])
            if current_dtype_len < len(fill_value):
                # Fix numpy's broken array string type behaviour which causes
                # X.filled() placeholder entries to be no longer than
                # non-placeholder entries
                warnings.warn(
                    "Changing numpy array dtype internally to "
                    "accommodate fill_value string length"
                )
                M = X.mask
                X = np.array(X.filled(), dtype=kind + str(len(fill_value)))
                X[M] = fill_value
            else:
                X = X.filled()
        else:
            X = X.filled()
    else:
        X = np.asarray(X)

    if X.dtype.kind not in "biufcmMOSUV":
        raise TypeError("Unsupported array dtype")

    if X.size == 1 and X.ndim == 0:
        X = np.array((X,))

    return X, np.array(fill_value)


def _verify_alphabet_sufficiently_large(X, Alphabet, fill_value):
    assert not np.any(X == np.array(None))
    assert not np.any(Alphabet == np.array(None))
    for i in range(X.shape[0]):
        I = X[i] != fill_value
        J = Alphabet[i] != fill_value
        # NB: This causes issues when both arguments contain None. But it is
        # always called after observations have all been mapped to integers.
        if np.setdiff1d(X[i, I], Alphabet[i, J]).size > 0:
            raise ValueError(
                "provided alphabet does not contain all observed " "outcomes"
            )


def _vstack_pad(Arrays, fill_value):
    max_length = max([A.shape[-1] for A in Arrays])
    Arrays = [
        np.append(
            A,
            np.tile(
                fill_value,
                np.append(A.shape[:-1], max_length - A.shape[-1]).astype(int),
            ),
        )
        for A in Arrays
    ]
    return np.vstack((Arrays))


# TODO Avoid hack surrounding fill_type None and numpy arrays with Python 3. Remove support for fill_type None?
# TODO Tests for keep_dims
# TODO Should this really be NaN? Is it consistently NaN for all measures?
# drv.entropy([-1,], estimator='PERKS', Alphabet_X = np.arange(100))

# NB: The following tests should also determine what happens when data contain
# None, but fill value is not None
# TODO Test _determine_number_additional_empty_bins using None fill_value etc.
# / add assertions
# TODO Test _append_empty_bins_using_alphabet using None fill_value etc. / add
# assertions
# TODO Test _autocreate_alphabet using None fill_value / add assertions
# TODO Test _remove_counts_at_fill_value / add assertions
# TODO Test _vstack_pad / add assertions
# TODO Run some integration tests using a mixed-type DataFrame
# TODO Run tests using unusual pandas arrangements, such as panels /
# or multi-level Dataframes