pyitlib

pyitlib is an MIT-licensed library of information-theoretic methods for data analysis and machine learning, implemented in Python and NumPy.

API documentation is available online at https://pafoster.github.io/pyitlib/.

pyitlib implements the following 19 measures on discrete random variables:

  • Entropy
  • Joint entropy
  • Conditional entropy
  • Cross entropy
  • Kullback-Leibler divergence
  • Symmetrised Kullback-Leibler divergence
  • Jensen-Shannon divergence
  • Mutual information
  • Normalised mutual information (7 variants)
  • Variation of information
  • Lautum information
  • Conditional mutual information
  • Co-information
  • Interaction information
  • Multi-information
  • Binding information
  • Residual entropy
  • Exogenous local information
  • Enigmatic information

The following estimators are available for each of the measures:

  • Maximum likelihood
  • Maximum a posteriori
  • James-Stein
  • Good-Turing

Missing data are supported, either using placeholder values or NumPy masked arrays.

Installation and codebase

pyitlib is listed on the Python Package Index at https://pypi.python.org/pypi/pyitlib/ and may be installed using pip as follows:

pip install pyitlib

The codebase for pyitlib is available at https://github.com/pafoster/pyitlib.

Notes for getting started

Import the module discrete_random_variable, as well as NumPy:

import numpy as np
from pyitlib import discrete_random_variable as drv

The respective methods implemented in discrete_random_variable accept NumPy arrays as input. Let’s compute the entropy for an array containing discrete random variable realisations, based on maximum likelihood estimation and quantifying entropy in bits:

>>> X = np.array((1,2,1,2))
>>> drv.entropy(X)
array(1.0)

NumPy arrays are created automatically for any input which isn’t of the required type, by passing the input to np.array(). Let’s compute entropy, again based on maximum likelihood estimation, but this time using list input and quantifying entropy in nats:

>>> drv.entropy(['a', 'b', 'a', 'b'], base=np.exp(1))
array(0.6931471805599453)

Those methods with the suffix _pmf operate on arrays specifying probability mass assignments. For example, the analogous method call for computing the entropy of the preceding random variable realisations (with estimated equi-probable outcomes) is:

>>> drv.entropy_pmf([0.5, 0.5], base=np.exp(1))
0.69314718055994529

It’s possible to specify missing data using placeholder values (the default placeholder value is -1). Elements equal to the placeholder value are subsequently ignored:

>>> drv.entropy([1, 2, 1, 2, -1])
array(1.0)

In measures expressible in terms of joint entropy (such as conditional entropy, mutual information etc.), equally many realisations of respective random variables are required (with realisations coupled using a common index). Any missing data for random variable X results in the corresponding realisations for random variable Y being ignored, and vice versa. Thus, the following method calls yield equivalent results (note use of alternative placeholder value None):

>>> drv.entropy_conditional([1,2,2,2], [1,1,2,2])
array(0.5)
>>> drv.entropy_conditional([1,2,2,2,1], [1,1,2,2,None], fill_value=None)
array(0.5)

It’s alternatively possible to specify missing data using NumPy masked arrays:

>>> Z = np.ma.array((1,2,1), mask=(0,0,1))
>>> drv.entropy(Z)
array(1.0)

In combination with any estimator other than maximum likelihood, it may be useful to specify alphabets containing unobserved outcomes. For example, we might seek to estimate the entropy in bits for the sequence of realisations [1,1,1,1]. Using maximum a posteriori estimation combined with the Perks prior (i.e. pseudo-counts of 1/L for each of L possible outcomes) and based on an alphabet specifying L=100 possible outcomes, we may use:

>>> drv.entropy([1,1,1,1], estimator='PERKS', Alphabet_X = np.arange(100))
array(2.030522626645241)

Multi-dimensional array input is supported based on the convention that leading dimensions index random variables, with the trailing dimension indexing random variable realisations. Thus, the following array specifies realisations for 3 random variables:

>>> X = np.array(((1,1,1,1), (1,1,2,2), (1,1,2,2)))
>>> X.shape
(3, 4)

When using multi-dimensional arrays, any alphabets must be specified separately for each random variable represented in the multi-dimensional array, using placeholder values (or NumPy masked arrays) to pad out any unequally sized alphabets:

>>> drv.entropy(X, estimator='PERKS', Alphabet_X = np.tile(np.arange(100),(3,1))) # 3 alphabets required
array([ 2.03052263,  2.81433872,  2.81433872])

>>> A = np.array(((1,2,-1), (1,2,-1), (1,2,3))) # padding required
>>> drv.entropy(X, estimator='PERKS', Alphabet_X = A)
array([ 0.46899559,  1.        ,  1.28669267])

For ease of use, those methods operating on two random variable array arguments (such as entropy_conditional, information_mutual etc.) may be invoked with a single multi-dimensional array. In this way, we may compute mutual information for all pairs of random variables represented in the array as follows:

>>> drv.information_mutual(X)
array([[ 0.,  0.,  0.],
       [ 0.,  1.,  1.],
       [ 0.,  1.,  1.]])

The above is equivalent to setting the cartesian_product parameter to True and specifying two random variable array arguments explicitly:

>>> drv.information_mutual(X, X, cartesian_product=True)
array([[ 0.,  0.,  0.],
       [ 0.,  1.,  1.],
       [ 0.,  1.,  1.]])

By default, those methods operating on several random variable array arguments don’t determine all combinations of random variables exhaustively. Instead a one-to-one mapping is performed:

>>> drv.information_mutual(X, X) # Mutual information between 3 pairs of random variables
array([ 0.,  1.,  1.])

>>> drv.entropy(X) # Mutual information equivalent to entropy in above case
array([ 0.,  1.,  1.])

pyitlib provides basic support for pandas DataFrames/Series. Both these types are converted to NumPy masked arrays internally, while masking those data recorded as missing (based on .isnull()). Note that due to indexing random variable realisations using the trailing dimension of multi-dimensional arrays, we typically need to transpose DataFrames when estimating information-theoretic quantities:

>>> import pandas
>>> df = pandas.read_csv('https://raw.githubusercontent.com/veekun/pokedex/master/pokedex/data/csv/pokemon.csv')
>>> df = df[['height', 'weight', 'base_experience']].apply(lambda s: pandas.qcut(s, 10, labels=False)) # Bin the data
>>> drv.information_mutual_normalised(df.T) # Transposition required for comparing columns
array([[ 1.        ,  0.32472696,  0.17745753],
       [ 0.32729034,  1.        ,  0.13343504],
       [ 0.17848175,  0.13315407,  1.        ]])

discrete_random_variable

This module implements various information-theoretic quantities for discrete random variables.

For ease of reference, function names follow the following convention:

Function names beginning with “entropy” : Entropy measures

Function names beginning with “information” : Mutual information measures

Function names beginning with “divergence” : Divergence measures

Function names ending with “pmf” : Functions operating on arrays of probability mass assignments (as opposed realisations of random variables)

Function Generalises Non-negativity Symmetry Identity Metric properties
divergence_jensenshannon()   Yes Yes Yes Square root is a metric
divergence_jensenshannon_pmf()   Yes Yes Yes Square root is a metric
divergence_kullbackleibler()   Yes No Yes  
divergence_kullbackleibler_pmf()   Yes No Yes  
divergence_kullbackleibler_symmetrised()   Yes Yes Yes  
divergence_kullbackleibler_symmetrised_pmf()   Yes Yes Yes  
entropy()   Yes      
entropy_conditional()   Yes No No  
entropy_cross()   Yes No No  
entropy_cross_pmf()   Yes No No  
entropy_joint()   Yes Yes No  
entropy_pmf()   Yes      
entropy_residual() information_variation Yes Yes No  
information_binding() information_mutual Yes Yes No  
information_co() information_mutual No No No  
information_enigmatic()   No Yes No  
information_exogenous_local()   Yes Yes No  
information_interaction() information_mutual No No No  
information_lautum()   Yes No No  
information_multi() information_mutual Yes Yes No  
information_mutual()   Yes Yes No  
information_mutual_conditional()   Yes No No  
information_mutual_normalised()   Yes See docs No See docs
information_variation()   Yes Yes No Is a metric

References

[AbPl12]Abdallah, S.A.; Plumbley, M.D.: A measure of statistical complexity based on predictive information with application to finite spin systems. In: Physics Letters A, Vol. 376, No. 4, 2012, P. 275-281.
[Bell03]Bell, A.J.: The co-information lattice. In: Proceedings of the International Workshop on Independent Component Analysis and Blind Signal Separation. 2003.
[CoTh06]Cover, T.M.; Thomas, J.A.: Elements of information theory (2nd ed.). John Wiley & Sons, 2006.
[Croo15]Crooks, G.E.: On measures of entropy and information. http://threeplusone.com/info, retrieved 2017-03-16.
[GaSa95]Gale, W.A.; Sampson, G.: Good‐Turing frequency estimation without tears. In: Journal of Quantitative Linguistics, Vol. 2, No. 3, 1995, P. 217-237.
[Han78]Han, T.S.: Nonnegative entropy measures of multivariate symmetric correlations. In: Information and Control, Vol. 36, 1978, P. 133-156.
[HaSt09]Hausser, J.; Strimmer, K.: Entropy inference and the James-Stein estimator, with application to nonlinear gene association networks. In: Journal of Machine Learning Research, Vol. 10, 2009, P. 1469-1484.
[JaBr03]Jakulin, A.; Bratko, I.: Quantifying and visualizing attribute interactions. arXiv preprint cs/0308002, 2003.
[JaEC11]James, R.G.; Ellison, C.J.; Crutchfield, J.P.: Anatomy of a bit: Information in a time series observation. In: Chaos: An Interdisciplinary Journal of Nonlinear Science, Vol. 21, No. 3, 2011.
[Lin91]Lin, J.: Divergence measures based on the Shannon entropy. In: IEEE Transactions on Information theory, Vol. 37, No. 1, 1991, P. 145-151.
[Meil03]Meilă, M.: Comparing clusterings by the variation of information. In: Learning theory and kernel machines. Springer, 2003, P. 173-187.
[Murp12]Murphy, K. P.: Machine learning: a probabilistic perspective. MIT press, 2012.
[PaVe08]Palomar, D. P.; Verdú, S.: Lautum information. In: IEEE transactions on information theory, Vol. 54, No. 3, 2008, P. 964-975.
[StVe98]Studený, M.; Vejnarová, J.: The multiinformation function as a tool for measuring stochastic dependence. In: Learning in graphical models. Springer Netherlands, 1998, P. 261-297.
[VeWe06]Verdú, S.; Weissman, T.: Erasure entropy. In: Proc. IEEE International Symposium on Information Theory, 2006, P. 98-102.
[Wata60]Watanabe, S.: Information theoretical analysis of multivariate correlation. In: IBM Journal of research and development, Vol. 4, No. 1, 1960, P. 66-82.
discrete_random_variable.divergence_jensenshannon(X, Y=None, cartesian_product=False, base=2, fill_value=-1, estimator='ML', Alphabet_X=None, Alphabet_Y=None, keep_dims=False)[source]

Returns the Jensen-Shannon divergence [Lin91] between arrays X and Y, each containing discrete random variable realisations.

Mathematical definition:

Denoting with \(P_X\), \(P_Y\) respectively probability distributions with common domain, associated with discrete random variables \(X\), \(Y\), the Jensen-Shannon divergence \(D_{\mathrm{JS}}(P_X \parallel P_Y)\) is defined as:

\[D_{\mathrm{JS}}(P_X \parallel P_Y) = \frac{1}{2} D_{\mathrm{KL}}(P_X \parallel M) + \frac{1}{2} D_{\mathrm{KL}}(P_Y \parallel M)\]

where \(M = \frac{1}{2}(P_X + P_Y)\) and where \(D_{\mathrm{KL}}(\cdot \parallel \cdot)\) denotes the Kullback-Leibler divergence.

Estimation:

Jensen-Shannon divergence is estimated based on frequency tables. See below for a list of available estimators.

Parameters:

X,Y : numpy array (or array-like object such as a list of immutables, as accepted by np.array())

cartesian_product==False and Y is not None: X and Y are arrays containing discrete random variable realisations, with X.shape[:-1]==Y.shape[:-1]. Successive realisations of a random variable are indexed by the last axis in the respective arrays; multiple random variables in X and Y may be specified using preceding axes of the respective arrays (random variables are paired one-to-one between X and Y). When X.ndim==Y.ndim==1, returns a scalar. When X.ndim>1 and Y.ndim>1, returns an array of estimated divergence values with dimensions X.shape[:-1]. Neither X nor Y may contain (floating point) NaN values. Missing data may be specified using numpy masked arrays, as well as using standard numpy array/array-like objects; see below for details.

cartesian_product==True and Y is not None: X and Y are arrays containing discrete random variable realisations. Successive realisations of a random variable are indexed by the last axis in the respective arrays; multiple random variables in X and Y may be specified using preceding axes of the respective arrays (random variables are paired many-to-many between X and Y). When X.ndim==Y.ndim==1, returns a scalar. When X.ndim>1 or Y.ndim>1, returns an array of estimated divergence values with dimensions np.append(X.shape[:-1],Y.shape[:-1]). Neither X nor Y may contain (floating point) NaN values. Missing data may be specified using numpy masked arrays, as well as using standard numpy array/array-like objects; see below for details.

Y is None: Equivalent to divergence_jensenshannon(X, X, … ). Thus, a shorthand syntax for computing Jensen-Shannon divergence (in bits) between all pairs of random variables in X is divergence_jensenshannon(X).

cartesian_product : boolean
Indicates whether random variables are paired one-to-one between X and Y (cartesian_product==False, the default value) or many-to-many between X and Y (cartesian_product==True).
base : float
The desired logarithmic base (default 2).
fill_value : object

It is possible to specify missing data using numpy masked arrays, pandas Series/DataFrames, as well as using standard numpy array/array-like objects with assigned placeholder values. When using numpy masked arrays, this function invokes np.ma.filled() internally, so that missing data are represented with the array’s object-internal placeholder value fill_value (this function’s fill_value parameter is ignored in such cases). When using pandas Series/DataFrames, an initial conversion to a numpy masked array is performed. When using standard numpy array/array-like objects, this function’s fill_value parameter is used to specify the placeholder value for missing data (defaults to -1).

Data equal to the placeholder value are subsequently ignored.

estimator : str or float

The desired estimator (see above for details on estimators). Possible values are:

‘ML’ (the default value) : Maximum likelihood estimator.

any floating point value : Maximum a posteriori esimator using Dirichlet prior (equivalent to maximum likelihood with pseudo-count for each outcome as specified).

PERKS : Maximum a posteriori esimator using Dirichlet prior (equivalent to maximum likelihood with pseudo-count for each outcome set to 1/L, where L is the number of possible outcomes.

MINIMAX : Maximum a posteriori esimator using Dirichlet prior (equivalent to maximum likelihood with pseudo-count for each outcome set to sqrt(N)/L, where N is the total number of realisations and where L is the number of possible outcomes.

JAMES-STEIN : James-Stein estimator [HaSt09].

GOOD-TURING : Good-Turing estimator [GaSa95].

Alphabet_X, Alphabet_Y : numpy array (or array-like object such as a list of immutables, as accepted by np.array())
Respectively an array specifying the alphabet/alphabets of possible outcomes that random variable realisations in array X, Y may assume. Defaults to None, in which case the alphabet/alphabets of possible outcomes is/are implicitly based the observed outcomes in array X, Y respectively, with no additional, unobserved outcomes. In combination with any estimator other than maximum likelihood, it may be useful to specify alphabets including unobserved outcomes. For such cases, successive possible outcomes of a random variable are indexed by the last axis in Alphabet_X, Alphabet_Y respectively; multiple alphabets may be specified using preceding axes, with the requirement X.shape[:-1]==Alphabet_X.shape[:-1] (analogously for Y). Alphabets of different sizes may be specified either using numpy masked arrays, or by padding with the chosen placeholder fill_value.
keep_dims : boolean
When set to True and cartesian_product==False an additional dimension of length one is appended to the returned array, facilitating any broadcast operations required by the user (defaults to False). Has no effect when cartesian_product==True.

Implementation notes:

Before estimation, outcomes are mapped to the set of non-negative integers internally, with the value -1 representing missing data. To avoid this internal conversion step, supply integer data and use the default fill value -1.

discrete_random_variable.divergence_jensenshannon_pmf(P, Q=None, cartesian_product=False, base=2, require_valid_pmf=True, keep_dims=False)[source]

Returns the Jensen-Shannon divergence [Lin91] between arrays P and Q, each representing a discrete probability distribution.

Mathematical definition:

Denoting with \(P\), \(Q\) probability distributions with common domain, the Jensen-Shannon divergence \(D_{\mathrm{JS}}(P \parallel Q)\) is defined as:

\[D_{\mathrm{JS}}(P \parallel Q) = \frac{1}{2} D_{\mathrm{KL}}(P \parallel M) + \frac{1}{2} D_{\mathrm{KL}}(Q \parallel M)\]

where \(M = \frac{1}{2}(P + Q)\) and where \(D_{\mathrm{KL}}(\cdot \parallel \cdot)\) denotes the Kullback-Leibler divergence.

Parameters:

P, Q : numpy array (or array-like object such as a list of immutables, as accepted by np.array())

cartesian_product==False and Q is not None: P and Q are arrays containing probability mass assignments, with P.shape==Q.shape. Probabilities in a distribution are indexed by the last axis in the respective arrays; multiple probability distributions in P and Q may be specified using preceding axes of the respective arrays (distributions are paired one-to-one between P and Q). When P.ndim==Q.ndim==1, returns a scalar. When P.ndim>1 and Q.ndim>1, returns an array of divergence values with dimensions P.shape[:-1]. Neither P nor Q may contain (floating point) NaN values.

cartesian_product==True and Q is not None: P and Q are arrays containing probability mass assignments, with P.shape[-1]==Q.shape[-1]. Probabilities in a distribution are indexed by the last axis in the respective arrays; multiple probability distributions in P and Q may be specified using preceding axes of the respective arrays (distributions are paired many-to-many between P and Q). When P.ndim==Q.ndim==1, returns a scalar. When P.ndim>1 and Q.ndim>1, returns an array of divergence values with dimensions np.append(P.shape[:-1],Q.shape[:-1]). Neither P nor Q may contain (floating point) NaN values.

Q is None: Equivalent to divergence_jensenshannon_pmf(P, P, … ). Thus, a shorthand syntax for computing Jensen-Shannon divergence (in bits) between all pairs of probability distributions in P is divergence_jensenshannon_pmf(P).

cartesian_product : boolean
Indicates whether probability distributions are paired one-to-one between P and Q (cartesian_product==False, the default value) or many-to-many between P and Q (cartesian_product==True).
base : float
The desired logarithmic base (default 2).
require_valid_pmf : boolean
When set to True (the default value), verifies that probability mass assignments in each distribution sum to 1. When set to False, no such test is performed, thus allowing incomplete probability distributions to be processed.
keep_dims : boolean
When set to True and cartesian_product==False an additional dimension of length one is appended to the returned array, facilitating any broadcast operations required by the user (defaults to False). Has no effect when cartesian_product==True.
discrete_random_variable.divergence_kullbackleibler(X, Y=None, cartesian_product=False, base=2, fill_value=-1, estimator='ML', Alphabet_X=None, Alphabet_Y=None, keep_dims=False)[source]

Returns the Kullback-Leibler divergence (see e.g. [CoTh06]) between arrays X and Y, each containing discrete random variable realisations.

Mathematical definition:

Denoting with \(P_X(x)\), \(P_Y(x)\) respectively the probability of observing an outcome \(x\) with discrete random variables \(X\), \(Y\), the Kullback-Leibler divergence \(D_{\mathrm{KL}}(P_X\parallel P_Y)\) is defined as:

\[D_{\mathrm{KL}}(P_X \parallel P_Y) = -\sum_x {P_X(x) \log {\frac{P_Y(x)}{P_X(x)}}}.\]

Estimation:

Kullback-Leibler divergence is estimated based on frequency tables, using the following functions:

entropy_cross()

entropy()

See below for a list of available estimators. Note that although Kullback-Leibler divergence is a non-negative quantity, depending on the chosen estimator the obtained estimate may be negative.

Parameters:

X,Y : numpy array (or array-like object such as a list of immutables, as accepted by np.array())

cartesian_product==False and Y is not None: X and Y are arrays containing discrete random variable realisations, with X.shape[:-1]==Y.shape[:-1]. Successive realisations of a random variable are indexed by the last axis in the respective arrays; multiple random variables in X and Y may be specified using preceding axes of the respective arrays (random variables are paired one-to-one between X and Y). When X.ndim==Y.ndim==1, returns a scalar. When X.ndim>1 and Y.ndim>1, returns an array of estimated divergence values with dimensions X.shape[:-1]. Neither X nor Y may contain (floating point) NaN values. Missing data may be specified using numpy masked arrays, as well as using standard numpy array/array-like objects; see below for details.

cartesian_product==True and Y is not None: X and Y are arrays containing discrete random variable realisations. Successive realisations of a random variable are indexed by the last axis in the respective arrays; multiple random variables in X and Y may be specified using preceding axes of the respective arrays (random variables are paired many-to-many between X and Y). When X.ndim==Y.ndim==1, returns a scalar. When X.ndim>1 or Y.ndim>1, returns an array of estimated divergence values with dimensions np.append(X.shape[:-1],Y.shape[:-1]). Neither X nor Y may contain (floating point) NaN values. Missing data may be specified using numpy masked arrays, as well as using standard numpy array/array-like objects; see below for details.

Y is None: Equivalent to divergence_kullbackleibler(X, X, … ). Thus, a shorthand syntax for computing Kullback-Leibler divergence (in bits) between all pairs of random variables in X is divergence_kullbackleibler(X).

cartesian_product : boolean
Indicates whether random variables are paired one-to-one between X and Y (cartesian_product==False, the default value) or many-to-many between X and Y (cartesian_product==True).
base : float
The desired logarithmic base (default 2).
fill_value : object

It is possible to specify missing data using numpy masked arrays, pandas Series/DataFrames, as well as using standard numpy array/array-like objects with assigned placeholder values. When using numpy masked arrays, this function invokes np.ma.filled() internally, so that missing data are represented with the array’s object-internal placeholder value fill_value (this function’s fill_value parameter is ignored in such cases). When using pandas Series/DataFrames, an initial conversion to a numpy masked array is performed. When using standard numpy array/array-like objects, this function’s fill_value parameter is used to specify the placeholder value for missing data (defaults to -1).

Data equal to the placeholder value are subsequently ignored.

estimator : str or float

The desired estimator (see above for details on estimators). Possible values are:

‘ML’ (the default value) : Maximum likelihood estimator.

any floating point value : Maximum a posteriori esimator using Dirichlet prior (equivalent to maximum likelihood with pseudo-count for each outcome as specified).

PERKS : Maximum a posteriori esimator using Dirichlet prior (equivalent to maximum likelihood with pseudo-count for each outcome set to 1/L, where L is the number of possible outcomes.

MINIMAX : Maximum a posteriori esimator using Dirichlet prior (equivalent to maximum likelihood with pseudo-count for each outcome set to sqrt(N)/L, where N is the total number of realisations and where L is the number of possible outcomes.

JAMES-STEIN : James-Stein estimator [HaSt09].

GOOD-TURING : Good-Turing estimator [GaSa95].

Alphabet_X, Alphabet_Y : numpy array (or array-like object such as a list of immutables, as accepted by np.array())
Respectively an array specifying the alphabet/alphabets of possible outcomes that random variable realisations in array X, Y may assume. Defaults to None, in which case the alphabet/alphabets of possible outcomes is/are implicitly based the observed outcomes in array X, Y respectively, with no additional, unobserved outcomes. In combination with any estimator other than maximum likelihood, it may be useful to specify alphabets including unobserved outcomes. For such cases, successive possible outcomes of a random variable are indexed by the last axis in Alphabet_X, Alphabet_Y respectively; multiple alphabets may be specified using preceding axes, with the requirement X.shape[:-1]==Alphabet_X.shape[:-1] (analogously for Y). Alphabets of different sizes may be specified either using numpy masked arrays, or by padding with the chosen placeholder fill_value.
keep_dims : boolean
When set to True and cartesian_product==False an additional dimension of length one is appended to the returned array, facilitating any broadcast operations required by the user (defaults to False). Has no effect when cartesian_product==True.

Implementation notes:

Before estimation, outcomes are mapped to the set of non-negative integers internally, with the value -1 representing missing data. To avoid this internal conversion step, supply integer data and use the default fill value -1.

discrete_random_variable.divergence_kullbackleibler_pmf(P, Q=None, cartesian_product=False, base=2, require_valid_pmf=True, keep_dims=False)[source]

Returns the Kullback-Leibler divergence (see e.g. [CoTh06]) between arrays P and Q, each representing a discrete probability distribution.

Mathematical definition:

Denoting with \(P(x)\), \(Q(x)\) respectively the probability mass associated with observing an outcome \(x\) under distributions \(P\), \(Q\), the Kullback-Leibler divergence \(D_{\mathrm{KL}}(P \parallel Q)\) is defined as:

\[D_{\mathrm{KL}}(P \parallel Q) = -\sum_x {P(x) \log {\frac{Q(x)}{P(x)}}}.\]

Parameters:

P, Q : numpy array (or array-like object such as a list of immutables, as accepted by np.array())

cartesian_product==False and Q is not None: P and Q are arrays containing probability mass assignments, with P.shape==Q.shape. Probabilities in a distribution are indexed by the last axis in the respective arrays; multiple probability distributions in P and Q may be specified using preceding axes of the respective arrays (distributions are paired one-to-one between P and Q). When P.ndim==Q.ndim==1, returns a scalar. When P.ndim>1 and Q.ndim>1, returns an array of divergence values with dimensions P.shape[:-1]. Neither P nor Q may contain (floating point) NaN values.

cartesian_product==True and Q is not None: P and Q are arrays containing probability mass assignments, with P.shape[-1]==Q.shape[-1]. Probabilities in a distribution are indexed by the last axis in the respective arrays; multiple probability distributions in P and Q may be specified using preceding axes of the respective arrays (distributions are paired many-to-many between P and Q). When P.ndim==Q.ndim==1, returns a scalar. When P.ndim>1 and Q.ndim>1, returns an array of divergence values with dimensions np.append(P.shape[:-1],Q.shape[:-1]). Neither P nor Q may contain (floating point) NaN values.

Q is None: Equivalent to divergence_kullbackleibler_pmf(P, P, … ). Thus, a shorthand syntax for computing Kullback-Leibler divergence (in bits) between all pairs of probability distributions in P is divergence_kullbackleibler_pmf(P).

cartesian_product : boolean
Indicates whether probability distributions are paired one-to-one between P and Q (cartesian_product==False, the default value) or many-to-many between P and Q (cartesian_product==True).
base : float
The desired logarithmic base (default 2).
require_valid_pmf : boolean
When set to True (the default value), verifies that probability mass assignments in each distribution sum to 1. When set to False, no such test is performed, thus allowing incomplete probability distributions to be processed.
keep_dims : boolean
When set to True and cartesian_product==False an additional dimension of length one is appended to the returned array, facilitating any broadcast operations required by the user (defaults to False). Has no effect when cartesian_product==True.
discrete_random_variable.divergence_kullbackleibler_symmetrised(X, Y=None, cartesian_product=False, base=2, fill_value=-1, estimator='ML', Alphabet_X=None, Alphabet_Y=None, keep_dims=False)[source]

Returns the symmetrised Kullback-Leibler divergence [Lin91] between arrays X and Y, each containing discrete random variable realisations.

Mathematical definition:

Denoting with \(P_X\), \(P_Y\) respectively probability distributions with common domain, associated with discrete random variables \(X\), \(Y\), the symmetrised Kullback-Leibler divergence \(D_{\mathrm{SKL}}(P_X \parallel P_Y)\) is defined as:

\[D_{\mathrm{SKL}}(P_X \parallel P_Y) = D_{\mathrm{KL}}(P_X \parallel P_Y) + D_{\mathrm{KL}}(P_Y \parallel P_X)\]

where \(D_{\mathrm{KL}}(\cdot \parallel \cdot)\) denotes the Kullback-Leibler divergence.

Estimation:

Symmetrised Kullback-Leibler divergence is estimated based on frequency tables, using the following functions:

entropy_cross()

entropy()

See below for a list of available estimators. Note that although symmetrised Kullback-Leibler divergence is a non-negative quantity, depending on the chosen estimator the obtained estimate may be negative.

Parameters:

X,Y : numpy array (or array-like object such as a list of immutables, as accepted by np.array())

cartesian_product==False and Y is not None: X and Y are arrays containing discrete random variable realisations, with X.shape[:-1]==Y.shape[:-1]. Successive realisations of a random variable are indexed by the last axis in the respective arrays; multiple random variables in X and Y may be specified using preceding axes of the respective arrays (random variables are paired one-to-one between X and Y). When X.ndim==Y.ndim==1, returns a scalar. When X.ndim>1 and Y.ndim>1, returns an array of estimated divergence values with dimensions X.shape[:-1]. Neither X nor Y may contain (floating point) NaN values. Missing data may be specified using numpy masked arrays, as well as using standard numpy array/array-like objects; see below for details.

cartesian_product==True and Y is not None: X and Y are arrays containing discrete random variable realisations. Successive realisations of a random variable are indexed by the last axis in the respective arrays; multiple random variables in X and Y may be specified using preceding axes of the respective arrays (random variables are paired many-to-many between X and Y). When X.ndim==Y.ndim==1, returns a scalar. When X.ndim>1 or Y.ndim>1, returns an array of estimated divergence values with dimensions np.append(X.shape[:-1],Y.shape[:-1]). Neither X nor Y may contain (floating point) NaN values. Missing data may be specified using numpy masked arrays, as well as using standard numpy array/array-like objects; see below for details.

Y is None: Equivalent to divergence_kullbackleibler_symmetrised(X, X, … ). Thus, a shorthand syntax for computing symmetrised Kullback-Leibler divergence (in bits) between all pairs of random variables in X is divergence_kullbackleibler_symmetrised(X).

cartesian_product : boolean
Indicates whether random variables are paired one-to-one between X and Y (cartesian_product==False, the default value) or many-to-many between X and Y (cartesian_product==True).
base : float
The desired logarithmic base (default 2).
fill_value : object

It is possible to specify missing data using numpy masked arrays, pandas Series/DataFrames, as well as using standard numpy array/array-like objects with assigned placeholder values. When using numpy masked arrays, this function invokes np.ma.filled() internally, so that missing data are represented with the array’s object-internal placeholder value fill_value (this function’s fill_value parameter is ignored in such cases). When using pandas Series/DataFrames, an initial conversion to a numpy masked array is performed. When using standard numpy array/array-like objects, this function’s fill_value parameter is used to specify the placeholder value for missing data (defaults to -1).

Data equal to the placeholder value are subsequently ignored.

estimator : str or float

The desired estimator (see above for details on estimators). Possible values are:

‘ML’ (the default value) : Maximum likelihood estimator.

any floating point value : Maximum a posteriori esimator using Dirichlet prior (equivalent to maximum likelihood with pseudo-count for each outcome as specified).

PERKS : Maximum a posteriori esimator using Dirichlet prior (equivalent to maximum likelihood with pseudo-count for each outcome set to 1/L, where L is the number of possible outcomes.

MINIMAX : Maximum a posteriori esimator using Dirichlet prior (equivalent to maximum likelihood with pseudo-count for each outcome set to sqrt(N)/L, where N is the total number of realisations and where L is the number of possible outcomes.

JAMES-STEIN : James-Stein estimator [HaSt09].

GOOD-TURING : Good-Turing estimator [GaSa95].

Alphabet_X, Alphabet_Y : numpy array (or array-like object such as a list of immutables, as accepted by np.array())
Respectively an array specifying the alphabet/alphabets of possible outcomes that random variable realisations in array X, Y may assume. Defaults to None, in which case the alphabet/alphabets of possible outcomes is/are implicitly based the observed outcomes in array X, Y respectively, with no additional, unobserved outcomes. In combination with any estimator other than maximum likelihood, it may be useful to specify alphabets including unobserved outcomes. For such cases, successive possible outcomes of a random variable are indexed by the last axis in Alphabet_X, Alphabet_Y respectively; multiple alphabets may be specified using preceding axes, with the requirement X.shape[:-1]==Alphabet_X.shape[:-1] (analogously for Y). Alphabets of different sizes may be specified either using numpy masked arrays, or by padding with the chosen placeholder fill_value.
keep_dims : boolean
When set to True and cartesian_product==False an additional dimension of length one is appended to the returned array, facilitating any broadcast operations required by the user (defaults to False). Has no effect when cartesian_product==True.

Implementation notes:

Before estimation, outcomes are mapped to the set of non-negative integers internally, with the value -1 representing missing data. To avoid this internal conversion step, supply integer data and use the default fill value -1.

discrete_random_variable.divergence_kullbackleibler_symmetrised_pmf(P, Q=None, cartesian_product=False, base=2, require_valid_pmf=True, keep_dims=False)[source]

Returns the symmetrised Kullback-Leibler divergence [Lin91] between arrays P and Q, each representing a discrete probability distribution.

Mathematical definition:

Denoting with \(P\), \(Q\) probability distributions with common domain, the symmetrised Kullback-Leibler divergence \(D_{\mathrm{SKL}}(P \parallel Q)\) is defined as:

\[D_{\mathrm{SKL}}(P \parallel Q) = D_{\mathrm{KL}}(P \parallel Q) + D_{\mathrm{KL}}(Q \parallel P)\]

where \(D_{\mathrm{KL}}(\cdot \parallel \cdot)\) denotes the Kullback-Leibler divergence.

Parameters:

P, Q : numpy array (or array-like object such as a list of immutables, as accepted by np.array())

cartesian_product==False and Q is not None: P and Q are arrays containing probability mass assignments, with P.shape==Q.shape. Probabilities in a distribution are indexed by the last axis in the respective arrays; multiple probability distributions in P and Q may be specified using preceding axes of the respective arrays (distributions are paired one-to-one between P and Q). When P.ndim==Q.ndim==1, returns a scalar. When P.ndim>1 and Q.ndim>1, returns an array of divergence values with dimensions P.shape[:-1]. Neither P nor Q may contain (floating point) NaN values.

cartesian_product==True and Q is not None: P and Q are arrays containing probability mass assignments, with P.shape[-1]==Q.shape[-1]. Probabilities in a distribution are indexed by the last axis in the respective arrays; multiple probability distributions in P and Q may be specified using preceding axes of the respective arrays (distributions are paired many-to-many between P and Q). When P.ndim==Q.ndim==1, returns a scalar. When P.ndim>1 and Q.ndim>1, returns an array of divergence values with dimensions np.append(P.shape[:-1],Q.shape[:-1]). Neither P nor Q may contain (floating point) NaN values.

Q is None: Equivalent to divergence_kullbackleibler_symmetrised_pmf(P, P, … ). Thus, a shorthand syntax for computing symmetrised Kullback-Leibler divergence (in bits) between all pairs of probability distributions in P is divergence_kullbackleibler_symmetrised_pmf(P).

cartesian_product : boolean
Indicates whether probability distributions are paired one-to-one between P and Q (cartesian_product==False, the default value) or many-to-many between P and Q (cartesian_product==True).
base : float
The desired logarithmic base (default 2).
require_valid_pmf : boolean
When set to True (the default value), verifies that probability mass assignments in each distribution sum to 1. When set to False, no such test is performed, thus allowing incomplete probability distributions to be processed.
keep_dims : boolean
When set to True and cartesian_product==False an additional dimension of length one is appended to the returned array, facilitating any broadcast operations required by the user (defaults to False). Has no effect when cartesian_product==True.
discrete_random_variable.entropy(X, base=2, fill_value=-1, estimator='ML', Alphabet_X=None, keep_dims=False)[source]

Returns the estimated entropy (see e.g. [CoTh06]) for an array X containing realisations of a discrete random variable.

Mathematical definition:

Denoting with \(P(x)\) the probability of observing outcome \(x\) of a discrete random variable \(X\), the entropy \(H(X)\) is defined as:

\[H(X) = -\sum_x {P(x) \log {P(x)}}.\]

Estimation:

Entropy is estimated based on frequency tables. See below for a list of available estimators.

Parameters:

X : numpy array (or array-like object such as a list of immutables, as accepted by np.array())
An array containing discrete random variable realisations. Successive realisations of a random variable are indexed by the last axis in the array; multiple random variables may be specified using preceding axes. When X.ndim==1, returns a scalar. When X.ndim>1, returns an array of estimated entropies with dimensions X.shape[:-1]. X may not contain (floating point) NaN values. Missing data may be specified using numpy masked arrays, as well as using standard numpy array/array-like objects; see below for details.
base : float
The desired logarithmic base (default 2).
fill_value : object

It is possible to specify missing data using numpy masked arrays, pandas Series/DataFrames, as well as using standard numpy array/array-like objects with assigned placeholder values. When using numpy masked arrays, this function invokes np.ma.filled() internally, so that missing data are represented with the array’s object-internal placeholder value fill_value (this function’s fill_value parameter is ignored in such cases). When using pandas Series/DataFrames, an initial conversion to a numpy masked array is performed. When using standard numpy array/array-like objects, this function’s fill_value parameter is used to specify the placeholder value for missing data (defaults to -1).

Data equal to the placeholder value are subsequently ignored.

estimator : str or float

The desired estimator (see above for details on estimators). Possible values are:

‘ML’ (the default value) : Maximum likelihood estimator.

any floating point value : Maximum a posteriori esimator using Dirichlet prior (equivalent to maximum likelihood with pseudo-count for each outcome as specified).

PERKS : Maximum a posteriori esimator using Dirichlet prior (equivalent to maximum likelihood with pseudo-count for each outcome set to 1/L, where L is the number of possible outcomes.

MINIMAX : Maximum a posteriori esimator using Dirichlet prior (equivalent to maximum likelihood with pseudo-count for each outcome set to sqrt(N)/L, where N is the total number of realisations and where L is the number of possible outcomes.

JAMES-STEIN : James-Stein estimator [HaSt09].

GOOD-TURING : Good-Turing estimator [GaSa95].

Alphabet_X : numpy array (or array-like object such as a list of immutables, as accepted by np.array())
An array specifying the alphabet/alphabets of possible outcomes that random variable realisations in array X may assume. Defaults to None, in which case the alphabet/alphabets of possible outcomes is/are implicitly based the observed outcomes in array X, with no additional, unobserved outcomes. In combination with any estimator other than maximum likelihood, it may be useful to specify alphabets including unobserved outcomes. For such cases, successive possible outcomes of a random variable are indexed by the last axis in Alphabet_X; multiple alphabets may be specified using preceding axes, with the requirement X.shape[:-1]==Alphabet_X.shape[:-1]. Alphabets of different sizes may be specified either using numpy masked arrays, or by padding with the chosen placeholder fill_value.
keep_dims : boolean
When set to True, an additional dimension of length one is appended to the returned array, facilitating any broadcast operations required by the user (defaults to False).

Implementation notes:

Before estimation, outcomes are mapped to the set of non-negative integers internally, with the value -1 representing missing data. To avoid this internal conversion step, supply integer data and use the default fill value -1.

discrete_random_variable.entropy_conditional(X, Y=None, cartesian_product=False, base=2, fill_value=-1, estimator='ML', Alphabet_X=None, Alphabet_Y=None, keep_dims=False)[source]

Returns the conditional entropy (see e.g. [CoTh06]) between arrays X and Y, each containing discrete random variable realisations.

Mathematical definition:

Given discrete random variables \(X\), \(Y\), the conditional entropy \(H(X|Y)\) is defined as:

\[H(X|Y) = H(X,Y) - H(Y)\]

where \(H(\cdot,\cdot)\) denotes the joint entropy and where \(H(\cdot)\) denotes the entropy.

Estimation:

Conditional entropy is estimated based on frequency tables, using the following functions:

entropy_joint()

entropy()

See below for a list of available estimators. Note that although conditional entropy is a non-negative quantity, depending on the chosen estimator the obtained estimate may be negative.

Parameters:

X,Y : numpy array (or array-like object such as a list of immutables, as accepted by np.array())

cartesian_product==False and Y is not None: X and Y are arrays containing discrete random variable realisations, with X.shape==Y.shape. Successive realisations of a random variable are indexed by the last axis in the respective arrays; multiple random variables in X and Y may be specified using preceding axes of the respective arrays (random variables are paired one-to-one between X and Y). When X.ndim==Y.ndim==1, returns a scalar. When X.ndim>1 and Y.ndim>1, returns an array of estimated conditional entropies with dimensions X.shape[:-1]. Neither X nor Y may contain (floating point) NaN values. Missing data may be specified using numpy masked arrays, as well as using standard numpy array/array-like objects; see below for details.

cartesian_product==True and Y is not None: X and Y are arrays containing discrete random variable realisations, with X.shape[-1]==Y.shape[-1]. Successive realisations of a random variable are indexed by the last axis in the respective arrays; multiple random variables in X and Y may be specified using preceding axes of the respective arrays (random variables are paired many-to-many between X and Y). When X.ndim==Y.ndim==1, returns a scalar. When X.ndim>1 or Y.ndim>1, returns an array of estimated conditional entropies with dimensions np.append(X.shape[:-1],Y.shape[:-1]). Neither X nor Y may contain (floating point) NaN values. Missing data may be specified using numpy masked arrays, as well as using standard numpy array/array-like objects; see below for details.

Y is None: Equivalent to entropy_conditional(X, X, … ). Thus, a shorthand syntax for computing conditional entropies (in bits) between all pairs of random variables in X is entropy_conditional(X).

cartesian_product : boolean
Indicates whether random variables are paired one-to-one between X and Y (cartesian_product==False, the default value) or many-to-many between X and Y (cartesian_product==True).
base : float
The desired logarithmic base (default 2).
fill_value : object

It is possible to specify missing data using numpy masked arrays, pandas Series/DataFrames, as well as using standard numpy array/array-like objects with assigned placeholder values. When using numpy masked arrays, this function invokes np.ma.filled() internally, so that missing data are represented with the array’s object-internal placeholder value fill_value (this function’s fill_value parameter is ignored in such cases). When using pandas Series/DataFrames, an initial conversion to a numpy masked array is performed. When using standard numpy array/array-like objects, this function’s fill_value parameter is used to specify the placeholder value for missing data (defaults to -1).

Data equal to the placeholder value are subsequently ignored.

estimator : str or float

The desired estimator (see above for details on estimators). Possible values are:

‘ML’ (the default value) : Maximum likelihood estimator.

any floating point value : Maximum a posteriori esimator using Dirichlet prior (equivalent to maximum likelihood with pseudo-count for each outcome as specified).

PERKS : Maximum a posteriori esimator using Dirichlet prior (equivalent to maximum likelihood with pseudo-count for each outcome set to 1/L, where L is the number of possible outcomes.

MINIMAX : Maximum a posteriori esimator using Dirichlet prior (equivalent to maximum likelihood with pseudo-count for each outcome set to sqrt(N)/L, where N is the total number of realisations and where L is the number of possible outcomes.

JAMES-STEIN : James-Stein estimator [HaSt09].

GOOD-TURING : Good-Turing estimator [GaSa95].

Alphabet_X, Alphabet_Y : numpy array (or array-like object such as a list of immutables, as accepted by np.array())

Respectively an array specifying the alphabet/alphabets of possible outcomes that random variable realisations in array X, Y may assume. Defaults to None, in which case the alphabet/alphabets of possible outcomes is/are implicitly based the observed outcomes in array X, Y respectively, with no additional, unobserved outcomes. In combination with any estimator other than maximum likelihood, it may be useful to specify alphabets including unobserved outcomes. For such cases, successive possible outcomes of a random variable are indexed by the last axis in Alphabet_X, Alphabet_Y respectively; multiple alphabets may be specified using preceding axes, with the requirement X.shape[:-1]==Alphabet_X.shape[:-1] (analogously for Y). Alphabets of different sizes may be specified either using numpy masked arrays, or by padding with the chosen placeholder fill_value.

NB: When specifying alphabets, an alphabet of possible joint outcomes is always implicit from the alphabets of possible (marginal) outcomes in Alphabet_X, Alphabet_Y. For example, specifying Alphabet_X=Alphabet_Y=np.array(((1,2)) implies an alphabet of possible joint outcomes np.array(((1,1,2,2),(1,2,1,2))).

keep_dims : boolean
When set to True and cartesian_product==False an additional dimension of length one is appended to the returned array, facilitating any broadcast operations required by the user (defaults to False). Has no effect when cartesian_product==True.

Implementation notes:

Before estimation, outcomes are mapped to the set of non-negative integers internally, with the value -1 representing missing data. To avoid this internal conversion step, supply integer data and use the default fill value -1.

discrete_random_variable.entropy_cross(X, Y=None, cartesian_product=False, base=2, fill_value=-1, estimator='ML', Alphabet_X=None, Alphabet_Y=None, keep_dims=False)[source]

Returns the cross entropy (see e.g. [Murp12]) between arrays X and Y, each containing discrete random variable realisations.

Mathematical definition:

Denoting with \(P_X(x)\), \(P_Y(x)\) respectively the probability of observing an outcome \(x\) with discrete random variables \(X\), \(Y\), the cross entropy \(H^\times(X,Y)\) is defined as:

\[H^\times(X,Y) = -\sum_x {P_X(x) \log {P_Y(x)}}.\]

Estimation:

Cross entropy is estimated based on frequency tables. See below for a list of available estimators.

Parameters:

X,Y : numpy array (or array-like object such as a list of immutables, as accepted by np.array())

cartesian_product==False and Y is not None: X and Y are arrays containing discrete random variable realisations, with X.shape[:-1]==Y.shape[:-1]. Successive realisations of a random variable are indexed by the last axis in the respective arrays; multiple random variables in X and Y may be specified using preceding axes of the respective arrays (random variables are paired one-to-one between X and Y). When X.ndim==Y.ndim==1, returns a scalar. When X.ndim>1 and Y.ndim>1, returns an array of estimated cross entropies with dimensions X.shape[:-1]. Neither X nor Y may contain (floating point) NaN values. Missing data may be specified using numpy masked arrays, as well as using standard numpy array/array-like objects; see below for details.

cartesian_product==True and Y is not None: X and Y are arrays containing discrete random variable realisations. Successive realisations of a random variable are indexed by the last axis in the respective arrays; multiple random variables in X and Y may be specified using preceding axes of the respective arrays (random variables are paired many-to-many between X and Y). When X.ndim==Y.ndim==1, returns a scalar. When X.ndim>1 or Y.ndim>1, returns an array of estimated cross entropies with dimensions np.append(X.shape[:-1],Y.shape[:-1]). Neither X nor Y may contain (floating point) NaN values. Missing data may be specified using numpy masked arrays, as well as using standard numpy array/array-like objects; see below for details.

Y is None: Equivalent to entropy_cross(X, X, … ). Thus, a shorthand syntax for computing cross entropies (in bits) between all pairs of random variables in X is entropy_cross(X).

cartesian_product : boolean
Indicates whether random variables are paired one-to-one between X and Y (cartesian_product==False, the default value) or many-to-many between X and Y (cartesian_product==True).
base : float
The desired logarithmic base (default 2).
fill_value : object

It is possible to specify missing data using numpy masked arrays, pandas Series/DataFrames, as well as using standard numpy array/array-like objects with assigned placeholder values. When using numpy masked arrays, this function invokes np.ma.filled() internally, so that missing data are represented with the array’s object-internal placeholder value fill_value (this function’s fill_value parameter is ignored in such cases). When using pandas Series/DataFrames, an initial conversion to a numpy masked array is performed. When using standard numpy array/array-like objects, this function’s fill_value parameter is used to specify the placeholder value for missing data (defaults to -1).

Data equal to the placeholder value are subsequently ignored.

estimator : str or float

The desired estimator (see above for details on estimators). Possible values are:

‘ML’ (the default value) : Maximum likelihood estimator.

any floating point value : Maximum a posteriori esimator using Dirichlet prior (equivalent to maximum likelihood with pseudo-count for each outcome as specified).

PERKS : Maximum a posteriori esimator using Dirichlet prior (equivalent to maximum likelihood with pseudo-count for each outcome set to 1/L, where L is the number of possible outcomes.

MINIMAX : Maximum a posteriori esimator using Dirichlet prior (equivalent to maximum likelihood with pseudo-count for each outcome set to sqrt(N)/L, where N is the total number of realisations and where L is the number of possible outcomes.

JAMES-STEIN : James-Stein estimator [HaSt09].

GOOD-TURING : Good-Turing estimator [GaSa95].

Alphabet_X, Alphabet_Y : numpy array (or array-like object such as a list of immutables, as accepted by np.array())
Respectively an array specifying the alphabet/alphabets of possible outcomes that random variable realisations in array X, Y may assume. Defaults to None, in which case the alphabet/alphabets of possible outcomes is/are implicitly based the observed outcomes in array X, Y respectively, with no additional, unobserved outcomes. In combination with any estimator other than maximum likelihood, it may be useful to specify alphabets including unobserved outcomes. For such cases, successive possible outcomes of a random variable are indexed by the last axis in Alphabet_X, Alphabet_Y respectively; multiple alphabets may be specified using preceding axes, with the requirement X.shape[:-1]==Alphabet_X.shape[:-1] (analogously for Y). Alphabets of different sizes may be specified either using numpy masked arrays, or by padding with the chosen placeholder fill_value.
keep_dims : boolean
When set to True and cartesian_product==False an additional dimension of length one is appended to the returned array, facilitating any broadcast operations required by the user (defaults to False). Has no effect when cartesian_product==True.

Implementation notes:

Before estimation, outcomes are mapped to the set of non-negative integers internally, with the value -1 representing missing data. To avoid this internal conversion step, supply integer data and use the default fill value -1.

discrete_random_variable.entropy_cross_pmf(P, Q=None, cartesian_product=False, base=2, require_valid_pmf=True, keep_dims=False)[source]

Returns the cross entropy (see e.g. [Murp12]) between arrays P and Q, each representing a discrete probability distribution.

Mathematical definition:

Denoting with \(P(x)\), \(Q(x)\) respectively the probability mass associated with observing an outcome \(x\) under distributions \(P\), \(Q\), the cross entropy \(H^\times(P,Q)\) is defined as:

\[H^\times(P,Q) = -\sum_x {P(x) \log {Q(x)}}.\]

Parameters:

P, Q : numpy array (or array-like object such as a list of immutables, as accepted by np.array())

cartesian_product==False and Q is not None: P and Q are arrays containing probability mass assignments, with P.shape==Q.shape. Probabilities in a distribution are indexed by the last axis in the respective arrays; multiple probability distributions in P and Q may be specified using preceding axes of the respective arrays (distributions are paired one-to-one between P and Q). When P.ndim==Q.ndim==1, returns a scalar. When P.ndim>1 and Q.ndim>1, returns an array of cross entropies with dimensions P.shape[:-1]. Neither P nor Q may contain (floating point) NaN values.

cartesian_product==True and Q is not None: P and Q are arrays containing probability mass assignments, with P.shape[-1]==Q.shape[-1]. Probabilities in a distribution are indexed by the last axis in the respective arrays; multiple probability distributions in P and Q may be specified using preceding axes of the respective arrays (distributions are paired many-to-many between P and Q). When P.ndim==Q.ndim==1, returns a scalar. When P.ndim>1 and Q.ndim>1, returns an array of cross entropies with dimensions np.append(P.shape[:-1],Q.shape[:-1]). Neither P nor Q may contain (floating point) NaN values.

Q is None: Equivalent to entropy_cross_pmf(P, P, … ). Thus, a shorthand syntax for computing cross entropies (in bits) between all pairs of probability distributions in P is entropy_cross_pmf(P).

cartesian_product : boolean
Indicates whether probability distributions are paired one-to-one between P and Q (cartesian_product==False, the default value) or many-to-many between P and Q (cartesian_product==True).
base : float
The desired logarithmic base (default 2).
require_valid_pmf : boolean
When set to True (the default value), verifies that probability mass assignments in each distribution sum to 1. When set to False, no such test is performed, thus allowing incomplete probability distributions to be processed.
keep_dims : boolean
When set to True and cartesian_product==False an additional dimension of length one is appended to the returned array, facilitating any broadcast operations required by the user (defaults to False). Has no effect when cartesian_product==True.
discrete_random_variable.entropy_joint(X, base=2, fill_value=-1, estimator='ML', Alphabet_X=None, keep_dims=False)[source]

Returns the estimated joint entropy (see e.g. [CoTh06]) for an array X containing realisations of discrete random variables.

Mathematical definition:

Denoting with \(P(x_1, \ldots, x_n)\) the probability of jointly observing outcomes \((x_1, \ldots, x_n)\) of \(n\) discrete random variables \((X_1, \ldots, X_n)\), the joint entropy \(H(X_1, \ldots, X_n)\) is defined as:

\[H(X_1, \ldots, X_n) = -\sum_{x_1} \ldots \sum_{x_n} {P(x_1, \ldots, x_n ) \log {P(x_1, \ldots, x_n)}}.\]

Estimation:

Joint entropy is estimated based on frequency tables. See below for a list of available estimators.

Parameters*:

X : numpy array (or array-like object such as a list of immutables, as accepted by np.array())
An array containing discrete random variable realisations. Successive realisations of a random variable are indexed by the last axis in the array; multiple random variables may be specified using preceding axes. When X.ndim==1, returns a scalar and is equivalent to entropy(). When X.ndim>1, returns a scalar based on jointly considering all random variables indexed in the array. X may not contain (floating point) NaN values. Missing data may be specified using numpy masked arrays, as well as using standard numpy array/array-like objects; see below for details.
base : float
The desired logarithmic base (default 2).
fill_value : object

It is possible to specify missing data using numpy masked arrays, pandas Series/DataFrames, as well as using standard numpy array/array-like objects with assigned placeholder values. When using numpy masked arrays, this function invokes np.ma.filled() internally, so that missing data are represented with the array’s object-internal placeholder value fill_value (this function’s fill_value parameter is ignored in such cases). When using pandas Series/DataFrames, an initial conversion to a numpy masked array is performed. When using standard numpy array/array-like objects, this function’s fill_value parameter is used to specify the placeholder value for missing data (defaults to -1).

Data equal to the placeholder value are subsequently ignored.

estimator : str or float

The desired estimator (see above for details on estimators). Possible values are:

‘ML’ (the default value) : Maximum likelihood estimator.

any floating point value : Maximum a posteriori esimator using Dirichlet prior (equivalent to maximum likelihood with pseudo-count for each outcome as specified).

PERKS : Maximum a posteriori esimator using Dirichlet prior (equivalent to maximum likelihood with pseudo-count for each outcome set to 1/L, where L is the number of possible outcomes.

MINIMAX : Maximum a posteriori esimator using Dirichlet prior (equivalent to maximum likelihood with pseudo-count for each outcome set to sqrt(N)/L, where N is the total number of realisations and where L is the number of possible outcomes.

JAMES-STEIN : James-Stein estimator [HaSt09].

GOOD-TURING : Good-Turing estimator [GaSa95].

Alphabet_X : numpy array (or array-like object such as a list of immutables, as accepted by np.array())

An array specifying the alphabet/alphabets of possible outcomes that random variable realisations in array X may assume. Defaults to None, in which case the alphabet/alphabets of possible outcomes is/are implicitly based the observed outcomes in array X, with no additional, unobserved outcomes. In combination with any estimator other than maximum likelihood, it may be useful to specify alphabets including unobserved outcomes. For such cases, successive possible outcomes of a random variable are indexed by the last axis in Alphabet_X; multiple alphabets may be specified using preceding axes, with the requirement X.shape[:-1]==Alphabet_X.shape[:-1]. Alphabets of different sizes may be specified either using numpy masked arrays, or by padding with the chosen placeholder fill_value.

NB: When specifying multiple alphabets, an alphabet of possible joint outcomes is always implicit from the alphabets of possible (marginal) outcomes in Alphabet_X. For example, specifying Alphabet_X=np.array(((1,2),(1,2))) implies an alphabet of possible joint outcomes np.array(((1,1,2,2),(1,2,1,2))).

keep_dims : boolean
When set to True, an additional dimension of length one is appended to the returned array, facilitating any broadcast operations required by the user (defaults to False).

Implementation notes:

Before estimation, outcomes are mapped to the set of non-negative integers internally, with the value -1 representing missing data. To avoid this internal conversion step, supply integer data and use the default fill value -1.

discrete_random_variable.entropy_pmf(P, base=2, require_valid_pmf=True, keep_dims=False)[source]

Returns the entropy (see e.g. [CoTh06]) of an array P representing a discrete probability distribution.

Mathematical definition:

Denoting with \(P(x)\) the probability mass associated with observing an outcome \(x\) under distribution \(P\), the entropy \(H(P)\) is defined as:

\[H(P) = -\sum_x {P(x) \log {P(x)}}.\]

Parameters:

P : numpy array (or array-like object such as a list of immutables, as accepted by np.array())
An array containing probability mass assignments. Probabilities in a distribution are indexed by the last axis in the array; multiple probability distributions may be specified using preceding axes. When P.ndim==1, returns a scalar. When P.ndim>1, returns an array of entropies with dimensions P.shape[:-1]. P may not contain (floating point) NaN values.
base : float
The desired logarithmic base (default 2).
require_valid_pmf : boolean
When set to True (the default value), verifies that probability mass assignments in each distribution sum to 1. When set to False, no such test is performed, thus allowing incomplete probability distributions to be processed.
keep_dims : boolean
When set to True, an additional dimension of length one is appended to the returned array, facilitating any broadcast operations required by the user (defaults to False).
discrete_random_variable.entropy_residual(X, base=2, fill_value=-1, estimator='ML', Alphabet_X=None, keep_dims=False)[source]

Returns the estimated residual entropy [JaEC11] (also known as erasure entropy [VeWe06]) for an array X containing realisations of discrete random variables.

Mathematical definition:

Given discrete random variables \(X_1, \ldots, X_n\), the residual entropy \(R(X_1, \ldots, X_n)\) is defined as:

\[R(X_1, \ldots, X_n) = H(X_1, \ldots, X_n) - B(X_1, \ldots, X_n)\]

where \(H(\cdot, \ldots, \cdot)\) denotes the joint entropy and where \(B(\cdot, \ldots, \cdot)\) denotes the binding information.

Estimation:

Residual information is estimated based on frequency tables, using the following functions:

entropy_joint()

entropy()

See below for a list of available estimators. Note that although residual information is a non-negative quantity, depending on the chosen estimator the obtained estimate may be negative.

Parameters:

X : numpy array (or array-like object such as a list of immutables, as accepted by np.array())
An array containing discrete random variable realisations. Successive realisations of a random variable are indexed by the last axis in the array; multiple random variables may be specified using preceding axes. When X.ndim==1, returns a scalar and is equivalent to entropy(). When X.ndim>1, returns a scalar based on jointly considering all random variables indexed in the array. X may not contain (floating point) NaN values. Missing data may be specified using numpy masked arrays, as well as using standard numpy array/array-like objects; see below for details.
base : float
The desired logarithmic base (default 2).
fill_value : object

It is possible to specify missing data using numpy masked arrays, pandas Series/DataFrames, as well as using standard numpy array/array-like objects with assigned placeholder values. When using numpy masked arrays, this function invokes np.ma.filled() internally, so that missing data are represented with the array’s object-internal placeholder value fill_value (this function’s fill_value parameter is ignored in such cases). When using pandas Series/DataFrames, an initial conversion to a numpy masked array is performed. When using standard numpy array/array-like objects, this function’s fill_value parameter is used to specify the placeholder value for missing data (defaults to -1).

Data equal to the placeholder value are subsequently ignored.

estimator : str or float

The desired estimator (see above for details on estimators). Possible values are:

‘ML’ (the default value) : Maximum likelihood estimator.

any floating point value : Maximum a posteriori esimator using Dirichlet prior (equivalent to maximum likelihood with pseudo-count for each outcome as specified).

PERKS : Maximum a posteriori esimator using Dirichlet prior (equivalent to maximum likelihood with pseudo-count for each outcome set to 1/L, where L is the number of possible outcomes.

MINIMAX : Maximum a posteriori esimator using Dirichlet prior (equivalent to maximum likelihood with pseudo-count for each outcome set to sqrt(N)/L, where N is the total number of realisations and where L is the number of possible outcomes.

JAMES-STEIN : James-Stein estimator [HaSt09].

GOOD-TURING : Good-Turing estimator [GaSa95].

Alphabet_X : numpy array (or array-like object such as a list of immutables, as accepted by np.array())

An array specifying the alphabet/alphabets of possible outcomes that random variable realisations in array X may assume. Defaults to None, in which case the alphabet/alphabets of possible outcomes is/are implicitly based the observed outcomes in array X, with no additional, unobserved outcomes. In combination with any estimator other than maximum likelihood, it may be useful to specify alphabets including unobserved outcomes. For such cases, successive possible outcomes of a random variable are indexed by the last axis in Alphabet_X; multiple alphabets may be specified using preceding axes, with the requirement X.shape[:-1]==Alphabet_X.shape[:-1]. Alphabets of different sizes may be specified either using numpy masked arrays, or by padding with the chosen placeholder fill_value.

NB: When specifying multiple alphabets, an alphabet of possible joint outcomes is always implicit from the alphabets of possible (marginal) outcomes in Alphabet_X. For example, specifying Alphabet_X=np.array(((1,2),(1,2))) implies an alphabet of possible joint outcomes np.array(((1,1,2,2),(1,2,1,2))).

keep_dims : boolean
When set to True, an additional dimension of length one is appended to the returned array, facilitating any broadcast operations required by the user (defaults to False).

Implementation notes:

Before estimation, outcomes are mapped to the set of non-negative integers internally, with the value -1 representing missing data. To avoid this internal conversion step, supply integer data and use the default fill value -1.

discrete_random_variable.information_binding(X, base=2, fill_value=-1, estimator='ML', Alphabet_X=None, keep_dims=False)[source]

Returns the estimated binding information [AbPl12] (also known as dual total correlation [Han78]) for an array X containing realisations of discrete random variables.

Mathematical definition:

Given discrete random variables \(X_1, \ldots, X_n\), the binding information \(B(X_1, \ldots, X_n)\) is defined as:

\[B(X_1, \ldots, X_n) = H(X_1, \ldots, X_n) - \sum_{i=1}^{n} H(X_i | X_1, \ldots X_{i-1}, X_{i+1}, \ldots, X_n)\]

where \(H(\cdot)\) denotes the entropy and where \(H(\cdot | \cdot)\) denotes the conditional entropy.

Estimation:

Binding information is estimated based on frequency tables, using the following functions:

entropy_joint()

entropy()

See below for a list of available estimators. Note that although binding information is a non-negative quantity, depending on the chosen estimator the obtained estimate may be negative.

Parameters:

X : numpy array (or array-like object such as a list of immutables, as accepted by np.array())
An array containing discrete random variable realisations. Successive realisations of a random variable are indexed by the last axis in the array; multiple random variables may be specified using preceding axes. When X.ndim==1, returns the scalar 0. When X.ndim>1, returns a scalar based on jointly considering all random variables indexed in the array. X may not contain (floating point) NaN values. Missing data may be specified using numpy masked arrays, as well as using standard numpy array/array-like objects; see below for details.
base : float
The desired logarithmic base (default 2).
fill_value : object

It is possible to specify missing data using numpy masked arrays, pandas Series/DataFrames, as well as using standard numpy array/array-like objects with assigned placeholder values. When using numpy masked arrays, this function invokes np.ma.filled() internally, so that missing data are represented with the array’s object-internal placeholder value fill_value (this function’s fill_value parameter is ignored in such cases). When using pandas Series/DataFrames, an initial conversion to a numpy masked array is performed. When using standard numpy array/array-like objects, this function’s fill_value parameter is used to specify the placeholder value for missing data (defaults to -1).

Data equal to the placeholder value are subsequently ignored.

estimator : str or float

The desired estimator (see above for details on estimators). Possible values are:

‘ML’ (the default value) : Maximum likelihood estimator.

any floating point value : Maximum a posteriori esimator using Dirichlet prior (equivalent to maximum likelihood with pseudo-count for each outcome as specified).

PERKS : Maximum a posteriori esimator using Dirichlet prior (equivalent to maximum likelihood with pseudo-count for each outcome set to 1/L, where L is the number of possible outcomes.

MINIMAX : Maximum a posteriori esimator using Dirichlet prior (equivalent to maximum likelihood with pseudo-count for each outcome set to sqrt(N)/L, where N is the total number of realisations and where L is the number of possible outcomes.

JAMES-STEIN : James-Stein estimator [HaSt09].

GOOD-TURING : Good-Turing estimator [GaSa95].

Alphabet_X : numpy array (or array-like object such as a list of immutables, as accepted by np.array())

An array specifying the alphabet/alphabets of possible outcomes that random variable realisations in array X may assume. Defaults to None, in which case the alphabet/alphabets of possible outcomes is/are implicitly based the observed outcomes in array X, with no additional, unobserved outcomes. In combination with any estimator other than maximum likelihood, it may be useful to specify alphabets including unobserved outcomes. For such cases, successive possible outcomes of a random variable are indexed by the last axis in Alphabet_X; multiple alphabets may be specified using preceding axes, with the requirement X.shape[:-1]==Alphabet_X.shape[:-1]. Alphabets of different sizes may be specified either using numpy masked arrays, or by padding with the chosen placeholder fill_value.

NB: When specifying multiple alphabets, an alphabet of possible joint outcomes is always implicit from the alphabets of possible (marginal) outcomes in Alphabet_X. For example, specifying Alphabet_X=np.array(((1,2),(1,2))) implies an alphabet of possible joint outcomes np.array(((1,1,2,2),(1,2,1,2))).

keep_dims : boolean
When set to True, an additional dimension of length one is appended to the returned array, facilitating any broadcast operations required by the user (defaults to False).

Implementation notes:

Before estimation, outcomes are mapped to the set of non-negative integers internally, with the value -1 representing missing data. To avoid this internal conversion step, supply integer data and use the default fill value -1.

discrete_random_variable.information_co(X, base=2, fill_value=-1, estimator='ML', Alphabet_X=None, keep_dims=False)[source]

Returns the estimated co-information [Bell03] for an array X containing realisations of discrete random variables.

Mathematical definition:

Given discrete random variables \(X_1, \ldots, X_n\), the co-information \(I(X_1, \ldots, X_n)\) is defined as:

\[I(X_1, \ldots, X_n) = - \sum_{T \subseteq \{1,\ldots, n\}} (-1)^{|T|} H(X_i : i \in T)\]

where \(H(X_i : i \in T)\) denotes the joint entropy of the subset of random variables specified by \(T\). Thus, co-information is an alternating sum of joint entropies, with the sets of random variables used to compute the joint entropy in each term selected from the power set of available random variables.

Note that co-information is equal in magnitude to the interaction information \(\mathrm{Int}(X_1, \ldots, X_n)\), with equality for the case where \(n\) is even,

\[I(X_1, \ldots, X_n) = (-1)^n \mathrm{Int}(X_1, \ldots, X_n).\]

Estimation:

Co-information is estimated based on frequency tables, using the following functions:

entropy_joint()

See below for a list of available estimators. Note that although co-information is a non-negative quantity, depending on the chosen estimator the obtained estimate may be negative.

Parameters:

X : numpy array (or array-like object such as a list of immutables, as accepted by np.array())
An array containing discrete random variable realisations. Successive realisations of a random variable are indexed by the last axis in the array; multiple random variables may be specified using preceding axes. When X.ndim==1, returns a scalar and is equivalent to entropy(). When X.ndim>1, returns a scalar based on jointly considering all random variables indexed in the array. X may not contain (floating point) NaN values. Missing data may be specified using numpy masked arrays, as well as using standard numpy array/array-like objects; see below for details.
base : float
The desired logarithmic base (default 2).
fill_value : object

It is possible to specify missing data using numpy masked arrays, pandas Series/DataFrames, as well as using standard numpy array/array-like objects with assigned placeholder values. When using numpy masked arrays, this function invokes np.ma.filled() internally, so that missing data are represented with the array’s object-internal placeholder value fill_value (this function’s fill_value parameter is ignored in such cases). When using pandas Series/DataFrames, an initial conversion to a numpy masked array is performed. When using standard numpy array/array-like objects, this function’s fill_value parameter is used to specify the placeholder value for missing data (defaults to -1).

Data equal to the placeholder value are subsequently ignored.

estimator : str or float

The desired estimator (see above for details on estimators). Possible values are:

‘ML’ (the default value) : Maximum likelihood estimator.

any floating point value : Maximum a posteriori esimator using Dirichlet prior (equivalent to maximum likelihood with pseudo-count for each outcome as specified).

PERKS : Maximum a posteriori esimator using Dirichlet prior (equivalent to maximum likelihood with pseudo-count for each outcome set to 1/L, where L is the number of possible outcomes.

MINIMAX : Maximum a posteriori esimator using Dirichlet prior (equivalent to maximum likelihood with pseudo-count for each outcome set to sqrt(N)/L, where N is the total number of realisations and where L is the number of possible outcomes.

JAMES-STEIN : James-Stein estimator [HaSt09].

GOOD-TURING : Good-Turing estimator [GaSa95].

Alphabet_X : numpy array (or array-like object such as a list of immutables, as accepted by np.array())

An array specifying the alphabet/alphabets of possible outcomes that random variable realisations in array X may assume. Defaults to None, in which case the alphabet/alphabets of possible outcomes is/are implicitly based the observed outcomes in array X, with no additional, unobserved outcomes. In combination with any estimator other than maximum likelihood, it may be useful to specify alphabets including unobserved outcomes. For such cases, successive possible outcomes of a random variable are indexed by the last axis in Alphabet_X; multiple alphabets may be specified using preceding axes, with the requirement X.shape[:-1]==Alphabet_X.shape[:-1]. Alphabets of different sizes may be specified either using numpy masked arrays, or by padding with the chosen placeholder fill_value.

NB: When specifying multiple alphabets, an alphabet of possible joint outcomes is always implicit from the alphabets of possible (marginal) outcomes in Alphabet_X. For example, specifying Alphabet_X=np.array(((1,2),(1,2))) implies an alphabet of possible joint outcomes np.array(((1,1,2,2),(1,2,1,2))).

keep_dims : boolean
When set to True, an additional dimension of length one is appended to the returned array, facilitating any broadcast operations required by the user (defaults to False).

Implementation notes:

Before estimation, outcomes are mapped to the set of non-negative integers internally, with the value -1 representing missing data. To avoid this internal conversion step, supply integer data and use the default fill value -1.

discrete_random_variable.information_enigmatic(X, base=2, fill_value=-1, estimator='ML', Alphabet_X=None, keep_dims=False)[source]

Returns the estimated enigmatic information [JaEC11] for an array X containing realisations of discrete random variables.

Mathematical definition:

Given discrete random variables \(X_1, \ldots, X_n\), the enigmatic information \(Q(X_1, \ldots, X_n)\) is defined as:

\[Q(X_1, \ldots, X_n) = T(X_1, \ldots, X_n) - B(X_1, \ldots, X_n)\]

where \(T(\cdot, \ldots, \cdot)\) denotes the multi-information and where \(B(\cdot, \ldots, \cdot)\) denotes the binding information.

Estimation:

Enigmatic information is estimated based on frequency tables, using the following functions:

entropy_joint()

entropy()

See below for a list of available estimators. Note that although enigmatic information is a non-negative quantity, depending on the chosen estimator the obtained estimate may be negative.

Parameters:

X : numpy array (or array-like object such as a list of immutables, as accepted by np.array())
An array containing discrete random variable realisations. Successive realisations of a random variable are indexed by the last axis in the array; multiple random variables may be specified using preceding axes. When X.ndim==1, returns the scalar 0. When X.ndim>1, returns a scalar based on jointly considering all random variables indexed in the array. X may not contain (floating point) NaN values. Missing data may be specified using numpy masked arrays, as well as using standard numpy array/array-like objects; see below for details.
base : float
The desired logarithmic base (default 2).
fill_value : object

It is possible to specify missing data using numpy masked arrays, pandas Series/DataFrames, as well as using standard numpy array/array-like objects with assigned placeholder values. When using numpy masked arrays, this function invokes np.ma.filled() internally, so that missing data are represented with the array’s object-internal placeholder value fill_value (this function’s fill_value parameter is ignored in such cases). When using pandas Series/DataFrames, an initial conversion to a numpy masked array is performed. When using standard numpy array/array-like objects, this function’s fill_value parameter is used to specify the placeholder value for missing data (defaults to -1).

Data equal to the placeholder value are subsequently ignored.

estimator : str or float

The desired estimator (see above for details on estimators). Possible values are:

‘ML’ (the default value) : Maximum likelihood estimator.

any floating point value : Maximum a posteriori esimator using Dirichlet prior (equivalent to maximum likelihood with pseudo-count for each outcome as specified).

PERKS : Maximum a posteriori esimator using Dirichlet prior (equivalent to maximum likelihood with pseudo-count for each outcome set to 1/L, where L is the number of possible outcomes.

MINIMAX : Maximum a posteriori esimator using Dirichlet prior (equivalent to maximum likelihood with pseudo-count for each outcome set to sqrt(N)/L, where N is the total number of realisations and where L is the number of possible outcomes.

JAMES-STEIN : James-Stein estimator [HaSt09].

GOOD-TURING : Good-Turing estimator [GaSa95].

Alphabet_X : numpy array (or array-like object such as a list of immutables, as accepted by np.array())

An array specifying the alphabet/alphabets of possible outcomes that random variable realisations in array X may assume. Defaults to None, in which case the alphabet/alphabets of possible outcomes is/are implicitly based the observed outcomes in array X, with no additional, unobserved outcomes. In combination with any estimator other than maximum likelihood, it may be useful to specify alphabets including unobserved outcomes. For such cases, successive possible outcomes of a random variable are indexed by the last axis in Alphabet_X; multiple alphabets may be specified using preceding axes, with the requirement X.shape[:-1]==Alphabet_X.shape[:-1]. Alphabets of different sizes may be specified either using numpy masked arrays, or by padding with the chosen placeholder fill_value.

NB: When specifying multiple alphabets, an alphabet of possible joint outcomes is always implicit from the alphabets of possible (marginal) outcomes in Alphabet_X. For example, specifying Alphabet_X=np.array(((1,2),(1,2))) implies an alphabet of possible joint outcomes np.array(((1,1,2,2),(1,2,1,2))).

keep_dims : boolean
When set to True, an additional dimension of length one is appended to the returned array, facilitating any broadcast operations required by the user (defaults to False).

Implementation notes:

Before estimation, outcomes are mapped to the set of non-negative integers internally, with the value -1 representing missing data. To avoid this internal conversion step, supply integer data and use the default fill value -1.

discrete_random_variable.information_exogenous_local(X, base=2, fill_value=-1, estimator='ML', Alphabet_X=None, keep_dims=False)[source]

Returns the estimated exogenous local information [JaEC11] for an array X containing realisations of discrete random variables.

Mathematical definition:

Given discrete random variables \(X_1, \ldots, X_n\), the exogenous local information \(W(X_1, \ldots, X_n)\) is defined as:

\[W(X_1, \ldots, X_n) = T(X_1, \ldots, X_n) + B(X_1, \ldots, X_n)\]

where \(T(\cdot, \ldots, \cdot)\) denotes the multi-information and where \(B(\cdot, \ldots, \cdot)\) denotes the binding information.

Estimation:

Exogenous local information is estimated based on frequency tables, using the following functions:

entropy_joint()

entropy()

See below for a list of available estimators. Note that although exogenous local information is a non-negative quantity, depending on the chosen estimator the obtained estimate may be negative.

Parameters:

X : numpy array (or array-like object such as a list of immutables, as accepted by np.array())
An array containing discrete random variable realisations. Successive realisations of a random variable are indexed by the last axis in the array; multiple random variables may be specified using preceding axes. When X.ndim==1, returns the scalar 0. When X.ndim>1, returns a scalar based on jointly considering all random variables indexed in the array. X may not contain (floating point) NaN values. Missing data may be specified using numpy masked arrays, as well as using standard numpy array/array-like objects; see below for details.
base : float
The desired logarithmic base (default 2).
fill_value : object

It is possible to specify missing data using numpy masked arrays, pandas Series/DataFrames, as well as using standard numpy array/array-like objects with assigned placeholder values. When using numpy masked arrays, this function invokes np.ma.filled() internally, so that missing data are represented with the array’s object-internal placeholder value fill_value (this function’s fill_value parameter is ignored in such cases). When using pandas Series/DataFrames, an initial conversion to a numpy masked array is performed. When using standard numpy array/array-like objects, this function’s fill_value parameter is used to specify the placeholder value for missing data (defaults to -1).

Data equal to the placeholder value are subsequently ignored.

estimator : str or float

The desired estimator (see above for details on estimators). Possible values are:

‘ML’ (the default value) : Maximum likelihood estimator.

any floating point value : Maximum a posteriori esimator using Dirichlet prior (equivalent to maximum likelihood with pseudo-count for each outcome as specified).

PERKS : Maximum a posteriori esimator using Dirichlet prior (equivalent to maximum likelihood with pseudo-count for each outcome set to 1/L, where L is the number of possible outcomes.

MINIMAX : Maximum a posteriori esimator using Dirichlet prior (equivalent to maximum likelihood with pseudo-count for each outcome set to sqrt(N)/L, where N is the total number of realisations and where L is the number of possible outcomes.

JAMES-STEIN : James-Stein estimator [HaSt09].

GOOD-TURING : Good-Turing estimator [GaSa95].

Alphabet_X : numpy array (or array-like object such as a list of immutables, as accepted by np.array())

An array specifying the alphabet/alphabets of possible outcomes that random variable realisations in array X may assume. Defaults to None, in which case the alphabet/alphabets of possible outcomes is/are implicitly based the observed outcomes in array X, with no additional, unobserved outcomes. In combination with any estimator other than maximum likelihood, it may be useful to specify alphabets including unobserved outcomes. For such cases, successive possible outcomes of a random variable are indexed by the last axis in Alphabet_X; multiple alphabets may be specified using preceding axes, with the requirement X.shape[:-1]==Alphabet_X.shape[:-1]. Alphabets of different sizes may be specified either using numpy masked arrays, or by padding with the chosen placeholder fill_value.

NB: When specifying multiple alphabets, an alphabet of possible joint outcomes is always implicit from the alphabets of possible (marginal) outcomes in Alphabet_X. For example, specifying Alphabet_X=np.array(((1,2),(1,2))) implies an alphabet of possible joint outcomes np.array(((1,1,2,2),(1,2,1,2))).

keep_dims : boolean
When set to True, an additional dimension of length one is appended to the returned array, facilitating any broadcast operations required by the user (defaults to False).

Implementation notes:

Before estimation, outcomes are mapped to the set of non-negative integers internally, with the value -1 representing missing data. To avoid this internal conversion step, supply integer data and use the default fill value -1.

discrete_random_variable.information_interaction(X, base=2, fill_value=-1, estimator='ML', Alphabet_X=None, keep_dims=False)[source]

Returns the estimated interaction information [JaBr03] for an array X containing realisations of discrete random variables.

Mathematical definition:

Given discrete random variables \(X_1, \ldots, X_n\), the interaction information \(\mathrm{Int}(X_1, \ldots, X_n)\) is defined as:

\[\mathrm{Int}(X_1, \ldots, X_n) = - \sum_{T \subseteq \{1,\ldots, n\}} (-1)^{n-|T|} H(X_i : i \in T)\]

where \(H(X_i : i \in T)\) denotes the joint entropy of the subset of random variables specified by \(T\). Thus, interaction information is an alternating sum of joint entropies, with the sets of random variables used to compute the joint entropy in each term selected from the power set of available random variables.

Note that interaction information is equal in magnitude to the co-information \(I(X_1, \ldots, X_n)\), with equality for the case where \(n\) is even,

\[\mathrm{Int}(X_1, \ldots, X_n) = (-1)^n I(X_1, \ldots, X_n).\]

Estimation:

Interaction information is estimated based on frequency tables, using the following functions:

entropy_joint()

See below for a list of available estimators. Note that although interaction information is a non-negative quantity, depending on the chosen estimator the obtained estimate may be negative.

Parameters:

X : numpy array (or array-like object such as a list of immutables, as accepted by np.array())
An array containing discrete random variable realisations. Successive realisations of a random variable are indexed by the last axis in the array; multiple random variables may be specified using preceding axes. When X.ndim==1, returns a scalar and is equivalent to -1*entropy(). When X.ndim>1, returns a scalar based on jointly considering all random variables indexed in the array. X may not contain (floating point) NaN values. Missing data may be specified using numpy masked arrays, as well as using standard numpy array/array-like objects; see below for details.
base : float
The desired logarithmic base (default 2).
fill_value : object

It is possible to specify missing data using numpy masked arrays, pandas Series/DataFrames, as well as using standard numpy array/array-like objects with assigned placeholder values. When using numpy masked arrays, this function invokes np.ma.filled() internally, so that missing data are represented with the array’s object-internal placeholder value fill_value (this function’s fill_value parameter is ignored in such cases). When using pandas Series/DataFrames, an initial conversion to a numpy masked array is performed. When using standard numpy array/array-like objects, this function’s fill_value parameter is used to specify the placeholder value for missing data (defaults to -1).

Data equal to the placeholder value are subsequently ignored.

estimator : str or float

The desired estimator (see above for details on estimators). Possible values are:

‘ML’ (the default value) : Maximum likelihood estimator.

any floating point value : Maximum a posteriori esimator using Dirichlet prior (equivalent to maximum likelihood with pseudo-count for each outcome as specified).

PERKS : Maximum a posteriori esimator using Dirichlet prior (equivalent to maximum likelihood with pseudo-count for each outcome set to 1/L, where L is the number of possible outcomes.

MINIMAX : Maximum a posteriori esimator using Dirichlet prior (equivalent to maximum likelihood with pseudo-count for each outcome set to sqrt(N)/L, where N is the total number of realisations and where L is the number of possible outcomes.

JAMES-STEIN : James-Stein estimator [HaSt09].

GOOD-TURING : Good-Turing estimator [GaSa95].

Alphabet_X : numpy array (or array-like object such as a list of immutables, as accepted by np.array())

An array specifying the alphabet/alphabets of possible outcomes that random variable realisations in array X may assume. Defaults to None, in which case the alphabet/alphabets of possible outcomes is/are implicitly based the observed outcomes in array X, with no additional, unobserved outcomes. In combination with any estimator other than maximum likelihood, it may be useful to specify alphabets including unobserved outcomes. For such cases, successive possible outcomes of a random variable are indexed by the last axis in Alphabet_X; multiple alphabets may be specified using preceding axes, with the requirement X.shape[:-1]==Alphabet_X.shape[:-1]. Alphabets of different sizes may be specified either using numpy masked arrays, or by padding with the chosen placeholder fill_value.

NB: When specifying multiple alphabets, an alphabet of possible joint outcomes is always implicit from the alphabets of possible (marginal) outcomes in Alphabet_X. For example, specifying Alphabet_X=np.array(((1,2),(1,2))) implies an alphabet of possible joint outcomes np.array(((1,1,2,2),(1,2,1,2))).

keep_dims : boolean
When set to True, an additional dimension of length one is appended to the returned array, facilitating any broadcast operations required by the user (defaults to False).

Implementation notes:

Before estimation, outcomes are mapped to the set of non-negative integers internally, with the value -1 representing missing data. To avoid this internal conversion step, supply integer data and use the default fill value -1.

discrete_random_variable.information_lautum(X, Y=None, cartesian_product=False, base=2, fill_value=-1, estimator='ML', Alphabet_X=None, Alphabet_Y=None, keep_dims=False)[source]

Returns the lautum information [PaVe08] between arrays X and Y, each containing discrete random variable realisations.

Mathematical definition:

Denoting with \(P_X(x)\), \(P_Y(x)\) respectively the probability of observing an outcome \(x\) with discrete random variables \(X\), \(Y\), and denoting with \(P_{XY}(x,y)\) the probability of jointly observing outcomes \(x\), \(y\) respectively with \(X\), \(Y\), the lautum information \(L(X;Y)\) is defined as:

\[\begin{split}\begin{eqnarray} L(X;Y) &=& -\sum_x \sum_y {P_X(x) P_Y(y) \log {\frac{P_X(x) P_Y(y)}{P_{XY}(x,y)}}} \\ &=& D_{\mathrm{KL}}(P_X P_Y \parallel P_{XY}) \end{eqnarray}\end{split}\]

where \(D_{\mathrm{KL}}(\cdot \parallel \cdot)\) denotes the Kullback-Leibler divergence. Note that lautum is mutual spelt backwards; denoting with \(I(\cdot;\cdot)\) the mutual information it may be shown (see e.g. [CoTh06]) that

\[\begin{eqnarray} I(X;Y) &=& D_{\mathrm{KL}}(P_{XY} \parallel P_X P_Y). \end{eqnarray}\]

Estimation:

Lautum information is estimated based on frequency tables. See below for a list of available estimators.

Parameters:

X,Y : numpy array (or array-like object such as a list of immutables, as accepted by np.array())

cartesian_product==False and Y is not None: X and Y are arrays containing discrete random variable realisations, with X.shape[:-1]==Y.shape[:-1]. Successive realisations of a random variable are indexed by the last axis in the respective arrays; multiple random variables in X and Y may be specified using preceding axes of the respective arrays (random variables are paired one-to-one between X and Y). When X.ndim==Y.ndim==1, returns a scalar. When X.ndim>1 and Y.ndim>1, returns an array of estimated information values with dimensions X.shape[:-1]. Neither X nor Y may contain (floating point) NaN values. Missing data may be specified using numpy masked arrays, as well as using standard numpy array/array-like objects; see below for details.

cartesian_product==True and Y is not None: X and Y are arrays containing discrete random variable realisations. Successive realisations of a random variable are indexed by the last axis in the respective arrays; multiple random variables in X and Y may be specified using preceding axes of the respective arrays (random variables are paired many-to-many between X and Y). When X.ndim==Y.ndim==1, returns a scalar. When X.ndim>1 or Y.ndim>1, returns an array of estimated information values with dimensions np.append(X.shape[:-1],Y.shape[:-1]). Neither X nor Y may contain (floating point) NaN values. Missing data may be specified using numpy masked arrays, as well as using standard numpy array/array-like objects; see below for details.

Y is None: Equivalent to information_lautum(X, X, … ). Thus, a shorthand syntax for computing lautum information (in bits) between all pairs of random variables in X is information_lautum(X).

cartesian_product : boolean
Indicates whether random variables are paired one-to-one between X and Y (cartesian_product==False, the default value) or many-to-many between X and Y (cartesian_product==True).
base : float
The desired logarithmic base (default 2).
fill_value : object

It is possible to specify missing data using numpy masked arrays, pandas Series/DataFrames, as well as using standard numpy array/array-like objects with assigned placeholder values. When using numpy masked arrays, this function invokes np.ma.filled() internally, so that missing data are represented with the array’s object-internal placeholder value fill_value (this function’s fill_value parameter is ignored in such cases). When using pandas Series/DataFrames, an initial conversion to a numpy masked array is performed. When using standard numpy array/array-like objects, this function’s fill_value parameter is used to specify the placeholder value for missing data (defaults to -1).

Data equal to the placeholder value are subsequently ignored.

estimator : str or float

The desired estimator (see above for details on estimators). Possible values are:

‘ML’ (the default value) : Maximum likelihood estimator.

any floating point value : Maximum a posteriori esimator using Dirichlet prior (equivalent to maximum likelihood with pseudo-count for each outcome as specified).

PERKS : Maximum a posteriori esimator using Dirichlet prior (equivalent to maximum likelihood with pseudo-count for each outcome set to 1/L, where L is the number of possible outcomes.

MINIMAX : Maximum a posteriori esimator using Dirichlet prior (equivalent to maximum likelihood with pseudo-count for each outcome set to sqrt(N)/L, where N is the total number of realisations and where L is the number of possible outcomes.

JAMES-STEIN : James-Stein estimator [HaSt09].

GOOD-TURING : Good-Turing estimator [GaSa95].

Alphabet_X, Alphabet_Y : numpy array (or array-like object such as a list of immutables, as accepted by np.array())

Respectively an array specifying the alphabet/alphabets of possible outcomes that random variable realisations in array X, Y may assume. Defaults to None, in which case the alphabet/alphabets of possible outcomes is/are implicitly based the observed outcomes in array X, Y respectively, with no additional, unobserved outcomes. In combination with any estimator other than maximum likelihood, it may be useful to specify alphabets including unobserved outcomes. For such cases, successive possible outcomes of a random variable are indexed by the last axis in Alphabet_X, Alphabet_Y respectively; multiple alphabets may be specified using preceding axes, with the requirement X.shape[:-1]==Alphabet_X.shape[:-1] (analogously for Y). Alphabets of different sizes may be specified either using numpy masked arrays, or by padding with the chosen placeholder fill_value.

NB: When specifying alphabets, an alphabet of possible joint outcomes is always implicit from the alphabets of possible (marginal) outcomes in Alphabet_X, Alphabet_Y. For example, specifying Alphabet_X=Alphabet_Y=np.array(((1,2)) implies an alphabet of possible joint outcomes np.array(((1,1,2,2),(1,2,1,2))).

keep_dims : boolean
When set to True and cartesian_product==False an additional dimension of length one is appended to the returned array, facilitating any broadcast operations required by the user (defaults to False). Has no effect when cartesian_product==True.

Implementation notes:

Before estimation, outcomes are mapped to the set of non-negative integers internally, with the value -1 representing missing data. To avoid this internal conversion step, supply integer data and use the default fill value -1.

discrete_random_variable.information_multi(X, base=2, fill_value=-1, estimator='ML', Alphabet_X=None, keep_dims=False)[source]

Returns the estimated multi-information [StVe98] (also known as total correlation [Wata60]) for an array X containing realisations of discrete random variables.

Mathematical definition:

Given discrete random variables \(X_1, \ldots, X_n\), the multi-information \(T(X_1, \ldots, X_n)\) is defined as:

\[T(X_1, \ldots, X_n) = \left( \sum_{i=1}^{n} H(X_i) \right) - H(X_1, \ldots, X_n)\]

where \(H(\cdot)\) denotes the entropy and where \(H(\cdot, \ldots, \cdot)\) denotes the joint entropy.

Estimation:

Multi-information is estimated based on frequency tables, using the following functions:

entropy_joint()

entropy()

See below for a list of available estimators. Note that although multi-information is a non-negative quantity, depending on the chosen estimator the obtained estimate may be negative.

Parameters:

X : numpy array (or array-like object such as a list of immutables, as accepted by np.array())
An array containing discrete random variable realisations. Successive realisations of a random variable are indexed by the last axis in the array; multiple random variables may be specified using preceding axes. When X.ndim==1, returns the scalar 0. When X.ndim>1, returns a scalar based on jointly considering all random variables indexed in the array. X may not contain (floating point) NaN values. Missing data may be specified using numpy masked arrays, as well as using standard numpy array/array-like objects; see below for details.
base : float
The desired logarithmic base (default 2).
fill_value : object

It is possible to specify missing data using numpy masked arrays, pandas Series/DataFrames, as well as using standard numpy array/array-like objects with assigned placeholder values. When using numpy masked arrays, this function invokes np.ma.filled() internally, so that missing data are represented with the array’s object-internal placeholder value fill_value (this function’s fill_value parameter is ignored in such cases). When using pandas Series/DataFrames, an initial conversion to a numpy masked array is performed. When using standard numpy array/array-like objects, this function’s fill_value parameter is used to specify the placeholder value for missing data (defaults to -1).

Data equal to the placeholder value are subsequently ignored.

estimator : str or float

The desired estimator (see above for details on estimators). Possible values are:

‘ML’ (the default value) : Maximum likelihood estimator.

any floating point value : Maximum a posteriori esimator using Dirichlet prior (equivalent to maximum likelihood with pseudo-count for each outcome as specified).

PERKS : Maximum a posteriori esimator using Dirichlet prior (equivalent to maximum likelihood with pseudo-count for each outcome set to 1/L, where L is the number of possible outcomes.

MINIMAX : Maximum a posteriori esimator using Dirichlet prior (equivalent to maximum likelihood with pseudo-count for each outcome set to sqrt(N)/L, where N is the total number of realisations and where L is the number of possible outcomes.

JAMES-STEIN : James-Stein estimator [HaSt09].

GOOD-TURING : Good-Turing estimator [GaSa95].

Alphabet_X : numpy array (or array-like object such as a list of immutables, as accepted by np.array())

An array specifying the alphabet/alphabets of possible outcomes that random variable realisations in array X may assume. Defaults to None, in which case the alphabet/alphabets of possible outcomes is/are implicitly based the observed outcomes in array X, with no additional, unobserved outcomes. In combination with any estimator other than maximum likelihood, it may be useful to specify alphabets including unobserved outcomes. For such cases, successive possible outcomes of a random variable are indexed by the last axis in Alphabet_X; multiple alphabets may be specified using preceding axes, with the requirement X.shape[:-1]==Alphabet_X.shape[:-1]. Alphabets of different sizes may be specified either using numpy masked arrays, or by padding with the chosen placeholder fill_value.

NB: When specifying multiple alphabets, an alphabet of possible joint outcomes is always implicit from the alphabets of possible (marginal) outcomes in Alphabet_X. For example, specifying Alphabet_X=np.array(((1,2),(1,2))) implies an alphabet of possible joint outcomes np.array(((1,1,2,2),(1,2,1,2))).

keep_dims : boolean
When set to True, an additional dimension of length one is appended to the returned array, facilitating any broadcast operations required by the user (defaults to False).

Implementation notes:

Before estimation, outcomes are mapped to the set of non-negative integers internally, with the value -1 representing missing data. To avoid this internal conversion step, supply integer data and use the default fill value -1.

discrete_random_variable.information_mutual(X, Y=None, cartesian_product=False, base=2, fill_value=-1, estimator='ML', Alphabet_X=None, Alphabet_Y=None, keep_dims=False)[source]

Returns the mutual information (see e.g. [CoTh06]) between arrays X and Y, each containing discrete random variable realisations.

Mathematical definition:

Given discrete random variables \(X\), \(Y\), the mutual information \(I(X;Y)\) is defined as:

\[I(X;Y) = H(X) - H(X|Y)\]

where \(H(\cdot)\) denotes the entropy and where \(H(\cdot|\cdot)\) denotes the conditional entropy.

Estimation:

Mutual information is estimated based on frequency tables, using the following functions:

entropy_joint()

entropy()

See below for a list of available estimators. Note that although mutual information is a non-negative quantity, depending on the chosen estimator the obtained estimate may be negative.

Parameters:

X,Y : numpy array (or array-like object such as a list of immutables, as accepted by np.array())

cartesian_product==False and Y is not None: X and Y are arrays containing discrete random variable realisations, with X.shape==Y.shape. Successive realisations of a random variable are indexed by the last axis in the respective arrays; multiple random variables in X and Y may be specified using preceding axes of the respective arrays (random variables are paired one-to-one between X and Y). When X.ndim==Y.ndim==1, returns a scalar. When X.ndim>1 and Y.ndim>1, returns an array of estimated mutual information values with dimensions X.shape[:-1]. Neither X nor Y may contain (floating point) NaN values. Missing data may be specified using numpy masked arrays, as well as using standard numpy array/array-like objects; see below for details.

cartesian_product==True and Y is not None: X and Y are arrays containing discrete random variable realisations, with X.shape[-1]==Y.shape[-1]. Successive realisations of a random variable are indexed by the last axis in the respective arrays; multiple random variables in X and Y may be specified using preceding axes of the respective arrays (random variables are paired many-to-many between X and Y). When X.ndim==Y.ndim==1, returns a scalar. When X.ndim>1 or Y.ndim>1, returns an array of estimated mutual information values with dimensions np.append(X.shape[:-1],Y.shape[:-1]). Neither X nor Y may contain (floating point) NaN values. Missing data may be specified using numpy masked arrays, as well as using standard numpy array/array-like objects; see below for details.

Y is None: Equivalent to information_mutual(X, X, … ). Thus, a shorthand syntax for computing mutual information (in bits) between all pairs of random variables in X is information_mutual(X).

cartesian_product : boolean
Indicates whether random variables are paired one-to-one between X and Y (cartesian_product==False, the default value) or many-to-many between X and Y (cartesian_product==True).
base : float
The desired logarithmic base (default 2).
fill_value : object

It is possible to specify missing data using numpy masked arrays, pandas Series/DataFrames, as well as using standard numpy array/array-like objects with assigned placeholder values. When using numpy masked arrays, this function invokes np.ma.filled() internally, so that missing data are represented with the array’s object-internal placeholder value fill_value (this function’s fill_value parameter is ignored in such cases). When using pandas Series/DataFrames, an initial conversion to a numpy masked array is performed. When using standard numpy array/array-like objects, this function’s fill_value parameter is used to specify the placeholder value for missing data (defaults to -1).

Data equal to the placeholder value are subsequently ignored.

estimator : str or float

The desired estimator (see above for details on estimators). Possible values are:

‘ML’ (the default value) : Maximum likelihood estimator.

any floating point value : Maximum a posteriori esimator using Dirichlet prior (equivalent to maximum likelihood with pseudo-count for each outcome as specified).

PERKS : Maximum a posteriori esimator using Dirichlet prior (equivalent to maximum likelihood with pseudo-count for each outcome set to 1/L, where L is the number of possible outcomes.

MINIMAX : Maximum a posteriori esimator using Dirichlet prior (equivalent to maximum likelihood with pseudo-count for each outcome set to sqrt(N)/L, where N is the total number of realisations and where L is the number of possible outcomes.

JAMES-STEIN : James-Stein estimator [HaSt09].

GOOD-TURING : Good-Turing estimator [GaSa95].

Alphabet_X, Alphabet_Y : numpy array (or array-like object such as a list of immutables, as accepted by np.array())

Respectively an array specifying the alphabet/alphabets of possible outcomes that random variable realisations in array X, Y may assume. Defaults to None, in which case the alphabet/alphabets of possible outcomes is/are implicitly based the observed outcomes in array X, Y respectively, with no additional, unobserved outcomes. In combination with any estimator other than maximum likelihood, it may be useful to specify alphabets including unobserved outcomes. For such cases, successive possible outcomes of a random variable are indexed by the last axis in Alphabet_X, Alphabet_Y respectively; multiple alphabets may be specified using preceding axes, with the requirement X.shape[:-1]==Alphabet_X.shape[:-1] (analogously for Y). Alphabets of different sizes may be specified either using numpy masked arrays, or by padding with the chosen placeholder fill_value.

NB: When specifying alphabets, an alphabet of possible joint outcomes is always implicit from the alphabets of possible (marginal) outcomes in Alphabet_X, Alphabet_Y. For example, specifying Alphabet_X=Alphabet_Y=np.array(((1,2)) implies an alphabet of possible joint outcomes np.array(((1,1,2,2),(1,2,1,2))).

keep_dims : boolean
When set to True and cartesian_product==False an additional dimension of length one is appended to the returned array, facilitating any broadcast operations required by the user (defaults to False). Has no effect when cartesian_product==True.

Implementation notes:

Before estimation, outcomes are mapped to the set of non-negative integers internally, with the value -1 representing missing data. To avoid this internal conversion step, supply integer data and use the default fill value -1.

discrete_random_variable.information_mutual_conditional(X, Y, Z, cartesian_product=False, base=2, fill_value=-1, estimator='ML', Alphabet_X=None, Alphabet_Y=None, Alphabet_Z=None, keep_dims=False)[source]

Returns the conditional mutual information (see e.g. [CoTh06]) between arrays X and Y given array Z, each containing discrete random variable realisations.

Mathematical definition:

Given discrete random variables \(X\), \(Y\), \(Z\), the conditional mutual information \(I(X;Y|Z)\) is defined as:

\[I(X;Y|Z) = H(X|Z) - H(X|Y,Z)\]

where \(H(\cdot|\cdot)\) denotes the conditional entropy.

Estimation:

Conditional mutual information is estimated based on frequency tables, using the following functions:

entropy_joint()

entropy()

See below for a list of available estimators. Note that although conditional mutual information is a non-negative quantity, depending on the chosen estimator the obtained estimate may be negative.

Parameters:

X,Y,Z : numpy array (or array-like object such as a list of immutables, as accepted by np.array())

cartesian_product==False: X,Y,Z are arrays containing discrete random variable realisations, with X.shape==Y.shape==Z.shape. Successive realisations of a random variable are indexed by the last axis in the respective arrays; multiple random variables in X,Y,Z may be specified using preceding axes of the respective arrays (random variables are paired one-to-one between X,Y,Z). When X.ndim==Y.ndim==Z.ndim==1, returns a scalar. When X.ndim>1 and Y.ndim>1 and Z.ndim>1, returns an array of estimated conditional mutual information values with dimensions X.shape[:-1]. Neither X nor Y nor Z may contain (floating point) NaN values. Missing data may be specified using numpy masked arrays, as well as using standard numpy array/array-like objects; see below for details.

cartesian_product==True: X,Y,Z are arrays containing discrete random variable realisations, with X.shape[-1]==Y.shape[-1]==Z.shape[-1]. Successive realisations of a random variable are indexed by the last axis in the respective arrays; multiple random variables in X,Y,Z may be specified using preceding axes of the respective arrays (random variables are paired many-to-many between X,Y,Z). When X.ndim==Y.ndim==Z.ndim==1, returns a scalar. When X.ndim>1 or Y.ndim>1 or Z.ndim>1, returns an array of estimated conditional mutual information values with dimensions np.append(X.shape[:-1],Y.shape[:-1],Z.shape[:-1]). Neither X nor Y nor Z may contain (floating point) NaN values. Missing data may be specified using numpy masked arrays, as well as using standard numpy array/array-like objects; see below for details.

cartesian_product : boolean
Indicates whether random variables are paired one-to-one between X,Y,Z (cartesian_product==False, the default value) or many-to-many between X,Y,Z (cartesian_product==True).
base : float
The desired logarithmic base (default 2).
fill_value : object

It is possible to specify missing data using numpy masked arrays, pandas Series/DataFrames, as well as using standard numpy array/array-like objects with assigned placeholder values. When using numpy masked arrays, this function invokes np.ma.filled() internally, so that missing data are represented with the array’s object-internal placeholder value fill_value (this function’s fill_value parameter is ignored in such cases). When using pandas Series/DataFrames, an initial conversion to a numpy masked array is performed. When using standard numpy array/array-like objects, this function’s fill_value parameter is used to specify the placeholder value for missing data (defaults to -1).

Data equal to the placeholder value are subsequently ignored.

estimator : str or float

The desired estimator (see above for details on estimators). Possible values are:

‘ML’ (the default value) : Maximum likelihood estimator.

any floating point value : Maximum a posteriori esimator using Dirichlet prior (equivalent to maximum likelihood with pseudo-count for each outcome as specified).

PERKS : Maximum a posteriori esimator using Dirichlet prior (equivalent to maximum likelihood with pseudo-count for each outcome set to 1/L, where L is the number of possible outcomes.

MINIMAX : Maximum a posteriori esimator using Dirichlet prior (equivalent to maximum likelihood with pseudo-count for each outcome set to sqrt(N)/L, where N is the total number of realisations and where L is the number of possible outcomes.

JAMES-STEIN : James-Stein estimator [HaSt09].

GOOD-TURING : Good-Turing estimator [GaSa95].

Alphabet_X, Alphabet_Y, Alphabet_Z : numpy array (or array-like object such as a list of immutables, as accepted by np.array())

Respectively an array specifying the alphabet/alphabets of possible outcomes that random variable realisations in array X, Y, Z may assume. Defaults to None, in which case the alphabet/alphabets of possible outcomes is/are implicitly based the observed outcomes in array X, Y, Z respectively, with no additional, unobserved outcomes. In combination with any estimator other than maximum likelihood, it may be useful to specify alphabets including unobserved outcomes. For such cases, successive possible outcomes of a random variable are indexed by the last axis in Alphabet_X, Alphabet_Y, Alphabet_Z respectively; multiple alphabets may be specified using preceding axes, with the requirement X.shape[:-1]==Alphabet_X.shape[:-1] (analogously for Y and Z). Alphabets of different sizes may be specified either using numpy masked arrays, or by padding with the chosen placeholder fill_value.

NB: When specifying alphabets, an alphabet of possible joint outcomes is always implicit from the alphabets of possible (marginal) outcomes in Alphabet_X, Alphabet_Y, Alphabet_Z. For example, specifying Alphabet_X=Alphabet_Y=Alphabet_Z=np.array(((1,2)) implies an alphabet of possible joint outcomes np.array((1,1,1,1,2,2,2,2),((1,1,2,2,1,1,2,2),(1,2,1,2,1,2,1,2))).

keep_dims : boolean
When set to True and cartesian_product==False an additional dimension of length one is appended to the returned array, facilitating any broadcast operations required by the user (defaults to False). Has no effect when cartesian_product==True.

Implementation notes:

Before estimation, outcomes are mapped to the set of non-negative integers internally, with the value -1 representing missing data. To avoid this internal conversion step, supply integer data and use the default fill value -1.

discrete_random_variable.information_mutual_normalised(X, Y=None, norm_factor='Y', cartesian_product=False, fill_value=-1, estimator='ML', Alphabet_X=None, Alphabet_Y=None, keep_dims=False)[source]

Returns the normalised mutual information between arrays X and Y, each containing discrete random variable realisations.

Mathematical definition:

Given discrete random variables \(X\), \(Y\), the normalised mutual information \(NI(X;Y)\) is defined as:

\[NI(X;Y) = \frac{I(X;Y)}{C_n}\]

where \(I\) denotes the mutual information and where \(C_n\) denotes a normalisation factor. Normalised mutual information is a dimensionless quantity, with \(C_n\) alternatively defined as:

\[\begin{split}\begin{eqnarray} C_{\text{X}} &=& H(X) \\ C_{\text{Y}} &=& H(Y) \\ C_{\text{X+Y}} &=& H(X) + H(Y) \\ C_{\text{MIN}} &=& \min \{ H(X), H(Y) \} \\ C_{\text{MAX}} &=& \max \{ H(X), H(Y) \} \\ C_{\text{XY}} &=& H(X,Y) \\ C_{\text{SQRT}} &=& \sqrt{H(X) H(Y)} \end{eqnarray}\end{split}\]

where \(H(\cdot)\) and \(H(\cdot,\cdot)\) respectively denote the entropy and joint entropy.

Estimation:

Normalised mutual information is estimated based on frequency tables, using the following functions:

entropy_joint()

entropy()

See below for a list of available estimators. Note that although normalised mutual information is a non-negative quantity, depending on the chosen estimator the obtained estimate may be negative.

Parameters:

X,Y : numpy array (or array-like object such as a list of immutables, as accepted by np.array())

cartesian_product==False and Y is not None: X and Y are arrays containing discrete random variable realisations, with X.shape==Y.shape. Successive realisations of a random variable are indexed by the last axis in the respective arrays; multiple random variables in X and Y may be specified using preceding axes of the respective arrays (random variables are paired one-to-one between X and Y). When X.ndim==Y.ndim==1, returns a scalar. When X.ndim>1 and Y.ndim>1, returns an array of estimated normalised information values with dimensions X.shape[:-1]. Neither X nor Y may contain (floating point) NaN values. Missing data may be specified using numpy masked arrays, as well as using standard numpy array/array-like objects; see below for details.

cartesian_product==True and Y is not None: X and Y are arrays containing discrete random variable realisations, with X.shape[-1]==Y.shape[-1]. Successive realisations of a random variable are indexed by the last axis in the respective arrays; multiple random variables in X and Y may be specified using preceding axes of the respective arrays (random variables are paired many-to-many between X and Y). When X.ndim==Y.ndim==1, returns a scalar. When X.ndim>1 or Y.ndim>1, returns an array of estimated normalised information values with dimensions np.append(X.shape[:-1],Y.shape[:-1]). Neither X nor Y may contain (floating point) NaN values. Missing data may be specified using numpy masked arrays, as well as using standard numpy array/array-like objects; see below for details.

Y is None: Equivalent to information_mutual_normalised(X, X, norm_factor, True). Thus, a shorthand syntax for computing normalised mutual information (based on C_n = C_Y as defined above) between all pairs of random variables in X is information_mutual_normalised(X).

norm_factor : string

The desired normalisation factor, specified as a string. Internally, the supplied string is converted to upper case and spaces are discarded. Subsequently, the function tests for one of the following string values, each corresponding to an alternative normalisation factor as defined above:

‘X’

‘Y’ (the default value)

‘X+Y’ (equivalently ‘Y+X’)

‘MIN’

‘MAX’

‘XY’ (equivalently YX)

‘SQRT’

cartesian_product : boolean
Indicates whether random variables are paired one-to-one between X and Y (cartesian_product==False, the default value) or many-to-many between X and Y (cartesian_product==True).
fill_value : object

It is possible to specify missing data using numpy masked arrays, pandas Series/DataFrames, as well as using standard numpy array/array-like objects with assigned placeholder values. When using numpy masked arrays, this function invokes np.ma.filled() internally, so that missing data are represented with the array’s object-internal placeholder value fill_value (this function’s fill_value parameter is ignored in such cases). When using pandas Series/DataFrames, an initial conversion to a numpy masked array is performed. When using standard numpy array/array-like objects, this function’s fill_value parameter is used to specify the placeholder value for missing data (defaults to -1).

Data equal to the placeholder value are subsequently ignored.

estimator : str or float

The desired estimator (see above for details on estimators). Possible values are:

‘ML’ (the default value) : Maximum likelihood estimator.

any floating point value : Maximum a posteriori esimator using Dirichlet prior (equivalent to maximum likelihood with pseudo-count for each outcome as specified).

PERKS : Maximum a posteriori esimator using Dirichlet prior (equivalent to maximum likelihood with pseudo-count for each outcome set to 1/L, where L is the number of possible outcomes.

MINIMAX : Maximum a posteriori esimator using Dirichlet prior (equivalent to maximum likelihood with pseudo-count for each outcome set to sqrt(N)/L, where N is the total number of realisations and where L is the number of possible outcomes.

JAMES-STEIN : James-Stein estimator [HaSt09].

GOOD-TURING : Good-Turing estimator [GaSa95].

Alphabet_X, Alphabet_Y : numpy array (or array-like object such as a list of immutables, as accepted by np.array())

Respectively an array specifying the alphabet/alphabets of possible outcomes that random variable realisations in array X, Y may assume. Defaults to None, in which case the alphabet/alphabets of possible outcomes is/are implicitly based the observed outcomes in array X, Y respectively, with no additional, unobserved outcomes. In combination with any estimator other than maximum likelihood, it may be useful to specify alphabets including unobserved outcomes. For such cases, successive possible outcomes of a random variable are indexed by the last axis in Alphabet_X, Alphabet_Y respectively; multiple alphabets may be specified using preceding axes, with the requirement X.shape[:-1]==Alphabet_X.shape[:-1] (analogously for Y). Alphabets of different sizes may be specified either using numpy masked arrays, or by padding with the chosen placeholder fill_value.

NB: When specifying alphabets, an alphabet of possible joint outcomes is always implicit from the alphabets of possible (marginal) outcomes in Alphabet_X, Alphabet_Y. For example, specifying Alphabet_X=Alphabet_Y=np.array(((1,2)) implies an alphabet of possible joint outcomes np.array(((1,1,2,2),(1,2,1,2))).

keep_dims : boolean
When set to True and cartesian_product==False an additional dimension of length one is appended to the returned array, facilitating any broadcast operations required by the user (defaults to False). Has no effect when cartesian_product==True.

Implementation notes:

Before estimation, outcomes are mapped to the set of non-negative integers internally, with the value -1 representing missing data. To avoid this internal conversion step, supply integer data and use the default fill value -1.

discrete_random_variable.information_variation(X, Y=None, cartesian_product=False, base=2, fill_value=-1, estimator='ML', Alphabet_X=None, Alphabet_Y=None, keep_dims=False)[source]

Returns the variation of information [Meil03] between arrays X and Y, each containing discrete random variable realisations.

Mathematical definition:

Given discrete random variables \(X\), \(Y\), the variation of information \(VI(X;Y)\) is defined as:

\[VI(X;Y) = H(X|Y) + H(Y|X)\]

where \(H(\cdot|\cdot)\) denotes the conditional entropy.

Estimation:

Variation of information is estimated based on frequency tables, using the following functions:

entropy_joint()

entropy()

See below for a list of available estimators. Note that although variation of information is a non-negative quantity, depending on the chosen estimator the obtained estimate may be negative.

Parameters:

X,Y : numpy array (or array-like object such as a list of immutables, as accepted by np.array())

cartesian_product==False and Y is not None: X and Y are arrays containing discrete random variable realisations, with X.shape==Y.shape. Successive realisations of a random variable are indexed by the last axis in the respective arrays; multiple random variables in X and Y may be specified using preceding axes of the respective arrays (random variables are paired one-to-one between X and Y). When X.ndim==Y.ndim==1, returns a scalar. When X.ndim>1 and Y.ndim>1, returns an array of estimated information values with dimensions X.shape[:-1]. Neither X nor Y may contain (floating point) NaN values. Missing data may be specified using numpy masked arrays, as well as using standard numpy array/array-like objects; see below for details.

cartesian_product==True and Y is not None: X and Y are arrays containing discrete random variable realisations, with X.shape[-1]==Y.shape[-1]. Successive realisations of a random variable are indexed by the last axis in the respective arrays; multiple random variables in X and Y may be specified using preceding axes of the respective arrays (random variables are paired many-to-many between X and Y). When X.ndim==Y.ndim==1, returns a scalar. When X.ndim>1 or Y.ndim>1, returns an array of estimated information values with dimensions np.append(X.shape[:-1],Y.shape[:-1]). Neither X nor Y may contain (floating point) NaN values. Missing data may be specified using numpy masked arrays, as well as using standard numpy array/array-like objects; see below for details.

Y is None: Equivalent to information_variation(X, X, … ). Thus, a shorthand syntax for computing variation of information (in bits) between all pairs of random variables in X is information_variation(X).

cartesian_product : boolean
Indicates whether random variables are paired one-to-one between X and Y (cartesian_product==False, the default value) or many-to-many between X and Y (cartesian_product==True).
base : float
The desired logarithmic base (default 2).
fill_value : object

It is possible to specify missing data using numpy masked arrays, pandas Series/DataFrames, as well as using standard numpy array/array-like objects with assigned placeholder values. When using numpy masked arrays, this function invokes np.ma.filled() internally, so that missing data are represented with the array’s object-internal placeholder value fill_value (this function’s fill_value parameter is ignored in such cases). When using pandas Series/DataFrames, an initial conversion to a numpy masked array is performed. When using standard numpy array/array-like objects, this function’s fill_value parameter is used to specify the placeholder value for missing data (defaults to -1).

Data equal to the placeholder value are subsequently ignored.

estimator : str or float

The desired estimator (see above for details on estimators). Possible values are:

‘ML’ (the default value) : Maximum likelihood estimator.

any floating point value : Maximum a posteriori esimator using Dirichlet prior (equivalent to maximum likelihood with pseudo-count for each outcome as specified).

PERKS : Maximum a posteriori esimator using Dirichlet prior (equivalent to maximum likelihood with pseudo-count for each outcome set to 1/L, where L is the number of possible outcomes.

MINIMAX : Maximum a posteriori esimator using Dirichlet prior (equivalent to maximum likelihood with pseudo-count for each outcome set to sqrt(N)/L, where N is the total number of realisations and where L is the number of possible outcomes.

JAMES-STEIN : James-Stein estimator [HaSt09].

GOOD-TURING : Good-Turing estimator [GaSa95].

Alphabet_X, Alphabet_Y : numpy array (or array-like object such as a list of immutables, as accepted by np.array())

Respectively an array specifying the alphabet/alphabets of possible outcomes that random variable realisations in array X, Y may assume. Defaults to None, in which case the alphabet/alphabets of possible outcomes is/are implicitly based the observed outcomes in array X, Y respectively, with no additional, unobserved outcomes. In combination with any estimator other than maximum likelihood, it may be useful to specify alphabets including unobserved outcomes. For such cases, successive possible outcomes of a random variable are indexed by the last axis in Alphabet_X, Alphabet_Y respectively; multiple alphabets may be specified using preceding axes, with the requirement X.shape[:-1]==Alphabet_X.shape[:-1] (analogously for Y). Alphabets of different sizes may be specified either using numpy masked arrays, or by padding with the chosen placeholder fill_value.

NB: When specifying alphabets, an alphabet of possible joint outcomes is always implicit from the alphabets of possible (marginal) outcomes in Alphabet_X, Alphabet_Y. For example, specifying Alphabet_X=Alphabet_Y=np.array(((1,2)) implies an alphabet of possible joint outcomes np.array(((1,1,2,2),(1,2,1,2))).

keep_dims : boolean
When set to True and cartesian_product==False an additional dimension of length one is appended to the returned array, facilitating any broadcast operations required by the user (defaults to False). Has no effect when cartesian_product==True.

Implementation notes:

Before estimation, outcomes are mapped to the set of non-negative integers internally, with the value -1 representing missing data. To avoid this internal conversion step, supply integer data and use the default fill value -1.

Indices and tables