multidim.models.CDER

class multidim.models.CDER(stop_level=None, **kwargs)[source]

The CDER (Cover-Tree Differencing for Entropy Reduction) algorithm for supervised machine-learning of labelled cloud collections. This uses the “Cover Tree with Friends” algorithm along with an entropy computation to build a regional classifer for labelled pointclouds. It relies on the multidim.covertree.CoverTree and multidim.covertree.CoverLevel data structures.

See the paper [CDER1] and the talk [CDER2].

Parameters:
parsimonious : bool

Whether to use the parsimonious version of CDER, where a region is ignored once entropy reached a local minimum. For most datasets, parsimonious=False provides a better classifier, but they are more expensive to evaluate. In many cases, parsimonious=False is good enough, and is significantly faster. Default: True

References

[CDER1](1, 2) Supervised Learning of Labeled Pointcloud Differences via Cover-Tree Entropy Reduction https://arxiv.org/abs/1702.07959
[CDER2](1, 2) CDER, Learning with Friends https://www.ima.umn.edu/2016-2017/DSS9.6.16-5.30.17/26150

Examples

We’ll make a simple dataset with one “green” pointcloud sampled from a uniform distribution on the 1x1 square centered at (-1,0), and one “magenta” pointcloud sampled from a uniform distribution on the 1x1 square centered at (1, 0).

>>> import numpy as np
>>> import multidim
>>> train_dataL = np.random.rand(100,2) - np.array([-1.5, -0.5])  # for green
>>> # dataL.mean(axis=0)  # should be near (-1, 0)
>>> train_dataR = np.random.rand(200,2) - np.array([0.5, -0.5])  # for magenta
>>> # dataR.mean(axis=0)  # should be near (+1, 0)
>>> cder = CDER(parsimonious=True)  # prepare a classifier
>>> cder.fit([train_dataL, train_dataR], ["green", "magenta"])  # this runs CoverTree
>>> for g in cder.gaussians:
...     print(sorted(list(g.keys())))
...     break
['adult', 'count', 'entropy', 'index', 'label', 'level', 'mean', 'radius', 'rotation', 'std', 'weight']
>>> test_dataL = np.random.rand(50,2) - np.array([-1.5, -0.5])  # should be green
>>> test_dataR = np.random.rand(50,2) - np.array([0.5, -0.5])  # should be magenta
>>> cder.predict([test_dataL, test_dataR])  # Guess the labels
array(['green', 'magenta'], dtype=object)
>>> cder.score([test_dataL, test_dataR], ["green", "magenta"])  # Correct?
array([ True,  True], dtype=bool)

A more thorough example is at http://nbviewer.jupyter.org/github/geomdata/gda-public/blob/master/examples/example-cder.ipynb

CDER.build_gaussian(coverlevel, adult, …) Build a Gaussian from the children of this adult in a cover-tree (using only a particular label).
CDER.evaluate(x) Evaluate all gaussians against a pointcloud
CDER.fit(*training) Fit this estimator to training data.
CDER.gausscoords([parsimonious]) This is the Gaussian-building heart of the CDER algorithm.
CDER.get_params([deep]) Pass original kwargs internally.
CDER.plot(canvas[, style]) Plot a CDER model, using matplotlib or bokeh
CDER.predict(pointclouds) Predict labels of given pointclouds, based on previous training data fed to fit().
CDER.runit(pointclouds)
CDER.score(pointclouds, labels) Score predicted labels against known labels.
CDER.set_params(**params) Set the parameters of this estimator.