multidim.models.CDER¶

class multidim.models.CDER(stop_level=None, **kwargs)[source]¶

The CDER (Cover-Tree Differencing for Entropy Reduction) algorithm for supervised machine-learning of labelled cloud collections. This uses the “Cover Tree with Friends” algorithm along with an entropy computation to build a regional classifer for labelled pointclouds. It relies on the multidim.covertree.CoverTree and multidim.covertree.CoverLevel data structures.

See the paper [CDER1] and the talk [CDER2].

Parameters:	parsimonious : bool Whether to use the parsimonious version of CDER, where a region is ignored once entropy reached a local minimum. For most datasets, `parsimonious=False` provides a better classifier, but they are more expensive to evaluate. In many cases, `parsimonious=False` is good enough, and is significantly faster. Default: `True`

See also

multidim.covertree.CoverTree

References

[CDER1]

(1, 2) Supervised Learning of Labeled Pointcloud Differences via Cover-Tree Entropy Reduction https://arxiv.org/abs/1702.07959

[CDER2]

(1, 2) CDER, Learning with Friends https://www.ima.umn.edu/2016-2017/DSS9.6.16-5.30.17/26150

Examples

We’ll make a simple dataset with one “green” pointcloud sampled from a uniform distribution on the 1x1 square centered at (-1,0), and one “magenta” pointcloud sampled from a uniform distribution on the 1x1 square centered at (1, 0).

>>> import numpy as np
>>> import multidim
>>> train_dataL = np.random.rand(100,2) - np.array([-1.5, -0.5])  # for green
>>> # dataL.mean(axis=0)  # should be near (-1, 0)
>>> train_dataR = np.random.rand(200,2) - np.array([0.5, -0.5])  # for magenta
>>> # dataR.mean(axis=0)  # should be near (+1, 0)
>>> cder = CDER(parsimonious=True)  # prepare a classifier
>>> cder.fit([train_dataL, train_dataR], ["green", "magenta"])  # this runs CoverTree
>>> for g in cder.gaussians:
...     print(sorted(list(g.keys())))
...     break
['adult', 'count', 'entropy', 'index', 'label', 'level', 'mean', 'radius', 'rotation', 'std', 'weight']
>>> test_dataL = np.random.rand(50,2) - np.array([-1.5, -0.5])  # should be green
>>> test_dataR = np.random.rand(50,2) - np.array([0.5, -0.5])  # should be magenta
>>> cder.predict([test_dataL, test_dataR])  # Guess the labels
array(['green', 'magenta'], dtype=object)
>>> cder.score([test_dataL, test_dataR], ["green", "magenta"])  # Correct?
array([ True,  True], dtype=bool)

A more thorough example is at http://nbviewer.jupyter.org/github/geomdata/gda-public/blob/master/examples/example-cder.ipynb

`CDER.build_gaussian`(coverlevel, adult, …)	Build a Gaussian from the children of this adult in a cover-tree (using only a particular label).
`CDER.evaluate`(x)	Evaluate all gaussians against a pointcloud
`CDER.fit`(*training)	Fit this estimator to training data.
`CDER.gausscoords`([parsimonious])	This is the Gaussian-building heart of the CDER algorithm.
`CDER.get_params`([deep])	Pass original kwargs internally.
`CDER.plot`(canvas[, style])	Plot a CDER model, using matplotlib or bokeh
`CDER.predict`(pointclouds)	Predict labels of given pointclouds, based on previous training data fed to `fit()`.
`CDER.runit`(pointclouds)
`CDER.score`(pointclouds, labels)	Score predicted labels against known labels.
`CDER.set_params`(**params)	Set the parameters of this estimator.

Previous topic

multidim.models

Next topic

multidim.models.CDER.build_gaussian