multidim.models.CDER¶
-
class
multidim.models.
CDER
(stop_level=None, **kwargs)[source]¶ The CDER (Cover-Tree Differencing for Entropy Reduction) algorithm for supervised machine-learning of labelled cloud collections. This uses the “Cover Tree with Friends” algorithm along with an entropy computation to build a regional classifer for labelled pointclouds. It relies on the
multidim.covertree.CoverTree
andmultidim.covertree.CoverLevel
data structures.See the paper [CDER1] and the talk [CDER2].
Parameters: - parsimonious : bool
Whether to use the parsimonious version of CDER, where a region is ignored once entropy reached a local minimum. For most datasets,
parsimonious=False
provides a better classifier, but they are more expensive to evaluate. In many cases,parsimonious=False
is good enough, and is significantly faster. Default:True
See also
References
[CDER1] (1, 2) Supervised Learning of Labeled Pointcloud Differences via Cover-Tree Entropy Reduction https://arxiv.org/abs/1702.07959 [CDER2] (1, 2) CDER, Learning with Friends https://www.ima.umn.edu/2016-2017/DSS9.6.16-5.30.17/26150 Examples
We’ll make a simple dataset with one “green” pointcloud sampled from a uniform distribution on the 1x1 square centered at (-1,0), and one “magenta” pointcloud sampled from a uniform distribution on the 1x1 square centered at (1, 0).
>>> import numpy as np >>> import multidim >>> train_dataL = np.random.rand(100,2) - np.array([-1.5, -0.5]) # for green >>> # dataL.mean(axis=0) # should be near (-1, 0) >>> train_dataR = np.random.rand(200,2) - np.array([0.5, -0.5]) # for magenta >>> # dataR.mean(axis=0) # should be near (+1, 0) >>> cder = CDER(parsimonious=True) # prepare a classifier >>> cder.fit([train_dataL, train_dataR], ["green", "magenta"]) # this runs CoverTree >>> for g in cder.gaussians: ... print(sorted(list(g.keys()))) ... break ['adult', 'count', 'entropy', 'index', 'label', 'level', 'mean', 'radius', 'rotation', 'std', 'weight'] >>> test_dataL = np.random.rand(50,2) - np.array([-1.5, -0.5]) # should be green >>> test_dataR = np.random.rand(50,2) - np.array([0.5, -0.5]) # should be magenta >>> cder.predict([test_dataL, test_dataR]) # Guess the labels array(['green', 'magenta'], dtype=object) >>> cder.score([test_dataL, test_dataR], ["green", "magenta"]) # Correct? array([ True, True], dtype=bool)
A more thorough example is at http://nbviewer.jupyter.org/github/geomdata/gda-public/blob/master/examples/example-cder.ipynb
CDER.build_gaussian
(coverlevel, adult, …)Build a Gaussian from the children of this adult in a cover-tree (using only a particular label). CDER.evaluate
(x)Evaluate all gaussians against a pointcloud CDER.fit
(*training)Fit this estimator to training data. CDER.gausscoords
([parsimonious])This is the Gaussian-building heart of the CDER algorithm. CDER.get_params
([deep])Pass original kwargs internally. CDER.plot
(canvas[, style])Plot a CDER model, using matplotlib or bokeh CDER.predict
(pointclouds)Predict labels of given pointclouds, based on previous training data fed to fit()
.CDER.runit
(pointclouds)CDER.score
(pointclouds, labels)Score predicted labels against known labels. CDER.set_params
(**params)Set the parameters of this estimator.