multidim.covertree.CoverTree¶
-
class
multidim.covertree.
CoverTree
(pointcloud, ratio=0.41421356237309503, exchange_teens=True, sort_orphans_by_mean=True)[source]¶ An efficient and convenient implementation of the “Cover Tree with Friends” algorithm. This implementation follows the notation and terminology of the paper [CDER1] as carefully as possible; they were written in concert.
A CoverTree is an Python iterator object [iter1] [iter2]. The
__next__()
and__getitem__()
methods yield theCoverLevel
with that index. The entire “friends” algorithm happens inmultidim.covertree.CoverTree.__next__()
Parameters: - pointcloud :
multidim.PointCloud
The original data from which to construct a cover tree. Note that the labeling/weighting/indexing system requires the use of
multidim.PointCloud
input, not merely anumpy.ndarray
. However,CoverTree
ignores all of the higher strata (edges, faces, and so on) of themultidim.PointCloud
. Only the points in stratum[0] are used.- ratio : float
Ratio \(\theta\) to shrink radii by at each step. Must satisfy \(0<\theta<1\). Good values are
0.5
orratio_Ag
orratio_Au
. Default:ratio_Ag
- exchange_teens : bool
Should teens be exchanged at each step, using Type-2 friends? Default:
True
- sort_orphans_by_mean : bool
Should orphans be re-ordered by their proximity to weighted mean of the labels? This is particularly useful for improving the cross-validation score of the
multidim.models.CDER
classifier. Disable for speed ordering of adults is irrelevant for your needs. Default:True
Yields: - :class:`multidim.covertree.CoverLevel`
From level 0 (one ball) until all points are separated. Each
CoverLevel
is cached once computed.
Notes
This section is excerpted and condensed from [CDER1]
Definition
Let \(X\) be a finite subset of \(\mathbb{R}^d\). The purpose of a cover tree is to build a filtration \(\emptyset \subset CL_0 \subset CL_1 \subset \cdots \subset CL_{\text{max}} = X\) by covering it with balls of smaller and smaller radius centered at points in the set. The points in \(CL_\ell\) are called the adults at level \(\ell\).
Specifically, a cover tree is a filtration of \(X\) with the following additional properties:
- \(CL_0\) contains a single point, \(a_0\). (see
_adult0
) - There is a radius \(r_0\) (see
_r0
) such that \(X\) is contained in the ball \(B(a_0, r_0)\) of radius \(r_0\) around \(a_0\) - There is a real number \(0< \theta < 1\) (see
ratio
) such that, for every \(\ell\), the set \(X\) is a subset of \(\cup_{a_i \in CL_\ell} B(a_i, r_\ell)\) where \(r_\ell = r_0 \theta^\ell\) - For each \(\ell\), if \(a_i, a_j \in CL_\ell\), then | a_i - a_j| > r_ell. No two adults lie in the same ball.
- For each \(\ell\), each point \(x \in X\) is assigned to a guardian \(a_i \in CL_\ell\) such that \(x\) lies in the ball \(B(a_i, r_\ell)\). We say \(x\) is a child of \(a_i\) at level \(\ell\). Each \(a_i \in CL_\ell\) is its own guardian and its own child.
- There is a tree structure on the (level, adult) pairs of the filtration \((\ell, a_i)\), where the tree relation \((\ell, a_i) \to (\ell+1, a_k)\) holds if \(a_k\) was a child of \(a_i\) at level \(\ell\). We say \(a_k\) is a successor of \(a_i\), and \(a_i\) is a predecessor of \(a_k\). Note that \((\ell, a_i) \to (\ell+1, a_i)\) for all \(a_i \in CL_\ell\)
Extending the maturation/reproduction metaphor of adults, children, and guardians above, a child \(x\) with guardian \(a_i\) at level \(\ell\) is called a teen if \(\frac12 r_\ell < \|a_i - x\|\), and it is called a youngin if \(\|a_i - x\| \leq \frac12 r_\ell\). The point of this is that we may require the additional condition:
- (Optional) On the previous condition, we can additionally require that
each \(x\) is the child of the nearest adult, if it lies in the
intersection of two or more balls of \(B(a_i, r_\ell)\). If two
adults are equally distant, choose the one of the lowest index. This
option is enforced by the
exchange_teens
flag.
When changing from level math:ell to level \(\ell+1\), the radius of each ball shrinks to \(r_{\ell+1} = \theta r_\ell\). Children farther than \(r_{\ell+1}\) from their guardians become orphans. We must decide whether these orphans should be adopted by other adults at level \(\ell+1\), or if the orphans should be emancipated as new adults at level \(\ell+1\). That is, the newly emancipated adults at level \(\ell+1\) comprise the cohort (see
cohort
) at level $ell+1$.We say \(a_j \in CL_{\ell}\) is an elder of \(a_k \in CL_{\ell+1}\) if the distance \(\|a_j - a_k\|\) is sufficiently small that \(a_j\) could have been emancipated from \(a_k\) between levels \(\ell\) and \(\ell+1\). That is, if the tree structure were unknown, then elders of \(a_j$ are the possible predecessors. If :math:\) is its own predecessor (because it was already an adult in \(CL_{\ell}\), then the only elder of \(a_k\) is itself.
Example
Consider this point cloud in \(\mathbb{R}^2\)
\[X = \{(0,0.1),(1,2),(0,1),(0,0),(2,2),(2,2.2),(3,3),(1,1)\}\]We index these points from 0 to 7 in the given order. We have the following filtration
\[\begin{split}&CL_0 = \{7\}\\ &CL_1 = \{3, 4, 6, 7\}\\ &CL_2 = \{1, 2, 3, 4, 6, 7\}\\ &CL_3 = \{1, 2, 3, 4, 6, 7\}\\ &CL_4 = \{1, 2, 3, 4, 5, 6, 7\}\\ &CL_5 = \{0, 1, 2, 3, 4, 5, 6, 7\}\\\end{split}\]We have the following cover ball radii
\[\begin{split}&r_0 = 2\sqrt{2}\\ &r_1 = \sqrt{2}\\ &r_2 = \frac{\sqrt{2}}{2}\\ &r_3 = \frac{\sqrt{2}}{4}\\ &r_4= \frac{\sqrt{2}}{8}\\ &r_5 = \frac{\sqrt{2}}{16}\end{split}\]Here we have \(a_0 = (1,1)\), \(r_0 = 2\sqrt{2}\), and \(\theta = 1/2\).
The Friends Algorithm
Our algorithm is based upon the concept of friends. To each adult there will be associated three types of friends. Types 1, 2, and 3 are used to build the
CoverTree
in typically linear time.Let \(a_i \in CL_\ell\), that is, \(a_i\) is an adult at level \(\ell\). Define the following thresholds
\[\begin{split}T_1(\ell) &= (2 + \theta)r_l \\ T_2(\ell) &= (2 + 2\theta)r_l \\ T_3(\ell) &= \frac{2}{1 - \theta}r_l.\end{split}\]It is elementary to show that \(T_1(l) < T_2(l) < T_3(l)\). Moreover, we have the recursion relation \(T_3(l) < T_3(l-1)\).
Each level of the filtreation and all of this associated data is stored in a
CoverLevel
object.The algorithm works like this, using a “reproduction” metaphor:
Level 0 (see
covertree[0]
of typeCoverLevel
) has a single adult. All points are its children. Its only friends are itself.…
Level \(\ell\) (see
covertree[l]
of typeCoverLevel
) has known adults, friends1, friends3, friends3, and children. We now compute level \(\ell+1.\) in__next__()
1. Shrink the radius by a factor of \(\theta\). Some children become orphans.
2. Orphans are adopted or become newly emanicpated adults. This uses \(T_1(\ell)\).
3. If
exhange_teens is True
, then children who are teens are re-assigned to the closest possible adult. This uses \(T_2(\ell)\).- Compute new friends3.
- Use new friends3 to compute new friends1, friends2, friends3.
Level \(\ell+1\) (see
covertree[l+1]
of typeCoverLevel
) has known adults, friends1, friends3, friends3, and children. We now compute level \(l+2\)…
Stop when all points are adults.
Levels are evaluated lazily and cached. For example, if no levels have been computed, then
covertree[3]
will compute levels 0, 1, 2, and 3. Thencovertree[5]
will use those values for 0, 1, 2, 3 to compute 4 and 5.References
[CDER1] (1, 2, 3) Supervised Learning of Labeled Pointcloud Differences via Cover-Tree Entropy Reduction https://arxiv.org/abs/1702.07959 [CDER2] CDER, Learning with Friends https://www.ima.umn.edu/2016-2017/DSS9.6.16-5.30.17/26150 [iter1] (1, 2) https://docs.python.org/3/library/stdtypes.html?highlight=iterator#iterator-types [iter2] (1, 2) https://wiki.python.org/moin/Iterator Examples
>>> pc = PointCloud.from_multisample_multilabel( ... [np.array([[0,0.1],[1,2],[0,1],[0,0],[2,2],[2,2.2],[3,3],[1,1]])], [None]) >>> ct = CoverTree(pc, ratio=0.5, sort_orphans_by_mean=False) >>> cl=ct.next() >>> list(cl.adults) [7] >>> pc.coords.values[7,:] array([ 1., 1.]) >>> cl Level 0 using 1 adults at radius 2.8284271247... >>> ct.next() Level 1 using 2 adults at radius 1.4142135623... >>> for cl in ct: ... print(cl.exponent, list(cl.adults)) 0 [7] 1 [7, 5] 2 [7, 5, 0, 1, 2, 6] 3 [7, 5, 0, 1, 2, 6] 4 [7, 5, 0, 1, 2, 6, 4] 5 [7, 5, 0, 1, 2, 6, 4, 3] >>> ct.cohort array([2, 2, 2, 5, 4, 1, 2, 0])
Attributes: - pointcloud :
multidim.PointCloud
The original dataset.
- ratio :
numpy.float64
Ratio \(\theta\) by which to shrink the ball radius between levels.
- _r0 :
numpy.float64
The initial radius at level 0.
- _adult0 :
numpy.int64
The index of the original adult. Typically, this is the index of the point nearest the weighted mean of the
PointCloud
- _levels :
collections.OrderedDict
An ordered dictionary to cache the levels computed so far, keyed by the index. Typically, a user would never access this directly. Insead, use
covertree[i]
- cohort :
numpy.ndarray
An array of
numpy.int64
, which keeps track of the cohort (that is, the level in the filtration) of each point. If a point has not been born as an adult yet, the value is -1- level_pointer : int
Index of the currently referenced
CoverLevel
, for iteration purposes. Setting this is like usingfile.seek()
on file objects. Usually, you don’t want to mess with it, but it is used internally inmutlidim.models.CDER
for comparing entropy between levels.- N : int
The number of points in
self.pointcloud
- allpoints :
numpy.ndarray
The raw NumPY array underlying
self.pointcloud
.
CoverTree.make_edges
([min_distance, …])Iterate over the edges between the points of the underlying PointCloud
, where min_distance < length <= max_distance.CoverTree.next
()See __next__()
CoverTree.plot
(canvas, **kwargs)Interactive plot of a CoverTree, with dynamic computation of levels. CoverTree.plot_tree
(canvas[, show_balls, …])Plot the tree of a CoverTree. CoverTree.reset
()Go to level -1. CoverTree.sparse_complex
([level])Make a sparse complex from this CoverTree, using the type-4 friends algorithm. - pointcloud :