M-tree Home Page

The M-tree is an index structure that can be used for the efficient resolution of similarity queries on complex objects to be compared using an arbitrary metric, i.e. a distance function d that satisfies the positivity, symmetry, and triangle inequality postulates. For instance, with the M-tree you can index a set of strings and organize them according to their edit distances (the minimal number of character changes needed to transform one string into another).

More in general, M-tree can be profitably used for content-based retrieval on multimedia data repositories, provided relevant features (e.g., color histograms from still images) have been extracted from the objects. Note that, since metric spaces strictly include vector spaces, M-tree has a far more general applicability than multi-dimensional (also called spatial) access methods, such as the R-tree. Since its first presentation in VLDB 97, M-tree has been successfully used in a number of different research fields, including Multimedia Data Indexing, Data Mining, and Case-Based Reasoning.

The basic version of the M-tree can handle simple range and nearest neighbors queries, where the user provides a query object and a minimum distance threshold (resp. the number of nearest neighbors). Since in the general case assessing the similarity between two objects can be a very time-consuming task, in the design of M-tree we have paid particular care to minimize, besides I/O costs (page reads), also CPU costs (distance computations). Actual performance of M-tree can also be accurately predicted by using the cost models we have developed.

Extensions

M-tree functionalities are evolving over time. Starting from the basic version, we have extended it to deal with a wider range of similarity queries. In particular, M-tree now supports:

Complex similarity queries (queries consisting of more than one similarity predicate), where all the similarity predicates refer to a single feature.
Approximate nearest neighbor queries, where the user has the possibility to trade-off quality of the result for search costs.

Implementation

M-tree implementation is based on the GiST C++ package, and the version numbering follows that of GiST libraries (this is why numbering of releases started from 0.9).