Chapter 3: Data Storage and Access Methods
•
Title:
The R* Tree: An Efficient and Robust Access
Method for Points and Rectangles
•
Authors:
N. Beckmann, H. Kriegel, R. Schneider and B.
Seeger
•
Pages:
207

216
The R* Tree: An Efficient and Robust Access
Method for Points and Rectangles
•
Problem
–
Problem Statement
–
Why is this problem important?
–
Why is this problem hard?
•
Approaches
–
Approach description, key concepts
–
Contributions (novelty, improved)
–
Assumptions
Problem Statement
–
R* Tree
•
Given
–
Data containing points and rectangles
–
Spatial queries (point, range query, insert, delete)
•
Find

An Access Method (Data Structure)
–
A hierarchical organization of rectangles
–
Example from wikipedia
•
Objectives
–
Efficiency of spatial queries
•
Constraints
–
Balanced tree
–
Each node is a disk page and has >= m (min # of entries) entries.
–
Root has at least two children unless it is a leaf
–
Efficiency metric = number of disk

pages accessed
Why is this problem important?
•
Multi

dimensional Applications
–
Large geographic data. e.g., Map objects like countries occupy
regions of non

zero size in two dimension.
–
Common real world usage: “Find all museums within 2 miles of
my current location".
–
CAD
–
…
•
Many DBMS servers support spatial indices
–
Orcale, IBM DB2, …
Why is this problem Hard?
•
B

tree split methods ineffective in 2

dimensions
–
Ex. Sorting
•
Size variation across data Rectangles
–
Large rectangles limit split options!
•
Non

uniform data distribution over space
•
Dynamic Access Method
–
Insertions and deletions
–
Overlapping directory rectangles => multiple search paths
Novelty of Contribution
•
Related Work
–
Traditional one

dimensional indexing structures
(e.g., hash, B

tree)
are not appropriate for range search
–
B+ tree
•
Represents sorted data in a way that allows for efficient insertion and
removal of elements.
•
Dynamic, multilevel index with maximum and minimum bounds on the
number of keys in each node.
•
Leaf nodes are linked
together as a linked list to make range queries easy.
–
R

tree
•
R

tree is a foundation for spatial access method
•
A complex spatial object is represented by
minimum bounding rectangles
while preserving essential geometric properties
•
Over

lapping regions
•
Heuristic:
minimize the area of each enclosing rectangle in the inner nodes.
Principles of R

tree
Reference: A Guttman ‘R

tree a dynamic index structure for spatial searching’, 1984
•
Height

balanced tree similar to a B

tree with index records
in its leaf nodes containing pointers to data objects.
•
Heuristic Optimization: minimize the area of each
enclosing rectangle in the inner nodes.
Performance Parameters beyond R

tree
•
(Q1) The area covered by a directory rectangle should be minimized.
•
(Q2) The overlap between directory rectangles should be minimized.
•
(Q3) The margin of a directory rectangle should be minimized.
•
(Q4) Storage utilization should be optimized.
•
Intuitions:
–
Reduce overlap between sibling nodes.
–
Reduce traversal of multiple branches for point query
–
Reinsert old data changes entries between neighboring nodes and thus
decreases overlap.
–
Due to more restructuring, less splits occur
Difference between R

tree and R*

tree
•
Minimization of area, margin, and overlap
is crucial to the
performance of R

tree / R*

tree.
•
The R*

tree attempts to reduce the tree, using a combination of a
revised node split algorithm
and the concept of
forced reinsertion at
node overflow
. This is based on the observation that
R

tree structures
are highly susceptible to the order
in which their entries are inserted,
so an insertion

built (rather than bulk

loaded) structure is likely to be
sub

optimal. Deletion and reinsertion of entries allows them to "find" a
place in the tree that may be
more appropriate than their original
location
.
Improve retrieval performance
Example
R1
R2
R3
R5
R4
R1
R2
R3
R5
R4
R1
R2
R3
R5
R4
Preferred by R

tree
Preferred by R*

tree
Validation Methodology
•
Methodology
–
Experiments with simulated workloads
–
Evaluation of design decisions
•
Results
–
R*

tree outperforms variants of R

tree and 2

level grid file.
–
R*

tree is robust against non

uniform data distributions.
Summary
•
Paper’s focus
–
R*

tree
–
implementations and performance
•
Ideas
–
Heuristic Optimizations (pp. 208)
•
Reduction of area, margin, and overlap of the directory rectangles
–
Better Storage Utilization (pp 211)
•
Forced Reinsertion (splits can be prevented)
•
Experimental comparison
–
Using many data distributions
Assumptions, Rewrite today
•
Assumptions
–
Indexing data in two

dimensional space
–
Bulk load and bulk reorganization not available
–
Concurrency control and recovery costs are negligible
•
Reinserts during split!
•
Rewrite today
–
Bulk

load of rectangles
–
Compare with newer methods
•
R+ tree (disjoint sibling), Hilbert

R

tree
–
Analytical results
•
Formally compare R*

tree with alternatives
Comments 0
Log in to post a comment