Design problem
• Design strategies(top-down, bottom-up)
• Fragmentation (horizontal and vertical)
• Allocation and replication of fragments, optimality, heuristics


Top-down approach
– Designing systems from scratch
– Homogeneous systems

Bottom-up approach
– The databases already exist at a number of sites
– The databases should be connected to solve common tasks


Distribution design is the central part of the design in DDBMSs (the other tasks are
similar to traditional databases)
– Objective: Design the LCSs by distributing the entities (relations) over the sites
– Two main aspects have to be designed carefully
∗ Fragmentation
· Relation may be divided into a number of sub-relations, which are distributed
∗ Allocation and replication
· Each fragment is stored at site with ”optimal” distribution
· Copy of fragment may be maintained at several sites


Fragmentation aims to improve:
– Reliability
– Performance
– Balanced storage capacity and costs
– Communication costs
– Security


Correctness Rules of Fragmentation
• Completeness
– Decomposition of relation R into fragments R 1 , R 2 , . . . , R n is complete iff each
data item in R can also be found in some R i .
• Reconstruction
– If relation R is decomposed into fragments R 1 , R 2 , . . . , R n , then there should exist
some relational operator ∇ that reconstructs R from its fragments, i.e.,
R = R 1 ∇ . . . ∇R n
∗ Union to combine horizontal fragments
∗ Join to combine vertical fragments
• Disjointness
– If relation R is decomposed into fragments R 1 , R 2 , . . . , R n and data item d i
appears in fragment R j , then d i should not appear in any other fragment R k , k = j
(exception: primary key attribute for vertical fragmentation)
∗ For horizontal fragmentation, data item is a tuple
∗ For vertical fragmentation, data item is an attribute

Replication and Allocation
• Replication: Which fragements shall be stored as multiple copies?
– Complete Replication
∗ Complete copy of the database is maintained in each site
– Selective Replication
∗ Selected fragments are replicated in some sites

Rule of thumb: read/update query > or equal to 1, replication is better.
• Allocation: On which sites to store the various fragments?
– Centralized
∗ Consists of a single DB and DBMS stored at one site with users distributed across
the network
– Partitioned
∗ Database is partitioned into disjoint fragments, each fragment assigned to one site

Required information for allocation
– Database Information
∗ selectivity of fragments
∗ size of a fragment
– Application Information
∗ RR ij : number of read accesses of a query q i to a fragment F j
∗ U R ij : number of update accesses of query q i to a fragment F j
∗ u ij : a matrix indicating which queries updates which fragments,
∗ r ij : a similar matrix for retrievals
∗ originating site of each query
– Site Information
∗ U SC k : unit cost of storing data at a site S k
∗ LP C k : cost of processing one unit of data at a site S k
– Network Information
∗ communication cost/frame between two sites
∗ frame size