Topology Optimization for DHT-based Application Layer Multicast

  • 49 trang
  • file .pdf
Table of Contents
Abstract 1
Acknowledgements iii
Abstract 1
1 Introduction 2
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Thesis structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2 Background 6
2.1 Multicast . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1.1.1 IP Multicast . . . . . . . . . . . . . . . . . . . . . . 7
2.1.1.2 Application Layer Multicast . . . . . . . . . . . . . 8
2.1.2 Application layer multicast protocols . . . . . . . . . . . . . . 9
2.1.2.1 Application Domain . . . . . . . . . . . . . . . . . . 10
2.1.2.2 Deployment Level . . . . . . . . . . . . . . . . . . . 10
2.1.2.3 Group Management . . . . . . . . . . . . . . . . . . 10
2.1.2.4 Routing Mechanism . . . . . . . . . . . . . . . . . . 11
2.2 DHT-based multicast . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2.1 Introduction of P2P Networks . . . . . . . . . . . . . . . . . . 12
2.2.1.1 Unstructured P2P Network model . . . . . . . . . . 13
2.2.1.2 Hybrid P2P Network model . . . . . . . . . . . . . . 14
2.2.1.3 Structured P2P Network model . . . . . . . . . . . . 15
2.2.2 DHT-based structure P2P networks . . . . . . . . . . . . . . . 16
iv
TABLE OF CONTENTS v
2.2.2.1 Chord Network . . . . . . . . . . . . . . . . . . . . . 18
2.2.2.2 Content Adressable Network . . . . . . . . . . . . . . 20
2.2.3 DHT-based multicast . . . . . . . . . . . . . . . . . . . . . . . 22
2.2.3.1 CAN-based multicast . . . . . . . . . . . . . . . . . . 22
2.2.3.2 Chord-based multicast . . . . . . . . . . . . . . . . . 24
2.2.3.3 Scribe . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.2.4 Topology optimization issues for DHT-based multicast . . . . 26
2.3 Related works on topology optimization for DHT-based multicast . . 26
2.3.1 SplitStream . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.3.2 Capacity Aware Multicast based on Overlay Network - CAM-
Chord . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.3.3 DHT-based lightweight broadcast algorithms in large-scale com-
puting infrastructures . . . . . . . . . . . . . . . . . . . . . . . 28
3 Bandwidth Adaptive Multicast over Chord : BAM Chord 31
3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.1.1 Node identifier . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.1.2 Finger table . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.2 Network Construction . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.3 Multicast method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4 Simulations and Evaluations 38
4.1 Simulation description . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.2 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.2.1 The depth of multicast tree . . . . . . . . . . . . . . . . . . . 38
4.2.2 Control Overhead . . . . . . . . . . . . . . . . . . . . . . . . . 40
5 Conclusions 42
List of Figures
2.1 Type of transmissions . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 IP Multicast . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3 A Comparison between IP multicast and application-layer multicast . 9
2.4 P2P Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.5 A unstructured P2P Network with a server . . . . . . . . . . . . . . . 13
2.6 Gnutella-like P2P . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.7 An example of Hybrid P2P Model . . . . . . . . . . . . . . . . . . . . 15
2.8 Scheme of a classic Hash Table . . . . . . . . . . . . . . . . . . . . . 16
2.9 Scheme of a Distributed Hash Table (DHT) . . . . . . . . . . . . . . 17
2.10 A Chord ring with 6 bit identifier. Notice how the finger table is
organized and how K54 is looked up following Chord’s algorithm . . . 18
2.11 Example of Chord Finger table . . . . . . . . . . . . . . . . . . . . . 20
2.12 Example of Content Addressable Network . . . . . . . . . . . . . . . 21
2.13 Smart multicast in CAN. The dot represents the starting node . . . . 23
2.14 An example of Chord-based multicast method. In this example, Node
1 sends messages to all nodes in the Chord ring. . . . . . . . . . . . . 24
2.15 A simple example illustrating the basic approach of SplitStream.
Original content is split into two stripes and correlatively, an inde-
pendent multicast tree is built for each stripes. A node is a leaf in
one tree and a interior in the other . . . . . . . . . . . . . . . . . . . 27
2.16 An example of CAM- Chord, neighbors of x. N = [0..31], cx = 3 . . . 28
2.17 Balanced Distributed broadcast tree construction using token in 16-
node Chord . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.18 Balanced Distributed broadcast tree construction using partition in
an 11-node Chord . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.1 An example of CAM- Chord, neighbors of x. N = [0..31], cx = 3 . . . 33
vi
LIST OF FIGURES vii
3.2 An example of CAM- Chord, neighbors of x. N = [0..31], cx = 3 . . . 35
3.3 An example of CAM- Chord, neighbors of x. N = [0..31], cx = 3 . . . 37
4.1 Average path length . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.2 Maximum path length . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.3 Average path length in Chord, CAM-Chord and BAM-Chord after
32 time steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.4 Average link number per node in Chord . . . . . . . . . . . . . . . . 41
4.5 Average link in Chord, CAM-Chord and BAM-Chord after 32 time
steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
List of Tables
2.1 Definition of variables for node n, using m-bit identifiers . . . . . . . 19
viii
Abstract
In recent years, Distributed Hash Table (DHT) becomes active and ongoing area of
research at a lot of universities and labs. DHT has many advantages: Decentraliza-
tion, scalability, fault tolerance, load balancing, data integrity, and performance,...
Those properties make DHTs are very suitable for deploying multicast services at
application layer and in fact, DHT-based network such as CAN, Chord, Pastry,
Tapestry, etc can be used to implement Internet-scale application layer multicast.
However, early DHT-based multicast systems are insufficient in addressing all
of these issues: Heterogeneous node capacity, large- scale multicast and dynamic
membership. Moreover, in those system, when one node joins into system through an
arbitrary way, some factors are not considered: nodes bandwidth, nodes positon on
DHT network (i.e node identifiers), thus, the multicast tree can be built inefficiently
and not balance in structure. The solution for assigning an appropriate number of
child nodes to each node is far from optimal in term of bandwidth: If the number
of child nodes is too high, low capacity node will be overloaded, therefore slows the
entire session multicast down. If the number of child nodes is too low, high capacity
nodes will be used inefficiently.
In this thesis, we study the method to optimize topology for DHT-based multi-
cast. We propose a DHT- based bandwidth adaptive multicast system that forcus
on host heterogeneity, scalibility, fault tolerate. In our system, nodes bandwidth is
firstly considered, result of this process is the basis for determining the level of the
node and correlatively caculating nodes identify. Level of a node is used to define
maximum number of its child nodes. As a result, in our model, each node is assigned
an optimal numbers of child nodes to forward multicast data. Thus, our method can
make tradeoff between depth of the multicast tree and bandwidth of every node and
take advandtages of DHTs in maintaining multicast tree in churn overlay. System
chosen for implementation and avaluation is Chord. This model is called Bandwidth
Adaptive Multicast over Chord: BAM-Chord.
1
Chapter 1
Introduction
1.1 Motivation
Today, Internet continues to grow rapidly, many communication requirements that
we must deal with. Cause of the explosion in data communications, demand on
transferring large volumes of data to many destination quickly increases. To solve
bandwidth problems, multicast is a effective solution. As the most cost-effective
way of delivering the same data to a multiple receivers at the same time, multicast
is suited to a large number of applications as mobile Internet, e-commerce, content
rich applications, and multi-media services: television broadcast, video conferencing.
The word ”multicast” is typically used to refer to network layer multicast - IP
multicast and was proposed about some decades ago. However, IP multicast still
has not been widely deployed because of various technical and administrational
reasons. The disadvantages of implementing multicast at the IP level has led to
an interesting application-level multicast. Application layer multicast refers to the
implementation of multicast capability at the application layer instead of network
layer. A number of application-level multicast systems have been proposed that are
built using structured peer-to-peer (p2p) overlays.
Originally, distribute hash table (DHTs) were developed with applications like
peer-to-peer file sharing. Recent years, there has been considerable interest in ap-
plying DHTs to overlay multicast applications since it has many advantages: Decen-
tralization, Scalability, Fault tolerance, Load balancing, and performance,...Those
properties make DHTs are very suitable for deploying multicast services at applica-
tion layer and in fact, DHT-based network such as CAN, Chord, Pastry, Tapestry,
2
1.1. Motivation 3
etc has been already used to implement Internet -scale applicastion layer muticast.
However, early DHT-based multicast systems are insufficient in addressing a number
of technical issues:
• Heterogeneous output bandwidth of Node: Since the number of child nodes
of a node in a multicast tree is decided based on the DHT algorithm without
consideration of nodes bandwidth capacity, a node with low bandwidth may
become a bottleneck if it is an internal node of a multicast tree
• Effective bandwidth utilization: If a node with high bandwidth is a leaf node,
the system can not utilize the bandwidth capacity of the node to deliver mul-
ticast messages.
• Dynamic membership: The optimization of multicast trees must be achieved
even when member nodes join or leave at anytime.
To limit the multicasting load on a node, M. Castro, el al. propose a solution
in SplitStream (M.Castro, ), where one node informs its out-degree. When it re-
ceives a futher join requests, it pushdown this request to its children. Nevertheless,
Bharambe, et al. explored and found heterogeneity issue is not address with Split-
Stream (Feb, February 24 25 2005). With CAM CAM-Chord and CAM-Koorde
designs in (6-1, 6 10 June 2005), heterogeneity is tackled by allowing node to al-
ter out-degree according to its capacity. However, they did not describe how to
maintain topology when the out-degree limits. Therefore, heterogeneity issue re-
mains open, and should be addressed before deploying DHTs for high-bandwidth
multicasting applications. S. Ratnasamy, M. Handley, R. Karp, and S. Shenker
proposed CAN-based multicast (S. Ratnasamy & Shenker, Nov 7 9 2001). That
was the first application-level multicast scheme scales to groups of several thou-
sands of nodes. Solution in (S.Q.Zhuang & J.D.Kubiatowicz., 2001) can scales to
large groups but has a single root node for each multicast group. It supports the
any-source model only by having the root node operate as a reflector for multi-
ple senders. There have been several ways to deal with dynamic membership: In
(S.Q.Zhuang & J.D.Kubiatowicz., 2001), they ensure that the root node of the mul-
ticasting tree does not handle all requests to join or leave the multicast group; In
(J. Li & Lim, 2005), use multiple interior-node- disjoint trees to avoid single points
of failure in tree structures; In (S. El-Ansary & Haridi, February 2003) The earliest
4 Chapter 1. Introduction
DHT-based broadcasting work by El-Ansary, et al. did not address the issue of
dynamic membership.
Moreover, in those system, one node join into system through an arbitrary way,
thus, the multicast tree can be built inefficiently and not balance in structure. The
solution for assigning an appropriate number of children to each node is far from
optimal in term of bandwidth: If the number of children is too high, low capacity
node will be overload, therefore slows the entire session multicast down. If the
number of children is too low, high capacity nodes will be used inefficiently.
In this thesis, we study the method to optimize topology for DHT-based multi-
cast. We propose a DHT- based bandwidth adaptive multicast system that forcus on
host heterogeneity, scalibility, fault tolerate. System chosen for implementation and
avaluation is Chord, a DHTs representative. We extend Chord to be more flexible
and it is called BAM-Chord: Bandwidth Adaptive Multicast Chord. Our method
can make tradeoff between depth of the multicast tree and out-degree of every node
and retain the advantages of a DHT network.
1.2 Objectives
This thesis aims our works at building a multicast system over DHT-based network.
Hence, we will study to attain the following objectives:
• Study P2P network, application layer multicast, DHT-based network and
DHT-based multicast system
• Study topology optimization problems for DHT-based multicast;
• Study models have been proposed on topology optimazation for DHT-based
multicast, advantages and disadvantages of those model
• Propose a improvement model for deploying multicast over DHT network:
BAM-Chord
• Evaluate the simulation results and compare with some existing model.
1.3. Contributions 5
1.3 Contributions
In the scope of this thesis,we propose a model for deploying multicast service over
DHT-based network. This thesis not only introduces the background of P2P net-
work, DHT-based network, overall view of DHT-based multicast and related tech-
nical issues, but also makes the following contributions:
• Proposes an improvement that can optimize network topology. This makes
multicast tree more balanced and node’s bandwidth adaptation. This
• Proposes an algorithm to find an optimal position when node joins into mul-
ticast session.
• Proposes a improvement model that can address technical issues in earlier
existing systems
1.4 Thesis structure
Thesis is organized in 5 chapters. Besides the Introduction in Chapter 1, the rest of
this thesis is organized as followed:
• Chapter 2 introduces multicast, IP multicast, application layer multicast,
DHT-based network, multicast over DHT-based network and related works
on topology optimization for DHT-based multicast
• Chapter 3 descirbes proposed model for deploying multicast over Chord, a
DHT representative: Banwidth adaptive multicast over Chord: BAM-Chord
• Chapter 4 shows the simulation and evaluate the simulation results in com-
parison with Chord, CAM-Chord
• Chapter 5 concludes our work and points out some possible improvements of
our system in the future.
Chapter 2
Background
2.1 Multicast
2.1.1 Introduction
Unicast is a basic type of transmission in which information is sent from only one
source to only one destination. In another words, Unicast transmission is between
one-to-one nodes. This type of transmission is used at almost network protocols:
IP, TCP, UDP, HTTP, FTP,.... However, when one sender sends the same data to
many receivers, it is not effective to use unicast.
Broadcast is a type of transmission in which information is sent from just one
source but is received by all destinations connected to the network. This would
mean that every time a node use broadcast transmission, all the other node will
receive that information.
An example of broadcast is ARP (Address Resolution Protocol) which will broad-
cast the address resolution request to all other computers on the network. Another
example is DHCP (Dynamic Host Configuration Protocol)
Figure 2.1: Type of transmissions
6
2.1. Multicast 7
Multicast is a very much different from Unicast and Broadcast in definition and
application as well. It is a type of transmission or communication in which the
information is sent by one source to a set of destination.
Many people think is an excellent transmission method for multimedia: Internet
radio broadcast, television broadcast, video conferencing, stock market tickers, slide
presentations, etc. However, multicast is also suited to a large number of other
applications: file transfers to multiple locations, dynamic web page updates, online
gaming, news feeds, chatrooms,...
Multicast is the most efficient method of delivering the same data to multiple
receivers at the same time. Servers send only one data stream to reach any number
of end-users. Using unicast, a server would need to send out as many streams as
there are receivers. This increases the CPU load, bandwidth required to reach that
audience. Using multicast, there will only be a single copy of the data streamed
across any links, therefore, can decrease bandwidth without upgrading links and
routers.
2.1.1.1 IP Multicast
The type of communication between one sender and one receiver (a connection be-
tween a Web browser and a server) can be addressed by basic TCP/TP network.
However, there is an emerging class of applications require multi-point communi-
cations between one or more senders and several receivers in order to work well.
In this case, data is sent from a sender to the receivers asynchronously as needed.
IP Multicast is designed to solve this problem. First introduced by S. Deering in
1988 [1], IP multicast is a method of sending Internet Protocol (IP) datagrams to a
group of interested receivers in a single transmission. In IP multicast, router has to
be equipped with multicasting capabilities. Router actively participate in multicast
and has a very important role. Routers make copies of packets as need and forward
towards multicast receivers.
In this model, all interested host form a multicast group for communicating with
each other through sending a message to multicast router using multicast group
membership discovery protocol as: IGMP (IPv4) or MLD (IPv6). IMGP stands
for Internet Group Management Protocol and MLD stands for Multicast Listener
Discovery. After joining to a multicast group, a host can receive all data that are
sent for that belonging multicast group without knowing the source address. Each
8 Chapter 2. Background
Figure 2.2: IP Multicast
of the multicast group is identified by a special multicast address called class-D
address (224.0.0.0 through 239.255.255.255).
IP muticast has all of advantages of multicast transmission. However, important
issues such as routing, group management, address allocation, authorization and
security, quality of Service (QoS) and scalability are not addressed effectively, IP
Multicast has not been widely deployed.
Due to the increasing number of application such as video-conferencing, multi-
party games, private chat rooms, web cache replication and database/directory repli-
cation, and a lack of widely deployment of IP multicast in all IP-based networks,
there has been interest in multicast protocols that can be supported without relying
on the IP multicast infrastructure, and do not depend on multicast-capable routers.
This is the so-called application-layer multicast (ALM).
2.1.1.2 Application Layer Multicast
The concept of application level multicast is implementation of multicasting func-
tionality at application layer instead of network layer. As a result, the multicast-
related functionalities are moved from routers to end-hosts. The basic idea of
application-layer multicast is shown in Figure 2.3 : S establishes unicast connec-
tions and send data to D1 and D2; In turn, D1, D2 delivers data to D3, D4 via
unicast connection respectively.
In IP multicast, data packets are replicated at routers inside the network. In
2.1. Multicast 9
Figure 2.3: A Comparison between IP multicast and application-layer multicast
ALM, data packets are replicated at end hosts. Logically, the end-hosts form an
overlay network, and the goal of ALM is to construct and maintain an overlay for
data transmission. Since IP Multicast is implemented by network routers and avoids
multiple copies of the same multicast packet on the same link. ALM is implemented
by application end-host and can not avoid multiple copies of the same packet on
the same link. Thus, ALM is less efficient than network layer multicast. However,
compared to IP multicast, application level multicast has some advantages:
• First, most proposed ALM models do not require any special support from
network routers, therefore, ALM can be widely deployed.
• Second, ALM is more easily deploy than IP multicast because ALM avoids
issues related to iner-domain multicast.
• Third, in recent years, numbers of improvements in ALM technology over-
come its weakness and ALM become a up-and coming solution, DHT-based
multicast is an example.
2.1.2 Application layer multicast protocols
There have been a lot of ALM protocols with a different approaches and characteris-
tics. In this thesis, we only review and give brief introduction about some important
categories and general approaches of different protocols as presented in (M. Hosseini,
2007).
10 Chapter 2. Background
2.1.2.1 Application Domain
The application domain determines the number of users that a protocol must sup-
port, the data types a protocol supply users, and the metrics that multicast tree
attempts to optimize. Thus, ALMs deploy as some categorization:
• Audio/video streaming: Protocols in this application domain usually involve
a single source distributing media to a large number of receivers. The metric
is bandwidth and latency
• Audio/video conferencing: Protocols in this application domain usually involve
a multi-party conferencing session, interacting groups in this session have small
to medium size. Important metrics are bandwidth and latency.
• Generic multicast service Protocols in this application domain usually try to
create a generic multicast service based on specific metrics that can affect a
variety of applications.
• Realiable data broadcast and file transfer: Reliable transfer and distribution
of files. Bandwidth is the only metric.
2.1.2.2 Deployment Level
Deployment level determines what level the protocol is expected to be deployed:
at the infrastructure level or end system level. Infrastructure-level, also known as
proxy-based ALM protocols, requires the deployment of dedicated servers/proxies
on the Internet where they self-organized into an overlay network and provides a
transparent multicast service to the end-user. End system level ALM protocols
assume only a unicast service from the infrastructure and expect end-system hosts
to participate in providing the multicasting functionality by taking on some of the
forwarding responsibility
2.1.2.3 Group Management
After deciding what is application domain and deployment level of ALM protocol,
designer must make some key decisions to determine how to manage a group of
nodes in a multicast session. Some key factors are considered: How users find out
the multicast session, how users join a session, how users leave, can the users still
contribute to existing multicast session even if they are not a part of it; Group
2.1. Multicast 11
management is centralized or distributed and how this affects the design and service
provided; Mesh-first approach or tree-first approach is taken? The advantages and
the disadvantages of each approach?
Mesh-first versus Tree-first In the mesh-first approach, the mest topology is
created at the begining, member nodes keep a connected in this topology. Then
multicast tree is built based on the mesh relative of nodes to the root. In the tree-
first approach, the multicast tree is built directly without any mesh. The member
nodes explicitly select their parent from the known members in the tree. Parents
is chosen based on running algorithm to detect and avoid loops. The advantages of
tree-first approach is that this approach gives direct control over the tree, mer nodes
can select a best parent neighbor that has enough resources. Another advantage of
the tree-first approach is that it has a lower communication overhead. However, the
disadvantage is that when a member changes a parent, it drags all of it descendents
with it. The advantage of the mesh-first approach is fault tolerance, it is possible
to manipulate the tree topology to a significant extent by selecting mesh neighbors
and changing the metrics. This approach is suitable for multi-source applications.
The disadvantages of this approach is that it is difficult to do load balancing and it
has higher control overhead.
Source Specific Tree versus Shared Tree Source-specific tree is built by minimizing
the length of the path to a specific individual destination. Shared tree is built by
minimizing the total number of hops or the cummulative end-to-end delay to forward
the packet to all the destination. The choice between a source-specific tree and
shared tree usually depends on whether the multiple sources use the same overlay
for data distribution or not. When there is a multiparty communication, shared trees
are preferred because it is better than source specific tree in term of the maintenance
cost. On the other hand, source specific tree allows for optimization of the tree for
a given source, but cannot support efficiently multiple sources on that tree
2.1.2.4 Routing Mechanism
Group 1: Shortest Path This group of ALMs use RTT measurement to determine
the shortest path tree from the source to the end hosts and minimize the time delay
for each applicastion while considering the degree constraint and QoS. A Shortest
Path Tree constructs a minimum cost path from source node to all its receivers. For
example: Yoid, SpreadIt, TAG...
12 Chapter 2. Background
Group 2: Minimum Spanning Tree This group tries to construct a low cost
tree. A Minimum Spanning Tree is a tree with minimum total cost spanning all the
members. For example: ALMI, HBM
Group 3: Clustering Structure This group constructs a cluster of nodes that can
be use to construct trees. For example: NICE, ZIGZAG
Group 4: Peer-to-Peer Structure In P2P structure, Reverse-path forwarding or
Forward-path forwarding is used for routing
2.2 DHT-based multicast
2.2.1 Introduction of P2P Networks
In simplest form, a peer-to-peer (P2P) network is created when two or more com-
puters are connected and share resources (such as processing power, disk storage
or network bandwidth) without going through a server. Contrast to client-server
networks, in P2P network, peers are both suppliers and consumers of resources. A
P2P network model allows many PCs to pool their resources together.
Figure 2.4: P2P Network
When Shawn Fanning created Napster in 1999, he did not imagine that today,
P2P networks have been receiving increasing demand from users. P2P networks are
decentralized, self-organized, and dynamic, they offer an alternative to the tradi-
tional client-server model of computing. In contrast to Client-server architecture,
P2P networks are almost unlimited in their scalability.
In recent years, due to the advantages of P2P networks, they have been widely
used in many ares: Instant Messaging, File Sharing, Collaborative Community, IP
2.2. DHT-based multicast 13
Telephony, High Performance Computing - Grid Computing the P2P networks have
become a focus of research at almost every major university.
2.2.1.1 Unstructured P2P Network model
Unstructured P2P network is form when each participating node connects to others
arbitrarily, and such networks can be easily constructed. When a new peer wants
to join the network, it can copy links of another existing node then forms its own
links.
Figure 2.5: A unstructured P2P Network with a server
In an unstructured P2P network, if a peer wants to find desired data in the
network, the query has to be flooded through the network to find as many peers as
possible. However, there is no way to guarantee that flooding will find a peer has
the desired data. Thus, the main disadvantage with unstructured P2P networks is
that the queries may not always be resolved. When a peer looks for data shared by
only a few other peers, its search will be likely unsuccessful and hence such networks
typically have very poor search efficiency. Also, the bandwidth cost of searching on
unstructured P2P network would grow exponentially to the number of connected
nodes.
The typical unstructured P2P networks is Gnutella. Peer nodes of the system
are organized randomly in unstructured overlay and described as servent. A new
node, who wants to join in the system, would perform an operation ”Join”. Firstly,
it would contact with a bootstrap node. Then bootstrap node would send back its
14 Chapter 2. Background
own contact list of existed node, which is chosen randomly. The new node would
store this list of the existed node as its neighbors. The new node may reach other
nodes in the communication network base on its neighbor. Following figure shows
an example of locating resources in a Gnutella-like P2P environment.
Figure 2.6: Gnutella-like P2P
2.2.1.2 Hybrid P2P Network model
In Unstructured P2P Network model, there is no correlation between a peer and its
data, queries might not always be resolved. In hybrid P2P network model, problems
of routing and lookup in unstructured P2P Network are solved . There peer nodes
are divided into two types, data nodes and hub nodes. Data node stores real data
and information of a hub node. Hub node is the same as a ”server”, it do not store
real data. It store only indexes to files in the network. To join in the system, each
node has to contact with a server. Since, the server would create indexes for these
files, and then it stores these indexes in its database.
As unstructured P2P Network model, Hybrid P2P network is easily constructed
because it does not need to implement distribution and routing algorithms. More-
over, Hybrid model does not use query flooding or random searching, since it is
more efficient than unstructured model. However, this model is essentially based
on ”servers” to guarantee successful routing and searching, system can not exist if
those servers get error.
In Hybrid P2P Model, eDonkey network is a typical example. There is no central
hub for the network. eDonkey client programs connect to the network to share files.
2.2. DHT-based multicast 15
Figure 2.7: An example of Hybrid P2P Model
eDonkey servers act as communication hubs for the clients, allowing users to locate
files within the network.
2.2.1.3 Structured P2P Network model
Unlike Hybrid P2P network where some hub nodes acting as ”servers”, the Struc-
tured P2P Network is completely decentralized and self-organizing. Structured P2P
model is constructed to achieve efficiently data searching, routing and distribution
among peers. Each node has capacities of storing data and routing to other nodes
based on a common routing algorithm. Nodes are arranged in a structured fashion.
When a new node joins into the network, a network identify (ID) is assigned to
it, this ID is created by using a global consistent hash function. Node ID represent
its position on the network and correlatively, a portion of data corresponding to.
When a node wishes to look for a resource, it must be redirected to the node which
is supposed to hold it by using a efficient look up function.
Structured P2P network is a scalable, efficient, completely decentralized and
self-organizing, and load balanced model. Each node stores information of O(logN)
neighbors. Routing algorithm allows it looking up any key with O(logN) hop time.
Because of the routing algorithm based on globally consistent hash function, a set of
keys are distributed equally to the key space. However, this P2P network model faces
to some challengers such as: avoiding bottlenecks in particular nodes, adaptation to