Peer-to-Peer Research
Link to the DKS Peer-to-Peer Middleware Homepage
Introduction
Internet's tremendous success can be witnessed by its 200 million
estimated hosts. Few, however, know that the Internet, or its
precursor, Arpanet, was designed for military survivability. The
underlying infrastructure, which routes our traffic across the earth,
is best-effort and self-organizing. Information about routes is
gossiped constantly between the routers, which form the backbone of
the Internet today. As a consequence, the Internet is self-organizing,
as information is re-routed if a link is broken or inaccessible for
the moment. There is no central server which controls or oversees the
routes on the Internet.
Despite the tenets of the Internet, services, software and systems
running on-top of the Internet are highly centralized, static, and not
self-organizing. As a result, companies and other organizations have
to spend millions on maintaining and administering their systems in
case of a failure, or if something needs to be upgraded, or simply
extended. As an example, it is easy for a medium-size company to setup
a web server. If it however wants to extend this web server to run on,
say, four servers, where the information is partitioned on two of the
servers, while the other two replicate information for
fault-tolerance, there is no simple way to achieve this. It will
definitely not be as simple as plugging in three extra servers to the
web infrastructure. In most cases, the infrastructure needs to be
completely reconsidered by experts, who will replace the old system
with a new, possibly with interrupted service. This is simply not
acceptable for many tasks, especially if the company is to run
distributed web services using the web servers as containers.
The emergence of Peer-to-Peer Computing
To avoid problems as mentioned above, Peer-to-Peer (P2P) computing
emerged to make services and applications distributed, decentralized,
and perhaps foremost, self-organizing. Many merely think of it as an
application for sharing music using popular applications such as
Napster and Gnutella. While those early applications did make use of
rudimentary peer-to-peer techniques, they are monolithic pieces of
software, and are far from fulfilling the true potential of P2P computing.
A distributed file-system is a particular example of an application,
or service, that can be efficiently implemented using
peer-to-peer technology.
There are already numerous such distributed file-systems available;
typically they enable local machines of an organization or company
to be used for file storage. As the underlying peer-to-peer
infrastructure provides fault-tolerance by maintaining several copies
of the same file on many machines, the resulting system is robust and
self-organizing, even if individual machines are not.
Nothing prevents the organization to add, at need,
dedicated high-end servers to the distributed P2P file system to
increase robustness or the scale up the amount of file storage
available. The amount of administration needed for such operation
is minuscule, due to the self-organization properties of
peer-to-peer technology.
Other popular applications of peer-to-peer technology include highly scalable
and self-organizing directory services for resource discovery;
these can for instance
be used in Grids to find appropriate resources for a particular
service.
Related Research at SICS
The Distributed Systems Laboratory at SICS has been active in the area
of peer-to-peer computing for several years. Its activities can
classified into two categories, applications and middleware building.
On the applications side, SICS has primarily been active in developing
a distributed file system, which is used to exploit the unused disk
space on local machines in an organization. SICS is currently looking
into extending the distributed file system into a distributed backup
system which uses the unused resources to automatically backup all
files in an organization. Another effort underway is the developing of
a decentralized Web server. The advantage of using a peer-to-peer
architecture for Web servers, is that web pages will automatically
load-balance data on the constituent servers to handle peaks and
flash-crowds, such as big news events.
All of the above mentioned applications run on-top of the DKS
(Distributed K-ary System) middleware developed at SICS in cooperation
with the Royal Institute of Technology (KTH). It is a general purpose
peer-to-peer middleware that can be used to build peer-to-peer
applications without having any knowledge about the underlying
peer-to-peer infrastructure. The DKS system provides several basic
services to applications running on-top of it, including multicasting,
name-based communication, and directory services. These are described
briefly below.
Applications running on-top of DKS can use its distributed multicast
infrastructure for application-level multicast, even though the
underlying network might not have support for IP-multicast. This
facilitates building simple publish/subscribe applications where
information groups can be created dynamically. If a node has
information that it likes to share with other nodes in the group, the
information can be published to that group, upon which all subscribed
nodes receive a notification about the event. No central servers are
needed to achieve this basic service.
Another service that DKS provides is that applications are assigned
logical identities for addressing, rather than using physical
addresses such as IP numbers. This alleviates several obstacles
associated with the use of physical identities, such as mobile nodes
or nodes receiving dynamical addresses. Many hosts are today behind
firewalls or NATs, making communication with them cumbersome; the DKS
middleware uses logical identities for communicating with them.
DKS further provides a distributed directory service, which allows
nodes to add information to the directory, such as mappings of
resource names to resource locations, or service information to
service providers. Any node in the system can lookup information in
the directory to find out, for example, which host provides a
particular service. The difference of the DKS directory service with
traditional directory services, such as DNS and LDAP, is that it is
completely decentralized and self-organizing. Every host in the
network is storing parts of the directory structure which is
dynamically assigned to it by the other hosts, and the information is
replicated over several hosts. Thus if a host leaves the network or
crashes, the information can be retrieved from another host which is
storing the same information.
Future Research
Several companies are already running applications on-top of DKS with
success. This does not however mean that there are no research
challenges ahead. One central research issue that SICS is currently
targeting is security. So far security has been provided in P2P
systems by building walls around the system, requiring all nodes in a
peer-to-peer network to provide certificates issued by trusted third
parties. This will be insufficient if these systems are to be
ubiquitously deployed, as even trusted nodes can be compromised by
malicious users or viruses. Rather we have to engineer the P2P systems
to cope with a small percentage of malicious nodes (e.g., using
replication of information), and to establish procedures for evicting
malicious nodes out of the system once detected.
Furthermore, much more experience is needed in actually testing and
deploying peer-to-peer systems in large-scale organizations. Once
these challenges have been faced, the full potential of this new
computing paradigm, such as self-organization and zero-maintenance,
can be realized and the techniques can be widely deployed in critical
business processes.