SICS Distributed Systems Laboratory
Swedish Institute of Computer Science

 
 
  CONTENTS
  Overview
Members
Projects
Software
  RESEARCH
  Peer-to-Peer Research
Fully Decentralized Distributed Systems
Distribution Middleware
Simulation of Large Asynchronous Systems
  Internal Pages
  Group Calendar

    SICS
    Box 1263
    SE-16429 Kista
    Sweden

    +46 8 633 1500
    +46 8 751 7230 (fax)
space

Peer-to-Peer Research

Link to the DKS Peer-to-Peer Middleware Homepage

Introduction

Internet's tremendous success can be witnessed by its 200 million estimated hosts. Few, however, know that the Internet, or its precursor, Arpanet, was designed for military survivability. The underlying infrastructure, which routes our traffic across the earth, is best-effort and self-organizing. Information about routes is gossiped constantly between the routers, which form the backbone of the Internet today. As a consequence, the Internet is self-organizing, as information is re-routed if a link is broken or inaccessible for the moment. There is no central server which controls or oversees the routes on the Internet.

Despite the tenets of the Internet, services, software and systems running on-top of the Internet are highly centralized, static, and not self-organizing. As a result, companies and other organizations have to spend millions on maintaining and administering their systems in case of a failure, or if something needs to be upgraded, or simply extended. As an example, it is easy for a medium-size company to setup a web server. If it however wants to extend this web server to run on, say, four servers, where the information is partitioned on two of the servers, while the other two replicate information for fault-tolerance, there is no simple way to achieve this. It will definitely not be as simple as plugging in three extra servers to the web infrastructure. In most cases, the infrastructure needs to be completely reconsidered by experts, who will replace the old system with a new, possibly with interrupted service. This is simply not acceptable for many tasks, especially if the company is to run distributed web services using the web servers as containers.

The emergence of Peer-to-Peer Computing

To avoid problems as mentioned above, Peer-to-Peer (P2P) computing emerged to make services and applications distributed, decentralized, and perhaps foremost, self-organizing. Many merely think of it as an application for sharing music using popular applications such as Napster and Gnutella. While those early applications did make use of rudimentary peer-to-peer techniques, they are monolithic pieces of software, and are far from fulfilling the true potential of P2P computing.

A distributed file-system is a particular example of an application, or service, that can be efficiently implemented using peer-to-peer technology. There are already numerous such distributed file-systems available; typically they enable local machines of an organization or company to be used for file storage. As the underlying peer-to-peer infrastructure provides fault-tolerance by maintaining several copies of the same file on many machines, the resulting system is robust and self-organizing, even if individual machines are not. Nothing prevents the organization to add, at need, dedicated high-end servers to the distributed P2P file system to increase robustness or the scale up the amount of file storage available. The amount of administration needed for such operation is minuscule, due to the self-organization properties of peer-to-peer technology.

Other popular applications of peer-to-peer technology include highly scalable and self-organizing directory services for resource discovery; these can for instance be used in Grids to find appropriate resources for a particular service.

Related Research at SICS

The Distributed Systems Laboratory at SICS has been active in the area of peer-to-peer computing for several years. Its activities can classified into two categories, applications and middleware building.

On the applications side, SICS has primarily been active in developing a distributed file system, which is used to exploit the unused disk space on local machines in an organization. SICS is currently looking into extending the distributed file system into a distributed backup system which uses the unused resources to automatically backup all files in an organization. Another effort underway is the developing of a decentralized Web server. The advantage of using a peer-to-peer architecture for Web servers, is that web pages will automatically load-balance data on the constituent servers to handle peaks and flash-crowds, such as big news events.

All of the above mentioned applications run on-top of the DKS (Distributed K-ary System) middleware developed at SICS in cooperation with the Royal Institute of Technology (KTH). It is a general purpose peer-to-peer middleware that can be used to build peer-to-peer applications without having any knowledge about the underlying peer-to-peer infrastructure. The DKS system provides several basic services to applications running on-top of it, including multicasting, name-based communication, and directory services. These are described briefly below.

Applications running on-top of DKS can use its distributed multicast infrastructure for application-level multicast, even though the underlying network might not have support for IP-multicast. This facilitates building simple publish/subscribe applications where information groups can be created dynamically. If a node has information that it likes to share with other nodes in the group, the information can be published to that group, upon which all subscribed nodes receive a notification about the event. No central servers are needed to achieve this basic service.

Another service that DKS provides is that applications are assigned logical identities for addressing, rather than using physical addresses such as IP numbers. This alleviates several obstacles associated with the use of physical identities, such as mobile nodes or nodes receiving dynamical addresses. Many hosts are today behind firewalls or NATs, making communication with them cumbersome; the DKS middleware uses logical identities for communicating with them.

DKS further provides a distributed directory service, which allows nodes to add information to the directory, such as mappings of resource names to resource locations, or service information to service providers. Any node in the system can lookup information in the directory to find out, for example, which host provides a particular service. The difference of the DKS directory service with traditional directory services, such as DNS and LDAP, is that it is completely decentralized and self-organizing. Every host in the network is storing parts of the directory structure which is dynamically assigned to it by the other hosts, and the information is replicated over several hosts. Thus if a host leaves the network or crashes, the information can be retrieved from another host which is storing the same information.

Future Research

Several companies are already running applications on-top of DKS with success. This does not however mean that there are no research challenges ahead. One central research issue that SICS is currently targeting is security. So far security has been provided in P2P systems by building walls around the system, requiring all nodes in a peer-to-peer network to provide certificates issued by trusted third parties. This will be insufficient if these systems are to be ubiquitously deployed, as even trusted nodes can be compromised by malicious users or viruses. Rather we have to engineer the P2P systems to cope with a small percentage of malicious nodes (e.g., using replication of information), and to establish procedures for evicting malicious nodes out of the system once detected.

Furthermore, much more experience is needed in actually testing and deploying peer-to-peer systems in large-scale organizations. Once these challenges have been faced, the full potential of this new computing paradigm, such as self-organization and zero-maintenance, can be realized and the techniques can be widely deployed in critical business processes.