Simulation of Large Asynchronous Systems
Background
Examples of large asynchronous systems are the World Wide Web,
peer-to-peer systems (like Gnutella), and biological systems. When we
look at such systems, we may be able to measure and study the system
as a whole but they cannot be modeled on that level. Instead such
systems are seen as being constructed of many interacting autonomous
agents. Modeling is thus done on the micro level: the behavior of
individual agents, their internal logic, and their interaction with
other agents. On the macro scale the behavior of the system as a whole
is obtained by simulation.
There are two cases for simulation. In the first we study some
existing system and have some knowledge of the macro behavior of the
system, but do not really understand this behavior. Hypotheses are
formulated in the form of the defined behavior of the individual
agents. For instance, if we are modeling the web, we model end-user
agents and site agents. If the behavior of a studied system as a
whole is faithfully reflected in a simulation, this indicates, but
does not prove, that the agent modeling in some fashion captures some
truth about the system. The case may be further strengthened if
simulations with alternative agent models fail to match.
In the second case we are designing a large asynchronous system. Say
we are designing a peer-to-peer system with some particular
decentralized algorithm. There are three kinds of agents, end users,
application agents, and infrastructure agents. The end user agent is
modeled to reflect the expected service use patterns. The application
agent behaves as we have programmed them to subject to certain outside
events. An application agent might be running on a user machine and
then the user turns off his machine. Infrastructure agents represent
routers and other elements that are outside the control of our
application, but they are well understood and easily modeled. The
infrastructure agents will be affected by random factors that simulate
routers being down for some limited period of time. The simulation
should, of course, support the correctness of the algorithm but this
should, of course, be proved by other means. However, expected case
macro behavior, in the presence of router failures, machine crashes,
user mischief, etc., is generally not possible to obtain by analytical
means.
Modeling the World Wide Web
Here with our partners we attempt to model individual users and sites
of the WWW. We attempt to answer the question of why and how certain
sites become popular. At first sight this may seem a trivial
question. Some sites are considered better by the majority of users
and therefore become popular. However this is not the case (as
simulation shows) as all users do not visit all sites. Users tip their
friends about good sites, so the popularity of sites depends very much
on the social interaction between end users. Research Strand 3b: The
Simulation Environment The simulations that we are interested in
involve an enormous number of agents. Scalability is a definite
issue. Our simulation environment has been parallalised so as to be
able to run on a network of machines. Care was taken to minimize the
communication overhead.
The simulation environment is not only our tool to study large
systems, but to some extent is also the subject of ongoing
research. We strive to improve the simulation platform. Examples are
1) agent behavior language to simplify the construction of new
simulations 2) performance tuning and 3) network models for
peer-to-peer simulations.
Peer-to-peer simulations
Here simulation is used to guide design. The current focus is on
algorithms for distributed naming services. How do algorithms and
variations behave under different failure assumptions and frequencies,
and different degrees of individual machine stability? Note that in
peer-to-peer systems one must usually assume considerabl? higher rates
of failure than in traditional distributed systems. Peers are or may
be PCs of ordinary end-users whose system (e.g. operating system) may
be poorly maintained and whose machine may be subject to sources of
outside interference that do not occur in clusters (e.g. a 2-year-old
child pulls out the network connection plug).