EURO-PAR'95, Stockholm, Sweden August 29-31,
Swedish Institute of Computer Science (SICS) and Department of
Teleinformatics, KTH

Tutorial Programme

All tutorials will be held on Monday, 28th August. See also the advance programme.

Full-day tutorial is 1450 SEK (600 for students) and half-day tutorials are 800 SEK (400 for students). All tutorial fees include coffee breaks and printed material. Full-day tutorial includes lunch. Please register using our registration web page.

Tutorial 1 (full day):  Per Stenström
(9:00-17:30)            "Multiprocessors and Multicomputers - Programming 
                         and Design" 

Tutorial 2 (half day):  Chris Jesshope
(9:00-12:30)            "Scalable Parallel Computers"

Tutorial 3 (half day):  Richard Hofmann
(14:00-17:30)           "ZM4/SIMPLE: a Universal Hardware Monitor and Trace Evaluation
                         Package for Parallel and Distributed Systems"

Tutorial 4 (half day):  Erland Fristedt and Per Oster
(9:00-12:30)            "Parallel Applications"

Tutorial 5 (half day):  Kam-Fai Wong
(14:00-17:30)           "Parallel Database Systems Engineering"

Tutorial 1

"Multiprocessors and Multicomputers - Programming and Design"

Per Stenström

August 28th, full day (9 - 17:30)

We cover in this tutorial an emerging class of high-performance computer systems known as MIMD systems. These systems are built from commodity microprocessors that cooperate in solving various compute-intensive applications from numerical algorithms to industrial-oriented control algorithms in embedded systems. Two distinct classes of systems are considered: shared-memory multiprocessors and multicomputers where messages are used to coordinate the parallel computation.

The tutorial is divided into two parts that address programming as well as design of multiprocessors and multicomputers. Regarding programming, we start from traditional imperative programming languages such as C and study how we can extend them with constructs to define and coordinate parallel actions. We then study how problem partitioning affects the algorithmic speedup of a computation.

Regarding the design part of the tutorial, we first focus on design principles and performance issues for interconnection networks (INs)--an important part of any parallel computer. We specifically study how IN topologies and routing algorithms affect the cost, latency, and bandwidth of a parallel computer. While multiprocessors and multicomputers both rely on efficient message transfers, shared-memory multiprocessors must support automatic data replication across processing nodes to be effective. A key mechanism to achieve this goal is to use caching and we focus in detail on design principles as well as performance issues of coherent caches--hardware-based schemes to support data replication in shared-memory multiprocessors. Finally, we look at advanced schemes for tolerating long latencies in large-scale machines such as prefetching, memory consistency model relaxation, and multithreading.

Target audience: practitioners in the field of computer engineering, but also to researchers in computer science who are interested in the state-of-the-art of parallel computer architecture. The tutorial assumes basic knowledge in programming and computer organization and architecture.

Per Stenström is an Associate Professor of Computer Engineering at Lund University, where he has conducted research in parallel processing since 1984. He received an MS degree in electrical engineering in 1981 and a PhD degree in computer engineering in 1990, both from Lund University. His primary research interests are in parallel architectures, performance evaluation, and memory systems for high-speed computer systems and he has authored and co-authored more than 40 papers in these areas. He is also an author of two textbooks on computer organization and architecture.

Besides his activities in Lund, Dr. Stenström is a research advisor at Swedish Institute of Computer Science. He has been a visiting scientist at Carnegie Mellon University (1987), Stanford University (1991), and University of Southern California (1993), where he has studied various aspects of shared-memory multiprocessor architectures. He is on the editorial board of the Journal of Parallel and Distributed Computing and a member of the ACM, the IEEE, and the Computer Society.


Tutorial 2

"Scalable Parallel Computers"

Chris Jesshope

August 28th, half day (9 - 12:30)

This tutorial provides an up-to-date overview of architecture and languages for scalable parallel computers. It focusses on distributed memory multi-processors and data-parallel languages. It is just about agreed now that scalability in architectural terms can not be considered in isolation, we must also have programming methodologies which are scalable and portable. It will be argued that these require a single global address space which will develop the seminar along the axis of virtual or distributed shared memory systems. Such memory systems however, are nun-uniform in their access properties (NUMA architectures). Therefore a large proportion of the time will be spent studying how within such architectures we can first of all reduce latency and secondly how we can tolerate any remaining latency. These issues will lead us to study networks for parallel computers and also scheduling mechanisms and other latency hiding techniques. The seminar will span on the architectural side cache-based processors, multi-threaded processors and dataflow processors, while on the language side it will consider EVAL, F-code, FORTRAN 90 and HPF.

Target audience: anyone who is interested in recent developments in scalable parallel computer architectures. This is not basic material, it presents a coherent view of the fundamental problems of reaching this goal and some of the developments which have been seen in recent commercial architectures as well as developments from recent research which show promise in achieving this goal. It does not however require a detailed knowledge of parallel computer systems. Some background knowledge in conventional computer architecture is however required (e.g. 1st degree level).

Chris Jesshope is Racal Professor of IT at the University of Surrey, where he leads the Computer Systems Research Group. He has had a reserach interest in parallel computers since 1976, when he was a research fellow at Reading University and a user of the ICL DAP, and Illiac V, both SIMD computers. Since then his interests have moved more towards architecture and implementation of parallel machines. At Southampton University 1981-1990 he has led projects implementing reconfigurable parallel computers, both SIMD and MIMD computers. At the University of Surrey, since 1990, he has led projects in which asics have been designed and fabricated to give low-latency, high-performance communication networks for scalable parallel computers. In addition to that he has established several projects whose aim is to provide portable software tools for programming parallel computers, based on the automatic compilation of un-annotated, data-parallel langages.

Chris Jesshope has published in excess of 100 papers in and has edited several state-of-the-art reports in this area. He is a co-author of a very successful book on parallel computers, which has been published in 5 editions, including two in English, one in Russian, one in Japanese and one in Romanian. He has given numerous invited papers at international conferences in more than 15 countries. He is the Honorary Editor of the IEE Proceedings Part E, Computers and Digital Techniques, Series Editor for the Chapman and Hall book series in Parallel and Distributed Computing, as well as being on a number of editorial boards for other journals in this area.

Chris Jesshope is Chairman of the Steering committee for EUROPAR, a member of the IEE, a Chartered Engineer and a Fellow of the British Computer Society.


Tutorial 3

"ZM4/SIMPLE: a Universal Hardware Monitor and Trace Evaluation Package for Parallel and Distributed Systems"

Richard Hofmann

University Erlangen, IMMD VII
Martensstr. 3, D-91058
Erlangen
phone: +49-9131-85-7026
email: rhofmann@informatik.uni-erlangen.de

August 28th, half day (14 - 17:30)

Target audience: persons involved in programming of parallel and distributed systems, with common knowledge in computer science and practice in parallel and distributed processing.

Due to the complex interactions between activities in parallel processes, the dynamic behavior of the system cannot be quantified a priori. However, a profound knowledge about what is going on in the system is the basis for balancing the load in order to optimally utilize the potential power of such a parallel system. Monitoring is a valuable aid in getting the necessary insight into this dynamic behavior of interacting processes. In this tutorial, the universal distributed hardware monitor system ZM4 as well as the universal toolbox for event trace evaluation SIMPLE is discussed. Both toolsets are in practical use by a number of research institutes at universities and in the industry as well as by ourselves.

In the first part of the tutorial, the principles of measurement-based performance analysis in parallel and distributed systems are briefly sketched. It is shown that an external hardware monitor system can be used for any parallel and distributed system, provided that it has the necessary features. These features are 1) a global timebase with sufficient accuracy, 2) a scaleable architecture, and 3) a universal interfacing scheme that makes it easily adaptable to arbitrary object systems. In the sequel, ZM4, a universal distributed monitor system providing these features is introduced. ZM4's relevance for parallel applications is shown by discussing monitoring projects in state of the art parallel and distributed systems.

The second part of the tutorial deals with SIMPLE (Source related Integrated performance Measurement and modeLing Environment), a universal and powerful toolbox for all tasks related to the process of presenting the meaning of trace data to human beings. In a similar way as ZM4, SIMPLE is universal in the following sense: SIMPLE can be used for arbitrary evnet traces, regardless of their source, structure, meaning, and content. In order to achieve this, SIMPLE follows a layered approach with a configurable access layer to the actual trace data. The SIMPLE toolbox contains statistics-oriented tools like trcstat (compute common trace statistics), fact (find activities), and varus (validating rules checking system) as well as interactive graphics-oriented tools like gantt (draw state time diagrams) and hasse (draw causality diagrams between process traces). All these tools will be introduced with examples from measurements at practical parallel and distributed systems.

Richard Hofmann studied at the University of Erlangen, where he received his diploma in electrical engineering. After joining the IMMD (Institute for mathematical machines and data processing) at the same university, he designed and implemented the universal distributed monitor system ZM4. In 1992 he received the Dr.-Ing degree in computer science with his work on Secure Temporal Relationships for Performance Analysis in Parallel and Distributed Systems. For this work he received the best dissertation award from the section communication and distributed systems in the GI/ITG. In 1991, Dr. Hofmann lectured on hardware and hybrid monitoring in parallel and distributed systems at FUDAN university inShanghai in China.

His current research interests include hardware monitoring, hybrid monitoring, hardware design for monitor interfaces, trace evaluation and tools, clock synchronization, basic problems in parallel and distributed systems, programmable logic devices.


Tutorial 4

"Parallel Applications"

Erland Fristedt                   Dr. Per Öster
Center for Parallel Computers     Center for Parallel Computers
Royal Institute of Technology     Royal Institute of Technology
100 44 Stockholm                  100 44 Stockholm
Sweden                            Sweden
Email: erlandf@pdc.kth.se         Email: per@pdc.kth.se
Tel: +46 8 790 6907               Tel: +46 8 790 6261
Fax: +46 8 24 77 84               Fax: +46 8 24 77 84
August 28th, half day (9 - 12:30)

A substantial part of existing standard computational codes in industry and science are today available in versions for scalable parallel computers. Although some of these implementations show a lack of efficiency and are only partial parallelizations the present interest from software vendors must be considered a commercial brake trough for parallel technology.

We will in this tutorial discuss and give an introduction to some aspects of development of parallel applications such as, communication models, data decomposition, load-balancing and use of message-passing libraries.

Target audience: Everyone interested in an introduction to parallel applications.

Erland Fristedt has a Master in Computer Science from KTH, Sweden. He has several years of experience in development of parallel applications for message passing systems at the Swedish National Research Defense Establishment. At present he is working with parallel applications at the Center for Parallel Computer at KTH, Sweden.

Per Öster has a BSc in Physics from University of Uppsala, Sweden and a PhD in Theoretical Atomic Physics from Chalmers University of Technology, Sweden. He has four years of experience as consultant in applied mathematics and responsible for engineering applications in the Volvo Data Corporation. At present he is working with parallel applications at the Center for Parallel Computer at KTH, Sweden.


Tutorial 5

"Parallel Database Systems Engineering"

Dr. Kam-Fai Wong

Department of Systems Engineering & Engineering Management,
Chinese University
Shatin, N.Y.
Hong Kong.
Email - kfwong@se.cuhk.hk
Tel: +852 6098332
Fax: +852 6035505

August 28th, half day (14 - 17:30)

Today, very large databases may easily involve over tera-bytes of data. This trend shows no sign of diminishing. Albeit the advancements in processor technology, handling such large volume of information is becoming increasingly difficult for conventional database management systems which run on sequential computers. To overcome this predicament, a number of research projects are investigating the use of parallel computers. The inherent parallelisms behind its data model (e.g. relational) render database suitable for parallel implementation. In this tutorial, the concept of parallel database systems (PDS) which is based on the extended dataflow computation model will be presented. In addition, few engineering issues regarding to the implemen- tation of the model will be reviewed.

Target Audience: Database developers who are interested in parallel implementations; Parallel software developers who are planning to develop a database system; First year Postgraduate students in database or parallel computing

Kam-Fai Wong obtained his PhD from the University of Edinburgh, Scotland, in 1987, in the area of computer architectures. After his PhD, he has performed research in Heriot-Watt University (Edinburgh, Scotland), UniSys (Livingston, Scotland) and ECRC (Munich, Germany). At present he is a Project Coordinator at the Chinese University of Hong Kong, in charge of the IPOC (Intelligent Processing Of Chinese) project. His research interests are parallel database and information systems. He has published over 25 technical papers in these areas in various international journals, conferences and books.

During his 8 years postdoctoral research period, he has given many seminars. In 1993/95, he is one of the ACM lecturers worldwide. He is a member of IEEE-CS, ACM and IEE(UK) and have served as the AI/DB track chair in 1994 ACM Symposium on Applied Computing, the Asian Coordinator of the 1994 Parallel and Distributed Information Systems and PC members of TOOLS94, PARLE94, VLDB94, SPDP94, ICDCS95, DASFAA95, Europar95 and ICDE96.


Pages developed and maintained by psm@sics.se. Please direct comments or bug reports to europar95-www@sics.se