Multicore Day August 31
The Multicore Day 2007 was very successful! The program included representatives for Swedish and international industry and academia and featured Dr. Marc Tremblay (CTO Microelectronics Sun Microsystems), Dr. Tim Mattson (Parallel Computing Evangelist at Intel) and Prof. Vijay Saraswat (IBM TJ Watson Research Lab) as invited speakers.
Time: August 31 2007, 9.00 - 16.30
Place: Kista Science Tower, Kista
The Multicore Day is organised by SICS in cooperation with Uppsala University and Chalmers Institute of Technology and is sponsored by VINNOVA.
Program
=======
9.00 Registration and coffee
9.30 Multicore: a Promise or a Threat?,
Prof. Erik Hagersten (Uppsala university)
10.00 Multithreaded Multicores and Rock's Transactional Memory, Dr. Marc Tremblay (CTO Microelectronics Sun Microsystems)
11.00 Parallel Programming: can we PLEASE do it right this time?,
Dr. Tim Mattson (Parallel computing evangelist, Intel)
12.00 Lunch break (lunch is not included, but there are many places to eat nearby)
COBEGIN (three parallel tracks)
---- Validation and Testing Track ----
13.00 Debugging Multicore Software Issues using Virtual Hardware,
Jakob Engblom (Virtutech)
13.30 Finding the Multicore Bottlenecks,
Erik Hagersten (CEO Acumem AB)
14.00 Quick Check, Prof. John Hughes (Chalmers)
---- Platforms and Applications Track ----
13.00 Introduction to Open MP, Dr. Tim Mattson (Intel)
13.30 Experiences of Multicore Implementations, Dr. Sverker Holmgren (Uppsala University)
14.00 Exploiting Multicore at Ericsson,Ulf Wiger (Ericsson)
---- Future Technologies Track ----
13.00 Lock free algorithms, Dr. Philipas Tsigas (Chalmers)
13.30 Reducing Squash Penalties in Transactional Memory Protocols,
Prof. Per Stenström (Chalmers)
14.00 Dependency profiling with Embla, Dr. Karl-Filip Faxen (SICS)
COEND
14.30 Coffee
15.00 The X10 Programming Language, Prof. Vijay Saraswat (IBM)
15.45 Closing Panel
16.30 The end
Introduction
============
In a multicore system, each processor chip contains several processors, referred to as cores. From a hardware perspective, this is the most appealing strategy for offering higher computer performance based on today’s chip technology. However, what is the software perspective?
Some people refer to multicore as the biggest revolution in computing since object-oriented programming. It will require rethinking of the way we structure and program our systems. Today, there are more questions than answers from a software perspective:
Will my existing algorithms run well on multicore?
How do I use existing languages to utilize the power of multicore?
What are the pros and cons of new languages?
How do I test and validate a parallel program?
Any new implications on real-time properties?
Answering these questions will open the door to utilizing multicore technology. The award is high: multicore systems can offer substantially higher performance at much lower power consumption than the old single-core strategy. Judging from trends in the U.S. computer industry, all major players seem to be in a violent agreement. Like it or not: the future is multicore.
Attend Multicore Day and get first-hand information about hardware and software thinking directly from key players in the U.S. computer industry. They will cover topics such as multicore hardware, using existing languages for multicore and the proposal of new programming languages for multicore. Three parallel sessions of technology from Swedish companies and researchers will give depth to some of the key aspects of going multicore.
Short biographies of invited speakers
=====================================
Dr. Marc Tremblay
Fellow, Senior Vice President and Chief Technology Officer
Microelectronics, Sun Microsystems Inc.
Marc Tremblay is currently CTO for Sun's Microelectronics business unit where he sets the direction for Sun's processor roadmap and related technology. His mission is to move Sun's entire product line to the Throughput Computing paradigm, incorporating techniques he has helped develop, including chip multiprocessing, chip multithreading, speculative multithreading, assist threading and transactional memory.
----------------
Dr. Tim Mattson
Parallel Computing Evangelist
Intel Corporation
Dr. Mattson is interested in doing whatever it takes to “make sequential software rare”. His wide ranging research agenda over the years has focused on parallel applications, parallel programming environments, and a careful analysis on the cognitive psychology of the programming process.
Currently, Dr. Mattson is conducting research on abstractions that bridge across parallel system design, parallel programming environments, and application software. This work builds on his recent book on Design Patterns in Parallel Programming (written with Professors Beverly Sanders and Berna Massingill and published by Addison Wesley). The patterns provide the "human angle" and help keep his research focused on technologies that help general programmers solve real problems.
----------------
Prof. Vijay Saraswat
Research Staff Member, IBM TJ Watson Research Lab
Adjunct Professor, Penn State University
Vijay Saraswat joined IBM Research in Sep 2003, after a year as a Professor at Penn State, a couple of years at startups and 13 years at Xerox PARC and AT&T Research. His main interests are in programming languages, constraints, logic and concurrency. At IBM, he leads the work on the design of X10, a modern object-oriented programming language intended for scalable concurrent computing. Over the last twenty years he has lectured at most major universities and research labs in USA and Europe.
Abstracts of talks
==================
Multicore: a Promise or a Threat?
Erik Hagersten, professor, Uppsala University
This talk gives a background to the multicore revolution and gives reasons for why it is happening right now. It identifies some of the challenges and possibilities with this new technology and raises questions to bear in mind when outlining the multicore strategy for a company.
-------------
Multithreaded Multicores and Rock's Transactional Memory,
Dr. Marc Tremblay (CTO Sun Microelectronics)
High-Performance Throughput Computing, achieved through
designed-from-scratch processors composed of multiple multi-threaded
cores, offers an unprecedent opportunity to create a new generation
of pipelines that deliver both high throughput performance and high
single-thread performance.
High throughput is achieved by generating tens of independent,
highly-confident, memory accesses and by providing the appropriate
high-bandwidth hierarchical caches, scalable interconnect fabric,
and a vast arrangement of memory DIMMs.
Strangely perhaps, high single-thread performance is obtained
through leveraging the same microarchitecture structures. Hardware
threads act as surrogates for the main thread, decomposing the hard
task into different phases (prefetch, speculation, execution, catch up,
and join).
The commercial availability of a plethora of multicores, while of great use
for servers, has not yet led to a new generation of software applications.
Transactional Memory has emerged as a leading technique that
enables applications to better take advantage of multi-threaded,
multi-core microprocessors. In this talk, we will unveil, for the first
time in Europe, what we believe is the first hardware implementation of
Transactional Memory, available on the Rock Sparc microprocessor.
-------------
Parallel Programming: can we PLEASE do it right this time?
Dr. Tim Mattson, Parallel Computing Evangelist, Intel Corporation
The computer industry has a problem. In the near future, our products will have multiple CPU cores per chip. When placed in SMP systems, clusters, and large scale grids, parallel systems will be ubiquitous. And if something isn't done soon to convert the key application software into a form that can exploit parallelism, these great parallel systems will only be marginally useful.
Where will this parallel software come from? With few exceptions, only graduate slaves and other strange "HPC people" are willing to write parallel software. Professional software engineers almost never write parallel software.
In this talk, I look back at the history of parallel computing and develop a set of rules we must follow if we want to attract programmers to parallel computing. People have been writing parallel programs for over 20 years. Just about every stupid mistake we could make, someone has already made. So rather than rediscover these mistakes on our own, lets learn from the past and "do it right this time".
-------------
Debugging Multicore Software Issues using Virtual Hardware
Dr. Jakob Engblom, Business Development Manager, Virtutech
Debugging software running on multicore and multiprocessor systems is a difficult problem. Errors are difficult to provoke, recreate, analyze and resolve because multiprocessing systems are inherently non-deterministic.
By using a virtual model of a multicore system instead of physical hardware, determinism can be reintroduced. The virtual model provides perfect control over and insight into the target system, and makes it possible to reliable reproduce problems. Virtual models also allow replay and reversed execution to enhance debug productivity.
-------------
Quick Check
Prof. John Hughes, Chalmers
QuickCheck is a tool for testing programs automatically. The programmer
provides a specification of the program, in the form of properties which the
program should satisfy, and QuickCheck then tests that the properties hold
in a large number of randomly generated cases. When a discrepancy between
what actually happens and what is supposed to happen is found, QuickCheck
simplifies the test case to the minimum that still produces the discrepancy,
automating the first stage of debugging. While specifications are written in
QuickCheck's Erlang-based specification language, the system under test can
be implemented in any language (or combination of languages). QuickCheck
is being used successfully to test industrial telecoms systems.
-------------
Finding the Multicore Bottlenecks
Erik Hagersten, CEO, Acumem AB
Most applications waste substantial performance in the memory system. This problem is expected to increase for multicore systems. Furthermore, the complicated thread interaction of multicore applications is expected to further add to the unnecessary overhead. Unfortunately, such problems are hard to spot and requires performance experts to analyze and fix.
The Acumem technology automatically identifies such wasteful memory behaviour. About 20 different types of performance issues are identified and fixes suggested at a level of detail allowing a novice programmer to perform performance optimization requiring extreme performance experts today, creating the Virtual Performance Expert.
-------------
Introduction to OpenMP
Dr. Tim Mattson, Parallel Computing Evangelist, Intel Corporation
OpenMP is a vendor independent standard for expressing shared memory parallel computation in conventional languages. There is a set of constructs for specifying parallel code, including loops, and how to share data among the parallel activities. Currently, there are bindings for Fortran, C and C++.
-------------
Experiences of Multicore Implementations
Dr. Sverker Holmberg, Uppsala University
Multicore processors changes the rules for the development of parallel algorithms and implementations. As an example, we show how the performance of an important computation can be improved by a factor of three by optimizing for a multicore architecture instead of using the standard algorithm optimized for older parallel architectures. The results build on extensive experience of shared memory algorithms and implementations, and we also give some highlights of other current research in this area at Uppsala University.
-------------
Exploiting Multicore at Ericsson
Ulf Wiger, Ericsson
Like all others, Ericsson must adapt to the multicore trend. The potential is great, since telecom applications have a high degree of natural concurrency. Even so, each product family's approach to concurrency dictates the challenges faced. This talk describes some of these challenges, and gives examples of how they can be addressed. It also describes some notable successes.
-------------
Lock free algorithms
Dr. Philipas Tsigas, Chalmers
Lock free algorithms is a research area that has been very active in recent years. The idea is to avoid locks in communication between threads instead using the atomic instructions available in most current architectures. In this way, parallelism is improved and waiting for locks is avoided with its attendant deadlock risk.
-------------
Reducing Squash Penalties in Transactional Memory Protocols
Prof. Per Stenström, Chalmers
Transactional memory promises to reduce the programming efforts for multi-core computers by avoiding serialization introduced by lock-based synchronization methodologies. Unfortunately, current proposals suffer from serious execution time losses when transactions conflict, especially in TM protocols using lazy conflict resolution. We present extensions to such protocols that reduce the penalties when squashing transactions and also address the starvation problem.
-------------
Dependency profiling with Embla
Dr. Karl-Filip Faxen, SICS
Embla is a tool (under development at SICS) for helping programmers identify independent program parts that can be executed in parallel, thus easing the burden of porting legacy software to multi core processors. Embla records dependencies while the program is running and is independent of the source language(s) the program is written in.
-------------
The X10 programming language
Prof. Vijay Saraswat, Penn State University & IBM
X10 is a programming language for multi(core) processors and clusters. X10 is built on Java and is being developed at IBM under the direction of Prof. Vijay Saraswat. The language has an explicit concept of place to control the data distribution in a distributed system.
