Multicore Days 2008, abstracts and bios
Talks and tutorials
TITLE: Parallelism for Multicore and Manycore
James Reinders, Intel
ABSTRACT: James Reinders of Intel will help set the stage for the
conference by discussing the current state of affairs with multicore
processors and alerting us to the pending move to manycore processors.
James will talk about where this is taking us in hardware and software,
and how software applications can best take advantage of it. James will
offer examples including some surprising and encouraging results, and
practical advice for development of effective parallel software.
TITLE: Intel Threading Building Blocks
Alexey Kukanov, Intel
ABSTRACT: Intel Threading Building Blocks (TBB) is a cross-platform,
open source, solution for parallel programming in C++. TBB has become
very popular - enjoying many contributions, ports to new platforms as
well as usage in major well known applications. TBB provides algorithms
for parallelism as well as container support to make up for short
comings in STL. Alexey will explain TBB, talk about some lessons learned
that help evolution of TBB, and talk some about future directions.
TITLE: Nema Labs Approach to Accelerate Reliable Threading
Per Stenstrom, Nema Labs, Göteborg, Sweden
It will take years for the software community to thread legacy and new
code to utilize the performance of multicores because of the educational
gap and the many obstacles associated with threading. To accelerate
threading efforts, Nema Labs offers methods and products that help
software developers to safely thread their code with near-zero overhead
using a sequential abstraction guided by intelligent tools.
Our new line of software tools is integrated in popular IDEs such as
Eclipse. The tools help conventional programmers with no prior
experience of threading to uncover parallelism in C/C++ reliably, with
little effort, and in a platform-agnostic manner.
TITLE: Multicore Research Activities supported by the European Commission
Per Stenstrom, Chalmers University of Technology, Göteborg, Sweden
While the research community has been engaged in parallel processing
research for decades, the shift to multicores has sent a clear message
to extract what is of value and what should drive research agendas for
the next 5-10 years. The European research community has responded to
this in many ways. First, the HiPEAC Network of Excellence (which
stands for High-Performance and Embedded Architectures and Compilers)
has joined top-class researchers around several research themes that are
of importance for the multi/many-core paradigm shift. Second, the SARC
integrated project (which stands for Scalable computer ARChitecture) is
meeting the major challenges of programmability and scalability. This
talk will give an overview of these efforts.
TITLE: Programming in the era of parallelism
David Padua, University of Illinois at Urbana-Champaign
With the coming of age of multiprocessors, program performance and
efficiency has become more important and difficult to achieve.
Furthermore, the applications of today must also be scalable so that
they can make effective use of the additional parallelism introduced by
newer generations of machines. To achieve strong and scalable
performance, programmers must do all the work traditionally required for
sequential tuning and in addition address the complex optimization
issues introduced by parallelism. This difficulty is likely to increase
even further if, as it is expected, multicores become heterogeneous or
their overall organization changes significanly over time. However, even
assuming homogeneous and stable organizations, programmer productivity
is bound to suffer due to the initial cost of tuning for multiprocessors
and the need for adaptation as the number of processors increase.
In this talk, I will discuss future directions for programming language
design, compiler technology, and the emerging autotuning strategies in
the context of parallel programming. I will argue that advances in
languages, compilers, and autotuning techniques will be necessary to
recover the ground in productivity that has been lost with the advent of
multicores. I will also argue that these tree components of a
programming environment must be designed jointly to facilitate program
tuning. The ultimate goal is for tuning to be accomplished without
requiring the programmer to be concerned with the details of the target
machine. It is expected that languages, compilers and autotuning
techniques will evolve into a methodology that will dramatically reduce
and perhaps eliminate in some cases the cost of porting programs across
machine generations and machine classes. The availability of such
methodology should not only help programmer productivity but also give
machine designers more freedom to innovate.
TITLE: Simics Accelerator: Creating a Parallel Program out of a Serial
Problem
Dr. Jakob Engblom, Virtutech
One of the big issues in the shift to multicore architectures is
the fact that certain workloads appear to be "stubbornly sequential",
and not very amenable to a parallel execution model. Virtutech Simics
is a computer system simulator, and like other tools of that type
remained a single-threaded program from its inception in 1991 until
today. The problem appears quite stubbornly sequential for a number of
reasons. However, by rethinking and refining some key semantic
properties of the simulation model and reconsidering the properties of
the problem domain and how they could be exploited, Virtutech has
managed to create a parallel version of Simics. This led to the Simics
Accelerator product that was launched this past March. With accelerator,
it is possible to run simulations using multiple host cores and see very
good performance benefits that are close to linear in the number of host
cores used. In this talk, we will discuss how we threaded Simics, why
it was tough to do, how we made the impact on existing Simics code
minimal, and the net performance results.
TITLE: OSE Multicore Version: Bare Metal Performance with SMP Flexibility
Dr. Magnus Karlsson, Enea
Using multicore chips for real-time operating systems (RTOS) have
become a major focus of the industry. Many approaches have been
suggested: use an SMP OS; run individual OSes on each core in an AMP
solution; or run on bare metal without any OS. Each approach has its
pros and cons. With SMP you get great flexibility, but you have to
rewrite your applications and pay a performance penalty for sharing of
OS resources you might not use. With AMP, you get better performance
but there is a major effort in getting the OSes to cooperate. Finally,
with bare metal you get the best performance, but you have to write
all of the OS-like functionality you need yourself.
In this talk, we present the multicore version of our RTOS OSE, in
which we aim to get bare metal performance combined with much of the
flexibility of an SMP OS. The focus of the presentation will be on the
design trade-offs we have made in order to achieve this goal. As a
case study, we show how OSE multicore version can be used in a
carrier-grade telecommunications platform, to provide good
performance, scalability and flexibility.
=========================================
Speakers
James Reinders
James Reinders is a senior engineer who joined Intel Corporation in 1989
and has contributed to projects including the world's first TeraFLOP
supercomputer (ASCI Red), compilers and architecture work for a number
of Intel processors and parallel systems. James has been a driver behind
the development of Intel as a major provider of software development
products, and serves as their chief evangelist as well as their director
of sales and marketing. Reinders is the author of a recent Nutshell book
"Intel Threading Building Blocks" from O'Reilly Media which has been
translated this year to Japanese and Chinese. James is a columnist for
the "The Gauntlet" found online at http://go-parallel.com, and author of
the book "VTune Performance Analyzer Essentials" from Intel Press and
has published numerous articles and is widely interviewed on
parallelism. James received his B.S.E. in Electrical and Computing
Engineering and M.S.E. in Computer Engineering from the University of
Michigan.
Per Stenström
Per Stenström is a professor of computer engineering at Chalmers
University of Technology. His research interests are devoted to
hardware/software interaction in high-performance computer systems. He
has authored/coauthored more than a hundred publications in this area
and has chaired several top-class conferences including IEEE/ACM ISCA
and IEEE HPCA. He also acts as editor of Journal of Parallel and
Distributed Computing, the IEEE TCCA Computer Architecture Letters, and
editor-in-chief of the Transactions on High-Performance Embedded
Architectures and Compilers. He is a founding member of the EU funded
Network of Excellence HiPEAC and a founder of the startup Nema Labs. He
was elevated to Fellow of the IEEE in 2007.
David Padua
David Padua is Donald Biggar Willet Professor of computer science at the
University of Illinois at Urbana-Champaign, where he has been a faculty
member since 1985. At Illinois, he has been Associate Director of the
Center for Supercomputing Research and Development, a member of Science
Steering Committee of the Center for Simulation of Advanced Rockets, and
chair of the College of Engineering Faculty Advisory Committee. He has
served as a program committee member, program chair, or general chair
for more than 40 conferences and workshops. He served on the editorial
board of the IEEE Transactions of Parallel and Distributed Systems, as
editor-in-chief of the International Journal of Parallel Programming
(IJPP) and as Steering Committee Chair of ACM SIGPLAN’s Principles and
Practice of Parallel Programming. He is member of the editorial boards
of the Journal of Parallel and Distributed Computing, ACM Transactions
on Programming Languages and Systems (TOPLAS), and IJPP. His areas of
interest include compilers, machine organization, and parallel
computing. He has published more than 140 papers in those areas. He is a
Fellow of the IEEE and the ACM.
