Multicore Days 2008, abstracts and bios

Talks and tutorials

TITLE: Parallelism for Multicore and Manycore
James Reinders, Intel

ABSTRACT: James Reinders of Intel will help set the stage for the conference by discussing the current state of affairs with multicore processors and alerting us to the pending move to manycore processors. James will talk about where this is taking us in hardware and software, and how software applications can best take advantage of it. James will offer examples including some surprising and encouraging results, and practical advice for development of effective parallel software.


TITLE: Intel Threading Building Blocks
Alexey Kukanov, Intel

ABSTRACT: Intel Threading Building Blocks (TBB) is a cross-platform, open source, solution for parallel programming in C++. TBB has become very popular - enjoying many contributions, ports to new platforms as well as usage in major well known applications. TBB provides algorithms for parallelism as well as container support to make up for short comings in STL. Alexey will explain TBB, talk about some lessons learned that help evolution of TBB, and talk some about future directions.


TITLE: Nema Labs Approach to Accelerate Reliable Threading
Per Stenstrom, Nema Labs, Göteborg, Sweden

It will take years for the software community to thread legacy and new code to utilize the performance of multicores because of the educational gap and the many obstacles associated with threading. To accelerate threading efforts, Nema Labs offers methods and products that help software developers to safely thread their code with near-zero overhead using a sequential abstraction guided by intelligent tools.

Our new line of software tools is integrated in popular IDEs such as
Eclipse. The tools help conventional programmers with no prior experience of threading to uncover parallelism in C/C++ reliably, with little effort, and in a platform-agnostic manner.


TITLE: Multicore Research Activities supported by the European Commission
Per Stenstrom, Chalmers University of Technology, Göteborg, Sweden

While the research community has been engaged in parallel processing research for decades, the shift to multicores has sent a clear message to extract what is of value and what should drive research agendas for the next 5-10 years. The European research community has responded to this in many ways. First, the HiPEAC Network of Excellence (which stands for High-Performance and Embedded Architectures and Compilers) has joined top-class researchers around several research themes that are of importance for the multi/many-core paradigm shift. Second, the SARC integrated project (which stands for Scalable computer ARChitecture) is meeting the major challenges of programmability and scalability. This talk will give an overview of these efforts.


TITLE: Programming in the era of parallelism
David Padua, University of Illinois at Urbana-Champaign

With the coming of age of multiprocessors, program performance and efficiency has become more important and difficult to achieve. Furthermore, the applications of today must also be scalable so that they can make effective use of the additional parallelism introduced by newer generations of machines. To achieve strong and scalable performance, programmers must do all the work traditionally required for sequential tuning and in addition address the complex optimization issues introduced by parallelism. This difficulty is likely to increase even further if, as it is expected, multicores become heterogeneous or their overall organization changes significanly over time. However, even assuming homogeneous and stable organizations, programmer productivity is bound to suffer due to the initial cost of tuning for multiprocessors and the need for adaptation as the number of processors increase.

In this talk, I will discuss future directions for programming language design, compiler technology, and the emerging autotuning strategies in the context of parallel programming. I will argue that advances in languages, compilers, and autotuning techniques will be necessary to recover the ground in productivity that has been lost with the advent of multicores. I will also argue that these tree components of a programming environment must be designed jointly to facilitate program tuning. The ultimate goal is for tuning to be accomplished without requiring the programmer to be concerned with the details of the target machine. It is expected that languages, compilers and autotuning techniques will evolve into a methodology that will dramatically reduce and perhaps eliminate in some cases the cost of porting programs across machine generations and machine classes. The availability of such methodology should not only help programmer productivity but also give machine designers more freedom to innovate.


TITLE: Simics Accelerator: Creating a Parallel Program out of a Serial Problem
Dr. Jakob Engblom, Virtutech

One of the big issues in the shift to multicore architectures is the fact that certain workloads appear to be "stubbornly sequential", and not very amenable to a parallel execution model. Virtutech Simics is a computer system simulator, and like other tools of that type remained a single-threaded program from its inception in 1991 until today. The problem appears quite stubbornly sequential for a number of reasons. However, by rethinking and refining some key semantic properties of the simulation model and reconsidering the properties of the problem domain and how they could be exploited, Virtutech has managed to create a parallel version of Simics. This led to the Simics Accelerator product that was launched this past March. With accelerator, it is possible to run simulations using multiple host cores and see very good performance benefits that are close to linear in the number of host cores used. In this talk, we will discuss how we threaded Simics, why it was tough to do, how we made the impact on existing Simics code minimal, and the net performance results.


TITLE: OSE Multicore Version: Bare Metal Performance with SMP Flexibility
Dr. Magnus Karlsson, Enea

Using multicore chips for real-time operating systems (RTOS) have become a major focus of the industry. Many approaches have been suggested: use an SMP OS; run individual OSes on each core in an AMP solution; or run on bare metal without any OS. Each approach has its pros and cons. With SMP you get great flexibility, but you have to rewrite your applications and pay a performance penalty for sharing of OS resources you might not use. With AMP, you get better performance but there is a major effort in getting the OSes to cooperate. Finally, with bare metal you get the best performance, but you have to write all of the OS-like functionality you need yourself.

In this talk, we present the multicore version of our RTOS OSE, in which we aim to get bare metal performance combined with much of the flexibility of an SMP OS. The focus of the presentation will be on the design trade-offs we have made in order to achieve this goal. As a case study, we show how OSE multicore version can be used in a carrier-grade telecommunications platform, to provide good performance, scalability and flexibility.



=========================================

Speakers


James Reinders

James Reinders is a senior engineer who joined Intel Corporation in 1989 and has contributed to projects including the world's first TeraFLOP supercomputer (ASCI Red), compilers and architecture work for a number of Intel processors and parallel systems. James has been a driver behind the development of Intel as a major provider of software development products, and serves as their chief evangelist as well as their director of sales and marketing. Reinders is the author of a recent Nutshell book "Intel Threading Building Blocks" from O'Reilly Media which has been translated this year to Japanese and Chinese. James is a columnist for the "The Gauntlet" found online at http://go-parallel.com, and author of the book "VTune Performance Analyzer Essentials" from Intel Press and has published numerous articles and is widely interviewed on parallelism. James received his B.S.E. in Electrical and Computing Engineering and M.S.E. in Computer Engineering from the University of Michigan.


Per Stenström

Per Stenström is a professor of computer engineering at Chalmers University of Technology. His research interests are devoted to hardware/software interaction in high-performance computer systems. He has authored/coauthored more than a hundred publications in this area and has chaired several top-class conferences including IEEE/ACM ISCA and IEEE HPCA. He also acts as editor of Journal of Parallel and Distributed Computing, the IEEE TCCA Computer Architecture Letters, and editor-in-chief of the Transactions on High-Performance Embedded Architectures and Compilers. He is a founding member of the EU funded Network of Excellence HiPEAC and a founder of the startup Nema Labs. He was elevated to Fellow of the IEEE in 2007.


David Padua
David Padua is Donald Biggar Willet Professor of computer science at the University of Illinois at Urbana-Champaign, where he has been a faculty member since 1985. At Illinois, he has been Associate Director of the Center for Supercomputing Research and Development, a member of Science Steering Committee of the Center for Simulation of Advanced Rockets, and chair of the College of Engineering Faculty Advisory Committee. He has served as a program committee member, program chair, or general chair for more than 40 conferences and workshops. He served on the editorial board of the IEEE Transactions of Parallel and Distributed Systems, as editor-in-chief of the International Journal of Parallel Programming (IJPP) and as Steering Committee Chair of ACM SIGPLAN’s Principles and Practice of Parallel Programming. He is member of the editorial boards of the Journal of Parallel and Distributed Computing, ACM Transactions on Programming Languages and Systems (TOPLAS), and IJPP. His areas of interest include compilers, machine organization, and parallel computing. He has published more than 140 papers in those areas. He is a Fellow of the IEEE and the ACM.