Multicore Day 2013 - speakers and abstracts

Multicore Day is part of SICS Software Week at Kistamässan 23-25 September.

See program of Multicore Day 2013.

Abstracts and Bios


GPUs: The Hype, The Reality, and The Future

David Black-Schaffer, Uppsala University


Graphics processors (GPUs) have become increasingly appealing for general purpose computation due to their performance (12x CPUs), memory bandwidth (5x CPUs), and potential for more energy efficient computation (5x CPUs). However, behind this hype lies the realities of the difficulties of writing and optimizing code and the limited speedups users are seeing in the real world. This talk will investigate the hype, reality, and future of GPUs, particularly in response to Intel's attempts to take over the market with their own GPU-like accelerator.


David Black-Schaffer is an Assistant Professor at Uppsala Architecture Research Group, Department of Information Technology, Uppsala University


Parallel Programming in the Age of Ubiquitous Parallelism

Keshav Pingali, Department of Computer Science and Institute for Computational Engineering and Science The University of Texas at Austin


Multicore and manycore processors are now ubiquitous, but parallel programming remains as difficult as it was 30-40 years ago. During this time, our community has explored many promising approaches including functional and dataflow languages, logic programming, and automatic parallelization using program analysis and restructuring, but these approaches have not succeeded except in a few niche application areas.

In this talk, I will argue that these problems arise largely from the computation-centric foundations and abstractions that we currently use to think about parallelism. In their place, I will propose a novel data-centric foundation for parallel programming called the operator formulation in which algorithms are described in terms of actions on data. The operator formulation shows that a generalized form of data-parallelism called amorphous data-parallelism is ubiquitous even in complex, irregular graph applications such as mesh generation/refinement/partitioning and SAT solvers.

Regular algorithms emerge as a special case of irregular ones, and many application-specific optimization techniques can be generalized to a broader context. The operator formulation also leads to a structural analysis of algorithms called TAO-analysis that provides implementation guidelines for exploiting parallelism efficiently. Finally, I will describe a system called Galois based on these ideas for exploiting amorphous data-parallelism on multicores and GPUs.


Keshav Pingali is a Professor in the Department of Computer Science at the University of Texas at Austin, and he holds the W.A."Tex" Moncrief Chair of Computing in the Institute for Computational

Engineering and Sciences (ICES) at UT Austin. He was on the faculty of the Department of Computer Science at Cornell University from 1986 to 2006, where he held the India Chair of Computer Science.

Pingali is a Fellow of the ACM, IEEE and the American Association for the Advancement of Science. He was the co-Editor-in-chief of the ACM Transactions on Programming Languages and Systems, and currently serves on the editorial boards of the International Journal of Parallel Programming and Distributed Computing. He has also served on the NSF CISE Advisory Committee (2009-2012).


The State of the Future

Krisztián Flautner is the vice president of research and development at ARM


Silicon technology evolution over the last four decades has yielded an exponential increase in integration densities with steady improvements of performance and power consumption at each technology generation. This steady progress has created a sense of entitlement for the dreams (and products) that future process generations would enable. Today, however, classical process scaling seems to be dead and living up to technology expectations requires continuous innovation at many levels, which comes at steadily progressing implementation and design costs. As is usually the case with economics, predictions of doom and gloom simply increase the incentives for innovation and give room for creativity. This talk will review some of the underlying issues, driving trends and promising-looking solutions that will allow engineers and designers to turn today's sci-fi vision into 2020's reality.


Krisztián Flautner is the vice president of research and development at ARM. ARM designs the technology that lies at the heart of advanced digital products with more than forty billion processors deployed by mid 2013. He leads a global team which is focused on the understanding and development of technologies relevant to the proliferation of the ARM architecture. The group’s activities cover a wide breadth of areas ranging from circuits, through processor and system architectures to tools and software. Key activities are related to high-performance computing in energy-constrained environments. Flautner received a PhD in computer science and engineering from the University of Michigan, where he is currently appointed as a visiting scholar. He is a member of the ACM and the IEEE.


Improving the efficiency of multicores in data centers

Jason Mars, University of Michigan


WSCs are built using commodity processor architectures (Intel/AMD), and software components (Linux, GCC, JVM, etc) that has been engineered and optimized for traditional computing environments and workloads, such as those you’d find in the desktop / laptop environment. However,  there are many characteristics, assumptions, and requirements present in the WSC computing domain that impacts design decisions within these components. In this presentation, we rethink how WSCs are designed and architected, identify sources of inefficiency, and develop solutions to improve WSCs, with a particular focus on the interaction between the application layer, system software stack, and the underlying multicore platform.

Jason Mars is currently an Assistant Professor at the University of Michigan. He received his Ph.D. of Computer Science at the University of Virginia in 2012. He has been an active researcher in the areas of computer architecture, system software, and cross-layer system design within the emerging domain of cloud computing platforms. Jason has published dozens of papers in these areas and received a number of rewards and honors for excellence in his research work. You can find out more information about Jason Mars at


The class of datacenters coined as “warehouse scale computers” (WSCs) house large-scale data intensive web services such as websearch, maps, social networking, docs, video sharing, etc. Companies like Google, Microsoft, Yahoo, and Amazon spend ten to hundreds of millions to construct and operate WSCs to provide these services. Maximizing the efficiency of this class of computing reduces cost and has energy implications for a greener planet. However, WSC design and architecture remains in its relative infancy.


Track: Systems

Using Speculation to Enhance JavaScript Performance in Web Applications

Jan Kasper Martinsen, Blekinge Institute of Technology


JavaScript lets developers provide client-side interactivity in Web applications, but because it is sequential, it can’t take advantage of multicore processors. Thread-level speculation (TLS) addresses this issue by enabling the speculation of JavaScript function calls and thus exploits the parallel performance potential multicore processors provide. The authors implemented TLS in the Squirrelfish JavaScript engine, which is part of the WebKit browser environment. They evaluate their approach using 15 popular Web applications on an eight-core computer, and show significant speed-ups without any modifications to the JavaScript source code.


Jan Kasper Martinsen is a PhD student in computer systems engineering at Blekinge Institute of Technology. His current work focuses on evaluating benchmarks and thread-level speculation for JavaScript execution in web applications. Martinsen has an MSc in computer science from the University of Oslo.


Improving perfect parallelism

Lars Karlsson, Umeå University


Perfectly parallel applications can be decomposed into a large number of independent tasks and one typically expects perfect/linear speedup from such applications. A key assumption is that the processors are independent in the sense that they do not share any presources. However, this assumption is certainly false if by processor we mean a single core in a multicore processor. Cores often share one or more levels of cache as well as a memory bus. We point out the simple but often overlooked observation that one can improve on perfect parallelism by forcibly parallelizing each task in an effort to transform shared resource competition into friendly cooperation.


Lars Karlsson is an Assistant Professor at Umeå University and is there affiliated with UMIT Research Lab. He received his Ph.D. in Computing Science in 2011 and specializes in high-performance scientific computing. Lars has made significant algorithmic contributions in the area of high-performance matrix computations and is currently improving the cache efficiency and scalability of matrix eigenvalue solvers.


System-level IPC on multicore platforms

Ola Dahl, Enea


Multicore System-on-Chip solutions, offering parallelization and partitioning, are increasingly used in real-time systems. As the number of cores increase, often in combination with increased heterogeneity in the form of hardware accelerated functionality, we see increased demands on effective communication, inside a multicore node but also on an inter-node system-level. The presentation will outline some of the challenges, as seen from Enea, to be expected when building future communication mechanisms, with requirements on performance and scalability, as well as transparency for applications. We will give examples from ongoing work in the Linux area, from Enea and from other open source contributors.


Ola Dahl is a systems and software engineer with more than twenty years of experience from development and research in the embedded and real-time systems domain. He has worked with software development and design, and with requirements engineering, modeling and simulation, in telecom and other industries, and he has held management positions in industry and in academia. Ola Dal holds a PhD in Automatic Control from Lund Institute of Technology. He joined Enea in 2013, as Principal Engineer at the Enea CTO Office.

Track: Models

Parallelism Management in the Multi/Many-core Processor Era

Professor Per Stenström, Chalmers University of Technology

As we have embarked on the multi/many-core roadmap, managing parallelism transparently to the programmers has had to be abandoned and is now left completely to the programmers. In this talk, I will argue that, by adequate support at the architecture and run-time system level, it seems possible to relieve programmers from much of the hurdles associated with parallelism management. I will especially focus on current developments in task-based parallel programming models and transactional memory and how these concepts can yield high-productivity parallel programming abstractions by making it possible to manage parallelism ³under the hood² in a performance robust way.


Per Stenstrom is professor at Chalmers University of Technology. His research interests are in parallel and distributed computer architecture. He has authored or co-authored three textbooks and more than 130 publications in this area. He has been program chairman of  the IEEE/ACM Symposium on Computer Architecture, the IEEE High-Performance Computer Architecture Symposium, and the IEEE Parallel and Distributed Processing Symposium and acts as Senior Associate Editor of ACM TACO and Associate Editor-in-Chief of JPDC. He is a Fellow of the ACM and the IEEE and a member of Academia Europaea and the Royal Swedish Academy of Engineering Sciences.


Threads are Evil, Tasks are Good: Towards a Unified Resource Management Framework

Professor Mats Brorsson, KTH


Multicore systems are becoming more parallel and heterogeneous. Unfortunately, the process and thread concurrency abstractions in current operating systems does not work well for these architectures in multi-programmed scenarios. Instead, we argue for a radically new concurrency abstraction akin to the tasks in TBB, OpenMP and Cilk to replace the old ones. We will show benefits such as: (i) better resource utilization, and (ii) improved compositional parallelism and discuss how it can be implemented efficiently in the Barrelfish operating system.


Mats Brorsson is a professor of Computer Architecture at KTH Royal Institute of Technology and cross-affiliated with SICS. His current research interests are in programming models, run-time systems, operating systems and the architecture of parallel computer systems in particular multi- and manycore systems. He has participated in several European and national and is the coordinator of the ARTEMIS call 2011 project PaPP: Portable and Predictable Performance on Embedded Heterogeneous Manycores, a project with 16 partners from 8 countries and with a budget of more than 10 M€.


Computing in the age of parallelism: challenges and opportunities

Jörn W. Janneck, Lund University


Several years ago clock scaling came to and end, taking with it the steady acceleration of sequential programs, and ushering in the "age of parallelism." Prompted by some simple physical realities of computing, the sea change in computer science has only just begun --- when it is over, our discipline will have changed forever, along with our understanding of basic concepts such as "algorithm" and "machine" as well as the nature and economics of computing devices, and everything in between: programming languages, programming tools, software architecture, development processes and many more.

Based on our work at the intersection of stream computing and parallel computing machines, this talk will highlight some of the challenges and opportunities in making future parallel computing machines as usable as sequential machines are.


Jorn W. Janneck is a lektor in the computer science department at Lund University. He graduated from the University of Bremen in 1995 and received a PhD from ETH Zurich in 2000. He worked at the Fraunhofer Institute for Material Flow and Logistics (IML) in Dortmund, was a postdoctoral scholar at the University of California at Berkeley in the EECS department, and worked in industrial research from 2003 to 2010, first at Xilinx Research in San Jose, CA, and more recently at the United Technologies Research Center in Berkeley, CA. He is co-author of the CAL actor language and has been working on tools and methodology focused on making dataflow a practical programming model in a wide range of application areas, including image processing, video coding, networking/packet processing, DSP and wireless baseband processing. He has made major contributions to the standardization of RVC-CAL and dataflow by MPEG and ISO. His research is focused on aspects of programming parallel computing machines, including programming languages, machine models, tools, code generation, profiling, and architecture.


The day is free of charge. Please register at

Multicore Day is part of SICS Software Week and organized in partnership with EIT ICT Labs.


Last year's Multicore Day

Program Multicore Day 2012
Abstracts Multicore Day 2012
Videos Multicore Day 2012