Kostas Tzoumas


Kostas Tzoumas, Data Artisans, Berlin:
Apache Flink: A New Distributed Data Analysis Platform

Apache Flink (flink.incubator.apache.org) is a next-generation platform for Big Data Analysis. Flink combines the flexibility and scalability of MapReduce-like systems with a high-performance runtime and automatic optimization technology inspired by MPP databases. Flink offers fluent APIs in Java and Scala that extend the MapReduce model with arbitrarily long programs and more operators such as join, cogroup, cross, and iterate. Flink's runtime uses main memory efficiently, and gradually degrades to disk with good performance under memory pressure. Flink’s cost-based optimizer automatically picks the best execution strategy for programs taking into account data and hardware characteristics. Finally, Flink features end-to-end first class support for iterative programs, achieving similar performance to Giraph while still being a general (not graph-specific) system. Flink is compatible with the Hadoop ecosystem, runs on top of YARN, and can use HDFS for data storage. Flink is developed by a growing developer community, and is currently witnessing its first commercial installations and use cases.