Introduction to Apache Spark Workshop

At Kistamässan

The invitation covers all employees at our FDF-partners and our academic partner KTH i.e.:

  • ABB
  • Bombardier Transportation
  • Ericsson
  • Green Cargo
  • Saab AB
  • TeliaSonera
  • KTH

The Introduction to Apache Spark workshop is for developers to learn the core Spark APIs in-depth. This full-day course features hands-on technical exercises to get up to speed using Spark for data exploration, analysis, and building Big Data applications.

Spark is one of the fastest growing open source projects in the Apache Software Foundation, and recently broke the world record for the 100 TB sort:

Topics covered include:
• Overview of Spark and RDDs
• Installing Spark Locally
• Using Spark’s Core APIs in Scala, Java, Python
• Building Spark Applications
• Deploying on a Big Data Cluster
• Combining SQL, Machine Learning, and Streaming for Unified Pipelines


Prerequisites: some familiarity with programming in Python, Scala, or Java.

Please bring a reasonably current laptop (+2GB RAM) with wifi, and have both JDK 6/7/8 and Python 2.7 installed.  NB: we advise against using Homebrew to install Spark on MacOSX, or Cygwin on Windows.

Downloads for code+data, plus the course slides as PDF:

Instructor bio

Paco Nathan is the Director of Community Evangelism at Databricks in San Francisco, working on Apache Spark. With expertise in distributed systems, machine learning, functional programming, and cloud computing, he has led innovative Data teams building large-scale apps for several years. Paco is an O'Reilly author and an advisor for Amplify Partners. He studied Math and Computer Science at Stanford University, and has 30+ years technology industry experience ranging from Bell Labs to early-stage start-ups.



Friday, November 14, 2014,
09:00 to 17:00
Arne Beurlings Torg 5
Kistamässan, Kista