Department of Computer Science | Institute of Theoretical Computer Science

CADMO - Center for Algorithms, Discrete Mathematics and Optimization

Seminar on Algorithms for Database Systems

with applications to Data Science

Organization: Michael Böhlen (UZH),  Sven Helmer (UZH), Paolo Penna (ETH)
Teaching language:English
Level:PhD, MSc and advanced BSc students
Academic Year:Spring 2019 (FS19)
Dates: Tuesday 19.2.2019, 16.30 - 18.00h UZH, BIN 0.K.11/12/13 (kickoff meeting)
Saturday 13.4.2019, 9.00 - 15.00h BIN 2.A.01
Saturday 11.5.2019, 9.00 - 15.00h ETH CAB H.52

Overview and objectives:

The area of this year's seminar is Algorithms and Systems for Data Science. Students learn how to critically read and study research papers, how to summarize the contents of a paper, and how to present it in a seminar.

Teaching format:

Each participant writes a self-contained report of about 10 pages and gives a 30 minutes presentation (blackboard, without a computer). Each participant has a buddy. Buddies read the report, make suggestions for improvements, and help with the presentation (e.g., dry runs). The first version of the report is due two weeks before the date of the presentation. This first version of the report and presentation will be discussed with the buddy and the teacher one week before the presentation. The final versions of the report are due at the end of the semester.

Setup and Organization:

The setup of the seminar will be discussed Tuesday, February 19, 2019 from 16:30 until 18:00 in room BIN 0.K.11/12/13 at UZH (kickoff meeting). At the first (kickoff) meeting the available slots for the seminar will be distributed and papers will be assigned.

Presentations:

Participation at all three meetings is compulsory. The assessment depends on the quality of the report, presentation, active participation during the seminar, and input as a buddy.

Useful links:


Topics

1. Architectures and Systems

2. Column Stores

3. Streams

4. Spark

5. Query Processing

6. Clustering

 

Saturday, April 13, 2019:

Presentation Student Buddy Advisor

MISO: Souping up Big Data Query Processing with a Multistore System, SIGMOD 2014.

Pascal Engeli Rabiya Abdullah Sven Helmer

RHEEM: Enabling Cross-Platform Data Processing, PVLDB 2018.

Mesut Ceylan Alex Wolf Sven Helmer

Abstraction for Advanced In-Database Analytics, PVLDB 2018.

Decova Sara Maximilian Wolfertz Michael Böhlen

Column Sketches: A Scan Accelerator for Rapid and Robust Predicate Evaluation, SIGMOD 2018.

Catharina Dekker Clive Charles Javara Paolo Penna

Column-Stores vs. Row-Stores: How Different Are TheyReally?, SIGMOD 2008.

Peter Giger Han-Mi Nguyen Sven Helmer

Access Path Selection in Main-Memory Optimized Data Systems: Should I Scan or Should I Probe?, SIGMOD 2017. (PDF, 683 KB)

Mike Suter Luca Wolf Michael Böhlen

Incremental Query Processing on Big Data Streams, TKDE 2016. (PDF, 433 KB)

Lorenzo Selvatici Timon Stampfli Paolo Penna

The Stratosphere Platform for Big Data Analytics , VLDB Journal 2014. (PDF, 2233 KB)

Syed Shahvaiz Ahmed Donn Edward Anin Paolo Penna

Drizzle: Fast and Adaptable Stream Processing at Scale, SOSP 2017. (PDF, 767 KB)

Yichun Xie Emilien Pierre Carlo Pilloud Michael Böhlen

Saturday, May 11, 2019:

Presentation Student Buddy Advisor

Spark SQL: Relational Data Processing in Spark, SIGMOD 2015.

Luca Wolf Decova Sara Sven Helmer

SHC: Distributed Query Processing for Non-Relational Data Store, ICDE 2018.

Donn Edward Anin Lorenzo Selvatici Sven Helmer

Flare: Optimizing Apache Spark with Native Compilation for Scale-Up Architectures and Medium-Size Data, OSDI 2018.

Clive Charles Javara Syed Shahvaiz Ahmed Sven Helmer

A Minimal Variance Estimator for the Cardinality of Big Data Set Intersection, KDD 2017.

Emilien Pierre Carlo Pilloud Mesut Ceylan Paolo Penna

Orca: A Modular Query Optimizer Architecture for Big Data, SIGMOD 2014.

Maximilian Wolfertz Mike Suter Michael Böhlen

Optimizing Big Data Queries Using Program Synthesis, SOSP 2017.

Alex Wolf Yichun Xie Michael Böhlen

Clustering with Same-Cluster Queries, NIPS 2016.

Michael Studer Peter Giger Paolo Penna

A Hierarchical Algorithm for Extreme Clustering, KDD 2017.

Han-Mi Nguyen Pascal Engeli Paolo Penna

Coconut: A Scalable Bottom-Up Approach for Building Data Series Indexes, VLDB 2018.

Timon Stampfli Catharina Dekker Michael Böhlen