CADMO, Institute of Theoretical Computer Science, Department of Computer Science, ETH Zürich

Seminar on Algorithms for Database Systems

with applications to Data Science

Organization:	Michael Böhlen (UZH), Sven Helmer (UZH), Paolo Penna (ETH)
Teaching language:	English
Level:	PhD, MSc and advanced BSc students
Academic Year:	Spring 2019 (FS19)
Dates:	Tuesday 19.2.2019, 16.30 - 18.00h UZH, BIN 0.K.11/12/13 (kickoff meeting) Saturday 13.4.2019, 9.00 - 15.00h BIN 2.A.01 Saturday 11.5.2019, 9.00 - 15.00h ETH CAB H.52

Overview and objectives:

The area of this year's seminar is Algorithms and Systems for Data Science. Students learn how to critically read and study research papers, how to summarize the contents of a paper, and how to present it in a seminar.

Teaching format:

Each participant writes a self-contained report of about 10 pages and gives a 30 minutes presentation (blackboard, without a computer). Each participant has a buddy. Buddies read the report, make suggestions for improvements, and help with the presentation (e.g., dry runs). The first version of the report is due two weeks before the date of the presentation. This first version of the report and presentation will be discussed with the buddy and the teacher one week before the presentation. The final versions of the report are due at the end of the semester.

Setup and Organization:

The setup of the seminar will be discussed Tuesday, February 19, 2019 from 16:30 until 18:00 in room BIN 0.K.11/12/13 at UZH (kickoff meeting). At the first (kickoff) meeting the available slots for the seminar will be distributed and papers will be assigned.

Presentations:

Saturday April 13, BIN 2.A.01
Saturday May 11, ETH CAB H.52

Participation at all three meetings is compulsory. The assessment depends on the quality of the report, presentation, active participation during the seminar, and input as a buddy.

Useful links:

organizational slides
How to give talks and read papers: link
example of a good report (PDF)

Topics

1. Architectures and Systems

2. Column Stores

3. Streams

4. Spark

5. Query Processing

6. Clustering

Saturday, April 13, 2019:

Presentation	Student	Buddy	Advisor
MISO: Souping up Big Data Query Processing with a Multistore System, SIGMOD 2014.	Pascal Engeli	Rabiya Abdullah	Sven Helmer
RHEEM: Enabling Cross-Platform Data Processing, PVLDB 2018.	Mesut Ceylan	Alex Wolf	Sven Helmer
Abstraction for Advanced In-Database Analytics, PVLDB 2018.	Decova Sara	Maximilian Wolfertz	Michael Böhlen
Column Sketches: A Scan Accelerator for Rapid and Robust Predicate Evaluation, SIGMOD 2018.	Catharina Dekker	Clive Charles Javara	Paolo Penna
Column-Stores vs. Row-Stores: How Different Are TheyReally?, SIGMOD 2008.	Peter Giger	Han-Mi Nguyen	Sven Helmer
Access Path Selection in Main-Memory Optimized Data Systems: Should I Scan or Should I Probe?, SIGMOD 2017. (PDF, 683 KB)	Mike Suter	Luca Wolf	Michael Böhlen
Incremental Query Processing on Big Data Streams, TKDE 2016. (PDF, 433 KB)	Lorenzo Selvatici	Timon Stampfli	Paolo Penna
The Stratosphere Platform for Big Data Analytics , VLDB Journal 2014. (PDF, 2233 KB)	Syed Shahvaiz Ahmed	Donn Edward Anin	Paolo Penna
Drizzle: Fast and Adaptable Stream Processing at Scale, SOSP 2017. (PDF, 767 KB)	Yichun Xie	Emilien Pierre Carlo Pilloud	Michael Böhlen

Saturday, May 11, 2019:

Presentation	Student	Buddy	Advisor
Spark SQL: Relational Data Processing in Spark, SIGMOD 2015.	Luca Wolf	Decova Sara	Sven Helmer
SHC: Distributed Query Processing for Non-Relational Data Store, ICDE 2018.	Donn Edward Anin	Lorenzo Selvatici	Sven Helmer
Flare: Optimizing Apache Spark with Native Compilation for Scale-Up Architectures and Medium-Size Data, OSDI 2018.	Clive Charles Javara	Syed Shahvaiz Ahmed	Sven Helmer
A Minimal Variance Estimator for the Cardinality of Big Data Set Intersection, KDD 2017.	Emilien Pierre Carlo Pilloud	Mesut Ceylan	Paolo Penna
Orca: A Modular Query Optimizer Architecture for Big Data, SIGMOD 2014.	Maximilian Wolfertz	Mike Suter	Michael Böhlen
Optimizing Big Data Queries Using Program Synthesis, SOSP 2017.	Alex Wolf	Yichun Xie	Michael Böhlen
Clustering with Same-Cluster Queries, NIPS 2016.	Michael Studer	Peter Giger	Paolo Penna
A Hierarchical Algorithm for Extreme Clustering, KDD 2017.	Han-Mi Nguyen	Pascal Engeli	Paolo Penna
Coconut: A Scalable Bottom-Up Approach for Building Data Series Indexes, VLDB 2018.	Timon Stampfli	Catharina Dekker	Michael Böhlen

CADMO - Center for Algorithms, Discrete Mathematics and Optimization