Department of Computer Science | Institute of Theoretical Computer Science

CADMO - Center for Algorithms, Discrete Mathematics and Optimization

Seminar Database Systems FS2018

Topic: Locality Hashing, Similarity, Nearest Neighbours

Organization: Michael Böhlen, Peter Widmayer and Przemyslaw Uznanski
Teaching language:English
Level:PhD, MSc and advanced BSc students
Academic Year:Spring 2018
Dates: Friday 23.2.2018 14.15-15.00h ETHZ CAB H 52
Saturday 28.4.2018 9.30-15.00h ETHZ CAB H 52
Saturday 19.5.2018 TBD

Overview and objectives: The area of this year's seminar is Locality Hashing, Similarity, Nearest Neighbours. Students learn how to critically read and study research papers, how to summarize the contents of a paper, and how to present it in a seminar.

Teaching format: Each participant writes a self-contained report of about 10 pages and gives a 30 minutes presentation (blackboard, without a computer). Each participant has a buddy. Buddies read the report, make suggestions for improvements, and help with the presentation (e.g., dry runs). The first version of the report is due three weeks before the date of the presentation, and will be discussed with the buddy and the professor about one week before the presentation. The final versions of the report are due one week before the date of the seminar.

Setup and Organization: The setup of the seminar will be discussed Friday February 23, 2018 from 14:15 until 15:00 in room CAB H 52 at ETHZ. At the first meeting the available slots for the seminar will be distributed and papers will be assigned.


Participation at all three meetings is compulsory for students. The assessment depends on the quality of the report, presentation, active participation during the seminar, and input as a buddy.

Useful links:

Saturday 28.4.2018:

Gionis, Indyk, Motwani: Similarity Search in High Dimensions via Hashing VLDB'99 Timothy PescatoreMichael BöhlenOlga Klimashevskareport
Indyk: Dimensionality Reduction Techniques for Proximity Problems SODA'00 Dhivyabharathi RamasamyPrzemysław UznańskiStefan Tiegelreport
Indyk: Stable Distributions, Pseudorandom Generators, Embeddings, and Data Stream Computation FOCS'00 Luise ArnPrzemysław UznańskiNicolas Gordilloreport
Indyk, Motwani: Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality STOC'98 Lukas ArnoldPeter WidmayerWanja Chrestareport
Datar, Immorlica, Indyk, Mirrkoni: Locality-Sensitive Hashing Scheme Based on p-Stable Distributions SoCG'04 Alphonse MariyagnanaseelanPeter WidmayerSebastian Sanchezreport
Broder, Glassman, Manasse, Zweig: Syntactic clustering of the Web Computer Networks and ISDN Systems Amos Madalin NeculauPeter WidmayerMichael Studerreport

Saturday 19.5.2018:
Cormode, Datar, Indyk, Muthukrishnan: Comparing Data Streams Using Hamming Norms (How to Zero In) VLDB'02 Stefan TiegelMichael BöhlenLukas Arnold
Andoni, Krauthgamer, Razenshteyn: Sketching and Embedding are Equivalent for Norms STOC'15 Wanja ChrestaPeter WidmayerDhivyabharathi Ramasamy
Cormode, Muthukrishnan: An Improved Data Stream Summary: The Count-Min Sketch and its Applications J. Algorithms Liu BingyanPrzemysław UznańskiAlphonse Mariyagnanaseelan
Andoni, Indyk, Laarhoven, Razenshteyn, Schmidt: Practical and Optimal LSH for Angular Distance NIPS'15 Nicolas GordilloPrzemysław UznańskiTimothy Pescatore
Andoni, Indyk: Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions FOCS'06 Michael StuderPeter WidmayerLiu Bingyan
Tang, U, Cai, Mamoulis, Cheng: Earth Mover’s Distance based Similarity Search at Scale VLDB'13 Olga KlimashevskaMichael BöhlenAmos Madalin Neculau
Matusevych, Smola, Ahmed: Hokusai — Sketching Streams in Real Time UAI'12 Sebastian SanchezMichael BöhlenLuise Arn