zur Startseite

Masterarbeit

Partitioning Massive Hypergraphs
Betreuer Dr. rer. nat. Sukanya Bhowmik
Prüfer Prof. Dr. rer. nat. Dr. h. c. Kurt Rothermel
Ende16.10.2019
Beschreibung

In recent years, a strong demand to perform complex data analytics on graph-structured data sets has lead to the advent of distributed graph processing systems [2]. In parallel, the rise of group-based communication technologies such as WhatsApp initiated research in hypergraph processing systems [3]. Hypergraphs are generalized graphs, where edges connect an arbitrary number of vertices (instead of two). For example, the Whatsapp hypergraph consists of user vertices and user groups, where each group is modeled as a hyperedge connecting an arbitrary number of users. 

Hypergraph processing poses unique challenges to data scientists and is of great practical relevance for distributed data analytics. However, efficient processing of hypergraphs heavily relies on efficient partitioning algorithms that maximize graph locality and therefore minimize communication overhead at runtime. In literature, a few heuristic hypergraph partitioning algorithms have been proposed, but they have shortcomings. Streaming hypergraph partitioning [1] considers one vertex at a time from a stream of hypergraph vertices and greedily assigns them to partitions based on a scoring function. As a result, it does not consider relationships between all vertices resulting in lower partitioning quality. HYPE [5] performs hypergraph partitioning using neighborhood expansion while considering the entire hypergraph structure. However, HYPE uses a purely sequential partitioning algorithm which might impact the runtime when dealing with large hypergraphs.

Goals and Tasks: As a result, in this thesis, we focus on efficient partitioning methods of billion-scale hypergraphs as a preprocessing step for distributed hypergraph processing.
• Design and implement methods for hypergraph partitioning (e.g., neighborhood expansion with
parallel processing, combining stream partitioner and global partitioner, etc.).
• Evaluate and compare partitioning latency and quality against state-of-the-art methods [5, 1, 4].
• Document results in a written report and present them in the VS-colloquium.

References
[1] Dan Alistarh, Jennifer Iglesias, and Milan Vojnovic. Streaming min-max hypergraph partitioning. In Advances in Neural
Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, December
7-12, 2015, Montreal, Quebec, Canada, pages 1900–1908, 2015.
[2] Joseph E Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, and Carlos Guestrin. Powergraph: Distributed graphparallel
computation on natural graphs. In Proceedings of the USENIX Conference on Operating Systems Design and
Implementation (OSDI), volume 12, page 2, 2012.
[3] Jin Huang, Rui Zhang, and Jeffrey XuYu. Scalable hypergraph learning and processing. In Proceedings of the 2015 IEEE
International Conference on Data Mining (ICDM), ICDM ’15, pages 775–780, Washington, DC, USA, 2015. IEEE Computer
Society.
[4] George Karypis and Vipin Kumar. Multilevel k-way hypergraph partitioning. In Proceedings of the 36th Annual ACM/IEEE
Design Automation Conference, DAC ’99, pages 343–348, New York, NY, USA, 1999. ACM.
[5] Christian Mayer, Ruben Mayer, Sukanya Bhowmik, Lukas Epple, and Kurt Rothermel. HYPE: massive hypergraph partitioning
with neighborhood expansion. In IEEE International Conference on Big Data, Big Data 2018, Seattle, WA, USA,
December 10-13, 2018, pages 458–467, 2018.