Proteus is a database engine designed for today's heterogeneous environments. Proteus adapts to variable data, hardware and workloads through a combination of GPU acceleration, data virtualization, and adaptive scheduling.

GPU-accelerated data management under the test of time

CIDR 2020. A. Raza, P. Chrysogelos, P. Sioulas, V. Indjic, A. Anadiotis, A. Ailamaki

Abstract

GPUs are becoming increasingly popular in large scale data center installations due to their strong, embarrassingly parallel, processing capabilities. Data management systems are riding the wave by using GPUs to accelerate query execution, mainly for analytical workloads. However, this acceleration comes at the price of a slow interconnect which imposes strong restrictions in bandwidth and latency when bringing data from the main memory to the GPU for processing. The related research in data management systems mostly relies on late materialization and data sharing to mitigate the overheads introduced by slow interconnects even in the standard CPU processing case. Finally, workload trends move beyond analytical to fresh data processing, typically referred to as Hybrid Transactional and Analytical Processing (HTAP).

Therefore, we experience an evolution in three different axes: interconnect technology, GPU architecture, and workload characteristics. In this paper, we break the evolution of the technological landscape into steps and we study the applicability and performance of late materialization and data sharing in each one of them. We demonstrate that the standard PCIe interconnect substantially limits the performance of state-of-the-art GPUs and we propose a hybrid materialization approach which combines eager with lazy data transfers. Further, we show that the wide gap between GPU and PCIe throughput can be bridged through efficient data sharing techniques. Finally, we provide an H2TAP system design which removes software-level interference and we show that the interference in the memory bus is minimal, allowing data transfer optimizations as in OLAP workloads.

@inproceedings{DBLP:conf/cidr/RazaCSIAA20,
  author    = {Aunn Raza and
               Periklis Chrysogelos and
               Panagiotis Sioulas and
               Vladimir Indjic and
               Angelos{-}Christos G. Anadiotis and
               Anastasia Ailamaki},
  title     = {GPU-accelerated data management under the test of time},
  booktitle = {10th Conference on Innovative Data Systems Research, {CIDR} 2020,
               Amsterdam, The Netherlands, January 12-15, 2020, Online Proceedings},
  publisher = {www.cidrdb.org},
  year      = {2020},
  url       = {http://cidrdb.org/cidr2020/papers/p18-raza-cidr20.pdf},
  timestamp = {Mon, 18 Jul 2022 17:13:00 +0200},
  biburl    = {https://dblp.org/rec/conf/cidr/RazaCSIAA20.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}