Proteus is a database engine designed for today's heterogeneous environments. Proteus adapts to variable data, hardware and workloads through a combination of GPU acceleration, data virtualization, and adaptive scheduling.

Just-In-Time Data Virtualization: Lightweight Data Management with ViDa

CIDR 2015. M. Karpathiotakis, I. Alagiannis, T. Heinis, M. Branco, A. Ailamaki


As the size of data and its heterogeneity increase, traditional database system architecture becomes an obstacle to data analysis. Integrating and ingesting (loading) data into databases is quickly becoming a bottleneck in face of massive data as well as increasingly heterogeneous data formats. Still, state-of-the-art approaches typically rely on copying and transforming data into one (or few) repositories. Queries, on the other hand, are often ad-hoc and supported by pre-cooked operators which are not adaptive enough to optimize access to data. As data formats and queries increasingly vary, there is a need to depart from the current status quo of static query processing primitives and build dynamic, fully adaptive architectures.

We build ViDa, a system which reads data in its raw format and processes queries using adaptive, just-in-time operators. Our key insight is use of virtualization, i.e., abstracting data and manipulating it regardless of its original format, and dynamic generation of operators. ViDa’s queryengine is generated just-in-time; its caches and its query operators adapt to the current query and the workload, while also treating raw datasets as its native storage structures. Finally, ViDa features a language expressive enough to support heterogeneous data models, and to which existing languages can be translated. Users therefore have the power to choose the language best suited for an analysis.

  author    = {Manos Karpathiotakis and
               Ioannis Alagiannis and
               Thomas Heinis and
               Miguel Branco and
               Anastasia Ailamaki},
  title     = {Just-In-Time Data Virtualization: Lightweight Data Management with
  booktitle = {Seventh Biennial Conference on Innovative Data Systems Research, {CIDR}
               2015, Asilomar, CA, USA, January 4-7, 2015, Online Proceedings},
  publisher = {},
  year      = {2015},
  url       = {\_Paper8.pdf},
  timestamp = {Mon, 18 Jul 2022 17:13:00 +0200},
  biburl    = {},
  bibsource = {dblp computer science bibliography,}