Series ISSN: 2367-2005
Number 1 (July 8, 2024)
Original publisher: OpenProceedings.org, ISBN: 978-3-98318-097-4, Electronic Edition
Research Track
GraLMatch: Matching Groups of Entities with Graphs and Language Models
Fernando De Meer Pardo, Claude Lehmann, Dennis Gehrig, Andrea Nagy, Stefano Nicoli, Branka Hadji Misheva, Martin Braschler, Kurt Stockinger
pp. 1–12
Fast Geosocial Reachability Queries
Panagiotis Bouros, Theodoros Chondrogiannis, Daniel Kowalski
pp. 25–38
Efficient Enumeration of Large Maximal k-Plexes
Qihao Cheng, Da Yan, Tianhao Wu, Lyuheng Yuan, Ji Cheng, Zhongyi Huang, Yang Zhou
pp. 53–65
Ensembling Object Detectors for Effective Video Query Processing
Daren Chao, Nick Koudas, Xiaohui Yu, Yueting Chen
pp. 66–79
OmniMatch: Overcoming the Cold-Start Problem in Cross-Domain Recommendations using Auxiliary Reviews
Yingjun Dai, Ahmed El-Roby, Elmira Adeeb, Vivek Thaker
pp. 80–91
Tabular Embeddings for Tables with Bi-Dimensional Hierarchical Metadata and Nesting
Gyanendra Shrestha, Chutian Jiang, Sai Akula, Vivek Yannam, Anna Pyayt, Michael Gubanov
pp. 92–105
Progressive Querying on Knowledge Graphs
Angela Bonifati, Stefania Dumbrava, Haridimos Kondylakis, Georgia Troullinou, Giannis Vassilliou
pp. 106–118
QueryER: A Framework for Fast Analysis-Aware Deduplication over Dirty Data
Giorgos Alexiou, George Papastefanatos, Vassilis Stamatopoulos, Georgia Koutrika, Nectarios Koziris
pp. 119–131
Private Approximate Query over Horizontal Data Federation
Ala Eddine Laouir, Abdessamad Imine
pp. 132–144
SPO-Join: Efficient Stream Inequality Join
Adeel Aslam, Kaustubh Beedkar, Giovanni Simonini
pp. 145–157
Experiments & Analyses Track
Evaluating the Data Model Robustness of Text-to-SQL Systems Based on Real User Queries
Jonathan Fürst, Catherine Kosten, Farhad Nooralahzadeh, Yi Zhang, Kurt Stockinger
pp. 158–170
An Experimental Comparison of Partitioning Strategies for Distributed Graph Neural Network Training
Nikolai Merkel, Daniel Stoll, Ruben Mayer, Hans-Arno Jacobsen
pp. 171–184
Evaluating the Feasibility of Sampling-Based Techniques for Training Multilayer Perceptrons
Sana Ebrahimi, Rishi Advani, Abolfazl Asudeh
pp. 185–198
Analysis of Text-to-SQL Benchmarks: Limitations, Challenges and Opportunities
Anna Mitsopoulou, Georgia Koutrika
pp. 199–212
Number 2 (November 11, 2024)
Original publisher: OpenProceedings.org, ISBN: 978-3-89318-098-1, Electronic Edition
Research Track
Differentially Private Publication of Smart Electricity Grid Data
Sina Shaham, Gabriel Ghinita, Bhaskar Krishnamachari, Cyrus Shahabi
pp. 213–225
DataSculpt: Cost-Efficient Label Function Design via Prompting Large Language Models
Naiqing Guan, Kaiwen Chen, Nick Koudas
pp. 226–232
RASP: Robust Mining of Frequent Temporal Sequential Patterns under Temporal Variations
Hyunjin Choo, Minho Eom, Gyuri Kim, Young-Gyu Yoon, Kijung Shin
pp. 233–245
Modifying an existing sort order with offset-value codes
Goetz Graefe, Marius Kuhrt, Bernhard Seeger
pp. 246–254
MEMPHIS: Holistic Lineage-based Reuse and Memory Management for Multi-backend ML Systems
Arnab Phani, Matthias Boehm
pp. 255–269
LINX: A Language Driven Generative System for Goal-Oriented Automated Data Exploration
Tavor Lipman, Tova Milo, Amit Somech, Tomer Wolfson, Oz Zafar
pp. 270–283
Synopses for Summarizing Spatial Data Streams
Jacco JE Kiezebrink, Wieger R. Punter, Odysseas Papapetrou, Kevin Verbeek
pp. 284–296
PRISMA: A Privacy-Preserving Schema Matcher using Functional Dependencies
Jan-Eric Hellenberg, Fabian Dustin Mahling, Lukas Laskowski, Felix Naumann, Matteo Paganelli, Fabian Panse
pp. 297–309
Taste: Towards Practical Deep Learning-based Approaches for Semantic Type Detection in the Cloud
Tao Li, Feng Liang, Jinqi Quan, Huang Chuang, Teng Wang, Runhuai Huang, Jie Wu, Xiping Hu
pp. 324–336
MaTElDa: Multi-Table Error Detection
Fatemeh Ahmadi, Marc Speckmann, Malte F. Kuhlmann, Ziawasch Abedjan
pp. 364–376
Metadata Unification in Open Data with Gnomon
Christina Christodoulakis, Moshe Gabel, Angela Demke Brown
pp. 377–383
Pythia: A Neural Model for Data Prefetching
Akshay A Bapat, Saravanan Thirumuruganathan, Nick Koudas
pp. 384–396
Fantastic Tables and Where to Find Them: Table Search in Semantic Data Lakes
Martin P Christensen, Aristotelis Leventidis, Matteo Lissandrini, Laura Di Rocco, Renée J. Miller, Katja Hose
pp. 397–410
Communication-Efficient Distributed Deep Learning via Federated Dynamic Averaging
Michail Theologitis, Georgios Frangias, Georgios Anestis, Vasilis Samoladas, Antonios Deligiannakis
pp. 411–424
No Time to Halt: In-Situ Analysis for Large-Scale Data Processing via Virtual Snapshotting
Reza Salkhordeh, Felix M Schuhknecht, Hossein Asadi, Steffen Eiden, André Brinkmann
pp. 438–450
QuIT your B+-tree for the Quick Insertion Tree
Aneesh Raman, Konstantinos Karatsenidis, Shaolin Xie, Matthaios Olma, Subhadeep Sarkar, Manos Athanassoulis
pp. 451–463
Parallel Spatial Join Processing with Adaptive Replication
Nikolaos Koutroumanis, Christos Doulkeridis, Akrivi Vlachou
pp. 464–476
Stable Tree Labelling for Accelerating Distance Queries on Dynamic Road Networks
Henning Koehler, Muhammad Farhan, Qing Wang
pp. 477–489
PEG: Local Differential Privacy for Edge-Labeled Graphs
André Mendonça, Felipe Brito, Javam C Machado
pp. 490–502
Template-based Explainable Inference over High-Stakes Financial Knowledge Graphs
Andrea Colombo, Teodoro Baldazzi, Luigi Bellomarini, Emanuel Sallinger, Stefano Ceri
pp. 503–515
Experiments & Analyses Track
Evaluation of Dataframe Libraries for Data Preparation on a Single Machine
Angelo Mozzillo, Luca Zecchini, Luca Gagliardelli, Adeel Aslam, Sonia Bergamaschi, Giovanni Simonini
pp. 337–349
Benchmarking, Analyzing, and Optimizing WA of Partial Compaction in RocksDB
Ran Wei, Zichen Zhu, Andrew J Kryczka, Jay Zhuang, Manos Athanassoulis
pp. 425–437
Benchmarking Analytical Query Processing in Intel SGXv2
Adrian Lutsch, Muhammad El-Hindi, Matthias Heinrich, Daniel Ritter, Zsolt István, Carsten Binnig
pp. 516–528
Entity Matching using Large Language Models
Ralph Peeters, Aaron Steiner, Christian Bizer
pp. 529–541
Number 3 (March 10, 2025)
Original publisher: OpenProceedings.org, ISBN: 978-3-89318-099-8, Electronic Edition
Research Track
Step-by-Step Data Cleaning Recommendations to Improve ML Prediction Accuracy
Sedir Mohammed, Felix Naumann, Hazar Harmouch
pp. 542–554
Fast, Highly Available, and Recoverable Transactions on Disaggregated Data Stores
Mahesh Dananjaya, Vasilis Gavrielatos, Antonios Katsarakis, Nikos Ntarmos, Vijay Nagarajan
pp. 555–568
Watermarking Decision Tree Ensembles
Stefano Calzavara, Lorenzo Cazzaro, Donald Gera, Salvatore Orlando
pp. 569–575
Query Rewriting-Based View Generation for Efficient Multi-Relation Multi-Query with Differential Privacy
Xinglin Du, Peng Tang, Rui Chen, Ning Wang, Chengyu Hu, Shanqing Guo
pp. 576–588
Dema: Efficient Decentralized Aggregation for Non-Decomposable Quantile Functions
Wang Yue, Martin Boissier, Manisha Luthra, Tilmann Rabl
pp. 589–595
Selective Evolving Centrality in Temporal Heterogeneous Graphs
Landy Andriamampianina, Franck Ravat, Jiefu Song, Nathalie Vallès-Parlangeau, Yanpei Wang
pp. 596–608
Toward Standardized Data Preparation: A Bottom-Up Approach
Eugenie Lai, Yuze Lou, Brit Youngmann, Michael Cafarella
pp. 609–622
Deep Skyline Community Search
Minglang Xie, Jianye Yang, Wenjie Zhang, Shiyu Yang, Xuemin Lin
pp. 636–648
Efficiently Indexing Large Data on GPUs with Fast Interconnects
Josef Schmeißer, Clemens Lutz, Volker Markl
pp. 661–667
Learned Indexes with Distribution Smoothing via Virtual Points
Kasun Amarasinghe, Farhana Choudhury, Jianzhong Qi, James Bailey
pp. 668–680
Efficient Multicore Discovery of Small, High-Quality k-Plex Teams in Multi-attributed Networks
Parisa Esmaeilian Ghahroudi, Sean Chester, Alex Thomo
pp. 681–693
High-dimensional density-based clustering using locality-sensitive hashing
Camilla Birch Okkels, Martin Aumüller, Viktor Bello Thomsen, Arthur Zimek
pp. 694–706
DBCopilot: Natural Language Querying over Massive Databases via Schema Routing
Tianshu Wang, Xiaoyang Chen, Hongyu Lin, Xianpei Han, Le Sun, Hao Wang, Zhenyu Zeng
pp. 707–721
Effective and Efficient Community Search over Large-Scale Hypergraphs
Yu Liu, Qi Luo, Yanwei Zheng, Wenjie Zhang, Xuemin Lin, Dongxiao Yu
pp. 722–734
Gem: Gaussian Mixture Model Embeddings for Numerical Feature Distributions
Hafiz Tayyab Rauf, Alex Bogatu, Norman W. Paton, André Freitas
pp. 735–747
Graph Consistency Rule Mining with LLMs: an Exploratory Study
Hoa Thi Le, Angela Bonifati, Andrea Mauri
pp. 748–754
Z-Shadow: An Efficient Method for Estimating Bicliques in Massive Graphs Using Füredi's Theorem
Bole Chang, Linxin Xie, Wei Li, Meng Qin, Jianfeng Hou
pp. 755–768
hybridNDP: Dynamic Operation Offloading and Cooperative Query Execution in Smart Storage Settings
Christian Knödler, Naeem Ramzan, Ilia Petrov
pp. 769–782
Path-based Algebraic Foundations of Graph Query Languages
Renzo Angles, Angela Bonifati, Roberto García, Domagoj Vrgoč
pp. 783–795
Icewafl: A Configurable Data Stream Polluter
Christoph Schinninger, Fabian Panse, Constantin Kühne, Lisa Ehrlinger
pp. 796–802
Generating Skyline Datasets for Data Science Models
Mengying Wang, Hanchao Ma, Yiyang Bian, Yangxin Fan, Yinghui Wu
pp. 803–815
An RFD-based approach for concept drift detection in Machine Learning Systems
Loredana Caruccio, Stefano Cirillo, Giuseppe Polese, Roberto Stanzione
pp. 816–828
ExaLogLog: Space-Efficient and Practical Approximate Distinct Counting up to the Exa-Scale
Otmar Ertl
pp. 829–841
Legally-Compliant Spatial Fairness Framework: Advancing Beyond Spatial Fairness
Nripsuta Ani Saxena, Ronit Mathur, Cyrus Shahabi
pp. 842–854
Taming the Beast of User-Programmed Transactions on Blockchains: A Declarative Transaction Approach
Nodirbek Korchiev, Akash Pateria, Vodelina Samatova, Sogolsadat Mansouri, Kemafor Anyanwu
pp. 855–866
FedForecaster: An Automated Federated Learning Approach for Time-series Forecasting
Mohamed Maher, Osama Fayez Oun, Mahmoud Saeed Mesmeh, Radwa El Shawi
pp. 867–873
Automated Data Quality Validation in an End-to-End GNN Framework
Sijie Dong, Soror Sahri, Themis Palpanas, Qitong Wang
pp. 874–880
Experiments & Analyses Track
GPU Architectures in Graph Analytics: A Comparative Experimental Study
Peichen Xie, Zhigao Zheng, Yongluan Zhou, Yang Xiu, Hao Liu, Zhixiang Yang, Yu Zhang, Bo Du
pp. 881–893
From Feature Selection to Resource Prediction: An Analysis of Commonly Applied Workflows and Techniques
Ling Zhang, Shaleen Deep, Joyce Cahoon, Jignesh Patel, Anja Gruenheid
pp. 894–908
Evaluating SQL Understanding in Large Language Models
Ananya Rahman, Anny Zheng, Mostafa Milani, Fei Chiang, Rachel Pottinger
pp. 909–921
A Deep Dive Into Cross-Dataset Entity Matching with Large and Small Language Models
Zeyu Zhang, Paul Groth, Iacer Calixto, Sebastian Schelter
pp. 922–934
An Empirical Evaluation of Serverless Cloud Infrastructure for Large-Scale Data Processing
Thomas Bodner, Theo Radig, David Justen, Daniel Ritter, Tilmann Rabl
pp. 935–948
Apache Ignite + Calcite Composable Database System: Experimental Evaluation and Analysis
Mark Dodds, Khuzaima Daudjee
pp. 949–961
Vision Track
Towards Reliable Conversational Data Analytics
Sihem Amer-Yahia, Jasmina Bogojeska, Roberta Facchinetti, Valeria Franceschi, Aristides Gionis, Katja Hose, Georgia Koutrika, Roger Kouyos, Matteo Lissandrini, Silviu Maniu, Katsiaryna Mirylenka, Davide Mottin, Themis Palpanas, Mattia Rigotti, Yannis Velegrakis
pp. 962–969
Towards Hybrid Graphs: Unifying Property Graphs and Time Series
Mouna Ammar, Christopher Rost, Riccardo Tommasini, Shubhangi Agarwal, Angela Bonifati, Petra Selmer, Evgeny Kharlamov, Erhard Rahm
pp. 970–977
Breaking Down the Data-metadata Barrier for Effective Property Graph Data Management
Sepehr Sadoughi, Nikolay Yakovets, George Fletcher
pp. 978–984
Industrial & Applications Track
PhoebeDB: A Disk-Based RDBMS Kernel for High-Performance and Cost-Effective OLTP
Boge Liu, Chunling Wang, Xiaoshuang Chen, Yu Hao, Zhengyi Yang, Yi Jin, Yixing Yang, Wenke Yang, wanchuan zhang, Wenjie Zhang
pp. 996–1004
Generating Activity Definitions with Large Language Models
Andreas Kouvaras, Periklis Mantenoglou, Alexander Artikis
pp. 1005–1013
A Computational Framework for Estimating Days of Maintenance Delay of Naval Ships
Gerald White, Deep Mistry, Kevin Chhoa, Senjuti Basu Roy, Lingyi Zhang, Adam Bienkowski, Krishna Pattipati
pp. 1014–1022
ComCrawler: General Crawling Solution for Aticle Comments
Zhijia Chen, Weiyi Meng, Eduard Dragut
pp. 1023–1031
FISQL: Enhancing Text-to-SQL Systems with Rich Interactive Feedback
Rakesh Menon, Kun Qian, Liqun Chen, Ishika Joshi, Daniel Pandyan, Shashank Srivastava, Yunyao Li
pp. 1032–1038
GRAIL: Graph Retrieval-Augmented In-Context Learning for Node Classification in Real-World Textual-Attributed Graphs
Chanuk Lim, Kyong-Ha Lee, Hyun Ji Jeong, Sungsu Lim
pp. 1039–1047
Data Completion In E-commerce
Liat Antwarg Friedman, Gal Lavee, Bracha Shapira, Dorin Shmaryahu
pp. 1048–1056
UniAsk: AI-powered search for banking knowledge bases
Ilaria Bordino, Francesco Di Iorio, Andrea Galliani, Alessio Rosatelli, Lorenzo Severini
pp. 1057–1065
Demonstration Track
Virtual: Compressing Data Lake Files
Mihail Stoian, Alexander van Renen, Jan Kobiolka, Ping-Lin Kuo, Andreas Zimmerer, Josif Grabocka, Andreas Kipf
pp. 1066–1069
Transforming Maritime Safety: Data-driven Applications for the Real-Time Detection and Mitigation of Maritime Incidents
Georgios Grigoropoulos, Alexandros Troupiotis - Kapeliaris, Ilias Chamatidis, Evangelia Filippou, Konstantina Bereta
pp. 1070–1073
GLOVES: Global Counterfactual-based Visual Explanations
Panagiotis Gidarakos, Nikolas Theologitis, Stavros Maroulis, Loukas Kavouras, Giorgos Giannopoulos, George Papastefanatos
pp. 1074–1077
ASSO: the Automated Schemaless Stream Overseer
Chiara Forresi, Matteo Francia, Enrico Gallinucci, Matteo Golfarelli
pp. 1078–1081
LADYBUG: an LLM Agent DeBUGger for data-driven applications
Joel Rorseth, Parke Godfrey, Lukasz Golab, Divesh Srivastava, Jarek Szlichta
pp. 1082–1085
LogicLM: Robust Application of Large Language Models with Logic Programming for Data Analytics
Evgeny Skvortsov, Shayan Mirjafari, Ojaswa Garg, Yilin Xia, Shawn Bowers, Bertram Ludäscher
pp. 1086–1089
DataLens: ML-Oriented Interactive Tabular Data Quality Dashboard
Mohamed Abdelaal, Samuel Lokadjaja, Arne Kreuz, Harald Schöning
pp. 1090–1093
CompoDB: A Demonstration of Modular Data Systems in Practice
Haralampos Gavriilidis, Lennart Behme, Christian Munz, Varun Pandey, Volker Markl
pp. 1094–1097
REACT: REcourse Analysis with Counterfactuals and Explanation Tables
Anastasiia Avksientieva, Parke Godfrey, Lukasz Golab, Divesh Srivastava, Jarek Szlichta
pp. 1098–1101
SemaSK: Answering Semantics-aware Spatial Keyword Queries with Large Language Models
Zesong Zhang, Jianzhong Qi, Xin Cao, Christian S. Jensen
pp. 1102–1105
Using A Probabilistic Database in an Image Retrieval Application
Fajrian Yunus, Pratik Karmakar, Pierre Senellart, Talel Abdessalem, St phane Bressan
pp. 1106–1109
TETYS: Configurable Topic Modeling Exploration for Big Corpora of Text Documents
Francesco Invernici, Anna Bernasconi, Francesca Curati, Jelena Jakimov, Amirhossein Samavi
pp. 1114–1117
Database is All You Need: Serving LLMs with Relational Queries
Wenbo Sun, Ziyu Li, Vaishnav Srinidhi, Rihan Hai
pp. 1118–1121
Do Research, not Data Visualization! How to Create More Consistent Plots for Experimental Research Papers in Less Time
Justus Henneberg, Felix Schuhknecht
pp. 1122–1125
Secure and Transparent Data Sharing with TrustShare: A GDPR-Compliant Platform
Sven Rasmusen, Konstantina Pityanou, Dimitra Papatsaroucha, Sofiane Lagraa, Moussa Ouedraogo, Evangelos Markakis
pp. 1126–1129
Enabling Complex Event Processing in NebulaStream
Ariane Ziehn, Lily Seidl, Samira Akili, Steffen Zeuch, Volker Markl
pp. 1130–1133
Hyppo: Efficient Discovery and Execution of Data Science Pipelines in Collaborative Environments
Antonios Kontaxakis, Dimitris Sacharidis, Alkis Simitsis, Alberto Abelló, Sergi Nadal
pp. 1134–1137
PROLIT: Supporting the Transparency of Data Preparation Pipelines through Narratives over Data Provenance
Pasquale Leonardo Lazzaro, Marialaura Lazzaro, Paolo Missier, Riccardo Torlone
pp. 1138–1141
AprèsCoT: Explaining LLM Answers with Knowledge Graphs and Chain of Thought
Moein Shirdel, Joel Rorseth, Parke Godfrey, Lukasz Golab, Divesh Srivastava, Jarek Szlichta
pp. 1142–1145
An Interactive Analysis of Serverless Cloud Infrastructure
Thomas Bodner, Tilmann Rabl
pp. 1146–1149
TransforMMer: A Universal Multi-Model Data Generator
Jáchym Bártík, Alžběta Šrůtková, Irena Holubová
pp. 1150–1153
FairnessEval: a Framework for Evaluating Fairness of Machine Learning Models
Andrea Baraldi, Matteo Brucato, Miroslav Dudík, Francesco Guerra, Matteo Interlandi
pp. 1154–1157
VCrypt: Leveraging Vectorized and Compressed Execution for Client-side Encryption
Charlotte Felius, Peter Boncz
pp. 1158–1161
Tutorial Track
Can Operations Research bring you to the next level? Basics and application
Vincent T’kindt, Patrick Marcel
pp. 1162–1165
Systems for Scalable Graph Analytics and Machine Learning: Trends and Methods
Da Yan, Lyuheng Yuan, Akhlaque Ahmad, Saugat Adhikari
pp. 1166–1169
Everything You Always Wanted to Know About JSON Schema (But Were Afraid to Ask)
Mohamed-Amine Baazizi, Dario Colazzo, Giorgio Ghelli, Carlo Sartiani, Stefanie Scherzinger
pp. 1170–1173
Unifying Large Language Models and Knowledge Graphs for Question Answering: Recent Advances and Opportunities
Chuangtao Ma, Yongrui Chen, Tianxing Wu, Arijit Khan, Haofen Wang
pp. 1174–1177