Optimizing Non-Indexed Join Processing in Flash Storage-Based Systems Yu Li, Sai Tung On, Jianliang Xu, Senior, IEEE, Byron Choi, Member, IEEE, and Haibo Hu Abstract—Flash memory-based disks (or simply ﬂash disks) have been widely used in today’s computer systems. With their continuously increasing capacity and dropping price, it is envisioned that some database systems will operate on ﬂash disks in the near future. However, the I/O characteristics of ﬂash disks are different from those of magnetic hard disks. Motivated by this, we study the core of query processing in row-based database systems — join processing — on ﬂash storage media. More speciﬁcally, we propose a new framework, called DigestJoin, to optimize non-indexed join processing by reducing the intermediate result size and exploiting fast random reads of ﬂash disks. DigestJoin consists of two phases: (1) projecting the join attributes followed by a join on the projected attributes; and (2) fetching the full tuples that satisfy the join to produce the ﬁnal join results. While the problem of tuple/page fetching with the minimum I/O cost (in the second phase) is intractable, we propose three heuristic page-fetching strategies for ﬂash disks. We have implemented DigestJoin and conducted extensive experiments on a real ﬂash disk. Our evaluation results based on TPC-H datasets show that DigestJoin clearly outperforms the traditional sort-merge join and hash join under a wide range of system conﬁgurations. Index Terms—Query Processing, Joins, Flash Memory, Relational Databases
Flash disks have been widely used in embedded and portable devices such as PDAs and smartphones, as well as in laptop and desktop computers in the form of Solid State Drives (SSDs). Compared to their magnetic...