Computer

Only available on StudyMode
  • Topic: Hash join, Paging, Memory management
  • Pages : 24 (8604 words )
  • Download(s) : 10
  • Published : January 20, 2013
Open Document
Text Preview
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE TRANSACTIONS ON COMPUTERS

1

Optimizing Non-Indexed Join Processing in Flash Storage-Based Systems Yu Li, Sai Tung On, Jianliang Xu, Senior, IEEE, Byron Choi, Member, IEEE, and Haibo Hu Abstract—Flash memory-based disks (or simply flash disks) have been widely used in today’s computer systems. With their continuously increasing capacity and dropping price, it is envisioned that some database systems will operate on flash disks in the near future. However, the I/O characteristics of flash disks are different from those of magnetic hard disks. Motivated by this, we study the core of query processing in row-based database systems — join processing — on flash storage media. More specifically, we propose a new framework, called DigestJoin, to optimize non-indexed join processing by reducing the intermediate result size and exploiting fast random reads of flash disks. DigestJoin consists of two phases: (1) projecting the join attributes followed by a join on the projected attributes; and (2) fetching the full tuples that satisfy the join to produce the final join results. While the problem of tuple/page fetching with the minimum I/O cost (in the second phase) is intractable, we propose three heuristic page-fetching strategies for flash disks. We have implemented DigestJoin and conducted extensive experiments on a real flash disk. Our evaluation results based on TPC-H datasets show that DigestJoin clearly outperforms the traditional sort-merge join and hash join under a wide range of system configurations. Index Terms—Query Processing, Joins, Flash Memory, Relational Databases



1

I NTRODUCTION

Flash disks have been widely used in embedded and portable devices such as PDAs and smartphones, as well as in laptop and desktop computers in the form of Solid State Drives (SSDs). Compared to their magnetic counterpart, flash disks offer an array of advantages such as faster access performance, better shock resistance, lower power consumption, and smaller dimensions. Given these advantages, flash disks have been a competitive mass storage media for today’s computer systems. Adopting flash disks as the storage media, a stream of recent research (e.g., [9], [16], [17], [23], [26], [28], [30]) has investigated the possibility of developing flash-based database systems. It has been remarked that flash disks have unique I/O characteristics. In particular, flash disks do not involve any mechanical components so that there is a negligible seek time in reading data from a flash disk. A random read is almost as fast as a sequential read on flash disks. On the other hand, flash disks have an erase-beforewrite constraint: a flash page has to be erased before it can be overwritten [16], [24]. Although this can be addressed by the out-place update strategy, new issues such as wear leveling and garbage collection arise [5], [7], rendering the write performance worse than the read performance on flash disks. In addition, with the short I/O latency of flash disks (e.g., the random read of a flash disk is 150X faster than that of a magnetic hard disk [16]), the disk I/O cost may no longer dominate the CPU computation cost in evaluating queries on flash disks. The state-of-the-art database systems are mainly deThe authors are with Department of Computer Science, Hong Kong Baptist University, Hong Kong SAR, China. E-mail: {yli,ston,xujl,bchoi,haibo}@comp.hkbu.edu.hk. Digital Object Indentifier 10.1109/TC.2012.86

signed to operate on magnetic hard disks and certainly assume the I/O characteristics of hard disks. Among the operations of database systems, join processing is particularly I/O intensive and computationally costly. In the literature, join processing has been extensively studied, including index-based joins [4], [22], [29] and non-indexed joins (e.g., nested-loop, sort-merge, and hash...
tracking img