Friday, October 9, 2015

Review of "Succinct: Enabling Queries on Compressed Data"

Everyone needs to store enormous amounts of data these days, and everyone wants to be able to access it quickly. Succinct presents the problem that storing large quantities of data means you want to be as space-efficient as possible, but to access it quickly you (generally) build indices, which are not at all space-efficient. I am unsure how frequent of a problem this is in practice, but I imagine that there are many use cases for Succinct where it would greatly advance the current state of the art.

The main idea of Succinct is to essentially build an indexing system into the compression. This eliminates the need for space-costly secondary indexes, while also providing fast search capabilities.

I think this is probably different from previous work because the quantities of data that are now being stored in e.g. NoSQL systems are blooming hugely, and in the past typical systems did not need to rely as heavily on compression and space awareness, but the increase in data volume has made it a very necessary feature. Yet, at the same time, the data still needs to be accessed quickly, leading to Succinct.

The trade-off here is, of course, speed vs. space. Succinct falls into a pretty happy middle ground between the two, though it still falls short in some areas, e.g. large sequential reads.

I can see this being influential in the future - this seems to be a very new way of thinking about compressed storage that should be very useful.

No comments:

Post a Comment