While I am not sure that running your data analytics in your RDBMS (which generally isn't designed for such a thing) is the best way to go about the problem, people are certainly doing it, so making this faster is definitely a valuable problem to tackle.
The main idea of this paper is to note that many of these analytics algorithms can be solved using IGD (incremental gradient descent). By leveraging this common solution mechanism, they can implement a framework which requires only small extensions to be able to run a wide variety of algorithms, making the development of new algorithms and applying them to new RDBMSes much easier. They also make clever use of data layout and parallelism.
Intuitively, it would seem that there should be a trade-off between performance and generality, with more specific implementations being more performant. This doesn't end up to be the case in their analysis, with their more general solution outperforming specific implementations. This may be more of a result of their other techniques; perhaps if they leveraged the techniques used to implement the general framework to fine-tune the individual algorithms, they could achieve even better performance at a loss of generality and ease of development.
No comments:
Post a Comment