In the current state of machine learning, using a machine learning algorithm on your data often essentially comes down to deciding on what features to use, which can sometimes be extremely nonintuitive, and requires a great deal of testing. With machine learning becoming more ubiquitous all the time, being able to do this quickly is important.
The main ideas of this paper to speed up this process are: intelligent subsampling (since this is just to determine good features and not to actually make business decisions, exact results aren't necessary), materializing partial results to be able to reuse them on similar computations, and maintaining a cache of computed values (since much of the computation will be the same on each iteration of trying new features, with only a few things changed).
Subsampling comes with a fundamental trade-off in terms of speed vs accuracy, but this is easily tunable, and due to the intelligent nature of their subsampling they show that in some cases they can achieve an 89x speedup with only a 1% error, which is pretty impressive.
I am pretty surprise that no one has already done work on this, since besides the transformation materialization, these optimizations seem to be pretty intuitive, and I am somewhat skeptical that they are the first to apply this... I think that these techniques will be influential in 10 years, but whether or not this paper specifically will be, I am not sure.
No comments:
Post a Comment