I am not sure how frequent it is to need to be able to join graphs against other data when doing graph computations, but it does seem that it could be a common situation and GraphX can certainly help to solve this while still being extremely performant.
The main nugget is essentially to be able to run graph algorithms on top of Spark, which gives you fault tolerance for free, as well as allowing you to seamlessly intermingle graph data and other data that you may want to join against the graph data. One important thing done to make this successful was to optimize the common join-map-group pattern.
One trade-off is programming model; the vertex-program model of GraphLab and Pregel may be more intuitive for graph processing than fitting an extension of Spark's RDD API to the problem; I certainly had a little harder time reasoning about the program's behavior. Performance may also be an issue in some cases, but it seems that in general GraphX performs very well.
No comments:
Post a Comment