Monday, November 16, 2015

Review of "Jellyfish: Networking Data Centers Randomly"

The main idea of Jellyfish is to connect servers / switches completely randomly, rather than trying to conform to a regular structure such as the traditional tree / fat-tree structure. The primary motivation was the ease of incrementally adding new servers, but they found that the random layout actually increased bandwidth capacity when using the same hardware due to the paths between servers being shorter on average.

The VL2 paper also involved an element of randomness, though at the routing level rather than the physical connection level, which makes me wonder if there may be anything fundamental about the use of randomness in networking to avoid congestion and increase connectivity.

There are two main tradeoffs I see here: complexity of routing, and length/complexity of cabling. Since servers are no longer connected to nearby neighbors in a tree fashion, and may be connected to servers anywhere else in the data center, the average cable length may increase significantly, and the cabling follows a less regular structure so it may result in a "dread spaghetti monster" (authors' words). They discuss ways to solve this issue including clustering switches (which there will be many more as compared to servers for large clusters) in the middle of the data center, constraining most of the cabling to that area. They discuss a number of ways to deal with the routing complexity issue.

This idea is very interesting, and I am curious to see if in the 3 years since publication anyone has tried this.

No comments:

Post a Comment