Flowlet aware routing engine
Intro
Suppose you have a stream of traffic that you want to divide among two
paths with say 70% on one path and 30% on the other, what would you
do?
One approach is to put 70% of the packets on the first
path and the rest on the other path. Of course, this will give you
very good accuracy. Even when the splitting ratios change, we can
prove that the traffic share on each path will never be farther than
the desired share by more than the size of the largest
packet. Unfortunately, TCP flows want packets to arrive in order, if
one of the paths has higher delay than the other, packets might be
re-ordered forcing TCP congestion windows to not open up and hurting
application performance.
An alternative is to put 70% of flows on one path and 30% of flows on
the other path. Unfortunately, flow sizes~(and rates) are heavy
tailed; a few flows end up contributing most of the traffic, so it is
very easy to get inaccurate traffic distributions. Even more when
splitting ratios change, you have one of two options:
- move flows already pinned to paths => introduce reordering or,
- only allocate the new flows according to the new ratios =>
long waiting time before we can achieve the desired traffic
ratios.
This toolkit implements an alternative flowlets, which are bursts of
packets within a flow separated by some large idle period, say
δ. Turns out if we pick δ larger than the maximum delay
difference between the paths we divide traffic across, reordering can
be completely avoided… Further
- Flowlets exist in most flows, and are due to TCP burstiness
-
Flowlets are small in size and arrive much more oftener than flows =>
better accuracy in balancing and quicker re-balance if ratios change
- Most of the benefits of flowlet switching can be gotten by using
a 1KB hash-table to split traffic
Papers
More information about Flare is in these papers.
Flare: Responsive Load Balancing Without Packet Reordering
Srikanth Kandula, Dina Katabi, Shantanu Sinha, Arthur Berger.
ACM Computer Communications Review, 2007.
Harnessing TCPs Burstiness with Flowlet Switching
Shan Sinha, Srikanth Kandula, Dina Katabi.
ACM Hot Topics in Networks~(HotNets), 2004.
Why you may care?
Load-sensitive routing mechanisms, like TeXCP adapt
traffic ratios incrementally based on utilization estimates from the
network. Quickly achieving the desired traffic ratios will let
adaptation schemes like TeXCP iterate faster.
Download Flare
Source for packet trace simulator of flowlets. Flare
Install
- tar zxvf flare_1.0.tgz
- make
- export PATH=${PATH}:.
HowTo
./FlarePacketSim IPSUMDUMP yes 300 3 "(.3;0;0),(.3;.03;0),(.4;.06;0)" CONSTANT rttFile FLARE -o .2 10 < traceFile
- Type of packet trace can be
- IPSUMDUMP is Bro's preferred format
- PCAP is tcpdump
- TSH is mostly NLANR's traces,
- FILE is for piping output to another Flare
- yes | no — yes if u want Flare to measure #of duplicates/reordering
- 300 — the measurement interval in msecs, granularity of computing error
- 3 — number of paths
- "(.3;0;0),(.3;.03;0),(.4;.06;0)" — path description,
(traffic share, propagation delay in secs, exponential parameter for variance in delay), only the relative propagation delays matter
- CONSTANT — how do traffic shares change, can be CONSTANT
| SINE3 | SINE2 etc.
- rttFile — the rtts of flows in the trace, it doesn't make sense to "simulate" flows
in the pkt trace that have smaller rtt than propagation delays on path, so Flare ignores such packets
- How to distribute traffic? can be one of
- FLARE — is the flare traffic shifter
- COIN — randomly chooses paths with given weights,
- BINS — is an improved "best-guess" version of the splitter used in today's routers,
- HASHREGIONS — maps hash-space to paths according to ratios
- -o .2 10 — arguments to the selector; the first is the idle
timeout to use for flare, the second is the size of the hash table to
use in bits
- traceFile — the packet trace
You may find it easier to use the wrapper FWrapPacketSim.sh. FErrorsDelta.sh and FTable.sh show two other ways to script Flare.
Credits
Srikanth Kandula, Shan Sinha
Legal et. al.
Original software. Quickly written research prototype, absolutely no warranty.
|