# On the Self-Similar Nature of Ethernet Traffic (Extended Version)

Will E. Leland, Murad S. Taqqu, Walter Willinger, and Daniel V. Wilson

One-line summary:

## Overview/Main Points

• Self-Similarity Definition:
Intuitively:
• time series is bursty at all time-scales, compared with classical models which smooth off at large time-scales (eg. exponential or Poisson arrival processes, stochastic models of packet traffic, ...)
Mathematically: see paper for exact definitions.
Implications of mathematical defintion:
• variance of sample mean var(X^(m)) decays as m^(-beta), with 0 < beta < 1.
• autocorrelation function decays hyperbolically in beta as well, meaning autocorrelation function is non-summable (this is long-range dependence.
• spectral density is unbounded at the origin - f(lambda) ~ lambda^(-gamma), with gamma = 1 - beta.
Hurst effect: E[R(n)/S(n)] ~ n^H, where H is Hurst parameter, and H = 1 - (beta/2). It is known (and can be mathematically demonstrated) that H=0.5 corresponds to non-selfsimilar data, while 0.5 < H < 1.0 corresponds to selfsimilar data. Larger H implies larger burstiness.

• Detecting Self-Similarity:
• Variance-time plots: since var(X^(m)) ~ m^(-beta), plotting log(var(X^(m)) vs. log(m) will yield a straight line or slope -beta. As H = 1 - (beta/2), can deduce H.
• R/S analysis (Pox plots): plotting log(R(n)/S(n)) vs. log(n) will yield a straight line of slope H.
• Periodogram analysis: an estimate of H (known as Whittle's approximate MLE) can be calculated along with a confidence interval, based on the power spectrum density function properties. Few details given in paper.
• Proof by picture, of course.

• Ethernet Traffic:
• High-quality Ethernet packet logs from Bellcore. Logs have microsecond resolution for periods of 27 hours, taken on four different days across a 4 year period.
• All three tests gave similar values for H - Ethernet traffic is therefore self-similar.
• Both LAN and WAN traffic showed similar H values.
• The busier times of day yielded more bursty traffic (seen by higher H values).
• The type and load of traffic (increased by web and exponential growth of internet) seemed to not affect the self-similarity property, although slight perturbations were noticed in H and attributed to host-to-host vs router-to-router, diskless vs. diskfull workstations, and other such things.
• Low-load WAN traffic was non-selfsimilar. (Inconclusive, really.)

• Causative Factors: Mandelbrot showed if have a large number of ON/OFF source clients, with ON/OFF periods having a heavy-tailed distribution, then the aggration of of the clients' traffic would be self-similar. (Heavy tail: P[U >= u] ~ u^(-alpha), 0 < alpha < 2.)

In a later paper, this hypothesis is verified.

• Modelling Self-Similarity:
• Fractional Gaussian noise, and fractional autoregressive integrated moving-average (ARIMA) processes. Essentially time series that satisfy certain properties regarding autocorrelation functions, can be shown that they are self-similar. Expensive to compute, reaonable approximations can be now made.
• Model ON/OFF sources, aggregate. Behaves like fractional Brownian motion (related to fractional Gaussian noise)

• Measuring Burstiness:
• H.
• Index of dispersion (for counts): for time interval L, IDC(L) = var(arrivals during L) / (expected value of arrivals during L). Valid, self-similar process have monotonically increasing (linear) IDC.
• peak-to-mean ratio: peak bandwidth / mean bandwidth. But over which timescale?? (Visual picture of whitenoise.)
• coefficient of variation: standard deviation of interarrival time : mean of interarrival time ratio. Infinite variance of interarrival time for heavy-tailed!

## Relevance

Is revolutionizing assumptions and understanding of networks, and time-series/processes in computer systems in general. Self similar stuff now includes networks (WAN, LAN, ..), file system traffic, HTTP traffic, ...

## Flaws

• overly mathematical for a CS paper (but necessarily so)
• self-similar definitions have limits as t goes to infinity; we have finite processes. How much does this affect results?
• to date, lack of practical implications

Back to index