**One-line summary:**

**Self-Similarity Definition**:

Intuitively:- time series is bursty at all time-scales, compared with classical models which smooth off at large time-scales (eg. exponential or Poisson arrival processes, stochastic models of packet traffic, ...)
- adding more sources adds to the burstiness

Implications of mathematical defintion:- variance of sample mean var(X^(m)) decays as m^(-beta), with 0 < beta < 1.
- autocorrelation function decays hyperbolically in beta as well, meaning
autocorrelation function is non-summable (this is
*long-range dependence*. - spectral density is unbounded at the origin - f(lambda) ~ lambda^(-gamma), with gamma = 1 - beta.

**Detecting Self-Similarity**:**Variance-time plots**: since var(X^(m)) ~ m^(-beta), plotting log(var(X^(m)) vs. log(m) will yield a straight line or slope -beta. As H = 1 - (beta/2), can deduce H.**R/S analysis (Pox plots)**: plotting log(R(n)/S(n)) vs. log(n) will yield a straight line of slope H.**Periodogram analysis**: an estimate of H (known as Whittle's approximate MLE) can be calculated along with a confidence interval, based on the power spectrum density function properties. Few details given in paper.- Proof by picture, of course.
**Ethernet Traffic**:- High-quality Ethernet packet logs from Bellcore. Logs have microsecond resolution for periods of 27 hours, taken on four different days across a 4 year period.
- All three tests gave similar values for H - Ethernet traffic is therefore self-similar.
- Both LAN and WAN traffic showed similar H values.
- The busier times of day yielded more bursty traffic (seen by higher H values).
- The type and load of traffic (increased by web and exponential growth of internet) seemed to not affect the self-similarity property, although slight perturbations were noticed in H and attributed to host-to-host vs router-to-router, diskless vs. diskfull workstations, and other such things.
- Low-load WAN traffic was non-selfsimilar. (Inconclusive, really.)

**Causative Factors**: Mandelbrot showed if have a large number of ON/OFF source clients, with ON/OFF periods having a heavy-tailed distribution, then the aggration of of the clients' traffic would be self-similar. (Heavy tail: P[U >= u] ~ u^(-alpha), 0 < alpha < 2.)In a later paper, this hypothesis is verified.

**Modelling Self-Similarity**:- Fractional Gaussian noise, and fractional autoregressive integrated moving-average (ARIMA) processes. Essentially time series that satisfy certain properties regarding autocorrelation functions, can be shown that they are self-similar. Expensive to compute, reaonable approximations can be now made.
- Model ON/OFF sources, aggregate. Behaves like fractional Brownian motion (related to fractional Gaussian noise)

**Measuring Burstiness:**- H.
- Index of dispersion (for counts): for time interval L, IDC(L) = var(arrivals during L) / (expected value of arrivals during L). Valid, self-similar process have monotonically increasing (linear) IDC.
- peak-to-mean ratio: peak bandwidth / mean bandwidth. But over which timescale?? (Visual picture of whitenoise.)
- coefficient of variation: standard deviation of interarrival time : mean of interarrival time ratio. Infinite variance of interarrival time for heavy-tailed!

## Relevance

Is revolutionizing assumptions and understanding of networks, and time-series/processes in computer systems in general. Self similar stuff now includes networks (WAN, LAN, ..), file system traffic, HTTP traffic, ...## Flaws

- overly mathematical for a CS paper (but necessarily so)
- self-similar definitions have limits as t goes to infinity; we have finite processes. How much does this affect results?
- to date, lack of practical implications

*Back to index*