Stream Query Repository: Network Traffic Management (CQL Queries)

Queries in CQL

  1. Top-k Traffic Query: Monitor the source-destination pairs in the top 5 percentile in terms of total traffic in the past 20 minutes over a backbone link B.

    Load: Select   srcIP, destIP, Sum(len) as traffic 
          From     Packets [Range 20 Minute]
          Where    colID = 'B'
          Group By srcIP, destIP
    
    Q: Select   srcIP, destIP, traffic
       From     Load as L1   
       Where    (Select Count(*)
                 From   Load as L2 
                 Where  L2.traffic < L1.traffic) > 
                (Select 0.95 * Count(*) 
                 From Load)      
       Order By traffic  
    

  2. Customer Monitoring Query: Maintain the fraction of packets on a particular backbone link B generated by a particular customer network C in the past hour.

    (Select Count(*)
     From   Packets [Range 60 Minute] B
            Packets C
     Where  B.colID = 'B' and C.colID = 'C' and C.id = B.id) 
     / 
    (Select Count(*) 
     From   Packets [Range 60 Minute] B
     Where  B.colID = 'B')
    

    Comments: For simplicity we assume that the denominator is never zero.

  3. Protocol Analysis Query: For each source IP address and each 5 minute interval, count the number of bytes and number of packets resulting from HTTP requests.

    Select   Rstream(srcIP, Sum(len), Count(*))
    From     Packets [Range 5 Minute
                      Slide 5 Minute]
    Where    destPort = '80'
    Group By srcIP 
    

    Comments: The query assumes that all HTTP requests are for port 80.

  4. Source Monitoring Query: Monitor the 10-minute exponentially decaying average of the number of packets from each source host on a per-minute basis.

    Q1: Select   Rstream(srcIP, Count(*) as num)
        From     Packets [Range 1 Minute 
                          Slide 1 Minute]
        Group By srcIP 
    
    Q2: Select   Rstream(srcIP, ExpDecAvg(num))
        From     Q1  [Range 10 Minute
                      Slide 1 Minute]
        Group By srcIP
    

    Comments: ExpDecAvg is a user-defined windowed-aggregate that computes the exponentially decaying average over a window of at most 10 values.

  5. Misbehaving Host Query: Identify TCP SYN packets for which a SYNACK was sent, but no ACK was received within a specified maximum bound of two minutes on the TCP handshake completion latency.

    DelayedSynStream:
     Select Dstream (*)
     From   Packets [Range 2 Minute] as P
     Where  isSyn(P1.flags)
    
    MisbehavingSyn:
     Select *
     From DelayedSynStream [Now] S
     Where Exists (Select *
                   From   Packets [Range 2 Minute] P1
                   Where  isSyn(P1.flags) and isAck(P1.flags) and
                          P1.srcIP = S.destIP and P1.srcPort = S.destPort and
                          P1.destIP = S.srcIP and P1.destPort = S.srcPort) and
           Not Exists(Select *
                      From   Packets [Range 2 Minute] P2
    		  Where  isAck(P2.flags) and
    		         P2.srcIP = S.srcIP and P2.srcPort = S.srcPort and
    		         P2.destIP = S.destIP and P2.destPort = S.destPort)
    

    Comments: This query uses two user-defined functions, isSyn and isAck, to identify packets with SYN and ACK bits set, respectively. The derived stream DelayedSynStream produces a stream of packets delayed by 2 minutes that have their SYN bit set. The outermost subquery outputs a (delayed) SYN packet as misbehaving if a SYNACK packet was sent (in the "reverse" direction) but no ACK packet was received.

  6. Flow Information Query: Generate the flows in the packet stream. A flow is defined as a sequence of packets from the same source to the same destination arriving "close together" in time. For simplicity assume that "close together" means less than 2 minutes apart, i.e., a flow ends when no packets arrive on the flow over a 2 minute interval. For each flow, output the source and destination addresses, the number of packets constituting the flow, and the length of the flow.

    FlowStartIdentifyStream:
     (Select P1.*, 1 as flowStartBit
      From Packets [Now] as P1
      Where (P1.srcIP, P1.destIP) Not In
            (Select srcIP, destIP
             From   Packets [Range 2 Minute] P2
             Where  P2.timestamp < P1.timestamp))
    
     Union
    
     (Select P1.*, 0 as flowStartBit
      From Packets [Now] as P1
      Where (P1.srcIP, P1.destIP) In
            (Select srcIP, destIP
             From   Packets [Range 2 Minute] P2
             Where  P2.timestamp < P1.timestamp))
          
    FlowIdStream:
     (Select P1.*, flowId
      From   FlowStartIdentifyStream [Now] P1, 
             (Select   P2.srcIP, P2.destIP, Count(*) as flowId
              From     FlowStartIdentifyStream as P2
              Where    P2.flowStartBit = 1
              Group By P2.srcIP, P2.destIP) I
      Where  P1.srcIP = I.srcIP and P1.destIP = I.destIP
     )
    
    FlowInformation:
     Select   srcIP, destIP, flowId, Count(*), Sum(len)
     From     FlowIdStream F
     Group By srcIP, destIP, flowId
    

    Comments: This query is expressed in terms of 3 subqueries. The first subquery identifies for each packet whether it starts a new flow or is part of an ongoing flow. In the former case, the flowStartBit is set to 1, and in the latter to 0. The second subquery, FlowIdStream computes a flowId for each flow and appends it to all the packets belonging to the flow. For each source-destination pair, the unique flowId for a flow is the number of packets so far that have started new flows between the same source-destination pair, i.e., the current number of packets between the source-destination pair with flowStartBit set to 1. The third and outermost subquery computes the required aggregates. Note that the output of this query is a relation. The output can be modified to be a stream if necessary.


Last modified: Dec 2 2002. Please send comments and questions to shivnath@stanford.edu