Section: Massive Datasets and Data Streams
Title: RadixZip: Linear Time Compression of Token Streams
Authors: B D Vo and G S Manku
ConfAcronym: VLDB 2007
ConfName: 33rd International Conference on Very Large Data Bases
ConfURL: http://www.vldb2007.org
ConfPages: 1162-1172
ConfMonth: Sep
ConfYear: 2007
PDF: papers/07vldb-radixzip.pdf
PS: papers/07vldb-radixzip.ps
PPT: papers/07vldb-radixzip.ppt
ID: 07vldb
Abstract: RadixZip is a block compression technique for token streams.
It introduces RadixZip Transform, a linear time algorithm that
rearranges bytes using a technique inspired by radix sorting. For
appropriate data, RadixZip Transform is analogous to Burrows-Wheeler
Transform used in bzip2, but is both simpler in operation and more
effective in compression. In addition, RadixZip Transform can take
advantage of correlations between token streams with no computational
overhead. Experiments over practical data show that for common token
streams, RadixZip is superior to bzip2.
Title: Detecting Near-Duplicates for Web Crawling
Authors: G S Manku, A Jain and A D Sarma
ConfAcronym: WWW 2007
ConfName: 16th International World Wide Web Conference
ConfURL: http://www2007.org
ConfPages: 141-149
ConfMonth: May
ConfYear: 2007
PDF: papers/07www-duplicates.pdf
PS: papers/07www-duplicates.ps
PPT: papers/07www-duplicates.ppt
ID: 07www
Abstract: Near-duplicate web documents are abundant. Two such documents
differ from each other in a very small portion that displays
advertisements, for example. Such differences are irrelevant for web
search. So the quality of a web crawler increases if it can assess
whether a newly crawled web page is a near-duplicate of a previously
crawled web page or not.
In the course of developing a near-duplicate detection system for
a multi-billion page repository, we make two research contributions.
First, we demonstrate that Charikar's fingerprinting technique is
appropriate for this goal. Second, we present an
algorithmic technique for identifying existing f-bit fingerprints
that differ from a given fingerprint in at most k bit-positions, for
small k. Our technique is useful for both online queries (single
fingerprints) and batch queries (multiple fingerprints). Experimental
evaluation over real data confirms the practicality of our design.
Title: Approximate Counts and Quantiles over Sliding Windows
Authors: A Arasu and G S Manku
ConfAcronym: PODS 2004
ConfName: 23rd ACM Symposium on Principles of Database Systems
ConfURL: http://www.sciences.univ-nantes.fr/irin/SIGMODPODS04/
ConfPages: 286-296
ConfMonth: June
ConfYear: 2004
PDF: papers/04pods-sliding.pdf
PS: papers/04pods-sliding.ps
PPT: papers/04pods-sliding.ppt
ID: 04pods
Abstract:
We consider the problem of maintaining ε-approximate counts
and quantiles over a stream sliding window using limited
space. We consider two types of sliding windows depending on
whether the number of elements N in the window is fixed
(fixed-size sliding window) or variable
(variable-size sliding window). In a fixed-size sliding
window, both the ends of the window slide synchronously over the
stream. In a variable-size sliding window, an adversary slides the
window ends independently, and therefore has the ability to vary the
number of elements N in the window.
We present various
deterministic and randomized algorithms for approximate counts and
quantiles. All of our algorithms require O(1/ε
polylog(1/ε, N)) space. For quantiles, this space
requirement is an improvement over the previous best bound of
O(1/ε² polylog(1/ε, N)). We believe that no
previous work on space-efficient approximate counts over sliding
windows exists.
Title: Query Processing, Resource Management and Approximation in a Data Stream Management System
Authors: R Motwani, J Widom, A Arasu, B Babcock, S Babu, M Datar, G S Manku, C Olston, J Rosenstein and R Varma
ConfAcronym: CIDR 2003
ConfName: 1st Biennial Conference On Innovative Data Systems Research
ConfURL: http://www-db.cs.wisc.edu/cidr/
ConfPages: 245-254
ConfMonth: Jan
ConfYear: 2003
PDF: papers/03cidr-stream.pdf
PS: papers/03cidr-stream.ps
PPT:
ID: 03cidr
Abstract:
This paper describes ongoing work developing the Stanford Stream
Data Manager (STREAM), a system for
executing continuous queries over multiple continuous data streams.
The STREAM system supports a declarative query language and it copes
with high data rates and query workloads by providing approximate
answers when resources are limited. This paper describes specific
contributions made so far and enumerates our next steps in
developing a general Data Stream Management System.
Title: Approximate Frequency Counts over Data Streams
Authors: G S Manku and R Motwani
ConfAcronym: VLDB 2002
ConfName: 28th VLDB
ConfURL: http://www.cs.ust.hk/vldb2002/
ConfPages: 346-357
ConfMonth: August
ConfYear: 2002
PDF: papers/02vldb-freq.pdf
PS: papers/02vldb-freq.ps
PPT: papers/02vldb-freq.ppt
ID: 02vldb
Abstract:
Consider a stream of elements. This paper proposes two simple
algorithms for identifying all elements whose relative frequency (in
the portion of the stream seen so far) exceeds a certain threshold
(say, 0.01%). Both algorithms require provably small memory
footprints. Although the output is approximate, the error is
guaranteed not to exceed a user-specified parameter. The algorithms
can easily be deployed for streams of singleton items like those
found in IP network monitoring.
The latter half of the paper
considers streams of variable-sized sets of items, e.g., sequence of
market-basket transactions at a retail store. One of the above
algorithms is adapted to compute frequent itemsets for such streams
in a single pass. The implementation is currently the fastest
program for computing frequent itemsets and association rules.
Title: Random Sampling Techniques for Space Efficient Online Computation of Order Statistics of Large Datasets
Authors: G S Manku, S Rajagopalan and B G Lindsay
ConfAcronym: SIGMOD 1999
ConfName: 1999 ACM SIGMOD
ConfURL: http://www.research.att.com/conf/sigmod99/
ConfVolume: 28
ConfNumber: 2
ConfPages: 251-62
ConfMonth: June
ConfYear: 1999
PDF: papers/99sigmod-unknown.pdf
PS: papers/99sigmod-unknown.ps
PPT: papers/99sigmod-unknown.ppt
ID: 99sigmod
Abstract:
In the followup paper in SIGMOD-99, we address two issues left open
in our earlier paper. First, all known and space efficient
algorithms for approximate quantile finding require advance
knowledge of the length of the input sequence. Many important
database applications employing quantiles cannot provide this
information. In this paper, we present a novel non-uniform random
sampling scheme and an extension of our framework. Together, they
form the basis of a new algorithm which computes approximate
quantiles without knowing the input sequence length. Second, if the
desired quantile is an extreme value (e.g., within the top 1% of the
elements), the space requirements of currently known algorithms are
overly pessimistic. We provide a simple algorithm which estimates
extreme values using less space than required by the earlier more
general technique for computing all quantiles. Our principal
observation here is that random sampling is quantifiably better when
estimating extreme values than is the case with the median.
Title: Approximate Medians and other Quantiles in One Pass and with Limited Memory
Authors: G S Manku, S Rajagopalan and B G Lindsay
ConfAcronym: SIGMOD 1998
ConfName: 1998 ACM SIGMOD
ConfURL: http://www.boeing.com/special/sigmod98
ConfVolume: 27
ConfNumber: 2
ConfPages: 426-35
ConfMonth: June
ConfYear: 1998
PDF: papers/98sigmod-quantiles.pdf
PS: papers/98sigmod-quantiles.ps
PPT: papers/98sigmod-quantiles.ppt
ID: 98sigmod
Abstract:
We present new algorithms for computing approximate
quantiles of large datasets in a single pass.
The approximation guarantees are explicit and apply without regard
to the value distribution or the arrival distributions of the
dataset. The main memory requirements are smaller than those
reported earlier by an order of magnitude. We also discuss methods
that couple the approximation algorithms with random sampling to
further reduce memory requirements. With sampling, the
approximation guarantees are explicit but probabilistic, i.e., they
apply with respect to a (user controlled) confidence parameter. We
present the algorithms, their theoretical analysis and simulation
results.
Section: Peer to Peer Systems -- Distributed Hash Tables
Title: Brief Announcement: Papillon: Greedy Routing in Rings
Authors: I Abraham, D Malkhi and G S Manku
ConfAcronym: DISC 2005
ConfName: 19th International Symposium on Distributed Computing
ConfURL: http://www.mimuw.edu.pl/~disc2005
ConfPages: 514-515
ConfMonth: September
ConfYear: 2005
PDF: papers/05disc-papillon.pdf
PS: papers/05disc-papillon.ps
PPT:
ID: 05disc
Abstract:
We construct the first n-node degree-d ring-based network with
worst-case GREEDY routes of length Θ(log n / log d) hops.
Title: Decentralized Algorithms using Both Local and Random Probes for P2P Load Balancing
Authors: K Kenthapadi and G S Manku
ConfAcronym: SPAA 2005
ConfName: 17th ACM Symposium on Parallelism in Algorithms and Architectures
ConfURL: http://www.cs.jhu.edu/~spaa/2005/index.html
ConfPages: 135-144
ConfMonth: July
ConfYear: 2005
PDF: papers/05spaa-localrandom.pdf
PS: papers/05spaa-localrandom.ps
PPT: papers/05spaa-localrandom.ppt
ID: 05spaa
Abstract:
We study randomized algorithms for placing a sequence of n nodes on
a circle with unit perimeter. Nodes divide the circle into disjoint
arcs. We desire that a newly-arrived node (which is oblivious of
its index in the sequence) choose its position on the circle by
learning the positions of as few existing nodes as possible. At the
same time, we desire that that the variation in arc-lengths be
small. To this end, we propose a new algorithm that works as
follows: The k-th node chooses r random points on the circle,
inspects the sizes of v arcs in the vicinity of each random point,
and places itself at the mid-point of the largest arc encountered.
We show that for any combination of r and v satisfying rv ≥ c log
k, where c is a small constant, the ratio of the largest to the
smallest arc-length is at most eight w.h.p., for an arbitrarily long
sequence of n nodes. This strategy of node placement underlies a
novel decentralized load-balancing algorithm that we propose for
Distributed Hash Tables (DHTs) in peer-to-peer environments.
Underlying the analysis of our algorithm is Structured Coupon
Collection over n/b disjoint cliques with b nodes per clique,
for any n, b ≥ 1. Nodes are initially uncovered. At each step,
we choose d nodes independently and uniformly at random. If all the
nodes in the corresponding cliques are covered, we do nothing.
Otherwise, from among the chosen cliques with at least one uncovered
node, we select one at random and cover an uncovered node within
that clique. We show that as long as bd ≥ c log n, O(n) steps
are sufficient to cover all nodes w.h.p. and each of the first
Ω(n) steps succeeds in covering a node w.h.p. These results
are then utilized to analyze a stochastic process for growing binary
trees that are highly balanced -- the leaves of the tree belong to
at most four different levels with high probability.
Title: Balanced Binary Trees for ID Management and Load Balance in Distributed Hash Tables
Authors: G S Manku
ConfAcronym: PODC 2004
ConfName: 23rd ACM Symposium on Principles of Distributed Computing
ConfURL: http://www.podc.org/podc2004
ConfPages: 197-205
ConfMonth: July
ConfYear: 2004
PDF: papers/04podc-id.pdf
PS: papers/04podc-id.ps
PPT: papers/04podc-id.ppt
ID: 04podc
Abstract:
We present a low-cost, decentralized algorithm for ID management in
distributed hash tables (DHTs) managed by a dynamic set of hosts.
Each host is assigned an ID in the unit interval [0, 1). At any
time, the set of IDs splits the interval into disjoint partitions.
Hosts do not possess global knowledge of other IDs in the system.
The challenge then is to design an efficient decentralized algorithm
that maintains roughly equi-sized partitions, in the face of
arrivals, departures and changes in the average number of hosts.
Our ID management algorithm is the first to enjoy all of
the following properties: (a) both arrivals and departures of hosts
are handled, (b) departure of a host causes at most one existing
host to change its ID, (c) the ratio of the largest to the smallest
partition is at most 4, with high probability, and (d) the expected
cost per arrival/departure is Θ(R + log n) messages, where n
denotes the current number of participants, and R denotes the cost
of routing one message in the DHT. In fact, our algorithm is
independent of the topology of the overlay network used for routing.
Variations of our algorithm diminish the ratio between the
largest and the smallest partition to (1+ε), for any
ε > 0, albeit at the cost of re-assigning the IDs of
O(1/ε) existing hosts per arrival/departure. Ours is the
first algorithm that allows such fine-tuning.
Finally, our
ID management algorithm enables (a) estimation of the total number
of hosts in the system by making only local measurements, and (b)
emulation of a variety of deterministic and randomized families of
routing topologies, in a straightforward fashion. Among these
families are several networks that require O(log n / log k) routing
hops in an n-node network with k links per node.
Title: Know thy Neighbor's Neighbor: the Power of Lookahead in Randomized P2P Networks
Authors: G S Manku, M Naor and U Wieder
ConfAcronym: STOC 2004
ConfName: 36th ACM Symposium on Theory of Computing
ConfURL: http://www.cs.uchicago.edu/~stoc04
ConfPages: 54-63
ConfMonth: June
ConfYear: 2004
PDF: papers/04stoc-nn.pdf
PS: papers/04stoc-nn.ps
PPT: papers/04stoc-nn.ppt
ID: 04stoc
Abstract:
Several peer-to-peer networks are based upon randomized graph
topologies that permit efficient GREEDY routing, e.g., randomized
hypercubes, randomized Chord, skip-graphs and constructions based
upon small-world percolation networks. In each of these networks, a
node has out-degree Θ(log n), where n denotes the total number
of nodes, and GREEDY routing is known to take O(log n) hops on
average. We establish lower-bounds for GREEDY routing for these
networks, and analyze Neighbor-of-Neighbor (NoN)-GREEDY routing.
The idea behind NoN, as the name suggests, is to take a neighbor's
neighbors into account for making better routing decisions.
The following picture emerges: Deterministic routing
networks like hypercubes and Chord have diameter Θ(log n) and
GREEDY routing is optimal. Randomized routing networks like
randomized hypercubes, randomized Chord, and constructions based on
small-world percolation networks, have diameter Θ(log n / log
log n) with high probability. The expected diameter of Skip graphs
is also Θ(log n / log log n). In all of these networks,
GREEDY routing fails to find short routes, requiring Ω(log n)
hops with high probability. Surprisingly, the NoN-GREEDY routing
algorithm is able to diminish route-lengths to Θ(log n / log
log n) hops, which is asymptotically optimal.
Title: Optimal Routing in Chord
Authors: P Ganesan and G S Manku
ConfAcronym: SODA 2004
ConfName: 15th Annual ACM-SIAM Symposium on Discrete Algorithms
ConfURL: http://www.siam.org/meetings/da04
ConfPages: 169-178
ConfMonth: January
ConfYear: 2004
PDF: papers/04soda-chord.pdf
PS: papers/04soda-chord.ps
PPT: papers/04soda-chord.ppt
ID: 04soda
Abstract:
We propose optimal routing algorithms for Chord, a popular topology
for routing in peer-to-peer networks. Chord is an undirected graph
on 2^b nodes arranged in a circle, with edges between pairs of nodes
that are 2^k positions apart for any k ≥ 0. The standard Chord
routing algorithm uses edges in only one direction. Our algorithms
exploit the bidirectionality of edges for optimality. At the heart
of the new protocols lie algorithms for writing a positive integer d
as the difference of two non-negative integers d′ and d″
such that the total number of 1-bits in the binary representation of
d' and d″ is minimized. Given that Chord is a variant of the
hypercube, the optimal routes possess a surprising combinatorial
structure.
Title: Routing Networks for Distributed Hash Tables
Authors: G S Manku
ConfAcronym: PODC 2003
ConfName: 22nd ACM Symposium on Principles of Distributed Computing
ConfURL: http://www.podc.org/podc2003
ConfPages: 133-142
ConfMonth: June
ConfYear: 2003
PDF: papers/03podc-dht.pdf
PS: papers/03podc-dht.ps
PPT: papers/03podc-dht.ppt
ID: 03podc
Abstract:
Routing topologies for distributed hashing in peer-to-peer networks
are classified into two categories: deterministic and randomized. A
general technique for constructing deterministic routing topologies
is presented. Using this technique, classical parallel
interconnection networks can be adapted to handle the dynamic nature
of participants in peer-to-peer networks. A unified picture of
randomized routing topologies is also presented. Two new protocols
are described which improve average latency as a function of
out-degree. One of the protocols can be shown to be optimal with
high probability. Finally, routing networks for distributed hashing
are revisited from a systems perspective and several open design
problems are listed.
Title: Symphony: Distributed Hashing in a Small World
Authors: G S Manku, M Bawa and P Raghavan
ConfAcronym: USITS 2003
ConfName: 4th USENIX Symposium on Internet Technologies and Systems
ConfURL: http://www.usenix.org/events/usits03/
ConfPages: 127-140
ConfMonth: March
ConfYear: 2003
PDF: papers/03usits-symphony.pdf
PS: papers/03usits-symphony.ps
PPT: papers/03usits-symphony.ppt
ID: 03usits
Abstract:
Symphony is a protocol for maintaining distributed hash tables in a
wide area network. The key idea is to arrange all participants
along a ring and equip them with long distance contacts
drawn from a family of harmonic distributions. Symphony is inspired
by Kleinberg's Small World construction. We extend Kleinberg's
theoretical result by showing that with k links per node, greedy
routing can achieve an average latency of O((log² n) / k)
hops. The paper builds upon this basic idea to engineer a practical
protocol that is scalable, flexible, stable in the presence of
frequent updates and offers small average latency with only a
handful of long distance links per node. The cost of updates when
hosts join and leave is small.
Title: SETS: Search Enhanced by Topic Segmentation
Authors: M Bawa, G S Manku, and P Raghavan
ConfAcronym: SIGIR 2003
ConfName: 26th International ACM SIGIR 2003
ConfURL: http://www.sigir2003.org/
ConfPages: 306-313
ConfMonth: July
ConfYear: 2003
PDF: papers/03sigir-sets.pdf
PS: papers/03sigir-sets.ps
PPT: papers/03sigir-sets.ppt
ID: 03sigir
Abstract:
We present SETS, an architecture for efficient search in
peer-to-peer networks, building upon ideas drawn from machine
learning and social network theory. The key idea is to arrange
participating sites in a topic-segmented overlay topology
in which most connections are short-distance, connecting
pairs of sites with similar content. Topically focused sets of
sites are then joined together into a single network by
long-distance links. Queries are matched and routed to
only the topically closest regions. We discuss a variety of design
issues and tradeoffs that an implementor of SETS would face. We
show that SETS is efficient in network traffic and query processing
load.
Section: Miscellaneous Subjects
Title: A Loop-free Gray Code for Minimal Signed-Binary Representations
Authors: G S Manku and J Sawada
ConfAcronym: ESA 2005
ConfName: 13th Annual European Symposium on Algorithms
ConfURL: http://www.lsi.upc.edu/~algo05/?cmd=esa2005
ConfPages: 438-447
ConfMonth: Oct
ConfYear: 2005
PDF: papers/05esa-graycode.pdf
PS: papers/05esa-graycode.ps
PPT: papers/05esa-graycode.ppt
ID: 05esa
Abstract:
A string
over an alphabet {-1, 0, 1} is said to be a minimal signed-binary
representation of an integer n if for and the
number of non-zero digits is minimal. We present a loopless (and
hence a Gray code) algorithm for generating all minimal
signed-binary representations of a given integer n.
Title: Self-Similarity in File-System Traffic
Authors: S D Gribble, G S Manku, D Roselli, E A Brewer, T J Gibson and E L Miller
ConfAcronym: SIGMETRICS 1998
ConfName: ACM SIGMETRICS 1998
ConfURL: http://www-sor.inria.fr/mirrors/sigmetrics98/
ConfPages: 141-150
ConfMonth: June
ConfYear: 1998
PDF: papers/98sigmetrics-selfsim.pdf
PS: papers/98sigmetrics-selfsim.ps
PPT: papers/98sigmetrics-selfsim.ppt
ID: 98sigmetrics
Abstract:
We demonstrate that high-level file system events exhibit
self-similar behaviour, but only for short-term time scales
of under a day. We do so through the analysis of four sets of
traces that span time scales of milliseconds through months, and
that differ in the trace collection method, the filesystems being
traced, and the chronological times of the tracing. Two sets of
detailed, short-term file system trace data are analyzed; both are
shown to have self-similar like behaviour, with consistent Hurst
parameters (a measure of self-similarity) for all file system
traffic as well as individual classes of file system events.
Long-term file system trace data is then analyzed, and we discover
that the traces' high variability and self-similar behaviour does
not persist across time scales of days, weeks, and months. Using
the short-term trace data, we show that sources of file system
traffic exhibit ON/OFF source behaviour, which is characterized by
highly variably lengthed bursts of activity, followed by similarly
variably lengthed periods of inactivity. This ON/OFF behaviour is
used to motivate a simple technique for synthesizing a stream of
events that exhibit the same self-similar short-term behaviour as
was observed in the file-system traces.
Title: Structural Symmetry and Model Checking
Authors: G S Manku, R Hojati and R K Brayton
ConfAcronym: CAV 1998
ConfName: 10th International Conference on Computer-Aided Verification
ConfURL: http://www.cs.ubc.ca/spider/ajh/cav98.html
ConfPages: p 159-171
ConfMonth: July
ConfYear: 1998
PDF: papers/98cav-symmetry.pdf
PS: papers/98cav-symmetry.ps
PPT:
ID: 98cav
Abstract:
We present a fully automatic framework for identifying symmetries in
structural descriptions of digital circuits and CTL* formulas and
using them in a model checker. The set of sub-formulas of a formula
is partitioned into equivalence classes so that truth values for
only one sub-formula in any class need be evaluated for model
checking. Structural symmetries in net-list descriptions of digital
circuits and CTL* formulas are formally defined and their
relationship with Kripke structures is described. A technique for
automatic identification of structural symmetries is described that
requires computation of the automorphism group of a suitable
directed labeled graph. A novel fast algorithm for this problem is
presented. Finally, experimental results are reported for BLIF-MV
net-lists derived from Verilog.
Title: Object Tracking using Affine Structure for Point Correspondences
Authors: G S Manku, P Jain, A Aggarwal, L Kumar and S Banerjee
ConfAcronym: CVPR 1997
ConfName: IEEE Conf. for Computer Vision and Pattern Recognition
ConfURL: http://www.computer.org/proceedings/cvpr/7822/7822toc.htm
ConfPages: 704-709
ConfMonth: June
ConfYear: 1997
PDF: papers/97cvpr-track.pdf
PS: papers/97cvpr-track.ps
PPT: papers/97cvpr-track.ppt
ID: 97cvpr
Abstract:
A new object tracking algorithm based on affine
structure has been developed and it is shown that the
performance is better than that of a Kalman filter based
correlation tracker. The algorithm is fast, reliable,
viewpoint invariant, and insensitiver to occlusion and/or individual
corner disappearance or reappearance. Detailed experimental
analysis on a long real image sequence is also presented.
Title: A New Voting Based Hardware Data Prefetch Scheme
Authors: G S Manku, M R Prasad and D A Patterson
ConfAcronym: HiPC 1997
ConfName: Fourth International Conference on High Performance Computing
ConfURL: http://www.hipc.org/hipc97/
ConfPages: 100-105
ConfMonth: December
ConfYear: 1997
PDF: papers/97hipc-prefetch.pdf
PS: papers/97hipc-prefetch.pdf
PPT:
ID: 97hipc
Abstract:
The dramatic increase in the processor-memory gap in recent years
has led to the development of techniques like data prefetching that
hide the latency of cache misses. Two such hardware techniques are
the stream buffer and the stride predictor. These
two have dissimilar architectures, are effective for different kinds
of memory access patterns and require different amounts of extra
memory bandwidth. We compare the performance of these two techniques
and propose a scheme that unifies them. Simulation studies
on six benchmark programs confirms that the combined scheme is more
effective in reducing the average memory access time (AMAT) than
either of the two individually.
Title: A Linear Time Algorithm for the Bottleneck Biconnected Spanning Subgraph Problem
Authors: G S Manku
ConfAcronym: IPL 1996
ConfName: Information Processing Letters
ConfVolume: 59
ConfNumber: 1
ConfURL:
ConfPages: 1-7
ConfMonth: July
ConfYear: 1996
PDF: papers/96ipl-bbssp.pdf
PS: papers/96ipl-bbssp.ps
PPT:
ID: 96ipl
Abstract:
A linear time algorithm for the Bottleneck Biconnected
Spanning Subgraph problem is presented. This improves the
hitherto best-known solution, which has a running time of O(m + n
log n), where m and n are the number of edges and vertices of the
graph. Linear time is acheived by first mapping the problem onto a
special kind of graphs (that we call B_Graphs), which are shown to
have nice structural properties. Then, a binary search on the
weights of edges is done. The main idea is to use the properties of
the B_Graph to shrink it by removing redundant edges and vertices in
each iteration, thereby reducing the size of the graph by at least a
factor of two.
Title: Circuit Partitioning with Partial Order for Mixed Simulation Emulation Environment
Authors: G S Manku, A Kumar and S Kumar
ConfAcronym: RSP 1995
ConfName: Sixth Intl. Conf. on Rapid System Prototyping
ConfURL: http://ada.computer.org/conferen/proceed/rsp95/rsp95tc.htm
ConfPages: 201-207
ConfMonth: June
ConfYear: 1995
PDF: papers/95rsp-partition.pdf
PS: papers/95rsp-partition.ps
PPT:
ID: 95rsp
Abstract:
A low-cost hybrid simulator for VLSI circuits has been under
development at IIT Delhi. The simulator uses a Reconfigurable
System (RS) consisting of a limited number of FPGAs for hardware
emulation and blends the ideas of hardware emulation with
conventional software simulation. A crucial preparatory step is to
partition a given circuit into as few parts as possible. The parts
are then downloaded onto the RS one by one and emulated in stand
alone mode or in conjunction with software simulator. The hybrid
simulation environment poses some unique requirements on the
partitioner. This paper presents an efficient partitioning
algorithm for this purpose. A study of performance of the algorithm
on 32 benchmark circuits for various I/O and size constraints of
FPGAs has been carried out and good results have been
obtained.
Section: Patents
Title: System and Method for Searching Peer-to-Peer Computer Networks by Selecting a Computer Based on At Least a Number of Files Shared by the Computer
Authors: W J Labio, G T Nguyen, W W Liu, G S Manku (assignee: Napster Inc.)
ConfAcronym: US Patent #07089301
ConfName: Issued: Aug 8, 2006
ConfURL:http://www.google.com/patents?vid=USPAT7089301&id=JMV6AAAAEBAJ&dq=gurmeet+manku&jtp=1
ConfPages: 1-14
ConfMonth: August
ConfYear: 2006
PDF:
PS:
PPT:
ID: 06patent
Abstract:
A method and system for intelligently directing a search of a
peer-to-peer network, in which a user performing a search is
assisted in choosing a host which is likely to return fast,
favorable results to the user. A host monitor monitors the
peer-to-peer network and collects data on various characteristics of
the hosts which make up the network. Thereafter, a host selector
ranks the hosts using the data, and passes this information to the
user. The user then selects one or more of the highly-ranked hosts
as an entry point into the network. Additionally, a cache may
collect a list of hosts based on the content on the hosts. In this
way, a user may choose to connect to a host which is known to
contain information relevant to the user's search. The host selector
may be used to select from among the hosts listed in the cache.
Title: Single Pass Space Efficient System and Method for Generating an Approximate Quantile in a Data Set Having an Unknown Size
Authors: B G Lindsay, G S Manku, S Rajagopalan (assignee: IBM Corp.)
ConfAcronym: US Patent #06343288
ConfName: Issued: Jan 29, 2002
ConfURL: http://www.google.com/patents?vid=USPAT6343288&id=fzkLAAAAEBAJ&dq=gurmeet+manku
ConfPages: 1-20
ConfMonth: January
ConfYear: 2002
PDF: patents/02patent.pdf
PS:
PPT:
ID: 02patent
Abstract:
A space-efficient system and method for generating an approximate
φ-quantile data element of a data set in a single pass over the
data set, without a priori knowledge of the size of the data
set. The approximate φ-quantile is guaranteed to lie within a
user-specified approximation error ε of the true quantile
being sought with a probability of at least 1-δ, with δ
being a user-defined probability of failure. B buffers, each having
a capacity of k elements, initially are filled with elements from
the data set, with the values of b and k depending on approximation
error e and the probability δ. The buffers are then collapsed
into an output buffer, with the remaining buffers then being
refilled with elements, collapsed (along with the previous output
buffer), and so on until the entire data set has been processed and
a single output remains. The element of the output corresponding to
the approximate quantile is then output as the approximate quantile.
In later iterations (when the height of the tree is at least equal
to a predetermined height that depends on δ and ε),
the data is sampled non-uniformly to populate the buffers to render
the desired performance. Parallel processors can be used, with the
final output buffers of the processors being sent to a collecting
processor P0 as input buffers to the collecting processor
P0.
Title: Single Pass Space Efficient System and Method for Generating Approximate Quantiles Satisfying an Apriori User-Defined Approximation Error
Authors: B G Lindsay, G S Manku, S Rajagopalan (assignee: IBM Corp.)
ConfAcronym: US Patent #06108658
ConfName: Issued: Aug 22, 2000
ConfURL: http://www.google.com/patents?vid=USPAT6108658&id=E2kEAAAAEBAJ&dq=gurmeet+manku
ConfPages: 1-17
ConfMonth: August
ConfYear: 2000
PDF: patents/00patent.pdf
PS:
PPT:
ID: 00patent
Abstract:
A system and method for finding an ε-approximate
φ-quantile data element of a data set with N data elements in a
single pass over the data set. The ε-approximate
φ-quantile data element is guaranteed to lie within a
user-specified approximation error ε of a true
φ-quantile data element being sought. B buffers, each having a
capacity of k elements, initially are filled with sorted data
elements from the data set, with the values of b and k depending on
ε and N. The buffers are then collapsed into an output
buffer, with the remaining buffers then being refilled with data
elements, collapsed (along with the previous output buffer), and so
on until the entire data set has been processed and a single output
buffer remains. A data element of the output buffer corresponding to
the ε-approximate φ-quantile is then output as the
approximate φ-quantile data element. If desired, the system and
method can be practiced with sampling to even further reduce the
amount of space required to find a desired ε-approximate
φ-quantile data element.
Section: Theses
Title: Dipsea: A Modular Distributed Hash Table
Authors: G S Manku
ConfAcronym: Ph. D. Thesis
ConfName: Stanford University
ConfURL: phd/index.html
ConfPages: 1-198
ConfMonth: August
ConfYear: 2004
PDF: papers/phd-thesis.pdf
PS: papers/phd-thesis.ps
PPT:
HTML: phd/index.html
ID: 04thesis
Abstract:
Not available yet.
Title: Structural Symmetries and Model Checking
Authors: G S Manku
ConfAcronym: M. S. Thesis
ConfName: U C Berkeley, Tech Report UCB/ERL M97/92
ConfURL:
ConfPages: 1-76
ConfMonth: December
ConfYear: 1997
PDF: papers/ms-thesis.pdf
PS: papers/ms-thesis.ps
PPT:
ID: 97thesis
Abstract:
Not available yet.
Title: Object Tracking using Affine Multiple Views Geomtry
Authors: G S Manku and H Nautiyal
ConfAcronym: B. Tech. Thesis
ConfName: IIT Delhi (won the Best B.Tech. Project Award)
ConfURL:
ConfPages: 1-56
ConfMonth: May
ConfYear: 1995
PDF: papers/btech-thesis.pdf
PS: papers/btech-thesis.ps
PPT:
ID: 95thesis
Abstract:
Not available yet.