Index of /~cysuen/curis/main/clustering/binders/text/completed

Icon  Name                                         Last modified      Size  Description
[DIR] Parent Directory - [TXT] completed_clusters_2012-10-31.txt 28-Nov-2012 01:56 35M [TXT] completed_clusters_2012-10-30.txt 28-Nov-2012 01:48 33M [TXT] completed_clusters_2012-10-29.txt 28-Nov-2012 01:40 40M [TXT] completed_clusters_2012-10-28.txt 28-Nov-2012 01:32 37M [TXT] completed_clusters_2012-10-27.txt 28-Nov-2012 01:27 29M [TXT] completed_clusters_2012-10-26.txt 28-Nov-2012 01:24 39M [TXT] completed_clusters_2012-10-25.txt 28-Nov-2012 01:17 43M [TXT] completed_clusters_2012-10-24.txt 28-Nov-2012 01:08 38M [TXT] completed_clusters_2012-10-23.txt 28-Nov-2012 01:00 40M [TXT] completed_clusters_2012-10-22.txt 28-Nov-2012 00:51 45M [TXT] completed_clusters_2012-10-21.txt 28-Nov-2012 00:42 36M [TXT] completed_clusters_2012-10-20.txt 28-Nov-2012 00:39 34M [TXT] completed_clusters_2012-10-19.txt 28-Nov-2012 00:34 41M [TXT] completed_clusters_2012-10-18.txt 28-Nov-2012 00:27 44M [TXT] completed_clusters_2012-10-17.txt 28-Nov-2012 00:18 36M [TXT] completed_clusters_2012-10-16.txt 28-Nov-2012 00:10 37M [TXT] completed_clusters_2012-10-15.txt 28-Nov-2012 00:03 46M [TXT] completed_clusters_2012-10-14.txt 27-Nov-2012 23:53 37M [TXT] completed_clusters_2012-10-13.txt 27-Nov-2012 23:48 33M [TXT] completed_clusters_2012-10-12.txt 27-Nov-2012 23:43 41M [TXT] completed_clusters_2012-10-11.txt 27-Nov-2012 23:35 43M [TXT] completed_clusters_2012-10-10.txt 27-Nov-2012 23:26 39M [TXT] completed_clusters_2012-10-09.txt 27-Nov-2012 23:15 39M [TXT] completed_clusters_2012-10-08.txt 27-Nov-2012 23:06 43M [TXT] completed_clusters_2012-10-07.txt 27-Nov-2012 22:58 39M [TXT] completed_clusters_2012-10-06.txt 27-Nov-2012 22:54 31M [TXT] completed_clusters_2012-10-05.txt 27-Nov-2012 22:49 40M [TXT] completed_clusters_2012-10-04.txt 27-Nov-2012 22:44 43M [TXT] completed_clusters_2012-10-03.txt 27-Nov-2012 22:36 37M [TXT] completed_clusters_2012-10-02.txt 27-Nov-2012 22:27 40M [TXT] completed_clusters_2012-10-01.txt 27-Nov-2012 22:16 43M [TXT] completed_clusters_2012-09-30.txt 27-Nov-2012 21:42 38M [TXT] completed_clusters_2012-09-29.txt 27-Nov-2012 21:38 34M
(same as the memetracker.org website)
Data format: Tab separated file with the following nested structure. Each block of the data has the following structure:
    A:  <ClId>  <Start> <End>   <NmVar> <TotFq> <NmPks> <RtLen> <Root>  <RpURL> <First> <Last>  <PkTm>  <Archived>  <DisSt>
    B:          <QtId>  <QtFq>  <NmPks> <Len>   <Quote> <RpURL> <First> <Last>  <PkTm>
    C:                  <DocId> <Tm>   <Url>
    
    
    Id\tSize\tPeaks\tNumWords\tQuote String\tRepresentative URL\tFirst Mention\tLast Mention\tPeak Time";
    
    
  TStr Response = "# Clusters: Id\tStart\tEnd\t";
  Response += "Variants\tSize\tPeaks\tNumWords\tLongest Variant\tRepresentative URL\t";
  Response += "First Mention Time\tLast Mention Time\tPeak Time\tArchived\tDiscard State";
    
<ClId>: cluster id.
<Start>: date the cluster was born
<End>: date the cluster was retired.
<NmVar>: number of unique phrase variants in the cluster (number of B records).
<TotFq>: total frequency (number of mentions) of all the phrases/variants in the cluster.
<NmPks>: number of computed peaks in cluster
<RtLen>: length (in words) of root variant
<Root>: root variant, i.e. the longest phrase in the cluster
<RpURL>: representative url of the cluster
<First>: time of first recorded URL mention of any phrase variant in the cluster
<Last>: time of last recorded URL mention of any phrase variant in the cluster
<PkTm>: time of highest recorded peak in cluster.
<Archived>: whether cluster was in an archived state at time of completion. 0 false, 1 true.
<DisSt>: whether cluster had been filtered out during post-clustering step. 0 no, 1-2 yes.

<QtId>: phrase/variant id.
<QtFq>: total frequency (number of mentions) of the phrase.
<NmPks>: number of peaks of mentions in the phrase
<Len>: length (in words) of phrase
<Quote>: the quote phrase
<RpURL>: representative url of the quote phrase

<DocId>: document/URL id
<Tm>: time when the article/post <Url>: was published.
<Url>: URL of the blog post/news article.