FAQ
Hi.

Anyone have an idea of how I would create a query which finds the data
backing a trend graph where date is X and num(docs) is on Y axis ?

This is quite a common use case in "buzz" analysis and currently I'm doing a
stupid query which iterates over the date range and queries lucene for every
date. Not very fast and not very flexible.

More specifically something like this but I need to add free text query as
well and then I cannot use MySQL for performance reasons. Any ideas ?

--clip--
mysql> select count(id) as Y,publishDate as X from FeedItem where
publishDate between "2008-08-01" and "2008-08-31" group by DAY(publishDate)
order by publishDate asc;
+-------+---------------------+
Y | X |
+-------+---------------------+
26663 | 2008-08-01 00:00:00 |
22478 | 2008-08-02 00:00:00 |
25745 | 2008-08-03 00:00:00 |
30576 | 2008-08-04 00:00:00 |
31351 | 2008-08-05 00:00:00 |
31084 | 2008-08-06 00:00:00 |
31245 | 2008-08-07 00:00:00 |
29518 | 2008-08-08 00:00:00 |
26001 | 2008-08-09 00:00:00 |
28687 | 2008-08-10 00:00:00 |
32957 | 2008-08-11 00:00:00 |
33251 | 2008-08-12 00:00:00 |
33062 | 2008-08-13 00:00:00 |
33960 | 2008-08-14 00:00:00 |
31034 | 2008-08-15 00:00:00 |
26726 | 2008-08-16 00:00:00 |
27543 | 2008-08-17 00:00:00 |
36887 | 2008-08-18 00:00:00 |
35376 | 2008-08-19 00:00:00 |
34573 | 2008-08-20 00:00:00 |
33889 | 2008-08-21 00:00:00 |
30604 | 2008-08-22 00:00:00 |
26875 | 2008-08-23 00:00:00 |
27356 | 2008-08-24 00:00:00 |
33438 | 2008-08-25 00:00:00 |
33102 | 2008-08-26 00:00:00 |
31720 | 2008-08-27 00:00:00 |
26133 | 2008-08-28 00:00:00 |
22781 | 2008-08-29 00:00:00 |
20198 | 2008-08-30 00:00:00 |
20 | 2008-08-31 00:00:00 |
+-------+---------------------+


--
Marcus Herou CTO and co-founder Tailsweep AB
+46702561312
marcus.herou@tailsweep.com
http://www.tailsweep.com/
http://blogg.tailsweep.com/

Search Discussions

  • Mark harwood at Oct 10, 2008 at 9:42 am
    Assuming your date data is held as YYYYMMDD and you want daily totals....

    Term startTerm=new Term("date","20080101");
    TermEnum termEnum = indexReader.terms(startTerm);
    do
    {
    Term currentTerm = termEnum.term();
    if(currentTerm.field()!=startTerm.field())
    {
    break;
    }
    System.out.println(currentTerm+" "+termEnum.docFreq());
    }while(termEnum.next());

    Should be plenty fast but if you need to avoid counting any deleted docs you'll need to look at using "TermDocs" in this loop (or optimize your index in advance)

    Cheers,
    Mark



    ----- Original Message ----
    From: Marcus Herou <marcus.herou@tailsweep.com>
    To: java-user@lucene.apache.org
    Sent: Friday, 10 October, 2008 10:12:35
    Subject: Buzz measurement - Aggregate functions

    Hi.

    Anyone have an idea of how I would create a query which finds the data
    backing a trend graph where date is X and num(docs) is on Y axis ?

    This is quite a common use case in "buzz" analysis and currently I'm doing a
    stupid query which iterates over the date range and queries lucene for every
    date. Not very fast and not very flexible.

    More specifically something like this but I need to add free text query as
    well and then I cannot use MySQL for performance reasons. Any ideas ?

    --clip--
    mysql> select count(id) as Y,publishDate as X from FeedItem where
    publishDate between "2008-08-01" and "2008-08-31" group by DAY(publishDate)
    order by publishDate asc;
    +-------+---------------------+
    Y | X |
    +-------+---------------------+
    26663 | 2008-08-01 00:00:00 |
    22478 | 2008-08-02 00:00:00 |
    25745 | 2008-08-03 00:00:00 |
    30576 | 2008-08-04 00:00:00 |
    31351 | 2008-08-05 00:00:00 |
    31084 | 2008-08-06 00:00:00 |
    31245 | 2008-08-07 00:00:00 |
    29518 | 2008-08-08 00:00:00 |
    26001 | 2008-08-09 00:00:00 |
    28687 | 2008-08-10 00:00:00 |
    32957 | 2008-08-11 00:00:00 |
    33251 | 2008-08-12 00:00:00 |
    33062 | 2008-08-13 00:00:00 |
    33960 | 2008-08-14 00:00:00 |
    31034 | 2008-08-15 00:00:00 |
    26726 | 2008-08-16 00:00:00 |
    27543 | 2008-08-17 00:00:00 |
    36887 | 2008-08-18 00:00:00 |
    35376 | 2008-08-19 00:00:00 |
    34573 | 2008-08-20 00:00:00 |
    33889 | 2008-08-21 00:00:00 |
    30604 | 2008-08-22 00:00:00 |
    26875 | 2008-08-23 00:00:00 |
    27356 | 2008-08-24 00:00:00 |
    33438 | 2008-08-25 00:00:00 |
    33102 | 2008-08-26 00:00:00 |
    31720 | 2008-08-27 00:00:00 |
    26133 | 2008-08-28 00:00:00 |
    22781 | 2008-08-29 00:00:00 |
    20198 | 2008-08-30 00:00:00 |
    20 | 2008-08-31 00:00:00 |
    +-------+---------------------+


    --
    Marcus Herou CTO and co-founder Tailsweep AB
    +46702561312
    marcus.herou@tailsweep.com
    http://www.tailsweep.com/
    http://blogg.tailsweep.com/





    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Mark harwood at Oct 10, 2008 at 10:06 am
    Ah, sorry. Just saw the bit about the free text query too.

    A FieldCache is the answer here I suspect in order to quickly retrieve the date values for arbitrary queries.



    ----- Original Message ----
    From: mark harwood <markharw00d@yahoo.co.uk>
    To: java-user@lucene.apache.org
    Sent: Friday, 10 October, 2008 10:40:32
    Subject: Re: Buzz measurement - Aggregate functions

    Assuming your date data is held as YYYYMMDD and you want daily totals....

    Term startTerm=new Term("date","20080101");
    TermEnum termEnum = indexReader.terms(startTerm);
    do
    {
    Term currentTerm = termEnum.term();
    if(currentTerm.field()!=startTerm.field())
    {
    break;
    }
    System.out.println(currentTerm+" "+termEnum.docFreq());
    }while(termEnum.next());

    Should be plenty fast but if you need to avoid counting any deleted docs you'll need to look at using "TermDocs" in this loop (or optimize your index in advance)

    Cheers,
    Mark



    ----- Original Message ----
    From: Marcus Herou <marcus.herou@tailsweep.com>
    To: java-user@lucene.apache.org
    Sent: Friday, 10 October, 2008 10:12:35
    Subject: Buzz measurement - Aggregate functions

    Hi.

    Anyone have an idea of how I would create a query which finds the data
    backing a trend graph where date is X and num(docs) is on Y axis ?

    This is quite a common use case in "buzz" analysis and currently I'm doing a
    stupid query which iterates over the date range and queries lucene for every
    date. Not very fast and not very flexible.

    More specifically something like this but I need to add free text query as
    well and then I cannot use MySQL for performance reasons. Any ideas ?

    --clip--
    mysql> select count(id) as Y,publishDate as X from FeedItem where
    publishDate between "2008-08-01" and "2008-08-31" group by DAY(publishDate)
    order by publishDate asc;
    +-------+---------------------+
    Y | X |
    +-------+---------------------+
    26663 | 2008-08-01 00:00:00 |
    22478 | 2008-08-02 00:00:00 |
    25745 | 2008-08-03 00:00:00 |
    30576 | 2008-08-04 00:00:00 |
    31351 | 2008-08-05 00:00:00 |
    31084 | 2008-08-06 00:00:00 |
    31245 | 2008-08-07 00:00:00 |
    29518 | 2008-08-08 00:00:00 |
    26001 | 2008-08-09 00:00:00 |
    28687 | 2008-08-10 00:00:00 |
    32957 | 2008-08-11 00:00:00 |
    33251 | 2008-08-12 00:00:00 |
    33062 | 2008-08-13 00:00:00 |
    33960 | 2008-08-14 00:00:00 |
    31034 | 2008-08-15 00:00:00 |
    26726 | 2008-08-16 00:00:00 |
    27543 | 2008-08-17 00:00:00 |
    36887 | 2008-08-18 00:00:00 |
    35376 | 2008-08-19 00:00:00 |
    34573 | 2008-08-20 00:00:00 |
    33889 | 2008-08-21 00:00:00 |
    30604 | 2008-08-22 00:00:00 |
    26875 | 2008-08-23 00:00:00 |
    27356 | 2008-08-24 00:00:00 |
    33438 | 2008-08-25 00:00:00 |
    33102 | 2008-08-26 00:00:00 |
    31720 | 2008-08-27 00:00:00 |
    26133 | 2008-08-28 00:00:00 |
    22781 | 2008-08-29 00:00:00 |
    20198 | 2008-08-30 00:00:00 |
    20 | 2008-08-31 00:00:00 |
    +-------+---------------------+


    --
    Marcus Herou CTO and co-founder Tailsweep AB
    +46702561312
    marcus.herou@tailsweep.com
    http://www.tailsweep.com/
    http://blogg.tailsweep.com/





    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org




    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedOct 10, '08 at 9:13a
activeOct 10, '08 at 10:06a
posts3
users2
websitelucene.apache.org

2 users in discussion

Mark harwood: 2 posts Marcus Herou: 1 post

People

Translate

site design / logo © 2022 Grokbase