Hello - Anyone have any thoughts on how to implement this



I am using Lucene.Net of version 2.0.0.4 with ASP.NET (Microsoft .NET
Framework 2.0). I have following query.



I want to implement an application where I want to give two facility to
user, one is "Index" and second is "Re-Index".



Index - This is index all documents of given directory.

Re-Index - This will append above created index file with newly added
documents in given directory. Assume that directory is same as above.



In Re-Index an application should not index documents which are all ready
indexed but index only which are newly added to the folder. Re-Index should
append index to existing index file which has been created by an application
when user clicks on Index button.



Thank you Todd McIndoo Speedy Solutions

Search Discussions

  • Michael Garski at Jan 26, 2010 at 7:08 pm
    Todd,

    You'll have to keep track of what has been indexed through some means to ensure items are not indexed twice. My first thought is to use file creation or modification times to know what to add to the index.

    Michael

    -----Original Message-----
    From: Todd McIndoo
    Sent: Tuesday, January 26, 2010 10:22 AM
    To: lucene-net-user@lucene.apache.org
    Subject: Index and Reindex in Lucene.Net

    Hello - Anyone have any thoughts on how to implement this



    I am using Lucene.Net of version 2.0.0.4 with ASP.NET (Microsoft .NET
    Framework 2.0). I have following query.



    I want to implement an application where I want to give two facility to
    user, one is "Index" and second is "Re-Index".



    Index - This is index all documents of given directory.

    Re-Index - This will append above created index file with newly added
    documents in given directory. Assume that directory is same as above.



    In Re-Index an application should not index documents which are all ready
    indexed but index only which are newly added to the folder. Re-Index should
    append index to existing index file which has been created by an application
    when user clicks on Index button.



    Thank you Todd McIndoo Speedy Solutions
  • Shashi Kant at Jan 26, 2010 at 7:22 pm
    Another approach is to store a file signature Field (such as a hash)
    to see which ones have been modified and hence need re-indexing.

    On Tue, Jan 26, 2010 at 2:06 PM, Michael Garski wrote:
    Todd,

    You'll have to keep track of what has been indexed through some means to ensure items are not indexed twice.  My first thought is to use file creation or modification times to know what to add to the index.

    Michael

    -----Original Message-----
    From: Todd McIndoo
    Sent: Tuesday, January 26, 2010 10:22 AM
    To: lucene-net-user@lucene.apache.org
    Subject: Index and Reindex in Lucene.Net

    Hello - Anyone have any thoughts on how to implement this



    I am using Lucene.Net of version 2.0.0.4 with ASP.NET (Microsoft .NET
    Framework 2.0). I have following query.



    I want to implement an application where I want to give two facility to
    user, one is "Index" and second is "Re-Index".



    Index - This is index all documents of given directory.

    Re-Index - This will append above created index file with newly added
    documents in given directory. Assume that directory is same as above.



    In Re-Index an application should not index documents which are all ready
    indexed but index only which are newly added to the folder. Re-Index should
    append index to existing index file which has been created by an application
    when user clicks on Index button.



    Thank you  Todd McIndoo Speedy Solutions
  • Moray McConnachie at Jan 27, 2010 at 8:43 am
    Since you will need list of all files already indexed in directory, it will be swiftest to index directory as a Lucene Field, as well as filename,mod times, file size, file attribute hash or whatever.

    Then you can make a dictionary of filename,filehash for all documents already indexed, using a lucene query on the directory and pulling out the fields you need (e.g. filename,filehash)

    Then you iterate the files in the directory, using the dictionary alone to compare against existing index and build a list of what needs to be reindexed.

    You don't mention deletions, but you might need to build a list of what to remove too.
    Moray
    ------------------
    Moray McConnachie
    Director of IT,
    Oxford Analytica


    -----Original Message-----
    From: Shashi Kant <skant@sloan.mit.edu>
    Date: Tue, 26 Jan 2010 14:21:15
    To: <lucene-net-user@lucene.apache.org>
    Subject: Re: Index and Reindex in Lucene.Net

    Another approach is to store a file signature Field (such as a hash)
    to see which ones have been modified and hence need re-indexing.

    On Tue, Jan 26, 2010 at 2:06 PM, Michael Garski wrote:
    Todd,

    You'll have to keep track of what has been indexed through some means to ensure items are not indexed twice.  My first thought is to use file creation or modification times to know what to add to the index.

    Michael

    -----Original Message-----
    From: Todd McIndoo
    Sent: Tuesday, January 26, 2010 10:22 AM
    To: lucene-net-user@lucene.apache.org
    Subject: Index and Reindex in Lucene.Net

    Hello - Anyone have any thoughts on how to implement this



    I am using Lucene.Net of version 2.0.0.4 with ASP.NET (Microsoft .NET
    Framework 2.0). I have following query.



    I want to implement an application where I want to give two facility to
    user, one is "Index" and second is "Re-Index".



    Index - This is index all documents of given directory.

    Re-Index - This will append above created index file with newly added
    documents in given directory. Assume that directory is same as above.



    In Re-Index an application should not index documents which are all ready
    indexed but index only which are newly added to the folder. Re-Index should
    append index to existing index file which has been created by an application
    when user clicks on Index button.



    Thank you  Todd McIndoo Speedy Solutions

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouplucene-net-user @
categorieslucene
postedJan 26, '10 at 6:59p
activeJan 27, '10 at 8:43a
posts4
users4
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase