FAQ
Hello,

I have XML-documents containing image information. These images should
be indexed with the document by having one additional field with the
image stuff. Could anyone please give me some hints how I can manage this?

Regards,
Bernd

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Search Discussions

  • Erick Erickson at Jun 10, 2008 at 12:27 pm
    You add as many fields to the document while indexing as you need to
    correctly contain your "image stuff".

    If this answer seems cryptic, it's as clear as your problem statement
    <G>. To give you a meaningful answer, we need a much clearer
    problem statement. What is "image stuff", what format is it in, and
    what do you want to do with it?

    Lucene doesn't index binary data...

    Best
    Erick
    On Tue, Jun 10, 2008 at 8:13 AM, Bernd Mueller wrote:

    Hello,

    I have XML-documents containing image information. These images should be
    indexed with the document by having one additional field with the image
    stuff. Could anyone please give me some hints how I can manage this?

    Regards,
    Bernd

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Bernd Mueller at Jun 10, 2008 at 12:39 pm
    Hi Erick,


    Thanks for your fast reply.

    I will try to explain what I mean with image stuff. An image in
    xml-documents is usually an url to the location where the image is
    stored. Additionally, such an image tag contains information like
    offset, width and height. I am thinking about storing the images (binary
    data) and meta-information (offset, width, height) in one field instead
    of using these tagging information as values for a field in the index.

    I guess, your last statement "Lucene doesn't index binary data..."
    indicates that this isn't possible...


    Regards,

    Bernd


    Erick Erickson wrote:
    You add as many fields to the document while indexing as you need to
    correctly contain your "image stuff".

    If this answer seems cryptic, it's as clear as your problem statement
    <G>. To give you a meaningful answer, we need a much clearer
    problem statement. What is "image stuff", what format is it in, and
    what do you want to do with it?

    Lucene doesn't index binary data...

    Best
    Erick

    On Tue, Jun 10, 2008 at 8:13 AM, Bernd Mueller wrote:


    Hello,

    I have XML-documents containing image information. These images should be
    indexed with the document by having one additional field with the image
    stuff. Could anyone please give me some hints how I can manage this?

    Regards,
    Bernd

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Steven A Rowe at Jun 10, 2008 at 4:16 pm
    Hi Bernd,

    It's still not clear what you want to do. What will a search look like?
    On 06/10/2008 at 8:36 AM, Bernd Mueller wrote:
    I will try to explain what I mean with image stuff. An image in
    xml-documents is usually an url to the location where the image is
    stored. Additionally, such an image tag contains information like
    offset, width and height. I am thinking about storing the
    images (binary data) and meta-information (offset, width, height) in one
    field instead of using these tagging information as values for a field
    in the index.
    You seem to be saying that you want to store the binary image along with its metadata all in one field - is this correct? Why do you want to do this?
    I guess, your last statement "Lucene doesn't index binary data..."
    indicates that this isn't possible...
    While Lucene will not *index* binary data, it can *store* it.

    Steve
    Erick Erickson wrote:
    You add as many fields to the document while indexing as you need to
    correctly contain your "image stuff".

    If this answer seems cryptic, it's as clear as your problem statement
    <G>. To give you a meaningful answer, we need a much clearer
    problem statement. What is "image stuff", what format is it in, and
    what do you want to do with it?

    Lucene doesn't index binary data...

    Best
    Erick

    On Tue, Jun 10, 2008 at 8:13 AM, Bernd Mueller
    wrote:
    Hello,

    I have XML-documents containing image information. These images should
    be indexed with the document by having one additional field with the
    image stuff. Could anyone please give me some hints how I can manage
    this?

    Regards,
    Bernd
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Bernd Mueller at Jun 10, 2008 at 7:22 pm
    Hi Steve,


    Actually it is not about searching the binary data. I just wanna have the
    whole document (together with the "image stuff") to be stored in the
    index. So when I got a hit of a search on a document that I can extract
    the document with its images (and the images' meta information).

    Nice would be some data structure to be stored in the "image"-field of the
    document (maybe a HashSet?). So I could extract the found documents with
    all their image information from the index for easy visualization purpose
    of the search results.


    Regards,

    Bernd


    Steven A Rowe wrote:
    Hi Bernd,

    It's still not clear what you want to do. What will a search look like?
    On 06/10/2008 at 8:36 AM, Bernd Mueller wrote:
    I will try to explain what I mean with image stuff. An image in
    xml-documents is usually an url to the location where the image is
    stored. Additionally, such an image tag contains information like
    offset, width and height. I am thinking about storing the
    images (binary data) and meta-information (offset, width, height) in one
    field instead of using these tagging information as values for a field
    in the index.
    You seem to be saying that you want to store the binary image along with
    its metadata all in one field - is this correct? Why do you want to do
    this?
    I guess, your last statement "Lucene doesn't index binary data..."
    indicates that this isn't possible...
    While Lucene will not *index* binary data, it can *store* it.

    Steve
    Erick Erickson wrote:
    You add as many fields to the document while indexing as you need to
    correctly contain your "image stuff".

    If this answer seems cryptic, it's as clear as your problem statement
    <G>. To give you a meaningful answer, we need a much clearer
    problem statement. What is "image stuff", what format is it in, and
    what do you want to do with it?

    Lucene doesn't index binary data...

    Best
    Erick

    On Tue, Jun 10, 2008 at 8:13 AM, Bernd Mueller
    wrote:
    Hello,

    I have XML-documents containing image information. These images
    should
    be indexed with the document by having one additional field with the
    image stuff. Could anyone please give me some hints how I can manage
    this?

    Regards,
    Bernd
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

    --
    Diplom Informatiker (FH)
    Bernd Mueller
    Fraunhofer SCAI
    Schloß Birlinghoven
    53754 Sankt Augustin

    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org
  • Steven A Rowe at Jun 10, 2008 at 7:32 pm
    Hi Bernd,
    On 06/10/2008 at 2:40 PM, Bernd Mueller wrote:
    Actually it is not about searching the binary data. I just
    wanna have the whole document (together with the "image stuff")
    to be stored in the index. So when I got a hit of a search on a
    document that I can extract the document with its images (and
    the images' meta information).

    Nice would be some data structure to be stored in the
    "image"-field of the document (maybe a HashSet?). So I
    could extract the found documents with all their image information
    from the index for easy visualization purpose of the search results.
    Lucene field values are flat - the kind of complex object serialization and de-serialization you're talking about is not directly supported.

    That said, if you manage the packing and unpacking yourself, Lucene can help. For example, if you serialize the image data and its metadata as a YAML file, with the binary data encoded as Base64 or something similar, you should be able to store and retrieve it as a String. Or maybe you could store a .jar file in a binary field - that would probably be simplest.

    Steve
    Steven A Rowe wrote:
    Hi Bernd,

    It's still not clear what you want to do. What will a search look like?
    On 06/10/2008 at 8:36 AM, Bernd Mueller wrote:
    I will try to explain what I mean with image stuff. An image in
    xml-documents is usually an url to the location where the image is
    stored. Additionally, such an image tag contains information like
    offset, width and height. I am thinking about storing the images
    (binary data) and meta-information (offset, width, height) in one
    field instead of using these tagging information as values for a field
    in the index.
    You seem to be saying that you want to store the binary image along
    with its metadata all in one field - is this correct? Why do you want
    to do this?
    I guess, your last statement "Lucene doesn't index binary data..."
    indicates that this isn't possible...
    While Lucene will not *index* binary data, it can *store* it.

    Steve
    Erick Erickson wrote:
    You add as many fields to the document while indexing as you need to
    correctly contain your "image stuff".

    If this answer seems cryptic, it's as clear as your problem statement
    <G>. To give you a meaningful answer, we need a much clearer problem
    statement. What is "image stuff", what format is it in, and what do
    you want to do with it?

    Lucene doesn't index binary data...

    Best
    Erick

    On Tue, Jun 10, 2008 at 8:13 AM, Bernd Mueller
    wrote:
    Hello,

    I have XML-documents containing image information. These images
    should be indexed with the document by having one additional field
    with the image stuff. Could anyone please give me some hints how I
    can manage this?

    Regards,
    Bernd
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
    For additional commands, e-mail: java-user-help@lucene.apache.org

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupjava-user @
categorieslucene
postedJun 10, '08 at 12:16p
activeJun 10, '08 at 7:32p
posts6
users4
websitelucene.apache.org

People

Translate

site design / logo © 2022 Grokbase