I have scoured the documentation and mailing list but can't find what seems
to be an obvious question.

I would simply like to index binary files (images, zips, dwg files) so that
they are included in the search results. They wouldn't need to be parsed
for content, only have the filenames searchable.

Is this possible with Xapian? If not has anyone come up with an alternate
strategy?

Search Discussions

  • Tom at Nov 30, 2010 at 12:29 pm
    Hi Brian,

    This is certainly possible, just by generating terms from the
    filename. But are you talking about writing an app from scratch, or
    adding this to an existing one? (I'm not sure but I think omega might
    already support this).

    Tom
    On 30 November 2010 12:01, Brian Burton wrote:
    I have scoured the documentation and mailing list but can't find what seems
    to be an obvious question.

    I would simply like to index binary files (images, zips, dwg files) so that
    they are included in the search results. ?They wouldn't need to be parsed
    for content, only have the filenames searchable.

    Is this possible with Xapian? ?If not has anyone come up with an alternate
    strategy?
    _______________________________________________
    Xapian-discuss mailing list
    Xapian-discuss at lists.xapian.org
    http://lists.xapian.org/mailman/listinfo/xapian-discuss
  • Brian Burton at Nov 30, 2010 at 11:05 pm
    I have finally cobbled together a solution so I'll post it here for anyone
    else who has this question.

    1) Open the xapian-omega source directory and edit the omindex.cc file.

    2) Starting at line 539 (in version 1.0.21) change these lines:
    } else {
    // Don't know how to index this type.
    cout << "unknown MIME type - skipping" << endl;
    return;
    }

    to this:
    } else {
    dump = file;
    title = file;
    keywords = file;
    sample = file;
    }

    This creates a sort of "catch all" to index files even if it doesn't know
    what they are.

    3) Around line 845 where the mime_map array is set up, add your extensions
    and their mimetypes like so:

    mime_map["jpg"] = "image/jpeg";
    mime_map["jpeg"] = "image/jpeg";
    mime_map["gif"] = "image/gif";
    mime_map["png"] = "image/png";
    mime_map["bmp"] = "image/bmp";
    mime_map["psd"] = "image/photoshop";
    mime_map["dwg"] = "application/acad";
    mime_map["mp3"] = "audio/mpeg";
    mime_map["avi"] = "video/avi";
    mime_map["mpg"] = "video/mpeg";

    4) Then compile xapian-omega as you normally would.

    Hope this helps someone else.

    Brian
    On Tue, Nov 30, 2010 at 1:29 PM, Tom wrote:

    Hi Brian,

    This is certainly possible, just by generating terms from the
    filename. But are you talking about writing an app from scratch, or
    adding this to an existing one? (I'm not sure but I think omega might
    already support this).

    Tom
    On 30 November 2010 12:01, Brian Burton wrote:
    I have scoured the documentation and mailing list but can't find what seems
    to be an obvious question.

    I would simply like to index binary files (images, zips, dwg files) so that
    they are included in the search results. They wouldn't need to be parsed
    for content, only have the filenames searchable.

    Is this possible with Xapian? If not has anyone come up with an alternate
    strategy?
    _______________________________________________
    Xapian-discuss mailing list
    Xapian-discuss at lists.xapian.org
    http://lists.xapian.org/mailman/listinfo/xapian-discuss
  • Olly Betts at Dec 2, 2010 at 12:28 am

    On Wed, Dec 01, 2010 at 12:05:22AM +0100, Brian Burton wrote:
    I have finally cobbled together a solution so I'll post it here for anyone
    else who has this question.
    Thanks. I've opened a ticket in trac for this issue:

    http://trac.xapian.org/ticket/519

    Not sure if you found it, but this has some tips on adding support for
    additional file formats to omindex:

    http://trac.xapian.org/wiki/FAQ/OmegaNewFileFormat

    I've been doing some work recently to make it possible to add new filters
    without patching code and recompiling. Currently that's on trunk, but
    hasn't gone into a release yet.

    Cheers,
    Olly

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupxapian-discuss @
categoriesxapian
postedNov 30, '10 at 12:01p
activeDec 2, '10 at 12:28a
posts4
users3
websitexapian.org
irc#xapian

3 users in discussion

Brian Burton: 2 posts Olly Betts: 1 post Tom: 1 post

People

Translate

site design / logo © 2022 Grokbase