FAQ
Hello,
is there in the roadmap of postgre integration of fulltext searching in
documents saved in blobs (bytea)?

For example linux antiword can export fine text output that can be inserted into
varchar field.

Would be very very nice (postgre users can be proud to be first) to save
documents into bytea and search that field via to_tsvector, to_tsquery ...

--

www.publicstream.cz - vytvořili jsme produkt pro živé přenosy (online streaming)
a videoarchív ve formátu FLASH.

Bc. Radek Novotný
jednatel Mediawork group s.r.o.

tel.: +420 724 020 361
email: radek.novotny@mediawork.cz
http://www.mediawork.cz

Search Discussions

  • Sam Mason at Jul 29, 2009 at 3:24 pm

    On Wed, Jul 29, 2009 at 04:46:43PM +0200, Radek Novotnnn wrote:
    is there in the roadmap of postgre integration of fulltext searching in
    documents saved in blobs (bytea)?
    Do you mean bytea or large-objects?
    Would be very very nice (postgre users can be proud to be first) to save
    documents into bytea and search that field via to_tsvector, to_tsquery ...
    This seems easy; for large objects, just use lo_export() to dump the
    blob out to the filesystem, and then use something like pl/perl to run
    antiword on it, saving the results to another file and then returning
    the file line-by-line as a SETOF TEXT (I think this is the best way of
    handling things in case the resulting text file is enormous anyway). If
    this code was called "runfilter" we can use it like:

    UPDATE myfiles f SET tsidx = (
    SELECT ts_accum(to_tsvector(t))
    FROM runfilter(f.loid) t);

    Where we've defined ts_accum to be:

    CREATE AGGREGATE ts_accum (tsvector) (
    SFUNC = tsvector_concat,
    STYPE = tsvector,
    INITCOND = ''
    );

    bytea is different because you know when the values has changed (i.e.
    write a trigger) but you need to write more code to get the bytea value
    out into the filesystem.
  • Pavel Stehule at Jul 29, 2009 at 3:38 pm
    Hello

    Dne 29. červenec 2009 16:46 Radek Novotný <radek.novotny@mediawork.cz>
    napsal(a):
    Hello,
    is there in the roadmap of postgre integration of fulltext searching in
    documents saved in blobs (bytea)?
    What I know, no. PostgreSQL doesn't know about others binary formats,
    so it cannot do it.
    For example linux antiword can export fine text output that can be inserted
    into varchar field.
    I understand it well. Three years ago, we used antiword. We stored two
    values - bytea - original word doc, and text - antiword output. The
    overhead is minimal, and this solution worked very well.
    Would be very very nice (postgre users can be proud to be first) to save
    documents into bytea and search that field via to_tsvector, to_tsquery ...
    It's should be very slow. You have to do repeated transformation.

    Pavel Stehule
    --

    www.publicstream.cz - vytvořili jsme produkt pro živé přenosy (online
    streaming) a videoarchív ve formátu FLASH.

    Bc. Radek Novotný
    jednatel Mediawork group s.r.o.

    tel.: +420 724 020 361
    email: radek.novotny@mediawork.cz
    http://www.mediawork.cz
  • Radek Novotný at Jul 30, 2009 at 1:09 pm
    Is there possible to create pg trigger that runs shell script?
    --

    www.publicstream.cz - vytvořili jsme produkt pro živé přenosy (online streaming)
    a videoarchív ve formátu FLASH.

    Bc. Radek Novotný
    jednatel Mediawork group s.r.o.

    tel.: +420 724 020 361
    email: radek.novotny@mediawork.cz
    http://www.mediawork.cz

    Dne St 29. července 2009 17:38:31 Pavel Stehule napsal(a):
    Hello

    Dne 29. červenec 2009 16:46 Radek Novotný <radek.novotny@mediawork.cz>

    napsal(a):
    Hello,
    is there in the roadmap of postgre integration of fulltext searching in
    documents saved in blobs (bytea)?
    What I know, no. PostgreSQL doesn't know about others binary formats,
    so it cannot do it.
    For example linux antiword can export fine text output that can be
    inserted into varchar field.
    I understand it well. Three years ago, we used antiword. We stored two
    values - bytea - original word doc, and text - antiword output. The
    overhead is minimal, and this solution worked very well.
    Would be very very nice (postgre users can be proud to be first) to save
    documents into bytea and search that field via to_tsvector, to_tsquery
    ...
    It's should be very slow. You have to do repeated transformation.

    Pavel Stehule
    --

    www.publicstream.cz - vytvořili jsme produkt pro živé přenosy (online
    streaming) a videoarchív ve formátu FLASH.

    Bc. Radek Novotný
    jednatel Mediawork group s.r.o.

    tel.: +420 724 020 361
    email: radek.novotny@mediawork.cz
    http://www.mediawork.cz
  • Sam Mason at Jul 30, 2009 at 1:18 pm

    On Thu, Jul 30, 2009 at 03:09:12PM +0200, Radek Novotnnn wrote:
    Is there possible to create pg trigger that runs shell script?
    Yes, pl/perl can do this.
  • Michael Glaesemann at Jul 30, 2009 at 1:22 pm

    On Jul 30, 2009, at 9:09 , Radek Novotný wrote:

    Is there possible to create pg trigger that runs shell script?

    [Please don't top post.]

    Yes. You can use an untrusted language such as pl/perlu to run system
    commands.

    Michael Glaesemann
    grzm seespotcode net
  • Tom Lane at Jul 30, 2009 at 2:08 pm

    Michael Glaesemann writes:
    On Jul 30, 2009, at 9:09 , Radek Novotný wrote:
    Is there possible to create pg trigger that runs shell script?
    Yes. You can use an untrusted language such as pl/perlu to run system
    commands.
    The fact that you can do it doesn't make it a good idea ...

    If you do this, you'll need to consider what happens if the database
    transaction rolls back after calling your trigger. The effects in the
    filesystem are still there, but the effects in the database aren't.

    regards, tom lane
  • Andreas Kretschmer at Jul 30, 2009 at 1:23 pm

    In response to Radek Novotný :
    Is there possible to create pg trigger that runs shell script?
    Sure, use an untrusted language for the trigger-function.


    Andreas
    --
    Andreas Kretschmer
    Kontakt: Heynitz: 035242/47150, D1: 0160/7141639 (mehr: -> Header)
    GnuPG-ID: 0x3FFF606C, privat 0x7F4584DA http://wwwkeys.de.pgp.net

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-general @
categoriespostgresql
postedJul 29, '09 at 2:48p
activeJul 30, '09 at 2:08p
posts8
users6
websitepostgresql.org
irc#postgresql

People

Translate

site design / logo © 2022 Grokbase