FAQ
Hello all, my first post to the pgsql mailing list!

There may be a completely better way to do this; if so please help point
me in the right direction!

What I'm trying to do is run a query to partially match 300+ text fields
to a PG table of about 200,000 rows. The idea is to pull out a portion
of text from the original text fields and do a partial text match on
anything in my PG table containing that substring.

I liked the idea of using a where IN(group) to do my comparisons, as in

select col1 from table1 where col1 in ('text1','text2')

however, this requires an exact string match. Is there any way to do a
substring match inside of my IN group? Or can anyone think of a better
way to do something like this?

Heres an example of something of how I'd like this to work:

Portion of 300 Original Text fields:
"brown cat"
"green dog"

2 rows of 200k+ Database table:
"brown kitty"
"green doggy"

We can assume a certain portion of the text is included in the DB table,
so I want to be able to do a substring match on "brown" and "green" and
in this case return both "brown kitty", and "green doggy". However the
problem is, if I run the query on each of my 300 rows to scan 200,000
rows in my DB is entirely too slow. So I was hoping to use the IN clause
to create an IN group of about 300 items to scan the DB once.

I hope this makes sense, but if anything sounds confusing please let me
know, and I will be sure to clarify! Thanks for any help or direction
anyone can provide!!

- Jason Farmer

Search Discussions

  • Richard Broersma Jr at Jun 29, 2006 at 8:52 pm

    We can assume a certain portion of the text is included in the DB table,
    so I want to be able to do a substring match on "brown" and "green" and
    in this case return both "brown kitty", and "green doggy". However the
    problem is, if I run the query on each of my 300 rows to scan 200,000
    rows in my DB is entirely too slow. So I was hoping to use the IN clause
    to create an IN group of about 300 items to scan the DB once.
    You can probably do it. However, you will have to pick a substring from your text field to
    compare against. In this case you seem to be choosing the first word, i.e. "brown" and "green".

    so maybe:

    select t1.col1
    from
    table1 as t1,
    (
    select distinct split_part( tblFileDump.Col1, ' ', 1 ) as samplecol
    from tblFileDump
    ) as fd1

    where t1.col1 like '%' || fd1.samplecol || '%'
    ;

    This is just an idea. I've never used split_part or developed a sudo join this way. But it may
    work provided you and jump your text files into a temp table.

    Notice:
    http://www.postgresql.org/docs/8.1/interactive/functions-string.html
    for the syntax for split_part().

    Regards,

    Richard Broersma Jr.
  • Richard Broersma Jr at Jun 30, 2006 at 12:17 am

    Well, there is also: <a
    href="http://www.postgresql.org/docs/8.1/interactive/functions-comparisons.html#AEN13377">

    /expression/ /operator/ ANY (/array expression/)</a>. So, if you have a way to preprocess you
    input text fields that you want matched
    you could build a regex for each and feed them in an array to an '~ ANY' expression like so (or,
    use ~* for case
    insensitive matching):

    SELECT col1
    FROM table
    WHERE col1 ~ ANY (ARRAY['regex1', 'regex2', ...]);
    Good point, But don't forget to include the list in your response. :-)

    Regards,

    Richard Broersma Jr.
  • Jason Farmer at Jun 30, 2006 at 12:31 pm
    Ah, I do think that sounds about like what I want! Let me play with this
    one some, thanks so much!!

    Richard Broersma Jr wrote:
    Well, there is also: <a
    href="http://www.postgresql.org/docs/8.1/interactive/functions-comparisons.html#AEN13377">

    /expression/ /operator/ ANY (/array expression/)</a>. So, if you have a way to preprocess you
    input text fields that you want matched
    you could build a regex for each and feed them in an array to an '~ ANY' expression like so (or,
    use ~* for case
    insensitive matching):

    SELECT col1
    FROM table
    WHERE col1 ~ ANY (ARRAY['regex1', 'regex2', ...]);
    Good point, But don't forget to include the list in your response. :-)

    Regards,

    Richard Broersma Jr.

    ---------------------------(end of broadcast)---------------------------
    TIP 3: Have you checked our extensive FAQ?

    http://www.postgresql.org/docs/faq

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouppgsql-sql @
categoriespostgresql
postedJun 29, '06 at 8:22p
activeJun 30, '06 at 12:31p
posts4
users2
websitepostgresql.org
irc#postgresql

People

Translate

site design / logo © 2021 Grokbase