FAQ

[PostgreSQL-Hackers] Why is there a tsquery data type?

Bruce Momjian
Aug 29, 2007 at 8:22 pm
Why does text search need a tsquery data type? I realize it needs
tsvector so it can create indexes and updated trigger columns, but it
seems tsquery could instead just be a simple text string.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +
reply

Search Discussions

14 responses

  • Tom Lane at Aug 29, 2007 at 10:36 pm

    Bruce Momjian writes:
    Why does text search need a tsquery data type? I realize it needs
    tsvector so it can create indexes and updated trigger columns, but it
    seems tsquery could instead just be a simple text string.
    By that logic, we don't need any data types other than text.

    regards, tom lane
  • Bruce Momjian at Aug 29, 2007 at 11:13 pm

    Tom Lane wrote:
    Bruce Momjian <bruce@momjian.us> writes:
    Why does text search need a tsquery data type? I realize it needs
    tsvector so it can create indexes and updated trigger columns, but it
    seems tsquery could instead just be a simple text string.
    By that logic, we don't need any data types other than text.
    What is tsquery giving us that text would not?

    --
    Bruce Momjian <bruce@momjian.us> http://momjian.us
    EnterpriseDB http://www.enterprisedb.com

    + If your life is a hard drive, Christ can be your backup. +
  • Tom Lane at Aug 29, 2007 at 11:26 pm

    Bruce Momjian writes:
    Tom Lane wrote:
    By that logic, we don't need any data types other than text.
    What is tsquery giving us that text would not?
    A preprocessed representation that can be compared to tsvector
    efficiently.

    Now, if you'd asked whether we need *both* tsvector and tsquery,
    that'd be a fair question. I'm not 100% clear on what the differences
    are, but they seem pretty subtle. Do you think that having only
    one datatype would be clearer?

    regards, tom lane
  • Bruce Momjian at Aug 30, 2007 at 1:03 am

    Tom Lane wrote:
    Bruce Momjian <bruce@momjian.us> writes:
    Tom Lane wrote:
    By that logic, we don't need any data types other than text.
    What is tsquery giving us that text would not?
    A preprocessed representation that can be compared to tsvector
    efficiently.

    Now, if you'd asked whether we need *both* tsvector and tsquery,
    that'd be a fair question. I'm not 100% clear on what the differences
    are, but they seem pretty subtle. Do you think that having only
    one datatype would be clearer?
    There is no question things would be clearer with only one text search
    data type. The only value I can see to having a tsquery data type is
    that you can store a tsquery value in a column, but why would that be
    much better than just storing it in a TEXT field?

    Internally I assume you would have to generate a tsquery structure from
    a TEXT string, so the idea of a query representation wouldn't go away;
    it would just be internal.

    The one thing we would lose is the ability to process the query string
    with a named configuration. If we always cast to TEXT, I assume we
    would always be using "default_text_search_config", and I am a little
    worried about queries in triggers that have to wire-down the
    configuration name. As I understand it the tsquery goes through the
    configuration just like the tsvector.

    Right now you can already do:

    'query' @@ 'boy girl'::tsvector

    and the system casts your text string to tsquery automatically. Perhaps
    we just need to minimize tsquery in the documentation and mention its
    special purpose.

    --
    Bruce Momjian <bruce@momjian.us> http://momjian.us
    EnterpriseDB http://www.enterprisedb.com

    + If your life is a hard drive, Christ can be your backup. +
  • Gregory Stark at Aug 30, 2007 at 6:33 am

    "Bruce Momjian" <bruce@momjian.us> writes:

    There is no question things would be clearer with only one text search
    data type. The only value I can see to having a tsquery data type is
    that you can store a tsquery value in a column, but why would that be
    much better than just storing it in a TEXT field?
    When you try storing a tsquery in a column does it alert you if you have an
    invalid syntax at that point? Storing it as text would mean not finding out
    until you try to use the query.

    Is converting a text query into the internal format faster or less memory
    intensive than converting text into the internal representation? When you run
    a query like "WHERE '...' @@ col" if there wasn't a tsquery data type then
    '...' would have to be parsed over and over again for each row.


    --
    Gregory Stark
    EnterpriseDB http://www.enterprisedb.com
  • Bruce Momjian at Aug 30, 2007 at 3:40 pm

    Gregory Stark wrote:
    "Bruce Momjian" <bruce@momjian.us> writes:
    There is no question things would be clearer with only one text search
    data type. The only value I can see to having a tsquery data type is
    that you can store a tsquery value in a column, but why would that be
    much better than just storing it in a TEXT field?
    When you try storing a tsquery in a column does it alert you if you have an
    invalid syntax at that point? Storing it as text would mean not finding out
    until you try to use the query.
    Yes it does check syntax:

    test=> select 'lkjadsf kjfdsa'::tsquery;
    ERROR: syntax error in tsearch query: "lkjadsf kjfdsa"

    A larger question is how many people store queries in the database to
    make it worth the complexity.
    Is converting a text query into the internal format faster or less memory
    intensive than converting text into the internal representation? When you run
    a query like "WHERE '...' @@ col" if there wasn't a tsquery data type then
    '...' would have to be parsed over and over again for each row.
    No, internally the TEXT string would be converted to something the
    system could deal with for that query, which is probably what 99% of all
    queries are going to do anyway by calling to_tsquery().

    --
    Bruce Momjian <bruce@momjian.us> http://momjian.us
    EnterpriseDB http://www.enterprisedb.com

    + If your life is a hard drive, Christ can be your backup. +
  • Gregory Stark at Aug 30, 2007 at 4:02 pm

    "Bruce Momjian" <bruce@momjian.us> writes:

    Gregory Stark wrote:
    "Bruce Momjian" <bruce@momjian.us> writes:
    There is no question things would be clearer with only one text search
    data type. The only value I can see to having a tsquery data type is
    that you can store a tsquery value in a column, but why would that be
    much better than just storing it in a TEXT field?
    When you try storing a tsquery in a column does it alert you if you have an
    invalid syntax at that point? Storing it as text would mean not finding out
    until you try to use the query.
    Yes it does check syntax:

    test=> select 'lkjadsf kjfdsa'::tsquery;
    ERROR: syntax error in tsearch query: "lkjadsf kjfdsa"

    A larger question is how many people store queries in the database to
    make it worth the complexity.
    So would this still happen if you didn't have a tsquery type? Or would it
    throw the error only when the actual matching operator executed and tried to
    parse the text?
    Is converting a text query into the internal format faster or less memory
    intensive than converting text into the internal representation? When you run
    a query like "WHERE '...' @@ col" if there wasn't a tsquery data type then
    '...' would have to be parsed over and over again for each row.
    No, internally the TEXT string would be converted to something the
    system could deal with for that query, which is probably what 99% of all
    queries are going to do anyway by calling to_tsquery().
    How would that happen if there wasn't a tsquery type?

    --
    Gregory Stark
    EnterpriseDB http://www.enterprisedb.com
  • Bruce Momjian at Aug 30, 2007 at 4:17 pm

    Gregory Stark wrote:
    "Bruce Momjian" <bruce@momjian.us> writes:
    Gregory Stark wrote:
    "Bruce Momjian" <bruce@momjian.us> writes:
    There is no question things would be clearer with only one text search
    data type. The only value I can see to having a tsquery data type is
    that you can store a tsquery value in a column, but why would that be
    much better than just storing it in a TEXT field?
    When you try storing a tsquery in a column does it alert you if you have an
    invalid syntax at that point? Storing it as text would mean not finding out
    until you try to use the query.
    Yes it does check syntax:

    test=> select 'lkjadsf kjfdsa'::tsquery;
    ERROR: syntax error in tsearch query: "lkjadsf kjfdsa"

    A larger question is how many people store queries in the database to
    make it worth the complexity.
    So would this still happen if you didn't have a tsquery type? Or would it
    throw the error only when the actual matching operator executed and tried to
    parse the text?
    Well, if you didn't have a tsquery data type then you would get the
    error when the TEXT was cast to tsquery for the search itself.
    Is converting a text query into the internal format faster or less memory
    intensive than converting text into the internal representation? When you run
    a query like "WHERE '...' @@ col" if there wasn't a tsquery data type then
    '...' would have to be parsed over and over again for each row.
    No, internally the TEXT string would be converted to something the
    system could deal with for that query, which is probably what 99% of all
    queries are going to do anyway by calling to_tsquery().
    How would that happen if there wasn't a tsquery type?
    There is an internal C structure which holds the tsquery information.
    My guess is that we would internally have something like tsquery but it
    wouldn't be user-visible perhaps. Right now I am a little confused
    about how to do this and keep the data-type-independent nature of the
    backend. You are right that we might have to end up re-parsing the TEXT
    field every time it hits the @@ operator, which is a pain.

    --
    Bruce Momjian <bruce@momjian.us> http://momjian.us
    EnterpriseDB http://www.enterprisedb.com

    + If your life is a hard drive, Christ can be your backup. +
  • Tom Lane at Aug 30, 2007 at 4:18 pm

    Gregory Stark writes:
    How would that happen if there wasn't a tsquery type?
    I don't think Bruce is suggesting that the search operand can really
    be plain text (or if he is, he's nuts). The question here is whether
    there's really a need for a distinction between tsvector and tsquery
    datatypes; could we have tsvector serve both purposes instead?

    I can see that there are differences: tsquery can tell the difference
    between "x AND y" and "x OR y", whereas tsvector just knows "x, y".
    A superset datatype that can do both is certainly possible, but whether
    it's practical, or would be easier to use than the current design,
    I dunno.

    Perhaps a suitable analogy is regexp pattern matching. Traditionally
    regexps are conceived of as strings, but if they'd originated in more
    strongly typed languages than they did, they'd certainly be thought
    of as a distinct data type. Had we implemented ~ as taking a right
    operand of type 'regexp', we could win on a number of levels: entry-time
    syntax checking for regexps, and a precompiled internal representation,
    for instance. For regexps it seems clear to me that the target text
    string and the pattern really are different datatypes, and fuzzing that
    distinction is not an improvement.

    regards, tom lane
  • Bruce Momjian at Aug 30, 2007 at 4:21 pm

    Tom Lane wrote:
    Gregory Stark <stark@enterprisedb.com> writes:
    How would that happen if there wasn't a tsquery type?
    I don't think Bruce is suggesting that the search operand can really
    be plain text (or if he is, he's nuts). The question here is whether
    Yes, that is what I was suggesting because as I mentioned TEXT already
    functions fine as tsquery.
    there's really a need for a distinction between tsvector and tsquery
    datatypes; could we have tsvector serve both purposes instead?

    I can see that there are differences: tsquery can tell the difference
    between "x AND y" and "x OR y", whereas tsvector just knows "x, y".
    A superset datatype that can do both is certainly possible, but whether
    it's practical, or would be easier to use than the current design,
    I dunno.
    Because of the special behavior of & and |, I assume tsquery and
    tsvector cannot be the same data type.
    Perhaps a suitable analogy is regexp pattern matching. Traditionally
    regexps are conceived of as strings, but if they'd originated in more
    strongly typed languages than they did, they'd certainly be thought
    of as a distinct data type. Had we implemented ~ as taking a right
    operand of type 'regexp', we could win on a number of levels: entry-time
    syntax checking for regexps, and a precompiled internal representation,
    for instance. For regexps it seems clear to me that the target text
    string and the pattern really are different datatypes, and fuzzing that
    distinction is not an improvement.
    Yes, this is a good analogy.

    --
    Bruce Momjian <bruce@momjian.us> http://momjian.us
    EnterpriseDB http://www.enterprisedb.com

    + If your life is a hard drive, Christ can be your backup. +
  • Oleg Bartunov at Aug 30, 2007 at 5:03 pm

    On Thu, 30 Aug 2007, Bruce Momjian wrote:

    Gregory Stark wrote:
    "Bruce Momjian" <bruce@momjian.us> writes:
    There is no question things would be clearer with only one text search
    data type. The only value I can see to having a tsquery data type is
    that you can store a tsquery value in a column, but why would that be
    much better than just storing it in a TEXT field?
    When you try storing a tsquery in a column does it alert you if you have an
    invalid syntax at that point? Storing it as text would mean not finding out
    until you try to use the query.
    Yes it does check syntax:

    test=> select 'lkjadsf kjfdsa'::tsquery;
    ERROR: syntax error in tsearch query: "lkjadsf kjfdsa"

    A larger question is how many people store queries in the database to
    make it worth the complexity.
    you forget about very powerfull query rewriting, which is table driven
    Is converting a text query into the internal format faster or less memory
    intensive than converting text into the internal representation? When you run
    a query like "WHERE '...' @@ col" if there wasn't a tsquery data type then
    '...' would have to be parsed over and over again for each row.
    No, internally the TEXT string would be converted to something the
    system could deal with for that query, which is probably what 99% of all
    queries are going to do anyway by calling to_tsquery().
    Regards,
    Oleg
    _____________________________________________________________
    Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
    Sternberg Astronomical Institute, Moscow University, Russia
    Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
    phone: +007(495)939-16-83, +007(495)939-23-83
  • Ron Mayer at Aug 30, 2007 at 5:27 pm

    Tom Lane wrote:
    Bruce Momjian <bruce@momjian.us> writes:
    Why does text search need a tsquery data type? I realize it needs
    tsvector so it can create indexes and updated trigger columns, but it
    seems tsquery could instead just be a simple text string.
    By that logic, we don't need any data types other than text.
    Could similar logic argue that we'd want special types for regular
    expressions too? That seems quite parallel to the tsquery type to me.
  • Tom Lane at Aug 30, 2007 at 6:10 pm

    Ron Mayer writes:
    Could similar logic argue that we'd want special types for regular
    expressions too? That seems quite parallel to the tsquery type to me.
    Yeah, it certainly seems like something we might want to consider in
    future --- we could get rid of that klugy cache for compiled regexps,
    for one thing. An implicit cast from text to the regexp type would
    cover backwards compatibility issues, I think.

    regards, tom lane
  • Bruce Momjian at Mar 12, 2008 at 7:41 pm
    Added to TODO:

    * Consider a special data type for regular expressions

    http://archives.postgresql.org/pgsql-hackers/2007-08/msg01067.php


    ---------------------------------------------------------------------------

    Tom Lane wrote:
    Ron Mayer <rm_pg@cheapcomplexdevices.com> writes:
    Could similar logic argue that we'd want special types for regular
    expressions too? That seems quite parallel to the tsquery type to me.
    Yeah, it certainly seems like something we might want to consider in
    future --- we could get rid of that klugy cache for compiled regexps,
    for one thing. An implicit cast from text to the regexp type would
    cover backwards compatibility issues, I think.

    regards, tom lane

    ---------------------------(end of broadcast)---------------------------
    TIP 4: Have you searched our list archives?

    http://archives.postgresql.org
    --
    Bruce Momjian <bruce@momjian.us> http://momjian.us
    EnterpriseDB http://postgres.enterprisedb.com

    + If your life is a hard drive, Christ can be your backup. +

Related Discussions