Hi,

I am indexing some pretty big ammount of positions in ES (like 150M ) in
monthy based indexes (201312 , 201311 etc)

One document has a timestamp and location.

My queries are like :
Give me all positions inside this boundig box... etc

I have 2 types of indexes with exaclty the same mapping except the location
fields.
Ex:
loc: {
   type: geo_point
}



loc: {
   tree: quadtree
   type: geo_shape
}


It seems to me that there is big difference in the speed of the queries
agains the two types of indexes.

The index with location of type geo_shape is MUCH faster that the index
with geo_point.
With cold caches the query with geo_point runs for aout 26 seconds , where
the query with geo_shape runs for like 2 seconds.
Also the query with geo_point type loads huge ammount of data in field
cache (8GB for just one month data). With geo_shape field data is much less.

The geo_shape mapping is with default precision and qudtree type.
Both queries have the same logic.

I would like to undestand why it is much fatser with geo_shape than
geo_point.
Can someone shade some light on this matter ?

Ofc the index with geo_shape is like 30% bigger in size.

Example query for index type geo_shape
{
   "query": {
     "bool": {
       "must": [
         {
           "range": {
             "ts": {
               "from": "2013-11-01",
               "to": "2013-12-30"
             }
           }
         },
         {
           "geo_shape": {
             "loc": {
               "shape": {
                 "type": "envelope",
                 "coordinates": [
                   [ 1.6754645,53.786 ],
                   [14.345234, 51.3453 ]
                 ]
               }
             }
           }
         }
       ],
     }

   },
   "aggregations": {
     "agg1": {
       "terms": {
         "field": "e_id"
       }
     }
   },
   "size": 0
}


Example query for index type geo_point
{
   "query": {
     "bool": {
       "must": [
         {
           "range": {
             "ts": {
               "from": "2013-11-01",
               "to": "2013-12-30"
             }
           }
         },
         {
           "geo_bounding_box" : {
               "loc" : {
                   "top_left" : {
                       "lat" : 40.73,
                       "lon" : -74.1
                   },
                   "bottom_right" : {
                       "lat" : 40.01,
                       "lon" : -71.12
                   }
               }
             }
         }
       ],
     }
   },
   "aggregations": {
     "agg1": {
       "terms": {
         "field": "e_id"
       }
     }
   },
   "size": 0
}

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4e721191-8164-40cf-aa3f-d882dec10cad%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Search Discussions

  • Alexander Reelsen at Mar 31, 2014 at 7:09 am
    Hey,

    this is all about storing and computing. First, lets take a look at
    geo_point

    * Index: Is stored as two floats lat/lon in the index
    * Query: All geo points are loaded into memory (thus your big fielddata)
    and then in memory calculations are executed

    Now the geo_shape

    * Index: The shape is converted into terms and then stored in the index
    (thus your big index size)
    * Query: A full-text search is basically used to check if a shape is inside
    of another (do they include the same terms?)


    Possible speed improvements:

    * geo_point: Use warmer APIs
    * geo_point: Maybe caching helps, your query location is always the same.
    * geo_point: Maybe the geo_hash_cell filter helps you in terms of speed
    (needs a special mapping)
    * geo_shape: Less precision, less index size, you can change that in the
    mapping

    At the end of day you are meeting a classic tradeoff here. Are willing to
    use more disk or are you willing to compute more things on query time?

    Hope it makes sense as a quick intro...


    --Alex



    On Wed, Mar 19, 2014 at 9:42 PM, Georgi Ivanov wrote:

    Hi,

    I am indexing some pretty big ammount of positions in ES (like 150M ) in
    monthy based indexes (201312 , 201311 etc)

    One document has a timestamp and location.

    My queries are like :
    Give me all positions inside this boundig box... etc

    I have 2 types of indexes with exaclty the same mapping except the
    location fields.
    Ex:
    loc: {
    type: geo_point
    }



    loc: {
    tree: quadtree
    type: geo_shape
    }


    It seems to me that there is big difference in the speed of the queries
    agains the two types of indexes.

    The index with location of type geo_shape is MUCH faster that the index
    with geo_point.
    With cold caches the query with geo_point runs for aout 26 seconds , where
    the query with geo_shape runs for like 2 seconds.
    Also the query with geo_point type loads huge ammount of data in field
    cache (8GB for just one month data). With geo_shape field data is much less.

    The geo_shape mapping is with default precision and qudtree type.
    Both queries have the same logic.

    I would like to undestand why it is much fatser with geo_shape than
    geo_point.
    Can someone shade some light on this matter ?

    Ofc the index with geo_shape is like 30% bigger in size.

    Example query for index type geo_shape
    {
    "query": {
    "bool": {
    "must": [
    {
    "range": {
    "ts": {
    "from": "2013-11-01",
    "to": "2013-12-30"
    }
    }
    },
    {
    "geo_shape": {
    "loc": {
    "shape": {
    "type": "envelope",
    "coordinates": [
    [ 1.6754645,53.786 ],
    [14.345234, 51.3453 ]
    ]
    }
    }
    }
    }
    ],
    }

    },
    "aggregations": {
    "agg1": {
    "terms": {
    "field": "e_id"
    }
    }
    },
    "size": 0
    }


    Example query for index type geo_point
    {
    "query": {
    "bool": {
    "must": [
    {
    "range": {
    "ts": {
    "from": "2013-11-01",
    "to": "2013-12-30"
    }
    }
    },
    {
    "geo_bounding_box" : {
    "loc" : {
    "top_left" : {
    "lat" : 40.73,
    "lon" : -74.1
    },
    "bottom_right" : {
    "lat" : 40.01,
    "lon" : -71.12
    }
    }
    }
    }
    ],
    }
    },
    "aggregations": {
    "agg1": {
    "terms": {
    "field": "e_id"
    }
    }
    },
    "size": 0
    }

    --
    You received this message because you are subscribed to the Google Groups
    "elasticsearch" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to elasticsearch+unsubscribe@googlegroups.com.
    To view this discussion on the web visit
    https://groups.google.com/d/msgid/elasticsearch/4e721191-8164-40cf-aa3f-d882dec10cad%40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/4e721191-8164-40cf-aa3f-d882dec10cad%40googlegroups.com?utm_medium=email&utm_source=footer>
    .
    For more options, visit https://groups.google.com/d/optout.
    --
    You received this message because you are subscribed to the Google Groups "elasticsearch" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
    To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGCwEM9iBFNQXOTUgUwOO4EeM40mXuKqxZiNZLZ%2B9N%2B%2BZWTbzQ%40mail.gmail.com.
    For more options, visit https://groups.google.com/d/optout.
  • Georgi Ivanov at Mar 31, 2014 at 7:21 am
    Thanks Alex,
    That makes perfect sense.
    For now I am sticking with geo_shape type here .
    Except the index size , everything is much smoother here.

    I could recommend geo_shape if one needs geo queries all the time (like me)

    George


    2014-03-31 9:09 GMT+02:00 Alexander Reelsen <alr@spinscale.de>:
    Hey,

    this is all about storing and computing. First, lets take a look at
    geo_point

    * Index: Is stored as two floats lat/lon in the index
    * Query: All geo points are loaded into memory (thus your big fielddata)
    and then in memory calculations are executed

    Now the geo_shape

    * Index: The shape is converted into terms and then stored in the index
    (thus your big index size)
    * Query: A full-text search is basically used to check if a shape is
    inside of another (do they include the same terms?)


    Possible speed improvements:

    * geo_point: Use warmer APIs
    * geo_point: Maybe caching helps, your query location is always the same.
    * geo_point: Maybe the geo_hash_cell filter helps you in terms of speed
    (needs a special mapping)
    * geo_shape: Less precision, less index size, you can change that in the
    mapping

    At the end of day you are meeting a classic tradeoff here. Are willing to
    use more disk or are you willing to compute more things on query time?

    Hope it makes sense as a quick intro...


    --Alex



    On Wed, Mar 19, 2014 at 9:42 PM, Georgi Ivanov wrote:

    Hi,

    I am indexing some pretty big ammount of positions in ES (like 150M ) in
    monthy based indexes (201312 , 201311 etc)

    One document has a timestamp and location.

    My queries are like :
    Give me all positions inside this boundig box... etc

    I have 2 types of indexes with exaclty the same mapping except the
    location fields.
    Ex:
    loc: {
    type: geo_point
    }



    loc: {
    tree: quadtree
    type: geo_shape
    }


    It seems to me that there is big difference in the speed of the queries
    agains the two types of indexes.

    The index with location of type geo_shape is MUCH faster that the index
    with geo_point.
    With cold caches the query with geo_point runs for aout 26 seconds ,
    where the query with geo_shape runs for like 2 seconds.
    Also the query with geo_point type loads huge ammount of data in field
    cache (8GB for just one month data). With geo_shape field data is much less.

    The geo_shape mapping is with default precision and qudtree type.
    Both queries have the same logic.

    I would like to undestand why it is much fatser with geo_shape than
    geo_point.
    Can someone shade some light on this matter ?

    Ofc the index with geo_shape is like 30% bigger in size.

    Example query for index type geo_shape
    {
    "query": {
    "bool": {
    "must": [
    {
    "range": {
    "ts": {
    "from": "2013-11-01",
    "to": "2013-12-30"
    }
    }
    },
    {
    "geo_shape": {
    "loc": {
    "shape": {
    "type": "envelope",
    "coordinates": [
    [ 1.6754645,53.786 ],
    [14.345234, 51.3453 ]
    ]
    }
    }
    }
    }
    ],
    }

    },
    "aggregations": {
    "agg1": {
    "terms": {
    "field": "e_id"
    }
    }
    },
    "size": 0
    }


    Example query for index type geo_point
    {
    "query": {
    "bool": {
    "must": [
    {
    "range": {
    "ts": {
    "from": "2013-11-01",
    "to": "2013-12-30"
    }
    }
    },
    {
    "geo_bounding_box" : {
    "loc" : {
    "top_left" : {
    "lat" : 40.73,
    "lon" : -74.1
    },
    "bottom_right" : {
    "lat" : 40.01,
    "lon" : -71.12
    }
    }
    }
    }
    ],
    }
    },
    "aggregations": {
    "agg1": {
    "terms": {
    "field": "e_id"
    }
    }
    },
    "size": 0
    }

    --
    You received this message because you are subscribed to the Google Groups
    "elasticsearch" group.
    To unsubscribe from this group and stop receiving emails from it, send an
    email to elasticsearch+unsubscribe@googlegroups.com.

    To view this discussion on the web visit
    https://groups.google.com/d/msgid/elasticsearch/4e721191-8164-40cf-aa3f-d882dec10cad%40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/4e721191-8164-40cf-aa3f-d882dec10cad%40googlegroups.com?utm_medium=email&utm_source=footer>
    .
    For more options, visit https://groups.google.com/d/optout.
    --
    You received this message because you are subscribed to a topic in the
    Google Groups "elasticsearch" group.
    To unsubscribe from this topic, visit
    https://groups.google.com/d/topic/elasticsearch/GYPrniLiJis/unsubscribe.
    To unsubscribe from this group and all its topics, send an email to
    elasticsearch+unsubscribe@googlegroups.com.
    To view this discussion on the web visit
    https://groups.google.com/d/msgid/elasticsearch/CAGCwEM9iBFNQXOTUgUwOO4EeM40mXuKqxZiNZLZ%2B9N%2B%2BZWTbzQ%40mail.gmail.com<https://groups.google.com/d/msgid/elasticsearch/CAGCwEM9iBFNQXOTUgUwOO4EeM40mXuKqxZiNZLZ%2B9N%2B%2BZWTbzQ%40mail.gmail.com?utm_medium=email&utm_source=footer>
    .

    For more options, visit https://groups.google.com/d/optout.
    --
    You received this message because you are subscribed to the Google Groups "elasticsearch" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
    To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGKxwgmH4oYns7yD3NGSRJnFmUFcCanGRqqT3OSc9R1u2Y3DKA%40mail.gmail.com.
    For more options, visit https://groups.google.com/d/optout.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupelasticsearch @
categorieselasticsearch
postedMar 19, '14 at 8:42p
activeMar 31, '14 at 7:21a
posts3
users2
websiteelasticsearch.org
irc#elasticsearch

People

Translate

site design / logo © 2018 Grokbase