FAQ
Do you think that would be right to use a generator (noeqd) for each
different table where it's needed a GUID or is it too overkill?

I question it because every generator would get a sequential number so it
would allow faster lookups into a table.

Thanks in advance!

--

Search Discussions

  • Kyle Lemons at Dec 2, 2012 at 9:38 pm
    You haven't provided us with nearly enough information to answer your
    question. What kind of table? Is your system distributed? What kind of
    ordering requirements do you (not) need? Why do you need GUIDs (i.e. can
    you use your database's automatic row numbering mechanism)? I'm not sure I
    agree with your premise that sequential IDs will result in faster lookups
    in a table.

    On Sun, Dec 2, 2012 at 8:03 AM, Archos wrote:

    Do you think that would be right to use a generator (noeqd) for each
    different table where it's needed a GUID or is it too overkill?

    I question it because every generator would get a sequential number so it
    would allow faster lookups into a table.

    Thanks in advance!

    --

    --
  • Archos at Dec 2, 2012 at 9:40 pm
    It is a relational database for a system no-distributed (although I'll use
    replication), to handle data of users (users, emails, profiles).

    I want not IDs generated from the RDBMS because it does a hard a system
    distributed, if were necessary. Instead, I use GUIDs because I want a
    global identifier for that can be merged the users' table from different
    services, when it is necessary.

    El domingo, 2 de diciembre de 2012 21:15:06 UTC, Kyle Lemons escribió:
    You haven't provided us with nearly enough information to answer your
    question. What kind of table? Is your system distributed? What kind of
    ordering requirements do you (not) need? Why do you need GUIDs (i.e. can
    you use your database's automatic row numbering mechanism)? I'm not sure I
    agree with your premise that sequential IDs will result in faster lookups
    in a table.


    On Sun, Dec 2, 2012 at 8:03 AM, Archos <raul...@sent.com <javascript:>>wrote:
    Do you think that would be right to use a generator (noeqd) for each
    different table where it's needed a GUID or is it too overkill?

    I question it because every generator would get a sequential number so it
    would allow faster lookups into a table.

    Thanks in advance!

    --

    --
  • André Moraes at Dec 3, 2012 at 5:20 pm

    On Sun, Dec 2, 2012 at 7:40 PM, Archos wrote:
    It is a relational database for a system no-distributed (although I'll use
    replication), to handle data of users (users, emails, profiles).

    I want not IDs generated from the RDBMS because it does a hard a system
    distributed, if were necessary. Instead, I use GUIDs because I want a global
    identifier for that can be merged the users' table from different services,
    when it is necessary.
    Look for k-sorted sequence generators.

    The idea is that you have unique id's with some degree of sorting (the k).

    That way, if you have 64 bit keys, and use 32 bits of the current time
    and the other 32 bits from some other sources (pid/machine id/internal
    process counter/...) you can sort the keys by the time part.

    I already saw something similar in the gonuts ML, but I can't find the
    link now. But you can have some inspiration from Twitter's Snowflake.

    https://github.com/twitter/snowflake

    They do exactly that.


    --
    André Moraes
    http://amoraes.info

    --
  • Stephen Day at Dec 3, 2012 at 5:14 pm
    Also, there is the issue that noeqd's generation scheme is designed to create temporally packed ids, at a high, constant rate. If the identifiers in an application aren't being allocated constantly, you are wasting a large amount of the integer space, due to the number of bits used to identify the time slice. I wouldn't recommend using this scheme (snowflake, noeqd, etc.), unless you are generating 10's of ids per quantum, at least. I don't know if there's any formal analysis showing this, but the above seems intuitive.

    Just let the RDBMS do its job. Great hardware is cheap these days, so vertical scaling is quite realistic for most applications.

    --
  • Archos at Dec 3, 2012 at 6:31 pm
    El lunes, 3 de diciembre de 2012 17:06:38 UTC, Stephen Day escribió:
    Also, there is the issue that noeqd's generation scheme is designed to
    create temporally packed ids, at a high, constant rate. If the identifiers
    in an application aren't being allocated constantly, you are wasting a
    large amount of the integer space, due to the number of bits used to
    identify the time slice. I wouldn't recommend using this scheme (snowflake,
    noeqd, etc.), unless you are generating 10's of ids per quantum, at least.
    I don't know if there's any formal analysis showing this, but the above
    seems intuitive.

    Just let the RDBMS do its job. Great hardware is cheap these days, so
    vertical scaling is quite realistic for most applications.
    This is something to have seriously in mind, mainly if it is going to be
    used by services which are not widely known or used.

    When you say "10's of ids per quantum", I'm supposing that you refer to
    generate id's per "quarter of an hour", right?

    --
  • Bryanturley at Dec 3, 2012 at 6:48 pm

    On Monday, December 3, 2012 12:31:20 PM UTC-6, Archos wrote:

    El lunes, 3 de diciembre de 2012 17:06:38 UTC, Stephen Day escribió:
    Also, there is the issue that noeqd's generation scheme is designed to
    create temporally packed ids, at a high, constant rate. If the identifiers
    in an application aren't being allocated constantly, you are wasting a
    large amount of the integer space, due to the number of bits used to
    identify the time slice. I wouldn't recommend using this scheme (snowflake,
    noeqd, etc.), unless you are generating 10's of ids per quantum, at least.
    I don't know if there's any formal analysis showing this, but the above
    seems intuitive.

    Just let the RDBMS do its job. Great hardware is cheap these days, so
    vertical scaling is quite realistic for most applications.
    This is something to have seriously in mind, mainly if it is going to be
    used by services which are not widely known or used.

    When you say "10's of ids per quantum", I'm supposing that you refer to
    generate id's per "quarter of an hour", right?
    quantum and quantity are practically the same word, neither mean quarter of
    an hour. Though a quarter of an hour is a quantity...
    Might be better said as a count or a specific amount of something.

    And 10s of ids per 15 minutes is probably a bad goal for something database
    related ;)


    --
  • Archos at Dec 3, 2012 at 6:50 pm
    Then the solution would be to generate at least 10 id's at the same time to
    be stored into an array. When you need an id, you get it from there; and
    finally, when it's empty, you generate another 10 id's.

    El lunes, 3 de diciembre de 2012 18:41:24 UTC, bryanturley escribió:
    On Monday, December 3, 2012 12:31:20 PM UTC-6, Archos wrote:


    El lunes, 3 de diciembre de 2012 17:06:38 UTC, Stephen Day escribió:
    Also, there is the issue that noeqd's generation scheme is designed to
    create temporally packed ids, at a high, constant rate. If the identifiers
    in an application aren't being allocated constantly, you are wasting a
    large amount of the integer space, due to the number of bits used to
    identify the time slice. I wouldn't recommend using this scheme (snowflake,
    noeqd, etc.), unless you are generating 10's of ids per quantum, at least.
    I don't know if there's any formal analysis showing this, but the above
    seems intuitive.

    Just let the RDBMS do its job. Great hardware is cheap these days, so
    vertical scaling is quite realistic for most applications.
    This is something to have seriously in mind, mainly if it is going to be
    used by services which are not widely known or used.

    When you say "10's of ids per quantum", I'm supposing that you refer to
    generate id's per "quarter of an hour", right?
    quantum and quantity are practically the same word, neither mean quarter
    of an hour. Though a quarter of an hour is a quantity...
    Might be better said as a count or a specific amount of something.

    And 10s of ids per 15 minutes is probably a bad goal for something
    database related ;)
    --
  • André Moraes at Dec 3, 2012 at 9:35 pm

    On Mon, Dec 3, 2012 at 6:50 PM, Archos wrote:
    Then the solution would be to generate at least 10 id's at the same time to
    be stored into an array. When you need an id, you get it from there; and
    finally, when it's empty, you generate another 10 id's.
    Even in this case, you will have lot's of empty Id's (if you use time
    like I said).

    For example:

    1- at 0800 AM you generate 10 ids (with 1 second of difference)
    2- you then use all those 10 ids in the space of 10 minutes
    3- at 0810 AM you generate more 10 ids (with 1 second of difference)

    the last id from (1) will be: 080010XXXXX and the first id from (3)
    will be 081000, as you can see, you wasted 10 minutes of valid id's.

    Also, storing time as a 32 bit signed value give's you a short period
    of time (http://en.wikipedia.org/wiki/Year_2038_problem). This of
    course if the world don't end this year. :)


    --
    André Moraes
    http://amoraes.info

    --
  • Archos at Dec 11, 2012 at 9:52 pm
    By the way, both snowflake and noeq don't use 32 bits for the time. They
    use 41 bits.

    El lunes, 3 de diciembre de 2012 21:35:09 UTC, André Moraes escribió:
    On Mon, Dec 3, 2012 at 6:50 PM, Archos <raul...@sent.com <javascript:>>
    wrote:
    Then the solution would be to generate at least 10 id's at the same time to
    be stored into an array. When you need an id, you get it from there; and
    finally, when it's empty, you generate another 10 id's.
    Even in this case, you will have lot's of empty Id's (if you use time
    like I said).

    For example:

    1- at 0800 AM you generate 10 ids (with 1 second of difference)
    2- you then use all those 10 ids in the space of 10 minutes
    3- at 0810 AM you generate more 10 ids (with 1 second of difference)

    the last id from (1) will be: 080010XXXXX and the first id from (3)
    will be 081000, as you can see, you wasted 10 minutes of valid id's.

    Also, storing time as a 32 bit signed value give's you a short period
    of time (http://en.wikipedia.org/wiki/Year_2038_problem). This of
    course if the world don't end this year. :)


    --
    André Moraes
    http://amoraes.info
    --
  • Stephen Day at Dec 4, 2012 at 4:38 pm
    I'm using the term "quantum" to refer to the smallest representable unit of time. In the case of noeq/snowflake, this is one millisecond. There is a per quantum counter of 4096. If no values are allocated in a given quantum, those values cannot be used and thus are "wasted". Conversely, if more than 4096 values are required within a quantum, you'll need to wait for the next quantum to allocate more. Of course, this is all per machine, so with up to 1024 machines, you can generate billions per second (theoretically, of course: 1024 machines * 4096 sequence * 1000ms/s).

    This is great if you require ids that need to be sorted by time and are generating thousands per second. It's not so great if you're allocating 4 billions ids today and zero tomorrow.

    --
  • Archos at Dec 11, 2012 at 9:53 am
    You are right, although you could change the parameters for the number of
    bits in each component, getting a different numbers of ids/day.

    Anyway, 2**64 is an astronomical number of ids. With the default settings
    you can do 4M ids a second for something like 69 years.

    El martes, 4 de diciembre de 2012 16:38:12 UTC, Stephen Day escribió:
    I'm using the term "quantum" to refer to the smallest representable unit
    of time. In the case of noeq/snowflake, this is one millisecond. There is a
    per quantum counter of 4096. If no values are allocated in a given quantum,
    those values cannot be used and thus are "wasted". Conversely, if more than
    4096 values are required within a quantum, you'll need to wait for the next
    quantum to allocate more. Of course, this is all per machine, so with up to
    1024 machines, you can generate billions per second (theoretically, of
    course: 1024 machines * 4096 sequence * 1000ms/s).

    This is great if you require ids that need to be sorted by time and are
    generating thousands per second. It's not so great if you're allocating 4
    billions ids today and zero tomorrow.
    --
  • Archos at Dec 3, 2012 at 6:25 pm
    noeqd is based in Snowflake but built using Go:

    https://github.com/bmizerany/noeqd
    https://github.com/noeq

    El lunes, 3 de diciembre de 2012 16:51:23 UTC, André Moraes escribió:
    On Sun, Dec 2, 2012 at 7:40 PM, Archos <raul...@sent.com <javascript:>>
    wrote:
    It is a relational database for a system no-distributed (although I'll use
    replication), to handle data of users (users, emails, profiles).

    I want not IDs generated from the RDBMS because it does a hard a system
    distributed, if were necessary. Instead, I use GUIDs because I want a global
    identifier for that can be merged the users' table from different services,
    when it is necessary.
    Look for k-sorted sequence generators.

    The idea is that you have unique id's with some degree of sorting (the k).

    That way, if you have 64 bit keys, and use 32 bits of the current time
    and the other 32 bits from some other sources (pid/machine id/internal
    process counter/...) you can sort the keys by the time part.

    I already saw something similar in the gonuts ML, but I can't find the
    link now. But you can have some inspiration from Twitter's Snowflake.

    https://github.com/twitter/snowflake

    They do exactly that.


    --
    André Moraes
    http://amoraes.info
    --
  • Patrick Mylund Nielsen at Dec 2, 2012 at 9:48 pm
    Sequential allows you to do binary search, and at least gives databases a
    way to make guesses about where to seek. It also makes insertion faster
    when the number is unique/the primary key.

    I guess the easy answer is if you need to coordinate the generation, but
    don't want to, use UUIDv4s. When storing a large number of rows on
    different systems, I would only do that if the occasional collision is no
    big deal.

    On Sun, Dec 2, 2012 at 10:15 PM, Kyle Lemons wrote:

    You haven't provided us with nearly enough information to answer your
    question. What kind of table? Is your system distributed? What kind of
    ordering requirements do you (not) need? Why do you need GUIDs (i.e. can
    you use your database's automatic row numbering mechanism)? I'm not sure I
    agree with your premise that sequential IDs will result in faster lookups
    in a table.

    On Sun, Dec 2, 2012 at 8:03 AM, Archos wrote:

    Do you think that would be right to use a generator (noeqd) for each
    different table where it's needed a GUID or is it too overkill?

    I question it because every generator would get a sequential number so it
    would allow faster lookups into a table.

    Thanks in advance!

    --

    --

    --
  • Kyle Lemons at Dec 3, 2012 at 3:58 am
    *I want not IDs generated from the RDBMS because it does a hard a system
    distributed, if were necessary. Instead, I use GUIDs because I want a
    global identifier for that can be merged the users' table from different
    services, when it is necessary.*

    That's not, strictly, a reason to avoid the RDBMS' auto increment values.
    Depending on which db you're using, you can create an index that's (shard,
    id) where the ID is automatically incremented and is unique to the shard.
    You still have an index that's unique, but you don't have to agree on what
    ID is inserted next between the different shards. In fact, that's largely
    what noeqd is doing for you.
    On Sun, Dec 2, 2012 at 4:48 PM, Patrick Mylund Nielsen wrote:

    Sequential allows you to do binary search, and at least gives databases a
    way to make guesses about where to seek. It also makes insertion faster
    when the number is unique/the primary key.

    Oh, I have no arguments that an *indexed* column is a good thing when you
    will be using it as a lookup key. Having the items that go into that column
    be sequential by insertion order is orthogonal to that. Databases often
    use b-trees for indexes (though you can often customize that if you know
    something interesting about the indexed data beforehand), so it's often
    somewhat better than binary search. When you have indices, insertion will
    take longer regardless of what's being inserted in the row, so and the
    uniqueness checks are subject to much the same performance dynamic as
    lookup in terms of indexing.

    I guess the easy answer is if you need to coordinate the generation, but
    don't want to, use UUIDv4s. When storing a large number of rows on
    different systems, I would only do that if the occasional collision is no
    big deal.

    On Sun, Dec 2, 2012 at 10:15 PM, Kyle Lemons wrote:

    You haven't provided us with nearly enough information to answer your
    question. What kind of table? Is your system distributed? What kind of
    ordering requirements do you (not) need? Why do you need GUIDs (i.e. can
    you use your database's automatic row numbering mechanism)? I'm not sure I
    agree with your premise that sequential IDs will result in faster lookups
    in a table.

    On Sun, Dec 2, 2012 at 8:03 AM, Archos wrote:

    Do you think that would be right to use a generator (noeqd) for each
    different table where it's needed a GUID or is it too overkill?

    I question it because every generator would get a sequential number so
    it would allow faster lookups into a table.

    Thanks in advance!

    --

    --

    --
  • Patrick Mylund Nielsen at Dec 3, 2012 at 9:18 am
    I am familiar with B-trees. My point is that a sequential PK isn't
    necessarily the same thing as an (indexed) string PK. In the former, the
    rows are almost guaranteed to be ordered on disk, in the latter only the
    index itself is possibly ordered. Not a huge difference in most cases, but
    it's significant if you want to e.g. select the first 1,000 rows in a
    table. You also don't need to check if the PK is a member of the set during
    insertion unless the database allows you to manually override the
    incremented value.

    On Mon, Dec 3, 2012 at 4:58 AM, Kyle Lemons wrote:

    *I want not IDs generated from the RDBMS because it does a hard a system
    distributed, if were necessary. Instead, I use GUIDs because I want a
    global identifier for that can be merged the users' table from different
    services, when it is necessary.*

    That's not, strictly, a reason to avoid the RDBMS' auto increment values.
    Depending on which db you're using, you can create an index that's (shard,
    id) where the ID is automatically incremented and is unique to the shard.
    You still have an index that's unique, but you don't have to agree on what
    ID is inserted next between the different shards. In fact, that's largely
    what noeqd is doing for you.

    On Sun, Dec 2, 2012 at 4:48 PM, Patrick Mylund Nielsen <
    patrick@patrickmylund.com> wrote:
    Sequential allows you to do binary search, and at least gives databases a
    way to make guesses about where to seek. It also makes insertion faster
    when the number is unique/the primary key.

    Oh, I have no arguments that an *indexed* column is a good thing when you
    will be using it as a lookup key. Having the items that go into that column
    be sequential by insertion order is orthogonal to that. Databases often
    use b-trees for indexes (though you can often customize that if you know
    something interesting about the indexed data beforehand), so it's often
    somewhat better than binary search. When you have indices, insertion will
    take longer regardless of what's being inserted in the row, so and the
    uniqueness checks are subject to much the same performance dynamic as
    lookup in terms of indexing.

    I guess the easy answer is if you need to coordinate the generation, but
    don't want to, use UUIDv4s. When storing a large number of rows on
    different systems, I would only do that if the occasional collision is no
    big deal.

    On Sun, Dec 2, 2012 at 10:15 PM, Kyle Lemons wrote:

    You haven't provided us with nearly enough information to answer your
    question. What kind of table? Is your system distributed? What kind of
    ordering requirements do you (not) need? Why do you need GUIDs (i.e. can
    you use your database's automatic row numbering mechanism)? I'm not sure I
    agree with your premise that sequential IDs will result in faster lookups
    in a table.

    On Sun, Dec 2, 2012 at 8:03 AM, Archos wrote:

    Do you think that would be right to use a generator (noeqd) for each
    different table where it's needed a GUID or is it too overkill?

    I question it because every generator would get a sequential number so
    it would allow faster lookups into a table.

    Thanks in advance!

    --

    --

    --

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupgolang-nuts @
categoriesgo
postedDec 2, '12 at 1:03p
activeDec 11, '12 at 9:52p
posts16
users6
websitegolang.org

People

Translate

site design / logo © 2021 Grokbase