Grokbase Groups Kafka users May 2013
FAQ

Search Discussions

  • Scott Clasen at May 20, 2013 at 4:56 pm
    My guess, EBS is likely your bottleneck. Try running on instance local
    disks, and compare your results. Is this 0.8? What replication factor are
    you using?

    On Mon, May 20, 2013 at 8:11 AM, Jason Weiss wrote:

    I'm trying to maximize my throughput and seem to have hit a ceiling.
    Everything described below is running in AWS.

    I have configured a Kafka cluster with 5 machines, M1.Large, with 600
    provisioned IOPS storage for each EC2 instance. I have a Zookeeper server
    (we aren't in production yet, so I didn't take the time to setup a ZK
    cluster). Publishing to a single topic from 7 different clients, I seem to
    max out at around 20,000 eps with a fixed 2K message size. Each broker
    defines 10 file segments, with a 25000 message / 5 second flush
    configuration in server.properties. I have stuck with 8 threads. My
    producers (Java) are configured with batch.num.messages at 50, and
    queue.buffering.max.messages at 100.

    When I went from 4 servers in the cluster to 5 servers, I only saw an
    increase of about 500 events per second in throughput. In sharp contrast,
    when I run a complete environment on my MacBook Pro, tuned as described
    above but with a single ZK and a single Kafka broker, I am seeing 61,000
    events per second. I don't think I'm network constrained in the AWS
    environment (producer side) because when I add one more client, my MacBook
    Pro, I see a proportionate decrease in EC2 client throughput, and the net
    result is an identical 20,000 eps. Stated differently, my EC2 instance give
    up throughput when my local MacBook Pro joins the array of producers such
    that the throughput is exactly the same.

    Does anyone have any additional suggestions on what else I could tune to
    try and hit our goal, 50,000 eps with a 5 machine cluster? Based on the
    whitepapers published, LinkedIn describes a peak of 170,000 events per
    second across their cluster. My 20,000 seems so far away from their
    production figures.

    What is the relationship, in terms of performance, between ZK and Kafka?
    Do I need to have a more performant ZK cluster, the same, or does it really
    not matter in terms of maximizing throughput.

    Thanks for any suggestions – I've been pulling knobs and turning levers on
    this for several days now.


    Jason

    This electronic message contains information which may be confidential or
    privileged. The information is intended for the use of the individual or
    entity named above. If you are not the intended recipient, be aware that
    any disclosure, copying, distribution or use of the contents of this
    information is prohibited. If you have received this electronic
    transmission in error, please notify us by e-mail at (
    postmaster@rapid7.com) immediately.
  • Jason Weiss at May 20, 2013 at 5:02 pm
    Hi Scott.

    I'm using Kafka 0.7.2. I am using the default replication factor, since I
    don't recall changing that configuration at all.

    I'm using provisioned IOPS, which from attending the AWS event in NYC a
    few weeks ago was presented as the "fastest storage option" for EC2. A
    number of partners presented success stories in terms of throughput with
    provisioned IOPS. I've tried to follow that model.


    Jason
    On 5/20/13 12:56 PM, "Scott Clasen" wrote:

    My guess, EBS is likely your bottleneck. Try running on instance local
    disks, and compare your results. Is this 0.8? What replication factor are
    you using?

    On Mon, May 20, 2013 at 8:11 AM, Jason Weiss wrote:

    I'm trying to maximize my throughput and seem to have hit a ceiling.
    Everything described below is running in AWS.

    I have configured a Kafka cluster with 5 machines, M1.Large, with 600
    provisioned IOPS storage for each EC2 instance. I have a Zookeeper
    server
    (we aren't in production yet, so I didn't take the time to setup a ZK
    cluster). Publishing to a single topic from 7 different clients, I seem
    to
    max out at around 20,000 eps with a fixed 2K message size. Each broker
    defines 10 file segments, with a 25000 message / 5 second flush
    configuration in server.properties. I have stuck with 8 threads. My
    producers (Java) are configured with batch.num.messages at 50, and
    queue.buffering.max.messages at 100.

    When I went from 4 servers in the cluster to 5 servers, I only saw an
    increase of about 500 events per second in throughput. In sharp
    contrast,
    when I run a complete environment on my MacBook Pro, tuned as described
    above but with a single ZK and a single Kafka broker, I am seeing 61,000
    events per second. I don't think I'm network constrained in the AWS
    environment (producer side) because when I add one more client, my
    MacBook
    Pro, I see a proportionate decrease in EC2 client throughput, and the
    net
    result is an identical 20,000 eps. Stated differently, my EC2 instance
    give
    up throughput when my local MacBook Pro joins the array of producers
    such
    that the throughput is exactly the same.

    Does anyone have any additional suggestions on what else I could tune to
    try and hit our goal, 50,000 eps with a 5 machine cluster? Based on the
    whitepapers published, LinkedIn describes a peak of 170,000 events per
    second across their cluster. My 20,000 seems so far away from their
    production figures.

    What is the relationship, in terms of performance, between ZK and Kafka?
    Do I need to have a more performant ZK cluster, the same, or does it
    really
    not matter in terms of maximizing throughput.

    Thanks for any suggestions ­ I've been pulling knobs and turning levers
    on
    this for several days now.


    Jason

    This electronic message contains information which may be confidential
    or
    privileged. The information is intended for the use of the individual or
    entity named above. If you are not the intended recipient, be aware that
    any disclosure, copying, distribution or use of the contents of this
    information is prohibited. If you have received this electronic
    transmission in error, please notify us by e-mail at (
    postmaster@rapid7.com) immediately.
    This electronic message contains information which may be confidential or privileged. The information is intended for the use of the individual or entity named above. If you are not the intended recipient, be aware that any disclosure, copying, distribution or use of the contents of this information is prohibited. If you have received this electronic transmission in error, please notify us by e-mail at (postmaster@rapid7.com) immediately.
  • Scott Clasen at May 20, 2013 at 5:17 pm
    Ahh, yeah, piops is definitely faster than standard EBS, but still much
    slower than local disk.

    you could try benchmarking local disk to see what the instances you are
    using are capable of, then try tweaking iops etc to see where you get.

       M1.Larges arent super fast so your macbook beating them isnt suprising to
    me.

    On Mon, May 20, 2013 at 10:01 AM, Jason Weiss wrote:

    Hi Scott.

    I'm using Kafka 0.7.2. I am using the default replication factor, since I
    don't recall changing that configuration at all.

    I'm using provisioned IOPS, which from attending the AWS event in NYC a
    few weeks ago was presented as the "fastest storage option" for EC2. A
    number of partners presented success stories in terms of throughput with
    provisioned IOPS. I've tried to follow that model.


    Jason
    On 5/20/13 12:56 PM, "Scott Clasen" wrote:

    My guess, EBS is likely your bottleneck. Try running on instance local
    disks, and compare your results. Is this 0.8? What replication factor are
    you using?


    On Mon, May 20, 2013 at 8:11 AM, Jason Weiss <jason_weiss@rapid7.com>
    wrote:
    I'm trying to maximize my throughput and seem to have hit a ceiling.
    Everything described below is running in AWS.

    I have configured a Kafka cluster with 5 machines, M1.Large, with 600
    provisioned IOPS storage for each EC2 instance. I have a Zookeeper
    server
    (we aren't in production yet, so I didn't take the time to setup a ZK
    cluster). Publishing to a single topic from 7 different clients, I seem
    to
    max out at around 20,000 eps with a fixed 2K message size. Each broker
    defines 10 file segments, with a 25000 message / 5 second flush
    configuration in server.properties. I have stuck with 8 threads. My
    producers (Java) are configured with batch.num.messages at 50, and
    queue.buffering.max.messages at 100.

    When I went from 4 servers in the cluster to 5 servers, I only saw an
    increase of about 500 events per second in throughput. In sharp
    contrast,
    when I run a complete environment on my MacBook Pro, tuned as described
    above but with a single ZK and a single Kafka broker, I am seeing 61,000
    events per second. I don't think I'm network constrained in the AWS
    environment (producer side) because when I add one more client, my
    MacBook
    Pro, I see a proportionate decrease in EC2 client throughput, and the
    net
    result is an identical 20,000 eps. Stated differently, my EC2 instance
    give
    up throughput when my local MacBook Pro joins the array of producers
    such
    that the throughput is exactly the same.

    Does anyone have any additional suggestions on what else I could tune to
    try and hit our goal, 50,000 eps with a 5 machine cluster? Based on the
    whitepapers published, LinkedIn describes a peak of 170,000 events per
    second across their cluster. My 20,000 seems so far away from their
    production figures.

    What is the relationship, in terms of performance, between ZK and Kafka?
    Do I need to have a more performant ZK cluster, the same, or does it
    really
    not matter in terms of maximizing throughput.

    Thanks for any suggestions ­ I've been pulling knobs and turning levers
    on
    this for several days now.


    Jason

    This electronic message contains information which may be confidential
    or
    privileged. The information is intended for the use of the individual or
    entity named above. If you are not the intended recipient, be aware that
    any disclosure, copying, distribution or use of the contents of this
    information is prohibited. If you have received this electronic
    transmission in error, please notify us by e-mail at (
    postmaster@rapid7.com) immediately.
    This electronic message contains information which may be confidential or
    privileged. The information is intended for the use of the individual or
    entity named above. If you are not the intended recipient, be aware that
    any disclosure, copying, distribution or use of the contents of this
    information is prohibited. If you have received this electronic
    transmission in error, please notify us by e-mail at (
    postmaster@rapid7.com) immediately.
  • Ken Krugler at May 20, 2013 at 5:44 pm
    Hi Jason,
    On May 20, 2013, at 10:01am, Jason Weiss wrote:

    Hi Scott.

    I'm using Kafka 0.7.2. I am using the default replication factor, since I
    don't recall changing that configuration at all.

    I'm using provisioned IOPS, which from attending the AWS event in NYC a
    few weeks ago was presented as the "fastest storage option" for EC2. A
    number of partners presented success stories in terms of throughput with
    provisioned IOPS. I've tried to follow that model.
    In my experience directly hitting an ephemeral drive on m1.large is faster than using EBS.

    I've seen some articles where RAIDing multiple EBS volumes can exceed the performance of ephemeral drives, but with high variability.

    If you want to maximize performance, set up up a (smaller) cluster of SSD-backed instances with 10Gb Ethernet in the same cluster group.

    E.g. test with three cr1.8xlarge instances.

    -- Ken

    On 5/20/13 12:56 PM, "Scott Clasen" wrote:

    My guess, EBS is likely your bottleneck. Try running on instance local
    disks, and compare your results. Is this 0.8? What replication factor are
    you using?


    On Mon, May 20, 2013 at 8:11 AM, Jason Weiss <jason_weiss@rapid7.com>
    wrote:
    I'm trying to maximize my throughput and seem to have hit a ceiling.
    Everything described below is running in AWS.

    I have configured a Kafka cluster with 5 machines, M1.Large, with 600
    provisioned IOPS storage for each EC2 instance. I have a Zookeeper
    server
    (we aren't in production yet, so I didn't take the time to setup a ZK
    cluster). Publishing to a single topic from 7 different clients, I seem
    to
    max out at around 20,000 eps with a fixed 2K message size. Each broker
    defines 10 file segments, with a 25000 message / 5 second flush
    configuration in server.properties. I have stuck with 8 threads. My
    producers (Java) are configured with batch.num.messages at 50, and
    queue.buffering.max.messages at 100.

    When I went from 4 servers in the cluster to 5 servers, I only saw an
    increase of about 500 events per second in throughput. In sharp
    contrast,
    when I run a complete environment on my MacBook Pro, tuned as described
    above but with a single ZK and a single Kafka broker, I am seeing 61,000
    events per second. I don't think I'm network constrained in the AWS
    environment (producer side) because when I add one more client, my
    MacBook
    Pro, I see a proportionate decrease in EC2 client throughput, and the
    net
    result is an identical 20,000 eps. Stated differently, my EC2 instance
    give
    up throughput when my local MacBook Pro joins the array of producers
    such
    that the throughput is exactly the same.

    Does anyone have any additional suggestions on what else I could tune to
    try and hit our goal, 50,000 eps with a 5 machine cluster? Based on the
    whitepapers published, LinkedIn describes a peak of 170,000 events per
    second across their cluster. My 20,000 seems so far away from their
    production figures.

    What is the relationship, in terms of performance, between ZK and Kafka?
    Do I need to have a more performant ZK cluster, the same, or does it
    really
    not matter in terms of maximizing throughput.

    Thanks for any suggestions ­ I've been pulling knobs and turning levers
    on
    this for several days now.


    Jason
    --------------------------
    Ken Krugler
    +1 530-210-6378
    http://www.scaleunlimited.com
    custom big data solutions & training
    Hadoop, Cascading, Cassandra & Solr
  • Philip O'Toole at May 21, 2013 at 3:45 pm
    As a test, why not just use a disk with provisioned IOPs of 4000? Just as a test - see if it improves.

    Also, you have not supplied any metrics regarding the VM's performance. Is the CPU busy? Is IO maxed out? Network? Disk? Use a tool like atop, and tell us what you find.

    Philip
    On May 20, 2013, at 6:43 PM, Ken Krugler wrote:

    Hi Jason,
    On May 20, 2013, at 10:01am, Jason Weiss wrote:

    Hi Scott.

    I'm using Kafka 0.7.2. I am using the default replication factor, since I
    don't recall changing that configuration at all.

    I'm using provisioned IOPS, which from attending the AWS event in NYC a
    few weeks ago was presented as the "fastest storage option" for EC2. A
    number of partners presented success stories in terms of throughput with
    provisioned IOPS. I've tried to follow that model.
    In my experience directly hitting an ephemeral drive on m1.large is faster than using EBS.

    I've seen some articles where RAIDing multiple EBS volumes can exceed the performance of ephemeral drives, but with high variability.

    If you want to maximize performance, set up up a (smaller) cluster of SSD-backed instances with 10Gb Ethernet in the same cluster group.

    E.g. test with three cr1.8xlarge instances.

    -- Ken

    On 5/20/13 12:56 PM, "Scott Clasen" wrote:

    My guess, EBS is likely your bottleneck. Try running on instance local
    disks, and compare your results. Is this 0.8? What replication factor are
    you using?


    On Mon, May 20, 2013 at 8:11 AM, Jason Weiss <jason_weiss@rapid7.com>
    wrote:
    I'm trying to maximize my throughput and seem to have hit a ceiling.
    Everything described below is running in AWS.

    I have configured a Kafka cluster with 5 machines, M1.Large, with 600
    provisioned IOPS storage for each EC2 instance. I have a Zookeeper
    server
    (we aren't in production yet, so I didn't take the time to setup a ZK
    cluster). Publishing to a single topic from 7 different clients, I seem
    to
    max out at around 20,000 eps with a fixed 2K message size. Each brokers
    defines 10 file segments, with a 25000 message / 5 second flush
    configuration in server.properties. I have stuck with 8 threads. My
    producers (Java) are configured with batch.num.messages at 50, and
    queue.buffering.max.messages at 100.

    When I went from 4 servers in the cluster to 5 servers, I only saw an
    increase of about 500 events per second in throughput. In sharp
    contrast,
    when I run a complete environment on my MacBook Pro, tuned as described
    above but with a single ZK and a single Kafka broker, I am seeing 61,00
    events per second. I don't think I'm network constrained in the AWS
    environment (producer side) because when I add one more client, my
    MacBook
    Pro, I see a proportionate decrease in EC2 client throughput, and the
    net
    result is an identical 20,000 eps. Stated differently, my EC2 instance
    give
    up throughput when my local MacBook Pro joins the array of producers
    such
    that the throughput is exactly the same.

    Does anyone have any additional suggestions on what else I could tune to
    try and hit our goal, 50,000 eps with a 5 machine cluster? Based on the
    whitepapers published, LinkedIn describes a peak of 170,000 events per
    second across their cluster. My 20,000 seems so far away from their
    production figures.

    What is the relationship, in terms of performance, between ZK and Kafka?
    Do I need to have a more performant ZK cluster, the same, or does it
    really
    not matter in terms of maximizing throughput.

    Thanks for any suggestions ­ I've been pulling knobs and turning levers
    on
    this for several days now.


    Jason
    --------------------------
    Ken Krugler
    +1 530-210-6378
    http://www.scaleunlimited.com
    custom big data solutions & training
    Hadoop, Cascading, Cassandra & Solr



  • Jason Weiss at May 21, 2013 at 3:52 pm
    Philip,

    Thanks for the response. I used top yesterday and determined that part of
    my problem was that the kafaka shell script is pre-configured to only use
    512M of RAM, and thus it wasn't using memory efficiently. That has helped
    out tremendously. Adding an echo at the start of the script that it was
    defaulting to such a low value probably would have saved me some time. In
    the same vein, I should have inspected the launch command more closely.

    The virtualization of AWS makes it difficult to truly know what your
    performance is, IMHO. There are lots of people arguing on the web about
    the value of bare metal versus virtualization. I am still baffled how
    companies like Urban Airship are purportedly seeing bursts of 750,000
    messages per second on a 3-cluster machine, but by playing with the knobs
    in a controlled manner, I'm starting to better understand the relationship
    and effect on the overall system.

    Jason

    On 5/21/13 11:44 AM, "Philip O'Toole" wrote:

    As a test, why not just use a disk with provisioned IOPs of 4000? Just as
    a test - see if it improves.

    Also, you have not supplied any metrics regarding the VM's performance.
    Is the CPU busy? Is IO maxed out? Network? Disk? Use a tool like atop,
    and tell us what you find.

    Philip
    On May 20, 2013, at 6:43 PM, Ken Krugler wrote:

    Hi Jason,
    On May 20, 2013, at 10:01am, Jason Weiss wrote:

    Hi Scott.

    I'm using Kafka 0.7.2. I am using the default replication factor,
    since I
    don't recall changing that configuration at all.

    I'm using provisioned IOPS, which from attending the AWS event in NYC a
    few weeks ago was presented as the "fastest storage option" for EC2. A
    number of partners presented success stories in terms of throughput
    with
    provisioned IOPS. I've tried to follow that model.
    In my experience directly hitting an ephemeral drive on m1.large is
    faster than using EBS.

    I've seen some articles where RAIDing multiple EBS volumes can exceed
    the performance of ephemeral drives, but with high variability.

    If you want to maximize performance, set up up a (smaller) cluster of
    SSD-backed instances with 10Gb Ethernet in the same cluster group.

    E.g. test with three cr1.8xlarge instances.

    -- Ken

    On 5/20/13 12:56 PM, "Scott Clasen" wrote:

    My guess, EBS is likely your bottleneck. Try running on instance
    local
    disks, and compare your results. Is this 0.8? What replication
    factor are
    you using?


    On Mon, May 20, 2013 at 8:11 AM, Jason Weiss <jason_weiss@rapid7.com>
    wrote:
    I'm trying to maximize my throughput and seem to have hit a ceiling.
    Everything described below is running in AWS.

    I have configured a Kafka cluster with 5 machines, M1.Large, with 600
    provisioned IOPS storage for each EC2 instance. I have a Zookeeper
    server
    (we aren't in production yet, so I didn't take the time to setup a ZK
    cluster). Publishing to a single topic from 7 different clients, I
    seem
    to
    max out at around 20,000 eps with a fixed 2K message size. Each
    brokers
    defines 10 file segments, with a 25000 message / 5 second flush
    configuration in server.properties. I have stuck with 8 threads. My
    producers (Java) are configured with batch.num.messages at 50, and
    queue.buffering.max.messages at 100.

    When I went from 4 servers in the cluster to 5 servers, I only saw an
    increase of about 500 events per second in throughput. In sharp
    contrast,
    when I run a complete environment on my MacBook Pro, tuned as
    described
    above but with a single ZK and a single Kafka broker, I am seeing
    61,000
    events per second. I don't think I'm network constrained in the AWS
    environment (producer side) because when I add one more client, my
    MacBook
    Pro, I see a proportionate decrease in EC2 client throughput, and the
    net
    result is an identical 20,000 eps. Stated differently, my EC2
    instance
    give
    up throughput when my local MacBook Pro joins the array of producers
    such
    that the throughput is exactly the same.

    Does anyone have any additional suggestions on what else I could
    tune to
    try and hit our goal, 50,000 eps with a 5 machine cluster? Based on
    the
    whitepapers published, LinkedIn describes a peak of 170,000 events
    per
    second across their cluster. My 20,000 seems so far away from their
    production figures.

    What is the relationship, in terms of performance, between ZK and
    Kafka?
    Do I need to have a more performant ZK cluster, the same, or does it
    really
    not matter in terms of maximizing throughput.

    Thanks for any suggestions ­ I've been pulling knobs and turning
    levers
    on
    this for several days now.


    Jason
    --------------------------
    Ken Krugler
    +1 530-210-6378
    http://www.scaleunlimited.com
    custom big data solutions & training
    Hadoop, Cascading, Cassandra & Solr



    This electronic message contains information which may be confidential or privileged. The information is intended for the use of the individual or entity named above. If you are not the intended recipient, be aware that any disclosure, copying, distribution or use of the contents of this information is prohibited. If you have received this electronic transmission in error, please notify us by e-mail at (postmaster@rapid7.com) immediately.
  • Philip O'Toole at May 21, 2013 at 3:58 pm
    Cool.

    By the way, I do mean you should use 'atop'. That was not a typo on my part.

    http://www.atoptool.nl/downloadatop.php

    apt-get install atop

    on Ubuntu systems.

    Philip
    On May 21, 2013, at 4:51 PM, Jason Weiss wrote:

    Philip,

    Thanks for the response. I used top yesterday and determined that part of
    my problem was that the kafaka shell script is pre-configured to only use
    512M of RAM, and thus it wasn't using memory efficiently. That has helped
    out tremendously. Adding an echo at the start of the script that it was
    defaulting to such a low value probably would have saved me some time. In
    the same vein, I should have inspected the launch command more closely.

    The virtualization of AWS makes it difficult to truly know what your
    performance is, IMHO. There are lots of people arguing on the web about
    the value of bare metal versus virtualization. I am still baffled how
    companies like Urban Airship are purportedly seeing bursts of 750,000
    messages per second on a 3-cluster machine, but by playing with the knobs
    in a controlled manner, I'm starting to better understand the relationship
    and effect on the overall system.

    Jason

    On 5/21/13 11:44 AM, "Philip O'Toole" wrote:

    As a test, why not just use a disk with provisioned IOPs of 4000? Just as
    a test - see if it improves.

    Also, you have not supplied any metrics regarding the VM's performance.
    Is the CPU busy? Is IO maxed out? Network? Disk? Use a tool like atop,
    and tell us what you find.

    Philip

    On May 20, 2013, at 6:43 PM, Ken Krugler <kkrugler_lists@transpac.com>
    wrote:
    Hi Jason,
    On May 20, 2013, at 10:01am, Jason Weiss wrote:

    Hi Scott.

    I'm using Kafka 0.7.2. I am using the default replication factor,
    since I
    don't recall changing that configuration at all.

    I'm using provisioned IOPS, which from attending the AWS event in NYC a
    few weeks ago was presented as the "fastest storage option" for EC2. A
    number of partners presented success stories in terms of throughput
    with
    provisioned IOPS. I've tried to follow that model.
    In my experience directly hitting an ephemeral drive on m1.large is
    faster than using EBS.

    I've seen some articles where RAIDing multiple EBS volumes can exceed
    the performance of ephemeral drives, but with high variability.

    If you want to maximize performance, set up up a (smaller) cluster of
    SSD-backed instances with 10Gb Ethernet in the same cluster group.

    E.g. test with three cr1.8xlarge instances.

    -- Ken

    On 5/20/13 12:56 PM, "Scott Clasen" wrote:

    My guess, EBS is likely your bottleneck. Try running on instance
    local
    disks, and compare your results. Is this 0.8? What replication
    factor are
    you using?


    On Mon, May 20, 2013 at 8:11 AM, Jason Weiss <jason_weiss@rapid7.com>
    wrote:
    I'm trying to maximize my throughput and seem to have hit a ceiling.
    Everything described below is running in AWS.

    I have configured a Kafka cluster with 5 machines, M1.Large, with 600
    provisioned IOPS storage for each EC2 instance. I have a Zookeeper
    server
    (we aren't in production yet, so I didn't take the time to setup a ZK
    cluster). Publishing to a single topic from 7 different clients, I
    seem
    to
    max out at around 20,000 eps with a fixed 2K message size. Each
    brokers
    defines 10 file segments, with a 25000 message / 5 second flush
    configuration in server.properties. I have stuck with 8 threads. My
    producers (Java) are configured with batch.num.messages at 50, and
    queue.buffering.max.messages at 100.

    When I went from 4 servers in the cluster to 5 servers, I only saw an
    increase of about 500 events per second in throughput. In sharp
    contrast,
    when I run a complete environment on my MacBook Pro, tuned as
    described
    above but with a single ZK and a single Kafka broker, I am seeing
    61,000
    events per second. I don't think I'm network constrained in the AWS
    environment (producer side) because when I add one more client, my
    MacBook
    Pro, I see a proportionate decrease in EC2 client throughput, and the
    net
    result is an identical 20,000 eps. Stated differently, my EC2
    instance
    give
    up throughput when my local MacBook Pro joins the array of producers
    such
    that the throughput is exactly the same.

    Does anyone have any additional suggestions on what else I could
    tune to
    try and hit our goal, 50,000 eps with a 5 machine cluster? Based on
    the
    whitepapers published, LinkedIn describes a peak of 170,000 events
    per
    second across their cluster. My 20,000 seems so far away from their
    production figures.

    What is the relationship, in terms of performance, between ZK and
    Kafka?
    Do I need to have a more performant ZK cluster, the same, or does it
    really
    not matter in terms of maximizing throughput.

    Thanks for any suggestions ­ I've been pulling knobs and turning
    levers
    on
    this for several days now.


    Jason
    --------------------------
    Ken Krugler
    +1 530-210-6378
    http://www.scaleunlimited.com
    custom big data solutions & training
    Hadoop, Cascading, Cassandra & Solr
    This electronic message contains information which may be confidential or privileged. The information is intended for the use of the individual or entity named above. If you are not the intended recipient, be aware that any disclosure, copying, distribution or use of the contents of this information is prohibited. If you have received this electronic transmission in error, please notify us by e-mail at (postmaster@rapid7.com) immediately.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupusers @
categorieskafka
postedMay 20, '13 at 4:36p
activeMay 21, '13 at 3:58p
posts8
users4
websitekafka.apache.org
irc#kafka

People

Translate

site design / logo © 2021 Grokbase