FAQ
I have been testing this setup with one of our clusters and I have found an
issue where failover breaks one or two of our queues.

I have a setup where I'm running latest snapshot with Zookeeper and setup
for replicated leveldb. When I run a activemq stop (using wrapper) my active
master shutdown fine and my slave recognizes this and starts process of
going active. When I can get back into my web console I see all queues like
expected BUT there are usually one queue with 50 messages that are stuck, it
has consumers but they are not consumed. When trying to view contents of
queue with jmx or web console it just hangs and times-out. If I delete this
queue its recreated and everything starts working again.

Only log info I see are these:

2013-05-26 07:09:38,850 | WARN | Async error occurred:
java.lang.IllegalArgumentException: The subscription does not exist:
ID:tn11-27191-1368304518500-1:6:5:1 |
org.apache.activemq.broker.TransportConnection.Service | ActiveMQ NIO Worker
38
java.lang.IllegalArgumentException: The subscription does not exist: ID


Here is my env:

INFO | jvm 1 | 2013/05/26 07:09:34 | INFO | Client
environment:zookeeper.version=3.4.3-1240972, built on 02/06/2012 10:48 GMT
INFO | jvm 1 | 2013/05/26 07:09:34 | INFO | Client
environment:java.version=1.7.0_21
INFO | jvm 1 | 2013/05/26 07:09:34 | INFO | Client
environment:java.vendor=Oracle Corporation
INFO | jvm 1 | 2013/05/26 07:09:34 | INFO | Client
environment:java.home=/opt/jdk/jre
INFO | jvm 1 | 2013/05/26 07:09:34 | INFO | Client
environment:java.class.path=/opt/activemq/bin/wrapper.jar:/opt/activemq/bin/activemq.jar
INFO | jvm 1 | 2013/05/26 07:09:34 | INFO | Client
environment:java.library.path=/opt/activemq/bin/linux-x86-64/
INFO | jvm 1 | 2013/05/26 07:09:34 | INFO | Client
environment:java.io.tmpdir=/tmp
INFO | jvm 1 | 2013/05/26 07:09:34 | INFO | Client
environment:java.compiler=<NA>
INFO | jvm 1 | 2013/05/26 07:09:34 | INFO | Client
environment:os.name=Linux
INFO | jvm 1 | 2013/05/26 07:09:34 | INFO | Client
environment:os.arch=amd64
INFO | jvm 1 | 2013/05/26 07:09:34 | INFO | Client
environment:os.version=3.2.0-37-generic
INFO | jvm 1 | 2013/05/26 07:09:34 | INFO | Client
environment:user.name=activemq
INFO | jvm 1 | 2013/05/26 07:09:34 | INFO | Client
environment:user.home=/home/activemq
INFO | jvm 1 | 2013/05/26 07:09:34 | INFO | Client
environment:user.dir=/opt/activemq/bin/linux-x86-64

I tried this we the three latest Snapshots they all do the same.

(And in an unrelated error When starting latest 5.9-snapshot I get a message
that amqp is not recognized.)




--
View this message in context: http://activemq.2283324.n4.nabble.com/Activemq-5-9-leveldb-replication-issue-tp4667495.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Search Discussions

  • Christian Posta at May 27, 2013 at 4:27 am
    Give me the exact steps to reproduce... How many queues, how many messages
    sent to each before failover, etc etc... Exact steps.
    On Sunday, May 26, 2013, heimdull wrote:

    I have been testing this setup with one of our clusters and I have found an
    issue where failover breaks one or two of our queues.

    I have a setup where I'm running latest snapshot with Zookeeper and setup
    for replicated leveldb. When I run a activemq stop (using wrapper) my
    active
    master shutdown fine and my slave recognizes this and starts process of
    going active. When I can get back into my web console I see all queues like
    expected BUT there are usually one queue with 50 messages that are stuck,
    it
    has consumers but they are not consumed. When trying to view contents of
    queue with jmx or web console it just hangs and times-out. If I delete this
    queue its recreated and everything starts working again.

    Only log info I see are these:

    2013-05-26 07:09:38,850 | WARN | Async error occurred:
    java.lang.IllegalArgumentException: The subscription does not exist:
    ID:tn11-27191-1368304518500-1:6:5:1 |
    org.apache.activemq.broker.TransportConnection.Service | ActiveMQ NIO
    Worker
    38
    java.lang.IllegalArgumentException: The subscription does not exist: ID


    Here is my env:

    INFO | jvm 1 | 2013/05/26 07:09:34 | INFO | Client
    environment:zookeeper.version=3.4.3-1240972, built on 02/06/2012 10:48 GMT
    INFO | jvm 1 | 2013/05/26 07:09:34 | INFO | Client
    environment:java.version=1.7.0_21
    INFO | jvm 1 | 2013/05/26 07:09:34 | INFO | Client
    environment:java.vendor=Oracle Corporation
    INFO | jvm 1 | 2013/05/26 07:09:34 | INFO | Client
    environment:java.home=/opt/jdk/jre
    INFO | jvm 1 | 2013/05/26 07:09:34 | INFO | Client

    environment:java.class.path=/opt/activemq/bin/wrapper.jar:/opt/activemq/bin/activemq.jar
    INFO | jvm 1 | 2013/05/26 07:09:34 | INFO | Client
    environment:java.library.path=/opt/activemq/bin/linux-x86-64/
    INFO | jvm 1 | 2013/05/26 07:09:34 | INFO | Client
    environment:java.io.tmpdir=/tmp
    INFO | jvm 1 | 2013/05/26 07:09:34 | INFO | Client
    environment:java.compiler=<NA>
    INFO | jvm 1 | 2013/05/26 07:09:34 | INFO | Client
    environment:os.name=Linux
    INFO | jvm 1 | 2013/05/26 07:09:34 | INFO | Client
    environment:os.arch=amd64
    INFO | jvm 1 | 2013/05/26 07:09:34 | INFO | Client
    environment:os.version=3.2.0-37-generic
    INFO | jvm 1 | 2013/05/26 07:09:34 | INFO | Client
    environment:user.name=activemq
    INFO | jvm 1 | 2013/05/26 07:09:34 | INFO | Client
    environment:user.home=/home/activemq
    INFO | jvm 1 | 2013/05/26 07:09:34 | INFO | Client
    environment:user.dir=/opt/activemq/bin/linux-x86-64

    I tried this we the three latest Snapshots they all do the same.

    (And in an unrelated error When starting latest 5.9-snapshot I get a
    message
    that amqp is not recognized.)




    --
    View this message in context:
    http://activemq.2283324.n4.nabble.com/Activemq-5-9-leveldb-replication-issue-tp4667495.html
    Sent from the ActiveMQ - User mailing list archive at Nabble.com.

    --
    *Christian Posta*
    http://www.christianposta.com/blog
    twitter: @christianposta
  • Heimdull at May 28, 2013 at 3:48 am
    This server has two queues and there was a few thousand messages sent when
    this happened. I can reproduce this everytime when doing a failover between
    these two servers. One thing I noticed today was when doing a failover
    (service activemq stop on master) was that on queue was stuck where
    "Messages Dequeued" is 0 but other queue is humming along so my fix is to
    delete that queue and when I do that I get a message that leveldb recovery
    is done in my logs and everything works fine again. (looks and sounds like
    there is a crash in leveldb with corruption)

    I will see tomorrow if I can replicate this in a smaller scaled server so
    that I can pull better data.



    --
    View this message in context: http://activemq.2283324.n4.nabble.com/Activemq-5-9-leveldb-replication-issue-tp4667495p4667522.html
    Sent from the ActiveMQ - User mailing list archive at Nabble.com.
  • Heimdull at May 28, 2013 at 5:49 pm
    I started testing this on a setup where I manually generate messages and I'm
    not able replicate the issue. I think it only happens if there is moderate
    load when failing over.

    Here is the exact message that shows in the setup where I can replicate the
    issue:

    2013-05-28 10:42:56,977 | INFO | DB recovered from failure. |
    org.apache.activemq.leveldb.LevelDBClient | ActiveMQ NIO Worker 93

    This message shows up in my log after a failover when I delete the broken
    queue. After that message I have these:

    2013-05-28 10:42:56,990 | WARN | Async error occurred:
    java.lang.IllegalArgumentException: The subscription does not exist:
    ID:xx03-35870-1369757354540-1:3:5:1 |
    org.apache.activemq.broker.TransportConnection.Service | ActiveMQ NIO Worker
    29
    java.lang.IllegalArgumentException: The subscription does not exist:
    ID:xx03-35870-1369757354540-1:3:5:1
      at
    org.apache.activemq.broker.region.AbstractRegion.messagePull(AbstractRegion.java:432)
      at
    org.apache.activemq.broker.region.RegionBroker.messagePull(RegionBroker.java:468)
      at
    org.apache.activemq.broker.BrokerFilter.messagePull(BrokerFilter.java:87)



    --
    View this message in context: http://activemq.2283324.n4.nabble.com/Activemq-5-9-leveldb-replication-issue-tp4667495p4667560.html
    Sent from the ActiveMQ - User mailing list archive at Nabble.com.
  • Hiram Chirino at May 30, 2013 at 9:01 pm
    If you got a 'DB recovered from failure.' message, then it should have
    been preceded with an 'DB operation failed. (entering recovery mode)'
    message. Could you post that message?
    On Tue, May 28, 2013 at 1:47 PM, heimdull wrote:
    I started testing this on a setup where I manually generate messages and I'm
    not able replicate the issue. I think it only happens if there is moderate
    load when failing over.

    Here is the exact message that shows in the setup where I can replicate the
    issue:

    2013-05-28 10:42:56,977 | INFO | DB recovered from failure. |
    org.apache.activemq.leveldb.LevelDBClient | ActiveMQ NIO Worker 93

    This message shows up in my log after a failover when I delete the broken
    queue. After that message I have these:

    2013-05-28 10:42:56,990 | WARN | Async error occurred:
    java.lang.IllegalArgumentException: The subscription does not exist:
    ID:xx03-35870-1369757354540-1:3:5:1 |
    org.apache.activemq.broker.TransportConnection.Service | ActiveMQ NIO Worker
    29
    java.lang.IllegalArgumentException: The subscription does not exist:
    ID:xx03-35870-1369757354540-1:3:5:1
    at
    org.apache.activemq.broker.region.AbstractRegion.messagePull(AbstractRegion.java:432)
    at
    org.apache.activemq.broker.region.RegionBroker.messagePull(RegionBroker.java:468)
    at
    org.apache.activemq.broker.BrokerFilter.messagePull(BrokerFilter.java:87)



    --
    View this message in context: http://activemq.2283324.n4.nabble.com/Activemq-5-9-leveldb-replication-issue-tp4667495p4667560.html
    Sent from the ActiveMQ - User mailing list archive at Nabble.com.


    --
    Hiram Chirino

    Engineering | Red Hat, Inc.

    hchirino@redhat.com | fusesource.com | redhat.com

    skype: hiramchirino | twitter: @hiramchirino

    blog: Hiram Chirino's Bit Mojo
  • Hiram Chirino at Jun 3, 2013 at 3:54 pm
    Hi Freddy,

    That 'short record at position' is odd. Basically it means it tired
    to read a message from the log file and the read call read fewer bytes
    than what we requested. I guess that's valid for an OS to do and we
    just need to issue an additional read to load the remaining bytes.
    I've just committed a change that should help with that:
    http://is.gd/0nMmLH

    So try out a new snapshot and keep an eye for that exception and let
    me know you see it happen on your system again.

    Thanks!

    On Fri, May 31, 2013 at 1:28 PM, heimdull wrote:
    I tried this morning to replicate the issue on these two servers and was not able to do so. The only issue that I got was a few messages that got stuck in the queue with a consumer holding them hostage. either restarting that consumer or moving them to a different queue and back resolved that.

    I did look at older errors that I had and found there recovery mode errors:

    INFO | jvm 1 | 2013/05/19 12:15:22 | INFO | DB recovered from failure.
    INFO | jvm 1 | 2013/05/19 12:15:22 | WARN | DB operation failed. (entering recovery mode)
    INFO | jvm 1 | 2013/05/19 12:15:22 | java.io.IOException: short record at position: 2467052 in file: /opt/activemq/data/0000000000000000.log, offset: 2467052
    INFO | jvm 1 | 2013/05/19 12:15:22 | at org.apache.activemq.leveldb.RecordLog$LogReader.read(RecordLog.scala:287)

    --
    Hiram Chirino

    Engineering | Red Hat, Inc.

    hchirino@redhat.com | fusesource.com | redhat.com

    skype: hiramchirino | twitter: @hiramchirino

    blog: Hiram Chirino's Bit Mojo
  • Heimdull at Jun 4, 2013 at 7:02 am
    great! I have tested latest snapshot and have not seen this error when I
    failover. I do however see that I get a handful of messages stuck in a queue
    when I do and I need to move them out and back in to the queue before they
    get consumed.

    Could this have something todo with that:

    2013-06-03 23:50:53,523 | WARN | Async error occurred:
    java.lang.IllegalArgumentException: The subscription does not exist:
    ID:tn11-6552-1369955706360-1:2:1:1 |
    org.apache.activemq.broker.TransportConnection.Service | ActiveMQ NIO Worker
    14
    java.lang.IllegalArgumentException: The subscription does not exist:
    ID:tn11-6552-1369955706360-1:2:1:1
      at
    org.apache.activemq.broker.region.AbstractRegion.messagePull(AbstractRegion.java:432)
      at
    org.apache.activemq.broker.region.RegionBroker.messagePull(RegionBroker.java:468)
      at
    org.apache.activemq.broker.BrokerFilter.messagePull(BrokerFilter.java:87)
      at
    org.apache.activemq.broker.BrokerFilter.messagePull(BrokerFilter.java:87)
      at
    org.apache.activemq.broker.BrokerFilter.messagePull(BrokerFilter.java:87)
      at
    org.apache.activemq.broker.MutableBrokerFilter.messagePull(MutableBrokerFilter.java:289)
      at
    org.apache.activemq.broker.TransportConnection.processMessagePull(TransportConnection.java:515)
      at org.apache.activemq.command.MessagePull.visit(MessagePull.java:43)
      at
    org.apache.activemq.broker.TransportConnection.service(TransportConnection.java:329)
      at
    org.apache.activemq.broker.TransportConnection$1.onCommand(TransportConnection.java:184)
      at
    org.apache.activemq.transport.MutexTransport.onCommand(MutexTransport.java:50)
      at
    org.apache.activemq.transport.WireFormatNegotiator.onCommand(WireFormatNegotiator.java:113)
      at
    org.apache.activemq.transport.AbstractInactivityMonitor.onCommand(AbstractInactivityMonitor.java:288)
      at
    org.apache.activemq.transport.TransportSupport.doConsume(TransportSupport.java:83)
      at
    org.apache.activemq.transport.nio.NIOTransport.serviceRead(NIOTransport.java:138)
      at
    org.apache.activemq.transport.nio.NIOTransport$1.onSelect(NIOTransport.java:69)
      at
    org.apache.activemq.transport.nio.SelectorSelection.onSelect(SelectorSelection.java:94)
      at
    org.apache.activemq.transport.nio.SelectorWorker$1.run(SelectorWorker.java:119)
      at
    java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
      at
    java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
      at java.lang.Thread.run(Thread.java:722)

    If I look at my web console (Active Consumers list) I see that 7 consumers
    have 1=Enqueues, 0=Dequeues, 1=Dispatched, 1=Dispatched Queue which happen
    to be the same number of stuck messages other consumers have even numbers.

    We are also using activemq-5.7 jar for our consumers could that be a
    problem?




    --
    View this message in context: http://activemq.2283324.n4.nabble.com/Activemq-5-9-leveldb-replication-issue-tp4667495p4667791.html
    Sent from the ActiveMQ - User mailing list archive at Nabble.com.
  • Linuraj at Mar 13, 2016 at 2:55 am
    Hi, We have the same issue with activemq version 5.13.1. After a failover, we
    see some messages stuck in the queue. We see a valid number as queue-size;
    however when we browse the queue via hawtio we will not see those messages.
    Also, it is the same number of messages that we miss after a failover. Do we
    know the cause? Do we know the fix? Really appreciate your help.ThanksLinu
    Raj.



    --
    View this message in context: http://activemq.2283324.n4.nabble.com/Activemq-5-9-leveldb-replication-issue-tp4667495p4709236.html
    Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupusers @
categoriesactivemq
postedMay 26, '13 at 2:22p
activeMar 13, '16 at 2:55a
posts8
users4
websiteactivemq.apache.org

People

Translate

site design / logo © 2022 Grokbase