FAQ
Hi,

I am trying to push data into storm cluster by putting the data on a
socket. Now, I want my Storm spout to read the data from the same socket.

Any pointers/help on how exactly to achieve the above will be helpful

Nikhil

Search Discussions

  • Ryan Moquin at Jan 4, 2013 at 5:06 pm
    If I understand what you are asking, you could try using storm-osgi:

    https://github.com/rmoquin/storm-osgi

    Define a basicSpout with a spout source that reads from the socket, put the
    tuples onto the supplied queue data structure and they'll be emitted for
    you. If you'd like to try it out. You'll have to wait a week or so before
    I should have some enhanced cluster support available for the topologies.

    Ryan
    On Thursday, January 3, 2013 8:55:16 AM UTC-5, Nikhil Shekhar wrote:

    Hi,

    I am trying to push data into storm cluster by putting the data on a
    socket. Now, I want my Storm spout to read the data from the same socket.

    Any pointers/help on how exactly to achieve the above will be helpful

    Nikhil
  • Nikhil Shekhar at Jan 4, 2013 at 5:59 pm
    Thanks for the inputs Ryan

    But what I am trying to do is....a socket server is pushing the data onto a
    socket..for example lets say to localhost and port 5000 "localhost:5000",
    now I want to read the data from the socket through a spout. But, when I
    try doing the same, I get a runtime exception saying that the Socket is not
    serializable.

    Nikhil
    On Friday, 4 January 2013 22:36:37 UTC+5:30, Ryan Moquin wrote:

    If I understand what you are asking, you could try using storm-osgi:

    https://github.com/rmoquin/storm-osgi

    Define a basicSpout with a spout source that reads from the socket, put
    the tuples onto the supplied queue data structure and they'll be emitted
    for you. If you'd like to try it out. You'll have to wait a week or so
    before I should have some enhanced cluster support available for the
    topologies.

    Ryan
    On Thursday, January 3, 2013 8:55:16 AM UTC-5, Nikhil Shekhar wrote:

    Hi,

    I am trying to push data into storm cluster by putting the data on a
    socket. Now, I want my Storm spout to read the data from the same socket.

    Any pointers/help on how exactly to achieve the above will be helpful

    Nikhil
  • Ryan Moquin at Jan 4, 2013 at 6:11 pm
    Sure, if you want to stick with what you have, then just mark your socket
    transient. It won't be serialized then and you shouldn't have an issue. I
    will be eliminating that problwm completely when I implement clustering in
    storm-osgi.

    Ryan
    On Jan 4, 2013 12:59 PM, "Nikhil Shekhar" wrote:

    Thanks for the inputs Ryan

    But what I am trying to do is....a socket server is pushing the data onto
    a socket..for example lets say to localhost and port 5000 "localhost:5000",
    now I want to read the data from the socket through a spout. But, when I
    try doing the same, I get a runtime exception saying that the Socket is not
    serializable.

    Nikhil
    On Friday, 4 January 2013 22:36:37 UTC+5:30, Ryan Moquin wrote:

    If I understand what you are asking, you could try using storm-osgi:

    https://github.com/rmoquin/**storm-osgi<https://github.com/rmoquin/storm-osgi>

    Define a basicSpout with a spout source that reads from the socket, put
    the tuples onto the supplied queue data structure and they'll be emitted
    for you. If you'd like to try it out. You'll have to wait a week or so
    before I should have some enhanced cluster support available for the
    topologies.

    Ryan
    On Thursday, January 3, 2013 8:55:16 AM UTC-5, Nikhil Shekhar wrote:

    Hi,

    I am trying to push data into storm cluster by putting the data on a
    socket. Now, I want my Storm spout to read the data from the same socket.

    Any pointers/help on how exactly to achieve the above will be helpful

    Nikhil
  • Brian O'Neill at Jan 4, 2013 at 6:23 pm
    Nikhil,

    We had a similar issue with database connections to Cassandra. We switched
    to create a static pool of connections.

    You'll need to be careful however if you use statics since multiple spouts
    might be started in the same JVM (on different ports). In your case,
    create a SocketPool (static member variable). Then change every reference
    to _socket to be POOL.getSocket(5000).

    -brian

    ---
    Brian O'Neill
    Lead Architect, Software Development
    Health Market Science
    The Science of Better Results
    2700 Horizon Drive € King of Prussia, PA € 19406
    M: 215.588.6024 € @boneill42 <http://www.twitter.com/boneill42> €
    healthmarketscience.com


    This information transmitted in this email message is for the intended
    recipient only and may contain confidential and/or privileged material. If
    you received this email in error and are not the intended recipient, or the
    person responsible to deliver it to the intended recipient, please contact
    the sender at the email above and delete this email and any attachments and
    destroy any copies thereof. Any review, retransmission, dissemination,
    copying or other use of, or taking any action in reliance upon, this
    information by persons or entities other than the intended recipient is
    strictly prohibited.



    From: Nikhil Shekhar <nikhilshekhar2000@gmail.com>
    Reply-To: <storm-user@googlegroups.com>
    Date: Friday, January 4, 2013 12:59 PM
    To: <storm-user@googlegroups.com>
    Subject: [storm-user] Re: Pushing data into storm and then reading it
    through Spout

    Thanks for the inputs Ryan

    But what I am trying to do is....a socket server is pushing the data onto a
    socket..for example lets say to localhost and port 5000 "localhost:5000",
    now I want to read the data from the socket through a spout. But, when I try
    doing the same, I get a runtime exception saying that the Socket is not
    serializable.

    Nikhil
    On Friday, 4 January 2013 22:36:37 UTC+5:30, Ryan Moquin wrote:
    If I understand what you are asking, you could try using storm-osgi:

    https://github.com/rmoquin/storm-osgi

    Define a basicSpout with a spout source that reads from the socket, put the
    tuples onto the supplied queue data structure and they'll be emitted for you.
    If you'd like to try it out. You'll have to wait a week or so before I should
    have some enhanced cluster support available for the topologies.

    Ryan
    On Thursday, January 3, 2013 8:55:16 AM UTC-5, Nikhil Shekhar wrote:
    Hi,

    I am trying to push data into storm cluster by putting the data on a socket.
    Now, I want my Storm spout to read the data from the same socket.

    Any pointers/help on how exactly to achieve the above will be helpful

    Nikhil
  • Ryan Moquin at Jan 4, 2013 at 7:24 pm
    There's no reason to serialize a socket though. It will exist on the other
    side and you don't, nor would you want to have a serialized socket
    deserialized, in theory it might think it's already connected, have wrong
    environment settings stored.. just mark it transient.

    Ryan
    On Jan 4, 2013 1:15 PM, "Brian O'Neill" wrote:

    Nikhil,

    We had a similar issue with database connections to Cassandra. We
    switched to create a static pool of connections.

    You'll need to be careful however if you use statics since multiple spouts
    might be started in the same JVM (on different ports). In your case,
    create a SocketPool (static member variable). Then change every reference
    to _socket to be POOL.getSocket(5000).

    -brian

    ---

    Brian O'Neill

    Lead Architect, Software Development

    *Health Market Science*

    *The Science of Better Results*

    2700 Horizon Drive • King of Prussia, PA • 19406****

    M: 215.588.6024 • @boneill42 <http://www.twitter.com/boneill42> •

    healthmarketscience.com


    This information transmitted in this email message is for the intended
    recipient only and may contain confidential and/or privileged material. If
    you received this email in error and are not the intended recipient, or the
    person responsible to deliver it to the intended recipient, please contact
    the sender at the email above and delete this email and any attachments and
    destroy any copies thereof. Any review, retransmission, dissemination,
    copying or other use of, or taking any action in reliance upon, this
    information by persons or entities other than the intended recipient is
    strictly prohibited.****

    ** **


    From: Nikhil Shekhar <nikhilshekhar2000@gmail.com>
    Reply-To: <storm-user@googlegroups.com>
    Date: Friday, January 4, 2013 12:59 PM
    To: <storm-user@googlegroups.com>
    Subject: [storm-user] Re: Pushing data into storm and then reading it
    through Spout

    Thanks for the inputs Ryan

    But what I am trying to do is....a socket server is pushing the data onto
    a socket..for example lets say to localhost and port 5000 "localhost:5000",
    now I want to read the data from the socket through a spout. But, when I
    try doing the same, I get a runtime exception saying that the Socket is not
    serializable.

    Nikhil
    On Friday, 4 January 2013 22:36:37 UTC+5:30, Ryan Moquin wrote:

    If I understand what you are asking, you could try using storm-osgi:

    https://github.com/rmoquin/**storm-osgi<https://github.com/rmoquin/storm-osgi>

    Define a basicSpout with a spout source that reads from the socket, put
    the tuples onto the supplied queue data structure and they'll be emitted
    for you. If you'd like to try it out. You'll have to wait a week or so
    before I should have some enhanced cluster support available for the
    topologies.

    Ryan
    On Thursday, January 3, 2013 8:55:16 AM UTC-5, Nikhil Shekhar wrote:

    Hi,

    I am trying to push data into storm cluster by putting the data on a
    socket. Now, I want my Storm spout to read the data from the same socket.

    Any pointers/help on how exactly to achieve the above will be helpful

    Nikhil
  • Brian O'Neill at Jan 4, 2013 at 9:38 pm
    Static's don't serialize.

    -brian
    On Jan 4, 2013, at 1:55 PM, Ryan Moquin wrote:

    There's no reason to serialize a socket though. It will exist on the other side and you don't, nor would you want to have a serialized socket deserialized, in theory it might think it's already connected, have wrong environment settings stored.. just mark it transient.

    Ryan

    On Jan 4, 2013 1:15 PM, "Brian O'Neill" wrote:
    Nikhil,

    We had a similar issue with database connections to Cassandra. We switched to create a static pool of connections.

    You'll need to be careful however if you use statics since multiple spouts might be started in the same JVM (on different ports). In your case, create a SocketPool (static member variable). Then change every reference to _socket to be POOL.getSocket(5000).

    -brian

    ---
    Brian O'Neill
    Lead Architect, Software Development
    Health Market Science
    The Science of Better Results
    2700 Horizon Drive • King of Prussia, PA • 19406
    M: 215.588.6024 • @boneill42 •
    healthmarketscience.com

    This information transmitted in this email message is for the intended recipient only and may contain confidential and/or privileged material. If you received this email in error and are not the intended recipient, or the person responsible to deliver it to the intended recipient, please contact the sender at the email above and delete this email and any attachments and destroy any copies thereof. Any review, retransmission, dissemination, copying or other use of, or taking any action in reliance upon, this information by persons or entities other than the intended recipient is strictly prohibited.



    From: Nikhil Shekhar <nikhilshekhar2000@gmail.com>
    Reply-To: <storm-user@googlegroups.com>
    Date: Friday, January 4, 2013 12:59 PM
    To: <storm-user@googlegroups.com>
    Subject: [storm-user] Re: Pushing data into storm and then reading it through Spout

    Thanks for the inputs Ryan

    But what I am trying to do is....a socket server is pushing the data onto a socket..for example lets say to localhost and port 5000 "localhost:5000", now I want to read the data from the socket through a spout. But, when I try doing the same, I get a runtime exception saying that the Socket is not serializable.

    Nikhil

    On Friday, 4 January 2013 22:36:37 UTC+5:30, Ryan Moquin wrote:
    If I understand what you are asking, you could try using storm-osgi:

    https://github.com/rmoquin/storm-osgi

    Define a basicSpout with a spout source that reads from the socket, put the tuples onto the supplied queue data structure and they'll be emitted for you. If you'd like to try it out. You'll have to wait a week or so before I should have some enhanced cluster support available for the topologies.

    Ryan

    On Thursday, January 3, 2013 8:55:16 AM UTC-5, Nikhil Shekhar wrote:
    Hi,

    I am trying to push data into storm cluster by putting the data on a socket. Now, I want my Storm spout to read the data from the same socket.

    Any pointers/help on how exactly to achieve the above will be helpful

    Nikhil
    --
    Brian ONeill
    Lead Architect, Health Market Science (http://healthmarketscience.com)
    mobile:215.588.6024
    blog: http://weblogs.java.net/blog/boneill42/
    blog: http://brianoneill.blogspot.com/
  • Ryan Moquin at Jan 5, 2013 at 1:37 am
    Let me make sure I interpreted your suggestion correctly. Yes, the default
    serialization of a java object will not include the values of static
    variables because they belong to the class, not the object instance.
    Otherwise you would have a mess at deserialization. However, you said to
    set _socket to Pool.getSocket (5000). So you still end up with a socket
    instance assigned to an object's non transient member variable (_socket).
    If you meant that _socket should be changed to static to hold the socket
    reference, then every spout would use the exact same Socket instance. That
    would be a problem if you have multiple spouts since multiple threads would
    be using the same socket. The only way the deserialized class can
    initalize itself with a socket on a different port, is through it's default
    constructor. How would you know which socket belonged to which spout if
    you had more than one or which port the new ones should be assigned?
    Without holding onto a reference to a socket, how does the static method
    know which to return? It would be much more work to deal with that
    coordination, than it would be than to just mark the field transient and
    initalize the class in it's default constructor wouldn't it? Or am I
    missing something? If you configured storm so you only had one spout per
    jvm, then you wouldn't need a pool..... I can't think of any easy way to
    handle this situation otherwise. :)

    Ryan

    Ryan
    Static's don't serialize.

    -brian

    On Jan 4, 2013, at 1:55 PM, Ryan Moquin wrote:

    There's no reason to serialize a socket though. It will exist on the other
    side and you don't, nor would you want to have a serialized socket
    deserialized, in theory it might think it's already connected, have wrong
    environment settings stored.. just mark it transient.

    Ryan
    On Jan 4, 2013 1:15 PM, "Brian O'Neill" wrote:

    Nikhil,

    We had a similar issue with database connections to Cassandra. We
    switched to create a static pool of connections.

    You'll need to be careful however if you use statics since multiple spouts
    might be started in the same JVM (on different ports). In your case,
    create a SocketPool (static member variable). Then change every reference
    to _socket to be POOL.getSocket(5000).

    -brian

    ---
    Brian O'Neill
    Lead Architect, Software Development
    *Health Market Science*
    *The Science of Better Results*
    2700 Horizon Drive • King of Prussia, PA • 19406****
    M: 215.588.6024 • @boneill42 <http://www.twitter.com/boneill42> •
    healthmarketscience.com

    This information transmitted in this email message is for the intended
    recipient only and may contain confidential and/or privileged material. If
    you received this email in error and are not the intended recipient, or the
    person responsible to deliver it to the intended recipient, please contact
    the sender at the email above and delete this email and any attachments and
    destroy any copies thereof. Any review, retransmission, dissemination,
    copying or other use of, or taking any action in reliance upon, this
    information by persons or entities other than the intended recipient is
    strictly prohibited.****
    ** **


    From: Nikhil Shekhar <nikhilshekhar2000@gmail.com>
    Reply-To: <storm-user@googlegroups.com>
    Date: Friday, January 4, 2013 12:59 PM
    To: <storm-user@googlegroups.com>
    Subject: [storm-user] Re: Pushing data into storm and then reading it
    through Spout

    Thanks for the inputs Ryan

    But what I am trying to do is....a socket server is pushing the data onto
    a socket..for example lets say to localhost and port 5000 "localhost:5000",
    now I want to read the data from the socket through a spout. But, when I
    try doing the same, I get a runtime exception saying that the Socket is not
    serializable.

    Nikhil
    On Friday, 4 January 2013 22:36:37 UTC+5:30, Ryan Moquin wrote:

    If I understand what you are asking, you could try using storm-osgi:

    https://github.com/rmoquin/**storm-osgi<https://github.com/rmoquin/storm-osgi>

    Define a basicSpout with a spout source that reads from the socket, put
    the tuples onto the supplied queue data structure and they'll be emitted
    for you. If you'd like to try it out. You'll have to wait a week or so
    before I should have some enhanced cluster support available for the
    topologies.

    Ryan
    On Thursday, January 3, 2013 8:55:16 AM UTC-5, Nikhil Shekhar wrote:

    Hi,

    I am trying to push data into storm cluster by putting the data on a
    socket. Now, I want my Storm spout to read the data from the same socket.

    Any pointers/help on how exactly to achieve the above will be helpful

    Nikhil
    --
    Brian ONeill
    Lead Architect, Health Market Science (http://healthmarketscience.com)
    mobile:215.588.6024
    blog: http://weblogs.java.net/blog/boneill42/
    blog: http://brianoneill.blogspot.com/
  • Vinh at Jan 5, 2013 at 1:29 am
    The socket pool can be declared as non-static. Just be sure to initialize it only in spout.open().

    On Jan 4, 2013, at 4:25 PM, Ryan Moquin wrote:

    Let me make sure I interpreted your suggestion correctly. Yes, the default serialization of a java object will not include the values of static variables because they belong to the class, not the object instance. Otherwise you would have a mess at deserialization. However, you said to set _socket to Pool.getSocket (5000). So you still end up with a socket instance assigned to an object's non transient member variable (_socket). If you meant that _socket should be changed to static to hold the socket reference, then every spout would use the exact same Socket instance. That would be a problem if you have multiple spouts since multiple threads would be using the same socket. The only way the deserialized class can initalize itself with a socket on a different port, is through it's default constructor. How would you know which socket belonged to which spout if you had more than one or which port the new ones should be assigned? Without holding onto a reference to a socket, how does the static method know which to return? It would be much more work to deal with that coordination, than it would be than to just mark the field transient and initalize the class in it's default constructor wouldn't it? Or am I missing something? If you configured storm so you only had one spout per jvm, then you wouldn't need a pool..... I can't think of any easy way to handle this situation otherwise. :)

    Ryan

    Ryan

    Static's don't serialize.

    -brian
    On Jan 4, 2013, at 1:55 PM, Ryan Moquin wrote:

    There's no reason to serialize a socket though. It will exist on the other side and you don't, nor would you want to have a serialized socket deserialized, in theory it might think it's already connected, have wrong environment settings stored.. just mark it transient.

    Ryan

    On Jan 4, 2013 1:15 PM, "Brian O'Neill" wrote:
    Nikhil,

    We had a similar issue with database connections to Cassandra. We switched to create a static pool of connections.

    You'll need to be careful however if you use statics since multiple spouts might be started in the same JVM (on different ports). In your case, create a SocketPool (static member variable). Then change every reference to _socket to be POOL.getSocket(5000).

    -brian

    ---
    Brian O'Neill
    Lead Architect, Software Development
    Health Market Science
    The Science of Better Results
    2700 Horizon Drive • King of Prussia, PA • 19406
    M: 215.588.6024 • @boneill42 •
    healthmarketscience.com

    This information transmitted in this email message is for the intended recipient only and may contain confidential and/or privileged material. If you received this email in error and are not the intended recipient, or the person responsible to deliver it to the intended recipient, please contact the sender at the email above and delete this email and any attachments and destroy any copies thereof. Any review, retransmission, dissemination, copying or other use of, or taking any action in reliance upon, this information by persons or entities other than the intended recipient is strictly prohibited.



    From: Nikhil Shekhar <nikhilshekhar2000@gmail.com>
    Reply-To: <storm-user@googlegroups.com>
    Date: Friday, January 4, 2013 12:59 PM
    To: <storm-user@googlegroups.com>
    Subject: [storm-user] Re: Pushing data into storm and then reading it through Spout

    Thanks for the inputs Ryan

    But what I am trying to do is....a socket server is pushing the data onto a socket..for example lets say to localhost and port 5000 "localhost:5000", now I want to read the data from the socket through a spout. But, when I try doing the same, I get a runtime exception saying that the Socket is not serializable.

    Nikhil

    On Friday, 4 January 2013 22:36:37 UTC+5:30, Ryan Moquin wrote:
    If I understand what you are asking, you could try using storm-osgi:

    https://github.com/rmoquin/storm-osgi

    Define a basicSpout with a spout source that reads from the socket, put the tuples onto the supplied queue data structure and they'll be emitted for you. If you'd like to try it out. You'll have to wait a week or so before I should have some enhanced cluster support available for the topologies.

    Ryan

    On Thursday, January 3, 2013 8:55:16 AM UTC-5, Nikhil Shekhar wrote:
    Hi,

    I am trying to push data into storm cluster by putting the data on a socket. Now, I want my Storm spout to read the data from the same socket.

    Any pointers/help on how exactly to achieve the above will be helpful

    Nikhil
    --
    Brian ONeill
    Lead Architect, Health Market Science (http://healthmarketscience.com)
    mobile:215.588.6024
    blog: http://weblogs.java.net/blog/boneill42/
    blog: http://brianoneill.blogspot.com/
  • Brian O'Neill at Jan 5, 2013 at 2:32 am
    @vinh
    That makes sense. (if you are okay if some fail because the port is already taken, or carefully control how many vms are on each machine)

    We liked the pooling approach so spouts could share a single connection pool per vm. (instead of each initializing its own pool)

    -brian

    On Jan 4, 2013, at 8:04 PM, vinh wrote:

    The socket pool can be declared as non-static. Just be sure to initialize it only in spout.open().

    On Jan 4, 2013, at 4:25 PM, Ryan Moquin wrote:

    Let me make sure I interpreted your suggestion correctly. Yes, the default serialization of a java object will not include the values of static variables because they belong to the class, not the object instance. Otherwise you would have a mess at deserialization. However, you said to set _socket to Pool.getSocket (5000). So you still end up with a socket instance assigned to an object's non transient member variable (_socket). If you meant that _socket should be changed to static to hold the socket reference, then every spout would use the exact same Socket instance. That would be a problem if you have multiple spouts since multiple threads would be using the same socket. The only way the deserialized class can initalize itself with a socket on a different port, is through it's default constructor. How would you know which socket belonged to which spout if you had more than one or which port the new ones should be assigned? Without holding onto a reference to a socket, how does the static method know which to return? It would be much more work to deal with that coordination, than it would be than to just mark the field transient and initalize the class in it's default constructor wouldn't it? Or am I missing something? If you configured storm so you only had one spout per jvm, then you wouldn't need a pool..... I can't think of any easy way to handle this situation otherwise. :)

    Ryan

    Ryan

    Static's don't serialize.

    -brian
    On Jan 4, 2013, at 1:55 PM, Ryan Moquin wrote:

    There's no reason to serialize a socket though. It will exist on the other side and you don't, nor would you want to have a serialized socket deserialized, in theory it might think it's already connected, have wrong environment settings stored.. just mark it transient.

    Ryan

    On Jan 4, 2013 1:15 PM, "Brian O'Neill" wrote:
    Nikhil,

    We had a similar issue with database connections to Cassandra. We switched to create a static pool of connections.

    You'll need to be careful however if you use statics since multiple spouts might be started in the same JVM (on different ports). In your case, create a SocketPool (static member variable). Then change every reference to _socket to be POOL.getSocket(5000).

    -brian

    ---
    Brian O'Neill
    Lead Architect, Software Development
    Health Market Science
    The Science of Better Results
    2700 Horizon Drive • King of Prussia, PA • 19406
    M: 215.588.6024 • @boneill42 •
    healthmarketscience.com

    This information transmitted in this email message is for the intended recipient only and may contain confidential and/or privileged material. If you received this email in error and are not the intended recipient, or the person responsible to deliver it to the intended recipient, please contact the sender at the email above and delete this email and any attachments and destroy any copies thereof. Any review, retransmission, dissemination, copying or other use of, or taking any action in reliance upon, this information by persons or entities other than the intended recipient is strictly prohibited.



    From: Nikhil Shekhar <nikhilshekhar2000@gmail.com>
    Reply-To: <storm-user@googlegroups.com>
    Date: Friday, January 4, 2013 12:59 PM
    To: <storm-user@googlegroups.com>
    Subject: [storm-user] Re: Pushing data into storm and then reading it through Spout

    Thanks for the inputs Ryan

    But what I am trying to do is....a socket server is pushing the data onto a socket..for example lets say to localhost and port 5000 "localhost:5000", now I want to read the data from the socket through a spout. But, when I try doing the same, I get a runtime exception saying that the Socket is not serializable.

    Nikhil

    On Friday, 4 January 2013 22:36:37 UTC+5:30, Ryan Moquin wrote:
    If I understand what you are asking, you could try using storm-osgi:

    https://github.com/rmoquin/storm-osgi

    Define a basicSpout with a spout source that reads from the socket, put the tuples onto the supplied queue data structure and they'll be emitted for you. If you'd like to try it out. You'll have to wait a week or so before I should have some enhanced cluster support available for the topologies.

    Ryan

    On Thursday, January 3, 2013 8:55:16 AM UTC-5, Nikhil Shekhar wrote:
    Hi,

    I am trying to push data into storm cluster by putting the data on a socket. Now, I want my Storm spout to read the data from the same socket.

    Any pointers/help on how exactly to achieve the above will be helpful

    Nikhil
    --
    Brian ONeill
    Lead Architect, Health Market Science (http://healthmarketscience.com)
    mobile:215.588.6024
    blog: http://weblogs.java.net/blog/boneill42/
    blog: http://brianoneill.blogspot.com/
    --
    Brian ONeill
    Lead Architect, Health Market Science (http://healthmarketscience.com)
    mobile:215.588.6024
    blog: http://weblogs.java.net/blog/boneill42/
    blog: http://brianoneill.blogspot.com/
  • Vinh at Jan 5, 2013 at 1:57 am
    Below is a good solution. A single conn pool in the VM, all spouts share the same pool reference, non-static pool reference in each spout, and error handling if the pool can't be initialized.

    public class ConnPoolManager {
    private static Integer lock = 0;
    private static ConnPoolManager instance;
    private ConnPool pool;
    private ConnPoolManager() {
    try {
    pool = new ConnPool();
    } catch (Exception e) {...}
    }
    public static ConnPoolManager getInstance() {
    if (instance == null) {
    synchronize(lock) {
    if (instance == null) {
    instance = new ConnPoolManager();
    }
    }
    }
    return instance;
    }
    public ConnPool getConnPool() {
    return pool;
    }
    }

    public class Spout extends ISpout {
    private ConnPool pool;
    public void open(…) {
    pool = ConnPoolManager.getInstance().getConnPool();
    }
    }

    On Jan 4, 2013, at 5:27 PM, Brian O'Neill wrote:


    @vinh
    That makes sense. (if you are okay if some fail because the port is already taken, or carefully control how many vms are on each machine)

    We liked the pooling approach so spouts could share a single connection pool per vm. (instead of each initializing its own pool)

    -brian

    On Jan 4, 2013, at 8:04 PM, vinh wrote:

    The socket pool can be declared as non-static. Just be sure to initialize it only in spout.open().

    On Jan 4, 2013, at 4:25 PM, Ryan Moquin wrote:

    Let me make sure I interpreted your suggestion correctly. Yes, the default serialization of a java object will not include the values of static variables because they belong to the class, not the object instance. Otherwise you would have a mess at deserialization. However, you said to set _socket to Pool.getSocket (5000). So you still end up with a socket instance assigned to an object's non transient member variable (_socket). If you meant that _socket should be changed to static to hold the socket reference, then every spout would use the exact same Socket instance. That would be a problem if you have multiple spouts since multiple threads would be using the same socket. The only way the deserialized class can initalize itself with a socket on a different port, is through it's default constructor. How would you know which socket belonged to which spout if you had more than one or which port the new ones should be assigned? Without holding onto a reference to a socket, how does the static method know which to return? It would be much more work to deal with that coordination, than it would be than to just mark the field transient and initalize the class in it's default constructor wouldn't it? Or am I missing something? If you configured storm so you only had one spout per jvm, then you wouldn't need a pool..... I can't think of any easy way to handle this situation otherwise. :)

    Ryan

    Ryan

    Static's don't serialize.

    -brian
    On Jan 4, 2013, at 1:55 PM, Ryan Moquin wrote:

    There's no reason to serialize a socket though. It will exist on the other side and you don't, nor would you want to have a serialized socket deserialized, in theory it might think it's already connected, have wrong environment settings stored.. just mark it transient.

    Ryan

    On Jan 4, 2013 1:15 PM, "Brian O'Neill" wrote:
    Nikhil,

    We had a similar issue with database connections to Cassandra. We switched to create a static pool of connections.

    You'll need to be careful however if you use statics since multiple spouts might be started in the same JVM (on different ports). In your case, create a SocketPool (static member variable). Then change every reference to _socket to be POOL.getSocket(5000).

    -brian

    ---
    Brian O'Neill
    Lead Architect, Software Development
    Health Market Science
    The Science of Better Results
    2700 Horizon Drive • King of Prussia, PA • 19406
    M: 215.588.6024 • @boneill42 •
    healthmarketscience.com

    This information transmitted in this email message is for the intended recipient only and may contain confidential and/or privileged material. If you received this email in error and are not the intended recipient, or the person responsible to deliver it to the intended recipient, please contact the sender at the email above and delete this email and any attachments and destroy any copies thereof. Any review, retransmission, dissemination, copying or other use of, or taking any action in reliance upon, this information by persons or entities other than the intended recipient is strictly prohibited.



    From: Nikhil Shekhar <nikhilshekhar2000@gmail.com>
    Reply-To: <storm-user@googlegroups.com>
    Date: Friday, January 4, 2013 12:59 PM
    To: <storm-user@googlegroups.com>
    Subject: [storm-user] Re: Pushing data into storm and then reading it through Spout

    Thanks for the inputs Ryan

    But what I am trying to do is....a socket server is pushing the data onto a socket..for example lets say to localhost and port 5000 "localhost:5000", now I want to read the data from the socket through a spout. But, when I try doing the same, I get a runtime exception saying that the Socket is not serializable.

    Nikhil

    On Friday, 4 January 2013 22:36:37 UTC+5:30, Ryan Moquin wrote:
    If I understand what you are asking, you could try using storm-osgi:

    https://github.com/rmoquin/storm-osgi

    Define a basicSpout with a spout source that reads from the socket, put the tuples onto the supplied queue data structure and they'll be emitted for you. If you'd like to try it out. You'll have to wait a week or so before I should have some enhanced cluster support available for the topologies.

    Ryan

    On Thursday, January 3, 2013 8:55:16 AM UTC-5, Nikhil Shekhar wrote:
    Hi,

    I am trying to push data into storm cluster by putting the data on a socket. Now, I want my Storm spout to read the data from the same socket.

    Any pointers/help on how exactly to achieve the above will be helpful

    Nikhil
    --
    Brian ONeill
    Lead Architect, Health Market Science (http://healthmarketscience.com)
    mobile:215.588.6024
    blog: http://weblogs.java.net/blog/boneill42/
    blog: http://brianoneill.blogspot.com/
    --
    Brian ONeill
    Lead Architect, Health Market Science (http://healthmarketscience.com)
    mobile:215.588.6024
    blog: http://weblogs.java.net/blog/boneill42/
    blog: http://brianoneill.blogspot.com/
  • Vinh at Jan 5, 2013 at 2:02 am
    Basically, lazy load the pool only when it's needed.
    On Jan 4, 2013, at 5:56 PM, vinh wrote:

    Below is a good solution. A single conn pool in the VM, all spouts share the same pool reference, non-static pool reference in each spout, and error handling if the pool can't be initialized.

    public class ConnPoolManager {
    private static Integer lock = 0;
    private static ConnPoolManager instance;
    private ConnPool pool;
    private ConnPoolManager() {
    try {
    pool = new ConnPool();
    } catch (Exception e) {...}
    }
    public static ConnPoolManager getInstance() {
    if (instance == null) {
    synchronize(lock) {
    if (instance == null) {
    instance = new ConnPoolManager();
    }
    }
    }
    return instance;
    }
    public ConnPool getConnPool() {
    return pool;
    }
    }

    public class Spout extends ISpout {
    private ConnPool pool;
    public void open(…) {
    pool = ConnPoolManager.getInstance().getConnPool();
    }
    }

    On Jan 4, 2013, at 5:27 PM, Brian O'Neill wrote:


    @vinh
    That makes sense. (if you are okay if some fail because the port is already taken, or carefully control how many vms are on each machine)

    We liked the pooling approach so spouts could share a single connection pool per vm. (instead of each initializing its own pool)

    -brian

    On Jan 4, 2013, at 8:04 PM, vinh wrote:

    The socket pool can be declared as non-static. Just be sure to initialize it only in spout.open().

    On Jan 4, 2013, at 4:25 PM, Ryan Moquin wrote:

    Let me make sure I interpreted your suggestion correctly. Yes, the default serialization of a java object will not include the values of static variables because they belong to the class, not the object instance. Otherwise you would have a mess at deserialization. However, you said to set _socket to Pool.getSocket (5000). So you still end up with a socket instance assigned to an object's non transient member variable (_socket). If you meant that _socket should be changed to static to hold the socket reference, then every spout would use the exact same Socket instance. That would be a problem if you have multiple spouts since multiple threads would be using the same socket. The only way the deserialized class can initalize itself with a socket on a different port, is through it's default constructor. How would you know which socket belonged to which spout if you had more than one or which port the new ones should be assigned? Without holding onto a reference to a socket, how does the static method know which to return? It would be much more work to deal with that coordination, than it would be than to just mark the field transient and initalize the class in it's default constructor wouldn't it? Or am I missing something? If you configured storm so you only had one spout per jvm, then you wouldn't need a pool..... I can't think of any easy way to handle this situation otherwise. :)

    Ryan

    Ryan

    Static's don't serialize.

    -brian
    On Jan 4, 2013, at 1:55 PM, Ryan Moquin wrote:

    There's no reason to serialize a socket though. It will exist on the other side and you don't, nor would you want to have a serialized socket deserialized, in theory it might think it's already connected, have wrong environment settings stored.. just mark it transient.

    Ryan

    On Jan 4, 2013 1:15 PM, "Brian O'Neill" wrote:
    Nikhil,

    We had a similar issue with database connections to Cassandra. We switched to create a static pool of connections.

    You'll need to be careful however if you use statics since multiple spouts might be started in the same JVM (on different ports). In your case, create a SocketPool (static member variable). Then change every reference to _socket to be POOL.getSocket(5000).

    -brian

    ---
    Brian O'Neill
    Lead Architect, Software Development
    Health Market Science
    The Science of Better Results
    2700 Horizon Drive • King of Prussia, PA • 19406
    M: 215.588.6024 • @boneill42 •
    healthmarketscience.com

    This information transmitted in this email message is for the intended recipient only and may contain confidential and/or privileged material. If you received this email in error and are not the intended recipient, or the person responsible to deliver it to the intended recipient, please contact the sender at the email above and delete this email and any attachments and destroy any copies thereof. Any review, retransmission, dissemination, copying or other use of, or taking any action in reliance upon, this information by persons or entities other than the intended recipient is strictly prohibited.



    From: Nikhil Shekhar <nikhilshekhar2000@gmail.com>
    Reply-To: <storm-user@googlegroups.com>
    Date: Friday, January 4, 2013 12:59 PM
    To: <storm-user@googlegroups.com>
    Subject: [storm-user] Re: Pushing data into storm and then reading it through Spout

    Thanks for the inputs Ryan

    But what I am trying to do is....a socket server is pushing the data onto a socket..for example lets say to localhost and port 5000 "localhost:5000", now I want to read the data from the socket through a spout. But, when I try doing the same, I get a runtime exception saying that the Socket is not serializable.

    Nikhil

    On Friday, 4 January 2013 22:36:37 UTC+5:30, Ryan Moquin wrote:
    If I understand what you are asking, you could try using storm-osgi:

    https://github.com/rmoquin/storm-osgi

    Define a basicSpout with a spout source that reads from the socket, put the tuples onto the supplied queue data structure and they'll be emitted for you. If you'd like to try it out. You'll have to wait a week or so before I should have some enhanced cluster support available for the topologies.

    Ryan

    On Thursday, January 3, 2013 8:55:16 AM UTC-5, Nikhil Shekhar wrote:
    Hi,

    I am trying to push data into storm cluster by putting the data on a socket. Now, I want my Storm spout to read the data from the same socket.

    Any pointers/help on how exactly to achieve the above will be helpful

    Nikhil
    --
    Brian ONeill
    Lead Architect, Health Market Science (http://healthmarketscience.com)
    mobile:215.588.6024
    blog: http://weblogs.java.net/blog/boneill42/
    blog: http://brianoneill.blogspot.com/
    --
    Brian ONeill
    Lead Architect, Health Market Science (http://healthmarketscience.com)
    mobile:215.588.6024
    blog: http://weblogs.java.net/blog/boneill42/
    blog: http://brianoneill.blogspot.com/
  • Brian O'Neill at Jan 5, 2013 at 2:16 am
    +1, I like it.

    -brian
    On Jan 4, 2013, at 9:02 PM, vinh wrote:

    Basically, lazy load the pool only when it's needed.
    On Jan 4, 2013, at 5:56 PM, vinh wrote:

    Below is a good solution. A single conn pool in the VM, all spouts share the same pool reference, non-static pool reference in each spout, and error handling if the pool can't be initialized.

    public class ConnPoolManager {
    private static Integer lock = 0;
    private static ConnPoolManager instance;
    private ConnPool pool;
    private ConnPoolManager() {
    try {
    pool = new ConnPool();
    } catch (Exception e) {...}
    }
    public static ConnPoolManager getInstance() {
    if (instance == null) {
    synchronize(lock) {
    if (instance == null) {
    instance = new ConnPoolManager();
    }
    }
    }
    return instance;
    }
    public ConnPool getConnPool() {
    return pool;
    }
    }

    public class Spout extends ISpout {
    private ConnPool pool;
    public void open(…) {
    pool = ConnPoolManager.getInstance().getConnPool();
    }
    }

    On Jan 4, 2013, at 5:27 PM, Brian O'Neill wrote:


    @vinh
    That makes sense. (if you are okay if some fail because the port is already taken, or carefully control how many vms are on each machine)

    We liked the pooling approach so spouts could share a single connection pool per vm. (instead of each initializing its own pool)

    -brian

    On Jan 4, 2013, at 8:04 PM, vinh wrote:

    The socket pool can be declared as non-static. Just be sure to initialize it only in spout.open().

    On Jan 4, 2013, at 4:25 PM, Ryan Moquin wrote:

    Let me make sure I interpreted your suggestion correctly. Yes, the default serialization of a java object will not include the values of static variables because they belong to the class, not the object instance. Otherwise you would have a mess at deserialization. However, you said to set _socket to Pool.getSocket (5000). So you still end up with a socket instance assigned to an object's non transient member variable (_socket). If you meant that _socket should be changed to static to hold the socket reference, then every spout would use the exact same Socket instance. That would be a problem if you have multiple spouts since multiple threads would be using the same socket. The only way the deserialized class can initalize itself with a socket on a different port, is through it's default constructor. How would you know which socket belonged to which spout if you had more than one or which port the new ones should be assigned? Without holding onto a reference to a socket, how does the static method know which to return? It would be much more work to deal with that coordination, than it would be than to just mark the field transient and initalize the class in it's default constructor wouldn't it? Or am I missing something? If you configured storm so you only had one spout per jvm, then you wouldn't need a pool..... I can't think of any easy way to handle this situation otherwise. :)

    Ryan

    Ryan

    Static's don't serialize.

    -brian
    On Jan 4, 2013, at 1:55 PM, Ryan Moquin wrote:

    There's no reason to serialize a socket though. It will exist on the other side and you don't, nor would you want to have a serialized socket deserialized, in theory it might think it's already connected, have wrong environment settings stored.. just mark it transient.

    Ryan

    On Jan 4, 2013 1:15 PM, "Brian O'Neill" wrote:
    Nikhil,

    We had a similar issue with database connections to Cassandra. We switched to create a static pool of connections.

    You'll need to be careful however if you use statics since multiple spouts might be started in the same JVM (on different ports). In your case, create a SocketPool (static member variable). Then change every reference to _socket to be POOL.getSocket(5000).

    -brian

    ---
    Brian O'Neill
    Lead Architect, Software Development
    Health Market Science
    The Science of Better Results
    2700 Horizon Drive • King of Prussia, PA • 19406
    M: 215.588.6024 • @boneill42 •
    healthmarketscience.com

    This information transmitted in this email message is for the intended recipient only and may contain confidential and/or privileged material. If you received this email in error and are not the intended recipient, or the person responsible to deliver it to the intended recipient, please contact the sender at the email above and delete this email and any attachments and destroy any copies thereof. Any review, retransmission, dissemination, copying or other use of, or taking any action in reliance upon, this information by persons or entities other than the intended recipient is strictly prohibited.



    From: Nikhil Shekhar <nikhilshekhar2000@gmail.com>
    Reply-To: <storm-user@googlegroups.com>
    Date: Friday, January 4, 2013 12:59 PM
    To: <storm-user@googlegroups.com>
    Subject: [storm-user] Re: Pushing data into storm and then reading it through Spout

    Thanks for the inputs Ryan

    But what I am trying to do is....a socket server is pushing the data onto a socket..for example lets say to localhost and port 5000 "localhost:5000", now I want to read the data from the socket through a spout. But, when I try doing the same, I get a runtime exception saying that the Socket is not serializable.

    Nikhil

    On Friday, 4 January 2013 22:36:37 UTC+5:30, Ryan Moquin wrote:
    If I understand what you are asking, you could try using storm-osgi:

    https://github.com/rmoquin/storm-osgi

    Define a basicSpout with a spout source that reads from the socket, put the tuples onto the supplied queue data structure and they'll be emitted for you. If you'd like to try it out. You'll have to wait a week or so before I should have some enhanced cluster support available for the topologies.

    Ryan

    On Thursday, January 3, 2013 8:55:16 AM UTC-5, Nikhil Shekhar wrote:
    Hi,

    I am trying to push data into storm cluster by putting the data on a socket. Now, I want my Storm spout to read the data from the same socket.

    Any pointers/help on how exactly to achieve the above will be helpful

    Nikhil
    --
    Brian ONeill
    Lead Architect, Health Market Science (http://healthmarketscience.com)
    mobile:215.588.6024
    blog: http://weblogs.java.net/blog/boneill42/
    blog: http://brianoneill.blogspot.com/
    --
    Brian ONeill
    Lead Architect, Health Market Science (http://healthmarketscience.com)
    mobile:215.588.6024
    blog: http://weblogs.java.net/blog/boneill42/
    blog: http://brianoneill.blogspot.com/
    --
    Brian ONeill
    Lead Architect, Health Market Science (http://healthmarketscience.com)
    mobile:215.588.6024
    blog: http://weblogs.java.net/blog/boneill42/
    blog: http://brianoneill.blogspot.com/
  • Nikhil Shekhar at Jan 8, 2013 at 11:02 am
    Thanks Ryan, I marked the socket transient and it worked. Even the insights
    form Brian and Vinh were very informative :)
    On Saturday, 5 January 2013 07:37:45 UTC+5:30, Brian O'Neill wrote:

    +1, I like it.

    -brian

    On Jan 4, 2013, at 9:02 PM, vinh wrote:

    Basically, lazy load the pool only when it's needed.

    On Jan 4, 2013, at 5:56 PM, vinh <vi...@loggly.com <javascript:>> wrote:

    Below is a good solution. A single conn pool in the VM, all spouts share
    the same pool reference, non-static pool reference in each spout, and error
    handling if the pool can't be initialized.

    public class ConnPoolManager {
    private static Integer lock = 0;
    private static ConnPoolManager instance;
    private ConnPool pool;
    private ConnPoolManager() {
    try {
    pool = new ConnPool();
    } catch (Exception e) {...}
    }
    public static ConnPoolManager getInstance() {
    if (instance == null) {
    synchronize(lock) {
    if (instance == null) {
    instance = new ConnPoolManager();
    }
    }
    }
    return instance;
    }
    public ConnPool getConnPool() {
    return pool;
    }
    }

    public class Spout extends ISpout {
    private ConnPool pool;
    public void open(…) {
    pool = ConnPoolManager.getInstance().getConnPool();
    }
    }


    On Jan 4, 2013, at 5:27 PM, Brian O'Neill wrote:


    @vinh
    That makes sense. (if you are okay if some fail because the port is
    already taken, or carefully control how many vms are on each machine)

    We liked the pooling approach so spouts could share a single connection
    pool per vm. (instead of each initializing its own pool)

    -brian


    On Jan 4, 2013, at 8:04 PM, vinh wrote:

    The socket pool can be declared as non-static. Just be sure to initialize
    it only in spout.open().


    On Jan 4, 2013, at 4:25 PM, Ryan Moquin <fragil...@gmail.com <javascript:>>
    wrote:

    Let me make sure I interpreted your suggestion correctly. Yes, the
    default serialization of a java object will not include the values of
    static variables because they belong to the class, not the object instance.
    Otherwise you would have a mess at deserialization. However, you said to
    set _socket to Pool.getSocket (5000). So you still end up with a socket
    instance assigned to an object's non transient member variable (_socket).
    If you meant that _socket should be changed to static to hold the socket
    reference, then every spout would use the exact same Socket instance. That
    would be a problem if you have multiple spouts since multiple threads would
    be using the same socket. The only way the deserialized class can
    initalize itself with a socket on a different port, is through it's default
    constructor. How would you know which socket belonged to which spout if
    you had more than one or which port the new ones should be assigned?
    Without holding onto a reference to a socket, how does the static method
    know which to return? It would be much more work to deal with that
    coordination, than it would be than to just mark the field transient and
    initalize the class in it's default constructor wouldn't it? Or am I
    missing something? If you configured storm so you only had one spout per
    jvm, then you wouldn't need a pool..... I can't think of any easy way to
    handle this situation otherwise. :)

    Ryan

    Ryan
    Static's don't serialize.

    -brian

    On Jan 4, 2013, at 1:55 PM, Ryan Moquin wrote:

    There's no reason to serialize a socket though. It will exist on the
    other side and you don't, nor would you want to have a serialized socket
    deserialized, in theory it might think it's already connected, have wrong
    environment settings stored.. just mark it transient.

    Ryan
    On Jan 4, 2013 1:15 PM, "Brian O'Neill" wrote:

    Nikhil,

    We had a similar issue with database connections to Cassandra. We
    switched to create a static pool of connections.

    You'll need to be careful however if you use statics since multiple
    spouts might be started in the same JVM (on different ports). In your
    case, create a SocketPool (static member variable). Then change every
    reference to _socket to be POOL.getSocket(5000).

    -brian

    ---
    Brian O'Neill
    Lead Architect, Software Development
    *Health Market Science*
    *The Science of Better Results*
    2700 Horizon Drive • King of Prussia, PA • 19406****
    M: 215.588.6024 • @boneill42 <http://www.twitter.com/boneill42> •
    healthmarketscience.com

    This information transmitted in this email message is for the intended
    recipient only and may contain confidential and/or privileged material. If
    you received this email in error and are not the intended recipient, or the
    person responsible to deliver it to the intended recipient, please contact
    the sender at the email above and delete this email and any attachments and
    destroy any copies thereof. Any review, retransmission, dissemination,
    copying or other use of, or taking any action in reliance upon, this
    information by persons or entities other than the intended recipient is
    strictly prohibited.****
    ** **


    From: Nikhil Shekhar <nikhilsh...@gmail.com <javascript:>>
    Reply-To: <storm...@googlegroups.com <javascript:>>
    Date: Friday, January 4, 2013 12:59 PM
    To: <storm...@googlegroups.com <javascript:>>
    Subject: [storm-user] Re: Pushing data into storm and then reading it
    through Spout

    Thanks for the inputs Ryan

    But what I am trying to do is....a socket server is pushing the data onto
    a socket..for example lets say to localhost and port 5000 "localhost:5000",
    now I want to read the data from the socket through a spout. But, when I
    try doing the same, I get a runtime exception saying that the Socket is not
    serializable.

    Nikhil
    On Friday, 4 January 2013 22:36:37 UTC+5:30, Ryan Moquin wrote:

    If I understand what you are asking, you could try using storm-osgi:

    https://github.com/rmoquin/**storm-osgi<https://github.com/rmoquin/storm-osgi>

    Define a basicSpout with a spout source that reads from the socket, put
    the tuples onto the supplied queue data structure and they'll be emitted
    for you. If you'd like to try it out. You'll have to wait a week or so
    before I should have some enhanced cluster support available for the
    topologies.

    Ryan
    On Thursday, January 3, 2013 8:55:16 AM UTC-5, Nikhil Shekhar wrote:

    Hi,

    I am trying to push data into storm cluster by putting the data on a
    socket. Now, I want my Storm spout to read the data from the same socket.

    Any pointers/help on how exactly to achieve the above will be helpful

    Nikhil
    --
    Brian ONeill
    Lead Architect, Health Market Science (http://healthmarketscience.com)
    mobile:215.588.6024
    blog: http://weblogs.java.net/blog/boneill42/
    blog: http://brianoneill.blogspot.com/



    --
    Brian ONeill
    Lead Architect, Health Market Science (http://healthmarketscience.com)
    mobile:215.588.6024
    blog: http://weblogs.java.net/blog/boneill42/
    blog: http://brianoneill.blogspot.com/




    --
    Brian ONeill
    Lead Architect, Health Market Science (http://healthmarketscience.com)
    mobile:215.588.6024
    blog: http://weblogs.java.net/blog/boneill42/
    blog: http://brianoneill.blogspot.com/
  • Nikhil Shekhar at Jan 8, 2013 at 11:15 am
    The issue that I see now, with my Spout is that, it is throwing a
    "NullPointerException" when it is trying to emit as below in the
    nextTuple() method:

    *collector.emit(new Values(response));*
    *
    *
    The connection to the Socket is established properly, but the Spout is not
    able to emit. The Detail stack trace is as below:
    java.lang.NullPointerException
    at
    com.cognizant.yahoostock.ApiStreamingSpout.nextTuple(ApiStreamingSpout.java:42)
    at
    backtype.storm.daemon.executor$fn__3968$fn__4009$fn__4010.invoke(executor.clj:433)
    at backtype.storm.util$async_loop$fn__465.invoke(util.clj:377)
    at clojure.lang.AFn.run(AFn.java:24)
    at java.lang.Thread.run(Thread.java:679)

    The other methods in the spout are as below:
    public void declareOutputFields(OutputFieldsDeclarer declarer) {
    declarer.declare(new Fields("stock"));
    }

    I am submitting the topology as below:

    TopologyBuilder builder = new TopologyBuilder();
    builder.setSpout("stock-collector", new ApiStreamingSpout(),1);
    builder.setBolt("sumarizer", new
    Sumarize()).shuffleGrouping("stock-collector");


    Everything seems fine but the Spout throws the exception when it encounters
    the emit() in nextTuple()

    Thanks for the replies in advance :)
  • Ryan Moquin at Jan 5, 2013 at 2:43 am
    I like the concept of pooling the connections, creating new objects from
    serialized java introduces some additional work. Brian is right though, if
    you don't mind working around those things, then it becomes a matter of
    preference. :)

    Ryan
    On Jan 4, 2013 8:29 PM, "Brian O'Neill" wrote:


    @vinh
    That makes sense. (if you are okay if some fail because the port is
    already taken, or carefully control how many vms are on each machine)

    We liked the pooling approach so spouts could share a single connection
    pool per vm. (instead of each initializing its own pool)

    -brian


    On Jan 4, 2013, at 8:04 PM, vinh wrote:

    The socket pool can be declared as non-static. Just be sure to initialize
    it only in spout.open().


    On Jan 4, 2013, at 4:25 PM, Ryan Moquin wrote:

    Let me make sure I interpreted your suggestion correctly. Yes, the
    default serialization of a java object will not include the values of
    static variables because they belong to the class, not the object instance.
    Otherwise you would have a mess at deserialization. However, you said to
    set _socket to Pool.getSocket (5000). So you still end up with a socket
    instance assigned to an object's non transient member variable (_socket).
    If you meant that _socket should be changed to static to hold the socket
    reference, then every spout would use the exact same Socket instance. That
    would be a problem if you have multiple spouts since multiple threads would
    be using the same socket. The only way the deserialized class can
    initalize itself with a socket on a different port, is through it's default
    constructor. How would you know which socket belonged to which spout if
    you had more than one or which port the new ones should be assigned?
    Without holding onto a reference to a socket, how does the static method
    know which to return? It would be much more work to deal with that
    coordination, than it would be than to just mark the field transient and
    initalize the class in it's default constructor wouldn't it? Or am I
    missing something? If you configured storm so you only had one spout per
    jvm, then you wouldn't need a pool..... I can't think of any easy way to
    handle this situation otherwise. :)

    Ryan

    Ryan
    Static's don't serialize.

    -brian

    On Jan 4, 2013, at 1:55 PM, Ryan Moquin wrote:

    There's no reason to serialize a socket though. It will exist on the
    other side and you don't, nor would you want to have a serialized socket
    deserialized, in theory it might think it's already connected, have wrong
    environment settings stored.. just mark it transient.

    Ryan
    On Jan 4, 2013 1:15 PM, "Brian O'Neill" wrote:

    Nikhil,

    We had a similar issue with database connections to Cassandra. We
    switched to create a static pool of connections.

    You'll need to be careful however if you use statics since multiple
    spouts might be started in the same JVM (on different ports). In your
    case, create a SocketPool (static member variable). Then change every
    reference to _socket to be POOL.getSocket(5000).

    -brian

    ---
    Brian O'Neill
    Lead Architect, Software Development
    *Health Market Science*
    *The Science of Better Results*
    2700 Horizon Drive • King of Prussia, PA • 19406****
    M: 215.588.6024 • @boneill42 <http://www.twitter.com/boneill42> •
    healthmarketscience.com

    This information transmitted in this email message is for the intended
    recipient only and may contain confidential and/or privileged material. If
    you received this email in error and are not the intended recipient, or the
    person responsible to deliver it to the intended recipient, please contact
    the sender at the email above and delete this email and any attachments and
    destroy any copies thereof. Any review, retransmission, dissemination,
    copying or other use of, or taking any action in reliance upon, this
    information by persons or entities other than the intended recipient is
    strictly prohibited.****
    ** **


    From: Nikhil Shekhar <nikhilshekhar2000@gmail.com>
    Reply-To: <storm-user@googlegroups.com>
    Date: Friday, January 4, 2013 12:59 PM
    To: <storm-user@googlegroups.com>
    Subject: [storm-user] Re: Pushing data into storm and then reading it
    through Spout

    Thanks for the inputs Ryan

    But what I am trying to do is....a socket server is pushing the data onto
    a socket..for example lets say to localhost and port 5000 "localhost:5000",
    now I want to read the data from the socket through a spout. But, when I
    try doing the same, I get a runtime exception saying that the Socket is not
    serializable.

    Nikhil
    On Friday, 4 January 2013 22:36:37 UTC+5:30, Ryan Moquin wrote:

    If I understand what you are asking, you could try using storm-osgi:

    https://github.com/rmoquin/**storm-osgi<https://github.com/rmoquin/storm-osgi>

    Define a basicSpout with a spout source that reads from the socket, put
    the tuples onto the supplied queue data structure and they'll be emitted
    for you. If you'd like to try it out. You'll have to wait a week or so
    before I should have some enhanced cluster support available for the
    topologies.

    Ryan
    On Thursday, January 3, 2013 8:55:16 AM UTC-5, Nikhil Shekhar wrote:

    Hi,

    I am trying to push data into storm cluster by putting the data on a
    socket. Now, I want my Storm spout to read the data from the same socket.

    Any pointers/help on how exactly to achieve the above will be helpful

    Nikhil
    --
    Brian ONeill
    Lead Architect, Health Market Science (http://healthmarketscience.com)
    mobile:215.588.6024
    blog: http://weblogs.java.net/blog/boneill42/
    blog: http://brianoneill.blogspot.com/



    --
    Brian ONeill
    Lead Architect, Health Market Science (http://healthmarketscience.com)
    mobile:215.588.6024
    blog: http://weblogs.java.net/blog/boneill42/
    blog: http://brianoneill.blogspot.com/

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupstorm-user @
postedJan 3, '13 at 1:55p
activeJan 8, '13 at 11:15a
posts16
users4
websitestorm-project.net
irc#storm-user

People

Translate

site design / logo © 2022 Grokbase