that's the bit that's 'coming'.
In theory, the plan will look a bit like this:
* install PCP (should be rpm's/debian packages available here: ftp://oss.sgi.com/projects/pcp/download/) on all nodes in the cluster
* download the PCP Glider (http://oss.sgi.com/projects/pcp/pcp-gui.html) UI tools, you can just install that on your own desktop, or wherever you wish to run them from
* drop in the jar I'll provide (hopefully RSN) in the hbase/lib area for all nodes in the cluster (so, drop in hbase/lib presumably)
* mod the hadoop-metrics.properties to specify the new PCPMetricContext
* fire up HBase
at this point you can point the PCP client tools (pmchart, pmdumptext, etc) at any and all nodes to pull hardware, os, java, hbase/hadoop metrics out.
* run a script that we'll provide that is pointed at the hbase/conf/regionservers, which will convert the topology into canned configurations for the visualizers. This step is just to get a basic known-good working viz of the cluster, but one could in theory point any of the tools at any or all of the nodes in the cluster and cherry pick what metrics you wanted to look at.
For retrospective logging/archive purposes there's an additional few steps just to configure which metrics you want to log, how frequently, but that's pretty simple.
I'm really hoping to be able to provide the jar soon, and some good steps for someone to try out, but honestly I would recommend just grabbing the base PCP packages on the cluster, because I think you'll find that monitoring the base hardware and OS of the cluster is very interesting and useful.
I don't like talking vapourware, I'm really sorry I haven't completed this in a form I'm comfortable sharing in more detail, but if you can just bear with me a bit longer.
If anyone has any more questions about what it might/could do fire away, I'd like to discuss what in an ideal world you'd like to have in cluster monitoring/retrospective analysis so I can use these concrete cases to show where/how this setup would be of high value.
On 08/04/2010, at 2:20 PM, Stack wrote:
On Tue, Apr 6, 2010 at 6:10 PM, Paul Smith wrote:
anyway, some more ideas to kick around and discuss.
What do we have to do to get it running on one of our clusters Paul?