Ari Rabkin updated HADOOP-3719:
Chukwa is designed to collect monitoring data (especially log files), and get the data into HDFS as quickly as possible. Data is initially collected by a Local Agent running on each machine being monitored. This Local Agent has a pluggable architecture, allowing many different adaptors to be used, each of which produces a particular stream of data. Local Agents send their data via HTTP to Collectors, which write out data into "sink files" in HDFS.
Map-reduce jobs run periodically to analyze these sink files, and to drain their contents into structured storage.
Chukwa provides a natural solution to the log collection problem, posed in HADOOP-2206. Once we have Chukwa working at scale, we intend to produce some patches to Hadoop to trigger log collection appropriately.
We expect this work to ultimately be complementary to HADOOP-3585, the failure analysis system. We want to collect similar data, and our framework is flexible enough to accommodate the proposed structure there, with only modest code changes on each side.
The attached document introduces Chukwa, and describes the data collection architecture. We do not present our analytics and visualization in detail in this document. We intend to describe them in a second document in the near future.
Project: Hadoop Core
Issue Type: Improvement
Reporter: Ari Rabkin
We'd like to contribute Chukwa, a data collection and analysis framework being developed at Yahoo!. Chukwa is a natural complement to Hadoop, since it is built on top of HDFS and Map-Reduce, and since Hadoop clusters are a key use case.
This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.