Because the potential storage capability of a Hadoop cluster is so very large, you need some means to track both the data contained on the cluster and the data feeds moving data into and out of it. In addition, you need to consider the locations where data might reside on the cluster—that is, in HDFS, Hive, HBase, or Impala. Knowing you should track your data only spawns more questions, however: What type of reporting might be required and in what format? Is a dashboard needed to post the status of data at any given moment? Are graphs or tables helpful to show the state of a data source for a given time period, such as the days in a week?