13.4. Tools

13.4.1. Builtin Tools Master Web Interface

The Master starts a web-interface on port 60010 by default.

The Master web UI lists created tables and their definition (e.g., ColumnFamilies, blocksize, etc.). Additionally, the available RegionServers in the cluster are listed along with selected high-level metrics (requests, number of regions, usedHeap, maxHeap). The Master web UI allows navigation to each RegionServer's web UI. RegionServer Web Interface

RegionServers starts a web-interface on port 60030 by default.

The RegionServer web UI lists online regions and their start/end keys, as well as point-in-time RegionServer metrics (requests, regions, storeFileIndexSize, compactionQueueSize, etc.).

See Section 15.4, “HBase Metrics” for more information in metric definitions. zkcli

zkcli is a very useful tool for investigating ZooKeeper-related issues. To invoke:

./hbase zkcli -server host:port <cmd> <args>

The commands (and arguments) are:

	connect host:port
	get path [watch]
	ls path [watch]
	set path data [version]
	delquota [-n|-b] path
	printwatches on|off
	create [-s] [-e] path data acl
	stat path [watch]
	ls2 path [watch]
	listquota path
	setAcl path acl
	getAcl path
	sync path
	redo cmdno
	addauth scheme auth
	delete path [version]
	setquota -n|-b val path

13.4.2. External Tools tail

tail is the command line tool that lets you look at the end of a file. Add the “-f” option and it will refresh when new data is available. It’s useful when you are wondering what’s happening, for example, when a cluster is taking a long time to shutdown or startup as you can just fire a new terminal and tail the master log (and maybe a few RegionServers). top

top is probably one of the most important tool when first trying to see what’s running on a machine and how the resources are consumed. Here’s an example from production system:

top - 14:46:59 up 39 days, 11:55,  1 user,  load average: 3.75, 3.57, 3.84
Tasks: 309 total,   1 running, 308 sleeping,   0 stopped,   0 zombie
Cpu(s):  4.5%us,  1.6%sy,  0.0%ni, 91.7%id,  1.4%wa,  0.1%hi,  0.6%si,  0.0%st
Mem:  24414432k total, 24296956k used,   117476k free,     7196k buffers
Swap: 16008732k total,	14348k used, 15994384k free, 11106908k cached

15558 hadoop	18  -2 3292m 2.4g 3556 S   79 10.4   6523:52 java
13268 hadoop	18  -2 8967m 8.2g 4104 S   21 35.1   5170:30 java
 8895 hadoop	18  -2 1581m 497m 3420 S   11  2.1   4002:32 java

Here we can see that the system load average during the last five minutes is 3.75, which very roughly means that on average 3.75 threads were waiting for CPU time during these 5 minutes. In general, the “perfect” utilization equals to the number of cores, under that number the machine is under utilized and over that the machine is over utilized. This is an important concept, see this article to understand it more: http://www.linuxjournal.com/article/9001.

Apart from load, we can see that the system is using almost all its available RAM but most of it is used for the OS cache (which is good). The swap only has a few KBs in it and this is wanted, high numbers would indicate swapping activity which is the nemesis of performance of Java systems. Another way to detect swapping is when the load average goes through the roof (although this could also be caused by things like a dying disk, among others).

The list of processes isn’t super useful by default, all we know is that 3 java processes are using about 111% of the CPUs. To know which is which, simply type “c” and each line will be expanded. Typing “1” will give you the detail of how each CPU is used instead of the average for all of them like shown here. jps

jps is shipped with every JDK and gives the java process ids for the current user (if root, then it gives the ids for all users). Example:

hadoop@sv4borg12:~$ jps
1322 TaskTracker
17789 HRegionServer
27862 Child
1158 DataNode
25115 HQuorumPeer
2950 Jps
19750 ThriftServer
18776 jmx

In order, we see a:

  • Hadoop TaskTracker, manages the local Childs
  • HBase RegionServer, serves regions
  • Child, its MapReduce task, cannot tell which type exactly
  • Hadoop TaskTracker, manages the local Childs
  • Hadoop DataNode, serves blocks
  • HQuorumPeer, a ZooKeeper ensemble member
  • Jps, well… it’s the current process
  • ThriftServer, it’s a special one will be running only if thrift was started
  • jmx, this is a local process that’s part of our monitoring platform ( poorly named maybe). You probably don’t have that.

You can then do stuff like checking out the full command line that started the process:

hadoop@sv4borg12:~$ ps aux | grep HRegionServer
hadoop   17789  155 35.2 9067824 8604364 ?     S<l  Mar04 9855:48 /usr/java/jdk1.6.0_14/bin/java -Xmx8000m -XX:+DoEscapeAnalysis -XX:+AggressiveOpts -XX:+UseConcMarkSweepGC -XX:NewSize=64m -XX:MaxNewSize=64m -XX:CMSInitiatingOccupancyFraction=88 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xloggc:/export1/hadoop/logs/gc-hbase.log -Dcom.sun.management.jmxremote.port=10102 -Dcom.sun.management.jmxremote.authenticate=true -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.password.file=/home/hadoop/hbase/conf/jmxremote.password -Dcom.sun.management.jmxremote -Dhbase.log.dir=/export1/hadoop/logs -Dhbase.log.file=hbase-hadoop-regionserver-sv4borg12.log -Dhbase.home.dir=/home/hadoop/hbase -Dhbase.id.str=hadoop -Dhbase.root.logger=INFO,DRFA -Djava.library.path=/home/hadoop/hbase/lib/native/Linux-amd64-64 -classpath /home/hadoop/hbase/bin/../conf:[many jars]:/home/hadoop/hadoop/conf org.apache.hadoop.hbase.regionserver.HRegionServer start

jstack is one of the most important tools when trying to figure out what a java process is doing apart from looking at the logs. It has to be used in conjunction with jps in order to give it a process id. It shows a list of threads, each one has a name, and they appear in the order that they were created (so the top ones are the most recent threads). Here’s a few example:

The main thread of a RegionServer that’s waiting for something to do from the master:

      "regionserver60020" prio=10 tid=0x0000000040ab4000 nid=0x45cf waiting on condition [0x00007f16b6a96000..0x00007f16b6a96a70]
   java.lang.Thread.State: TIMED_WAITING (parking)
        	at sun.misc.Unsafe.park(Native Method)
        	- parking to wait for  <0x00007f16cd5c2f30> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        	at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:198)
        	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1963)
        	at java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:395)
        	at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:647)
        	at java.lang.Thread.run(Thread.java:619)

        	The MemStore flusher thread that is currently flushing to a file:
"regionserver60020.cacheFlusher" daemon prio=10 tid=0x0000000040f4e000 nid=0x45eb in Object.wait() [0x00007f16b5b86000..0x00007f16b5b87af0]
   java.lang.Thread.State: WAITING (on object monitor)
        	at java.lang.Object.wait(Native Method)
        	at java.lang.Object.wait(Object.java:485)
        	at org.apache.hadoop.ipc.Client.call(Client.java:803)
        	- locked <0x00007f16cb14b3a8> (a org.apache.hadoop.ipc.Client$Call)
        	at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:221)
        	at $Proxy1.complete(Unknown Source)
        	at sun.reflect.GeneratedMethodAccessor38.invoke(Unknown Source)
        	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        	at java.lang.reflect.Method.invoke(Method.java:597)
        	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
        	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
        	at $Proxy1.complete(Unknown Source)
        	at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.closeInternal(DFSClient.java:3390)
        	- locked <0x00007f16cb14b470> (a org.apache.hadoop.hdfs.DFSClient$DFSOutputStream)
        	at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.close(DFSClient.java:3304)
        	at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:61)
        	at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:86)
        	at org.apache.hadoop.hbase.io.hfile.HFile$Writer.close(HFile.java:650)
        	at org.apache.hadoop.hbase.regionserver.StoreFile$Writer.close(StoreFile.java:853)
        	at org.apache.hadoop.hbase.regionserver.Store.internalFlushCache(Store.java:467)
        	- locked <0x00007f16d00e6f08> (a java.lang.Object)
        	at org.apache.hadoop.hbase.regionserver.Store.flushCache(Store.java:427)
        	at org.apache.hadoop.hbase.regionserver.Store.access$100(Store.java:80)
        	at org.apache.hadoop.hbase.regionserver.Store$StoreFlusherImpl.flushCache(Store.java:1359)
        	at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:907)
        	at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:834)
        	at org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:786)
        	at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:250)
        	at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:224)
        	at org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.java:146)

A handler thread that’s waiting for stuff to do (like put, delete, scan, etc):

"IPC Server handler 16 on 60020" daemon prio=10 tid=0x00007f16b011d800 nid=0x4a5e waiting on condition [0x00007f16afefd000..0x00007f16afefd9f0]
   java.lang.Thread.State: WAITING (parking)
        	at sun.misc.Unsafe.park(Native Method)
        	- parking to wait for  <0x00007f16cd3f8dd8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
        	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
        	at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
        	at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1013)

And one that’s busy doing an increment of a counter (it’s in the phase where it’s trying to create a scanner in order to read the last value):

"IPC Server handler 66 on 60020" daemon prio=10 tid=0x00007f16b006e800 nid=0x4a90 runnable [0x00007f16acb77000..0x00007f16acb77cf0]
   java.lang.Thread.State: RUNNABLE
        	at org.apache.hadoop.hbase.regionserver.KeyValueHeap.<init>(KeyValueHeap.java:56)
        	at org.apache.hadoop.hbase.regionserver.StoreScanner.<init>(StoreScanner.java:79)
        	at org.apache.hadoop.hbase.regionserver.Store.getScanner(Store.java:1202)
        	at org.apache.hadoop.hbase.regionserver.HRegion$RegionScanner.<init>(HRegion.java:2209)
        	at org.apache.hadoop.hbase.regionserver.HRegion.instantiateInternalScanner(HRegion.java:1063)
        	at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1055)
        	at org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1039)
        	at org.apache.hadoop.hbase.regionserver.HRegion.getLastIncrement(HRegion.java:2875)
        	at org.apache.hadoop.hbase.regionserver.HRegion.incrementColumnValue(HRegion.java:2978)
        	at org.apache.hadoop.hbase.regionserver.HRegionServer.incrementColumnValue(HRegionServer.java:2433)
        	at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
        	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        	at java.lang.reflect.Method.invoke(Method.java:597)
        	at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:560)
        	at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1027)

A thread that receives data from HDFS:

"IPC Client (47) connection to sv4borg9/ from hadoop" daemon prio=10 tid=0x00007f16a02d0000 nid=0x4fa3 runnable [0x00007f16b517d000..0x00007f16b517dbf0]
   java.lang.Thread.State: RUNNABLE
        	at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
        	at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215)
        	at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
        	at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
        	- locked <0x00007f17d5b68c00> (a sun.nio.ch.Util$1)
        	- locked <0x00007f17d5b68be8> (a java.util.Collections$UnmodifiableSet)
        	- locked <0x00007f1877959b50> (a sun.nio.ch.EPollSelectorImpl)
        	at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
        	at org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:332)
        	at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157)
        	at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
        	at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
        	at java.io.FilterInputStream.read(FilterInputStream.java:116)
        	at org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:304)
        	at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
        	at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
        	- locked <0x00007f1808539178> (a java.io.BufferedInputStream)
        	at java.io.DataInputStream.readInt(DataInputStream.java:370)
        	at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:569)
        	at org.apache.hadoop.ipc.Client$Connection.run(Client.java:477)

And here is a master trying to recover a lease after a RegionServer died:

"LeaseChecker" daemon prio=10 tid=0x00000000407ef800 nid=0x76cd waiting on condition [0x00007f6d0eae2000..0x00007f6d0eae2a70]
   java.lang.Thread.State: WAITING (on object monitor)
        	at java.lang.Object.wait(Native Method)
        	at java.lang.Object.wait(Object.java:485)
        	at org.apache.hadoop.ipc.Client.call(Client.java:726)
        	- locked <0x00007f6d1cd28f80> (a org.apache.hadoop.ipc.Client$Call)
        	at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
        	at $Proxy1.recoverBlock(Unknown Source)
        	at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2636)
        	at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.<init>(DFSClient.java:2832)
        	at org.apache.hadoop.hdfs.DFSClient.append(DFSClient.java:529)
        	at org.apache.hadoop.hdfs.DistributedFileSystem.append(DistributedFileSystem.java:186)
        	at org.apache.hadoop.fs.FileSystem.append(FileSystem.java:530)
        	at org.apache.hadoop.hbase.util.FSUtils.recoverFileLease(FSUtils.java:619)
        	at org.apache.hadoop.hbase.regionserver.wal.HLog.splitLog(HLog.java:1322)
        	at org.apache.hadoop.hbase.regionserver.wal.HLog.splitLog(HLog.java:1210)
        	at org.apache.hadoop.hbase.master.HMaster.splitLogAfterStartup(HMaster.java:648)
        	at org.apache.hadoop.hbase.master.HMaster.joinCluster(HMaster.java:572)
        	at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:503)

OpenTSDB is an excellent alternative to Ganglia as it uses Apache HBase to store all the time series and doesn’t have to downsample. Monitoring your own HBase cluster that hosts OpenTSDB is a good exercise.

Here’s an example of a cluster that’s suffering from hundreds of compactions launched almost all around the same time, which severely affects the IO performance: (TODO: insert graph plotting compactionQueueSize)

It’s a good practice to build dashboards with all the important graphs per machine and per cluster so that debugging issues can be done with a single quick look. For example, at StumbleUpon there’s one dashboard per cluster with the most important metrics from both the OS and Apache HBase. You can then go down at the machine level and get even more detailed metrics. clusterssh+top

clusterssh+top, it’s like a poor man’s monitoring system and it can be quite useful when you have only a few machines as it’s very easy to setup. Starting clusterssh will give you one terminal per machine and another terminal in which whatever you type will be retyped in every window. This means that you can type “top” once and it will start it for all of your machines at the same time giving you full view of the current state of your cluster. You can also tail all the logs at the same time, edit files, etc.

comments powered by Disqus