HDFS Commands Guide

Overview

All HDFS commands are invoked by the bin/hdfs script. Running the hdfs script without any arguments prints the description for all commands.

Usage: hdfs [--config confdir] [COMMAND] [GENERIC_OPTIONS] [COMMAND_OPTIONS]

Hadoop has an option parsing framework that employs parsing generic options as well as running classes.

COMMAND_OPTION Description
--config confdir Overwrites the default Configuration directory. Default is $HADOOP_HOME/conf.
GENERIC_OPTIONS The common set of options supported by multiple commands. Full list is here.
COMMAND_OPTIONS Various commands with their options are described in the following sections. The commands have been grouped into and .

User Commands

Commands useful for users of a hadoop cluster.

dfs

Usage: hdfs dfs [GENERIC_OPTIONS] [COMMAND_OPTIONS]

Run a filesystem command on the file system supported in Hadoop. The various COMMAND_OPTIONS can be found at File System Shell Guide.

fetchdt

Gets Delegation Token from a NameNode. See fetchdt for more info.

Usage: hdfs fetchdt [GENERIC_OPTIONS] [--webservice <namenode_http_addr>] <path>

COMMAND_OPTION Description
fileName File name to store the token into.
--webservice https_address use http protocol instead of RPC

fsck

Runs a HDFS filesystem checking utility. See fsck for more info.

Usage: hdfs fsck [GENERIC_OPTIONS] <path> [-list-corruptfileblocks | [-move | -delete | -openforwrite] [-files [-blocks [-locations | -racks]]]] [-includeSnapshots]

COMMAND_OPTION Description
path Start checking from this path.
-move Move corrupted files to /lost+found
-delete Delete corrupted files.
-files Print out files being checked.
-openforwrite Print out files opened for write.
-includeSnapshots Include snapshot data if the given path indicates a snapshottable directory or there are snapshottable directories under it.
-list-corruptfileblocks Print out list of missing blocks and files they belong to.
-blocks Print out block report.
-locations Print out locations for every block.
-racks Print out network topology for data-node locations.

version

Prints the version.

Usage: hdfs version

version

Prints the version.

Usage: hdfs version

Administration Commands

Commands useful for administrators of a hadoop cluster.

balancer

Runs a cluster balancing utility. An administrator can simply press Ctrl-C to stop the rebalancing process. See Balancer for more details.

   Usage: hdfs balancer
           [-threshold <threshold>]
           [-policy <policy>]
           [-exclude [-f <hosts-file> | <comma-separated list of hosts>]]
           [-include [-f <hosts-file> | <comma-separated list of hosts>]]
           [-idleiterations <idleiterations>]
COMMAND_OPTION Description
-threshold threshold Percentage of disk capacity. This overwrites the default threshold.
-policy policy datanode (default): Cluster is balanced if each datanode is balanced.  
blockpool: Cluster is balanced if each block pool in each datanode is balanced.
-exclude -f <hosts-file> | <comma-separated list of hosts> Excludes the specified datanodes from being balanced by the balancer.
-include -f <hosts-file> | <comma-separated list of hosts> Includes only the specified datanode to be balanced by the balancer.
-idleiterations iterations Maximum number of idle iterations before exit. This overwrites the default idleiterations(5).

Note that the blockpool policy is more strict than the datanode policy.

Besides the above command options, a pinning feature is introduced starting from 2.7.0 to prevent certain replicas from getting moved by balancer/mover. This pinning feature is disabled by default, and can be enabled by configuration property "dfs.datanode.block-pinning.enabled". When enabled, this feature only affects blocks that are written to favored nodes specified in the create() call. This feature is useful when we want to maintain the data locality, for applications such as HBase regionserver.

datanode

Runs a HDFS datanode.

Usage: hdfs datanode [-regular | -rollback | -rollingupgrace rollback]

COMMAND_OPTION Description
-regular Normal datanode startup (default).
-rollback Rollback the datanode to the previous version. This should be used after stopping the datanode and distributing the old hadoop version.
-rollingupgrade rollback Rollback a rolling upgrade operation.

dfsadmin

Runs a HDFS dfsadmin client.

   Usage: <<<hdfs dfsadmin [GENERIC_OPTIONS]
          [-report [-live] [-dead] [-decommissioning]]
          [-safemode enter | leave | get | wait]
          [-saveNamespace]
          [-rollEdits]
          [-restoreFailedStorage true|false|check]
          [-refreshNodes]
          [-setQuota <quota> <dirname>...<dirname>]
          [-clrQuota <dirname>...<dirname>]
          [-setSpaceQuota <quota> <dirname>...<dirname>]
          [-clrSpaceQuota <dirname>...<dirname>]
          [-setStoragePolicy <path> <policyName>]
          [-getStoragePolicy <path>]
          [-finalizeUpgrade]
          [-rollingUpgrade [<query>|<prepare>|<finalize>]]
          [-metasave filename]
          [-refreshServiceAcl]
          [-refreshUserToGroupsMappings]
          [-refreshSuperUserGroupsConfiguration]
          [-refreshCallQueue]
          [-refresh <host:ipc_port> <key> [arg1..argn]]
          [-reconfig <datanode|...> <host:ipc_port> <start|status>]
          [-printTopology]
          [-refreshNamenodes datanodehost:port]
          [-deleteBlockPool datanode-host:port blockpoolId [force]]
          [-setBalancerBandwidth <bandwidth in bytes per second>]
          [-allowSnapshot <snapshotDir>]
          [-disallowSnapshot <snapshotDir>]
          [-fetchImage <local directory>]
          [-shutdownDatanode <datanode_host:ipc_port> [upgrade]]
          [-getDatanodeInfo <datanode_host:ipc_port>]
          [-triggerBlockReport [-incremental] <datanode_host:ipc_port>]
          [-listOpenFiles]
          [-help [cmd]]>>>
COMMAND_OPTION Description
-report [-live] [-dead] [-decommissioning] Reports basic filesystem information and statistics. The dfs usage can be different from "du" usage, because it measures raw space used by replication, checksums, snapshots and etc. on all the DNs. Optional flags may be used to filter the list of displayed DataNodes.
-safemode enter|leave|get|wait Safe mode maintenance command. Safe mode is a Namenode state in which it  
1. does not accept changes to the name space (read-only)  
2. does not replicate or delete blocks.  
Safe mode is entered automatically at Namenode startup, and leaves safe mode automatically when the configured minimum percentage of blocks satisfies the minimum replication condition. Safe mode can also be entered manually, but then it can only be turned off manually as well.
-saveNamespace Save current namespace into storage directories and reset edits log. Requires safe mode.
-rollEdits Rolls the edit log on the active NameNode.
-restoreFailedStorage true|false|check This option will turn on/off automatic attempt to restore failed storage replicas. If a failed storage becomes available again the system will attempt to restore edits and/or fsimage during checkpoint. 'check' option will return current setting.
-refreshNodes Re-read the hosts and exclude files to update the set of Datanodes that are allowed to connect to the Namenode and those that should be decommissioned or recommissioned.
-setQuota <quota> <dirname>...<dirname> See HDFS Quotas Guide for the detail.
-clrQuota <dirname>...<dirname> See HDFS Quotas Guide for the detail.
-setSpaceQuota <quota> <dirname>...<dirname> See HDFS Quotas Guide for the detail.
-clrSpaceQuota <dirname>...<dirname> See HDFS Quotas Guide for the detail.
-setStoragePolicy <path> <policyName> Set a storage policy to a file or a directory.
-getStoragePolicy <path> Get the storage policy of a file or a directory.
-finalizeUpgrade Finalize upgrade of HDFS. Datanodes delete their previous version working directories, followed by Namenode doing the same. This completes the upgrade process.
-rollingUpgrade [<query>|<prepare>|<finalize>] See Rolling Upgrade document for the detail.
-metasave filename Save Namenode's primary data structures to <filename> in the directory specified by hadoop.log.dir property. <filename> is overwritten if it exists. <filename> will contain one line for each of the following 
1. Datanodes heart beating with Namenode 
2. Blocks waiting to be replicated 
3. Blocks currently being replicated 
4. Blocks waiting to be deleted
-refreshServiceAcl Reload the service-level authorization policy file.
-refreshUserToGroupsMappings Refresh user-to-groups mappings.
-refreshSuperUserGroupsConfiguration Refresh superuser proxy groups mappings
-refreshCallQueue Reload the call queue from config.
-refresh <host:ipc_port> <key> [arg1..argn] Triggers a runtime-refresh of the resource specified by <key> on <host:ipc_port>. All other args after are sent to the host.
-reconfig <datanode ...> <host:ipc_port> <start reconfiguration or get the status of an ongoing reconfiguration. The second parameter specifies the node type. Currently, only reloading DataNode's configuration is supported.
-printTopology Print a tree of the racks and their nodes as reported by the Namenode
-refreshNamenodes datanodehost:port For the given datanode, reloads the configuration files, stops serving the removed block-pools and starts serving new block-pools.
-deleteBlockPool datanode-host:port blockpoolId [force] If force is passed, block pool directory for the given blockpool id on the given datanode is deleted along with its contents, otherwise the directory is deleted only if it is empty. The command will fail if datanode is still serving the block pool. Refer to refreshNamenodes to shutdown a block pool service on a datanode.
-setBalancerBandwidth <bandwidth in bytes per second> Changes the network bandwidth used by each datanode during HDFS block balancing. <bandwidth> is the maximum number of bytes per second that will be used by each datanode. This value overrides the dfs.balance.bandwidthPerSec parameter. 
NOTE: The new value is not persistent on the DataNode.
-allowSnapshot <snapshotDir> Allowing snapshots of a directory to be created. If the operation completes successfully, the directory becomes snapshottable.
-disallowSnapshot <snapshotDir> Disallowing snapshots of a directory to be created. All snapshots of the directory must be deleted before disallowing snapshots.
-fetchImage <local directory> Downloads the most recent fsimage from the NameNode and saves it in the specified local directory.
-shutdownDatanode <datanode_host:ipc_port> [upgrade] Submit a shutdown request for the given datanode. See Rolling Upgrade document for the detail.
-getDatanodeInfo <datanode_host:ipc_port> Get the information about the given datanode. See Rolling Upgrade document for the detail.
-triggerBlockReport [-incremental] <datanode_host:ipc_port> Trigger a block report for the given datanode. If 'incremental' is specified, it will be otherwise, it will be a full block report.
-listOpenFiles List all open files currently managed by the NameNode along with client name and client machine accessing them.
-help [cmd] Displays help for the given command or all commands if none is specified.

mover

Runs the data migration utility. See Mover for more details.

Usage: hdfs mover [-p <files/dirs> | -f <local file name>]

COMMAND_OPTION Description
-p <files/dirs> Specify a space separated list of HDFS files/dirs to migrate.
-f <local file> Specify a local file containing a list of HDFS files/dirs to migrate.

Note that, when both -p and -f options are omitted, the default path is the root directory.

In addition, a pinning feature is introduced starting from 2.7.0 to prevent certain replicas from getting moved by balancer/mover. This pinning feature is disabled by default, and can be enabled by configuration property "dfs.datanode.block-pinning.enabled". When enabled, this feature only affects blocks that are written to favored nodes specified in the create() call. This feature is useful when we want to maintain the data locality, for applications such as HBase regionserver.

namenode

Runs the namenode. More info about the upgrade, rollback and finalize is at Upgrade Rollback.

   Usage: hdfs namenode [-backup] |
          [-checkpoint] |
          [-format [-clusterid cid ] [-force] [-nonInteractive] ] |
          [-upgrade [-clusterid cid] [-renameReserved<k-v pairs>] ] |
          [-upgradeOnly [-clusterid cid] [-renameReserved<k-v pairs>] ] |
          [-rollback] |
          [-rollingUpgrade <downgrade|rollback> ] |
          [-finalize] |
          [-importCheckpoint] |
          [-initializeSharedEdits] |
          [-bootstrapStandby] |
          [-recover [-force] ] |
          [-metadataVersion ]
COMMAND_OPTION Description
-backup Start backup node.
-checkpoint Start checkpoint node.
-format [-clusterid cid] [-force] [-nonInteractive] Formats the specified NameNode. It starts the NameNode, formats it and then shut it down. -force option formats if the name directory exists. -nonInteractive option aborts if the name directory exists, unless -force option is specified.
-upgrade [-clusterid cid] [-renameReserved<k-v pairs>] Namenode should be started with upgrade option after the distribution of new Hadoop version.
-upgradeOnly [-clusterid cid] [-renameReserved<k-v pairs>] Upgrade the specified NameNode and then shutdown it.
-rollback Rollback the NameNode to the previous version. This should be used after stopping the cluster and distributing the old Hadoop version.
-rollingUpgrade <downgrade|rollback|started> See Rolling Upgrade document for the detail.
-finalize Finalize will remove the previous state of the files system. Recent upgrade will become permanent. Rollback option will not be available anymore. After finalization it shuts the NameNode down.
-importCheckpoint Loads image from a checkpoint directory and save it into the current one. Checkpoint dir is read from property fs.checkpoint.dir
-initializeSharedEdits Format a new shared edits dir and copy in enough edit log segments so that the standby NameNode can start up.
-bootstrapStandby Allows the standby NameNode's storage directories to be bootstrapped by copying the latest namespace snapshot from the active NameNode. This is used when first configuring an HA cluster.
-recover [-force] Recover lost metadata on a corrupt filesystem. See HDFS User Guide for the detail.
-metadataVersion Verify that configured directories exist, then print the metadata versions of the software and the image.

secondarynamenode

Runs the HDFS secondary namenode. See Secondary Namenode for more info.

Usage: hdfs secondarynamenode [-checkpoint [force]] | [-format] | [-geteditsize]

COMMAND_OPTION Description
-checkpoint [force] Checkpoints the SecondaryNameNode if EditLog size >= fs.checkpoint.size. If force is used, checkpoint irrespective of EditLog size.
-format Format the local storage during startup.
-geteditsize Prints the number of uncheckpointed transactions on the NameNode.