WebHDFS REST API

Document Conventions

Monospaced Used for commands, HTTP request and responses and code blocks.
<Monospaced> User entered values.
[Monospaced] Optional values. When the value is not specified, the default value is used.
Italics Important phrases and words.

Introduction

The HTTP REST API supports the complete FileSystem/FileContext interface for HDFS. The operations and the corresponding FileSystem/FileContext methods are shown in the next section. The Section HTTP Query Parameter Dictionary specifies the parameter details such as the defaults and the valid values.

Operations

FileSystem URIs vs HTTP URLs

The FileSystem scheme of WebHDFS is "webhdfs://". A WebHDFS FileSystem URI has the following format.

  webhdfs://<HOST>:<HTTP_PORT>/<PATH>

The above WebHDFS URI corresponds to the below HDFS URI.

  hdfs://<HOST>:<RPC_PORT>/<PATH>

In the REST API, the prefix "/webhdfs/v1" is inserted in the path and a query is appended at the end. Therefore, the corresponding HTTP URL has the following format.

  http://<HOST>:<HTTP_PORT>/webhdfs/v1/<PATH>?op=...

HDFS Configuration Options

Below are the HDFS configuration options for WebHDFS.

Property Name Description
dfs.webhdfs.enabled Enable/disable WebHDFS in Namenodes and Datanodes
dfs.web.authentication.kerberos.principal The HTTP Kerberos principal used by Hadoop-Auth in the HTTP endpoint. The HTTP Kerberos principal MUST start with 'HTTP/' per Kerberos HTTP SPNEGO specification. A value of "*" will use all HTTP principals found in the keytab.
dfs.web.authentication.kerberos.keytab The Kerberos keytab file with the credentials for the HTTP Kerberos principal used by Hadoop-Auth in the HTTP endpoint.

Authentication

When security is off, the authenticated user is the username specified in the user.name query parameter. If the user.name parameter is not set, the server may either set the authenticated user to a default web user, if there is any, or return an error response.

When security is on, authentication is performed by either Hadoop delegation token or Kerberos SPNEGO. If a token is set in the delegation query parameter, the authenticated user is the user encoded in the token. If the delegation parameter is not set, the user is authenticated by Kerberos SPNEGO.

Below are examples using the curl command tool.

  1. Authentication when security is off:
    curl -i "http://<HOST>:<PORT>/webhdfs/v1/<PATH>?[user.name=<USER>&]op=..."
  2. Authentication using Kerberos SPNEGO when security is on:
    curl -i --negotiate -u : "http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=..."
  3. Authentication using Hadoop delegation token when security is on:
    curl -i "http://<HOST>:<PORT>/webhdfs/v1/<PATH>?delegation=<TOKEN>&op=..."

See also: Authentication for Hadoop HTTP web-consoles

Proxy Users

When the proxy user feature is enabled, a proxy user P may submit a request on behalf of another user U. The username of U must be specified in the doas query parameter unless a delegation token is presented in authentication. In such case, the information of both users P and U must be encoded in the delegation token.

  1. A proxy request when security is off:
    curl -i "http://<HOST>:<PORT>/webhdfs/v1/<PATH>?[user.name=<USER>&]doas=<USER>&op=..."
  2. A proxy request using Kerberos SPNEGO when security is on:
    curl -i --negotiate -u : "http://<HOST>:<PORT>/webhdfs/v1/<PATH>?doas=<USER>&op=..."
  3. A proxy request using Hadoop delegation token when security is on:
    curl -i "http://<HOST>:<PORT>/webhdfs/v1/<PATH>?delegation=<TOKEN>&op=..."

File and Directory Operations

Create and Write to a File

  • Step 1: Submit a HTTP PUT request without automatically following redirects and without sending the file data.
    curl -i -X PUT "http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=CREATE
                        [&overwrite=<true|false>][&blocksize=<LONG>][&replication=<SHORT>]
                        [&permission=<OCTAL>][&buffersize=<INT>]"

    The request is redirected to a datanode where the file data is to be written:

    HTTP/1.1 307 TEMPORARY_REDIRECT
    Location: http://<DATANODE>:<PORT>/webhdfs/v1/<PATH>?op=CREATE...
    Content-Length: 0
  • Step 2: Submit another HTTP PUT request using the URL in the Location header with the file data to be written.
    curl -i -X PUT -T <LOCAL_FILE> "http://<DATANODE>:<PORT>/webhdfs/v1/<PATH>?op=CREATE..."

    The client receives a 201 Created response with zero content length and the WebHDFS URI of the file in the Location header:

    HTTP/1.1 201 Created
    Location: webhdfs://<HOST>:<PORT>/<PATH>
    Content-Length: 0

Note that the reason of having two-step create/append is for preventing clients to send out data before the redirect. This issue is addressed by the "Expect: 100-continue" header in HTTP/1.1; see RFC 2616, Section 8.2.3. Unfortunately, there are software library bugs (e.g. Jetty 6 HTTP server and Java 6 HTTP client), which do not correctly implement "Expect: 100-continue". The two-step create/append is a temporary workaround for the software library bugs.

See also: overwrite, blocksize, replication, permission, buffersize, FileSystem.create

Append to a File

  • Step 1: Submit a HTTP POST request without automatically following redirects and without sending the file data.
    curl -i -X POST "http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=APPEND[&buffersize=<INT>]"

    The request is redirected to a datanode where the file data is to be appended:

    HTTP/1.1 307 TEMPORARY_REDIRECT
    Location: http://<DATANODE>:<PORT>/webhdfs/v1/<PATH>?op=APPEND...
    Content-Length: 0
  • Step 2: Submit another HTTP POST request using the URL in the Location header with the file data to be appended.
    curl -i -X POST -T <LOCAL_FILE> "http://<DATANODE>:<PORT>/webhdfs/v1/<PATH>?op=APPEND..."

    The client receives a response with zero content length:

    HTTP/1.1 200 OK
    Content-Length: 0

See the note in the previous section for the description of why this operation requires two steps.

See also: buffersize, FileSystem.append

Concat File(s)

  • Submit a HTTP POST request.
    curl -i -X POST "http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=CONCAT&sources=<PATHS>"

    The client receives a response with zero content length:

    HTTP/1.1 200 OK
    Content-Length: 0

See also: sources, FileSystem.concat

Open and Read a File

  • Submit a HTTP GET request with automatically following redirects.
    curl -i -L "http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=OPEN
                        [&offset=<LONG>][&length=<LONG>][&buffersize=<INT>]"

    The request is redirected to a datanode where the file data can be read:

    HTTP/1.1 307 TEMPORARY_REDIRECT
    Location: http://<DATANODE>:<PORT>/webhdfs/v1/<PATH>?op=OPEN...
    Content-Length: 0

    The client follows the redirect to the datanode and receives the file data:

    HTTP/1.1 200 OK
    Content-Type: application/octet-stream
    Content-Length: 22
    
    Hello, webhdfs user!

See also: offset, length, buffersize, FileSystem.open

Make a Directory

  • Submit a HTTP PUT request.
    curl -i -X PUT "http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=MKDIRS[&permission=<OCTAL>]"

    The client receives a response with a boolean JSON object:

    HTTP/1.1 200 OK
    Content-Type: application/json
    Transfer-Encoding: chunked
    
    {"boolean": true}

See also: permission, FileSystem.mkdirs

Create a Symbolic Link

  • Submit a HTTP PUT request.
    curl -i -X PUT "http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=CREATESYMLINK
                                  &destination=<PATH>[&createParent=<true|false>]"

    The client receives a response with zero content length:

    HTTP/1.1 200 OK
    Content-Length: 0

See also: destination, createParent, FileSystem.createSymlink

Rename a File/Directory

  • Submit a HTTP PUT request.
    curl -i -X PUT "<HOST>:<PORT>/webhdfs/v1/<PATH>?op=RENAME&destination=<PATH>"

    The client receives a response with a boolean JSON object:

    HTTP/1.1 200 OK
    Content-Type: application/json
    Transfer-Encoding: chunked
    
    {"boolean": true}

See also: destination, FileSystem.rename

Delete a File/Directory

  • Submit a HTTP DELETE request.
    curl -i -X DELETE "http://<host>:<port>/webhdfs/v1/<path>?op=DELETE
                                  [&recursive=<true|false>]"

    The client receives a response with a boolean JSON object:

    HTTP/1.1 200 OK
    Content-Type: application/json
    Transfer-Encoding: chunked
    
    {"boolean": true}

See also: recursive, FileSystem.delete

Status of a File/Directory

  • Submit a HTTP GET request.
    curl -i  "http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=GETFILESTATUS"

    The client receives a response with a FileStatus JSON object:

    HTTP/1.1 200 OK
    Content-Type: application/json
    Transfer-Encoding: chunked
    
    {
      "FileStatus":
      {
        "accessTime"      : 0,
        "blockSize"       : 0,
        "childrenNum"     : 1,
        "fileId"          : 16386,
        "group"           : "supergroup",
        "length"          : 0,             //in bytes, zero for directories
        "modificationTime": 1320173277227,
        "owner"           : "webuser",
        "pathSuffix"      : "",
        "permission"      : "777",
        "replication"     : 0,
        "type"            : "DIRECTORY"    //enum {FILE, DIRECTORY, SYMLINK}
      }
    }

See also: FileSystem.getFileStatus

List a Directory

  • Submit a HTTP GET request.
    curl -i  "http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=LISTSTATUS"

    The client receives a response with a FileStatuses JSON object:

    HTTP/1.1 200 OK
    Content-Type: application/json
    Content-Length: 427
    
    {
      "FileStatuses":
      {
        "FileStatus":
        [
          {
            "accessTime"      : 1320171722771,
            "blockSize"       : 33554432,
            "childrenNum"     : 0,
            "fileId"          : 16387,
            "group"           : "supergroup",
            "length"          : 24930,
            "modificationTime": 1320171722771,
            "owner"           : "webuser",
            "pathSuffix"      : "a.patch",
            "permission"      : "644",
            "replication"     : 1,
            "type"            : "FILE"
          },
          {
            "accessTime"      : 0,
            "blockSize"       : 0,
            "childrenNum"     : 2,
            "fileId"          : 16388,
            "group"           : "supergroup",
            "length"          : 0,
            "modificationTime": 1320895981256,
            "owner"           : "szetszwo",
            "pathSuffix"      : "bar",
            "permission"      : "711",
            "replication"     : 0,
            "type"            : "DIRECTORY"
          },
          ...
        ]
      }
    }

See also: FileSystem.listStatus

Iteratively List a Directory

  • Submit a HTTP GET request.
    curl -i  "http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=LISTSTATUS_BATCH&startAfter=<CHILD>"

    The client receives a response with a DirectoryListing JSON object, which contains a FileStatuses JSON object, as well as iteration information:

    HTTP/1.1 200 OK
    Cache-Control: no-cache
    Expires: Tue, 30 Aug 2016 16:42:16 GMT
    Date: Tue, 30 Aug 2016 16:42:16 GMT
    Pragma: no-cache
    Expires: Tue, 30 Aug 2016 16:42:16 GMT
    Date: Tue, 30 Aug 2016 16:42:16 GMT
    Pragma: no-cache
    Content-Type: application/json
    X-FRAME-OPTIONS: SAMEORIGIN
    Transfer-Encoding: chunked
    Server: Jetty(6.1.26)
    
    {
      "DirectoryListing": {
        "partialListing": {
          "FileStatuses": {
            "FileStatus": [
              {
                "accessTime": 0,
                "blockSize": 0,
                "childrenNum": 0,
                "fileId": 16387,
                "group": "supergroup",
                "length": 0,
                "modificationTime": 1473305882563,
                "owner": "andrew",
                "pathSuffix": "bardir",
                "permission": "755",
                "replication": 0,
                "storagePolicy": 0,
                "type": "DIRECTORY"
              },
              {
                "accessTime": 1473305896945,
                "blockSize": 1024,
                "childrenNum": 0,
                "fileId": 16388,
                "group": "supergroup",
                "length": 0,
                "modificationTime": 1473305896965,
                "owner": "andrew",
                "pathSuffix": "bazfile",
                "permission": "644",
                "replication": 3,
                "storagePolicy": 0,
                "type": "FILE"
              }
            ]
          }
        },
      "remainingEntries": 2
      }
    }

    If remainingEntries is non-zero, there are additional entries in the directory. To query the next batch, set the startAfter parameter to the pathSuffix of the last item returned in the current batch. For example:

    curl -i  "http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=LISTSTATUS_BATCH&startAfter=bazfile"

    Which will return the next batch of directory entries:

    HTTP/1.1 200 OK
    Cache-Control: no-cache
    Expires: Tue, 30 Aug 2016 16:46:23 GMT
    Date: Tue, 30 Aug 2016 16:46:23 GMT
    Pragma: no-cache
    Expires: Tue, 30 Aug 2016 16:46:23 GMT
    Date: Tue, 30 Aug 2016 16:46:23 GMT
    Pragma: no-cache
    Content-Type: application/json
    X-FRAME-OPTIONS: SAMEORIGIN
    Transfer-Encoding: chunked
    Server: Jetty(6.1.26)
    
    {
      "DirectoryListing": {
        "partialListing": {
          "FileStatuses": {
            "FileStatus": [
              {
                "accessTime": 0,
                "blockSize": 0,
                "childrenNum": 0,
                "fileId": 16386,
                "group": "supergroup",
                "length": 0,
                "modificationTime": 1473305878951,
                "owner": "andrew",
                "pathSuffix": "foodir",
                "permission": "755",
                "replication": 0,
                "storagePolicy": 0,
                "type": "DIRECTORY"
              },
              {
                "accessTime": 1473305902864,
                "blockSize": 1024,
                "childrenNum": 0,
                "fileId": 16389,
                "group": "supergroup",
                "length": 0,
                "modificationTime": 1473305902878,
                "owner": "andrew",
                "pathSuffix": "quxfile",
                "permission": "644",
                "replication": 3,
                "storagePolicy": 0,
                "type": "FILE"
              }
            ]
          }
        },
        "remainingEntries": 0
      }
    }

    Batch size is controlled by the dfs.ls.limit option on the NameNode.

See also: FileSystem.listStatusIterator

Other File System Operations

Get Content Summary of a Directory

  • Submit a HTTP GET request.
    curl -i "http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=GETCONTENTSUMMARY"

    The client receives a response with a ContentSummary JSON object:

    HTTP/1.1 200 OK
    Content-Type: application/json
    Transfer-Encoding: chunked
    
    {
      "ContentSummary":
      {
        "directoryCount": 2,
        "fileCount"     : 1,
        "length"        : 24930,
        "quota"         : -1,
        "spaceConsumed" : 24930,
        "spaceQuota"    : -1
      }
    }

See also: FileSystem.getContentSummary

Get File Checksum

  • Submit a HTTP GET request.
    curl -i "http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=GETFILECHECKSUM"

    The request is redirected to a datanode:

    HTTP/1.1 307 TEMPORARY_REDIRECT
    Location: http://<DATANODE>:<PORT>/webhdfs/v1/<PATH>?op=GETFILECHECKSUM...
    Content-Length: 0

    The client follows the redirect to the datanode and receives a FileChecksum JSON object:

    HTTP/1.1 200 OK
    Content-Type: application/json
    Transfer-Encoding: chunked
    
    {
      "FileChecksum":
      {
        "algorithm": "MD5-of-1MD5-of-512CRC32",
        "bytes"    : "eadb10de24aa315748930df6e185c0d ...",
        "length"   : 28
      }
    }

See also: FileSystem.getFileChecksum

Get Home Directory

  • Submit a HTTP GET request.
    curl -i "http://<HOST>:<PORT>/webhdfs/v1/?op=GETHOMEDIRECTORY"

    The client receives a response with a Path JSON object:

    HTTP/1.1 200 OK
    Content-Type: application/json
    Transfer-Encoding: chunked
    
    {"Path": "/user/szetszwo"}

See also: FileSystem.getHomeDirectory

Get Trash Root

  • Submit a HTTP GET request.
    curl -i "http://<HOST>:<PORT>/webhdfs/v1/?op=GETTRASHROOT"

    The client receives a response with a Path JSON object:

    HTTP/1.1 200 OK
    Content-Type: application/json
    Transfer-Encoding: chunked
    
    {"Path": "/user/username/.Trash"}

    If the path is an encrypted zone path and user has permission of the path, the client receives a response like this:

    HTTP/1.1 200 OK
    Content-Type: application/json
    Transfer-Encoding: chunked
    
    {"Path": "/PATH/.Trash/username"}

See also: FileSystem.getTrashRoot

For more details about trash root in an encrypted zone, please refer to Transparent Encryption Guide.

Set Permission

  • Submit a HTTP PUT request.
    curl -i -X PUT "http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=SETPERMISSION
                                  [&permission=<OCTAL>]"

    The client receives a response with zero content length:

    HTTP/1.1 200 OK
    Content-Length: 0

See also: permission, FileSystem.setPermission

Set Owner

  • Submit a HTTP PUT request.
    curl -i -X PUT "http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=SETOWNER
                                  [&owner=<USER>][&group=<GROUP>]"

    The client receives a response with zero content length:

    HTTP/1.1 200 OK
    Content-Length: 0

See also: owner, group, FileSystem.setOwner

Set Replication Factor

  • Submit a HTTP PUT request.
    curl -i -X PUT "http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=SETREPLICATION
                                  [&replication=<SHORT>]"

    The client receives a response with a boolean JSON object:

    HTTP/1.1 200 OK
    Content-Type: application/json
    Transfer-Encoding: chunked
    
    {"boolean": true}

See also: replication, FileSystem.setReplication

Set Access or Modification Time

  • Submit a HTTP PUT request.
    curl -i -X PUT "http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=SETTIMES
                                  [&modificationtime=<TIME>][&accesstime=<TIME>]"

    The client receives a response with zero content length:

    HTTP/1.1 200 OK
    Content-Length: 0

See also: modificationtime, accesstime, FileSystem.setTimes

Modify ACL Entries

  • Submit a HTTP PUT request.
    curl -i -X PUT "http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=MODIFYACLENTRIES
                                  &aclspec=<ACLSPEC>"

    The client receives a response with zero content length:

    HTTP/1.1 200 OK
    Content-Length: 0

See also: FileSystem.modifyAclEntries

Remove ACL Entries

  • Submit a HTTP PUT request.
    curl -i -X PUT "http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=REMOVEACLENTRIES
                                  &aclspec=<ACLSPEC>"

    The client receives a response with zero content length:

    HTTP/1.1 200 OK
    Content-Length: 0

See also: FileSystem.removeAclEntries

Remove Default ACL

  • Submit a HTTP PUT request.
    curl -i -X PUT "http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=REMOVEDEFAULTACL"

    The client receives a response with zero content length:

    HTTP/1.1 200 OK
    Content-Length: 0

See also: FileSystem.removeDefaultAcl

Remove ACL

  • Submit a HTTP PUT request.
    curl -i -X PUT "http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=REMOVEACL"

    The client receives a response with zero content length:

    HTTP/1.1 200 OK
    Content-Length: 0

See also: FileSystem.removeAcl

Set ACL

  • Submit a HTTP PUT request.
    curl -i -X PUT "http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=SETACL
                                  &aclspec=<ACLSPEC>"

    The client receives a response with zero content length:

    HTTP/1.1 200 OK
    Content-Length: 0

See also: FileSystem.setAcl

Get ACL Status

  • Submit a HTTP GET request.
    curl -i -X PUT "http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=GETACLSTATUS"

    The client receives a response with a AclStatus JSON object:

    HTTP/1.1 200 OK
    Content-Type: application/json
    Transfer-Encoding: chunked
    
    {
        "AclStatus": {
            "entries": [
                "user:carla:rw-", 
                "group::r-x"
            ], 
            "group": "supergroup", 
            "owner": "hadoop", 
            "permission":"775",
            "stickyBit": false
        }
    }

See also: FileSystem.getAclStatus

Check access

  • Submit a HTTP GET request.
    curl -i -X PUT "http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=CHECKACCESS
                                  &fsaction=<FSACTION>

    The client receives a response with zero content length:

    HTTP/1.1 200 OK
    Content-Length: 0

See also: FileSystem.access

Extended Attributes(XAttrs) Operations

Set XAttr

  • Submit a HTTP PUT request.
    curl -i -X PUT "http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=SETXATTR
                                  &xattr.name=<XATTRNAME>&xattr.value=<XATTRVALUE>
                                  &flag=<FLAG>"

    The client receives a response with zero content length:

    HTTP/1.1 200 OK
    Content-Length: 0

See also: FileSystem.setXAttr

Remove XAttr

  • Submit a HTTP PUT request.
    curl -i -X PUT "http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=REMOVEXATTR
                                  &xattr.name=<XATTRNAME>"

    The client receives a response with zero content length:

    HTTP/1.1 200 OK
    Content-Length: 0

See also: FileSystem.removeXAttr

Get an XAttr

  • Submit a HTTP GET request.
    curl -i -X PUT "http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=GETXATTRS
                                  &xattr.name=<XATTRNAME>&encoding=<ENCODING>"

    The client receives a response with a XAttrs JSON object:

    HTTP/1.1 200 OK
    Content-Type: application/json
    Transfer-Encoding: chunked
    
    {
        "XAttrs": [
            {
                "name":"XATTRNAME",
                "value":"XATTRVALUE"
            }
        ]
    }

See also: FileSystem.getXAttr

Get multiple XAttrs

  • Submit a HTTP GET request.
    curl -i -X PUT "http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=GETXATTRS
                                  &xattr.name=<XATTRNAME1>&xattr.name=<XATTRNAME2>
                                  &encoding=<ENCODING>"

    The client receives a response with a XAttrs JSON object:

    HTTP/1.1 200 OK
    Content-Type: application/json
    Transfer-Encoding: chunked
    
    {
        "XAttrs": [
            {
                "name":"XATTRNAME1",
                "value":"XATTRVALUE1"
            },
            {
                "name":"XATTRNAME2",
                "value":"XATTRVALUE2"
            }
        ]
    }

See also: FileSystem.getXAttrs

Get all XAttrs

  • Submit a HTTP GET request.
    curl -i -X PUT "http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=GETXATTRS
                                  &encoding=<ENCODING>"

    The client receives a response with a XAttrs JSON object:

    HTTP/1.1 200 OK
    Content-Type: application/json
    Transfer-Encoding: chunked
    
    {
        "XAttrs": [
            {
                "name":"XATTRNAME1",
                "value":"XATTRVALUE1"
            },
            {
                "name":"XATTRNAME2",
                "value":"XATTRVALUE2"
            },
            {
                "name":"XATTRNAME3",
                "value":"XATTRVALUE3"
            }
        ]
    }

See also: FileSystem.getXAttrs

List all XAttrs

  • Submit a HTTP GET request.
    curl -i -X PUT "http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=LISTXATTRS"

    The client receives a response with a XAttrNames JSON object:

    HTTP/1.1 200 OK
    Content-Type: application/json
    Transfer-Encoding: chunked
    
    {
        "XAttrNames":"[\"XATTRNAME1\",\"XATTRNAME2\",\"XATTRNAME3\"]"
    }

See also: FileSystem.listXAttrs

Snapshot Operations

Create Snapshot

  • Submit a HTTP PUT request.
    curl -i -X PUT "http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=CREATESNAPSHOT[&snapshotname=<SNAPSHOTNAME>]"

    The client receives a response with a Path JSON object:

    HTTP/1.1 200 OK
    Content-Type: application/json
    Transfer-Encoding: chunked
    
    {"Path": "/user/szetszwo/.snapshot/s1"}

See also: FileSystem.createSnapshot

Delete Snapshot

  • Submit a HTTP DELETE request.
    curl -i -X DELETE "http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=DELETESNAPSHOT&snapshotname=<SNAPSHOTNAME>"

    The client receives a response with zero content length:

    HTTP/1.1 200 OK
    Content-Length: 0

See also: FileSystem.deleteSnapshot

Rename Snapshot

  • Submit a HTTP PUT request.
    curl -i -X PUT "http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=RENAMESNAPSHOT
                       &oldsnapshotname=<SNAPSHOTNAME>&snapshotname=<SNAPSHOTNAME>"

    The client receives a response with zero content length:

    HTTP/1.1 200 OK
    Content-Length: 0

See also: FileSystem.renameSnapshot

Delegation Token Operations

Get Delegation Token

  • Submit a HTTP GET request.
    curl -i "http://<HOST>:<PORT>/webhdfs/v1/?op=GETDELEGATIONTOKEN&renewer=<USER>&service=<SERVICE>&kind=<KIND>"

    The client receives a response with a Token JSON object:

    HTTP/1.1 200 OK
    Content-Type: application/json
    Transfer-Encoding: chunked
    
    {
      "Token":
      {
        "urlString": "JQAIaG9y..."
      }
    }

See also: renewer, FileSystem.getDelegationToken, kind, service

Get Delegation Tokens

  • Submit a HTTP GET request.
    curl -i "http://<HOST>:<PORT>/webhdfs/v1/?op=GETDELEGATIONTOKENS&renewer=<USER>"

    The client receives a response with a Tokens JSON object:

    HTTP/1.1 200 OK
    Content-Type: application/json
    Transfer-Encoding: chunked
    
    {
      "Tokens":
      {
        "Token":
        [
          {
            "urlString":"KAAKSm9i ..."
          }
        ]
      }
    }

See also: renewer, FileSystem.getDelegationTokens

Renew Delegation Token

  • Submit a HTTP PUT request.
    curl -i -X PUT "http://<HOST>:<PORT>/webhdfs/v1/?op=RENEWDELEGATIONTOKEN&token=<TOKEN>"

    The client receives a response with a long JSON object:

    HTTP/1.1 200 OK
    Content-Type: application/json
    Transfer-Encoding: chunked
    
    {"long": 1320962673997}           //the new expiration time

See also: token, FileSystem.renewDelegationToken

Cancel Delegation Token

  • Submit a HTTP PUT request.
    curl -i -X PUT "http://<HOST>:<PORT>/webhdfs/v1/?op=CANCELDELEGATIONTOKEN&token=<TOKEN>"

    The client receives a response with zero content length:

    HTTP/1.1 200 OK
    Content-Length: 0

See also: token, FileSystem.cancelDelegationToken

Error Responses

When an operation fails, the server may throw an exception. The JSON schema of error responses is defined in RemoteException JSON Schema. The table below shows the mapping from exceptions to HTTP response codes.

HTTP Response Codes

Exceptions HTTP Response Codes
IllegalArgumentException 400 Bad Request
UnsupportedOperationException 400 Bad Request
SecurityException 401 Unauthorized
IOException 403 Forbidden
FileNotFoundException 404 Not Found
RumtimeException 500 Internal Server Error

Below are examples of exception responses.

Illegal Argument Exception
HTTP/1.1 400 Bad Request
Content-Type: application/json
Transfer-Encoding: chunked

{
  "RemoteException":
  {
    "exception"    : "IllegalArgumentException",
    "javaClassName": "java.lang.IllegalArgumentException",
    "message"      : "Invalid value for webhdfs parameter \"permission\": ..."
  }
}
Security Exception
HTTP/1.1 401 Unauthorized
Content-Type: application/json
Transfer-Encoding: chunked

{
  "RemoteException":
  {
    "exception"    : "SecurityException",
    "javaClassName": "java.lang.SecurityException",
    "message"      : "Failed to obtain user group information: ..."
  }
}
Access Control Exception
HTTP/1.1 403 Forbidden
Content-Type: application/json
Transfer-Encoding: chunked

{
  "RemoteException":
  {
    "exception"    : "AccessControlException",
    "javaClassName": "org.apache.hadoop.security.AccessControlException",
    "message"      : "Permission denied: ..."
  }
}
File Not Found Exception
HTTP/1.1 404 Not Found
Content-Type: application/json
Transfer-Encoding: chunked

{
  "RemoteException":
  {
    "exception"    : "FileNotFoundException",
    "javaClassName": "java.io.FileNotFoundException",
    "message"      : "File does not exist: /foo/a.patch"
  }
}

JSON Schemas

All operations, except for OPEN, either return a zero-length response or a JSON response. For OPEN, the response is an octet-stream. The JSON schemas are shown below. See draft-zyp-json-schema-03 for the syntax definitions of the JSON schemas.

Note that the default value of additionalProperties is an empty schema which allows any value for additional properties. Therefore, all WebHDFS JSON responses allow any additional property. However, if additional properties are included in the responses, they are considered as optional properties in order to maintain compatibility.

ACL Status JSON Schema

{
  "name"      : "AclStatus",
  "properties":
  {
    "AclStatus":
    {
      "type"      : "object",
      "properties":
      {
        "entries":
        {
          "type": "array"
          "items":
          {
            "description": "ACL entry.",
            "type": "string"
          }
        },
        "group":
        {
          "description": "The group owner.",
          "type"       : "string",
          "required"   : true
        },
        "owner":
        {
          "description": "The user who is the owner.",
          "type"       : "string",
          "required"   : true
        },
        "stickyBit":
        {
          "description": "True if the sticky bit is on.",
          "type"       : "boolean",
          "required"   : true
        },
      }
    }
  }
}

XAttrs JSON Schema

{
  "name"      : "XAttrs",
  "properties":
  {
    "XAttrs":
    {
      "type"      : "array",
      "items":
      {
        "type"    " "object",
        "properties":
        {
          "name":
          {
            "description": "XAttr name.",
            "type"       : "string",
            "required"   : true
          },
          "value":
          {
            "description": "XAttr value.",
            "type"       : "string"
          }
        }
      }
    }
  }
}

XAttrNames JSON Schema

{
  "name"      : "XAttrNames",
  "properties":
  {
    "XAttrNames":
    {
      "description": "XAttr names.",
      "type"       : "string"
      "required"   : true
    }
  }
}

Boolean JSON Schema

{
  "name"      : "boolean",
  "properties":
  {
    "boolean":
    {
      "description": "A boolean value",
      "type"       : "boolean",
      "required"   : true
    }
  }
}

See also: MKDIRS, RENAME, DELETE, SETREPLICATION

ContentSummary JSON Schema

{
  "name"      : "ContentSummary",
  "properties":
  {
    "ContentSummary":
    {
      "type"      : "object",
      "properties":
      {
        "directoryCount":
        {
          "description": "The number of directories.",
          "type"       : "integer",
          "required"   : true
        },
        "fileCount":
        {
          "description": "The number of files.",
          "type"       : "integer",
          "required"   : true
        },
        "length":
        {
          "description": "The number of bytes used by the content.",
          "type"       : "integer",
          "required"   : true
        },
        "quota":
        {
          "description": "The namespace quota of this directory.",
          "type"       : "integer",
          "required"   : true
        },
        "spaceConsumed":
        {
          "description": "The disk space consumed by the content.",
          "type"       : "integer",
          "required"   : true
        },
        "spaceQuota":
        {
          "description": "The disk space quota.",
          "type"       : "integer",
          "required"   : true
        }
      }
    }
  }
}

See also: GETCONTENTSUMMARY

FileChecksum JSON Schema

{
  "name"      : "FileChecksum",
  "properties":
  {
    "FileChecksum":
    {
      "type"      : "object",
      "properties":
      {
        "algorithm":
        {
          "description": "The name of the checksum algorithm.",
          "type"       : "string",
          "required"   : true
        },
        "bytes":
        {
          "description": "The byte sequence of the checksum in hexadecimal.",
          "type"       : "string",
          "required"   : true
        },
        "length":
        {
          "description": "The length of the bytes (not the length of the string).",
          "type"       : "integer",
          "required"   : true
        }
      }
    }
  }
}

See also: GETFILECHECKSUM

FileStatus JSON Schema

{
  "name"      : "FileStatus",
  "properties":
  {
    "FileStatus": fileStatusProperties      //See FileStatus Properties
  }
}

See also: FileStatus Properties, GETFILESTATUS, FileStatus

FileStatus Properties

JavaScript syntax is used to define fileStatusProperties so that it can be referred in both FileStatus and FileStatuses JSON schemas.

var fileStatusProperties =
{
  "type"      : "object",
  "properties":
  {
    "accessTime":
    {
      "description": "The access time.",
      "type"       : "integer",
      "required"   : true
    },
    "blockSize":
    {
      "description": "The block size of a file.",
      "type"       : "integer",
      "required"   : true
    },
    "childrenNum":
    {
      "description": "The number of children.",
      "type"       : "integer",
      "required"   : true
    },
    "fileId":
    {
      "description": "The inode ID.",
      "type"       : "integer",
      "required"   : true
    },
    "group":
    {
      "description": "The group owner.",
      "type"       : "string",
      "required"   : true
    },
    "length":
    {
      "description": "The number of bytes in a file.",
      "type"       : "integer",
      "required"   : true
    },
    "modificationTime":
    {
      "description": "The modification time.",
      "type"       : "integer",
      "required"   : true
    },
    "owner":
    {
      "description": "The user who is the owner.",
      "type"       : "string",
      "required"   : true
    },
    "pathSuffix":
    {
      "description": "The path suffix.",
      "type"       : "string",
      "required"   : true
    },
    "permission":
    {
      "description": "The permission represented as a octal string.",
      "type"       : "string",
      "required"   : true
    },
    "replication":
    {
      "description": "The number of replication of a file.",
      "type"       : "integer",
      "required"   : true
    },
   "symlink":                                         //an optional property
    {
      "description": "The link target of a symlink.",
      "type"       : "string"
    },
   "type":
    {
      "description": "The type of the path object.",
      "enum"       : ["FILE", "DIRECTORY", "SYMLINK"],
      "required"   : true
    }
  }
};

FileStatuses JSON Schema

A FileStatuses JSON object represents an array of FileStatus JSON objects.

{
  "name"      : "FileStatuses",
  "properties":
  {
    "FileStatuses":
    {
      "type"      : "object",
      "properties":
      {
        "FileStatus":
        {
          "description": "An array of FileStatus",
          "type"       : "array",
          "items"      : fileStatusProperties      //See FileStatus Properties
        }
      }
    }
  }
}

See also: FileStatus Properties, LISTSTATUS, FileStatus

DirectoryListing JSON Schema

A DirectoryListing JSON object represents a batch of directory entries while iteratively listing a directory. It contains a FileStatuses JSON object as well as iteration information.

{
  "name"      : "DirectoryListing",
  "properties":
  {
    "DirectoryListing":
    {
      "type"      : "object",
      "properties":
      {
        "partialListing":
        {
          "description": "A partial directory listing",
          "type"       : "object", // A FileStatuses object
          "required"   : true
        },
        "remainingEntries":
        {
          "description": "Number of remaining entries",
          "type"       : "integer",
          "required"   : true
        }
      }
    }
  }
}

See also: FileStatuses JSON object, LISTSTATUS_BATCH, FileStatus

Long JSON Schema

{
  "name"      : "long",
  "properties":
  {
    "long":
    {
      "description": "A long integer value",
      "type"       : "integer",
      "required"   : true
    }
  }
}

See also: RENEWDELEGATIONTOKEN,

Path JSON Schema

{
  "name"      : "Path",
  "properties":
  {
    "Path":
    {
      "description": "The string representation a Path.",
      "type"       : "string",
      "required"   : true
    }
  }
}

See also: GETHOMEDIRECTORY, Path

RemoteException JSON Schema

{
  "name"      : "RemoteException",
  "properties":
  {
    "RemoteException":
    {
      "type"      : "object",
      "properties":
      {
        "exception":
        {
          "description": "Name of the exception",
          "type"       : "string",
          "required"   : true
        },
        "message":
        {
          "description": "Exception message",
          "type"       : "string",
          "required"   : true
        },
        "javaClassName":                                     //an optional property
        {
          "description": "Java class name of the exception",
          "type"       : "string",
        }
      }
    }
  }
}

See also: Error Responses

Token JSON Schema

{
  "name"      : "Token",
  "properties":
  {
    "Token": tokenProperties      //See Token Properties
  }
}

See also: Token Properties, GETDELEGATIONTOKEN, the note in Delegation.

Token Properties

JavaScript syntax is used to define tokenProperties so that it can be referred in both Token and Tokens JSON schemas.

var tokenProperties =
{
  "type"      : "object",
  "properties":
  {
    "urlString":
    {
      "description": "A delegation token encoded as a URL safe string.",
      "type"       : "string",
      "required"   : true
    }
  }
}

Tokens JSON Schema

A Tokens JSON object represents an array of Token JSON objects.

{
  "name"      : "Tokens",
  "properties":
  {
    "Tokens":
    {
      "type"      : "object",
      "properties":
      {
        "Token":
        {
          "description": "An array of Token",
          "type"       : "array",
          "items"      : "Token": tokenProperties      //See Token Properties
        }
      }
    }
  }
}

See also: Token Properties, GETDELEGATIONTOKENS, the note in Delegation.

HTTP Query Parameter Dictionary

ACL Spec

Name aclspec
Description The ACL spec included in ACL modification operations.
Type String
Default Value <empty>
Valid Values See Permissions and HDFS.
Syntax See Permissions and HDFS.

XAttr Name

Name xattr.name
Description The XAttr name of a file/directory.
Type String
Default Value <empty>
Valid Values Any string prefixed with user./trusted./system./security..
Syntax Any string prefixed with user./trusted./system./security..

XAttr Value

Name xattr.value
Description The XAttr value of a file/directory.
Type String
Default Value <empty>
Valid Values An encoded value.
Syntax Enclosed in double quotes or prefixed with 0x or 0s.

See also: Extended Attributes

XAttr set flag

Name flag
Description The XAttr set flag.
Type String
Default Value <empty>
Valid Values CREATE,REPLACE.
Syntax CREATE,REPLACE.

See also: Extended Attributes

XAttr value encoding

Name encoding
Description The XAttr value encoding.
Type String
Default Value <empty>
Valid Values text | hex | base64
Syntax text | hex | base64

See also: Extended Attributes

Access Time

Name accesstime
Description The access time of a file/directory.
Type long
Default Value -1 (means keeping it unchanged)
Valid Values -1 or a timestamp
Syntax Any integer.

See also: SETTIMES

Block Size

Name blocksize
Description The block size of a file.
Type long
Default Value Specified in the configuration.
Valid Values > 0
Syntax Any integer.

See also: CREATE

Buffer Size

Name buffersize
Description The size of the buffer used in transferring data.
Type int
Default Value Specified in the configuration.
Valid Values > 0
Syntax Any integer.

See also: CREATE, APPEND, OPEN

Create Parent

Name createparent
Description If the parent directories do not exist, should they be created?
Type boolean
Default Value false
Valid Values true
Syntax true

See also: CREATESYMLINK

Delegation

Name delegation
Description The delegation token used for authentication.
Type String
Default Value <empty>
Valid Values An encoded token.
Syntax See the note below.

Note that delegation tokens are encoded as a URL safe string; see encodeToUrlString() and decodeFromUrlString(String) in org.apache.hadoop.security.token.Token for the details of the encoding.

See also: Authentication

Destination

Name destination
Description The destination path.
Type Path
Default Value <empty> (an invalid path)
Valid Values An absolute FileSystem path without scheme and authority.
Syntax Any path.

See also: CREATESYMLINK, RENAME

Do As

Name doas
Description Allowing a proxy user to do as another user.
Type String
Default Value null
Valid Values Any valid username.
Syntax Any string.

See also: Proxy Users

Fs Action

Name fsaction
Description File system operation read/write/execute
Type String
Default Value null (an invalid value)
Valid Values Strings matching regex pattern  "[r-][w-][x-]  "
Syntax  "[r-][w-][x-]  "

See also: CHECKACCESS,

Group

Name group
Description The name of a group.
Type String
Default Value <empty> (means keeping it unchanged)
Valid Values Any valid group name.
Syntax Any string.

See also: SETOWNER

Length

Name length
Description The number of bytes to be processed.
Type long
Default Value null (means the entire file)
Valid Values >= 0 or null
Syntax Any integer.

See also: OPEN

Modification Time

Name modificationtime
Description The modification time of a file/directory.
Type long
Default Value -1 (means keeping it unchanged)
Valid Values -1 or a timestamp
Syntax Any integer.

See also: SETTIMES

Offset

Name offset
Description The starting byte position.
Type long
Default Value 0
Valid Values >= 0
Syntax Any integer.

See also: OPEN

Old Snapshot Name

Name oldsnapshotname
Description The old name of the snapshot to be renamed.
Type String
Default Value null
Valid Values An existing snapshot name.
Syntax Any string.

See also: RENAMESNAPSHOT

Op

Name op
Description The name of the operation to be executed.
Type enum
Default Value null (an invalid value)
Valid Values Any valid operation name.
Syntax Any string.

See also: Operations

Overwrite

Name overwrite
Description If a file already exists, should it be overwritten?
Type boolean
Default Value false
Valid Values true
Syntax true

See also: CREATE

Owner

Name owner
Description The username who is the owner of a file/directory.
Type String
Default Value <empty> (means keeping it unchanged)
Valid Values Any valid username.
Syntax Any string.

See also: SETOWNER

Permission

Name permission
Description The permission of a file/directory.
Type Octal
Default Value 755
Valid Values 0 - 1777
Syntax Any radix-8 integer (leading zeros may be omitted.)

See also: CREATE, MKDIRS, SETPERMISSION

Recursive

Name recursive
Description Should the operation act on the content in the subdirectories?
Type boolean
Default Value false
Valid Values true
Syntax true

See also: RENAME

Renewer

Name renewer
Description The username of the renewer of a delegation token.
Type String
Default Value <empty> (means the current user)
Valid Values Any valid username.
Syntax Any string.

See also: GETDELEGATIONTOKEN, GETDELEGATIONTOKENS

Replication

Name replication
Description The number of replications of a file.
Type short
Default Value Specified in the configuration.
Valid Values > 0
Syntax Any integer.

See also: CREATE, SETREPLICATION

Snapshot Name

Name snapshotname
Description The name of the snapshot to be created/deleted. Or the new name for snapshot rename.
Type String
Default Value null
Valid Values Any valid snapshot name.
Syntax Any string.

See also: CREATESNAPSHOT, DELETESNAPSHOT, RENAMESNAPSHOT

Sources

Name sources
Description A list of source paths.
Type String
Default Value <empty>
Valid Values A list of comma seperated absolute FileSystem paths without scheme and authority.
Syntax Any string.

See also: CONCAT

Token

Name token
Description The delegation token used for the operation.
Type String
Default Value <empty>
Valid Values An encoded token.
Syntax See the note in Delegation.

See also: RENEWDELEGATIONTOKEN, CANCELDELEGATIONTOKEN

Token Kind

Name kind
Description The kind of the delegation token requested
Type String
Default Value <empty> (Server sets the default kind for the service)
Valid Values A string that represents token kind e.g "HDFS_DELEGATION_TOKEN" or "WEBHDFS delegation"
Syntax Any string.

See also: GETDELEGATIONTOKEN

Token Service

Name service
Description The name of the service where the token is supposed to be used, e.g. ip:port of the namenode
Type String
Default Value <empty>
Valid Values ip:port in string format or logical name of the service
Syntax Any string.

See also: GETDELEGATIONTOKEN

Username

Name user.name
Description The authenticated user; see Authentication.
Type String
Default Value null
Valid Values Any valid username.
Syntax Any string.

See also: Authentication