Replication arguments for HDFS.
name | data type | description |
---|---|---|
sourceService | ApiServiceRef | The service to replicate from. |
sourcePath | string | The path to replicate. |
destinationPath | string | The destination to replicate to. |
mapreduceServiceName | string | The mapreduce service to use for the replication job. |
schedulerPoolName | string | Name of the scheduler pool to use when submitting the MapReduce job. Currently supports the capacity and fair schedulers. The option is ignored if a different scheduler is configured. |
userName | string | The user which will execute the MapReduce job. Required if running with Kerberos enabled. |
sourceUser | string | The user which will perform operations on source cluster. Required if running with Kerberos enabled. |
numMaps | number | The number of mappers to use for the mapreduce replication job. |
dryRun | boolean | Whether to perform a dry run. Defaults to false. |
bandwidthPerMap | number | The maximum bandwidth (in MB) per mapper in the mapreduce replication job. |
abortOnError | boolean | Whether to abort on a replication failure. Defaults to false. |
removeMissingFiles | boolean | Whether to delete destination files that are missing in source. Defaults to false. |
preserveReplicationCount | boolean | Whether to preserve the HDFS replication count. Defaults to false. |
preserveBlockSize | boolean | Whether to preserve the HDFS block size. Defaults to false. |
preservePermissions | boolean | Whether to preserve the HDFS owner, group and permissions. Defaults to false. Starting from V10, it also preserves ACLs. Defaults to null (no preserve). ACLs is preserved if both clusters enable ACL support, and replication ignores any ACL related failures. |
logPath | string | The HDFS path where the replication log files should be written to. |
skipChecksumChecks | boolean | Whether to skip checksum based file validation during replication. Defaults to false. |
skipListingChecksumChecks | boolean | Whether to skip checksum based file comparison during replication. Defaults to false. |
skipTrash | boolean | Whether to permanently delete destination files that are missing in source. Defaults to null. |
replicationStrategy | ReplicationStrategy | The strategy for distributing the file replication tasks among the mappers of the MR job associated with a replication. Default is ReplicationStrategy#STATIC. |
preserveXAttrs | boolean | Whether to preserve XAttrs, default to false This is introduced in V10. To preserve XAttrs, both CDH versions should be >= 5.2. Replication fails if either cluster does not support XAttrs. |
exclusionFilters | array of string | Specify regular expression strings to match full paths of files and directories matching source paths and exclude them from the replication. Optional. Available since V11. |
raiseSnapshotDiffFailures | boolean |
Example
{ "sourceService" : { "peerName" : "...", "clusterName" : "...", "serviceName" : "..." }, "sourcePath" : "...", "destinationPath" : "...", "mapreduceServiceName" : "...", "schedulerPoolName" : "...", "userName" : "...", "sourceUser" : "...", "numMaps" : 12345, "dryRun" : true, "bandwidthPerMap" : 12345, "abortOnError" : true, "removeMissingFiles" : true, "preserveReplicationCount" : true, "preserveBlockSize" : true, "preservePermissions" : true, "logPath" : "...", "skipChecksumChecks" : true, "skipListingChecksumChecks" : true, "skipTrash" : true, "replicationStrategy" : "STATIC", "preserveXAttrs" : true, "exclusionFilters" : [ "...", "..." ], "raiseSnapshotDiffFailures" : true }