Cloudera Manager API: ApiHdfsCloudReplicationArguments

Replication arguments for HDFS.

Properties
name	data type	description
sourceAccount	string
destinationAccount	string
Properties inherited from ApiHdfsReplicationArguments
sourceService	ApiServiceRef	The service to replicate from.
sourcePath	string	The path to replicate.
destinationPath	string	The destination to replicate to.
mapreduceServiceName	string	The mapreduce service to use for the replication job.
schedulerPoolName	string	Name of the scheduler pool to use when submitting the MapReduce job. Currently supports the capacity and fair schedulers. The option is ignored if a different scheduler is configured.
userName	string	The user which will execute the MapReduce job. Required if running with Kerberos enabled.
sourceUser	string	The user which will perform operations on source cluster. Required if running with Kerberos enabled.
numMaps	number	The number of mappers to use for the mapreduce replication job.
dryRun	boolean	Whether to perform a dry run. Defaults to false.
bandwidthPerMap	number	The maximum bandwidth (in MB) per mapper in the mapreduce replication job.
abortOnError	boolean	Whether to abort on a replication failure. Defaults to false.
removeMissingFiles	boolean	Whether to delete destination files that are missing in source. Defaults to false.
preserveReplicationCount	boolean	Whether to preserve the HDFS replication count. Defaults to false.
preserveBlockSize	boolean	Whether to preserve the HDFS block size. Defaults to false.
preservePermissions	boolean	Whether to preserve the HDFS owner, group and permissions. Defaults to false. Starting from V10, it also preserves ACLs. Defaults to null (no preserve). ACLs is preserved if both clusters enable ACL support, and replication ignores any ACL related failures.
logPath	string	The HDFS path where the replication log files should be written to.
skipChecksumChecks	boolean	Whether to skip checksum based file validation during replication. Defaults to false.
skipListingChecksumChecks	boolean	Whether to skip checksum based file comparison during replication. Defaults to false.
skipTrash	boolean	Whether to permanently delete destination files that are missing in source. Defaults to null.
replicationStrategy	ReplicationStrategy	The strategy for distributing the file replication tasks among the mappers of the MR job associated with a replication. Default is ReplicationStrategy#STATIC.
preserveXAttrs	boolean	Whether to preserve XAttrs, default to false This is introduced in V10. To preserve XAttrs, both CDH versions should be >= 5.2. Replication fails if either cluster does not support XAttrs.
exclusionFilters	array of string	Specify regular expression strings to match full paths of files and directories matching source paths and exclude them from the replication. Optional. Available since V11.
raiseSnapshotDiffFailures	boolean	Flag indicating if failures during snapshotDiff should be ignored or not. When it is set to false then, replication will fallback to full copy listing in case of any error in snapshot diff handling and it will ignore snapshot delete/rename failures at the end of a replication. The flag is by default set to false in distcp tool which means it will ignore snapshot diff failures and mark replication as success for snapshot delete/rename failures. In UI, the flag is set to true by default when source CM Version is greater than 5.14.
deleteLatestSourceSnapshotOnJobFailure	boolean	A flag configuring distcp behaviour for the case when the distcp mapreduce job fails. This failure is sometimes due to some issues with the snapshot. By default, in case of mapreduce job failure the latest source side snapshot is deleted. Set this flag to false to make this replication not delete the last successfully replicated old snapshot of the source dataset in case of job failure. The default value of this flag is true. Not setting this flag is equivalent to the value true.
numFetchThreads	number	The number of threads to use for fetching the file statuses from HDFS during source file listing. The value 0 means don't use parallel fetching. The null value means to use distcp defaults.
destinationCloudAccount	string	The cloud account name which is used in direct hive cloud replication, if specified.

Example

{
  "sourceAccount" : "...",
  "destinationAccount" : "...",
  "sourceService" : {
    "peerName" : "...",
    "clusterName" : "...",
    "clusterDisplayName" : "...",
    "serviceName" : "...",
    "serviceDisplayName" : "...",
    "serviceType" : "..."
  },
  "sourcePath" : "...",
  "destinationPath" : "...",
  "mapreduceServiceName" : "...",
  "schedulerPoolName" : "...",
  "userName" : "...",
  "sourceUser" : "...",
  "numMaps" : 12345,
  "dryRun" : true,
  "bandwidthPerMap" : 12345,
  "abortOnError" : true,
  "removeMissingFiles" : true,
  "preserveReplicationCount" : true,
  "preserveBlockSize" : true,
  "preservePermissions" : true,
  "logPath" : "...",
  "skipChecksumChecks" : true,
  "skipListingChecksumChecks" : true,
  "skipTrash" : true,
  "replicationStrategy" : "STATIC",
  "preserveXAttrs" : true,
  "exclusionFilters" : [ "...", "..." ],
  "raiseSnapshotDiffFailures" : true,
  "deleteLatestSourceSnapshotOnJobFailure" : true,
  "numFetchThreads" : 12345,
  "destinationCloudAccount" : "..."
}

ApiHdfsCloudReplicationArguments Data Model

Properties inherited from ApiHdfsReplicationArguments