ApiHiveCloudReplicationArguments Data Model

Replication arguments for Hive services.

Properties
name data type description
exportFilesPrefix string When a Hive external cloud replication schedule was executed, then the exported files had the same name under the given cloud root path. Therefore, it was forbidden to concurrently execute multiple Hive external replications with the same cloud root path.
This field is meant to lift this restriction: it contains a prefix name that will be prepended to every export file, by default it is set to a UUID for a replication schedule.
Note that this feature is available only if feature flag HIVE_ALLOW_CONCURRENT_REPLICATION_WITH_SAME_CLOUD_ROOT_PATH is enabled on both source and destination! It's NOT supported for cloud backup only and cloud restore only replications.
For instance if cloudRootPath == s3a://bucket/warehouse/external/my_db and exportFilesPrefix == ab58c108, then after a successful replication the following export files will be in the cloud:
  • s3a://bucket/warehouse/external/my_db/ab58c108export.json
  • s3a://bucket/warehouse/external/my_db/ab58c108.export.json.meta
  • s3a://bucket/warehouse/external/my_db/ab58c108sentry-export.json
In the next replication run the same UUID will be used for a certain schedule as before, so the same export files will be overwritten in the cloud. This way the cloud root path won't be polluted.
sourceAccount string
destinationAccount string
cloudRootPath string
replicationOption ReplicationOption
Properties inherited from ApiHiveReplicationArguments
sourceService ApiServiceRef The service to replicate from.
tableFilters array of ApiHiveTable Filters for tables to include in the replication. Optional. If not provided, include all tables in all databases.
exportDir string Directory, in the HDFS service where the target Hive service's data is stored, where the export file will be saved. Optional. If not provided, Cloudera Manager will pick a directory for storing the data.
force boolean Whether to force overwriting of mismatched tables. Defaults to false.
replicateData boolean Whether to replicate table data stored in HDFS. Defaults to false.

If set, the "hdfsArguments" property must be set to configure the HDFS replication job.

hdfsArguments ApiHdfsReplicationArguments Arguments for the HDFS replication job.

This must be provided when choosing to replicate table data stored in HDFS. The "sourceService", "sourcePath" and "dryRun" properties of the HDFS arguments are ignored; their values are derived from the Hive replication's information.

The "destinationPath" property is used slightly differently from the usual HDFS replication jobs. It is used to map the root path of the source service into the target service. It may be omitted, in which case the source and target paths will match.

Example: if the destination path is set to "/new_root", a "/foo/bar" path in the source will be stored in "/new_root/foo/bar" in the target.

replicateImpalaMetadata boolean Whether to replicate the impala metadata. (i.e. the metadata for impala UDFs and their corresponding binaries in HDFS).
runInvalidateMetadata boolean Whether to run invalidate metadata query or not
dryRun boolean Whether to perform a dry run. Defaults to false
numThreads number Number of threads to use in multi-threaded export/import phase
sentryMigration boolean
skipUrlPermissions boolean Is skipUrlPermissions on.
atlasReplicationNeeded boolean
sentryExportProperties map of string Additional properties to add or override in authorization-migration-site.xml for Sentry export, on the source.
rangerImportProperties map of string Additional properties to add or override in authorization-migration-site.xml for Ranger import, on the destination.

Example

{
  "exportFilesPrefix" : "...",
  "sourceAccount" : "...",
  "destinationAccount" : "...",
  "cloudRootPath" : "...",
  "replicationOption" : "METADATA_AND_DATA",
  "sourceService" : {
    "peerName" : "...",
    "clusterName" : "...",
    "clusterDisplayName" : "...",
    "serviceName" : "...",
    "serviceDisplayName" : "...",
    "serviceType" : "..."
  },
  "tableFilters" : [ {
    "database" : "...",
    "tableName" : "..."
  }, {
    "database" : "...",
    "tableName" : "..."
  } ],
  "exportDir" : "...",
  "force" : true,
  "replicateData" : true,
  "hdfsArguments" : {
    "sourceService" : {
      "peerName" : "...",
      "clusterName" : "...",
      "clusterDisplayName" : "...",
      "serviceName" : "...",
      "serviceDisplayName" : "...",
      "serviceType" : "..."
    },
    "sourcePath" : "...",
    "destinationPath" : "...",
    "mapreduceServiceName" : "...",
    "schedulerPoolName" : "...",
    "userName" : "...",
    "sourceUser" : "...",
    "numMaps" : 12345,
    "dryRun" : true,
    "bandwidthPerMap" : 12345,
    "abortOnError" : true,
    "removeMissingFiles" : true,
    "preserveReplicationCount" : true,
    "preserveBlockSize" : true,
    "preservePermissions" : true,
    "logPath" : "...",
    "skipChecksumChecks" : true,
    "skipListingChecksumChecks" : true,
    "skipTrash" : true,
    "replicationStrategy" : "STATIC",
    "preserveXAttrs" : true,
    "exclusionFilters" : [ "...", "..." ],
    "raiseSnapshotDiffFailures" : true,
    "deleteLatestSourceSnapshotOnJobFailure" : true,
    "numFetchThreads" : 12345,
    "destinationCloudAccount" : "..."
  },
  "replicateImpalaMetadata" : true,
  "runInvalidateMetadata" : true,
  "dryRun" : true,
  "numThreads" : 12345,
  "sentryMigration" : true,
  "skipUrlPermissions" : true,
  "atlasReplicationNeeded" : true,
  "sentryExportProperties" : {
    "property1" : "...",
    "property2" : "..."
  },
  "rangerImportProperties" : {
    "property1" : "...",
    "property2" : "..."
  }
}