You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
1429 lines
81 KiB
1429 lines
81 KiB
<html><body>
|
|
<style>
|
|
|
|
body, h1, h2, h3, div, span, p, pre, a {
|
|
margin: 0;
|
|
padding: 0;
|
|
border: 0;
|
|
font-weight: inherit;
|
|
font-style: inherit;
|
|
font-size: 100%;
|
|
font-family: inherit;
|
|
vertical-align: baseline;
|
|
}
|
|
|
|
body {
|
|
font-size: 13px;
|
|
padding: 1em;
|
|
}
|
|
|
|
h1 {
|
|
font-size: 26px;
|
|
margin-bottom: 1em;
|
|
}
|
|
|
|
h2 {
|
|
font-size: 24px;
|
|
margin-bottom: 1em;
|
|
}
|
|
|
|
h3 {
|
|
font-size: 20px;
|
|
margin-bottom: 1em;
|
|
margin-top: 1em;
|
|
}
|
|
|
|
pre, code {
|
|
line-height: 1.5;
|
|
font-family: Monaco, 'DejaVu Sans Mono', 'Bitstream Vera Sans Mono', 'Lucida Console', monospace;
|
|
}
|
|
|
|
pre {
|
|
margin-top: 0.5em;
|
|
}
|
|
|
|
h1, h2, h3, p {
|
|
font-family: Arial, sans serif;
|
|
}
|
|
|
|
h1, h2, h3 {
|
|
border-bottom: solid #CCC 1px;
|
|
}
|
|
|
|
.toc_element {
|
|
margin-top: 0.5em;
|
|
}
|
|
|
|
.firstline {
|
|
margin-left: 2 em;
|
|
}
|
|
|
|
.method {
|
|
margin-top: 1em;
|
|
border: solid 1px #CCC;
|
|
padding: 1em;
|
|
background: #EEE;
|
|
}
|
|
|
|
.details {
|
|
font-weight: bold;
|
|
font-size: 14px;
|
|
}
|
|
|
|
</style>
|
|
|
|
<h1><a href="dataflow_v1b3.html">Dataflow API</a> . <a href="dataflow_v1b3.projects.html">projects</a> . <a href="dataflow_v1b3.projects.templates.html">templates</a></h1>
|
|
<h2>Instance Methods</h2>
|
|
<p class="toc_element">
|
|
<code><a href="#create">create(projectId, body, x__xgafv=None)</a></code></p>
|
|
<p class="firstline">Creates a Cloud Dataflow job from a template.</p>
|
|
<p class="toc_element">
|
|
<code><a href="#get">get(projectId, gcsPath=None, location=None, x__xgafv=None, view=None)</a></code></p>
|
|
<p class="firstline">Get the template associated with a template.</p>
|
|
<p class="toc_element">
|
|
<code><a href="#launch">launch(projectId, body, dynamicTemplate_gcsPath=None, x__xgafv=None, dynamicTemplate_stagingLocation=None, location=None, gcsPath=None, validateOnly=None)</a></code></p>
|
|
<p class="firstline">Launch a template.</p>
|
|
<h3>Method Details</h3>
|
|
<div class="method">
|
|
<code class="details" id="create">create(projectId, body, x__xgafv=None)</code>
|
|
<pre>Creates a Cloud Dataflow job from a template.
|
|
|
|
Args:
|
|
projectId: string, Required. The ID of the Cloud Platform project that the job belongs to. (required)
|
|
body: object, The request body. (required)
|
|
The object takes the form of:
|
|
|
|
{ # A request to create a Cloud Dataflow job from a template.
|
|
"environment": { # The environment values to set at runtime. # The runtime environment for the job.
|
|
"machineType": "A String", # The machine type to use for the job. Defaults to the value from the
|
|
# template if not specified.
|
|
"network": "A String", # Network to which VMs will be assigned. If empty or unspecified,
|
|
# the service will use the network "default".
|
|
"zone": "A String", # The Compute Engine [availability
|
|
# zone](https://cloud.google.com/compute/docs/regions-zones/regions-zones)
|
|
# for launching worker instances to run your pipeline.
|
|
"additionalUserLabels": { # Additional user labels to be specified for the job.
|
|
# Keys and values should follow the restrictions specified in the [labeling
|
|
# restrictions](https://cloud.google.com/compute/docs/labeling-resources#restrictions)
|
|
# page.
|
|
"a_key": "A String",
|
|
},
|
|
"additionalExperiments": [ # Additional experiment flags for the job.
|
|
"A String",
|
|
],
|
|
"bypassTempDirValidation": True or False, # Whether to bypass the safety checks for the job's temporary directory.
|
|
# Use with caution.
|
|
"tempLocation": "A String", # The Cloud Storage path to use for temporary files.
|
|
# Must be a valid Cloud Storage URL, beginning with `gs://`.
|
|
"serviceAccountEmail": "A String", # The email address of the service account to run the job as.
|
|
"numWorkers": 42, # The initial number of Google Compute Engine instnaces for the job.
|
|
"maxWorkers": 42, # The maximum number of Google Compute Engine instances to be made
|
|
# available to your pipeline during execution, from 1 to 1000.
|
|
"subnetwork": "A String", # Subnetwork to which VMs will be assigned, if desired. Expected to be of
|
|
# the form "regions/REGION/subnetworks/SUBNETWORK".
|
|
},
|
|
"gcsPath": "A String", # Required. A Cloud Storage path to the template from which to
|
|
# create the job.
|
|
# Must be a valid Cloud Storage URL, beginning with `gs://`.
|
|
"location": "A String", # The [regional endpoint]
|
|
# (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) to
|
|
# which to direct the request.
|
|
"parameters": { # The runtime parameters to pass to the job.
|
|
"a_key": "A String",
|
|
},
|
|
"jobName": "A String", # Required. The job name to use for the created job.
|
|
}
|
|
|
|
x__xgafv: string, V1 error format.
|
|
Allowed values
|
|
1 - v1 error format
|
|
2 - v2 error format
|
|
|
|
Returns:
|
|
An object of the form:
|
|
|
|
{ # Defines a job to be run by the Cloud Dataflow service.
|
|
"labels": { # User-defined labels for this job.
|
|
#
|
|
# The labels map can contain no more than 64 entries. Entries of the labels
|
|
# map are UTF8 strings that comply with the following restrictions:
|
|
#
|
|
# * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}
|
|
# * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}
|
|
# * Both keys and values are additionally constrained to be <= 128 bytes in
|
|
# size.
|
|
"a_key": "A String",
|
|
},
|
|
"jobMetadata": { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs
|
|
# by the metadata values provided here. Populated for ListJobs and all GetJob
|
|
# views SUMMARY and higher.
|
|
# ListJob response and Job SUMMARY view.
|
|
"sdkVersion": { # The version of the SDK used to run the job. # The SDK version used to run the job.
|
|
"versionDisplayName": "A String", # A readable string describing the version of the SDK.
|
|
"version": "A String", # The version of the SDK used to run the job.
|
|
"sdkSupportStatus": "A String", # The support status for this SDK version.
|
|
},
|
|
"pubsubDetails": [ # Identification of a PubSub source used in the Dataflow job.
|
|
{ # Metadata for a PubSub connector used by the job.
|
|
"topic": "A String", # Topic accessed in the connection.
|
|
"subscription": "A String", # Subscription used in the connection.
|
|
},
|
|
],
|
|
"datastoreDetails": [ # Identification of a Datastore source used in the Dataflow job.
|
|
{ # Metadata for a Datastore connector used by the job.
|
|
"projectId": "A String", # ProjectId accessed in the connection.
|
|
"namespace": "A String", # Namespace used in the connection.
|
|
},
|
|
],
|
|
"fileDetails": [ # Identification of a File source used in the Dataflow job.
|
|
{ # Metadata for a File connector used by the job.
|
|
"filePattern": "A String", # File Pattern used to access files by the connector.
|
|
},
|
|
],
|
|
"spannerDetails": [ # Identification of a Spanner source used in the Dataflow job.
|
|
{ # Metadata for a Spanner connector used by the job.
|
|
"instanceId": "A String", # InstanceId accessed in the connection.
|
|
"projectId": "A String", # ProjectId accessed in the connection.
|
|
"databaseId": "A String", # DatabaseId accessed in the connection.
|
|
},
|
|
],
|
|
"bigTableDetails": [ # Identification of a BigTable source used in the Dataflow job.
|
|
{ # Metadata for a BigTable connector used by the job.
|
|
"instanceId": "A String", # InstanceId accessed in the connection.
|
|
"projectId": "A String", # ProjectId accessed in the connection.
|
|
"tableId": "A String", # TableId accessed in the connection.
|
|
},
|
|
],
|
|
"bigqueryDetails": [ # Identification of a BigQuery source used in the Dataflow job.
|
|
{ # Metadata for a BigQuery connector used by the job.
|
|
"projectId": "A String", # Project accessed in the connection.
|
|
"dataset": "A String", # Dataset accessed in the connection.
|
|
"table": "A String", # Table accessed in the connection.
|
|
"query": "A String", # Query used to access data in the connection.
|
|
},
|
|
],
|
|
},
|
|
"pipelineDescription": { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.
|
|
# A description of the user pipeline and stages through which it is executed.
|
|
# Created by Cloud Dataflow service. Only retrieved with
|
|
# JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.
|
|
# form. This data is provided by the Dataflow service for ease of visualizing
|
|
# the pipeline and interpreting Dataflow provided metrics.
|
|
"originalPipelineTransform": [ # Description of each transform in the pipeline and collections between them.
|
|
{ # Description of the type, names/ids, and input/outputs for a transform.
|
|
"kind": "A String", # Type of transform.
|
|
"name": "A String", # User provided name for this transform instance.
|
|
"inputCollectionName": [ # User names for all collection inputs to this transform.
|
|
"A String",
|
|
],
|
|
"displayData": [ # Transform-specific display data.
|
|
{ # Data provided with a pipeline or transform to provide descriptive info.
|
|
"shortStrValue": "A String", # A possible additional shorter value to display.
|
|
# For example a java_class_name_value of com.mypackage.MyDoFn
|
|
# will be stored with MyDoFn as the short_str_value and
|
|
# com.mypackage.MyDoFn as the java_class_name value.
|
|
# short_str_value can be displayed and java_class_name_value
|
|
# will be displayed as a tooltip.
|
|
"durationValue": "A String", # Contains value if the data is of duration type.
|
|
"url": "A String", # An optional full URL.
|
|
"floatValue": 3.14, # Contains value if the data is of float type.
|
|
"namespace": "A String", # The namespace for the key. This is usually a class name or programming
|
|
# language namespace (i.e. python module) which defines the display data.
|
|
# This allows a dax monitoring system to specially handle the data
|
|
# and perform custom rendering.
|
|
"javaClassValue": "A String", # Contains value if the data is of java class type.
|
|
"label": "A String", # An optional label to display in a dax UI for the element.
|
|
"boolValue": True or False, # Contains value if the data is of a boolean type.
|
|
"strValue": "A String", # Contains value if the data is of string type.
|
|
"key": "A String", # The key identifying the display data.
|
|
# This is intended to be used as a label for the display data
|
|
# when viewed in a dax monitoring system.
|
|
"int64Value": "A String", # Contains value if the data is of int64 type.
|
|
"timestampValue": "A String", # Contains value if the data is of timestamp type.
|
|
},
|
|
],
|
|
"outputCollectionName": [ # User names for all collection outputs to this transform.
|
|
"A String",
|
|
],
|
|
"id": "A String", # SDK generated id of this transform instance.
|
|
},
|
|
],
|
|
"executionPipelineStage": [ # Description of each stage of execution of the pipeline.
|
|
{ # Description of the composing transforms, names/ids, and input/outputs of a
|
|
# stage of execution. Some composing transforms and sources may have been
|
|
# generated by the Dataflow service during execution planning.
|
|
"componentSource": [ # Collections produced and consumed by component transforms of this stage.
|
|
{ # Description of an interstitial value between transforms in an execution
|
|
# stage.
|
|
"userName": "A String", # Human-readable name for this transform; may be user or system generated.
|
|
"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
|
|
# source is most closely associated.
|
|
"name": "A String", # Dataflow service generated name for this source.
|
|
},
|
|
],
|
|
"kind": "A String", # Type of tranform this stage is executing.
|
|
"name": "A String", # Dataflow service generated name for this stage.
|
|
"outputSource": [ # Output sources for this stage.
|
|
{ # Description of an input or output of an execution stage.
|
|
"userName": "A String", # Human-readable name for this source; may be user or system generated.
|
|
"sizeBytes": "A String", # Size of the source, if measurable.
|
|
"name": "A String", # Dataflow service generated name for this source.
|
|
"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
|
|
# source is most closely associated.
|
|
},
|
|
],
|
|
"inputSource": [ # Input sources for this stage.
|
|
{ # Description of an input or output of an execution stage.
|
|
"userName": "A String", # Human-readable name for this source; may be user or system generated.
|
|
"sizeBytes": "A String", # Size of the source, if measurable.
|
|
"name": "A String", # Dataflow service generated name for this source.
|
|
"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
|
|
# source is most closely associated.
|
|
},
|
|
],
|
|
"componentTransform": [ # Transforms that comprise this execution stage.
|
|
{ # Description of a transform executed as part of an execution stage.
|
|
"userName": "A String", # Human-readable name for this transform; may be user or system generated.
|
|
"originalTransform": "A String", # User name for the original user transform with which this transform is
|
|
# most closely associated.
|
|
"name": "A String", # Dataflow service generated name for this source.
|
|
},
|
|
],
|
|
"id": "A String", # Dataflow service generated id for this stage.
|
|
},
|
|
],
|
|
"displayData": [ # Pipeline level display data.
|
|
{ # Data provided with a pipeline or transform to provide descriptive info.
|
|
"shortStrValue": "A String", # A possible additional shorter value to display.
|
|
# For example a java_class_name_value of com.mypackage.MyDoFn
|
|
# will be stored with MyDoFn as the short_str_value and
|
|
# com.mypackage.MyDoFn as the java_class_name value.
|
|
# short_str_value can be displayed and java_class_name_value
|
|
# will be displayed as a tooltip.
|
|
"durationValue": "A String", # Contains value if the data is of duration type.
|
|
"url": "A String", # An optional full URL.
|
|
"floatValue": 3.14, # Contains value if the data is of float type.
|
|
"namespace": "A String", # The namespace for the key. This is usually a class name or programming
|
|
# language namespace (i.e. python module) which defines the display data.
|
|
# This allows a dax monitoring system to specially handle the data
|
|
# and perform custom rendering.
|
|
"javaClassValue": "A String", # Contains value if the data is of java class type.
|
|
"label": "A String", # An optional label to display in a dax UI for the element.
|
|
"boolValue": True or False, # Contains value if the data is of a boolean type.
|
|
"strValue": "A String", # Contains value if the data is of string type.
|
|
"key": "A String", # The key identifying the display data.
|
|
# This is intended to be used as a label for the display data
|
|
# when viewed in a dax monitoring system.
|
|
"int64Value": "A String", # Contains value if the data is of int64 type.
|
|
"timestampValue": "A String", # Contains value if the data is of timestamp type.
|
|
},
|
|
],
|
|
},
|
|
"stageStates": [ # This field may be mutated by the Cloud Dataflow service;
|
|
# callers cannot mutate it.
|
|
{ # A message describing the state of a particular execution stage.
|
|
"executionStageName": "A String", # The name of the execution stage.
|
|
"executionStageState": "A String", # Executions stage states allow the same set of values as JobState.
|
|
"currentStateTime": "A String", # The time at which the stage transitioned to this state.
|
|
},
|
|
],
|
|
"id": "A String", # The unique ID of this job.
|
|
#
|
|
# This field is set by the Cloud Dataflow service when the Job is
|
|
# created, and is immutable for the life of the job.
|
|
"replacedByJobId": "A String", # If another job is an update of this job (and thus, this job is in
|
|
# `JOB_STATE_UPDATED`), this field contains the ID of that job.
|
|
"projectId": "A String", # The ID of the Cloud Platform project that the job belongs to.
|
|
"transformNameMapping": { # The map of transform name prefixes of the job to be replaced to the
|
|
# corresponding name prefixes of the new job.
|
|
"a_key": "A String",
|
|
},
|
|
"environment": { # Describes the environment in which a Dataflow Job runs. # The environment for the job.
|
|
"version": { # A structure describing which components and their versions of the service
|
|
# are required in order to run the job.
|
|
"a_key": "", # Properties of the object.
|
|
},
|
|
"flexResourceSchedulingGoal": "A String", # Which Flexible Resource Scheduling mode to run in.
|
|
"serviceKmsKeyName": "A String", # If set, contains the Cloud KMS key identifier used to encrypt data
|
|
# at rest, AKA a Customer Managed Encryption Key (CMEK).
|
|
#
|
|
# Format:
|
|
# projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY
|
|
"internalExperiments": { # Experimental settings.
|
|
"a_key": "", # Properties of the object. Contains field @type with type URL.
|
|
},
|
|
"dataset": "A String", # The dataset for the current project where various workflow
|
|
# related tables are stored.
|
|
#
|
|
# The supported resource type is:
|
|
#
|
|
# Google BigQuery:
|
|
# bigquery.googleapis.com/{dataset}
|
|
"experiments": [ # The list of experiments to enable.
|
|
"A String",
|
|
],
|
|
"serviceAccountEmail": "A String", # Identity to run virtual machines as. Defaults to the default account.
|
|
"sdkPipelineOptions": { # The Cloud Dataflow SDK pipeline options specified by the user. These
|
|
# options are passed through the service and are used to recreate the
|
|
# SDK pipeline options on the worker in a language agnostic and platform
|
|
# independent way.
|
|
"a_key": "", # Properties of the object.
|
|
},
|
|
"userAgent": { # A description of the process that generated the request.
|
|
"a_key": "", # Properties of the object.
|
|
},
|
|
"clusterManagerApiService": "A String", # The type of cluster manager API to use. If unknown or
|
|
# unspecified, the service will attempt to choose a reasonable
|
|
# default. This should be in the form of the API service name,
|
|
# e.g. "compute.googleapis.com".
|
|
"workerPools": [ # The worker pools. At least one "harness" worker pool must be
|
|
# specified in order for the job to have workers.
|
|
{ # Describes one particular pool of Cloud Dataflow workers to be
|
|
# instantiated by the Cloud Dataflow service in order to perform the
|
|
# computations required by a job. Note that a workflow job may use
|
|
# multiple pools, in order to match the various computational
|
|
# requirements of the various stages of the job.
|
|
"diskSourceImage": "A String", # Fully qualified source image for disks.
|
|
"taskrunnerSettings": { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when
|
|
# using the standard Dataflow task runner. Users should ignore
|
|
# this field.
|
|
"workflowFileName": "A String", # The file to store the workflow in.
|
|
"logUploadLocation": "A String", # Indicates where to put logs. If this is not specified, the logs
|
|
# will not be uploaded.
|
|
#
|
|
# The supported resource type is:
|
|
#
|
|
# Google Cloud Storage:
|
|
# storage.googleapis.com/{bucket}/{object}
|
|
# bucket.storage.googleapis.com/{object}
|
|
"commandlinesFileName": "A String", # The file to store preprocessing commands in.
|
|
"parallelWorkerSettings": { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.
|
|
"reportingEnabled": True or False, # Whether to send work progress updates to the service.
|
|
"shuffleServicePath": "A String", # The Shuffle service path relative to the root URL, for example,
|
|
# "shuffle/v1beta1".
|
|
"workerId": "A String", # The ID of the worker running this pipeline.
|
|
"baseUrl": "A String", # The base URL for accessing Google Cloud APIs.
|
|
#
|
|
# When workers access Google Cloud APIs, they logically do so via
|
|
# relative URLs. If this field is specified, it supplies the base
|
|
# URL to use for resolving these relative URLs. The normative
|
|
# algorithm used is defined by RFC 1808, "Relative Uniform Resource
|
|
# Locators".
|
|
#
|
|
# If not specified, the default value is "http://www.googleapis.com/"
|
|
"servicePath": "A String", # The Cloud Dataflow service path relative to the root URL, for example,
|
|
# "dataflow/v1b3/projects".
|
|
"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary
|
|
# storage.
|
|
#
|
|
# The supported resource type is:
|
|
#
|
|
# Google Cloud Storage:
|
|
#
|
|
# storage.googleapis.com/{bucket}/{object}
|
|
# bucket.storage.googleapis.com/{object}
|
|
},
|
|
"vmId": "A String", # The ID string of the VM.
|
|
"baseTaskDir": "A String", # The location on the worker for task-specific subdirectories.
|
|
"continueOnException": True or False, # Whether to continue taskrunner if an exception is hit.
|
|
"oauthScopes": [ # The OAuth2 scopes to be requested by the taskrunner in order to
|
|
# access the Cloud Dataflow API.
|
|
"A String",
|
|
],
|
|
"taskUser": "A String", # The UNIX user ID on the worker VM to use for tasks launched by
|
|
# taskrunner; e.g. "root".
|
|
"baseUrl": "A String", # The base URL for the taskrunner to use when accessing Google Cloud APIs.
|
|
#
|
|
# When workers access Google Cloud APIs, they logically do so via
|
|
# relative URLs. If this field is specified, it supplies the base
|
|
# URL to use for resolving these relative URLs. The normative
|
|
# algorithm used is defined by RFC 1808, "Relative Uniform Resource
|
|
# Locators".
|
|
#
|
|
# If not specified, the default value is "http://www.googleapis.com/"
|
|
"taskGroup": "A String", # The UNIX group ID on the worker VM to use for tasks launched by
|
|
# taskrunner; e.g. "wheel".
|
|
"languageHint": "A String", # The suggested backend language.
|
|
"logToSerialconsole": True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial
|
|
# console.
|
|
"streamingWorkerMainClass": "A String", # The streaming worker main class name.
|
|
"logDir": "A String", # The directory on the VM to store logs.
|
|
"dataflowApiVersion": "A String", # The API version of endpoint, e.g. "v1b3"
|
|
"harnessCommand": "A String", # The command to launch the worker harness.
|
|
"tempStoragePrefix": "A String", # The prefix of the resources the taskrunner should use for
|
|
# temporary storage.
|
|
#
|
|
# The supported resource type is:
|
|
#
|
|
# Google Cloud Storage:
|
|
# storage.googleapis.com/{bucket}/{object}
|
|
# bucket.storage.googleapis.com/{object}
|
|
"alsologtostderr": True or False, # Whether to also send taskrunner log info to stderr.
|
|
},
|
|
"kind": "A String", # The kind of the worker pool; currently only `harness` and `shuffle`
|
|
# are supported.
|
|
"packages": [ # Packages to be installed on workers.
|
|
{ # The packages that must be installed in order for a worker to run the
|
|
# steps of the Cloud Dataflow job that will be assigned to its worker
|
|
# pool.
|
|
#
|
|
# This is the mechanism by which the Cloud Dataflow SDK causes code to
|
|
# be loaded onto the workers. For example, the Cloud Dataflow Java SDK
|
|
# might use this to install jars containing the user's code and all of the
|
|
# various dependencies (libraries, data files, etc.) required in order
|
|
# for that code to run.
|
|
"location": "A String", # The resource to read the package from. The supported resource type is:
|
|
#
|
|
# Google Cloud Storage:
|
|
#
|
|
# storage.googleapis.com/{bucket}
|
|
# bucket.storage.googleapis.com/
|
|
"name": "A String", # The name of the package.
|
|
},
|
|
],
|
|
"machineType": "A String", # Machine type (e.g. "n1-standard-1"). If empty or unspecified, the
|
|
# service will attempt to choose a reasonable default.
|
|
"network": "A String", # Network to which VMs will be assigned. If empty or unspecified,
|
|
# the service will use the network "default".
|
|
"zone": "A String", # Zone to run the worker pools in. If empty or unspecified, the service
|
|
# will attempt to choose a reasonable default.
|
|
"diskSizeGb": 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will
|
|
# attempt to choose a reasonable default.
|
|
"teardownPolicy": "A String", # Sets the policy for determining when to turndown worker pool.
|
|
# Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and
|
|
# `TEARDOWN_NEVER`.
|
|
# `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether
|
|
# the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down
|
|
# if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn
|
|
# down.
|
|
#
|
|
# If the workers are not torn down by the service, they will
|
|
# continue to run and use Google Compute Engine VM resources in the
|
|
# user's project until they are explicitly terminated by the user.
|
|
# Because of this, Google recommends using the `TEARDOWN_ALWAYS`
|
|
# policy except for small, manually supervised test jobs.
|
|
#
|
|
# If unknown or unspecified, the service will attempt to choose a reasonable
|
|
# default.
|
|
"onHostMaintenance": "A String", # The action to take on host maintenance, as defined by the Google
|
|
# Compute Engine API.
|
|
"ipConfiguration": "A String", # Configuration for VM IPs.
|
|
"numThreadsPerWorker": 42, # The number of threads per worker harness. If empty or unspecified, the
|
|
# service will choose a number of threads (according to the number of cores
|
|
# on the selected machine type for batch, or 1 by convention for streaming).
|
|
"poolArgs": { # Extra arguments for this worker pool.
|
|
"a_key": "", # Properties of the object. Contains field @type with type URL.
|
|
},
|
|
"numWorkers": 42, # Number of Google Compute Engine workers in this pool needed to
|
|
# execute the job. If zero or unspecified, the service will
|
|
# attempt to choose a reasonable default.
|
|
"workerHarnessContainerImage": "A String", # Required. Docker container image that executes the Cloud Dataflow worker
|
|
# harness, residing in Google Container Registry.
|
|
"subnetwork": "A String", # Subnetwork to which VMs will be assigned, if desired. Expected to be of
|
|
# the form "regions/REGION/subnetworks/SUBNETWORK".
|
|
"dataDisks": [ # Data disks that are used by a VM in this workflow.
|
|
{ # Describes the data disk used by a workflow job.
|
|
"mountPoint": "A String", # Directory in a VM where disk is mounted.
|
|
"sizeGb": 42, # Size of disk in GB. If zero or unspecified, the service will
|
|
# attempt to choose a reasonable default.
|
|
"diskType": "A String", # Disk storage type, as defined by Google Compute Engine. This
|
|
# must be a disk type appropriate to the project and zone in which
|
|
# the workers will run. If unknown or unspecified, the service
|
|
# will attempt to choose a reasonable default.
|
|
#
|
|
# For example, the standard persistent disk type is a resource name
|
|
# typically ending in "pd-standard". If SSD persistent disks are
|
|
# available, the resource name typically ends with "pd-ssd". The
|
|
# actual valid values are defined the Google Compute Engine API,
|
|
# not by the Cloud Dataflow API; consult the Google Compute Engine
|
|
# documentation for more information about determining the set of
|
|
# available disk types for a particular project and zone.
|
|
#
|
|
# Google Compute Engine Disk types are local to a particular
|
|
# project in a particular zone, and so the resource name will
|
|
# typically look something like this:
|
|
#
|
|
# compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard
|
|
},
|
|
],
|
|
"autoscalingSettings": { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.
|
|
"maxNumWorkers": 42, # The maximum number of workers to cap scaling at.
|
|
"algorithm": "A String", # The algorithm to use for autoscaling.
|
|
},
|
|
"defaultPackageSet": "A String", # The default package set to install. This allows the service to
|
|
# select a default set of packages which are useful to worker
|
|
# harnesses written in a particular language.
|
|
"diskType": "A String", # Type of root disk for VMs. If empty or unspecified, the service will
|
|
# attempt to choose a reasonable default.
|
|
"metadata": { # Metadata to set on the Google Compute Engine VMs.
|
|
"a_key": "A String",
|
|
},
|
|
},
|
|
],
|
|
"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary
|
|
# storage. The system will append the suffix "/temp-{JOBNAME} to
|
|
# this resource prefix, where {JOBNAME} is the value of the
|
|
# job_name field. The resulting bucket and object prefix is used
|
|
# as the prefix of the resources used to store temporary data
|
|
# needed during the job execution. NOTE: This will override the
|
|
# value in taskrunner_settings.
|
|
# The supported resource type is:
|
|
#
|
|
# Google Cloud Storage:
|
|
#
|
|
# storage.googleapis.com/{bucket}/{object}
|
|
# bucket.storage.googleapis.com/{object}
|
|
},
|
|
"location": "A String", # The [regional endpoint]
|
|
# (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
|
|
# contains this job.
|
|
"tempFiles": [ # A set of files the system should be aware of that are used
|
|
# for temporary storage. These temporary files will be
|
|
# removed on job completion.
|
|
# No duplicates are allowed.
|
|
# No file patterns are supported.
|
|
#
|
|
# The supported files are:
|
|
#
|
|
# Google Cloud Storage:
|
|
#
|
|
# storage.googleapis.com/{bucket}/{object}
|
|
# bucket.storage.googleapis.com/{object}
|
|
"A String",
|
|
],
|
|
"type": "A String", # The type of Cloud Dataflow job.
|
|
"clientRequestId": "A String", # The client's unique identifier of the job, re-used across retried attempts.
|
|
# If this field is set, the service will ensure its uniqueness.
|
|
# The request to create a job will fail if the service has knowledge of a
|
|
# previously submitted job with the same client's ID and job name.
|
|
# The caller may use this field to ensure idempotence of job
|
|
# creation across retried attempts to create a job.
|
|
# By default, the field is empty and, in that case, the service ignores it.
|
|
"createdFromSnapshotId": "A String", # If this is specified, the job's initial state is populated from the given
|
|
# snapshot.
|
|
"stepsLocation": "A String", # The GCS location where the steps are stored.
|
|
"currentStateTime": "A String", # The timestamp associated with the current state.
|
|
"startTime": "A String", # The timestamp when the job was started (transitioned to JOB_STATE_PENDING).
|
|
# Flexible resource scheduling jobs are started with some delay after job
|
|
# creation, so start_time is unset before start and is updated when the
|
|
# job is started by the Cloud Dataflow service. For other jobs, start_time
|
|
# always equals to create_time and is immutable and set by the Cloud Dataflow
|
|
# service.
|
|
"createTime": "A String", # The timestamp when the job was initially created. Immutable and set by the
|
|
# Cloud Dataflow service.
|
|
"requestedState": "A String", # The job's requested state.
|
|
#
|
|
# `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and
|
|
# `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may
|
|
# also be used to directly set a job's requested state to
|
|
# `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the
|
|
# job if it has not already reached a terminal state.
|
|
"name": "A String", # The user-specified Cloud Dataflow job name.
|
|
#
|
|
# Only one Job with a given name may exist in a project at any
|
|
# given time. If a caller attempts to create a Job with the same
|
|
# name as an already-existing Job, the attempt returns the
|
|
# existing Job.
|
|
#
|
|
# The name must match the regular expression
|
|
# `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`
|
|
"steps": [ # Exactly one of step or steps_location should be specified.
|
|
#
|
|
# The top-level steps that constitute the entire job.
|
|
{ # Defines a particular step within a Cloud Dataflow job.
|
|
#
|
|
# A job consists of multiple steps, each of which performs some
|
|
# specific operation as part of the overall job. Data is typically
|
|
# passed from one step to another as part of the job.
|
|
#
|
|
# Here's an example of a sequence of steps which together implement a
|
|
# Map-Reduce job:
|
|
#
|
|
# * Read a collection of data from some source, parsing the
|
|
# collection's elements.
|
|
#
|
|
# * Validate the elements.
|
|
#
|
|
# * Apply a user-defined function to map each element to some value
|
|
# and extract an element-specific key value.
|
|
#
|
|
# * Group elements with the same key into a single element with
|
|
# that key, transforming a multiply-keyed collection into a
|
|
# uniquely-keyed collection.
|
|
#
|
|
# * Write the elements out to some data sink.
|
|
#
|
|
# Note that the Cloud Dataflow service may be used to run many different
|
|
# types of jobs, not just Map-Reduce.
|
|
"kind": "A String", # The kind of step in the Cloud Dataflow job.
|
|
"properties": { # Named properties associated with the step. Each kind of
|
|
# predefined step has its own required set of properties.
|
|
# Must be provided on Create. Only retrieved with JOB_VIEW_ALL.
|
|
"a_key": "", # Properties of the object.
|
|
},
|
|
"name": "A String", # The name that identifies the step. This must be unique for each
|
|
# step with respect to all other steps in the Cloud Dataflow job.
|
|
},
|
|
],
|
|
"replaceJobId": "A String", # If this job is an update of an existing job, this field is the job ID
|
|
# of the job it replaced.
|
|
#
|
|
# When sending a `CreateJobRequest`, you can update a job by specifying it
|
|
# here. The job named here is stopped, and its intermediate state is
|
|
# transferred to this job.
|
|
"currentState": "A String", # The current state of the job.
|
|
#
|
|
# Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise
|
|
# specified.
|
|
#
|
|
# A job in the `JOB_STATE_RUNNING` state may asynchronously enter a
|
|
# terminal state. After a job has reached a terminal state, no
|
|
# further state updates may be made.
|
|
#
|
|
# This field may be mutated by the Cloud Dataflow service;
|
|
# callers cannot mutate it.
|
|
"executionInfo": { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.
|
|
# isn't contained in the submitted job.
|
|
"stages": { # A mapping from each stage to the information about that stage.
|
|
"a_key": { # Contains information about how a particular
|
|
# google.dataflow.v1beta3.Step will be executed.
|
|
"stepName": [ # The steps associated with the execution stage.
|
|
# Note that stages may have several steps, and that a given step
|
|
# might be run by more than one stage.
|
|
"A String",
|
|
],
|
|
},
|
|
},
|
|
},
|
|
}</pre>
|
|
</div>
|
|
|
|
<div class="method">
|
|
<code class="details" id="get">get(projectId, gcsPath=None, location=None, x__xgafv=None, view=None)</code>
|
|
<pre>Get the template associated with a template.
|
|
|
|
Args:
|
|
projectId: string, Required. The ID of the Cloud Platform project that the job belongs to. (required)
|
|
gcsPath: string, Required. A Cloud Storage path to the template from which to
|
|
create the job.
|
|
Must be valid Cloud Storage URL, beginning with 'gs://'.
|
|
location: string, The [regional endpoint]
|
|
(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) to
|
|
which to direct the request.
|
|
x__xgafv: string, V1 error format.
|
|
Allowed values
|
|
1 - v1 error format
|
|
2 - v2 error format
|
|
view: string, The view to retrieve. Defaults to METADATA_ONLY.
|
|
|
|
Returns:
|
|
An object of the form:
|
|
|
|
{ # The response to a GetTemplate request.
|
|
"status": { # The `Status` type defines a logical error model that is suitable for # The status of the get template request. Any problems with the
|
|
# request will be indicated in the error_details.
|
|
# different programming environments, including REST APIs and RPC APIs. It is
|
|
# used by [gRPC](https://github.com/grpc). The error model is designed to be:
|
|
#
|
|
# - Simple to use and understand for most users
|
|
# - Flexible enough to meet unexpected needs
|
|
#
|
|
# # Overview
|
|
#
|
|
# The `Status` message contains three pieces of data: error code, error
|
|
# message, and error details. The error code should be an enum value of
|
|
# google.rpc.Code, but it may accept additional error codes if needed. The
|
|
# error message should be a developer-facing English message that helps
|
|
# developers *understand* and *resolve* the error. If a localized user-facing
|
|
# error message is needed, put the localized message in the error details or
|
|
# localize it in the client. The optional error details may contain arbitrary
|
|
# information about the error. There is a predefined set of error detail types
|
|
# in the package `google.rpc` that can be used for common error conditions.
|
|
#
|
|
# # Language mapping
|
|
#
|
|
# The `Status` message is the logical representation of the error model, but it
|
|
# is not necessarily the actual wire format. When the `Status` message is
|
|
# exposed in different client libraries and different wire protocols, it can be
|
|
# mapped differently. For example, it will likely be mapped to some exceptions
|
|
# in Java, but more likely mapped to some error codes in C.
|
|
#
|
|
# # Other uses
|
|
#
|
|
# The error model and the `Status` message can be used in a variety of
|
|
# environments, either with or without APIs, to provide a
|
|
# consistent developer experience across different environments.
|
|
#
|
|
# Example uses of this error model include:
|
|
#
|
|
# - Partial errors. If a service needs to return partial errors to the client,
|
|
# it may embed the `Status` in the normal response to indicate the partial
|
|
# errors.
|
|
#
|
|
# - Workflow errors. A typical workflow has multiple steps. Each step may
|
|
# have a `Status` message for error reporting.
|
|
#
|
|
# - Batch operations. If a client uses batch request and batch response, the
|
|
# `Status` message should be used directly inside batch response, one for
|
|
# each error sub-response.
|
|
#
|
|
# - Asynchronous operations. If an API call embeds asynchronous operation
|
|
# results in its response, the status of those operations should be
|
|
# represented directly using the `Status` message.
|
|
#
|
|
# - Logging. If some API errors are stored in logs, the message `Status` could
|
|
# be used directly after any stripping needed for security/privacy reasons.
|
|
"message": "A String", # A developer-facing error message, which should be in English. Any
|
|
# user-facing error message should be localized and sent in the
|
|
# google.rpc.Status.details field, or localized by the client.
|
|
"code": 42, # The status code, which should be an enum value of google.rpc.Code.
|
|
"details": [ # A list of messages that carry the error details. There is a common set of
|
|
# message types for APIs to use.
|
|
{
|
|
"a_key": "", # Properties of the object. Contains field @type with type URL.
|
|
},
|
|
],
|
|
},
|
|
"metadata": { # Metadata describing a template. # The template metadata describing the template name, available
|
|
# parameters, etc.
|
|
"name": "A String", # Required. The name of the template.
|
|
"parameters": [ # The parameters for the template.
|
|
{ # Metadata for a specific parameter.
|
|
"regexes": [ # Optional. Regexes that the parameter must match.
|
|
"A String",
|
|
],
|
|
"helpText": "A String", # Required. The help text to display for the parameter.
|
|
"name": "A String", # Required. The name of the parameter.
|
|
"isOptional": True or False, # Optional. Whether the parameter is optional. Defaults to false.
|
|
"label": "A String", # Required. The label to display for the parameter.
|
|
},
|
|
],
|
|
"description": "A String", # Optional. A description of the template.
|
|
},
|
|
}</pre>
|
|
</div>
|
|
|
|
<div class="method">
|
|
<code class="details" id="launch">launch(projectId, body, dynamicTemplate_gcsPath=None, x__xgafv=None, dynamicTemplate_stagingLocation=None, location=None, gcsPath=None, validateOnly=None)</code>
|
|
<pre>Launch a template.
|
|
|
|
Args:
|
|
projectId: string, Required. The ID of the Cloud Platform project that the job belongs to. (required)
|
|
body: object, The request body. (required)
|
|
The object takes the form of:
|
|
|
|
{ # Parameters to provide to the template being launched.
|
|
"environment": { # The environment values to set at runtime. # The runtime environment for the job.
|
|
"machineType": "A String", # The machine type to use for the job. Defaults to the value from the
|
|
# template if not specified.
|
|
"network": "A String", # Network to which VMs will be assigned. If empty or unspecified,
|
|
# the service will use the network "default".
|
|
"zone": "A String", # The Compute Engine [availability
|
|
# zone](https://cloud.google.com/compute/docs/regions-zones/regions-zones)
|
|
# for launching worker instances to run your pipeline.
|
|
"additionalUserLabels": { # Additional user labels to be specified for the job.
|
|
# Keys and values should follow the restrictions specified in the [labeling
|
|
# restrictions](https://cloud.google.com/compute/docs/labeling-resources#restrictions)
|
|
# page.
|
|
"a_key": "A String",
|
|
},
|
|
"additionalExperiments": [ # Additional experiment flags for the job.
|
|
"A String",
|
|
],
|
|
"bypassTempDirValidation": True or False, # Whether to bypass the safety checks for the job's temporary directory.
|
|
# Use with caution.
|
|
"tempLocation": "A String", # The Cloud Storage path to use for temporary files.
|
|
# Must be a valid Cloud Storage URL, beginning with `gs://`.
|
|
"serviceAccountEmail": "A String", # The email address of the service account to run the job as.
|
|
"numWorkers": 42, # The initial number of Google Compute Engine instnaces for the job.
|
|
"maxWorkers": 42, # The maximum number of Google Compute Engine instances to be made
|
|
# available to your pipeline during execution, from 1 to 1000.
|
|
"subnetwork": "A String", # Subnetwork to which VMs will be assigned, if desired. Expected to be of
|
|
# the form "regions/REGION/subnetworks/SUBNETWORK".
|
|
},
|
|
"parameters": { # The runtime parameters to pass to the job.
|
|
"a_key": "A String",
|
|
},
|
|
"jobName": "A String", # Required. The job name to use for the created job.
|
|
}
|
|
|
|
dynamicTemplate_gcsPath: string, Path to dynamic template spec file on GCS.
|
|
The file must be a Json serialized DynamicTemplateFieSpec object.
|
|
x__xgafv: string, V1 error format.
|
|
Allowed values
|
|
1 - v1 error format
|
|
2 - v2 error format
|
|
dynamicTemplate_stagingLocation: string, Cloud Storage path for staging dependencies.
|
|
Must be a valid Cloud Storage URL, beginning with `gs://`.
|
|
location: string, The [regional endpoint]
|
|
(https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) to
|
|
which to direct the request.
|
|
gcsPath: string, A Cloud Storage path to the template from which to create
|
|
the job.
|
|
Must be valid Cloud Storage URL, beginning with 'gs://'.
|
|
validateOnly: boolean, If true, the request is validated but not actually executed.
|
|
Defaults to false.
|
|
|
|
Returns:
|
|
An object of the form:
|
|
|
|
{ # Response to the request to launch a template.
|
|
"job": { # Defines a job to be run by the Cloud Dataflow service. # The job that was launched, if the request was not a dry run and
|
|
# the job was successfully launched.
|
|
"labels": { # User-defined labels for this job.
|
|
#
|
|
# The labels map can contain no more than 64 entries. Entries of the labels
|
|
# map are UTF8 strings that comply with the following restrictions:
|
|
#
|
|
# * Keys must conform to regexp: \p{Ll}\p{Lo}{0,62}
|
|
# * Values must conform to regexp: [\p{Ll}\p{Lo}\p{N}_-]{0,63}
|
|
# * Both keys and values are additionally constrained to be <= 128 bytes in
|
|
# size.
|
|
"a_key": "A String",
|
|
},
|
|
"jobMetadata": { # Metadata available primarily for filtering jobs. Will be included in the # This field is populated by the Dataflow service to support filtering jobs
|
|
# by the metadata values provided here. Populated for ListJobs and all GetJob
|
|
# views SUMMARY and higher.
|
|
# ListJob response and Job SUMMARY view.
|
|
"sdkVersion": { # The version of the SDK used to run the job. # The SDK version used to run the job.
|
|
"versionDisplayName": "A String", # A readable string describing the version of the SDK.
|
|
"version": "A String", # The version of the SDK used to run the job.
|
|
"sdkSupportStatus": "A String", # The support status for this SDK version.
|
|
},
|
|
"pubsubDetails": [ # Identification of a PubSub source used in the Dataflow job.
|
|
{ # Metadata for a PubSub connector used by the job.
|
|
"topic": "A String", # Topic accessed in the connection.
|
|
"subscription": "A String", # Subscription used in the connection.
|
|
},
|
|
],
|
|
"datastoreDetails": [ # Identification of a Datastore source used in the Dataflow job.
|
|
{ # Metadata for a Datastore connector used by the job.
|
|
"projectId": "A String", # ProjectId accessed in the connection.
|
|
"namespace": "A String", # Namespace used in the connection.
|
|
},
|
|
],
|
|
"fileDetails": [ # Identification of a File source used in the Dataflow job.
|
|
{ # Metadata for a File connector used by the job.
|
|
"filePattern": "A String", # File Pattern used to access files by the connector.
|
|
},
|
|
],
|
|
"spannerDetails": [ # Identification of a Spanner source used in the Dataflow job.
|
|
{ # Metadata for a Spanner connector used by the job.
|
|
"instanceId": "A String", # InstanceId accessed in the connection.
|
|
"projectId": "A String", # ProjectId accessed in the connection.
|
|
"databaseId": "A String", # DatabaseId accessed in the connection.
|
|
},
|
|
],
|
|
"bigTableDetails": [ # Identification of a BigTable source used in the Dataflow job.
|
|
{ # Metadata for a BigTable connector used by the job.
|
|
"instanceId": "A String", # InstanceId accessed in the connection.
|
|
"projectId": "A String", # ProjectId accessed in the connection.
|
|
"tableId": "A String", # TableId accessed in the connection.
|
|
},
|
|
],
|
|
"bigqueryDetails": [ # Identification of a BigQuery source used in the Dataflow job.
|
|
{ # Metadata for a BigQuery connector used by the job.
|
|
"projectId": "A String", # Project accessed in the connection.
|
|
"dataset": "A String", # Dataset accessed in the connection.
|
|
"table": "A String", # Table accessed in the connection.
|
|
"query": "A String", # Query used to access data in the connection.
|
|
},
|
|
],
|
|
},
|
|
"pipelineDescription": { # A descriptive representation of submitted pipeline as well as the executed # Preliminary field: The format of this data may change at any time.
|
|
# A description of the user pipeline and stages through which it is executed.
|
|
# Created by Cloud Dataflow service. Only retrieved with
|
|
# JOB_VIEW_DESCRIPTION or JOB_VIEW_ALL.
|
|
# form. This data is provided by the Dataflow service for ease of visualizing
|
|
# the pipeline and interpreting Dataflow provided metrics.
|
|
"originalPipelineTransform": [ # Description of each transform in the pipeline and collections between them.
|
|
{ # Description of the type, names/ids, and input/outputs for a transform.
|
|
"kind": "A String", # Type of transform.
|
|
"name": "A String", # User provided name for this transform instance.
|
|
"inputCollectionName": [ # User names for all collection inputs to this transform.
|
|
"A String",
|
|
],
|
|
"displayData": [ # Transform-specific display data.
|
|
{ # Data provided with a pipeline or transform to provide descriptive info.
|
|
"shortStrValue": "A String", # A possible additional shorter value to display.
|
|
# For example a java_class_name_value of com.mypackage.MyDoFn
|
|
# will be stored with MyDoFn as the short_str_value and
|
|
# com.mypackage.MyDoFn as the java_class_name value.
|
|
# short_str_value can be displayed and java_class_name_value
|
|
# will be displayed as a tooltip.
|
|
"durationValue": "A String", # Contains value if the data is of duration type.
|
|
"url": "A String", # An optional full URL.
|
|
"floatValue": 3.14, # Contains value if the data is of float type.
|
|
"namespace": "A String", # The namespace for the key. This is usually a class name or programming
|
|
# language namespace (i.e. python module) which defines the display data.
|
|
# This allows a dax monitoring system to specially handle the data
|
|
# and perform custom rendering.
|
|
"javaClassValue": "A String", # Contains value if the data is of java class type.
|
|
"label": "A String", # An optional label to display in a dax UI for the element.
|
|
"boolValue": True or False, # Contains value if the data is of a boolean type.
|
|
"strValue": "A String", # Contains value if the data is of string type.
|
|
"key": "A String", # The key identifying the display data.
|
|
# This is intended to be used as a label for the display data
|
|
# when viewed in a dax monitoring system.
|
|
"int64Value": "A String", # Contains value if the data is of int64 type.
|
|
"timestampValue": "A String", # Contains value if the data is of timestamp type.
|
|
},
|
|
],
|
|
"outputCollectionName": [ # User names for all collection outputs to this transform.
|
|
"A String",
|
|
],
|
|
"id": "A String", # SDK generated id of this transform instance.
|
|
},
|
|
],
|
|
"executionPipelineStage": [ # Description of each stage of execution of the pipeline.
|
|
{ # Description of the composing transforms, names/ids, and input/outputs of a
|
|
# stage of execution. Some composing transforms and sources may have been
|
|
# generated by the Dataflow service during execution planning.
|
|
"componentSource": [ # Collections produced and consumed by component transforms of this stage.
|
|
{ # Description of an interstitial value between transforms in an execution
|
|
# stage.
|
|
"userName": "A String", # Human-readable name for this transform; may be user or system generated.
|
|
"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
|
|
# source is most closely associated.
|
|
"name": "A String", # Dataflow service generated name for this source.
|
|
},
|
|
],
|
|
"kind": "A String", # Type of tranform this stage is executing.
|
|
"name": "A String", # Dataflow service generated name for this stage.
|
|
"outputSource": [ # Output sources for this stage.
|
|
{ # Description of an input or output of an execution stage.
|
|
"userName": "A String", # Human-readable name for this source; may be user or system generated.
|
|
"sizeBytes": "A String", # Size of the source, if measurable.
|
|
"name": "A String", # Dataflow service generated name for this source.
|
|
"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
|
|
# source is most closely associated.
|
|
},
|
|
],
|
|
"inputSource": [ # Input sources for this stage.
|
|
{ # Description of an input or output of an execution stage.
|
|
"userName": "A String", # Human-readable name for this source; may be user or system generated.
|
|
"sizeBytes": "A String", # Size of the source, if measurable.
|
|
"name": "A String", # Dataflow service generated name for this source.
|
|
"originalTransformOrCollection": "A String", # User name for the original user transform or collection with which this
|
|
# source is most closely associated.
|
|
},
|
|
],
|
|
"componentTransform": [ # Transforms that comprise this execution stage.
|
|
{ # Description of a transform executed as part of an execution stage.
|
|
"userName": "A String", # Human-readable name for this transform; may be user or system generated.
|
|
"originalTransform": "A String", # User name for the original user transform with which this transform is
|
|
# most closely associated.
|
|
"name": "A String", # Dataflow service generated name for this source.
|
|
},
|
|
],
|
|
"id": "A String", # Dataflow service generated id for this stage.
|
|
},
|
|
],
|
|
"displayData": [ # Pipeline level display data.
|
|
{ # Data provided with a pipeline or transform to provide descriptive info.
|
|
"shortStrValue": "A String", # A possible additional shorter value to display.
|
|
# For example a java_class_name_value of com.mypackage.MyDoFn
|
|
# will be stored with MyDoFn as the short_str_value and
|
|
# com.mypackage.MyDoFn as the java_class_name value.
|
|
# short_str_value can be displayed and java_class_name_value
|
|
# will be displayed as a tooltip.
|
|
"durationValue": "A String", # Contains value if the data is of duration type.
|
|
"url": "A String", # An optional full URL.
|
|
"floatValue": 3.14, # Contains value if the data is of float type.
|
|
"namespace": "A String", # The namespace for the key. This is usually a class name or programming
|
|
# language namespace (i.e. python module) which defines the display data.
|
|
# This allows a dax monitoring system to specially handle the data
|
|
# and perform custom rendering.
|
|
"javaClassValue": "A String", # Contains value if the data is of java class type.
|
|
"label": "A String", # An optional label to display in a dax UI for the element.
|
|
"boolValue": True or False, # Contains value if the data is of a boolean type.
|
|
"strValue": "A String", # Contains value if the data is of string type.
|
|
"key": "A String", # The key identifying the display data.
|
|
# This is intended to be used as a label for the display data
|
|
# when viewed in a dax monitoring system.
|
|
"int64Value": "A String", # Contains value if the data is of int64 type.
|
|
"timestampValue": "A String", # Contains value if the data is of timestamp type.
|
|
},
|
|
],
|
|
},
|
|
"stageStates": [ # This field may be mutated by the Cloud Dataflow service;
|
|
# callers cannot mutate it.
|
|
{ # A message describing the state of a particular execution stage.
|
|
"executionStageName": "A String", # The name of the execution stage.
|
|
"executionStageState": "A String", # Executions stage states allow the same set of values as JobState.
|
|
"currentStateTime": "A String", # The time at which the stage transitioned to this state.
|
|
},
|
|
],
|
|
"id": "A String", # The unique ID of this job.
|
|
#
|
|
# This field is set by the Cloud Dataflow service when the Job is
|
|
# created, and is immutable for the life of the job.
|
|
"replacedByJobId": "A String", # If another job is an update of this job (and thus, this job is in
|
|
# `JOB_STATE_UPDATED`), this field contains the ID of that job.
|
|
"projectId": "A String", # The ID of the Cloud Platform project that the job belongs to.
|
|
"transformNameMapping": { # The map of transform name prefixes of the job to be replaced to the
|
|
# corresponding name prefixes of the new job.
|
|
"a_key": "A String",
|
|
},
|
|
"environment": { # Describes the environment in which a Dataflow Job runs. # The environment for the job.
|
|
"version": { # A structure describing which components and their versions of the service
|
|
# are required in order to run the job.
|
|
"a_key": "", # Properties of the object.
|
|
},
|
|
"flexResourceSchedulingGoal": "A String", # Which Flexible Resource Scheduling mode to run in.
|
|
"serviceKmsKeyName": "A String", # If set, contains the Cloud KMS key identifier used to encrypt data
|
|
# at rest, AKA a Customer Managed Encryption Key (CMEK).
|
|
#
|
|
# Format:
|
|
# projects/PROJECT_ID/locations/LOCATION/keyRings/KEY_RING/cryptoKeys/KEY
|
|
"internalExperiments": { # Experimental settings.
|
|
"a_key": "", # Properties of the object. Contains field @type with type URL.
|
|
},
|
|
"dataset": "A String", # The dataset for the current project where various workflow
|
|
# related tables are stored.
|
|
#
|
|
# The supported resource type is:
|
|
#
|
|
# Google BigQuery:
|
|
# bigquery.googleapis.com/{dataset}
|
|
"experiments": [ # The list of experiments to enable.
|
|
"A String",
|
|
],
|
|
"serviceAccountEmail": "A String", # Identity to run virtual machines as. Defaults to the default account.
|
|
"sdkPipelineOptions": { # The Cloud Dataflow SDK pipeline options specified by the user. These
|
|
# options are passed through the service and are used to recreate the
|
|
# SDK pipeline options on the worker in a language agnostic and platform
|
|
# independent way.
|
|
"a_key": "", # Properties of the object.
|
|
},
|
|
"userAgent": { # A description of the process that generated the request.
|
|
"a_key": "", # Properties of the object.
|
|
},
|
|
"clusterManagerApiService": "A String", # The type of cluster manager API to use. If unknown or
|
|
# unspecified, the service will attempt to choose a reasonable
|
|
# default. This should be in the form of the API service name,
|
|
# e.g. "compute.googleapis.com".
|
|
"workerPools": [ # The worker pools. At least one "harness" worker pool must be
|
|
# specified in order for the job to have workers.
|
|
{ # Describes one particular pool of Cloud Dataflow workers to be
|
|
# instantiated by the Cloud Dataflow service in order to perform the
|
|
# computations required by a job. Note that a workflow job may use
|
|
# multiple pools, in order to match the various computational
|
|
# requirements of the various stages of the job.
|
|
"diskSourceImage": "A String", # Fully qualified source image for disks.
|
|
"taskrunnerSettings": { # Taskrunner configuration settings. # Settings passed through to Google Compute Engine workers when
|
|
# using the standard Dataflow task runner. Users should ignore
|
|
# this field.
|
|
"workflowFileName": "A String", # The file to store the workflow in.
|
|
"logUploadLocation": "A String", # Indicates where to put logs. If this is not specified, the logs
|
|
# will not be uploaded.
|
|
#
|
|
# The supported resource type is:
|
|
#
|
|
# Google Cloud Storage:
|
|
# storage.googleapis.com/{bucket}/{object}
|
|
# bucket.storage.googleapis.com/{object}
|
|
"commandlinesFileName": "A String", # The file to store preprocessing commands in.
|
|
"parallelWorkerSettings": { # Provides data to pass through to the worker harness. # The settings to pass to the parallel worker harness.
|
|
"reportingEnabled": True or False, # Whether to send work progress updates to the service.
|
|
"shuffleServicePath": "A String", # The Shuffle service path relative to the root URL, for example,
|
|
# "shuffle/v1beta1".
|
|
"workerId": "A String", # The ID of the worker running this pipeline.
|
|
"baseUrl": "A String", # The base URL for accessing Google Cloud APIs.
|
|
#
|
|
# When workers access Google Cloud APIs, they logically do so via
|
|
# relative URLs. If this field is specified, it supplies the base
|
|
# URL to use for resolving these relative URLs. The normative
|
|
# algorithm used is defined by RFC 1808, "Relative Uniform Resource
|
|
# Locators".
|
|
#
|
|
# If not specified, the default value is "http://www.googleapis.com/"
|
|
"servicePath": "A String", # The Cloud Dataflow service path relative to the root URL, for example,
|
|
# "dataflow/v1b3/projects".
|
|
"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary
|
|
# storage.
|
|
#
|
|
# The supported resource type is:
|
|
#
|
|
# Google Cloud Storage:
|
|
#
|
|
# storage.googleapis.com/{bucket}/{object}
|
|
# bucket.storage.googleapis.com/{object}
|
|
},
|
|
"vmId": "A String", # The ID string of the VM.
|
|
"baseTaskDir": "A String", # The location on the worker for task-specific subdirectories.
|
|
"continueOnException": True or False, # Whether to continue taskrunner if an exception is hit.
|
|
"oauthScopes": [ # The OAuth2 scopes to be requested by the taskrunner in order to
|
|
# access the Cloud Dataflow API.
|
|
"A String",
|
|
],
|
|
"taskUser": "A String", # The UNIX user ID on the worker VM to use for tasks launched by
|
|
# taskrunner; e.g. "root".
|
|
"baseUrl": "A String", # The base URL for the taskrunner to use when accessing Google Cloud APIs.
|
|
#
|
|
# When workers access Google Cloud APIs, they logically do so via
|
|
# relative URLs. If this field is specified, it supplies the base
|
|
# URL to use for resolving these relative URLs. The normative
|
|
# algorithm used is defined by RFC 1808, "Relative Uniform Resource
|
|
# Locators".
|
|
#
|
|
# If not specified, the default value is "http://www.googleapis.com/"
|
|
"taskGroup": "A String", # The UNIX group ID on the worker VM to use for tasks launched by
|
|
# taskrunner; e.g. "wheel".
|
|
"languageHint": "A String", # The suggested backend language.
|
|
"logToSerialconsole": True or False, # Whether to send taskrunner log info to Google Compute Engine VM serial
|
|
# console.
|
|
"streamingWorkerMainClass": "A String", # The streaming worker main class name.
|
|
"logDir": "A String", # The directory on the VM to store logs.
|
|
"dataflowApiVersion": "A String", # The API version of endpoint, e.g. "v1b3"
|
|
"harnessCommand": "A String", # The command to launch the worker harness.
|
|
"tempStoragePrefix": "A String", # The prefix of the resources the taskrunner should use for
|
|
# temporary storage.
|
|
#
|
|
# The supported resource type is:
|
|
#
|
|
# Google Cloud Storage:
|
|
# storage.googleapis.com/{bucket}/{object}
|
|
# bucket.storage.googleapis.com/{object}
|
|
"alsologtostderr": True or False, # Whether to also send taskrunner log info to stderr.
|
|
},
|
|
"kind": "A String", # The kind of the worker pool; currently only `harness` and `shuffle`
|
|
# are supported.
|
|
"packages": [ # Packages to be installed on workers.
|
|
{ # The packages that must be installed in order for a worker to run the
|
|
# steps of the Cloud Dataflow job that will be assigned to its worker
|
|
# pool.
|
|
#
|
|
# This is the mechanism by which the Cloud Dataflow SDK causes code to
|
|
# be loaded onto the workers. For example, the Cloud Dataflow Java SDK
|
|
# might use this to install jars containing the user's code and all of the
|
|
# various dependencies (libraries, data files, etc.) required in order
|
|
# for that code to run.
|
|
"location": "A String", # The resource to read the package from. The supported resource type is:
|
|
#
|
|
# Google Cloud Storage:
|
|
#
|
|
# storage.googleapis.com/{bucket}
|
|
# bucket.storage.googleapis.com/
|
|
"name": "A String", # The name of the package.
|
|
},
|
|
],
|
|
"machineType": "A String", # Machine type (e.g. "n1-standard-1"). If empty or unspecified, the
|
|
# service will attempt to choose a reasonable default.
|
|
"network": "A String", # Network to which VMs will be assigned. If empty or unspecified,
|
|
# the service will use the network "default".
|
|
"zone": "A String", # Zone to run the worker pools in. If empty or unspecified, the service
|
|
# will attempt to choose a reasonable default.
|
|
"diskSizeGb": 42, # Size of root disk for VMs, in GB. If zero or unspecified, the service will
|
|
# attempt to choose a reasonable default.
|
|
"teardownPolicy": "A String", # Sets the policy for determining when to turndown worker pool.
|
|
# Allowed values are: `TEARDOWN_ALWAYS`, `TEARDOWN_ON_SUCCESS`, and
|
|
# `TEARDOWN_NEVER`.
|
|
# `TEARDOWN_ALWAYS` means workers are always torn down regardless of whether
|
|
# the job succeeds. `TEARDOWN_ON_SUCCESS` means workers are torn down
|
|
# if the job succeeds. `TEARDOWN_NEVER` means the workers are never torn
|
|
# down.
|
|
#
|
|
# If the workers are not torn down by the service, they will
|
|
# continue to run and use Google Compute Engine VM resources in the
|
|
# user's project until they are explicitly terminated by the user.
|
|
# Because of this, Google recommends using the `TEARDOWN_ALWAYS`
|
|
# policy except for small, manually supervised test jobs.
|
|
#
|
|
# If unknown or unspecified, the service will attempt to choose a reasonable
|
|
# default.
|
|
"onHostMaintenance": "A String", # The action to take on host maintenance, as defined by the Google
|
|
# Compute Engine API.
|
|
"ipConfiguration": "A String", # Configuration for VM IPs.
|
|
"numThreadsPerWorker": 42, # The number of threads per worker harness. If empty or unspecified, the
|
|
# service will choose a number of threads (according to the number of cores
|
|
# on the selected machine type for batch, or 1 by convention for streaming).
|
|
"poolArgs": { # Extra arguments for this worker pool.
|
|
"a_key": "", # Properties of the object. Contains field @type with type URL.
|
|
},
|
|
"numWorkers": 42, # Number of Google Compute Engine workers in this pool needed to
|
|
# execute the job. If zero or unspecified, the service will
|
|
# attempt to choose a reasonable default.
|
|
"workerHarnessContainerImage": "A String", # Required. Docker container image that executes the Cloud Dataflow worker
|
|
# harness, residing in Google Container Registry.
|
|
"subnetwork": "A String", # Subnetwork to which VMs will be assigned, if desired. Expected to be of
|
|
# the form "regions/REGION/subnetworks/SUBNETWORK".
|
|
"dataDisks": [ # Data disks that are used by a VM in this workflow.
|
|
{ # Describes the data disk used by a workflow job.
|
|
"mountPoint": "A String", # Directory in a VM where disk is mounted.
|
|
"sizeGb": 42, # Size of disk in GB. If zero or unspecified, the service will
|
|
# attempt to choose a reasonable default.
|
|
"diskType": "A String", # Disk storage type, as defined by Google Compute Engine. This
|
|
# must be a disk type appropriate to the project and zone in which
|
|
# the workers will run. If unknown or unspecified, the service
|
|
# will attempt to choose a reasonable default.
|
|
#
|
|
# For example, the standard persistent disk type is a resource name
|
|
# typically ending in "pd-standard". If SSD persistent disks are
|
|
# available, the resource name typically ends with "pd-ssd". The
|
|
# actual valid values are defined the Google Compute Engine API,
|
|
# not by the Cloud Dataflow API; consult the Google Compute Engine
|
|
# documentation for more information about determining the set of
|
|
# available disk types for a particular project and zone.
|
|
#
|
|
# Google Compute Engine Disk types are local to a particular
|
|
# project in a particular zone, and so the resource name will
|
|
# typically look something like this:
|
|
#
|
|
# compute.googleapis.com/projects/project-id/zones/zone/diskTypes/pd-standard
|
|
},
|
|
],
|
|
"autoscalingSettings": { # Settings for WorkerPool autoscaling. # Settings for autoscaling of this WorkerPool.
|
|
"maxNumWorkers": 42, # The maximum number of workers to cap scaling at.
|
|
"algorithm": "A String", # The algorithm to use for autoscaling.
|
|
},
|
|
"defaultPackageSet": "A String", # The default package set to install. This allows the service to
|
|
# select a default set of packages which are useful to worker
|
|
# harnesses written in a particular language.
|
|
"diskType": "A String", # Type of root disk for VMs. If empty or unspecified, the service will
|
|
# attempt to choose a reasonable default.
|
|
"metadata": { # Metadata to set on the Google Compute Engine VMs.
|
|
"a_key": "A String",
|
|
},
|
|
},
|
|
],
|
|
"tempStoragePrefix": "A String", # The prefix of the resources the system should use for temporary
|
|
# storage. The system will append the suffix "/temp-{JOBNAME} to
|
|
# this resource prefix, where {JOBNAME} is the value of the
|
|
# job_name field. The resulting bucket and object prefix is used
|
|
# as the prefix of the resources used to store temporary data
|
|
# needed during the job execution. NOTE: This will override the
|
|
# value in taskrunner_settings.
|
|
# The supported resource type is:
|
|
#
|
|
# Google Cloud Storage:
|
|
#
|
|
# storage.googleapis.com/{bucket}/{object}
|
|
# bucket.storage.googleapis.com/{object}
|
|
},
|
|
"location": "A String", # The [regional endpoint]
|
|
# (https://cloud.google.com/dataflow/docs/concepts/regional-endpoints) that
|
|
# contains this job.
|
|
"tempFiles": [ # A set of files the system should be aware of that are used
|
|
# for temporary storage. These temporary files will be
|
|
# removed on job completion.
|
|
# No duplicates are allowed.
|
|
# No file patterns are supported.
|
|
#
|
|
# The supported files are:
|
|
#
|
|
# Google Cloud Storage:
|
|
#
|
|
# storage.googleapis.com/{bucket}/{object}
|
|
# bucket.storage.googleapis.com/{object}
|
|
"A String",
|
|
],
|
|
"type": "A String", # The type of Cloud Dataflow job.
|
|
"clientRequestId": "A String", # The client's unique identifier of the job, re-used across retried attempts.
|
|
# If this field is set, the service will ensure its uniqueness.
|
|
# The request to create a job will fail if the service has knowledge of a
|
|
# previously submitted job with the same client's ID and job name.
|
|
# The caller may use this field to ensure idempotence of job
|
|
# creation across retried attempts to create a job.
|
|
# By default, the field is empty and, in that case, the service ignores it.
|
|
"createdFromSnapshotId": "A String", # If this is specified, the job's initial state is populated from the given
|
|
# snapshot.
|
|
"stepsLocation": "A String", # The GCS location where the steps are stored.
|
|
"currentStateTime": "A String", # The timestamp associated with the current state.
|
|
"startTime": "A String", # The timestamp when the job was started (transitioned to JOB_STATE_PENDING).
|
|
# Flexible resource scheduling jobs are started with some delay after job
|
|
# creation, so start_time is unset before start and is updated when the
|
|
# job is started by the Cloud Dataflow service. For other jobs, start_time
|
|
# always equals to create_time and is immutable and set by the Cloud Dataflow
|
|
# service.
|
|
"createTime": "A String", # The timestamp when the job was initially created. Immutable and set by the
|
|
# Cloud Dataflow service.
|
|
"requestedState": "A String", # The job's requested state.
|
|
#
|
|
# `UpdateJob` may be used to switch between the `JOB_STATE_STOPPED` and
|
|
# `JOB_STATE_RUNNING` states, by setting requested_state. `UpdateJob` may
|
|
# also be used to directly set a job's requested state to
|
|
# `JOB_STATE_CANCELLED` or `JOB_STATE_DONE`, irrevocably terminating the
|
|
# job if it has not already reached a terminal state.
|
|
"name": "A String", # The user-specified Cloud Dataflow job name.
|
|
#
|
|
# Only one Job with a given name may exist in a project at any
|
|
# given time. If a caller attempts to create a Job with the same
|
|
# name as an already-existing Job, the attempt returns the
|
|
# existing Job.
|
|
#
|
|
# The name must match the regular expression
|
|
# `[a-z]([-a-z0-9]{0,38}[a-z0-9])?`
|
|
"steps": [ # Exactly one of step or steps_location should be specified.
|
|
#
|
|
# The top-level steps that constitute the entire job.
|
|
{ # Defines a particular step within a Cloud Dataflow job.
|
|
#
|
|
# A job consists of multiple steps, each of which performs some
|
|
# specific operation as part of the overall job. Data is typically
|
|
# passed from one step to another as part of the job.
|
|
#
|
|
# Here's an example of a sequence of steps which together implement a
|
|
# Map-Reduce job:
|
|
#
|
|
# * Read a collection of data from some source, parsing the
|
|
# collection's elements.
|
|
#
|
|
# * Validate the elements.
|
|
#
|
|
# * Apply a user-defined function to map each element to some value
|
|
# and extract an element-specific key value.
|
|
#
|
|
# * Group elements with the same key into a single element with
|
|
# that key, transforming a multiply-keyed collection into a
|
|
# uniquely-keyed collection.
|
|
#
|
|
# * Write the elements out to some data sink.
|
|
#
|
|
# Note that the Cloud Dataflow service may be used to run many different
|
|
# types of jobs, not just Map-Reduce.
|
|
"kind": "A String", # The kind of step in the Cloud Dataflow job.
|
|
"properties": { # Named properties associated with the step. Each kind of
|
|
# predefined step has its own required set of properties.
|
|
# Must be provided on Create. Only retrieved with JOB_VIEW_ALL.
|
|
"a_key": "", # Properties of the object.
|
|
},
|
|
"name": "A String", # The name that identifies the step. This must be unique for each
|
|
# step with respect to all other steps in the Cloud Dataflow job.
|
|
},
|
|
],
|
|
"replaceJobId": "A String", # If this job is an update of an existing job, this field is the job ID
|
|
# of the job it replaced.
|
|
#
|
|
# When sending a `CreateJobRequest`, you can update a job by specifying it
|
|
# here. The job named here is stopped, and its intermediate state is
|
|
# transferred to this job.
|
|
"currentState": "A String", # The current state of the job.
|
|
#
|
|
# Jobs are created in the `JOB_STATE_STOPPED` state unless otherwise
|
|
# specified.
|
|
#
|
|
# A job in the `JOB_STATE_RUNNING` state may asynchronously enter a
|
|
# terminal state. After a job has reached a terminal state, no
|
|
# further state updates may be made.
|
|
#
|
|
# This field may be mutated by the Cloud Dataflow service;
|
|
# callers cannot mutate it.
|
|
"executionInfo": { # Additional information about how a Cloud Dataflow job will be executed that # Deprecated.
|
|
# isn't contained in the submitted job.
|
|
"stages": { # A mapping from each stage to the information about that stage.
|
|
"a_key": { # Contains information about how a particular
|
|
# google.dataflow.v1beta3.Step will be executed.
|
|
"stepName": [ # The steps associated with the execution stage.
|
|
# Note that stages may have several steps, and that a given step
|
|
# might be run by more than one stage.
|
|
"A String",
|
|
],
|
|
},
|
|
},
|
|
},
|
|
},
|
|
}</pre>
|
|
</div>
|
|
|
|
</body></html> |