You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
6163 lines
433 KiB
6163 lines
433 KiB
<html><body>
|
|
<style>
|
|
|
|
body, h1, h2, h3, div, span, p, pre, a {
|
|
margin: 0;
|
|
padding: 0;
|
|
border: 0;
|
|
font-weight: inherit;
|
|
font-style: inherit;
|
|
font-size: 100%;
|
|
font-family: inherit;
|
|
vertical-align: baseline;
|
|
}
|
|
|
|
body {
|
|
font-size: 13px;
|
|
padding: 1em;
|
|
}
|
|
|
|
h1 {
|
|
font-size: 26px;
|
|
margin-bottom: 1em;
|
|
}
|
|
|
|
h2 {
|
|
font-size: 24px;
|
|
margin-bottom: 1em;
|
|
}
|
|
|
|
h3 {
|
|
font-size: 20px;
|
|
margin-bottom: 1em;
|
|
margin-top: 1em;
|
|
}
|
|
|
|
pre, code {
|
|
line-height: 1.5;
|
|
font-family: Monaco, 'DejaVu Sans Mono', 'Bitstream Vera Sans Mono', 'Lucida Console', monospace;
|
|
}
|
|
|
|
pre {
|
|
margin-top: 0.5em;
|
|
}
|
|
|
|
h1, h2, h3, p {
|
|
font-family: Arial, sans serif;
|
|
}
|
|
|
|
h1, h2, h3 {
|
|
border-bottom: solid #CCC 1px;
|
|
}
|
|
|
|
.toc_element {
|
|
margin-top: 0.5em;
|
|
}
|
|
|
|
.firstline {
|
|
margin-left: 2 em;
|
|
}
|
|
|
|
.method {
|
|
margin-top: 1em;
|
|
border: solid 1px #CCC;
|
|
padding: 1em;
|
|
background: #EEE;
|
|
}
|
|
|
|
.details {
|
|
font-weight: bold;
|
|
font-size: 14px;
|
|
}
|
|
|
|
</style>
|
|
|
|
<h1><a href="dlp_v2.html">Cloud Data Loss Prevention (DLP) API</a> . <a href="dlp_v2.projects.html">projects</a> . <a href="dlp_v2.projects.dlpJobs.html">dlpJobs</a></h1>
|
|
<h2>Instance Methods</h2>
|
|
<p class="toc_element">
|
|
<code><a href="#cancel">cancel(name, body=None, x__xgafv=None)</a></code></p>
|
|
<p class="firstline">Starts asynchronous cancellation on a long-running DlpJob. The server</p>
|
|
<p class="toc_element">
|
|
<code><a href="#create">create(parent, body, x__xgafv=None)</a></code></p>
|
|
<p class="firstline">Creates a new job to inspect storage or calculate risk metrics.</p>
|
|
<p class="toc_element">
|
|
<code><a href="#delete">delete(name, x__xgafv=None)</a></code></p>
|
|
<p class="firstline">Deletes a long-running DlpJob. This method indicates that the client is</p>
|
|
<p class="toc_element">
|
|
<code><a href="#get">get(name, x__xgafv=None)</a></code></p>
|
|
<p class="firstline">Gets the latest state of a long-running DlpJob.</p>
|
|
<p class="toc_element">
|
|
<code><a href="#list">list(parent, orderBy=None, type=None, pageSize=None, pageToken=None, x__xgafv=None, filter=None)</a></code></p>
|
|
<p class="firstline">Lists DlpJobs that match the specified filter in the request.</p>
|
|
<p class="toc_element">
|
|
<code><a href="#list_next">list_next(previous_request, previous_response)</a></code></p>
|
|
<p class="firstline">Retrieves the next page of results.</p>
|
|
<h3>Method Details</h3>
|
|
<div class="method">
|
|
<code class="details" id="cancel">cancel(name, body=None, x__xgafv=None)</code>
|
|
<pre>Starts asynchronous cancellation on a long-running DlpJob. The server
|
|
makes a best effort to cancel the DlpJob, but success is not
|
|
guaranteed.
|
|
See https://cloud.google.com/dlp/docs/inspecting-storage and
|
|
https://cloud.google.com/dlp/docs/compute-risk-analysis to learn more.
|
|
|
|
Args:
|
|
name: string, The name of the DlpJob resource to be cancelled. (required)
|
|
body: object, The request body.
|
|
The object takes the form of:
|
|
|
|
{ # The request message for canceling a DLP job.
|
|
}
|
|
|
|
x__xgafv: string, V1 error format.
|
|
Allowed values
|
|
1 - v1 error format
|
|
2 - v2 error format
|
|
|
|
Returns:
|
|
An object of the form:
|
|
|
|
{ # A generic empty message that you can re-use to avoid defining duplicated
|
|
# empty messages in your APIs. A typical example is to use it as the request
|
|
# or the response type of an API method. For instance:
|
|
#
|
|
# service Foo {
|
|
# rpc Bar(google.protobuf.Empty) returns (google.protobuf.Empty);
|
|
# }
|
|
#
|
|
# The JSON representation for `Empty` is empty JSON object `{}`.
|
|
}</pre>
|
|
</div>
|
|
|
|
<div class="method">
|
|
<code class="details" id="create">create(parent, body, x__xgafv=None)</code>
|
|
<pre>Creates a new job to inspect storage or calculate risk metrics.
|
|
See https://cloud.google.com/dlp/docs/inspecting-storage and
|
|
https://cloud.google.com/dlp/docs/compute-risk-analysis to learn more.
|
|
|
|
When no InfoTypes or CustomInfoTypes are specified in inspect jobs, the
|
|
system will automatically choose what detectors to run. By default this may
|
|
be all types, but may change over time as detectors are updated.
|
|
|
|
Args:
|
|
parent: string, The parent resource name, for example projects/my-project-id. (required)
|
|
body: object, The request body. (required)
|
|
The object takes the form of:
|
|
|
|
{ # Request message for CreateDlpJobRequest. Used to initiate long running
|
|
# jobs such as calculating risk metrics or inspecting Google Cloud
|
|
# Storage.
|
|
"riskJob": { # Configuration for a risk analysis job. See
|
|
# https://cloud.google.com/dlp/docs/concepts-risk-analysis to learn more.
|
|
"privacyMetric": { # Privacy metric to compute for reidentification risk analysis. # Privacy metric to compute.
|
|
"numericalStatsConfig": { # Compute numerical stats over an individual column, including
|
|
# min, max, and quantiles.
|
|
"field": { # General identifier of a data field in a storage service. # Field to compute numerical stats on. Supported types are
|
|
# integer, float, date, datetime, timestamp, time.
|
|
"name": "A String", # Name describing the field.
|
|
},
|
|
},
|
|
"kMapEstimationConfig": { # Reidentifiability metric. This corresponds to a risk model similar to what
|
|
# is called "journalist risk" in the literature, except the attack dataset is
|
|
# statistically modeled instead of being perfectly known. This can be done
|
|
# using publicly available data (like the US Census), or using a custom
|
|
# statistical model (indicated as one or several BigQuery tables), or by
|
|
# extrapolating from the distribution of values in the input dataset.
|
|
# A column with a semantic tag attached.
|
|
"regionCode": "A String", # ISO 3166-1 alpha-2 region code to use in the statistical modeling.
|
|
# Required if no column is tagged with a region-specific InfoType (like
|
|
# US_ZIP_5) or a region code.
|
|
"quasiIds": [ # Fields considered to be quasi-identifiers. No two columns can have the
|
|
# same tag. [required]
|
|
{
|
|
"field": { # General identifier of a data field in a storage service. # Identifies the column. [required]
|
|
"name": "A String", # Name describing the field.
|
|
},
|
|
"customTag": "A String", # A column can be tagged with a custom tag. In this case, the user must
|
|
# indicate an auxiliary table that contains statistical information on
|
|
# the possible values of this column (below).
|
|
"infoType": { # Type of information detected by the API. # A column can be tagged with a InfoType to use the relevant public
|
|
# dataset as a statistical model of population, if available. We
|
|
# currently support US ZIP codes, region codes, ages and genders.
|
|
# To programmatically obtain the list of supported InfoTypes, use
|
|
# ListInfoTypes with the supported_by=RISK_ANALYSIS filter.
|
|
"name": "A String", # Name of the information type. Either a name of your choosing when
|
|
# creating a CustomInfoType, or one of the names listed
|
|
# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying
|
|
# a built-in type. InfoType names should conform to the pattern
|
|
# [a-zA-Z0-9_]{1,64}.
|
|
},
|
|
"inferred": { # A generic empty message that you can re-use to avoid defining duplicated # If no semantic tag is indicated, we infer the statistical model from
|
|
# the distribution of values in the input data
|
|
# empty messages in your APIs. A typical example is to use it as the request
|
|
# or the response type of an API method. For instance:
|
|
#
|
|
# service Foo {
|
|
# rpc Bar(google.protobuf.Empty) returns (google.protobuf.Empty);
|
|
# }
|
|
#
|
|
# The JSON representation for `Empty` is empty JSON object `{}`.
|
|
},
|
|
},
|
|
],
|
|
"auxiliaryTables": [ # Several auxiliary tables can be used in the analysis. Each custom_tag
|
|
# used to tag a quasi-identifiers column must appear in exactly one column
|
|
# of one auxiliary table.
|
|
{ # An auxiliary table contains statistical information on the relative
|
|
# frequency of different quasi-identifiers values. It has one or several
|
|
# quasi-identifiers columns, and one column that indicates the relative
|
|
# frequency of each quasi-identifier tuple.
|
|
# If a tuple is present in the data but not in the auxiliary table, the
|
|
# corresponding relative frequency is assumed to be zero (and thus, the
|
|
# tuple is highly reidentifiable).
|
|
"relativeFrequency": { # General identifier of a data field in a storage service. # The relative frequency column must contain a floating-point number
|
|
# between 0 and 1 (inclusive). Null values are assumed to be zero.
|
|
# [required]
|
|
"name": "A String", # Name describing the field.
|
|
},
|
|
"quasiIds": [ # Quasi-identifier columns. [required]
|
|
{ # A quasi-identifier column has a custom_tag, used to know which column
|
|
# in the data corresponds to which column in the statistical model.
|
|
"field": { # General identifier of a data field in a storage service.
|
|
"name": "A String", # Name describing the field.
|
|
},
|
|
"customTag": "A String",
|
|
},
|
|
],
|
|
"table": { # Message defining the location of a BigQuery table. A table is uniquely # Auxiliary table location. [required]
|
|
# identified by its project_id, dataset_id, and table_name. Within a query
|
|
# a table is often referenced with a string in the format of:
|
|
# `<project_id>:<dataset_id>.<table_id>` or
|
|
# `<project_id>.<dataset_id>.<table_id>`.
|
|
"projectId": "A String", # The Google Cloud Platform project ID of the project containing the table.
|
|
# If omitted, project ID is inferred from the API call.
|
|
"tableId": "A String", # Name of the table.
|
|
"datasetId": "A String", # Dataset ID of the table.
|
|
},
|
|
},
|
|
],
|
|
},
|
|
"lDiversityConfig": { # l-diversity metric, used for analysis of reidentification risk.
|
|
"sensitiveAttribute": { # General identifier of a data field in a storage service. # Sensitive field for computing the l-value.
|
|
"name": "A String", # Name describing the field.
|
|
},
|
|
"quasiIds": [ # Set of quasi-identifiers indicating how equivalence classes are
|
|
# defined for the l-diversity computation. When multiple fields are
|
|
# specified, they are considered a single composite key.
|
|
{ # General identifier of a data field in a storage service.
|
|
"name": "A String", # Name describing the field.
|
|
},
|
|
],
|
|
},
|
|
"deltaPresenceEstimationConfig": { # δ-presence metric, used to estimate how likely it is for an attacker to
|
|
# figure out that one given individual appears in a de-identified dataset.
|
|
# Similarly to the k-map metric, we cannot compute δ-presence exactly without
|
|
# knowing the attack dataset, so we use a statistical model instead.
|
|
"regionCode": "A String", # ISO 3166-1 alpha-2 region code to use in the statistical modeling.
|
|
# Required if no column is tagged with a region-specific InfoType (like
|
|
# US_ZIP_5) or a region code.
|
|
"quasiIds": [ # Fields considered to be quasi-identifiers. No two fields can have the
|
|
# same tag. [required]
|
|
{ # A column with a semantic tag attached.
|
|
"field": { # General identifier of a data field in a storage service. # Identifies the column. [required]
|
|
"name": "A String", # Name describing the field.
|
|
},
|
|
"customTag": "A String", # A column can be tagged with a custom tag. In this case, the user must
|
|
# indicate an auxiliary table that contains statistical information on
|
|
# the possible values of this column (below).
|
|
"infoType": { # Type of information detected by the API. # A column can be tagged with a InfoType to use the relevant public
|
|
# dataset as a statistical model of population, if available. We
|
|
# currently support US ZIP codes, region codes, ages and genders.
|
|
# To programmatically obtain the list of supported InfoTypes, use
|
|
# ListInfoTypes with the supported_by=RISK_ANALYSIS filter.
|
|
"name": "A String", # Name of the information type. Either a name of your choosing when
|
|
# creating a CustomInfoType, or one of the names listed
|
|
# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying
|
|
# a built-in type. InfoType names should conform to the pattern
|
|
# [a-zA-Z0-9_]{1,64}.
|
|
},
|
|
"inferred": { # A generic empty message that you can re-use to avoid defining duplicated # If no semantic tag is indicated, we infer the statistical model from
|
|
# the distribution of values in the input data
|
|
# empty messages in your APIs. A typical example is to use it as the request
|
|
# or the response type of an API method. For instance:
|
|
#
|
|
# service Foo {
|
|
# rpc Bar(google.protobuf.Empty) returns (google.protobuf.Empty);
|
|
# }
|
|
#
|
|
# The JSON representation for `Empty` is empty JSON object `{}`.
|
|
},
|
|
},
|
|
],
|
|
"auxiliaryTables": [ # Several auxiliary tables can be used in the analysis. Each custom_tag
|
|
# used to tag a quasi-identifiers field must appear in exactly one
|
|
# field of one auxiliary table.
|
|
{ # An auxiliary table containing statistical information on the relative
|
|
# frequency of different quasi-identifiers values. It has one or several
|
|
# quasi-identifiers columns, and one column that indicates the relative
|
|
# frequency of each quasi-identifier tuple.
|
|
# If a tuple is present in the data but not in the auxiliary table, the
|
|
# corresponding relative frequency is assumed to be zero (and thus, the
|
|
# tuple is highly reidentifiable).
|
|
"relativeFrequency": { # General identifier of a data field in a storage service. # The relative frequency column must contain a floating-point number
|
|
# between 0 and 1 (inclusive). Null values are assumed to be zero.
|
|
# [required]
|
|
"name": "A String", # Name describing the field.
|
|
},
|
|
"quasiIds": [ # Quasi-identifier columns. [required]
|
|
{ # A quasi-identifier column has a custom_tag, used to know which column
|
|
# in the data corresponds to which column in the statistical model.
|
|
"field": { # General identifier of a data field in a storage service.
|
|
"name": "A String", # Name describing the field.
|
|
},
|
|
"customTag": "A String",
|
|
},
|
|
],
|
|
"table": { # Message defining the location of a BigQuery table. A table is uniquely # Auxiliary table location. [required]
|
|
# identified by its project_id, dataset_id, and table_name. Within a query
|
|
# a table is often referenced with a string in the format of:
|
|
# `<project_id>:<dataset_id>.<table_id>` or
|
|
# `<project_id>.<dataset_id>.<table_id>`.
|
|
"projectId": "A String", # The Google Cloud Platform project ID of the project containing the table.
|
|
# If omitted, project ID is inferred from the API call.
|
|
"tableId": "A String", # Name of the table.
|
|
"datasetId": "A String", # Dataset ID of the table.
|
|
},
|
|
},
|
|
],
|
|
},
|
|
"categoricalStatsConfig": { # Compute numerical stats over an individual column, including
|
|
# number of distinct values and value count distribution.
|
|
"field": { # General identifier of a data field in a storage service. # Field to compute categorical stats on. All column types are
|
|
# supported except for arrays and structs. However, it may be more
|
|
# informative to use NumericalStats when the field type is supported,
|
|
# depending on the data.
|
|
"name": "A String", # Name describing the field.
|
|
},
|
|
},
|
|
"kAnonymityConfig": { # k-anonymity metric, used for analysis of reidentification risk.
|
|
"entityId": { # An entity in a dataset is a field or set of fields that correspond to a # Optional message indicating that multiple rows might be associated to a
|
|
# single individual. If the same entity_id is associated to multiple
|
|
# quasi-identifier tuples over distinct rows, we consider the entire
|
|
# collection of tuples as the composite quasi-identifier. This collection
|
|
# is a multiset: the order in which the different tuples appear in the
|
|
# dataset is ignored, but their frequency is taken into account.
|
|
#
|
|
# Important note: a maximum of 1000 rows can be associated to a single
|
|
# entity ID. If more rows are associated with the same entity ID, some
|
|
# might be ignored.
|
|
# single person. For example, in medical records the `EntityId` might be a
|
|
# patient identifier, or for financial records it might be an account
|
|
# identifier. This message is used when generalizations or analysis must take
|
|
# into account that multiple rows correspond to the same entity.
|
|
"field": { # General identifier of a data field in a storage service. # Composite key indicating which field contains the entity identifier.
|
|
"name": "A String", # Name describing the field.
|
|
},
|
|
},
|
|
"quasiIds": [ # Set of fields to compute k-anonymity over. When multiple fields are
|
|
# specified, they are considered a single composite key. Structs and
|
|
# repeated data types are not supported; however, nested fields are
|
|
# supported so long as they are not structs themselves or nested within
|
|
# a repeated field.
|
|
{ # General identifier of a data field in a storage service.
|
|
"name": "A String", # Name describing the field.
|
|
},
|
|
],
|
|
},
|
|
},
|
|
"sourceTable": { # Message defining the location of a BigQuery table. A table is uniquely # Input dataset to compute metrics over.
|
|
# identified by its project_id, dataset_id, and table_name. Within a query
|
|
# a table is often referenced with a string in the format of:
|
|
# `<project_id>:<dataset_id>.<table_id>` or
|
|
# `<project_id>.<dataset_id>.<table_id>`.
|
|
"projectId": "A String", # The Google Cloud Platform project ID of the project containing the table.
|
|
# If omitted, project ID is inferred from the API call.
|
|
"tableId": "A String", # Name of the table.
|
|
"datasetId": "A String", # Dataset ID of the table.
|
|
},
|
|
"actions": [ # Actions to execute at the completion of the job. Are executed in the order
|
|
# provided.
|
|
{ # A task to execute on the completion of a job.
|
|
# See https://cloud.google.com/dlp/docs/concepts-actions to learn more.
|
|
"saveFindings": { # If set, the detailed findings will be persisted to the specified # Save resulting findings in a provided location.
|
|
# OutputStorageConfig. Only a single instance of this action can be
|
|
# specified.
|
|
# Compatible with: Inspect, Risk
|
|
"outputConfig": { # Cloud repository for storing output.
|
|
"table": { # Message defining the location of a BigQuery table. A table is uniquely # Store findings in an existing table or a new table in an existing
|
|
# dataset. If table_id is not set a new one will be generated
|
|
# for you with the following format:
|
|
# dlp_googleapis_yyyy_mm_dd_[dlp_job_id]. Pacific timezone will be used for
|
|
# generating the date details.
|
|
#
|
|
# For Inspect, each column in an existing output table must have the same
|
|
# name, type, and mode of a field in the `Finding` object.
|
|
#
|
|
# For Risk, an existing output table should be the output of a previous
|
|
# Risk analysis job run on the same source table, with the same privacy
|
|
# metric and quasi-identifiers. Risk jobs that analyze the same table but
|
|
# compute a different privacy metric, or use different sets of
|
|
# quasi-identifiers, cannot store their results in the same table.
|
|
# identified by its project_id, dataset_id, and table_name. Within a query
|
|
# a table is often referenced with a string in the format of:
|
|
# `<project_id>:<dataset_id>.<table_id>` or
|
|
# `<project_id>.<dataset_id>.<table_id>`.
|
|
"projectId": "A String", # The Google Cloud Platform project ID of the project containing the table.
|
|
# If omitted, project ID is inferred from the API call.
|
|
"tableId": "A String", # Name of the table.
|
|
"datasetId": "A String", # Dataset ID of the table.
|
|
},
|
|
"outputSchema": "A String", # Schema used for writing the findings for Inspect jobs. This field is only
|
|
# used for Inspect and must be unspecified for Risk jobs. Columns are derived
|
|
# from the `Finding` object. If appending to an existing table, any columns
|
|
# from the predefined schema that are missing will be added. No columns in
|
|
# the existing table will be deleted.
|
|
#
|
|
# If unspecified, then all available columns will be used for a new table or
|
|
# an (existing) table with no schema, and no changes will be made to an
|
|
# existing table that has a schema.
|
|
},
|
|
},
|
|
"jobNotificationEmails": { # Enable email notification to project owners and editors on jobs's # Enable email notification to project owners and editors on job's
|
|
# completion/failure.
|
|
# completion/failure.
|
|
},
|
|
"publishSummaryToCscc": { # Publish the result summary of a DlpJob to the Cloud Security # Publish summary to Cloud Security Command Center (Alpha).
|
|
# Command Center (CSCC Alpha).
|
|
# This action is only available for projects which are parts of
|
|
# an organization and whitelisted for the alpha Cloud Security Command
|
|
# Center.
|
|
# The action will publish count of finding instances and their info types.
|
|
# The summary of findings will be persisted in CSCC and are governed by CSCC
|
|
# service-specific policy, see https://cloud.google.com/terms/service-terms
|
|
# Only a single instance of this action can be specified.
|
|
# Compatible with: Inspect
|
|
},
|
|
"pubSub": { # Publish a message into given Pub/Sub topic when DlpJob has completed. The # Publish a notification to a pubsub topic.
|
|
# message contains a single field, `DlpJobName`, which is equal to the
|
|
# finished job's
|
|
# [`DlpJob.name`](/dlp/docs/reference/rest/v2/projects.dlpJobs#DlpJob).
|
|
# Compatible with: Inspect, Risk
|
|
"topic": "A String", # Cloud Pub/Sub topic to send notifications to. The topic must have given
|
|
# publishing access rights to the DLP API service account executing
|
|
# the long running DlpJob sending the notifications.
|
|
# Format is projects/{project}/topics/{topic}.
|
|
},
|
|
},
|
|
],
|
|
},
|
|
"jobId": "A String", # The job id can contain uppercase and lowercase letters,
|
|
# numbers, and hyphens; that is, it must match the regular
|
|
# expression: `[a-zA-Z\\d-_]+`. The maximum length is 100
|
|
# characters. Can be empty to allow the system to generate one.
|
|
"inspectJob": {
|
|
"storageConfig": { # Shared message indicating Cloud storage type. # The data to scan.
|
|
"datastoreOptions": { # Options defining a data set within Google Cloud Datastore. # Google Cloud Datastore options specification.
|
|
"partitionId": { # Datastore partition ID. # A partition ID identifies a grouping of entities. The grouping is always
|
|
# by project and namespace, however the namespace ID may be empty.
|
|
# A partition ID identifies a grouping of entities. The grouping is always
|
|
# by project and namespace, however the namespace ID may be empty.
|
|
#
|
|
# A partition ID contains several dimensions:
|
|
# project ID and namespace ID.
|
|
"projectId": "A String", # The ID of the project to which the entities belong.
|
|
"namespaceId": "A String", # If not empty, the ID of the namespace to which the entities belong.
|
|
},
|
|
"kind": { # A representation of a Datastore kind. # The kind to process.
|
|
"name": "A String", # The name of the kind.
|
|
},
|
|
},
|
|
"bigQueryOptions": { # Options defining BigQuery table and row identifiers. # BigQuery options specification.
|
|
"excludedFields": [ # References to fields excluded from scanning. This allows you to skip
|
|
# inspection of entire columns which you know have no findings.
|
|
{ # General identifier of a data field in a storage service.
|
|
"name": "A String", # Name describing the field.
|
|
},
|
|
],
|
|
"rowsLimit": "A String", # Max number of rows to scan. If the table has more rows than this value, the
|
|
# rest of the rows are omitted. If not set, or if set to 0, all rows will be
|
|
# scanned. Only one of rows_limit and rows_limit_percent can be specified.
|
|
# Cannot be used in conjunction with TimespanConfig.
|
|
"sampleMethod": "A String",
|
|
"identifyingFields": [ # References to fields uniquely identifying rows within the table.
|
|
# Nested fields in the format, like `person.birthdate.year`, are allowed.
|
|
{ # General identifier of a data field in a storage service.
|
|
"name": "A String", # Name describing the field.
|
|
},
|
|
],
|
|
"rowsLimitPercent": 42, # Max percentage of rows to scan. The rest are omitted. The number of rows
|
|
# scanned is rounded down. Must be between 0 and 100, inclusively. Both 0 and
|
|
# 100 means no limit. Defaults to 0. Only one of rows_limit and
|
|
# rows_limit_percent can be specified. Cannot be used in conjunction with
|
|
# TimespanConfig.
|
|
"tableReference": { # Message defining the location of a BigQuery table. A table is uniquely # Complete BigQuery table reference.
|
|
# identified by its project_id, dataset_id, and table_name. Within a query
|
|
# a table is often referenced with a string in the format of:
|
|
# `<project_id>:<dataset_id>.<table_id>` or
|
|
# `<project_id>.<dataset_id>.<table_id>`.
|
|
"projectId": "A String", # The Google Cloud Platform project ID of the project containing the table.
|
|
# If omitted, project ID is inferred from the API call.
|
|
"tableId": "A String", # Name of the table.
|
|
"datasetId": "A String", # Dataset ID of the table.
|
|
},
|
|
},
|
|
"timespanConfig": { # Configuration of the timespan of the items to include in scanning.
|
|
# Currently only supported when inspecting Google Cloud Storage and BigQuery.
|
|
"timestampField": { # General identifier of a data field in a storage service. # Specification of the field containing the timestamp of scanned items.
|
|
# Used for data sources like Datastore or BigQuery.
|
|
# If not specified for BigQuery, table last modification timestamp
|
|
# is checked against given time span.
|
|
# The valid data types of the timestamp field are:
|
|
# for BigQuery - timestamp, date, datetime;
|
|
# for Datastore - timestamp.
|
|
# Datastore entity will be scanned if the timestamp property does not exist
|
|
# or its value is empty or invalid.
|
|
"name": "A String", # Name describing the field.
|
|
},
|
|
"endTime": "A String", # Exclude files or rows newer than this value.
|
|
# If set to zero, no upper time limit is applied.
|
|
"startTime": "A String", # Exclude files or rows older than this value.
|
|
"enableAutoPopulationOfTimespanConfig": True or False, # When the job is started by a JobTrigger we will automatically figure out
|
|
# a valid start_time to avoid scanning files that have not been modified
|
|
# since the last time the JobTrigger executed. This will be based on the
|
|
# time of the execution of the last run of the JobTrigger.
|
|
},
|
|
"cloudStorageOptions": { # Options defining a file or a set of files within a Google Cloud Storage # Google Cloud Storage options specification.
|
|
# bucket.
|
|
"bytesLimitPerFile": "A String", # Max number of bytes to scan from a file. If a scanned file's size is bigger
|
|
# than this value then the rest of the bytes are omitted. Only one
|
|
# of bytes_limit_per_file and bytes_limit_per_file_percent can be specified.
|
|
"sampleMethod": "A String",
|
|
"fileSet": { # Set of files to scan. # The set of one or more files to scan.
|
|
"url": "A String", # The Cloud Storage url of the file(s) to scan, in the format
|
|
# `gs://<bucket>/<path>`. Trailing wildcard in the path is allowed.
|
|
#
|
|
# If the url ends in a trailing slash, the bucket or directory represented
|
|
# by the url will be scanned non-recursively (content in sub-directories
|
|
# will not be scanned). This means that `gs://mybucket/` is equivalent to
|
|
# `gs://mybucket/*`, and `gs://mybucket/directory/` is equivalent to
|
|
# `gs://mybucket/directory/*`.
|
|
#
|
|
# Exactly one of `url` or `regex_file_set` must be set.
|
|
"regexFileSet": { # Message representing a set of files in a Cloud Storage bucket. Regular # The regex-filtered set of files to scan. Exactly one of `url` or
|
|
# `regex_file_set` must be set.
|
|
# expressions are used to allow fine-grained control over which files in the
|
|
# bucket to include.
|
|
#
|
|
# Included files are those that match at least one item in `include_regex` and
|
|
# do not match any items in `exclude_regex`. Note that a file that matches
|
|
# items from both lists will _not_ be included. For a match to occur, the
|
|
# entire file path (i.e., everything in the url after the bucket name) must
|
|
# match the regular expression.
|
|
#
|
|
# For example, given the input `{bucket_name: "mybucket", include_regex:
|
|
# ["directory1/.*"], exclude_regex:
|
|
# ["directory1/excluded.*"]}`:
|
|
#
|
|
# * `gs://mybucket/directory1/myfile` will be included
|
|
# * `gs://mybucket/directory1/directory2/myfile` will be included (`.*` matches
|
|
# across `/`)
|
|
# * `gs://mybucket/directory0/directory1/myfile` will _not_ be included (the
|
|
# full path doesn't match any items in `include_regex`)
|
|
# * `gs://mybucket/directory1/excludedfile` will _not_ be included (the path
|
|
# matches an item in `exclude_regex`)
|
|
#
|
|
# If `include_regex` is left empty, it will match all files by default
|
|
# (this is equivalent to setting `include_regex: [".*"]`).
|
|
#
|
|
# Some other common use cases:
|
|
#
|
|
# * `{bucket_name: "mybucket", exclude_regex: [".*\.pdf"]}` will include all
|
|
# files in `mybucket` except for .pdf files
|
|
# * `{bucket_name: "mybucket", include_regex: ["directory/[^/]+"]}` will
|
|
# include all files directly under `gs://mybucket/directory/`, without matching
|
|
# across `/`
|
|
"excludeRegex": [ # A list of regular expressions matching file paths to exclude. All files in
|
|
# the bucket that match at least one of these regular expressions will be
|
|
# excluded from the scan.
|
|
#
|
|
# Regular expressions use RE2
|
|
# [syntax](https://github.com/google/re2/wiki/Syntax); a guide can be found
|
|
# under the google/re2 repository on GitHub.
|
|
"A String",
|
|
],
|
|
"bucketName": "A String", # The name of a Cloud Storage bucket. Required.
|
|
"includeRegex": [ # A list of regular expressions matching file paths to include. All files in
|
|
# the bucket that match at least one of these regular expressions will be
|
|
# included in the set of files, except for those that also match an item in
|
|
# `exclude_regex`. Leaving this field empty will match all files by default
|
|
# (this is equivalent to including `.*` in the list).
|
|
#
|
|
# Regular expressions use RE2
|
|
# [syntax](https://github.com/google/re2/wiki/Syntax); a guide can be found
|
|
# under the google/re2 repository on GitHub.
|
|
"A String",
|
|
],
|
|
},
|
|
},
|
|
"bytesLimitPerFilePercent": 42, # Max percentage of bytes to scan from a file. The rest are omitted. The
|
|
# number of bytes scanned is rounded down. Must be between 0 and 100,
|
|
# inclusively. Both 0 and 100 means no limit. Defaults to 0. Only one
|
|
# of bytes_limit_per_file and bytes_limit_per_file_percent can be specified.
|
|
"filesLimitPercent": 42, # Limits the number of files to scan to this percentage of the input FileSet.
|
|
# Number of files scanned is rounded down. Must be between 0 and 100,
|
|
# inclusively. Both 0 and 100 means no limit. Defaults to 0.
|
|
"fileTypes": [ # List of file type groups to include in the scan.
|
|
# If empty, all files are scanned and available data format processors
|
|
# are applied. In addition, the binary content of the selected files
|
|
# is always scanned as well.
|
|
"A String",
|
|
],
|
|
},
|
|
},
|
|
"inspectConfig": { # Configuration description of the scanning process. # How and what to scan for.
|
|
# When used with redactContent only info_types and min_likelihood are currently
|
|
# used.
|
|
"excludeInfoTypes": True or False, # When true, excludes type information of the findings.
|
|
"limits": {
|
|
"maxFindingsPerRequest": 42, # Max number of findings that will be returned per request/job.
|
|
# When set within `InspectContentRequest`, the maximum returned is 2000
|
|
# regardless if this is set higher.
|
|
"maxFindingsPerInfoType": [ # Configuration of findings limit given for specified infoTypes.
|
|
{ # Max findings configuration per infoType, per content item or long
|
|
# running DlpJob.
|
|
"infoType": { # Type of information detected by the API. # Type of information the findings limit applies to. Only one limit per
|
|
# info_type should be provided. If InfoTypeLimit does not have an
|
|
# info_type, the DLP API applies the limit against all info_types that
|
|
# are found but not specified in another InfoTypeLimit.
|
|
"name": "A String", # Name of the information type. Either a name of your choosing when
|
|
# creating a CustomInfoType, or one of the names listed
|
|
# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying
|
|
# a built-in type. InfoType names should conform to the pattern
|
|
# [a-zA-Z0-9_]{1,64}.
|
|
},
|
|
"maxFindings": 42, # Max findings limit for the given infoType.
|
|
},
|
|
],
|
|
"maxFindingsPerItem": 42, # Max number of findings that will be returned for each item scanned.
|
|
# When set within `InspectDataSourceRequest`,
|
|
# the maximum returned is 2000 regardless if this is set higher.
|
|
# When set within `InspectContentRequest`, this field is ignored.
|
|
},
|
|
"minLikelihood": "A String", # Only returns findings equal or above this threshold. The default is
|
|
# POSSIBLE.
|
|
# See https://cloud.google.com/dlp/docs/likelihood to learn more.
|
|
"customInfoTypes": [ # CustomInfoTypes provided by the user. See
|
|
# https://cloud.google.com/dlp/docs/creating-custom-infotypes to learn more.
|
|
{ # Custom information type provided by the user. Used to find domain-specific
|
|
# sensitive information configurable to the data in question.
|
|
"regex": { # Message defining a custom regular expression. # Regular expression based CustomInfoType.
|
|
"pattern": "A String", # Pattern defining the regular expression. Its syntax
|
|
# (https://github.com/google/re2/wiki/Syntax) can be found under the
|
|
# google/re2 repository on GitHub.
|
|
"groupIndexes": [ # The index of the submatch to extract as findings. When not
|
|
# specified, the entire match is returned. No more than 3 may be included.
|
|
42,
|
|
],
|
|
},
|
|
"surrogateType": { # Message for detecting output from deidentification transformations # Message for detecting output from deidentification transformations that
|
|
# support reversing.
|
|
# such as
|
|
# [`CryptoReplaceFfxFpeConfig`](/dlp/docs/reference/rest/v2/organizations.deidentifyTemplates#cryptoreplaceffxfpeconfig).
|
|
# These types of transformations are
|
|
# those that perform pseudonymization, thereby producing a "surrogate" as
|
|
# output. This should be used in conjunction with a field on the
|
|
# transformation such as `surrogate_info_type`. This CustomInfoType does
|
|
# not support the use of `detection_rules`.
|
|
},
|
|
"infoType": { # Type of information detected by the API. # CustomInfoType can either be a new infoType, or an extension of built-in
|
|
# infoType, when the name matches one of existing infoTypes and that infoType
|
|
# is specified in `InspectContent.info_types` field. Specifying the latter
|
|
# adds findings to the one detected by the system. If built-in info type is
|
|
# not specified in `InspectContent.info_types` list then the name is treated
|
|
# as a custom info type.
|
|
"name": "A String", # Name of the information type. Either a name of your choosing when
|
|
# creating a CustomInfoType, or one of the names listed
|
|
# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying
|
|
# a built-in type. InfoType names should conform to the pattern
|
|
# [a-zA-Z0-9_]{1,64}.
|
|
},
|
|
"dictionary": { # Custom information type based on a dictionary of words or phrases. This can # A list of phrases to detect as a CustomInfoType.
|
|
# be used to match sensitive information specific to the data, such as a list
|
|
# of employee IDs or job titles.
|
|
#
|
|
# Dictionary words are case-insensitive and all characters other than letters
|
|
# and digits in the unicode [Basic Multilingual
|
|
# Plane](https://en.wikipedia.org/wiki/Plane_%28Unicode%29#Basic_Multilingual_Plane)
|
|
# will be replaced with whitespace when scanning for matches, so the
|
|
# dictionary phrase "Sam Johnson" will match all three phrases "sam johnson",
|
|
# "Sam, Johnson", and "Sam (Johnson)". Additionally, the characters
|
|
# surrounding any match must be of a different type than the adjacent
|
|
# characters within the word, so letters must be next to non-letters and
|
|
# digits next to non-digits. For example, the dictionary word "jen" will
|
|
# match the first three letters of the text "jen123" but will return no
|
|
# matches for "jennifer".
|
|
#
|
|
# Dictionary words containing a large number of characters that are not
|
|
# letters or digits may result in unexpected findings because such characters
|
|
# are treated as whitespace. The
|
|
# [limits](https://cloud.google.com/dlp/limits) page contains details about
|
|
# the size limits of dictionaries. For dictionaries that do not fit within
|
|
# these constraints, consider using `LargeCustomDictionaryConfig` in the
|
|
# `StoredInfoType` API.
|
|
"wordList": { # Message defining a list of words or phrases to search for in the data. # List of words or phrases to search for.
|
|
"words": [ # Words or phrases defining the dictionary. The dictionary must contain
|
|
# at least one phrase and every phrase must contain at least 2 characters
|
|
# that are letters or digits. [required]
|
|
"A String",
|
|
],
|
|
},
|
|
"cloudStoragePath": { # Message representing a single file or path in Cloud Storage. # Newline-delimited file of words in Cloud Storage. Only a single file
|
|
# is accepted.
|
|
"path": "A String", # A url representing a file or path (no wildcards) in Cloud Storage.
|
|
# Example: gs://[BUCKET_NAME]/dictionary.txt
|
|
},
|
|
},
|
|
"storedType": { # A reference to a StoredInfoType to use with scanning. # Load an existing `StoredInfoType` resource for use in
|
|
# `InspectDataSource`. Not currently supported in `InspectContent`.
|
|
"name": "A String", # Resource name of the requested `StoredInfoType`, for example
|
|
# `organizations/433245324/storedInfoTypes/432452342` or
|
|
# `projects/project-id/storedInfoTypes/432452342`.
|
|
"createTime": "A String", # Timestamp indicating when the version of the `StoredInfoType` used for
|
|
# inspection was created. Output-only field, populated by the system.
|
|
},
|
|
"detectionRules": [ # Set of detection rules to apply to all findings of this CustomInfoType.
|
|
# Rules are applied in order that they are specified. Not supported for the
|
|
# `surrogate_type` CustomInfoType.
|
|
{ # Deprecated; use `InspectionRuleSet` instead. Rule for modifying a
|
|
# `CustomInfoType` to alter behavior under certain circumstances, depending
|
|
# on the specific details of the rule. Not supported for the `surrogate_type`
|
|
# custom infoType.
|
|
"hotwordRule": { # The rule that adjusts the likelihood of findings within a certain # Hotword-based detection rule.
|
|
# proximity of hotwords.
|
|
"proximity": { # Message for specifying a window around a finding to apply a detection # Proximity of the finding within which the entire hotword must reside.
|
|
# The total length of the window cannot exceed 1000 characters. Note that
|
|
# the finding itself will be included in the window, so that hotwords may
|
|
# be used to match substrings of the finding itself. For example, the
|
|
# certainty of a phone number regex "\(\d{3}\) \d{3}-\d{4}" could be
|
|
# adjusted upwards if the area code is known to be the local area code of
|
|
# a company office using the hotword regex "\(xxx\)", where "xxx"
|
|
# is the area code in question.
|
|
# rule.
|
|
"windowAfter": 42, # Number of characters after the finding to consider.
|
|
"windowBefore": 42, # Number of characters before the finding to consider.
|
|
},
|
|
"hotwordRegex": { # Message defining a custom regular expression. # Regular expression pattern defining what qualifies as a hotword.
|
|
"pattern": "A String", # Pattern defining the regular expression. Its syntax
|
|
# (https://github.com/google/re2/wiki/Syntax) can be found under the
|
|
# google/re2 repository on GitHub.
|
|
"groupIndexes": [ # The index of the submatch to extract as findings. When not
|
|
# specified, the entire match is returned. No more than 3 may be included.
|
|
42,
|
|
],
|
|
},
|
|
"likelihoodAdjustment": { # Message for specifying an adjustment to the likelihood of a finding as # Likelihood adjustment to apply to all matching findings.
|
|
# part of a detection rule.
|
|
"relativeLikelihood": 42, # Increase or decrease the likelihood by the specified number of
|
|
# levels. For example, if a finding would be `POSSIBLE` without the
|
|
# detection rule and `relative_likelihood` is 1, then it is upgraded to
|
|
# `LIKELY`, while a value of -1 would downgrade it to `UNLIKELY`.
|
|
# Likelihood may never drop below `VERY_UNLIKELY` or exceed
|
|
# `VERY_LIKELY`, so applying an adjustment of 1 followed by an
|
|
# adjustment of -1 when base likelihood is `VERY_LIKELY` will result in
|
|
# a final likelihood of `LIKELY`.
|
|
"fixedLikelihood": "A String", # Set the likelihood of a finding to a fixed value.
|
|
},
|
|
},
|
|
},
|
|
],
|
|
"exclusionType": "A String", # If set to EXCLUSION_TYPE_EXCLUDE this infoType will not cause a finding
|
|
# to be returned. It still can be used for rules matching.
|
|
"likelihood": "A String", # Likelihood to return for this CustomInfoType. This base value can be
|
|
# altered by a detection rule if the finding meets the criteria specified by
|
|
# the rule. Defaults to `VERY_LIKELY` if not specified.
|
|
},
|
|
],
|
|
"includeQuote": True or False, # When true, a contextual quote from the data that triggered a finding is
|
|
# included in the response; see Finding.quote.
|
|
"ruleSet": [ # Set of rules to apply to the findings for this InspectConfig.
|
|
# Exclusion rules, contained in the set are executed in the end, other
|
|
# rules are executed in the order they are specified for each info type.
|
|
{ # Rule set for modifying a set of infoTypes to alter behavior under certain
|
|
# circumstances, depending on the specific details of the rules within the set.
|
|
"rules": [ # Set of rules to be applied to infoTypes. The rules are applied in order.
|
|
{ # A single inspection rule to be applied to infoTypes, specified in
|
|
# `InspectionRuleSet`.
|
|
"hotwordRule": { # The rule that adjusts the likelihood of findings within a certain # Hotword-based detection rule.
|
|
# proximity of hotwords.
|
|
"proximity": { # Message for specifying a window around a finding to apply a detection # Proximity of the finding within which the entire hotword must reside.
|
|
# The total length of the window cannot exceed 1000 characters. Note that
|
|
# the finding itself will be included in the window, so that hotwords may
|
|
# be used to match substrings of the finding itself. For example, the
|
|
# certainty of a phone number regex "\(\d{3}\) \d{3}-\d{4}" could be
|
|
# adjusted upwards if the area code is known to be the local area code of
|
|
# a company office using the hotword regex "\(xxx\)", where "xxx"
|
|
# is the area code in question.
|
|
# rule.
|
|
"windowAfter": 42, # Number of characters after the finding to consider.
|
|
"windowBefore": 42, # Number of characters before the finding to consider.
|
|
},
|
|
"hotwordRegex": { # Message defining a custom regular expression. # Regular expression pattern defining what qualifies as a hotword.
|
|
"pattern": "A String", # Pattern defining the regular expression. Its syntax
|
|
# (https://github.com/google/re2/wiki/Syntax) can be found under the
|
|
# google/re2 repository on GitHub.
|
|
"groupIndexes": [ # The index of the submatch to extract as findings. When not
|
|
# specified, the entire match is returned. No more than 3 may be included.
|
|
42,
|
|
],
|
|
},
|
|
"likelihoodAdjustment": { # Message for specifying an adjustment to the likelihood of a finding as # Likelihood adjustment to apply to all matching findings.
|
|
# part of a detection rule.
|
|
"relativeLikelihood": 42, # Increase or decrease the likelihood by the specified number of
|
|
# levels. For example, if a finding would be `POSSIBLE` without the
|
|
# detection rule and `relative_likelihood` is 1, then it is upgraded to
|
|
# `LIKELY`, while a value of -1 would downgrade it to `UNLIKELY`.
|
|
# Likelihood may never drop below `VERY_UNLIKELY` or exceed
|
|
# `VERY_LIKELY`, so applying an adjustment of 1 followed by an
|
|
# adjustment of -1 when base likelihood is `VERY_LIKELY` will result in
|
|
# a final likelihood of `LIKELY`.
|
|
"fixedLikelihood": "A String", # Set the likelihood of a finding to a fixed value.
|
|
},
|
|
},
|
|
"exclusionRule": { # The rule that specifies conditions when findings of infoTypes specified in # Exclusion rule.
|
|
# `InspectionRuleSet` are removed from results.
|
|
"regex": { # Message defining a custom regular expression. # Regular expression which defines the rule.
|
|
"pattern": "A String", # Pattern defining the regular expression. Its syntax
|
|
# (https://github.com/google/re2/wiki/Syntax) can be found under the
|
|
# google/re2 repository on GitHub.
|
|
"groupIndexes": [ # The index of the submatch to extract as findings. When not
|
|
# specified, the entire match is returned. No more than 3 may be included.
|
|
42,
|
|
],
|
|
},
|
|
"excludeInfoTypes": { # List of exclude infoTypes. # Set of infoTypes for which findings would affect this rule.
|
|
"infoTypes": [ # InfoType list in ExclusionRule rule drops a finding when it overlaps or
|
|
# contained within with a finding of an infoType from this list. For
|
|
# example, for `InspectionRuleSet.info_types` containing "PHONE_NUMBER"` and
|
|
# `exclusion_rule` containing `exclude_info_types.info_types` with
|
|
# "EMAIL_ADDRESS" the phone number findings are dropped if they overlap
|
|
# with EMAIL_ADDRESS finding.
|
|
# That leads to "555-222-2222@example.org" to generate only a single
|
|
# finding, namely email address.
|
|
{ # Type of information detected by the API.
|
|
"name": "A String", # Name of the information type. Either a name of your choosing when
|
|
# creating a CustomInfoType, or one of the names listed
|
|
# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying
|
|
# a built-in type. InfoType names should conform to the pattern
|
|
# [a-zA-Z0-9_]{1,64}.
|
|
},
|
|
],
|
|
},
|
|
"dictionary": { # Custom information type based on a dictionary of words or phrases. This can # Dictionary which defines the rule.
|
|
# be used to match sensitive information specific to the data, such as a list
|
|
# of employee IDs or job titles.
|
|
#
|
|
# Dictionary words are case-insensitive and all characters other than letters
|
|
# and digits in the unicode [Basic Multilingual
|
|
# Plane](https://en.wikipedia.org/wiki/Plane_%28Unicode%29#Basic_Multilingual_Plane)
|
|
# will be replaced with whitespace when scanning for matches, so the
|
|
# dictionary phrase "Sam Johnson" will match all three phrases "sam johnson",
|
|
# "Sam, Johnson", and "Sam (Johnson)". Additionally, the characters
|
|
# surrounding any match must be of a different type than the adjacent
|
|
# characters within the word, so letters must be next to non-letters and
|
|
# digits next to non-digits. For example, the dictionary word "jen" will
|
|
# match the first three letters of the text "jen123" but will return no
|
|
# matches for "jennifer".
|
|
#
|
|
# Dictionary words containing a large number of characters that are not
|
|
# letters or digits may result in unexpected findings because such characters
|
|
# are treated as whitespace. The
|
|
# [limits](https://cloud.google.com/dlp/limits) page contains details about
|
|
# the size limits of dictionaries. For dictionaries that do not fit within
|
|
# these constraints, consider using `LargeCustomDictionaryConfig` in the
|
|
# `StoredInfoType` API.
|
|
"wordList": { # Message defining a list of words or phrases to search for in the data. # List of words or phrases to search for.
|
|
"words": [ # Words or phrases defining the dictionary. The dictionary must contain
|
|
# at least one phrase and every phrase must contain at least 2 characters
|
|
# that are letters or digits. [required]
|
|
"A String",
|
|
],
|
|
},
|
|
"cloudStoragePath": { # Message representing a single file or path in Cloud Storage. # Newline-delimited file of words in Cloud Storage. Only a single file
|
|
# is accepted.
|
|
"path": "A String", # A url representing a file or path (no wildcards) in Cloud Storage.
|
|
# Example: gs://[BUCKET_NAME]/dictionary.txt
|
|
},
|
|
},
|
|
"matchingType": "A String", # How the rule is applied, see MatchingType documentation for details.
|
|
},
|
|
},
|
|
],
|
|
"infoTypes": [ # List of infoTypes this rule set is applied to.
|
|
{ # Type of information detected by the API.
|
|
"name": "A String", # Name of the information type. Either a name of your choosing when
|
|
# creating a CustomInfoType, or one of the names listed
|
|
# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying
|
|
# a built-in type. InfoType names should conform to the pattern
|
|
# [a-zA-Z0-9_]{1,64}.
|
|
},
|
|
],
|
|
},
|
|
],
|
|
"contentOptions": [ # List of options defining data content to scan.
|
|
# If empty, text, images, and other content will be included.
|
|
"A String",
|
|
],
|
|
"infoTypes": [ # Restricts what info_types to look for. The values must correspond to
|
|
# InfoType values returned by ListInfoTypes or listed at
|
|
# https://cloud.google.com/dlp/docs/infotypes-reference.
|
|
#
|
|
# When no InfoTypes or CustomInfoTypes are specified in a request, the
|
|
# system may automatically choose what detectors to run. By default this may
|
|
# be all types, but may change over time as detectors are updated.
|
|
#
|
|
# The special InfoType name "ALL_BASIC" can be used to trigger all detectors,
|
|
# but may change over time as new InfoTypes are added. If you need precise
|
|
# control and predictability as to what detectors are run you should specify
|
|
# specific InfoTypes listed in the reference.
|
|
{ # Type of information detected by the API.
|
|
"name": "A String", # Name of the information type. Either a name of your choosing when
|
|
# creating a CustomInfoType, or one of the names listed
|
|
# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying
|
|
# a built-in type. InfoType names should conform to the pattern
|
|
# [a-zA-Z0-9_]{1,64}.
|
|
},
|
|
],
|
|
},
|
|
"inspectTemplateName": "A String", # If provided, will be used as the default for all values in InspectConfig.
|
|
# `inspect_config` will be merged into the values persisted as part of the
|
|
# template.
|
|
"actions": [ # Actions to execute at the completion of the job.
|
|
{ # A task to execute on the completion of a job.
|
|
# See https://cloud.google.com/dlp/docs/concepts-actions to learn more.
|
|
"saveFindings": { # If set, the detailed findings will be persisted to the specified # Save resulting findings in a provided location.
|
|
# OutputStorageConfig. Only a single instance of this action can be
|
|
# specified.
|
|
# Compatible with: Inspect, Risk
|
|
"outputConfig": { # Cloud repository for storing output.
|
|
"table": { # Message defining the location of a BigQuery table. A table is uniquely # Store findings in an existing table or a new table in an existing
|
|
# dataset. If table_id is not set a new one will be generated
|
|
# for you with the following format:
|
|
# dlp_googleapis_yyyy_mm_dd_[dlp_job_id]. Pacific timezone will be used for
|
|
# generating the date details.
|
|
#
|
|
# For Inspect, each column in an existing output table must have the same
|
|
# name, type, and mode of a field in the `Finding` object.
|
|
#
|
|
# For Risk, an existing output table should be the output of a previous
|
|
# Risk analysis job run on the same source table, with the same privacy
|
|
# metric and quasi-identifiers. Risk jobs that analyze the same table but
|
|
# compute a different privacy metric, or use different sets of
|
|
# quasi-identifiers, cannot store their results in the same table.
|
|
# identified by its project_id, dataset_id, and table_name. Within a query
|
|
# a table is often referenced with a string in the format of:
|
|
# `<project_id>:<dataset_id>.<table_id>` or
|
|
# `<project_id>.<dataset_id>.<table_id>`.
|
|
"projectId": "A String", # The Google Cloud Platform project ID of the project containing the table.
|
|
# If omitted, project ID is inferred from the API call.
|
|
"tableId": "A String", # Name of the table.
|
|
"datasetId": "A String", # Dataset ID of the table.
|
|
},
|
|
"outputSchema": "A String", # Schema used for writing the findings for Inspect jobs. This field is only
|
|
# used for Inspect and must be unspecified for Risk jobs. Columns are derived
|
|
# from the `Finding` object. If appending to an existing table, any columns
|
|
# from the predefined schema that are missing will be added. No columns in
|
|
# the existing table will be deleted.
|
|
#
|
|
# If unspecified, then all available columns will be used for a new table or
|
|
# an (existing) table with no schema, and no changes will be made to an
|
|
# existing table that has a schema.
|
|
},
|
|
},
|
|
"jobNotificationEmails": { # Enable email notification to project owners and editors on jobs's # Enable email notification to project owners and editors on job's
|
|
# completion/failure.
|
|
# completion/failure.
|
|
},
|
|
"publishSummaryToCscc": { # Publish the result summary of a DlpJob to the Cloud Security # Publish summary to Cloud Security Command Center (Alpha).
|
|
# Command Center (CSCC Alpha).
|
|
# This action is only available for projects which are parts of
|
|
# an organization and whitelisted for the alpha Cloud Security Command
|
|
# Center.
|
|
# The action will publish count of finding instances and their info types.
|
|
# The summary of findings will be persisted in CSCC and are governed by CSCC
|
|
# service-specific policy, see https://cloud.google.com/terms/service-terms
|
|
# Only a single instance of this action can be specified.
|
|
# Compatible with: Inspect
|
|
},
|
|
"pubSub": { # Publish a message into given Pub/Sub topic when DlpJob has completed. The # Publish a notification to a pubsub topic.
|
|
# message contains a single field, `DlpJobName`, which is equal to the
|
|
# finished job's
|
|
# [`DlpJob.name`](/dlp/docs/reference/rest/v2/projects.dlpJobs#DlpJob).
|
|
# Compatible with: Inspect, Risk
|
|
"topic": "A String", # Cloud Pub/Sub topic to send notifications to. The topic must have given
|
|
# publishing access rights to the DLP API service account executing
|
|
# the long running DlpJob sending the notifications.
|
|
# Format is projects/{project}/topics/{topic}.
|
|
},
|
|
},
|
|
],
|
|
},
|
|
}
|
|
|
|
x__xgafv: string, V1 error format.
|
|
Allowed values
|
|
1 - v1 error format
|
|
2 - v2 error format
|
|
|
|
Returns:
|
|
An object of the form:
|
|
|
|
{ # Combines all of the information about a DLP job.
|
|
"errors": [ # A stream of errors encountered running the job.
|
|
{ # Details information about an error encountered during job execution or
|
|
# the results of an unsuccessful activation of the JobTrigger.
|
|
# Output only field.
|
|
"timestamps": [ # The times the error occurred.
|
|
"A String",
|
|
],
|
|
"details": { # The `Status` type defines a logical error model that is suitable for
|
|
# different programming environments, including REST APIs and RPC APIs. It is
|
|
# used by [gRPC](https://github.com/grpc). Each `Status` message contains
|
|
# three pieces of data: error code, error message, and error details.
|
|
#
|
|
# You can find out more about this error model and how to work with it in the
|
|
# [API Design Guide](https://cloud.google.com/apis/design/errors).
|
|
"message": "A String", # A developer-facing error message, which should be in English. Any
|
|
# user-facing error message should be localized and sent in the
|
|
# google.rpc.Status.details field, or localized by the client.
|
|
"code": 42, # The status code, which should be an enum value of google.rpc.Code.
|
|
"details": [ # A list of messages that carry the error details. There is a common set of
|
|
# message types for APIs to use.
|
|
{
|
|
"a_key": "", # Properties of the object. Contains field @type with type URL.
|
|
},
|
|
],
|
|
},
|
|
},
|
|
],
|
|
"name": "A String", # The server-assigned name.
|
|
"inspectDetails": { # The results of an inspect DataSource job. # Results from inspecting a data source.
|
|
"requestedOptions": { # The configuration used for this job.
|
|
"snapshotInspectTemplate": { # The inspectTemplate contains a configuration (set of types of sensitive data # If run with an InspectTemplate, a snapshot of its state at the time of
|
|
# this run.
|
|
# to be detected) to be used anywhere you otherwise would normally specify
|
|
# InspectConfig. See https://cloud.google.com/dlp/docs/concepts-templates
|
|
# to learn more.
|
|
"updateTime": "A String", # The last update timestamp of a inspectTemplate, output only field.
|
|
"displayName": "A String", # Display name (max 256 chars).
|
|
"description": "A String", # Short description (max 256 chars).
|
|
"inspectConfig": { # Configuration description of the scanning process. # The core content of the template. Configuration of the scanning process.
|
|
# When used with redactContent only info_types and min_likelihood are currently
|
|
# used.
|
|
"excludeInfoTypes": True or False, # When true, excludes type information of the findings.
|
|
"limits": {
|
|
"maxFindingsPerRequest": 42, # Max number of findings that will be returned per request/job.
|
|
# When set within `InspectContentRequest`, the maximum returned is 2000
|
|
# regardless if this is set higher.
|
|
"maxFindingsPerInfoType": [ # Configuration of findings limit given for specified infoTypes.
|
|
{ # Max findings configuration per infoType, per content item or long
|
|
# running DlpJob.
|
|
"infoType": { # Type of information detected by the API. # Type of information the findings limit applies to. Only one limit per
|
|
# info_type should be provided. If InfoTypeLimit does not have an
|
|
# info_type, the DLP API applies the limit against all info_types that
|
|
# are found but not specified in another InfoTypeLimit.
|
|
"name": "A String", # Name of the information type. Either a name of your choosing when
|
|
# creating a CustomInfoType, or one of the names listed
|
|
# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying
|
|
# a built-in type. InfoType names should conform to the pattern
|
|
# [a-zA-Z0-9_]{1,64}.
|
|
},
|
|
"maxFindings": 42, # Max findings limit for the given infoType.
|
|
},
|
|
],
|
|
"maxFindingsPerItem": 42, # Max number of findings that will be returned for each item scanned.
|
|
# When set within `InspectDataSourceRequest`,
|
|
# the maximum returned is 2000 regardless if this is set higher.
|
|
# When set within `InspectContentRequest`, this field is ignored.
|
|
},
|
|
"minLikelihood": "A String", # Only returns findings equal or above this threshold. The default is
|
|
# POSSIBLE.
|
|
# See https://cloud.google.com/dlp/docs/likelihood to learn more.
|
|
"customInfoTypes": [ # CustomInfoTypes provided by the user. See
|
|
# https://cloud.google.com/dlp/docs/creating-custom-infotypes to learn more.
|
|
{ # Custom information type provided by the user. Used to find domain-specific
|
|
# sensitive information configurable to the data in question.
|
|
"regex": { # Message defining a custom regular expression. # Regular expression based CustomInfoType.
|
|
"pattern": "A String", # Pattern defining the regular expression. Its syntax
|
|
# (https://github.com/google/re2/wiki/Syntax) can be found under the
|
|
# google/re2 repository on GitHub.
|
|
"groupIndexes": [ # The index of the submatch to extract as findings. When not
|
|
# specified, the entire match is returned. No more than 3 may be included.
|
|
42,
|
|
],
|
|
},
|
|
"surrogateType": { # Message for detecting output from deidentification transformations # Message for detecting output from deidentification transformations that
|
|
# support reversing.
|
|
# such as
|
|
# [`CryptoReplaceFfxFpeConfig`](/dlp/docs/reference/rest/v2/organizations.deidentifyTemplates#cryptoreplaceffxfpeconfig).
|
|
# These types of transformations are
|
|
# those that perform pseudonymization, thereby producing a "surrogate" as
|
|
# output. This should be used in conjunction with a field on the
|
|
# transformation such as `surrogate_info_type`. This CustomInfoType does
|
|
# not support the use of `detection_rules`.
|
|
},
|
|
"infoType": { # Type of information detected by the API. # CustomInfoType can either be a new infoType, or an extension of built-in
|
|
# infoType, when the name matches one of existing infoTypes and that infoType
|
|
# is specified in `InspectContent.info_types` field. Specifying the latter
|
|
# adds findings to the one detected by the system. If built-in info type is
|
|
# not specified in `InspectContent.info_types` list then the name is treated
|
|
# as a custom info type.
|
|
"name": "A String", # Name of the information type. Either a name of your choosing when
|
|
# creating a CustomInfoType, or one of the names listed
|
|
# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying
|
|
# a built-in type. InfoType names should conform to the pattern
|
|
# [a-zA-Z0-9_]{1,64}.
|
|
},
|
|
"dictionary": { # Custom information type based on a dictionary of words or phrases. This can # A list of phrases to detect as a CustomInfoType.
|
|
# be used to match sensitive information specific to the data, such as a list
|
|
# of employee IDs or job titles.
|
|
#
|
|
# Dictionary words are case-insensitive and all characters other than letters
|
|
# and digits in the unicode [Basic Multilingual
|
|
# Plane](https://en.wikipedia.org/wiki/Plane_%28Unicode%29#Basic_Multilingual_Plane)
|
|
# will be replaced with whitespace when scanning for matches, so the
|
|
# dictionary phrase "Sam Johnson" will match all three phrases "sam johnson",
|
|
# "Sam, Johnson", and "Sam (Johnson)". Additionally, the characters
|
|
# surrounding any match must be of a different type than the adjacent
|
|
# characters within the word, so letters must be next to non-letters and
|
|
# digits next to non-digits. For example, the dictionary word "jen" will
|
|
# match the first three letters of the text "jen123" but will return no
|
|
# matches for "jennifer".
|
|
#
|
|
# Dictionary words containing a large number of characters that are not
|
|
# letters or digits may result in unexpected findings because such characters
|
|
# are treated as whitespace. The
|
|
# [limits](https://cloud.google.com/dlp/limits) page contains details about
|
|
# the size limits of dictionaries. For dictionaries that do not fit within
|
|
# these constraints, consider using `LargeCustomDictionaryConfig` in the
|
|
# `StoredInfoType` API.
|
|
"wordList": { # Message defining a list of words or phrases to search for in the data. # List of words or phrases to search for.
|
|
"words": [ # Words or phrases defining the dictionary. The dictionary must contain
|
|
# at least one phrase and every phrase must contain at least 2 characters
|
|
# that are letters or digits. [required]
|
|
"A String",
|
|
],
|
|
},
|
|
"cloudStoragePath": { # Message representing a single file or path in Cloud Storage. # Newline-delimited file of words in Cloud Storage. Only a single file
|
|
# is accepted.
|
|
"path": "A String", # A url representing a file or path (no wildcards) in Cloud Storage.
|
|
# Example: gs://[BUCKET_NAME]/dictionary.txt
|
|
},
|
|
},
|
|
"storedType": { # A reference to a StoredInfoType to use with scanning. # Load an existing `StoredInfoType` resource for use in
|
|
# `InspectDataSource`. Not currently supported in `InspectContent`.
|
|
"name": "A String", # Resource name of the requested `StoredInfoType`, for example
|
|
# `organizations/433245324/storedInfoTypes/432452342` or
|
|
# `projects/project-id/storedInfoTypes/432452342`.
|
|
"createTime": "A String", # Timestamp indicating when the version of the `StoredInfoType` used for
|
|
# inspection was created. Output-only field, populated by the system.
|
|
},
|
|
"detectionRules": [ # Set of detection rules to apply to all findings of this CustomInfoType.
|
|
# Rules are applied in order that they are specified. Not supported for the
|
|
# `surrogate_type` CustomInfoType.
|
|
{ # Deprecated; use `InspectionRuleSet` instead. Rule for modifying a
|
|
# `CustomInfoType` to alter behavior under certain circumstances, depending
|
|
# on the specific details of the rule. Not supported for the `surrogate_type`
|
|
# custom infoType.
|
|
"hotwordRule": { # The rule that adjusts the likelihood of findings within a certain # Hotword-based detection rule.
|
|
# proximity of hotwords.
|
|
"proximity": { # Message for specifying a window around a finding to apply a detection # Proximity of the finding within which the entire hotword must reside.
|
|
# The total length of the window cannot exceed 1000 characters. Note that
|
|
# the finding itself will be included in the window, so that hotwords may
|
|
# be used to match substrings of the finding itself. For example, the
|
|
# certainty of a phone number regex "\(\d{3}\) \d{3}-\d{4}" could be
|
|
# adjusted upwards if the area code is known to be the local area code of
|
|
# a company office using the hotword regex "\(xxx\)", where "xxx"
|
|
# is the area code in question.
|
|
# rule.
|
|
"windowAfter": 42, # Number of characters after the finding to consider.
|
|
"windowBefore": 42, # Number of characters before the finding to consider.
|
|
},
|
|
"hotwordRegex": { # Message defining a custom regular expression. # Regular expression pattern defining what qualifies as a hotword.
|
|
"pattern": "A String", # Pattern defining the regular expression. Its syntax
|
|
# (https://github.com/google/re2/wiki/Syntax) can be found under the
|
|
# google/re2 repository on GitHub.
|
|
"groupIndexes": [ # The index of the submatch to extract as findings. When not
|
|
# specified, the entire match is returned. No more than 3 may be included.
|
|
42,
|
|
],
|
|
},
|
|
"likelihoodAdjustment": { # Message for specifying an adjustment to the likelihood of a finding as # Likelihood adjustment to apply to all matching findings.
|
|
# part of a detection rule.
|
|
"relativeLikelihood": 42, # Increase or decrease the likelihood by the specified number of
|
|
# levels. For example, if a finding would be `POSSIBLE` without the
|
|
# detection rule and `relative_likelihood` is 1, then it is upgraded to
|
|
# `LIKELY`, while a value of -1 would downgrade it to `UNLIKELY`.
|
|
# Likelihood may never drop below `VERY_UNLIKELY` or exceed
|
|
# `VERY_LIKELY`, so applying an adjustment of 1 followed by an
|
|
# adjustment of -1 when base likelihood is `VERY_LIKELY` will result in
|
|
# a final likelihood of `LIKELY`.
|
|
"fixedLikelihood": "A String", # Set the likelihood of a finding to a fixed value.
|
|
},
|
|
},
|
|
},
|
|
],
|
|
"exclusionType": "A String", # If set to EXCLUSION_TYPE_EXCLUDE this infoType will not cause a finding
|
|
# to be returned. It still can be used for rules matching.
|
|
"likelihood": "A String", # Likelihood to return for this CustomInfoType. This base value can be
|
|
# altered by a detection rule if the finding meets the criteria specified by
|
|
# the rule. Defaults to `VERY_LIKELY` if not specified.
|
|
},
|
|
],
|
|
"includeQuote": True or False, # When true, a contextual quote from the data that triggered a finding is
|
|
# included in the response; see Finding.quote.
|
|
"ruleSet": [ # Set of rules to apply to the findings for this InspectConfig.
|
|
# Exclusion rules, contained in the set are executed in the end, other
|
|
# rules are executed in the order they are specified for each info type.
|
|
{ # Rule set for modifying a set of infoTypes to alter behavior under certain
|
|
# circumstances, depending on the specific details of the rules within the set.
|
|
"rules": [ # Set of rules to be applied to infoTypes. The rules are applied in order.
|
|
{ # A single inspection rule to be applied to infoTypes, specified in
|
|
# `InspectionRuleSet`.
|
|
"hotwordRule": { # The rule that adjusts the likelihood of findings within a certain # Hotword-based detection rule.
|
|
# proximity of hotwords.
|
|
"proximity": { # Message for specifying a window around a finding to apply a detection # Proximity of the finding within which the entire hotword must reside.
|
|
# The total length of the window cannot exceed 1000 characters. Note that
|
|
# the finding itself will be included in the window, so that hotwords may
|
|
# be used to match substrings of the finding itself. For example, the
|
|
# certainty of a phone number regex "\(\d{3}\) \d{3}-\d{4}" could be
|
|
# adjusted upwards if the area code is known to be the local area code of
|
|
# a company office using the hotword regex "\(xxx\)", where "xxx"
|
|
# is the area code in question.
|
|
# rule.
|
|
"windowAfter": 42, # Number of characters after the finding to consider.
|
|
"windowBefore": 42, # Number of characters before the finding to consider.
|
|
},
|
|
"hotwordRegex": { # Message defining a custom regular expression. # Regular expression pattern defining what qualifies as a hotword.
|
|
"pattern": "A String", # Pattern defining the regular expression. Its syntax
|
|
# (https://github.com/google/re2/wiki/Syntax) can be found under the
|
|
# google/re2 repository on GitHub.
|
|
"groupIndexes": [ # The index of the submatch to extract as findings. When not
|
|
# specified, the entire match is returned. No more than 3 may be included.
|
|
42,
|
|
],
|
|
},
|
|
"likelihoodAdjustment": { # Message for specifying an adjustment to the likelihood of a finding as # Likelihood adjustment to apply to all matching findings.
|
|
# part of a detection rule.
|
|
"relativeLikelihood": 42, # Increase or decrease the likelihood by the specified number of
|
|
# levels. For example, if a finding would be `POSSIBLE` without the
|
|
# detection rule and `relative_likelihood` is 1, then it is upgraded to
|
|
# `LIKELY`, while a value of -1 would downgrade it to `UNLIKELY`.
|
|
# Likelihood may never drop below `VERY_UNLIKELY` or exceed
|
|
# `VERY_LIKELY`, so applying an adjustment of 1 followed by an
|
|
# adjustment of -1 when base likelihood is `VERY_LIKELY` will result in
|
|
# a final likelihood of `LIKELY`.
|
|
"fixedLikelihood": "A String", # Set the likelihood of a finding to a fixed value.
|
|
},
|
|
},
|
|
"exclusionRule": { # The rule that specifies conditions when findings of infoTypes specified in # Exclusion rule.
|
|
# `InspectionRuleSet` are removed from results.
|
|
"regex": { # Message defining a custom regular expression. # Regular expression which defines the rule.
|
|
"pattern": "A String", # Pattern defining the regular expression. Its syntax
|
|
# (https://github.com/google/re2/wiki/Syntax) can be found under the
|
|
# google/re2 repository on GitHub.
|
|
"groupIndexes": [ # The index of the submatch to extract as findings. When not
|
|
# specified, the entire match is returned. No more than 3 may be included.
|
|
42,
|
|
],
|
|
},
|
|
"excludeInfoTypes": { # List of exclude infoTypes. # Set of infoTypes for which findings would affect this rule.
|
|
"infoTypes": [ # InfoType list in ExclusionRule rule drops a finding when it overlaps or
|
|
# contained within with a finding of an infoType from this list. For
|
|
# example, for `InspectionRuleSet.info_types` containing "PHONE_NUMBER"` and
|
|
# `exclusion_rule` containing `exclude_info_types.info_types` with
|
|
# "EMAIL_ADDRESS" the phone number findings are dropped if they overlap
|
|
# with EMAIL_ADDRESS finding.
|
|
# That leads to "555-222-2222@example.org" to generate only a single
|
|
# finding, namely email address.
|
|
{ # Type of information detected by the API.
|
|
"name": "A String", # Name of the information type. Either a name of your choosing when
|
|
# creating a CustomInfoType, or one of the names listed
|
|
# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying
|
|
# a built-in type. InfoType names should conform to the pattern
|
|
# [a-zA-Z0-9_]{1,64}.
|
|
},
|
|
],
|
|
},
|
|
"dictionary": { # Custom information type based on a dictionary of words or phrases. This can # Dictionary which defines the rule.
|
|
# be used to match sensitive information specific to the data, such as a list
|
|
# of employee IDs or job titles.
|
|
#
|
|
# Dictionary words are case-insensitive and all characters other than letters
|
|
# and digits in the unicode [Basic Multilingual
|
|
# Plane](https://en.wikipedia.org/wiki/Plane_%28Unicode%29#Basic_Multilingual_Plane)
|
|
# will be replaced with whitespace when scanning for matches, so the
|
|
# dictionary phrase "Sam Johnson" will match all three phrases "sam johnson",
|
|
# "Sam, Johnson", and "Sam (Johnson)". Additionally, the characters
|
|
# surrounding any match must be of a different type than the adjacent
|
|
# characters within the word, so letters must be next to non-letters and
|
|
# digits next to non-digits. For example, the dictionary word "jen" will
|
|
# match the first three letters of the text "jen123" but will return no
|
|
# matches for "jennifer".
|
|
#
|
|
# Dictionary words containing a large number of characters that are not
|
|
# letters or digits may result in unexpected findings because such characters
|
|
# are treated as whitespace. The
|
|
# [limits](https://cloud.google.com/dlp/limits) page contains details about
|
|
# the size limits of dictionaries. For dictionaries that do not fit within
|
|
# these constraints, consider using `LargeCustomDictionaryConfig` in the
|
|
# `StoredInfoType` API.
|
|
"wordList": { # Message defining a list of words or phrases to search for in the data. # List of words or phrases to search for.
|
|
"words": [ # Words or phrases defining the dictionary. The dictionary must contain
|
|
# at least one phrase and every phrase must contain at least 2 characters
|
|
# that are letters or digits. [required]
|
|
"A String",
|
|
],
|
|
},
|
|
"cloudStoragePath": { # Message representing a single file or path in Cloud Storage. # Newline-delimited file of words in Cloud Storage. Only a single file
|
|
# is accepted.
|
|
"path": "A String", # A url representing a file or path (no wildcards) in Cloud Storage.
|
|
# Example: gs://[BUCKET_NAME]/dictionary.txt
|
|
},
|
|
},
|
|
"matchingType": "A String", # How the rule is applied, see MatchingType documentation for details.
|
|
},
|
|
},
|
|
],
|
|
"infoTypes": [ # List of infoTypes this rule set is applied to.
|
|
{ # Type of information detected by the API.
|
|
"name": "A String", # Name of the information type. Either a name of your choosing when
|
|
# creating a CustomInfoType, or one of the names listed
|
|
# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying
|
|
# a built-in type. InfoType names should conform to the pattern
|
|
# [a-zA-Z0-9_]{1,64}.
|
|
},
|
|
],
|
|
},
|
|
],
|
|
"contentOptions": [ # List of options defining data content to scan.
|
|
# If empty, text, images, and other content will be included.
|
|
"A String",
|
|
],
|
|
"infoTypes": [ # Restricts what info_types to look for. The values must correspond to
|
|
# InfoType values returned by ListInfoTypes or listed at
|
|
# https://cloud.google.com/dlp/docs/infotypes-reference.
|
|
#
|
|
# When no InfoTypes or CustomInfoTypes are specified in a request, the
|
|
# system may automatically choose what detectors to run. By default this may
|
|
# be all types, but may change over time as detectors are updated.
|
|
#
|
|
# The special InfoType name "ALL_BASIC" can be used to trigger all detectors,
|
|
# but may change over time as new InfoTypes are added. If you need precise
|
|
# control and predictability as to what detectors are run you should specify
|
|
# specific InfoTypes listed in the reference.
|
|
{ # Type of information detected by the API.
|
|
"name": "A String", # Name of the information type. Either a name of your choosing when
|
|
# creating a CustomInfoType, or one of the names listed
|
|
# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying
|
|
# a built-in type. InfoType names should conform to the pattern
|
|
# [a-zA-Z0-9_]{1,64}.
|
|
},
|
|
],
|
|
},
|
|
"createTime": "A String", # The creation timestamp of a inspectTemplate, output only field.
|
|
"name": "A String", # The template name. Output only.
|
|
#
|
|
# The template will have one of the following formats:
|
|
# `projects/PROJECT_ID/inspectTemplates/TEMPLATE_ID` OR
|
|
# `organizations/ORGANIZATION_ID/inspectTemplates/TEMPLATE_ID`
|
|
},
|
|
"jobConfig": {
|
|
"storageConfig": { # Shared message indicating Cloud storage type. # The data to scan.
|
|
"datastoreOptions": { # Options defining a data set within Google Cloud Datastore. # Google Cloud Datastore options specification.
|
|
"partitionId": { # Datastore partition ID. # A partition ID identifies a grouping of entities. The grouping is always
|
|
# by project and namespace, however the namespace ID may be empty.
|
|
# A partition ID identifies a grouping of entities. The grouping is always
|
|
# by project and namespace, however the namespace ID may be empty.
|
|
#
|
|
# A partition ID contains several dimensions:
|
|
# project ID and namespace ID.
|
|
"projectId": "A String", # The ID of the project to which the entities belong.
|
|
"namespaceId": "A String", # If not empty, the ID of the namespace to which the entities belong.
|
|
},
|
|
"kind": { # A representation of a Datastore kind. # The kind to process.
|
|
"name": "A String", # The name of the kind.
|
|
},
|
|
},
|
|
"bigQueryOptions": { # Options defining BigQuery table and row identifiers. # BigQuery options specification.
|
|
"excludedFields": [ # References to fields excluded from scanning. This allows you to skip
|
|
# inspection of entire columns which you know have no findings.
|
|
{ # General identifier of a data field in a storage service.
|
|
"name": "A String", # Name describing the field.
|
|
},
|
|
],
|
|
"rowsLimit": "A String", # Max number of rows to scan. If the table has more rows than this value, the
|
|
# rest of the rows are omitted. If not set, or if set to 0, all rows will be
|
|
# scanned. Only one of rows_limit and rows_limit_percent can be specified.
|
|
# Cannot be used in conjunction with TimespanConfig.
|
|
"sampleMethod": "A String",
|
|
"identifyingFields": [ # References to fields uniquely identifying rows within the table.
|
|
# Nested fields in the format, like `person.birthdate.year`, are allowed.
|
|
{ # General identifier of a data field in a storage service.
|
|
"name": "A String", # Name describing the field.
|
|
},
|
|
],
|
|
"rowsLimitPercent": 42, # Max percentage of rows to scan. The rest are omitted. The number of rows
|
|
# scanned is rounded down. Must be between 0 and 100, inclusively. Both 0 and
|
|
# 100 means no limit. Defaults to 0. Only one of rows_limit and
|
|
# rows_limit_percent can be specified. Cannot be used in conjunction with
|
|
# TimespanConfig.
|
|
"tableReference": { # Message defining the location of a BigQuery table. A table is uniquely # Complete BigQuery table reference.
|
|
# identified by its project_id, dataset_id, and table_name. Within a query
|
|
# a table is often referenced with a string in the format of:
|
|
# `<project_id>:<dataset_id>.<table_id>` or
|
|
# `<project_id>.<dataset_id>.<table_id>`.
|
|
"projectId": "A String", # The Google Cloud Platform project ID of the project containing the table.
|
|
# If omitted, project ID is inferred from the API call.
|
|
"tableId": "A String", # Name of the table.
|
|
"datasetId": "A String", # Dataset ID of the table.
|
|
},
|
|
},
|
|
"timespanConfig": { # Configuration of the timespan of the items to include in scanning.
|
|
# Currently only supported when inspecting Google Cloud Storage and BigQuery.
|
|
"timestampField": { # General identifier of a data field in a storage service. # Specification of the field containing the timestamp of scanned items.
|
|
# Used for data sources like Datastore or BigQuery.
|
|
# If not specified for BigQuery, table last modification timestamp
|
|
# is checked against given time span.
|
|
# The valid data types of the timestamp field are:
|
|
# for BigQuery - timestamp, date, datetime;
|
|
# for Datastore - timestamp.
|
|
# Datastore entity will be scanned if the timestamp property does not exist
|
|
# or its value is empty or invalid.
|
|
"name": "A String", # Name describing the field.
|
|
},
|
|
"endTime": "A String", # Exclude files or rows newer than this value.
|
|
# If set to zero, no upper time limit is applied.
|
|
"startTime": "A String", # Exclude files or rows older than this value.
|
|
"enableAutoPopulationOfTimespanConfig": True or False, # When the job is started by a JobTrigger we will automatically figure out
|
|
# a valid start_time to avoid scanning files that have not been modified
|
|
# since the last time the JobTrigger executed. This will be based on the
|
|
# time of the execution of the last run of the JobTrigger.
|
|
},
|
|
"cloudStorageOptions": { # Options defining a file or a set of files within a Google Cloud Storage # Google Cloud Storage options specification.
|
|
# bucket.
|
|
"bytesLimitPerFile": "A String", # Max number of bytes to scan from a file. If a scanned file's size is bigger
|
|
# than this value then the rest of the bytes are omitted. Only one
|
|
# of bytes_limit_per_file and bytes_limit_per_file_percent can be specified.
|
|
"sampleMethod": "A String",
|
|
"fileSet": { # Set of files to scan. # The set of one or more files to scan.
|
|
"url": "A String", # The Cloud Storage url of the file(s) to scan, in the format
|
|
# `gs://<bucket>/<path>`. Trailing wildcard in the path is allowed.
|
|
#
|
|
# If the url ends in a trailing slash, the bucket or directory represented
|
|
# by the url will be scanned non-recursively (content in sub-directories
|
|
# will not be scanned). This means that `gs://mybucket/` is equivalent to
|
|
# `gs://mybucket/*`, and `gs://mybucket/directory/` is equivalent to
|
|
# `gs://mybucket/directory/*`.
|
|
#
|
|
# Exactly one of `url` or `regex_file_set` must be set.
|
|
"regexFileSet": { # Message representing a set of files in a Cloud Storage bucket. Regular # The regex-filtered set of files to scan. Exactly one of `url` or
|
|
# `regex_file_set` must be set.
|
|
# expressions are used to allow fine-grained control over which files in the
|
|
# bucket to include.
|
|
#
|
|
# Included files are those that match at least one item in `include_regex` and
|
|
# do not match any items in `exclude_regex`. Note that a file that matches
|
|
# items from both lists will _not_ be included. For a match to occur, the
|
|
# entire file path (i.e., everything in the url after the bucket name) must
|
|
# match the regular expression.
|
|
#
|
|
# For example, given the input `{bucket_name: "mybucket", include_regex:
|
|
# ["directory1/.*"], exclude_regex:
|
|
# ["directory1/excluded.*"]}`:
|
|
#
|
|
# * `gs://mybucket/directory1/myfile` will be included
|
|
# * `gs://mybucket/directory1/directory2/myfile` will be included (`.*` matches
|
|
# across `/`)
|
|
# * `gs://mybucket/directory0/directory1/myfile` will _not_ be included (the
|
|
# full path doesn't match any items in `include_regex`)
|
|
# * `gs://mybucket/directory1/excludedfile` will _not_ be included (the path
|
|
# matches an item in `exclude_regex`)
|
|
#
|
|
# If `include_regex` is left empty, it will match all files by default
|
|
# (this is equivalent to setting `include_regex: [".*"]`).
|
|
#
|
|
# Some other common use cases:
|
|
#
|
|
# * `{bucket_name: "mybucket", exclude_regex: [".*\.pdf"]}` will include all
|
|
# files in `mybucket` except for .pdf files
|
|
# * `{bucket_name: "mybucket", include_regex: ["directory/[^/]+"]}` will
|
|
# include all files directly under `gs://mybucket/directory/`, without matching
|
|
# across `/`
|
|
"excludeRegex": [ # A list of regular expressions matching file paths to exclude. All files in
|
|
# the bucket that match at least one of these regular expressions will be
|
|
# excluded from the scan.
|
|
#
|
|
# Regular expressions use RE2
|
|
# [syntax](https://github.com/google/re2/wiki/Syntax); a guide can be found
|
|
# under the google/re2 repository on GitHub.
|
|
"A String",
|
|
],
|
|
"bucketName": "A String", # The name of a Cloud Storage bucket. Required.
|
|
"includeRegex": [ # A list of regular expressions matching file paths to include. All files in
|
|
# the bucket that match at least one of these regular expressions will be
|
|
# included in the set of files, except for those that also match an item in
|
|
# `exclude_regex`. Leaving this field empty will match all files by default
|
|
# (this is equivalent to including `.*` in the list).
|
|
#
|
|
# Regular expressions use RE2
|
|
# [syntax](https://github.com/google/re2/wiki/Syntax); a guide can be found
|
|
# under the google/re2 repository on GitHub.
|
|
"A String",
|
|
],
|
|
},
|
|
},
|
|
"bytesLimitPerFilePercent": 42, # Max percentage of bytes to scan from a file. The rest are omitted. The
|
|
# number of bytes scanned is rounded down. Must be between 0 and 100,
|
|
# inclusively. Both 0 and 100 means no limit. Defaults to 0. Only one
|
|
# of bytes_limit_per_file and bytes_limit_per_file_percent can be specified.
|
|
"filesLimitPercent": 42, # Limits the number of files to scan to this percentage of the input FileSet.
|
|
# Number of files scanned is rounded down. Must be between 0 and 100,
|
|
# inclusively. Both 0 and 100 means no limit. Defaults to 0.
|
|
"fileTypes": [ # List of file type groups to include in the scan.
|
|
# If empty, all files are scanned and available data format processors
|
|
# are applied. In addition, the binary content of the selected files
|
|
# is always scanned as well.
|
|
"A String",
|
|
],
|
|
},
|
|
},
|
|
"inspectConfig": { # Configuration description of the scanning process. # How and what to scan for.
|
|
# When used with redactContent only info_types and min_likelihood are currently
|
|
# used.
|
|
"excludeInfoTypes": True or False, # When true, excludes type information of the findings.
|
|
"limits": {
|
|
"maxFindingsPerRequest": 42, # Max number of findings that will be returned per request/job.
|
|
# When set within `InspectContentRequest`, the maximum returned is 2000
|
|
# regardless if this is set higher.
|
|
"maxFindingsPerInfoType": [ # Configuration of findings limit given for specified infoTypes.
|
|
{ # Max findings configuration per infoType, per content item or long
|
|
# running DlpJob.
|
|
"infoType": { # Type of information detected by the API. # Type of information the findings limit applies to. Only one limit per
|
|
# info_type should be provided. If InfoTypeLimit does not have an
|
|
# info_type, the DLP API applies the limit against all info_types that
|
|
# are found but not specified in another InfoTypeLimit.
|
|
"name": "A String", # Name of the information type. Either a name of your choosing when
|
|
# creating a CustomInfoType, or one of the names listed
|
|
# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying
|
|
# a built-in type. InfoType names should conform to the pattern
|
|
# [a-zA-Z0-9_]{1,64}.
|
|
},
|
|
"maxFindings": 42, # Max findings limit for the given infoType.
|
|
},
|
|
],
|
|
"maxFindingsPerItem": 42, # Max number of findings that will be returned for each item scanned.
|
|
# When set within `InspectDataSourceRequest`,
|
|
# the maximum returned is 2000 regardless if this is set higher.
|
|
# When set within `InspectContentRequest`, this field is ignored.
|
|
},
|
|
"minLikelihood": "A String", # Only returns findings equal or above this threshold. The default is
|
|
# POSSIBLE.
|
|
# See https://cloud.google.com/dlp/docs/likelihood to learn more.
|
|
"customInfoTypes": [ # CustomInfoTypes provided by the user. See
|
|
# https://cloud.google.com/dlp/docs/creating-custom-infotypes to learn more.
|
|
{ # Custom information type provided by the user. Used to find domain-specific
|
|
# sensitive information configurable to the data in question.
|
|
"regex": { # Message defining a custom regular expression. # Regular expression based CustomInfoType.
|
|
"pattern": "A String", # Pattern defining the regular expression. Its syntax
|
|
# (https://github.com/google/re2/wiki/Syntax) can be found under the
|
|
# google/re2 repository on GitHub.
|
|
"groupIndexes": [ # The index of the submatch to extract as findings. When not
|
|
# specified, the entire match is returned. No more than 3 may be included.
|
|
42,
|
|
],
|
|
},
|
|
"surrogateType": { # Message for detecting output from deidentification transformations # Message for detecting output from deidentification transformations that
|
|
# support reversing.
|
|
# such as
|
|
# [`CryptoReplaceFfxFpeConfig`](/dlp/docs/reference/rest/v2/organizations.deidentifyTemplates#cryptoreplaceffxfpeconfig).
|
|
# These types of transformations are
|
|
# those that perform pseudonymization, thereby producing a "surrogate" as
|
|
# output. This should be used in conjunction with a field on the
|
|
# transformation such as `surrogate_info_type`. This CustomInfoType does
|
|
# not support the use of `detection_rules`.
|
|
},
|
|
"infoType": { # Type of information detected by the API. # CustomInfoType can either be a new infoType, or an extension of built-in
|
|
# infoType, when the name matches one of existing infoTypes and that infoType
|
|
# is specified in `InspectContent.info_types` field. Specifying the latter
|
|
# adds findings to the one detected by the system. If built-in info type is
|
|
# not specified in `InspectContent.info_types` list then the name is treated
|
|
# as a custom info type.
|
|
"name": "A String", # Name of the information type. Either a name of your choosing when
|
|
# creating a CustomInfoType, or one of the names listed
|
|
# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying
|
|
# a built-in type. InfoType names should conform to the pattern
|
|
# [a-zA-Z0-9_]{1,64}.
|
|
},
|
|
"dictionary": { # Custom information type based on a dictionary of words or phrases. This can # A list of phrases to detect as a CustomInfoType.
|
|
# be used to match sensitive information specific to the data, such as a list
|
|
# of employee IDs or job titles.
|
|
#
|
|
# Dictionary words are case-insensitive and all characters other than letters
|
|
# and digits in the unicode [Basic Multilingual
|
|
# Plane](https://en.wikipedia.org/wiki/Plane_%28Unicode%29#Basic_Multilingual_Plane)
|
|
# will be replaced with whitespace when scanning for matches, so the
|
|
# dictionary phrase "Sam Johnson" will match all three phrases "sam johnson",
|
|
# "Sam, Johnson", and "Sam (Johnson)". Additionally, the characters
|
|
# surrounding any match must be of a different type than the adjacent
|
|
# characters within the word, so letters must be next to non-letters and
|
|
# digits next to non-digits. For example, the dictionary word "jen" will
|
|
# match the first three letters of the text "jen123" but will return no
|
|
# matches for "jennifer".
|
|
#
|
|
# Dictionary words containing a large number of characters that are not
|
|
# letters or digits may result in unexpected findings because such characters
|
|
# are treated as whitespace. The
|
|
# [limits](https://cloud.google.com/dlp/limits) page contains details about
|
|
# the size limits of dictionaries. For dictionaries that do not fit within
|
|
# these constraints, consider using `LargeCustomDictionaryConfig` in the
|
|
# `StoredInfoType` API.
|
|
"wordList": { # Message defining a list of words or phrases to search for in the data. # List of words or phrases to search for.
|
|
"words": [ # Words or phrases defining the dictionary. The dictionary must contain
|
|
# at least one phrase and every phrase must contain at least 2 characters
|
|
# that are letters or digits. [required]
|
|
"A String",
|
|
],
|
|
},
|
|
"cloudStoragePath": { # Message representing a single file or path in Cloud Storage. # Newline-delimited file of words in Cloud Storage. Only a single file
|
|
# is accepted.
|
|
"path": "A String", # A url representing a file or path (no wildcards) in Cloud Storage.
|
|
# Example: gs://[BUCKET_NAME]/dictionary.txt
|
|
},
|
|
},
|
|
"storedType": { # A reference to a StoredInfoType to use with scanning. # Load an existing `StoredInfoType` resource for use in
|
|
# `InspectDataSource`. Not currently supported in `InspectContent`.
|
|
"name": "A String", # Resource name of the requested `StoredInfoType`, for example
|
|
# `organizations/433245324/storedInfoTypes/432452342` or
|
|
# `projects/project-id/storedInfoTypes/432452342`.
|
|
"createTime": "A String", # Timestamp indicating when the version of the `StoredInfoType` used for
|
|
# inspection was created. Output-only field, populated by the system.
|
|
},
|
|
"detectionRules": [ # Set of detection rules to apply to all findings of this CustomInfoType.
|
|
# Rules are applied in order that they are specified. Not supported for the
|
|
# `surrogate_type` CustomInfoType.
|
|
{ # Deprecated; use `InspectionRuleSet` instead. Rule for modifying a
|
|
# `CustomInfoType` to alter behavior under certain circumstances, depending
|
|
# on the specific details of the rule. Not supported for the `surrogate_type`
|
|
# custom infoType.
|
|
"hotwordRule": { # The rule that adjusts the likelihood of findings within a certain # Hotword-based detection rule.
|
|
# proximity of hotwords.
|
|
"proximity": { # Message for specifying a window around a finding to apply a detection # Proximity of the finding within which the entire hotword must reside.
|
|
# The total length of the window cannot exceed 1000 characters. Note that
|
|
# the finding itself will be included in the window, so that hotwords may
|
|
# be used to match substrings of the finding itself. For example, the
|
|
# certainty of a phone number regex "\(\d{3}\) \d{3}-\d{4}" could be
|
|
# adjusted upwards if the area code is known to be the local area code of
|
|
# a company office using the hotword regex "\(xxx\)", where "xxx"
|
|
# is the area code in question.
|
|
# rule.
|
|
"windowAfter": 42, # Number of characters after the finding to consider.
|
|
"windowBefore": 42, # Number of characters before the finding to consider.
|
|
},
|
|
"hotwordRegex": { # Message defining a custom regular expression. # Regular expression pattern defining what qualifies as a hotword.
|
|
"pattern": "A String", # Pattern defining the regular expression. Its syntax
|
|
# (https://github.com/google/re2/wiki/Syntax) can be found under the
|
|
# google/re2 repository on GitHub.
|
|
"groupIndexes": [ # The index of the submatch to extract as findings. When not
|
|
# specified, the entire match is returned. No more than 3 may be included.
|
|
42,
|
|
],
|
|
},
|
|
"likelihoodAdjustment": { # Message for specifying an adjustment to the likelihood of a finding as # Likelihood adjustment to apply to all matching findings.
|
|
# part of a detection rule.
|
|
"relativeLikelihood": 42, # Increase or decrease the likelihood by the specified number of
|
|
# levels. For example, if a finding would be `POSSIBLE` without the
|
|
# detection rule and `relative_likelihood` is 1, then it is upgraded to
|
|
# `LIKELY`, while a value of -1 would downgrade it to `UNLIKELY`.
|
|
# Likelihood may never drop below `VERY_UNLIKELY` or exceed
|
|
# `VERY_LIKELY`, so applying an adjustment of 1 followed by an
|
|
# adjustment of -1 when base likelihood is `VERY_LIKELY` will result in
|
|
# a final likelihood of `LIKELY`.
|
|
"fixedLikelihood": "A String", # Set the likelihood of a finding to a fixed value.
|
|
},
|
|
},
|
|
},
|
|
],
|
|
"exclusionType": "A String", # If set to EXCLUSION_TYPE_EXCLUDE this infoType will not cause a finding
|
|
# to be returned. It still can be used for rules matching.
|
|
"likelihood": "A String", # Likelihood to return for this CustomInfoType. This base value can be
|
|
# altered by a detection rule if the finding meets the criteria specified by
|
|
# the rule. Defaults to `VERY_LIKELY` if not specified.
|
|
},
|
|
],
|
|
"includeQuote": True or False, # When true, a contextual quote from the data that triggered a finding is
|
|
# included in the response; see Finding.quote.
|
|
"ruleSet": [ # Set of rules to apply to the findings for this InspectConfig.
|
|
# Exclusion rules, contained in the set are executed in the end, other
|
|
# rules are executed in the order they are specified for each info type.
|
|
{ # Rule set for modifying a set of infoTypes to alter behavior under certain
|
|
# circumstances, depending on the specific details of the rules within the set.
|
|
"rules": [ # Set of rules to be applied to infoTypes. The rules are applied in order.
|
|
{ # A single inspection rule to be applied to infoTypes, specified in
|
|
# `InspectionRuleSet`.
|
|
"hotwordRule": { # The rule that adjusts the likelihood of findings within a certain # Hotword-based detection rule.
|
|
# proximity of hotwords.
|
|
"proximity": { # Message for specifying a window around a finding to apply a detection # Proximity of the finding within which the entire hotword must reside.
|
|
# The total length of the window cannot exceed 1000 characters. Note that
|
|
# the finding itself will be included in the window, so that hotwords may
|
|
# be used to match substrings of the finding itself. For example, the
|
|
# certainty of a phone number regex "\(\d{3}\) \d{3}-\d{4}" could be
|
|
# adjusted upwards if the area code is known to be the local area code of
|
|
# a company office using the hotword regex "\(xxx\)", where "xxx"
|
|
# is the area code in question.
|
|
# rule.
|
|
"windowAfter": 42, # Number of characters after the finding to consider.
|
|
"windowBefore": 42, # Number of characters before the finding to consider.
|
|
},
|
|
"hotwordRegex": { # Message defining a custom regular expression. # Regular expression pattern defining what qualifies as a hotword.
|
|
"pattern": "A String", # Pattern defining the regular expression. Its syntax
|
|
# (https://github.com/google/re2/wiki/Syntax) can be found under the
|
|
# google/re2 repository on GitHub.
|
|
"groupIndexes": [ # The index of the submatch to extract as findings. When not
|
|
# specified, the entire match is returned. No more than 3 may be included.
|
|
42,
|
|
],
|
|
},
|
|
"likelihoodAdjustment": { # Message for specifying an adjustment to the likelihood of a finding as # Likelihood adjustment to apply to all matching findings.
|
|
# part of a detection rule.
|
|
"relativeLikelihood": 42, # Increase or decrease the likelihood by the specified number of
|
|
# levels. For example, if a finding would be `POSSIBLE` without the
|
|
# detection rule and `relative_likelihood` is 1, then it is upgraded to
|
|
# `LIKELY`, while a value of -1 would downgrade it to `UNLIKELY`.
|
|
# Likelihood may never drop below `VERY_UNLIKELY` or exceed
|
|
# `VERY_LIKELY`, so applying an adjustment of 1 followed by an
|
|
# adjustment of -1 when base likelihood is `VERY_LIKELY` will result in
|
|
# a final likelihood of `LIKELY`.
|
|
"fixedLikelihood": "A String", # Set the likelihood of a finding to a fixed value.
|
|
},
|
|
},
|
|
"exclusionRule": { # The rule that specifies conditions when findings of infoTypes specified in # Exclusion rule.
|
|
# `InspectionRuleSet` are removed from results.
|
|
"regex": { # Message defining a custom regular expression. # Regular expression which defines the rule.
|
|
"pattern": "A String", # Pattern defining the regular expression. Its syntax
|
|
# (https://github.com/google/re2/wiki/Syntax) can be found under the
|
|
# google/re2 repository on GitHub.
|
|
"groupIndexes": [ # The index of the submatch to extract as findings. When not
|
|
# specified, the entire match is returned. No more than 3 may be included.
|
|
42,
|
|
],
|
|
},
|
|
"excludeInfoTypes": { # List of exclude infoTypes. # Set of infoTypes for which findings would affect this rule.
|
|
"infoTypes": [ # InfoType list in ExclusionRule rule drops a finding when it overlaps or
|
|
# contained within with a finding of an infoType from this list. For
|
|
# example, for `InspectionRuleSet.info_types` containing "PHONE_NUMBER"` and
|
|
# `exclusion_rule` containing `exclude_info_types.info_types` with
|
|
# "EMAIL_ADDRESS" the phone number findings are dropped if they overlap
|
|
# with EMAIL_ADDRESS finding.
|
|
# That leads to "555-222-2222@example.org" to generate only a single
|
|
# finding, namely email address.
|
|
{ # Type of information detected by the API.
|
|
"name": "A String", # Name of the information type. Either a name of your choosing when
|
|
# creating a CustomInfoType, or one of the names listed
|
|
# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying
|
|
# a built-in type. InfoType names should conform to the pattern
|
|
# [a-zA-Z0-9_]{1,64}.
|
|
},
|
|
],
|
|
},
|
|
"dictionary": { # Custom information type based on a dictionary of words or phrases. This can # Dictionary which defines the rule.
|
|
# be used to match sensitive information specific to the data, such as a list
|
|
# of employee IDs or job titles.
|
|
#
|
|
# Dictionary words are case-insensitive and all characters other than letters
|
|
# and digits in the unicode [Basic Multilingual
|
|
# Plane](https://en.wikipedia.org/wiki/Plane_%28Unicode%29#Basic_Multilingual_Plane)
|
|
# will be replaced with whitespace when scanning for matches, so the
|
|
# dictionary phrase "Sam Johnson" will match all three phrases "sam johnson",
|
|
# "Sam, Johnson", and "Sam (Johnson)". Additionally, the characters
|
|
# surrounding any match must be of a different type than the adjacent
|
|
# characters within the word, so letters must be next to non-letters and
|
|
# digits next to non-digits. For example, the dictionary word "jen" will
|
|
# match the first three letters of the text "jen123" but will return no
|
|
# matches for "jennifer".
|
|
#
|
|
# Dictionary words containing a large number of characters that are not
|
|
# letters or digits may result in unexpected findings because such characters
|
|
# are treated as whitespace. The
|
|
# [limits](https://cloud.google.com/dlp/limits) page contains details about
|
|
# the size limits of dictionaries. For dictionaries that do not fit within
|
|
# these constraints, consider using `LargeCustomDictionaryConfig` in the
|
|
# `StoredInfoType` API.
|
|
"wordList": { # Message defining a list of words or phrases to search for in the data. # List of words or phrases to search for.
|
|
"words": [ # Words or phrases defining the dictionary. The dictionary must contain
|
|
# at least one phrase and every phrase must contain at least 2 characters
|
|
# that are letters or digits. [required]
|
|
"A String",
|
|
],
|
|
},
|
|
"cloudStoragePath": { # Message representing a single file or path in Cloud Storage. # Newline-delimited file of words in Cloud Storage. Only a single file
|
|
# is accepted.
|
|
"path": "A String", # A url representing a file or path (no wildcards) in Cloud Storage.
|
|
# Example: gs://[BUCKET_NAME]/dictionary.txt
|
|
},
|
|
},
|
|
"matchingType": "A String", # How the rule is applied, see MatchingType documentation for details.
|
|
},
|
|
},
|
|
],
|
|
"infoTypes": [ # List of infoTypes this rule set is applied to.
|
|
{ # Type of information detected by the API.
|
|
"name": "A String", # Name of the information type. Either a name of your choosing when
|
|
# creating a CustomInfoType, or one of the names listed
|
|
# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying
|
|
# a built-in type. InfoType names should conform to the pattern
|
|
# [a-zA-Z0-9_]{1,64}.
|
|
},
|
|
],
|
|
},
|
|
],
|
|
"contentOptions": [ # List of options defining data content to scan.
|
|
# If empty, text, images, and other content will be included.
|
|
"A String",
|
|
],
|
|
"infoTypes": [ # Restricts what info_types to look for. The values must correspond to
|
|
# InfoType values returned by ListInfoTypes or listed at
|
|
# https://cloud.google.com/dlp/docs/infotypes-reference.
|
|
#
|
|
# When no InfoTypes or CustomInfoTypes are specified in a request, the
|
|
# system may automatically choose what detectors to run. By default this may
|
|
# be all types, but may change over time as detectors are updated.
|
|
#
|
|
# The special InfoType name "ALL_BASIC" can be used to trigger all detectors,
|
|
# but may change over time as new InfoTypes are added. If you need precise
|
|
# control and predictability as to what detectors are run you should specify
|
|
# specific InfoTypes listed in the reference.
|
|
{ # Type of information detected by the API.
|
|
"name": "A String", # Name of the information type. Either a name of your choosing when
|
|
# creating a CustomInfoType, or one of the names listed
|
|
# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying
|
|
# a built-in type. InfoType names should conform to the pattern
|
|
# [a-zA-Z0-9_]{1,64}.
|
|
},
|
|
],
|
|
},
|
|
"inspectTemplateName": "A String", # If provided, will be used as the default for all values in InspectConfig.
|
|
# `inspect_config` will be merged into the values persisted as part of the
|
|
# template.
|
|
"actions": [ # Actions to execute at the completion of the job.
|
|
{ # A task to execute on the completion of a job.
|
|
# See https://cloud.google.com/dlp/docs/concepts-actions to learn more.
|
|
"saveFindings": { # If set, the detailed findings will be persisted to the specified # Save resulting findings in a provided location.
|
|
# OutputStorageConfig. Only a single instance of this action can be
|
|
# specified.
|
|
# Compatible with: Inspect, Risk
|
|
"outputConfig": { # Cloud repository for storing output.
|
|
"table": { # Message defining the location of a BigQuery table. A table is uniquely # Store findings in an existing table or a new table in an existing
|
|
# dataset. If table_id is not set a new one will be generated
|
|
# for you with the following format:
|
|
# dlp_googleapis_yyyy_mm_dd_[dlp_job_id]. Pacific timezone will be used for
|
|
# generating the date details.
|
|
#
|
|
# For Inspect, each column in an existing output table must have the same
|
|
# name, type, and mode of a field in the `Finding` object.
|
|
#
|
|
# For Risk, an existing output table should be the output of a previous
|
|
# Risk analysis job run on the same source table, with the same privacy
|
|
# metric and quasi-identifiers. Risk jobs that analyze the same table but
|
|
# compute a different privacy metric, or use different sets of
|
|
# quasi-identifiers, cannot store their results in the same table.
|
|
# identified by its project_id, dataset_id, and table_name. Within a query
|
|
# a table is often referenced with a string in the format of:
|
|
# `<project_id>:<dataset_id>.<table_id>` or
|
|
# `<project_id>.<dataset_id>.<table_id>`.
|
|
"projectId": "A String", # The Google Cloud Platform project ID of the project containing the table.
|
|
# If omitted, project ID is inferred from the API call.
|
|
"tableId": "A String", # Name of the table.
|
|
"datasetId": "A String", # Dataset ID of the table.
|
|
},
|
|
"outputSchema": "A String", # Schema used for writing the findings for Inspect jobs. This field is only
|
|
# used for Inspect and must be unspecified for Risk jobs. Columns are derived
|
|
# from the `Finding` object. If appending to an existing table, any columns
|
|
# from the predefined schema that are missing will be added. No columns in
|
|
# the existing table will be deleted.
|
|
#
|
|
# If unspecified, then all available columns will be used for a new table or
|
|
# an (existing) table with no schema, and no changes will be made to an
|
|
# existing table that has a schema.
|
|
},
|
|
},
|
|
"jobNotificationEmails": { # Enable email notification to project owners and editors on jobs's # Enable email notification to project owners and editors on job's
|
|
# completion/failure.
|
|
# completion/failure.
|
|
},
|
|
"publishSummaryToCscc": { # Publish the result summary of a DlpJob to the Cloud Security # Publish summary to Cloud Security Command Center (Alpha).
|
|
# Command Center (CSCC Alpha).
|
|
# This action is only available for projects which are parts of
|
|
# an organization and whitelisted for the alpha Cloud Security Command
|
|
# Center.
|
|
# The action will publish count of finding instances and their info types.
|
|
# The summary of findings will be persisted in CSCC and are governed by CSCC
|
|
# service-specific policy, see https://cloud.google.com/terms/service-terms
|
|
# Only a single instance of this action can be specified.
|
|
# Compatible with: Inspect
|
|
},
|
|
"pubSub": { # Publish a message into given Pub/Sub topic when DlpJob has completed. The # Publish a notification to a pubsub topic.
|
|
# message contains a single field, `DlpJobName`, which is equal to the
|
|
# finished job's
|
|
# [`DlpJob.name`](/dlp/docs/reference/rest/v2/projects.dlpJobs#DlpJob).
|
|
# Compatible with: Inspect, Risk
|
|
"topic": "A String", # Cloud Pub/Sub topic to send notifications to. The topic must have given
|
|
# publishing access rights to the DLP API service account executing
|
|
# the long running DlpJob sending the notifications.
|
|
# Format is projects/{project}/topics/{topic}.
|
|
},
|
|
},
|
|
],
|
|
},
|
|
},
|
|
"result": { # All result fields mentioned below are updated while the job is processing. # A summary of the outcome of this inspect job.
|
|
"infoTypeStats": [ # Statistics of how many instances of each info type were found during
|
|
# inspect job.
|
|
{ # Statistics regarding a specific InfoType.
|
|
"count": "A String", # Number of findings for this infoType.
|
|
"infoType": { # Type of information detected by the API. # The type of finding this stat is for.
|
|
"name": "A String", # Name of the information type. Either a name of your choosing when
|
|
# creating a CustomInfoType, or one of the names listed
|
|
# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying
|
|
# a built-in type. InfoType names should conform to the pattern
|
|
# [a-zA-Z0-9_]{1,64}.
|
|
},
|
|
},
|
|
],
|
|
"totalEstimatedBytes": "A String", # Estimate of the number of bytes to process.
|
|
"processedBytes": "A String", # Total size in bytes that were processed.
|
|
},
|
|
},
|
|
"riskDetails": { # Result of a risk analysis operation request. # Results from analyzing risk of a data source.
|
|
"numericalStatsResult": { # Result of the numerical stats computation.
|
|
"quantileValues": [ # List of 99 values that partition the set of field values into 100 equal
|
|
# sized buckets.
|
|
{ # Set of primitive values supported by the system.
|
|
# Note that for the purposes of inspection or transformation, the number
|
|
# of bytes considered to comprise a 'Value' is based on its representation
|
|
# as a UTF-8 encoded string. For example, if 'integer_value' is set to
|
|
# 123456789, the number of bytes would be counted as 9, even though an
|
|
# int64 only holds up to 8 bytes of data.
|
|
"floatValue": 3.14,
|
|
"timestampValue": "A String",
|
|
"dayOfWeekValue": "A String",
|
|
"timeValue": { # Represents a time of day. The date and time zone are either not significant
|
|
# or are specified elsewhere. An API may choose to allow leap seconds. Related
|
|
# types are google.type.Date and `google.protobuf.Timestamp`.
|
|
"hours": 42, # Hours of day in 24 hour format. Should be from 0 to 23. An API may choose
|
|
# to allow the value "24:00:00" for scenarios like business closing time.
|
|
"nanos": 42, # Fractions of seconds in nanoseconds. Must be from 0 to 999,999,999.
|
|
"seconds": 42, # Seconds of minutes of the time. Must normally be from 0 to 59. An API may
|
|
# allow the value 60 if it allows leap-seconds.
|
|
"minutes": 42, # Minutes of hour of day. Must be from 0 to 59.
|
|
},
|
|
"dateValue": { # Represents a whole or partial calendar date, e.g. a birthday. The time of day
|
|
# and time zone are either specified elsewhere or are not significant. The date
|
|
# is relative to the Proleptic Gregorian Calendar. This can represent:
|
|
#
|
|
# * A full date, with non-zero year, month and day values
|
|
# * A month and day value, with a zero year, e.g. an anniversary
|
|
# * A year on its own, with zero month and day values
|
|
# * A year and month value, with a zero day, e.g. a credit card expiration date
|
|
#
|
|
# Related types are google.type.TimeOfDay and `google.protobuf.Timestamp`.
|
|
"year": 42, # Year of date. Must be from 1 to 9999, or 0 if specifying a date without
|
|
# a year.
|
|
"day": 42, # Day of month. Must be from 1 to 31 and valid for the year and month, or 0
|
|
# if specifying a year by itself or a year and month where the day is not
|
|
# significant.
|
|
"month": 42, # Month of year. Must be from 1 to 12, or 0 if specifying a year without a
|
|
# month and day.
|
|
},
|
|
"stringValue": "A String",
|
|
"booleanValue": True or False,
|
|
"integerValue": "A String",
|
|
},
|
|
],
|
|
"maxValue": { # Set of primitive values supported by the system. # Maximum value appearing in the column.
|
|
# Note that for the purposes of inspection or transformation, the number
|
|
# of bytes considered to comprise a 'Value' is based on its representation
|
|
# as a UTF-8 encoded string. For example, if 'integer_value' is set to
|
|
# 123456789, the number of bytes would be counted as 9, even though an
|
|
# int64 only holds up to 8 bytes of data.
|
|
"floatValue": 3.14,
|
|
"timestampValue": "A String",
|
|
"dayOfWeekValue": "A String",
|
|
"timeValue": { # Represents a time of day. The date and time zone are either not significant
|
|
# or are specified elsewhere. An API may choose to allow leap seconds. Related
|
|
# types are google.type.Date and `google.protobuf.Timestamp`.
|
|
"hours": 42, # Hours of day in 24 hour format. Should be from 0 to 23. An API may choose
|
|
# to allow the value "24:00:00" for scenarios like business closing time.
|
|
"nanos": 42, # Fractions of seconds in nanoseconds. Must be from 0 to 999,999,999.
|
|
"seconds": 42, # Seconds of minutes of the time. Must normally be from 0 to 59. An API may
|
|
# allow the value 60 if it allows leap-seconds.
|
|
"minutes": 42, # Minutes of hour of day. Must be from 0 to 59.
|
|
},
|
|
"dateValue": { # Represents a whole or partial calendar date, e.g. a birthday. The time of day
|
|
# and time zone are either specified elsewhere or are not significant. The date
|
|
# is relative to the Proleptic Gregorian Calendar. This can represent:
|
|
#
|
|
# * A full date, with non-zero year, month and day values
|
|
# * A month and day value, with a zero year, e.g. an anniversary
|
|
# * A year on its own, with zero month and day values
|
|
# * A year and month value, with a zero day, e.g. a credit card expiration date
|
|
#
|
|
# Related types are google.type.TimeOfDay and `google.protobuf.Timestamp`.
|
|
"year": 42, # Year of date. Must be from 1 to 9999, or 0 if specifying a date without
|
|
# a year.
|
|
"day": 42, # Day of month. Must be from 1 to 31 and valid for the year and month, or 0
|
|
# if specifying a year by itself or a year and month where the day is not
|
|
# significant.
|
|
"month": 42, # Month of year. Must be from 1 to 12, or 0 if specifying a year without a
|
|
# month and day.
|
|
},
|
|
"stringValue": "A String",
|
|
"booleanValue": True or False,
|
|
"integerValue": "A String",
|
|
},
|
|
"minValue": { # Set of primitive values supported by the system. # Minimum value appearing in the column.
|
|
# Note that for the purposes of inspection or transformation, the number
|
|
# of bytes considered to comprise a 'Value' is based on its representation
|
|
# as a UTF-8 encoded string. For example, if 'integer_value' is set to
|
|
# 123456789, the number of bytes would be counted as 9, even though an
|
|
# int64 only holds up to 8 bytes of data.
|
|
"floatValue": 3.14,
|
|
"timestampValue": "A String",
|
|
"dayOfWeekValue": "A String",
|
|
"timeValue": { # Represents a time of day. The date and time zone are either not significant
|
|
# or are specified elsewhere. An API may choose to allow leap seconds. Related
|
|
# types are google.type.Date and `google.protobuf.Timestamp`.
|
|
"hours": 42, # Hours of day in 24 hour format. Should be from 0 to 23. An API may choose
|
|
# to allow the value "24:00:00" for scenarios like business closing time.
|
|
"nanos": 42, # Fractions of seconds in nanoseconds. Must be from 0 to 999,999,999.
|
|
"seconds": 42, # Seconds of minutes of the time. Must normally be from 0 to 59. An API may
|
|
# allow the value 60 if it allows leap-seconds.
|
|
"minutes": 42, # Minutes of hour of day. Must be from 0 to 59.
|
|
},
|
|
"dateValue": { # Represents a whole or partial calendar date, e.g. a birthday. The time of day
|
|
# and time zone are either specified elsewhere or are not significant. The date
|
|
# is relative to the Proleptic Gregorian Calendar. This can represent:
|
|
#
|
|
# * A full date, with non-zero year, month and day values
|
|
# * A month and day value, with a zero year, e.g. an anniversary
|
|
# * A year on its own, with zero month and day values
|
|
# * A year and month value, with a zero day, e.g. a credit card expiration date
|
|
#
|
|
# Related types are google.type.TimeOfDay and `google.protobuf.Timestamp`.
|
|
"year": 42, # Year of date. Must be from 1 to 9999, or 0 if specifying a date without
|
|
# a year.
|
|
"day": 42, # Day of month. Must be from 1 to 31 and valid for the year and month, or 0
|
|
# if specifying a year by itself or a year and month where the day is not
|
|
# significant.
|
|
"month": 42, # Month of year. Must be from 1 to 12, or 0 if specifying a year without a
|
|
# month and day.
|
|
},
|
|
"stringValue": "A String",
|
|
"booleanValue": True or False,
|
|
"integerValue": "A String",
|
|
},
|
|
},
|
|
"kMapEstimationResult": { # Result of the reidentifiability analysis. Note that these results are an
|
|
# estimation, not exact values.
|
|
"kMapEstimationHistogram": [ # The intervals [min_anonymity, max_anonymity] do not overlap. If a value
|
|
# doesn't correspond to any such interval, the associated frequency is
|
|
# zero. For example, the following records:
|
|
# {min_anonymity: 1, max_anonymity: 1, frequency: 17}
|
|
# {min_anonymity: 2, max_anonymity: 3, frequency: 42}
|
|
# {min_anonymity: 5, max_anonymity: 10, frequency: 99}
|
|
# mean that there are no record with an estimated anonymity of 4, 5, or
|
|
# larger than 10.
|
|
{ # A KMapEstimationHistogramBucket message with the following values:
|
|
# min_anonymity: 3
|
|
# max_anonymity: 5
|
|
# frequency: 42
|
|
# means that there are 42 records whose quasi-identifier values correspond
|
|
# to 3, 4 or 5 people in the overlying population. An important particular
|
|
# case is when min_anonymity = max_anonymity = 1: the frequency field then
|
|
# corresponds to the number of uniquely identifiable records.
|
|
"bucketValues": [ # Sample of quasi-identifier tuple values in this bucket. The total
|
|
# number of classes returned per bucket is capped at 20.
|
|
{ # A tuple of values for the quasi-identifier columns.
|
|
"estimatedAnonymity": "A String", # The estimated anonymity for these quasi-identifier values.
|
|
"quasiIdsValues": [ # The quasi-identifier values.
|
|
{ # Set of primitive values supported by the system.
|
|
# Note that for the purposes of inspection or transformation, the number
|
|
# of bytes considered to comprise a 'Value' is based on its representation
|
|
# as a UTF-8 encoded string. For example, if 'integer_value' is set to
|
|
# 123456789, the number of bytes would be counted as 9, even though an
|
|
# int64 only holds up to 8 bytes of data.
|
|
"floatValue": 3.14,
|
|
"timestampValue": "A String",
|
|
"dayOfWeekValue": "A String",
|
|
"timeValue": { # Represents a time of day. The date and time zone are either not significant
|
|
# or are specified elsewhere. An API may choose to allow leap seconds. Related
|
|
# types are google.type.Date and `google.protobuf.Timestamp`.
|
|
"hours": 42, # Hours of day in 24 hour format. Should be from 0 to 23. An API may choose
|
|
# to allow the value "24:00:00" for scenarios like business closing time.
|
|
"nanos": 42, # Fractions of seconds in nanoseconds. Must be from 0 to 999,999,999.
|
|
"seconds": 42, # Seconds of minutes of the time. Must normally be from 0 to 59. An API may
|
|
# allow the value 60 if it allows leap-seconds.
|
|
"minutes": 42, # Minutes of hour of day. Must be from 0 to 59.
|
|
},
|
|
"dateValue": { # Represents a whole or partial calendar date, e.g. a birthday. The time of day
|
|
# and time zone are either specified elsewhere or are not significant. The date
|
|
# is relative to the Proleptic Gregorian Calendar. This can represent:
|
|
#
|
|
# * A full date, with non-zero year, month and day values
|
|
# * A month and day value, with a zero year, e.g. an anniversary
|
|
# * A year on its own, with zero month and day values
|
|
# * A year and month value, with a zero day, e.g. a credit card expiration date
|
|
#
|
|
# Related types are google.type.TimeOfDay and `google.protobuf.Timestamp`.
|
|
"year": 42, # Year of date. Must be from 1 to 9999, or 0 if specifying a date without
|
|
# a year.
|
|
"day": 42, # Day of month. Must be from 1 to 31 and valid for the year and month, or 0
|
|
# if specifying a year by itself or a year and month where the day is not
|
|
# significant.
|
|
"month": 42, # Month of year. Must be from 1 to 12, or 0 if specifying a year without a
|
|
# month and day.
|
|
},
|
|
"stringValue": "A String",
|
|
"booleanValue": True or False,
|
|
"integerValue": "A String",
|
|
},
|
|
],
|
|
},
|
|
],
|
|
"minAnonymity": "A String", # Always positive.
|
|
"bucketValueCount": "A String", # Total number of distinct quasi-identifier tuple values in this bucket.
|
|
"maxAnonymity": "A String", # Always greater than or equal to min_anonymity.
|
|
"bucketSize": "A String", # Number of records within these anonymity bounds.
|
|
},
|
|
],
|
|
},
|
|
"kAnonymityResult": { # Result of the k-anonymity computation.
|
|
"equivalenceClassHistogramBuckets": [ # Histogram of k-anonymity equivalence classes.
|
|
{
|
|
"bucketValues": [ # Sample of equivalence classes in this bucket. The total number of
|
|
# classes returned per bucket is capped at 20.
|
|
{ # The set of columns' values that share the same ldiversity value
|
|
"quasiIdsValues": [ # Set of values defining the equivalence class. One value per
|
|
# quasi-identifier column in the original KAnonymity metric message.
|
|
# The order is always the same as the original request.
|
|
{ # Set of primitive values supported by the system.
|
|
# Note that for the purposes of inspection or transformation, the number
|
|
# of bytes considered to comprise a 'Value' is based on its representation
|
|
# as a UTF-8 encoded string. For example, if 'integer_value' is set to
|
|
# 123456789, the number of bytes would be counted as 9, even though an
|
|
# int64 only holds up to 8 bytes of data.
|
|
"floatValue": 3.14,
|
|
"timestampValue": "A String",
|
|
"dayOfWeekValue": "A String",
|
|
"timeValue": { # Represents a time of day. The date and time zone are either not significant
|
|
# or are specified elsewhere. An API may choose to allow leap seconds. Related
|
|
# types are google.type.Date and `google.protobuf.Timestamp`.
|
|
"hours": 42, # Hours of day in 24 hour format. Should be from 0 to 23. An API may choose
|
|
# to allow the value "24:00:00" for scenarios like business closing time.
|
|
"nanos": 42, # Fractions of seconds in nanoseconds. Must be from 0 to 999,999,999.
|
|
"seconds": 42, # Seconds of minutes of the time. Must normally be from 0 to 59. An API may
|
|
# allow the value 60 if it allows leap-seconds.
|
|
"minutes": 42, # Minutes of hour of day. Must be from 0 to 59.
|
|
},
|
|
"dateValue": { # Represents a whole or partial calendar date, e.g. a birthday. The time of day
|
|
# and time zone are either specified elsewhere or are not significant. The date
|
|
# is relative to the Proleptic Gregorian Calendar. This can represent:
|
|
#
|
|
# * A full date, with non-zero year, month and day values
|
|
# * A month and day value, with a zero year, e.g. an anniversary
|
|
# * A year on its own, with zero month and day values
|
|
# * A year and month value, with a zero day, e.g. a credit card expiration date
|
|
#
|
|
# Related types are google.type.TimeOfDay and `google.protobuf.Timestamp`.
|
|
"year": 42, # Year of date. Must be from 1 to 9999, or 0 if specifying a date without
|
|
# a year.
|
|
"day": 42, # Day of month. Must be from 1 to 31 and valid for the year and month, or 0
|
|
# if specifying a year by itself or a year and month where the day is not
|
|
# significant.
|
|
"month": 42, # Month of year. Must be from 1 to 12, or 0 if specifying a year without a
|
|
# month and day.
|
|
},
|
|
"stringValue": "A String",
|
|
"booleanValue": True or False,
|
|
"integerValue": "A String",
|
|
},
|
|
],
|
|
"equivalenceClassSize": "A String", # Size of the equivalence class, for example number of rows with the
|
|
# above set of values.
|
|
},
|
|
],
|
|
"bucketValueCount": "A String", # Total number of distinct equivalence classes in this bucket.
|
|
"equivalenceClassSizeLowerBound": "A String", # Lower bound on the size of the equivalence classes in this bucket.
|
|
"equivalenceClassSizeUpperBound": "A String", # Upper bound on the size of the equivalence classes in this bucket.
|
|
"bucketSize": "A String", # Total number of equivalence classes in this bucket.
|
|
},
|
|
],
|
|
},
|
|
"lDiversityResult": { # Result of the l-diversity computation.
|
|
"sensitiveValueFrequencyHistogramBuckets": [ # Histogram of l-diversity equivalence class sensitive value frequencies.
|
|
{
|
|
"bucketValues": [ # Sample of equivalence classes in this bucket. The total number of
|
|
# classes returned per bucket is capped at 20.
|
|
{ # The set of columns' values that share the same ldiversity value.
|
|
"numDistinctSensitiveValues": "A String", # Number of distinct sensitive values in this equivalence class.
|
|
"quasiIdsValues": [ # Quasi-identifier values defining the k-anonymity equivalence
|
|
# class. The order is always the same as the original request.
|
|
{ # Set of primitive values supported by the system.
|
|
# Note that for the purposes of inspection or transformation, the number
|
|
# of bytes considered to comprise a 'Value' is based on its representation
|
|
# as a UTF-8 encoded string. For example, if 'integer_value' is set to
|
|
# 123456789, the number of bytes would be counted as 9, even though an
|
|
# int64 only holds up to 8 bytes of data.
|
|
"floatValue": 3.14,
|
|
"timestampValue": "A String",
|
|
"dayOfWeekValue": "A String",
|
|
"timeValue": { # Represents a time of day. The date and time zone are either not significant
|
|
# or are specified elsewhere. An API may choose to allow leap seconds. Related
|
|
# types are google.type.Date and `google.protobuf.Timestamp`.
|
|
"hours": 42, # Hours of day in 24 hour format. Should be from 0 to 23. An API may choose
|
|
# to allow the value "24:00:00" for scenarios like business closing time.
|
|
"nanos": 42, # Fractions of seconds in nanoseconds. Must be from 0 to 999,999,999.
|
|
"seconds": 42, # Seconds of minutes of the time. Must normally be from 0 to 59. An API may
|
|
# allow the value 60 if it allows leap-seconds.
|
|
"minutes": 42, # Minutes of hour of day. Must be from 0 to 59.
|
|
},
|
|
"dateValue": { # Represents a whole or partial calendar date, e.g. a birthday. The time of day
|
|
# and time zone are either specified elsewhere or are not significant. The date
|
|
# is relative to the Proleptic Gregorian Calendar. This can represent:
|
|
#
|
|
# * A full date, with non-zero year, month and day values
|
|
# * A month and day value, with a zero year, e.g. an anniversary
|
|
# * A year on its own, with zero month and day values
|
|
# * A year and month value, with a zero day, e.g. a credit card expiration date
|
|
#
|
|
# Related types are google.type.TimeOfDay and `google.protobuf.Timestamp`.
|
|
"year": 42, # Year of date. Must be from 1 to 9999, or 0 if specifying a date without
|
|
# a year.
|
|
"day": 42, # Day of month. Must be from 1 to 31 and valid for the year and month, or 0
|
|
# if specifying a year by itself or a year and month where the day is not
|
|
# significant.
|
|
"month": 42, # Month of year. Must be from 1 to 12, or 0 if specifying a year without a
|
|
# month and day.
|
|
},
|
|
"stringValue": "A String",
|
|
"booleanValue": True or False,
|
|
"integerValue": "A String",
|
|
},
|
|
],
|
|
"topSensitiveValues": [ # Estimated frequencies of top sensitive values.
|
|
{ # A value of a field, including its frequency.
|
|
"count": "A String", # How many times the value is contained in the field.
|
|
"value": { # Set of primitive values supported by the system. # A value contained in the field in question.
|
|
# Note that for the purposes of inspection or transformation, the number
|
|
# of bytes considered to comprise a 'Value' is based on its representation
|
|
# as a UTF-8 encoded string. For example, if 'integer_value' is set to
|
|
# 123456789, the number of bytes would be counted as 9, even though an
|
|
# int64 only holds up to 8 bytes of data.
|
|
"floatValue": 3.14,
|
|
"timestampValue": "A String",
|
|
"dayOfWeekValue": "A String",
|
|
"timeValue": { # Represents a time of day. The date and time zone are either not significant
|
|
# or are specified elsewhere. An API may choose to allow leap seconds. Related
|
|
# types are google.type.Date and `google.protobuf.Timestamp`.
|
|
"hours": 42, # Hours of day in 24 hour format. Should be from 0 to 23. An API may choose
|
|
# to allow the value "24:00:00" for scenarios like business closing time.
|
|
"nanos": 42, # Fractions of seconds in nanoseconds. Must be from 0 to 999,999,999.
|
|
"seconds": 42, # Seconds of minutes of the time. Must normally be from 0 to 59. An API may
|
|
# allow the value 60 if it allows leap-seconds.
|
|
"minutes": 42, # Minutes of hour of day. Must be from 0 to 59.
|
|
},
|
|
"dateValue": { # Represents a whole or partial calendar date, e.g. a birthday. The time of day
|
|
# and time zone are either specified elsewhere or are not significant. The date
|
|
# is relative to the Proleptic Gregorian Calendar. This can represent:
|
|
#
|
|
# * A full date, with non-zero year, month and day values
|
|
# * A month and day value, with a zero year, e.g. an anniversary
|
|
# * A year on its own, with zero month and day values
|
|
# * A year and month value, with a zero day, e.g. a credit card expiration date
|
|
#
|
|
# Related types are google.type.TimeOfDay and `google.protobuf.Timestamp`.
|
|
"year": 42, # Year of date. Must be from 1 to 9999, or 0 if specifying a date without
|
|
# a year.
|
|
"day": 42, # Day of month. Must be from 1 to 31 and valid for the year and month, or 0
|
|
# if specifying a year by itself or a year and month where the day is not
|
|
# significant.
|
|
"month": 42, # Month of year. Must be from 1 to 12, or 0 if specifying a year without a
|
|
# month and day.
|
|
},
|
|
"stringValue": "A String",
|
|
"booleanValue": True or False,
|
|
"integerValue": "A String",
|
|
},
|
|
},
|
|
],
|
|
"equivalenceClassSize": "A String", # Size of the k-anonymity equivalence class.
|
|
},
|
|
],
|
|
"bucketValueCount": "A String", # Total number of distinct equivalence classes in this bucket.
|
|
"bucketSize": "A String", # Total number of equivalence classes in this bucket.
|
|
"sensitiveValueFrequencyUpperBound": "A String", # Upper bound on the sensitive value frequencies of the equivalence
|
|
# classes in this bucket.
|
|
"sensitiveValueFrequencyLowerBound": "A String", # Lower bound on the sensitive value frequencies of the equivalence
|
|
# classes in this bucket.
|
|
},
|
|
],
|
|
},
|
|
"requestedPrivacyMetric": { # Privacy metric to compute for reidentification risk analysis. # Privacy metric to compute.
|
|
"numericalStatsConfig": { # Compute numerical stats over an individual column, including
|
|
# min, max, and quantiles.
|
|
"field": { # General identifier of a data field in a storage service. # Field to compute numerical stats on. Supported types are
|
|
# integer, float, date, datetime, timestamp, time.
|
|
"name": "A String", # Name describing the field.
|
|
},
|
|
},
|
|
"kMapEstimationConfig": { # Reidentifiability metric. This corresponds to a risk model similar to what
|
|
# is called "journalist risk" in the literature, except the attack dataset is
|
|
# statistically modeled instead of being perfectly known. This can be done
|
|
# using publicly available data (like the US Census), or using a custom
|
|
# statistical model (indicated as one or several BigQuery tables), or by
|
|
# extrapolating from the distribution of values in the input dataset.
|
|
# A column with a semantic tag attached.
|
|
"regionCode": "A String", # ISO 3166-1 alpha-2 region code to use in the statistical modeling.
|
|
# Required if no column is tagged with a region-specific InfoType (like
|
|
# US_ZIP_5) or a region code.
|
|
"quasiIds": [ # Fields considered to be quasi-identifiers. No two columns can have the
|
|
# same tag. [required]
|
|
{
|
|
"field": { # General identifier of a data field in a storage service. # Identifies the column. [required]
|
|
"name": "A String", # Name describing the field.
|
|
},
|
|
"customTag": "A String", # A column can be tagged with a custom tag. In this case, the user must
|
|
# indicate an auxiliary table that contains statistical information on
|
|
# the possible values of this column (below).
|
|
"infoType": { # Type of information detected by the API. # A column can be tagged with a InfoType to use the relevant public
|
|
# dataset as a statistical model of population, if available. We
|
|
# currently support US ZIP codes, region codes, ages and genders.
|
|
# To programmatically obtain the list of supported InfoTypes, use
|
|
# ListInfoTypes with the supported_by=RISK_ANALYSIS filter.
|
|
"name": "A String", # Name of the information type. Either a name of your choosing when
|
|
# creating a CustomInfoType, or one of the names listed
|
|
# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying
|
|
# a built-in type. InfoType names should conform to the pattern
|
|
# [a-zA-Z0-9_]{1,64}.
|
|
},
|
|
"inferred": { # A generic empty message that you can re-use to avoid defining duplicated # If no semantic tag is indicated, we infer the statistical model from
|
|
# the distribution of values in the input data
|
|
# empty messages in your APIs. A typical example is to use it as the request
|
|
# or the response type of an API method. For instance:
|
|
#
|
|
# service Foo {
|
|
# rpc Bar(google.protobuf.Empty) returns (google.protobuf.Empty);
|
|
# }
|
|
#
|
|
# The JSON representation for `Empty` is empty JSON object `{}`.
|
|
},
|
|
},
|
|
],
|
|
"auxiliaryTables": [ # Several auxiliary tables can be used in the analysis. Each custom_tag
|
|
# used to tag a quasi-identifiers column must appear in exactly one column
|
|
# of one auxiliary table.
|
|
{ # An auxiliary table contains statistical information on the relative
|
|
# frequency of different quasi-identifiers values. It has one or several
|
|
# quasi-identifiers columns, and one column that indicates the relative
|
|
# frequency of each quasi-identifier tuple.
|
|
# If a tuple is present in the data but not in the auxiliary table, the
|
|
# corresponding relative frequency is assumed to be zero (and thus, the
|
|
# tuple is highly reidentifiable).
|
|
"relativeFrequency": { # General identifier of a data field in a storage service. # The relative frequency column must contain a floating-point number
|
|
# between 0 and 1 (inclusive). Null values are assumed to be zero.
|
|
# [required]
|
|
"name": "A String", # Name describing the field.
|
|
},
|
|
"quasiIds": [ # Quasi-identifier columns. [required]
|
|
{ # A quasi-identifier column has a custom_tag, used to know which column
|
|
# in the data corresponds to which column in the statistical model.
|
|
"field": { # General identifier of a data field in a storage service.
|
|
"name": "A String", # Name describing the field.
|
|
},
|
|
"customTag": "A String",
|
|
},
|
|
],
|
|
"table": { # Message defining the location of a BigQuery table. A table is uniquely # Auxiliary table location. [required]
|
|
# identified by its project_id, dataset_id, and table_name. Within a query
|
|
# a table is often referenced with a string in the format of:
|
|
# `<project_id>:<dataset_id>.<table_id>` or
|
|
# `<project_id>.<dataset_id>.<table_id>`.
|
|
"projectId": "A String", # The Google Cloud Platform project ID of the project containing the table.
|
|
# If omitted, project ID is inferred from the API call.
|
|
"tableId": "A String", # Name of the table.
|
|
"datasetId": "A String", # Dataset ID of the table.
|
|
},
|
|
},
|
|
],
|
|
},
|
|
"lDiversityConfig": { # l-diversity metric, used for analysis of reidentification risk.
|
|
"sensitiveAttribute": { # General identifier of a data field in a storage service. # Sensitive field for computing the l-value.
|
|
"name": "A String", # Name describing the field.
|
|
},
|
|
"quasiIds": [ # Set of quasi-identifiers indicating how equivalence classes are
|
|
# defined for the l-diversity computation. When multiple fields are
|
|
# specified, they are considered a single composite key.
|
|
{ # General identifier of a data field in a storage service.
|
|
"name": "A String", # Name describing the field.
|
|
},
|
|
],
|
|
},
|
|
"deltaPresenceEstimationConfig": { # δ-presence metric, used to estimate how likely it is for an attacker to
|
|
# figure out that one given individual appears in a de-identified dataset.
|
|
# Similarly to the k-map metric, we cannot compute δ-presence exactly without
|
|
# knowing the attack dataset, so we use a statistical model instead.
|
|
"regionCode": "A String", # ISO 3166-1 alpha-2 region code to use in the statistical modeling.
|
|
# Required if no column is tagged with a region-specific InfoType (like
|
|
# US_ZIP_5) or a region code.
|
|
"quasiIds": [ # Fields considered to be quasi-identifiers. No two fields can have the
|
|
# same tag. [required]
|
|
{ # A column with a semantic tag attached.
|
|
"field": { # General identifier of a data field in a storage service. # Identifies the column. [required]
|
|
"name": "A String", # Name describing the field.
|
|
},
|
|
"customTag": "A String", # A column can be tagged with a custom tag. In this case, the user must
|
|
# indicate an auxiliary table that contains statistical information on
|
|
# the possible values of this column (below).
|
|
"infoType": { # Type of information detected by the API. # A column can be tagged with a InfoType to use the relevant public
|
|
# dataset as a statistical model of population, if available. We
|
|
# currently support US ZIP codes, region codes, ages and genders.
|
|
# To programmatically obtain the list of supported InfoTypes, use
|
|
# ListInfoTypes with the supported_by=RISK_ANALYSIS filter.
|
|
"name": "A String", # Name of the information type. Either a name of your choosing when
|
|
# creating a CustomInfoType, or one of the names listed
|
|
# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying
|
|
# a built-in type. InfoType names should conform to the pattern
|
|
# [a-zA-Z0-9_]{1,64}.
|
|
},
|
|
"inferred": { # A generic empty message that you can re-use to avoid defining duplicated # If no semantic tag is indicated, we infer the statistical model from
|
|
# the distribution of values in the input data
|
|
# empty messages in your APIs. A typical example is to use it as the request
|
|
# or the response type of an API method. For instance:
|
|
#
|
|
# service Foo {
|
|
# rpc Bar(google.protobuf.Empty) returns (google.protobuf.Empty);
|
|
# }
|
|
#
|
|
# The JSON representation for `Empty` is empty JSON object `{}`.
|
|
},
|
|
},
|
|
],
|
|
"auxiliaryTables": [ # Several auxiliary tables can be used in the analysis. Each custom_tag
|
|
# used to tag a quasi-identifiers field must appear in exactly one
|
|
# field of one auxiliary table.
|
|
{ # An auxiliary table containing statistical information on the relative
|
|
# frequency of different quasi-identifiers values. It has one or several
|
|
# quasi-identifiers columns, and one column that indicates the relative
|
|
# frequency of each quasi-identifier tuple.
|
|
# If a tuple is present in the data but not in the auxiliary table, the
|
|
# corresponding relative frequency is assumed to be zero (and thus, the
|
|
# tuple is highly reidentifiable).
|
|
"relativeFrequency": { # General identifier of a data field in a storage service. # The relative frequency column must contain a floating-point number
|
|
# between 0 and 1 (inclusive). Null values are assumed to be zero.
|
|
# [required]
|
|
"name": "A String", # Name describing the field.
|
|
},
|
|
"quasiIds": [ # Quasi-identifier columns. [required]
|
|
{ # A quasi-identifier column has a custom_tag, used to know which column
|
|
# in the data corresponds to which column in the statistical model.
|
|
"field": { # General identifier of a data field in a storage service.
|
|
"name": "A String", # Name describing the field.
|
|
},
|
|
"customTag": "A String",
|
|
},
|
|
],
|
|
"table": { # Message defining the location of a BigQuery table. A table is uniquely # Auxiliary table location. [required]
|
|
# identified by its project_id, dataset_id, and table_name. Within a query
|
|
# a table is often referenced with a string in the format of:
|
|
# `<project_id>:<dataset_id>.<table_id>` or
|
|
# `<project_id>.<dataset_id>.<table_id>`.
|
|
"projectId": "A String", # The Google Cloud Platform project ID of the project containing the table.
|
|
# If omitted, project ID is inferred from the API call.
|
|
"tableId": "A String", # Name of the table.
|
|
"datasetId": "A String", # Dataset ID of the table.
|
|
},
|
|
},
|
|
],
|
|
},
|
|
"categoricalStatsConfig": { # Compute numerical stats over an individual column, including
|
|
# number of distinct values and value count distribution.
|
|
"field": { # General identifier of a data field in a storage service. # Field to compute categorical stats on. All column types are
|
|
# supported except for arrays and structs. However, it may be more
|
|
# informative to use NumericalStats when the field type is supported,
|
|
# depending on the data.
|
|
"name": "A String", # Name describing the field.
|
|
},
|
|
},
|
|
"kAnonymityConfig": { # k-anonymity metric, used for analysis of reidentification risk.
|
|
"entityId": { # An entity in a dataset is a field or set of fields that correspond to a # Optional message indicating that multiple rows might be associated to a
|
|
# single individual. If the same entity_id is associated to multiple
|
|
# quasi-identifier tuples over distinct rows, we consider the entire
|
|
# collection of tuples as the composite quasi-identifier. This collection
|
|
# is a multiset: the order in which the different tuples appear in the
|
|
# dataset is ignored, but their frequency is taken into account.
|
|
#
|
|
# Important note: a maximum of 1000 rows can be associated to a single
|
|
# entity ID. If more rows are associated with the same entity ID, some
|
|
# might be ignored.
|
|
# single person. For example, in medical records the `EntityId` might be a
|
|
# patient identifier, or for financial records it might be an account
|
|
# identifier. This message is used when generalizations or analysis must take
|
|
# into account that multiple rows correspond to the same entity.
|
|
"field": { # General identifier of a data field in a storage service. # Composite key indicating which field contains the entity identifier.
|
|
"name": "A String", # Name describing the field.
|
|
},
|
|
},
|
|
"quasiIds": [ # Set of fields to compute k-anonymity over. When multiple fields are
|
|
# specified, they are considered a single composite key. Structs and
|
|
# repeated data types are not supported; however, nested fields are
|
|
# supported so long as they are not structs themselves or nested within
|
|
# a repeated field.
|
|
{ # General identifier of a data field in a storage service.
|
|
"name": "A String", # Name describing the field.
|
|
},
|
|
],
|
|
},
|
|
},
|
|
"categoricalStatsResult": { # Result of the categorical stats computation.
|
|
"valueFrequencyHistogramBuckets": [ # Histogram of value frequencies in the column.
|
|
{
|
|
"bucketValues": [ # Sample of value frequencies in this bucket. The total number of
|
|
# values returned per bucket is capped at 20.
|
|
{ # A value of a field, including its frequency.
|
|
"count": "A String", # How many times the value is contained in the field.
|
|
"value": { # Set of primitive values supported by the system. # A value contained in the field in question.
|
|
# Note that for the purposes of inspection or transformation, the number
|
|
# of bytes considered to comprise a 'Value' is based on its representation
|
|
# as a UTF-8 encoded string. For example, if 'integer_value' is set to
|
|
# 123456789, the number of bytes would be counted as 9, even though an
|
|
# int64 only holds up to 8 bytes of data.
|
|
"floatValue": 3.14,
|
|
"timestampValue": "A String",
|
|
"dayOfWeekValue": "A String",
|
|
"timeValue": { # Represents a time of day. The date and time zone are either not significant
|
|
# or are specified elsewhere. An API may choose to allow leap seconds. Related
|
|
# types are google.type.Date and `google.protobuf.Timestamp`.
|
|
"hours": 42, # Hours of day in 24 hour format. Should be from 0 to 23. An API may choose
|
|
# to allow the value "24:00:00" for scenarios like business closing time.
|
|
"nanos": 42, # Fractions of seconds in nanoseconds. Must be from 0 to 999,999,999.
|
|
"seconds": 42, # Seconds of minutes of the time. Must normally be from 0 to 59. An API may
|
|
# allow the value 60 if it allows leap-seconds.
|
|
"minutes": 42, # Minutes of hour of day. Must be from 0 to 59.
|
|
},
|
|
"dateValue": { # Represents a whole or partial calendar date, e.g. a birthday. The time of day
|
|
# and time zone are either specified elsewhere or are not significant. The date
|
|
# is relative to the Proleptic Gregorian Calendar. This can represent:
|
|
#
|
|
# * A full date, with non-zero year, month and day values
|
|
# * A month and day value, with a zero year, e.g. an anniversary
|
|
# * A year on its own, with zero month and day values
|
|
# * A year and month value, with a zero day, e.g. a credit card expiration date
|
|
#
|
|
# Related types are google.type.TimeOfDay and `google.protobuf.Timestamp`.
|
|
"year": 42, # Year of date. Must be from 1 to 9999, or 0 if specifying a date without
|
|
# a year.
|
|
"day": 42, # Day of month. Must be from 1 to 31 and valid for the year and month, or 0
|
|
# if specifying a year by itself or a year and month where the day is not
|
|
# significant.
|
|
"month": 42, # Month of year. Must be from 1 to 12, or 0 if specifying a year without a
|
|
# month and day.
|
|
},
|
|
"stringValue": "A String",
|
|
"booleanValue": True or False,
|
|
"integerValue": "A String",
|
|
},
|
|
},
|
|
],
|
|
"bucketValueCount": "A String", # Total number of distinct values in this bucket.
|
|
"valueFrequencyUpperBound": "A String", # Upper bound on the value frequency of the values in this bucket.
|
|
"valueFrequencyLowerBound": "A String", # Lower bound on the value frequency of the values in this bucket.
|
|
"bucketSize": "A String", # Total number of values in this bucket.
|
|
},
|
|
],
|
|
},
|
|
"deltaPresenceEstimationResult": { # Result of the δ-presence computation. Note that these results are an
|
|
# estimation, not exact values.
|
|
"deltaPresenceEstimationHistogram": [ # The intervals [min_probability, max_probability) do not overlap. If a
|
|
# value doesn't correspond to any such interval, the associated frequency
|
|
# is zero. For example, the following records:
|
|
# {min_probability: 0, max_probability: 0.1, frequency: 17}
|
|
# {min_probability: 0.2, max_probability: 0.3, frequency: 42}
|
|
# {min_probability: 0.3, max_probability: 0.4, frequency: 99}
|
|
# mean that there are no record with an estimated probability in [0.1, 0.2)
|
|
# nor larger or equal to 0.4.
|
|
{ # A DeltaPresenceEstimationHistogramBucket message with the following
|
|
# values:
|
|
# min_probability: 0.1
|
|
# max_probability: 0.2
|
|
# frequency: 42
|
|
# means that there are 42 records for which δ is in [0.1, 0.2). An
|
|
# important particular case is when min_probability = max_probability = 1:
|
|
# then, every individual who shares this quasi-identifier combination is in
|
|
# the dataset.
|
|
"bucketValues": [ # Sample of quasi-identifier tuple values in this bucket. The total
|
|
# number of classes returned per bucket is capped at 20.
|
|
{ # A tuple of values for the quasi-identifier columns.
|
|
"quasiIdsValues": [ # The quasi-identifier values.
|
|
{ # Set of primitive values supported by the system.
|
|
# Note that for the purposes of inspection or transformation, the number
|
|
# of bytes considered to comprise a 'Value' is based on its representation
|
|
# as a UTF-8 encoded string. For example, if 'integer_value' is set to
|
|
# 123456789, the number of bytes would be counted as 9, even though an
|
|
# int64 only holds up to 8 bytes of data.
|
|
"floatValue": 3.14,
|
|
"timestampValue": "A String",
|
|
"dayOfWeekValue": "A String",
|
|
"timeValue": { # Represents a time of day. The date and time zone are either not significant
|
|
# or are specified elsewhere. An API may choose to allow leap seconds. Related
|
|
# types are google.type.Date and `google.protobuf.Timestamp`.
|
|
"hours": 42, # Hours of day in 24 hour format. Should be from 0 to 23. An API may choose
|
|
# to allow the value "24:00:00" for scenarios like business closing time.
|
|
"nanos": 42, # Fractions of seconds in nanoseconds. Must be from 0 to 999,999,999.
|
|
"seconds": 42, # Seconds of minutes of the time. Must normally be from 0 to 59. An API may
|
|
# allow the value 60 if it allows leap-seconds.
|
|
"minutes": 42, # Minutes of hour of day. Must be from 0 to 59.
|
|
},
|
|
"dateValue": { # Represents a whole or partial calendar date, e.g. a birthday. The time of day
|
|
# and time zone are either specified elsewhere or are not significant. The date
|
|
# is relative to the Proleptic Gregorian Calendar. This can represent:
|
|
#
|
|
# * A full date, with non-zero year, month and day values
|
|
# * A month and day value, with a zero year, e.g. an anniversary
|
|
# * A year on its own, with zero month and day values
|
|
# * A year and month value, with a zero day, e.g. a credit card expiration date
|
|
#
|
|
# Related types are google.type.TimeOfDay and `google.protobuf.Timestamp`.
|
|
"year": 42, # Year of date. Must be from 1 to 9999, or 0 if specifying a date without
|
|
# a year.
|
|
"day": 42, # Day of month. Must be from 1 to 31 and valid for the year and month, or 0
|
|
# if specifying a year by itself or a year and month where the day is not
|
|
# significant.
|
|
"month": 42, # Month of year. Must be from 1 to 12, or 0 if specifying a year without a
|
|
# month and day.
|
|
},
|
|
"stringValue": "A String",
|
|
"booleanValue": True or False,
|
|
"integerValue": "A String",
|
|
},
|
|
],
|
|
"estimatedProbability": 3.14, # The estimated probability that a given individual sharing these
|
|
# quasi-identifier values is in the dataset. This value, typically called
|
|
# δ, is the ratio between the number of records in the dataset with these
|
|
# quasi-identifier values, and the total number of individuals (inside
|
|
# *and* outside the dataset) with these quasi-identifier values.
|
|
# For example, if there are 15 individuals in the dataset who share the
|
|
# same quasi-identifier values, and an estimated 100 people in the entire
|
|
# population with these values, then δ is 0.15.
|
|
},
|
|
],
|
|
"bucketValueCount": "A String", # Total number of distinct quasi-identifier tuple values in this bucket.
|
|
"bucketSize": "A String", # Number of records within these probability bounds.
|
|
"maxProbability": 3.14, # Always greater than or equal to min_probability.
|
|
"minProbability": 3.14, # Between 0 and 1.
|
|
},
|
|
],
|
|
},
|
|
"requestedSourceTable": { # Message defining the location of a BigQuery table. A table is uniquely # Input dataset to compute metrics over.
|
|
# identified by its project_id, dataset_id, and table_name. Within a query
|
|
# a table is often referenced with a string in the format of:
|
|
# `<project_id>:<dataset_id>.<table_id>` or
|
|
# `<project_id>.<dataset_id>.<table_id>`.
|
|
"projectId": "A String", # The Google Cloud Platform project ID of the project containing the table.
|
|
# If omitted, project ID is inferred from the API call.
|
|
"tableId": "A String", # Name of the table.
|
|
"datasetId": "A String", # Dataset ID of the table.
|
|
},
|
|
},
|
|
"state": "A String", # State of a job.
|
|
"jobTriggerName": "A String", # If created by a job trigger, the resource name of the trigger that
|
|
# instantiated the job.
|
|
"startTime": "A String", # Time when the job started.
|
|
"endTime": "A String", # Time when the job finished.
|
|
"type": "A String", # The type of job.
|
|
"createTime": "A String", # Time when the job was created.
|
|
}</pre>
|
|
</div>
|
|
|
|
<div class="method">
|
|
<code class="details" id="delete">delete(name, x__xgafv=None)</code>
|
|
<pre>Deletes a long-running DlpJob. This method indicates that the client is
|
|
no longer interested in the DlpJob result. The job will be cancelled if
|
|
possible.
|
|
See https://cloud.google.com/dlp/docs/inspecting-storage and
|
|
https://cloud.google.com/dlp/docs/compute-risk-analysis to learn more.
|
|
|
|
Args:
|
|
name: string, The name of the DlpJob resource to be deleted. (required)
|
|
x__xgafv: string, V1 error format.
|
|
Allowed values
|
|
1 - v1 error format
|
|
2 - v2 error format
|
|
|
|
Returns:
|
|
An object of the form:
|
|
|
|
{ # A generic empty message that you can re-use to avoid defining duplicated
|
|
# empty messages in your APIs. A typical example is to use it as the request
|
|
# or the response type of an API method. For instance:
|
|
#
|
|
# service Foo {
|
|
# rpc Bar(google.protobuf.Empty) returns (google.protobuf.Empty);
|
|
# }
|
|
#
|
|
# The JSON representation for `Empty` is empty JSON object `{}`.
|
|
}</pre>
|
|
</div>
|
|
|
|
<div class="method">
|
|
<code class="details" id="get">get(name, x__xgafv=None)</code>
|
|
<pre>Gets the latest state of a long-running DlpJob.
|
|
See https://cloud.google.com/dlp/docs/inspecting-storage and
|
|
https://cloud.google.com/dlp/docs/compute-risk-analysis to learn more.
|
|
|
|
Args:
|
|
name: string, The name of the DlpJob resource. (required)
|
|
x__xgafv: string, V1 error format.
|
|
Allowed values
|
|
1 - v1 error format
|
|
2 - v2 error format
|
|
|
|
Returns:
|
|
An object of the form:
|
|
|
|
{ # Combines all of the information about a DLP job.
|
|
"errors": [ # A stream of errors encountered running the job.
|
|
{ # Details information about an error encountered during job execution or
|
|
# the results of an unsuccessful activation of the JobTrigger.
|
|
# Output only field.
|
|
"timestamps": [ # The times the error occurred.
|
|
"A String",
|
|
],
|
|
"details": { # The `Status` type defines a logical error model that is suitable for
|
|
# different programming environments, including REST APIs and RPC APIs. It is
|
|
# used by [gRPC](https://github.com/grpc). Each `Status` message contains
|
|
# three pieces of data: error code, error message, and error details.
|
|
#
|
|
# You can find out more about this error model and how to work with it in the
|
|
# [API Design Guide](https://cloud.google.com/apis/design/errors).
|
|
"message": "A String", # A developer-facing error message, which should be in English. Any
|
|
# user-facing error message should be localized and sent in the
|
|
# google.rpc.Status.details field, or localized by the client.
|
|
"code": 42, # The status code, which should be an enum value of google.rpc.Code.
|
|
"details": [ # A list of messages that carry the error details. There is a common set of
|
|
# message types for APIs to use.
|
|
{
|
|
"a_key": "", # Properties of the object. Contains field @type with type URL.
|
|
},
|
|
],
|
|
},
|
|
},
|
|
],
|
|
"name": "A String", # The server-assigned name.
|
|
"inspectDetails": { # The results of an inspect DataSource job. # Results from inspecting a data source.
|
|
"requestedOptions": { # The configuration used for this job.
|
|
"snapshotInspectTemplate": { # The inspectTemplate contains a configuration (set of types of sensitive data # If run with an InspectTemplate, a snapshot of its state at the time of
|
|
# this run.
|
|
# to be detected) to be used anywhere you otherwise would normally specify
|
|
# InspectConfig. See https://cloud.google.com/dlp/docs/concepts-templates
|
|
# to learn more.
|
|
"updateTime": "A String", # The last update timestamp of a inspectTemplate, output only field.
|
|
"displayName": "A String", # Display name (max 256 chars).
|
|
"description": "A String", # Short description (max 256 chars).
|
|
"inspectConfig": { # Configuration description of the scanning process. # The core content of the template. Configuration of the scanning process.
|
|
# When used with redactContent only info_types and min_likelihood are currently
|
|
# used.
|
|
"excludeInfoTypes": True or False, # When true, excludes type information of the findings.
|
|
"limits": {
|
|
"maxFindingsPerRequest": 42, # Max number of findings that will be returned per request/job.
|
|
# When set within `InspectContentRequest`, the maximum returned is 2000
|
|
# regardless if this is set higher.
|
|
"maxFindingsPerInfoType": [ # Configuration of findings limit given for specified infoTypes.
|
|
{ # Max findings configuration per infoType, per content item or long
|
|
# running DlpJob.
|
|
"infoType": { # Type of information detected by the API. # Type of information the findings limit applies to. Only one limit per
|
|
# info_type should be provided. If InfoTypeLimit does not have an
|
|
# info_type, the DLP API applies the limit against all info_types that
|
|
# are found but not specified in another InfoTypeLimit.
|
|
"name": "A String", # Name of the information type. Either a name of your choosing when
|
|
# creating a CustomInfoType, or one of the names listed
|
|
# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying
|
|
# a built-in type. InfoType names should conform to the pattern
|
|
# [a-zA-Z0-9_]{1,64}.
|
|
},
|
|
"maxFindings": 42, # Max findings limit for the given infoType.
|
|
},
|
|
],
|
|
"maxFindingsPerItem": 42, # Max number of findings that will be returned for each item scanned.
|
|
# When set within `InspectDataSourceRequest`,
|
|
# the maximum returned is 2000 regardless if this is set higher.
|
|
# When set within `InspectContentRequest`, this field is ignored.
|
|
},
|
|
"minLikelihood": "A String", # Only returns findings equal or above this threshold. The default is
|
|
# POSSIBLE.
|
|
# See https://cloud.google.com/dlp/docs/likelihood to learn more.
|
|
"customInfoTypes": [ # CustomInfoTypes provided by the user. See
|
|
# https://cloud.google.com/dlp/docs/creating-custom-infotypes to learn more.
|
|
{ # Custom information type provided by the user. Used to find domain-specific
|
|
# sensitive information configurable to the data in question.
|
|
"regex": { # Message defining a custom regular expression. # Regular expression based CustomInfoType.
|
|
"pattern": "A String", # Pattern defining the regular expression. Its syntax
|
|
# (https://github.com/google/re2/wiki/Syntax) can be found under the
|
|
# google/re2 repository on GitHub.
|
|
"groupIndexes": [ # The index of the submatch to extract as findings. When not
|
|
# specified, the entire match is returned. No more than 3 may be included.
|
|
42,
|
|
],
|
|
},
|
|
"surrogateType": { # Message for detecting output from deidentification transformations # Message for detecting output from deidentification transformations that
|
|
# support reversing.
|
|
# such as
|
|
# [`CryptoReplaceFfxFpeConfig`](/dlp/docs/reference/rest/v2/organizations.deidentifyTemplates#cryptoreplaceffxfpeconfig).
|
|
# These types of transformations are
|
|
# those that perform pseudonymization, thereby producing a "surrogate" as
|
|
# output. This should be used in conjunction with a field on the
|
|
# transformation such as `surrogate_info_type`. This CustomInfoType does
|
|
# not support the use of `detection_rules`.
|
|
},
|
|
"infoType": { # Type of information detected by the API. # CustomInfoType can either be a new infoType, or an extension of built-in
|
|
# infoType, when the name matches one of existing infoTypes and that infoType
|
|
# is specified in `InspectContent.info_types` field. Specifying the latter
|
|
# adds findings to the one detected by the system. If built-in info type is
|
|
# not specified in `InspectContent.info_types` list then the name is treated
|
|
# as a custom info type.
|
|
"name": "A String", # Name of the information type. Either a name of your choosing when
|
|
# creating a CustomInfoType, or one of the names listed
|
|
# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying
|
|
# a built-in type. InfoType names should conform to the pattern
|
|
# [a-zA-Z0-9_]{1,64}.
|
|
},
|
|
"dictionary": { # Custom information type based on a dictionary of words or phrases. This can # A list of phrases to detect as a CustomInfoType.
|
|
# be used to match sensitive information specific to the data, such as a list
|
|
# of employee IDs or job titles.
|
|
#
|
|
# Dictionary words are case-insensitive and all characters other than letters
|
|
# and digits in the unicode [Basic Multilingual
|
|
# Plane](https://en.wikipedia.org/wiki/Plane_%28Unicode%29#Basic_Multilingual_Plane)
|
|
# will be replaced with whitespace when scanning for matches, so the
|
|
# dictionary phrase "Sam Johnson" will match all three phrases "sam johnson",
|
|
# "Sam, Johnson", and "Sam (Johnson)". Additionally, the characters
|
|
# surrounding any match must be of a different type than the adjacent
|
|
# characters within the word, so letters must be next to non-letters and
|
|
# digits next to non-digits. For example, the dictionary word "jen" will
|
|
# match the first three letters of the text "jen123" but will return no
|
|
# matches for "jennifer".
|
|
#
|
|
# Dictionary words containing a large number of characters that are not
|
|
# letters or digits may result in unexpected findings because such characters
|
|
# are treated as whitespace. The
|
|
# [limits](https://cloud.google.com/dlp/limits) page contains details about
|
|
# the size limits of dictionaries. For dictionaries that do not fit within
|
|
# these constraints, consider using `LargeCustomDictionaryConfig` in the
|
|
# `StoredInfoType` API.
|
|
"wordList": { # Message defining a list of words or phrases to search for in the data. # List of words or phrases to search for.
|
|
"words": [ # Words or phrases defining the dictionary. The dictionary must contain
|
|
# at least one phrase and every phrase must contain at least 2 characters
|
|
# that are letters or digits. [required]
|
|
"A String",
|
|
],
|
|
},
|
|
"cloudStoragePath": { # Message representing a single file or path in Cloud Storage. # Newline-delimited file of words in Cloud Storage. Only a single file
|
|
# is accepted.
|
|
"path": "A String", # A url representing a file or path (no wildcards) in Cloud Storage.
|
|
# Example: gs://[BUCKET_NAME]/dictionary.txt
|
|
},
|
|
},
|
|
"storedType": { # A reference to a StoredInfoType to use with scanning. # Load an existing `StoredInfoType` resource for use in
|
|
# `InspectDataSource`. Not currently supported in `InspectContent`.
|
|
"name": "A String", # Resource name of the requested `StoredInfoType`, for example
|
|
# `organizations/433245324/storedInfoTypes/432452342` or
|
|
# `projects/project-id/storedInfoTypes/432452342`.
|
|
"createTime": "A String", # Timestamp indicating when the version of the `StoredInfoType` used for
|
|
# inspection was created. Output-only field, populated by the system.
|
|
},
|
|
"detectionRules": [ # Set of detection rules to apply to all findings of this CustomInfoType.
|
|
# Rules are applied in order that they are specified. Not supported for the
|
|
# `surrogate_type` CustomInfoType.
|
|
{ # Deprecated; use `InspectionRuleSet` instead. Rule for modifying a
|
|
# `CustomInfoType` to alter behavior under certain circumstances, depending
|
|
# on the specific details of the rule. Not supported for the `surrogate_type`
|
|
# custom infoType.
|
|
"hotwordRule": { # The rule that adjusts the likelihood of findings within a certain # Hotword-based detection rule.
|
|
# proximity of hotwords.
|
|
"proximity": { # Message for specifying a window around a finding to apply a detection # Proximity of the finding within which the entire hotword must reside.
|
|
# The total length of the window cannot exceed 1000 characters. Note that
|
|
# the finding itself will be included in the window, so that hotwords may
|
|
# be used to match substrings of the finding itself. For example, the
|
|
# certainty of a phone number regex "\(\d{3}\) \d{3}-\d{4}" could be
|
|
# adjusted upwards if the area code is known to be the local area code of
|
|
# a company office using the hotword regex "\(xxx\)", where "xxx"
|
|
# is the area code in question.
|
|
# rule.
|
|
"windowAfter": 42, # Number of characters after the finding to consider.
|
|
"windowBefore": 42, # Number of characters before the finding to consider.
|
|
},
|
|
"hotwordRegex": { # Message defining a custom regular expression. # Regular expression pattern defining what qualifies as a hotword.
|
|
"pattern": "A String", # Pattern defining the regular expression. Its syntax
|
|
# (https://github.com/google/re2/wiki/Syntax) can be found under the
|
|
# google/re2 repository on GitHub.
|
|
"groupIndexes": [ # The index of the submatch to extract as findings. When not
|
|
# specified, the entire match is returned. No more than 3 may be included.
|
|
42,
|
|
],
|
|
},
|
|
"likelihoodAdjustment": { # Message for specifying an adjustment to the likelihood of a finding as # Likelihood adjustment to apply to all matching findings.
|
|
# part of a detection rule.
|
|
"relativeLikelihood": 42, # Increase or decrease the likelihood by the specified number of
|
|
# levels. For example, if a finding would be `POSSIBLE` without the
|
|
# detection rule and `relative_likelihood` is 1, then it is upgraded to
|
|
# `LIKELY`, while a value of -1 would downgrade it to `UNLIKELY`.
|
|
# Likelihood may never drop below `VERY_UNLIKELY` or exceed
|
|
# `VERY_LIKELY`, so applying an adjustment of 1 followed by an
|
|
# adjustment of -1 when base likelihood is `VERY_LIKELY` will result in
|
|
# a final likelihood of `LIKELY`.
|
|
"fixedLikelihood": "A String", # Set the likelihood of a finding to a fixed value.
|
|
},
|
|
},
|
|
},
|
|
],
|
|
"exclusionType": "A String", # If set to EXCLUSION_TYPE_EXCLUDE this infoType will not cause a finding
|
|
# to be returned. It still can be used for rules matching.
|
|
"likelihood": "A String", # Likelihood to return for this CustomInfoType. This base value can be
|
|
# altered by a detection rule if the finding meets the criteria specified by
|
|
# the rule. Defaults to `VERY_LIKELY` if not specified.
|
|
},
|
|
],
|
|
"includeQuote": True or False, # When true, a contextual quote from the data that triggered a finding is
|
|
# included in the response; see Finding.quote.
|
|
"ruleSet": [ # Set of rules to apply to the findings for this InspectConfig.
|
|
# Exclusion rules, contained in the set are executed in the end, other
|
|
# rules are executed in the order they are specified for each info type.
|
|
{ # Rule set for modifying a set of infoTypes to alter behavior under certain
|
|
# circumstances, depending on the specific details of the rules within the set.
|
|
"rules": [ # Set of rules to be applied to infoTypes. The rules are applied in order.
|
|
{ # A single inspection rule to be applied to infoTypes, specified in
|
|
# `InspectionRuleSet`.
|
|
"hotwordRule": { # The rule that adjusts the likelihood of findings within a certain # Hotword-based detection rule.
|
|
# proximity of hotwords.
|
|
"proximity": { # Message for specifying a window around a finding to apply a detection # Proximity of the finding within which the entire hotword must reside.
|
|
# The total length of the window cannot exceed 1000 characters. Note that
|
|
# the finding itself will be included in the window, so that hotwords may
|
|
# be used to match substrings of the finding itself. For example, the
|
|
# certainty of a phone number regex "\(\d{3}\) \d{3}-\d{4}" could be
|
|
# adjusted upwards if the area code is known to be the local area code of
|
|
# a company office using the hotword regex "\(xxx\)", where "xxx"
|
|
# is the area code in question.
|
|
# rule.
|
|
"windowAfter": 42, # Number of characters after the finding to consider.
|
|
"windowBefore": 42, # Number of characters before the finding to consider.
|
|
},
|
|
"hotwordRegex": { # Message defining a custom regular expression. # Regular expression pattern defining what qualifies as a hotword.
|
|
"pattern": "A String", # Pattern defining the regular expression. Its syntax
|
|
# (https://github.com/google/re2/wiki/Syntax) can be found under the
|
|
# google/re2 repository on GitHub.
|
|
"groupIndexes": [ # The index of the submatch to extract as findings. When not
|
|
# specified, the entire match is returned. No more than 3 may be included.
|
|
42,
|
|
],
|
|
},
|
|
"likelihoodAdjustment": { # Message for specifying an adjustment to the likelihood of a finding as # Likelihood adjustment to apply to all matching findings.
|
|
# part of a detection rule.
|
|
"relativeLikelihood": 42, # Increase or decrease the likelihood by the specified number of
|
|
# levels. For example, if a finding would be `POSSIBLE` without the
|
|
# detection rule and `relative_likelihood` is 1, then it is upgraded to
|
|
# `LIKELY`, while a value of -1 would downgrade it to `UNLIKELY`.
|
|
# Likelihood may never drop below `VERY_UNLIKELY` or exceed
|
|
# `VERY_LIKELY`, so applying an adjustment of 1 followed by an
|
|
# adjustment of -1 when base likelihood is `VERY_LIKELY` will result in
|
|
# a final likelihood of `LIKELY`.
|
|
"fixedLikelihood": "A String", # Set the likelihood of a finding to a fixed value.
|
|
},
|
|
},
|
|
"exclusionRule": { # The rule that specifies conditions when findings of infoTypes specified in # Exclusion rule.
|
|
# `InspectionRuleSet` are removed from results.
|
|
"regex": { # Message defining a custom regular expression. # Regular expression which defines the rule.
|
|
"pattern": "A String", # Pattern defining the regular expression. Its syntax
|
|
# (https://github.com/google/re2/wiki/Syntax) can be found under the
|
|
# google/re2 repository on GitHub.
|
|
"groupIndexes": [ # The index of the submatch to extract as findings. When not
|
|
# specified, the entire match is returned. No more than 3 may be included.
|
|
42,
|
|
],
|
|
},
|
|
"excludeInfoTypes": { # List of exclude infoTypes. # Set of infoTypes for which findings would affect this rule.
|
|
"infoTypes": [ # InfoType list in ExclusionRule rule drops a finding when it overlaps or
|
|
# contained within with a finding of an infoType from this list. For
|
|
# example, for `InspectionRuleSet.info_types` containing "PHONE_NUMBER"` and
|
|
# `exclusion_rule` containing `exclude_info_types.info_types` with
|
|
# "EMAIL_ADDRESS" the phone number findings are dropped if they overlap
|
|
# with EMAIL_ADDRESS finding.
|
|
# That leads to "555-222-2222@example.org" to generate only a single
|
|
# finding, namely email address.
|
|
{ # Type of information detected by the API.
|
|
"name": "A String", # Name of the information type. Either a name of your choosing when
|
|
# creating a CustomInfoType, or one of the names listed
|
|
# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying
|
|
# a built-in type. InfoType names should conform to the pattern
|
|
# [a-zA-Z0-9_]{1,64}.
|
|
},
|
|
],
|
|
},
|
|
"dictionary": { # Custom information type based on a dictionary of words or phrases. This can # Dictionary which defines the rule.
|
|
# be used to match sensitive information specific to the data, such as a list
|
|
# of employee IDs or job titles.
|
|
#
|
|
# Dictionary words are case-insensitive and all characters other than letters
|
|
# and digits in the unicode [Basic Multilingual
|
|
# Plane](https://en.wikipedia.org/wiki/Plane_%28Unicode%29#Basic_Multilingual_Plane)
|
|
# will be replaced with whitespace when scanning for matches, so the
|
|
# dictionary phrase "Sam Johnson" will match all three phrases "sam johnson",
|
|
# "Sam, Johnson", and "Sam (Johnson)". Additionally, the characters
|
|
# surrounding any match must be of a different type than the adjacent
|
|
# characters within the word, so letters must be next to non-letters and
|
|
# digits next to non-digits. For example, the dictionary word "jen" will
|
|
# match the first three letters of the text "jen123" but will return no
|
|
# matches for "jennifer".
|
|
#
|
|
# Dictionary words containing a large number of characters that are not
|
|
# letters or digits may result in unexpected findings because such characters
|
|
# are treated as whitespace. The
|
|
# [limits](https://cloud.google.com/dlp/limits) page contains details about
|
|
# the size limits of dictionaries. For dictionaries that do not fit within
|
|
# these constraints, consider using `LargeCustomDictionaryConfig` in the
|
|
# `StoredInfoType` API.
|
|
"wordList": { # Message defining a list of words or phrases to search for in the data. # List of words or phrases to search for.
|
|
"words": [ # Words or phrases defining the dictionary. The dictionary must contain
|
|
# at least one phrase and every phrase must contain at least 2 characters
|
|
# that are letters or digits. [required]
|
|
"A String",
|
|
],
|
|
},
|
|
"cloudStoragePath": { # Message representing a single file or path in Cloud Storage. # Newline-delimited file of words in Cloud Storage. Only a single file
|
|
# is accepted.
|
|
"path": "A String", # A url representing a file or path (no wildcards) in Cloud Storage.
|
|
# Example: gs://[BUCKET_NAME]/dictionary.txt
|
|
},
|
|
},
|
|
"matchingType": "A String", # How the rule is applied, see MatchingType documentation for details.
|
|
},
|
|
},
|
|
],
|
|
"infoTypes": [ # List of infoTypes this rule set is applied to.
|
|
{ # Type of information detected by the API.
|
|
"name": "A String", # Name of the information type. Either a name of your choosing when
|
|
# creating a CustomInfoType, or one of the names listed
|
|
# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying
|
|
# a built-in type. InfoType names should conform to the pattern
|
|
# [a-zA-Z0-9_]{1,64}.
|
|
},
|
|
],
|
|
},
|
|
],
|
|
"contentOptions": [ # List of options defining data content to scan.
|
|
# If empty, text, images, and other content will be included.
|
|
"A String",
|
|
],
|
|
"infoTypes": [ # Restricts what info_types to look for. The values must correspond to
|
|
# InfoType values returned by ListInfoTypes or listed at
|
|
# https://cloud.google.com/dlp/docs/infotypes-reference.
|
|
#
|
|
# When no InfoTypes or CustomInfoTypes are specified in a request, the
|
|
# system may automatically choose what detectors to run. By default this may
|
|
# be all types, but may change over time as detectors are updated.
|
|
#
|
|
# The special InfoType name "ALL_BASIC" can be used to trigger all detectors,
|
|
# but may change over time as new InfoTypes are added. If you need precise
|
|
# control and predictability as to what detectors are run you should specify
|
|
# specific InfoTypes listed in the reference.
|
|
{ # Type of information detected by the API.
|
|
"name": "A String", # Name of the information type. Either a name of your choosing when
|
|
# creating a CustomInfoType, or one of the names listed
|
|
# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying
|
|
# a built-in type. InfoType names should conform to the pattern
|
|
# [a-zA-Z0-9_]{1,64}.
|
|
},
|
|
],
|
|
},
|
|
"createTime": "A String", # The creation timestamp of a inspectTemplate, output only field.
|
|
"name": "A String", # The template name. Output only.
|
|
#
|
|
# The template will have one of the following formats:
|
|
# `projects/PROJECT_ID/inspectTemplates/TEMPLATE_ID` OR
|
|
# `organizations/ORGANIZATION_ID/inspectTemplates/TEMPLATE_ID`
|
|
},
|
|
"jobConfig": {
|
|
"storageConfig": { # Shared message indicating Cloud storage type. # The data to scan.
|
|
"datastoreOptions": { # Options defining a data set within Google Cloud Datastore. # Google Cloud Datastore options specification.
|
|
"partitionId": { # Datastore partition ID. # A partition ID identifies a grouping of entities. The grouping is always
|
|
# by project and namespace, however the namespace ID may be empty.
|
|
# A partition ID identifies a grouping of entities. The grouping is always
|
|
# by project and namespace, however the namespace ID may be empty.
|
|
#
|
|
# A partition ID contains several dimensions:
|
|
# project ID and namespace ID.
|
|
"projectId": "A String", # The ID of the project to which the entities belong.
|
|
"namespaceId": "A String", # If not empty, the ID of the namespace to which the entities belong.
|
|
},
|
|
"kind": { # A representation of a Datastore kind. # The kind to process.
|
|
"name": "A String", # The name of the kind.
|
|
},
|
|
},
|
|
"bigQueryOptions": { # Options defining BigQuery table and row identifiers. # BigQuery options specification.
|
|
"excludedFields": [ # References to fields excluded from scanning. This allows you to skip
|
|
# inspection of entire columns which you know have no findings.
|
|
{ # General identifier of a data field in a storage service.
|
|
"name": "A String", # Name describing the field.
|
|
},
|
|
],
|
|
"rowsLimit": "A String", # Max number of rows to scan. If the table has more rows than this value, the
|
|
# rest of the rows are omitted. If not set, or if set to 0, all rows will be
|
|
# scanned. Only one of rows_limit and rows_limit_percent can be specified.
|
|
# Cannot be used in conjunction with TimespanConfig.
|
|
"sampleMethod": "A String",
|
|
"identifyingFields": [ # References to fields uniquely identifying rows within the table.
|
|
# Nested fields in the format, like `person.birthdate.year`, are allowed.
|
|
{ # General identifier of a data field in a storage service.
|
|
"name": "A String", # Name describing the field.
|
|
},
|
|
],
|
|
"rowsLimitPercent": 42, # Max percentage of rows to scan. The rest are omitted. The number of rows
|
|
# scanned is rounded down. Must be between 0 and 100, inclusively. Both 0 and
|
|
# 100 means no limit. Defaults to 0. Only one of rows_limit and
|
|
# rows_limit_percent can be specified. Cannot be used in conjunction with
|
|
# TimespanConfig.
|
|
"tableReference": { # Message defining the location of a BigQuery table. A table is uniquely # Complete BigQuery table reference.
|
|
# identified by its project_id, dataset_id, and table_name. Within a query
|
|
# a table is often referenced with a string in the format of:
|
|
# `<project_id>:<dataset_id>.<table_id>` or
|
|
# `<project_id>.<dataset_id>.<table_id>`.
|
|
"projectId": "A String", # The Google Cloud Platform project ID of the project containing the table.
|
|
# If omitted, project ID is inferred from the API call.
|
|
"tableId": "A String", # Name of the table.
|
|
"datasetId": "A String", # Dataset ID of the table.
|
|
},
|
|
},
|
|
"timespanConfig": { # Configuration of the timespan of the items to include in scanning.
|
|
# Currently only supported when inspecting Google Cloud Storage and BigQuery.
|
|
"timestampField": { # General identifier of a data field in a storage service. # Specification of the field containing the timestamp of scanned items.
|
|
# Used for data sources like Datastore or BigQuery.
|
|
# If not specified for BigQuery, table last modification timestamp
|
|
# is checked against given time span.
|
|
# The valid data types of the timestamp field are:
|
|
# for BigQuery - timestamp, date, datetime;
|
|
# for Datastore - timestamp.
|
|
# Datastore entity will be scanned if the timestamp property does not exist
|
|
# or its value is empty or invalid.
|
|
"name": "A String", # Name describing the field.
|
|
},
|
|
"endTime": "A String", # Exclude files or rows newer than this value.
|
|
# If set to zero, no upper time limit is applied.
|
|
"startTime": "A String", # Exclude files or rows older than this value.
|
|
"enableAutoPopulationOfTimespanConfig": True or False, # When the job is started by a JobTrigger we will automatically figure out
|
|
# a valid start_time to avoid scanning files that have not been modified
|
|
# since the last time the JobTrigger executed. This will be based on the
|
|
# time of the execution of the last run of the JobTrigger.
|
|
},
|
|
"cloudStorageOptions": { # Options defining a file or a set of files within a Google Cloud Storage # Google Cloud Storage options specification.
|
|
# bucket.
|
|
"bytesLimitPerFile": "A String", # Max number of bytes to scan from a file. If a scanned file's size is bigger
|
|
# than this value then the rest of the bytes are omitted. Only one
|
|
# of bytes_limit_per_file and bytes_limit_per_file_percent can be specified.
|
|
"sampleMethod": "A String",
|
|
"fileSet": { # Set of files to scan. # The set of one or more files to scan.
|
|
"url": "A String", # The Cloud Storage url of the file(s) to scan, in the format
|
|
# `gs://<bucket>/<path>`. Trailing wildcard in the path is allowed.
|
|
#
|
|
# If the url ends in a trailing slash, the bucket or directory represented
|
|
# by the url will be scanned non-recursively (content in sub-directories
|
|
# will not be scanned). This means that `gs://mybucket/` is equivalent to
|
|
# `gs://mybucket/*`, and `gs://mybucket/directory/` is equivalent to
|
|
# `gs://mybucket/directory/*`.
|
|
#
|
|
# Exactly one of `url` or `regex_file_set` must be set.
|
|
"regexFileSet": { # Message representing a set of files in a Cloud Storage bucket. Regular # The regex-filtered set of files to scan. Exactly one of `url` or
|
|
# `regex_file_set` must be set.
|
|
# expressions are used to allow fine-grained control over which files in the
|
|
# bucket to include.
|
|
#
|
|
# Included files are those that match at least one item in `include_regex` and
|
|
# do not match any items in `exclude_regex`. Note that a file that matches
|
|
# items from both lists will _not_ be included. For a match to occur, the
|
|
# entire file path (i.e., everything in the url after the bucket name) must
|
|
# match the regular expression.
|
|
#
|
|
# For example, given the input `{bucket_name: "mybucket", include_regex:
|
|
# ["directory1/.*"], exclude_regex:
|
|
# ["directory1/excluded.*"]}`:
|
|
#
|
|
# * `gs://mybucket/directory1/myfile` will be included
|
|
# * `gs://mybucket/directory1/directory2/myfile` will be included (`.*` matches
|
|
# across `/`)
|
|
# * `gs://mybucket/directory0/directory1/myfile` will _not_ be included (the
|
|
# full path doesn't match any items in `include_regex`)
|
|
# * `gs://mybucket/directory1/excludedfile` will _not_ be included (the path
|
|
# matches an item in `exclude_regex`)
|
|
#
|
|
# If `include_regex` is left empty, it will match all files by default
|
|
# (this is equivalent to setting `include_regex: [".*"]`).
|
|
#
|
|
# Some other common use cases:
|
|
#
|
|
# * `{bucket_name: "mybucket", exclude_regex: [".*\.pdf"]}` will include all
|
|
# files in `mybucket` except for .pdf files
|
|
# * `{bucket_name: "mybucket", include_regex: ["directory/[^/]+"]}` will
|
|
# include all files directly under `gs://mybucket/directory/`, without matching
|
|
# across `/`
|
|
"excludeRegex": [ # A list of regular expressions matching file paths to exclude. All files in
|
|
# the bucket that match at least one of these regular expressions will be
|
|
# excluded from the scan.
|
|
#
|
|
# Regular expressions use RE2
|
|
# [syntax](https://github.com/google/re2/wiki/Syntax); a guide can be found
|
|
# under the google/re2 repository on GitHub.
|
|
"A String",
|
|
],
|
|
"bucketName": "A String", # The name of a Cloud Storage bucket. Required.
|
|
"includeRegex": [ # A list of regular expressions matching file paths to include. All files in
|
|
# the bucket that match at least one of these regular expressions will be
|
|
# included in the set of files, except for those that also match an item in
|
|
# `exclude_regex`. Leaving this field empty will match all files by default
|
|
# (this is equivalent to including `.*` in the list).
|
|
#
|
|
# Regular expressions use RE2
|
|
# [syntax](https://github.com/google/re2/wiki/Syntax); a guide can be found
|
|
# under the google/re2 repository on GitHub.
|
|
"A String",
|
|
],
|
|
},
|
|
},
|
|
"bytesLimitPerFilePercent": 42, # Max percentage of bytes to scan from a file. The rest are omitted. The
|
|
# number of bytes scanned is rounded down. Must be between 0 and 100,
|
|
# inclusively. Both 0 and 100 means no limit. Defaults to 0. Only one
|
|
# of bytes_limit_per_file and bytes_limit_per_file_percent can be specified.
|
|
"filesLimitPercent": 42, # Limits the number of files to scan to this percentage of the input FileSet.
|
|
# Number of files scanned is rounded down. Must be between 0 and 100,
|
|
# inclusively. Both 0 and 100 means no limit. Defaults to 0.
|
|
"fileTypes": [ # List of file type groups to include in the scan.
|
|
# If empty, all files are scanned and available data format processors
|
|
# are applied. In addition, the binary content of the selected files
|
|
# is always scanned as well.
|
|
"A String",
|
|
],
|
|
},
|
|
},
|
|
"inspectConfig": { # Configuration description of the scanning process. # How and what to scan for.
|
|
# When used with redactContent only info_types and min_likelihood are currently
|
|
# used.
|
|
"excludeInfoTypes": True or False, # When true, excludes type information of the findings.
|
|
"limits": {
|
|
"maxFindingsPerRequest": 42, # Max number of findings that will be returned per request/job.
|
|
# When set within `InspectContentRequest`, the maximum returned is 2000
|
|
# regardless if this is set higher.
|
|
"maxFindingsPerInfoType": [ # Configuration of findings limit given for specified infoTypes.
|
|
{ # Max findings configuration per infoType, per content item or long
|
|
# running DlpJob.
|
|
"infoType": { # Type of information detected by the API. # Type of information the findings limit applies to. Only one limit per
|
|
# info_type should be provided. If InfoTypeLimit does not have an
|
|
# info_type, the DLP API applies the limit against all info_types that
|
|
# are found but not specified in another InfoTypeLimit.
|
|
"name": "A String", # Name of the information type. Either a name of your choosing when
|
|
# creating a CustomInfoType, or one of the names listed
|
|
# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying
|
|
# a built-in type. InfoType names should conform to the pattern
|
|
# [a-zA-Z0-9_]{1,64}.
|
|
},
|
|
"maxFindings": 42, # Max findings limit for the given infoType.
|
|
},
|
|
],
|
|
"maxFindingsPerItem": 42, # Max number of findings that will be returned for each item scanned.
|
|
# When set within `InspectDataSourceRequest`,
|
|
# the maximum returned is 2000 regardless if this is set higher.
|
|
# When set within `InspectContentRequest`, this field is ignored.
|
|
},
|
|
"minLikelihood": "A String", # Only returns findings equal or above this threshold. The default is
|
|
# POSSIBLE.
|
|
# See https://cloud.google.com/dlp/docs/likelihood to learn more.
|
|
"customInfoTypes": [ # CustomInfoTypes provided by the user. See
|
|
# https://cloud.google.com/dlp/docs/creating-custom-infotypes to learn more.
|
|
{ # Custom information type provided by the user. Used to find domain-specific
|
|
# sensitive information configurable to the data in question.
|
|
"regex": { # Message defining a custom regular expression. # Regular expression based CustomInfoType.
|
|
"pattern": "A String", # Pattern defining the regular expression. Its syntax
|
|
# (https://github.com/google/re2/wiki/Syntax) can be found under the
|
|
# google/re2 repository on GitHub.
|
|
"groupIndexes": [ # The index of the submatch to extract as findings. When not
|
|
# specified, the entire match is returned. No more than 3 may be included.
|
|
42,
|
|
],
|
|
},
|
|
"surrogateType": { # Message for detecting output from deidentification transformations # Message for detecting output from deidentification transformations that
|
|
# support reversing.
|
|
# such as
|
|
# [`CryptoReplaceFfxFpeConfig`](/dlp/docs/reference/rest/v2/organizations.deidentifyTemplates#cryptoreplaceffxfpeconfig).
|
|
# These types of transformations are
|
|
# those that perform pseudonymization, thereby producing a "surrogate" as
|
|
# output. This should be used in conjunction with a field on the
|
|
# transformation such as `surrogate_info_type`. This CustomInfoType does
|
|
# not support the use of `detection_rules`.
|
|
},
|
|
"infoType": { # Type of information detected by the API. # CustomInfoType can either be a new infoType, or an extension of built-in
|
|
# infoType, when the name matches one of existing infoTypes and that infoType
|
|
# is specified in `InspectContent.info_types` field. Specifying the latter
|
|
# adds findings to the one detected by the system. If built-in info type is
|
|
# not specified in `InspectContent.info_types` list then the name is treated
|
|
# as a custom info type.
|
|
"name": "A String", # Name of the information type. Either a name of your choosing when
|
|
# creating a CustomInfoType, or one of the names listed
|
|
# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying
|
|
# a built-in type. InfoType names should conform to the pattern
|
|
# [a-zA-Z0-9_]{1,64}.
|
|
},
|
|
"dictionary": { # Custom information type based on a dictionary of words or phrases. This can # A list of phrases to detect as a CustomInfoType.
|
|
# be used to match sensitive information specific to the data, such as a list
|
|
# of employee IDs or job titles.
|
|
#
|
|
# Dictionary words are case-insensitive and all characters other than letters
|
|
# and digits in the unicode [Basic Multilingual
|
|
# Plane](https://en.wikipedia.org/wiki/Plane_%28Unicode%29#Basic_Multilingual_Plane)
|
|
# will be replaced with whitespace when scanning for matches, so the
|
|
# dictionary phrase "Sam Johnson" will match all three phrases "sam johnson",
|
|
# "Sam, Johnson", and "Sam (Johnson)". Additionally, the characters
|
|
# surrounding any match must be of a different type than the adjacent
|
|
# characters within the word, so letters must be next to non-letters and
|
|
# digits next to non-digits. For example, the dictionary word "jen" will
|
|
# match the first three letters of the text "jen123" but will return no
|
|
# matches for "jennifer".
|
|
#
|
|
# Dictionary words containing a large number of characters that are not
|
|
# letters or digits may result in unexpected findings because such characters
|
|
# are treated as whitespace. The
|
|
# [limits](https://cloud.google.com/dlp/limits) page contains details about
|
|
# the size limits of dictionaries. For dictionaries that do not fit within
|
|
# these constraints, consider using `LargeCustomDictionaryConfig` in the
|
|
# `StoredInfoType` API.
|
|
"wordList": { # Message defining a list of words or phrases to search for in the data. # List of words or phrases to search for.
|
|
"words": [ # Words or phrases defining the dictionary. The dictionary must contain
|
|
# at least one phrase and every phrase must contain at least 2 characters
|
|
# that are letters or digits. [required]
|
|
"A String",
|
|
],
|
|
},
|
|
"cloudStoragePath": { # Message representing a single file or path in Cloud Storage. # Newline-delimited file of words in Cloud Storage. Only a single file
|
|
# is accepted.
|
|
"path": "A String", # A url representing a file or path (no wildcards) in Cloud Storage.
|
|
# Example: gs://[BUCKET_NAME]/dictionary.txt
|
|
},
|
|
},
|
|
"storedType": { # A reference to a StoredInfoType to use with scanning. # Load an existing `StoredInfoType` resource for use in
|
|
# `InspectDataSource`. Not currently supported in `InspectContent`.
|
|
"name": "A String", # Resource name of the requested `StoredInfoType`, for example
|
|
# `organizations/433245324/storedInfoTypes/432452342` or
|
|
# `projects/project-id/storedInfoTypes/432452342`.
|
|
"createTime": "A String", # Timestamp indicating when the version of the `StoredInfoType` used for
|
|
# inspection was created. Output-only field, populated by the system.
|
|
},
|
|
"detectionRules": [ # Set of detection rules to apply to all findings of this CustomInfoType.
|
|
# Rules are applied in order that they are specified. Not supported for the
|
|
# `surrogate_type` CustomInfoType.
|
|
{ # Deprecated; use `InspectionRuleSet` instead. Rule for modifying a
|
|
# `CustomInfoType` to alter behavior under certain circumstances, depending
|
|
# on the specific details of the rule. Not supported for the `surrogate_type`
|
|
# custom infoType.
|
|
"hotwordRule": { # The rule that adjusts the likelihood of findings within a certain # Hotword-based detection rule.
|
|
# proximity of hotwords.
|
|
"proximity": { # Message for specifying a window around a finding to apply a detection # Proximity of the finding within which the entire hotword must reside.
|
|
# The total length of the window cannot exceed 1000 characters. Note that
|
|
# the finding itself will be included in the window, so that hotwords may
|
|
# be used to match substrings of the finding itself. For example, the
|
|
# certainty of a phone number regex "\(\d{3}\) \d{3}-\d{4}" could be
|
|
# adjusted upwards if the area code is known to be the local area code of
|
|
# a company office using the hotword regex "\(xxx\)", where "xxx"
|
|
# is the area code in question.
|
|
# rule.
|
|
"windowAfter": 42, # Number of characters after the finding to consider.
|
|
"windowBefore": 42, # Number of characters before the finding to consider.
|
|
},
|
|
"hotwordRegex": { # Message defining a custom regular expression. # Regular expression pattern defining what qualifies as a hotword.
|
|
"pattern": "A String", # Pattern defining the regular expression. Its syntax
|
|
# (https://github.com/google/re2/wiki/Syntax) can be found under the
|
|
# google/re2 repository on GitHub.
|
|
"groupIndexes": [ # The index of the submatch to extract as findings. When not
|
|
# specified, the entire match is returned. No more than 3 may be included.
|
|
42,
|
|
],
|
|
},
|
|
"likelihoodAdjustment": { # Message for specifying an adjustment to the likelihood of a finding as # Likelihood adjustment to apply to all matching findings.
|
|
# part of a detection rule.
|
|
"relativeLikelihood": 42, # Increase or decrease the likelihood by the specified number of
|
|
# levels. For example, if a finding would be `POSSIBLE` without the
|
|
# detection rule and `relative_likelihood` is 1, then it is upgraded to
|
|
# `LIKELY`, while a value of -1 would downgrade it to `UNLIKELY`.
|
|
# Likelihood may never drop below `VERY_UNLIKELY` or exceed
|
|
# `VERY_LIKELY`, so applying an adjustment of 1 followed by an
|
|
# adjustment of -1 when base likelihood is `VERY_LIKELY` will result in
|
|
# a final likelihood of `LIKELY`.
|
|
"fixedLikelihood": "A String", # Set the likelihood of a finding to a fixed value.
|
|
},
|
|
},
|
|
},
|
|
],
|
|
"exclusionType": "A String", # If set to EXCLUSION_TYPE_EXCLUDE this infoType will not cause a finding
|
|
# to be returned. It still can be used for rules matching.
|
|
"likelihood": "A String", # Likelihood to return for this CustomInfoType. This base value can be
|
|
# altered by a detection rule if the finding meets the criteria specified by
|
|
# the rule. Defaults to `VERY_LIKELY` if not specified.
|
|
},
|
|
],
|
|
"includeQuote": True or False, # When true, a contextual quote from the data that triggered a finding is
|
|
# included in the response; see Finding.quote.
|
|
"ruleSet": [ # Set of rules to apply to the findings for this InspectConfig.
|
|
# Exclusion rules, contained in the set are executed in the end, other
|
|
# rules are executed in the order they are specified for each info type.
|
|
{ # Rule set for modifying a set of infoTypes to alter behavior under certain
|
|
# circumstances, depending on the specific details of the rules within the set.
|
|
"rules": [ # Set of rules to be applied to infoTypes. The rules are applied in order.
|
|
{ # A single inspection rule to be applied to infoTypes, specified in
|
|
# `InspectionRuleSet`.
|
|
"hotwordRule": { # The rule that adjusts the likelihood of findings within a certain # Hotword-based detection rule.
|
|
# proximity of hotwords.
|
|
"proximity": { # Message for specifying a window around a finding to apply a detection # Proximity of the finding within which the entire hotword must reside.
|
|
# The total length of the window cannot exceed 1000 characters. Note that
|
|
# the finding itself will be included in the window, so that hotwords may
|
|
# be used to match substrings of the finding itself. For example, the
|
|
# certainty of a phone number regex "\(\d{3}\) \d{3}-\d{4}" could be
|
|
# adjusted upwards if the area code is known to be the local area code of
|
|
# a company office using the hotword regex "\(xxx\)", where "xxx"
|
|
# is the area code in question.
|
|
# rule.
|
|
"windowAfter": 42, # Number of characters after the finding to consider.
|
|
"windowBefore": 42, # Number of characters before the finding to consider.
|
|
},
|
|
"hotwordRegex": { # Message defining a custom regular expression. # Regular expression pattern defining what qualifies as a hotword.
|
|
"pattern": "A String", # Pattern defining the regular expression. Its syntax
|
|
# (https://github.com/google/re2/wiki/Syntax) can be found under the
|
|
# google/re2 repository on GitHub.
|
|
"groupIndexes": [ # The index of the submatch to extract as findings. When not
|
|
# specified, the entire match is returned. No more than 3 may be included.
|
|
42,
|
|
],
|
|
},
|
|
"likelihoodAdjustment": { # Message for specifying an adjustment to the likelihood of a finding as # Likelihood adjustment to apply to all matching findings.
|
|
# part of a detection rule.
|
|
"relativeLikelihood": 42, # Increase or decrease the likelihood by the specified number of
|
|
# levels. For example, if a finding would be `POSSIBLE` without the
|
|
# detection rule and `relative_likelihood` is 1, then it is upgraded to
|
|
# `LIKELY`, while a value of -1 would downgrade it to `UNLIKELY`.
|
|
# Likelihood may never drop below `VERY_UNLIKELY` or exceed
|
|
# `VERY_LIKELY`, so applying an adjustment of 1 followed by an
|
|
# adjustment of -1 when base likelihood is `VERY_LIKELY` will result in
|
|
# a final likelihood of `LIKELY`.
|
|
"fixedLikelihood": "A String", # Set the likelihood of a finding to a fixed value.
|
|
},
|
|
},
|
|
"exclusionRule": { # The rule that specifies conditions when findings of infoTypes specified in # Exclusion rule.
|
|
# `InspectionRuleSet` are removed from results.
|
|
"regex": { # Message defining a custom regular expression. # Regular expression which defines the rule.
|
|
"pattern": "A String", # Pattern defining the regular expression. Its syntax
|
|
# (https://github.com/google/re2/wiki/Syntax) can be found under the
|
|
# google/re2 repository on GitHub.
|
|
"groupIndexes": [ # The index of the submatch to extract as findings. When not
|
|
# specified, the entire match is returned. No more than 3 may be included.
|
|
42,
|
|
],
|
|
},
|
|
"excludeInfoTypes": { # List of exclude infoTypes. # Set of infoTypes for which findings would affect this rule.
|
|
"infoTypes": [ # InfoType list in ExclusionRule rule drops a finding when it overlaps or
|
|
# contained within with a finding of an infoType from this list. For
|
|
# example, for `InspectionRuleSet.info_types` containing "PHONE_NUMBER"` and
|
|
# `exclusion_rule` containing `exclude_info_types.info_types` with
|
|
# "EMAIL_ADDRESS" the phone number findings are dropped if they overlap
|
|
# with EMAIL_ADDRESS finding.
|
|
# That leads to "555-222-2222@example.org" to generate only a single
|
|
# finding, namely email address.
|
|
{ # Type of information detected by the API.
|
|
"name": "A String", # Name of the information type. Either a name of your choosing when
|
|
# creating a CustomInfoType, or one of the names listed
|
|
# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying
|
|
# a built-in type. InfoType names should conform to the pattern
|
|
# [a-zA-Z0-9_]{1,64}.
|
|
},
|
|
],
|
|
},
|
|
"dictionary": { # Custom information type based on a dictionary of words or phrases. This can # Dictionary which defines the rule.
|
|
# be used to match sensitive information specific to the data, such as a list
|
|
# of employee IDs or job titles.
|
|
#
|
|
# Dictionary words are case-insensitive and all characters other than letters
|
|
# and digits in the unicode [Basic Multilingual
|
|
# Plane](https://en.wikipedia.org/wiki/Plane_%28Unicode%29#Basic_Multilingual_Plane)
|
|
# will be replaced with whitespace when scanning for matches, so the
|
|
# dictionary phrase "Sam Johnson" will match all three phrases "sam johnson",
|
|
# "Sam, Johnson", and "Sam (Johnson)". Additionally, the characters
|
|
# surrounding any match must be of a different type than the adjacent
|
|
# characters within the word, so letters must be next to non-letters and
|
|
# digits next to non-digits. For example, the dictionary word "jen" will
|
|
# match the first three letters of the text "jen123" but will return no
|
|
# matches for "jennifer".
|
|
#
|
|
# Dictionary words containing a large number of characters that are not
|
|
# letters or digits may result in unexpected findings because such characters
|
|
# are treated as whitespace. The
|
|
# [limits](https://cloud.google.com/dlp/limits) page contains details about
|
|
# the size limits of dictionaries. For dictionaries that do not fit within
|
|
# these constraints, consider using `LargeCustomDictionaryConfig` in the
|
|
# `StoredInfoType` API.
|
|
"wordList": { # Message defining a list of words or phrases to search for in the data. # List of words or phrases to search for.
|
|
"words": [ # Words or phrases defining the dictionary. The dictionary must contain
|
|
# at least one phrase and every phrase must contain at least 2 characters
|
|
# that are letters or digits. [required]
|
|
"A String",
|
|
],
|
|
},
|
|
"cloudStoragePath": { # Message representing a single file or path in Cloud Storage. # Newline-delimited file of words in Cloud Storage. Only a single file
|
|
# is accepted.
|
|
"path": "A String", # A url representing a file or path (no wildcards) in Cloud Storage.
|
|
# Example: gs://[BUCKET_NAME]/dictionary.txt
|
|
},
|
|
},
|
|
"matchingType": "A String", # How the rule is applied, see MatchingType documentation for details.
|
|
},
|
|
},
|
|
],
|
|
"infoTypes": [ # List of infoTypes this rule set is applied to.
|
|
{ # Type of information detected by the API.
|
|
"name": "A String", # Name of the information type. Either a name of your choosing when
|
|
# creating a CustomInfoType, or one of the names listed
|
|
# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying
|
|
# a built-in type. InfoType names should conform to the pattern
|
|
# [a-zA-Z0-9_]{1,64}.
|
|
},
|
|
],
|
|
},
|
|
],
|
|
"contentOptions": [ # List of options defining data content to scan.
|
|
# If empty, text, images, and other content will be included.
|
|
"A String",
|
|
],
|
|
"infoTypes": [ # Restricts what info_types to look for. The values must correspond to
|
|
# InfoType values returned by ListInfoTypes or listed at
|
|
# https://cloud.google.com/dlp/docs/infotypes-reference.
|
|
#
|
|
# When no InfoTypes or CustomInfoTypes are specified in a request, the
|
|
# system may automatically choose what detectors to run. By default this may
|
|
# be all types, but may change over time as detectors are updated.
|
|
#
|
|
# The special InfoType name "ALL_BASIC" can be used to trigger all detectors,
|
|
# but may change over time as new InfoTypes are added. If you need precise
|
|
# control and predictability as to what detectors are run you should specify
|
|
# specific InfoTypes listed in the reference.
|
|
{ # Type of information detected by the API.
|
|
"name": "A String", # Name of the information type. Either a name of your choosing when
|
|
# creating a CustomInfoType, or one of the names listed
|
|
# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying
|
|
# a built-in type. InfoType names should conform to the pattern
|
|
# [a-zA-Z0-9_]{1,64}.
|
|
},
|
|
],
|
|
},
|
|
"inspectTemplateName": "A String", # If provided, will be used as the default for all values in InspectConfig.
|
|
# `inspect_config` will be merged into the values persisted as part of the
|
|
# template.
|
|
"actions": [ # Actions to execute at the completion of the job.
|
|
{ # A task to execute on the completion of a job.
|
|
# See https://cloud.google.com/dlp/docs/concepts-actions to learn more.
|
|
"saveFindings": { # If set, the detailed findings will be persisted to the specified # Save resulting findings in a provided location.
|
|
# OutputStorageConfig. Only a single instance of this action can be
|
|
# specified.
|
|
# Compatible with: Inspect, Risk
|
|
"outputConfig": { # Cloud repository for storing output.
|
|
"table": { # Message defining the location of a BigQuery table. A table is uniquely # Store findings in an existing table or a new table in an existing
|
|
# dataset. If table_id is not set a new one will be generated
|
|
# for you with the following format:
|
|
# dlp_googleapis_yyyy_mm_dd_[dlp_job_id]. Pacific timezone will be used for
|
|
# generating the date details.
|
|
#
|
|
# For Inspect, each column in an existing output table must have the same
|
|
# name, type, and mode of a field in the `Finding` object.
|
|
#
|
|
# For Risk, an existing output table should be the output of a previous
|
|
# Risk analysis job run on the same source table, with the same privacy
|
|
# metric and quasi-identifiers. Risk jobs that analyze the same table but
|
|
# compute a different privacy metric, or use different sets of
|
|
# quasi-identifiers, cannot store their results in the same table.
|
|
# identified by its project_id, dataset_id, and table_name. Within a query
|
|
# a table is often referenced with a string in the format of:
|
|
# `<project_id>:<dataset_id>.<table_id>` or
|
|
# `<project_id>.<dataset_id>.<table_id>`.
|
|
"projectId": "A String", # The Google Cloud Platform project ID of the project containing the table.
|
|
# If omitted, project ID is inferred from the API call.
|
|
"tableId": "A String", # Name of the table.
|
|
"datasetId": "A String", # Dataset ID of the table.
|
|
},
|
|
"outputSchema": "A String", # Schema used for writing the findings for Inspect jobs. This field is only
|
|
# used for Inspect and must be unspecified for Risk jobs. Columns are derived
|
|
# from the `Finding` object. If appending to an existing table, any columns
|
|
# from the predefined schema that are missing will be added. No columns in
|
|
# the existing table will be deleted.
|
|
#
|
|
# If unspecified, then all available columns will be used for a new table or
|
|
# an (existing) table with no schema, and no changes will be made to an
|
|
# existing table that has a schema.
|
|
},
|
|
},
|
|
"jobNotificationEmails": { # Enable email notification to project owners and editors on jobs's # Enable email notification to project owners and editors on job's
|
|
# completion/failure.
|
|
# completion/failure.
|
|
},
|
|
"publishSummaryToCscc": { # Publish the result summary of a DlpJob to the Cloud Security # Publish summary to Cloud Security Command Center (Alpha).
|
|
# Command Center (CSCC Alpha).
|
|
# This action is only available for projects which are parts of
|
|
# an organization and whitelisted for the alpha Cloud Security Command
|
|
# Center.
|
|
# The action will publish count of finding instances and their info types.
|
|
# The summary of findings will be persisted in CSCC and are governed by CSCC
|
|
# service-specific policy, see https://cloud.google.com/terms/service-terms
|
|
# Only a single instance of this action can be specified.
|
|
# Compatible with: Inspect
|
|
},
|
|
"pubSub": { # Publish a message into given Pub/Sub topic when DlpJob has completed. The # Publish a notification to a pubsub topic.
|
|
# message contains a single field, `DlpJobName`, which is equal to the
|
|
# finished job's
|
|
# [`DlpJob.name`](/dlp/docs/reference/rest/v2/projects.dlpJobs#DlpJob).
|
|
# Compatible with: Inspect, Risk
|
|
"topic": "A String", # Cloud Pub/Sub topic to send notifications to. The topic must have given
|
|
# publishing access rights to the DLP API service account executing
|
|
# the long running DlpJob sending the notifications.
|
|
# Format is projects/{project}/topics/{topic}.
|
|
},
|
|
},
|
|
],
|
|
},
|
|
},
|
|
"result": { # All result fields mentioned below are updated while the job is processing. # A summary of the outcome of this inspect job.
|
|
"infoTypeStats": [ # Statistics of how many instances of each info type were found during
|
|
# inspect job.
|
|
{ # Statistics regarding a specific InfoType.
|
|
"count": "A String", # Number of findings for this infoType.
|
|
"infoType": { # Type of information detected by the API. # The type of finding this stat is for.
|
|
"name": "A String", # Name of the information type. Either a name of your choosing when
|
|
# creating a CustomInfoType, or one of the names listed
|
|
# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying
|
|
# a built-in type. InfoType names should conform to the pattern
|
|
# [a-zA-Z0-9_]{1,64}.
|
|
},
|
|
},
|
|
],
|
|
"totalEstimatedBytes": "A String", # Estimate of the number of bytes to process.
|
|
"processedBytes": "A String", # Total size in bytes that were processed.
|
|
},
|
|
},
|
|
"riskDetails": { # Result of a risk analysis operation request. # Results from analyzing risk of a data source.
|
|
"numericalStatsResult": { # Result of the numerical stats computation.
|
|
"quantileValues": [ # List of 99 values that partition the set of field values into 100 equal
|
|
# sized buckets.
|
|
{ # Set of primitive values supported by the system.
|
|
# Note that for the purposes of inspection or transformation, the number
|
|
# of bytes considered to comprise a 'Value' is based on its representation
|
|
# as a UTF-8 encoded string. For example, if 'integer_value' is set to
|
|
# 123456789, the number of bytes would be counted as 9, even though an
|
|
# int64 only holds up to 8 bytes of data.
|
|
"floatValue": 3.14,
|
|
"timestampValue": "A String",
|
|
"dayOfWeekValue": "A String",
|
|
"timeValue": { # Represents a time of day. The date and time zone are either not significant
|
|
# or are specified elsewhere. An API may choose to allow leap seconds. Related
|
|
# types are google.type.Date and `google.protobuf.Timestamp`.
|
|
"hours": 42, # Hours of day in 24 hour format. Should be from 0 to 23. An API may choose
|
|
# to allow the value "24:00:00" for scenarios like business closing time.
|
|
"nanos": 42, # Fractions of seconds in nanoseconds. Must be from 0 to 999,999,999.
|
|
"seconds": 42, # Seconds of minutes of the time. Must normally be from 0 to 59. An API may
|
|
# allow the value 60 if it allows leap-seconds.
|
|
"minutes": 42, # Minutes of hour of day. Must be from 0 to 59.
|
|
},
|
|
"dateValue": { # Represents a whole or partial calendar date, e.g. a birthday. The time of day
|
|
# and time zone are either specified elsewhere or are not significant. The date
|
|
# is relative to the Proleptic Gregorian Calendar. This can represent:
|
|
#
|
|
# * A full date, with non-zero year, month and day values
|
|
# * A month and day value, with a zero year, e.g. an anniversary
|
|
# * A year on its own, with zero month and day values
|
|
# * A year and month value, with a zero day, e.g. a credit card expiration date
|
|
#
|
|
# Related types are google.type.TimeOfDay and `google.protobuf.Timestamp`.
|
|
"year": 42, # Year of date. Must be from 1 to 9999, or 0 if specifying a date without
|
|
# a year.
|
|
"day": 42, # Day of month. Must be from 1 to 31 and valid for the year and month, or 0
|
|
# if specifying a year by itself or a year and month where the day is not
|
|
# significant.
|
|
"month": 42, # Month of year. Must be from 1 to 12, or 0 if specifying a year without a
|
|
# month and day.
|
|
},
|
|
"stringValue": "A String",
|
|
"booleanValue": True or False,
|
|
"integerValue": "A String",
|
|
},
|
|
],
|
|
"maxValue": { # Set of primitive values supported by the system. # Maximum value appearing in the column.
|
|
# Note that for the purposes of inspection or transformation, the number
|
|
# of bytes considered to comprise a 'Value' is based on its representation
|
|
# as a UTF-8 encoded string. For example, if 'integer_value' is set to
|
|
# 123456789, the number of bytes would be counted as 9, even though an
|
|
# int64 only holds up to 8 bytes of data.
|
|
"floatValue": 3.14,
|
|
"timestampValue": "A String",
|
|
"dayOfWeekValue": "A String",
|
|
"timeValue": { # Represents a time of day. The date and time zone are either not significant
|
|
# or are specified elsewhere. An API may choose to allow leap seconds. Related
|
|
# types are google.type.Date and `google.protobuf.Timestamp`.
|
|
"hours": 42, # Hours of day in 24 hour format. Should be from 0 to 23. An API may choose
|
|
# to allow the value "24:00:00" for scenarios like business closing time.
|
|
"nanos": 42, # Fractions of seconds in nanoseconds. Must be from 0 to 999,999,999.
|
|
"seconds": 42, # Seconds of minutes of the time. Must normally be from 0 to 59. An API may
|
|
# allow the value 60 if it allows leap-seconds.
|
|
"minutes": 42, # Minutes of hour of day. Must be from 0 to 59.
|
|
},
|
|
"dateValue": { # Represents a whole or partial calendar date, e.g. a birthday. The time of day
|
|
# and time zone are either specified elsewhere or are not significant. The date
|
|
# is relative to the Proleptic Gregorian Calendar. This can represent:
|
|
#
|
|
# * A full date, with non-zero year, month and day values
|
|
# * A month and day value, with a zero year, e.g. an anniversary
|
|
# * A year on its own, with zero month and day values
|
|
# * A year and month value, with a zero day, e.g. a credit card expiration date
|
|
#
|
|
# Related types are google.type.TimeOfDay and `google.protobuf.Timestamp`.
|
|
"year": 42, # Year of date. Must be from 1 to 9999, or 0 if specifying a date without
|
|
# a year.
|
|
"day": 42, # Day of month. Must be from 1 to 31 and valid for the year and month, or 0
|
|
# if specifying a year by itself or a year and month where the day is not
|
|
# significant.
|
|
"month": 42, # Month of year. Must be from 1 to 12, or 0 if specifying a year without a
|
|
# month and day.
|
|
},
|
|
"stringValue": "A String",
|
|
"booleanValue": True or False,
|
|
"integerValue": "A String",
|
|
},
|
|
"minValue": { # Set of primitive values supported by the system. # Minimum value appearing in the column.
|
|
# Note that for the purposes of inspection or transformation, the number
|
|
# of bytes considered to comprise a 'Value' is based on its representation
|
|
# as a UTF-8 encoded string. For example, if 'integer_value' is set to
|
|
# 123456789, the number of bytes would be counted as 9, even though an
|
|
# int64 only holds up to 8 bytes of data.
|
|
"floatValue": 3.14,
|
|
"timestampValue": "A String",
|
|
"dayOfWeekValue": "A String",
|
|
"timeValue": { # Represents a time of day. The date and time zone are either not significant
|
|
# or are specified elsewhere. An API may choose to allow leap seconds. Related
|
|
# types are google.type.Date and `google.protobuf.Timestamp`.
|
|
"hours": 42, # Hours of day in 24 hour format. Should be from 0 to 23. An API may choose
|
|
# to allow the value "24:00:00" for scenarios like business closing time.
|
|
"nanos": 42, # Fractions of seconds in nanoseconds. Must be from 0 to 999,999,999.
|
|
"seconds": 42, # Seconds of minutes of the time. Must normally be from 0 to 59. An API may
|
|
# allow the value 60 if it allows leap-seconds.
|
|
"minutes": 42, # Minutes of hour of day. Must be from 0 to 59.
|
|
},
|
|
"dateValue": { # Represents a whole or partial calendar date, e.g. a birthday. The time of day
|
|
# and time zone are either specified elsewhere or are not significant. The date
|
|
# is relative to the Proleptic Gregorian Calendar. This can represent:
|
|
#
|
|
# * A full date, with non-zero year, month and day values
|
|
# * A month and day value, with a zero year, e.g. an anniversary
|
|
# * A year on its own, with zero month and day values
|
|
# * A year and month value, with a zero day, e.g. a credit card expiration date
|
|
#
|
|
# Related types are google.type.TimeOfDay and `google.protobuf.Timestamp`.
|
|
"year": 42, # Year of date. Must be from 1 to 9999, or 0 if specifying a date without
|
|
# a year.
|
|
"day": 42, # Day of month. Must be from 1 to 31 and valid for the year and month, or 0
|
|
# if specifying a year by itself or a year and month where the day is not
|
|
# significant.
|
|
"month": 42, # Month of year. Must be from 1 to 12, or 0 if specifying a year without a
|
|
# month and day.
|
|
},
|
|
"stringValue": "A String",
|
|
"booleanValue": True or False,
|
|
"integerValue": "A String",
|
|
},
|
|
},
|
|
"kMapEstimationResult": { # Result of the reidentifiability analysis. Note that these results are an
|
|
# estimation, not exact values.
|
|
"kMapEstimationHistogram": [ # The intervals [min_anonymity, max_anonymity] do not overlap. If a value
|
|
# doesn't correspond to any such interval, the associated frequency is
|
|
# zero. For example, the following records:
|
|
# {min_anonymity: 1, max_anonymity: 1, frequency: 17}
|
|
# {min_anonymity: 2, max_anonymity: 3, frequency: 42}
|
|
# {min_anonymity: 5, max_anonymity: 10, frequency: 99}
|
|
# mean that there are no record with an estimated anonymity of 4, 5, or
|
|
# larger than 10.
|
|
{ # A KMapEstimationHistogramBucket message with the following values:
|
|
# min_anonymity: 3
|
|
# max_anonymity: 5
|
|
# frequency: 42
|
|
# means that there are 42 records whose quasi-identifier values correspond
|
|
# to 3, 4 or 5 people in the overlying population. An important particular
|
|
# case is when min_anonymity = max_anonymity = 1: the frequency field then
|
|
# corresponds to the number of uniquely identifiable records.
|
|
"bucketValues": [ # Sample of quasi-identifier tuple values in this bucket. The total
|
|
# number of classes returned per bucket is capped at 20.
|
|
{ # A tuple of values for the quasi-identifier columns.
|
|
"estimatedAnonymity": "A String", # The estimated anonymity for these quasi-identifier values.
|
|
"quasiIdsValues": [ # The quasi-identifier values.
|
|
{ # Set of primitive values supported by the system.
|
|
# Note that for the purposes of inspection or transformation, the number
|
|
# of bytes considered to comprise a 'Value' is based on its representation
|
|
# as a UTF-8 encoded string. For example, if 'integer_value' is set to
|
|
# 123456789, the number of bytes would be counted as 9, even though an
|
|
# int64 only holds up to 8 bytes of data.
|
|
"floatValue": 3.14,
|
|
"timestampValue": "A String",
|
|
"dayOfWeekValue": "A String",
|
|
"timeValue": { # Represents a time of day. The date and time zone are either not significant
|
|
# or are specified elsewhere. An API may choose to allow leap seconds. Related
|
|
# types are google.type.Date and `google.protobuf.Timestamp`.
|
|
"hours": 42, # Hours of day in 24 hour format. Should be from 0 to 23. An API may choose
|
|
# to allow the value "24:00:00" for scenarios like business closing time.
|
|
"nanos": 42, # Fractions of seconds in nanoseconds. Must be from 0 to 999,999,999.
|
|
"seconds": 42, # Seconds of minutes of the time. Must normally be from 0 to 59. An API may
|
|
# allow the value 60 if it allows leap-seconds.
|
|
"minutes": 42, # Minutes of hour of day. Must be from 0 to 59.
|
|
},
|
|
"dateValue": { # Represents a whole or partial calendar date, e.g. a birthday. The time of day
|
|
# and time zone are either specified elsewhere or are not significant. The date
|
|
# is relative to the Proleptic Gregorian Calendar. This can represent:
|
|
#
|
|
# * A full date, with non-zero year, month and day values
|
|
# * A month and day value, with a zero year, e.g. an anniversary
|
|
# * A year on its own, with zero month and day values
|
|
# * A year and month value, with a zero day, e.g. a credit card expiration date
|
|
#
|
|
# Related types are google.type.TimeOfDay and `google.protobuf.Timestamp`.
|
|
"year": 42, # Year of date. Must be from 1 to 9999, or 0 if specifying a date without
|
|
# a year.
|
|
"day": 42, # Day of month. Must be from 1 to 31 and valid for the year and month, or 0
|
|
# if specifying a year by itself or a year and month where the day is not
|
|
# significant.
|
|
"month": 42, # Month of year. Must be from 1 to 12, or 0 if specifying a year without a
|
|
# month and day.
|
|
},
|
|
"stringValue": "A String",
|
|
"booleanValue": True or False,
|
|
"integerValue": "A String",
|
|
},
|
|
],
|
|
},
|
|
],
|
|
"minAnonymity": "A String", # Always positive.
|
|
"bucketValueCount": "A String", # Total number of distinct quasi-identifier tuple values in this bucket.
|
|
"maxAnonymity": "A String", # Always greater than or equal to min_anonymity.
|
|
"bucketSize": "A String", # Number of records within these anonymity bounds.
|
|
},
|
|
],
|
|
},
|
|
"kAnonymityResult": { # Result of the k-anonymity computation.
|
|
"equivalenceClassHistogramBuckets": [ # Histogram of k-anonymity equivalence classes.
|
|
{
|
|
"bucketValues": [ # Sample of equivalence classes in this bucket. The total number of
|
|
# classes returned per bucket is capped at 20.
|
|
{ # The set of columns' values that share the same ldiversity value
|
|
"quasiIdsValues": [ # Set of values defining the equivalence class. One value per
|
|
# quasi-identifier column in the original KAnonymity metric message.
|
|
# The order is always the same as the original request.
|
|
{ # Set of primitive values supported by the system.
|
|
# Note that for the purposes of inspection or transformation, the number
|
|
# of bytes considered to comprise a 'Value' is based on its representation
|
|
# as a UTF-8 encoded string. For example, if 'integer_value' is set to
|
|
# 123456789, the number of bytes would be counted as 9, even though an
|
|
# int64 only holds up to 8 bytes of data.
|
|
"floatValue": 3.14,
|
|
"timestampValue": "A String",
|
|
"dayOfWeekValue": "A String",
|
|
"timeValue": { # Represents a time of day. The date and time zone are either not significant
|
|
# or are specified elsewhere. An API may choose to allow leap seconds. Related
|
|
# types are google.type.Date and `google.protobuf.Timestamp`.
|
|
"hours": 42, # Hours of day in 24 hour format. Should be from 0 to 23. An API may choose
|
|
# to allow the value "24:00:00" for scenarios like business closing time.
|
|
"nanos": 42, # Fractions of seconds in nanoseconds. Must be from 0 to 999,999,999.
|
|
"seconds": 42, # Seconds of minutes of the time. Must normally be from 0 to 59. An API may
|
|
# allow the value 60 if it allows leap-seconds.
|
|
"minutes": 42, # Minutes of hour of day. Must be from 0 to 59.
|
|
},
|
|
"dateValue": { # Represents a whole or partial calendar date, e.g. a birthday. The time of day
|
|
# and time zone are either specified elsewhere or are not significant. The date
|
|
# is relative to the Proleptic Gregorian Calendar. This can represent:
|
|
#
|
|
# * A full date, with non-zero year, month and day values
|
|
# * A month and day value, with a zero year, e.g. an anniversary
|
|
# * A year on its own, with zero month and day values
|
|
# * A year and month value, with a zero day, e.g. a credit card expiration date
|
|
#
|
|
# Related types are google.type.TimeOfDay and `google.protobuf.Timestamp`.
|
|
"year": 42, # Year of date. Must be from 1 to 9999, or 0 if specifying a date without
|
|
# a year.
|
|
"day": 42, # Day of month. Must be from 1 to 31 and valid for the year and month, or 0
|
|
# if specifying a year by itself or a year and month where the day is not
|
|
# significant.
|
|
"month": 42, # Month of year. Must be from 1 to 12, or 0 if specifying a year without a
|
|
# month and day.
|
|
},
|
|
"stringValue": "A String",
|
|
"booleanValue": True or False,
|
|
"integerValue": "A String",
|
|
},
|
|
],
|
|
"equivalenceClassSize": "A String", # Size of the equivalence class, for example number of rows with the
|
|
# above set of values.
|
|
},
|
|
],
|
|
"bucketValueCount": "A String", # Total number of distinct equivalence classes in this bucket.
|
|
"equivalenceClassSizeLowerBound": "A String", # Lower bound on the size of the equivalence classes in this bucket.
|
|
"equivalenceClassSizeUpperBound": "A String", # Upper bound on the size of the equivalence classes in this bucket.
|
|
"bucketSize": "A String", # Total number of equivalence classes in this bucket.
|
|
},
|
|
],
|
|
},
|
|
"lDiversityResult": { # Result of the l-diversity computation.
|
|
"sensitiveValueFrequencyHistogramBuckets": [ # Histogram of l-diversity equivalence class sensitive value frequencies.
|
|
{
|
|
"bucketValues": [ # Sample of equivalence classes in this bucket. The total number of
|
|
# classes returned per bucket is capped at 20.
|
|
{ # The set of columns' values that share the same ldiversity value.
|
|
"numDistinctSensitiveValues": "A String", # Number of distinct sensitive values in this equivalence class.
|
|
"quasiIdsValues": [ # Quasi-identifier values defining the k-anonymity equivalence
|
|
# class. The order is always the same as the original request.
|
|
{ # Set of primitive values supported by the system.
|
|
# Note that for the purposes of inspection or transformation, the number
|
|
# of bytes considered to comprise a 'Value' is based on its representation
|
|
# as a UTF-8 encoded string. For example, if 'integer_value' is set to
|
|
# 123456789, the number of bytes would be counted as 9, even though an
|
|
# int64 only holds up to 8 bytes of data.
|
|
"floatValue": 3.14,
|
|
"timestampValue": "A String",
|
|
"dayOfWeekValue": "A String",
|
|
"timeValue": { # Represents a time of day. The date and time zone are either not significant
|
|
# or are specified elsewhere. An API may choose to allow leap seconds. Related
|
|
# types are google.type.Date and `google.protobuf.Timestamp`.
|
|
"hours": 42, # Hours of day in 24 hour format. Should be from 0 to 23. An API may choose
|
|
# to allow the value "24:00:00" for scenarios like business closing time.
|
|
"nanos": 42, # Fractions of seconds in nanoseconds. Must be from 0 to 999,999,999.
|
|
"seconds": 42, # Seconds of minutes of the time. Must normally be from 0 to 59. An API may
|
|
# allow the value 60 if it allows leap-seconds.
|
|
"minutes": 42, # Minutes of hour of day. Must be from 0 to 59.
|
|
},
|
|
"dateValue": { # Represents a whole or partial calendar date, e.g. a birthday. The time of day
|
|
# and time zone are either specified elsewhere or are not significant. The date
|
|
# is relative to the Proleptic Gregorian Calendar. This can represent:
|
|
#
|
|
# * A full date, with non-zero year, month and day values
|
|
# * A month and day value, with a zero year, e.g. an anniversary
|
|
# * A year on its own, with zero month and day values
|
|
# * A year and month value, with a zero day, e.g. a credit card expiration date
|
|
#
|
|
# Related types are google.type.TimeOfDay and `google.protobuf.Timestamp`.
|
|
"year": 42, # Year of date. Must be from 1 to 9999, or 0 if specifying a date without
|
|
# a year.
|
|
"day": 42, # Day of month. Must be from 1 to 31 and valid for the year and month, or 0
|
|
# if specifying a year by itself or a year and month where the day is not
|
|
# significant.
|
|
"month": 42, # Month of year. Must be from 1 to 12, or 0 if specifying a year without a
|
|
# month and day.
|
|
},
|
|
"stringValue": "A String",
|
|
"booleanValue": True or False,
|
|
"integerValue": "A String",
|
|
},
|
|
],
|
|
"topSensitiveValues": [ # Estimated frequencies of top sensitive values.
|
|
{ # A value of a field, including its frequency.
|
|
"count": "A String", # How many times the value is contained in the field.
|
|
"value": { # Set of primitive values supported by the system. # A value contained in the field in question.
|
|
# Note that for the purposes of inspection or transformation, the number
|
|
# of bytes considered to comprise a 'Value' is based on its representation
|
|
# as a UTF-8 encoded string. For example, if 'integer_value' is set to
|
|
# 123456789, the number of bytes would be counted as 9, even though an
|
|
# int64 only holds up to 8 bytes of data.
|
|
"floatValue": 3.14,
|
|
"timestampValue": "A String",
|
|
"dayOfWeekValue": "A String",
|
|
"timeValue": { # Represents a time of day. The date and time zone are either not significant
|
|
# or are specified elsewhere. An API may choose to allow leap seconds. Related
|
|
# types are google.type.Date and `google.protobuf.Timestamp`.
|
|
"hours": 42, # Hours of day in 24 hour format. Should be from 0 to 23. An API may choose
|
|
# to allow the value "24:00:00" for scenarios like business closing time.
|
|
"nanos": 42, # Fractions of seconds in nanoseconds. Must be from 0 to 999,999,999.
|
|
"seconds": 42, # Seconds of minutes of the time. Must normally be from 0 to 59. An API may
|
|
# allow the value 60 if it allows leap-seconds.
|
|
"minutes": 42, # Minutes of hour of day. Must be from 0 to 59.
|
|
},
|
|
"dateValue": { # Represents a whole or partial calendar date, e.g. a birthday. The time of day
|
|
# and time zone are either specified elsewhere or are not significant. The date
|
|
# is relative to the Proleptic Gregorian Calendar. This can represent:
|
|
#
|
|
# * A full date, with non-zero year, month and day values
|
|
# * A month and day value, with a zero year, e.g. an anniversary
|
|
# * A year on its own, with zero month and day values
|
|
# * A year and month value, with a zero day, e.g. a credit card expiration date
|
|
#
|
|
# Related types are google.type.TimeOfDay and `google.protobuf.Timestamp`.
|
|
"year": 42, # Year of date. Must be from 1 to 9999, or 0 if specifying a date without
|
|
# a year.
|
|
"day": 42, # Day of month. Must be from 1 to 31 and valid for the year and month, or 0
|
|
# if specifying a year by itself or a year and month where the day is not
|
|
# significant.
|
|
"month": 42, # Month of year. Must be from 1 to 12, or 0 if specifying a year without a
|
|
# month and day.
|
|
},
|
|
"stringValue": "A String",
|
|
"booleanValue": True or False,
|
|
"integerValue": "A String",
|
|
},
|
|
},
|
|
],
|
|
"equivalenceClassSize": "A String", # Size of the k-anonymity equivalence class.
|
|
},
|
|
],
|
|
"bucketValueCount": "A String", # Total number of distinct equivalence classes in this bucket.
|
|
"bucketSize": "A String", # Total number of equivalence classes in this bucket.
|
|
"sensitiveValueFrequencyUpperBound": "A String", # Upper bound on the sensitive value frequencies of the equivalence
|
|
# classes in this bucket.
|
|
"sensitiveValueFrequencyLowerBound": "A String", # Lower bound on the sensitive value frequencies of the equivalence
|
|
# classes in this bucket.
|
|
},
|
|
],
|
|
},
|
|
"requestedPrivacyMetric": { # Privacy metric to compute for reidentification risk analysis. # Privacy metric to compute.
|
|
"numericalStatsConfig": { # Compute numerical stats over an individual column, including
|
|
# min, max, and quantiles.
|
|
"field": { # General identifier of a data field in a storage service. # Field to compute numerical stats on. Supported types are
|
|
# integer, float, date, datetime, timestamp, time.
|
|
"name": "A String", # Name describing the field.
|
|
},
|
|
},
|
|
"kMapEstimationConfig": { # Reidentifiability metric. This corresponds to a risk model similar to what
|
|
# is called "journalist risk" in the literature, except the attack dataset is
|
|
# statistically modeled instead of being perfectly known. This can be done
|
|
# using publicly available data (like the US Census), or using a custom
|
|
# statistical model (indicated as one or several BigQuery tables), or by
|
|
# extrapolating from the distribution of values in the input dataset.
|
|
# A column with a semantic tag attached.
|
|
"regionCode": "A String", # ISO 3166-1 alpha-2 region code to use in the statistical modeling.
|
|
# Required if no column is tagged with a region-specific InfoType (like
|
|
# US_ZIP_5) or a region code.
|
|
"quasiIds": [ # Fields considered to be quasi-identifiers. No two columns can have the
|
|
# same tag. [required]
|
|
{
|
|
"field": { # General identifier of a data field in a storage service. # Identifies the column. [required]
|
|
"name": "A String", # Name describing the field.
|
|
},
|
|
"customTag": "A String", # A column can be tagged with a custom tag. In this case, the user must
|
|
# indicate an auxiliary table that contains statistical information on
|
|
# the possible values of this column (below).
|
|
"infoType": { # Type of information detected by the API. # A column can be tagged with a InfoType to use the relevant public
|
|
# dataset as a statistical model of population, if available. We
|
|
# currently support US ZIP codes, region codes, ages and genders.
|
|
# To programmatically obtain the list of supported InfoTypes, use
|
|
# ListInfoTypes with the supported_by=RISK_ANALYSIS filter.
|
|
"name": "A String", # Name of the information type. Either a name of your choosing when
|
|
# creating a CustomInfoType, or one of the names listed
|
|
# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying
|
|
# a built-in type. InfoType names should conform to the pattern
|
|
# [a-zA-Z0-9_]{1,64}.
|
|
},
|
|
"inferred": { # A generic empty message that you can re-use to avoid defining duplicated # If no semantic tag is indicated, we infer the statistical model from
|
|
# the distribution of values in the input data
|
|
# empty messages in your APIs. A typical example is to use it as the request
|
|
# or the response type of an API method. For instance:
|
|
#
|
|
# service Foo {
|
|
# rpc Bar(google.protobuf.Empty) returns (google.protobuf.Empty);
|
|
# }
|
|
#
|
|
# The JSON representation for `Empty` is empty JSON object `{}`.
|
|
},
|
|
},
|
|
],
|
|
"auxiliaryTables": [ # Several auxiliary tables can be used in the analysis. Each custom_tag
|
|
# used to tag a quasi-identifiers column must appear in exactly one column
|
|
# of one auxiliary table.
|
|
{ # An auxiliary table contains statistical information on the relative
|
|
# frequency of different quasi-identifiers values. It has one or several
|
|
# quasi-identifiers columns, and one column that indicates the relative
|
|
# frequency of each quasi-identifier tuple.
|
|
# If a tuple is present in the data but not in the auxiliary table, the
|
|
# corresponding relative frequency is assumed to be zero (and thus, the
|
|
# tuple is highly reidentifiable).
|
|
"relativeFrequency": { # General identifier of a data field in a storage service. # The relative frequency column must contain a floating-point number
|
|
# between 0 and 1 (inclusive). Null values are assumed to be zero.
|
|
# [required]
|
|
"name": "A String", # Name describing the field.
|
|
},
|
|
"quasiIds": [ # Quasi-identifier columns. [required]
|
|
{ # A quasi-identifier column has a custom_tag, used to know which column
|
|
# in the data corresponds to which column in the statistical model.
|
|
"field": { # General identifier of a data field in a storage service.
|
|
"name": "A String", # Name describing the field.
|
|
},
|
|
"customTag": "A String",
|
|
},
|
|
],
|
|
"table": { # Message defining the location of a BigQuery table. A table is uniquely # Auxiliary table location. [required]
|
|
# identified by its project_id, dataset_id, and table_name. Within a query
|
|
# a table is often referenced with a string in the format of:
|
|
# `<project_id>:<dataset_id>.<table_id>` or
|
|
# `<project_id>.<dataset_id>.<table_id>`.
|
|
"projectId": "A String", # The Google Cloud Platform project ID of the project containing the table.
|
|
# If omitted, project ID is inferred from the API call.
|
|
"tableId": "A String", # Name of the table.
|
|
"datasetId": "A String", # Dataset ID of the table.
|
|
},
|
|
},
|
|
],
|
|
},
|
|
"lDiversityConfig": { # l-diversity metric, used for analysis of reidentification risk.
|
|
"sensitiveAttribute": { # General identifier of a data field in a storage service. # Sensitive field for computing the l-value.
|
|
"name": "A String", # Name describing the field.
|
|
},
|
|
"quasiIds": [ # Set of quasi-identifiers indicating how equivalence classes are
|
|
# defined for the l-diversity computation. When multiple fields are
|
|
# specified, they are considered a single composite key.
|
|
{ # General identifier of a data field in a storage service.
|
|
"name": "A String", # Name describing the field.
|
|
},
|
|
],
|
|
},
|
|
"deltaPresenceEstimationConfig": { # δ-presence metric, used to estimate how likely it is for an attacker to
|
|
# figure out that one given individual appears in a de-identified dataset.
|
|
# Similarly to the k-map metric, we cannot compute δ-presence exactly without
|
|
# knowing the attack dataset, so we use a statistical model instead.
|
|
"regionCode": "A String", # ISO 3166-1 alpha-2 region code to use in the statistical modeling.
|
|
# Required if no column is tagged with a region-specific InfoType (like
|
|
# US_ZIP_5) or a region code.
|
|
"quasiIds": [ # Fields considered to be quasi-identifiers. No two fields can have the
|
|
# same tag. [required]
|
|
{ # A column with a semantic tag attached.
|
|
"field": { # General identifier of a data field in a storage service. # Identifies the column. [required]
|
|
"name": "A String", # Name describing the field.
|
|
},
|
|
"customTag": "A String", # A column can be tagged with a custom tag. In this case, the user must
|
|
# indicate an auxiliary table that contains statistical information on
|
|
# the possible values of this column (below).
|
|
"infoType": { # Type of information detected by the API. # A column can be tagged with a InfoType to use the relevant public
|
|
# dataset as a statistical model of population, if available. We
|
|
# currently support US ZIP codes, region codes, ages and genders.
|
|
# To programmatically obtain the list of supported InfoTypes, use
|
|
# ListInfoTypes with the supported_by=RISK_ANALYSIS filter.
|
|
"name": "A String", # Name of the information type. Either a name of your choosing when
|
|
# creating a CustomInfoType, or one of the names listed
|
|
# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying
|
|
# a built-in type. InfoType names should conform to the pattern
|
|
# [a-zA-Z0-9_]{1,64}.
|
|
},
|
|
"inferred": { # A generic empty message that you can re-use to avoid defining duplicated # If no semantic tag is indicated, we infer the statistical model from
|
|
# the distribution of values in the input data
|
|
# empty messages in your APIs. A typical example is to use it as the request
|
|
# or the response type of an API method. For instance:
|
|
#
|
|
# service Foo {
|
|
# rpc Bar(google.protobuf.Empty) returns (google.protobuf.Empty);
|
|
# }
|
|
#
|
|
# The JSON representation for `Empty` is empty JSON object `{}`.
|
|
},
|
|
},
|
|
],
|
|
"auxiliaryTables": [ # Several auxiliary tables can be used in the analysis. Each custom_tag
|
|
# used to tag a quasi-identifiers field must appear in exactly one
|
|
# field of one auxiliary table.
|
|
{ # An auxiliary table containing statistical information on the relative
|
|
# frequency of different quasi-identifiers values. It has one or several
|
|
# quasi-identifiers columns, and one column that indicates the relative
|
|
# frequency of each quasi-identifier tuple.
|
|
# If a tuple is present in the data but not in the auxiliary table, the
|
|
# corresponding relative frequency is assumed to be zero (and thus, the
|
|
# tuple is highly reidentifiable).
|
|
"relativeFrequency": { # General identifier of a data field in a storage service. # The relative frequency column must contain a floating-point number
|
|
# between 0 and 1 (inclusive). Null values are assumed to be zero.
|
|
# [required]
|
|
"name": "A String", # Name describing the field.
|
|
},
|
|
"quasiIds": [ # Quasi-identifier columns. [required]
|
|
{ # A quasi-identifier column has a custom_tag, used to know which column
|
|
# in the data corresponds to which column in the statistical model.
|
|
"field": { # General identifier of a data field in a storage service.
|
|
"name": "A String", # Name describing the field.
|
|
},
|
|
"customTag": "A String",
|
|
},
|
|
],
|
|
"table": { # Message defining the location of a BigQuery table. A table is uniquely # Auxiliary table location. [required]
|
|
# identified by its project_id, dataset_id, and table_name. Within a query
|
|
# a table is often referenced with a string in the format of:
|
|
# `<project_id>:<dataset_id>.<table_id>` or
|
|
# `<project_id>.<dataset_id>.<table_id>`.
|
|
"projectId": "A String", # The Google Cloud Platform project ID of the project containing the table.
|
|
# If omitted, project ID is inferred from the API call.
|
|
"tableId": "A String", # Name of the table.
|
|
"datasetId": "A String", # Dataset ID of the table.
|
|
},
|
|
},
|
|
],
|
|
},
|
|
"categoricalStatsConfig": { # Compute numerical stats over an individual column, including
|
|
# number of distinct values and value count distribution.
|
|
"field": { # General identifier of a data field in a storage service. # Field to compute categorical stats on. All column types are
|
|
# supported except for arrays and structs. However, it may be more
|
|
# informative to use NumericalStats when the field type is supported,
|
|
# depending on the data.
|
|
"name": "A String", # Name describing the field.
|
|
},
|
|
},
|
|
"kAnonymityConfig": { # k-anonymity metric, used for analysis of reidentification risk.
|
|
"entityId": { # An entity in a dataset is a field or set of fields that correspond to a # Optional message indicating that multiple rows might be associated to a
|
|
# single individual. If the same entity_id is associated to multiple
|
|
# quasi-identifier tuples over distinct rows, we consider the entire
|
|
# collection of tuples as the composite quasi-identifier. This collection
|
|
# is a multiset: the order in which the different tuples appear in the
|
|
# dataset is ignored, but their frequency is taken into account.
|
|
#
|
|
# Important note: a maximum of 1000 rows can be associated to a single
|
|
# entity ID. If more rows are associated with the same entity ID, some
|
|
# might be ignored.
|
|
# single person. For example, in medical records the `EntityId` might be a
|
|
# patient identifier, or for financial records it might be an account
|
|
# identifier. This message is used when generalizations or analysis must take
|
|
# into account that multiple rows correspond to the same entity.
|
|
"field": { # General identifier of a data field in a storage service. # Composite key indicating which field contains the entity identifier.
|
|
"name": "A String", # Name describing the field.
|
|
},
|
|
},
|
|
"quasiIds": [ # Set of fields to compute k-anonymity over. When multiple fields are
|
|
# specified, they are considered a single composite key. Structs and
|
|
# repeated data types are not supported; however, nested fields are
|
|
# supported so long as they are not structs themselves or nested within
|
|
# a repeated field.
|
|
{ # General identifier of a data field in a storage service.
|
|
"name": "A String", # Name describing the field.
|
|
},
|
|
],
|
|
},
|
|
},
|
|
"categoricalStatsResult": { # Result of the categorical stats computation.
|
|
"valueFrequencyHistogramBuckets": [ # Histogram of value frequencies in the column.
|
|
{
|
|
"bucketValues": [ # Sample of value frequencies in this bucket. The total number of
|
|
# values returned per bucket is capped at 20.
|
|
{ # A value of a field, including its frequency.
|
|
"count": "A String", # How many times the value is contained in the field.
|
|
"value": { # Set of primitive values supported by the system. # A value contained in the field in question.
|
|
# Note that for the purposes of inspection or transformation, the number
|
|
# of bytes considered to comprise a 'Value' is based on its representation
|
|
# as a UTF-8 encoded string. For example, if 'integer_value' is set to
|
|
# 123456789, the number of bytes would be counted as 9, even though an
|
|
# int64 only holds up to 8 bytes of data.
|
|
"floatValue": 3.14,
|
|
"timestampValue": "A String",
|
|
"dayOfWeekValue": "A String",
|
|
"timeValue": { # Represents a time of day. The date and time zone are either not significant
|
|
# or are specified elsewhere. An API may choose to allow leap seconds. Related
|
|
# types are google.type.Date and `google.protobuf.Timestamp`.
|
|
"hours": 42, # Hours of day in 24 hour format. Should be from 0 to 23. An API may choose
|
|
# to allow the value "24:00:00" for scenarios like business closing time.
|
|
"nanos": 42, # Fractions of seconds in nanoseconds. Must be from 0 to 999,999,999.
|
|
"seconds": 42, # Seconds of minutes of the time. Must normally be from 0 to 59. An API may
|
|
# allow the value 60 if it allows leap-seconds.
|
|
"minutes": 42, # Minutes of hour of day. Must be from 0 to 59.
|
|
},
|
|
"dateValue": { # Represents a whole or partial calendar date, e.g. a birthday. The time of day
|
|
# and time zone are either specified elsewhere or are not significant. The date
|
|
# is relative to the Proleptic Gregorian Calendar. This can represent:
|
|
#
|
|
# * A full date, with non-zero year, month and day values
|
|
# * A month and day value, with a zero year, e.g. an anniversary
|
|
# * A year on its own, with zero month and day values
|
|
# * A year and month value, with a zero day, e.g. a credit card expiration date
|
|
#
|
|
# Related types are google.type.TimeOfDay and `google.protobuf.Timestamp`.
|
|
"year": 42, # Year of date. Must be from 1 to 9999, or 0 if specifying a date without
|
|
# a year.
|
|
"day": 42, # Day of month. Must be from 1 to 31 and valid for the year and month, or 0
|
|
# if specifying a year by itself or a year and month where the day is not
|
|
# significant.
|
|
"month": 42, # Month of year. Must be from 1 to 12, or 0 if specifying a year without a
|
|
# month and day.
|
|
},
|
|
"stringValue": "A String",
|
|
"booleanValue": True or False,
|
|
"integerValue": "A String",
|
|
},
|
|
},
|
|
],
|
|
"bucketValueCount": "A String", # Total number of distinct values in this bucket.
|
|
"valueFrequencyUpperBound": "A String", # Upper bound on the value frequency of the values in this bucket.
|
|
"valueFrequencyLowerBound": "A String", # Lower bound on the value frequency of the values in this bucket.
|
|
"bucketSize": "A String", # Total number of values in this bucket.
|
|
},
|
|
],
|
|
},
|
|
"deltaPresenceEstimationResult": { # Result of the δ-presence computation. Note that these results are an
|
|
# estimation, not exact values.
|
|
"deltaPresenceEstimationHistogram": [ # The intervals [min_probability, max_probability) do not overlap. If a
|
|
# value doesn't correspond to any such interval, the associated frequency
|
|
# is zero. For example, the following records:
|
|
# {min_probability: 0, max_probability: 0.1, frequency: 17}
|
|
# {min_probability: 0.2, max_probability: 0.3, frequency: 42}
|
|
# {min_probability: 0.3, max_probability: 0.4, frequency: 99}
|
|
# mean that there are no record with an estimated probability in [0.1, 0.2)
|
|
# nor larger or equal to 0.4.
|
|
{ # A DeltaPresenceEstimationHistogramBucket message with the following
|
|
# values:
|
|
# min_probability: 0.1
|
|
# max_probability: 0.2
|
|
# frequency: 42
|
|
# means that there are 42 records for which δ is in [0.1, 0.2). An
|
|
# important particular case is when min_probability = max_probability = 1:
|
|
# then, every individual who shares this quasi-identifier combination is in
|
|
# the dataset.
|
|
"bucketValues": [ # Sample of quasi-identifier tuple values in this bucket. The total
|
|
# number of classes returned per bucket is capped at 20.
|
|
{ # A tuple of values for the quasi-identifier columns.
|
|
"quasiIdsValues": [ # The quasi-identifier values.
|
|
{ # Set of primitive values supported by the system.
|
|
# Note that for the purposes of inspection or transformation, the number
|
|
# of bytes considered to comprise a 'Value' is based on its representation
|
|
# as a UTF-8 encoded string. For example, if 'integer_value' is set to
|
|
# 123456789, the number of bytes would be counted as 9, even though an
|
|
# int64 only holds up to 8 bytes of data.
|
|
"floatValue": 3.14,
|
|
"timestampValue": "A String",
|
|
"dayOfWeekValue": "A String",
|
|
"timeValue": { # Represents a time of day. The date and time zone are either not significant
|
|
# or are specified elsewhere. An API may choose to allow leap seconds. Related
|
|
# types are google.type.Date and `google.protobuf.Timestamp`.
|
|
"hours": 42, # Hours of day in 24 hour format. Should be from 0 to 23. An API may choose
|
|
# to allow the value "24:00:00" for scenarios like business closing time.
|
|
"nanos": 42, # Fractions of seconds in nanoseconds. Must be from 0 to 999,999,999.
|
|
"seconds": 42, # Seconds of minutes of the time. Must normally be from 0 to 59. An API may
|
|
# allow the value 60 if it allows leap-seconds.
|
|
"minutes": 42, # Minutes of hour of day. Must be from 0 to 59.
|
|
},
|
|
"dateValue": { # Represents a whole or partial calendar date, e.g. a birthday. The time of day
|
|
# and time zone are either specified elsewhere or are not significant. The date
|
|
# is relative to the Proleptic Gregorian Calendar. This can represent:
|
|
#
|
|
# * A full date, with non-zero year, month and day values
|
|
# * A month and day value, with a zero year, e.g. an anniversary
|
|
# * A year on its own, with zero month and day values
|
|
# * A year and month value, with a zero day, e.g. a credit card expiration date
|
|
#
|
|
# Related types are google.type.TimeOfDay and `google.protobuf.Timestamp`.
|
|
"year": 42, # Year of date. Must be from 1 to 9999, or 0 if specifying a date without
|
|
# a year.
|
|
"day": 42, # Day of month. Must be from 1 to 31 and valid for the year and month, or 0
|
|
# if specifying a year by itself or a year and month where the day is not
|
|
# significant.
|
|
"month": 42, # Month of year. Must be from 1 to 12, or 0 if specifying a year without a
|
|
# month and day.
|
|
},
|
|
"stringValue": "A String",
|
|
"booleanValue": True or False,
|
|
"integerValue": "A String",
|
|
},
|
|
],
|
|
"estimatedProbability": 3.14, # The estimated probability that a given individual sharing these
|
|
# quasi-identifier values is in the dataset. This value, typically called
|
|
# δ, is the ratio between the number of records in the dataset with these
|
|
# quasi-identifier values, and the total number of individuals (inside
|
|
# *and* outside the dataset) with these quasi-identifier values.
|
|
# For example, if there are 15 individuals in the dataset who share the
|
|
# same quasi-identifier values, and an estimated 100 people in the entire
|
|
# population with these values, then δ is 0.15.
|
|
},
|
|
],
|
|
"bucketValueCount": "A String", # Total number of distinct quasi-identifier tuple values in this bucket.
|
|
"bucketSize": "A String", # Number of records within these probability bounds.
|
|
"maxProbability": 3.14, # Always greater than or equal to min_probability.
|
|
"minProbability": 3.14, # Between 0 and 1.
|
|
},
|
|
],
|
|
},
|
|
"requestedSourceTable": { # Message defining the location of a BigQuery table. A table is uniquely # Input dataset to compute metrics over.
|
|
# identified by its project_id, dataset_id, and table_name. Within a query
|
|
# a table is often referenced with a string in the format of:
|
|
# `<project_id>:<dataset_id>.<table_id>` or
|
|
# `<project_id>.<dataset_id>.<table_id>`.
|
|
"projectId": "A String", # The Google Cloud Platform project ID of the project containing the table.
|
|
# If omitted, project ID is inferred from the API call.
|
|
"tableId": "A String", # Name of the table.
|
|
"datasetId": "A String", # Dataset ID of the table.
|
|
},
|
|
},
|
|
"state": "A String", # State of a job.
|
|
"jobTriggerName": "A String", # If created by a job trigger, the resource name of the trigger that
|
|
# instantiated the job.
|
|
"startTime": "A String", # Time when the job started.
|
|
"endTime": "A String", # Time when the job finished.
|
|
"type": "A String", # The type of job.
|
|
"createTime": "A String", # Time when the job was created.
|
|
}</pre>
|
|
</div>
|
|
|
|
<div class="method">
|
|
<code class="details" id="list">list(parent, orderBy=None, type=None, pageSize=None, pageToken=None, x__xgafv=None, filter=None)</code>
|
|
<pre>Lists DlpJobs that match the specified filter in the request.
|
|
See https://cloud.google.com/dlp/docs/inspecting-storage and
|
|
https://cloud.google.com/dlp/docs/compute-risk-analysis to learn more.
|
|
|
|
Args:
|
|
parent: string, The parent resource name, for example projects/my-project-id. (required)
|
|
orderBy: string, Optional comma separated list of fields to order by,
|
|
followed by `asc` or `desc` postfix. This list is case-insensitive,
|
|
default sorting order is ascending, redundant space characters are
|
|
insignificant.
|
|
|
|
Example: `name asc, end_time asc, create_time desc`
|
|
|
|
Supported fields are:
|
|
|
|
- `create_time`: corresponds to time the job was created.
|
|
- `end_time`: corresponds to time the job ended.
|
|
- `name`: corresponds to job's name.
|
|
- `state`: corresponds to `state`
|
|
type: string, The type of job. Defaults to `DlpJobType.INSPECT`
|
|
pageSize: integer, The standard list page size.
|
|
pageToken: string, The standard list page token.
|
|
x__xgafv: string, V1 error format.
|
|
Allowed values
|
|
1 - v1 error format
|
|
2 - v2 error format
|
|
filter: string, Optional. Allows filtering.
|
|
|
|
Supported syntax:
|
|
|
|
* Filter expressions are made up of one or more restrictions.
|
|
* Restrictions can be combined by `AND` or `OR` logical operators. A
|
|
sequence of restrictions implicitly uses `AND`.
|
|
* A restriction has the form of `<field> <operator> <value>`.
|
|
* Supported fields/values for inspect jobs:
|
|
- `state` - PENDING|RUNNING|CANCELED|FINISHED|FAILED
|
|
- `inspected_storage` - DATASTORE|CLOUD_STORAGE|BIGQUERY
|
|
- `trigger_name` - The resource name of the trigger that created job.
|
|
- 'end_time` - Corresponds to time the job finished.
|
|
- 'start_time` - Corresponds to time the job finished.
|
|
* Supported fields for risk analysis jobs:
|
|
- `state` - RUNNING|CANCELED|FINISHED|FAILED
|
|
- 'end_time` - Corresponds to time the job finished.
|
|
- 'start_time` - Corresponds to time the job finished.
|
|
* The operator must be `=` or `!=`.
|
|
|
|
Examples:
|
|
|
|
* inspected_storage = cloud_storage AND state = done
|
|
* inspected_storage = cloud_storage OR inspected_storage = bigquery
|
|
* inspected_storage = cloud_storage AND (state = done OR state = canceled)
|
|
* end_time > \"2017-12-12T00:00:00+00:00\"
|
|
|
|
The length of this field should be no more than 500 characters.
|
|
|
|
Returns:
|
|
An object of the form:
|
|
|
|
{ # The response message for listing DLP jobs.
|
|
"nextPageToken": "A String", # The standard List next-page token.
|
|
"jobs": [ # A list of DlpJobs that matches the specified filter in the request.
|
|
{ # Combines all of the information about a DLP job.
|
|
"errors": [ # A stream of errors encountered running the job.
|
|
{ # Details information about an error encountered during job execution or
|
|
# the results of an unsuccessful activation of the JobTrigger.
|
|
# Output only field.
|
|
"timestamps": [ # The times the error occurred.
|
|
"A String",
|
|
],
|
|
"details": { # The `Status` type defines a logical error model that is suitable for
|
|
# different programming environments, including REST APIs and RPC APIs. It is
|
|
# used by [gRPC](https://github.com/grpc). Each `Status` message contains
|
|
# three pieces of data: error code, error message, and error details.
|
|
#
|
|
# You can find out more about this error model and how to work with it in the
|
|
# [API Design Guide](https://cloud.google.com/apis/design/errors).
|
|
"message": "A String", # A developer-facing error message, which should be in English. Any
|
|
# user-facing error message should be localized and sent in the
|
|
# google.rpc.Status.details field, or localized by the client.
|
|
"code": 42, # The status code, which should be an enum value of google.rpc.Code.
|
|
"details": [ # A list of messages that carry the error details. There is a common set of
|
|
# message types for APIs to use.
|
|
{
|
|
"a_key": "", # Properties of the object. Contains field @type with type URL.
|
|
},
|
|
],
|
|
},
|
|
},
|
|
],
|
|
"name": "A String", # The server-assigned name.
|
|
"inspectDetails": { # The results of an inspect DataSource job. # Results from inspecting a data source.
|
|
"requestedOptions": { # The configuration used for this job.
|
|
"snapshotInspectTemplate": { # The inspectTemplate contains a configuration (set of types of sensitive data # If run with an InspectTemplate, a snapshot of its state at the time of
|
|
# this run.
|
|
# to be detected) to be used anywhere you otherwise would normally specify
|
|
# InspectConfig. See https://cloud.google.com/dlp/docs/concepts-templates
|
|
# to learn more.
|
|
"updateTime": "A String", # The last update timestamp of a inspectTemplate, output only field.
|
|
"displayName": "A String", # Display name (max 256 chars).
|
|
"description": "A String", # Short description (max 256 chars).
|
|
"inspectConfig": { # Configuration description of the scanning process. # The core content of the template. Configuration of the scanning process.
|
|
# When used with redactContent only info_types and min_likelihood are currently
|
|
# used.
|
|
"excludeInfoTypes": True or False, # When true, excludes type information of the findings.
|
|
"limits": {
|
|
"maxFindingsPerRequest": 42, # Max number of findings that will be returned per request/job.
|
|
# When set within `InspectContentRequest`, the maximum returned is 2000
|
|
# regardless if this is set higher.
|
|
"maxFindingsPerInfoType": [ # Configuration of findings limit given for specified infoTypes.
|
|
{ # Max findings configuration per infoType, per content item or long
|
|
# running DlpJob.
|
|
"infoType": { # Type of information detected by the API. # Type of information the findings limit applies to. Only one limit per
|
|
# info_type should be provided. If InfoTypeLimit does not have an
|
|
# info_type, the DLP API applies the limit against all info_types that
|
|
# are found but not specified in another InfoTypeLimit.
|
|
"name": "A String", # Name of the information type. Either a name of your choosing when
|
|
# creating a CustomInfoType, or one of the names listed
|
|
# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying
|
|
# a built-in type. InfoType names should conform to the pattern
|
|
# [a-zA-Z0-9_]{1,64}.
|
|
},
|
|
"maxFindings": 42, # Max findings limit for the given infoType.
|
|
},
|
|
],
|
|
"maxFindingsPerItem": 42, # Max number of findings that will be returned for each item scanned.
|
|
# When set within `InspectDataSourceRequest`,
|
|
# the maximum returned is 2000 regardless if this is set higher.
|
|
# When set within `InspectContentRequest`, this field is ignored.
|
|
},
|
|
"minLikelihood": "A String", # Only returns findings equal or above this threshold. The default is
|
|
# POSSIBLE.
|
|
# See https://cloud.google.com/dlp/docs/likelihood to learn more.
|
|
"customInfoTypes": [ # CustomInfoTypes provided by the user. See
|
|
# https://cloud.google.com/dlp/docs/creating-custom-infotypes to learn more.
|
|
{ # Custom information type provided by the user. Used to find domain-specific
|
|
# sensitive information configurable to the data in question.
|
|
"regex": { # Message defining a custom regular expression. # Regular expression based CustomInfoType.
|
|
"pattern": "A String", # Pattern defining the regular expression. Its syntax
|
|
# (https://github.com/google/re2/wiki/Syntax) can be found under the
|
|
# google/re2 repository on GitHub.
|
|
"groupIndexes": [ # The index of the submatch to extract as findings. When not
|
|
# specified, the entire match is returned. No more than 3 may be included.
|
|
42,
|
|
],
|
|
},
|
|
"surrogateType": { # Message for detecting output from deidentification transformations # Message for detecting output from deidentification transformations that
|
|
# support reversing.
|
|
# such as
|
|
# [`CryptoReplaceFfxFpeConfig`](/dlp/docs/reference/rest/v2/organizations.deidentifyTemplates#cryptoreplaceffxfpeconfig).
|
|
# These types of transformations are
|
|
# those that perform pseudonymization, thereby producing a "surrogate" as
|
|
# output. This should be used in conjunction with a field on the
|
|
# transformation such as `surrogate_info_type`. This CustomInfoType does
|
|
# not support the use of `detection_rules`.
|
|
},
|
|
"infoType": { # Type of information detected by the API. # CustomInfoType can either be a new infoType, or an extension of built-in
|
|
# infoType, when the name matches one of existing infoTypes and that infoType
|
|
# is specified in `InspectContent.info_types` field. Specifying the latter
|
|
# adds findings to the one detected by the system. If built-in info type is
|
|
# not specified in `InspectContent.info_types` list then the name is treated
|
|
# as a custom info type.
|
|
"name": "A String", # Name of the information type. Either a name of your choosing when
|
|
# creating a CustomInfoType, or one of the names listed
|
|
# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying
|
|
# a built-in type. InfoType names should conform to the pattern
|
|
# [a-zA-Z0-9_]{1,64}.
|
|
},
|
|
"dictionary": { # Custom information type based on a dictionary of words or phrases. This can # A list of phrases to detect as a CustomInfoType.
|
|
# be used to match sensitive information specific to the data, such as a list
|
|
# of employee IDs or job titles.
|
|
#
|
|
# Dictionary words are case-insensitive and all characters other than letters
|
|
# and digits in the unicode [Basic Multilingual
|
|
# Plane](https://en.wikipedia.org/wiki/Plane_%28Unicode%29#Basic_Multilingual_Plane)
|
|
# will be replaced with whitespace when scanning for matches, so the
|
|
# dictionary phrase "Sam Johnson" will match all three phrases "sam johnson",
|
|
# "Sam, Johnson", and "Sam (Johnson)". Additionally, the characters
|
|
# surrounding any match must be of a different type than the adjacent
|
|
# characters within the word, so letters must be next to non-letters and
|
|
# digits next to non-digits. For example, the dictionary word "jen" will
|
|
# match the first three letters of the text "jen123" but will return no
|
|
# matches for "jennifer".
|
|
#
|
|
# Dictionary words containing a large number of characters that are not
|
|
# letters or digits may result in unexpected findings because such characters
|
|
# are treated as whitespace. The
|
|
# [limits](https://cloud.google.com/dlp/limits) page contains details about
|
|
# the size limits of dictionaries. For dictionaries that do not fit within
|
|
# these constraints, consider using `LargeCustomDictionaryConfig` in the
|
|
# `StoredInfoType` API.
|
|
"wordList": { # Message defining a list of words or phrases to search for in the data. # List of words or phrases to search for.
|
|
"words": [ # Words or phrases defining the dictionary. The dictionary must contain
|
|
# at least one phrase and every phrase must contain at least 2 characters
|
|
# that are letters or digits. [required]
|
|
"A String",
|
|
],
|
|
},
|
|
"cloudStoragePath": { # Message representing a single file or path in Cloud Storage. # Newline-delimited file of words in Cloud Storage. Only a single file
|
|
# is accepted.
|
|
"path": "A String", # A url representing a file or path (no wildcards) in Cloud Storage.
|
|
# Example: gs://[BUCKET_NAME]/dictionary.txt
|
|
},
|
|
},
|
|
"storedType": { # A reference to a StoredInfoType to use with scanning. # Load an existing `StoredInfoType` resource for use in
|
|
# `InspectDataSource`. Not currently supported in `InspectContent`.
|
|
"name": "A String", # Resource name of the requested `StoredInfoType`, for example
|
|
# `organizations/433245324/storedInfoTypes/432452342` or
|
|
# `projects/project-id/storedInfoTypes/432452342`.
|
|
"createTime": "A String", # Timestamp indicating when the version of the `StoredInfoType` used for
|
|
# inspection was created. Output-only field, populated by the system.
|
|
},
|
|
"detectionRules": [ # Set of detection rules to apply to all findings of this CustomInfoType.
|
|
# Rules are applied in order that they are specified. Not supported for the
|
|
# `surrogate_type` CustomInfoType.
|
|
{ # Deprecated; use `InspectionRuleSet` instead. Rule for modifying a
|
|
# `CustomInfoType` to alter behavior under certain circumstances, depending
|
|
# on the specific details of the rule. Not supported for the `surrogate_type`
|
|
# custom infoType.
|
|
"hotwordRule": { # The rule that adjusts the likelihood of findings within a certain # Hotword-based detection rule.
|
|
# proximity of hotwords.
|
|
"proximity": { # Message for specifying a window around a finding to apply a detection # Proximity of the finding within which the entire hotword must reside.
|
|
# The total length of the window cannot exceed 1000 characters. Note that
|
|
# the finding itself will be included in the window, so that hotwords may
|
|
# be used to match substrings of the finding itself. For example, the
|
|
# certainty of a phone number regex "\(\d{3}\) \d{3}-\d{4}" could be
|
|
# adjusted upwards if the area code is known to be the local area code of
|
|
# a company office using the hotword regex "\(xxx\)", where "xxx"
|
|
# is the area code in question.
|
|
# rule.
|
|
"windowAfter": 42, # Number of characters after the finding to consider.
|
|
"windowBefore": 42, # Number of characters before the finding to consider.
|
|
},
|
|
"hotwordRegex": { # Message defining a custom regular expression. # Regular expression pattern defining what qualifies as a hotword.
|
|
"pattern": "A String", # Pattern defining the regular expression. Its syntax
|
|
# (https://github.com/google/re2/wiki/Syntax) can be found under the
|
|
# google/re2 repository on GitHub.
|
|
"groupIndexes": [ # The index of the submatch to extract as findings. When not
|
|
# specified, the entire match is returned. No more than 3 may be included.
|
|
42,
|
|
],
|
|
},
|
|
"likelihoodAdjustment": { # Message for specifying an adjustment to the likelihood of a finding as # Likelihood adjustment to apply to all matching findings.
|
|
# part of a detection rule.
|
|
"relativeLikelihood": 42, # Increase or decrease the likelihood by the specified number of
|
|
# levels. For example, if a finding would be `POSSIBLE` without the
|
|
# detection rule and `relative_likelihood` is 1, then it is upgraded to
|
|
# `LIKELY`, while a value of -1 would downgrade it to `UNLIKELY`.
|
|
# Likelihood may never drop below `VERY_UNLIKELY` or exceed
|
|
# `VERY_LIKELY`, so applying an adjustment of 1 followed by an
|
|
# adjustment of -1 when base likelihood is `VERY_LIKELY` will result in
|
|
# a final likelihood of `LIKELY`.
|
|
"fixedLikelihood": "A String", # Set the likelihood of a finding to a fixed value.
|
|
},
|
|
},
|
|
},
|
|
],
|
|
"exclusionType": "A String", # If set to EXCLUSION_TYPE_EXCLUDE this infoType will not cause a finding
|
|
# to be returned. It still can be used for rules matching.
|
|
"likelihood": "A String", # Likelihood to return for this CustomInfoType. This base value can be
|
|
# altered by a detection rule if the finding meets the criteria specified by
|
|
# the rule. Defaults to `VERY_LIKELY` if not specified.
|
|
},
|
|
],
|
|
"includeQuote": True or False, # When true, a contextual quote from the data that triggered a finding is
|
|
# included in the response; see Finding.quote.
|
|
"ruleSet": [ # Set of rules to apply to the findings for this InspectConfig.
|
|
# Exclusion rules, contained in the set are executed in the end, other
|
|
# rules are executed in the order they are specified for each info type.
|
|
{ # Rule set for modifying a set of infoTypes to alter behavior under certain
|
|
# circumstances, depending on the specific details of the rules within the set.
|
|
"rules": [ # Set of rules to be applied to infoTypes. The rules are applied in order.
|
|
{ # A single inspection rule to be applied to infoTypes, specified in
|
|
# `InspectionRuleSet`.
|
|
"hotwordRule": { # The rule that adjusts the likelihood of findings within a certain # Hotword-based detection rule.
|
|
# proximity of hotwords.
|
|
"proximity": { # Message for specifying a window around a finding to apply a detection # Proximity of the finding within which the entire hotword must reside.
|
|
# The total length of the window cannot exceed 1000 characters. Note that
|
|
# the finding itself will be included in the window, so that hotwords may
|
|
# be used to match substrings of the finding itself. For example, the
|
|
# certainty of a phone number regex "\(\d{3}\) \d{3}-\d{4}" could be
|
|
# adjusted upwards if the area code is known to be the local area code of
|
|
# a company office using the hotword regex "\(xxx\)", where "xxx"
|
|
# is the area code in question.
|
|
# rule.
|
|
"windowAfter": 42, # Number of characters after the finding to consider.
|
|
"windowBefore": 42, # Number of characters before the finding to consider.
|
|
},
|
|
"hotwordRegex": { # Message defining a custom regular expression. # Regular expression pattern defining what qualifies as a hotword.
|
|
"pattern": "A String", # Pattern defining the regular expression. Its syntax
|
|
# (https://github.com/google/re2/wiki/Syntax) can be found under the
|
|
# google/re2 repository on GitHub.
|
|
"groupIndexes": [ # The index of the submatch to extract as findings. When not
|
|
# specified, the entire match is returned. No more than 3 may be included.
|
|
42,
|
|
],
|
|
},
|
|
"likelihoodAdjustment": { # Message for specifying an adjustment to the likelihood of a finding as # Likelihood adjustment to apply to all matching findings.
|
|
# part of a detection rule.
|
|
"relativeLikelihood": 42, # Increase or decrease the likelihood by the specified number of
|
|
# levels. For example, if a finding would be `POSSIBLE` without the
|
|
# detection rule and `relative_likelihood` is 1, then it is upgraded to
|
|
# `LIKELY`, while a value of -1 would downgrade it to `UNLIKELY`.
|
|
# Likelihood may never drop below `VERY_UNLIKELY` or exceed
|
|
# `VERY_LIKELY`, so applying an adjustment of 1 followed by an
|
|
# adjustment of -1 when base likelihood is `VERY_LIKELY` will result in
|
|
# a final likelihood of `LIKELY`.
|
|
"fixedLikelihood": "A String", # Set the likelihood of a finding to a fixed value.
|
|
},
|
|
},
|
|
"exclusionRule": { # The rule that specifies conditions when findings of infoTypes specified in # Exclusion rule.
|
|
# `InspectionRuleSet` are removed from results.
|
|
"regex": { # Message defining a custom regular expression. # Regular expression which defines the rule.
|
|
"pattern": "A String", # Pattern defining the regular expression. Its syntax
|
|
# (https://github.com/google/re2/wiki/Syntax) can be found under the
|
|
# google/re2 repository on GitHub.
|
|
"groupIndexes": [ # The index of the submatch to extract as findings. When not
|
|
# specified, the entire match is returned. No more than 3 may be included.
|
|
42,
|
|
],
|
|
},
|
|
"excludeInfoTypes": { # List of exclude infoTypes. # Set of infoTypes for which findings would affect this rule.
|
|
"infoTypes": [ # InfoType list in ExclusionRule rule drops a finding when it overlaps or
|
|
# contained within with a finding of an infoType from this list. For
|
|
# example, for `InspectionRuleSet.info_types` containing "PHONE_NUMBER"` and
|
|
# `exclusion_rule` containing `exclude_info_types.info_types` with
|
|
# "EMAIL_ADDRESS" the phone number findings are dropped if they overlap
|
|
# with EMAIL_ADDRESS finding.
|
|
# That leads to "555-222-2222@example.org" to generate only a single
|
|
# finding, namely email address.
|
|
{ # Type of information detected by the API.
|
|
"name": "A String", # Name of the information type. Either a name of your choosing when
|
|
# creating a CustomInfoType, or one of the names listed
|
|
# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying
|
|
# a built-in type. InfoType names should conform to the pattern
|
|
# [a-zA-Z0-9_]{1,64}.
|
|
},
|
|
],
|
|
},
|
|
"dictionary": { # Custom information type based on a dictionary of words or phrases. This can # Dictionary which defines the rule.
|
|
# be used to match sensitive information specific to the data, such as a list
|
|
# of employee IDs or job titles.
|
|
#
|
|
# Dictionary words are case-insensitive and all characters other than letters
|
|
# and digits in the unicode [Basic Multilingual
|
|
# Plane](https://en.wikipedia.org/wiki/Plane_%28Unicode%29#Basic_Multilingual_Plane)
|
|
# will be replaced with whitespace when scanning for matches, so the
|
|
# dictionary phrase "Sam Johnson" will match all three phrases "sam johnson",
|
|
# "Sam, Johnson", and "Sam (Johnson)". Additionally, the characters
|
|
# surrounding any match must be of a different type than the adjacent
|
|
# characters within the word, so letters must be next to non-letters and
|
|
# digits next to non-digits. For example, the dictionary word "jen" will
|
|
# match the first three letters of the text "jen123" but will return no
|
|
# matches for "jennifer".
|
|
#
|
|
# Dictionary words containing a large number of characters that are not
|
|
# letters or digits may result in unexpected findings because such characters
|
|
# are treated as whitespace. The
|
|
# [limits](https://cloud.google.com/dlp/limits) page contains details about
|
|
# the size limits of dictionaries. For dictionaries that do not fit within
|
|
# these constraints, consider using `LargeCustomDictionaryConfig` in the
|
|
# `StoredInfoType` API.
|
|
"wordList": { # Message defining a list of words or phrases to search for in the data. # List of words or phrases to search for.
|
|
"words": [ # Words or phrases defining the dictionary. The dictionary must contain
|
|
# at least one phrase and every phrase must contain at least 2 characters
|
|
# that are letters or digits. [required]
|
|
"A String",
|
|
],
|
|
},
|
|
"cloudStoragePath": { # Message representing a single file or path in Cloud Storage. # Newline-delimited file of words in Cloud Storage. Only a single file
|
|
# is accepted.
|
|
"path": "A String", # A url representing a file or path (no wildcards) in Cloud Storage.
|
|
# Example: gs://[BUCKET_NAME]/dictionary.txt
|
|
},
|
|
},
|
|
"matchingType": "A String", # How the rule is applied, see MatchingType documentation for details.
|
|
},
|
|
},
|
|
],
|
|
"infoTypes": [ # List of infoTypes this rule set is applied to.
|
|
{ # Type of information detected by the API.
|
|
"name": "A String", # Name of the information type. Either a name of your choosing when
|
|
# creating a CustomInfoType, or one of the names listed
|
|
# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying
|
|
# a built-in type. InfoType names should conform to the pattern
|
|
# [a-zA-Z0-9_]{1,64}.
|
|
},
|
|
],
|
|
},
|
|
],
|
|
"contentOptions": [ # List of options defining data content to scan.
|
|
# If empty, text, images, and other content will be included.
|
|
"A String",
|
|
],
|
|
"infoTypes": [ # Restricts what info_types to look for. The values must correspond to
|
|
# InfoType values returned by ListInfoTypes or listed at
|
|
# https://cloud.google.com/dlp/docs/infotypes-reference.
|
|
#
|
|
# When no InfoTypes or CustomInfoTypes are specified in a request, the
|
|
# system may automatically choose what detectors to run. By default this may
|
|
# be all types, but may change over time as detectors are updated.
|
|
#
|
|
# The special InfoType name "ALL_BASIC" can be used to trigger all detectors,
|
|
# but may change over time as new InfoTypes are added. If you need precise
|
|
# control and predictability as to what detectors are run you should specify
|
|
# specific InfoTypes listed in the reference.
|
|
{ # Type of information detected by the API.
|
|
"name": "A String", # Name of the information type. Either a name of your choosing when
|
|
# creating a CustomInfoType, or one of the names listed
|
|
# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying
|
|
# a built-in type. InfoType names should conform to the pattern
|
|
# [a-zA-Z0-9_]{1,64}.
|
|
},
|
|
],
|
|
},
|
|
"createTime": "A String", # The creation timestamp of a inspectTemplate, output only field.
|
|
"name": "A String", # The template name. Output only.
|
|
#
|
|
# The template will have one of the following formats:
|
|
# `projects/PROJECT_ID/inspectTemplates/TEMPLATE_ID` OR
|
|
# `organizations/ORGANIZATION_ID/inspectTemplates/TEMPLATE_ID`
|
|
},
|
|
"jobConfig": {
|
|
"storageConfig": { # Shared message indicating Cloud storage type. # The data to scan.
|
|
"datastoreOptions": { # Options defining a data set within Google Cloud Datastore. # Google Cloud Datastore options specification.
|
|
"partitionId": { # Datastore partition ID. # A partition ID identifies a grouping of entities. The grouping is always
|
|
# by project and namespace, however the namespace ID may be empty.
|
|
# A partition ID identifies a grouping of entities. The grouping is always
|
|
# by project and namespace, however the namespace ID may be empty.
|
|
#
|
|
# A partition ID contains several dimensions:
|
|
# project ID and namespace ID.
|
|
"projectId": "A String", # The ID of the project to which the entities belong.
|
|
"namespaceId": "A String", # If not empty, the ID of the namespace to which the entities belong.
|
|
},
|
|
"kind": { # A representation of a Datastore kind. # The kind to process.
|
|
"name": "A String", # The name of the kind.
|
|
},
|
|
},
|
|
"bigQueryOptions": { # Options defining BigQuery table and row identifiers. # BigQuery options specification.
|
|
"excludedFields": [ # References to fields excluded from scanning. This allows you to skip
|
|
# inspection of entire columns which you know have no findings.
|
|
{ # General identifier of a data field in a storage service.
|
|
"name": "A String", # Name describing the field.
|
|
},
|
|
],
|
|
"rowsLimit": "A String", # Max number of rows to scan. If the table has more rows than this value, the
|
|
# rest of the rows are omitted. If not set, or if set to 0, all rows will be
|
|
# scanned. Only one of rows_limit and rows_limit_percent can be specified.
|
|
# Cannot be used in conjunction with TimespanConfig.
|
|
"sampleMethod": "A String",
|
|
"identifyingFields": [ # References to fields uniquely identifying rows within the table.
|
|
# Nested fields in the format, like `person.birthdate.year`, are allowed.
|
|
{ # General identifier of a data field in a storage service.
|
|
"name": "A String", # Name describing the field.
|
|
},
|
|
],
|
|
"rowsLimitPercent": 42, # Max percentage of rows to scan. The rest are omitted. The number of rows
|
|
# scanned is rounded down. Must be between 0 and 100, inclusively. Both 0 and
|
|
# 100 means no limit. Defaults to 0. Only one of rows_limit and
|
|
# rows_limit_percent can be specified. Cannot be used in conjunction with
|
|
# TimespanConfig.
|
|
"tableReference": { # Message defining the location of a BigQuery table. A table is uniquely # Complete BigQuery table reference.
|
|
# identified by its project_id, dataset_id, and table_name. Within a query
|
|
# a table is often referenced with a string in the format of:
|
|
# `<project_id>:<dataset_id>.<table_id>` or
|
|
# `<project_id>.<dataset_id>.<table_id>`.
|
|
"projectId": "A String", # The Google Cloud Platform project ID of the project containing the table.
|
|
# If omitted, project ID is inferred from the API call.
|
|
"tableId": "A String", # Name of the table.
|
|
"datasetId": "A String", # Dataset ID of the table.
|
|
},
|
|
},
|
|
"timespanConfig": { # Configuration of the timespan of the items to include in scanning.
|
|
# Currently only supported when inspecting Google Cloud Storage and BigQuery.
|
|
"timestampField": { # General identifier of a data field in a storage service. # Specification of the field containing the timestamp of scanned items.
|
|
# Used for data sources like Datastore or BigQuery.
|
|
# If not specified for BigQuery, table last modification timestamp
|
|
# is checked against given time span.
|
|
# The valid data types of the timestamp field are:
|
|
# for BigQuery - timestamp, date, datetime;
|
|
# for Datastore - timestamp.
|
|
# Datastore entity will be scanned if the timestamp property does not exist
|
|
# or its value is empty or invalid.
|
|
"name": "A String", # Name describing the field.
|
|
},
|
|
"endTime": "A String", # Exclude files or rows newer than this value.
|
|
# If set to zero, no upper time limit is applied.
|
|
"startTime": "A String", # Exclude files or rows older than this value.
|
|
"enableAutoPopulationOfTimespanConfig": True or False, # When the job is started by a JobTrigger we will automatically figure out
|
|
# a valid start_time to avoid scanning files that have not been modified
|
|
# since the last time the JobTrigger executed. This will be based on the
|
|
# time of the execution of the last run of the JobTrigger.
|
|
},
|
|
"cloudStorageOptions": { # Options defining a file or a set of files within a Google Cloud Storage # Google Cloud Storage options specification.
|
|
# bucket.
|
|
"bytesLimitPerFile": "A String", # Max number of bytes to scan from a file. If a scanned file's size is bigger
|
|
# than this value then the rest of the bytes are omitted. Only one
|
|
# of bytes_limit_per_file and bytes_limit_per_file_percent can be specified.
|
|
"sampleMethod": "A String",
|
|
"fileSet": { # Set of files to scan. # The set of one or more files to scan.
|
|
"url": "A String", # The Cloud Storage url of the file(s) to scan, in the format
|
|
# `gs://<bucket>/<path>`. Trailing wildcard in the path is allowed.
|
|
#
|
|
# If the url ends in a trailing slash, the bucket or directory represented
|
|
# by the url will be scanned non-recursively (content in sub-directories
|
|
# will not be scanned). This means that `gs://mybucket/` is equivalent to
|
|
# `gs://mybucket/*`, and `gs://mybucket/directory/` is equivalent to
|
|
# `gs://mybucket/directory/*`.
|
|
#
|
|
# Exactly one of `url` or `regex_file_set` must be set.
|
|
"regexFileSet": { # Message representing a set of files in a Cloud Storage bucket. Regular # The regex-filtered set of files to scan. Exactly one of `url` or
|
|
# `regex_file_set` must be set.
|
|
# expressions are used to allow fine-grained control over which files in the
|
|
# bucket to include.
|
|
#
|
|
# Included files are those that match at least one item in `include_regex` and
|
|
# do not match any items in `exclude_regex`. Note that a file that matches
|
|
# items from both lists will _not_ be included. For a match to occur, the
|
|
# entire file path (i.e., everything in the url after the bucket name) must
|
|
# match the regular expression.
|
|
#
|
|
# For example, given the input `{bucket_name: "mybucket", include_regex:
|
|
# ["directory1/.*"], exclude_regex:
|
|
# ["directory1/excluded.*"]}`:
|
|
#
|
|
# * `gs://mybucket/directory1/myfile` will be included
|
|
# * `gs://mybucket/directory1/directory2/myfile` will be included (`.*` matches
|
|
# across `/`)
|
|
# * `gs://mybucket/directory0/directory1/myfile` will _not_ be included (the
|
|
# full path doesn't match any items in `include_regex`)
|
|
# * `gs://mybucket/directory1/excludedfile` will _not_ be included (the path
|
|
# matches an item in `exclude_regex`)
|
|
#
|
|
# If `include_regex` is left empty, it will match all files by default
|
|
# (this is equivalent to setting `include_regex: [".*"]`).
|
|
#
|
|
# Some other common use cases:
|
|
#
|
|
# * `{bucket_name: "mybucket", exclude_regex: [".*\.pdf"]}` will include all
|
|
# files in `mybucket` except for .pdf files
|
|
# * `{bucket_name: "mybucket", include_regex: ["directory/[^/]+"]}` will
|
|
# include all files directly under `gs://mybucket/directory/`, without matching
|
|
# across `/`
|
|
"excludeRegex": [ # A list of regular expressions matching file paths to exclude. All files in
|
|
# the bucket that match at least one of these regular expressions will be
|
|
# excluded from the scan.
|
|
#
|
|
# Regular expressions use RE2
|
|
# [syntax](https://github.com/google/re2/wiki/Syntax); a guide can be found
|
|
# under the google/re2 repository on GitHub.
|
|
"A String",
|
|
],
|
|
"bucketName": "A String", # The name of a Cloud Storage bucket. Required.
|
|
"includeRegex": [ # A list of regular expressions matching file paths to include. All files in
|
|
# the bucket that match at least one of these regular expressions will be
|
|
# included in the set of files, except for those that also match an item in
|
|
# `exclude_regex`. Leaving this field empty will match all files by default
|
|
# (this is equivalent to including `.*` in the list).
|
|
#
|
|
# Regular expressions use RE2
|
|
# [syntax](https://github.com/google/re2/wiki/Syntax); a guide can be found
|
|
# under the google/re2 repository on GitHub.
|
|
"A String",
|
|
],
|
|
},
|
|
},
|
|
"bytesLimitPerFilePercent": 42, # Max percentage of bytes to scan from a file. The rest are omitted. The
|
|
# number of bytes scanned is rounded down. Must be between 0 and 100,
|
|
# inclusively. Both 0 and 100 means no limit. Defaults to 0. Only one
|
|
# of bytes_limit_per_file and bytes_limit_per_file_percent can be specified.
|
|
"filesLimitPercent": 42, # Limits the number of files to scan to this percentage of the input FileSet.
|
|
# Number of files scanned is rounded down. Must be between 0 and 100,
|
|
# inclusively. Both 0 and 100 means no limit. Defaults to 0.
|
|
"fileTypes": [ # List of file type groups to include in the scan.
|
|
# If empty, all files are scanned and available data format processors
|
|
# are applied. In addition, the binary content of the selected files
|
|
# is always scanned as well.
|
|
"A String",
|
|
],
|
|
},
|
|
},
|
|
"inspectConfig": { # Configuration description of the scanning process. # How and what to scan for.
|
|
# When used with redactContent only info_types and min_likelihood are currently
|
|
# used.
|
|
"excludeInfoTypes": True or False, # When true, excludes type information of the findings.
|
|
"limits": {
|
|
"maxFindingsPerRequest": 42, # Max number of findings that will be returned per request/job.
|
|
# When set within `InspectContentRequest`, the maximum returned is 2000
|
|
# regardless if this is set higher.
|
|
"maxFindingsPerInfoType": [ # Configuration of findings limit given for specified infoTypes.
|
|
{ # Max findings configuration per infoType, per content item or long
|
|
# running DlpJob.
|
|
"infoType": { # Type of information detected by the API. # Type of information the findings limit applies to. Only one limit per
|
|
# info_type should be provided. If InfoTypeLimit does not have an
|
|
# info_type, the DLP API applies the limit against all info_types that
|
|
# are found but not specified in another InfoTypeLimit.
|
|
"name": "A String", # Name of the information type. Either a name of your choosing when
|
|
# creating a CustomInfoType, or one of the names listed
|
|
# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying
|
|
# a built-in type. InfoType names should conform to the pattern
|
|
# [a-zA-Z0-9_]{1,64}.
|
|
},
|
|
"maxFindings": 42, # Max findings limit for the given infoType.
|
|
},
|
|
],
|
|
"maxFindingsPerItem": 42, # Max number of findings that will be returned for each item scanned.
|
|
# When set within `InspectDataSourceRequest`,
|
|
# the maximum returned is 2000 regardless if this is set higher.
|
|
# When set within `InspectContentRequest`, this field is ignored.
|
|
},
|
|
"minLikelihood": "A String", # Only returns findings equal or above this threshold. The default is
|
|
# POSSIBLE.
|
|
# See https://cloud.google.com/dlp/docs/likelihood to learn more.
|
|
"customInfoTypes": [ # CustomInfoTypes provided by the user. See
|
|
# https://cloud.google.com/dlp/docs/creating-custom-infotypes to learn more.
|
|
{ # Custom information type provided by the user. Used to find domain-specific
|
|
# sensitive information configurable to the data in question.
|
|
"regex": { # Message defining a custom regular expression. # Regular expression based CustomInfoType.
|
|
"pattern": "A String", # Pattern defining the regular expression. Its syntax
|
|
# (https://github.com/google/re2/wiki/Syntax) can be found under the
|
|
# google/re2 repository on GitHub.
|
|
"groupIndexes": [ # The index of the submatch to extract as findings. When not
|
|
# specified, the entire match is returned. No more than 3 may be included.
|
|
42,
|
|
],
|
|
},
|
|
"surrogateType": { # Message for detecting output from deidentification transformations # Message for detecting output from deidentification transformations that
|
|
# support reversing.
|
|
# such as
|
|
# [`CryptoReplaceFfxFpeConfig`](/dlp/docs/reference/rest/v2/organizations.deidentifyTemplates#cryptoreplaceffxfpeconfig).
|
|
# These types of transformations are
|
|
# those that perform pseudonymization, thereby producing a "surrogate" as
|
|
# output. This should be used in conjunction with a field on the
|
|
# transformation such as `surrogate_info_type`. This CustomInfoType does
|
|
# not support the use of `detection_rules`.
|
|
},
|
|
"infoType": { # Type of information detected by the API. # CustomInfoType can either be a new infoType, or an extension of built-in
|
|
# infoType, when the name matches one of existing infoTypes and that infoType
|
|
# is specified in `InspectContent.info_types` field. Specifying the latter
|
|
# adds findings to the one detected by the system. If built-in info type is
|
|
# not specified in `InspectContent.info_types` list then the name is treated
|
|
# as a custom info type.
|
|
"name": "A String", # Name of the information type. Either a name of your choosing when
|
|
# creating a CustomInfoType, or one of the names listed
|
|
# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying
|
|
# a built-in type. InfoType names should conform to the pattern
|
|
# [a-zA-Z0-9_]{1,64}.
|
|
},
|
|
"dictionary": { # Custom information type based on a dictionary of words or phrases. This can # A list of phrases to detect as a CustomInfoType.
|
|
# be used to match sensitive information specific to the data, such as a list
|
|
# of employee IDs or job titles.
|
|
#
|
|
# Dictionary words are case-insensitive and all characters other than letters
|
|
# and digits in the unicode [Basic Multilingual
|
|
# Plane](https://en.wikipedia.org/wiki/Plane_%28Unicode%29#Basic_Multilingual_Plane)
|
|
# will be replaced with whitespace when scanning for matches, so the
|
|
# dictionary phrase "Sam Johnson" will match all three phrases "sam johnson",
|
|
# "Sam, Johnson", and "Sam (Johnson)". Additionally, the characters
|
|
# surrounding any match must be of a different type than the adjacent
|
|
# characters within the word, so letters must be next to non-letters and
|
|
# digits next to non-digits. For example, the dictionary word "jen" will
|
|
# match the first three letters of the text "jen123" but will return no
|
|
# matches for "jennifer".
|
|
#
|
|
# Dictionary words containing a large number of characters that are not
|
|
# letters or digits may result in unexpected findings because such characters
|
|
# are treated as whitespace. The
|
|
# [limits](https://cloud.google.com/dlp/limits) page contains details about
|
|
# the size limits of dictionaries. For dictionaries that do not fit within
|
|
# these constraints, consider using `LargeCustomDictionaryConfig` in the
|
|
# `StoredInfoType` API.
|
|
"wordList": { # Message defining a list of words or phrases to search for in the data. # List of words or phrases to search for.
|
|
"words": [ # Words or phrases defining the dictionary. The dictionary must contain
|
|
# at least one phrase and every phrase must contain at least 2 characters
|
|
# that are letters or digits. [required]
|
|
"A String",
|
|
],
|
|
},
|
|
"cloudStoragePath": { # Message representing a single file or path in Cloud Storage. # Newline-delimited file of words in Cloud Storage. Only a single file
|
|
# is accepted.
|
|
"path": "A String", # A url representing a file or path (no wildcards) in Cloud Storage.
|
|
# Example: gs://[BUCKET_NAME]/dictionary.txt
|
|
},
|
|
},
|
|
"storedType": { # A reference to a StoredInfoType to use with scanning. # Load an existing `StoredInfoType` resource for use in
|
|
# `InspectDataSource`. Not currently supported in `InspectContent`.
|
|
"name": "A String", # Resource name of the requested `StoredInfoType`, for example
|
|
# `organizations/433245324/storedInfoTypes/432452342` or
|
|
# `projects/project-id/storedInfoTypes/432452342`.
|
|
"createTime": "A String", # Timestamp indicating when the version of the `StoredInfoType` used for
|
|
# inspection was created. Output-only field, populated by the system.
|
|
},
|
|
"detectionRules": [ # Set of detection rules to apply to all findings of this CustomInfoType.
|
|
# Rules are applied in order that they are specified. Not supported for the
|
|
# `surrogate_type` CustomInfoType.
|
|
{ # Deprecated; use `InspectionRuleSet` instead. Rule for modifying a
|
|
# `CustomInfoType` to alter behavior under certain circumstances, depending
|
|
# on the specific details of the rule. Not supported for the `surrogate_type`
|
|
# custom infoType.
|
|
"hotwordRule": { # The rule that adjusts the likelihood of findings within a certain # Hotword-based detection rule.
|
|
# proximity of hotwords.
|
|
"proximity": { # Message for specifying a window around a finding to apply a detection # Proximity of the finding within which the entire hotword must reside.
|
|
# The total length of the window cannot exceed 1000 characters. Note that
|
|
# the finding itself will be included in the window, so that hotwords may
|
|
# be used to match substrings of the finding itself. For example, the
|
|
# certainty of a phone number regex "\(\d{3}\) \d{3}-\d{4}" could be
|
|
# adjusted upwards if the area code is known to be the local area code of
|
|
# a company office using the hotword regex "\(xxx\)", where "xxx"
|
|
# is the area code in question.
|
|
# rule.
|
|
"windowAfter": 42, # Number of characters after the finding to consider.
|
|
"windowBefore": 42, # Number of characters before the finding to consider.
|
|
},
|
|
"hotwordRegex": { # Message defining a custom regular expression. # Regular expression pattern defining what qualifies as a hotword.
|
|
"pattern": "A String", # Pattern defining the regular expression. Its syntax
|
|
# (https://github.com/google/re2/wiki/Syntax) can be found under the
|
|
# google/re2 repository on GitHub.
|
|
"groupIndexes": [ # The index of the submatch to extract as findings. When not
|
|
# specified, the entire match is returned. No more than 3 may be included.
|
|
42,
|
|
],
|
|
},
|
|
"likelihoodAdjustment": { # Message for specifying an adjustment to the likelihood of a finding as # Likelihood adjustment to apply to all matching findings.
|
|
# part of a detection rule.
|
|
"relativeLikelihood": 42, # Increase or decrease the likelihood by the specified number of
|
|
# levels. For example, if a finding would be `POSSIBLE` without the
|
|
# detection rule and `relative_likelihood` is 1, then it is upgraded to
|
|
# `LIKELY`, while a value of -1 would downgrade it to `UNLIKELY`.
|
|
# Likelihood may never drop below `VERY_UNLIKELY` or exceed
|
|
# `VERY_LIKELY`, so applying an adjustment of 1 followed by an
|
|
# adjustment of -1 when base likelihood is `VERY_LIKELY` will result in
|
|
# a final likelihood of `LIKELY`.
|
|
"fixedLikelihood": "A String", # Set the likelihood of a finding to a fixed value.
|
|
},
|
|
},
|
|
},
|
|
],
|
|
"exclusionType": "A String", # If set to EXCLUSION_TYPE_EXCLUDE this infoType will not cause a finding
|
|
# to be returned. It still can be used for rules matching.
|
|
"likelihood": "A String", # Likelihood to return for this CustomInfoType. This base value can be
|
|
# altered by a detection rule if the finding meets the criteria specified by
|
|
# the rule. Defaults to `VERY_LIKELY` if not specified.
|
|
},
|
|
],
|
|
"includeQuote": True or False, # When true, a contextual quote from the data that triggered a finding is
|
|
# included in the response; see Finding.quote.
|
|
"ruleSet": [ # Set of rules to apply to the findings for this InspectConfig.
|
|
# Exclusion rules, contained in the set are executed in the end, other
|
|
# rules are executed in the order they are specified for each info type.
|
|
{ # Rule set for modifying a set of infoTypes to alter behavior under certain
|
|
# circumstances, depending on the specific details of the rules within the set.
|
|
"rules": [ # Set of rules to be applied to infoTypes. The rules are applied in order.
|
|
{ # A single inspection rule to be applied to infoTypes, specified in
|
|
# `InspectionRuleSet`.
|
|
"hotwordRule": { # The rule that adjusts the likelihood of findings within a certain # Hotword-based detection rule.
|
|
# proximity of hotwords.
|
|
"proximity": { # Message for specifying a window around a finding to apply a detection # Proximity of the finding within which the entire hotword must reside.
|
|
# The total length of the window cannot exceed 1000 characters. Note that
|
|
# the finding itself will be included in the window, so that hotwords may
|
|
# be used to match substrings of the finding itself. For example, the
|
|
# certainty of a phone number regex "\(\d{3}\) \d{3}-\d{4}" could be
|
|
# adjusted upwards if the area code is known to be the local area code of
|
|
# a company office using the hotword regex "\(xxx\)", where "xxx"
|
|
# is the area code in question.
|
|
# rule.
|
|
"windowAfter": 42, # Number of characters after the finding to consider.
|
|
"windowBefore": 42, # Number of characters before the finding to consider.
|
|
},
|
|
"hotwordRegex": { # Message defining a custom regular expression. # Regular expression pattern defining what qualifies as a hotword.
|
|
"pattern": "A String", # Pattern defining the regular expression. Its syntax
|
|
# (https://github.com/google/re2/wiki/Syntax) can be found under the
|
|
# google/re2 repository on GitHub.
|
|
"groupIndexes": [ # The index of the submatch to extract as findings. When not
|
|
# specified, the entire match is returned. No more than 3 may be included.
|
|
42,
|
|
],
|
|
},
|
|
"likelihoodAdjustment": { # Message for specifying an adjustment to the likelihood of a finding as # Likelihood adjustment to apply to all matching findings.
|
|
# part of a detection rule.
|
|
"relativeLikelihood": 42, # Increase or decrease the likelihood by the specified number of
|
|
# levels. For example, if a finding would be `POSSIBLE` without the
|
|
# detection rule and `relative_likelihood` is 1, then it is upgraded to
|
|
# `LIKELY`, while a value of -1 would downgrade it to `UNLIKELY`.
|
|
# Likelihood may never drop below `VERY_UNLIKELY` or exceed
|
|
# `VERY_LIKELY`, so applying an adjustment of 1 followed by an
|
|
# adjustment of -1 when base likelihood is `VERY_LIKELY` will result in
|
|
# a final likelihood of `LIKELY`.
|
|
"fixedLikelihood": "A String", # Set the likelihood of a finding to a fixed value.
|
|
},
|
|
},
|
|
"exclusionRule": { # The rule that specifies conditions when findings of infoTypes specified in # Exclusion rule.
|
|
# `InspectionRuleSet` are removed from results.
|
|
"regex": { # Message defining a custom regular expression. # Regular expression which defines the rule.
|
|
"pattern": "A String", # Pattern defining the regular expression. Its syntax
|
|
# (https://github.com/google/re2/wiki/Syntax) can be found under the
|
|
# google/re2 repository on GitHub.
|
|
"groupIndexes": [ # The index of the submatch to extract as findings. When not
|
|
# specified, the entire match is returned. No more than 3 may be included.
|
|
42,
|
|
],
|
|
},
|
|
"excludeInfoTypes": { # List of exclude infoTypes. # Set of infoTypes for which findings would affect this rule.
|
|
"infoTypes": [ # InfoType list in ExclusionRule rule drops a finding when it overlaps or
|
|
# contained within with a finding of an infoType from this list. For
|
|
# example, for `InspectionRuleSet.info_types` containing "PHONE_NUMBER"` and
|
|
# `exclusion_rule` containing `exclude_info_types.info_types` with
|
|
# "EMAIL_ADDRESS" the phone number findings are dropped if they overlap
|
|
# with EMAIL_ADDRESS finding.
|
|
# That leads to "555-222-2222@example.org" to generate only a single
|
|
# finding, namely email address.
|
|
{ # Type of information detected by the API.
|
|
"name": "A String", # Name of the information type. Either a name of your choosing when
|
|
# creating a CustomInfoType, or one of the names listed
|
|
# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying
|
|
# a built-in type. InfoType names should conform to the pattern
|
|
# [a-zA-Z0-9_]{1,64}.
|
|
},
|
|
],
|
|
},
|
|
"dictionary": { # Custom information type based on a dictionary of words or phrases. This can # Dictionary which defines the rule.
|
|
# be used to match sensitive information specific to the data, such as a list
|
|
# of employee IDs or job titles.
|
|
#
|
|
# Dictionary words are case-insensitive and all characters other than letters
|
|
# and digits in the unicode [Basic Multilingual
|
|
# Plane](https://en.wikipedia.org/wiki/Plane_%28Unicode%29#Basic_Multilingual_Plane)
|
|
# will be replaced with whitespace when scanning for matches, so the
|
|
# dictionary phrase "Sam Johnson" will match all three phrases "sam johnson",
|
|
# "Sam, Johnson", and "Sam (Johnson)". Additionally, the characters
|
|
# surrounding any match must be of a different type than the adjacent
|
|
# characters within the word, so letters must be next to non-letters and
|
|
# digits next to non-digits. For example, the dictionary word "jen" will
|
|
# match the first three letters of the text "jen123" but will return no
|
|
# matches for "jennifer".
|
|
#
|
|
# Dictionary words containing a large number of characters that are not
|
|
# letters or digits may result in unexpected findings because such characters
|
|
# are treated as whitespace. The
|
|
# [limits](https://cloud.google.com/dlp/limits) page contains details about
|
|
# the size limits of dictionaries. For dictionaries that do not fit within
|
|
# these constraints, consider using `LargeCustomDictionaryConfig` in the
|
|
# `StoredInfoType` API.
|
|
"wordList": { # Message defining a list of words or phrases to search for in the data. # List of words or phrases to search for.
|
|
"words": [ # Words or phrases defining the dictionary. The dictionary must contain
|
|
# at least one phrase and every phrase must contain at least 2 characters
|
|
# that are letters or digits. [required]
|
|
"A String",
|
|
],
|
|
},
|
|
"cloudStoragePath": { # Message representing a single file or path in Cloud Storage. # Newline-delimited file of words in Cloud Storage. Only a single file
|
|
# is accepted.
|
|
"path": "A String", # A url representing a file or path (no wildcards) in Cloud Storage.
|
|
# Example: gs://[BUCKET_NAME]/dictionary.txt
|
|
},
|
|
},
|
|
"matchingType": "A String", # How the rule is applied, see MatchingType documentation for details.
|
|
},
|
|
},
|
|
],
|
|
"infoTypes": [ # List of infoTypes this rule set is applied to.
|
|
{ # Type of information detected by the API.
|
|
"name": "A String", # Name of the information type. Either a name of your choosing when
|
|
# creating a CustomInfoType, or one of the names listed
|
|
# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying
|
|
# a built-in type. InfoType names should conform to the pattern
|
|
# [a-zA-Z0-9_]{1,64}.
|
|
},
|
|
],
|
|
},
|
|
],
|
|
"contentOptions": [ # List of options defining data content to scan.
|
|
# If empty, text, images, and other content will be included.
|
|
"A String",
|
|
],
|
|
"infoTypes": [ # Restricts what info_types to look for. The values must correspond to
|
|
# InfoType values returned by ListInfoTypes or listed at
|
|
# https://cloud.google.com/dlp/docs/infotypes-reference.
|
|
#
|
|
# When no InfoTypes or CustomInfoTypes are specified in a request, the
|
|
# system may automatically choose what detectors to run. By default this may
|
|
# be all types, but may change over time as detectors are updated.
|
|
#
|
|
# The special InfoType name "ALL_BASIC" can be used to trigger all detectors,
|
|
# but may change over time as new InfoTypes are added. If you need precise
|
|
# control and predictability as to what detectors are run you should specify
|
|
# specific InfoTypes listed in the reference.
|
|
{ # Type of information detected by the API.
|
|
"name": "A String", # Name of the information type. Either a name of your choosing when
|
|
# creating a CustomInfoType, or one of the names listed
|
|
# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying
|
|
# a built-in type. InfoType names should conform to the pattern
|
|
# [a-zA-Z0-9_]{1,64}.
|
|
},
|
|
],
|
|
},
|
|
"inspectTemplateName": "A String", # If provided, will be used as the default for all values in InspectConfig.
|
|
# `inspect_config` will be merged into the values persisted as part of the
|
|
# template.
|
|
"actions": [ # Actions to execute at the completion of the job.
|
|
{ # A task to execute on the completion of a job.
|
|
# See https://cloud.google.com/dlp/docs/concepts-actions to learn more.
|
|
"saveFindings": { # If set, the detailed findings will be persisted to the specified # Save resulting findings in a provided location.
|
|
# OutputStorageConfig. Only a single instance of this action can be
|
|
# specified.
|
|
# Compatible with: Inspect, Risk
|
|
"outputConfig": { # Cloud repository for storing output.
|
|
"table": { # Message defining the location of a BigQuery table. A table is uniquely # Store findings in an existing table or a new table in an existing
|
|
# dataset. If table_id is not set a new one will be generated
|
|
# for you with the following format:
|
|
# dlp_googleapis_yyyy_mm_dd_[dlp_job_id]. Pacific timezone will be used for
|
|
# generating the date details.
|
|
#
|
|
# For Inspect, each column in an existing output table must have the same
|
|
# name, type, and mode of a field in the `Finding` object.
|
|
#
|
|
# For Risk, an existing output table should be the output of a previous
|
|
# Risk analysis job run on the same source table, with the same privacy
|
|
# metric and quasi-identifiers. Risk jobs that analyze the same table but
|
|
# compute a different privacy metric, or use different sets of
|
|
# quasi-identifiers, cannot store their results in the same table.
|
|
# identified by its project_id, dataset_id, and table_name. Within a query
|
|
# a table is often referenced with a string in the format of:
|
|
# `<project_id>:<dataset_id>.<table_id>` or
|
|
# `<project_id>.<dataset_id>.<table_id>`.
|
|
"projectId": "A String", # The Google Cloud Platform project ID of the project containing the table.
|
|
# If omitted, project ID is inferred from the API call.
|
|
"tableId": "A String", # Name of the table.
|
|
"datasetId": "A String", # Dataset ID of the table.
|
|
},
|
|
"outputSchema": "A String", # Schema used for writing the findings for Inspect jobs. This field is only
|
|
# used for Inspect and must be unspecified for Risk jobs. Columns are derived
|
|
# from the `Finding` object. If appending to an existing table, any columns
|
|
# from the predefined schema that are missing will be added. No columns in
|
|
# the existing table will be deleted.
|
|
#
|
|
# If unspecified, then all available columns will be used for a new table or
|
|
# an (existing) table with no schema, and no changes will be made to an
|
|
# existing table that has a schema.
|
|
},
|
|
},
|
|
"jobNotificationEmails": { # Enable email notification to project owners and editors on jobs's # Enable email notification to project owners and editors on job's
|
|
# completion/failure.
|
|
# completion/failure.
|
|
},
|
|
"publishSummaryToCscc": { # Publish the result summary of a DlpJob to the Cloud Security # Publish summary to Cloud Security Command Center (Alpha).
|
|
# Command Center (CSCC Alpha).
|
|
# This action is only available for projects which are parts of
|
|
# an organization and whitelisted for the alpha Cloud Security Command
|
|
# Center.
|
|
# The action will publish count of finding instances and their info types.
|
|
# The summary of findings will be persisted in CSCC and are governed by CSCC
|
|
# service-specific policy, see https://cloud.google.com/terms/service-terms
|
|
# Only a single instance of this action can be specified.
|
|
# Compatible with: Inspect
|
|
},
|
|
"pubSub": { # Publish a message into given Pub/Sub topic when DlpJob has completed. The # Publish a notification to a pubsub topic.
|
|
# message contains a single field, `DlpJobName`, which is equal to the
|
|
# finished job's
|
|
# [`DlpJob.name`](/dlp/docs/reference/rest/v2/projects.dlpJobs#DlpJob).
|
|
# Compatible with: Inspect, Risk
|
|
"topic": "A String", # Cloud Pub/Sub topic to send notifications to. The topic must have given
|
|
# publishing access rights to the DLP API service account executing
|
|
# the long running DlpJob sending the notifications.
|
|
# Format is projects/{project}/topics/{topic}.
|
|
},
|
|
},
|
|
],
|
|
},
|
|
},
|
|
"result": { # All result fields mentioned below are updated while the job is processing. # A summary of the outcome of this inspect job.
|
|
"infoTypeStats": [ # Statistics of how many instances of each info type were found during
|
|
# inspect job.
|
|
{ # Statistics regarding a specific InfoType.
|
|
"count": "A String", # Number of findings for this infoType.
|
|
"infoType": { # Type of information detected by the API. # The type of finding this stat is for.
|
|
"name": "A String", # Name of the information type. Either a name of your choosing when
|
|
# creating a CustomInfoType, or one of the names listed
|
|
# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying
|
|
# a built-in type. InfoType names should conform to the pattern
|
|
# [a-zA-Z0-9_]{1,64}.
|
|
},
|
|
},
|
|
],
|
|
"totalEstimatedBytes": "A String", # Estimate of the number of bytes to process.
|
|
"processedBytes": "A String", # Total size in bytes that were processed.
|
|
},
|
|
},
|
|
"riskDetails": { # Result of a risk analysis operation request. # Results from analyzing risk of a data source.
|
|
"numericalStatsResult": { # Result of the numerical stats computation.
|
|
"quantileValues": [ # List of 99 values that partition the set of field values into 100 equal
|
|
# sized buckets.
|
|
{ # Set of primitive values supported by the system.
|
|
# Note that for the purposes of inspection or transformation, the number
|
|
# of bytes considered to comprise a 'Value' is based on its representation
|
|
# as a UTF-8 encoded string. For example, if 'integer_value' is set to
|
|
# 123456789, the number of bytes would be counted as 9, even though an
|
|
# int64 only holds up to 8 bytes of data.
|
|
"floatValue": 3.14,
|
|
"timestampValue": "A String",
|
|
"dayOfWeekValue": "A String",
|
|
"timeValue": { # Represents a time of day. The date and time zone are either not significant
|
|
# or are specified elsewhere. An API may choose to allow leap seconds. Related
|
|
# types are google.type.Date and `google.protobuf.Timestamp`.
|
|
"hours": 42, # Hours of day in 24 hour format. Should be from 0 to 23. An API may choose
|
|
# to allow the value "24:00:00" for scenarios like business closing time.
|
|
"nanos": 42, # Fractions of seconds in nanoseconds. Must be from 0 to 999,999,999.
|
|
"seconds": 42, # Seconds of minutes of the time. Must normally be from 0 to 59. An API may
|
|
# allow the value 60 if it allows leap-seconds.
|
|
"minutes": 42, # Minutes of hour of day. Must be from 0 to 59.
|
|
},
|
|
"dateValue": { # Represents a whole or partial calendar date, e.g. a birthday. The time of day
|
|
# and time zone are either specified elsewhere or are not significant. The date
|
|
# is relative to the Proleptic Gregorian Calendar. This can represent:
|
|
#
|
|
# * A full date, with non-zero year, month and day values
|
|
# * A month and day value, with a zero year, e.g. an anniversary
|
|
# * A year on its own, with zero month and day values
|
|
# * A year and month value, with a zero day, e.g. a credit card expiration date
|
|
#
|
|
# Related types are google.type.TimeOfDay and `google.protobuf.Timestamp`.
|
|
"year": 42, # Year of date. Must be from 1 to 9999, or 0 if specifying a date without
|
|
# a year.
|
|
"day": 42, # Day of month. Must be from 1 to 31 and valid for the year and month, or 0
|
|
# if specifying a year by itself or a year and month where the day is not
|
|
# significant.
|
|
"month": 42, # Month of year. Must be from 1 to 12, or 0 if specifying a year without a
|
|
# month and day.
|
|
},
|
|
"stringValue": "A String",
|
|
"booleanValue": True or False,
|
|
"integerValue": "A String",
|
|
},
|
|
],
|
|
"maxValue": { # Set of primitive values supported by the system. # Maximum value appearing in the column.
|
|
# Note that for the purposes of inspection or transformation, the number
|
|
# of bytes considered to comprise a 'Value' is based on its representation
|
|
# as a UTF-8 encoded string. For example, if 'integer_value' is set to
|
|
# 123456789, the number of bytes would be counted as 9, even though an
|
|
# int64 only holds up to 8 bytes of data.
|
|
"floatValue": 3.14,
|
|
"timestampValue": "A String",
|
|
"dayOfWeekValue": "A String",
|
|
"timeValue": { # Represents a time of day. The date and time zone are either not significant
|
|
# or are specified elsewhere. An API may choose to allow leap seconds. Related
|
|
# types are google.type.Date and `google.protobuf.Timestamp`.
|
|
"hours": 42, # Hours of day in 24 hour format. Should be from 0 to 23. An API may choose
|
|
# to allow the value "24:00:00" for scenarios like business closing time.
|
|
"nanos": 42, # Fractions of seconds in nanoseconds. Must be from 0 to 999,999,999.
|
|
"seconds": 42, # Seconds of minutes of the time. Must normally be from 0 to 59. An API may
|
|
# allow the value 60 if it allows leap-seconds.
|
|
"minutes": 42, # Minutes of hour of day. Must be from 0 to 59.
|
|
},
|
|
"dateValue": { # Represents a whole or partial calendar date, e.g. a birthday. The time of day
|
|
# and time zone are either specified elsewhere or are not significant. The date
|
|
# is relative to the Proleptic Gregorian Calendar. This can represent:
|
|
#
|
|
# * A full date, with non-zero year, month and day values
|
|
# * A month and day value, with a zero year, e.g. an anniversary
|
|
# * A year on its own, with zero month and day values
|
|
# * A year and month value, with a zero day, e.g. a credit card expiration date
|
|
#
|
|
# Related types are google.type.TimeOfDay and `google.protobuf.Timestamp`.
|
|
"year": 42, # Year of date. Must be from 1 to 9999, or 0 if specifying a date without
|
|
# a year.
|
|
"day": 42, # Day of month. Must be from 1 to 31 and valid for the year and month, or 0
|
|
# if specifying a year by itself or a year and month where the day is not
|
|
# significant.
|
|
"month": 42, # Month of year. Must be from 1 to 12, or 0 if specifying a year without a
|
|
# month and day.
|
|
},
|
|
"stringValue": "A String",
|
|
"booleanValue": True or False,
|
|
"integerValue": "A String",
|
|
},
|
|
"minValue": { # Set of primitive values supported by the system. # Minimum value appearing in the column.
|
|
# Note that for the purposes of inspection or transformation, the number
|
|
# of bytes considered to comprise a 'Value' is based on its representation
|
|
# as a UTF-8 encoded string. For example, if 'integer_value' is set to
|
|
# 123456789, the number of bytes would be counted as 9, even though an
|
|
# int64 only holds up to 8 bytes of data.
|
|
"floatValue": 3.14,
|
|
"timestampValue": "A String",
|
|
"dayOfWeekValue": "A String",
|
|
"timeValue": { # Represents a time of day. The date and time zone are either not significant
|
|
# or are specified elsewhere. An API may choose to allow leap seconds. Related
|
|
# types are google.type.Date and `google.protobuf.Timestamp`.
|
|
"hours": 42, # Hours of day in 24 hour format. Should be from 0 to 23. An API may choose
|
|
# to allow the value "24:00:00" for scenarios like business closing time.
|
|
"nanos": 42, # Fractions of seconds in nanoseconds. Must be from 0 to 999,999,999.
|
|
"seconds": 42, # Seconds of minutes of the time. Must normally be from 0 to 59. An API may
|
|
# allow the value 60 if it allows leap-seconds.
|
|
"minutes": 42, # Minutes of hour of day. Must be from 0 to 59.
|
|
},
|
|
"dateValue": { # Represents a whole or partial calendar date, e.g. a birthday. The time of day
|
|
# and time zone are either specified elsewhere or are not significant. The date
|
|
# is relative to the Proleptic Gregorian Calendar. This can represent:
|
|
#
|
|
# * A full date, with non-zero year, month and day values
|
|
# * A month and day value, with a zero year, e.g. an anniversary
|
|
# * A year on its own, with zero month and day values
|
|
# * A year and month value, with a zero day, e.g. a credit card expiration date
|
|
#
|
|
# Related types are google.type.TimeOfDay and `google.protobuf.Timestamp`.
|
|
"year": 42, # Year of date. Must be from 1 to 9999, or 0 if specifying a date without
|
|
# a year.
|
|
"day": 42, # Day of month. Must be from 1 to 31 and valid for the year and month, or 0
|
|
# if specifying a year by itself or a year and month where the day is not
|
|
# significant.
|
|
"month": 42, # Month of year. Must be from 1 to 12, or 0 if specifying a year without a
|
|
# month and day.
|
|
},
|
|
"stringValue": "A String",
|
|
"booleanValue": True or False,
|
|
"integerValue": "A String",
|
|
},
|
|
},
|
|
"kMapEstimationResult": { # Result of the reidentifiability analysis. Note that these results are an
|
|
# estimation, not exact values.
|
|
"kMapEstimationHistogram": [ # The intervals [min_anonymity, max_anonymity] do not overlap. If a value
|
|
# doesn't correspond to any such interval, the associated frequency is
|
|
# zero. For example, the following records:
|
|
# {min_anonymity: 1, max_anonymity: 1, frequency: 17}
|
|
# {min_anonymity: 2, max_anonymity: 3, frequency: 42}
|
|
# {min_anonymity: 5, max_anonymity: 10, frequency: 99}
|
|
# mean that there are no record with an estimated anonymity of 4, 5, or
|
|
# larger than 10.
|
|
{ # A KMapEstimationHistogramBucket message with the following values:
|
|
# min_anonymity: 3
|
|
# max_anonymity: 5
|
|
# frequency: 42
|
|
# means that there are 42 records whose quasi-identifier values correspond
|
|
# to 3, 4 or 5 people in the overlying population. An important particular
|
|
# case is when min_anonymity = max_anonymity = 1: the frequency field then
|
|
# corresponds to the number of uniquely identifiable records.
|
|
"bucketValues": [ # Sample of quasi-identifier tuple values in this bucket. The total
|
|
# number of classes returned per bucket is capped at 20.
|
|
{ # A tuple of values for the quasi-identifier columns.
|
|
"estimatedAnonymity": "A String", # The estimated anonymity for these quasi-identifier values.
|
|
"quasiIdsValues": [ # The quasi-identifier values.
|
|
{ # Set of primitive values supported by the system.
|
|
# Note that for the purposes of inspection or transformation, the number
|
|
# of bytes considered to comprise a 'Value' is based on its representation
|
|
# as a UTF-8 encoded string. For example, if 'integer_value' is set to
|
|
# 123456789, the number of bytes would be counted as 9, even though an
|
|
# int64 only holds up to 8 bytes of data.
|
|
"floatValue": 3.14,
|
|
"timestampValue": "A String",
|
|
"dayOfWeekValue": "A String",
|
|
"timeValue": { # Represents a time of day. The date and time zone are either not significant
|
|
# or are specified elsewhere. An API may choose to allow leap seconds. Related
|
|
# types are google.type.Date and `google.protobuf.Timestamp`.
|
|
"hours": 42, # Hours of day in 24 hour format. Should be from 0 to 23. An API may choose
|
|
# to allow the value "24:00:00" for scenarios like business closing time.
|
|
"nanos": 42, # Fractions of seconds in nanoseconds. Must be from 0 to 999,999,999.
|
|
"seconds": 42, # Seconds of minutes of the time. Must normally be from 0 to 59. An API may
|
|
# allow the value 60 if it allows leap-seconds.
|
|
"minutes": 42, # Minutes of hour of day. Must be from 0 to 59.
|
|
},
|
|
"dateValue": { # Represents a whole or partial calendar date, e.g. a birthday. The time of day
|
|
# and time zone are either specified elsewhere or are not significant. The date
|
|
# is relative to the Proleptic Gregorian Calendar. This can represent:
|
|
#
|
|
# * A full date, with non-zero year, month and day values
|
|
# * A month and day value, with a zero year, e.g. an anniversary
|
|
# * A year on its own, with zero month and day values
|
|
# * A year and month value, with a zero day, e.g. a credit card expiration date
|
|
#
|
|
# Related types are google.type.TimeOfDay and `google.protobuf.Timestamp`.
|
|
"year": 42, # Year of date. Must be from 1 to 9999, or 0 if specifying a date without
|
|
# a year.
|
|
"day": 42, # Day of month. Must be from 1 to 31 and valid for the year and month, or 0
|
|
# if specifying a year by itself or a year and month where the day is not
|
|
# significant.
|
|
"month": 42, # Month of year. Must be from 1 to 12, or 0 if specifying a year without a
|
|
# month and day.
|
|
},
|
|
"stringValue": "A String",
|
|
"booleanValue": True or False,
|
|
"integerValue": "A String",
|
|
},
|
|
],
|
|
},
|
|
],
|
|
"minAnonymity": "A String", # Always positive.
|
|
"bucketValueCount": "A String", # Total number of distinct quasi-identifier tuple values in this bucket.
|
|
"maxAnonymity": "A String", # Always greater than or equal to min_anonymity.
|
|
"bucketSize": "A String", # Number of records within these anonymity bounds.
|
|
},
|
|
],
|
|
},
|
|
"kAnonymityResult": { # Result of the k-anonymity computation.
|
|
"equivalenceClassHistogramBuckets": [ # Histogram of k-anonymity equivalence classes.
|
|
{
|
|
"bucketValues": [ # Sample of equivalence classes in this bucket. The total number of
|
|
# classes returned per bucket is capped at 20.
|
|
{ # The set of columns' values that share the same ldiversity value
|
|
"quasiIdsValues": [ # Set of values defining the equivalence class. One value per
|
|
# quasi-identifier column in the original KAnonymity metric message.
|
|
# The order is always the same as the original request.
|
|
{ # Set of primitive values supported by the system.
|
|
# Note that for the purposes of inspection or transformation, the number
|
|
# of bytes considered to comprise a 'Value' is based on its representation
|
|
# as a UTF-8 encoded string. For example, if 'integer_value' is set to
|
|
# 123456789, the number of bytes would be counted as 9, even though an
|
|
# int64 only holds up to 8 bytes of data.
|
|
"floatValue": 3.14,
|
|
"timestampValue": "A String",
|
|
"dayOfWeekValue": "A String",
|
|
"timeValue": { # Represents a time of day. The date and time zone are either not significant
|
|
# or are specified elsewhere. An API may choose to allow leap seconds. Related
|
|
# types are google.type.Date and `google.protobuf.Timestamp`.
|
|
"hours": 42, # Hours of day in 24 hour format. Should be from 0 to 23. An API may choose
|
|
# to allow the value "24:00:00" for scenarios like business closing time.
|
|
"nanos": 42, # Fractions of seconds in nanoseconds. Must be from 0 to 999,999,999.
|
|
"seconds": 42, # Seconds of minutes of the time. Must normally be from 0 to 59. An API may
|
|
# allow the value 60 if it allows leap-seconds.
|
|
"minutes": 42, # Minutes of hour of day. Must be from 0 to 59.
|
|
},
|
|
"dateValue": { # Represents a whole or partial calendar date, e.g. a birthday. The time of day
|
|
# and time zone are either specified elsewhere or are not significant. The date
|
|
# is relative to the Proleptic Gregorian Calendar. This can represent:
|
|
#
|
|
# * A full date, with non-zero year, month and day values
|
|
# * A month and day value, with a zero year, e.g. an anniversary
|
|
# * A year on its own, with zero month and day values
|
|
# * A year and month value, with a zero day, e.g. a credit card expiration date
|
|
#
|
|
# Related types are google.type.TimeOfDay and `google.protobuf.Timestamp`.
|
|
"year": 42, # Year of date. Must be from 1 to 9999, or 0 if specifying a date without
|
|
# a year.
|
|
"day": 42, # Day of month. Must be from 1 to 31 and valid for the year and month, or 0
|
|
# if specifying a year by itself or a year and month where the day is not
|
|
# significant.
|
|
"month": 42, # Month of year. Must be from 1 to 12, or 0 if specifying a year without a
|
|
# month and day.
|
|
},
|
|
"stringValue": "A String",
|
|
"booleanValue": True or False,
|
|
"integerValue": "A String",
|
|
},
|
|
],
|
|
"equivalenceClassSize": "A String", # Size of the equivalence class, for example number of rows with the
|
|
# above set of values.
|
|
},
|
|
],
|
|
"bucketValueCount": "A String", # Total number of distinct equivalence classes in this bucket.
|
|
"equivalenceClassSizeLowerBound": "A String", # Lower bound on the size of the equivalence classes in this bucket.
|
|
"equivalenceClassSizeUpperBound": "A String", # Upper bound on the size of the equivalence classes in this bucket.
|
|
"bucketSize": "A String", # Total number of equivalence classes in this bucket.
|
|
},
|
|
],
|
|
},
|
|
"lDiversityResult": { # Result of the l-diversity computation.
|
|
"sensitiveValueFrequencyHistogramBuckets": [ # Histogram of l-diversity equivalence class sensitive value frequencies.
|
|
{
|
|
"bucketValues": [ # Sample of equivalence classes in this bucket. The total number of
|
|
# classes returned per bucket is capped at 20.
|
|
{ # The set of columns' values that share the same ldiversity value.
|
|
"numDistinctSensitiveValues": "A String", # Number of distinct sensitive values in this equivalence class.
|
|
"quasiIdsValues": [ # Quasi-identifier values defining the k-anonymity equivalence
|
|
# class. The order is always the same as the original request.
|
|
{ # Set of primitive values supported by the system.
|
|
# Note that for the purposes of inspection or transformation, the number
|
|
# of bytes considered to comprise a 'Value' is based on its representation
|
|
# as a UTF-8 encoded string. For example, if 'integer_value' is set to
|
|
# 123456789, the number of bytes would be counted as 9, even though an
|
|
# int64 only holds up to 8 bytes of data.
|
|
"floatValue": 3.14,
|
|
"timestampValue": "A String",
|
|
"dayOfWeekValue": "A String",
|
|
"timeValue": { # Represents a time of day. The date and time zone are either not significant
|
|
# or are specified elsewhere. An API may choose to allow leap seconds. Related
|
|
# types are google.type.Date and `google.protobuf.Timestamp`.
|
|
"hours": 42, # Hours of day in 24 hour format. Should be from 0 to 23. An API may choose
|
|
# to allow the value "24:00:00" for scenarios like business closing time.
|
|
"nanos": 42, # Fractions of seconds in nanoseconds. Must be from 0 to 999,999,999.
|
|
"seconds": 42, # Seconds of minutes of the time. Must normally be from 0 to 59. An API may
|
|
# allow the value 60 if it allows leap-seconds.
|
|
"minutes": 42, # Minutes of hour of day. Must be from 0 to 59.
|
|
},
|
|
"dateValue": { # Represents a whole or partial calendar date, e.g. a birthday. The time of day
|
|
# and time zone are either specified elsewhere or are not significant. The date
|
|
# is relative to the Proleptic Gregorian Calendar. This can represent:
|
|
#
|
|
# * A full date, with non-zero year, month and day values
|
|
# * A month and day value, with a zero year, e.g. an anniversary
|
|
# * A year on its own, with zero month and day values
|
|
# * A year and month value, with a zero day, e.g. a credit card expiration date
|
|
#
|
|
# Related types are google.type.TimeOfDay and `google.protobuf.Timestamp`.
|
|
"year": 42, # Year of date. Must be from 1 to 9999, or 0 if specifying a date without
|
|
# a year.
|
|
"day": 42, # Day of month. Must be from 1 to 31 and valid for the year and month, or 0
|
|
# if specifying a year by itself or a year and month where the day is not
|
|
# significant.
|
|
"month": 42, # Month of year. Must be from 1 to 12, or 0 if specifying a year without a
|
|
# month and day.
|
|
},
|
|
"stringValue": "A String",
|
|
"booleanValue": True or False,
|
|
"integerValue": "A String",
|
|
},
|
|
],
|
|
"topSensitiveValues": [ # Estimated frequencies of top sensitive values.
|
|
{ # A value of a field, including its frequency.
|
|
"count": "A String", # How many times the value is contained in the field.
|
|
"value": { # Set of primitive values supported by the system. # A value contained in the field in question.
|
|
# Note that for the purposes of inspection or transformation, the number
|
|
# of bytes considered to comprise a 'Value' is based on its representation
|
|
# as a UTF-8 encoded string. For example, if 'integer_value' is set to
|
|
# 123456789, the number of bytes would be counted as 9, even though an
|
|
# int64 only holds up to 8 bytes of data.
|
|
"floatValue": 3.14,
|
|
"timestampValue": "A String",
|
|
"dayOfWeekValue": "A String",
|
|
"timeValue": { # Represents a time of day. The date and time zone are either not significant
|
|
# or are specified elsewhere. An API may choose to allow leap seconds. Related
|
|
# types are google.type.Date and `google.protobuf.Timestamp`.
|
|
"hours": 42, # Hours of day in 24 hour format. Should be from 0 to 23. An API may choose
|
|
# to allow the value "24:00:00" for scenarios like business closing time.
|
|
"nanos": 42, # Fractions of seconds in nanoseconds. Must be from 0 to 999,999,999.
|
|
"seconds": 42, # Seconds of minutes of the time. Must normally be from 0 to 59. An API may
|
|
# allow the value 60 if it allows leap-seconds.
|
|
"minutes": 42, # Minutes of hour of day. Must be from 0 to 59.
|
|
},
|
|
"dateValue": { # Represents a whole or partial calendar date, e.g. a birthday. The time of day
|
|
# and time zone are either specified elsewhere or are not significant. The date
|
|
# is relative to the Proleptic Gregorian Calendar. This can represent:
|
|
#
|
|
# * A full date, with non-zero year, month and day values
|
|
# * A month and day value, with a zero year, e.g. an anniversary
|
|
# * A year on its own, with zero month and day values
|
|
# * A year and month value, with a zero day, e.g. a credit card expiration date
|
|
#
|
|
# Related types are google.type.TimeOfDay and `google.protobuf.Timestamp`.
|
|
"year": 42, # Year of date. Must be from 1 to 9999, or 0 if specifying a date without
|
|
# a year.
|
|
"day": 42, # Day of month. Must be from 1 to 31 and valid for the year and month, or 0
|
|
# if specifying a year by itself or a year and month where the day is not
|
|
# significant.
|
|
"month": 42, # Month of year. Must be from 1 to 12, or 0 if specifying a year without a
|
|
# month and day.
|
|
},
|
|
"stringValue": "A String",
|
|
"booleanValue": True or False,
|
|
"integerValue": "A String",
|
|
},
|
|
},
|
|
],
|
|
"equivalenceClassSize": "A String", # Size of the k-anonymity equivalence class.
|
|
},
|
|
],
|
|
"bucketValueCount": "A String", # Total number of distinct equivalence classes in this bucket.
|
|
"bucketSize": "A String", # Total number of equivalence classes in this bucket.
|
|
"sensitiveValueFrequencyUpperBound": "A String", # Upper bound on the sensitive value frequencies of the equivalence
|
|
# classes in this bucket.
|
|
"sensitiveValueFrequencyLowerBound": "A String", # Lower bound on the sensitive value frequencies of the equivalence
|
|
# classes in this bucket.
|
|
},
|
|
],
|
|
},
|
|
"requestedPrivacyMetric": { # Privacy metric to compute for reidentification risk analysis. # Privacy metric to compute.
|
|
"numericalStatsConfig": { # Compute numerical stats over an individual column, including
|
|
# min, max, and quantiles.
|
|
"field": { # General identifier of a data field in a storage service. # Field to compute numerical stats on. Supported types are
|
|
# integer, float, date, datetime, timestamp, time.
|
|
"name": "A String", # Name describing the field.
|
|
},
|
|
},
|
|
"kMapEstimationConfig": { # Reidentifiability metric. This corresponds to a risk model similar to what
|
|
# is called "journalist risk" in the literature, except the attack dataset is
|
|
# statistically modeled instead of being perfectly known. This can be done
|
|
# using publicly available data (like the US Census), or using a custom
|
|
# statistical model (indicated as one or several BigQuery tables), or by
|
|
# extrapolating from the distribution of values in the input dataset.
|
|
# A column with a semantic tag attached.
|
|
"regionCode": "A String", # ISO 3166-1 alpha-2 region code to use in the statistical modeling.
|
|
# Required if no column is tagged with a region-specific InfoType (like
|
|
# US_ZIP_5) or a region code.
|
|
"quasiIds": [ # Fields considered to be quasi-identifiers. No two columns can have the
|
|
# same tag. [required]
|
|
{
|
|
"field": { # General identifier of a data field in a storage service. # Identifies the column. [required]
|
|
"name": "A String", # Name describing the field.
|
|
},
|
|
"customTag": "A String", # A column can be tagged with a custom tag. In this case, the user must
|
|
# indicate an auxiliary table that contains statistical information on
|
|
# the possible values of this column (below).
|
|
"infoType": { # Type of information detected by the API. # A column can be tagged with a InfoType to use the relevant public
|
|
# dataset as a statistical model of population, if available. We
|
|
# currently support US ZIP codes, region codes, ages and genders.
|
|
# To programmatically obtain the list of supported InfoTypes, use
|
|
# ListInfoTypes with the supported_by=RISK_ANALYSIS filter.
|
|
"name": "A String", # Name of the information type. Either a name of your choosing when
|
|
# creating a CustomInfoType, or one of the names listed
|
|
# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying
|
|
# a built-in type. InfoType names should conform to the pattern
|
|
# [a-zA-Z0-9_]{1,64}.
|
|
},
|
|
"inferred": { # A generic empty message that you can re-use to avoid defining duplicated # If no semantic tag is indicated, we infer the statistical model from
|
|
# the distribution of values in the input data
|
|
# empty messages in your APIs. A typical example is to use it as the request
|
|
# or the response type of an API method. For instance:
|
|
#
|
|
# service Foo {
|
|
# rpc Bar(google.protobuf.Empty) returns (google.protobuf.Empty);
|
|
# }
|
|
#
|
|
# The JSON representation for `Empty` is empty JSON object `{}`.
|
|
},
|
|
},
|
|
],
|
|
"auxiliaryTables": [ # Several auxiliary tables can be used in the analysis. Each custom_tag
|
|
# used to tag a quasi-identifiers column must appear in exactly one column
|
|
# of one auxiliary table.
|
|
{ # An auxiliary table contains statistical information on the relative
|
|
# frequency of different quasi-identifiers values. It has one or several
|
|
# quasi-identifiers columns, and one column that indicates the relative
|
|
# frequency of each quasi-identifier tuple.
|
|
# If a tuple is present in the data but not in the auxiliary table, the
|
|
# corresponding relative frequency is assumed to be zero (and thus, the
|
|
# tuple is highly reidentifiable).
|
|
"relativeFrequency": { # General identifier of a data field in a storage service. # The relative frequency column must contain a floating-point number
|
|
# between 0 and 1 (inclusive). Null values are assumed to be zero.
|
|
# [required]
|
|
"name": "A String", # Name describing the field.
|
|
},
|
|
"quasiIds": [ # Quasi-identifier columns. [required]
|
|
{ # A quasi-identifier column has a custom_tag, used to know which column
|
|
# in the data corresponds to which column in the statistical model.
|
|
"field": { # General identifier of a data field in a storage service.
|
|
"name": "A String", # Name describing the field.
|
|
},
|
|
"customTag": "A String",
|
|
},
|
|
],
|
|
"table": { # Message defining the location of a BigQuery table. A table is uniquely # Auxiliary table location. [required]
|
|
# identified by its project_id, dataset_id, and table_name. Within a query
|
|
# a table is often referenced with a string in the format of:
|
|
# `<project_id>:<dataset_id>.<table_id>` or
|
|
# `<project_id>.<dataset_id>.<table_id>`.
|
|
"projectId": "A String", # The Google Cloud Platform project ID of the project containing the table.
|
|
# If omitted, project ID is inferred from the API call.
|
|
"tableId": "A String", # Name of the table.
|
|
"datasetId": "A String", # Dataset ID of the table.
|
|
},
|
|
},
|
|
],
|
|
},
|
|
"lDiversityConfig": { # l-diversity metric, used for analysis of reidentification risk.
|
|
"sensitiveAttribute": { # General identifier of a data field in a storage service. # Sensitive field for computing the l-value.
|
|
"name": "A String", # Name describing the field.
|
|
},
|
|
"quasiIds": [ # Set of quasi-identifiers indicating how equivalence classes are
|
|
# defined for the l-diversity computation. When multiple fields are
|
|
# specified, they are considered a single composite key.
|
|
{ # General identifier of a data field in a storage service.
|
|
"name": "A String", # Name describing the field.
|
|
},
|
|
],
|
|
},
|
|
"deltaPresenceEstimationConfig": { # δ-presence metric, used to estimate how likely it is for an attacker to
|
|
# figure out that one given individual appears in a de-identified dataset.
|
|
# Similarly to the k-map metric, we cannot compute δ-presence exactly without
|
|
# knowing the attack dataset, so we use a statistical model instead.
|
|
"regionCode": "A String", # ISO 3166-1 alpha-2 region code to use in the statistical modeling.
|
|
# Required if no column is tagged with a region-specific InfoType (like
|
|
# US_ZIP_5) or a region code.
|
|
"quasiIds": [ # Fields considered to be quasi-identifiers. No two fields can have the
|
|
# same tag. [required]
|
|
{ # A column with a semantic tag attached.
|
|
"field": { # General identifier of a data field in a storage service. # Identifies the column. [required]
|
|
"name": "A String", # Name describing the field.
|
|
},
|
|
"customTag": "A String", # A column can be tagged with a custom tag. In this case, the user must
|
|
# indicate an auxiliary table that contains statistical information on
|
|
# the possible values of this column (below).
|
|
"infoType": { # Type of information detected by the API. # A column can be tagged with a InfoType to use the relevant public
|
|
# dataset as a statistical model of population, if available. We
|
|
# currently support US ZIP codes, region codes, ages and genders.
|
|
# To programmatically obtain the list of supported InfoTypes, use
|
|
# ListInfoTypes with the supported_by=RISK_ANALYSIS filter.
|
|
"name": "A String", # Name of the information type. Either a name of your choosing when
|
|
# creating a CustomInfoType, or one of the names listed
|
|
# at https://cloud.google.com/dlp/docs/infotypes-reference when specifying
|
|
# a built-in type. InfoType names should conform to the pattern
|
|
# [a-zA-Z0-9_]{1,64}.
|
|
},
|
|
"inferred": { # A generic empty message that you can re-use to avoid defining duplicated # If no semantic tag is indicated, we infer the statistical model from
|
|
# the distribution of values in the input data
|
|
# empty messages in your APIs. A typical example is to use it as the request
|
|
# or the response type of an API method. For instance:
|
|
#
|
|
# service Foo {
|
|
# rpc Bar(google.protobuf.Empty) returns (google.protobuf.Empty);
|
|
# }
|
|
#
|
|
# The JSON representation for `Empty` is empty JSON object `{}`.
|
|
},
|
|
},
|
|
],
|
|
"auxiliaryTables": [ # Several auxiliary tables can be used in the analysis. Each custom_tag
|
|
# used to tag a quasi-identifiers field must appear in exactly one
|
|
# field of one auxiliary table.
|
|
{ # An auxiliary table containing statistical information on the relative
|
|
# frequency of different quasi-identifiers values. It has one or several
|
|
# quasi-identifiers columns, and one column that indicates the relative
|
|
# frequency of each quasi-identifier tuple.
|
|
# If a tuple is present in the data but not in the auxiliary table, the
|
|
# corresponding relative frequency is assumed to be zero (and thus, the
|
|
# tuple is highly reidentifiable).
|
|
"relativeFrequency": { # General identifier of a data field in a storage service. # The relative frequency column must contain a floating-point number
|
|
# between 0 and 1 (inclusive). Null values are assumed to be zero.
|
|
# [required]
|
|
"name": "A String", # Name describing the field.
|
|
},
|
|
"quasiIds": [ # Quasi-identifier columns. [required]
|
|
{ # A quasi-identifier column has a custom_tag, used to know which column
|
|
# in the data corresponds to which column in the statistical model.
|
|
"field": { # General identifier of a data field in a storage service.
|
|
"name": "A String", # Name describing the field.
|
|
},
|
|
"customTag": "A String",
|
|
},
|
|
],
|
|
"table": { # Message defining the location of a BigQuery table. A table is uniquely # Auxiliary table location. [required]
|
|
# identified by its project_id, dataset_id, and table_name. Within a query
|
|
# a table is often referenced with a string in the format of:
|
|
# `<project_id>:<dataset_id>.<table_id>` or
|
|
# `<project_id>.<dataset_id>.<table_id>`.
|
|
"projectId": "A String", # The Google Cloud Platform project ID of the project containing the table.
|
|
# If omitted, project ID is inferred from the API call.
|
|
"tableId": "A String", # Name of the table.
|
|
"datasetId": "A String", # Dataset ID of the table.
|
|
},
|
|
},
|
|
],
|
|
},
|
|
"categoricalStatsConfig": { # Compute numerical stats over an individual column, including
|
|
# number of distinct values and value count distribution.
|
|
"field": { # General identifier of a data field in a storage service. # Field to compute categorical stats on. All column types are
|
|
# supported except for arrays and structs. However, it may be more
|
|
# informative to use NumericalStats when the field type is supported,
|
|
# depending on the data.
|
|
"name": "A String", # Name describing the field.
|
|
},
|
|
},
|
|
"kAnonymityConfig": { # k-anonymity metric, used for analysis of reidentification risk.
|
|
"entityId": { # An entity in a dataset is a field or set of fields that correspond to a # Optional message indicating that multiple rows might be associated to a
|
|
# single individual. If the same entity_id is associated to multiple
|
|
# quasi-identifier tuples over distinct rows, we consider the entire
|
|
# collection of tuples as the composite quasi-identifier. This collection
|
|
# is a multiset: the order in which the different tuples appear in the
|
|
# dataset is ignored, but their frequency is taken into account.
|
|
#
|
|
# Important note: a maximum of 1000 rows can be associated to a single
|
|
# entity ID. If more rows are associated with the same entity ID, some
|
|
# might be ignored.
|
|
# single person. For example, in medical records the `EntityId` might be a
|
|
# patient identifier, or for financial records it might be an account
|
|
# identifier. This message is used when generalizations or analysis must take
|
|
# into account that multiple rows correspond to the same entity.
|
|
"field": { # General identifier of a data field in a storage service. # Composite key indicating which field contains the entity identifier.
|
|
"name": "A String", # Name describing the field.
|
|
},
|
|
},
|
|
"quasiIds": [ # Set of fields to compute k-anonymity over. When multiple fields are
|
|
# specified, they are considered a single composite key. Structs and
|
|
# repeated data types are not supported; however, nested fields are
|
|
# supported so long as they are not structs themselves or nested within
|
|
# a repeated field.
|
|
{ # General identifier of a data field in a storage service.
|
|
"name": "A String", # Name describing the field.
|
|
},
|
|
],
|
|
},
|
|
},
|
|
"categoricalStatsResult": { # Result of the categorical stats computation.
|
|
"valueFrequencyHistogramBuckets": [ # Histogram of value frequencies in the column.
|
|
{
|
|
"bucketValues": [ # Sample of value frequencies in this bucket. The total number of
|
|
# values returned per bucket is capped at 20.
|
|
{ # A value of a field, including its frequency.
|
|
"count": "A String", # How many times the value is contained in the field.
|
|
"value": { # Set of primitive values supported by the system. # A value contained in the field in question.
|
|
# Note that for the purposes of inspection or transformation, the number
|
|
# of bytes considered to comprise a 'Value' is based on its representation
|
|
# as a UTF-8 encoded string. For example, if 'integer_value' is set to
|
|
# 123456789, the number of bytes would be counted as 9, even though an
|
|
# int64 only holds up to 8 bytes of data.
|
|
"floatValue": 3.14,
|
|
"timestampValue": "A String",
|
|
"dayOfWeekValue": "A String",
|
|
"timeValue": { # Represents a time of day. The date and time zone are either not significant
|
|
# or are specified elsewhere. An API may choose to allow leap seconds. Related
|
|
# types are google.type.Date and `google.protobuf.Timestamp`.
|
|
"hours": 42, # Hours of day in 24 hour format. Should be from 0 to 23. An API may choose
|
|
# to allow the value "24:00:00" for scenarios like business closing time.
|
|
"nanos": 42, # Fractions of seconds in nanoseconds. Must be from 0 to 999,999,999.
|
|
"seconds": 42, # Seconds of minutes of the time. Must normally be from 0 to 59. An API may
|
|
# allow the value 60 if it allows leap-seconds.
|
|
"minutes": 42, # Minutes of hour of day. Must be from 0 to 59.
|
|
},
|
|
"dateValue": { # Represents a whole or partial calendar date, e.g. a birthday. The time of day
|
|
# and time zone are either specified elsewhere or are not significant. The date
|
|
# is relative to the Proleptic Gregorian Calendar. This can represent:
|
|
#
|
|
# * A full date, with non-zero year, month and day values
|
|
# * A month and day value, with a zero year, e.g. an anniversary
|
|
# * A year on its own, with zero month and day values
|
|
# * A year and month value, with a zero day, e.g. a credit card expiration date
|
|
#
|
|
# Related types are google.type.TimeOfDay and `google.protobuf.Timestamp`.
|
|
"year": 42, # Year of date. Must be from 1 to 9999, or 0 if specifying a date without
|
|
# a year.
|
|
"day": 42, # Day of month. Must be from 1 to 31 and valid for the year and month, or 0
|
|
# if specifying a year by itself or a year and month where the day is not
|
|
# significant.
|
|
"month": 42, # Month of year. Must be from 1 to 12, or 0 if specifying a year without a
|
|
# month and day.
|
|
},
|
|
"stringValue": "A String",
|
|
"booleanValue": True or False,
|
|
"integerValue": "A String",
|
|
},
|
|
},
|
|
],
|
|
"bucketValueCount": "A String", # Total number of distinct values in this bucket.
|
|
"valueFrequencyUpperBound": "A String", # Upper bound on the value frequency of the values in this bucket.
|
|
"valueFrequencyLowerBound": "A String", # Lower bound on the value frequency of the values in this bucket.
|
|
"bucketSize": "A String", # Total number of values in this bucket.
|
|
},
|
|
],
|
|
},
|
|
"deltaPresenceEstimationResult": { # Result of the δ-presence computation. Note that these results are an
|
|
# estimation, not exact values.
|
|
"deltaPresenceEstimationHistogram": [ # The intervals [min_probability, max_probability) do not overlap. If a
|
|
# value doesn't correspond to any such interval, the associated frequency
|
|
# is zero. For example, the following records:
|
|
# {min_probability: 0, max_probability: 0.1, frequency: 17}
|
|
# {min_probability: 0.2, max_probability: 0.3, frequency: 42}
|
|
# {min_probability: 0.3, max_probability: 0.4, frequency: 99}
|
|
# mean that there are no record with an estimated probability in [0.1, 0.2)
|
|
# nor larger or equal to 0.4.
|
|
{ # A DeltaPresenceEstimationHistogramBucket message with the following
|
|
# values:
|
|
# min_probability: 0.1
|
|
# max_probability: 0.2
|
|
# frequency: 42
|
|
# means that there are 42 records for which δ is in [0.1, 0.2). An
|
|
# important particular case is when min_probability = max_probability = 1:
|
|
# then, every individual who shares this quasi-identifier combination is in
|
|
# the dataset.
|
|
"bucketValues": [ # Sample of quasi-identifier tuple values in this bucket. The total
|
|
# number of classes returned per bucket is capped at 20.
|
|
{ # A tuple of values for the quasi-identifier columns.
|
|
"quasiIdsValues": [ # The quasi-identifier values.
|
|
{ # Set of primitive values supported by the system.
|
|
# Note that for the purposes of inspection or transformation, the number
|
|
# of bytes considered to comprise a 'Value' is based on its representation
|
|
# as a UTF-8 encoded string. For example, if 'integer_value' is set to
|
|
# 123456789, the number of bytes would be counted as 9, even though an
|
|
# int64 only holds up to 8 bytes of data.
|
|
"floatValue": 3.14,
|
|
"timestampValue": "A String",
|
|
"dayOfWeekValue": "A String",
|
|
"timeValue": { # Represents a time of day. The date and time zone are either not significant
|
|
# or are specified elsewhere. An API may choose to allow leap seconds. Related
|
|
# types are google.type.Date and `google.protobuf.Timestamp`.
|
|
"hours": 42, # Hours of day in 24 hour format. Should be from 0 to 23. An API may choose
|
|
# to allow the value "24:00:00" for scenarios like business closing time.
|
|
"nanos": 42, # Fractions of seconds in nanoseconds. Must be from 0 to 999,999,999.
|
|
"seconds": 42, # Seconds of minutes of the time. Must normally be from 0 to 59. An API may
|
|
# allow the value 60 if it allows leap-seconds.
|
|
"minutes": 42, # Minutes of hour of day. Must be from 0 to 59.
|
|
},
|
|
"dateValue": { # Represents a whole or partial calendar date, e.g. a birthday. The time of day
|
|
# and time zone are either specified elsewhere or are not significant. The date
|
|
# is relative to the Proleptic Gregorian Calendar. This can represent:
|
|
#
|
|
# * A full date, with non-zero year, month and day values
|
|
# * A month and day value, with a zero year, e.g. an anniversary
|
|
# * A year on its own, with zero month and day values
|
|
# * A year and month value, with a zero day, e.g. a credit card expiration date
|
|
#
|
|
# Related types are google.type.TimeOfDay and `google.protobuf.Timestamp`.
|
|
"year": 42, # Year of date. Must be from 1 to 9999, or 0 if specifying a date without
|
|
# a year.
|
|
"day": 42, # Day of month. Must be from 1 to 31 and valid for the year and month, or 0
|
|
# if specifying a year by itself or a year and month where the day is not
|
|
# significant.
|
|
"month": 42, # Month of year. Must be from 1 to 12, or 0 if specifying a year without a
|
|
# month and day.
|
|
},
|
|
"stringValue": "A String",
|
|
"booleanValue": True or False,
|
|
"integerValue": "A String",
|
|
},
|
|
],
|
|
"estimatedProbability": 3.14, # The estimated probability that a given individual sharing these
|
|
# quasi-identifier values is in the dataset. This value, typically called
|
|
# δ, is the ratio between the number of records in the dataset with these
|
|
# quasi-identifier values, and the total number of individuals (inside
|
|
# *and* outside the dataset) with these quasi-identifier values.
|
|
# For example, if there are 15 individuals in the dataset who share the
|
|
# same quasi-identifier values, and an estimated 100 people in the entire
|
|
# population with these values, then δ is 0.15.
|
|
},
|
|
],
|
|
"bucketValueCount": "A String", # Total number of distinct quasi-identifier tuple values in this bucket.
|
|
"bucketSize": "A String", # Number of records within these probability bounds.
|
|
"maxProbability": 3.14, # Always greater than or equal to min_probability.
|
|
"minProbability": 3.14, # Between 0 and 1.
|
|
},
|
|
],
|
|
},
|
|
"requestedSourceTable": { # Message defining the location of a BigQuery table. A table is uniquely # Input dataset to compute metrics over.
|
|
# identified by its project_id, dataset_id, and table_name. Within a query
|
|
# a table is often referenced with a string in the format of:
|
|
# `<project_id>:<dataset_id>.<table_id>` or
|
|
# `<project_id>.<dataset_id>.<table_id>`.
|
|
"projectId": "A String", # The Google Cloud Platform project ID of the project containing the table.
|
|
# If omitted, project ID is inferred from the API call.
|
|
"tableId": "A String", # Name of the table.
|
|
"datasetId": "A String", # Dataset ID of the table.
|
|
},
|
|
},
|
|
"state": "A String", # State of a job.
|
|
"jobTriggerName": "A String", # If created by a job trigger, the resource name of the trigger that
|
|
# instantiated the job.
|
|
"startTime": "A String", # Time when the job started.
|
|
"endTime": "A String", # Time when the job finished.
|
|
"type": "A String", # The type of job.
|
|
"createTime": "A String", # Time when the job was created.
|
|
},
|
|
],
|
|
}</pre>
|
|
</div>
|
|
|
|
<div class="method">
|
|
<code class="details" id="list_next">list_next(previous_request, previous_response)</code>
|
|
<pre>Retrieves the next page of results.
|
|
|
|
Args:
|
|
previous_request: The request for the previous page. (required)
|
|
previous_response: The response from the request for the previous page. (required)
|
|
|
|
Returns:
|
|
A request object that you can call 'execute()' on to request the next
|
|
page. Returns None if there are no more items in the collection.
|
|
</pre>
|
|
</div>
|
|
|
|
</body></html> |