I tried to use gcp code for dlp:
The code is easily found from this gcp official website documentation(no changes made other than inputting my own credentials):
def deidentify_with_mask(
project, input_str, info_types, replacement_str="REPLACEMENT_STR",
):
"""Uses the Data Loss Prevention API to deidentify sensitive data in a
string by replacing matched input values with a value you specify.
Args:
project: The Google Cloud project id to use as a parent resource.
input_str: The string to deidentify (will be treated as text).
info_types: A list of strings representing info types to look for.
replacement_str: The string to replace all values that match given
info types.
Returns:
None; the response from the API is printed to the terminal.
"""
import google.cloud.dlp
# Instantiate a client
dlp = google.cloud.dlp_v2.DlpServiceClient(credentials=credentials)
# Convert the project id into a full resource id.
parent = f"projects/{project}"
# Construct inspect configuration dictionary
inspect_config = {"info_types": [{"name": info_type} for info_type in info_types]}
# Construct deidentify configuration dictionary
deidentify_config = {
"info_type_transformations": {
"transformations": [
{
"primitive_transformation": {
"replace_config": {
"new_value": {"string_value": replacement_str}
}
}
}
]
}
}
# Construct item
item = {"value": input_str}
# Call the API
response = dlp.deidentify_content(
request={
"parent": parent,
"deidentify_config": deidentify_config,
"inspect_config": inspect_config,
"item": item,
}
)
# Print out the results.
print(response.item.value)
I received an error stating:
[2020-09-10 00:18:25,312] {{base_task_runner.py:101}} INFO - Job 3: Subtask task File "pandas/_libs/lib.pyx", line 2228, in pandas._libs.lib.map_infer
[2020-09-10 00:18:25,312] {{base_task_runner.py:101}} INFO - Job 3: Subtask task File "/usr/local/airflow/src/task/src_task.py", line 133, in <lambda>
[2020-09-10 00:18:25,312] {{base_task_runner.py:101}} INFO - Job 3: Subtask task info_types=info_types))
[2020-09-10 00:18:25,312] {{base_task_runner.py:101}} INFO - Job 3: Subtask task File "/usr/local/airflow/src/task/src_task.py", line 89, in deidentify_with_mask
[2020-09-10 00:18:25,312] {{base_task_runner.py:101}} INFO - Job 3: Subtask task parent = dlp.project_path(project)
[2020-09-10 00:18:25,312] {{base_task_runner.py:101}} INFO - Job 3: Subtask task AttributeError: 'DlpServiceClient' object has no attribute 'project_path'
[2020-09-10 00:18:26,263] {{logging_mixin.py:95}} INFO - [2020-09-10 00:18:26,261] {{jobs.py:2627}} INFO - Task exited with return code 1
I dont understand why I received this error because when I try it locally it worked but not in airflow.
This question has been fixed!
The problem is due to the version not compatible.
The above code solved my problem.