Comprehend: Difference between detect_pii_entities and contains_pii_entities boto3 comprehend

493 Views Asked by At

I am trying to understand the difference between using botot3 comprehend's detect_pii_entities and contains_pii_entities functions. I tried to use the following snippet:

str_text = """
Hello Zhang Wei, I am John. Your AnyCompany Financial Services, LLC credit card account 1111-0000-1111-0008 has a minimum payment of $24.53 that is due by July 31st. Based on your autopay settings, we will withdraw your payment on the due date from your bank account number XXXXXX1111 with the routing number XXXXX0000. 

Your latest statement was mailed to 100 Main Street, Any City, WA 98121. 
After your payment is received, you will receive a confirmation text message at 206-555-0100. 
If you have questions about your bill, AnyCompany Customer Service is available by phone at 206-555-0199 or email at [email protected].
"""

client = boto3.client('comprehend')
detect_pii = client.detect_pii_entities(
             Text=str_text,
             LanguageCode='en'
         )
print("detect pii: ", detect_pii)
contains_pii = client.detect_pii_entities(
             Text=str_text,
             LanguageCode='en'
         )
print("contains pii: ", contains_pii)

The output that i get is:

detect_pii:  {'Entities': [{'Score': 0.9996908903121948, 'Type': 'NAME', 'BeginOffset': 52, 'EndOffset': 61}, {'Score': 0.9999550580978394, 'Type': 'NAME', 'BeginOffset': 68, 'EndOffset': 72}, {'Score': 0.9627901911735535, 'Type': 'CREDIT_DEBIT_NUMBER', 'BeginOffset': 134, 'EndOffset': 153}, {'Score': 0.9714980125427246, 'Type': 'DATE_TIME', 'BeginOffset': 201, 'EndOffset': 210}, {'Score': 0.9999960660934448, 'Type': 'BANK_ACCOUNT_NUMBER', 'BeginOffset': 320, 'EndOffset': 330}, {'Score': 0.999988317489624, 'Type': 'BANK_ROUTING', 'BeginOffset': 355, 'EndOffset': 364}, {'Score': 0.9999522566795349, 'Type': 'ADDRESS', 'BeginOffset': 406, 'EndOffset': 441}, {'Score': 0.9999591112136841, 'Type': 'PHONE', 'BeginOffset': 525, 'EndOffset': 537}, {'Score': 0.999980092048645, 'Type': 'PHONE', 'BeginOffset': 633, 'EndOffset': 645}, {'Score': 0.9995272159576416, 'Type': 'EMAIL', 'BeginOffset': 658, 'EndOffset': 680}], 'ResponseMetadata': {'RequestId': '80d513d3-83b3-4ebc-915a-1e2c731d1eb4', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': '80d513d3-83b3-4ebc-915a-1e2c731d1eb4', 'content-type': 'application/x-amz-json-1.1', 'content-length': '827', 'date': 'Fri, 04 Mar 2022 16:03:42 GMT'}, 'RetryAttempts': 0}}

contains_pii: {'Labels': [{'Name': 'DATE_TIME', 'Score': 0.9986850023269653}, {'Name': 'EMAIL', 'Score': 0.9985549449920654}, {'Name': 'BANK_ACCOUNT_NUMBER', 'Score': 0.8221991658210754}, {'Name': 'BANK_ROUTING', 'Score': 0.6654205918312073}, {'Name': 'CREDIT_DEBIT_NUMBER', 'Score': 1.0}, {'Name': 'PHONE', 'Score': 1.0}], 'ResponseMetadata': {'RequestId': 'f0361d1a-afad-4b4f-9877-fdbb5c297936', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': 'f0361d1a-afad-4b4f-9877-fdbb5c297936', 'content-type': 'application/x-amz-json-1.1', 'content-length': '285', 'date': 'Fri, 04 Mar 2022 16:03:42 GMT'}, 'RetryAttempts': 0}}

I see that in the second case Name and Address are missing and maybe some more PII labels. How do I get that using contains. The documentation suggests that Name and Address should be available as well as the Comprehend API on the console gives me back all PII labels.

Output on AWS console:

{
    "Labels": [
        {
            "Name": "EMAIL",
            "Score": 1
        },
        {
            "Name": "DATE_TIME",
            "Score": 1
        },
        {
            "Name": "NAME",
            "Score": 0.8311530351638794
        },
        {
            "Name": "BANK_ROUTING",
            "Score": 0.7879412174224854
        },
        {
            "Name": "ADDRESS",
            "Score": 0.6723417043685913
        },
        {
            "Name": "BANK_ACCOUNT_NUMBER",
            "Score": 0.6297846436500549
        },
        {
            "Name": "CREDIT_DEBIT_NUMBER",
            "Score": 1
        },
        {
            "Name": "PHONE",
            "Score": 1
        }
    ]
}

Not sure what I am missing while using the boto3 package. boto3 version used: 1.18.12

0

There are 0 best solutions below