Unable to feed JSONL data to AutoML NLP - Entity Extraction

313 Views Asked by At

So, I'm trying to implement Entity Extraction for AutoML and I'm a complete beginner. My CSV file upload is successful but my JSONL isn't being parsed correctly. Where am I going wrong? The starting line of the file contains all the training data. I haven't annotated it yet as I intend to do so on the UI. What am I doing wrong?

PS: I used Pandas to convert it to JSONL

enter image description here

1

There are 1 best solutions below

0
On

You should use the textContent key instead of text_snippet. Check out the AI Platform docs versus the Cloud Natural Language docs

I recently had this issue because I was trying to use AI Platform's Natural Language with the format defined for Cloud Natural Language AutoML. I did not initially realize these were separate products and had different schemas for importing data.

Try this JSONL schema to see if it works.

{
    "textSegmentAnnotations": [
      {
        "startOffset":number,
        "endOffset":number,
        "displayName": "label"
      },
      ...
    ],
    "textContent": "inline_text"|"textGcsUri": "gcs_uri_to_file",
    "dataItemResourceLabels": {
      "aiplatform.googleapis.com/ml_use": "training|test|validation"
    }
}