How to prevent Amazon SageMaker from splitting my .txt file into lines?

239 Views Asked by At

I want to create a labeling job for workers to label my text data. Each text file should be labeled as an entity. SageMaker seems to split my text into lines, so each line can be labeled, which does not make any sense for my project. I used GroundTruth option ‘Create a labeling job’ and could not find any configuration options to prevent the splitting.

1

There are 1 best solutions below

0
On

Firstly replace all the new line characters in your text i.e "/n" with a <br/> tag. Then you will need to create a custom labelling job , also you can choose from the pre-defined templates for the initial code. Inside the tag just include "skip_autoescape" it will help in considering the <br/> as the line break and you can see the desired output as a single entity.

Follow below docs for more references :

https://docs.aws.amazon.com/sagemaker/latest/dg/sms-custom-templates-step2.html