AWS GroundTruth text labeling - hide columns in the data, and checking quality of answers

115 Views Asked by At

I am new to SageMaker. I have a large csv dataset which I would like labelled:

sentence_id sentence pre_agreed_label
148392 A sentence 0
383294 Another sentence 1

For each sentence, I would like a) a yes/no binary classification in response to a question, and b) on a scale of 1-3, how obvious the classification was. I need the sentence id to map to other parts of the dataset, and will use the pre-agreed labels to assess accuracy.

I have identified SageMaker GroundTruth labelling jobs as a possible way to do this. Is this the best way? In trying to set it up I have run into a few problems.

The first problem is I can't find a way to display only the sentence column to the labellers, hiding the sentence_id and pre_agreed_labels.

The second is that there is either single labelling or multi labelling, but I would like a way to have two sets of single-selection labels:

Select one for binary classification:

  1. Yes
  2. No

Select one for difficulty of classification:

  1. Easy
  2. Medium
  3. Hard

It seems as though this can be done using custom HTML, but I don't know how to do this - the template it gives you doesn't even render

Finally, having not used mechanical turk before, are there ways of ensuring people take the work seriously and don't just select random answers? I can see there's an option to have x number of people answer the same question, but is there also a way to put in an obvious question to which we already have a 'pre_agreed_label' every nth question, and kick people off the task if they get it wrong? There also appears to be a maximum of $1.20 per task which seems odd.

0

There are 0 best solutions below