more than one target column in MLDataTable - CreateML framework in Swift

486 Views Asked by At

I want to create a MLDataTable with one feature and three targets using Create ML framework. For example, let us assume I'm creating a calendar app, which has a feature to add quick event like the Native Mac Calendar app. I have a feature column text which contains strings like Club game at Nehru Stadium, Chennai on Saturday Morning. I want the three target columns title, location and time to get the values Club game, Nehru Stadium, Chennai and 24 Nov 2018, 08:00.

Also, Kindly let me know if there are any other ways to implement the same using CreateML framework.

1

There are 1 best solutions below

2
On

You can train MLWordTagger for this task. Create a training data file (JSON) in this format.

[
    {
      "tokens": [
        "Club game",
        "at",
        "Nehru Stadium Chennai",
        "on",
        "Saturday Morning"
      ],
      "labels": [
        "TITLE",
        "NONE",
        "LOCATION",
        "NONE",
        "TIME"
      ]
    },
    ... other sample records...

  ]

You can train with the code below in Playground.

var trainingData = try MLDataTable(contentsOf: URL(fileURLWithPath: "/pathto..train.json"))

let model = try! MLWordTagger(trainingData: trainingData, tokenColumn: "tokens", labelColumn: "labels")

And then use this prediction method to predict each of your tokens in the sentence.

func prediction(from tokens: [MLWordTagger.Token]) throws -> [String]

This method return an array of tags for the tokens.

An alternative way of doing this is using NLTagger which is already able to detect place names, organization names but time.

import NaturalLanguage

let text = "Club game at Nehru Stadium, Chennai on Saturday Morning."
let tagger = NLTagger(tagSchemes: [.nameType])
tagger.string = text
let options: NLTagger.Options = [.omitPunctuation, .omitWhitespace, .joinNames]
let tags: [NLTag] = [.personalName, .placeName, .organizationName]
tagger.enumerateTags(in: text.startIndex..<text.endIndex, unit: .word, scheme: .nameType, options: options) { tag, tokenRange in
    if let tag = tag, tags.contains(tag) {
        print("\(text[tokenRange]): \(tag.rawValue)")
    }
    return true
} 

This will return the output below, so you will only need to train model to detect the adverb of time.

Nehru Stadium: PlaceName
Chennai: OrganizationName