Documents likelihood with Machine Learning

75 Views Asked by At

What I'm trying to achieve?

  • I have classified data (in JSON format), I want to generate a model which should give me the likelihood of new incoming data with existing classified data. (Likelihood with all existing classes).
  • For example, I've classified the existing data in 2 classes. tier1 and tier2. When I receive new data I want to know, how many % new data is matching with my existing tier1 and tier2 data! If it is not matching just want to get 0 %

Sample data I've

[
    {
        "type": "threat",
        "severity": "2",
        "category": "tier1"
    },
    {
        "type": "threat",
        "severity": "3",
        "category": "tier1"
    },
    {
        "type": "malware",
        "severity": "7",
        "category": "tier2"
    },
    {
        "type": "threat",
        "severity": "7",
        "category": "tier2"
    },
    {
        "type": "malware",
        "severity": "5",
        "category": "tier1"
    },
    {
        "type": "threat",
        "severity": "14",
        "category": "tier2"
    },
    {
        "type": "malware",
        "severity": "13",
        "category": "tier2"
    },
    {
        "type": "threat",
        "severity": "14",
        "category": "tier2"
    },
    {
        "type": "threat",
        "severity": "1",
        "category": "tier1"
    },
]

Incoming Data and my expectations

  • Scenario 1: Incoming data:
{
    "type": "foo",
    "severity": "cdsb",
}

Expectation: tier1: 0 %, tier2: 0 %

  • Scenario 2: Incoming data:
{
    "type": "threat",
    "severity": "60",
}

Expectation: tier1: X %, tier2: Y%

Some more questions: 1. What is the best approach to solving this? 2. The data which I've shown here is just 2 features, but real input data has so many more fields with a different type. What is the best way to extract the features!

0

There are 0 best solutions below