What I'm trying to achieve?
- I have classified data (in JSON format), I want to generate a model which should give me the likelihood of new incoming data with existing classified data. (Likelihood with all existing classes).
- For example, I've classified the existing data in 2 classes.
tier1
andtier2
. When I receive new data I want to know, how many%
new data is matching with my existingtier1
andtier2
data! If it is not matching just want to get0 %
Sample data I've
[
{
"type": "threat",
"severity": "2",
"category": "tier1"
},
{
"type": "threat",
"severity": "3",
"category": "tier1"
},
{
"type": "malware",
"severity": "7",
"category": "tier2"
},
{
"type": "threat",
"severity": "7",
"category": "tier2"
},
{
"type": "malware",
"severity": "5",
"category": "tier1"
},
{
"type": "threat",
"severity": "14",
"category": "tier2"
},
{
"type": "malware",
"severity": "13",
"category": "tier2"
},
{
"type": "threat",
"severity": "14",
"category": "tier2"
},
{
"type": "threat",
"severity": "1",
"category": "tier1"
},
]
Incoming Data and my expectations
- Scenario 1: Incoming data:
{
"type": "foo",
"severity": "cdsb",
}
Expectation: tier1: 0 %, tier2: 0 %
- Scenario 2: Incoming data:
{
"type": "threat",
"severity": "60",
}
Expectation: tier1: X %, tier2: Y%
Some more questions:
1. What is the best approach to solving this?
2. The data which I've shown here is just 2
features, but real input data has so many more fields with a different type. What is the best way to extract the features!