Notice: I'm going to use direct string manipulation to get my data instead of JSON. I STILL WANT TO KNOW how to do this.
I have very large datasets I'm currently trying to categorize, but the first step is getting my data into a format that I can process.
I have an array of streams containing an array of XYZ data (an array) of variable length. I understand I will have to have a fixed length dataset (with sparse support for shorter datasets), and I will trim and '?' accordingly. This is ONE DATASET, NOT 21 datasets.
The most raw form of my data is currently generating a JSON file in the format below.
{
"0":
[[0.268869,-0.061725,1.466800],
[0.265376,-0.061317,1.453814],
[0.261664,-0.061190,1.439445],...1-n elements
"20": [[0.268869,-0.061725,1.466800],...21 containers/training file
[0.265376,-0.061317,1.453814],
[0.261664,-0.061190,1.439445]
This is the general structure I would use for training data.
@relation streams
@attribute 0 relational
@attribute f1 numeric
@attribute f2 numeric
@attribute f3 numeric
@end 0
@attribute 1 relational
@attribute f1 numeric
@attribute f2 numeric
@attribute f3 numeric
@end 1
@attribute 2 relational
@attribute f1 numeric
@attribute f2 numeric
@attribute f3 numeric
@end 2... (we have 21 of these)
@data
"0.268869,-0.061725,1.466800\n0.265...","0.268869,-0.0617...(sample 1)
"0.268869,-0.061725,1.466800\n0.265...","0.268869,-0.0617...(sample 2)
"0.268869,-0.061725,1.466800\n0.265...","0.268869,-0.0617...(sample 3)
... (tons of data here)
The problem I'm having is that I can't export relational ARFF files into JSON so I can see the structure they would need. I'm building the data in C using JSON-C so using this library to just build the header would be cleaner (IMO) than raw string manipulation and more useful for most of my applications being that it will be in a widely supported format.
Regarding the question How do I use a JSON file with weka: None of the WEKA arff files contain datasets that are multidimensional. I've perused the arff files and converted a few over to see if I can pattern match but I'm not having much luck.
What I did learn from those conversions is that to load the data into the WEKA GUI I need a header. Here is where I am now, with a very stripped down example that (understandably) fails to load.
{
"header" : {
"relation" : "delta"
"attributes" : [
{
"name": "delta",
"type": "numeric",(relational? how do I subclass here?)
"class": false, (theres no documentation on the JSON type)
"weight" : 1.0,
}
]
},
"data" : [
{
"sparse" : false,
"weight" : 1.0,
"values" : [
[
[0.268869,-0.061725,1.466800],
[0.265376,-0.061317,1.453814],
[0.261664,-0.061190,1.439445],
[0.258106,-0.061153,1.423623],
[0.255281,-0.060748,1.406505],
[0.253105,-0.059812,1.388318],
[0.250796,-0.058583,1.369752],
[0.248108,-0.057399,1.351671],
[0.245563,-0.056328,1.334261],
[0.243474,-0.055272,1.316677]
],[
[0.301861,-0.056221,1.282535],
[0.302261,-0.055824,1.270375],
[0.302599,-0.055763,1.256942],
[0.303153,-0.055863,1.242172],
[0.304334,-0.055614,1.226184],
[0.305898,-0.054782,1.209144],
[0.306914,-0.053585,1.191657],
[0.307043,-0.052422,1.174524],
[0.306837,-0.051428,1.157930],
[0.306804,-0.050517,1.141103]
],[
[0.311746,-0.050597,0.997220],
[0.316743,-0.050354,0.985238],
[0.321871,-0.050400,0.972060],
[0.327290,-0.050710,0.957635],
[0.333158,-0.050946,0.942053],
[0.339085,-0.050813,0.925458],
[0.344214,-0.050331,0.908463],
[0.348190,-0.049759,0.891851],
[0.351326,-0.049144,0.875753],
[0.354042,-0.048306,0.859365]
]
]
}
]
}