Multidimensional JSON-C file convert to WEKA ARFF

501 Views Asked by At

Notice: I'm going to use direct string manipulation to get my data instead of JSON. I STILL WANT TO KNOW how to do this.

I have very large datasets I'm currently trying to categorize, but the first step is getting my data into a format that I can process.

I have an array of streams containing an array of XYZ data (an array) of variable length. I understand I will have to have a fixed length dataset (with sparse support for shorter datasets), and I will trim and '?' accordingly. This is ONE DATASET, NOT 21 datasets.

The most raw form of my data is currently generating a JSON file in the format below.

{
"0":
    [[0.268869,-0.061725,1.466800],
    [0.265376,-0.061317,1.453814],
    [0.261664,-0.061190,1.439445],...1-n elements

"20": [[0.268869,-0.061725,1.466800],...21 containers/training file
    [0.265376,-0.061317,1.453814],
    [0.261664,-0.061190,1.439445]

This is the general structure I would use for training data.

@relation streams

@attribute 0 relational
    @attribute f1 numeric
    @attribute f2 numeric
    @attribute f3 numeric
@end 0
@attribute 1 relational
    @attribute f1 numeric
    @attribute f2 numeric
    @attribute f3 numeric
@end 1
@attribute 2 relational
    @attribute f1 numeric
    @attribute f2 numeric
    @attribute f3 numeric
@end 2... (we have 21 of these)

@data
"0.268869,-0.061725,1.466800\n0.265...","0.268869,-0.0617...(sample 1)
"0.268869,-0.061725,1.466800\n0.265...","0.268869,-0.0617...(sample 2)
"0.268869,-0.061725,1.466800\n0.265...","0.268869,-0.0617...(sample 3)
... (tons of data here)

The problem I'm having is that I can't export relational ARFF files into JSON so I can see the structure they would need. I'm building the data in C using JSON-C so using this library to just build the header would be cleaner (IMO) than raw string manipulation and more useful for most of my applications being that it will be in a widely supported format.

Regarding the question How do I use a JSON file with weka: None of the WEKA arff files contain datasets that are multidimensional. I've perused the arff files and converted a few over to see if I can pattern match but I'm not having much luck.

What I did learn from those conversions is that to load the data into the WEKA GUI I need a header. Here is where I am now, with a very stripped down example that (understandably) fails to load.

{
    "header" : {
        "relation" : "delta"
        "attributes" : [
            {
                "name": "delta",
                "type": "numeric",(relational? how do I subclass here?)
                "class": false,   (theres no documentation on the JSON type)
                "weight" : 1.0,
            }
        ]
    },
    "data" : [
        {
            "sparse" : false,
            "weight" : 1.0,
            "values" : [
                [
                    [0.268869,-0.061725,1.466800],
                    [0.265376,-0.061317,1.453814],
                    [0.261664,-0.061190,1.439445],
                    [0.258106,-0.061153,1.423623],
                    [0.255281,-0.060748,1.406505],
                    [0.253105,-0.059812,1.388318],
                    [0.250796,-0.058583,1.369752],
                    [0.248108,-0.057399,1.351671],
                    [0.245563,-0.056328,1.334261],
                    [0.243474,-0.055272,1.316677]
                ],[
                    [0.301861,-0.056221,1.282535],
                    [0.302261,-0.055824,1.270375],
                    [0.302599,-0.055763,1.256942],
                    [0.303153,-0.055863,1.242172],
                    [0.304334,-0.055614,1.226184],
                    [0.305898,-0.054782,1.209144],
                    [0.306914,-0.053585,1.191657],
                    [0.307043,-0.052422,1.174524],
                    [0.306837,-0.051428,1.157930],
                    [0.306804,-0.050517,1.141103]
                ],[
                    [0.311746,-0.050597,0.997220],
                    [0.316743,-0.050354,0.985238],
                    [0.321871,-0.050400,0.972060],
                    [0.327290,-0.050710,0.957635],
                    [0.333158,-0.050946,0.942053],
                    [0.339085,-0.050813,0.925458],
                    [0.344214,-0.050331,0.908463],
                    [0.348190,-0.049759,0.891851],
                    [0.351326,-0.049144,0.875753],
                    [0.354042,-0.048306,0.859365]
                ]
            ]
        }
    ]
}
0

There are 0 best solutions below