I am unable to load this nested JSON into a data frame.
require(RJSONIO)
sample_json2 <- '[{ "_id" : { "$oid" : "50b59cd75bed76f46522c34e" }, "student_id" : 0, "class_id" : 2, "scores" : [ { "type" : "exam", "score" : 57.92947112575566 }, { "type" : "quiz", "score" : 21.24542588206755 }, { "type" : "homework", "score" : 68.19567810587429 }, { "type" : "homework", "score" : 67.95019716560351 }, { "type" : "homework", "score" : 18.81037253352722 } ] }
,{ "_id" : { "$oid" : "50b59cd75bed76f46522c34f" }, "student_id" : 0, "class_id" : 28, "scores" : [ { "type" : "exam", "score" : 39.17749400402234 }, { "type" : "quiz", "score" : 78.44172815491468 }, { "type" : "homework", "score" : 20.81782269075502 }, { "type" : "homework", "score" : 70.44520452408949 }, { "type" : "homework", "score" : 50.66616327819226 }, { "type" : "homework", "score" : 53.84983118363991 } ] }
,{ "_id" : { "$oid" : "50b59cd75bed76f46522c350" }, "student_id" : 0, "class_id" : 5, "scores" : [ { "type" : "exam", "score" : 88.22950674232497 }, { "type" : "quiz", "score" : 79.28962650427184 }, { "type" : "homework", "score" : 18.66254946562674 }, { "type" : "homework", "score" : 40.28154176513361 }, { "type" : "homework", "score" : 1.23735944117882 }, { "type" : "homework", "score" : 88.96101200683958 } ] }
,{ "_id" : { "$oid" : "50b59cd75bed76f46522c351" }, "student_id" : 0, "class_id" : 16, "scores" : [ { "type" : "exam", "score" : 59.1805667559299 }, { "type" : "quiz", "score" : 47.58960202938239 }, { "type" : "homework", "score" : 6.48470951607214 }, { "type" : "homework", "score" : 68.33519637418685 }, { "type" : "homework", "score" : 78.53068038180965 } ] }
,{ "_id" : { "$oid" : "50b59cd75bed76f46522c352" }, "student_id" : 0, "class_id" : 24, "scores" : [ { "type" : "exam", "score" : 4.444435759027499 }, { "type" : "quiz", "score" : 28.63057857803885 }, { "type" : "homework", "score" : 86.79352850434199 }, { "type" : "homework", "score" : 83.9164548767836 } ] }
,{ "_id" : { "$oid" : "50b59cd75bed76f46522c353" }, "student_id" : 0, "class_id" : 30, "scores" : [ { "type" : "exam", "score" : 14.34345947841966 }, { "type" : "quiz", "score" : 47.65945482174327 }, { "type" : "homework", "score" : 83.42772189120254 }, { "type" : "homework", "score" : 49.86812935368258 }, { "type" : "homework", "score" : 39.85525554437086 } ] }
,{ "_id" : { "$oid" : "50b59cd75bed76f46522c354" }, "student_id" : 0, "class_id" : 7, "scores" : [ { "type" : "exam", "score" : 18.20492211025179 }, { "type" : "quiz", "score" : 60.4769945611789 }, { "type" : "homework", "score" : 75.62999921143397 }, { "type" : "homework", "score" : 72.41228797373115 }, { "type" : "homework", "score" : 74.06744381708968 } ] }]'
I tried using
json_file2 <- fromJSON(sample_json2, flatten = TRUE)
for jsonlite package but it still has scores as a concatenated list.
I later tried lapply but the output in data frame is not in desired format. Its a list which will not make sense to any reader. Note that I am new to R so just following online blogs and trying to derive the output. Below is my code using lapply.
json_file2 <- fromJSON(sample_json2)
df <- lapply(json_file2, function(play){
data.frame(matrix(unlist(play, recursive = TRUE), byrow = FALSE))
})
My desired output format would be to have 5 rows with each type of score in one row. I would also prefer if I can retain all column tags in the column names. Eg: _id.$oid, student_id, class_id, type, Scores.type, scores.score. But the format below will also work just fine.
$oid, student_id class_id type score
50b59cd75bed76f46522c35 0 7 exam 75.62
50b59cd75bed76f46522c354 0 7 quiz 59.62
50b59cd75bed76f46522c354 0 7 homework 59.62
50b59cd75bed76f46522c354 0 7 Homework 59.62
Can someone please help me resolve this problem? Many thanks.
Hello and welcome to SO!
I agree with @SymbolixAU that it's easier to format your mongo query, however, in case that isn't possible, I ended up making an attempt at it.
Please note that this code is just a very ugly and dirty solution, and I'm just learning a learner of R yet too, but it should just get you started. I used the
tidyverse
in here, as you did not specify if it had to use base R. Go here in case you're not familiar with it.You'll end up with this:
Note: I had a very similar problem, that's why I'm sharing the code, but please note that this isn't encouraged by SO standards...