I am attempting to create a text network graph. I am working with pivoted survey data, and attempting to associate words from open-ended comments with associated numeric responses. I've constructed word correlations and graphed them, but am having a devil of a time associating numeric values back into the network graph. I have experience with R, but I've not had formal training/classes and I feel confident I'm missing something pretty basic right now.
I was able to successfully create a plot using the following code, assuming graph is my data frame, containing variables x (raw numeric score from the survey data), row_number (to tie individual word used back to its initial open ended comment), word, n (# of times "word" appears in the dataset), and y (average of x per word).
graph %>%
group_by(word) %>%
filter(n() >= 1000)%>%
pairwise_cor(word, row_number, upper=FALSE) %>%
filter(correlation > .09) %>%
graph_from_data_frame() %>%
ggraph(layout = "fr") +
geom_edge_link(aes(edge_alpha = correlation), show.legend = FALSE) +
geom_node_point(color = "lightblue", size = 5) +
scale_color_gradient(low = "red", high = "green") +
geom_node_text(aes(label = name), repel = TRUE) +
theme_void()
The pairwise_cor function essentially reshapes the dataframe into item1, item2, and correlation, dropping all other variables, meaning my relevant color-assigning variables were dropped, so I created a correlated words dataset, and then a final_df that joins individual word average scores (y) with the correlated_words dataset:
final <- cor_df %>%
left_join(filt_df, by = join_by(item1 == word)) %>%
left_join(filt_df, by = join_by(item2 == word))
"final" now contains item1 (word 1), item2 (word 2), correlation, n.1, y.1, n.2, and y.2 (where n is count of words and y is a weird stat: the average of X, the original survey numeric score associated that that word).
With the "final" data frame, I've now attempted a multitude of ways to map either y.1 or y.2 to the color of the nodes, generally something like:
as_tbl_graph(final)
ggraph(final, layout = "fr") +
geom_node_point(aes(color = y.1), size = 5) +
geom_node_text(aes(label = name), repel = TRUE) +
scale_color_gradient(low = "red", high = "green") +
theme_void()
This is the error I receive:
Error in geom_node_point()
:
! Problem while computing aesthetics.
ℹ Error occurred in the 1st layer.
Caused by error in FUN()
:
! object 'y.1' not found
Not sure exactly where I'm going wrong, although I have been poring through the documentation for ggraph and tidygraph. I don't have a full conceptual understanding of the various layout possibilities, which I feel is likely where my issues lie (or possibly my confusion starts in the construction of the dataframe itself via as_tbl_graph?), and would really welcome any additional resources or documentation towards understanding those algorithms/customizing layouts. (I've read https://cran.r-project.org/web/packages/ggraph/vignettes/Layouts.html and all of the ggraph vignettes!)
My question, boiled down, is: how can I use a numeric variable to add a color dimension to nodes in a network graph using ggraph (or more specifically, what the heck am I doing wrong)? Thanks in advance for any help!
The first issue with your code is that you are passing the
data.frame
final
toggraph()
instead of thetbl_graph
objectas_tbl_graph(final)
.The second issue is that, when converting to a
tbl_graph
they.1
andy.2
columns you added via thelef_join
become a columns or features of the edges data not the nodes and are thus not available to be mapped on aesthetics ingeom_node_xxx
. To fix this second issue you have to convertcor_df
to atbl_graph
first, then join yourfilt_df
. This way the columns are added to the nodes data.Note: I do only one
left_join
as a second does not make sense for the nodes data. Also I renamed the column fromy
tovalue
as I encountered a warning when usingy
.Using some fake data based on the
highschool
dataset fromggraph
: