R igraph Adjazenzmatrix weighted graph – plot is not weighted

294 Views Asked by At

I am trying to plot a weighed graph of terms used in tweets. Basically I made a term Document Matrix; removed sparse terms; build a adjazenzmatrix of the remaining words and would like to plot them. I can't figure out where the problem is. Tried to do it exactly like on: http://www.rdatamining.com/examples/text-mining

Here's my code:

tweet_corpus = Corpus(VectorSource(df$CONTENT))
tdm = TermDocumentMatrix(
     tweet_corpus,
     control = list(
       removePunctuation = TRUE,
       stopwords = c("hehe", "haha", stopwords_phil, stopwords("english"), stopwords("spanish")),
       removeNumbers = TRUE, tolower = TRUE)
       )

m = as.matrix(tdm)
termDocMatrix <- m
termDocMatrix[5:10,1:20]
          Docs
Terms      1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
  aabutin  0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0
  aad      0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0
  aaf      0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0
  aali     0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0
  aannacm  0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0
  aantukin 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0

myTdm2 <- removeSparseTerms(tdm, sparse =0.98)
m2 <- as.matrix(myTdm2)
m2[5:10,1:20]
          Docs
Terms      1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
  filipino 0 0 1 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0
  give     0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  1  1  0  0
  god      0 1 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  1  0  0
  good     0 0 0 0 0 0 0 0 0  0  0  0  0  0  1  0  0  0  0  0
  guy      0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  1  0  0  1  0
  haiyan   0 0 0 0 0 0 0 0 0  0  0  0  1  0  0  0  0  0  0  0

myTdm2
<<TermDocumentMatrix (terms: 34, documents: 27395)>>
Non-/sparse entries: 39769/891661
Sparsity           : 96%
Maximal term length: 9
Weighting          : term frequency (tf)

termDocMatrix2 <- m2
termDocMatrix2[termDocMatrix2>=1] <- 1
termMatrix2 <- termDocMatrix2 %*% t(termDocMatrix2)
termMatrix2[5:10,5:10]
          Terms
Terms      disaster give  god good guy   test
  disaster      623    6   53   11   4     19
  give            6  592   98   16   8      6
  god            53   98 2679  135  38     29
  good           11   16  135  816  21      5
  guy             4    8   38   21 637      5
  test           19    6   29    5   5    610
g2 <- graph.adjacency(termMatrix2, weighted=T, mode="undirected")
g2 <- simplify(g2)
V(g)$label <- V(g)$name
V(g2)$label <- V(g2)$name
V(g2)$degree <- degree(g2)
set.seed(3952)
layout1 <- layout.fruchterman.reingold(g2)
plot(g2, layout=layout1)
plot(g2, layout=layout.kamada.kawai)
V(g2)$label.cex <- 2.2 * V(g2)$degree / max(V(g2)$degree)+ .2
V(g2)$label.color <- rgb(0, 0, .2, .8)
V(g2)$frame.color <- NA
egam <- (log(E(g2)$weight)+.4) / max(log(E(g2)$weight)+.4)
E(g2)$color <- rgb(.5, .5, 0, egam)
E(g2)$width <- egam
plot(g2, layout=layout1)

This then looks like: enter image description here

but i would like to have something like this: enter image description here

apparently the weighing doesn't work - but why?!

Thank you guys in advance!

1

There are 1 best solutions below

2
On

Even though your graph is weighted, the layout algorithm does not use the weights unless you explicitly tell it to do so. Try this:

layout1 <- layout.fruchterman.reingold(g2, weights=E(g2)$weight)

However, if your weights are wildly varying in terms of magnitude, it is usually better to use the logarithm of the weights (plus some constant to make all of them strictly positive) as the input of the layout algorithm.