TM TF-IDF Summary Max Value is Above 1

44 Views Asked by At

My apologises in advance, I'm new to R and using my school's codes as a reference. I do not know why the Max value of the TF-IDF value could be above 1 when I closely followed the example I was given considering that I have normalised my values. I'm not sure why that is the case. Appreciate any help and do tell if more info is needed. Thank you.

# Create Document-Term Matrix
dtm_bumble <- DocumentTermMatrix(bumble)

# Find the unique indexed numbers from each document
ui = unique(dtm_bumble$i)

# If dtm$i does not contain a particular row index p, then row p is empty
new_dtm_bumble = dtm_bumble[ui,]

# Create Document-Term Matrix with TF-IDF values
dtm_tfidf_bumble <- weightTfIdf(new_dtm_bumble, normalize=TRUE)

# Info on DTM
inspect(new_dtm_bumble)

<<DocumentTermMatrix (documents: 84146, terms: 23016)>>
Non-/sparse entries: 645486/1936058850
Sparsity           : 100%
Maximal term length: 277
Weighting          : term frequency (tf)
Sample             :
       Terms
Docs    date good match messag money pay peopl profil swipe time
  33615    0    1     2      0     3   0     0      0     3    0
  36782    0    0     0      1     1   0     0      0     0    1
  37333    0    0     0      0     2   0     1      0     0    0
  40474    1    2     1      0     1   2     0      0     0    1
  49551    1    0     1      0     2   1     0      2     0    2
  58630    3    0     3      0     2   2     0      0     3    0
  63130    1    0    12      0     1   1     0      3     4    8
  66277    2    2     0      0     1   0     1      0     0    1
  73764    0    1     3      1     0   0     2      2     1    2
  83079    0    0     1      0     0   0     0      0     0    0

# Retrieve statistical summary of TF-IDF
summary(dtm_tfidf_bumble$v)

    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
 0.01849  0.30264  0.50189  0.86867  0.91498 16.36061 
0

There are 0 best solutions below