How to assign column names?

45 Views Asked by At

I am writing a code for Topic modeling. I received this error.

install.packages("tm")
install.packages("topicmodels")

library(tm)
library(topicmodels)

docs <- Corpus(VectorSource(c(
        "This is the first document about topic modeling.",
        "Topic modeling is a popular technique in text analysis.",
        "LDA is a common algorithm for topic modeling.",
        "Text mining is an interesting field in data science."
)))

docs <- tm_map(docs, content_transformer(tolower))     # Convert text to lowercase
docs <- tm_map(docs, removePunctuation)                # Remove punctuation
docs <- tm_map(docs, removeNumbers)                    # Remove numbers
docs <- tm_map(docs, removeWords, stopwords("english")) # Remove common English stopwords
docs <- tm_map(docs, stripWhitespace)

dtm <- DocumentTermMatrix(docs)

num_topics <- 3  
lda_model <- LDA(dtm, k = num_topics)

topic_labels <- c("Topic 1: Introduction to Topic Modeling",
                 "Topic 2: Techniques in Text Analysis",
                  "Topic 3: LDA and Data Science")

terms(lda_model, 10)  

topics <- topics(lda_model)

document_topics <- as.data.frame(topics)
colnames(document_topics) <- topic_labels
print(document_topics)


Error in names(x) <- value : 
  'names' attribute [3] must be the same length as the vector [1]

> traceback()
1: `colnames<-`(`*tmp*`, value = c("Topic 1: Introduction to Topic Modeling", 
   "Topic 2: Techniques in Text Analysis", "Topic 3: LDA and Data Science"
   ))

I am trying to print the topic names instead of only in numbers.

2

There are 2 best solutions below

0
Nir Graham On

best guess is you are attempting to do this :


(document_topics <- 
data.frame(topics = factor(topics,
           levels = seq(topic_labels),
           labels = topic_labels
)))
1
Mark On

The problem is that as.data.frame(topics) is:

  topics
1      2
2      3
3      1
4      2

i.e. it's a one column dataframe, and topic_labels has three labels in it. Three into one doesn't go, so you get the error Error in names(x) <- value : 'names' attribute [3] must be the same length as the vector [1] (which, as R errors go, is actually quite descriptive).

Your options are:

  1. Pick one value, assign it to the one column in the dataframe:
> data.frame(blah = topics)
  blah
1    2
2    3
3    1
4    2
  1. If you wanted each value to be its own column you could transpose the data and then do it that way:
> document_topics <- data.frame(t(topics))
> colnames(document_topics) <- c("a", "b", "c", "d")
> document_topics
  a b c d
1 2 3 1 2
  1. You could remove one of the values, and use the original topic_labels vector, which has a length of 3:
> document_topics <- data.frame(t(topics))[,1:3]
> colnames(document_topics) <- topic_labels
> document_topics
  Topic 1: Introduction to Topic Modeling Topic 2: Techniques in Text Analysis
1                                       2                                    3
  Topic 3: LDA and Data Science
1                             1

Bonus option #4. If you wanted to use topics to index topic_labels, you could use:

> topic_labels[topics]
[1] "Topic 2: Techniques in Text Analysis"   
[2] "Topic 3: LDA and Data Science"          
[3] "Topic 1: Introduction to Topic Modeling"
[4] "Topic 2: Techniques in Text Analysis"   

Or as a dataframe:

> document_topics <- data.frame(topics = topic_labels[topics])
> document_topics
                                   topics
1    Topic 2: Techniques in Text Analysis
2           Topic 3: LDA and Data Science
3 Topic 1: Introduction to Topic Modeling
4    Topic 2: Techniques in Text Analysis