Pay attention to the following syntax:
# Reading CSV file
df = pd.read_csv("Positif.csv") # Ganti "Positif.csv" dengan nama file CSV Anda
# Separating column 'comments;sentimentscore' into two separated columns
df[['comments', 'sentiment_score']] = df['comments;sentimentscore'].str.split(';', expand=True)
# Returns the first few rows of the DataFrame after the display
print(df.head())
# Separating features and labels
X = df['comments']
y = df['sentiment_score']
# Divide data into training data and test data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Create feature vectors using TF-IDF
tfidf_vectorizer = TfidfVectorizer(max_features=1000) # Ubah max_features sesuai kebutuhan
X_train_tfidf = tfidf_vectorizer.fit_transform(X_train)
X_test_tfidf = tfidf_vectorizer.transform(X_test)
# Train SVM model
svm_model = SVC(kernel='linear')
svm_model.fit(X_train_tfidf, y_train)
# Predict sentiment on test data
y_pred_svm = svm_model.predict(X_test_tfidf)
# Calculate accuracy
svm_accuracy = accuracy_score(y_test, y_pred_svm)
print("Accuracy using SVM:", svm_accuracy)
When I run the program, there's this following error: KeyError Traceback (most recent call last) /usr/local/lib/python3.10/dist-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance) 3801 try: -> 3802 return self._engine.get_loc(casted_key) 3803 except KeyError as err:
4 frames
/usr/local/lib/python3.10/dist-packages/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
/usr/local/lib/python3.10/dist-packages/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'comments;sentimentscore'
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
<ipython-input-29-a1ec9eefed91> in <cell line: 5>()
3
4 # Separating column 'comments;sentimentscore' into two separated columns
----> 5 df[['comments', 'sentiment_score']] = df['comments;sentimentscore'].str.split(';', expand=True)
6
7 # Returns the first few rows of the DataFrame after the display
/usr/local/lib/python3.10/dist-packages/pandas/core/frame.py in __getitem__(self, key)
3805 if self.columns.nlevels > 1:
3806 return self._getitem_multilevel(key)
-> 3807 indexer = self.columns.get_loc(key)
3808 if is_integer(indexer):
3809 indexer = [indexer]
/usr/local/lib/python3.10/dist-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
3802 return self._engine.get_loc(casted_key)
3803 except KeyError as err:
-> 3804 raise KeyError(key) from err
3805 except TypeError:
3806 # If we have a listlike key, _check_indexing_error will raise
KeyError: 'comments;sentimentscore'
How to fix the KeyError?
I expected this kind of syntax to run normally and no errors. But it resulted in KeyError. The CSV file has 2 columns, comments & sentiment_score