Keyerror because of columns in CSV

11 Views Asked by At

Pay attention to the following syntax:

# Reading CSV file
df = pd.read_csv("Positif.csv") # Ganti "Positif.csv" dengan nama file CSV Anda

# Separating column 'comments;sentimentscore' into two separated columns
df[['comments', 'sentiment_score']] = df['comments;sentimentscore'].str.split(';', expand=True)

# Returns the first few rows of the DataFrame after the display
print(df.head())

# Separating features and labels
X = df['comments']
y = df['sentiment_score']

# Divide data into training data and test data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create feature vectors using TF-IDF
tfidf_vectorizer = TfidfVectorizer(max_features=1000) # Ubah max_features sesuai kebutuhan
X_train_tfidf = tfidf_vectorizer.fit_transform(X_train)
X_test_tfidf = tfidf_vectorizer.transform(X_test)

# Train SVM model
svm_model = SVC(kernel='linear')
svm_model.fit(X_train_tfidf, y_train)

# Predict sentiment on test data
y_pred_svm = svm_model.predict(X_test_tfidf)

# Calculate accuracy
svm_accuracy = accuracy_score(y_test, y_pred_svm)
print("Accuracy using SVM:", svm_accuracy)

When I run the program, there's this following error: KeyError Traceback (most recent call last) /usr/local/lib/python3.10/dist-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance) 3801 try: -> 3802 return self._engine.get_loc(casted_key) 3803 except KeyError as err:

4 frames
/usr/local/lib/python3.10/dist-packages/pandas/_libs/index.pyx in         pandas._libs.index.IndexEngine.get_loc()

/usr/local/lib/python3.10/dist-packages/pandas/_libs/index.pyx in     pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'comments;sentimentscore'

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
<ipython-input-29-a1ec9eefed91> in <cell line: 5>()
      3 
      4 # Separating column 'comments;sentimentscore' into two separated columns
----> 5 df[['comments', 'sentiment_score']] = df['comments;sentimentscore'].str.split(';', expand=True)
      6 
      7 # Returns the first few rows of the DataFrame after the display

/usr/local/lib/python3.10/dist-packages/pandas/core/frame.py in __getitem__(self, key)
   3805             if self.columns.nlevels > 1:
   3806                 return self._getitem_multilevel(key)
-> 3807             indexer = self.columns.get_loc(key)
   3808             if is_integer(indexer):
   3809                 indexer = [indexer]

/usr/local/lib/python3.10/dist-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   3802                 return self._engine.get_loc(casted_key)
   3803             except KeyError as err:
-> 3804                 raise KeyError(key) from err
   3805             except TypeError:
   3806                 # If we have a listlike key, _check_indexing_error will raise

KeyError: 'comments;sentimentscore'

How to fix the KeyError?

I expected this kind of syntax to run normally and no errors. But it resulted in KeyError. The CSV file has 2 columns, comments & sentiment_score

0

There are 0 best solutions below