combine two columns one from math feature extraction and the second from base BERT

30 Views Asked by At

I am trying to predict the score from each student answer, since the answer in free text format, so it has math operation and text. I used math feature extraction that created a new column for mathematics operation, and i used base BERT that transfer the text to tensor numbers. I got the output from these step. Now i am trying to build CNN but i got error when i tried to combine the two columns math features + line embeeding

This is my code

# Prepare data for CNN
# Convert 'math_expressions' and 'line_embeddings' columns to PyTorch tensors
X_math = torch.tensor([float(item) 
                       for sublist in data['math_expressions'].apply(lambda x: [] if pd.isna(x) else x).explode().tolist() 
                       if item.isdigit() or re.match(r'\d+\.\d+', item)], dtype=torch.float32)

X_text = torch.stack(data['line_embeddings']).numpy()
X_text = torch.tensor(X_text, dtype=torch.float32)

# Concatenate the two sets of embeddings
X_combined = torch.cat((X_math.view(-1, 1), X_text), dim=1)

# Standardize the input features
scaler = StandardScaler()
X_combined = scaler.fit_transform(X_combined)

y = torch.tensor(data['overallScoreNew'].values, dtype=torch.float32)

# Split the data
X_train, X_test, y_train, y_test = train_test_split(X_combined, y, test_size=0.2, random_state=42)

# Convert data to PyTorch tensors
X_train_tensor = torch.FloatTensor(X_train)
y_train_tensor = torch.FloatTensor(y_train)

here is the error i got:

X_math = torch.tensor([item for sublist in data['math_expressions'].apply(lambda x: [] if pd.isna(x) else x).explode().tolist()], dtype=torch.float32)
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
0

There are 0 best solutions below