How to set random seed in sklearn_crfsuite

273 Views Asked by At

I am currently trying to implement NER model using sklearn_crfsuite library.

The training code is simply as follows:

for repeat in range(10):
    crf = sklearn_crfsuite.CRF(
                            algorithm='lbfgs',
                            c1=0.1,
                            c2=0.1,
                            max_iterations=100,
                            all_possible_transitions=True,
                            verbose=True
                        )
    crf.fit(X_train, y_train)
    pred_list = crf.predict(X_test)

The code is do training for ten repeat, my goal is to observe 10 different scores and average them as a final score. However, each repeat gives the same score, although I reinitialize the model in each loop.

The question is, how I properly set random seed so that each repeat can give different results?

NOTE: After I shuffle the training data in each loop, it still gives the same results. Finally, I changed the training algorithm from 'lbfgs' (Gradient descent using the L-BFGS method) to 'l2sgd' (Stochastic Gradient Descent with L2 regularization), then I started to obtain different results.

1

There are 1 best solutions below

1
On

You don't search for a random seed, you probably search for cross validation:

the full documentation you can find here.

if you want to run 10 different iterations you can use:

crf = sklearn_crfsuite.CRF(
                            algorithm='lbfgs',
                            max_iterations=100,
                            all_possible_transitions=True,
                            verbose=True
                        )
    
params_space = {
    'c1': scipy.stats.expon(scale=0.5),
    'c2': scipy.stats.expon(scale=0.05),
}

# use the same metric for evaluation
f1_scorer = make_scorer(metrics.flat_f1_score,
                        average='weighted', labels=labels)

# search
rs = RandomizedSearchCV(crf, params_space,
                        cv=10,
                        verbose=1,
                        n_jobs=-1,
                        n_iter=50,
                        scoring=f1_scorer)
rs.fit(X_train, y_train)

and you will get the best parameters