Forward and forward-backward algorithms in CRF

693 Views Asked by At

In the papers about CRF (like this or this), authors all mention the forward-backward algorithm, yet implementations in GitHub (or the basic implementation in PyTorch tutorial) seem to only use the forward algorithm for calculating the negative log-likelihood to be optimized with SGD.

If I want to train a NER on BiLSTM features and the only type of queries I will make is just like "given a sentence, find the named entities", do I need the forward-backward algorithm? Or more generally, what is the difference between these two algorithms and which one is used when?

2

There are 2 best solutions below

0
On BEST ANSWER

I think the reason why only forward algorithm is used in the PyTorch tutorial is that for calculating the partition function, only forward or backward pass is needed. Forward-backward algorithm would be needed to calculate the marginal probabilities though.

1
On

BiLSTM gives you more context and potentially better results. Consider the following example:

  1. Teddy bear is a soft toy in the form of a bear.
  2. Teddy Roosevelt was the 26th president of United States.

In the first case "teddy" is not NE, but in the second one "Teddy" is NE. BiLSTM would be better at noticing this as it looks not only past but also future states (i.e. "bear" and "Roosevelt").

From Wikipedia about BiLSTM:

"General procedures for training are as follows: For forward pass, forward states and backward states are passed first, then output neurons are passed. For backward pass, output neurons are passed first, then forward states and backward states are passed next. After forward and backward passes are done, the weights are updated"