Number of Common sub sequences of two strings

2.8k Views Asked by At

Revisiting Longest Common Subsequence, I was wondering what is the number of common subsequences of 2 strings. I tried to formulate a recurrent relation : DP[i][j] represents the number of subsequences ending at max i in first string and j in second. Recurrent relation being :

DP[i][j]= DP[i-1][j-1] + DP[i][j-1] + DP[i-1][j] when A[i]==B[j] (A is my first string while B is the second)

and

DP[i][j]=DP[i-1][j]+DP[i][j-1] other wise

It is not giving correct results!

Explanation if A[i]==B[j], then , We can add A[i] to every common subsequence till i-1 and j-1 to form DP[i-1][j-1] new subsequences. Also We have to add the subsequences formed by dropping the last character from each string.

other wise we can only add the ways by dropping last characters!

If someone can correct this recurrence? Morover , I will love to see a formal proof.(Just getting into the habit of seeing a formal proof(You can provide a hint to prove it myself too.))

EDIT: I forgot to mention the base cases I was considering

DP[i][0]=1 for all i in length of A

and DP[0][j]=1 for all j in length of B

and also

DP[0][0]=1

(denoting empty subsequence)

2

There are 2 best solutions below

4
On BEST ANSWER

DP[i-1][j] and DP[i][j-1] shall have DP[i-1][j-1] common sub-sequences which will be double counted.

Change your recurrence to:

DP[i][j]= DP[i][j-1] + DP[i-1][j] when A[i]==B[j]

and

DP[i][j]=DP[i-1][j]+DP[i][j-1]-DP[i-1][j-1] other wise

Explanation:
In your original relations, I have just subtracted a term DP[i-1][j-1]. This is because DP[i][j-1] and DP[i-1][j] both include DP[i-1][j-1]. Since we are adding these two terms, the term DP[i-1][j-1] is getting double counted, so we need to subtract it once.

0
On
#define MAX_LEN 101
string s1;
string s2;
int N1;
int N2;
int dp[MAX_LEN][MAX_LEN];
int GetCommomSubsequencesCount()
{
    for (int i = 0; i <= N1; i++)//N1 is size of 1st string
    {
        for (int j = 0; j <= N2; j++)//N2 is size of 2nd string
        {
            dp[i][j] = 0;
        }
    }
    for (int i = 1; i <= N1; i++)
    {
        for (int j = 1; j <= N2; j++)
        {
            if (s1[i - 1] == s2[j - 1])
            {
                dp[i][j] = 1 + dp[i][j - 1] + dp[i - 1][j];                    
            }
            else
            {
                dp[i][j] =  dp[i][j - 1] + dp[i - 1][j] - dp[i-1][j-1];
            }
        }
    }
    return dp[N1][N2];
}

Explanation :

dp[i][j] is the number of common sub-sequences for two strings with size i and j respectively

Case 1 : s1[i-1] == s2[j-1]

All previous common sub-sequences are doubled as they get appended by one more character. Both dp[i][j-1] and dp[i-1][j] contain dp[i-1][j-1] and hence it gets added two times in our recurrence which takes care of doubling of count of all previous common sub-sequences. Addition of 1 in recurrence is for the latest character match : common sub-sequence made up of s1[i-1] and s2[j-1]

Case 2: s1[i-1] != s2[j-1]

Here we subtract dp[i-1][j-1] once because it is present in both dp[i][j - 1] and dp[i - 1][j] and gets added twice.

I reckon these recurrences are easier to absorb as they do not rely on empty sub-sequences.