Run-length encoding of a given sorted string

5.1k Views Asked by At

Write code for run -length encoding of a given string
Sample Input: aaaaaaaaaabcccccc
Output: a10bc6

My code:

static void Main(string[] args)
{
    string str = "aaaaaaaaaabcccccc";
    var qry = (from c in str
               group c by c into grp
               select new
               {
                   output = grp.Key.ToString() + grp.Count().ToString()
               });
    StringBuilder sb = new StringBuilder();
    foreach (var item in qry)
    {
        sb.Append(item.output);
    }
    Console.WriteLine(sb.ToString());
    Console.ReadLine();
}

However it returns:

a10b1c6

I want to remove the count for non-repeating char, here is "1" for letter 'b'.

Assume that it is a sorted string.

6

There are 6 best solutions below

2
On BEST ANSWER

Here's a simplified version:

public static void Main()
{
   string str = "aaaaaaaaaabcccccc";
    var qry = (from c in str
               group c by c into grp
               let c = grp.Count()
               select grp.Key.ToString() + (c > 1 ? c.ToString() : ""));

    Console.WriteLine(string.Join("",qry));
    Console.ReadLine();
}

You need to be careful with the bracket placement around the ternary expression, and then I used string.Join to avoid the mess with a for each loop and string builder.

9
On

add a ternary expression:

output = grp.Key + (grp.Count() > 1 ? grp.Count().ToString() : "")
0
On

You can use the conditional operator for the core issue. Another approach is to use a Lookup which is similar to a dictionary and String.Concat:

var charLook = input.ToLookup(c => c);
string result = string.Concat(charLook
    .Select(g => string.Format("{0}{1}", g.Key, g.Count()==1 ? "" : g.Count().ToString())));
2
On

Although OP did mention as an afterthought that in his/her case his source string was sorted, in general, the input to Run Length encoding won't be sorted as will lose information and can't be decompressed. Here's a take on the more general case of unsorted:

  string str = "aaaaaaaabccccccaadddddaaa"; // a8bc6a2d5a3

  // Zip the string with itself, offset by 1 character. 
  // Duplicate the last char to make strings equal length
  var pairs = str
    .Zip((str + str.Last()).Skip(1),
         (prev, current) => new { prev, current });

  // Retain a horrid mutable sequence which tracks consecutive characters
  var sequence = 0;
  var grps = pairs.GroupBy(p => 
    new { Ch = p.prev, 
          Sequence = p.current == p.prev
          ? sequence 
          : sequence++});

  // Join this together, using the other solutions to drop the count from single chars
  var rle = String.Join("", 
    grps.Select(g => g.Count() > 1
        ? g.Key.Ch.ToString() + g.Count().ToString() 
        : g.Key.Ch.ToString()));
  Console.WriteLine(rle);

Edit
I guess the number comments indicate some violation of the POLA which require explanation:

  • The string is Zipped with itself offset by one (Skip), in order to detect the boundaries of consecutive characters
  • Since Zip stops on the shortest enumeration, the last character is repeated on the shortest string to handle the final character in the string.
  • Unlike the 'sorted' RLE input string in the other answers, the Grouping key is done by the combination of character and a 'are the characters adjacent?' sequencer.
  • This sequence is rather horridly incremented within a conditional in the projection lambda of the GroupBy
  • @Jonesy's / @Tim's conditional join is used within a String.Join to reassemble the final encoded string.
1
On

I just discovered a short RegEx solution. Posting here to share, along with a great RegEx cheat sheet.

Regex.Replace(input, @"(\D)\1+", c => c.Length.ToString() + c.Value[0]);
0
On

Please check the code below, it might help:

StringBuilder sb = new StringBuilder();
string x = "aaaaaaaaaabcccccc";
char[] c = x.ToCharArray();
char[] t = c.Distinct().ToArray();
for (int i = 0; i < t.Length; i++)
{
   int count = 0;

   for (int j = 1; j < c.Length; j++)
   {  
       if (t[i] == c[j - 1])
       {
          count++;
       }
   }

   if (count > 1)
   {
       sb.Append(t[i] + count.ToString());
   }
   else
   {
       sb.Append(t[i]);
   }

}
Console.Write(sb);
Console.ReadKey();