Parsing Strings with Linq/C#

Given a string, return its encoding defined as follows:

  • First, the string is divided into the least possible number of disjoint substrings consisting of identical characters
    • for example, "aabbbc" is divided into ["aa", "bbb", "c"]
  • Next, each substring with length greater than one is replaced with a concatenation of its length and the repeating character
    • for example, substring "bbb" is replaced by "3b"
  • Finally, all the new strings are concatenated together in the same order and a new string is returned.

Example

For s = "aabbbc", the output should be
lineEncoding(s) = "2a3bc".

 

string lineEncoding(string s)
{
// use a stringbuilder, because it’s not imutable.
StringBuilder sb = new StringBuilder();

// loop while the string still has chars left
while(s.Length>0)
{

// get the first char from the string.
char c = s.Take(1).First();

// create a new version of the string by removing the first (n) repeating chars
string newS = new String(s.SkipWhile(x => x == c).ToArray());

// calcuate the number of chars we removed.
int v = s.Length – newS.Length ;

// overwrite the original s string
s = newS;

// append the count to the output stringbuilder if it’s greater than 1
if (v > 1)
{
sb.Append(v);
}

// append the char to the stringbuilder
sb.Append(c);
}

// return the final value
return sb.ToString();
}

 

The first Linq statement you’ll see here is :  s.Take(1).First();

This just takes the first char from the string. The Take(1) returns an enumerable of type char, because the input is a string. If you gave it an array of ints, it would return an enumerable of int. When you call First() on that enumerable, it returns the first value of the enumberable.

The next Linq statement you’ll see is: new String(s.SkipWhile(x => x == c).ToArray())

s.SkipWhile(x=>x == c) returns an enumberable of type char, because s is a string. the (x=>x==c) is the lamda expression that it will execute to decide when to stop skipping. In this case, it starts at the first char, and stops skipping when it finds a char that is no longer equal to c. Then the ToArray() turns this enumberable of char into a char[] array, which can then be used in the new String() constructor to create the new string representation of this operation.

 

Tom

Leave a comment

Your email address will not be published. Required fields are marked *