We have a web site and WinForms application written in .NET 4.0 that allows users to enter any Unicode char (pretty standard).
The problem is that a small amount of our data gets submitted to an old mainframe application. While we were testing a user entered a name with characters that ending up crashing the mainframe program. The name was BOËNS. The E is not supported.
What is the best way to detect if a unicode char is supported by EBCDIC?
I tried using the following regular expression but that restricted some standard special chars (/, _, :) which are fine for the mainframe.
I would prefer to use one method to validate each char or have a method that you just passed in a string and it returned true or false if chars not supported by EBCDIC were contained in the strig.
You can escape characters in Regex using the
\. So if you want to match a dot, you can do@"\.". To match/._,:[]-for example:@"[/._,:\-\[\]]. Now, EBDIC is 8 bits, but many characters are control characters. Do you have a list of "valid" characters?I have made this pattern:
It should find "illegal" characters. If
IsMatchthen there is a problem.I have used this: http://nemesis.lonestar.org/reference/telecom/codes/ebcdic.html
Note the special handling of the
". I'm using the@at the beginning of the string to disable\ escape expansion, so I can't escape the closing quote, and so I add it to the pattern in the end.To test it:
m1isfalse(it's the list of all the "good" characters),m2istrue(to the other list I've added the€symbol)