We have a web site and WinForms application written in .NET 4.0 that allows users to enter any Unicode char (pretty standard).
The problem is that a small amount of our data gets submitted to an old mainframe application. While we were testing a user entered a name with characters that ending up crashing the mainframe program. The name was BOËNS. The E is not supported.
What is the best way to detect if a unicode char is supported by EBCDIC?
I tried using the following regular expression but that restricted some standard special chars (/, _, :) which are fine for the mainframe.
I would prefer to use one method to validate each char or have a method that you just passed in a string and it returned true or false if chars not supported by EBCDIC were contained in the strig.
You can escape characters in Regex using the
\
. So if you want to match a dot, you can do@"\."
. To match/._,:[]-
for example:@"[/._,:\-\[\]]
. Now, EBDIC is 8 bits, but many characters are control characters. Do you have a list of "valid" characters?I have made this pattern:
It should find "illegal" characters. If
IsMatch
then there is a problem.I have used this: http://nemesis.lonestar.org/reference/telecom/codes/ebcdic.html
Note the special handling of the
"
. I'm using the@
at the beginning of the string to disable\ escape expansion
, so I can't escape the closing quote, and so I add it to the pattern in the end.To test it:
m1
isfalse
(it's the list of all the "good" characters),m2
istrue
(to the other list I've added the€
symbol)