Having to maintain old programs written in VB6, I find myself having this issue.
I need to find an efficient way to search a string for all characters OUTSIDE the Windows-1252 set and replace them with "_". I can do this in C#
So far I have done this by creating a string with all 1252 characters, is there a faster way?
I may have to do this for a few million records in a text file
string 1252chars = ""!\""#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~¡¢£¤¥¦§¨©ª«¬®¯°±²³´µ¶•¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖרÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿŸžœ›š™˜—–•""’’ŽŽ‹Š‰vˆ‡†…„ƒ‚€ ""
//Replace all characters not in the string above...
Have you tried to normalize the string?
string.Normalize()method is used to remove all characters that are not part of the Windows-1252 character set. https://learn.microsoft.com/de-de/dotnet/api/system.string.normalize?view=net-7.0Alternatively, you can use a loop to check each character of the string and remove the characters that are not in the Windows-1252 set using the StringBuilder class.