How to convert numeric character reference to Unicode in classic ASP?

261 Views Asked by At

I have a website in Classic ASP. I need to export some data from a database to CSV files.

Some of the data (from the database) is in numeric character reference (NCR). These characters all start with the characters "&#". These are actually Chinese characters represented in NCR (i.e. a number string spelling out the unicode character number). For example: 香辣猪

How do I decode these Chinese characters which are currently in NCR to their actual Chinese characters (maybe in unicode format), in the exported CSV file, so that when I open up this file in Excel or Google Sheets, these Chinese characters will show up properly (display the actual Chinese characters)?

For example 香辣猪 should be actually displayed as 香辣猪

In Excel, I can actually use the following to do the conversion:

=UNICHAR(39321)&UNICHAR(36771)&UNICHAR(29482)

But, I would like to pre-convert those NCR to unicode when exporting to CSV. Is there a way to do this? What is the equivalent of UNICHAR in Classic ASP?

1

There are 1 best solutions below

4
On

In google sheet, if you want to upload a csv file whith html special characters, try

function importCsvFromIdCodeHtml() {
  var id = '13tlu9eYb5Ty3L45_RKibsfHjOXyUxeX3'; // adapt the id to your own file id
  var csv = DriveApp.getFileById(id).getBlob().getDataAsString();
  var csvData = Utilities.parseCsv(csv);
  csvData.forEach((rng, row) => {
    rng.forEach((r, col) => {
      code = (ExtractAllRegex(r, '&#([0-9]+);', 1))
      code.forEach(function (c) { r = r.replace(`&#${c};`, String.fromCharCode(c)) })
      csvData[row][col] = r
    })
  });
  var f = SpreadsheetApp.getActiveSpreadsheet().getActiveSheet();
  f.getRange(1, 1, csvData.length, csvData[0].length).setValues(csvData);
}
function ExtractAllRegex(input, pattern,groupId) {
  return Array.from(input.matchAll(new RegExp(pattern,'g')), x=>x[groupId]);
}

  var txt = '香'
  var char = String.fromCharCode(txt.match(/&#(\d+);/)[1]);
  console.log(char)