Using JavaScript to truncate text to a certain size (8 KB)

Question

Using JavaScript to truncate text to a certain size (8 KB)

6.6k Views Asked by Bungle At 31 July 2025 at 02:59

I'm using the Zemanta API, which accepts up to 8 KB of text per call. I'm extracting the text to send to Zemanta from Web pages using JavaScript, so I'm looking for a function that will truncate my text at exactly 8 KB.

Zemanta should do this truncation on its own (i.e., if you send it a larger string), but I need to shuttle this text around a bit before making the API call, so I want to keep the payload as small as possible.

Is it safe to assume that 8 KB of text is 8,192 characters, and to truncate accordingly? (1 byte per character; 1,024 characters per KB; 8 KB = 8,192 bytes/characters) Or, is that inaccurate or only true given certain circumstances?

Is there a more elegant way to truncate a string based on its actual file size?

Original Q&A

There are 4 best solutions below

Dominic Rodger On 04 October 2009 at 08:11

No it's not safe to assume that 8KB of text is 8192 characters, since in some character encodings, each character takes up multiple bytes.

If you're reading the data from files, can't you just grab the filesize? Or read it in in chunks of 8KB?

annakata On 04 October 2009 at 08:23

As Dominic says, character encoding is the problem - however if you can either really ensure that you'll only deal with 8-bit chars (unlikely but possible) or assume 16-bit chars and limit yourself to half the available space, i.e. 4096 chars then you could attempt this.

It's a bad idea to rely on JS for this though because it can be trivially modified or ignored and you have complications of escape chars and encoding to deal with for example. Better to use JS as a first-chance filter and use whatever server-side language you have available (which will also open up compression).

Daniel Aviv On 24 August 2018 at 01:11

You can do something like this since unescape is partially deprecated

function byteCount( string ) {
    // UTF8
    return encodeURI(string).split(/%..|./).length - 1;
}

function truncateByBytes(string, byteSize) {
    // UTF8
    if (byteCount(string) > byteSize) {
        const charsArray = string.split('');
        let truncatedStringArray = [];
        let bytesCounter = 0;
        for (let i = 0; i < charsArray.length; i++) {
            bytesCounter += byteCount(charsArray[i]);
            if (bytesCounter <= byteSize) {
                truncatedStringArray.push(charsArray[i]);
            } else {
                break;
            }
        }
        return truncatedStringArray.join('');
    }
    return string;
}

**bobince** · Accepted Answer

If you are using a single-byte encoding, yes, 8192 characters=8192 bytes. If you are using UTF-16, 8192 characters(*)=4096 bytes.

(Actually 8192 code-points, which is a slightly different thing in the face of surrogates, but let's not worry about that because JavaScript doesn't.)

If you are using UTF-8, there's a quick trick you can use to implement a UTF-8 encoder/decoder in JS with minimal code:

function toBytesUTF8(chars) {
    return unescape(encodeURIComponent(chars));
}
function fromBytesUTF8(bytes) {
    return decodeURIComponent(escape(bytes));
}

Now you can truncate with:

function truncateByBytesUTF8(chars, n) {
    var bytes= toBytesUTF8(chars).substring(0, n);
    while (true) {
        try {
            return fromBytesUTF8(bytes);
        } catch(e) {};
        bytes= bytes.substring(0, bytes.length-1);
    }
}

(The reason for the try-catch there is that if you truncate the bytes in the middle of a multibyte character sequence you'll get an invalid UTF-8 stream and decodeURIComponent will complain.)

If it's another multibyte encoding such as Shift-JIS or Big5, you're on your own.

Using JavaScript to truncate text to a certain size (8 KB)

There are 4 best solutions below

Related Questions in JAVASCRIPT

Related Questions in TEXT

Related Questions in BYTE

Related Questions in TRUNCATE

Related Questions in ZEMANTA

Trending Questions

Popular # Hahtags

Popular Questions