Does GCC's standard library or Boost or any other library implement iostream-compliant versions of ifstream or ofstream that supports conversion between UTF-8-encoded (file-) streams and a std::vector<wchar_t> or std::wstring?
UTF-8-compliant IOstreams
2.7k Views Asked by Nordlöw At
2
There are 2 best solutions below
2
Cubbi
On
The C++11 solution is to wrap the UTF-8 stream in an appropriate wbuffer_convert
#include <fstream>
#include <string>
#include <codecvt>
int main()
{
std::ifstream utf8file("test.txt"); // if the file holds UTF-8 data
std::wbuffer_convert<std::codecvt_utf8<wchar_t>> conv(utf8file.rdbuf());
std::wistream ucsbuf(&conv);
std::wstring line;
getline(ucsbuf, line); // then line holds UCS2 or UCS4, depending on the OS
}
This works with Visual Studio 2010 and with clang++/libc++, but, unfortunately, not with GCC.
Until this becomes widespread, third-party libraries are indeed the best solution.
Related Questions in C++
- How to immediately apply DISPLAYCONFIG_SCALING display scaling mode with SetDisplayConfig and DISPLAYCONFIG_PATH_TARGET_INFO
- Why can't I use templates members in its specialization?
- How to fix "Access violation executing location" when using GLFW and GLAD
- Dynamic array of structures in C++/ cannot fill a dynamic array of doubles in structure from dynamic array of structures
- How do I apply the interface concept with the base-class in design?
- File refuses to compile std::erase() even if using -std=g++23
- How can I do a successful map when the number of elements to be mapped is not consistent in Thrust C++
- Can std::bit_cast be applied to an empty object?
- Unexpected inter-thread happens-before relationships from relaxed memory ordering
- How i can move element of dynamic vector in argument of function push_back for dynamic vector
- Brick Breaker Ball Bounce
- Thread-safe lock-free min where both operands can change c++
- Watchdog Timer Reset on ESP32 using Webservers
- How to solve compiler error: no matching function for call to 'dmhFS::dmhFS()' in my case?
- Conda CMAKE CXX Compiler error while compiling Pytorch
Related Questions in UNICODE
- Question about unicode assignments in python
- Can't we make a better variable-length character encoding with just using the 1 bit extra in the 7 bit ASCII?
- UTF-8 string has too many bytes using SBCL and babel on Windows 64 bits
- how to implement ZWJ and NZWJ in fontlab
- charAt() on HTML entities
- NCURSESW - Unable to use addwstr function to print out unicode characters outside of standard ASCII
- pdftk unicode works in preview but not adobe acrobat
- How to store metadata for a UTF-8 text file cross-platform?
- Is there a 'bottom-to-top' equivalent of the unicode 'rtl override'?
- pdftk generated pdf does not render correct utf-8
- How do I add a bullet point before a line of text in ZPL on a Zebra ZD500R?
- Visual C++ - how can I turn a unicode character into char or string?
- Getting error 'Some bytes have been replaced with the Unicode substitution character while loading file ... with Unicode (UTF-8)"
- French special characters unicode required for first name
- How to use HTML5 input pattern attribute to validate Latin and extended Latin characters only
Related Questions in UTF-8
- Can't we make a better variable-length character encoding with just using the 1 bit extra in the 7 bit ASCII?
- UTF-8 issue with excel
- UTF-8 string has too many bytes using SBCL and babel on Windows 64 bits
- How to convert from Java ASCII properties to UTF8 (Java 9) properties
- How to read a file that contains both ANSI and UTF-8 encoded characters
- BSONError in MongoDB Compass
- Create HMAC SHA-1 in JS with byte array
- pdftk unicode works in preview but not adobe acrobat
- xml file from ISO-8859-2 to UTF-8 in python
- How to store metadata for a UTF-8 text file cross-platform?
- Encoding problem on MySQL: Why some non-ASCII characters get encoded on more than 4 bytes?
- How to get character position in a text file encode in UTF-8 in C?
- Unicode character ſ is matched as itself and as 's.'
- VS Code integrated terminal UTF-8 input problem
- pdftk generated pdf does not render correct utf-8
Related Questions in IOSTREAM
- Writing tuple data into a file using a loop
- Is it safe to output a NULL char to std::cout?
- Why is my vector not being written to code?
- C++ Input/Output stream
- No such file or directory #include <iostream>
- cin is not working in a Do..While loop, program is crashing directly after start
- Segmentation fault in c++ in visual studio code
- Can't display accents when executing a cpp code
- Undefined references to std::iostream functions?
- Printing std::u8stream in std::ostream and a concept for checking it
- #include <iostream>
- Inconsistent output order of std::cout and std::cerr in CLion
- The book "Accelerated C++: Practical Programming by Example". End-of-file indictation misunderstanding
- Stream insertion operator >> for std::optional
- How does std::setprecision affect general floating-point formatting?
Related Questions in FILESTREAMS
- Reading data from a text file with different variable types in C/C++
- Bitmap curropts after being loaded and saved
- Forking Streams in java
- Java open stream from an arbitrary location in file
- UTF-8-compliant IOstreams
- Implementing stop and restart in file stream transfer - how to? C# .NET
- Avoiding Error Flags when Reading Files
- Can a file stream destructor throw an exception in C++?
- Python: Opening a file without creating a lock
- What is the equivalent 'streams' code of TStringList.SaveToFile and which is better for large amounts of data?
- Copy ResourceStream In Stream
- Unable to transfer large size file using request module in Node.js
- How to wait for cpp fstream to finish writing before moving on?
- Error #2044: Unhandled IOErrorEvent:. text=Error #2032: Stream Error
- Writing to a file duplicates the last entry
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
Your question doesn't quite work. UTF-8 is a specific encoding, while
wchar_tis a data type. Moreover,wchar_tis intended by the standard to represent the system's character set, but this is entirely left to platform, and the standard makes no requirements.Therefore, the correct thing to ask for is first of all conversion between the system's narrow, multibyte encoding and the fixed-length encoding of the system's encoding into a wide string. This functionality is provided by
std::mbstowcsandstd::wcstombs. There may also be a locale facet somewhere that wraps this, but that's a bit of a niche area of the library.If you want to convert between the opaque "system's encoding" prescribed by the standard and a definite encoding prescribed by your serialized data source/sink, you need an extra library. I'd recommend Posix's
iconv(), which is widely available. (The Windows API has a different approach and offers special functions for conversion.)C++11 alleviates the issue slightly by adding an explicit family of UTF-encoded string types and literals, and presumably also transcoding facilities among those (though I've never seen them implemented by anyone).
Here's my standard response of past posts on the subject: Q1, Q2, Q3. C++11 will be a joy once its fully available :-)