I'm working on a project on C that reads a text file and converts it to an array of booleans.
First I read the file to a string of size n
(is a unsigned char array), then I use a function to convert that string to a boolean array with size n * 8
. The function works perfectly, no questions on that.
I get the string from the file using this code:
unsigned char *Data_in; // define pointer to string
int i;
FILE* sp = fopen("file.txt", "r"); //open file
fseek(sp, 0, SEEK_END); // points sp to the end of file
int data_dim = ftell(sp); // Returns the position of the pointer (amount of bytes from beginning to end)
rewind(sp); // points sp to the beginning of file
Data_in = (unsigned char *) malloc ( data_dim * sizeof(unsigned char) ); //allocate memory for string
unsigned char carac; //define auxiliary variable
for(i=0; feof(sp) == 0; i++) // while end of file is not reached (0)
{
carac = fgetc(sp); //read character from file to char
Data_in[i] = carac; // put char in its corresponding position
}
//
fclose(sp); //close file
The thing is that have a text file made by Notepad in Windows XP.
Inside it I have this 4 character string ":\n\nC"
(colon, enter key, enter key, capital C).
This is what it looks like with HxD (hex editor): 3A 0D 0A 0D 0A 43
.
This table makes it clearer:
character hex decimal binary
: 3A 58 0011 1010
\n (enter+newline) 0D 0A 13 10 0000 1101 0000 1010
\n (enter+newline) 0D 0A 13 10 0000 1101 0000 1010
C 43 67 0100 0011
Now, I execute the program, which prints that part in binary, so I get:
character hex decimal binary
: 3A 58 0011 1010
(newline) 0A 10 0000 1010
(newline) 0A 10 0000 1010
C 43 67 0100 0011
Well, now that this is shown, I ask the questions:
- Is the reading correct?
- If so, why does it take the 0Ds out?
- How does that work?
Make the
fopen
binary:Otherwise your standard library will just eat away the
\r
(0x0D
).As a side note, opening the file in binary mode also mitigates another problem where a certain sequence in the middle of the file looks like EOF on DOS.