Fellow Programmers,
I searched the forum , but couldn't find answer to my problem.
I am trying to parse 2 GB xml file in C using expat, here snippet from my code ( I have removed most of the part which does not relate to my problem ),
void main(int argc, char **argv) {
XML_Parser p = XML_ParserCreate(NULL);
FILE *fp;
fp = fopen("/dev/shm/GNBIExport_XML_RT_06_11_2015_07_48_53_953_10_100_5_153.xml","r");
XML_UseParserAsHandlerArg(p);
XML_SetElementHandler(p, start_hndl, end_hndl);
XML_SetCharacterDataHandler(p, char_hndl);
char buffer[10000000];
memset(buffer,0,10000000);
size_t file_size=0;
file_size=fread(buffer,sizeof(char),10000000,fp);
while(file_size != 0 ){
if ( XML_Parse(p,buffer,strlen(buffer),XML_FALSE) == XML_STATUS_ERROR ){
printf("Encountered error\n");
exit(-1);
}
file_size=fread(buffer,sizeof(char),10000000,fp);
}
}
As you see I am reading from file and putting into buffer of size 10000000.
My problem is , I get some times malformed XML error or mismatched tag error. My understanding is because the xml file is huge so when the data is read into buffer , it might miss to include the closing tag in buffer for which open tag is present in buffer - thats why mismatched tag error, Malformed XML error because , instead of reading a complete tag like for example xml is
<Transmission> <BTSTEMPLATERSC> <attributes><TEMPLATENAME>defaultOfBTS30</TEMPLATENAME></attributes> </BTSTEMPLATERSC>
and buffer reads only
<Transmission> <BTSTEMPLATERSC> <attributes><TEMPLATENAME>defaultOfBTS30</TEMPLATENAME></attributes> </BTSTEMPL
BTSTEMPLATERSC tag is not complete hence I get malformed xml error.
So , can some one please help me know how can I read a chunk of xml data correctly so that these two errors can be avioded ?
Thanks Sarwesh