I'm looking for a way to convert an XML file to a CSV file with the Tag names as the headers. However, my problem is that the XML file has some duplicate tags so it makes the process of parsing the file and "finding" elements difficult. My problem is around the <File>
tags.
Here's a snippet of the file I'm trying to convert. Also, bear in mind that the <File>
tags are dynamic, so it can have 0 or even 10 <File>
tags.
<Program>
<ProgramName>Enrolled Nursing Assistant</ProgramName>
<Category>Nursing</Category>
<Credential>Enrolled Nurse</Credential>
<ProgramLevel>-</ProgramLevel>
<StartDate>27-09-2004</StartDate>
<CompletionDate>05-05-2006</CompletionDate>
<Institution>
<InstitutionType>College</InstitutionType>
<SchoolName>EXCELSIOR COMMUNITY COLLEGE(MAIN)</SchoolName>
<PrimaryLanguage>English</PrimaryLanguage>
<LanguageOfInstruction>
<Theory>English</Theory>
<Clinical>English</Clinical>
</LanguageOfInstruction>
<Address>
<StreetAddress1>-</StreetAddress1>
<StreetAddress2>-</StreetAddress2>
<POBox>-</POBox>
<City>-</City>
<State>-</State>
<Country iso-code='876'>Jamaica</Country>
<PostalCode>-</PostalCode>
</Address>
</Institution>
<Documents>
<Document>
<DocumentType>TRANSCRIPT</DocumentType>
<DocumentNumber>001</DocumentNumber>
<IssuedFrom iso-code='876'>Country</IssuedFrom>
<DateIssued>-</DateIssued>
<ReceivedDate>28-05-2014</ReceivedDate>
<Files>
<File>
<Name>001.tiff</Name>
<Path>images\education</Path>
<Extension>tiff</Extension>
<Size>36000</Size>
<LastModifiedDate>28-05-2014</LastModifiedDate>
</File>
<File>
<Name>7002.tiff</Name>
<Path>images\education</Path>
<Extension>tiff</Extension>
<Size>38000</Size>
<LastModifiedDate>28-05-2014</LastModifiedDate>
</File>
<File>
<Name>003.tiff</Name>
<Path>images\education</Path>
<Extension>tiff</Extension>
<Size>50000</Size>
<LastModifiedDate>28-05-2014</LastModifiedDate>
</File>
</Files>
</Document>
</Documents>
</Program>
I have a solution to convert this to CSV but it doesn't handle the duplicated tags. It simply keeps using the details of first <File>
tag. So, in my CSV file there would be the tags for the 3 File but they all would have the details of the first tag.
var xml = XDocument.Load(@"C:/path/7123451_53957.xml");
string program = "Program";
Func<XDocument, IEnumerable<string>> getFields =
xd =>
xd
.Descendants(program)
.SelectMany(d => d.Descendants())
.Select(e => e.Name.ToString());
var headers =
String.Join(",",
getFields(xml)
.Select(f => csvFormat(f)));
var programQuery =
(from programs in xml.Descendants(program)
select string.Join(",",
getFields(xml)
.Select(f => programs.Descendants(f).Any()
? programs.Descendants(f).First().Value
: "")
.Select(x => csvFormat(x))))
.ToList();
I believe the problem is around the programs.Descendants(f).First().Value
part.