I'm migrating from LaTeX to PrinceXML. One of the things I need to do is to convert the bibliography. I've converted my .bib
file to HTML. However, since LaTeX took care of sorting the entries for me, I haven't taken care to put them into the correct order - but in the HTML the order of declaration does matter.
So my problem is: using Linux command line tools (e.g. Perl is acceptable, but Javascript is not), how can I sort a source file like this:
<div id="references">
<h2>References</h2>
<ul>
<li id="reference-to-book-1">
<span class="ref-author">Sample, Peter</span>
<cite><a href="http://example.org/">Online Book 1</a></cite>
<span class="ref-year">2011</span>
</li>
<li id="reference-to-book-2">
<cite>Physical Book 2</cite>
<span class="ref-year">2012</span>
<span class="ref-author">Example, Sandy</span>
</li>
</ul>
</div><!-- references -->
to look like this:
<div id="references">
<h2>References</h2>
<ul>
<li id="reference-to-book-2">
<span class="ref-author">Example, Sandy</span>
<cite>Physical Book 2</cite>
<span class="ref-year">2012</span>
</li>
<li id="reference-to-book-1">
<span class="ref-author">Sample, Peter</span>
<cite><a href="http://example.org/">Online Book 1</a></cite>
<span class="ref-year">2011</span>
</li>
</ul>
</div><!-- references -->
The criteria being:
- The
<li>
elements containing the entries are sorted alphabetically according to author (i.e. everything from one<li id="
to its corresponding</li>
is to be moved as a single block). - Within each entry, the elements are in the following order:
- line matches
class="ref-author"
- line matches
<cite>
- line matches
class="ref-year"
- There are more elements (e.g.
class="publisher"
) I omitted from the example for purposes of clarity; also, I run across this sorting problem very often. So it would be helpful if the expressions to match could be specified freely (e.g. as an array declaration in the script).
- line matches
- The remainder of the file (outside
/id="references"/,/-- references --/
) is unchanged. - The result file should have each line unchanged except for its position in the file (this point added because I the XML parsers I tried broke my indentation).
I got 1, 3 and 4 solved using sed
and sort
, but can't get 2 to work that way.
I'd use Mojo for this. You might need to tidy up the XML afterwards.