I am trying to do some manipulation of an XMLTV format file that contains TV schedule information. Within the file are sections that look like this:
<programme start="20141215220000 -0500" stop="20141216060000 -0500" channel="someid.someaddress.com">
<title lang="en">Local Programming</title>
<length units="hours">1</length>
<episode-num system="common">S00E00</episode-num>
<episode-num system="dd_progid">SH00019112.0000</episode-num>
<previously-shown />
</programme>
As you can see the second line contains this:
<title lang="en">Local Programming</title>
What I would like to find is some kind of command line utility that runs in Linux, that can look for that specific line and if it exists, remove everything between and including the programme tags. I am not very familiar with XML files so I don't know if there is a specific name for a block of data such as this, but I just want to remove that entire section whenever the title is "Local Programming".
It would actually work better for my purposes if I could remove the block only when the title is "Local Programming" AND the channel value in the first line is a certain specific value, since I only need to remove these for a specific channel, but it would not hurt anything to remove all of the "Local Programming" blocks on any channel, and to look for two values would probably make this a much more difficult problem. It has to be a command line utility because it will be called from a short shell script.
Basically I'm just trying to identify the best tool for the job. I'm not a programmer (unless you count making a bash shell script of a few lines, that just runs several things sequentially, as programming) so I'd like to stick with an existing command line tool if possible, but I'm not adverse to pulling in something new with apt-get either. Any suggestions?
EDIT: What worked was the xmlstarlet tool suggested by Charles Duffy, but only if I did not attempt to use the --var option and instead specified the values directly. For example, this removed all blocks with the title "Local Programming" from a file xmltv.xml:
xmlstarlet ed --delete "//programme[title='Local Programming']" <xmltv.xml >newfile.xml
And if I want to remove the block only when the title is "Local Programming" AND the channel value in the first line is a certain specific value, then it appears that this works:
xmlstarlet ed --delete "//programme[title='Local Programming'][@channel='someid.someaddress.com']" <xmltv.xml >newfile.xml
This is exactly what I was looking for, so I consider the problem solved. Thank you to all who replied.
To delete any program having both the English-language title
Local Programming
and the channelsomeid.someaddress.com
:If you're targeting an older XMLStarlet release, you may need to do the substitutions yourself -- using
"Local Programming"
in place of$name
and"someid.someaddress.com"
in place of$chan
-- but the above is known to work against the 1.5.0 release.This requires the tool XMLStarlet, which should be available for installation in your distribution vendor's repository.
Note that you didn't show your document's namespace declarations -- if
xmlns='...'
has been specified in a parent, some adjustment may be called for.