Is there a Linux command line utility to remove partitions (not sure if that's the right term) from an XML file?

I am trying to do some manipulation with an XMLTV format file that contains TV schedule information. Inside the file are sections that look like this:

  <programme start="20141215220000 -0500" stop="20141216060000 -0500" channel="someid.someaddress.com">
    <title lang="en">Local Programming</title>
    <length units="hours">1</length>
    <episode-num system="common">S00E00</episode-num>
    <episode-num system="dd_progid">SH00019112.0000</episode-num>
    <previously-shown />
  </programme>

      

As you can see, the second line contains the following:

    <title lang="en">Local Programming</title>

      

What I would like to find is some command line utility that works in Linux that can look for that particular line and, if it exists, removes everything between and including program tags. I'm not very familiar with XML files, so I don't know if there is a specific name for such a block of data, but I just want to delete this whole section when the header is called "Local Programming".

It would be better for my purposes if I could only delete the block if the header is "Local Programming" and the channel value in the first line is a specific specific value, since I only need to delete them for a specific channel, but nothing it wouldn't hurt to delete all "Local Programming" blocks on any channel, and finding two values ​​is likely to be much more difficult. It should be a command line utility because it will be called from a short shell script.

Basically I'm just trying to figure out the best tool for the job. I'm not a programmer (unless you feel like making bash a multi-line shell script that just runs multiple things sequentially, like programming), so I'd like to stick with an existing command line tool if possible, but I don't mind pulling in something new with apt-get. Any suggestions?

The EDIT: . The one that worked was the xmlstarlet tool suggested by Charles Duffy, but only if I did not try to use the -var parameter and instead specified the values ​​directly. For example, this removed all blocks with the title "Local Programming" from the xmltv.xml file:

xmlstarlet ed --delete "//programme[title='Local Programming']" <xmltv.xml >newfile.xml

      

And if I want to delete a block only when the header is "Local Programming" and the channel value in the first line is a certain specific value, then it turns out that this works:

xmlstarlet ed --delete "//programme[title='Local Programming'][@channel='someid.someaddress.com']" <xmltv.xml >newfile.xml

      

This is exactly what I was looking for, so I believe the problem has been resolved. Thanks to all who responded.

+3


source to share


2 answers


To uninstall any program that has both English Local Programming

and a channel someid.someaddress.com

:

xmlstarlet ed \
  --var chan "'someid.someaddress.com'" \
  --var name "'Local Programming'" \
  --delete '//programme[title[@lang="en"]=$name][@channel=$chan]' \
  <in.xml >out.xml && mv out.xml in.xml

      

If you have targeted an earlier version of XMLStarlet, you may need to make the replacements yourself - using "Local Programming"

instead of $name

and "someid.someaddress.com"

instead of $chan

- but this is known to work against version 1.5.0.



This requires the XMLStarlet tool , which must be available for installation in the repository of the distribution provider.

Note that you did not show the document namespace declarations - if xmlns='...'

specified in the parent, some customization may be required.

+5


source


In addition to processing XML correctly, as shown in the other answer, you can always go for the old way: treat XML as plain text. In Perl:

cat fancy.xml |
perl -ne 'BEGIN{$/=undef;} print grep { /^<programme/ ? !m{<title\s+lang="en">Local\s+Programming</title>} : 1 } split qr{(<programme.*?</programme>)}s'

      



This reads all of the input XML (by flushing the input record separator), slices it into a flat list of program blocks and everything that happens in between ( split () ), and then filters out the program blocks that contain the search string ( grep () ).

+2


source







All Articles