C # parse and modify strings in yaml

I'm looking for a way to parse a yaml file and change each line, then save the file without changing the structure of the original file. In my opinion I shouldn't be using Regex for this, but some kind of yaml parser. An example of yaml input is below:

receipt:     Oz-Ware Purchase Invoice
date:        2007-08-06
customer:
    given:   Dorothy

items:
    - part_no:   A4786
      descrip:   Water Bucket (Filled)

    - part_no:   E1628
      descrip:   High Heeled "Ruby" Slippers
      size:      8

bill-to:  &id001
    street: |
            123 Tornado Alley
            Suite 16
    city:   East Centerville
    state:  KS

ship-to:  *id001

specialDelivery:  >
    Follow the Yellow Brick
    Road to the Emerald City.
...

      

Desired output:

receipt:     ###Oz-Ware Purchase Invoice###
date:        ###2007-08-06###
customer:
    given:   ###Dorothy###

items:
    - part_no:   ###A4786###
      descrip:   ###Water Bucket (Filled)###

    - part_no:   ###E1628###
      descrip:   ###High Heeled "Ruby" Slippers###
      size:      ###8###

bill-to:  ###&id001###
    street: |
            ###123 Tornado Alley
            Suite 16###
    city:   ###East Centerville###
    state:  ###KS###

ship-to:  ###*id001###

specialDelivery:  >
    ###Follow the Yellow Brick
    Road to the Emerald City.###
...

      

Is there a good yaml parser that could handle complex yaml files, modify strings, and save that data without affecting the document structure? Perhaps you have another idea how to fix this problem. Basically, I would like to iterate over each line from the top of the document and make some changes to the line. Any hints were appreciated.

+3


source to share


2 answers


Most YAML parsers are built to read YAML, either written by other programs or edited by humans, and to write YAML to be read by other programs. What is known to be missing is the ability of parsers to write YAML, which is still human readable:

  • display order of keys is undefined
  • comments are discarded
  • scalar block literal style, if present, is discarded
  • distance around scalars is discarded
  • scalar folding information, if any, is discarded

Loading a dump of an uploaded YAML file with manual processing will create the same internal data structures as the main load, but the intermediate dump usually does not look like the original (manual) YAML.

If you have a Python program:

import ruamel.yaml as yaml

yaml_str = """\
receipt:     Oz-Ware Purchase Invoice
date:        2007-08-06
customer:
    given:   Dorothy

items:
    - part_no:   A4786
      descrip:   Water Bucket (Filled)

    - part_no:   E1628
      descrip:   High Heeled "Ruby" Slippers
      size:      8

bill-to:  &id001
    street: |
            123 Tornado Alley
            Suite 16
    city:   East Centerville
    state:  KS

ship-to:  *id001

specialDelivery:  >
    Follow the Yellow Brick
    Road to the Emerald City.
"""

data1 = yaml.load(yaml_str, Loader=yaml.Loader)
dump_str = yaml.dump(data1, Dumper=yaml.Dumper)
data2 = yaml.load(dump_str, Loader=yaml.Loader)

      

Then the following statements hold:

assert data1 == data2
assert dump_str != yaml_str

      

The intermediate dump_str

looks like this:

bill-to: &id001 {city: East Centerville, state: KS, street: '123 Tornado Alley

    Suite 16

    '}
customer: {given: Dorothy}
date: 2007-08-06
items:
- {descrip: Water Bucket (Filled), part_no: A4786}
- {descrip: High Heeled "Ruby" Slippers, part_no: E1628, size: 8}
receipt: Oz-Ware Purchase Invoice
ship-to: *id001
specialDelivery: 'Follow the Yellow Brick Road to the Emerald City.

  '

      

The above is the default behavior for ruamel.yaml , PyYAML, and for many other YAML parsers in other languages ​​and online YAML conversion services. For some parsers, this is the only behavior.

The reason for running ruamel.yaml as an enhancement to PyYAML was to make the transition from manual YAML to internal data, to YAML, lead to something that is better human readable (what I call a round trip) and saves additional information (especially comments).



data = yaml.load(yaml_str, Loader=yaml.RoundTripLoader)
print yaml.dump(data, Dumper=yaml.RoundTripDumper)

      

gives you:

receipt: Oz-Ware Purchase Invoice
date: 2007-08-06
customer:
  given: Dorothy
items:
- part_no: A4786
  descrip: Water Bucket (Filled)
- part_no: E1628
  descrip: High Heeled "Ruby" Slippers
  size: 8
bill-to: &id001
  street: |
    123 Tornado Alley
    Suite 16
  city: East Centerville
  state: KS
ship-to: *id001
specialDelivery: 'Follow the Yellow Brick Road to the Emerald City.

  '

      

My focus was on comments, key, orderly and literal block style. The distance between scalars and stacked scalars is not (yet) special.


From now on (you can also do this in PyYAML, but you won't have the built-in enhancements to ruamel.yaml key management), you can either provide custom emitters or hook into the system at a lower level by overriding some of the methods in emitter.py

(and making sure you can call originals for cases you don't need to handle:

def rewrite_write_plain(self, text, split=True):
    if self.state == self.expect_block_mapping_simple_value:
        text = '###' + text + '###'
        while self.column < 20:
            text = ' ' + text
            self.column += 1
    self._org_write_plain(text, split)

def rewrite_write_literal(self, text):
    if self.state == self.expect_block_mapping_simple_value:
        last_nl = False
        if text and text[-1] == '\n':
            last_nl = True
            text = text[:-1]
        text = '###' + text + '###'
        if False:
            extra_indent = ''
            while self.column < 15:
                text = ' ' + text
                extra_indent += ' '
                self.column += 1
            text = text.replace('\n', '\n' + extra_indent)
        if last_nl:
            text += '\n'
    self._org_write_literal(text)

def rewrite_write_single_quoted(self, text, split=True):
    if self.state == self.expect_block_mapping_simple_value:
        last_nl = False
        if text and text[-1] == u'\n':
            last_nl = True
            text = text[:-1]
        text = u'###' + text + u'###'
        if last_nl:
            text += u'\n'
    self.write_folded(text)

def rewrite_write_indicator(self, indicator, need_whitespace,
                    whitespace=False, indention=False):
    if indicator and indicator[0] in u"*&":
        indicator = u'###' + indicator + u'###'
        while self.column < 20:
            indicator = ' ' + indicator
            self.column += 1
    self._org_write_indicator(indicator, need_whitespace, whitespace,
                              indention)

dumper._org_write_plain = dumper.write_plain
dumper.write_plain = rewrite_write_plain
dumper._org_write_literal = dumper.write_literal
dumper.write_literal = rewrite_write_literal
dumper._org_write_single_quoted = dumper.write_single_quoted
dumper.write_single_quoted = rewrite_write_single_quoted
dumper._org_write_indicator = dumper.write_indicator
dumper.write_indicator = rewrite_write_indicator

print yaml.dump(data, Dumper=dumper, indent=4)

      

gives you:

receipt:             ###Oz-Ware Purchase Invoice###
date:                ###2007-08-06###
customer:
    given:           ###Dorothy###
items:
-   part_no:         ###A4786###
    descrip:         ###Water Bucket (Filled)###
-   part_no:         ###E1628###
    descrip:         ###High Heeled "Ruby" Slippers###
    size:            ###8###
bill-to:             ###&id001###
    street: |
        ###123 Tornado Alley
        Suite 16###
    city:            ###East Centerville###
    state:           ###KS###
ship-to:             ###*id001###
specialDelivery: >
    ###Follow the Yellow Brick Road to the Emerald City.###

      

which is hopefully acceptable for further processing in C #

+1


source


In the YAML spec this is to say :

In the view model, the mapping keys have no order. To serialize a mapping, you need to impose an order on its keys. This order is a serialization detail and should not be used when constructing the presentation graph (and therefore to store application data). In every case where the order of the node is significant, a sequence must be used. For example, an ordered mapping can be thought of as a sequence of mappings, where each mapping is a pair with one key: value. YAML provides a convenient compact notation for this case.

Thus, you really shouldn't expect YAML to maintain any order when loading and saving documents.

Having said that, I completely understand where you are from. Since YAML documents are meant for humans, maintaining a certain order is definitely useful. Unfortunately, due to the specification, most implementations will use unordered data structures to represent key / value mappings. In C # and Python, this would be a dictionary; and dictionaries are developed without ordering.



But both C # and Python have ordered dictionary types, OrderedDictionary

and OrderedDict

and, at least for Python, there has been some effort in the past to maintain key order using ordered dictionaries:

This is the Python side; I'm sure there are similar efforts to implement C # too.

+2


source







All Articles