Get-Content and show control characters such as `r - render control characters in strings

What flag can be passed to Get-Content

to display control characters such as \r\n

or \n

?

What I am trying to do is to detect if the end of file lines are in Unix or Dos style. I've tried just run Get-Content

that doesn't show line endings. I have also tried using Vim with set list

, which only shows $

whatever the line ends with.

I would like to do this using PowerShell because that would be helpful.

+5


source to share


3 answers


One way is to use the Get-Content -Encoding parameter, for example:

Get-Content foo.txt -Encoding byte | % {"0x{0:X2}" -f $_}

      

If you have PowerShell Community Extensions , you can use the Format-Hex command:

Format-Hex foo.txt

Address:  0  1  2  3  4  5  6  7  8  9  A  B  C  D  E  F ASCII
-------- ----------------------------------------------- ----------------
00000000 61 73 66 09 61 73 64 66 61 73 64 66 09 61 73 64 asf.asdfasdf.asd
00000010 66 61 73 0D 0A 61 73 64 66 0D 0A 61 73 09 61 73 fas..asdf..as.as

      



If you really want to see "\ r \ n" in the output, then do what BaconBits suggests, but you must use the -Raw parameter, for example:

(Get-Content foo.txt -Raw) -replace '\r','\r' -replace '\n','\n' -replace '\t','\t'

      

Outputs:

asf\tasdfasdf\tasdfas\r\nasdf\r\nas\tasd\r\nasdfasd\tasf\tasdf\t\r\nasdf

      

+7


source


Below is a custom function Debug-String

that renders control characters in strings
:

  • where possible, using PowerShell's '

    own -prefixed escape sequence notation (e.g. 'r

    for CR) where native PowerShell escaping is available,

  • return to carriage notation (for example, the ASCII range control character with code point 0x4

    - END OF TRANSMISSION - is represented as ^D

    ).

    • Alternatively, you can use a switch -CaretNotation

      to carriage all control characters in the ASCII range, resulting in output similar cat -A

      to Linux and cat -et

      macOS / BSD.
  • all other control characters, namely those outside the ASCII range ( 0x0

    ASCII range code points 0x0

    - 0x7F

    ), are represented as 'u{<hex>}

    , where <hex>

    is the hexadecimal code. Code point representation up to 6 digits long; For example, 'u{85}

    is a Unicode character. , control symbol NEXT LINE; this notation is now also supported in expandable strings ( ), but only in PowerShell Core. U+0085

    "..."

With reference to your version of the use you will use (requires PSv3 + , because of the use Get-Content -Raw

to ensure the reading of the file as a whole, would have been lost without him on the line endings information):

Get-Content -Raw $file | Debug-String

      

Two simple examples:


Using PowerShell Escape Sequence Notation. Note that this only looks like a no-op: the '-prefixed sequences inside the "..." lines create the actual control characters.

PS> "a'ab't c'0d'r'n" | Debug-String

      

a'ab't c'0d'r'n

      



Using -CaretNotation

, with output similar cat -A

to Linux:

PS> "a'ab't c'0d'r'n" | Debug-String -CaretNotation

      

a^Gb^I c^@d^M$

      


Source code Debug-String

:

Function Debug-String {
  param(
    [Parameter(ValueFromPipeline, Mandatory)]
    [string] $String
    ,
    [switch] $CaretNotation
  )

  begin {
    # \p{C} matches any Unicode control character, both inside and outside
    # the ASCII range; note that tabs ('t) are control character too, but not spaces.
    $re = [regex] '\p{C}'
  }

  process {

    $re.Replace($String, {
      param($match)
      $handled = $False
      if (-not $CaretNotation) {
        # Translate control chars. that have native PS escape sequences into them.
        $handled = $True
        switch ([Int16] [char] $match.Value) {
          0  { ''0'; break }
          7  { ''a'; break }
          8  { ''b'; break }
          12 { ''f'; break }
          10 { ''n'; break }
          13 { ''r'; break }
          9  { ''t'; break }
          11 { ''v'; break }
          default { $handled = $false }
        } # switch
      }
      if (-not $handled) {
          switch ([Int16] [char] $match.Value) {
            10 { '$'; break } # cat -A / cat -e visualizes LFs as '$'
            # If it a control character in the ASCII range, 
            # use caret notation too (C0 range).
            # See https://en.wikipedia.org/wiki/Caret_notation
            { $_ -ge 0 -and $_ -le 31 -or $_ -eq 127 } {
              # Caret notation is based on the letter obtained by adding the
              # control-character code point to the code point of '@' (64).
              '^' + [char] (64 + $_)
              break
            }
            # NON-ASCII control characters; use the - PS Core-only - Unicode
            # escape-sequence notation:
            default { ''u{{{0}}}' -f ([int16] [char] $_).ToString('x') }
          }
      } # if (-not $handled)
    })  # .Replace
  } # process

}

      

For brevity, I have not included help based on the comments above; here:

<#
.SYNOPSIS
Outputs a string in diagnostic form.

.DESCRIPTION
Prints a string with normally hidden control characters visualized.

Common control characters are visualized using PowerShell own escaping 
notation by default, such as
"'t" for a tab, "'n" for a LF, and "'r" for a CR.

Any other control characters in the ASCII range (C0 control characters)
are represented in caret notation (see https://en.wikipedia.org/wiki/Caret_notation).

If you want all ASCII range control characters visualized using caret notation,
except LF visualized as "$", similiar to 'cat -A' on Linux, for instance, 
use -CaretNotation.

Non-ASCII control characters are visualized by their Unicode code point
in the form 'u{<hex>}, where <hex> is the hex. representation of the
code point with up to 6 digits; e.g., 'u{85} is U+0085, the NEXT LINE
control char.

.PARAMETER CaretNotation
Causes LF to be visualized as "$" and all other ASCII-range control characters
in caret notation, similar to 'cat -A' on Linux.

.EXAMPLE
PS> "a'ab't c'0d'r'n" | Debug-String
a'ab't c'0d'r'n

.EXAMPLE
PS> "a'ab't c'0d'r'n" | Debug-String -CaretNotation
a^Gb^I c^@d^M$
#>

      

+4


source


Here's one way using regex replacement:

function Printable([string] $s) {
    $Matcher = 
    {  
      param($m) 

      $x = $m.Groups[0].Value
      $c = [int]($x.ToCharArray())[0]
      switch ($c)
      {
          9 { '\t' }
          13 { '\r' }
          10 { '\n' }
          92 { '\\' }
          Default { "\$c" }
      }
    }
    return ([regex]'[^ -~\\]').Replace($s, $Matcher)
}

PS C:\> $a = [char[]](65,66,67, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)

PS C:\> $b = $a -join ""

PS C:\> Printable $b
ABC\1\2\3\4\5\6\7\8\t\n\11\12\r

      

+2


source







All Articles