Why doesn't "chomp" remove newlines on Windows XP with Eclipse and Cygwin Perl?

I am running Windows XP, Eclipse 3.2 with EPIC and Cygwin for my Perl interpreter, and I get unexpected results.

FYI ... When I run it on my Ubuntu distribution ( VMware , same computer) I get the expected results. Why?

############ CODE: #############

use warnings;
use strict;

my $test = "test";
my $input = <STDIN>;

print length $test, " ", length $input, "\n";

chomp $input;

print "|$test| |$input| \n";    #The bars indicate white space, new line, etc...

print length $test, " ", length $input, "\n";

if ($test eq $input) {
    print "TIME TO QUIT";
}

      

Results on Windows XP:

test           <-- My input
4 6            <-- Lengths printed before chomp
|test| |test   <-- Print the variables after chomp
|              <-- There is still a new line there
4 5            <-- Lengths after the initial chomp

      

+2


source to share


3 answers


Based on the lengths, I would say that you get the input string as:

test<cr><lf>

      

where <cr>

and <lf>

are ASCII codes 0x13 and 0x10, respectively.

When you saw it, it removes <lf>

but leaves there <cr>

.

This is almost certainly an interoperability issue between Eclipse, Cygwin, and Windows, disagreeing on what the end-of-line character sequence should be. I couldn't replicate your problem with Perl / Cygwin or Perl / Windows only, but this command gives similar results (in Cygwin):

echo 'test^M' | perl qq.pl | sed 's/^M/\n/g'

      

( qq.pl

is your script and "^M"

is the actual CTRL-M). Here's the output in text form:

4 6
|test| |test
|
4 5

      



and an octal dump:

0000000 2034 0a36 747c 7365 7c74 7c20 6574 7473
          4       6  \n   |   t   e   s   t   |       |   t   e   s   t
        064 040 066 012 174 164 145 163 164 174 040 174 164 145 163 164
0000020 7c0a 340a 3520 000a
         \n   |  \n   4       5  \n  \0
        012 174 012 064 040 065 012 000
0000027

      

So, I would say that your input is put in <cr>

, and <lf>

, while printing takes <cr>

in <lf>

(or just doing the same thing for both of them).

If you need a workaround for your environment, you can replace the line chomp

with:

$input =~ s/\r?\n$//;

      

how in:

use warnings;
use strict;
my $test = "test";
my $input = <STDIN>;
print length $test ," ",length $input,"\n";
$input =~ s/\r?\n$//;
print "|$test| |$input|\n";
print length $test," ",length $input,"\n";
if ($test eq $input) {
    print "TIME TO QUIT";
}

      

which runs on Cygwin for the test data I used (for his own situation, of course), but you may find that you can solve this problem better using tools that all agree on the final string sequence (e.g. Perl for Windows, and not Cygwin, might do the trick for you).

+4


source


Given that Windows XP indicates a problem, the difference must be related to CRLF (carriage return, line feed) handling. chomp

removes, it turns out, LF, but not CR; print translates CR to CR LF.

The Perl doc for chomp says that if you've configured Windows EOL correctly ( $/ = "\r\n";

), you chomp

should be doing it right:

$/ = "\r\n";
$test = "test\r\n";
print "<<$test>>\n";
chomp $test;
print "<<$test>>\n";

      



A hex dump of the output file gives:

0x0000: 3C 3C 74 65 73 74 0D 0A 3E 3E 0A 3C 3C 74 65 73   <<test..>>.<<tes
0x0010: 74 3E 3E 0A                                       t>>.
0x0014:

      

I am not sure why $/

it is not automatically installed - perhaps Cygwin is confusing things (pretending too well that it works on Unix).

+6


source


Here's how to remove the trailing \r\n

or \n

(whichever is at the end):

$input =~ s@\r?\n\Z(?!\n)@@;

      

Another option is to do

binmode(STDIN, ':crlf')

      

before reading anything from STDIN. This converts trailing to \r\n

only \n

one that can be removed with chomp

. This will also work even if your entry only contains \n

. See the PerlIO documentation for details.

+4


source







All Articles