How to remove BOM from UTF-8 file?

Question

How to remove BOM from UTF-8 file?

I have a UTF-8 encoded file with a BOM and would like to remove the BOM. Are there linux command line tools to remove a spec from a file?

$ file test.xml
test.xml:  XML 1.0 document, UTF-8 Unicode (with BOM) text, with very long lines

+10

command-line linux file utf-8 byte-order-mark

m13r 21 jul. 17 at 14:36

source to share

4 answers

Using VIM

Open file in VIM:

vi text.xml

Remove BOM encoding:

:set nobomb

Save and exit:

:wq

+11

Joshua pinter Dec 24. 17 at 18:05

source to share

You can remove the specification from the file with the command tail

:

tail --bytes=+4 withBOM.txt > withoutBOM.txt

+5

m13r 21 jul. 17 at 14:36

source to share

Ok, just figured it out today and my preferred path was dos2unix:

dos2unix will remove the BOM and also take care of other features of other SOs:

$ sudo apt install dos2unix
$ dos2unix test.xml

It is also possible to remove only the bom (-r, - -r emove-bom):

$ dos2unix -r test.xml

Note: tested with dos2unix 7.3.4

0

Reginaldo santos 05 Feb 19 at 14:53

source to share

rici · Accepted Answer · 2017-07-21T15:06:35+0000

The BOM is the Unicode code point U + FEFF; UTF-8 encoding consists of three hexadecimal values 0xEF, 0xBB, 0xBF.

With the bash you can create a UTF-8 specification of the special form of citation $''

, which implements Unicode screening: $'\uFEFF'

. So, for bash, a reliable way to remove the UTF-8 BOM from the beginning of a text file is:

sed -i $'1s/^\uFEFF//' file.txt

This will leave the file unchanged unless it starts with a UTF-8 BOM, and remove the BOM otherwise.

If you are using some other shell, you may find that "$(printf '\ufeff')"

produces a BOM character (which works with zsh

as well as any shell without an inline printf

, assuming the /usr/bin/printf

version is Gnu), but if you want a Posix-compatible version, you can use:

sed "$(printf '1s/^\357\273\277//)" file.txt

(The edit-in-place flag -i

is also a Gnu extension; this version writes the possibly modified file to standard output.)

How to remove BOM from UTF-8 file?

Using VIM

More articles: