How to remove/delete CTRL-M (^M) characters from text files in Linux and UNIX systems
Newline
Newline (frequently called line ending, end of line (EOL), line feed, or line break) is a control character or sequence of control characters in a character encoding specification (e.g. ASCII table or EBCDIC ) that is used to signify the end of a line of text and the start of a new one.
The concepts of line feed (LF) and carriage return (CR) are closely associated and can be considered either separately or together. In the physical media of typewriters and printers, two axes of motion, "down" and "across", are needed to create a new line on the page. Although the design of a machine (typewriter or printer) must consider them separately, the abstract logic of software can combine them together as one event. This is why a newline in character encoding can be defined as LF and CR combined into one (commonly called CR+LF or CRLF).
OS designers had to choose how to represent the start of a new line in text in computer files. For various historical reasons, in the Unix/Linux world a single LF character was chosen as the newline marker; MS-DOS chose CR+LF, and Windows inherited this. Thus different platforms use different conventions.
Operating system | Character Encoding | Abbreviation | hex value | dec value | Escape Sequence |
---|---|---|---|---|---|
Linux, Unix, Free BSD, Unix like OS | ASCII | LF | 0A | 10 | \n |
Microsoft Windows, MS-DOS, Symbian OS, Palm OS | ASCII | CR LF | 0D 0A | 13 10 | \r\n |
When you copy text file from windows to linux, you copy new line as \r\n and this escape seqence is saved to file in linux machine. When you print such file with e.g. cat -v file.txt
, linux represent \n as a new line, but carriage return
(escape seqence \r) still remain in the end of each file.
So, when you copy file from windows to linux/bsd system, ^M (carriage return ) char still remain on the end of every line.
~] cat -v file.txt
1. line number 1^M
2. line number 2^M
3. line number 3^M
4. line number 4^M
I found a lot of instructions on how to remove all ^M (carriage return ) from a file, but only a few of them were really functional.
Introdution to ASCII
The ASCII table
contains letters, numbers, control characters, and other symbols. Each character is assigned a unique 7-bit code. ASCII is an acronym for American Standard Code for Information Interchange.
We need only a ASCII representation of carriage return char.
Decimal | Octal | Hex | Binary | Value | Description | Carret notation | Escape sequence in C |
---|---|---|---|---|---|---|---|
013 | 015 | 0D | 0000 1101 | CR | carriage return | ^M | \r |
Howe remove CTRL+M / CTRL^M from file
sed solution
Eescape sequence is a sequence of characters that does not represent itself when used inside a character or string literal, but is translated into another character or a sequence of characters that may be difficult or impossible to represent directly.
Escape Sequence | Description |
---|---|
\r | Produces or matches a carriage return (ASCII 13). |
\cx | Produces or matches CONTROL-x, where x is any character. |
\dxxx | Produces or matches a character whose decimal ASCII value is xxx. |
\oxxx | Produces or matches a character whose octal ASCII value is xxx. |
\xxx | Produces or matches a character whose hexadecimal ASCII value is xx. |
the following examples have the same result: delete ^M carriage return from file
~] sed -i 's/\r//g' file.txt
~] sed -i 's/\cM//g' file.txt
~] sed -i 's/\d013//g' file.txt
~] sed -i 's/\o015//g' file.txt
`] sed -i 's/\x0D//g' file.txt
How add carriage return (CTRL+M or ^M) to file
Another problem is how to add carriage return to end of new line:
sed solution:
~] sed -i 's/$/\x0D/g' file.txt