cut command in linux with examples
The cut command is a command-line utility for cutting sections from each line of a file. It writes the result to the standard output. It’s worth noting that it does not modify the file, but only works on a copy of the content. Although typically the input to a cut command is a file, we can pipe the output of other commands and use it as input.
It can be used to cut parts of a line by byte position, character and field. Basically the cut command slices a line and extracts the text. It is necessary to specify option with command otherwise it gives error. If more than one file name is provided then data from each file is not precedes by its file name.
How to Use the cut Command
The syntax for the cut command is as follows:
The options that tell cut whether to use a delimiter, byte position, or character when cutting out selected portions the lines are as follows:
cut OPTION... [FILE]...
- -f (--fields=LIST) - Select by specifying a field, a set of fields, or a range of fields. This is the most commonly used option.
- -b (--bytes=LIST) - Select by specifying a byte, a set of bytes, or a range of bytes.
- -c (--characters=LIST) - Select by specifying a character, a set of characters, or a range of characters.
Other options are:
- -d (--delimiter) - Specify a delimiter that will be used instead of the default “TAB” delimiter.
- --complement - Complement the selection. When using this option cut displays all bytes, characters, or fields except the selected.
- -s (--only-delimited) - By default cut prints the lines that contain no delimiter character. When this option is used, cut doesn’t print lines not containing delimiters.
- --output-delimiter - The default behavior of cut is to use the input delimiter as the output delimiter. This option allows you to specify a different output delimiter string.
The cut command can accept zero or more input FILE names. If no FILE is specified, or when FILE is -, cut will read from the standard input.
The LIST argument passed to the -f, -b, and -c options can be an integer, multiple integers separated by commas, a range of integers or multiple integer ranges separated by commas. Each range can be one of the following:
- N - the Nth field, byte or character, starting from 1.
- N- - from the Nth field, byte or character, to the end of the line.
- N-M - from the Nth to the Mth field, byte, or character.
- -M - from the first to the Mth field, byte, or character.
cut - Slicing by Bytes
Before going any further, let’s make a distinction between bytes and characters.
One byte is 8 bits and can represent 256 different values. When the ASCII standard was established, it took into account all of the letters, numbers, and symbols necessary to work with English. The ASCII character table has 128 characters, and each character is represented by one byte. When computers started to become globally accessible, tech companies began to introduce new character encodings for different languages. For languages that have more than 256 characters, a simple 1 to 1 mapping was not possible. This leads to different problems such as sharing documents or browsing websites, and a new Unicode standard that can handle most of the world’s writing systems was needed. UTF-8 was created to solve these problems. In UTF-8, not all characters are represented with 1 byte. Characters can be represented with 1 byte to 4 bytes.
The -b (--bytes) option tells the command to cut sections from each line specified by given byte positions.
In the following examples, we are using the ü
character that takes 2 bytes.
# select first byte
~] echo 'Zürich' | cut -b 1
Z
# for 'ü' char you must select bytes from 2 to 3
~] echo 'Zürich' | cut -b 2-3
ü
~] echo 'Zürich' | cut -b 2
?
~] echo 'Zürich' | cut -b 4
r
But for files contains only ASCI chars cut works very well:
cut - Slicing by Characters
-c (--characters=LIST): To cut by character use the -c option. This selects the characters given to the -c option. This can be a list of numbers separated comma or a range of numbers separated by hyphen(-). Tabs and backspaces are treated as a character. It is necessary to specify list of character numbers otherwise it gives error with this option.
It’s similar to slicing by byte, except that it uses the character position rather than the byte position. So, if a character uses multiple bytes, the output will include the whole character instead of a byte from the character.
Syntax:
~] cut -c [(k)-(n)/(k),(n)/(n)] filename
Here, k denotes the starting position of the character and n denotes the ending position of the character in each line, if k and n are separated by "-" otherwise they are only the position of character in each line from the file taken as an input.
Example 1
~] echo spéciale | cut -c 3
é
~] echo spéciale | cut -b 3
?
~] echo spéciale | cut -b 3,4
é
Example 2
~] cat example.txt
France:Paris:euro
England:London:pound sterling
Japan:Tokio:yen
USA:Washington:dollar
China:Peking:renminbi
Τηεοδ29
~] cut -c 2,5,7 example.txt
rc:
nad
anT
SWs
haP
ηδ9
~] cut -c 1-7 example.txt
France:
England
Japan:T
USA:Was
China:P
Τηεοδ29
~] cut -c 3- example.txt
ance:Paris:euro
gland:London:pound sterling
pan:Tokio:yen
A:Washington:dollar
ina:Peking:renminbi
εοδ29
~] cut -c -5 example.txt
Franc
Engla
Japan
USA:W
China
Τηεοδ
~] cut -c 1,2,4-5,8 example.txt
FrncP
Enla:
Jaano
US:Wh
Chnae
Τηοδ
doesn’t have an option to cut by characters
. When using the -c option, cut behaves the same as when using the -b option. But for RedHat 8 it does not apply. Examples from this arcicle was made on RedHat 8.
To simulate some uses of coreutils
cut -c
you have to use sed
program or use perl
programming language.
# we want chars from postion 3 to 5
# don't capture first two chars and than capture next 3 chars
~] echo Τηεοδ29 | sed -e 's/^.\{2\}\(.\{3\}\).*/\1/'
εοδ
# we want chars from postion 3 to 5
# don't capture first two chars and than capture next 3 chars
~] echo Τηεοδ29 | perl -CS -pe 's/^.{2}(.{3}).*/\1/'
εοδ
- -CS turns on UTF-8 for standard input, output and error in perl
cut - Slicing by Fields
Now, let’s see how we can slice file data by field.
Here, we’ve used the -f option of the cut command and sliced the input using 1 as the field number. Cut assuming that the fields in the file are separated using the tab delimiter. But, we can override this behavior by using the -d or -–delimiter option to specify a different delimiter:
~] cat example.txt
France:Paris:euro
England:London:pound sterling
Japan:Tokio:yen
USA:Washington:dollar
China:Peking:renminbi
Τηεοδ29
~] cut -d : -f 2 example.txt
Paris
London
Tokio
Washington
Peking
Τηεοδ29
~] cut -d : -f 1,3 example.txt
France:euro
England:pound sterling
Japan:yen
USA:dollar
China:renminbi
Τηεοδ29