cut和tr对文本字符串处理

发布时间 2023-11-23 17:33:06作者: 往事已成昨天

Linux 命令: cut 和 tr

1. 写在前面

本文主要介绍:Linux "cut "和 "tr" 命令行实用程序概述;

公众号: 滑翔的纸飞机

2. Linux 命令:cut

“cut” 命令是一种命令行工具,允许我们剪切指定文件或管道数据的部分内容,并将结果打印到标准输出。

root@dev:~# man cut
-------------------------------------------------------
NAME
cut - remove sections from each line of files
SYNOPSIS
cut OPTION... [FILE]...
... ...
-b, --bytes=LIST
select only these bytes
-c, --characters=LIST
select only these characters
-d, --delimiter=DELIM
use DELIM instead of TAB for field delimiter
... ...

下面是一个文本文件:让我们看看如何操作下面的文本文件,以根据需要打印输出。

test.txt:

Nov 15 00:13:08 dev com.apple.xpc.launchd[1] (com.apple.mdworker.shared.10000000-0000-0000-0000-000000000000[1938]): Service exited due to SIGKILL | sent by mds[98]
Nov 15 00:13:15 dev com.apple.xpc.launchd[1] (com.apple.mdworker.shared.01000000-0000-0000-0000-000000000000[1936]): Service exited due to SIGKILL | sent by mds[98]
Nov 15 00:13:15 dev com.apple.xpc.launchd[1] (com.apple.mdworker.shared.06000000-0200-0000-0000-000000000000[1935]): Service exited due to SIGKILL | sent by mds[98]
Nov 15 00:13:15 dev com.apple.xpc.launchd[1] (com.apple.mdworker.shared.04000000-0200-0000-0000-000000000000[1939]): Service exited due to SIGKILL | sent by mds[98]
Nov 15 00:13:41 dev com.apple.xpc.launchd[1] (com.apple.mdworker.shared.05000000-0600-0000-0000-000000000000[1940]): Service exited due to SIGKILL | sent by mds[98]
Nov 15 00:13:41 dev com.apple.xpc.launchd[1] (com.apple.mdworker.shared.08000000-0100-0000-0000-000000000000[1941]): Service exited due to SIGKILL | sent by mds[98]
Nov 15 00:13:41 dev com.apple.xpc.launchd[1] (com.apple.mdworker.shared.0D000000-0200-0000-0000-000000000000[1917]): Service exited due to SIGKILL | sent by mds[98]
Nov 15 00:13:41 dev com.apple.xpc.launchd[1] (com.apple.mdworker.shared.0E000000-0400-0000-0000-000000000000[1937]): Service exited due to SIGKILL | sent by mds[98]

2.1 按字符范围打印

在一定字符范围内打印输出 :

范围:1 - 5

root@dev:~# cut -c 1-5 test.txt 
-------------------------------------------------------
Nov 1
Nov 1
Nov 1
Nov 1
Nov 1
Nov 1
Nov 1
Nov 1

范围:21 - 40

root@dev:~# cut -c 21-40 test.txt 
-------------------------------------------------------
com.apple.xpc.launch
com.apple.xpc.launch
com.apple.xpc.launch
com.apple.xpc.launch
com.apple.xpc.launch
com.apple.xpc.launch
com.apple.xpc.launch
com.apple.xpc.launch

范围:70 - end

root@dev:~# cut -c 76-  test.txt 
-------------------------------------------------------
00000-0000-0000-0000-000000000000[1938]): Service exited due to SIGKILL | sent by mds[98]
00000-0000-0000-0000-000000000000[1936]): Service exited due to SIGKILL | sent by mds[98]
00000-0200-0000-0000-000000000000[1935]): Service exited due to SIGKILL | sent by mds[98]
00000-0200-0000-0000-000000000000[1939]): Service exited due to SIGKILL | sent by mds[98]
00000-0600-0000-0000-000000000000[1940]): Service exited due to SIGKILL | sent by mds[98]
00000-0100-0000-0000-000000000000[1941]): Service exited due to SIGKILL | sent by mds[98]
00000-0200-0000-0000-000000000000[1917]): Service exited due to SIGKILL | sent by mds[98]
00000-0400-0000-0000-000000000000[1937]): Service exited due to SIGKILL | sent by mds[98]

2.2 按字段名打印

假设我们想根据字段从以下文件中提取数据。

test.txt:

NAME EMAIL PHONE ADDRESS
devid devid@text.com 0897663232 beijin,china
harry harry@text.com 0232323232 hangzhou,china
jane jane@text.com 0323213122 zhejiang,china

我们必须使用"-d = delimiter"选项(可以是一个字符,默认为 TAB)来分隔每个字段。然后,我们必须指定要打印的字段编号。

-d, --delimiter=DELIM   
-f, --fields=LIST

>> cut -d ' ' -f1

在下面的演示中,我们使用空格(' ')作为分隔符。

# 打印空格分割第1列
root@dev:~# cut -d ' ' -f1 test.txt
-------------------------------------------------------
NAME
devid
harry
jane


# 打印空格分割第2列
root@dev:~# cut -d ' ' -f2 test.txt
-------------------------------------------------------
EMAIL
devid@text.com
harry@text.com
jane@text.com

打印多个字段:打印第1、3列

root@jpzhang-dev:~# cut -d ' ' -f1,3 test.txt 
-------------------------------------------------------
NAME PHONE
devid 0897663232
harry 0232323232
jane 0323213122

使用逗号 (, ) 作为分隔符:

root@dev:~# echo "jane,jane@dev,12345678,china" | cut -d ','  -f1
--------------------------------------------------------------------
jane

root@dev:~# echo "jane,jane@dev,12345678,china" | cut -d ',' -f2
--------------------------------------------------------------------
jane@dev

root@dev:~# echo "jane,jane@dev,12345678,china" | cut -d ',' -f3
--------------------------------------------------------------------

12345678

3. Linux 命令:tr

Linux tr 命令用于转换或删除文件中的字符。
tr 指令从标准输入设备读取数据,经过字符串转译后,将结果输出到标准输出设备。

语法

tr [-cdst][--help][--version][第一字符集][第二字符集]  
tr [OPTION]…SET1[SET2]

具体参数:

>> man tr
--------------------------------------------------------------------
tr [OPTION]... SET1 [SET2]
# Options
-c, -C, --complement
use the complement of SET1
-d, --delete
delete characters in SET1, do not translate
-s, --squeeze-repeats
replace each sequence of a repeated character that is listed in the last specified SET, with a single occurrence of that character
-t, --truncate-set1
first truncate SET1 to length of SET2
...
...

参数说明:

⁃   -c, --complement:反选设定字符。也就是符合 SET1 的部份不做处理,不符合的剩余部份才进行转换  
⁃ -d, --delete:删除指令字符
⁃ -s, --squeeze-repeats:缩减连续重复的字符成指定的单个字符
⁃ -t, --truncate-set1:削减 SET1 指定范围,使之与 SET2 设定长度相等
⁃ --help:显示程序用法信息
⁃ --version:显示程序本身的版本信息

字符集合范围:

⁃   \NNN 八进制值的字符 NNN (1 to 3 为八进制值的字符)
⁃ \\ 反斜杠
⁃ \a Ctrl-G 铃声
⁃ \b Ctrl-H 退格符
⁃ \f Ctrl-L 走行换页
⁃ \n Ctrl-J 新行
⁃ \r Ctrl-M 回车
⁃ \t Ctrl-I tab键
⁃ \v Ctrl-X 水平制表符
⁃ CHAR1-CHAR2 :字符范围从 CHAR1 到 CHAR2 的指定,范围的指定以 ASCII 码的次序为基础,只能由小到大,不能由大到小。
⁃ [CHAR*] :这是 SET2 专用的设定,功能是重复指定的字符到与 SET1 相同长度为止
⁃ [CHAR*REPEAT] :这也是 SET2 专用的设定,功能是重复指定的字符到设定的 REPEAT 次数为止(REPEAT 的数字采 8 进位制计算,以 0 为开始)
⁃ [:alnum:] :所有字母字符与数字
⁃ [:alpha:] :所有字母字符
⁃ [:blank:] :所有水平空格
⁃ [:cntrl:] :所有控制字符
⁃ [:digit:] :所有数字
⁃ [:graph:] :所有可打印的字符(不包含空格符)
⁃ [:lower:] :所有小写字母
⁃ [:print:] :所有可打印的字符(包含空格符)
⁃ [:punct:] :所有标点字符
⁃ [:space:] :所有水平与垂直空格符
⁃ [:upper:] :所有大写字母
⁃ [:xdigit:] :所有 16 进位制的数字
⁃ [=CHAR=] :所有符合指定的字符(等号里的 CHAR,代表你可自订的字符)

3.1 替换字符

替换字符:'H' > 'h'

root@dev:~# echo "Hello World" | tr 'H' 'h'
--------------------------------------------------------------------
hello World

替换字符:‘Ho’ > ‘xx’ 即 'H' 或 ‘o’ 替换为 ‘x’

root@dev:~# echo "Hello World" | tr 'Ho' 'xx'
--------------------------------------------------------------------
xellx Wxrld

3.2 删除字符

# 删除 'H' 或 'o'
root@dev:~# echo "Hello World" | tr -d 'Ho'
--------------------------------------------------------------------
ell Wrld

# 反选,除'Hd\n'其他删除
root@dev:~# echo "Hello World" | tr -cd 'Hd\n'
--------------------------------------------------------------------
Hd

# 反选,除数字外其他删除
root@dev:~# echo "Hello World 12345 " | tr -cd [:digit:]
--------------------------------------------------------------------
12345

# 反选,除字母外其他删除
root@dev:~# echo "Hello World 12345 " | tr -cd [:alpha:]
--------------------------------------------------------------------
HelloWorld

3.3 压缩字符

# 压缩指定重复字符
root@dev:~# echo "HHHHHHHHellooooo Woooorrrrrrrrrldddddddddddddddddd" | tr -s 'Hord'
------------------------------------------------------------------------------------
Hello World
# 压缩重复字符,小写转换大写
root@dev:~# echo "Hello World" | tr -s [:lower:] [:upper:]
------------------------------------------------------------------------------------
HELO WORLD

感谢您花时间阅读文章

关注公众号不迷路

 

滑翔的纸飞机
追逐技术,打破黑箱,分享技术干货
20篇原创内容

/////往期精彩/////

Linux 命令:dmesg | uname

Linux 命令:lsof(列出打开的文件)

滑翔的纸飞机
收录于合集 #linux
 5
上一篇Linux 命令: dmesg | uname
 
阅读 506
滑翔的纸飞机
 
 
关注后可发消息