LESSCHARSET = utf-8less似乎没有工作

我试图以less方式查看UTF-8文本文件/stream,即使我像这样调用它:

 cat file | LESSCHARSET=utf-8 less 

非ASCII兼容的UTF-8字符无法正确显示。 相反,它们的hex值在括号中突出显示,例如<F4>

在vim中用UTF-8编码读取相同的文本没有问题。 所以我觉得我调用的方式有点不对劲。

我的locale输出如下

 LANG="en_US.UTF-8" LC_COLLATE="en_US.UTF-8" LC_CTYPE="en_US.UTF-8" LC_MESSAGES="en_US.UTF-8" LC_MONETARY="en_US.UTF-8" LC_NUMERIC="en_US.UTF-8" LC_TIME="en_US.UTF-8" LC_ALL= 

我的较less版本是XCode在OSX Leopard上安装的版本:

 $ less --version | sed 's/^/ /' less 394 Copyright (C) 1984-2005 Mark Nudelman less comes with NO WARRANTY, to the extent permitted by law. For information about the terms of redistribution, see the file named README in the less distribution. Homepage: http://www.greenwoodsoftware.com/less 

locale -a | grep US | sed 's/^/ /' locale -a | grep US | sed 's/^/ /'输出以下内容:

 en_AU.US-ASCII en_CA.US-ASCII en_GB.US-ASCII en_NZ.US-ASCII en_US en_US.ISO8859-1 en_US.ISO8859-15 en_US.US-ASCII en_US.UTF-8 

  1. locale命令输出的是什么? 它是一个UTF-8语言环境吗?

  2. 你确定你的终端设置为显示UTF-8吗? echo -e '\xe2\x82\xac'产生欧元符号?

  3. 您设置的语言环境是否已安装在系统上? 列表中是否出现locale -a输出?

  4. 你用的是什么版本? (运行less --version找出。) 真的真正的旧版本甚至不支持LESSCHARSET 。 这种情况不太可能是这样,因为我有一个Debian“sarge”系统,版本less 382,如果区域设置正确,它甚至不需要LESSCHARSET。

我的猜测是你的文件不是UTF8,而是ISO8859。 (<F4>角色应该是“ô”?)

LANG=en_US.ISO-8859-1 xterm启动一个LANG=en_US.ISO-8859-1 xterm 。 然后验证语言环境(语言环境的输出应该类似于en_US.ISO-8859-1)。 然后使用较少来查看文件。 它显示正确吗?

请注意,仅使用LESSCHARSET=iso8859而不启动新终端是不够的。 LESSCHARSET认为终端不能解释iso8859,但终端可能显示UTF8,因为欧元符号正确显示。 但是由于\ xf4不是一个有效的utf8字符,终端可能会显示类似“ ”的内容。

尝试命令file file.txt 。 例如,如果输出是“ISO-8859英文文本”,则通过命令iconv -f ISO-8859-1 -t UTF-8 -o testfile.txt file.txt将文件的编码从ISO-8859更改为UTF-8 iconv -f ISO-8859-1 -t UTF-8 -o testfile.txt file.txt 。 如果less testfile.txt显示正确,用mv testfile.txt file.txt结束。

在Mac OS上,字符集必须是大写字母:

 bash-4.4$ less --version less 458 (POSIX regular expressions) Copyright (C) 1984-2012 Mark Nudelman bash-4.4$ LESSCHARSET=cp1251 less invalid charset name bash-4.4$ LESSCHARSET=CP1251 less Missing filename ("less --help" for help) 

在这里,我找到了字符集列表:

 { "ascii", NULL, "8bcccbcc18b95.b" }, { "utf-8", &utf_mode, "8bcccbcc18b95.b126.bb" }, { "iso8859", NULL, "8bcccbcc18b95.33b." }, { "latin3", NULL, "8bcccbcc18b95.33b5.b8.b15.b4.b12.b18.b12.b." }, { "arabic", NULL, "8bcccbcc18b95.33b.3b.7b2.13b.3b.b26.5b19.b" }, { "greek", NULL, "8bcccbcc18b95.33b4.2b4.b3.b35.b44.b" }, { "greek2005", NULL, "8bcccbcc18b95.33b14.b35.b44.b" }, { "hebrew", NULL, "8bcccbcc18b95.33b.b29.32b28.2b2.b" }, { "koi8-r", NULL, "8bcccbcc18b95.b." }, { "KOI8-T", NULL, "8bcccbcc18b95.b8.b6.b8.bb5b7.3b4.b4.b3.bb3b." }, { "georgianps", NULL, "8bcccbcc18b95.3b11.4b12.2b." }, { "tcvn", NULL, "b..b...bcccbccbbb7.8b95.b48.5b." }, { "TIS-620", NULL, "8bcccbcc18b95.b.4b.11b7.8b." }, { "next", NULL, "8bcccbcc18b95.bb125.bb" }, { "dos", NULL, "8bcccbcc12bc5b95.b." }, { "windows-1251", NULL, "8bcccbcc12bc5b95.b24.b." }, { "windows-1252", NULL, "8bcccbcc12bc5b95.b.b11.b.2b12.b." }, { "windows-1255", NULL, "8bcccbcc12bc5b95.b.b8.b.5b9.b.4b." }, { "ebcdic", NULL, "5bc6bcc7bcc41b.9b7.9b5.b..8b6.10b6.b9.7b9.8b8.17b3.3b9.7b9.8b8.6b10.bbb" }, { "IBM-1047", NULL, "4cbcbc3b9cbccbccbb4c6bcc5b3cbbc4bc4bccbc191.b" }, { NULL, NULL, NULL } 

和他们的别名:

 { "UTF-8", "utf-8" }, { "ANSI_X3.4-1968", "ascii" }, { "US-ASCII", "ascii" }, { "latin1", "iso8859" }, { "ISO-8859-1", "iso8859" }, { "latin9", "iso8859" }, { "ISO-8859-15", "iso8859" }, { "latin2", "iso8859" }, { "ISO-8859-2", "iso8859" }, { "ISO-8859-3", "latin3" }, { "latin4", "iso8859" }, { "ISO-8859-4", "iso8859" }, { "cyrillic", "iso8859" }, { "ISO-8859-5", "iso8859" }, { "ISO-8859-6", "arabic" }, { "ISO-8859-7", "greek" }, { "IBM9005", "greek2005" }, { "ISO-8859-8", "hebrew" }, { "latin5", "iso8859" }, { "ISO-8859-9", "iso8859" }, { "latin6", "iso8859" }, { "ISO-8859-10", "iso8859" }, { "latin7", "iso8859" }, { "ISO-8859-13", "iso8859" }, { "latin8", "iso8859" }, { "ISO-8859-14", "iso8859" }, { "latin10", "iso8859" }, { "ISO-8859-16", "iso8859" }, { "IBM437", "dos" }, { "EBCDIC-US", "ebcdic" }, { "IBM1047", "IBM-1047" }, { "KOI8-R", "koi8-r" }, { "KOI8-U", "koi8-r" }, { "GEORGIAN-PS", "georgianps" }, { "TCVN5712-1", "tcvn" }, { "NEXTSTEP", "next" }, { "windows", "windows-1252" }, /* backward compatibility */ { "CP1251", "windows-1251" }, { "CP1252", "windows-1252" }, { "CP1255", "windows-1255" }, { NULL, NULL }