我正在开发一个python应用程序,可以在多个平台上以多种语言向控制台打印文本。 该程序适用于所有UNIX平台,但在Windows中,在命令行中打印unicodestring时出现错误。
已经有一个相关的线程:( Windows cmd编码更改导致Python崩溃 ),但我找不到我的具体答案。
例如,对于以下亚洲文本,在Linux中,我可以运行:
>>> print u"\u5f15\u8d77\u7684\u6216".encode("utf-8") 引起的或
但在Windows中,我得到:
>>> print u"\u5f15\u8d77\u7684\u6216".encode("utf-8") σ╝ץΦ╡╖τתהµטצ
在做这样的事情时,我成功地用消息框显示正确的文本:
>>> file("bla.vbs", "w").write(u'MsgBox "\u5f15\u8d77\u7684\u6216", 4, "MyTitle"'.encode("utf-16")) >>> os.system("cscript //U //NoLogo bla.vbs")
但是,我希望能够在Windows控制台,最好 – 不需要太多的configuration我的Python代码以外(因为我的应用程序将分配给许多主机)。
这可能吗?
编辑:如果这是不可能的 – 我会很乐意接受一些其他的build议,在窗口中显示unicode编写控制台应用程序,例如一个替代Windows控制台的python实现
有一个WriteConsoleW解决方案提供unicode argv和stdout(打印),但不是标准输入: Windows cmd编码更改导致Python崩溃
我修改的唯一的东西是sys.argv保持unicode。 原始版本utf-8编码它出于某种原因。
#!/usr/bin/env python # -*- coding: utf-8 -*- """ https://stackoverflow.com/questions/878972/windows-cmd-encoding-change-causes-python-crash#answer-3259271 """ import sys if sys.platform == "win32": import codecs from ctypes import WINFUNCTYPE, windll, POINTER, byref, c_int from ctypes.wintypes import BOOL, HANDLE, DWORD, LPWSTR, LPCWSTR, LPVOID original_stderr = sys.stderr # If any exception occurs in this code, we'll probably try to print it on stderr, # which makes for frustrating debugging if stderr is directed to our wrapper. # So be paranoid about catching errors and reporting them to original_stderr, # so that we can at least see them. def _complain(message): print >>original_stderr, message if isinstance(message, str) else repr(message) # Work around <http://bugs.python.org/issue6058>. codecs.register(lambda name: codecs.lookup('utf-8') if name == 'cp65001' else None) # Make Unicode console output work independently of the current code page. # This also fixes <http://bugs.python.org/issue1602>. # Credit to Michael Kaplan <http://www.siao2.com/2010/04/07/9989346.aspx> # and TZOmegaTZIOY # <https://stackoverflow.com/questions/878972/windows-cmd-encoding-change-causes-python-crash/1432462#1432462>. try: # <http://msdn.microsoft.com/en-us/library/ms683231(VS.85).aspx> # HANDLE WINAPI GetStdHandle(DWORD nStdHandle); # returns INVALID_HANDLE_VALUE, NULL, or a valid handle # # <http://msdn.microsoft.com/en-us/library/aa364960(VS.85).aspx> # DWORD WINAPI GetFileType(DWORD hFile); # # <http://msdn.microsoft.com/en-us/library/ms683167(VS.85).aspx> # BOOL WINAPI GetConsoleMode(HANDLE hConsole, LPDWORD lpMode); GetStdHandle = WINFUNCTYPE(HANDLE, DWORD)(("GetStdHandle", windll.kernel32)) STD_OUTPUT_HANDLE = DWORD(-11) STD_ERROR_HANDLE = DWORD(-12) GetFileType = WINFUNCTYPE(DWORD, DWORD)(("GetFileType", windll.kernel32)) FILE_TYPE_CHAR = 0x0002 FILE_TYPE_REMOTE = 0x8000 GetConsoleMode = WINFUNCTYPE(BOOL, HANDLE, POINTER(DWORD))(("GetConsoleMode", windll.kernel32)) INVALID_HANDLE_VALUE = DWORD(-1).value def not_a_console(handle): if handle == INVALID_HANDLE_VALUE or handle is None: return True return ((GetFileType(handle) & ~FILE_TYPE_REMOTE) != FILE_TYPE_CHAR or GetConsoleMode(handle, byref(DWORD())) == 0) old_stdout_fileno = None old_stderr_fileno = None if hasattr(sys.stdout, 'fileno'): old_stdout_fileno = sys.stdout.fileno() if hasattr(sys.stderr, 'fileno'): old_stderr_fileno = sys.stderr.fileno() STDOUT_FILENO = 1 STDERR_FILENO = 2 real_stdout = (old_stdout_fileno == STDOUT_FILENO) real_stderr = (old_stderr_fileno == STDERR_FILENO) if real_stdout: hStdout = GetStdHandle(STD_OUTPUT_HANDLE) if not_a_console(hStdout): real_stdout = False if real_stderr: hStderr = GetStdHandle(STD_ERROR_HANDLE) if not_a_console(hStderr): real_stderr = False if real_stdout or real_stderr: # BOOL WINAPI WriteConsoleW(HANDLE hOutput, LPWSTR lpBuffer, DWORD nChars, # LPDWORD lpCharsWritten, LPVOID lpReserved); WriteConsoleW = WINFUNCTYPE(BOOL, HANDLE, LPWSTR, DWORD, POINTER(DWORD), LPVOID)(("WriteConsoleW", windll.kernel32)) class UnicodeOutput: def __init__(self, hConsole, stream, fileno, name): self._hConsole = hConsole self._stream = stream self._fileno = fileno self.closed = False self.softspace = False self.mode = 'w' self.encoding = 'utf-8' self.name = name self.flush() def isatty(self): return False def close(self): # don't really close the handle, that would only cause problems self.closed = True def fileno(self): return self._fileno def flush(self): if self._hConsole is None: try: self._stream.flush() except Exception as e: _complain("%s.flush: %r from %r" % (self.name, e, self._stream)) raise def write(self, text): try: if self._hConsole is None: if isinstance(text, unicode): text = text.encode('utf-8') self._stream.write(text) else: if not isinstance(text, unicode): text = str(text).decode('utf-8') remaining = len(text) while remaining: n = DWORD(0) # There is a shorter-than-documented limitation on the # length of the string passed to WriteConsoleW (see # <http://tahoe-lafs.org/trac/tahoe-lafs/ticket/1232>. retval = WriteConsoleW(self._hConsole, text, min(remaining, 10000), byref(n), None) if retval == 0 or n.value == 0: raise IOError("WriteConsoleW returned %r, n.value = %r" % (retval, n.value)) remaining -= n.value if not remaining: break text = text[n.value:] except Exception as e: _complain("%s.write: %r" % (self.name, e)) raise def writelines(self, lines): try: for line in lines: self.write(line) except Exception as e: _complain("%s.writelines: %r" % (self.name, e)) raise if real_stdout: sys.stdout = UnicodeOutput(hStdout, None, STDOUT_FILENO, '<Unicode console stdout>') else: sys.stdout = UnicodeOutput(None, sys.stdout, old_stdout_fileno, '<Unicode redirected stdout>') if real_stderr: sys.stderr = UnicodeOutput(hStderr, None, STDERR_FILENO, '<Unicode console stderr>') else: sys.stderr = UnicodeOutput(None, sys.stderr, old_stderr_fileno, '<Unicode redirected stderr>') except Exception as e: _complain("exception %r while fixing up sys.stdout and sys.stderr" % (e,)) # While we're at it, let's unmangle the command-line arguments: # This works around <http://bugs.python.org/issue2128>. GetCommandLineW = WINFUNCTYPE(LPWSTR)(("GetCommandLineW", windll.kernel32)) CommandLineToArgvW = WINFUNCTYPE(POINTER(LPWSTR), LPCWSTR, POINTER(c_int))(("CommandLineToArgvW", windll.shell32)) argc = c_int(0) argv_unicode = CommandLineToArgvW(GetCommandLineW(), byref(argc)) argv = [argv_unicode[i] for i in xrange(0, argc.value)] # argv = [argv_unicode[i].encode('utf-8') for i in xrange(0, argc.value)] if not hasattr(sys, 'frozen'): # If this is an executable produced by py2exe or bbfreeze, then it will # have been invoked directly. Otherwise, unicode_argv[0] is the Python # interpreter, so skip that. argv = argv[1:] # Also skip option arguments to the Python interpreter. while len(argv) > 0: arg = argv[0] if not arg.startswith(u"-") or arg == u"-": break argv = argv[1:] if arg == u'-m': # sys.argv[0] should really be the absolute path of the module source, # but never mind break if arg == u'-c': argv[0] = u'-c' break # if you like: sys.argv = argv
使用不同的控制台程序。 下面的工作在mintty中,Cygwin中默认的终端模拟器。
>>> print u"\u5f15\u8d77\u7684\u6216"引起的或
还有其他的控制台替代品可用于Windows,但我还没有评估他们的Unicode支持。
它只是来自cmd和powershell consoel不支持可变宽度的字体。 固定的字体没有包含中文脚本。 Cygwin也是如此。
腻子更先进,支持可变宽度的字体,包括西里尔文,越南文,阿拉伯文,但目前还没有中文。
HTH
你可以尝试使用Windows上的程序iconv
,并通过它管道你的Python输出? 它会像这样的东西:
python foo.py | iconv -f utf-8 -t utf-16
你可能需要做一些工作才能在Windows上获得iconv
– 这是Cygwin的一部分,但如果需要的话,你也许可以单独构建它。
这个问题在PrintFails文章中得到了解答 。
默认情况下,Microsoft Windows中的控制台仅显示256个字符(原始IBM-PC 1981扩展ASCII字符集, 代码页437的 cp437)。
对于俄罗斯这意味着CP866,其他国家也使用自己的代码页。 这意味着要正确读取Windows控制台中的Python输出,您应该使用配置为显示打印符号的本机代码页进行Windows配置。
我建议你总是打印Unicode文本而不进行任何编码,以确保与各种平台的最大兼容性。
如果您尝试打印不可打印的字符,您将得到UnicodeEncodeError或看到扭曲的文本。
在某些情况下,如果Python无法正确确定输出编码,可以尝试设置PYTHONIOENCODING环境变量,但请注意,这可能不适用于您的示例,因为控制台无法在当前配置中显示亚洲文本。
要重新配置控制台,请使用控制面板 – >语言和区域设置 – >高级(选项卡) – >非Unicode程序语言(部分)。 请注意,菜单名称由我从俄文翻译。
另请参阅非常类似的问题的答案。