ld.so.nohwcap）用于valgrind＆gdblogging

具有glibc的现代x86_64 linux将检测到CPU支持AVX扩展，并将许多string函数从通用实现切换到AVX优化版本（借助于ifunc dispatchers： 1，2 ）。

这个特性对性能有好处，但是它可以防止valgrind（ valgrind-3.8之前的老版本libVEXs ）和gdb的“ target record ”（ Reverse Execution ）无法正常工作（Ubuntu“Z”17.04 beta， gdb 7.12 .50.20170207- 0ubuntu2，gcc 6.3.0-8ubuntu1 20170221，Ubuntu GLIBC 2.24-7ubuntu2）：

 $ cat ac #include <string.h> #define N 1000 int main(){ char src[N], dst[N]; memcpy(dst, src, N); return 0; } $ gcc ac -oa -fno-builtin $ gdb -q ./a Reading symbols from ./a...(no debugging symbols found)...done. (gdb) start Temporary breakpoint 1 at 0x724 Starting program: /home/user/src/a Temporary breakpoint 1, 0x0000555555554724 in main () (gdb) record (gdb) c Continuing. Process record does not support instruction 0xc5 at address 0x7ffff7b60d31. Process record: failed to record execution log. Program stopped. __memmove_avx_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:416 416 VMOVU (%rsi), %VEC(4) (gdb) x/i $pc => 0x7ffff7b60d31 <__memmove_avx_unaligned_erms+529>: vmovdqu (%rsi),%ymm4

由于logging/重放引擎不支持AVX指令（有时在_dl_runtime_resolve_avx函数中检测到问题），因此gdb的“目标logging”实现中存在错误消息“ Process record does not support instruction 0xc5 ”： https：// sourceware .org / ml / gdb / 2016-08 / msg00028.html “有些AVX指令不支持进程logging”， https: //bugs.launchpad.net/ubuntu/+source/gdb/+bug/1573786，https ： //bugs.debian.org/cgi-bin/bugreport.cgi?bug=836802，https://bugzilla.redhat.com/show_bug.cgi?id=1136403

解决schemebuild议在https://sourceware.org/ml/gdb/2016-08/msg00028.html “你可以在运行时重新编译libc（因此ld.so），或者破解__init_cpu_features，因此__cpu_features（见例如strcmp）。或者设置LD_BIND_NOW=1 ，但是重新编译的glibc仍然有AVX，而ld bind-现在不起作用。

我听说在glibc中有/etc/ld.so.nohwcap和LD_HWCAP_MASKconfiguration。他们可以用来禁用ifunc调度到glibc AVX优化的string函数？

glibc（rtld？）如何使用cpuid ， /proc/cpuinfo （可能不是）或HWCAP aux （ LD_SHOW_AUXV=1 /bin/echo |grep HWCAP命令给出AT_HWCAP: bfebfbff ）来检测AVX？

Windows中的另一个GSL链接错误

不是最好的或完整的解决方案，只是一个最小的位编辑kludge允许valgrind和gdb记录我的任务。

Lekensteyn问：

如何在不重新编译glibc的情况下屏蔽掉AVX / SSE

我做了完全重建的未经修改的glibc，这在debian和ubuntu中是相当容易的：只是sudo apt-get source glibc ， sudo apt-get build-dep glibc和cd glibc-*/; dpkg-buildpackage -us -uc cd glibc-*/; dpkg-buildpackage -us -uc （手动获取ld.so，没有剥离的调试信息。

然后，我在__get_cpu_features使用的函数中对输出ld.so文件进行二进制（位）修补。目标函数是从源文件sysdeps/x86/cpu-features.c的get_common_indeces.constprop.1以get_common_indeces.constprop.1的名称get_common_indeces.constprop.1 （ __get_cpu_features在二进制代码中的__get_cpu_features之后）。它有几个cpuids，第一个是cpuid eax=1 “Processor Info and Feature Bits” ; 稍后检查“jle 0x6”并绕着代码“ cpuid eax=7 ecx=0 Extended Features”跳转以获得AVX2状态。有这个逻辑编译的代码：

 get_common_indeces (struct cpu_features *cpu_features, unsigned int *family, unsigned int *model, unsigned int *extended_model, unsigned int *stepping) { ... if (cpu_features->max_cpuid >= 7) __cpuid_count (7, 0, cpu_features->cpuid[COMMON_CPUID_INDEX_7].eax, cpu_features->cpuid[COMMON_CPUID_INDEX_7].ebx, cpu_features->cpuid[COMMON_CPUID_INDEX_7].ecx, cpu_features->cpuid[COMMON_CPUID_INDEX_7].edx);

在__cpuid (0, cpu_features->max_cpuid, ebx, ecx, edx); cpu_features->max_cpuid被填充到同一文件的__cpuid (0, cpu_features->max_cpuid, ebx, ecx, edx); 线。通过用jg （字节0x7e到0x7f）替换cmp 0x6之后的jle来禁用if语句更容易。（实际上这个二进制补丁已经被手动重新应用到了真正的系统ld-linux.so.2 __get_cpu_features函数中 – mov 7 eax; xor ecx,ecx; cpuid之前的第一个jle mov 7 eax; xor ecx,ecx; cpuid变成了jg。）

重新编译的软件包和修改后的ld.so没有安装到系统中; 我使用了ld.so ./my_program （或mv ld.so /some/short/path.so和patchelf --set-interpreter ./my_program ）的命令行语法。

其他可能的解决方

尝试使用更新的valgrind＆gdb记录工具
尝试使用旧的glibc
如果没有完成，在gdb记录中实现缺少的指令模拟
在glibc中if (cpu_features->max_cpuid >= 7)源代码并重新编译
在glibc中使用支持avx2的字符串函数修补源代码并重新编译

似乎没有一个简单的运行时方法来修补功能检测。这个检测发生在动态链接器（ld.so）的早期。

二进制补丁链接器似乎是目前最简单的方法。 @osgx 描述了跳转被覆盖的一种方法。另一种方法就是伪造cpuid结果。通常， cpuid(eax=0)返回eax支持的最高功能，而制造商ID则返回到寄存器ebx，ecx和edx中。我们在glibc 2.25 sysdeps/x86/cpu-features.c有这个代码片段：

 __cpuid (0, cpu_features->max_cpuid, ebx, ecx, edx); /* This spells out "GenuineIntel". */ if (ebx == 0x756e6547 && ecx == 0x6c65746e && edx == 0x49656e69) { /* feature detection for various Intel CPUs */ } /* another case for AMD */ else { kind = arch_kind_other; get_common_indeces (cpu_features, NULL, NULL, NULL, NULL); }

__cpuid行转换为/lib/ld-2.25.so ）中的这些指令：

 172a8: 31 c0 xor eax,eax 172aa: c7 44 24 38 00 00 00 mov DWORD PTR [rsp+0x38],0x0 172b1: 00 172b2: c7 44 24 3c 00 00 00 mov DWORD PTR [rsp+0x3c],0x0 172b9: 00 172ba: 0f a2 cpuid

因此，而不是修补分支，我们可以将cpuid更改为nop指令，这将导致调用最后一个分支（因为寄存器不包含“GenuineIntel”）。因为最初eax=0 ， cpu_features->max_cpuid也将是0，并且if (cpu_features->max_cpuid >= 7)也将被绕过。

二进制补丁cpuid(eax=0)可以通过此实用程序完成（适用于x86和x86-64）：

 #!/usr/bin/env python import re import sys infile, outfile = sys.argv[1:] d = open(infile, 'rb').read() # Match CPUID(eax=0), "xor eax,eax" followed closely by "cpuid" o = re.sub(b'(\x31\xc0.{0,32})\x0f\xa2', b'\\1\x66\x90', d) assert d != o open(outfile, 'wb').write(o)

这是很容易的部分。现在，我不想替换系统范围的动态链接器，但是只用这个链接器执行一个特定的程序。当然，这可以用./ld-linux-x86-64-patched.so.2 ./a完成，但是天真的gdb调用未能设置断点：

 $ gdb -q -ex "set exec-wrapper ./ld-linux-x86-64-patched.so.2" -ex start ./a Reading symbols from ./a...done. Temporary breakpoint 1 at 0x400502: file ac, line 5. Starting program: /tmp/a During startup program exited normally. (gdb) quit $ gdb -q -ex start --args ./ld-linux-x86-64-patched.so.2 ./a Reading symbols from ./ld-linux-x86-64-patched.so.2...(no debugging symbols found)...done. Function "main" not defined. Temporary breakpoint 1 (main) pending. Starting program: /tmp/ld-linux-x86-64-patched.so.2 ./a [Inferior 1 (process 27418) exited normally] (gdb) quit

在如何使用自定义elf解释器调试程序中介绍手动解决方法？它工作，但不幸的是使用add-symbol-file的手动操作。尽管如此，应该可以使用GDB Catchpoints自动化一下。

另一种不二进制链接的方法是LD_PRELOAD定义用于memcpy ， memove等自定义例程的库。这将优先于glibc例程。 sysdeps/x86_64/multiarch/ifunc-impl-list.c提供了完整的函数列表。（ grep -Po 'IFUNC_IMPL \(i, name, \K[^,]+' sysdeps/x86_64/multiarch/ifunc-impl-list.c ）与当前的HEAD有更多的符号。

memcmp，memcmp，memcmp，_memmove_chk，memmove，memrchr，__memset_chk，memset，rawmemchr，strlen，strnlen，stpncpy，stpcpy，strcasecmp，strcasecmp_l，strcat，strchr，strchrnul，strrchr，strcmp，strcpy，strccase，strncasecmp_l，strncat， strcmp，wrschr，wcsrchr，wcscpy，wcslen，wmemchr，wmemcmp，wmemset，__memcpy_chk，memcpy，__mempcpy_chk，mempcpy，strncmp，__wmemset_chk，

我听说在glibc中有/etc/ld.so.nohwcap和LD_HWCAP_MASK配置。他们可以用来禁用ifunc调度到glibc AVX优化的字符串函数？

是：设置LD_HWCAP_MASK=0会使GLIBC假装没有任何CPU功能可用。代码

将掩码设置为0可能会引发错误，您可能需要找出控制AVX的精确位，并屏蔽该位。