运行任何英特尔®AVXfunction后,math函数需要更多的周期

我注意到运行任何英特尔AVXfunction后,math函数(如ceil,round,…)需要更多的CPU周期。

看下面的例子:

#include <stdio.h> #include <math.h> #include <immintrin.h> static unsigned long int get_rdtsc(void) { unsigned int a, d; asm volatile("rdtsc" : "=a" (a), "=d" (d)); return (((unsigned long int)a) | (((unsigned long int)d) << 32)); } #define NUM_ITERATIONS 10000000 void run_round() { unsigned long int t1, t2, res, i; double d = 3.2; t1 = get_rdtsc(); for (i = 0 ; i < NUM_ITERATIONS ; ++i) { res = round(d*i); } t2 = get_rdtsc(); printf("round res %lu total cycles %lu CPI %lu\n", res, t2 - t1, (t2 - t1) / NUM_ITERATIONS); } int main () { __m256d a; run_round(); a = _mm256_set1_pd(1); run_round(); return 0; } 

编译时使用:gcc -Wall -lm -mavx foo.c

输出是:

总计周转次数31999997 224725952 CPI 22

轮回资源31999997总周期1900864520 CPI 190

请指教。

Solutions Collecting From Web of "运行任何英特尔®AVXfunction后,math函数需要更多的周期"