使用2个线程sorting2个数组需要的时间比逐个排列2个数组要多

我有2个未sorting的数组和2个这些数组的副本。 我使用两个不同的线程来sorting两个数组,然后我sorting其他两个未sorting的数组一个接一个。 我认为线程过程会更快,但不是,线程如何花费更多的时间?

#include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <time.h> #include <pthread.h> struct thread_data { int count; unsigned int *arr; }; struct thread_data thread_data_array[2]; void insertionSort(unsigned int arr[], int n) { int i, key, j; for (i = 1; i < n; i++) { key = arr[i]; j = i-1; while (j >= 0 && arr[j] > key) { arr[j+1] = arr[j]; j = j-1; } arr[j+1] = key; } } void *sortAndMergeArrays(void *threadarg) { int count; unsigned int *arr; struct thread_data *my_data; my_data = (struct thread_data *) threadarg; count = my_data->count; arr = my_data->arr; insertionSort(arr, count); pthread_exit(NULL); } int main(int argc, char *argv[]) { int count, i, rc; clock_t start, end, total_t; pthread_t threads[2]; //get the loop count. If loop count is not provided take 10000 as default loop count. if(argc == 2){ count = atoi(argv[1]); } else{ count = 10000; } unsigned int arr1[count], arr2[count], copyArr1[count], copyArr2[count]; srand(time(0)); for(i = 0; i<count; i++){ arr1[i] = rand(); arr2[i] = rand(); copyArr1[i] = arr1[i]; copyArr2[i] = arr2[i]; } start = clock(); for(int t=0; t<2; t++) { thread_data_array[t].count = count; if(t==0) thread_data_array[t].arr = arr1; else thread_data_array[t].arr = arr2; rc = pthread_create(&threads[t], NULL, sortAndMergeArrays, (void *) &thread_data_array[t]); if (rc) { printf("ERROR; return code from pthread_create() is %d\n", rc); exit(-1); } } pthread_join(threads[0], NULL); pthread_join(threads[1], NULL); end = clock(); total_t = (double)(end - start); printf("Total time taken by CPU to sort using threads: %d\n", total_t); start = clock(); insertionSort(copyArr1, count); insertionSort(copyArr2, count); end = clock(); total_t = (double)(end - start); printf("Total time taken by CPU to sort sequentially: %d\n", total_t); pthread_exit(NULL); } 

我正在使用Linux服务器来执行代码。 首先,我随机填充数组并将它们复制到两个单独的数组中。 对于前两个数组,我使用pthread创build了两个线程,并将两个数组传递给它们,并使用插入sorting对它们进行sorting。 而对于另外两个数组,我只是一个一个的sorting。

我期望通过使用线程我会减less执行时间,但实际上需要更多的时间。

诊断

你几乎同时得到的原因 – 从线程代码中得到的时间比顺序代码稍微多一点 – 是clock()度量CPU时间,两种排序方式几乎需要相同的CPU时间,因为它们是做同样的工作(而线程数可能稍微大些,因为有时间设置和拆卸线程)。

clock()函数应该返回实现最接近处理器所用时间的时间,因为从实现定义的时代开始只涉及进程调用。

BSD(macOS)手册页:

clock()函数确定调用进程后使用的处理器时间量,以CLOCKS_PER_SECs秒为单位度量。

排序两个数组所花费的CPU时间基本相同; 不同的是线程的开销(或多或少)。

修改后的代码

我有一套函数可以使用clock_gettime()来代替(在timer.ctimer.h在GitHub代码)。 这些测量挂钟时间 – 流逝的时间,而不是CPU时间。

下面是你的代码的一个微调的版本 – 实质性的改变是将排序函数中的key类型从int改为unsigned int以匹配数组中的数据,并将%d的转换规范修改为%ld以匹配类型由GCC确定为clock_t 。 我温和地调整了参数处理和计时消息,以使它们长度一致,并添加了经过时间测量代码:

 #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <time.h> #include <pthread.h> #include "timer.h" struct thread_data { int count; unsigned int *arr; }; struct thread_data thread_data_array[2]; static void insertionSort(unsigned int arr[], int n) { for (int i = 1; i < n; i++) { unsigned int key = arr[i]; int j = i - 1; while (j >= 0 && arr[j] > key) { arr[j + 1] = arr[j]; j = j - 1; } arr[j + 1] = key; } } static void *sortAndMergeArrays(void *threadarg) { int count; unsigned int *arr; struct thread_data *my_data; my_data = (struct thread_data *)threadarg; count = my_data->count; arr = my_data->arr; insertionSort(arr, count); pthread_exit(NULL); } int main(int argc, char *argv[]) { int count = 10000; int i, rc; clock_t start, end, total_t; pthread_t threads[2]; // get the loop count. If loop count is not provided take 10000 as default loop count. if (argc == 2) count = atoi(argv[1]); unsigned int arr1[count], arr2[count], copyArr1[count], copyArr2[count]; srand(time(0)); for (i = 0; i < count; i++) { arr1[i] = rand(); arr2[i] = rand(); copyArr1[i] = arr1[i]; copyArr2[i] = arr2[i]; } Clock clk; clk_init(&clk); start = clock(); clk_start(&clk); for (int t = 0; t < 2; t++) { thread_data_array[t].count = count; if (t == 0) thread_data_array[t].arr = arr1; else thread_data_array[t].arr = arr2; rc = pthread_create(&threads[t], NULL, sortAndMergeArrays, (void *)&thread_data_array[t]); if (rc) { printf("ERROR; return code from pthread_create() is %d\n", rc); exit(-1); } } pthread_join(threads[0], NULL); pthread_join(threads[1], NULL); clk_stop(&clk); end = clock(); char buffer[32]; printf("Elapsed using threads: %ss\n", clk_elapsed_us(&clk, buffer, sizeof(buffer))); total_t = (double)(end - start); printf("CPU time using threads: %ld\n", total_t); start = clock(); clk_start(&clk); insertionSort(copyArr1, count); insertionSort(copyArr2, count); clk_stop(&clk); end = clock(); printf("Elapsed sequentially: %ss\n", clk_elapsed_us(&clk, buffer, sizeof(buffer))); total_t = (double)(end - start); printf("CPU time sequentially: %ld\n", total_t); return 0; } 

结果

运行示例程序(程序inssortthread23 ) – 运行MacBook Pro(15“2016),带有16 GiB RAM和2.7 GHz Intel Core i7 CPU,运行macOS High Sierra 10.13,使用GCC 7.2.0进行编译。例如,浏览器没有被主动使用,没有音乐或视频播放,没有下载正在进行等等(这些事情对于基准测试来说很重要)。

 $ inssortthread23 100000 Elapsed using threads: 1.060299 s CPU time using threads: 2099441 Elapsed sequentially: 2.146059 s CPU time sequentially: 2138465 $ inssortthread23 200000 Elapsed using threads: 4.332935 s CPU time using threads: 8616953 Elapsed sequentially: 8.496348 s CPU time sequentially: 8469327 $ inssortthread23 300000 Elapsed using threads: 9.984021 s CPU time using threads: 19880539 Elapsed sequentially: 20.000900 s CPU time sequentially: 19959341 $ 

结论

在这里,你可以清楚地看到:

  1. 非线程代码所花费的时间大约是线程代码的两倍。
  2. 线程和非线程代码的CPU时间几乎相同。
  3. 总体时间是排序行数的二次方。

所有这些都非常符合预期 – 一旦你意识到clock()是测量CPU时间,而不是经过时间。

小难题

你也可以看到,我的线程CPU时间稍微小于顺序排序的CPU时间,有些时候。 我对此没有解释 – 我认为它“在噪音中消失”,尽管效果是持久的:

 $ inssortthread23 100000 Elapsed using threads: 1.051229 s CPU time using threads: 2081847 Elapsed sequentially: 2.138538 s CPU time sequentially: 2132083 $ inssortthread23 100000 Elapsed using threads: 1.053656 s CPU time using threads: 2089886 Elapsed sequentially: 2.128908 s CPU time sequentially: 2122983 $ inssortthread23 100000 Elapsed using threads: 1.058283 s CPU time using threads: 2093644 Elapsed sequentially: 2.126402 s CPU time sequentially: 2120625 $ $ inssortthread23 200000 Elapsed using threads: 4.259660 s CPU time using threads: 8479978 Elapsed sequentially: 8.872929 s CPU time sequentially: 8843207 $ inssortthread23 200000 Elapsed using threads: 4.463954 s CPU time using threads: 8883267 Elapsed sequentially: 8.603401 s CPU time sequentially: 8580240 $ inssortthread23 200000 Elapsed using threads: 4.227154 s CPU time using threads: 8411582 Elapsed sequentially: 8.816412 s CPU time sequentially: 8797965 $