在pthread程序中，例程的额外执行时间是多less？

我写了四个不同的程序来计算两个文件中的总字数。这四个版本看起来大致相同。前三个版本使用两个线程来计数，只是三个语句的顺序不同。最后一个版本使用一个线程来计数。我会先列出每个版本的不同部分和通用部分，然后列出每个版本的输出和我的问题。

不同的部分：

// version 1 count_words(&file1); pthread_create(&new_thread, NULL, count_words, &file2); pthread_join(new_thread, NULL); // version 2 pthread_create(&new_thread, NULL, count_words, &file2); count_words(&file1); pthread_join(new_thread, NULL); // version 3 pthread_create(&new_thread, NULL, count_words, &file2); pthread_join(new_thread, NULL); count_words(&file1); // version 4 count_words(&file1); count_words(&file2);

通用部分:( 将不同的部分插入到这个通用部分来制作一个完整的版本 ）

 #include <stdio.h> #include <pthread.h> #include <ctype.h> #include <stdlib.h> #include <time.h> #define N 2000 typedef struct file_t { char *name; int words; } file_t; double time_diff(struct timespec *, struct timespec *); void *count_words(void *); // Usage: progname file1 file2 int main(int argc, char *argv[]) { pthread_t new_thread; file_t file1, file2; file1.name = argv[1]; file1.words = 0; file2.name= argv[2]; file2.words = 0; // Insert different part here printf("Total words: %d\n", file1.words+file2.words); return 0; } void *count_words(void *arg) { FILE *fp; file_t *file = (file_t *)arg; int i, c, prevc = '\0'; struct timespec process_beg, process_end; struct timespec thread_beg, thread_end; double process_diff, thread_diff; clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &process_beg); clock_gettime(CLOCK_THREAD_CPUTIME_ID, &thread_beg); fp = fopen(file->name, "r"); for (i = 0; i < N; i++) { while ((c = getc(fp)) != EOF) { if (!isalnum(c) && isalnum(prevc)) file->words++; prevc = c; } fseek(fp, 0, SEEK_SET); } fclose(fp); clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &process_end); clock_gettime(CLOCK_THREAD_CPUTIME_ID, &thread_end); process_diff = time_diff(&process_beg, &process_end); thread_diff = time_diff(&thread_beg, &thread_end); printf("count_words() in %s takes %.3fs process time and" "%.3fs thread time\n", file->name, process_diff, thread_diff); return NULL; } double time_diff(struct timespec *beg, struct timespec *end) { return ((double)end->tv_sec + (double)end->tv_nsec*1.0e-9) - ((double)beg->tv_sec + (double)beg->tv_nsec*1.0e-9); }

注意

file1是一个10000字“word”的文件。 file2是由cp命令创build的file1的副本。
为了使执行时间足够长，程序会重复计算字数。 N是循环的数目。所以结果不是准确的总词数，而是乘以N.
请不要太重视计数algorithm。我只关心这个例子中的执行时间。
重要信息 ：该机器是英特尔®赛扬®CPU 420 @ 1.60GHz。一个核心。操作系统是Linux 3.2.0。也许一个核心是像其他人所说的这个奇怪的现象的原因。但我仍然想弄明白。

程序对字进行计数并使用clock_gettime（）来计算进程cpu时间和例程count_words（）的线程cpu时间，然后输出时间和字数。下面是输出和我的问题的评论。如果有人能解释什么是额外的时间，我将非常感激。

 // version 1 count_words() in file1 takes 2.563s process time and 2.563s thread time count_words() in file2 takes 8.374s process time and 8.374s thread time Total words: 40000000

注释：原始线程完成count_words（）并等待新线程死亡。当在新线程中运行count_words（）时，不会发生上下文切换（因为进程时间==线程时间）。 为什么需要这么多时间？ count_words（）在新线程中会发生什么？

 // version 2 count_words() in file1 takes 16.755s process time and 8.377s thread time count_words() in file2 takes 16.753s process time and 8.380s thread time Total words: 40000000

评论：两条线程在这里并行运行。发生上下文切换，所以进程时间>线程时间。

 // version 3 count_words() in file2 takes 8.374s process time and 8.374s thread time count_words() in file1 takes 8.365s process time and 8.365s thread time Total words: 40000000

评论：新线程首先计算，原始线程等待它。新线程join后，原始线程开始计数。 他们都没有上下文切换，为什么这么多的时间，特别是在新线程join后的计数？

 // version 4 count_words() in file1 takes 2.555s process time and 2.555s thread time count_words() in file2 takes 2.556s process time and 2.556s thread time Total words: 40000000

评论：最快的版本。没有创build新的线程。 count_words（）都在一个线程中运行。

这可能是因为创建任何线程强制libc在getc使用同步。这使这个功能显着变慢。下面的例子对我来说就像版本3一样慢：

 void *skip(void *p){ return NULL; }; pthread_create(&new_thread, NULL, skip, NULL); count_words(&file1); count_words(&file2);

要解决这个问题，你可以使用一个缓冲区：

 for (i = 0; i < N; i++) { char buffer[BUFSIZ]; int read; do { read = fread(buffer, 1, BUFSIZ, fp); int j; for(j = 0; j < read; j++) { if (!isalnum(buffer[j]) && isalnum(prevc)) file->words++; prevc = buffer[j]; } } while(read == BUFSIZ); fseek(fp, 0, SEEK_SET); }

在这个解决方案中，IO函数被调用的时间不足以使同步开销微不足道。这不仅解决了怪异的时间问题，而且使其速度提高了几倍。对于我来说，它从0.54s （无线程）或0.85s （线程）减少到0.15s （在这两种情况下）。