在FIFO线程固定到一个核心std :: promise :: set_value不唤醒std :: future

我正在尝试创build一个具有确定性实时响应的系统。

我创build了一些cpusets ,将所有非关键任务和未固定的内核线程移动到一个集合,然后将每个实时线程固定到它自己的cpuset,每个cpu包含一个cpu。

 $ non-critical tasks and unpinned kernel threads cset proc --move --fromset=root --toset=system cset proc --kthread --fromset=root --toset=system $ realtime threads cset proc --move --toset=shield/RealtimeTest1/thread1 --pid=17651 cset proc --move --toset=shield/RealtimeTest1/thread2 --pid=17654 

我的情况是这样的:

  • 线程1: SCHED_OTHER ,固定到set1 ,等待std::future<void>
  • 线程2: SCHED_FIFO ,固定到set2 ,调用std::promise<void>::set_value()

线程1永远阻止。 但是, 如果我更改线程2以便SCHED_OTHER ,则线程1能够继续。

我运行strace -f来获得更多的洞察力; 似乎线程1正在等待futex (我假设std::future的内部),但永远不会被唤醒。

我是绝对的阻碍 – 有什么办法让一个线程自己到一个核心,并将其调度器设置为FIFO,然后使用std::promise来唤醒另一个线程,正在等待它完成这个所谓的实时设置?

thread1创buildthread2的代码如下所示:

 // Thread1: std::promise<void> p; std::future <void> f = p.get_future(); _thread = std::move(std::thread(std::bind(&Dispatcher::Run, this, std::ref(p)))); LOG_INFO << "waiting for thread2 to start" << std::endl; if (f.valid()) f.wait(); 

而thread2的Run函数如下:

 // Thread2: LOG_INFO << "started: threadId=" << Thread::GetId() << std::endl; Realtime::Service* rs = Service::Registry::Lookup<Realtime::Service>(); if (rs) rs->ConfigureThread(this->Name()); // this does the pinning and FIFO etc LOG_INFO << "thread2 has started" << std::endl; p.set_value(); // indicate fact that the thread has started 

strace输出如下:

  • 线程1是[pid 17651]
  • 线程2是[pid 17654]

为了简洁起见,我删除了一些输出。

 //////// Thread 1 creates thread 2 and waits on a future //////// [pid 17654] gettid() = 17654 [pid 17651] write(2, "09:29:52 INFO waiting for thread"..., 4309:29:52 INFO waiting for thread2 to start <unfinished ...> [pid 17654] gettid( <unfinished ...> [pid 17651] <... write resumed> ) = 43 [pid 17654] <... gettid resumed> ) = 17654 [pid 17651] futex(0xd52294, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...> [pid 17654] gettid() = 17654 [pid 17654] write(2, "09:29:52 INFO thread2 started: t"..., 6109:29:52 INFO thread2 started: threadId=17654 ) = 61 //////// <snip> thread2 performs pinning, FIFO, etc </snip> //////// [pid 17654] write(2, "09:29:52 INFO thread2 has starte"..., 3409:29:52 INFO thread2 has started ) = 34 [pid 17654] futex(0xd52294, FUTEX_CMP_REQUEUE_PRIVATE, 1, 2147483647, 0xd52268, 2) = 1 [pid 17651] <... futex resumed> ) = 0 [pid 17654] futex(0xd522c4, FUTEX_WAKE_PRIVATE, 2147483647 <unfinished ...> [pid 17651] futex(0xd52268, FUTEX_WAKE_PRIVATE, 1 <unfinished ...> [pid 17654] <... futex resumed> ) = 0 [pid 17651] <... futex resumed> ) = 0 //////// blocks here forever //////// 

您可以看到,pid 17651(thread1)报告futex resumed ,但它可能运行在错误的cpu上并阻止在作为FIFO运行的thread2后面?

更新:看来这是一个线程没有运行在它们被固定到的cpus上的问题。

top -p 17649 -Hf,j来调出last used cpu说明线程1确实在线程2的cpu上运行

 top - 10:00:59 up 18:17, 3 users, load average: 7.16, 7.61, 4.18 Tasks: 3 total, 2 running, 1 sleeping, 0 stopped, 0 zombie Cpu(s): 7.1%us, 0.1%sy, 0.0%ni, 89.5%id, 0.0%wa, 0.0%hi, 3.3%si, 0.0%st Mem: 8180892k total, 722800k used, 7458092k free, 43364k buffers Swap: 8393952k total, 0k used, 8393952k free, 193324k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ P COMMAND 17654 root -2 0 54080 35m 7064 R 100 0.4 5:00.77 3 RealtimeTest 17649 root 20 0 54080 35m 7064 S 0 0.4 0:00.05 2 RealtimeTest 17651 root 20 0 54080 35m 7064 R 0 0.4 0:00.00 3 RealtimeTest 

但是,如果我看看cpuset文件系统,我可以看到我的任务应该被固定在我请求的cpus上:

 /cpusets/shield/RealtimeTest1 $ for i in `find -name tasks`; do echo $i; cat $i; echo "------------"; done ./thread1/tasks 17651 ------------ ./main/tasks 17649 ------------ ./thread2/tasks 17654 ------------ 

显示cpusetconfiguration:

 $ cset set --list -r cset: Name CPUs-X MEMs-X Tasks Subs Path ------------ ---------- - ------- - ----- ---- ---------- root 0-23 y 0-1 y 279 2 / system 0,2,4,6,8,10 n 0 n 202 0 /system shield 1,3,5,7,9,11 n 1 n 0 2 /shield RealtimeTest1 1,3,5,7 n 1 n 0 4 /shield/RealtimeTest1 thread1 3 n 1 n 1 0 /shield/RealtimeTest1/thread1 thread2 5 n 1 n 1 0 /shield/RealtimeTest1/thread2 main 1 n 1 n 1 0 /shield/RealtimeTest1/main 

从这我可以说,thread2 应该是在CPU 5上,但顶部说,它在CPU上运行3。

有趣的是, sched_getaffinity报告了cpuset作用 – 那个thread1在cpu3上,而thread2在cpu5上。

但是,查看/proc/17649/task来查找last_cpu每个任务都运行在:

 /proc/17649/task $ for i in `ls -1`; do cat $i/stat | awk '{print $1 " is on " $(NF - 5)}'; done 17649 is on 2 17651 is on 3 17654 is on 3 

sched_getaffinity报告一件事情,但现实是另一回事

有趣的是, main线程[ pid 17649 ]应该在cpu 1上(根据cset输出),但实际上它在cpu 2上运行(在另一个套接字上)

所以我会说, cpuset不工作?

我的机器configuration是:

 $ cat /etc/SuSE-release SUSE Linux Enterprise Server 11 (x86_64) VERSION = 11 PATCHLEVEL = 1 $ uname -a Linux foobar 2.6.32.12-0.7-default #1 SMP 2010-05-20 11:14:20 +0200 x86_64 x86_64 x86_64 GNU/Linux 

我已经在SLES 11 / SP 2盒上重新进行了测试,并且锁定工作。

因此,我将把这个标记作为答案,即:这是与SP 1相关的问题