Intereting Posts

在资源pipe理器中的文件图标上添加状态图标，如Dropbox或.NET中的SVN 有没有在Windows上运行Ruby on Rails应用程序？指定了无效的无效选项 Linux Asynch IO – aio.h和libaio.h之间的区别 Windows命令“哪里arg”在XP？如何编写一个bash脚本，使用图像magick将图像剪切成碎片？ Point Vim更新Python版本，而不是重新编译如何从ml64.exe（MSVC 64位X64汇编程序）访问线程本地存储？更改进程优先级不起作用增加Linux机器上的堆大小使用setreuid（），其中_POSIX_SAVED_IDS未设置什么时候应该在HKCR \ Interface中注册我的COM接口？如何将PDF文档的所有页面移动一英寸？为什么`ping`不能在Linux下超时？修改程序我不是所有者

TCP保持活动参数不被遵守

我正在尝试使用TCP在我的Linux机器上保持活着状态，并且写下了以下小型服务器：

#include <iostream> #include <cstring> #include <netinet/in.h> #include <arpa/inet.h> // inet_ntop #include <netinet/tcp.h> #include <netdb.h> // addrinfo stuff using namespace std; typedef int SOCKET; int main(int argc, char *argv []) { struct sockaddr_in sockaddr_IPv4; memset(&sockaddr_IPv4, 0, sizeof(struct sockaddr_in)); sockaddr_IPv4.sin_family = AF_INET; sockaddr_IPv4.sin_port = htons(58080); if (inet_pton(AF_INET, "10.6.186.24", &sockaddr_IPv4.sin_addr) != 1) return -1; SOCKET serverSock = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP); if (bind(serverSock, (sockaddr*)&sockaddr_IPv4, sizeof(sockaddr_IPv4)) != 0 || listen(serverSock, SOMAXCONN) != 0) { cout << "Failed to setup listening socket!\n"; } SOCKET clientSock = accept(serverSock, 0, 0); if (clientSock == -1) return -1; // Enable keep-alive on the client socket const int nVal = 1; if (setsockopt(clientSock, SOL_SOCKET, SO_KEEPALIVE, &nVal, sizeof(nVal)) < 0) { cout << "Failed to set keep-alive!\n"; return -1; } // Get the keep-alive options that will be used on the client socket int nProbes, nTime, nInterval; socklen_t nOptLen = sizeof(int); bool bError = false; if (getsockopt(clientSock, IPPROTO_TCP, TCP_KEEPIDLE, &nTime, &nOptLen) < 0) { bError = true; } nOptLen = sizeof(int); if (getsockopt(clientSock, IPPROTO_TCP, TCP_KEEPCNT, &nProbes, &nOptLen) < 0) {bError = true; } nOptLen = sizeof(int); if (getsockopt(clientSock, IPPROTO_TCP, TCP_KEEPINTVL, &nInterval, &nOptLen) < 0) { bError = true; } cout << "Keep alive settings are: time: " << nTime << ", interval: " << nInterval << ", number of probes: " << nProbes << "\n"; if (bError) { // Failed to retrieve values cout << "Failed to get keep-alive options!\n"; return -1; } int nRead = 0; char buf[128]; do { nRead = recv(clientSock, buf, 128, 0); } while (nRead != 0); return 0; }

然后，我调整了系统范围的TCP保持活动设置如下：

 # cat /proc/sys/net/ipv4/tcp_keepalive_time 20 # cat /proc/sys/net/ipv4/tcp_keepalive_intvl 30

然后，我从Windows连接到我的服务器，并运行Wireshark跟踪来查看保持活动的数据包。下面的图片显示了结果。

数据包1

这使我感到困惑，因为我现在明白，如果没有收到ACK来响应原来的保活包（参见我的另一个问题），保活间隔只会发挥作用。所以我希望随后的数据包能够以20秒的间隔（不是30，这是我们所看到的）持续发送，而不仅仅是第一个。

然后我调整了系统宽度设置如下：

 # cat /proc/sys/net/ipv4/tcp_keepalive_time 30 # cat /proc/sys/net/ipv4/tcp_keepalive_intvl 20

这一次，当我连接，我看到我的Wireshark跟踪以下内容：

Packets2

现在我们看到第一个保活包在30秒之后被发送，但是之后的每个包也是30秒发送的，而不是前一次运行所build议的那20个！

有人可以解释这种不一致的行为吗？

粗略地说，它应该如何工作是每隔tcp_keepalive_time秒发送一个keepalive消息。如果没有收到ACK ，它将会探测每个tcp_keepalive_intvl秒。如果在tcp_keepalive_probes之后没有收到ACK ，连接将被中止。因此，连接最多会被中止

  tcp_keepalive_time + tcp_keepalive_probes * tcp_keepalive_intvl

秒没有响应。看到这个内核文档。

我们可以使用netcat Keepalive （一种允许我们设置tcp keepalive参数的netcat版本（sysctl keepalive参数是默认参数，但是它们可以在tcp_sock结构中的每个套接字基础上被重写））轻松地观察这个工作。

首先启动一个侦听端口8888的服务器， keepalive_timer设置为5秒， keepalive_intval设置为1秒， keepalive_probes设置为4。

  $ ./nckl-linux -K -O 5 -I 1 -P 4 -l 8888 >/dev/null &

接下来，让我们使用iptables为发送给服务器的ACK数据包引入丢失：

  $ sudo iptables -A OUTPUT -p tcp --dport 8888 \ > --tcp-flags SYN,ACK,RST,FIN ACK \ > -m statistic --mode random --probability 0.5 \ > -j DROP

这将导致发送到TCP端口8888的数据包只有设置了ACK标记的概率为0.5。

现在让我们连接并观看香草netcat（将使用sysctl保持值）：

  $ nc localhost 8888

这里是捕获：

TCP保持活动捕获

正如你所看到的，它在收到一个ACK之后等待5秒，然后再发送一个保持消息。如果在1秒内未收到ACK ，则发送另一个探测，如果在4次探测后没有收到ACK ，则会中止连接。这正是keepalive应该如何工作。

所以我们试着重现你所看到的让我们删除iptables规则（不丢失），启动一个新的服务器， tcp_keepalive_time设置为1秒， tcp_keepalive_intvl设置为5秒，然后连接一个客户端。结果如下：

捕获keepalive_time <keepalive_intvl，没有损失

有趣的是，我们看到了同样的行为：在第一次ACK ，它等待1秒发送一个保持消息，之后每5秒钟一次。

让我们回过头来添加iptables规则来介绍丢失，看看在没有得到ACK情况下，实际上等待发送另一个探针的时间（在服务器上使用-K -O 1 -I 5 -P 4 ）：

用keepalive_time <keepalive_intvl捕获，丢失

同样，它从第一个ACK等待1秒钟发送一个保持消息，但是此后它等待5秒钟，不管它是否看到ACK ，好像keepalive_time和keepalive_intvl都设置为5。

为了理解这个行为，我们需要看看linux内核的TCP实现。我们先来看看tcp_finish_connect ：

  if (sock_flag(sk, SOCK_KEEPOPEN)) inet_csk_reset_keepalive_timer(sk, keepalive_time_when(tp));

当TCP连接建立时，keepalive定时器被有效地设置为tcp_keepalive_time ，在我们的例子中是1秒。

接下来，让我们来看看如何在tcp_keepalive_timer处理定时器：

  elapsed = keepalive_time_elapsed(tp); if (elapsed >= keepalive_time_when(tp)) { /* If the TCP_USER_TIMEOUT option is enabled, use that * to determine when to timeout instead. */ if ((icsk->icsk_user_timeout != 0 && elapsed >= icsk->icsk_user_timeout && icsk->icsk_probes_out > 0) || (icsk->icsk_user_timeout == 0 && icsk->icsk_probes_out >= keepalive_probes(tp))) { tcp_send_active_reset(sk, GFP_ATOMIC); tcp_write_err(sk); goto out; } if (tcp_write_wakeup(sk, LINUX_MIB_TCPKEEPALIVE) <= 0) { icsk->icsk_probes_out++; elapsed = keepalive_intvl_when(tp); } else { /* If keepalive was lost due to local congestion, * try harder. */ elapsed = TCP_RESOURCE_PROBE_INTERVAL; } } else { /* It is tp->rcv_tstamp + keepalive_time_when(tp) */ elapsed = keepalive_time_when(tp) - elapsed; } sk_mem_reclaim(sk); resched: inet_csk_reset_keepalive_timer (sk, elapsed); goto out;

当keepalive_time_when大于keepalive_itvl_when此代码按预期工作。但是，当它不是，你会看到你观察到的行为。

当初始定时器（在建立TCP连接时设置）在1秒后过期时，我们将延长定时器，直到elapsed大于keepalive_time_when 。在这一点上，我们将发送一个探测器，并设置定时器keepalive_intvl_when ，这是5秒钟。当这个定时器到期时，如果最后1秒没有收到任何东西（ keepalive_time_when ），我们将发送一个探测器，然后再次设置定时器keepalive_intvl_when ，并在5秒内唤醒，等等。

但是，如果我们在keepalive_time_when内收到了一些定时器到期的内容，它将使用keepalive_time_when来重新安排自从我们最后一次收到任何东西以来1秒的定时器。

所以，为了回答你的问题，TCP keepalive的linux实现假定keepalive_intvl小于keepalive_time ，但是仍然“合理地”工作。