(Linux 4.4)
我试图让一个内核模块通过通用Netlink发送信息给用户进程。 看来这个消息没有被用户进程成功接收 – nlmsg_unicast函数返回-111。
这是我所知道的:
我在用户进程中使用libmnl(正如您可能已经从我的暗示mnl_socket_recvfrom中猜到的那样)。
uname -a
Linux yaron-VirtualBox 4.4.0-57-generic#78-Ubuntu SMP Fri Dec 9 23:50:32 UTC 2016 x86_64 x86_64 x86_64 GNU / Linux
实际上,这是我在内核中的发送代码:
struct sk_buff *msg; struct sock *socket; struct netlink_kernel_cfg nlCfg = { .groups = 1, .flags = 0, .input = NULL, .cb_mutex = NULL, .bind = NULL, .unbind = NULL, .compare = NULL, }; void *msg_head; int retval; struct net init_net; /* Open a socket */ socket = netlink_kernel_create(&init_net, NETLINK_GENERIC, &nlCfg); if (socket == NULL) goto CmdFail; /* Allocate space */ msg = genlmsg_new(NLMSG_GOODSIZE, GFP_KERNEL); if (msg == NULL) goto CmdFail; /* Generate message header * arguments of genlmsg_put: * struct sk_buff *, * int portID, <-- this is sender portID * int netlinkSeqNum, * struct genl_family *, * int flags, * u8 command_idx */ msg_head = genlmsg_put(msg, 0, ++netlinkSeqNum, &genlFamily, 0, MYFAMILY_CMD_MYMSG); if (msg_head == NULL) goto CmdFail; /* Add a MYFAMILY_ATTR_MYCMD attribute (command to be sent) */ retval = nla_put_string(msg, MYFAMILY_ATTR_MYMSG, "Temporary message"); if (retval != 0) goto CmdFail; /* Finalize the message */ genlmsg_end(msg, msg_head); /* void inline function - no return value */ /* Send the message */ retval = nlmsg_unicast(socket, msg, userNetlinkPortID); printk("nlmsg_unicast returned %d\n", retval); if (retval != 0) goto CmdFail; netlink_kernel_release(socket); return; CmdFail: printk(KERN_ALERT "*** Failed to send command !\n"); netlink_kernel_release(socket); return;
基本上,这是我在用户进程中的接收代码:
char bufferHdr[getpagesize()]; struct nlmsghdr *nlHeader; struct genlmsghdr *nlHeaderExtraHdr; int numBytes, seq, ret_val; // Set up the header. // Function mnl_nlmsg_put_header will zero out a length of bufferHdr sufficient to hold a Netlink header, // and initialize the nlmsg_len field in that space to the size of a header. // It returns a pointer to bufferHdr. if ( (nlHeader = mnl_nlmsg_put_header(bufferHdr)) != (struct nlmsghdr *) bufferHdr ) { perror("mnl_nlmsg_put_header failed"); exit(EXIT_FAILURE); } nlHeader->nlmsg_type = genetlinkFamilyID; // Function mnl_nlmsg_put_extra_header extends the header, to allow for these extra fields. if ( (nlHeaderExtraHdr = (struct genlmsghdr *) mnl_nlmsg_put_extra_header(nlHeader, sizeof(struct genlmsghdr))) != (struct genlmsghdr *) (bufferHdr + sizeof(struct nlmsghdr)) ) { perror("mnl_nlmsg_put_extra_header failed"); exit(EXIT_FAILURE); } // No command to set // No attributes to set // Wait for a message, and process it while (1) { numBytes = mnl_socket_recvfrom(nlSocket, bufferHdr, sizeof(bufferHdr)); if (numBytes == -1) { perror("mnl_socket_recvfrom returned error"); break; } // Callback run queue handler - use it to call getMsgCallback std::cout << "received a msg, handling it" << std::endl; ret_val = mnl_cb_run(bufferHdr, numBytes, seq, portid, getMsgCallback, NULL); if (ret_val == -1) { //perror("mnl_cb_run failed"); break; } else if (ret_val == 0) break; } return ret_val;
附录:已经浏览了一些更多的内核源代码(在elixir.free-electrons.com上),我猜测我的消息从来没有进入用户进程。 build议debugging将不胜感激。
这里是我所看到的: nlmsg_unicast调用netlink_unicast ,然后调用netlink_getsockbyportid ,它看起来像这样:
static struct sock *netlink_getsockbyportid(struct sock *ssk, u32 portid) { struct sock *sock; struct netlink_sock *nlk; sock = netlink_lookup(sock_net(ssk), ssk->sk_protocol, portid); if (!sock) return ERR_PTR(-ECONNREFUSED); /* Don't bother queuing skb if kernel socket has no input function */ nlk = nlk_sk(sock); if (sock->sk_state == NETLINK_CONNECTED && nlk->dst_portid != nlk_sk(ssk)->portid) { sock_put(sock); return ERR_PTR(-ECONNREFUSED); } return sock; }
我在猜测这里的两个条件之一是踢和返回-ECONNREFUSED触发。
任何关于如何debugging这些条件是否成立的build议? 它看起来不像我可以直接从我的模块代码调用netlink_lookup或nlk_sk – 我猜这些符号没有暴露 – 也没有它们的子函数 – 一大堆符号被埋在af_netlink.h和af_netlink.c中,我想build立你的外部模块时,符号是不可用的,至less正常的方式。 (它看起来不像af_netlink.h是发行版的一部分。)