现在是否值得实施一个slab分配器？

我正在从一个服务器上读取数千个同时连接的套接字客户端。客户端请求由大小约为32字节的消息组成。

我正在阅读有关slab allocator ，我想为我的应用程序使用这种特殊的技术，当我调用read从套接字获取数据（从内核缓冲区read副本数据到我select的缓冲区，我想使用一些dynamic分配的内存）。

在阅读的时候，似乎Linux内核已经在使用这种技术了。如果这是用来实现malloc or new是否仍然值得我这样做，因为分配已经高性能？

我想在没有SLABalgorithm的情况下使用堆栈分配可能会更好，但是我不确定哪个是最好的方法。

如果你是一个C程序员，当然你应该弄脏内存管理！

然而，除非你真的接近你的机器的极限，否则你可能不会遇到任何问题，除非你真的接近你的机器的极限，这看起来不太可能。但是我相信，知道自己的选择会比用别人的话更好。这里有一些想法要考虑。

静态数组

最简单的方法是使用单个全局数组请求槽，并跟踪哪些正在使用中。这意味着有多少请求的静态限制，但另一方面，没有开销和碎片没有真正的问题。只要设置极限真的很高。

这是一个示例实现。如果您对按位运算不熟悉，可能看起来有些困惑，但要点是我们有一个附加数组，每个请求位包含一个位（打开或关闭），指定是否正在使用该插槽。你可以在结构本身中添加一个“is_used”变量，但是最终会把结构填充很多，这样就会降低开销。

头文件是相当小的（顺便说一下，这是C的真正的美丽）：

 typedef struct request_s { /* your 32 bytes of information */ unsigned char data[32]; } request_t; request_t *alloc_request(void); void free_request(request_t *req);

源文件：

 #include the header file /* note: this example is not written with multithreading in mind */ /* allow a million requests (total 32 MB + 128 KB of memory) */ #define MAX_REQUESTS (1*1024*1024) static request_t g_requests[MAX_REQUESTS]; /* use one bit per request to store whether it's in use */ /* unsigned int is 32 bits. shifting right by 5 divides by 32 */ static unsigned int g_requests_used[MAX_REQUESTS >> 5]; request_t *alloc_request(void) { /* note: this is a very naive method. you really don't want to search * from the beginning every time, but i'll leave improving that as an * exercise for you. */ unsigned int word_bits; unsigned int word, bit; /* look through the bit array one word (ie, 32 bits) at a time */ for (word = 0; word < (MAX_REQUESTS >> 5); word++) { word_bits = g_requests_used[word]; /* we can tell right away whether the entire chunk of 32 requests is * in use, and avoid the inner loop */ if (word_bits == 0xFFFFFFFFU) continue; /* now we know there is a gap somewhere in this chunk, so we loop * through the 32 bits to find it */ for (bit = 0; bit < 32; bit++) { if (word_bits & (1U << bit)) continue; /* bit is set, slot is in use */ /* found a free slot */ g_requests_used[word] |= 1U << bit; return &g_requests[(word << 5) + bit]; } } /* we're all out of requests! */ return NULL; } void free_request(request_t *req) { /* make sure the request is actually within the g_requests block of * memory */ if (req >= g_requests && req < g_requests + MAX_REQUESTS) { /* find the overall index of this request. pointer arithmetic like this * is somewhat peculiar to c/c++, you may want to read up on it. */ ptrdiff_t index = req - g_requests; /* reducing a ptrdiff_t to an unsigned int isn't something you should * do without thinking about it first. but in our case, we're fine as * long as we don't allow more than 2 billion requests, not that our * computer could handle that many anyway */ unsigned int u_index = (unsigned int)index; /* do some arithmetic to figure out which bit of which word we need to * turn off */ unsigned int word = u_index >> 5; /* index / 32 */ unsigned int bit = u_index & 31; /* index % 32 */ g_requests_used[word] &= ~(1U << bit); } }

（是的，你可以编写index / 32而不是index >> 5 ，等等，编译器会为你优化它，但是我觉得不太合适）

您可以进入相当深入的位置，并在优化分配器对空闲插槽的搜索时获得相当的创造性。一个简单的想法是在最后一个分配的位置开始搜索，并在到达结尾时进行环绕。

这种方法有很多好处，但有一个很大的缺点：限制。你可能想要设置的限制高于你曾经期望的最大数量的请求。但是，如果你能做到这一点，那么你可能不会碰到你的系统的限制，那么你为什么在这里呢？

内存池（静态数组的链表）

如果您不喜欢静态限制，则可以批量分配。保持内存“池”的链接列表，每个内存拥有一定数量的请求槽。每个单独的池基本上是一个静态数组，如上所述。如果所有现有池已满，我们malloc一个新的池并将其添加到链接列表。一个游泳池基本上是这样的：

 #define REQUESTS_PER_POOL 1024 typedef struct request_pool_s request_pool_t; struct request_pool_s { request_t requests[REQUESTS_PER_POOL]; unsigned int requests_used[REQUESTS_PER_POOL >> 5]; request_pool_t *prev; request_pool_t *next; };

当交通停顿时，您将希望能够释放游泳池，否则这与静态限制几乎没有什么不同。不幸的是，在这种情况下，你将会处理碎片问题。你可能发现自己有很多稀疏使用的池，特别是如果请求有时可能会持续很长时间。直到最后一个槽都是空的，整个池才能被释放。您仍然可以节省开销（考虑到个别请求的小规模），但是处理碎片可能会将这个问题从一个小巧，优雅的解决方案转化为更多的工作而不是价值。

您可以减少每个池的请求数量，以减少碎片化的影响，但是现在我们将失去这种方法的优势。

哪一个？

首先，你应该考虑个别malloc的替代方法的主要原因是：struct的小尺寸（32字节），大量的，以及它们被创建和销毁的频率。

静态数组可以大大减少开销，但在今天这个时代很难证明是合理的。除非你的服务器运行在Arduino上。
内存池是这样一个问题的明显方向，但他们可能需要一点点工作才能顺利进行。如果这是你的胡同，那么我说去吧。
板分配器就像复杂的内存池，不限于某个单一的结构大小。由于你只有32字节的请求，所以它们对你来说是过大的，尽管也许你可以找到适合你的第三方库。

采取简单的方法，简单地将每个请求分配给每个请求是一个滑坡，最终可能会导致完全抛弃C。 ;）