Commit 02fd58e5 authored by Marcus Gelderie's avatar Marcus Gelderie Committed by Kamal Mostafa

ipc: modify message queue accounting to not take kernel data structures into account

commit de54b9ac upstream.

A while back, the message queue implementation in the kernel was
improved to use btrees to speed up retrieval of messages, in commit
d6629859 ("ipc/mqueue: improve performance of send/recv").

That patch introducing the improved kernel handling of message queues
(using btrees) has, as a by-product, changed the meaning of the QSIZE
field in the pseudo-file created for the queue.  Before, this field
reflected the size of the user-data in the queue.  Since, it also takes
kernel data structures into account.  For example, if 13 bytes of user
data are in the queue, on my machine the file reports a size of 61
bytes.

There was some discussion on this topic before (for example
https://lkml.org/lkml/2014/10/1/115).  Commenting on a th lkml, Michael
Kerrisk gave the following background
(https://lkml.org/lkml/2015/6/16/74):

    The pseudofiles in the mqueue filesystem (usually mounted at
    /dev/mqueue) expose fields with metadata describing a message
    queue. One of these fields, QSIZE, as originally implemented,
    showed the total number of bytes of user data in all messages in
    the message queue, and this feature was documented from the
    beginning in the mq_overview(7) page. In 3.5, some other (useful)
    work happened to break the user-space API in a couple of places,
    including the value exposed via QSIZE, which now includes a measure
    of kernel overhead bytes for the queue, a figure that renders QSIZE
    useless for its original purpose, since there's no way to deduce
    the number of overhead bytes consumed by the implementation.
    (The other user-space breakage was subsequently fixed.)

This patch removes the accounting of kernel data structures in the
queue.  Reporting the size of these data-structures in the QSIZE field
was a breaking change (see Michael's comment above).  Without the QSIZE
field reporting the total size of user-data in the queue, there is no
way to deduce this number.

It should be noted that the resource limit RLIMIT_MSGQUEUE is counted
against the worst-case size of the queue (in both the old and the new
implementation).  Therefore, the kernel overhead accounting in QSIZE is
not necessary to help the user understand the limitations RLIMIT imposes
on the processes.
Signed-off-by: default avatarMarcus Gelderie <redmnic@gmail.com>
Acked-by: default avatarDoug Ledford <dledford@redhat.com>
Acked-by: default avatarMichael Kerrisk <mtk.manpages@gmail.com>
Acked-by: default avatarDavidlohr Bueso <dbueso@suse.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: John Duffy <jb_duffy@btinternet.com>
Cc: Arto Bendiken <arto@bendiken.net>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
parent 9952b2a2
...@@ -143,7 +143,6 @@ static int msg_insert(struct msg_msg *msg, struct mqueue_inode_info *info) ...@@ -143,7 +143,6 @@ static int msg_insert(struct msg_msg *msg, struct mqueue_inode_info *info)
if (!leaf) if (!leaf)
return -ENOMEM; return -ENOMEM;
INIT_LIST_HEAD(&leaf->msg_list); INIT_LIST_HEAD(&leaf->msg_list);
info->qsize += sizeof(*leaf);
} }
leaf->priority = msg->m_type; leaf->priority = msg->m_type;
rb_link_node(&leaf->rb_node, parent, p); rb_link_node(&leaf->rb_node, parent, p);
...@@ -188,7 +187,6 @@ static inline struct msg_msg *msg_get(struct mqueue_inode_info *info) ...@@ -188,7 +187,6 @@ static inline struct msg_msg *msg_get(struct mqueue_inode_info *info)
"lazy leaf delete!\n"); "lazy leaf delete!\n");
rb_erase(&leaf->rb_node, &info->msg_tree); rb_erase(&leaf->rb_node, &info->msg_tree);
if (info->node_cache) { if (info->node_cache) {
info->qsize -= sizeof(*leaf);
kfree(leaf); kfree(leaf);
} else { } else {
info->node_cache = leaf; info->node_cache = leaf;
...@@ -201,7 +199,6 @@ static inline struct msg_msg *msg_get(struct mqueue_inode_info *info) ...@@ -201,7 +199,6 @@ static inline struct msg_msg *msg_get(struct mqueue_inode_info *info)
if (list_empty(&leaf->msg_list)) { if (list_empty(&leaf->msg_list)) {
rb_erase(&leaf->rb_node, &info->msg_tree); rb_erase(&leaf->rb_node, &info->msg_tree);
if (info->node_cache) { if (info->node_cache) {
info->qsize -= sizeof(*leaf);
kfree(leaf); kfree(leaf);
} else { } else {
info->node_cache = leaf; info->node_cache = leaf;
...@@ -1026,7 +1023,6 @@ SYSCALL_DEFINE5(mq_timedsend, mqd_t, mqdes, const char __user *, u_msg_ptr, ...@@ -1026,7 +1023,6 @@ SYSCALL_DEFINE5(mq_timedsend, mqd_t, mqdes, const char __user *, u_msg_ptr,
/* Save our speculative allocation into the cache */ /* Save our speculative allocation into the cache */
INIT_LIST_HEAD(&new_leaf->msg_list); INIT_LIST_HEAD(&new_leaf->msg_list);
info->node_cache = new_leaf; info->node_cache = new_leaf;
info->qsize += sizeof(*new_leaf);
new_leaf = NULL; new_leaf = NULL;
} else { } else {
kfree(new_leaf); kfree(new_leaf);
...@@ -1133,7 +1129,6 @@ SYSCALL_DEFINE5(mq_timedreceive, mqd_t, mqdes, char __user *, u_msg_ptr, ...@@ -1133,7 +1129,6 @@ SYSCALL_DEFINE5(mq_timedreceive, mqd_t, mqdes, char __user *, u_msg_ptr,
/* Save our speculative allocation into the cache */ /* Save our speculative allocation into the cache */
INIT_LIST_HEAD(&new_leaf->msg_list); INIT_LIST_HEAD(&new_leaf->msg_list);
info->node_cache = new_leaf; info->node_cache = new_leaf;
info->qsize += sizeof(*new_leaf);
} else { } else {
kfree(new_leaf); kfree(new_leaf);
} }
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment