• Neil Horman's avatar
    nfs4: Ensure that ACL pages sent over NFS were not allocated from the slab (v3) · e9e3d724
    Neil Horman authored
    The "bad_page()" page allocator sanity check was reported recently (call
    chain as follows):
    
      bad_page+0x69/0x91
      free_hot_cold_page+0x81/0x144
      skb_release_data+0x5f/0x98
      __kfree_skb+0x11/0x1a
      tcp_ack+0x6a3/0x1868
      tcp_rcv_established+0x7a6/0x8b9
      tcp_v4_do_rcv+0x2a/0x2fa
      tcp_v4_rcv+0x9a2/0x9f6
      do_timer+0x2df/0x52c
      ip_local_deliver+0x19d/0x263
      ip_rcv+0x539/0x57c
      netif_receive_skb+0x470/0x49f
      :virtio_net:virtnet_poll+0x46b/0x5c5
      net_rx_action+0xac/0x1b3
      __do_softirq+0x89/0x133
      call_softirq+0x1c/0x28
      do_softirq+0x2c/0x7d
      do_IRQ+0xec/0xf5
      default_idle+0x0/0x50
      ret_from_intr+0x0/0xa
      default_idle+0x29/0x50
      cpu_idle+0x95/0xb8
      start_kernel+0x220/0x225
      _sinittext+0x22f/0x236
    
    It occurs because an skb with a fraglist was freed from the tcp
    retransmit queue when it was acked, but a page on that fraglist had
    PG_Slab set (indicating it was allocated from the Slab allocator (which
    means the free path above can't safely free it via put_page.
    
    We tracked this back to an nfsv4 setacl operation, in which the nfs code
    attempted to fill convert the passed in buffer to an array of pages in
    __nfs4_proc_set_acl, which gets used by the skb->frags list in
    xs_sendpages.  __nfs4_proc_set_acl just converts each page in the buffer
    to a page struct via virt_to_page, but the vfs allocates the buffer via
    kmalloc, meaning the PG_slab bit is set.  We can't create a buffer with
    kmalloc and free it later in the tcp ack path with put_page, so we need
    to either:
    
    1) ensure that when we create the list of pages, no page struct has
       PG_Slab set
    
     or
    
    2) not use a page list to send this data
    
    Given that these buffers can be multiple pages and arbitrarily sized, I
    think (1) is the right way to go.  I've written the below patch to
    allocate a page from the buddy allocator directly and copy the data over
    to it.  This ensures that we have a put_page free-able page for every
    entry that winds up on an skb frag list, so it can be safely freed when
    the frame is acked.  We do a put page on each entry after the
    rpc_call_sync call so as to drop our own reference count to the page,
    leaving only the ref count taken by tcp_sendpages.  This way the data
    will be properly freed when the ack comes in
    
    Successfully tested by myself to solve the above oops.
    
    Note, as this is the result of a setacl operation that exceeded a page
    of data, I think this amounts to a local DOS triggerable by an
    uprivlidged user, so I'm CCing security on this as well.
    Signed-off-by: default avatarNeil Horman <nhorman@tuxdriver.com>
    CC: Trond Myklebust <Trond.Myklebust@netapp.com>
    CC: security@kernel.org
    CC: Jeff Layton <jlayton@redhat.com>
    Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
    e9e3d724
nfs4proc.c 150 KB