1. 04 Oct, 2012 7 commits
    • Eric Dumazet's avatar
      team: set qdisc_tx_busylock to avoid LOCKDEP splat · b3c581d5
      Eric Dumazet authored
      If a qdisc is installed on a team device, its possible to get
      a lockdep splat under stress, because nested dev_queue_xmit() can
      lock busylock a second time (on a different device, so its a false
      positive)
      
      Avoid this problem using a distinct lock_class_key for team
      devices.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Jiri Pirko <jpirko@redhat.com>
      Acked-by: default avatarJiri Pirko <jiri@resnulli.us>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b3c581d5
    • Eric Dumazet's avatar
      bonding: set qdisc_tx_busylock to avoid LOCKDEP splat · 49ee4920
      Eric Dumazet authored
      If a qdisc is installed on a bonding device, its possible to get
      following lockdep splat under stress :
      
       =============================================
       [ INFO: possible recursive locking detected ]
       3.6.0+ #211 Not tainted
       ---------------------------------------------
       ping/4876 is trying to acquire lock:
        (dev->qdisc_tx_busylock ?: &qdisc_tx_busylock){+.-...}, at: [<ffffffff8157a191>] dev_queue_xmit+0xe1/0x830
      
       but task is already holding lock:
        (dev->qdisc_tx_busylock ?: &qdisc_tx_busylock){+.-...}, at: [<ffffffff8157a191>] dev_queue_xmit+0xe1/0x830
      
       other info that might help us debug this:
        Possible unsafe locking scenario:
      
              CPU0
              ----
         lock(dev->qdisc_tx_busylock ?: &qdisc_tx_busylock);
         lock(dev->qdisc_tx_busylock ?: &qdisc_tx_busylock);
      
        *** DEADLOCK ***
      
        May be due to missing lock nesting notation
      
       6 locks held by ping/4876:
        #0:  (sk_lock-AF_INET){+.+.+.}, at: [<ffffffff815e5030>] raw_sendmsg+0x600/0xc30
        #1:  (rcu_read_lock_bh){.+....}, at: [<ffffffff815ba4bd>] ip_finish_output+0x12d/0x870
        #2:  (rcu_read_lock_bh){.+....}, at: [<ffffffff8157a0b0>] dev_queue_xmit+0x0/0x830
        #3:  (dev->qdisc_tx_busylock ?: &qdisc_tx_busylock){+.-...}, at: [<ffffffff8157a191>] dev_queue_xmit+0xe1/0x830
        #4:  (&bond->lock){++.?..}, at: [<ffffffffa02128c1>] bond_start_xmit+0x31/0x4b0 [bonding]
        #5:  (rcu_read_lock_bh){.+....}, at: [<ffffffff8157a0b0>] dev_queue_xmit+0x0/0x830
      
       stack backtrace:
       Pid: 4876, comm: ping Not tainted 3.6.0+ #211
       Call Trace:
        [<ffffffff810a0145>] __lock_acquire+0x715/0x1b80
        [<ffffffff810a256b>] ? mark_held_locks+0x9b/0x100
        [<ffffffff810a1bf2>] lock_acquire+0x92/0x1d0
        [<ffffffff8157a191>] ? dev_queue_xmit+0xe1/0x830
        [<ffffffff81726b7c>] _raw_spin_lock+0x3c/0x50
        [<ffffffff8157a191>] ? dev_queue_xmit+0xe1/0x830
        [<ffffffff8106264d>] ? rcu_read_lock_bh_held+0x5d/0x90
        [<ffffffff8157a191>] dev_queue_xmit+0xe1/0x830
        [<ffffffff8157a0b0>] ? netdev_pick_tx+0x570/0x570
        [<ffffffffa0212a6a>] bond_start_xmit+0x1da/0x4b0 [bonding]
        [<ffffffff815796d0>] dev_hard_start_xmit+0x240/0x6b0
        [<ffffffff81597c6e>] sch_direct_xmit+0xfe/0x2a0
        [<ffffffff8157a249>] dev_queue_xmit+0x199/0x830
        [<ffffffff8157a0b0>] ? netdev_pick_tx+0x570/0x570
        [<ffffffff815ba96f>] ip_finish_output+0x5df/0x870
        [<ffffffff815ba4bd>] ? ip_finish_output+0x12d/0x870
        [<ffffffff815bb964>] ip_output+0x54/0xf0
        [<ffffffff815bad48>] ip_local_out+0x28/0x90
        [<ffffffff815bc444>] ip_send_skb+0x14/0x50
        [<ffffffff815bc4b2>] ip_push_pending_frames+0x32/0x40
        [<ffffffff815e536a>] raw_sendmsg+0x93a/0xc30
        [<ffffffff8128d570>] ? selinux_file_send_sigiotask+0x1f0/0x1f0
        [<ffffffff8109ddb4>] ? __lock_is_held+0x54/0x80
        [<ffffffff815f6730>] ? inet_recvmsg+0x220/0x220
        [<ffffffff8109ddb4>] ? __lock_is_held+0x54/0x80
        [<ffffffff815f6855>] inet_sendmsg+0x125/0x240
        [<ffffffff815f6730>] ? inet_recvmsg+0x220/0x220
        [<ffffffff8155cddb>] sock_sendmsg+0xab/0xe0
        [<ffffffff810a1650>] ? lock_release_non_nested+0xa0/0x2e0
        [<ffffffff810a1650>] ? lock_release_non_nested+0xa0/0x2e0
        [<ffffffff8155d18c>] __sys_sendmsg+0x37c/0x390
        [<ffffffff81195b2a>] ? fsnotify+0x2ca/0x7e0
        [<ffffffff811958e8>] ? fsnotify+0x88/0x7e0
        [<ffffffff81361f36>] ? put_ldisc+0x56/0xd0
        [<ffffffff8116f98a>] ? fget_light+0x3da/0x510
        [<ffffffff8155f6c4>] sys_sendmsg+0x44/0x80
        [<ffffffff8172fc22>] system_call_fastpath+0x16/0x1b
      
      Avoid this problem using a distinct lock_class_key for bonding
      devices.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Jay Vosburgh <fubar@us.ibm.com>
      Cc: Andy Gospodarek <andy@greyhouse.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      49ee4920
    • Nicolas Dichtel's avatar
      sctp: check src addr when processing SACK to update transport state · edfee033
      Nicolas Dichtel authored
      Suppose we have an SCTP connection with two paths. After connection is
      established, path1 is not available, thus this path is marked as inactive. Then
      traffic goes through path2, but for some reasons packets are delayed (after
      rto.max). Because packets are delayed, the retransmit mechanism will switch
      again to path1. At this time, we receive a delayed SACK from path2. When we
      update the state of the path in sctp_check_transmitted(), we do not take into
      account the source address of the SACK, hence we update the wrong path.
      Signed-off-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Acked-by: default avatarVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      edfee033
    • Nicolas Dichtel's avatar
      sctp: fix a typo in prototype of __sctp_rcv_lookup() · 57565993
      Nicolas Dichtel authored
      Just to avoid confusion when people only reads this prototype.
      Signed-off-by: default avatarNicolas Dichtel <nicolas.dichtel@6wind.com>
      Acked-by: default avatarVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      57565993
    • David S. Miller's avatar
      Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net · e7b565e7
      David S. Miller authored
      Jeff Kirsher says:
      
      ====================
      This series contains fixes/updates to ixgbe only.  There are three
      PTP fixes, polling loop fix and the addition of a device id (X540-AT1).
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      e7b565e7
    • Eric Dumazet's avatar
      ipv4: add a fib_type to fib_info · f4ef85bb
      Eric Dumazet authored
      commit d2d68ba9 (ipv4: Cache input routes in fib_info nexthops.)
      introduced a regression for forwarding.
      
      This was hard to reproduce but the symptom was that packets were
      delivered to local host instead of being forwarded.
      
      David suggested to add fib_type to fib_info so that we dont
      inadvertently share same fib_info for different purposes.
      
      With help from Julian Anastasov who provided very helpful
      hints, reproduced here :
      
      <quote>
              Can it be a problem related to fib_info reuse
      from different routes. For example, when local IP address
      is created for subnet we have:
      
      broadcast 192.168.0.255 dev DEV  proto kernel  scope link  src
      192.168.0.1
      192.168.0.0/24 dev DEV  proto kernel  scope link  src 192.168.0.1
      local 192.168.0.1 dev DEV  proto kernel  scope host  src 192.168.0.1
      
              The "dev DEV  proto kernel  scope link  src 192.168.0.1" is
      a reused fib_info structure where we put cached routes.
      The result can be same fib_info for 192.168.0.255 and
      192.168.0.0/24. RTN_BROADCAST is cached only for input
      routes. Incoming broadcast to 192.168.0.255 can be cached
      and can cause problems for traffic forwarded to 192.168.0.0/24.
      So, this patch should solve the problem because it
      separates the broadcast from unicast traffic.
      
              And the ip_route_input_slow caching will work for
      local and broadcast input routes (above routes 1 and 3) just
      because they differ in scope and use different fib_info.
      
      </quote>
      
      Many thanks to Chris Clayton for his patience and help.
      Reported-by: default avatarChris Clayton <chris2553@googlemail.com>
      Bisected-by: default avatarChris Clayton <chris2553@googlemail.com>
      Reported-by: default avatarDave Jones <davej@redhat.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Julian Anastasov <ja@ssi.bg>
      Tested-by: default avatarChris Clayton <chris2553@googlemail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f4ef85bb
    • David S. Miller's avatar
  2. 03 Oct, 2012 16 commits
  3. 02 Oct, 2012 17 commits
    • Linus Torvalds's avatar
      Merge tag 'pinctrl-for-v3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl · 06fe918e
      Linus Torvalds authored
      Pull pinctrl changes from Linus Walleij:
       "Some of this stuff is hitting arch/arm/* and have been ACKed by the
        ARM SoC folks, or it's device tree bindings pertaining to the specific
        driver.
      
        These are the bulk pinctrl changes for kernel v3.7:
         - Add subdrivers for the DB8540 and NHK8815 Nomadik-type ASICs,
           provide platform config for the Nomadik.
         - Add a driver for the i.MX35.
         - Add a driver for the BCM2835, an advanced GPIO expander.
         - Various fixes and clean-ups and minor improvements for the core,
           Nomadik, pinctr-single, sirf drivers.
         - Some platform config for the ux500."
      
      * tag 'pinctrl-for-v3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: (27 commits)
        pinctrl: add bcm2835 driver
        pinctrl: clarify idle vs sleep states
        pinctrl/nomadik: use irq_find_mapping()
        pinctrl: sirf: add lost chained_irq_enter and exit in sirfsoc_gpio_handle_irq
        pinctrl: sirf: initialize the irq_chip pointer of pinctrl_gpio_range
        pinctrl: sirf: fix spinlock deadlock in sirfsoc_gpio_set_input
        pinctrl: sirf: add missing pins to pinctrl list
        pinctrl: sirf: fix a typo in sirfsoc_gpio_probe
        pinctrl: pinctrl-single: add debugfs pin h/w state info
        ARM: ux500: 8500: update I2C sleep states pinctrl
        pinctrl: Fix potential memory leak in pinctrl_register_one_pin()
        ARM: ux500: tidy up pin sleep modes
        ARM: ux500: fix spi2 pin group
        pinctrl: imx: remove duplicated const
        pinctrl: document semantics vs GPIO
        ARM: ux500: 8500: use hsit_a_2 group for HSI
        pinctrl: use kasprintf() in pinmux_request_gpio()
        pinctrl: pinctrl-single: Add pinctrl-single,bits type of mux
        pinctrl/nomadik : add MC1_a_2 pin MC1 function group list
        pinctrl: pinctrl-single: Make sure we do not change bits outside of mask
        ...
      06fe918e
    • Linus Torvalds's avatar
      Merge tag 'gpio-for-v3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio · dff8360a
      Linus Torvalds authored
      Pull GPIO changes from Linus Walleij:
       "So this is the LW GPIO patch stack for v3.7:
         - refactoring from Thierry Redding at Arnd Bergmann's request to use
           the seq_file iterator interface in gpiolib.
         - A new driver for Avionic Design's N-bit GPIO expander.
         - Two instances of mutexes replaced by spinlocks from Axel Lin to
           code that is supposed to be fastpath compliant.
         - IRQ demuxer and gpio_to_irq() support for pcf857x by Kuninori
           Morimoto.
         - Dynamic GPIO numbers, device tree support, daisy chaining and some
           other fixes for the 74x164 driver by Maxime Ripard.
         - IRQ domain and device tree support for the tc3589x driver by Lee
           Jones.
         - Some conversion to use managed resources devm_* code.
         - Some instances of clk_prepare() or clk_prepare_enable() added to
           support the new, stricter common clock framework.
         - Some for_each_set_bit() simplifications.
         - Then a lot of fixes as we fixed up all of the above tripping over
           our own shoelaces and that kind of thing."
      
      * tag 'gpio-for-v3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio: (34 commits)
        gpio: pcf857x: select IRQ_DOMAIN
        gpio: Document device_node's det_debounce
        gpio-lpc32xx: Add GPI_28
        gpio: adnp: dt: Reference generic interrupt binding
        gpio: Add Avionic Design N-bit GPIO expander support
        gpio: pxa: using for_each_set_bit to simplify the code
        gpio_msm: using for_each_set_bit to simplify the code
        gpio: Enable the tc3298x GPIO expander driver for Device Tree
        gpio: Provide the tc3589x GPIO expander driver with an IRQ domain
        ARM: shmobile: kzm9g: use gpio-keys instead of gpio-keys-polled
        gpio: pcf857x: fixup smatch WARNING
        gpio: 74x164: Add support for the daisy-chaining
        gpio: 74x164: dts: Add documentation for the dt binding
        dt: Fix incorrect reference in gpio-led documentation
        gpio: 74x164: Add device tree support
        gpio: 74x164: Use dynamic gpio number assignment if no pdata is present
        gpio: 74x164: Use devm_kzalloc
        gpio: 74x164: Use module_spi_driver boiler plate function
        gpio: sx150x: Use irq_data_get_irq_chip_data() at appropriate places
        gpio: em: Use irq_data_get_irq_chip_data() at appropriate places
        ...
      dff8360a
    • Linus Torvalds's avatar
      workqueue: avoid using deprecated functions · 916082b0
      Linus Torvalds authored
      The network merge brought in a few users of functions that got
      deprecated by the workqueue cleanups: the 'system_nrt_wq' is now the
      same as the regular system_wq, since all workqueues are now non-
      reentrant.
      
      Similarly, remove one use of flush_work_sync() - the regular
      flush_work() has become synchronous, and the "_sync()" version is thus
      deprecated as being superfluous.
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      916082b0
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next · aecdc33e
      Linus Torvalds authored
      Pull networking changes from David Miller:
      
       1) GRE now works over ipv6, from Dmitry Kozlov.
      
       2) Make SCTP more network namespace aware, from Eric Biederman.
      
       3) TEAM driver now works with non-ethernet devices, from Jiri Pirko.
      
       4) Make openvswitch network namespace aware, from Pravin B Shelar.
      
       5) IPV6 NAT implementation, from Patrick McHardy.
      
       6) Server side support for TCP Fast Open, from Jerry Chu and others.
      
       7) Packet BPF filter supports MOD and XOR, from Eric Dumazet and Daniel
          Borkmann.
      
       8) Increate the loopback default MTU to 64K, from Eric Dumazet.
      
       9) Use a per-task rather than per-socket page fragment allocator for
          outgoing networking traffic.  This benefits processes that have very
          many mostly idle sockets, which is quite common.
      
          From Eric Dumazet.
      
      10) Use up to 32K for page fragment allocations, with fallbacks to
          smaller sizes when higher order page allocations fail.  Benefits are
          a) less segments for driver to process b) less calls to page
          allocator c) less waste of space.
      
          From Eric Dumazet.
      
      11) Allow GRO to be used on GRE tunnels, from Eric Dumazet.
      
      12) VXLAN device driver, one way to handle VLAN issues such as the
          limitation of 4096 VLAN IDs yet still have some level of isolation.
          From Stephen Hemminger.
      
      13) As usual there is a large boatload of driver changes, with the scale
          perhaps tilted towards the wireless side this time around.
      
      Fix up various fairly trivial conflicts, mostly caused by the user
      namespace changes.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1012 commits)
        hyperv: Add buffer for extended info after the RNDIS response message.
        hyperv: Report actual status in receive completion packet
        hyperv: Remove extra allocated space for recv_pkt_list elements
        hyperv: Fix page buffer handling in rndis_filter_send_request()
        hyperv: Fix the missing return value in rndis_filter_set_packet_filter()
        hyperv: Fix the max_xfer_size in RNDIS initialization
        vxlan: put UDP socket in correct namespace
        vxlan: Depend on CONFIG_INET
        sfc: Fix the reported priorities of different filter types
        sfc: Remove EFX_FILTER_FLAG_RX_OVERRIDE_IP
        sfc: Fix loopback self-test with separate_tx_channels=1
        sfc: Fix MCDI structure field lookup
        sfc: Add parentheses around use of bitfield macro arguments
        sfc: Fix null function pointer in efx_sriov_channel_type
        vxlan: virtual extensible lan
        igmp: export symbol ip_mc_leave_group
        netlink: add attributes to fdb interface
        tg3: unconditionally select HWMON support when tg3 is enabled.
        Revert "net: ti cpsw ethernet: allow reading phy interface mode from DT"
        gre: fix sparse warning
        ...
      aecdc33e
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-next · a20acf99
      Linus Torvalds authored
      Pull sparc updates from David Miller:
       "Largely this is simply adding support for the Niagara 4 cpu.
      
        Major areas are perf events (chip now supports 4 counters and can
        monitor any event on each counter), crypto (opcodes are availble for
        sha1, sha256, sha512, md5, crc32c, AES, DES, CAMELLIA, and Kasumi
        although the last is unsupported since we lack a generic crypto layer
        Kasumi implementation), and an optimized memcpy.
      
        Finally some cleanups by Peter Senna Tschudin."
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-next: (47 commits)
        sparc64: Fix trailing whitespace in NG4 memcpy.
        sparc64: Fix comment type in NG4 copy from user.
        sparc64: Add SPARC-T4 optimized memcpy.
        drivers/sbus/char: removes unnecessary semicolon
        arch/sparc/kernel/pci_sun4v.c: removes unnecessary semicolon
        sparc64: Fix function argument comment in camellia_sparc64_key_expand asm.
        sparc64: Fix IV handling bug in des_sparc64_cbc_decrypt
        sparc64: Add auto-loading mechanism to crypto-opcode drivers.
        sparc64: Add missing pr_fmt define to crypto opcode drivers.
        sparc64: Adjust crypto priorities.
        sparc64: Use cpu_pgsz_mask for linear kernel mapping config.
        sparc64: Probe cpu page size support more portably.
        sparc64: Support 2GB and 16GB page sizes for kernel linear mappings.
        sparc64: Fix bugs in unrolled 256-bit loops.
        sparc64: Avoid code duplication in crypto assembler.
        sparc64: Unroll CTR crypt loops in AES driver.
        sparc64: Unroll ECB decryption loops in AES driver.
        sparc64: Unroll ECB encryption loops in AES driver.
        sparc64: Add ctr mode support to AES driver.
        sparc64: Move AES driver over to a methods based implementation.
        ...
      a20acf99
    • Haiyang Zhang's avatar
      hyperv: Add buffer for extended info after the RNDIS response message. · a3a6cab5
      Haiyang Zhang authored
      In some response messages, there may be some extended info after the
      message.
      Signed-off-by: default avatarHaiyang Zhang <haiyangz@microsoft.com>
      Reviewed-by: default avatarK. Y. Srinivasan <kys@microsoft.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a3a6cab5
    • Haiyang Zhang's avatar
      hyperv: Report actual status in receive completion packet · 63f6921d
      Haiyang Zhang authored
      The existing code always reports NVSP_STAT_SUCCESS. This patch adds the
      mechanism to report failure when it happens.
      Signed-off-by: default avatarHaiyang Zhang <haiyangz@microsoft.com>
      Reviewed-by: default avatarK. Y. Srinivasan <kys@microsoft.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      63f6921d
    • Haiyang Zhang's avatar
      hyperv: Remove extra allocated space for recv_pkt_list elements · 6562640b
      Haiyang Zhang authored
      The receive code path doesn't use the page buffer, so remove the
      extra allocated space here.
      Signed-off-by: default avatarHaiyang Zhang <haiyangz@microsoft.com>
      Reviewed-by: default avatarK. Y. Srinivasan <kys@microsoft.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6562640b
    • Haiyang Zhang's avatar
      hyperv: Fix page buffer handling in rndis_filter_send_request() · 99e3fcfa
      Haiyang Zhang authored
      To prevent possible data corruption in RNDIS requests, add another
      page buffer if the request message crossed page boundary.
      Signed-off-by: default avatarHaiyang Zhang <haiyangz@microsoft.com>
      Reviewed-by: default avatarK. Y. Srinivasan <kys@microsoft.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      99e3fcfa
    • Haiyang Zhang's avatar
      hyperv: Fix the missing return value in rndis_filter_set_packet_filter() · ea496374
      Haiyang Zhang authored
      Return ETIMEDOUT when the reply message is not received in time.
      Signed-off-by: default avatarHaiyang Zhang <haiyangz@microsoft.com>
      Reviewed-by: default avatarK. Y. Srinivasan <kys@microsoft.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ea496374
    • Haiyang Zhang's avatar
      hyperv: Fix the max_xfer_size in RNDIS initialization · fb1d074e
      Haiyang Zhang authored
      According to RNDIS specs, Windows sets this size to
      0x4000. I use the same value here.
      Signed-off-by: default avatarHaiyang Zhang <haiyangz@microsoft.com>
      Reviewed-by: default avatarK. Y. Srinivasan <kys@microsoft.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fb1d074e
    • stephen hemminger's avatar
      vxlan: put UDP socket in correct namespace · bfe1b9b1
      stephen hemminger authored
      Move vxlan UDP socket to correct network namespace
      Signed-off-by: default avatarStephen Hemminger <shemminger@vyatta.com>
      Acked-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      bfe1b9b1
    • David S. Miller's avatar
      vxlan: Depend on CONFIG_INET · aaba1f58
      David S. Miller authored
      Reported-by: default avatarRandy Dunlap <rdunlap@xenotime.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      aaba1f58
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace · 437589a7
      Linus Torvalds authored
      Pull user namespace changes from Eric Biederman:
       "This is a mostly modest set of changes to enable basic user namespace
        support.  This allows the code to code to compile with user namespaces
        enabled and removes the assumption there is only the initial user
        namespace.  Everything is converted except for the most complex of the
        filesystems: autofs4, 9p, afs, ceph, cifs, coda, fuse, gfs2, ncpfs,
        nfs, ocfs2 and xfs as those patches need a bit more review.
      
        The strategy is to push kuid_t and kgid_t values are far down into
        subsystems and filesystems as reasonable.  Leaving the make_kuid and
        from_kuid operations to happen at the edge of userspace, as the values
        come off the disk, and as the values come in from the network.
        Letting compile type incompatible compile errors (present when user
        namespaces are enabled) guide me to find the issues.
      
        The most tricky areas have been the places where we had an implicit
        union of uid and gid values and were storing them in an unsigned int.
        Those places were converted into explicit unions.  I made certain to
        handle those places with simple trivial patches.
      
        Out of that work I discovered we have generic interfaces for storing
        quota by projid.  I had never heard of the project identifiers before.
        Adding full user namespace support for project identifiers accounts
        for most of the code size growth in my git tree.
      
        Ultimately there will be work to relax privlige checks from
        "capable(FOO)" to "ns_capable(user_ns, FOO)" where it is safe allowing
        root in a user names to do those things that today we only forbid to
        non-root users because it will confuse suid root applications.
      
        While I was pushing kuid_t and kgid_t changes deep into the audit code
        I made a few other cleanups.  I capitalized on the fact we process
        netlink messages in the context of the message sender.  I removed
        usage of NETLINK_CRED, and started directly using current->tty.
      
        Some of these patches have also made it into maintainer trees, with no
        problems from identical code from different trees showing up in
        linux-next.
      
        After reading through all of this code I feel like I might be able to
        win a game of kernel trivial pursuit."
      
      Fix up some fairly trivial conflicts in netfilter uid/git logging code.
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (107 commits)
        userns: Convert the ufs filesystem to use kuid/kgid where appropriate
        userns: Convert the udf filesystem to use kuid/kgid where appropriate
        userns: Convert ubifs to use kuid/kgid
        userns: Convert squashfs to use kuid/kgid where appropriate
        userns: Convert reiserfs to use kuid and kgid where appropriate
        userns: Convert jfs to use kuid/kgid where appropriate
        userns: Convert jffs2 to use kuid and kgid where appropriate
        userns: Convert hpfs to use kuid and kgid where appropriate
        userns: Convert btrfs to use kuid/kgid where appropriate
        userns: Convert bfs to use kuid/kgid where appropriate
        userns: Convert affs to use kuid/kgid wherwe appropriate
        userns: On alpha modify linux_to_osf_stat to use convert from kuids and kgids
        userns: On ia64 deal with current_uid and current_gid being kuid and kgid
        userns: On ppc convert current_uid from a kuid before printing.
        userns: Convert s390 getting uid and gid system calls to use kuid and kgid
        userns: Convert s390 hypfs to use kuid and kgid where appropriate
        userns: Convert binder ipc to use kuids
        userns: Teach security_path_chown to take kuids and kgids
        userns: Add user namespace support to IMA
        userns: Convert EVM to deal with kuids and kgids in it's hmac computation
        ...
      437589a7
    • Linus Torvalds's avatar
      Merge branch 'for-3.7-hierarchy' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup · 68d47a13
      Linus Torvalds authored
      Pull cgroup hierarchy update from Tejun Heo:
       "Currently, different cgroup subsystems handle nested cgroups
        completely differently.  There's no consistency among subsystems and
        the behaviors often are outright broken.
      
        People at least seem to agree that the broken hierarhcy behaviors need
        to be weeded out if any progress is gonna be made on this front and
        that the fallouts from deprecating the broken behaviors should be
        acceptable especially given that the current behaviors don't make much
        sense when nested.
      
        This patch makes cgroup emit warning messages if cgroups for
        subsystems with broken hierarchy behavior are nested to prepare for
        fixing them in the future.  This was put in a separate branch because
        more related changes were expected (didn't make it this round) and the
        memory cgroup wanted to pull in this and make changes on top."
      
      * 'for-3.7-hierarchy' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
        cgroup: mark subsystems with broken hierarchy support and whine if cgroups are nested for them
      68d47a13
    • Linus Torvalds's avatar
      Merge branch 'for-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup · c0e8a139
      Linus Torvalds authored
      Pull cgroup updates from Tejun Heo:
      
       - xattr support added.  The implementation is shared with tmpfs.  The
         usage is restricted and intended to be used to manage per-cgroup
         metadata by system software.  tmpfs changes are routed through this
         branch with Hugh's permission.
      
       - cgroup subsystem ID handling simplified.
      
      * 'for-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
        cgroup: Define CGROUP_SUBSYS_COUNT according the configuration
        cgroup: Assign subsystem IDs during compile time
        cgroup: Do not depend on a given order when populating the subsys array
        cgroup: Wrap subsystem selection macro
        cgroup: Remove CGROUP_BUILTIN_SUBSYS_COUNT
        cgroup: net_prio: Do not define task_netpioidx() when not selected
        cgroup: net_cls: Do not define task_cls_classid() when not selected
        cgroup: net_cls: Move sock_update_classid() declaration to cls_cgroup.h
        cgroup: trivial fixes for Documentation/cgroups/cgroups.txt
        xattr: mark variable as uninitialized to make both gcc and smatch happy
        fs: add missing documentation to simple_xattr functions
        cgroup: add documentation on extended attributes usage
        cgroup: rename subsys_bits to subsys_mask
        cgroup: add xattr support
        cgroup: revise how we re-populate root directory
        xattr: extract simple_xattr code from tmpfs
      c0e8a139
    • Linus Torvalds's avatar
      Merge branch 'for-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq · 033d9959
      Linus Torvalds authored
      Pull workqueue changes from Tejun Heo:
       "This is workqueue updates for v3.7-rc1.  A lot of activities this
        round including considerable API and behavior cleanups.
      
         * delayed_work combines a timer and a work item.  The handling of the
           timer part has always been a bit clunky leading to confusing
           cancelation API with weird corner-case behaviors.  delayed_work is
           updated to use new IRQ safe timer and cancelation now works as
           expected.
      
         * Another deficiency of delayed_work was lack of the counterpart of
           mod_timer() which led to cancel+queue combinations or open-coded
           timer+work usages.  mod_delayed_work[_on]() are added.
      
           These two delayed_work changes make delayed_work provide interface
           and behave like timer which is executed with process context.
      
         * A work item could be executed concurrently on multiple CPUs, which
           is rather unintuitive and made flush_work() behavior confusing and
           half-broken under certain circumstances.  This problem doesn't
           exist for non-reentrant workqueues.  While non-reentrancy check
           isn't free, the overhead is incurred only when a work item bounces
           across different CPUs and even in simulated pathological scenario
           the overhead isn't too high.
      
           All workqueues are made non-reentrant.  This removes the
           distinction between flush_[delayed_]work() and
           flush_[delayed_]_work_sync().  The former is now as strong as the
           latter and the specified work item is guaranteed to have finished
           execution of any previous queueing on return.
      
         * In addition to the various bug fixes, Lai redid and simplified CPU
           hotplug handling significantly.
      
         * Joonsoo introduced system_highpri_wq and used it during CPU
           hotplug.
      
        There are two merge commits - one to pull in IRQ safe timer from
        tip/timers/core and the other to pull in CPU hotplug fixes from
        wq/for-3.6-fixes as Lai's hotplug restructuring depended on them."
      
      Fixed a number of trivial conflicts, but the more interesting conflicts
      were silent ones where the deprecated interfaces had been used by new
      code in the merge window, and thus didn't cause any real data conflicts.
      
      Tejun pointed out a few of them, I fixed a couple more.
      
      * 'for-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: (46 commits)
        workqueue: remove spurious WARN_ON_ONCE(in_irq()) from try_to_grab_pending()
        workqueue: use cwq_set_max_active() helper for workqueue_set_max_active()
        workqueue: introduce cwq_set_max_active() helper for thaw_workqueues()
        workqueue: remove @delayed from cwq_dec_nr_in_flight()
        workqueue: fix possible stall on try_to_grab_pending() of a delayed work item
        workqueue: use hotcpu_notifier() for workqueue_cpu_down_callback()
        workqueue: use __cpuinit instead of __devinit for cpu callbacks
        workqueue: rename manager_mutex to assoc_mutex
        workqueue: WORKER_REBIND is no longer necessary for idle rebinding
        workqueue: WORKER_REBIND is no longer necessary for busy rebinding
        workqueue: reimplement idle worker rebinding
        workqueue: deprecate __cancel_delayed_work()
        workqueue: reimplement cancel_delayed_work() using try_to_grab_pending()
        workqueue: use mod_delayed_work() instead of __cancel + queue
        workqueue: use irqsafe timer for delayed_work
        workqueue: clean up delayed_work initializers and add missing one
        workqueue: make deferrable delayed_work initializer names consistent
        workqueue: cosmetic whitespace updates for macro definitions
        workqueue: deprecate system_nrt[_freezable]_wq
        workqueue: deprecate flush[_delayed]_work_sync()
        ...
      033d9959