1. 27 Aug, 2014 3 commits
  2. 25 Aug, 2014 5 commits
    • Linus Torvalds's avatar
      Linux 3.17-rc2 · 52addcf9
      Linus Torvalds authored
      52addcf9
    • Linus Torvalds's avatar
      Merge tag 'nfs-for-3.17-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs · f01bfc97
      Linus Torvalds authored
      Pull NFS client fixes from Trond Myklebust:
       "Highlights:
      
         - more fixes for read/write codepath regressions
           * sleeping while holding the inode lock
           * stricter enforcement of page contiguity when coalescing requests
           * fix up error handling in the page coalescing code
      
         - don't busy wait on SIGKILL in the file locking code"
      
      * tag 'nfs-for-3.17-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
        nfs: Don't busy-wait on SIGKILL in __nfs_iocounter_wait
        nfs: can_coalesce_requests must enforce contiguity
        nfs: disallow duplicate pages in pgio page vectors
        nfs: don't sleep with inode lock in lock_and_join_requests
        nfs: fix error handling in lock_and_join_requests
        nfs: use blocking page_group_lock in add_request
        nfs: fix nonblocking calls to nfs_page_group_lock
        nfs: change nfs_page_group_lock argument
      f01bfc97
    • Linus Torvalds's avatar
      Merge tag 'renesas-sh-drivers-for-v3.17' of... · dd5957b7
      Linus Torvalds authored
      Merge tag 'renesas-sh-drivers-for-v3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas
      
      Pull SH driver fix from Simon Horman:
       "Confine SH_INTC to platforms that need it"
      
      * tag 'renesas-sh-drivers-for-v3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas:
        sh: intc: Confine SH_INTC to platforms that need it
      dd5957b7
    • Linus Torvalds's avatar
      Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus · 497c01dd
      Linus Torvalds authored
      Pull MIPS fixes from Ralf Baechle:
       "Pretty much all across the field so with this we should be in
        reasonable shape for the upcoming -rc2"
      
      * 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
        MIPS: OCTEON: make get_system_type() thread-safe
        MIPS: CPS: Initialize EVA before bringing up VPEs from secondary cores
        MIPS: Malta: EVA: Rename 'eva_entry' to 'platform_eva_init'
        MIPS: EVA: Add new EVA header
        MIPS: scall64-o32: Fix indirect syscall detection
        MIPS: syscall: Fix AUDIT value for O32 processes on MIPS64
        MIPS: Loongson: Fix COP2 usage for preemptible kernel
        MIPS: NL: Fix nlm_xlp_defconfig build error
        MIPS: Remove race window in page fault handling
        MIPS: Malta: Improve system memory detection for '{e, }memsize' >= 2G
        MIPS: Alchemy: Fix db1200 PSC clock enablement
        MIPS: BCM47XX: Fix reboot problem on BCM4705/BCM4785
        MIPS: Remove duplicated include from numa.c
        MIPS: Add common plat_irq_dispatch declaration
        MIPS: MSP71xx: remove unused plat_irq_dispatch() argument
        MIPS: GIC: Remove useless parens from GICBIS().
        MIPS: perf: Mark pmu interupt IRQF_NO_THREAD
      497c01dd
    • Linus Torvalds's avatar
      Merge tag 'trace-fixes-v3.17-rc1' of... · 01e9982a
      Linus Torvalds authored
      Merge tag 'trace-fixes-v3.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
      
      Pull fix for ftrace function tracer/profiler conflict from Steven Rostedt:
       "The rewrite of the ftrace code that makes it possible to allow for
        separate trampolines had a design flaw with the interaction between
        the function and function_graph tracers.
      
        The main flaw was the simplification of the use of multiple tracers
        having the same filter (like function and function_graph, that use the
        set_ftrace_filter file to filter their code).  The design assumed that
        the two tracers could never run simultaneously as only one tracer can
        be used at a time.  The problem with this assumption was that the
        function profiler could be implemented on top of the function graph
        tracer, and the function profiler could run at the same time as the
        function tracer.  This caused the assumption to be broken and when
        ftrace detected this failed assumpiton it would spit out a nasty
        warning and shut itself down.
      
        Instead of using a single ftrace_ops that switches between the
        function and function_graph callbacks, the two tracers can again use
        their own ftrace_ops.  But instead of having a complex hierarchy of
        ftrace_ops, the filter fields are placed in its own structure and the
        ftrace_ops can carefully use the same filter.  This change took a bit
        to be able to allow for this and currently only the global_ops can
        share the same filter, but this new design can easily be modified to
        allow for any ftrace_ops to share its filter with another ftrace_ops.
      
        The first four patches deal with the change of allowing the ftrace_ops
        to share the filter (and this needs to go to 3.16 as well).
      
        The fifth patch fixes a bug that was also caused by the new changes
        but only for archs other than x86, and only if those archs implement a
        direct call to the function_graph tracer which they do not do yet but
        will in the future.  It does not need to go to stable, but needs to be
        fixed before the other archs update their code to allow direct calls
        to the function_graph trampoline"
      
      * tag 'trace-fixes-v3.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
        ftrace: Use current addr when converting to nop in __ftrace_replace_code()
        ftrace: Fix function_profiler and function tracer together
        ftrace: Fix up trampoline accounting with looping on hash ops
        ftrace: Update all ftrace_ops for a ftrace_hash_ops update
        ftrace: Allow ftrace_ops to use the hashes from other ops
      01e9982a
  3. 24 Aug, 2014 12 commits
  4. 23 Aug, 2014 5 commits
    • Heiko Stuebner's avatar
      MAINTAINERS: add new Rockchip SoC list · 00250b52
      Heiko Stuebner authored
      Add the new list that Rockchip-specific patches should also be directed to.
      Signed-off-by: default avatarHeiko Stuebner <heiko@sntech.de>
      00250b52
    • Heiko Stuebner's avatar
      ARM: dts: rockchip: readd missing mmc0 pinctrl settings · 1302d32c
      Heiko Stuebner authored
      During the restructuring of the Rockchip Cortex-A9 dtsi files it seems
      like the pinctrl settings vanished at some point from the mmc0 support.
      
      This of course renders them unusable, so readd the necessary pinctrl
      properties.
      Signed-off-by: default avatarHeiko Stuebner <heiko@sntech.de>
      1302d32c
    • Olof Johansson's avatar
      Merge tag 'sunxi-dt-for-3.17-2' of... · 2136edf3
      Olof Johansson authored
      Merge tag 'sunxi-dt-for-3.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mripard/linux into fixes
      
      Merge "Allwinner DT changes, take 2" from Maxime Ripard:
      
      Only a single patch in here that fixes a DTC warning.
      
      * tag 'sunxi-dt-for-3.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mripard/linux:
        ARM: dt: sun6i: Add #address-cells and #size-cells to i2c controller nodes
      Signed-off-by: default avatarOlof Johansson <olof@lixom.net>
      2136edf3
    • Steven Rostedt (Red Hat)'s avatar
      ftrace: Use current addr when converting to nop in __ftrace_replace_code() · 39b5552c
      Steven Rostedt (Red Hat) authored
      In __ftrace_replace_code(), when converting the call to a nop in a function
      it needs to compare against the "curr" (current) value of the ftrace ops, and
      not the "new" one. It currently does not affect x86 which is the only arch
      to do the trampolines with function graph tracer, but when other archs that do
      depend on this code implement the function graph trampoline, it can crash.
      
      Here's an example when ARM uses the trampolines (in the future):
      
       ------------[ cut here ]------------
       WARNING: CPU: 0 PID: 9 at kernel/trace/ftrace.c:1716 ftrace_bug+0x17c/0x1f4()
       Modules linked in: omap_rng rng_core ipv6
       CPU: 0 PID: 9 Comm: migration/0 Not tainted 3.16.0-test-10959-gf0094b28-dirty #52
       [<c02188f4>] (unwind_backtrace) from [<c021343c>] (show_stack+0x20/0x24)
       [<c021343c>] (show_stack) from [<c095a674>] (dump_stack+0x78/0x94)
       [<c095a674>] (dump_stack) from [<c02532a0>] (warn_slowpath_common+0x7c/0x9c)
       [<c02532a0>] (warn_slowpath_common) from [<c02532ec>] (warn_slowpath_null+0x2c/0x34)
       [<c02532ec>] (warn_slowpath_null) from [<c02cbac4>] (ftrace_bug+0x17c/0x1f4)
       [<c02cbac4>] (ftrace_bug) from [<c02cc44c>] (ftrace_replace_code+0x80/0x9c)
       [<c02cc44c>] (ftrace_replace_code) from [<c02cc658>] (ftrace_modify_all_code+0xb8/0x164)
       [<c02cc658>] (ftrace_modify_all_code) from [<c02cc718>] (__ftrace_modify_code+0x14/0x1c)
       [<c02cc718>] (__ftrace_modify_code) from [<c02c7244>] (multi_cpu_stop+0xf4/0x134)
       [<c02c7244>] (multi_cpu_stop) from [<c02c6e90>] (cpu_stopper_thread+0x54/0x130)
       [<c02c6e90>] (cpu_stopper_thread) from [<c0271cd4>] (smpboot_thread_fn+0x1ac/0x1bc)
       [<c0271cd4>] (smpboot_thread_fn) from [<c026ddf0>] (kthread+0xe0/0xfc)
       [<c026ddf0>] (kthread) from [<c020f318>] (ret_from_fork+0x14/0x20)
       ---[ end trace dc9ce72c5b617d8f ]---
      [   65.047264] ftrace failed to modify [<c0208580>] asm_do_IRQ+0x10/0x1c
      [   65.054070]  actual: 85:1b:00:eb
      
      Fixes: 7413af1f "ftrace: Make get_ftrace_addr() and get_ftrace_addr_old() global"
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      39b5552c
    • Steven Rostedt (Red Hat)'s avatar
      ftrace: Fix function_profiler and function tracer together · 5f151b24
      Steven Rostedt (Red Hat) authored
      The latest rewrite of ftrace removed the separate ftrace_ops of
      the function tracer and the function graph tracer and had them
      share the same ftrace_ops. This simplified the accounting by removing
      the multiple layers of functions called, where the global_ops func
      would call a special list that would iterate over the other ops that
      were registered within it (like function and function graph), which
      itself was registered to the ftrace ops list of all functions
      currently active. If that sounds confusing, the code that implemented
      it was also confusing and its removal is a good thing.
      
      The problem with this change was that it assumed that the function
      and function graph tracer can never be used at the same time.
      This is mostly true, but there is an exception. That is when the
      function profiler uses the function graph tracer to profile.
      The function profiler can be activated the same time as the function
      tracer, and this breaks the assumption and the result is that ftrace
      will crash (it detects the error and shuts itself down, it does not
      cause a kernel oops).
      
      To solve this issue, a previous change allowed the hash tables
      for the functions traced by a ftrace_ops to be a pointer and let
      multiple ftrace_ops share the same hash. This allows the function
      and function_graph tracer to have separate ftrace_ops, but still
      share the hash, which is what is done.
      
      Now the function and function graph tracers have separate ftrace_ops
      again, and the function tracer can be run while the function_profile
      is active.
      
      Cc: stable@vger.kernel.org # 3.16 (apply after 3.17-rc4 is out)
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      5f151b24
  5. 22 Aug, 2014 15 commits
    • David Jeffery's avatar
      nfs: Don't busy-wait on SIGKILL in __nfs_iocounter_wait · 92a56555
      David Jeffery authored
      If a SIGKILL is sent to a task waiting in __nfs_iocounter_wait,
      it will busy-wait or soft lockup in its while loop.
      nfs_wait_bit_killable won't sleep, and the loop won't exit on
      the error return.
      
      Stop the busy-wait by breaking out of the loop when
      nfs_wait_bit_killable returns an error.
      Signed-off-by: default avatarDavid Jeffery <djeffery@redhat.com>
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@primarydata.com>
      92a56555
    • Weston Andros Adamson's avatar
      nfs: can_coalesce_requests must enforce contiguity · 78270e8f
      Weston Andros Adamson authored
      Commit 6094f838
      "nfs: allow coalescing of subpage requests" got rid of the requirement
      that requests cover whole pages, but it made some incorrect assumptions.
      
      It turns out that callers of this interface can map adjacent requests
      (by file position as seen by req_offset + req->wb_bytes) to different pages,
      even when they could share a page. An example is the direct I/O interface -
      iov_iter_get_pages_alloc may return one segment with a partial page filled
      and the next segment (which is adjacent in the file position) starts with a
      new page.
      Reported-by: default avatarToralf Förster <toralf.foerster@gmx.de>
      Signed-off-by: default avatarWeston Andros Adamson <dros@primarydata.com>
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@primarydata.com>
      78270e8f
    • Weston Andros Adamson's avatar
      nfs: disallow duplicate pages in pgio page vectors · bba5c188
      Weston Andros Adamson authored
      Adjacent requests that share the same page are allowed, but should only
      use one entry in the page vector. This avoids overruning the page
      vector - it is sized based on how many bytes there are, not by
      request count.
      
      This fixes issues that manifest as "Redzone overwritten" bugs (the
      vector overrun) and hangs waiting on page read / write, as it waits on
      the same page more than once.
      
      This also adds bounds checking to the page vector with a graceful failure
      (WARN_ON_ONCE and pgio error returned to application).
      Reported-by: default avatarToralf Förster <toralf.foerster@gmx.de>
      Signed-off-by: default avatarWeston Andros Adamson <dros@primarydata.com>
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@primarydata.com>
      bba5c188
    • Weston Andros Adamson's avatar
      nfs: don't sleep with inode lock in lock_and_join_requests · 7c3af975
      Weston Andros Adamson authored
      This handles the 'nonblock=false' case in nfs_lock_and_join_requests.
      If the group is already locked and blocking is allowed, drop the inode lock
      and wait for the group lock to be cleared before trying it all again.
      This should fix warnings found in peterz's tree (sched/wait branch), where
      might_sleep() checks are added to wait.[ch].
      Reported-by: default avatarFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: default avatarWeston Andros Adamson <dros@primarydata.com>
      Reviewed-by: default avatarPeng Tao <tao.peng@primarydata.com>
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@primarydata.com>
      7c3af975
    • Weston Andros Adamson's avatar
      nfs: fix error handling in lock_and_join_requests · 94970014
      Weston Andros Adamson authored
      This fixes handling of errors from nfs_page_group_lock in
      nfs_lock_and_join_requests.  It now releases the inode lock and the
      reference to the head request.
      Reported-by: default avatarPeng Tao <tao.peng@primarydata.com>
      Signed-off-by: default avatarWeston Andros Adamson <dros@primarydata.com>
      Reviewed-by: default avatarPeng Tao <tao.peng@primarydata.com>
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@primarydata.com>
      94970014
    • Weston Andros Adamson's avatar
      nfs: use blocking page_group_lock in add_request · bfd484a5
      Weston Andros Adamson authored
      __nfs_pageio_add_request was calling nfs_page_group_lock nonblocking, but
      this can return -EAGAIN which would end up passing -EIO to the application.
      
      There is no reason not to block in this path, so change the two calls to
      do so. Also, there is no need to check the return value of
      nfs_page_group_lock when nonblock=false, so remove the error handling code.
      Signed-off-by: default avatarWeston Andros Adamson <dros@primarydata.com>
      Reviewed-by: default avatarPeng Tao <tao.peng@primarydata.com>
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@primarydata.com>
      bfd484a5
    • Weston Andros Adamson's avatar
      nfs: fix nonblocking calls to nfs_page_group_lock · bc8a309e
      Weston Andros Adamson authored
      nfs_page_group_lock was calling wait_on_bit_lock even when told not to
      block. Fix by first trying test_and_set_bit, followed by wait_on_bit_lock
      if and only if blocking is allowed.  Return -EAGAIN if nonblocking and the
      test_and_set of the bit was already locked.
      Signed-off-by: default avatarWeston Andros Adamson <dros@primarydata.com>
      Reviewed-by: default avatarPeng Tao <tao.peng@primarydata.com>
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@primarydata.com>
      bc8a309e
    • Weston Andros Adamson's avatar
      nfs: change nfs_page_group_lock argument · fd2f3a06
      Weston Andros Adamson authored
      Flip the meaning of the second argument from 'wait' to 'nonblock' to
      match related functions. Update all five calls to reflect this change.
      Signed-off-by: default avatarWeston Andros Adamson <dros@primarydata.com>
      Reviewed-by: default avatarPeng Tao <tao.peng@primarydata.com>
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@primarydata.com>
      fd2f3a06
    • Linus Torvalds's avatar
      Merge tag 'pwm/for-3.17-rc2' of... · 451fd722
      Linus Torvalds authored
      Merge tag 'pwm/for-3.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm
      
      Pull pwm fix from Thierry Reding:
       "Just one bugfix for the PWM lookup table code that would cause a PWM
        channel to be set to the wrong period and polarity for non-perfect
        matches"
      
      * tag 'pwm/for-3.17-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm:
        pwm: Fix period and polarity in pwm_get() for non-perfect matches
      451fd722
    • Michal Kazior's avatar
      mac80211: fix channel switch for chanctx-based drivers · 47e4df94
      Michal Kazior authored
      The new_ctx pointer is set only for non-chanctx drivers.  This yielded a
      crash for chanctx-based drivers during channel switch finalization:
      
        BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
        IP: ieee80211_vif_use_reserved_switch+0x71c/0xb00 [mac80211]
      
      Use an adequate chanctx pointer to fix this.
      Reported-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarMichal Kazior <michal.kazior@tieto.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      47e4df94
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 433ab34d
      Linus Torvalds authored
      Pull networking fixes from David Miller:
       "Here are some bug fixes that have piled up during ksummit/linuxcon.
      
         1) Fix endian problems in ibmveth, from Anton Blanchard.
      
         2) IPV6 routing code does GFP_KERNEL allocation in atomic, fix from
            Benjamin Block.
      
         3) SCTP association fixes from Daniel Borkmann.
      
         4) When multiple VLAN headers are present we have to make sure the
            second and subsequent ones are pullable in the SKB otherwise we
            blindly dereference garbage.  From Jiri Benc.
      
         5) The argument adjustment of the signature of hlist_add_after*()
            introduced a regression in the batman-adv code, fix from Sven
            Eckelmann.
      
         6) Fix TX hang handling to avoid a panic in i40e, from Anjali Singhai
            Jain.
      
         7) PTP flag test is inverted in i40e driver, from Jesse Brandeburg.
      
         8) ATM LEC driver needs to hold RTNL mutex over MTU changes, from
            Chas Williams.
      
         9) Truncate packets larger then the TPACKET_V3 format configured
            buffers, otherwise we overwrite past the end of said buffers.
            From Eric Dumazet.
      
        10) Fix endianness bugs in qlcnic firmware handling, from Rajesh
            Borundia and Shahed Shaikh.
      
        11) CXGB4 sometimes doesn't get all of the TX completion events it
            should resulting in SKBs getting stuck in the TX queue, from
            Hariprasad Shenai.
      
        12) When the FEC chip's PTP clock is disabled, you can't access the
            register.  Add necessary checks to avoid the resulting hang, from
            Fugang Duan"
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (37 commits)
        drivers: isdn: eicon: xdi_msg.h: Fix typo in #ifndef
        net: sctp: fix suboptimal edge-case on non-active active/retrans path selection
        net: sctp: spare unnecessary comparison in sctp_trans_elect_best
        net: ethernet: broadcom: bnx2x: Remove redundant #ifdef
        ibmveth: Fix endian issues with rx_no_buffer statistic
        net: xgene: fix possible NULL dereference in xgene_enet_free_desc_rings()
        openvswitch: fix panic with multiple vlan headers
        net: ipv6: fib: don't sleep inside atomic lock
        net: fec: ptp: avoid register access when ipg clock is disabled
        cxgb4: Free completed tx skbs promptly
        cxgb4: Fix race condition in cleanup
        sctp: not send SCTP_PEER_ADDR_CHANGE notifications with failed probe
        bnx2x: Revert UNDI flushing mechanism
        qlcnic: Fix endianess issue in firmware load from file operation
        qlcnic: Fix endianess issue in FW dump template header
        qlcnic: Fix flash access interface to application
        MAINTAINERS: Add section for MRF24J40 IEEE 802.15.4 radio driver
        macvlan: Allow setting multicast filter on all macvlan types
        packet: handle too big packets for PACKET_V3
        MAINTAINERS: add entry for ec_bhf driver
        ...
      433ab34d
    • Steven Rostedt (Red Hat)'s avatar
      ftrace: Fix up trampoline accounting with looping on hash ops · bce0b6c5
      Steven Rostedt (Red Hat) authored
      Now that a ftrace_hash can be shared by multiple ftrace_ops, they can dec
      the rec->flags by more than once (one per those that share the ftrace_hash).
      This means that the tramp_hash may not have a hash item when it was added.
      
      For example, if two ftrace_ops share a hash for a ftrace record, and the
      first ops has a trampoline, when it adds itself it will set the rec->flags
      TRAMP flag and increments its nr_trampolines counter. When the second ops
      is added, it must clear that tramp flag but also decrement the other ops
      that shares its hash. As the update to the function callbacks has not yet
      been performed, the other ops will not have the tramp hash set yet and it
      can not be used to know to decrement its nr_trampolines.
      
      Luckily, the tramp_hash does not need to be used. As the ftrace_mutex is
      held, a ops with a trampoline to a record during an update of another ops
      that shares the record will have its func_hash pointing to it. Since a
      trampoline can only be set for a record if only one ops is attached to it,
      we can just check if the record has a trampoline (the FTRACE_FL_TRAMP flag
      is set) and then find the ops that has this record in its hashes.
      
      Also added some output to help debug when things go wrong.
      
      Cc: stable@vger.kernel.org # 3.16+ (apply after 3.17-rc4 is out)
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      bce0b6c5
    • Rasmus Villemoes's avatar
      drivers: isdn: eicon: xdi_msg.h: Fix typo in #ifndef · faaa5524
      Rasmus Villemoes authored
      Test for definedness of the macro which is actually defined (the
      change is hard to see: it is s/SSS/SSA/).
      Signed-off-by: default avatarRasmus Villemoes <linux@rasmusvillemoes.dk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      faaa5524
    • Daniel Borkmann's avatar
      net: sctp: fix suboptimal edge-case on non-active active/retrans path selection · aa4a83ee
      Daniel Borkmann authored
      In SCTP, selection of active (T.ACT) and retransmission (T.RET)
      transports is being done whenever transport control operations
      (UP, DOWN, PF, ...) are engaged through sctp_assoc_control_transport().
      
      Commits 4c47af4d ("net: sctp: rework multihoming retransmission
      path selection to rfc4960") and a7288c4d ("net: sctp: improve
      sctp_select_active_and_retran_path selection") have both improved
      it towards a more fine-grained and optimal path selection.
      
      Currently, the selection algorithm for T.ACT and T.RET is as follows:
      
      1) Elect the two most recently used ACTIVE transports T1, T2 for
         T.ACT, T.RET, where T.ACT<-T1 and T1 is most recently used
      2) In case primary path T.PRI not in {T1, T2} but ACTIVE, set
         T.ACT<-T.PRI and T.RET<-T1
      3) If only T1 is ACTIVE from the set, set T.ACT<-T1 and T.RET<-T1
      4) If none is ACTIVE, set T.ACT<-best(T.PRI, T.RET, T3) where
         T3 is the most recently used (if avail) in PF, set T.RET<-T.PRI
      
      Prior to above commits, 4) was simply a camp on T.ACT<-T.PRI and
      T.RET<-T.PRI, ignoring possible paths in PF. Camping on T.PRI is
      still slightly suboptimal as it can lead to the following scenario:
      
      Setup:
              <A>                                <B>
          T1: p1p1 (10.0.10.10) <==>  .'`)  <==> p1p1 (10.0.10.12)  <= T.PRI
          T2: p1p2 (10.0.10.20) <==> (_ . ) <==> p1p2 (10.0.10.22)
      
          net.sctp.rto_min = 1000
          net.sctp.path_max_retrans = 2
          net.sctp.pf_retrans = 0
          net.sctp.hb_interval = 1000
      
      T.PRI is permanently down, T2 is put briefly into PF state (e.g. due to
      link flapping). Here, the first time transmission is sent over PF path
      T2 as it's the only non-INACTIVE path, but the retransmitted data-chunks
      are sent over the INACTIVE path T1 (T.PRI), which is not good.
      
      After the patch, it's choosing better transports in both cases by
      modifying step 4):
      
      4) If none is ACTIVE, set T.ACT_new<-best(T.ACT_old, T3) where T3 is
         the most recently used (if avail) in PF, set T.RET<-T.ACT_new
      
      This will still select a best possible path in PF if available (which
      can also include T.PRI/T.RET), and set both T.ACT/T.RET to it.
      
      In case sctp_assoc_control_transport() *just* put T.ACT_old into INACTIVE
      as it transitioned from ACTIVE->PF->INACTIVE and stays in INACTIVE just
      for a very short while before going back ACTIVE, it will guarantee that
      this path will be reselected for T.ACT/T.RET since T3 (PF) is not
      available.
      
      Previously, this was not possible, as we would only select between T.PRI
      and T.RET, and a possible T3 would be NULL due to the fact that we have
      just transitioned T3 in sctp_assoc_control_transport() from PF->INACTIVE
      and would select a suboptimal path when T.PRI/T.RET have worse properties.
      
      In the case that T.ACT_old permanently went to INACTIVE during this
      transition and there's no PF path available, plus T.PRI and T.RET are
      INACTIVE as well, we would now camp on T.ACT_old, but if everything is
      being INACTIVE there's really not much we can do except hoping for a
      successful HB to bring one of the transports back up again and, thus
      cause a new selection through sctp_assoc_control_transport().
      
      Now both tests work fine:
      
      Case 1:
      
       1. T1 S(ACTIVE) T.ACT
          T2 S(ACTIVE) T.RET
      
       2. T1 S(ACTIVE) T.ACT, T.RET
          T2 S(PF)
      
       3. T1 S(ACTIVE) T.ACT, T.RET
          T2 S(INACTIVE)
      
       5. T1 S(PF) T.ACT, T.RET
          T2 S(INACTIVE)
      
      [ 5.1 T1 S(INACTIVE) T.ACT, T.RET
            T2 S(INACTIVE) ]
      
       6. T1 S(ACTIVE) T.ACT, T.RET
          T2 S(INACTIVE)
      
       7. T1 S(ACTIVE) T.ACT
          T2 S(ACTIVE) T.RET
      
      Case 2:
      
       1. T1 S(ACTIVE) T.ACT
          T2 S(ACTIVE) T.RET
      
       2. T1 S(PF)
          T2 S(ACTIVE) T.ACT, T.RET
      
       3. T1 S(INACTIVE)
          T2 S(ACTIVE) T.ACT, T.RET
      
       5. T1 S(INACTIVE)
          T2 S(PF) T.ACT, T.RET
      
      [ 5.1 T1 S(INACTIVE)
            T2 S(INACTIVE) T.ACT, T.RET ]
      
       6. T1 S(INACTIVE)
          T2 S(ACTIVE) T.ACT, T.RET
      
       7. T1 S(ACTIVE) T.ACT
          T2 S(ACTIVE) T.RET
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Acked-by: default avatarVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      aa4a83ee
    • Daniel Borkmann's avatar
      net: sctp: spare unnecessary comparison in sctp_trans_elect_best · ea4f19c1
      Daniel Borkmann authored
      When both transports are the same, we don't have to go down that
      road only to realize that we will return the very same transport.
      We are guaranteed that curr is always non-NULL. Therefore, just
      short-circuit this special case.
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Acked-by: default avatarVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ea4f19c1