1. 22 Jul, 2014 2 commits
    • David S. Miller's avatar
      Merge tag 'batman-adv-fix-for-davem' of git://git.open-mesh.org/linux-merge · 850717ef
      David S. Miller authored
      Antonio Quartulli says:
      
      ====================
      pull request [net]: batman-adv 20140721
      
      here you have two fixes that we have been testing for quite some time
      (this is why they arrived a bit late in the rc cycle).
      
      Patch 1) ensures that BLA packets get dropped and not forwarded to the
      mesh even if they reach batman-adv within QinQ frames. Forwarding them
      into the mesh means messing up with the TT database of other nodes which
      can generate all kind of unexpected behaviours during route computation.
      
      Patch 2) avoids a couple of race conditions triggered upon fast VLAN
      deletion-addition. Such race conditions are pretty dangerous because
      they not only create inconsistencies in the TT database of the nodes
      in the network, but such scenario is also unrecoverable (unless
      nodes are rebooted).
      ====================
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      850717ef
    • Eric Dumazet's avatar
      ipv4: fix buffer overflow in ip_options_compile() · 10ec9472
      Eric Dumazet authored
      There is a benign buffer overflow in ip_options_compile spotted by
      AddressSanitizer[1] :
      
      Its benign because we always can access one extra byte in skb->head
      (because header is followed by struct skb_shared_info), and in this case
      this byte is not even used.
      
      [28504.910798] ==================================================================
      [28504.912046] AddressSanitizer: heap-buffer-overflow in ip_options_compile
      [28504.913170] Read of size 1 by thread T15843:
      [28504.914026]  [<ffffffff81802f91>] ip_options_compile+0x121/0x9c0
      [28504.915394]  [<ffffffff81804a0d>] ip_options_get_from_user+0xad/0x120
      [28504.916843]  [<ffffffff8180dedf>] do_ip_setsockopt.isra.15+0x8df/0x1630
      [28504.918175]  [<ffffffff8180ec60>] ip_setsockopt+0x30/0xa0
      [28504.919490]  [<ffffffff8181e59b>] tcp_setsockopt+0x5b/0x90
      [28504.920835]  [<ffffffff8177462f>] sock_common_setsockopt+0x5f/0x70
      [28504.922208]  [<ffffffff817729c2>] SyS_setsockopt+0xa2/0x140
      [28504.923459]  [<ffffffff818cfb69>] system_call_fastpath+0x16/0x1b
      [28504.924722]
      [28504.925106] Allocated by thread T15843:
      [28504.925815]  [<ffffffff81804995>] ip_options_get_from_user+0x35/0x120
      [28504.926884]  [<ffffffff8180dedf>] do_ip_setsockopt.isra.15+0x8df/0x1630
      [28504.927975]  [<ffffffff8180ec60>] ip_setsockopt+0x30/0xa0
      [28504.929175]  [<ffffffff8181e59b>] tcp_setsockopt+0x5b/0x90
      [28504.930400]  [<ffffffff8177462f>] sock_common_setsockopt+0x5f/0x70
      [28504.931677]  [<ffffffff817729c2>] SyS_setsockopt+0xa2/0x140
      [28504.932851]  [<ffffffff818cfb69>] system_call_fastpath+0x16/0x1b
      [28504.934018]
      [28504.934377] The buggy address ffff880026382828 is located 0 bytes to the right
      [28504.934377]  of 40-byte region [ffff880026382800, ffff880026382828)
      [28504.937144]
      [28504.937474] Memory state around the buggy address:
      [28504.938430]  ffff880026382300: ........ rrrrrrrr rrrrrrrr rrrrrrrr
      [28504.939884]  ffff880026382400: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr
      [28504.941294]  ffff880026382500: .....rrr rrrrrrrr rrrrrrrr rrrrrrrr
      [28504.942504]  ffff880026382600: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr
      [28504.943483]  ffff880026382700: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr
      [28504.944511] >ffff880026382800: .....rrr rrrrrrrr rrrrrrrr rrrrrrrr
      [28504.945573]                         ^
      [28504.946277]  ffff880026382900: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr
      [28505.094949]  ffff880026382a00: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr
      [28505.096114]  ffff880026382b00: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr
      [28505.097116]  ffff880026382c00: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr
      [28505.098472]  ffff880026382d00: ffffffff rrrrrrrr rrrrrrrr rrrrrrrr
      [28505.099804] Legend:
      [28505.100269]  f - 8 freed bytes
      [28505.100884]  r - 8 redzone bytes
      [28505.101649]  . - 8 allocated bytes
      [28505.102406]  x=1..7 - x allocated bytes + (8-x) redzone bytes
      [28505.103637] ==================================================================
      
      [1] https://code.google.com/p/address-sanitizer/wiki/AddressSanitizerForKernelSigned-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      10ec9472
  2. 21 Jul, 2014 9 commits
  3. 17 Jul, 2014 7 commits
    • Bjørn Mork's avatar
      net: huawei_cdc_ncm: add "subclass 3" devices · c2a6c781
      Bjørn Mork authored
      Huawei's usage of the subclass and protocol fields is not 100%
      clear to us, but there appears to be a very strict system.
      
      A device with the "shared" device ID 12d1:1506 and this NCM
      function was recently reported (showing only default altsetting):
      
          Interface Descriptor:
            bLength                 9
            bDescriptorType         4
            bInterfaceNumber        1
            bAlternateSetting       0
            bNumEndpoints           1
            bInterfaceClass       255 Vendor Specific Class
            bInterfaceSubClass      3
            bInterfaceProtocol     22
            iInterface              8 CDC Network Control Model (NCM)
            ** UNRECOGNIZED:  05 24 00 10 01
            ** UNRECOGNIZED:  06 24 1a 00 01 1f
            ** UNRECOGNIZED:  0c 24 1b 00 01 00 04 10 14 dc 05 20
            ** UNRECOGNIZED:  0d 24 0f 0a 0f 00 00 00 ea 05 03 00 01
            ** UNRECOGNIZED:  05 24 06 01 01
            Endpoint Descriptor:
              bLength                 7
              bDescriptorType         5
              bEndpointAddress     0x85  EP 5 IN
              bmAttributes            3
                Transfer Type            Interrupt
                Synch Type               None
                Usage Type               Data
              wMaxPacketSize     0x0010  1x 16 bytes
              bInterval               9
      
      Cc: Enrico Mioso <mrkiko.rs@gmail.com>
      Signed-off-by: default avatarBjørn Mork <bjorn@mork.no>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c2a6c781
    • Bjørn Mork's avatar
      net: qmi_wwan: add two Sierra Wireless/Netgear devices · 53433300
      Bjørn Mork authored
      Add two device IDs found in an out-of-tree driver downloadable
      from Netgear.
      Signed-off-by: default avatarBjørn Mork <bjorn@mork.no>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      53433300
    • Dan Carpenter's avatar
      wan/x25_asy: integer overflow in x25_asy_change_mtu() · a28d0e87
      Dan Carpenter authored
      If "newmtu * 2 + 4" is too large then it can cause an integer overflow
      leading to memory corruption.  Eric Dumazet suggests that 65534 is a
      reasonable upper limit.
      
      Btw, "newmtu" is not allowed to be a negative number because of the
      check in dev_set_mtu(), so that's ok.
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a28d0e87
    • Christoph Schulz's avatar
      net: ppp: fix creating PPP pass and active filters · cc25eaae
      Christoph Schulz authored
      Commit 568f194e ("net: ppp: use
      sk_unattached_filter api") inadvertently changed the logic when setting
      PPP pass and active filters. This applies to both the generic PPP subsystem
      implemented by drivers/net/ppp/ppp_generic.c and the ISDN PPP subsystem
      implemented by drivers/isdn/i4l/isdn_ppp.c. The original code in ppp_ioctl()
      (or isdn_ppp_ioctl(), resp.) handling PPPIOCSPASS and PPPIOCSACTIVE allowed to
      remove a pass/active filter previously set by using a filter of length zero.
      However, with the new code this is not possible anymore as this case is not
      explicitly checked for, which leads to passing NULL as a filter to
      sk_unattached_filter_create(). This results in returning EINVAL to the caller.
      
      Additionally, the variables ppp->pass_filter and ppp->active_filter (or
      is->pass_filter and is->active_filter, resp.) are not reset to NULL, although
      the filters they point to may have been destroyed by
      sk_unattached_filter_destroy(), so in this EINVAL case dangling pointers are
      left behind (provided the pointers were previously non-NULL).
      
      This patch corrects both problems by checking whether the filter passed is
      empty or non-empty, and prevents sk_unattached_filter_create() from being
      called in the first case. Moreover, the pointers are always reset to NULL
      as soon as sk_unattached_filter_destroy() returns.
      Signed-off-by: default avatarChristoph Schulz <develop@kristov.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cc25eaae
    • Amir Vadai's avatar
      net/mlx4_en: cq->irq_desc wasn't set in legacy EQ's · 858e6c32
      Amir Vadai authored
      Fix a regression introduced by commit 35f6f453 ("net/mlx4_en: Don't use
      irq_affinity_notifier to track changes in IRQ affinity map").
      When core is started in legacy EQ's (number of IRQ's < rx rings), cq->irq_desc
      was NULL.  This caused a kernel crash under heavy traffic - when having more
      than rx NAPI budget completions.
      Fixed to have it set for both EQ modes.
      Signed-off-by: default avatarAmir Vadai <amirv@mellanox.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      858e6c32
    • Sowmini Varadhan's avatar
      sunvnet: clean up objects created in vnet_new() on vnet_exit() · a4b70a07
      Sowmini Varadhan authored
      Nothing cleans up the objects created by
      vnet_new(), they are completely leaked.
      
      vnet_exit(), after doing the vio_unregister_driver() to clean
      up ports, should call a helper function that iterates over vnet_list
      and cleans up those objects. This includes unregister_netdevice()
      as well as free_netdev().
      Signed-off-by: default avatarSowmini Varadhan <sowmini.varadhan@oracle.com>
      Acked-by: default avatarDave Kleikamp <dave.kleikamp@oracle.com>
      Reviewed-by: default avatarKarl Volz <karl.volz@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a4b70a07
    • Michel Dänzer's avatar
      r8169: Enable RX_MULTI_EN for RTL_GIGA_MAC_VER_40 · 7a9810e7
      Michel Dänzer authored
      The ethernet port on my ASUS A88X Pro mainboard stopped working
      several times a day, with messages like these in dmesg:
      
      AMD-Vi: Event logged [IO_PAGE_FAULT device=05:00.0 domain=0x001e address=0x0000000000003000 flags=0x0050]
      
      Searching the web for these messages led me to similar reports about
      different hardware supported by r8169, and eventually to commits
      3ced8c95 ('r8169: enforce RX_MULTI_EN
      for the 8168f.') and eb2dc35d ('r8169:
      RxConfig hack for the 8168evl'). So I tried this change, and it fixes
      the problem for me.
      Signed-off-by: default avatarMichel Dänzer <michel@daenzer.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7a9810e7
  4. 16 Jul, 2014 8 commits
  5. 15 Jul, 2014 9 commits
    • Niu Yawei's avatar
      quota: missing lock in dqcache_shrink_scan() · d68aab6b
      Niu Yawei authored
      Commit 1ab6c499 (fs: convert fs shrinkers to new scan/count API)
      accidentally removed locking from quota shrinker. Fix it -
      dqcache_shrink_scan() should use dq_list_lock to protect the
      scan on free_dquots list.
      
      CC: stable@vger.kernel.org
      Fixes: 1ab6c499Signed-off-by: default avatarNiu Yawei <yawei.niu@intel.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      d68aab6b
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse · 0b632204
      Linus Torvalds authored
      Pull fuse fixes from Miklos Szeredi:
       "This contains miscellaneous fixes"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
        fuse: replace count*size kzalloc by kcalloc
        fuse: release temporary page if fuse_writepage_locked() failed
        fuse: restructure ->rename2()
        fuse: avoid scheduling while atomic
        fuse: handle large user and group ID
        fuse: inode: drop cast
        fuse: ignore entry-timeout on LOOKUP_REVAL
        fuse: timeout comparison fix
      0b632204
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 5615f9f8
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) Bluetooth pairing fixes from Johan Hedberg.
      
       2) ieee80211_send_auth() doesn't allocate enough tail room for the SKB,
          from Max Stepanov.
      
       3) New iwlwifi chip IDs, from Oren Givon.
      
       4) bnx2x driver reads wrong PCI config space MSI register, from Yijing
          Wang.
      
       5) IPV6 MLD Query validation isn't strong enough, from Hangbin Liu.
      
       6) Fix double SKB free in openvswitch, from Andy Zhou.
      
       7) Fix sk_dst_set() being racey with UDP sockets, leading to strange
          crashes, from Eric Dumazet.
      
       8) Interpret the NAPI budget correctly in the new systemport driver,
          from Florian Fainelli.
      
       9) VLAN code frees percpu stats in the wrong place, leading to crashes
          in the get stats handler.  From Eric Dumazet.
      
      10) TCP sockets doing a repair can crash with a divide by zero, because
          we invoke tcp_push() with an MSS value of zero.  Just skip that part
          of the sendmsg paths in repair mode.  From Christoph Paasch.
      
      11) IRQ affinity bug fixes in mlx4 driver from Amir Vadai.
      
      12) Don't ignore path MTU icmp messages with a zero mtu, machines out
          there still spit them out, and all of our per-protocol handlers for
          PMTU can cope with it just fine.  From Edward Allcutt.
      
      13) Some NETDEV_CHANGE notifier invocations were not passing in the
          correct kind of cookie as the argument, from Loic Prylli.
      
      14) Fix crashes in long multicast/broadcast reassembly, from Jon Paul
          Maloy.
      
      15) ip_tunnel_lookup() doesn't interpret wildcard keys correctly, fix
          from Dmitry Popov.
      
      16) Fix skb->sk assigned without taking a reference to 'sk' in
          appletalk, from Andrey Utkin.
      
      17) Fix some info leaks in ULP event signalling to userspace in SCTP,
          from Daniel Borkmann.
      
      18) Fix deadlocks in HSO driver, from Olivier Sobrie.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (93 commits)
        hso: fix deadlock when receiving bursts of data
        hso: remove unused workqueue
        net: ppp: don't call sk_chk_filter twice
        mlx4: mark napi id for gro_skb
        bonding: fix ad_select module param check
        net: pppoe: use correct channel MTU when using Multilink PPP
        neigh: sysctl - simplify address calculation of gc_* variables
        net: sctp: fix information leaks in ulpevent layer
        MAINTAINERS: update r8169 maintainer
        net: bcmgenet: fix RGMII_MODE_EN bit
        tipc: clear 'next'-pointer of message fragments before reassembly
        r8152: fix r8152_csum_workaround function
        be2net: set EQ DB clear-intr bit in be_open()
        GRE: enable offloads for GRE
        farsync: fix invalid memory accesses in fst_add_one() and fst_init_card()
        igb: do a reset on SR-IOV re-init if device is down
        igb: Workaround for i210 Errata 25: Slow System Clock
        usbnet: smsc95xx: add reset_resume function with reset operation
        dp83640: Always decode received status frames
        r8169: disable L23
        ...
      5615f9f8
    • Takashi Iwai's avatar
      ALSA: hda - Fix broken PM due to incomplete i915 initialization · 4da63c6f
      Takashi Iwai authored
      When the initialization of Intel HDMI controller fails due to missing
      i915 kernel symbols (e.g. HD-audio is built in while i915 is module),
      the driver discontinues the probe.  However, since the probe was done
      asynchronously, the driver object still remains, thus the relevant PM
      ops are still called at suspend/resume. This results in the bad access
      to the incomplete audio card object, eventually leads to Oops or stall
      at PM.
      
      This patch adds the missing checks of chip->init_failed flag at each
      PM callback in order to fix the problem above.
      
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=79561
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      4da63c6f
    • Olivier Sobrie's avatar
      hso: fix deadlock when receiving bursts of data · 8f9818af
      Olivier Sobrie authored
      When the module sends bursts of data, sometimes a deadlock happens in
      the hso driver when the tty buffer doesn't get the chance to be flushed
      quickly enough.
      
      Remove the endless while loop in function put_rxbuf_data() which is
      called by the urb completion handler.
      If there isn't enough room in the tty buffer, discards all the data
      received in the URB.
      
      Cc: David Miller <davem@davemloft.net>
      Cc: David Laight <David.Laight@ACULAB.COM>
      Cc: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk>
      Cc: Dan Williams <dcbw@redhat.com>
      Cc: Jan Dumon <j.dumon@option.com>
      Signed-off-by: default avatarOlivier Sobrie <olivier@sobrie.be>
      Acked-by: default avatarAlan Cox <alan@linux.intel.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8f9818af
    • Olivier Sobrie's avatar
      hso: remove unused workqueue · 5c763edf
      Olivier Sobrie authored
      The workqueue "retry_unthrottle_workqueue" is not scheduled anywhere
      in the code. So, remove it.
      Signed-off-by: default avatarOlivier Sobrie <olivier@sobrie.be>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5c763edf
    • Linus Torvalds's avatar
      Merge tag 'firewire-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394 · 1b81e881
      Linus Torvalds authored
      Pull firewire fix from Stefan Richter:
       "The 1394 drivers cannot and are not supposed to be built on platforms
        which don't provide the DMA mapping API (regression since v3.16-rc1
        with CONFIG_COMPILE_TEST=y on some architectures)"
      
      * tag 'firewire-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394:
        firewire: IEEE 1394 (FireWire) support should depend on HAS_DMA
      1b81e881
    • Linus Torvalds's avatar
      Merge git://git.kvack.org/~bcrl/aio-fixes · 8ec8ba8e
      Linus Torvalds authored
      Pull another aio fix from Ben LaHaise:
       "put_reqs_available() can now be called from within irq context, which
        means that it (and its sibling function get_reqs_available()) now need
        to be irq-safe, not just preempt-safe"
      
      * git://git.kvack.org/~bcrl/aio-fixes:
        aio: protect reqs_available updates from changes in interrupt handlers
      8ec8ba8e
    • Sasha Levin's avatar
      net/l2tp: don't fall back on UDP [get|set]sockopt · 3cf521f7
      Sasha Levin authored
      The l2tp [get|set]sockopt() code has fallen back to the UDP functions
      for socket option levels != SOL_PPPOL2TP since day one, but that has
      never actually worked, since the l2tp socket isn't an inet socket.
      
      As David Miller points out:
      
        "If we wanted this to work, it'd have to look up the tunnel and then
         use tunnel->sk, but I wonder how useful that would be"
      
      Since this can never have worked so nobody could possibly have depended
      on that functionality, just remove the broken code and return -EINVAL.
      Reported-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Acked-by: default avatarJames Chapman <jchapman@katalix.com>
      Acked-by: default avatarDavid Miller <davem@davemloft.net>
      Cc: Phil Turnbull <phil.turnbull@oracle.com>
      Cc: Vegard Nossum <vegard.nossum@oracle.com>
      Cc: Willy Tarreau <w@1wt.eu>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      3cf521f7
  6. 14 Jul, 2014 5 commits
    • Christoph Schulz's avatar
      net: ppp: don't call sk_chk_filter twice · 3916a319
      Christoph Schulz authored
      Commit 568f194e ("net: ppp: use
      sk_unattached_filter api") causes sk_chk_filter() to be called twice when
      setting a PPP pass or active filter. This applies to both the generic PPP
      subsystem implemented by drivers/net/ppp/ppp_generic.c and the ISDN PPP
      subsystem implemented by drivers/isdn/i4l/isdn_ppp.c. The first call is from
      within get_filter(). The second one is through the call chain
      
        ppp_ioctl() or isdn_ppp_ioctl()
        --> sk_unattached_filter_create()
            --> __sk_prepare_filter()
                --> sk_chk_filter()
      
      The first call from within get_filter() should be deleted as get_filter() is
      called just before calling sk_unattached_filter_create() later on, which
      eventually calls sk_chk_filter() anyway.
      
      For 3.15.x, this proposed change is a bugfix rather than a pure optimization as
      in that branch, sk_chk_filter() may replace filter codes by other codes which
      are not recognized when executing sk_chk_filter() a second time. So with
      3.15.x, if sk_chk_filter() is called twice, the second invocation may yield
      EINVAL (this depends on the filter codes found in the filter to be set, but
      because the replacement is done for frequently used codes, this is almost
      always the case). The net effect is that setting pass and/or active PPP filters
      does not work anymore, since sk_unattached_filter_create() always returns
      EINVAL due to the second call to sk_chk_filter(), regardless whether the filter
      was originally sane or not.
      Signed-off-by: default avatarChristoph Schulz <develop@kristov.de>
      Acked-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3916a319
    • Jason Wang's avatar
      mlx4: mark napi id for gro_skb · 32b333fe
      Jason Wang authored
      Napi id was not marked for gro_skb, this will lead rx busy loop won't
      work correctly since they stack never try to call low latency receive
      method because of a zero socket napi id. Fix this by marking napi id
      for gro_skb.
      
      The transaction rate of 1 byte netperf tcp_rr gets about 50% increased
      (from 20531.68 to 30610.88).
      
      Cc: Amir Vadai <amirv@mellanox.com>
      Signed-off-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      32b333fe
    • Nikolay Aleksandrov's avatar
      bonding: fix ad_select module param check · 548d28bd
      Nikolay Aleksandrov authored
      Obvious copy/paste error when I converted the ad_select to the new
      option API. "lacp_rate" there should be "ad_select" so we can get the
      proper value.
      
      CC: Jay Vosburgh <j.vosburgh@gmail.com>
      CC: Veaceslav Falico <vfalico@gmail.com>
      CC: Andy Gospodarek <andy@greyhouse.net>
      CC: David S. Miller <davem@davemloft.net>
      
      Fixes: 9e5f5eeb ("bonding: convert ad_select to use the new option
      API")
      Reported-by: default avatarKarim Scheik <karim.scheik@prisma-solutions.at>
      Signed-off-by: default avatarNikolay Aleksandrov <nikolay@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      548d28bd
    • Christoph Schulz's avatar
      net: pppoe: use correct channel MTU when using Multilink PPP · a8a3e41c
      Christoph Schulz authored
      The PPP channel MTU is used with Multilink PPP when ppp_mp_explode() (see
      ppp_generic module) tries to determine how big a fragment might be. According
      to RFC 1661, the MTU excludes the 2-byte PPP protocol field, see the
      corresponding comment and code in ppp_mp_explode():
      
      		/*
      		 * hdrlen includes the 2-byte PPP protocol field, but the
      		 * MTU counts only the payload excluding the protocol field.
      		 * (RFC1661 Section 2)
      		 */
      		mtu = pch->chan->mtu - (hdrlen - 2);
      
      However, the pppoe module *does* include the PPP protocol field in the channel
      MTU, which is wrong as it causes the PPP payload to be 1-2 bytes too big under
      certain circumstances (one byte if PPP protocol compression is used, two
      otherwise), causing the generated Ethernet packets to be dropped. So the pppoe
      module has to subtract two bytes from the channel MTU. This error only
      manifests itself when using Multilink PPP, as otherwise the channel MTU is not
      used anywhere.
      
      In the following, I will describe how to reproduce this bug. We configure two
      pppd instances for multilink PPP over two PPPoE links, say eth2 and eth3, with
      a MTU of 1492 bytes for each link and a MRRU of 2976 bytes. (This MRRU is
      computed by adding the two link MTUs and subtracting the MP header twice, which
      is 4 bytes long.) The necessary pppd statements on both sides are "multilink
      mtu 1492 mru 1492 mrru 2976". On the client side, we additionally need "plugin
      rp-pppoe.so eth2" and "plugin rp-pppoe.so eth3", respectively; on the server
      side, we additionally need to start two pppoe-server instances to be able to
      establish two PPPoE sessions, one over eth2 and one over eth3. We set the MTU
      of the PPP network interface to the MRRU (2976) on both sides of the connection
      in order to make use of the higher bandwidth. (If we didn't do that, IP
      fragmentation would kick in, which we want to avoid.)
      
      Now we send a ICMPv4 echo request with a payload of 2948 bytes from client to
      server over the PPP link. This results in the following network packet:
      
         2948 (echo payload)
       +    8 (ICMPv4 header)
       +   20 (IPv4 header)
      ---------------------
         2976 (PPP payload)
      
      These 2976 bytes do not exceed the MTU of the PPP network interface, so the
      IP packet is not fragmented. Now the multilink PPP code in ppp_mp_explode()
      prepends one protocol byte (0x21 for IPv4), making the packet one byte bigger
      than the negotiated MRRU. So this packet would have to be divided in three
      fragments. But this does not happen as each link MTU is assumed to be two bytes
      larger. So this packet is diveded into two fragments only, one of size 1489 and
      one of size 1488. Now we have for that bigger fragment:
      
         1489 (PPP payload)
       +    4 (MP header)
       +    2 (PPP protocol field for the MP payload (0x3d))
       +    6 (PPPoE header)
      --------------------------
         1501 (Ethernet payload)
      
      This packet exceeds the link MTU and is discarded.
      
      If one configures the link MTU on the client side to 1501, one can see the
      discarded Ethernet frames with tcpdump running on the client. A
      
      ping -s 2948 -c 1 192.168.15.254
      
      leads to the smaller fragment that is correctly received on the server side:
      
      (tcpdump -vvvne -i eth3 pppoes and ppp proto 0x3d)
      52:54:00:ad:87:fd > 52:54:00:79:5c:d0, ethertype PPPoE S (0x8864),
        length 1514: PPPoE  [ses 0x3] MLPPP (0x003d), length 1494: seq 0x000,
        Flags [end], length 1492
      
      and to the bigger fragment that is not received on the server side:
      
      (tcpdump -vvvne -i eth2 pppoes and ppp proto 0x3d)
      52:54:00:70:9e:89 > 52:54:00:5d:6f:b0, ethertype PPPoE S (0x8864),
        length 1515: PPPoE  [ses 0x5] MLPPP (0x003d), length 1495: seq 0x000,
        Flags [begin], length 1493
      
      With the patch below, we correctly obtain three fragments:
      
      52:54:00:ad:87:fd > 52:54:00:79:5c:d0, ethertype PPPoE S (0x8864),
        length 1514: PPPoE  [ses 0x1] MLPPP (0x003d), length 1494: seq 0x000,
        Flags [begin], length 1492
      52:54:00:70:9e:89 > 52:54:00:5d:6f:b0, ethertype PPPoE S (0x8864),
        length 1514: PPPoE  [ses 0x1] MLPPP (0x003d), length 1494: seq 0x000,
        Flags [none], length 1492
      52:54:00:ad:87:fd > 52:54:00:79:5c:d0, ethertype PPPoE S (0x8864),
        length 27: PPPoE  [ses 0x1] MLPPP (0x003d), length 7: seq 0x000,
        Flags [end], length 5
      
      And the ICMPv4 echo request is successfully received at the server side:
      
      IP (tos 0x0, ttl 64, id 21925, offset 0, flags [DF], proto ICMP (1),
        length 2976)
          192.168.222.2 > 192.168.15.254: ICMP echo request, id 30530, seq 0,
            length 2956
      
      The bug was introduced in commit c9aa6895
      ("[PPPOE]: Advertise PPPoE MTU") from the very beginning. This patch applies
      to 3.10 upwards but the fix can be applied (with minor modifications) to
      kernels as old as 2.6.32.
      Signed-off-by: default avatarChristoph Schulz <develop@kristov.de>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a8a3e41c
    • Mathias Krause's avatar
      neigh: sysctl - simplify address calculation of gc_* variables · 9ecf07a1
      Mathias Krause authored
      The code in neigh_sysctl_register() relies on a specific layout of
      struct neigh_table, namely that the 'gc_*' variables are directly
      following the 'parms' member in a specific order. The code, though,
      expresses this in the most ugly way.
      
      Get rid of the ugly casts and use the 'tbl' pointer to get a handle to
      the table. This way we can refer to the 'gc_*' variables directly.
      
      Similarly seen in the grsecurity patch, written by Brad Spengler.
      Signed-off-by: default avatarMathias Krause <minipli@googlemail.com>
      Cc: Brad Spengler <spender@grsecurity.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9ecf07a1