1. 21 Jan, 2015 17 commits
    • Florian Westphal's avatar
      netfilter: conntrack: disable generic tracking for known protocols · 25bde5dd
      Florian Westphal authored
      commit db29a950 upstream.
      
      Given following iptables ruleset:
      
      -P FORWARD DROP
      -A FORWARD -m sctp --dport 9 -j ACCEPT
      -A FORWARD -p tcp --dport 80 -j ACCEPT
      -A FORWARD -p tcp -m conntrack -m state ESTABLISHED,RELATED -j ACCEPT
      
      One would assume that this allows SCTP on port 9 and TCP on port 80.
      Unfortunately, if the SCTP conntrack module is not loaded, this allows
      *all* SCTP communication, to pass though, i.e. -p sctp -j ACCEPT,
      which we think is a security issue.
      
      This is because on the first SCTP packet on port 9, we create a dummy
      "generic l4" conntrack entry without any port information (since
      conntrack doesn't know how to extract this information).
      
      All subsequent packets that are unknown will then be in established
      state since they will fallback to proto_generic and will match the
      'generic' entry.
      
      Our originally proposed version [1] completely disabled generic protocol
      tracking, but Jozsef suggests to not track protocols for which a more
      suitable helper is available, hence we now mitigate the issue for in
      tree known ct protocol helpers only, so that at least NAT and direction
      information will still be preserved for others.
      
       [1] http://www.spinics.net/lists/netfilter-devel/msg33430.html
      
      Joint work with Daniel Borkmann.
      Signed-off-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Acked-by: default avatarJozsef Kadlecsik <kadlec@blackhole.kfki.hu>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      25bde5dd
    • Cong Wang's avatar
      macvlan: unregister net device when netdev_upper_dev_link() fails · 1bfbf30d
      Cong Wang authored
      commit da37705c upstream.
      
      rtnl_newlink() doesn't unregister it for us on failure.
      
      Cc: Patrick McHardy <kaber@trash.net>
      Cc: David S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarCong Wang <cwang@twopensource.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Cc: Zefan Li <lizefan@huawei.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      1bfbf30d
    • Jay Vosburgh's avatar
      net/core: Handle csum for CHECKSUM_COMPLETE VXLAN forwarding · 4fdd841d
      Jay Vosburgh authored
      [ Upstream commit 2c26d34b ]
      
      When using VXLAN tunnels and a sky2 device, I have experienced
      checksum failures of the following type:
      
      [ 4297.761899] eth0: hw csum failure
      [...]
      [ 4297.765223] Call Trace:
      [ 4297.765224]  <IRQ>  [<ffffffff8172f026>] dump_stack+0x46/0x58
      [ 4297.765235]  [<ffffffff8162ba52>] netdev_rx_csum_fault+0x42/0x50
      [ 4297.765238]  [<ffffffff8161c1a0>] ? skb_push+0x40/0x40
      [ 4297.765240]  [<ffffffff8162325c>] __skb_checksum_complete+0xbc/0xd0
      [ 4297.765243]  [<ffffffff8168c602>] tcp_v4_rcv+0x2e2/0x950
      [ 4297.765246]  [<ffffffff81666ca0>] ? ip_rcv_finish+0x360/0x360
      
      	These are reliably reproduced in a network topology of:
      
      container:eth0 == host(OVS VXLAN on VLAN) == bond0 == eth0 (sky2) -> switch
      
      	When VXLAN encapsulated traffic is received from a similarly
      configured peer, the above warning is generated in the receive
      processing of the encapsulated packet.  Note that the warning is
      associated with the container eth0.
      
              The skbs from sky2 have ip_summed set to CHECKSUM_COMPLETE, and
      because the packet is an encapsulated Ethernet frame, the checksum
      generated by the hardware includes the inner protocol and Ethernet
      headers.
      
      	The receive code is careful to update the skb->csum, except in
      __dev_forward_skb, as called by dev_forward_skb.  __dev_forward_skb
      calls eth_type_trans, which in turn calls skb_pull_inline(skb, ETH_HLEN)
      to skip over the Ethernet header, but does not update skb->csum when
      doing so.
      
      	This patch resolves the problem by adding a call to
      skb_postpull_rcsum to update the skb->csum after the call to
      eth_type_trans.
      Signed-off-by: default avatarJay Vosburgh <jay.vosburgh@canonical.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      4fdd841d
    • Govindarajulu Varadarajan's avatar
      enic: fix rx skb checksum · aad9a4c0
      Govindarajulu Varadarajan authored
      [ Upstream commit 17e96834 ]
      
      Hardware always provides compliment of IP pseudo checksum. Stack expects
      whole packet checksum without pseudo checksum if CHECKSUM_COMPLETE is set.
      
      This causes checksum error in nf & ovs.
      
      kernel: qg-19546f09-f2: hw csum failure
      kernel: CPU: 9 PID: 0 Comm: swapper/9 Tainted: GF          O--------------   3.10.0-123.8.1.el7.x86_64 #1
      kernel: Hardware name: Cisco Systems Inc UCSB-B200-M3/UCSB-B200-M3, BIOS B200M3.2.2.3.0.080820141339 08/08/2014
      kernel: ffff881218f40000 df68243feb35e3a8 ffff881237a43ab8 ffffffff815e237b
      kernel: ffff881237a43ad0 ffffffff814cd4ca ffff8829ec71eb00 ffff881237a43af0
      kernel: ffffffff814c6232 0000000000000286 ffff8829ec71eb00 ffff881237a43b00
      kernel: Call Trace:
      kernel: <IRQ>  [<ffffffff815e237b>] dump_stack+0x19/0x1b
      kernel: [<ffffffff814cd4ca>] netdev_rx_csum_fault+0x3a/0x40
      kernel: [<ffffffff814c6232>] __skb_checksum_complete_head+0x62/0x70
      kernel: [<ffffffff814c6251>] __skb_checksum_complete+0x11/0x20
      kernel: [<ffffffff8155a20c>] nf_ip_checksum+0xcc/0x100
      kernel: [<ffffffffa049edc7>] icmp_error+0x1f7/0x35c [nf_conntrack_ipv4]
      kernel: [<ffffffff814cf419>] ? netif_rx+0xb9/0x1d0
      kernel: [<ffffffffa040eb7b>] ? internal_dev_recv+0xdb/0x130 [openvswitch]
      kernel: [<ffffffffa04c8330>] nf_conntrack_in+0xf0/0xa80 [nf_conntrack]
      kernel: [<ffffffff81509380>] ? inet_del_offload+0x40/0x40
      kernel: [<ffffffffa049e302>] ipv4_conntrack_in+0x22/0x30 [nf_conntrack_ipv4]
      kernel: [<ffffffff815005ca>] nf_iterate+0xaa/0xc0
      kernel: [<ffffffff81509380>] ? inet_del_offload+0x40/0x40
      kernel: [<ffffffff81500664>] nf_hook_slow+0x84/0x140
      kernel: [<ffffffff81509380>] ? inet_del_offload+0x40/0x40
      kernel: [<ffffffff81509dd4>] ip_rcv+0x344/0x380
      
      Hardware verifies IP & tcp/udp header checksum but does not provide payload
      checksum, use CHECKSUM_UNNECESSARY. Set it only if its valid IP tcp/udp packet.
      
      Cc: Jiri Benc <jbenc@redhat.com>
      Cc: Stefan Assmann <sassmann@redhat.com>
      Reported-by: default avatarSunil Choudhary <schoudha@redhat.com>
      Signed-off-by: default avatarGovindarajulu Varadarajan <_govind@gmx.com>
      Reviewed-by: default avatarJiri Benc <jbenc@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      aad9a4c0
    • Jiri Pirko's avatar
      team: avoid possible underflow of count_pending value for notify_peers and mcast_rejoin · 57bdf691
      Jiri Pirko authored
      [ Upstream commit b0d11b42 ]
      
      This patch is fixing a race condition that may cause setting
      count_pending to -1, which results in unwanted big bulk of arp messages
      (in case of "notify peers").
      
      Consider following scenario:
      
      count_pending == 2
         CPU0                                           CPU1
      					team_notify_peers_work
      					  atomic_dec_and_test (dec count_pending to 1)
      					  schedule_delayed_work
       team_notify_peers
         atomic_add (adding 1 to count_pending)
      					team_notify_peers_work
      					  atomic_dec_and_test (dec count_pending to 1)
      					  schedule_delayed_work
      					team_notify_peers_work
      					  atomic_dec_and_test (dec count_pending to 0)
         schedule_delayed_work
      					team_notify_peers_work
      					  atomic_dec_and_test (dec count_pending to -1)
      
      Fix this race by using atomic_dec_if_positive - that will prevent
      count_pending running under 0.
      
      Fixes: fc423ff0 ("team: add peer notification")
      Fixes: 492b200e  ("team: add support for sending multicast rejoins")
      Signed-off-by: default avatarJiri Pirko <jiri@resnulli.us>
      Signed-off-by: default avatarJiri Benc <jbenc@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      57bdf691
    • Eric Dumazet's avatar
      alx: fix alx_poll() · 644bfce0
      Eric Dumazet authored
      [ Upstream commit 7a05dc64 ]
      
      Commit d75b1ade ("net: less interrupt masking in NAPI") uncovered
      wrong alx_poll() behavior.
      
      A NAPI poll() handler is supposed to return exactly the budget when/if
      napi_complete() has not been called.
      
      It is also supposed to return number of frames that were received, so
      that netdev_budget can have a meaning.
      
      Also, in case of TX pressure, we still have to dequeue received
      packets : alx_clean_rx_irq() has to be called even if
      alx_clean_tx_irq(alx) returns false, otherwise device is half duplex.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Fixes: d75b1ade ("net: less interrupt masking in NAPI")
      Reported-by: default avatarOded Gabbay <oded.gabbay@amd.com>
      Bisected-by: default avatarOded Gabbay <oded.gabbay@amd.com>
      Tested-by: default avatarOded Gabbay <oded.gabbay@amd.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      644bfce0
    • Herbert Xu's avatar
      tcp: Do not apply TSO segment limit to non-TSO packets · a32cd17b
      Herbert Xu authored
      [ Upstream commit 843925f3 ]
      
      Thomas Jarosch reported IPsec TCP stalls when a PMTU event occurs.
      
      In fact the problem was completely unrelated to IPsec.  The bug is
      also reproducible if you just disable TSO/GSO.
      
      The problem is that when the MSS goes down, existing queued packet
      on the TX queue that have not been transmitted yet all look like
      TSO packets and get treated as such.
      
      This then triggers a bug where tcp_mss_split_point tells us to
      generate a zero-sized packet on the TX queue.  Once that happens
      we're screwed because the zero-sized packet can never be removed
      by ACKs.
      
      Fixes: 1485348d ("tcp: Apply device TSO segment limit earlier")
      Reported-by: default avatarThomas Jarosch <thomas.jarosch@intra2net.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      
      Cheers,
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      a32cd17b
    • Thomas Graf's avatar
      net: Reset secmark when scrubbing packet · 375e61d0
      Thomas Graf authored
      [ Upstream commit b8fb4e06 ]
      
      skb_scrub_packet() is called when a packet switches between a context
      such as between underlay and overlay, between namespaces, or between
      L3 subnets.
      
      While we already scrub the packet mark, connection tracking entry,
      and cached destination, the security mark/context is left intact.
      
      It seems wrong to inherit the security context of a packet when going
      from overlay to underlay or across forwarding paths.
      Signed-off-by: default avatarThomas Graf <tgraf@suug.ch>
      Acked-by: default avatarFlavio Leitner <fbl@sysclose.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      375e61d0
    • Toshiaki Makita's avatar
      net: Fix stacked vlan offload features computation · 11a78d0f
      Toshiaki Makita authored
      [ Upstream commit 796f2da8 ]
      
      When vlan tags are stacked, it is very likely that the outer tag is stored
      in skb->vlan_tci and skb->protocol shows the inner tag's vlan_proto.
      Currently netif_skb_features() first looks at skb->protocol even if there
      is the outer tag in vlan_tci, thus it incorrectly retrieves the protocol
      encapsulated by the inner vlan instead of the inner vlan protocol.
      This allows GSO packets to be passed to HW and they end up being
      corrupted.
      
      Fixes: 58e998c6 ("offloading: Force software GSO for multiple vlan tags.")
      Signed-off-by: default avatarToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      11a78d0f
    • Antonio Quartulli's avatar
      batman-adv: avoid NULL dereferences and fix if check · 030b8e65
      Antonio Quartulli authored
      [ Upstream commit 0d164491 ]
      
      Gateway having bandwidth_down equal to zero are not accepted
      at all and so never added to the Gateway list.
      For this reason checking the bandwidth_down member in
      batadv_gw_out_of_range() is useless.
      
      This is probably a copy/paste error and this check was supposed
      to be "!gw_node" only. Moreover, the way the check is written
      now may also lead to a NULL dereference.
      
      Fix this by rewriting the if-condition properly.
      
      Introduced by 414254e3
      ("batman-adv: tvlv - gateway download/upload bandwidth container")
      Signed-off-by: default avatarAntonio Quartulli <antonio@meshcoding.com>
      Reported-by: default avatarDavid Binderman <dcb314@hotmail.com>
      Signed-off-by: default avatarMarek Lindner <mareklindner@neomailbox.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      030b8e65
    • Sven Eckelmann's avatar
      batman-adv: Unify fragment size calculation · 01fadf3a
      Sven Eckelmann authored
      [ Upstream commit 0402e444 ]
      
      The fragmentation code was replaced in 610bfc6b
      ("batman-adv: Receive fragmented packets and merge") by an implementation which
      can handle up to 16 fragments of a packet. The packet is prepared for the split
      in fragments by the function batadv_frag_send_packet and the actual split is
      done by batadv_frag_create.
      
      Both functions calculate the size of a fragment themself. But their calculation
      differs because batadv_frag_send_packet also subtracts ETH_HLEN. Therefore,
      the check in batadv_frag_send_packet "can a full fragment can be created?" may
      return true even when batadv_frag_create cannot create a full fragment.
      
      The function batadv_frag_create doesn't check the size of the skb before
      splitting it and therefore might try to create a larger fragment than the
      remaining buffer. This creates an integer underflow and an invalid len is given
      to skb_split.
      Signed-off-by: default avatarSven Eckelmann <sven@narfation.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      01fadf3a
    • Prashant Sreedharan's avatar
      tg3: tg3_disable_ints using uninitialized mailbox value to disable interrupts · 49f195ad
      Prashant Sreedharan authored
      [ Upstream commit 05b0aa57 ]
      
      During driver load in tg3_init_one, if the driver detects DMA activity before
      intializing the chip tg3_halt is called. As part of tg3_halt interrupts are
      disabled using routine tg3_disable_ints. This routine was using mailbox value
      which was not initialized (default value is 0). As a result driver was writing
      0x00000001 to pci config space register 0, which is the vendor id / device id.
      
      This driver bug was exposed because of the commit a7877b17a667 (PCI: Check only
      the Vendor ID to identify Configuration Request Retry). Also this issue is only
      seen in older generation chipsets like 5722 because config space write to offset
      0 from driver is possible. The newer generation chips ignore writes to offset 0.
      Also without commit a7877b17a667, for these older chips when a GRC reset is
      issued the Bootcode would reprogram the vendor id/device id, which is the reason
      this bug was masked earlier.
      
      Fixed by initializing the interrupt mailbox registers before calling tg3_halt.
      
      Please queue for -stable.
      Reported-by: default avatarNils Holland <nholland@tisys.org>
      Reported-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: default avatarPrashant Sreedharan <prashant@broadcom.com>
      Signed-off-by: default avatarMichael Chan <mchan@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      49f195ad
    • stephen hemminger's avatar
      in6: fix conflict with glibc · 8f6ad7c4
      stephen hemminger authored
      [ Upstream commit 6d08acd2 ]
      
      Resolve conflicts between glibc definition of IPV6 socket options
      and those defined in Linux headers. Looks like earlier efforts to
      solve this did not cover all the definitions.
      
      It resolves warnings during iproute2 build.
      Please consider for stable as well.
      Signed-off-by: default avatarStephen Hemminger <stephen@networkplumber.org>
      Acked-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      8f6ad7c4
    • Thomas Graf's avatar
      netlink: Don't reorder loads/stores before marking mmap netlink frame as available · 3eeeea50
      Thomas Graf authored
      [ Upstream commit a18e6a18 ]
      
      Each mmap Netlink frame contains a status field which indicates
      whether the frame is unused, reserved, contains data or needs to
      be skipped. Both loads and stores may not be reordeded and must
      complete before the status field is changed and another CPU might
      pick up the frame for use. Use an smp_mb() to cover needs of both
      types of callers to netlink_set_status(), callers which have been
      reading data frame from the frame, and callers which have been
      filling or releasing and thus writing to the frame.
      
      - Example code path requiring a smp_rmb():
        memcpy(skb->data, (void *)hdr + NL_MMAP_HDRLEN, hdr->nm_len);
        netlink_set_status(hdr, NL_MMAP_STATUS_UNUSED);
      
      - Example code path requiring a smp_wmb():
        hdr->nm_uid	= from_kuid(sk_user_ns(sk), NETLINK_CB(skb).creds.uid);
        hdr->nm_gid	= from_kgid(sk_user_ns(sk), NETLINK_CB(skb).creds.gid);
        netlink_frame_flush_dcache(hdr);
        netlink_set_status(hdr, NL_MMAP_STATUS_VALID);
      
      Fixes: f9c228 ("netlink: implement memory mapped recvmsg()")
      Reported-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: default avatarThomas Graf <tgraf@suug.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      3eeeea50
    • David Miller's avatar
      netlink: Always copy on mmap TX. · 357d462e
      David Miller authored
      [ Upstream commit 4682a035 ]
      
      Checking the file f_count and the nlk->mapped count is not completely
      sufficient to prevent the mmap'd area contents from changing from
      under us during netlink mmap sendmsg() operations.
      
      Be careful to sample the header's length field only once, because this
      could change from under us as well.
      
      Fixes: 5fd96123 ("netlink: implement memory mapped sendmsg()")
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Acked-by: default avatarDaniel Borkmann <dborkman@redhat.com>
      Acked-by: default avatarThomas Graf <tgraf@suug.ch>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      357d462e
    • Timo Teräs's avatar
      gre: fix the inner mac header in nbma tunnel xmit path · 6c7142a1
      Timo Teräs authored
      [ Upstream commit 8a0033a9 ]
      
      The NBMA GRE tunnels temporarily push GRE header that contain the
      per-packet NBMA destination on the skb via header ops early in xmit
      path. It is the later pulled before the real GRE header is constructed.
      
      The inner mac was thus set differently in nbma case: the GRE header
      has been pushed by neighbor layer, and mac header points to beginning
      of the temporary gre header (set by dev_queue_xmit).
      
      Now that the offloads expect mac header to point to the gre payload,
      fix the xmit patch to:
       - pull first the temporary gre header away
       - and reset mac header to point to gre payload
      
      This fixes tso to work again with nbma tunnels.
      
      Fixes: 14051f04 ("gre: Use inner mac length when computing tunnel length")
      Signed-off-by: default avatarTimo Teräs <timo.teras@iki.fi>
      Cc: Tom Herbert <therbert@google.com>
      Cc: Alexander Duyck <alexander.h.duyck@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      6c7142a1
    • Kamal Mostafa's avatar
      Linux 3.13.11-ckt14 · 38e21c0c
      Kamal Mostafa authored
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      38e21c0c
  2. 14 Jan, 2015 5 commits
  3. 13 Jan, 2015 13 commits
    • Andy Lutomirski's avatar
      x86_64, vdso: Fix the vdso address randomization algorithm · ff845a15
      Andy Lutomirski authored
      commit 394f56fe upstream.
      
      The theory behind vdso randomization is that it's mapped at a random
      offset above the top of the stack.  To avoid wasting a page of
      memory for an extra page table, the vdso isn't supposed to extend
      past the lowest PMD into which it can fit.  Other than that, the
      address should be a uniformly distributed address that meets all of
      the alignment requirements.
      
      The current algorithm is buggy: the vdso has about a 50% probability
      of being at the very end of a PMD.  The current algorithm also has a
      decent chance of failing outright due to incorrect handling of the
      case where the top of the stack is near the top of its PMD.
      
      This fixes the implementation.  The paxtest estimate of vdso
      "randomisation" improves from 11 bits to 18 bits.  (Disclaimer: I
      don't know what the paxtest code is actually calculating.)
      
      It's worth noting that this algorithm is inherently biased: the vdso
      is more likely to end up near the end of its PMD than near the
      beginning.  Ideally we would either nix the PMD sharing requirement
      or jointly randomize the vdso and the stack to reduce the bias.
      
      In the mean time, this is a considerable improvement with basically
      no risk of compatibility issues, since the allowed outputs of the
      algorithm are unchanged.
      
      As an easy test, doing this:
      
      for i in `seq 10000`
        do grep -P vdso /proc/self/maps |cut -d- -f1
      done |sort |uniq -d
      
      used to produce lots of output (1445 lines on my most recent run).
      A tiny subset looks like this:
      
      7fffdfffe000
      7fffe01fe000
      7fffe05fe000
      7fffe07fe000
      7fffe09fe000
      7fffe0bfe000
      7fffe0dfe000
      
      Note the suspicious fe000 endings.  With the fix, I get a much more
      palatable 76 repeated addresses.
      Reviewed-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatarAndy Lutomirski <luto@amacapital.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      ff845a15
    • Jan Kara's avatar
      isofs: Fix unchecked printing of ER records · 24c7fcc3
      Jan Kara authored
      commit 4e202462 upstream.
      
      We didn't check length of rock ridge ER records before printing them.
      Thus corrupted isofs image can cause us to access and print some memory
      behind the buffer with obvious consequences.
      Reported-and-tested-by: default avatarCarl Henrik Lunde <chlunde@ping.uio.no>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      24c7fcc3
    • Sasha Levin's avatar
      KEYS: close race between key lookup and freeing · 55036ae4
      Sasha Levin authored
      commit a3a87844 upstream.
      
      When a key is being garbage collected, it's key->user would get put before
      the ->destroy() callback is called, where the key is removed from it's
      respective tracking structures.
      
      This leaves a key hanging in a semi-invalid state which leaves a window open
      for a different task to try an access key->user. An example is
      find_keyring_by_name() which would dereference key->user for a key that is
      in the process of being garbage collected (where key->user was freed but
      ->destroy() wasn't called yet - so it's still present in the linked list).
      
      This would cause either a panic, or corrupt memory.
      
      Fixes CVE-2014-9529.
      Signed-off-by: default avatarSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      55036ae4
    • Sven Eckelmann's avatar
      batman-adv: Calculate extra tail size based on queued fragments · e6e75eaa
      Sven Eckelmann authored
      commit 5b6698b0 upstream.
      
      The fragmentation code was replaced in 610bfc6b
      ("batman-adv: Receive fragmented packets and merge"). The new code provided a
      mostly unused parameter skb for the merging function. It is used inside the
      function to calculate the additionally needed skb tailroom. But instead of
      increasing its own tailroom, it is only increasing the tailroom of the first
      queued skb. This is not correct in some situations because the first queued
      entry can be a different one than the parameter.
      
      An observed problem was:
      
      1. packet with size 104, total_size 1464, fragno 1 was received
         - packet is queued
      2. packet with size 1400, total_size 1464, fragno 0 was received
         - packet is queued at the end of the list
      3. enough data was received and can be given to the merge function
         (1464 == (1400 - 20) + (104 - 20))
         - merge functions gets 1400 byte large packet as skb argument
      4. merge function gets first entry in queue (104 byte)
         - stored as skb_out
      5. merge function calculates the required extra tail as total_size - skb->len
         - pskb_expand_head tail of skb_out with 64 bytes
      6. merge function tries to squeeze the extra 1380 bytes from the second queued
         skb (1400 byte aka skb parameter) in the 64 extra tail bytes of skb_out
      
      Instead calculate the extra required tail bytes for skb_out also using skb_out
      instead of using the parameter skb. The skb parameter is only used to get the
      total_size from the last received packet. This is also the total_size used to
      decide that all fragments were received.
      Reported-by: default avatarPhilipp Psurek <philipp.psurek@gmail.com>
      Signed-off-by: default avatarSven Eckelmann <sven@narfation.org>
      Acked-by: default avatarMartin Hundebøll <martin@hundeboll.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      e6e75eaa
    • Jan Kara's avatar
      isofs: Fix infinite looping over CE entries · f5034d91
      Jan Kara authored
      commit f54e18f1 upstream.
      
      Rock Ridge extensions define so called Continuation Entries (CE) which
      define where is further space with Rock Ridge data. Corrupted isofs
      image can contain arbitrarily long chain of these, including a one
      containing loop and thus causing kernel to end in an infinite loop when
      traversing these entries.
      
      Limit the traversal to 32 entries which should be more than enough space
      to store all the Rock Ridge data.
      Reported-by: default avatarP J P <ppandit@redhat.com>
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      f5034d91
    • Andy Lutomirski's avatar
      x86_64, switch_to(): Load TLS descriptors before switching DS and ES · 39f1a2d2
      Andy Lutomirski authored
      commit f647d7c1 upstream.
      
      Otherwise, if buggy user code points DS or ES into the TLS
      array, they would be corrupted after a context switch.
      
      This also significantly improves the comments and documents some
      gotchas in the code.
      
      Before this patch, the both tests below failed.  With this
      patch, the es test passes, although the gsbase test still fails.
      
       ----- begin es test -----
      
      /*
       * Copyright (c) 2014 Andy Lutomirski
       * GPL v2
       */
      
      static unsigned short GDT3(int idx)
      {
      	return (idx << 3) | 3;
      }
      
      static int create_tls(int idx, unsigned int base)
      {
      	struct user_desc desc = {
      		.entry_number    = idx,
      		.base_addr       = base,
      		.limit           = 0xfffff,
      		.seg_32bit       = 1,
      		.contents        = 0, /* Data, grow-up */
      		.read_exec_only  = 0,
      		.limit_in_pages  = 1,
      		.seg_not_present = 0,
      		.useable         = 0,
      	};
      
      	if (syscall(SYS_set_thread_area, &desc) != 0)
      		err(1, "set_thread_area");
      
      	return desc.entry_number;
      }
      
      int main()
      {
      	int idx = create_tls(-1, 0);
      	printf("Allocated GDT index %d\n", idx);
      
      	unsigned short orig_es;
      	asm volatile ("mov %%es,%0" : "=rm" (orig_es));
      
      	int errors = 0;
      	int total = 1000;
      	for (int i = 0; i < total; i++) {
      		asm volatile ("mov %0,%%es" : : "rm" (GDT3(idx)));
      		usleep(100);
      
      		unsigned short es;
      		asm volatile ("mov %%es,%0" : "=rm" (es));
      		asm volatile ("mov %0,%%es" : : "rm" (orig_es));
      		if (es != GDT3(idx)) {
      			if (errors == 0)
      				printf("[FAIL]\tES changed from 0x%hx to 0x%hx\n",
      				       GDT3(idx), es);
      			errors++;
      		}
      	}
      
      	if (errors) {
      		printf("[FAIL]\tES was corrupted %d/%d times\n", errors, total);
      		return 1;
      	} else {
      		printf("[OK]\tES was preserved\n");
      		return 0;
      	}
      }
      
       ----- end es test -----
      
       ----- begin gsbase test -----
      
      /*
       * gsbase.c, a gsbase test
       * Copyright (c) 2014 Andy Lutomirski
       * GPL v2
       */
      
      static unsigned char *testptr, *testptr2;
      
      static unsigned char read_gs_testvals(void)
      {
      	unsigned char ret;
      	asm volatile ("movb %%gs:%1, %0" : "=r" (ret) : "m" (*testptr));
      	return ret;
      }
      
      int main()
      {
      	int errors = 0;
      
      	testptr = mmap((void *)0x200000000UL, 1, PROT_READ | PROT_WRITE,
      		       MAP_PRIVATE | MAP_FIXED | MAP_ANONYMOUS, -1, 0);
      	if (testptr == MAP_FAILED)
      		err(1, "mmap");
      
      	testptr2 = mmap((void *)0x300000000UL, 1, PROT_READ | PROT_WRITE,
      		       MAP_PRIVATE | MAP_FIXED | MAP_ANONYMOUS, -1, 0);
      	if (testptr2 == MAP_FAILED)
      		err(1, "mmap");
      
      	*testptr = 0;
      	*testptr2 = 1;
      
      	if (syscall(SYS_arch_prctl, ARCH_SET_GS,
      		    (unsigned long)testptr2 - (unsigned long)testptr) != 0)
      		err(1, "ARCH_SET_GS");
      
      	usleep(100);
      
      	if (read_gs_testvals() == 1) {
      		printf("[OK]\tARCH_SET_GS worked\n");
      	} else {
      		printf("[FAIL]\tARCH_SET_GS failed\n");
      		errors++;
      	}
      
      	asm volatile ("mov %0,%%gs" : : "r" (0));
      
      	if (read_gs_testvals() == 0) {
      		printf("[OK]\tWriting 0 to gs worked\n");
      	} else {
      		printf("[FAIL]\tWriting 0 to gs failed\n");
      		errors++;
      	}
      
      	usleep(100);
      
      	if (read_gs_testvals() == 0) {
      		printf("[OK]\tgsbase is still zero\n");
      	} else {
      		printf("[FAIL]\tgsbase was corrupted\n");
      		errors++;
      	}
      
      	return errors == 0 ? 0 : 1;
      }
      
       ----- end gsbase test -----
      Signed-off-by: default avatarAndy Lutomirski <luto@amacapital.net>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/509d27c9fec78217691c3dad91cec87e1006b34a.1418075657.git.luto@amacapital.netSigned-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      39f1a2d2
    • Eric W. Biederman's avatar
      userns: Only allow the creator of the userns unprivileged mappings · be2aec30
      Eric W. Biederman authored
      commit f95d7918 upstream.
      
      If you did not create the user namespace and are allowed
      to write to uid_map or gid_map you should already have the necessary
      privilege in the parent user namespace to establish any mapping
      you want so this will not affect userspace in practice.
      
      Limiting unprivileged uid mapping establishment to the creator of the
      user namespace makes it easier to verify all credentials obtained with
      the uid mapping can be obtained without the uid mapping without
      privilege.
      
      Limiting unprivileged gid mapping establishment (which is temporarily
      absent) to the creator of the user namespace also ensures that the
      combination of uid and gid can already be obtained without privilege.
      
      This is part of the fix for CVE-2014-8989.
      Reviewed-by: default avatarAndy Lutomirski <luto@amacapital.net>
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      be2aec30
    • Eric W. Biederman's avatar
      userns: Document what the invariant required for safe unprivileged mappings. · be9edd8b
      Eric W. Biederman authored
      commit 0542f17b upstream.
      
      The rule is simple.  Don't allow anything that wouldn't be allowed
      without unprivileged mappings.
      
      It was previously overlooked that establishing gid mappings would
      allow dropping groups and potentially gaining permission to files and
      directories that had lesser permissions for a specific group than for
      all other users.
      
      This is the rule needed to fix CVE-2014-8989 and prevent any other
      security issues with new_idmap_permitted.
      
      The reason for this rule is that the unix permission model is old and
      there are programs out there somewhere that take advantage of every
      little corner of it.  So allowing a uid or gid mapping to be
      established without privielge that would allow anything that would not
      be allowed without that mapping will result in expectations from some
      code somewhere being violated.  Violated expectations about the
      behavior of the OS is a long way to say a security issue.
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      be9edd8b
    • Eric W. Biederman's avatar
      userns: Check euid no fsuid when establishing an unprivileged uid mapping · 65007036
      Eric W. Biederman authored
      commit 80dd00a2 upstream.
      
      setresuid allows the euid to be set to any of uid, euid, suid, and
      fsuid.  Therefor it is safe to allow an unprivileged user to map
      their euid and use CAP_SETUID privileged with exactly that uid,
      as no new credentials can be obtained.
      
      I can not find a combination of existing system calls that allows setting
      uid, euid, suid, and fsuid from the fsuid making the previous use
      of fsuid for allowing unprivileged mappings a bug.
      
      This is part of a fix for CVE-2014-8989.
      Reviewed-by: default avatarAndy Lutomirski <luto@amacapital.net>
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      65007036
    • Andy Lutomirski's avatar
      x86/tls: Validate TLS entries to protect espfix · eb8b9652
      Andy Lutomirski authored
      commit 41bdc785 upstream.
      
      Installing a 16-bit RW data segment into the GDT defeats espfix.
      AFAICT this will not affect glibc, Wine, or dosemu at all.
      Signed-off-by: default avatarAndy Lutomirski <luto@amacapital.net>
      Acked-by: default avatarH. Peter Anvin <hpa@zytor.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: security@kernel.org <security@kernel.org>
      Cc: Willy Tarreau <w@1wt.eu>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      eb8b9652
    • Nadav Amit's avatar
      [3.13-stable only] KVM: x86: Fix far-jump to non-canonical check · 111fefa3
      Nadav Amit authored
      commit 7e46dddd upstream.
      
      [3.13-stable's first backport (f9bffe04) of this commit accidentally omitted
      part of the upstream patch (the WARN_ON fixes), supplied here.]
      
      Commit d1442d85 ("KVM: x86: Handle errors when RIP is set during far
      jumps") introduced a bug that caused the fix to be incomplete.  Due to
      incorrect evaluation, far jump to segment with L bit cleared (i.e., 32-bit
      segment) and RIP with any of the high bits set (i.e, RIP[63:32] != 0) set may
      not trigger #GP.  As we know, this imposes a security problem.
      
      In addition, the condition for two warnings was incorrect.
      
      Fixes: d1442d85Reported-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: default avatarNadav Amit <namit@cs.technion.ac.il>
      [Add #ifdef CONFIG_X86_64 to avoid complaints of undefined behavior. - Paolo]
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Cc: Vinson Lee <vlee@twopensource.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      111fefa3
    • Ronald Wahl's avatar
      usb: gadget: at91_udc: move prepare clk into process context · 0847cfb9
      Ronald Wahl authored
      commit b2ba27a5 upstream.
      
      Commit 76280832 (usb: gadget: at91_udc:
      prepare clk before calling enable) added clock preparation in interrupt
      context. This is not allowed as it might sleep. Also setting the clock
      rate is unsafe to call from there for the same reason. Move clock
      preparation and setting clock rate into process context (at91udc_probe).
      Signed-off-by: default avatarRonald Wahl <ronald.wahl@raritan.com>
      Acked-by: default avatarAlexandre Belloni <alexandre.belloni@free-electrons.com>
      Acked-by: default avatarBoris Brezillon <boris.brezillon@free-electrons.com>
      Acked-by: default avatarNicolas Ferre <nicolas.ferre@atmel.com>
      Cc: Felipe Balbi <balbi@ti.com>
      Signed-off-by: default avatarFelipe Balbi <balbi@ti.com>
      [ kamal: backport to 3.13-stable: at91_udc.c moved ]
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      0847cfb9
    • David Ertman's avatar
      e1000e: Fix no connectivity when driver loaded with cable out · b81e3d0a
      David Ertman authored
      commit b20a7744 upstream.
      
      In commit da1e2046, the flow for enabling/disabling an Si errata
      workaround (e1000_lv_jumbo_workaround_ich8lan) was changed to fix a problem
      with iAMT connections dropping on interface down with jumbo frames set.
      Part of this change was to move the function call disabling the workaround
      to e1000e_down() from the e1000_setup_rctl() function.  The mechanic for
      disabling of this workaround involves writing several MAC and PHY registers
      back to hardware defaults.
      
      After this commit, when the driver is loaded with the cable out, the PHY
      registers are not programmed with the correct default values.  This causes
      the device to be capable of transmitting packets, but is unable to recieve
      them until this workaround is called.
      
      The flow of e1000e's open code relies upon calling the above workaround to
      expicitly program these registers either with jumbo frame appropriate settings
      or h/w defaults on 82579 and newer hardware.
      
      Fix this issue by adding logic to e1000_setup_rctl() that not only calls
      e1000_lv_jumbo_workaround_ich8lan() when jumbo frames are set, to enable the
      workaround, but also calls this function to explicitly disable the workaround
      in the case that jumbo frames are not set.
      Signed-off-by: default avatarDave Ertman <davidx.m.ertman@intel.com>
      Tested-by: default avatarJeff Pieper <jeffrey.e.pieper@intel.com>
      Signed-off-by: default avatarJeff Kirsher <jeffrey.t.kirsher@intel.com>
      Cc: Joseph Salisbury <joseph.salisbury@canonical.com>
      BugLink: http://bugs.launchpad.net/bugs/1400365Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      b81e3d0a
  4. 09 Jan, 2015 1 commit
  5. 18 Dec, 2014 1 commit
  6. 15 Dec, 2014 3 commits