1. 07 Jun, 2017 22 commits
    • Eric Dumazet's avatar
      ipv4: add reference counting to metrics · 338f665a
      Eric Dumazet authored
      
      [ Upstream commit 3fb07daf ]
      
      Andrey Konovalov reported crashes in ipv4_mtu()
      
      I could reproduce the issue with KASAN kernels, between
      10.246.7.151 and 10.246.7.152 :
      
      1) 20 concurrent netperf -t TCP_RR -H 10.246.7.152 -l 1000 &
      
      2) At the same time run following loop :
      while :
      do
       ip ro add 10.246.7.152 dev eth0 src 10.246.7.151 mtu 1500
       ip ro del 10.246.7.152 dev eth0 src 10.246.7.151 mtu 1500
      done
      
      Cong Wang attempted to add back rt->fi in commit
      82486aa6 ("ipv4: restore rt->fi for reference counting")
      but this proved to add some issues that were complex to solve.
      
      Instead, I suggested to add a refcount to the metrics themselves,
      being a standalone object (in particular, no reference to other objects)
      
      I tried to make this patch as small as possible to ease its backport,
      instead of being super clean. Note that we believe that only ipv4 dst
      need to take care of the metric refcount. But if this is wrong,
      this patch adds the basic infrastructure to extend this to other
      families.
      
      Many thanks to Julian Anastasov for reviewing this patch, and Cong Wang
      for his efforts on this problem.
      
      Fixes: 2860583f ("ipv4: Kill rt->fi")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Reviewed-by: default avatarJulian Anastasov <ja@ssi.bg>
      Acked-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      338f665a
    • Davide Caratti's avatar
      sctp: fix ICMP processing if skb is non-linear · 97f54575
      Davide Caratti authored
      
      [ Upstream commit 804ec7eb ]
      
      sometimes ICMP replies to INIT chunks are ignored by the client, even if
      the encapsulated SCTP headers match an open socket. This happens when the
      ICMP packet is carried by a paged skb: use skb_header_pointer() to read
      packet contents beyond the SCTP header, so that chunk header and initiate
      tag are validated correctly.
      
      v2:
      - don't use skb_header_pointer() to read the transport header, since
        icmp_socket_deliver() already puts these 8 bytes in the linear area.
      - change commit message to make specific reference to INIT chunks.
      Signed-off-by: default avatarDavide Caratti <dcaratti@redhat.com>
      Acked-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Acked-by: default avatarVlad Yasevich <vyasevich@gmail.com>
      Reviewed-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      97f54575
    • Wei Wang's avatar
      tcp: avoid fastopen API to be used on AF_UNSPEC · fe22b600
      Wei Wang authored
      
      [ Upstream commit ba615f67 ]
      
      Fastopen API should be used to perform fastopen operations on the TCP
      socket. It does not make sense to use fastopen API to perform disconnect
      by calling it with AF_UNSPEC. The fastopen data path is also prone to
      race conditions and bugs when using with AF_UNSPEC.
      
      One issue reported and analyzed by Vegard Nossum is as follows:
      +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
      Thread A:                            Thread B:
      ------------------------------------------------------------------------
      sendto()
       - tcp_sendmsg()
           - sk_stream_memory_free() = 0
               - goto wait_for_sndbuf
      	     - sk_stream_wait_memory()
      	        - sk_wait_event() // sleep
                |                          sendto(flags=MSG_FASTOPEN, dest_addr=AF_UNSPEC)
      	  |                           - tcp_sendmsg()
      	  |                              - tcp_sendmsg_fastopen()
      	  |                                 - __inet_stream_connect()
      	  |                                    - tcp_disconnect() //because of AF_UNSPEC
      	  |                                       - tcp_transmit_skb()// send RST
      	  |                                    - return 0; // no reconnect!
      	  |                           - sk_stream_wait_connect()
      	  |                                 - sock_error()
      	  |                                    - xchg(&sk->sk_err, 0)
      	  |                                    - return -ECONNRESET
      	- ... // wake up, see sk->sk_err == 0
          - skb_entail() on TCP_CLOSE socket
      
      If the connection is reopened then we will send a brand new SYN packet
      after thread A has already queued a buffer. At this point I think the
      socket internal state (sequence numbers etc.) becomes messed up.
      
      When the new connection is closed, the FIN-ACK is rejected because the
      sequence number is outside the window. The other side tries to
      retransmit,
      but __tcp_retransmit_skb() calls tcp_trim_head() on an empty skb which
      corrupts the skb data length and hits a BUG() in copy_and_csum_bits().
      +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
      
      Hence, this patch adds a check for AF_UNSPEC in the fastopen data path
      and return EOPNOTSUPP to user if such case happens.
      
      Fixes: cf60af03 ("tcp: Fast Open client - sendmsg(MSG_FASTOPEN)")
      Reported-by: default avatarVegard Nossum <vegard.nossum@oracle.com>
      Signed-off-by: default avatarWei Wang <weiwan@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fe22b600
    • Vlad Yasevich's avatar
      virtio-net: enable TSO/checksum offloads for Q-in-Q vlans · d7ed7fce
      Vlad Yasevich authored
      
      [ Upstream commit 2836b4f2 ]
      
      Since virtio does not provide it's own ndo_features_check handler,
      TSO, and now checksum offload, are disabled for stacked vlans.
      Re-enable the support and let the host take care of it.  This
      restores/improves Guest-to-Guest performance over Q-in-Q vlans.
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarVladislav Yasevich <vyasevic@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d7ed7fce
    • Vlad Yasevich's avatar
      be2net: Fix offload features for Q-in-Q packets · 8380f16d
      Vlad Yasevich authored
      
      [ Upstream commit cc6e9de6 ]
      
      At least some of the be2net cards do not seem to be capabled
      of performing checksum offload computions on Q-in-Q packets.
      In these case, the recevied checksum on the remote is invalid
      and TCP syn packets are dropped.
      
      This patch adds a call to check disbled acceleration features
      on Q-in-Q tagged traffic.
      
      CC: Sathya Perla <sathya.perla@broadcom.com>
      CC: Ajit Khaparde <ajit.khaparde@broadcom.com>
      CC: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
      CC: Somnath Kotur <somnath.kotur@broadcom.com>
      Signed-off-by: default avatarVladislav Yasevich <vyasevic@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8380f16d
    • Eric Dumazet's avatar
      ipv6: fix out of bound writes in __ip6_append_data() · 38f02f2c
      Eric Dumazet authored
      
      [ Upstream commit 232cd35d ]
      
      Andrey Konovalov and idaifish@gmail.com reported crashes caused by
      one skb shared_info being overwritten from __ip6_append_data()
      
      Andrey program lead to following state :
      
      copy -4200 datalen 2000 fraglen 2040
      maxfraglen 2040 alloclen 2048 transhdrlen 0 offset 0 fraggap 6200
      
      The skb_copy_and_csum_bits(skb_prev, maxfraglen, data + transhdrlen,
      fraggap, 0); is overwriting skb->head and skb_shared_info
      
      Since we apparently detect this rare condition too late, move the
      code earlier to even avoid allocating skb and risking crashes.
      
      Once again, many thanks to Andrey and syzkaller team.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Tested-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Reported-by: <idaifish@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      38f02f2c
    • Xin Long's avatar
      bridge: start hello_timer when enabling KERNEL_STP in br_stp_start · 3a854210
      Xin Long authored
      
      [ Upstream commit 6d18c732 ]
      
      Since commit 76b91c32 ("bridge: stp: when using userspace stp stop
      kernel hello and hold timers"), bridge would not start hello_timer if
      stp_enabled is not KERNEL_STP when br_dev_open.
      
      The problem is even if users set stp_enabled with KERNEL_STP later,
      the timer will still not be started. It causes that KERNEL_STP can
      not really work. Users have to re-ifup the bridge to avoid this.
      
      This patch is to fix it by starting br->hello_timer when enabling
      KERNEL_STP in br_stp_start.
      
      As an improvement, it's also to start hello_timer again only when
      br->stp_enabled is KERNEL_STP in br_hello_timer_expired, there is
      no reason to start the timer again when it's NO_STP.
      
      Fixes: 76b91c32 ("bridge: stp: when using userspace stp stop kernel hello and hold timers")
      Reported-by: default avatarHaidong Li <haili@redhat.com>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Acked-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Reviewed-by: default avatarIvan Vecera <cera@cera.cz>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3a854210
    • Bjørn Mork's avatar
      qmi_wwan: add another Lenovo EM74xx device ID · b543ccc4
      Bjørn Mork authored
      
      [ Upstream commit 486181bc ]
      
      In their infinite wisdom, and never ending quest for end user frustration,
      Lenovo has decided to use a new USB device ID for the wwan modules in
      their 2017 laptops.  The actual hardware is still the Sierra Wireless
      EM7455 or EM7430, depending on region.
      Signed-off-by: default avatarBjørn Mork <bjorn@mork.no>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b543ccc4
    • Tobias Jungel's avatar
      bridge: netlink: check vlan_default_pvid range · 94c0bf3c
      Tobias Jungel authored
      
      [ Upstream commit a2858602 ]
      
      Currently it is allowed to set the default pvid of a bridge to a value
      above VLAN_VID_MASK (0xfff). This patch adds a check to br_validate and
      returns -EINVAL in case the pvid is out of bounds.
      
      Reproduce by calling:
      
      [root@test ~]# ip l a type bridge
      [root@test ~]# ip l a type dummy
      [root@test ~]# ip l s bridge0 type bridge vlan_filtering 1
      [root@test ~]# ip l s bridge0 type bridge vlan_default_pvid 9999
      [root@test ~]# ip l s dummy0 master bridge0
      [root@test ~]# bridge vlan
      port	vlan ids
      bridge0	 9999 PVID Egress Untagged
      
      dummy0	 9999 PVID Egress Untagged
      
      Fixes: 0f963b75 ("bridge: netlink: add support for default_pvid")
      Acked-by: default avatarNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: default avatarTobias Jungel <tobias.jungel@bisdn.de>
      Acked-by: default avatarSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      94c0bf3c
    • David S. Miller's avatar
      ipv6: Check ip6_find_1stfragopt() return value properly. · f76d54a8
      David S. Miller authored
      
      [ Upstream commit 7dd7eb95 ]
      
      Do not use unsigned variables to see if it returns a negative
      error or not.
      
      Fixes: 2423496a ("ipv6: Prevent overrun when parsing v6 header options")
      Reported-by: default avatarJulia Lawall <julia.lawall@lip6.fr>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f76d54a8
    • Craig Gallek's avatar
      ipv6: Prevent overrun when parsing v6 header options · 017fabea
      Craig Gallek authored
      
      [ Upstream commit 2423496a ]
      
      The KASAN warning repoted below was discovered with a syzkaller
      program.  The reproducer is basically:
        int s = socket(AF_INET6, SOCK_RAW, NEXTHDR_HOP);
        send(s, &one_byte_of_data, 1, MSG_MORE);
        send(s, &more_than_mtu_bytes_data, 2000, 0);
      
      The socket() call sets the nexthdr field of the v6 header to
      NEXTHDR_HOP, the first send call primes the payload with a non zero
      byte of data, and the second send call triggers the fragmentation path.
      
      The fragmentation code tries to parse the header options in order
      to figure out where to insert the fragment option.  Since nexthdr points
      to an invalid option, the calculation of the size of the network header
      can made to be much larger than the linear section of the skb and data
      is read outside of it.
      
      This fix makes ip6_find_1stfrag return an error if it detects
      running out-of-bounds.
      
      [   42.361487] ==================================================================
      [   42.364412] BUG: KASAN: slab-out-of-bounds in ip6_fragment+0x11c8/0x3730
      [   42.365471] Read of size 840 at addr ffff88000969e798 by task ip6_fragment-oo/3789
      [   42.366469]
      [   42.366696] CPU: 1 PID: 3789 Comm: ip6_fragment-oo Not tainted 4.11.0+ #41
      [   42.367628] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.1-1ubuntu1 04/01/2014
      [   42.368824] Call Trace:
      [   42.369183]  dump_stack+0xb3/0x10b
      [   42.369664]  print_address_description+0x73/0x290
      [   42.370325]  kasan_report+0x252/0x370
      [   42.370839]  ? ip6_fragment+0x11c8/0x3730
      [   42.371396]  check_memory_region+0x13c/0x1a0
      [   42.371978]  memcpy+0x23/0x50
      [   42.372395]  ip6_fragment+0x11c8/0x3730
      [   42.372920]  ? nf_ct_expect_unregister_notifier+0x110/0x110
      [   42.373681]  ? ip6_copy_metadata+0x7f0/0x7f0
      [   42.374263]  ? ip6_forward+0x2e30/0x2e30
      [   42.374803]  ip6_finish_output+0x584/0x990
      [   42.375350]  ip6_output+0x1b7/0x690
      [   42.375836]  ? ip6_finish_output+0x990/0x990
      [   42.376411]  ? ip6_fragment+0x3730/0x3730
      [   42.376968]  ip6_local_out+0x95/0x160
      [   42.377471]  ip6_send_skb+0xa1/0x330
      [   42.377969]  ip6_push_pending_frames+0xb3/0xe0
      [   42.378589]  rawv6_sendmsg+0x2051/0x2db0
      [   42.379129]  ? rawv6_bind+0x8b0/0x8b0
      [   42.379633]  ? _copy_from_user+0x84/0xe0
      [   42.380193]  ? debug_check_no_locks_freed+0x290/0x290
      [   42.380878]  ? ___sys_sendmsg+0x162/0x930
      [   42.381427]  ? rcu_read_lock_sched_held+0xa3/0x120
      [   42.382074]  ? sock_has_perm+0x1f6/0x290
      [   42.382614]  ? ___sys_sendmsg+0x167/0x930
      [   42.383173]  ? lock_downgrade+0x660/0x660
      [   42.383727]  inet_sendmsg+0x123/0x500
      [   42.384226]  ? inet_sendmsg+0x123/0x500
      [   42.384748]  ? inet_recvmsg+0x540/0x540
      [   42.385263]  sock_sendmsg+0xca/0x110
      [   42.385758]  SYSC_sendto+0x217/0x380
      [   42.386249]  ? SYSC_connect+0x310/0x310
      [   42.386783]  ? __might_fault+0x110/0x1d0
      [   42.387324]  ? lock_downgrade+0x660/0x660
      [   42.387880]  ? __fget_light+0xa1/0x1f0
      [   42.388403]  ? __fdget+0x18/0x20
      [   42.388851]  ? sock_common_setsockopt+0x95/0xd0
      [   42.389472]  ? SyS_setsockopt+0x17f/0x260
      [   42.390021]  ? entry_SYSCALL_64_fastpath+0x5/0xbe
      [   42.390650]  SyS_sendto+0x40/0x50
      [   42.391103]  entry_SYSCALL_64_fastpath+0x1f/0xbe
      [   42.391731] RIP: 0033:0x7fbbb711e383
      [   42.392217] RSP: 002b:00007ffff4d34f28 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
      [   42.393235] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fbbb711e383
      [   42.394195] RDX: 0000000000001000 RSI: 00007ffff4d34f60 RDI: 0000000000000003
      [   42.395145] RBP: 0000000000000046 R08: 00007ffff4d34f40 R09: 0000000000000018
      [   42.396056] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000400aad
      [   42.396598] R13: 0000000000000066 R14: 00007ffff4d34ee0 R15: 00007fbbb717af00
      [   42.397257]
      [   42.397411] Allocated by task 3789:
      [   42.397702]  save_stack_trace+0x16/0x20
      [   42.398005]  save_stack+0x46/0xd0
      [   42.398267]  kasan_kmalloc+0xad/0xe0
      [   42.398548]  kasan_slab_alloc+0x12/0x20
      [   42.398848]  __kmalloc_node_track_caller+0xcb/0x380
      [   42.399224]  __kmalloc_reserve.isra.32+0x41/0xe0
      [   42.399654]  __alloc_skb+0xf8/0x580
      [   42.400003]  sock_wmalloc+0xab/0xf0
      [   42.400346]  __ip6_append_data.isra.41+0x2472/0x33d0
      [   42.400813]  ip6_append_data+0x1a8/0x2f0
      [   42.401122]  rawv6_sendmsg+0x11ee/0x2db0
      [   42.401505]  inet_sendmsg+0x123/0x500
      [   42.401860]  sock_sendmsg+0xca/0x110
      [   42.402209]  ___sys_sendmsg+0x7cb/0x930
      [   42.402582]  __sys_sendmsg+0xd9/0x190
      [   42.402941]  SyS_sendmsg+0x2d/0x50
      [   42.403273]  entry_SYSCALL_64_fastpath+0x1f/0xbe
      [   42.403718]
      [   42.403871] Freed by task 1794:
      [   42.404146]  save_stack_trace+0x16/0x20
      [   42.404515]  save_stack+0x46/0xd0
      [   42.404827]  kasan_slab_free+0x72/0xc0
      [   42.405167]  kfree+0xe8/0x2b0
      [   42.405462]  skb_free_head+0x74/0xb0
      [   42.405806]  skb_release_data+0x30e/0x3a0
      [   42.406198]  skb_release_all+0x4a/0x60
      [   42.406563]  consume_skb+0x113/0x2e0
      [   42.406910]  skb_free_datagram+0x1a/0xe0
      [   42.407288]  netlink_recvmsg+0x60d/0xe40
      [   42.407667]  sock_recvmsg+0xd7/0x110
      [   42.408022]  ___sys_recvmsg+0x25c/0x580
      [   42.408395]  __sys_recvmsg+0xd6/0x190
      [   42.408753]  SyS_recvmsg+0x2d/0x50
      [   42.409086]  entry_SYSCALL_64_fastpath+0x1f/0xbe
      [   42.409513]
      [   42.409665] The buggy address belongs to the object at ffff88000969e780
      [   42.409665]  which belongs to the cache kmalloc-512 of size 512
      [   42.410846] The buggy address is located 24 bytes inside of
      [   42.410846]  512-byte region [ffff88000969e780, ffff88000969e980)
      [   42.411941] The buggy address belongs to the page:
      [   42.412405] page:ffffea000025a780 count:1 mapcount:0 mapping:          (null) index:0x0 compound_mapcount: 0
      [   42.413298] flags: 0x100000000008100(slab|head)
      [   42.413729] raw: 0100000000008100 0000000000000000 0000000000000000 00000001800c000c
      [   42.414387] raw: ffffea00002a9500 0000000900000007 ffff88000c401280 0000000000000000
      [   42.415074] page dumped because: kasan: bad access detected
      [   42.415604]
      [   42.415757] Memory state around the buggy address:
      [   42.416222]  ffff88000969e880: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      [   42.416904]  ffff88000969e900: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      [   42.417591] >ffff88000969e980: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      [   42.418273]                    ^
      [   42.418588]  ffff88000969ea00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [   42.419273]  ffff88000969ea80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [   42.419882] ==================================================================
      Reported-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: default avatarCraig Gallek <kraig@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      017fabea
    • David Ahern's avatar
      net: Improve handling of failures on link and route dumps · 640bfcf2
      David Ahern authored
      
      [ Upstream commit f6c5775f ]
      
      In general, rtnetlink dumps do not anticipate failure to dump a single
      object (e.g., link or route) on a single pass. As both route and link
      objects have grown via more attributes, that is no longer a given.
      
      netlink dumps can handle a failure if the dump function returns an
      error; specifically, netlink_dump adds the return code to the response
      if it is <= 0 so userspace is notified of the failure. The missing
      piece is the rtnetlink dump functions returning the error.
      
      Fix route and link dump functions to return the errors if no object is
      added to an skb (detected by skb->len != 0). IPv6 route dumps
      (rt6_dump_route) already return the error; this patch updates IPv4 and
      link dumps. Other dump functions may need to be ajusted as well.
      Reported-by: default avatarJan Moskyto Matejka <mq@ucw.cz>
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      640bfcf2
    • Soheil Hassas Yeganeh's avatar
      tcp: eliminate negative reordering in tcp_clean_rtx_queue · 7ede5c90
      Soheil Hassas Yeganeh authored
      
      [ Upstream commit bafbb9c7 ]
      
      tcp_ack() can call tcp_fragment() which may dededuct the
      value tp->fackets_out when MSS changes. When prior_fackets
      is larger than tp->fackets_out, tcp_clean_rtx_queue() can
      invoke tcp_update_reordering() with negative values. This
      results in absurd tp->reodering values higher than
      sysctl_tcp_max_reordering.
      
      Note that tcp_update_reordering indeeds sets tp->reordering
      to min(sysctl_tcp_max_reordering, metric), but because
      the comparison is signed, a negative metric always wins.
      
      Fixes: c7caf8d3 ("[TCP]: Fix reord detection due to snd_una covered holes")
      Reported-by: default avatarRebecca Isaacs <risaacs@google.com>
      Signed-off-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7ede5c90
    • Eric Dumazet's avatar
      sctp: do not inherit ipv6_{mc|ac|fl}_list from parent · ffa551de
      Eric Dumazet authored
      
      [ Upstream commit fdcee2cb ]
      
      SCTP needs fixes similar to 83eaddab ("ipv6/dccp: do not inherit
      ipv6_mc_list from parent"), otherwise bad things can happen.
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Tested-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ffa551de
    • Xin Long's avatar
      sctp: fix src address selection if using secondary addresses for ipv6 · 704e6c6b
      Xin Long authored
      
      [ Upstream commit dbc2b5e9 ]
      
      Commit 0ca50d12 ("sctp: fix src address selection if using secondary
      addresses") has fixed a src address selection issue when using secondary
      addresses for ipv4.
      
      Now sctp ipv6 also has the similar issue. When using a secondary address,
      sctp_v6_get_dst tries to choose the saddr which has the most same bits
      with the daddr by sctp_v6_addr_match_len. It may make some cases not work
      as expected.
      
      hostA:
        [1] fd21:356b:459a:cf10::11 (eth1)
        [2] fd21:356b:459a:cf20::11 (eth2)
      
      hostB:
        [a] fd21:356b:459a:cf30::2  (eth1)
        [b] fd21:356b:459a:cf40::2  (eth2)
      
      route from hostA to hostB:
        fd21:356b:459a:cf30::/64 dev eth1  metric 1024  mtu 1500
      
      The expected path should be:
        fd21:356b:459a:cf10::11 <-> fd21:356b:459a:cf30::2
      But addr[2] matches addr[a] more bits than addr[1] does, according to
      sctp_v6_addr_match_len. It causes the path to be:
        fd21:356b:459a:cf20::11 <-> fd21:356b:459a:cf30::2
      
      This patch is to fix it with the same way as Marcelo's fix for sctp ipv4.
      As no ip_dev_find for ipv6, this patch is to use ipv6_chk_addr to check
      if the saddr is in a dev instead.
      
      Note that for backwards compatibility, it will still do the addr_match_len
      check here when no optimal is found.
      Reported-by: default avatarPatrick Talbert <ptalbert@redhat.com>
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      704e6c6b
    • Yuchung Cheng's avatar
      tcp: avoid fragmenting peculiar skbs in SACK · 90e3f8a5
      Yuchung Cheng authored
      
      [ Upstream commit b451e5d2 ]
      
      This patch fixes a bug in splitting an SKB during SACK
      processing. Specifically if an skb contains multiple
      packets and is only partially sacked in the higher sequences,
      tcp_match_sack_to_skb() splits the skb and marks the second fragment
      as SACKed.
      
      The current code further attempts rounding up the first fragment
      to MSS boundaries. But it misses a boundary condition when the
      rounded-up fragment size (pkt_len) is exactly skb size.  Spliting
      such an skb is pointless and causses a kernel warning and aborts
      the SACK processing. This patch universally checks such over-split
      before calling tcp_fragment to prevent these unnecessary warnings.
      
      Fixes: adb92db8 ("tcp: Make SACK code to split only at mss boundaries")
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      90e3f8a5
    • Julian Wiedmann's avatar
      s390/qeth: avoid null pointer dereference on OSN · 182abc4e
      Julian Wiedmann authored
      
      [ Upstream commit 25e2c341 ]
      
      Access card->dev only after checking whether's its valid.
      Signed-off-by: default avatarJulian Wiedmann <jwi@linux.vnet.ibm.com>
      Reviewed-by: default avatarUrsula Braun <ubraun@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      182abc4e
    • Julian Wiedmann's avatar
      s390/qeth: unbreak OSM and OSN support · 21b87158
      Julian Wiedmann authored
      
      [ Upstream commit 2d2ebb3e ]
      
      commit b4d72c08 ("qeth: bridgeport support - basic control")
      broke the support for OSM and OSN devices as follows:
      
      As OSM and OSN are L2 only, qeth_core_probe_device() does an early
      setup by loading the l2 discipline and calling qeth_l2_probe_device().
      In this context, adding the l2-specific bridgeport sysfs attributes
      via qeth_l2_create_device_attributes() hits a BUG_ON in fs/sysfs/group.c,
      since the basic sysfs infrastructure for the device hasn't been
      established yet.
      
      Note that OSN actually has its own unique sysfs attributes
      (qeth_osn_devtype), so the additional attributes shouldn't be created
      at all.
      For OSM, add a new qeth_l2_devtype that contains all the common
      and l2-specific sysfs attributes.
      When qeth_core_probe_device() does early setup for OSM or OSN, assign
      the corresponding devtype so that the ccwgroup probe code creates the
      full set of sysfs attributes.
      This allows us to skip qeth_l2_create_device_attributes() in case
      of an early setup.
      
      Any device that can't do early setup will initially have only the
      generic sysfs attributes, and when it's probed later
      qeth_l2_probe_device() adds the l2-specific attributes.
      
      If an early-setup device is removed (by calling ccwgroup_ungroup()),
      device_unregister() will - using the devtype - delete the
      l2-specific attributes before qeth_l2_remove_device() is called.
      So make sure to not remove them twice.
      
      What complicates the issue is that qeth_l2_probe_device() and
      qeth_l2_remove_device() is also called on a device when its
      layer2 attribute changes (ie. its layer mode is switched).
      For early-setup devices this wouldn't work properly - we wouldn't
      remove the l2-specific attributes when switching to L3.
      But switching the layer mode doesn't actually make any sense;
      we already decided that the device can only operate in L2!
      So just refuse to switch the layer mode on such devices. Note that
      OSN doesn't have a layer2 attribute, so we only need to special-case
      OSM.
      
      Based on an initial patch by Ursula Braun.
      
      Fixes: b4d72c08 ("qeth: bridgeport support - basic control")
      Signed-off-by: default avatarJulian Wiedmann <jwi@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      21b87158
    • Ursula Braun's avatar
      s390/qeth: handle sysfs error during initialization · 2ac37098
      Ursula Braun authored
      
      [ Upstream commit 9111e788 ]
      
      When setting up the device from within the layer discipline's
      probe routine, creating the layer-specific sysfs attributes can fail.
      Report this error back to the caller, and handle it by
      releasing the layer discipline.
      Signed-off-by: default avatarUrsula Braun <ubraun@linux.vnet.ibm.com>
      [jwi: updated commit msg, moved an OSN change to a subsequent patch]
      Signed-off-by: default avatarJulian Wiedmann <jwi@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2ac37098
    • WANG Cong's avatar
      ipv6/dccp: do not inherit ipv6_mc_list from parent · d1428ee5
      WANG Cong authored
      
      [ Upstream commit 83eaddab ]
      
      Like commit 657831ff ("dccp/tcp: do not inherit mc_list from parent")
      we should clear ipv6_mc_list etc. for IPv6 sockets too.
      
      Cc: Eric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d1428ee5
    • Eric Dumazet's avatar
      dccp/tcp: do not inherit mc_list from parent · 5f67a166
      Eric Dumazet authored
      
      [ Upstream commit 657831ff ]
      
      syzkaller found a way to trigger double frees from ip_mc_drop_socket()
      
      It turns out that leave a copy of parent mc_list at accept() time,
      which is very bad.
      
      Very similar to commit 8b485ce6 ("tcp: do not inherit
      fastopen_req from parent")
      
      Initial report from Pray3r, completed by Andrey one.
      Thanks a lot to them !
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarPray3r <pray3r.z@gmail.com>
      Reported-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Tested-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5f67a166
    • Orlando Arias's avatar
      sparc: Fix -Wstringop-overflow warning · b9978c27
      Orlando Arias authored
      
      [ Upstream commit deba804c ]
      
      Greetings,
      
      GCC 7 introduced the -Wstringop-overflow flag to detect buffer overflows
      in calls to string handling functions [1][2]. Due to the way
      ``empty_zero_page'' is declared in arch/sparc/include/setup.h, this
      causes a warning to trigger at compile time in the function mem_init(),
      which is subsequently converted to an error. The ensuing patch fixes
      this issue and aligns the declaration of empty_zero_page to that of
      other architectures. Thank you.
      
      Cheers,
      Orlando.
      
      [1] https://gcc.gnu.org/ml/gcc-patches/2016-10/msg02308.html
      [2] https://gcc.gnu.org/gcc-7/changes.htmlSigned-off-by: default avatarOrlando Arias <oarias@knights.ucf.edu>
      
      --------------------------------------------------------------------------------
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b9978c27
  2. 25 May, 2017 18 commits
    • Greg Kroah-Hartman's avatar
      Linux 4.4.70 · b409ba3b
      Greg Kroah-Hartman authored
      b409ba3b
    • Julius Werner's avatar
      drivers: char: mem: Check for address space wraparound with mmap() · 837bfdb4
      Julius Werner authored
      commit b299cde2 upstream.
      
      /dev/mem currently allows mmap() mappings that wrap around the end of
      the physical address space, which should probably be illegal. It
      circumvents the existing STRICT_DEVMEM permission check because the loop
      immediately terminates (as the start address is already higher than the
      end address). On the x86_64 architecture it will then cause a panic
      (from the BUG(start >= end) in arch/x86/mm/pat.c:reserve_memtype()).
      
      This patch adds an explicit check to make sure offset + size will not
      wrap around in the physical address type.
      Signed-off-by: default avatarJulius Werner <jwerner@chromium.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      837bfdb4
    • J. Bruce Fields's avatar
      nfsd: encoders mustn't use unitialized values in error cases · 52cf2476
      J. Bruce Fields authored
      commit f961e3f2 upstream.
      
      In error cases, lgp->lg_layout_type may be out of bounds; so we
      shouldn't be using it until after the check of nfserr.
      
      This was seen to crash nfsd threads when the server receives a LAYOUTGET
      request with a large layout type.
      
      GETDEVICEINFO has the same problem.
      Reported-by: default avatarAri Kauppi <Ari.Kauppi@synopsys.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      52cf2476
    • Mario Kleiner's avatar
      drm/edid: Add 10 bpc quirk for LGD 764 panel in HP zBook 17 G2 · da922dc4
      Mario Kleiner authored
      commit e345da82 upstream.
      
      The builtin eDP panel in the HP zBook 17 G2 supports 10 bpc,
      as advertised by the Laptops product specs and verified via
      injecting a fixed edid + photometer measurements, but edid
      reports unknown depth, so drivers fall back to 6 bpc.
      
      Add a quirk to get the full 10 bpc.
      Signed-off-by: default avatarMario Kleiner <mario.kleiner.de@gmail.com>
      Acked-by: default avatarHarry Wentland <harry.wentland@amd.com>
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@ffwll.ch>
      Link: http://patchwork.freedesktop.org/patch/msgid/1492787108-23959-1-git-send-email-mario.kleiner.de@gmail.comSigned-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      da922dc4
    • Lukas Wunner's avatar
      PCI: Freeze PME scan before suspending devices · bc428e94
      Lukas Wunner authored
      commit ea00353f upstream.
      
      Laurent Pinchart reported that the Renesas R-Car H2 Lager board (r8a7790)
      crashes during suspend tests.  Geert Uytterhoeven managed to reproduce the
      issue on an M2-W Koelsch board (r8a7791):
      
        It occurs when the PME scan runs, once per second.  During PME scan, the
        PCI host bridge (rcar-pci) registers are accessed while its module clock
        has already been disabled, leading to the crash.
      
      One reproducer is to configure s2ram to use "s2idle" instead of "deep"
      suspend:
      
        # echo 0 > /sys/module/printk/parameters/console_suspend
        # echo s2idle > /sys/power/mem_sleep
        # echo mem > /sys/power/state
      
      Another reproducer is to write either "platform" or "processors" to
      /sys/power/pm_test.  It does not (or is less likely) to happen during full
      system suspend ("core" or "none") because system suspend also disables
      timers, and thus the workqueue handling PME scans no longer runs.  Geert
      believes the issue may still happen in the small window between disabling
      module clocks and disabling timers:
      
        # echo 0 > /sys/module/printk/parameters/console_suspend
        # echo platform > /sys/power/pm_test    # Or "processors"
        # echo mem > /sys/power/state
      
      (Make sure CONFIG_PCI_RCAR_GEN2 and CONFIG_USB_OHCI_HCD_PCI are enabled.)
      
      Rafael Wysocki agrees that PME scans should be suspended before the host
      bridge registers become inaccessible.  To that end, queue the task on a
      workqueue that gets frozen before devices suspend.
      
      Rafael notes however that as a result, some wakeup events may be missed if
      they are delivered via PME from a device without working IRQ (which hence
      must be polled) and occur after the workqueue has been frozen.  If that
      turns out to be an issue in practice, it may be possible to solve it by
      calling pci_pme_list_scan() once directly from one of the host bridge's
      pm_ops callbacks.
      
      Stacktrace for posterity:
      
        PM: Syncing filesystems ... [   38.566237] done.
        PM: Preparing system for sleep (mem)
        Freezing user space processes ... [   38.579813] (elapsed 0.001 seconds) done.
        Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
        PM: Suspending system (mem)
        PM: suspend of devices complete after 152.456 msecs
        PM: late suspend of devices complete after 2.809 msecs
        PM: noirq suspend of devices complete after 29.863 msecs
        suspend debug: Waiting for 5 second(s).
        Unhandled fault: asynchronous external abort (0x1211) at 0x00000000
        pgd = c0003000
        [00000000] *pgd=80000040004003, *pmd=00000000
        Internal error: : 1211 [#1] SMP ARM
        Modules linked in:
        CPU: 1 PID: 20 Comm: kworker/1:1 Not tainted
        4.9.0-rc1-koelsch-00011-g68db9bc8 #3383
        Hardware name: Generic R8A7791 (Flattened Device Tree)
        Workqueue: events pci_pme_list_scan
        task: eb56e140 task.stack: eb58e000
        PC is at pci_generic_config_read+0x64/0x6c
        LR is at rcar_pci_cfg_base+0x64/0x84
        pc : [<c041d7b4>]    lr : [<c04309a0>]    psr: 600d0093
        sp : eb58fe98  ip : c041d750  fp : 00000008
        r10: c0e2283c  r9 : 00000000  r8 : 600d0013
        r7 : 00000008  r6 : eb58fed6  r5 : 00000002  r4 : eb58feb4
        r3 : 00000000  r2 : 00000044  r1 : 00000008  r0 : 00000000
        Flags: nZCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment user
        Control: 30c5387d  Table: 6a9f6c80  DAC: 55555555
        Process kworker/1:1 (pid: 20, stack limit = 0xeb58e210)
        Stack: (0xeb58fe98 to 0xeb590000)
        fe80:                                                       00000002 00000044
        fea0: eb6f5800 c041d9b0 eb58feb4 00000008 00000044 00000000 eb78a000 eb78a000
        fec0: 00000044 00000000 eb9aff00 c0424bf0 eb78a000 00000000 eb78a000 c0e22830
        fee0: ea8a6fc0 c0424c5c eaae79c0 c0424ce0 eb55f380 c0e22838 eb9a9800 c0235fbc
        ff00: eb55f380 c0e22838 eb55f380 eb9a9800 eb9a9800 eb58e000 eb9a9824 c0e02100
        ff20: eb55f398 c02366c4 eb56e140 eb5631c0 00000000 eb55f380 c023641c 00000000
        ff40: 00000000 00000000 00000000 c023a928 cd105598 00000000 40506a34 eb55f380
        ff60: 00000000 00000000 dead4ead ffffffff ffffffff eb58ff74 eb58ff74 00000000
        ff80: 00000000 dead4ead ffffffff ffffffff eb58ff90 eb58ff90 eb58ffac eb5631c0
        ffa0: c023a844 00000000 00000000 c0206d68 00000000 00000000 00000000 00000000
        ffc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
        ffe0: 00000000 00000000 00000000 00000000 00000013 00000000 3a81336c 10ccd1dd
        [<c041d7b4>] (pci_generic_config_read) from [<c041d9b0>]
        (pci_bus_read_config_word+0x58/0x80)
        [<c041d9b0>] (pci_bus_read_config_word) from [<c0424bf0>]
        (pci_check_pme_status+0x34/0x78)
        [<c0424bf0>] (pci_check_pme_status) from [<c0424c5c>] (pci_pme_wakeup+0x28/0x54)
        [<c0424c5c>] (pci_pme_wakeup) from [<c0424ce0>] (pci_pme_list_scan+0x58/0xb4)
        [<c0424ce0>] (pci_pme_list_scan) from [<c0235fbc>]
        (process_one_work+0x1bc/0x308)
        [<c0235fbc>] (process_one_work) from [<c02366c4>] (worker_thread+0x2a8/0x3e0)
        [<c02366c4>] (worker_thread) from [<c023a928>] (kthread+0xe4/0xfc)
        [<c023a928>] (kthread) from [<c0206d68>] (ret_from_fork+0x14/0x2c)
        Code: ea000000 e5903000 f57ff04f e3a00000 (e5843000)
        ---[ end trace 667d43ba3aa9e589 ]---
      
      Fixes: df17e62e ("PCI: Add support for polling PME state on suspended legacy PCI devices")
      Reported-and-tested-by: default avatarLaurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
      Reported-and-tested-by: default avatarGeert Uytterhoeven <geert+renesas@glider.be>
      Signed-off-by: default avatarLukas Wunner <lukas@wunner.de>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Reviewed-by: default avatarLaurent Pinchart <laurent.pinchart@ideasonboard.com>
      Acked-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
      Cc: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
      Cc: Simon Horman <horms+renesas@verge.net.au>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Matthew Garrett <mjg59@srcf.ucam.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bc428e94
    • David Woodhouse's avatar
      PCI: Fix pci_mmap_fits() for HAVE_PCI_RESOURCE_TO_USER platforms · 5f36c8b4
      David Woodhouse authored
      commit 6bccc7f4 upstream.
      
      In the PCI_MMAP_PROCFS case when the address being passed by the user is a
      'user visible' resource address based on the bus window, and not the actual
      contents of the resource, that's what we need to be checking it against.
      Signed-off-by: default avatarDavid Woodhouse <dwmw@amazon.co.uk>
      Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5f36c8b4
    • Thomas Gleixner's avatar
      tracing/kprobes: Enforce kprobes teardown after testing · 6384f782
      Thomas Gleixner authored
      commit 30e7d894 upstream.
      
      Enabling the tracer selftest triggers occasionally the warning in
      text_poke(), which warns when the to be modified page is not marked
      reserved.
      
      The reason is that the tracer selftest installs kprobes on functions marked
      __init for testing. These probes are removed after the tests, but that
      removal schedules the delayed kprobes_optimizer work, which will do the
      actual text poke. If the work is executed after the init text is freed,
      then the warning triggers. The bug can be reproduced reliably when the work
      delay is increased.
      
      Flush the optimizer work and wait for the optimizing/unoptimizing lists to
      become empty before returning from the kprobes tracer selftest. That
      ensures that all operations which were queued due to the probes removal
      have completed.
      
      Link: http://lkml.kernel.org/r/20170516094802.76a468bb@gandalf.local.homeSigned-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Acked-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Fixes: 6274de49 ("kprobes: Support delayed unoptimizing")
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6384f782
    • Al Viro's avatar
      osf_wait4(): fix infoleak · d5fb96b9
      Al Viro authored
      commit a8c39544 upstream.
      
      failing sys_wait4() won't fill struct rusage...
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d5fb96b9
    • Thomas Gleixner's avatar
      genirq: Fix chained interrupt data ordering · e07db0d7
      Thomas Gleixner authored
      commit 2c4569ca upstream.
      
      irq_set_chained_handler_and_data() sets up the chained interrupt and then
      stores the handler data.
      
      That's racy against an immediate interrupt which gets handled before the
      store of the handler data happened. The handler will dereference a NULL
      pointer and crash.
      
      Cure it by storing handler data before installing the chained handler.
      Reported-by: default avatarBorislav Petkov <bp@alien8.de>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e07db0d7
    • Johan Hovold's avatar
      uwb: fix device quirk on big-endian hosts · 1736f2b3
      Johan Hovold authored
      commit 41318a2b upstream.
      
      Add missing endianness conversion when using the USB device-descriptor
      idProduct field to apply a hardware quirk.
      
      Fixes: 1ba47da5 ("uwb: add the i1480 DFU driver")
      Signed-off-by: default avatarJohan Hovold <johan@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1736f2b3
    • James Hogan's avatar
      metag/uaccess: Check access_ok in strncpy_from_user · ca19dd15
      James Hogan authored
      commit 3a158a62 upstream.
      
      The metag implementation of strncpy_from_user() doesn't validate the src
      pointer, which could allow reading of arbitrary kernel memory. Add a
      short access_ok() check to prevent that.
      
      Its still possible for it to read across the user/kernel boundary, but
      it will invariably reach a NUL character after only 9 bytes, leaking
      only a static kernel address being loaded into D0Re0 at the beginning of
      __start, which is acceptable for the immediate fix.
      Reported-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarJames Hogan <james.hogan@imgtec.com>
      Cc: linux-metag@vger.kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ca19dd15
    • James Hogan's avatar
      metag/uaccess: Fix access_ok() · 2d9b2e78
      James Hogan authored
      commit 8a8b5663 upstream.
      
      The __user_bad() macro used by access_ok() has a few corner cases
      noticed by Al Viro where it doesn't behave correctly:
      
       - The kernel range check has off by 1 errors which permit access to the
         first and last byte of the kernel mapped range.
      
       - The kernel range check ends at LINCORE_BASE rather than
         META_MEMORY_LIMIT, which is ineffective when the kernel is in global
         space (an extremely uncommon configuration).
      
      There are a couple of other shortcomings here too:
      
       - Access to the whole of the other address space is permitted (i.e. the
         global half of the address space when the kernel is in local space).
         This isn't ideal as it could theoretically still contain privileged
         mappings set up by the bootloader.
      
       - The size argument is unused, permitting user copies which start on
         valid pages at the end of the user address range and cross the
         boundary into the kernel address space (e.g. addr = 0x3ffffff0, size
         > 0x10).
      
      It isn't very convenient to add size checks when disallowing certain
      regions, and it seems far safer to be sure and explicit about what
      userland is able to access, so invert the logic to allow certain regions
      instead, and fix the off by 1 errors and missing size checks. This also
      allows the get_fs() == KERNEL_DS check to be more easily optimised into
      the user address range case.
      
      We now have 3 such allowed regions:
      
       - The user address range (incorporating the get_fs() == KERNEL_DS
         check).
      
       - NULL (some kernel code expects this to work, and we'll always catch
         the fault anyway).
      
       - The core code memory region.
      
      Fixes: 373cd784 ("metag: Memory handling")
      Reported-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarJames Hogan <james.hogan@imgtec.com>
      Cc: linux-metag@vger.kernel.org
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2d9b2e78
    • KarimAllah Ahmed's avatar
      iommu/vt-d: Flush the IOTLB to get rid of the initial kdump mappings · 98d5e843
      KarimAllah Ahmed authored
      commit f73a7eee upstream.
      
      Ever since commit 091d42e4 ("iommu/vt-d: Copy translation tables from
      old kernel") the kdump kernel copies the IOMMU context tables from the
      previous kernel. Each device mappings will be destroyed once the driver
      for the respective device takes over.
      
      This unfortunately breaks the workflow of mapping and unmapping a new
      context to the IOMMU. The mapping function assumes that either:
      
      1) Unmapping did the proper IOMMU flushing and it only ever flush if the
         IOMMU unit supports caching invalid entries.
      2) The system just booted and the initialization code took care of
         flushing all IOMMU caches.
      
      This assumption is not true for the kdump kernel since the context
      tables have been copied from the previous kernel and translations could
      have been cached ever since. So make sure to flush the IOTLB as well
      when we destroy these old copied mappings.
      
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: Anthony Liguori <aliguori@amazon.com>
      Signed-off-by: default avatarKarimAllah Ahmed <karahmed@amazon.de>
      Acked-by: default avatarDavid Woodhouse <dwmw@amazon.co.uk>
      Fixes: 091d42e4 ("iommu/vt-d: Copy translation tables from old kernel")
      Signed-off-by: default avatarJoerg Roedel <jroedel@suse.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      98d5e843
    • Malcolm Priestley's avatar
      staging: rtl8192e: rtl92e_get_eeprom_size Fix read size of EPROM_CMD. · cb89b1f9
      Malcolm Priestley authored
      commit 90be652c upstream.
      
      EPROM_CMD is 2 byte aligned on PCI map so calling with rtl92e_readl
      will return invalid data so use rtl92e_readw.
      
      The device is unable to select the right eeprom type.
      Signed-off-by: default avatarMalcolm Priestley <tvboxspy@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cb89b1f9
    • Malcolm Priestley's avatar
      staging: rtl8192e: fix 2 byte alignment of register BSSIDR. · 427907e5
      Malcolm Priestley authored
      commit 867510bd upstream.
      
      BSSIDR has two byte alignment on PCI ioremap correct the write
      by swapping to 16 bits first.
      
      This fixes a problem that the device associates fail because
      the filter is not set correctly.
      Signed-off-by: default avatarMalcolm Priestley <tvboxspy@gmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      427907e5
    • Keno Fischer's avatar
      mm/huge_memory.c: respect FOLL_FORCE/FOLL_COW for thp · 8b26f53b
      Keno Fischer authored
      commit 8310d48b upstream.
      
      In commit 19be0eaf ("mm: remove gup_flags FOLL_WRITE games from
      __get_user_pages()"), the mm code was changed from unsetting FOLL_WRITE
      after a COW was resolved to setting the (newly introduced) FOLL_COW
      instead.  Simultaneously, the check in gup.c was updated to still allow
      writes with FOLL_FORCE set if FOLL_COW had also been set.
      
      However, a similar check in huge_memory.c was forgotten.  As a result,
      remote memory writes to ro regions of memory backed by transparent huge
      pages cause an infinite loop in the kernel (handle_mm_fault sets
      FOLL_COW and returns 0 causing a retry, but follow_trans_huge_pmd bails
      out immidiately because `(flags & FOLL_WRITE) && !pmd_write(*pmd)` is
      true.
      
      While in this state the process is stil SIGKILLable, but little else
      works (e.g.  no ptrace attach, no other signals).  This is easily
      reproduced with the following code (assuming thp are set to always):
      
          #include <assert.h>
          #include <fcntl.h>
          #include <stdint.h>
          #include <stdio.h>
          #include <string.h>
          #include <sys/mman.h>
          #include <sys/stat.h>
          #include <sys/types.h>
          #include <sys/wait.h>
          #include <unistd.h>
      
          #define TEST_SIZE 5 * 1024 * 1024
      
          int main(void) {
            int status;
            pid_t child;
            int fd = open("/proc/self/mem", O_RDWR);
            void *addr = mmap(NULL, TEST_SIZE, PROT_READ,
                              MAP_ANONYMOUS | MAP_PRIVATE, 0, 0);
            assert(addr != MAP_FAILED);
            pid_t parent_pid = getpid();
            if ((child = fork()) == 0) {
              void *addr2 = mmap(NULL, TEST_SIZE, PROT_READ | PROT_WRITE,
                                 MAP_ANONYMOUS | MAP_PRIVATE, 0, 0);
              assert(addr2 != MAP_FAILED);
              memset(addr2, 'a', TEST_SIZE);
              pwrite(fd, addr2, TEST_SIZE, (uintptr_t)addr);
              return 0;
            }
            assert(child == waitpid(child, &status, 0));
            assert(WIFEXITED(status) && WEXITSTATUS(status) == 0);
            return 0;
          }
      
      Fix this by updating follow_trans_huge_pmd in huge_memory.c analogously
      to the update in gup.c in the original commit.  The same pattern exists
      in follow_devmap_pmd.  However, we should not be able to reach that
      check with FOLL_COW set, so add WARN_ONCE to make sure we notice if we
      ever do.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Link: http://lkml.kernel.org/r/20170106015025.GA38411@juliacomputing.comSigned-off-by: default avatarKeno Fischer <keno@juliacomputing.com>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Willy Tarreau <w@1wt.eu>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      [AmitP: Minor refactoring of upstream changes for linux-3.18.y,
              where follow_devmap_pmd() doesn't exist.]
      Signed-off-by: default avatarAmit Pundir <amit.pundir@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8b26f53b
    • Takashi Iwai's avatar
      xc2028: Fix use-after-free bug properly · f03484fd
      Takashi Iwai authored
      commit 22a1e778 upstream.
      
      The commit 8dfbcc43 ("[media] xc2028: avoid use after free") tried
      to address the reported use-after-free by clearing the reference.
      
      However, it's clearing the wrong pointer; it sets NULL to
      priv->ctrl.fname, but it's anyway overwritten by the next line
      memcpy(&priv->ctrl, p, sizeof(priv->ctrl)).
      
      OTOH, the actual code accessing the freed string is the strcmp() call
      with priv->fname:
      	if (!firmware_name[0] && p->fname &&
      	    priv->fname && strcmp(p->fname, priv->fname))
      		free_firmware(priv);
      
      where priv->fname points to the previous file name, and this was
      already freed by kfree().
      
      For fixing the bug properly, this patch does the following:
      
      - Keep the copy of firmware file name in only priv->fname,
        priv->ctrl.fname isn't changed;
      - The allocation is done only when the firmware gets loaded;
      - The kfree() is called in free_firmware() commonly
      
      Fixes: commit 8dfbcc43 ('[media] xc2028: avoid use after free')
      Signed-off-by: default avatarTakashi Iwai <tiwai@suse.de>
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab@s-opensource.com>
      Signed-off-by: default avatarAmit Pundir <amit.pundir@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f03484fd
    • Kristina Martsenko's avatar
      arm64: documentation: document tagged pointer stack constraints · e0188a55
      Kristina Martsenko authored
      commit f0e421b1 upstream.
      
      Some kernel features don't currently work if a task puts a non-zero
      address tag in its stack pointer, frame pointer, or frame record entries
      (FP, LR).
      
      For example, with a tagged stack pointer, the kernel can't deliver
      signals to the process, and the task is killed instead. As another
      example, with a tagged frame pointer or frame records, perf fails to
      generate call graphs or resolve symbols.
      
      For now, just document these limitations, instead of finding and fixing
      everything that doesn't work, as it's not known if anyone needs to use
      tags in these places anyway.
      
      In addition, as requested by Dave Martin, generalize the limitations
      into a general kernel address tag policy, and refactor
      tagged-pointers.txt to include it.
      
      Fixes: d50240a5 ("arm64: mm: permit use of tagged pointers at EL0")
      Reviewed-by: default avatarDave Martin <Dave.Martin@arm.com>
      Acked-by: default avatarWill Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarKristina Martsenko <kristina.martsenko@arm.com>
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e0188a55