1. 20 Dec, 2017 3 commits
    • Eric Biggers's avatar
      crypto: rsa - fix buffer overread when stripping leading zeroes · 80dbdc5a
      Eric Biggers authored
      commit d2890c37 upstream.
      
      In rsa_get_n(), if the buffer contained all 0's and "FIPS mode" is
      enabled, we would read one byte past the end of the buffer while
      scanning the leading zeroes.  Fix it by checking 'n_sz' before '!*ptr'.
      
      This bug was reachable by adding a specially crafted key of type
      "asymmetric" (requires CONFIG_RSA and CONFIG_X509_CERTIFICATE_PARSER).
      
      KASAN report:
      
          BUG: KASAN: slab-out-of-bounds in rsa_get_n+0x19e/0x1d0 crypto/rsa_helper.c:33
          Read of size 1 at addr ffff88003501a708 by task keyctl/196
      
          CPU: 1 PID: 196 Comm: keyctl Not tainted 4.14.0-09238-g1d3b78bb #26
          Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-20171110_100015-anatol 04/01/2014
          Call Trace:
           rsa_get_n+0x19e/0x1d0 crypto/rsa_helper.c:33
           asn1_ber_decoder+0x82a/0x1fd0 lib/asn1_decoder.c:328
           rsa_set_pub_key+0xd3/0x320 crypto/rsa.c:278
           crypto_akcipher_set_pub_key ./include/crypto/akcipher.h:364 [inline]
           pkcs1pad_set_pub_key+0xae/0x200 crypto/rsa-pkcs1pad.c:117
           crypto_akcipher_set_pub_key ./include/crypto/akcipher.h:364 [inline]
           public_key_verify_signature+0x270/0x9d0 crypto/asymmetric_keys/public_key.c:106
           x509_check_for_self_signed+0x2ea/0x480 crypto/asymmetric_keys/x509_public_key.c:141
           x509_cert_parse+0x46a/0x620 crypto/asymmetric_keys/x509_cert_parser.c:129
           x509_key_preparse+0x61/0x750 crypto/asymmetric_keys/x509_public_key.c:174
           asymmetric_key_preparse+0xa4/0x150 crypto/asymmetric_keys/asymmetric_type.c:388
           key_create_or_update+0x4d4/0x10a0 security/keys/key.c:850
           SYSC_add_key security/keys/keyctl.c:122 [inline]
           SyS_add_key+0xe8/0x290 security/keys/keyctl.c:62
           entry_SYSCALL_64_fastpath+0x1f/0x96
      
          Allocated by task 196:
           __do_kmalloc mm/slab.c:3711 [inline]
           __kmalloc_track_caller+0x118/0x2e0 mm/slab.c:3726
           kmemdup+0x17/0x40 mm/util.c:118
           kmemdup ./include/linux/string.h:414 [inline]
           x509_cert_parse+0x2cb/0x620 crypto/asymmetric_keys/x509_cert_parser.c:106
           x509_key_preparse+0x61/0x750 crypto/asymmetric_keys/x509_public_key.c:174
           asymmetric_key_preparse+0xa4/0x150 crypto/asymmetric_keys/asymmetric_type.c:388
           key_create_or_update+0x4d4/0x10a0 security/keys/key.c:850
           SYSC_add_key security/keys/keyctl.c:122 [inline]
           SyS_add_key+0xe8/0x290 security/keys/keyctl.c:62
           entry_SYSCALL_64_fastpath+0x1f/0x96
      
      Fixes: 5a7de973 ("crypto: rsa - return raw integers for the ASN.1 parser")
      Cc: Tudor Ambarus <tudor-dan.ambarus@nxp.com>
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Reviewed-by: default avatarJames Morris <james.l.morris@oracle.com>
      Reviewed-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      80dbdc5a
    • Eric Biggers's avatar
      crypto: algif_aead - fix reference counting of null skcipher · 96c2dfae
      Eric Biggers authored
      commit b32a7dc8 upstream.
      
      In the AEAD interface for AF_ALG, the reference to the "null skcipher"
      held by each tfm was being dropped in the wrong place -- when each
      af_alg_ctx was freed instead of when the aead_tfm was freed.  As
      discovered by syzkaller, a specially crafted program could use this to
      cause the null skcipher to be freed while it is still in use.
      
      Fix it by dropping the reference in the right place.
      
      Fixes: 72548b09 ("crypto: algif_aead - copy AAD from src to dst")
      Reported-by: default avatarsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Reviewed-by: default avatarStephan Mueller <smueller@chronox.de>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      96c2dfae
    • Martin Kaiser's avatar
      mfd: fsl-imx25: Clean up irq settings during removal · 3d27b022
      Martin Kaiser authored
      commit 18f77393 upstream.
      
      When fsl-imx25-tsadc is compiled as a module, loading, unloading and
      reloading the module will lead to a crash.
      
      Unable to handle kernel paging request at virtual address bf005430
      [<c004df6c>] (irq_find_matching_fwspec)
         from [<c028d5ec>] (of_irq_get+0x58/0x74)
      [<c028d594>] (of_irq_get)
         from [<c01ff970>] (platform_get_irq+0x48/0xc8)
      [<c01ff928>] (platform_get_irq)
         from [<bf00e33c>] (mx25_tsadc_probe+0x220/0x2f4 [fsl_imx25_tsadc])
      
      irq_find_matching_fwspec() loops over all registered irq domains. The
      irq domain is still registered from last time the module was loaded but
      the pointer to its operations is invalid after the module was unloaded.
      
      Add a removal function which clears the irq handler and removes the irq
      domain. With this cleanup in place, it's possible to unload and reload
      the module.
      Signed-off-by: default avatarMartin Kaiser <martin@kaiser.cx>
      Reviewed-by: default avatarLucas Stach <l.stach@pengutronix.de>
      Signed-off-by: default avatarLee Jones <lee.jones@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3d27b022
  2. 17 Dec, 2017 37 commits
    • Greg Kroah-Hartman's avatar
      Linux 4.14.7 · 3afae843
      Greg Kroah-Hartman authored
      3afae843
    • Mauro Carvalho Chehab's avatar
      dvb_frontend: don't use-after-free the frontend struct · 7bc8eb30
      Mauro Carvalho Chehab authored
      commit b1cb7372 upstream.
      
      dvb_frontend_invoke_release() may free the frontend struct.
      So, the free logic can't update it anymore after calling it.
      
      That's OK, as __dvb_frontend_free() is called only when the
      krefs are zeroed, so nobody is using it anymore.
      
      That should fix the following KASAN error:
      
      The KASAN report looks like this (running on kernel 3e0cc09a (4.14-rc5+)):
      ==================================================================
      BUG: KASAN: use-after-free in __dvb_frontend_free+0x113/0x120
      Write of size 8 at addr ffff880067d45a00 by task kworker/0:1/24
      
      CPU: 0 PID: 24 Comm: kworker/0:1 Not tainted 4.14.0-rc5-43687-g06ab8a23e0e6 #545
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
      Workqueue: usb_hub_wq hub_event
      Call Trace:
       __dump_stack lib/dump_stack.c:16
       dump_stack+0x292/0x395 lib/dump_stack.c:52
       print_address_description+0x78/0x280 mm/kasan/report.c:252
       kasan_report_error mm/kasan/report.c:351
       kasan_report+0x23d/0x350 mm/kasan/report.c:409
       __asan_report_store8_noabort+0x1c/0x20 mm/kasan/report.c:435
       __dvb_frontend_free+0x113/0x120 drivers/media/dvb-core/dvb_frontend.c:156
       dvb_frontend_put+0x59/0x70 drivers/media/dvb-core/dvb_frontend.c:176
       dvb_frontend_detach+0x120/0x150 drivers/media/dvb-core/dvb_frontend.c:2803
       dvb_usb_adapter_frontend_exit+0xd6/0x160 drivers/media/usb/dvb-usb/dvb-usb-dvb.c:340
       dvb_usb_adapter_exit drivers/media/usb/dvb-usb/dvb-usb-init.c:116
       dvb_usb_exit+0x9b/0x200 drivers/media/usb/dvb-usb/dvb-usb-init.c:132
       dvb_usb_device_exit+0xa5/0xf0 drivers/media/usb/dvb-usb/dvb-usb-init.c:295
       usb_unbind_interface+0x21c/0xa90 drivers/usb/core/driver.c:423
       __device_release_driver drivers/base/dd.c:861
       device_release_driver_internal+0x4f1/0x5c0 drivers/base/dd.c:893
       device_release_driver+0x1e/0x30 drivers/base/dd.c:918
       bus_remove_device+0x2f4/0x4b0 drivers/base/bus.c:565
       device_del+0x5c4/0xab0 drivers/base/core.c:1985
       usb_disable_device+0x1e9/0x680 drivers/usb/core/message.c:1170
       usb_disconnect+0x260/0x7a0 drivers/usb/core/hub.c:2124
       hub_port_connect drivers/usb/core/hub.c:4754
       hub_port_connect_change drivers/usb/core/hub.c:5009
       port_event drivers/usb/core/hub.c:5115
       hub_event+0x1318/0x3740 drivers/usb/core/hub.c:5195
       process_one_work+0xc73/0x1d90 kernel/workqueue.c:2119
       worker_thread+0x221/0x1850 kernel/workqueue.c:2253
       kthread+0x363/0x440 kernel/kthread.c:231
       ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:431
      
      Allocated by task 24:
       save_stack_trace+0x1b/0x20 arch/x86/kernel/stacktrace.c:59
       save_stack+0x43/0xd0 mm/kasan/kasan.c:447
       set_track mm/kasan/kasan.c:459
       kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:551
       kmem_cache_alloc_trace+0x11e/0x2d0 mm/slub.c:2772
       kmalloc ./include/linux/slab.h:493
       kzalloc ./include/linux/slab.h:666
       dtt200u_fe_attach+0x4c/0x110 drivers/media/usb/dvb-usb/dtt200u-fe.c:212
       dtt200u_frontend_attach+0x35/0x80 drivers/media/usb/dvb-usb/dtt200u.c:136
       dvb_usb_adapter_frontend_init+0x32b/0x660 drivers/media/usb/dvb-usb/dvb-usb-dvb.c:286
       dvb_usb_adapter_init drivers/media/usb/dvb-usb/dvb-usb-init.c:86
       dvb_usb_init drivers/media/usb/dvb-usb/dvb-usb-init.c:162
       dvb_usb_device_init+0xf73/0x17f0 drivers/media/usb/dvb-usb/dvb-usb-init.c:277
       dtt200u_usb_probe+0xa1/0xe0 drivers/media/usb/dvb-usb/dtt200u.c:155
       usb_probe_interface+0x35d/0x8e0 drivers/usb/core/driver.c:361
       really_probe drivers/base/dd.c:413
       driver_probe_device+0x610/0xa00 drivers/base/dd.c:557
       __device_attach_driver+0x230/0x290 drivers/base/dd.c:653
       bus_for_each_drv+0x161/0x210 drivers/base/bus.c:463
       __device_attach+0x26b/0x3c0 drivers/base/dd.c:710
       device_initial_probe+0x1f/0x30 drivers/base/dd.c:757
       bus_probe_device+0x1eb/0x290 drivers/base/bus.c:523
       device_add+0xd0b/0x1660 drivers/base/core.c:1835
       usb_set_configuration+0x104e/0x1870 drivers/usb/core/message.c:1932
       generic_probe+0x73/0xe0 drivers/usb/core/generic.c:174
       usb_probe_device+0xaf/0xe0 drivers/usb/core/driver.c:266
       really_probe drivers/base/dd.c:413
       driver_probe_device+0x610/0xa00 drivers/base/dd.c:557
       __device_attach_driver+0x230/0x290 drivers/base/dd.c:653
       bus_for_each_drv+0x161/0x210 drivers/base/bus.c:463
       __device_attach+0x26b/0x3c0 drivers/base/dd.c:710
       device_initial_probe+0x1f/0x30 drivers/base/dd.c:757
       bus_probe_device+0x1eb/0x290 drivers/base/bus.c:523
       device_add+0xd0b/0x1660 drivers/base/core.c:1835
       usb_new_device+0x7b8/0x1020 drivers/usb/core/hub.c:2457
       hub_port_connect drivers/usb/core/hub.c:4903
       hub_port_connect_change drivers/usb/core/hub.c:5009
       port_event drivers/usb/core/hub.c:5115
       hub_event+0x194d/0x3740 drivers/usb/core/hub.c:5195
       process_one_work+0xc73/0x1d90 kernel/workqueue.c:2119
       worker_thread+0x221/0x1850 kernel/workqueue.c:2253
       kthread+0x363/0x440 kernel/kthread.c:231
       ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:431
      
      Freed by task 24:
       save_stack_trace+0x1b/0x20 arch/x86/kernel/stacktrace.c:59
       save_stack+0x43/0xd0 mm/kasan/kasan.c:447
       set_track mm/kasan/kasan.c:459
       kasan_slab_free+0x72/0xc0 mm/kasan/kasan.c:524
       slab_free_hook mm/slub.c:1390
       slab_free_freelist_hook mm/slub.c:1412
       slab_free mm/slub.c:2988
       kfree+0xf6/0x2f0 mm/slub.c:3919
       dtt200u_fe_release+0x3c/0x50 drivers/media/usb/dvb-usb/dtt200u-fe.c:202
       dvb_frontend_invoke_release.part.13+0x1c/0x30 drivers/media/dvb-core/dvb_frontend.c:2790
       dvb_frontend_invoke_release drivers/media/dvb-core/dvb_frontend.c:2789
       __dvb_frontend_free+0xad/0x120 drivers/media/dvb-core/dvb_frontend.c:153
       dvb_frontend_put+0x59/0x70 drivers/media/dvb-core/dvb_frontend.c:176
       dvb_frontend_detach+0x120/0x150 drivers/media/dvb-core/dvb_frontend.c:2803
       dvb_usb_adapter_frontend_exit+0xd6/0x160 drivers/media/usb/dvb-usb/dvb-usb-dvb.c:340
       dvb_usb_adapter_exit drivers/media/usb/dvb-usb/dvb-usb-init.c:116
       dvb_usb_exit+0x9b/0x200 drivers/media/usb/dvb-usb/dvb-usb-init.c:132
       dvb_usb_device_exit+0xa5/0xf0 drivers/media/usb/dvb-usb/dvb-usb-init.c:295
       usb_unbind_interface+0x21c/0xa90 drivers/usb/core/driver.c:423
       __device_release_driver drivers/base/dd.c:861
       device_release_driver_internal+0x4f1/0x5c0 drivers/base/dd.c:893
       device_release_driver+0x1e/0x30 drivers/base/dd.c:918
       bus_remove_device+0x2f4/0x4b0 drivers/base/bus.c:565
       device_del+0x5c4/0xab0 drivers/base/core.c:1985
       usb_disable_device+0x1e9/0x680 drivers/usb/core/message.c:1170
       usb_disconnect+0x260/0x7a0 drivers/usb/core/hub.c:2124
       hub_port_connect drivers/usb/core/hub.c:4754
       hub_port_connect_change drivers/usb/core/hub.c:5009
       port_event drivers/usb/core/hub.c:5115
       hub_event+0x1318/0x3740 drivers/usb/core/hub.c:5195
       process_one_work+0xc73/0x1d90 kernel/workqueue.c:2119
       worker_thread+0x221/0x1850 kernel/workqueue.c:2253
       kthread+0x363/0x440 kernel/kthread.c:231
       ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:431
      
      The buggy address belongs to the object at ffff880067d45500
       which belongs to the cache kmalloc-2048 of size 2048
      The buggy address is located 1280 bytes inside of
       2048-byte region [ffff880067d45500, ffff880067d45d00)
      The buggy address belongs to the page:
      page:ffffea00019f5000 count:1 mapcount:0 mapping:          (null)
      index:0x0 compound_mapcount: 0
      flags: 0x100000000008100(slab|head)
      raw: 0100000000008100 0000000000000000 0000000000000000 00000001000f000f
      raw: dead000000000100 dead000000000200 ffff88006c002d80 0000000000000000
      page dumped because: kasan: bad access detected
      
      Memory state around the buggy address:
       ffff880067d45900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
       ffff880067d45980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
       ffff880067d45a00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                         ^
       ffff880067d45a80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
       ffff880067d45b00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      ==================================================================
      
      Fixes: ead66600 ("media: dvb_frontend: only use kref after initialized")
      Reported-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Suggested-by: default avatarMatthias Schwarzott <zzam@gentoo.org>
      Tested-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab@s-opensource.com>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7bc8eb30
    • Daniel Scheller's avatar
      media: dvb-core: always call invoke_release() in fe_free() · ce344631
      Daniel Scheller authored
      commit 62229de1 upstream.
      
      Follow-up to: ead66600 ("media: dvb_frontend: only use kref after initialized")
      
      The aforementioned commit fixed refcount OOPSes when demod driver attaching
      succeeded but tuner driver didn't. However, the use count of the attached
      demod drivers don't go back to zero and thus couldn't be cleanly unloaded.
      Improve on this by calling dvb_frontend_invoke_release() in
      __dvb_frontend_free() regardless of fepriv being NULL, instead of returning
      when fepriv is NULL. This is safe to do since _invoke_release() will check
      for passed pointers being valid before calling the .release() function.
      
      [mchehab@s-opensource.com: changed the logic a little bit to reduce
       conflicts with another bug fix patch under review]
      Fixes: ead66600 ("media: dvb_frontend: only use kref after initialized")
      Signed-off-by: default avatarDaniel Scheller <d.scheller@gmx.net>
      Signed-off-by: default avatarMauro Carvalho Chehab <mchehab@s-opensource.com>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ce344631
    • Reinette Chatre's avatar
      x86/intel_rdt: Fix potential deadlock during resctrl unmount · 93e2d845
      Reinette Chatre authored
      
      [ Upstream commit 36b6f9fc ]
      
      Lockdep warns about a potential deadlock:
      
      [   66.782842] ======================================================
      [   66.782888] WARNING: possible circular locking dependency detected
      [   66.782937] 4.14.0-rc2-test-test+ #48 Not tainted
      [   66.782983] ------------------------------------------------------
      [   66.783052] umount/336 is trying to acquire lock:
      [   66.783117]  (cpu_hotplug_lock.rw_sem){++++}, at: [<ffffffff81032395>] rdt_kill_sb+0x215/0x390
      [   66.783193]
                     but task is already holding lock:
      [   66.783244]  (rdtgroup_mutex){+.+.}, at: [<ffffffff810321b6>] rdt_kill_sb+0x36/0x390
      [   66.783305]
                     which lock already depends on the new lock.
      
      [   66.783364]
                     the existing dependency chain (in reverse order) is:
      [   66.783419]
                     -> #3 (rdtgroup_mutex){+.+.}:
      [   66.783467]        __lock_acquire+0x1293/0x13f0
      [   66.783509]        lock_acquire+0xaf/0x220
      [   66.783543]        __mutex_lock+0x71/0x9b0
      [   66.783575]        mutex_lock_nested+0x1b/0x20
      [   66.783610]        intel_rdt_online_cpu+0x3b/0x430
      [   66.783649]        cpuhp_invoke_callback+0xab/0x8e0
      [   66.783687]        cpuhp_thread_fun+0x7a/0x150
      [   66.783722]        smpboot_thread_fn+0x1cc/0x270
      [   66.783764]        kthread+0x16e/0x190
      [   66.783794]        ret_from_fork+0x27/0x40
      [   66.783825]
                     -> #2 (cpuhp_state){+.+.}:
      [   66.783870]        __lock_acquire+0x1293/0x13f0
      [   66.783906]        lock_acquire+0xaf/0x220
      [   66.783938]        cpuhp_issue_call+0x102/0x170
      [   66.783974]        __cpuhp_setup_state_cpuslocked+0x154/0x2a0
      [   66.784023]        __cpuhp_setup_state+0xc7/0x170
      [   66.784061]        page_writeback_init+0x43/0x67
      [   66.784097]        pagecache_init+0x43/0x4a
      [   66.784131]        start_kernel+0x3ad/0x3f7
      [   66.784165]        x86_64_start_reservations+0x2a/0x2c
      [   66.784204]        x86_64_start_kernel+0x72/0x75
      [   66.784241]        verify_cpu+0x0/0xfb
      [   66.784270]
                     -> #1 (cpuhp_state_mutex){+.+.}:
      [   66.784319]        __lock_acquire+0x1293/0x13f0
      [   66.784355]        lock_acquire+0xaf/0x220
      [   66.784387]        __mutex_lock+0x71/0x9b0
      [   66.784419]        mutex_lock_nested+0x1b/0x20
      [   66.784454]        __cpuhp_setup_state_cpuslocked+0x52/0x2a0
      [   66.784497]        __cpuhp_setup_state+0xc7/0x170
      [   66.784535]        page_alloc_init+0x28/0x30
      [   66.784569]        start_kernel+0x148/0x3f7
      [   66.784602]        x86_64_start_reservations+0x2a/0x2c
      [   66.784642]        x86_64_start_kernel+0x72/0x75
      [   66.784678]        verify_cpu+0x0/0xfb
      [   66.784707]
                     -> #0 (cpu_hotplug_lock.rw_sem){++++}:
      [   66.784759]        check_prev_add+0x32f/0x6e0
      [   66.784794]        __lock_acquire+0x1293/0x13f0
      [   66.784830]        lock_acquire+0xaf/0x220
      [   66.784863]        cpus_read_lock+0x3d/0xb0
      [   66.784896]        rdt_kill_sb+0x215/0x390
      [   66.784930]        deactivate_locked_super+0x3e/0x70
      [   66.784968]        deactivate_super+0x40/0x60
      [   66.785003]        cleanup_mnt+0x3f/0x80
      [   66.785034]        __cleanup_mnt+0x12/0x20
      [   66.785070]        task_work_run+0x8b/0xc0
      [   66.785103]        exit_to_usermode_loop+0x94/0xa0
      [   66.786804]        syscall_return_slowpath+0xe8/0x150
      [   66.788502]        entry_SYSCALL_64_fastpath+0xab/0xad
      [   66.790194]
                     other info that might help us debug this:
      
      [   66.795139] Chain exists of:
                       cpu_hotplug_lock.rw_sem --> cpuhp_state --> rdtgroup_mutex
      
      [   66.800035]  Possible unsafe locking scenario:
      
      [   66.803267]        CPU0                    CPU1
      [   66.804867]        ----                    ----
      [   66.806443]   lock(rdtgroup_mutex);
      [   66.808002]                                lock(cpuhp_state);
      [   66.809565]                                lock(rdtgroup_mutex);
      [   66.811110]   lock(cpu_hotplug_lock.rw_sem);
      [   66.812608]
                      *** DEADLOCK ***
      
      [   66.816983] 2 locks held by umount/336:
      [   66.818418]  #0:  (&type->s_umount_key#35){+.+.}, at: [<ffffffff81229738>] deactivate_super+0x38/0x60
      [   66.819922]  #1:  (rdtgroup_mutex){+.+.}, at: [<ffffffff810321b6>] rdt_kill_sb+0x36/0x390
      
      When the resctrl filesystem is unmounted the locks should be obtain in the
      locks in the same order as was done when the cpus came online:
      
            cpu_hotplug_lock before rdtgroup_mutex.
      
      This also requires to switch the static_branch_disable() calls to the
      _cpulocked variant because now cpu hotplug lock is held already.
      
      [ tglx: Switched to cpus_read_[un]lock ]
      Signed-off-by: default avatarReinette Chatre <reinette.chatre@intel.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Tested-by: default avatarSai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
      Acked-by: default avatarVikas Shivappa <vikas.shivappa@linux.intel.com>
      Acked-by: default avatarFenghua Yu <fenghua.yu@intel.com>
      Acked-by: default avatarTony Luck <tony.luck@intel.com>
      Link: https://lkml.kernel.org/r/cc292e76be073f7260604651711c47b09fd0dc81.1508490116.git.reinette.chatre@intel.comSigned-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      93e2d845
    • Leon Romanovsky's avatar
      RDMA/cxgb4: Annotate r2 and stag as __be32 · f0a7d7b4
      Leon Romanovsky authored
      
      [ Upstream commit 7d7d065a ]
      
      Chelsio cxgb4 HW is big-endian, hence there is need to properly
      annotate r2 and stag fields as __be32 and not __u32 to fix the
      following sparse warnings.
      
        drivers/infiniband/hw/cxgb4/qp.c:614:16:
          warning: incorrect type in assignment (different base types)
            expected unsigned int [unsigned] [usertype] r2
            got restricted __be32 [usertype] <noident>
        drivers/infiniband/hw/cxgb4/qp.c:615:18:
          warning: incorrect type in assignment (different base types)
            expected unsigned int [unsigned] [usertype] stag
            got restricted __be32 [usertype] <noident>
      
      Cc: Steve Wise <swise@opengridcomputing.com>
      Signed-off-by: default avatarLeon Romanovsky <leon@kernel.org>
      Reviewed-by: default avatarSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: default avatarDoug Ledford <dledford@redhat.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f0a7d7b4
    • Zdenek Kabelac's avatar
      md: free unused memory after bitmap resize · 89a459e0
      Zdenek Kabelac authored
      
      [ Upstream commit 0868b99c ]
      
      When bitmap is resized, the old kalloced chunks just are not released
      once the resized bitmap starts to use new space.
      
      This fixes in particular kmemleak reports like this one:
      
      unreferenced object 0xffff8f4311e9c000 (size 4096):
        comm "lvm", pid 19333, jiffies 4295263268 (age 528.265s)
        hex dump (first 32 bytes):
          02 80 02 80 02 80 02 80 02 80 02 80 02 80 02 80  ................
          02 80 02 80 02 80 02 80 02 80 02 80 02 80 02 80  ................
        backtrace:
          [<ffffffffa69471ca>] kmemleak_alloc+0x4a/0xa0
          [<ffffffffa628c10e>] kmem_cache_alloc_trace+0x14e/0x2e0
          [<ffffffffa676cfec>] bitmap_checkpage+0x7c/0x110
          [<ffffffffa676d0c5>] bitmap_get_counter+0x45/0xd0
          [<ffffffffa676d6b3>] bitmap_set_memory_bits+0x43/0xe0
          [<ffffffffa676e41c>] bitmap_init_from_disk+0x23c/0x530
          [<ffffffffa676f1ae>] bitmap_load+0xbe/0x160
          [<ffffffffc04c47d3>] raid_preresume+0x203/0x2f0 [dm_raid]
          [<ffffffffa677762f>] dm_table_resume_targets+0x4f/0xe0
          [<ffffffffa6774b52>] dm_resume+0x122/0x140
          [<ffffffffa6779b9f>] dev_suspend+0x18f/0x290
          [<ffffffffa677a3a7>] ctl_ioctl+0x287/0x560
          [<ffffffffa677a693>] dm_ctl_ioctl+0x13/0x20
          [<ffffffffa62d6b46>] do_vfs_ioctl+0xa6/0x750
          [<ffffffffa62d7269>] SyS_ioctl+0x79/0x90
          [<ffffffffa6956d41>] entry_SYSCALL_64_fastpath+0x1f/0xc2
      Signed-off-by: default avatarZdenek Kabelac <zkabelac@redhat.com>
      Signed-off-by: default avatarShaohua Li <shli@fb.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      89a459e0
    • Heinz Mauelshagen's avatar
      dm raid: fix panic when attempting to force a raid to sync · 2c727856
      Heinz Mauelshagen authored
      
      [ Upstream commit 23397844 ]
      
      Requesting a sync on an active raid device via a table reload
      (see 'sync' parameter in Documentation/device-mapper/dm-raid.txt)
      skips the super_load() call that defines the superblock size
      (rdev->sb_size) -- resulting in an oops if/when super_sync()->memset()
      is called.
      
      Fix by moving the initialization of the superblock start and size
      out of super_load() to the caller (analyse_superblocks).
      Signed-off-by: default avatarHeinz Mauelshagen <heinzm@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2c727856
    • Paul Moore's avatar
      audit: ensure that 'audit=1' actually enables audit for PID 1 · 0ad0bb60
      Paul Moore authored
      
      [ Upstream commit 173743dd ]
      
      Prior to this patch we enabled audit in audit_init(), which is too
      late for PID 1 as the standard initcalls are run after the PID 1 task
      is forked.  This means that we never allocate an audit_context (see
      audit_alloc()) for PID 1 and therefore miss a lot of audit events
      generated by PID 1.
      
      This patch enables audit as early as possible to help ensure that when
      PID 1 is forked it can allocate an audit_context if required.
      Reviewed-by: default avatarRichard Guy Briggs <rgb@redhat.com>
      Signed-off-by: default avatarPaul Moore <paul@paul-moore.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0ad0bb60
    • Steve Grubb's avatar
      audit: Allow auditd to set pid to 0 to end auditing · 4086f7cf
      Steve Grubb authored
      
      [ Upstream commit 33e8a907 ]
      
      The API to end auditing has historically been for auditd to set the
      pid to 0. This patch restores that functionality.
      
      See: https://github.com/linux-audit/audit-kernel/issues/69Reviewed-by: default avatarRichard Guy Briggs <rgb@redhat.com>
      Signed-off-by: default avatarSteve Grubb <sgrubb@redhat.com>
      Signed-off-by: default avatarPaul Moore <paul@paul-moore.com>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4086f7cf
    • Israel Rukshin's avatar
      nvmet-rdma: update queue list during ib_device removal · 7536280f
      Israel Rukshin authored
      
      [ Upstream commit 43b92fd2 ]
      
      A NULL deref happens when nvmet_rdma_remove_one() is called more than once
      (e.g. while connected via 2 ports).
      The first call frees the queues related to the first ib_device but
      doesn't remove them from the queue list.
      While calling nvmet_rdma_remove_one() for the second ib_device it goes over
      the full queue list again and we get the NULL deref.
      
      Fixes: f1d4ef7d ("nvmet-rdma: register ib_client to not deadlock in device removal")
      Signed-off-by: default avatarIsrael Rukshin <israelr@mellanox.com>
      Reviewed-by: default avatarMax Gurtovoy <maxg@mellanox.com>
      Reviewed-by: default avatarSagi Grimberg <sagi@grmberg.me>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7536280f
    • Bart Van Assche's avatar
      blk-mq: Avoid that request queue removal can trigger list corruption · a4000d95
      Bart Van Assche authored
      
      [ Upstream commit aba7afc5 ]
      
      Avoid that removal of a request queue sporadically triggers the
      following warning:
      
      list_del corruption. next->prev should be ffff8807d649b970, but was 6b6b6b6b6b6b6b6b
      WARNING: CPU: 3 PID: 342 at lib/list_debug.c:56 __list_del_entry_valid+0x92/0xa0
      Call Trace:
       process_one_work+0x11b/0x660
       worker_thread+0x3d/0x3b0
       kthread+0x129/0x140
       ret_from_fork+0x27/0x40
      Signed-off-by: default avatarBart Van Assche <bart.vanassche@wdc.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Hannes Reinecke <hare@suse.com>
      Cc: Johannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a4000d95
    • Hongxu Jia's avatar
      ide: ide-atapi: fix compile error with defining macro DEBUG · bd099ef9
      Hongxu Jia authored
      
      [ Upstream commit 8dc7a31f ]
      
      Compile ide-atapi failed with defining macro "DEBUG"
      ...
      |drivers/ide/ide-atapi.c:285:52: error: 'struct request' has
      no member named 'cmd'; did you mean 'csd'?
      |  debug_log("%s: rq->cmd[0]: 0x%x\n", __func__, rq->cmd[0]);
      ...
      
      Since we split the scsi_request out of struct request, it missed
      do the same thing on debug_log
      
      Fixes: 82ed4db4 ("block: split scsi_request out of struct request")
      Signed-off-by: default avatarHongxu Jia <hongxu.jia@windriver.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bd099ef9
    • Keefe Liu's avatar
      ipvlan: fix ipv6 outbound device · 7015ca81
      Keefe Liu authored
      
      [ Upstream commit ca29fd7c ]
      
      When process the outbound packet of ipv6, we should assign the master
      device to output device other than input device.
      Signed-off-by: default avatarKeefe Liu <liuqifa@huawei.com>
      Acked-by: default avatarMahesh Bandewar <maheshb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7015ca81
    • Vaidyanathan Srinivasan's avatar
      powerpc/powernv/idle: Round up latency and residency values · 854b27ed
      Vaidyanathan Srinivasan authored
      
      [ Upstream commit 8d4e10e9 ]
      
      On PowerNV platforms, firmware provides exit latency and
      target residency for each of the idle states in nano
      seconds.  Cpuidle framework expects the values in micro
      seconds.  Round up to nearest micro seconds to avoid errors
      in cases where the values are defined as fractional micro
      seconds.
      
      Default idle state of 'snooze' has exit latency of zero.  If
      other states have fractional micro second exit latency, they
      would get rounded down to zero micro second and make cpuidle
      framework choose deeper idle state when snooze loop is the
      right choice.
      Reported-by: default avatarAnton Blanchard <anton@samba.org>
      Signed-off-by: default avatarVaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
      Reviewed-by: default avatarGautham R. Shenoy <ego@linux.vnet.ibm.com>
      Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      854b27ed
    • Masahiro Yamada's avatar
      kbuild: do not call cc-option before KBUILD_CFLAGS initialization · b0c08c89
      Masahiro Yamada authored
      
      [ Upstream commit 433dc2eb ]
      
      Some $(call cc-option,...) are invoked very early, even before
      KBUILD_CFLAGS, etc. are initialized.
      
      The returned string from $(call cc-option,...) depends on
      KBUILD_CPPFLAGS, KBUILD_CFLAGS, and GCC_PLUGINS_CFLAGS.
      
      Since they are exported, they are not empty when the top Makefile
      is recursively invoked.
      
      The recursion occurs in several places.  For example, the top
      Makefile invokes itself for silentoldconfig.  "make tinyconfig",
      "make rpm-pkg" are the cases, too.
      
      In those cases, the second call of cc-option from the same line
      runs a different shell command due to non-pristine KBUILD_CFLAGS.
      
      To get the same result all the time, KBUILD_* and GCC_PLUGINS_CFLAGS
      must be initialized before any call of cc-option.  This avoids
      garbage data in the .cache.mk file.
      
      Move all calls of cc-option below the config targets because target
      compiler flags are unnecessary for Kconfig.
      Signed-off-by: default avatarMasahiro Yamada <yamada.masahiro@socionext.com>
      Reviewed-by: default avatarDouglas Anderson <dianders@chromium.org>
      Signed-off-by: default avatarSasha Levin <alexander.levin@verizon.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b0c08c89
    • Marc Zyngier's avatar
      KVM: arm/arm64: vgic-its: Preserve the revious read from the pending table · d7a71904
      Marc Zyngier authored
      commit 64afe6e9 upstream.
      
      The current pending table parsing code assumes that we keep the
      previous read of the pending bits, but keep that variable in
      the current block, making sure it is discarded on each loop.
      
      We end-up using whatever is on the stack. Who knows, it might
      just be the right thing...
      
      Fixes: 33d3bc95 ("KVM: arm64: vgic-its: Read initial LPI pending table")
      Reported-by: default avatarAKASHI Takahiro <takahiro.akashi@linaro.org>
      Reviewed-by: default avatarChristoffer Dall <christoffer.dall@linaro.org>
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: default avatarChristoffer Dall <christoffer.dall@linaro.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d7a71904
    • Al Viro's avatar
      fix kcm_clone() · e587b76e
      Al Viro authored
      commit a5739435 upstream.
      
      1) it's fput() or sock_release(), not both
      2) don't do fd_install() until the last failure exit.
      3) not a bug per se, but... don't attach socket to struct file
         until it's set up.
      
      Take reserving descriptor into the caller, move fd_install() to the
      caller, sanitize failure exits and calling conventions.
      Acked-by: default avatarTom Herbert <tom@herbertland.com>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e587b76e
    • Jeff Layton's avatar
      fcntl: don't cap l_start and l_end values for F_GETLK64 in compat syscall · fc2f8021
      Jeff Layton authored
      commit 4d2dc2cc upstream.
      
      Currently, we're capping the values too low in the F_GETLK64 case. The
      fields in that structure are 64-bit values, so we shouldn't need to do
      any sort of fixup there.
      
      Make sure we check that assumption at build time in the future however
      by ensuring that the sizes we're copying will fit.
      
      With this, we no longer need COMPAT_LOFF_T_MAX either, so remove it.
      
      Fixes: 94073ad7 (fs/locks: don't mess with the address limit in compat_fcntl64)
      Reported-by: default avatarVitaly Lipatov <lav@etersoft.ru>
      Signed-off-by: default avatarJeff Layton <jlayton@redhat.com>
      Reviewed-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fc2f8021
    • Vincent Pelletier's avatar
      usb: gadget: ffs: Forbid usb_ep_alloc_request from sleeping · 4b575421
      Vincent Pelletier authored
      commit 30bf90cc upstream.
      
      Found using DEBUG_ATOMIC_SLEEP while submitting an AIO read operation:
      
      [  100.853642] BUG: sleeping function called from invalid context at mm/slab.h:421
      [  100.861148] in_atomic(): 1, irqs_disabled(): 1, pid: 1880, name: python
      [  100.867954] 2 locks held by python/1880:
      [  100.867961]  #0:  (&epfile->mutex){....}, at: [<f8188627>] ffs_mutex_lock+0x27/0x30 [usb_f_fs]
      [  100.868020]  #1:  (&(&ffs->eps_lock)->rlock){....}, at: [<f818ad4b>] ffs_epfile_io.isra.17+0x24b/0x590 [usb_f_fs]
      [  100.868076] CPU: 1 PID: 1880 Comm: python Not tainted 4.14.0-edison+ #118
      [  100.868085] Hardware name: Intel Corporation Merrifield/BODEGA BAY, BIOS 542 2015.01.21:18.19.48
      [  100.868093] Call Trace:
      [  100.868122]  dump_stack+0x47/0x62
      [  100.868156]  ___might_sleep+0xfd/0x110
      [  100.868182]  __might_sleep+0x68/0x70
      [  100.868217]  kmem_cache_alloc_trace+0x4b/0x200
      [  100.868248]  ? dwc3_gadget_ep_alloc_request+0x24/0xe0 [dwc3]
      [  100.868302]  dwc3_gadget_ep_alloc_request+0x24/0xe0 [dwc3]
      [  100.868343]  usb_ep_alloc_request+0x16/0xc0 [udc_core]
      [  100.868386]  ffs_epfile_io.isra.17+0x444/0x590 [usb_f_fs]
      [  100.868424]  ? _raw_spin_unlock_irqrestore+0x27/0x40
      [  100.868457]  ? kiocb_set_cancel_fn+0x57/0x60
      [  100.868477]  ? ffs_ep0_poll+0xc0/0xc0 [usb_f_fs]
      [  100.868512]  ffs_epfile_read_iter+0xfe/0x157 [usb_f_fs]
      [  100.868551]  ? security_file_permission+0x9c/0xd0
      [  100.868587]  ? rw_verify_area+0xac/0x120
      [  100.868633]  aio_read+0x9d/0x100
      [  100.868692]  ? __fget+0xa2/0xd0
      [  100.868727]  ? __might_sleep+0x68/0x70
      [  100.868763]  SyS_io_submit+0x471/0x680
      [  100.868878]  do_int80_syscall_32+0x4e/0xd0
      [  100.868921]  entry_INT80_32+0x2a/0x2a
      [  100.868932] EIP: 0xb7fbb676
      [  100.868941] EFLAGS: 00000292 CPU: 1
      [  100.868951] EAX: ffffffda EBX: b7aa2000 ECX: 00000002 EDX: b7af8368
      [  100.868961] ESI: b7fbb660 EDI: b7aab000 EBP: bfb6c658 ESP: bfb6c638
      [  100.868973]  DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b
      Signed-off-by: default avatarVincent Pelletier <plr.vincent@gmail.com>
      Signed-off-by: default avatarFelipe Balbi <felipe.balbi@linux.intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4b575421
    • Masamitsu Yamazaki's avatar
      ipmi: Stop timers before cleaning up the module · 7a33cd0e
      Masamitsu Yamazaki authored
      commit 4f7f5551 upstream.
      
      System may crash after unloading ipmi_si.ko module
      because a timer may remain and fire after the module cleaned up resources.
      
      cleanup_one_si() contains the following processing.
      
              /*
               * Make sure that interrupts, the timer and the thread are
               * stopped and will not run again.
               */
              if (to_clean->irq_cleanup)
                      to_clean->irq_cleanup(to_clean);
              wait_for_timer_and_thread(to_clean);
      
              /*
               * Timeouts are stopped, now make sure the interrupts are off
               * in the BMC.  Note that timers and CPU interrupts are off,
               * so no need for locks.
               */
              while (to_clean->curr_msg || (to_clean->si_state != SI_NORMAL)) {
                      poll(to_clean);
                      schedule_timeout_uninterruptible(1);
              }
      
      si_state changes as following in the while loop calling poll(to_clean).
      
        SI_GETTING_MESSAGES
          => SI_CHECKING_ENABLES
           => SI_SETTING_ENABLES
            => SI_GETTING_EVENTS
             => SI_NORMAL
      
      As written in the code comments above,
      timers are expected to stop before the polling loop and not to run again.
      But the timer is set again in the following process
      when si_state becomes SI_SETTING_ENABLES.
      
        => poll
           => smi_event_handler
             => handle_transaction_done
                // smi_info->si_state == SI_SETTING_ENABLES
               => start_getting_events
                 => start_new_msg
                  => smi_mod_timer
                    => mod_timer
      
      As a result, before the timer set in start_new_msg() expires,
      the polling loop may see si_state becoming SI_NORMAL
      and the module clean-up finishes.
      
      For example, hard LOCKUP and panic occurred as following.
      smi_timeout was called after smi_event_handler,
      kcs_event and hangs at port_inb()
      trying to access I/O port after release.
      
          [exception RIP: port_inb+19]
          RIP: ffffffffc0473053  RSP: ffff88069fdc3d80  RFLAGS: 00000006
          RAX: ffff8806800f8e00  RBX: ffff880682bd9400  RCX: 0000000000000000
          RDX: 0000000000000ca3  RSI: 0000000000000ca3  RDI: ffff8806800f8e40
          RBP: ffff88069fdc3d80   R8: ffffffff81d86dfc   R9: ffffffff81e36426
          R10: 00000000000509f0  R11: 0000000000100000  R12: 0000000000]:000000
          R13: 0000000000000000  R14: 0000000000000246  R15: ffff8806800f8e00
          ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0000
       --- <NMI exception stack> ---
      
      To fix the problem I defined a flag, timer_can_start,
      as member of struct smi_info.
      The flag is enabled immediately after initializing the timer
      and disabled immediately before waiting for timer deletion.
      
      Fixes: 0cfec916 ("ipmi: Start the timer and thread on internal msgs")
      Signed-off-by: default avatarYamazaki Masamitsu <m-yamazaki@ah.jp.nec.com>
      [Some fairly major changes went into the IPMI driver in 4.15, so this
       required a backport as the code had changed and moved to a different
       file.]
      Signed-off-by: default avatarCorey Minyard <cminyard@mvista.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7a33cd0e
    • Xin Long's avatar
      sctp: use right member as the param of list_for_each_entry · 0c882334
      Xin Long authored
      
      [ Upstream commit a8dd3979 ]
      
      Commit d04adf1b ("sctp: reset owner sk for data chunks on out queues
      when migrating a sock") made a mistake that using 'list' as the param of
      list_for_each_entry to traverse the retransmit, sacked and abandoned
      queues, while chunks are using 'transmitted_list' to link into these
      queues.
      
      It could cause NULL dereference panic if there are chunks in any of these
      queues when peeling off one asoc.
      
      So use the chunk member 'transmitted_list' instead in this patch.
      
      Fixes: d04adf1b ("sctp: reset owner sk for data chunks on out queues when migrating a sock")
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Acked-by: default avatarMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Acked-by: default avatarNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0c882334
    • Jakub Kicinski's avatar
      cls_bpf: don't decrement net's refcount when offload fails · 627a5956
      Jakub Kicinski authored
      
      [ Upstream commit 25415cec ]
      
      When cls_bpf offload was added it seemed like a good idea to
      call cls_bpf_delete_prog() instead of extending the error
      handling path, since the software state is fully initialized
      at that point.  This handling of errors without jumping to
      the end of the function is error prone, as proven by later
      commit missing that extra call to __cls_bpf_delete_prog().
      
      __cls_bpf_delete_prog() is now expected to be invoked with
      a reference on exts->net or the field zeroed out.  The call
      on the offload's error patch does not fullfil this requirement,
      leading to each error stealing a reference on net namespace.
      
      Create a function undoing what cls_bpf_set_parms() did and
      use it from __cls_bpf_delete_prog() and the error path.
      
      Fixes: aae2c35e ("cls_bpf: use tcf_exts_get_net() before call_rcu()")
      Signed-off-by: default avatarJakub Kicinski <jakub.kicinski@netronome.com>
      Reviewed-by: default avatarSimon Horman <simon.horman@netronome.com>
      Acked-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      627a5956
    • Gustavo A. R. Silva's avatar
      net: openvswitch: datapath: fix data type in queue_gso_packets · 87ff3fb3
      Gustavo A. R. Silva authored
      
      [ Upstream commit 2734166e ]
      
      gso_type is being used in binary AND operations together with SKB_GSO_UDP.
      The issue is that variable gso_type is of type unsigned short and
      SKB_GSO_UDP expands to more than 16 bits:
      
      SKB_GSO_UDP = 1 << 16
      
      this makes any binary AND operation between gso_type and SKB_GSO_UDP to
      be always zero, hence making some code unreachable and likely causing
      undesired behavior.
      
      Fix this by changing the data type of variable gso_type to unsigned int.
      
      Addresses-Coverity-ID: 1462223
      Fixes: 0c19f846 ("net: accept UFO datagrams from tuntap and packet")
      Signed-off-by: default avatarGustavo A. R. Silva <garsilva@embeddedor.com>
      Acked-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      87ff3fb3
    • Willem de Bruijn's avatar
      net: accept UFO datagrams from tuntap and packet · 60335608
      Willem de Bruijn authored
      
      [ Upstream commit 0c19f846 ]
      
      Tuntap and similar devices can inject GSO packets. Accept type
      VIRTIO_NET_HDR_GSO_UDP, even though not generating UFO natively.
      
      Processes are expected to use feature negotiation such as TUNSETOFFLOAD
      to detect supported offload types and refrain from injecting other
      packets. This process breaks down with live migration: guest kernels
      do not renegotiate flags, so destination hosts need to expose all
      features that the source host does.
      
      Partially revert the UFO removal from 182e0b6b~1..d9d30adf.
      This patch introduces nearly(*) no new code to simplify verification.
      It brings back verbatim tuntap UFO negotiation, VIRTIO_NET_HDR_GSO_UDP
      insertion and software UFO segmentation.
      
      It does not reinstate protocol stack support, hardware offload
      (NETIF_F_UFO), SKB_GSO_UDP tunneling in SKB_GSO_SOFTWARE or reception
      of VIRTIO_NET_HDR_GSO_UDP packets in tuntap.
      
      To support SKB_GSO_UDP reappearing in the stack, also reinstate
      logic in act_csum and openvswitch. Achieve equivalence with v4.13 HEAD
      by squashing in commit 93991221 ("net: skb_needs_check() removes
      CHECKSUM_UNNECESSARY check for tx.") and reverting commit 8d63bee6
      ("net: avoid skb_warn_bad_offload false positives on UFO").
      
      (*) To avoid having to bring back skb_shinfo(skb)->ip6_frag_id,
      ipv6_proxy_select_ident is changed to return a __be32 and this is
      assigned directly to the frag_hdr. Also, SKB_GSO_UDP is inserted
      at the end of the enum to minimize code churn.
      
      Tested
        Booted a v4.13 guest kernel with QEMU. On a host kernel before this
        patch `ethtool -k eth0` shows UFO disabled. After the patch, it is
        enabled, same as on a v4.13 host kernel.
      
        A UFO packet sent from the guest appears on the tap device:
          host:
            nc -l -p -u 8000 &
            tcpdump -n -i tap0
      
          guest:
            dd if=/dev/zero of=payload.txt bs=1 count=2000
            nc -u 192.16.1.1 8000 < payload.txt
      
        Direct tap to tap transmission of VIRTIO_NET_HDR_GSO_UDP succeeds,
        packets arriving fragmented:
      
          ./with_tap_pair.sh ./tap_send_ufo tap0 tap1
          (from https://github.com/wdebruij/kerneltools/tree/master/tests)
      
      Changes
        v1 -> v2
          - simplified set_offload change (review comment)
          - documented test procedure
      
      Link: http://lkml.kernel.org/r/<CAF=yD-LuUeDuL9YWPJD9ykOZ0QCjNeznPDr6whqZ9NGMNF12Mw@mail.gmail.com>
      Fixes: fb652fdf ("macvlan/macvtap: Remove NETIF_F_UFO advertisement.")
      Reported-by: default avatarMichal Kubecek <mkubecek@suse.cz>
      Signed-off-by: default avatarWillem de Bruijn <willemb@google.com>
      Acked-by: default avatarJason Wang <jasowang@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      60335608
    • Xin Long's avatar
      tun: fix rcu_read_lock imbalance in tun_build_skb · 29d631e5
      Xin Long authored
      
      [ Upstream commit 654d5738 ]
      
      rcu_read_lock in tun_build_skb is used to rcu_dereference tun->xdp_prog
      safely, rcu_read_unlock should be done in every return path.
      
      Now I could see one place missing it, where it returns NULL in switch-case
      XDP_REDIRECT,  another palce using rcu_read_lock wrongly, where it returns
      NULL in if (xdp_xmit) chunk.
      
      So fix both in this patch.
      
      Fixes: 761876c8 ("tap: XDP support")
      Signed-off-by: default avatarXin Long <lucien.xin@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      29d631e5
    • David Ahern's avatar
      net: ipv6: Fixup device for anycast routes during copy · 7afe2e66
      David Ahern authored
      
      [ Upstream commit 98d11291 ]
      
      Florian reported a breakage with anycast routes due to commit
      4832c30d ("net: ipv6: put host and anycast routes on device with
      address"). Prior to this commit anycast routes were added against the
      loopback device causing repetitive route entries with no insight into
      why they existed. e.g.:
        $ ip -6 ro ls  table local type anycast
        anycast 2001:db8:1:: dev lo proto kernel metric 0 pref medium
        anycast 2001:db8:2:: dev lo proto kernel metric 0 pref medium
        anycast fe80:: dev lo proto kernel metric 0 pref medium
        anycast fe80:: dev lo proto kernel metric 0 pref medium
      
      The point of commit 4832c30d is to add the routes using the device
      with the address which is causing the route to be added. e.g.,:
        $ ip -6 ro ls  table local type anycast
        anycast 2001:db8:1:: dev eth1 proto kernel metric 0 pref medium
        anycast 2001:db8:2:: dev eth2 proto kernel metric 0 pref medium
        anycast fe80:: dev eth2 proto kernel metric 0 pref medium
        anycast fe80:: dev eth1 proto kernel metric 0 pref medium
      
      For traffic to work as it did before, the dst device needs to be switched
      to the loopback when the copy is created similar to local routes.
      
      Fixes: 4832c30d ("net: ipv6: put host and anycast routes on device with address")
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7afe2e66
    • Wei Xu's avatar
      tun: free skb in early errors · 0536add6
      Wei Xu authored
      
      [ Upstream commit c33ee15b ]
      
      tun_recvmsg() supports accepting skb by msg_control after
      commit ac77cfd4 ("tun: support receiving skb through msg_control"),
      the skb if presented should be freed no matter how far it can go
      along, otherwise it would be leaked.
      
      This patch fixes several missed cases.
      Signed-off-by: default avatarWei Xu <wexu@redhat.com>
      Reported-by: default avatarMatthew Rosato <mjrosato@linux.vnet.ibm.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0536add6
    • Neal Cardwell's avatar
      tcp: when scheduling TLP, time of RTO should account for current ACK · 241eb29c
      Neal Cardwell authored
      
      [ Upstream commit ed66dfaf ]
      
      Fix the TLP scheduling logic so that when scheduling a TLP probe, we
      ensure that the estimated time at which an RTO would fire accounts for
      the fact that ACKs indicating forward progress should push back RTO
      times.
      
      After the following fix:
      
      df92c839 ("tcp: fix xmit timer to only be reset if data ACKed/SACKed")
      
      we had an unintentional behavior change in the following kind of
      scenario: suppose the RTT variance has been very low recently. Then
      suppose we send out a flight of N packets and our RTT is 100ms:
      
      t=0: send a flight of N packets
      t=100ms: receive an ACK for N-1 packets
      
      The response before df92c839 that was:
        -> schedule a TLP for now + RTO_interval
      
      The response after df92c839 is:
        -> schedule a TLP for t=0 + RTO_interval
      
      Since RTO_interval = srtt + RTT_variance, this means that we have
      scheduled a TLP timer at a point in the future that only accounts for
      RTT_variance. If the RTT_variance term is small, this means that the
      timer fires soon.
      
      Before df92c839 this would not happen, because in that code, when
      we receive an ACK for a prefix of flight, we did:
      
          1) Near the top of tcp_ack(), switch from TLP timer to RTO
             at write_queue_head->paket_tx_time + RTO_interval:
                  if (icsk->icsk_pending == ICSK_TIME_LOSS_PROBE)
                         tcp_rearm_rto(sk);
      
          2) In tcp_clean_rtx_queue(), update the RTO to now + RTO_interval:
                  if (flag & FLAG_ACKED) {
                         tcp_rearm_rto(sk);
      
          3) In tcp_ack() after tcp_fastretrans_alert() switch from RTO
             to TLP at now + RTO_interval:
                  if (icsk->icsk_pending == ICSK_TIME_RETRANS)
                         tcp_schedule_loss_probe(sk);
      
      In df92c839 we removed that 3-phase dance, and instead directly
      set the TLP timer once: we set the TLP timer in cases like this to
      write_queue_head->packet_tx_time + RTO_interval. So if the RTT
      variance is small, then this means that this is setting the TLP timer
      to fire quite soon. This means if the ACK for the tail of the flight
      takes longer than an RTT to arrive (often due to delayed ACKs), then
      the TLP timer fires too quickly.
      
      Fixes: df92c839 ("tcp: fix xmit timer to only be reset if data ACKed/SACKed")
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarYuchung Cheng <ycheng@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Acked-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      241eb29c
    • Wei Xu's avatar
      tap: free skb if flags error · 616bada6
      Wei Xu authored
      
      [ Upstream commit 61d78537 ]
      
      tap_recvmsg() supports accepting skb by msg_control after
      commit 3b4ba04a ("tap: support receiving skb from msg_control"),
      the skb if presented should be freed within the function, otherwise
      it would be leaked.
      Signed-off-by: default avatarWei Xu <wexu@redhat.com>
      Reported-by: default avatarMatthew Rosato <mjrosato@linux.vnet.ibm.com>
      Acked-by: default avatarMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      616bada6
    • Jiri Pirko's avatar
      net: sched: cbq: create block for q->link.block · 8067098c
      Jiri Pirko authored
      
      [ Upstream commit d51aae68 ]
      
      q->link.block is not initialized, that leads to EINVAL when one tries to
      add filter there. So initialize it properly.
      
      This can be reproduced by:
      $ tc qdisc add dev eth0 root handle 1: cbq avpkt 1000 rate 1000Mbit bandwidth 1000Mbit
      $ tc filter add dev eth0 parent 1: protocol ip prio 100 u32 match ip protocol 0 0x00 flowid 1:1
      Reported-by: default avatarJaroslav Aster <jaster@redhat.com>
      Reported-by: default avatarIvan Vecera <ivecera@redhat.com>
      Fixes: 6529eaba ("net: sched: introduce tcf block infractructure")
      Signed-off-by: default avatarJiri Pirko <jiri@mellanox.com>
      Acked-by: default avatarEelco Chaudron <echaudro@redhat.com>
      Reviewed-by: default avatarIvan Vecera <ivecera@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8067098c
    • Eric Dumazet's avatar
      tcp: use current time in tcp_rcv_space_adjust() · 91c5e655
      Eric Dumazet authored
      
      [ Upstream commit 86323850 ]
      
      When I switched rcv_rtt_est to high resolution timestamps, I forgot
      that tp->tcp_mstamp needed to be refreshed in tcp_rcv_space_adjust()
      
      Using an old timestamp leads to autotuning lags.
      
      Fixes: 645f4c6f ("tcp: switch rcv_rtt_est and rcvq_space to high resolution timestamps")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Wei Wang <weiwan@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Acked-by: default avatarNeal Cardwell <ncardwell@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      91c5e655
    • Tommi Rantala's avatar
      tipc: call tipc_rcv() only if bearer is up in tipc_udp_recv() · de514e06
      Tommi Rantala authored
      
      [ Upstream commit c7799c06 ]
      
      Remove the second tipc_rcv() call in tipc_udp_recv(). We have just
      checked that the bearer is not up, and calling tipc_rcv() with a bearer
      that is not up leads to a TIPC div-by-zero crash in
      tipc_node_calculate_timer(). The crash is rare in practice, but can
      happen like this:
      
        We're enabling a bearer, but it's not yet up and fully initialized.
        At the same time we receive a discovery packet, and in tipc_udp_recv()
        we end up calling tipc_rcv() with the not-yet-initialized bearer,
        causing later the div-by-zero crash in tipc_node_calculate_timer().
      
      Jon Maloy explains the impact of removing the second tipc_rcv() call:
        "link setup in the worst case will be delayed until the next arriving
         discovery messages, 1 sec later, and this is an acceptable delay."
      
      As the tipc_rcv() call is removed, just leave the function via the
      rcu_out label, so that we will kfree_skb().
      
      [   12.590450] Own node address <1.1.1>, network identity 1
      [   12.668088] divide error: 0000 [#1] SMP
      [   12.676952] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.14.2-dirty #1
      [   12.679225] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-2.fc27 04/01/2014
      [   12.682095] task: ffff8c2a761edb80 task.stack: ffffa41cc0cac000
      [   12.684087] RIP: 0010:tipc_node_calculate_timer.isra.12+0x45/0x60 [tipc]
      [   12.686486] RSP: 0018:ffff8c2a7fc838a0 EFLAGS: 00010246
      [   12.688451] RAX: 0000000000000000 RBX: ffff8c2a5b382600 RCX: 0000000000000000
      [   12.691197] RDX: 0000000000000000 RSI: ffff8c2a5b382600 RDI: ffff8c2a5b382600
      [   12.693945] RBP: ffff8c2a7fc838b0 R08: 0000000000000001 R09: 0000000000000001
      [   12.696632] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8c2a5d8949d8
      [   12.699491] R13: ffffffff95ede400 R14: 0000000000000000 R15: ffff8c2a5d894800
      [   12.702338] FS:  0000000000000000(0000) GS:ffff8c2a7fc80000(0000) knlGS:0000000000000000
      [   12.705099] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   12.706776] CR2: 0000000001bb9440 CR3: 00000000bd009001 CR4: 00000000003606e0
      [   12.708847] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [   12.711016] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [   12.712627] Call Trace:
      [   12.713390]  <IRQ>
      [   12.714011]  tipc_node_check_dest+0x2e8/0x350 [tipc]
      [   12.715286]  tipc_disc_rcv+0x14d/0x1d0 [tipc]
      [   12.716370]  tipc_rcv+0x8b0/0xd40 [tipc]
      [   12.717396]  ? minmax_running_min+0x2f/0x60
      [   12.718248]  ? dst_alloc+0x4c/0xa0
      [   12.718964]  ? tcp_ack+0xaf1/0x10b0
      [   12.719658]  ? tipc_udp_is_known_peer+0xa0/0xa0 [tipc]
      [   12.720634]  tipc_udp_recv+0x71/0x1d0 [tipc]
      [   12.721459]  ? dst_alloc+0x4c/0xa0
      [   12.722130]  udp_queue_rcv_skb+0x264/0x490
      [   12.722924]  __udp4_lib_rcv+0x21e/0x990
      [   12.723670]  ? ip_route_input_rcu+0x2dd/0xbf0
      [   12.724442]  ? tcp_v4_rcv+0x958/0xa40
      [   12.725039]  udp_rcv+0x1a/0x20
      [   12.725587]  ip_local_deliver_finish+0x97/0x1d0
      [   12.726323]  ip_local_deliver+0xaf/0xc0
      [   12.726959]  ? ip_route_input_noref+0x19/0x20
      [   12.727689]  ip_rcv_finish+0xdd/0x3b0
      [   12.728307]  ip_rcv+0x2ac/0x360
      [   12.728839]  __netif_receive_skb_core+0x6fb/0xa90
      [   12.729580]  ? udp4_gro_receive+0x1a7/0x2c0
      [   12.730274]  __netif_receive_skb+0x1d/0x60
      [   12.730953]  ? __netif_receive_skb+0x1d/0x60
      [   12.731637]  netif_receive_skb_internal+0x37/0xd0
      [   12.732371]  napi_gro_receive+0xc7/0xf0
      [   12.732920]  receive_buf+0x3c3/0xd40
      [   12.733441]  virtnet_poll+0xb1/0x250
      [   12.733944]  net_rx_action+0x23e/0x370
      [   12.734476]  __do_softirq+0xc5/0x2f8
      [   12.734922]  irq_exit+0xfa/0x100
      [   12.735315]  do_IRQ+0x4f/0xd0
      [   12.735680]  common_interrupt+0xa2/0xa2
      [   12.736126]  </IRQ>
      [   12.736416] RIP: 0010:native_safe_halt+0x6/0x10
      [   12.736925] RSP: 0018:ffffa41cc0cafe90 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff4d
      [   12.737756] RAX: 0000000000000000 RBX: ffff8c2a761edb80 RCX: 0000000000000000
      [   12.738504] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
      [   12.739258] RBP: ffffa41cc0cafe90 R08: 0000014b5b9795e5 R09: ffffa41cc12c7e88
      [   12.740118] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000002
      [   12.740964] R13: ffff8c2a761edb80 R14: 0000000000000000 R15: 0000000000000000
      [   12.741831]  default_idle+0x2a/0x100
      [   12.742323]  arch_cpu_idle+0xf/0x20
      [   12.742796]  default_idle_call+0x28/0x40
      [   12.743312]  do_idle+0x179/0x1f0
      [   12.743761]  cpu_startup_entry+0x1d/0x20
      [   12.744291]  start_secondary+0x112/0x120
      [   12.744816]  secondary_startup_64+0xa5/0xa5
      [   12.745367] Code: b9 f4 01 00 00 48 89 c2 48 c1 ea 02 48 3d d3 07 00
      00 48 0f 47 d1 49 8b 0c 24 48 39 d1 76 07 49 89 14 24 48 89 d1 31 d2 48
      89 df <48> f7 f1 89 c6 e8 81 6e ff ff 5b 41 5c 5d c3 66 90 66 2e 0f 1f
      [   12.747527] RIP: tipc_node_calculate_timer.isra.12+0x45/0x60 [tipc] RSP: ffff8c2a7fc838a0
      [   12.748555] ---[ end trace 1399ab83390650fd ]---
      [   12.749296] Kernel panic - not syncing: Fatal exception in interrupt
      [   12.750123] Kernel Offset: 0x13200000 from 0xffffffff82000000
      (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
      [   12.751215] Rebooting in 60 seconds..
      
      Fixes: c9b64d49 ("tipc: add replicast peer discovery")
      Signed-off-by: default avatarTommi Rantala <tommi.t.rantala@nokia.com>
      Cc: Jon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      de514e06
    • David Ahern's avatar
      tcp: use IPCB instead of TCP_SKB_CB in inet_exact_dif_match() · efb7dc5a
      David Ahern authored
      
      [ Usptream commit b4d1605a ]
      
      After this fix : ("tcp: add tcp_v4_fill_cb()/tcp_v4_restore_cb()"),
      socket lookups happen while skb->cb[] has not been mangled yet by TCP.
      
      Fixes: a04a480d ("net: Require exact match for TCP socket lookups if dif is l3mdev")
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      efb7dc5a
    • Julian Wiedmann's avatar
      s390/qeth: fix GSO throughput regression · effea096
      Julian Wiedmann authored
      
      [ Upstream commit 6d69b1f1 ]
      
      Using GSO with small MTUs currently results in a substantial throughput
      regression - which is caused by how qeth needs to map non-linear skbs
      into its IO buffer elements:
      compared to a linear skb, each GSO-segmented skb effectively consumes
      twice as many buffer elements (ie two instead of one) due to the
      additional header-only part. This causes the Output Queue to be
      congested with low-utilized IO buffers.
      
      Fix this as follows:
      If the MSS is low enough so that a non-SG GSO segmentation produces
      order-0 skbs (currently ~3500 byte), opt out from NETIF_F_SG. This is
      where we anticipate the biggest savings, since an SG-enabled
      GSO segmentation produces skbs that always consume at least two
      buffer elements.
      
      Larger MSS values continue to get a SG-enabled GSO segmentation, since
      1) the relative overhead of the additional header-only buffer element
      becomes less noticeable, and
      2) the linearization overhead increases.
      
      With the throughput regression fixed, re-enable NETIF_F_SG by default to
      reap the significant CPU savings of GSO.
      
      Fixes: 5722963a ("qeth: do not turn on SG per default")
      Reported-by: default avatarNils Hoppmann <niho@de.ibm.com>
      Signed-off-by: default avatarJulian Wiedmann <jwi@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      effea096
    • Julian Wiedmann's avatar
      s390/qeth: fix thinko in IPv4 multicast address tracking · 79651202
      Julian Wiedmann authored
      
      [ Upsteam commit bc3ab705 ]
      
      Commit 5f78e29c ("qeth: optimize IP handling in rx_mode callback")
      reworked how secondary addresses are managed for qeth devices.
      Instead of dropping & subsequently re-adding all addresses on every
      ndo_set_rx_mode() call, qeth now keeps track of the addresses that are
      currently registered with the HW.
      On a ndo_set_rx_mode(), we thus only need to do (de-)registration
      requests for the addresses that have actually changed.
      
      On L3 devices, the lookup for IPv4 Multicast addresses checks the wrong
      hashtable - and thus never finds a match. As a result, we first delete
      *all* such addresses, and then re-add them again. So each set_rx_mode()
      causes a short period where the IPv4 Multicast addresses are not
      registered, and the card stops forwarding inbound traffic for them.
      
      Fix this by setting the ->is_multicast flag on the lookup object, thus
      enabling qeth_l3_ip_from_hash() to search the correct hashtable and
      find a match there.
      
      Fixes: 5f78e29c ("qeth: optimize IP handling in rx_mode callback")
      Signed-off-by: default avatarJulian Wiedmann <jwi@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      79651202
    • Julian Wiedmann's avatar
      s390/qeth: build max size GSO skbs on L2 devices · cfa19a2e
      Julian Wiedmann authored
      
      [ Upstream commit 0cbff6d4 ]
      
      The current GSO skb size limit was copy&pasted over from the L3 path,
      where it is needed due to a TSO limitation.
      As L2 devices don't offer TSO support (and thus all GSO skbs are
      segmented before they reach the driver), there's no reason to restrict
      the stack in how large it may build the GSO skbs.
      
      Fixes: d52aec97 ("qeth: enable scatter/gather in layer 2 mode")
      Signed-off-by: default avatarJulian Wiedmann <jwi@linux.vnet.ibm.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cfa19a2e
    • Eric Dumazet's avatar
      tcp/dccp: block bh before arming time_wait timer · 5fa411e8
      Eric Dumazet authored
      
      [ Upstream commit cfac7f83 ]
      
      Maciej Żenczykowski reported some panics in tcp_twsk_destructor()
      that might be caused by the following bug.
      
      timewait timer is pinned to the cpu, because we want to transition
      timwewait refcount from 0 to 4 in one go, once everything has been
      initialized.
      
      At the time commit ed2e9239 ("tcp/dccp: fix timewait races in timer
      handling") was merged, TCP was always running from BH habdler.
      
      After commit 5413d1ba ("net: do not block BH while processing
      socket backlog") we definitely can run tcp_time_wait() from process
      context.
      
      We need to block BH in the critical section so that the pinned timer
      has still its purpose.
      
      This bug is more likely to happen under stress and when very small RTO
      are used in datacenter flows.
      
      Fixes: 5413d1ba ("net: do not block BH while processing socket backlog")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarMaciej Żenczykowski <maze@google.com>
      Acked-by: default avatarMaciej Żenczykowski <maze@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5fa411e8