1. 20 Jun, 2018 2 commits
  2. 19 Jun, 2018 2 commits
    • Bart Van Assche's avatar
      Revert "block: Add warning for bi_next not NULL in bio_endio()" · 9c24c10a
      Bart Van Assche authored
      Commit 0ba99ca4 ("block: Add warning for bi_next not NULL in
      bio_endio()") breaks the dm driver. end_clone_bio() detects whether
      or not a bio is the last bio associated with a request by checking
      the .bi_next field. Commit 0ba99ca4 clears that field before
      end_clone_bio() has had a chance to inspect that field. Hence revert
      commit 0ba99ca4.
      
      This patch avoids that KASAN reports the following complaint when
      running the srp-test software (srp-test/run_tests -c -d -r 10 -t 02-mq):
      
      ==================================================================
      BUG: KASAN: use-after-free in bio_advance+0x11b/0x1d0
      Read of size 4 at addr ffff8801300e06d0 by task ksoftirqd/0/9
      
      CPU: 0 PID: 9 Comm: ksoftirqd/0 Not tainted 4.18.0-rc1-dbg+ #1
      Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
      Call Trace:
       dump_stack+0xa4/0xf5
       print_address_description+0x6f/0x270
       kasan_report+0x241/0x360
       __asan_load4+0x78/0x80
       bio_advance+0x11b/0x1d0
       blk_update_request+0xa7/0x5b0
       scsi_end_request+0x56/0x320 [scsi_mod]
       scsi_io_completion+0x7d6/0xb20 [scsi_mod]
       scsi_finish_command+0x1c0/0x280 [scsi_mod]
       scsi_softirq_done+0x19a/0x230 [scsi_mod]
       blk_mq_complete_request+0x160/0x240
       scsi_mq_done+0x50/0x1a0 [scsi_mod]
       srp_recv_done+0x515/0x1330 [ib_srp]
       __ib_process_cq+0xa0/0xf0 [ib_core]
       ib_poll_handler+0x38/0xa0 [ib_core]
       irq_poll_softirq+0xe8/0x1f0
       __do_softirq+0x128/0x60d
       run_ksoftirqd+0x3f/0x60
       smpboot_thread_fn+0x352/0x460
       kthread+0x1c1/0x1e0
       ret_from_fork+0x24/0x30
      
      Allocated by task 1918:
       save_stack+0x43/0xd0
       kasan_kmalloc+0xad/0xe0
       kasan_slab_alloc+0x11/0x20
       kmem_cache_alloc+0xfe/0x350
       mempool_alloc_slab+0x15/0x20
       mempool_alloc+0xfb/0x270
       bio_alloc_bioset+0x244/0x350
       submit_bh_wbc+0x9c/0x2f0
       __block_write_full_page+0x299/0x5a0
       block_write_full_page+0x16b/0x180
       blkdev_writepage+0x18/0x20
       __writepage+0x42/0x80
       write_cache_pages+0x376/0x8a0
       generic_writepages+0xbe/0x110
       blkdev_writepages+0xe/0x10
       do_writepages+0x9b/0x180
       __filemap_fdatawrite_range+0x178/0x1c0
       file_write_and_wait_range+0x59/0xc0
       blkdev_fsync+0x46/0x80
       vfs_fsync_range+0x66/0x100
       do_fsync+0x3d/0x70
       __x64_sys_fsync+0x21/0x30
       do_syscall_64+0x77/0x230
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Freed by task 9:
       save_stack+0x43/0xd0
       __kasan_slab_free+0x137/0x190
       kasan_slab_free+0xe/0x10
       kmem_cache_free+0xd3/0x380
       mempool_free_slab+0x17/0x20
       mempool_free+0x63/0x160
       bio_free+0x81/0xa0
       bio_put+0x59/0x60
       end_bio_bh_io_sync+0x5d/0x70
       bio_endio+0x1a7/0x360
       blk_update_request+0xd0/0x5b0
       end_clone_bio+0xa3/0xd0 [dm_mod]
       bio_endio+0x1a7/0x360
       blk_update_request+0xd0/0x5b0
       scsi_end_request+0x56/0x320 [scsi_mod]
       scsi_io_completion+0x7d6/0xb20 [scsi_mod]
       scsi_finish_command+0x1c0/0x280 [scsi_mod]
       scsi_softirq_done+0x19a/0x230 [scsi_mod]
       blk_mq_complete_request+0x160/0x240
       scsi_mq_done+0x50/0x1a0 [scsi_mod]
       srp_recv_done+0x515/0x1330 [ib_srp]
       __ib_process_cq+0xa0/0xf0 [ib_core]
       ib_poll_handler+0x38/0xa0 [ib_core]
       irq_poll_softirq+0xe8/0x1f0
       __do_softirq+0x128/0x60d
      
      The buggy address belongs to the object at ffff8801300e0640
       which belongs to the cache bio-0 of size 200
      The buggy address is located 144 bytes inside of
       200-byte region [ffff8801300e0640, ffff8801300e0708)
      The buggy address belongs to the page:
      page:ffffea0004c03800 count:1 mapcount:0 mapping:ffff88015a563a00 index:0x0 compound_mapcount: 0
      flags: 0x8000000000008100(slab|head)
      raw: 8000000000008100 dead000000000100 dead000000000200 ffff88015a563a00
      raw: 0000000000000000 0000000000330033 00000001ffffffff 0000000000000000
      page dumped because: kasan: bad access detected
      
      Memory state around the buggy address:
       ffff8801300e0580: fb fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc
       ffff8801300e0600: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
      >ffff8801300e0680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                                       ^
       ffff8801300e0700: fb fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
       ffff8801300e0780: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      ==================================================================
      
      Cc: Kent Overstreet <kent.overstreet@gmail.com>
      Fixes: 0ba99ca4 ("block: Add warning for bi_next not NULL in bio_endio()")
      Acked-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarBart Van Assche <bart.vanassche@wdc.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      9c24c10a
    • Christoph Hellwig's avatar
      block: fix timeout changes for legacy request drivers · 0cc61e64
      Christoph Hellwig authored
      blk_mq_complete_request can only be called for blk-mq drivers, but when
      removing the BLK_EH_HANDLED return value, two legacy request timeout
      methods incorrectly got switched to call blk_mq_complete_request.
      Call __blk_complete_request instead to reinstance the previous behavior.
      For that __blk_complete_request needs to be exported.
      
      Fixes: 1fc2b62e ("scsi_transport_fc: complete requests from ->timeout")
      Fixes: 0df0bb08 ("null_blk: complete requests from ->timeout")
      Reported-by: default avatarJianchao Wang <jianchao.w.wang@oracle.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      0cc61e64
  3. 15 Jun, 2018 6 commits
  4. 14 Jun, 2018 6 commits
    • Christoph Hellwig's avatar
      blk-mq: remove blk_mq_tagset_iter · e6c3456a
      Christoph Hellwig authored
      Unused now that nvme stopped using it.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarJens Axboe <axboe@kernel.dk>
      e6c3456a
    • Christoph Hellwig's avatar
      nvme: remove nvme_reinit_tagset · 14dfa400
      Christoph Hellwig authored
      Unused now that all transports stopped using it.
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarJens Axboe <axboe@kernel.dk>
      14dfa400
    • James Smart's avatar
      nvme-fc: fix nulling of queue data on reconnect · 3e493c00
      James Smart authored
      The reconnect path is calling the init routines to clear a queue
      structure. But the queue structure has state that perhaps needs
      to persist as long as the controller is live.
      
      Remove the nvme_fc_init_queue() calls on reconnect.
      The nvme_fc_free_queue() calls will clear state bits and reset
      any relevant queue state for a new connection.
      Signed-off-by: default avatarJames Smart <james.smart@broadcom.com>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      3e493c00
    • James Smart's avatar
      nvme-fc: remove reinit_request routine · 587331f7
      James Smart authored
      The reinit_request routine is not necessary. Remove support for the
      op callback.
      
      As all that nvme_reinit_tagset() does is itterate and call the
      reinit routine, it too has no purpose. Remove the call.
      Signed-off-by: default avatarJames Smart <james.smart@broadcom.com>
      Reviewed-by: default avatarHannes Reinecke <hare@suse.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      587331f7
    • Christoph Hellwig's avatar
      blk-mq: don't time out requests again that are in the timeout handler · da661267
      Christoph Hellwig authored
      We can currently call the timeout handler again on a request that has
      already been handed over to the timeout handler.  Prevent that with a new
      flag.
      
      Fixes: 12f5b931 ("blk-mq: Remove generation seqeunce")
      Reported-by: default avatarAndrew Randrianasulu <randrianasulu@gmail.com>
      Tested-by: default avatarAndrew Randrianasulu <randrianasulu@gmail.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      da661267
    • James Smart's avatar
      nvme-fc: change controllers first connect to use reconnect path · 4c984154
      James Smart authored
      Current code follows the framework that has been in the transports
      from the beginning where initial link-side controller connect occurs
      as part of "creating the controller". Thus that first connect fully
      talks to the controller and obtains values that can then be used in
      for blk-mq setup, etc. It also means that everything about the
      controller is fully know before the "create controller" call returns.
      
      This has several weaknesses:
      - The initial create_ctrl call made by the cli will block for a long
        time as wire transactions are performed synchronously. This delay
        becomes longer if errors occur or connectivity is lost and retries
        need to be performed.
      - Code wise, it means there is a separate connect path for initial
        controller connect vs the (same) steps used in the reconnect path.
      - And as there's separate paths, it means there's separate error
        handling and retry logic. It also plays havoc with the NEW state
        (should transition out of it after successful initial connect) vs
        the RESETTING and CONNECTING (reconnect) states that want to be
        transitioned to on error.
      - As there's separate paths, to recover from errors and disruptions,
        it requires separate recovery/retry paths as well and can severely
        convolute the controller state.
      
      This patch reworks the fc transport to use the same connect paths
      for the initial connection as it uses for reconnect. This makes a
      single path for error recovery and handling.
      
      This patch:
      - Removes the driving of the initial connect and replaces it with
        a state transition to CONNECTING and initiating the reconnect
        thread. A dummy state transition of RESETTING had to be traversed
        as a direct transtion of NEW->CONNECTING is not allowed. Given
        that the controller is "new", the RESETTING transition is a simple
        no-op. Once in the reconnecting thread, the normal behaviors of
        ctrl_loss_tmo (max_retries * connect_delay) and dev_loss_tmo will
        apply before the controller is torn down.
      - Only if the state transitions couldn't be traversed and the
        reconnect thread not scheduled, will the controller be torn down
        while in create_ctrl.
      - The prior code used the controller state of NEW to indicate
        whether request queues had been initialized or not. For the admin
        queue, the request queue is always created, so there's no need to
        check a state. For IO queues, change to tracking whether a successful
        io request queue create has occurred (e.g. 1st successful connect).
      - The initial controller id is initialized to the dynamic controller
        id used in the initial connect message. It will be overwritten by
        the real controller id once the controller is connected on the wire.
      Signed-off-by: default avatarJames Smart <james.smart@broadcom.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      4c984154
  5. 13 Jun, 2018 1 commit
  6. 11 Jun, 2018 6 commits
    • Chaitanya Kulkarni's avatar
      nvmet: free smart-log buffer after use · c42d7a30
      Chaitanya Kulkarni authored
      Free smart-log buffer allocated in the function after use.
      Signed-off-by: default avatarChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      c42d7a30
    • Max Gurtovoy's avatar
      nvme-rdma: fix error flow during mapping request data · 94423a8f
      Max Gurtovoy authored
      After dma mapping the sgl, we map the sgl to nvme sgl descriptor. In case
      of failure during the last mapping we never dma unmap the sgl.
      Signed-off-by: default avatarMax Gurtovoy <maxg@mellanox.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      94423a8f
    • Hannes Reinecke's avatar
      nvme: add bio remapping tracepoint · 2796b569
      Hannes Reinecke authored
      Adding a tracepoint to trace bio remapping for native nvme multipath.
      Signed-off-by: default avatarHannes Reinecke <hare@suse.com>
      Reviewed-by: default avatarSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      2796b569
    • Israel Rukshin's avatar
      nvme: fix NULL pointer dereference in nvme_init_subsystem · 16001c10
      Israel Rukshin authored
      When using nvme-pci driver the nvmf_ctrl_options is NULL.
      There is no need to check for discovery_nqn flag at non-fabrics controller.
      
      Fixes: 181303d0 ("nvme-fabrics: allow duplicate connections to the discovery controller")
      Signed-off-by: default avatarIsrael Rukshin <israelr@mellanox.com>
      Reviewed-by: default avatarMax Gurtovoy <maxg@mellanox.com>
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      16001c10
    • Roman Pen's avatar
      blk-mq: reinit q->tag_set_list entry only after grace period · a347c7ad
      Roman Pen authored
      It is not allowed to reinit q->tag_set_list list entry while RCU grace
      period has not completed yet, otherwise the following soft lockup in
      blk_mq_sched_restart() happens:
      
      [ 1064.252652] watchdog: BUG: soft lockup - CPU#12 stuck for 23s! [fio:9270]
      [ 1064.254445] task: ffff99b912e8b900 task.stack: ffffa6d54c758000
      [ 1064.254613] RIP: 0010:blk_mq_sched_restart+0x96/0x150
      [ 1064.256510] Call Trace:
      [ 1064.256664]  <IRQ>
      [ 1064.256824]  blk_mq_free_request+0xea/0x100
      [ 1064.256987]  msg_io_conf+0x59/0xd0 [ibnbd_client]
      [ 1064.257175]  complete_rdma_req+0xf2/0x230 [ibtrs_client]
      [ 1064.257340]  ? ibtrs_post_recv_empty+0x4d/0x70 [ibtrs_core]
      [ 1064.257502]  ibtrs_clt_rdma_done+0xd1/0x1e0 [ibtrs_client]
      [ 1064.257669]  ib_create_qp+0x321/0x380 [ib_core]
      [ 1064.257841]  ib_process_cq_direct+0xbd/0x120 [ib_core]
      [ 1064.258007]  irq_poll_softirq+0xb7/0xe0
      [ 1064.258165]  __do_softirq+0x106/0x2a2
      [ 1064.258328]  irq_exit+0x92/0xa0
      [ 1064.258509]  do_IRQ+0x4a/0xd0
      [ 1064.258660]  common_interrupt+0x7a/0x7a
      [ 1064.258818]  </IRQ>
      
      Meanwhile another context frees other queue but with the same set of
      shared tags:
      
      [ 1288.201183] INFO: task bash:5910 blocked for more than 180 seconds.
      [ 1288.201833] bash            D    0  5910   5820 0x00000000
      [ 1288.202016] Call Trace:
      [ 1288.202315]  schedule+0x32/0x80
      [ 1288.202462]  schedule_timeout+0x1e5/0x380
      [ 1288.203838]  wait_for_completion+0xb0/0x120
      [ 1288.204137]  __wait_rcu_gp+0x125/0x160
      [ 1288.204287]  synchronize_sched+0x6e/0x80
      [ 1288.204770]  blk_mq_free_queue+0x74/0xe0
      [ 1288.204922]  blk_cleanup_queue+0xc7/0x110
      [ 1288.205073]  ibnbd_clt_unmap_device+0x1bc/0x280 [ibnbd_client]
      [ 1288.205389]  ibnbd_clt_unmap_dev_store+0x169/0x1f0 [ibnbd_client]
      [ 1288.205548]  kernfs_fop_write+0x109/0x180
      [ 1288.206328]  vfs_write+0xb3/0x1a0
      [ 1288.206476]  SyS_write+0x52/0xc0
      [ 1288.206624]  do_syscall_64+0x68/0x1d0
      [ 1288.206774]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
      
      What happened is the following:
      
      1. There are several MQ queues with shared tags.
      2. One queue is about to be freed and now task is in
         blk_mq_del_queue_tag_set().
      3. Other CPU is in blk_mq_sched_restart() and loops over all queues in
         tag list in order to find hctx to restart.
      
      Because linked list entry was modified in blk_mq_del_queue_tag_set()
      without proper waiting for a grace period, blk_mq_sched_restart()
      never ends, spining in list_for_each_entry_rcu_rr(), thus soft lockup.
      
      Fix is simple: reinit list entry after an RCU grace period elapsed.
      
      Fixes: Fixes: 705cda97 ("blk-mq: Make it safe to use RCU to iterate over blk_mq_tag_set.tag_list")
      Cc: stable@vger.kernel.org
      Cc: Sagi Grimberg <sagi@grimberg.me>
      Cc: linux-block@vger.kernel.org
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarMing Lei <ming.lei@redhat.com>
      Reviewed-by: default avatarBart Van Assche <bart.vanassche@wdc.com>
      Signed-off-by: default avatarRoman Pen <roman.penyaev@profitbricks.com>
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      a347c7ad
    • Linus Torvalds's avatar
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · f0dc7f9c
      Linus Torvalds authored
      Pull networking fixes from David Miller:
      
       1) Fix several bpfilter/UMH bugs, in particular make the UMH build not
          depend upon X86 specific Kconfig symbols. From Alexei Starovoitov.
      
       2) Fix handling of modified context pointer in bpf verifier, from
          Daniel Borkmann.
      
       3) Kill regression in ifdown/ifup sequences for hv_netvsc driver, from
          Dexuan Cui.
      
       4) When the bonding primary member name changes, we have to re-evaluate
          the bond->force_primary setting, from Xiangning Yu.
      
       5) Eliminate possible padding beyone end of SKB in cdc_ncm driver, from
          Bjørn Mork.
      
       6) RX queue length reported for UDP sockets in procfs and socket diag
          are inaccurate, from Paolo Abeni.
      
       7) Fix br_fdb_find_port() locking, from Petr Machata.
      
       8) Limit sk_rcvlowat values properly in TCP, from Soheil Hassas
          Yeganeh.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (23 commits)
        tcp: limit sk_rcvlowat by the maximum receive buffer
        net: phy: dp83822: use BMCR_ANENABLE instead of BMSR_ANEGCAPABLE for DP83620
        socket: close race condition between sock_close() and sockfs_setattr()
        net: bridge: Fix locking in br_fdb_find_port()
        udp: fix rx queue len reported by diag and proc interface
        cdc_ncm: avoid padding beyond end of skb
        net/sched: act_simple: fix parsing of TCA_DEF_DATA
        net: fddi: fix a possible null-ptr-deref
        net: aquantia: fix unsigned numvecs comparison with less than zero
        net: stmmac: fix build failure due to missing COMMON_CLK dependency
        bpfilter: fix race in pipe access
        bpf, xdp: fix crash in xdp_umem_unaccount_pages
        xsk: Fix umem fill/completion queue mmap on 32-bit
        tools/bpf: fix selftest get_cgroup_id_user
        bpfilter: fix OUTPUT_FORMAT
        umh: fix race condition
        net: mscc: ocelot: Fix uninitialized error in ocelot_netdevice_event()
        bonding: re-evaluate force_primary when the primary slave name changes
        ip_tunnel: Fix name string concatenate in __ip_tunnel_create()
        hv_netvsc: Fix a network regression after ifdown/ifup
        ...
      f0dc7f9c
  7. 10 Jun, 2018 16 commits
    • Linus Torvalds's avatar
      Merge tag 'rtc-4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux · 1aaccb5f
      Linus Torvalds authored
      Pull RTC updates from Alexandre Belloni:
       "Setting the supported range from drivers for RTCs failing soon has
        started. A few fixes are developed along the way. Some drivers have
        been switched to SPDX by their maintainers.
      
        Subsystem:
      
         - rework of the rtc-test driver which allows to test the core more
           thoroughly
      
         - rtc_set_alarm() now fails early when alarms are not supported
      
        Drivers:
      
         - mktime() is now replaced by mktime64()
      
         - RTC range added for 88pm80x, ab-b5ze-s3, at91rm9200,
           brcmstb-waketimer, ds1685, ftrtc010, ls1x, mxc_v2, rx8581, sprd,
           st-lpc, tps6586x, tps65910 and vr41xx
      
         - fixed a possible race condition in probe functions
      
         - pxa: fix the probe function that is broken since v4.3
      
         - stm32: now supports stm32mp1"
      
      * tag 'rtc-4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux: (78 commits)
        rtc: pxa: fix probe function
        rtc: cros-ec: Switch to SPDX identifier.
        rtc: cros-ec: Make license text and module license match.
        rtc: ensure rtc_set_alarm fails when alarms are not supported
        rtc: test: remove alarm support from the first device
        rtc: test: convert to devm_rtc_allocate_device
        rtc: ftrtc010: let the core handle range
        rtc: ftrtc010: handle dates after 2106
        rtc: ftrtc010: switch to devm_rtc_allocate_device
        rtc: mrst: switch to devm functions
        rtc: sunxi: fix possible race condition
        rtc: test: remove irq sysfs file
        rtc: test: emulate alarms using timers
        rtc: test: store time as an offset to system time
        rtc: test: allow registering many devices
        rtc: test: remove useless proc info
        rtc: ds1685: Add range
        rtc: ds1685: fix possible race condition
        rtc: sprd: Add new RTC power down check method
        rtc: sun6i: Fix bit_idx value for clk_register_gate
        ...
      1aaccb5f
    • Linus Torvalds's avatar
      Merge tag 'upstream-4.18-rc1' of git://git.infradead.org/linux-ubifs · ab0b2e59
      Linus Torvalds authored
      Pull UBI and UBIFS updates from Richard Weinberger:
      
       - the UBI on-disk format header file is now dual licensed
      
       - new way to detect Fastmap problems during runtime
      
       - bugfix for Fastmap
      
       - minor updates for UBIFS (spelling, comments, vm_fault_t, ...)
      
      * tag 'upstream-4.18-rc1' of git://git.infradead.org/linux-ubifs:
        mtd: ubi: Update ubi-media.h to dual license
        ubi: fastmap: Detect EBA mismatches on-the-fly
        ubi: fastmap: Check each mapping only once
        ubi: fastmap: Correctly handle interrupted erasures in EBA
        ubi: fastmap: Cancel work upon detach
        ubifs: lpt: Fix wrong pnode number range in comment
        ubifs: gc: Fix typo
        ubifs: log: Some spelling fixes
        ubifs: Spelling fix someting -> something
        ubifs: journal: Remove wrong comment
        ubifs: remove set but never used variable
        ubifs, xattr: remove misguided quota flags
        fs: ubifs: Adding new return type vm_fault_t
      ab0b2e59
    • Soheil Hassas Yeganeh's avatar
      tcp: limit sk_rcvlowat by the maximum receive buffer · 867f816b
      Soheil Hassas Yeganeh authored
      The user-provided value to setsockopt(SO_RCVLOWAT) can be
      larger than the maximum possible receive buffer. Such values
      mute POLLIN signals on the socket which can stall progress
      on the socket.
      
      Limit the user-provided value to half of the maximum receive
      buffer, i.e., half of sk_rcvbuf when the receive buffer size
      is set by the user, or otherwise half of sysctl_tcp_rmem[2].
      
      Fixes: d1361840 ("tcp: fix SO_RCVLOWAT and RCVBUF autotuning")
      Signed-off-by: default avatarSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarNeal Cardwell <ncardwell@google.com>
      Acked-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      867f816b
    • Linus Torvalds's avatar
      Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · 5f85942c
      Linus Torvalds authored
      Pull SCSI updates from James Bottomley:
       "This is mostly updates to the usual drivers: ufs, qedf, mpt3sas, lpfc,
        xfcp, hisi_sas, cxlflash, qla2xxx.
      
        In the absence of Nic, we're also taking target updates which are
        mostly minor except for the tcmu refactor.
      
        The only real core change to worry about is the removal of high page
        bouncing (in sas, storvsc and iscsi). This has been well tested and no
        problems have shown up so far"
      
      * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (268 commits)
        scsi: lpfc: update driver version to 12.0.0.4
        scsi: lpfc: Fix port initialization failure.
        scsi: lpfc: Fix 16gb hbas failing cq create.
        scsi: lpfc: Fix crash in blk_mq layer when executing modprobe -r lpfc
        scsi: lpfc: correct oversubscription of nvme io requests for an adapter
        scsi: lpfc: Fix MDS diagnostics failure (Rx < Tx)
        scsi: hisi_sas: Mark PHY as in reset for nexus reset
        scsi: hisi_sas: Fix return value when get_free_slot() failed
        scsi: hisi_sas: Terminate STP reject quickly for v2 hw
        scsi: hisi_sas: Add v2 hw force PHY function for internal ATA command
        scsi: hisi_sas: Include TMF elements in struct hisi_sas_slot
        scsi: hisi_sas: Try wait commands before before controller reset
        scsi: hisi_sas: Init disks after controller reset
        scsi: hisi_sas: Create a scsi_host_template per HW module
        scsi: hisi_sas: Reset disks when discovered
        scsi: hisi_sas: Add LED feature for v3 hw
        scsi: hisi_sas: Change common allocation mode of device id
        scsi: hisi_sas: change slot index allocation mode
        scsi: hisi_sas: Introduce hisi_sas_phy_set_linkrate()
        scsi: hisi_sas: fix a typo in hisi_sas_task_prep()
        ...
      5f85942c
    • Alvaro Gamez Machado's avatar
      net: phy: dp83822: use BMCR_ANENABLE instead of BMSR_ANEGCAPABLE for DP83620 · b718e8c8
      Alvaro Gamez Machado authored
      DP83620 register set is compatible with the DP83848, but it also supports
      100base-FX. When the hardware is configured such as that fiber mode is
      enabled, autonegotiation is not possible.
      
      The chip, however, doesn't expose this information via BMSR_ANEGCAPABLE.
      Instead, this bit is always set high, even if the particular hardware
      configuration makes it so that auto negotiation is not possible [1]. Under
      these circumstances, the phy subsystem keeps trying for autonegotiation to
      happen, without success.
      
      Hereby, we inspect BMCR_ANENABLE bit after genphy_config_init, which on
      reset is set to 0 when auto negotiation is disabled, and so we use this
      value instead of BMSR_ANEGCAPABLE.
      
      [1] https://e2e.ti.com/support/interface/ethernet/f/903/p/697165/2571170Signed-off-by: default avatarAlvaro Gamez Machado <alvaro.gamez@hazent.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b718e8c8
    • Cong Wang's avatar
      socket: close race condition between sock_close() and sockfs_setattr() · 6d8c50dc
      Cong Wang authored
      fchownat() doesn't even hold refcnt of fd until it figures out
      fd is really needed (otherwise is ignored) and releases it after
      it resolves the path. This means sock_close() could race with
      sockfs_setattr(), which leads to a NULL pointer dereference
      since typically we set sock->sk to NULL in ->release().
      
      As pointed out by Al, this is unique to sockfs. So we can fix this
      in socket layer by acquiring inode_lock in sock_close() and
      checking against NULL in sockfs_setattr().
      
      sock_release() is called in many places, only the sock_close()
      path matters here. And fortunately, this should not affect normal
      sock_close() as it is only called when the last fd refcnt is gone.
      It only affects sock_close() with a parallel sockfs_setattr() in
      progress, which is not common.
      
      Fixes: 86741ec2 ("net: core: Add a UID field to struct sock.")
      Reported-by: default avatarshankarapailoor <shankarapailoor@gmail.com>
      Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
      Cc: Lorenzo Colitti <lorenzo@google.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6d8c50dc
    • Linus Torvalds's avatar
      Merge tag '4.18-fixes-smb3' of git://git.samba.org/sfrench/cifs-2.6 · 0c14e43a
      Linus Torvalds authored
      Pull cifs fixes from Steve French:
      
       - one smb3 (ACL related) fix for stable
      
       - one SMB3 security enhancement (when mounting -t smb3 forbid less
         secure dialects)
      
       - some RDMA and compounding fixes
      
      * tag '4.18-fixes-smb3' of git://git.samba.org/sfrench/cifs-2.6:
        cifs: fix a buffer leak in smb2_query_symlink
        smb3: do not allow insecure cifs mounts when using smb3
        CIFS: Fix NULL ptr deref
        CIFS: fix encryption in SMB3.1.1
        CIFS: Pass page offset for encrypting
        CIFS: Pass page offset for calculating signature
        CIFS: SMBD: Support page offset in memory registration
        CIFS: SMBD: Support page offset in RDMA recv
        CIFS: SMBD: Support page offset in RDMA send
        CIFS: When sending data on socket, pass the correct page offset
        CIFS: Introduce helper function to get page offset and length in smb_rqst
        CIFS: Calculate the correct request length based on page offset and tail size
        cifs: For SMB2 security informaion query, check for minimum sized security descriptor instead of sizeof FileAllInformation class
        CIFS: Fix signing for SMB2/3
      0c14e43a
    • Linus Torvalds's avatar
      Merge tag 'for-linus-20180610' of git://git.kernel.dk/linux-block · bbaa1013
      Linus Torvalds authored
      Pull block flush handling fix from Jens Axboe:
       "Single fix that we should merge now, fixing a regression in queuing
        flush request, accessing request flags after calling the end_request
        handler"
      
      * tag 'for-linus-20180610' of git://git.kernel.dk/linux-block:
        block: fix use-after-free in block flush handling
      bbaa1013
    • Linus Torvalds's avatar
      Merge branch 'core-rseq-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · d82991a8
      Linus Torvalds authored
      Pull restartable sequence support from Thomas Gleixner:
       "The restartable sequences syscall (finally):
      
        After a lot of back and forth discussion and massive delays caused by
        the speculative distraction of maintainers, the core set of
        restartable sequences has finally reached a consensus.
      
        It comes with the basic non disputed core implementation along with
        support for arm, powerpc and x86 and a full set of selftests
      
        It was exposed to linux-next earlier this week, so it does not fully
        comply with the merge window requirements, but there is really no
        point to drag it out for yet another cycle"
      
      * 'core-rseq-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        rseq/selftests: Provide Makefile, scripts, gitignore
        rseq/selftests: Provide parametrized tests
        rseq/selftests: Provide basic percpu ops test
        rseq/selftests: Provide basic test
        rseq/selftests: Provide rseq library
        selftests/lib.mk: Introduce OVERRIDE_TARGETS
        powerpc: Wire up restartable sequences system call
        powerpc: Add syscall detection for restartable sequences
        powerpc: Add support for restartable sequences
        x86: Wire up restartable sequence system call
        x86: Add support for restartable sequences
        arm: Wire up restartable sequences system call
        arm: Add syscall detection for restartable sequences
        arm: Add restartable sequences support
        rseq: Introduce restartable sequences system call
        uapi/headers: Provide types_32_64.h
      d82991a8
    • Linus Torvalds's avatar
      Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · f4e5b30d
      Linus Torvalds authored
      Pull x86 updates and fixes from Thomas Gleixner:
      
       - Fix the (late) fallout from the vector management rework causing
         hlist corruption and irq descriptor reference leaks caused by a
         missing sanity check.
      
         The straight forward fix triggered another long standing issue to
         surface. The pre rework code hid the issue due to being way slower,
         but now the chance that user space sees an EBUSY error return when
         updating irq affinities is way higher, though quite a bunch of
         userspace tools do not handle it properly despite the fact that EBUSY
         could be returned for at least 10 years.
      
         It turned out that the EBUSY return can be avoided completely by
         utilizing the existing delayed affinity update mechanism for irq
         remapped scenarios as well. That's a bit more error handling in the
         kernel, but avoids fruitless fingerpointing discussions with tool
         developers.
      
       - Decouple PHYSICAL_MASK from AMD SME as its going to be required for
         the upcoming Intel memory encryption support as well.
      
       - Handle legacy device ACPI detection properly for newer platforms
      
       - Fix the wrong argument ordering in the vector allocation tracepoint
      
       - Simplify the IDT setup code for the APIC=n case
      
       - Use the proper string helpers in the MTRR code
      
       - Remove a stale unused VDSO source file
      
       - Convert the microcode update lock to a raw spinlock as its used in
         atomic context.
      
      * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/intel_rdt: Enable CMT and MBM on new Skylake stepping
        x86/apic/vector: Print APIC control bits in debugfs
        genirq/affinity: Defer affinity setting if irq chip is busy
        x86/platform/uv: Use apic_ack_irq()
        x86/ioapic: Use apic_ack_irq()
        irq_remapping: Use apic_ack_irq()
        x86/apic: Provide apic_ack_irq()
        genirq/migration: Avoid out of line call if pending is not set
        genirq/generic_pending: Do not lose pending affinity update
        x86/apic/vector: Prevent hlist corruption and leaks
        x86/vector: Fix the args of vector_alloc tracepoint
        x86/idt: Simplify the idt_setup_apic_and_irq_gates()
        x86/platform/uv: Remove extra parentheses
        x86/mm: Decouple dynamic __PHYSICAL_MASK from AMD SME
        x86: Mark native_set_p4d() as __always_inline
        x86/microcode: Make the late update update_lock a raw lock for RT
        x86/mtrr: Convert to use strncpy_from_user() helper
        x86/mtrr: Convert to use match_string() helper
        x86/vdso: Remove unused file
        x86/i8237: Register device based on FADT legacy boot flag
      f4e5b30d
    • Linus Torvalds's avatar
      Merge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · a2211de0
      Linus Torvalds authored
      Pull x86 pti updates from Thomas Gleixner:
       "Three small commits updating the SSB mitigation to take the updated
        AMD mitigation variants into account"
      
      * 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/bugs: Switch the selection of mitigation from CPU vendor to CPU features
        x86/bugs: Add AMD's SPEC_CTRL MSR usage
        x86/bugs: Add AMD's variant of SSB_NO
      a2211de0
    • Linus Torvalds's avatar
      Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 2322d6c5
      Linus Torvalds authored
      Pull more perf tooling updates from Thomas Gleixner:
       "Perf tool updates and fixes:
      
        perf stat:
      
         - Display user and system time for workload targets (Jiri Olsa)
      
        perf record:
      
         - Enable arbitrary event names thru name= modifier (Alexey Budankov)
      
        PowerPC:
      
         - Add a python script for hypervisor call statistics (Ravi Bangoria)
      
        Intel PT: (Adrian Hunter)
      
         - Fix sync_switch INTEL_PT_SS_NOT_TRACING
      
         - Fix decoding to accept CBR between FUP and corresponding TIP
      
         - Fix MTC timing after overflow
      
         - Fix "Unexpected indirect branch" error
      
        perf test:
      
         - record+probe_libc_inet_pton:
            - To get the symbol table for dynamic shared objects on ubuntu we
              need to pass the -D/--dynamic command line option, unlike with
              the fedora distros (Arnaldo Carvalho de Melo)
      
         - code-reading:
            - Fix perf_env setup for PTI entry trampolines (Adrian Hunter)
      
         - kmod-path:
            - Add tests for vdso32 and vdsox32 (Adrian Hunter)
      
         - Use header file util/debug.h (Thomas Richter)
      
        perf annotate:
      
         - Make the various UI backends (stdio, TUI, gtk) use more
           consistently structs with annotation options as specified by the
           user (Arnaldo Carvalho de Melo)
      
         - Move annotation specific knobs from the symbol_conf global kitchen
           sink to the annotation option structs (Arnaldo Carvalho de Melo)
      
        perf script:
      
         - Add more PMU fields to python scripts event handler dict (Jin Yao)
      
        Core:
      
         - Fix misleading error for some unparsable events mentioning PMUs
           when those are not involved in the problem (Jiri Olsa)
      
         - Consider BSS symbols when processing /proc/kallsyms ('B' and 'b')
           (Arnaldo Carvalho de Melo)
      
         - Be more robust when trying to use per-symbol histograms, checking
           for unlikely but possible cases where the space for the histograms
           wasn't allocated, print a debug message for such cases (Arnaldo
           Carvalho de Melo)
      
         - Fix symbol and object code resolution for vdso32 and vdsox32
           (Adrian Hunter)
      
         - No need to check for null when passing pointers to foo__get() style
           refcount grabbing helpers, just like in the kernel and with free(),
           its safe to pass a NULL pointer to avoid having to check it before
           each and every foo__get() call (Arnaldo Carvalho de Melo)
      
         - Remove some dead code (quote.[ch]) (Arnaldo Carvalho de Melo)
      
         - Remove some needless globals, making them local (Arnaldo Carvalho
           de Melo)
      
         - Reduce usage of symbol_conf.use_callchain, using other means of
           finding out if callchains are in use or available for specific
           events, as we evolved this codebase to allow requesting callchains
           for just a subset of the monitored events. In time it will help
           polish recording and showing mixed sets accross the various tools:
      
              perf record -e cycles/call-graph=fp/,cache-misses/call-graph=dwarf/,instructions'
      
           (Arnaldo Carvalho de Melo)
      
         - Consider PTI entry trampolines in map__rip_2objdump() (Adrian
           Hunter)"
      
      * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (50 commits)
        perf script python: Add dict fields introduction to Documentation
        perf script python: Add more PMU fields to event handler dict
        perf script python: Move dsoname code to a new function
        perf symbols: Add BSS symbols when reading from /proc/kallsyms
        perf annnotate: Make __symbol__inc_addr_samples handle src->histograms == NULL
        perf intel-pt: Fix "Unexpected indirect branch" error
        perf intel-pt: Fix MTC timing after overflow
        perf intel-pt: Fix decoding to accept CBR between FUP and corresponding TIP
        perf intel-pt: Fix sync_switch INTEL_PT_SS_NOT_TRACING
        perf script powerpc: Python script for hypervisor call statistics
        perf test record+probe_libc_inet_pton: Ask 'nm' for dynamic symbols
        perf map: Consider PTI entry trampolines in rip_2objdump()
        perf test code-reading: Fix perf_env setup for PTI entry trampolines
        perf tools: Fix pmu events parsing rule
        perf stat: Display user and system time
        perf record: Enable arbitrary event names thru name= modifier
        perf tools: Fix symbol and object code resolution for vdso32 and vdsox32
        perf tests kmod-path: Add tests for vdso32 and vdsox32
        perf hists: Check if a hist_entry has callchains before using them
        perf hists: Introduce hist_entry__has_callchain() method
        ...
      2322d6c5
    • Linus Torvalds's avatar
      Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 9f3fbe85
      Linus Torvalds authored
      Pull irq fixes from Thomas Gleixner:
       "Two small fixlets:
      
         - Add the missing iomu mapping call in the Freescale/NXP/Qualcomm/
           whoever owns it now/ SCFG MSI irqchip driver. Otherwise IRQs wont
           work at all.
      
         - Fix a SMP=n build warning in the STM32 irq chip driver"
      
      * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        irqchip/ls-scfg-msi: Map MSIs in the iommu
        irqchip/stm32: Fix non-SMP build warning
      9f3fbe85
    • Linus Torvalds's avatar
      Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · a8a4021b
      Linus Torvalds authored
      Pull core fixes from Thomas Gleixner:
       "A small set of core updates:
      
         - Make objtool cope with GCC8 oddities some more
      
         - Remove a stale local_irq_save/restore sequence in the signal code
           along with the stale comment in the RCU code. The underlying issue
           which led to this has been solved long time ago, but nobody cared
           to cleanup the hackarounds"
      
      * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        signal: Remove no longer required irqsave/restore
        rcu: Update documentation of rcu_read_unlock()
        objtool: Fix GCC 8 cold subfunction detection for aliased functions
      a8a4021b
    • Anna-Maria Gleixner's avatar
      signal: Remove no longer required irqsave/restore · 59dc6f3c
      Anna-Maria Gleixner authored
      Commit a841796f ("signal: align __lock_task_sighand() irq disabling and
      RCU") introduced a rcu read side critical section with interrupts
      disabled. The changelog suggested that a better long-term fix would be "to
      make rt_mutex_unlock() disable irqs when acquiring the rt_mutex structure's
      ->wait_lock".
      
      This long-term fix has been made in commit b4abf910 ("rtmutex: Make
      wait_lock irq safe") for a different reason.
      
      Therefore revert commit a841796f ("signal: align >
      __lock_task_sighand() irq disabling and RCU") as the interrupt disable
      dance is not longer required.
      
      The change was tested on the base of b4abf910 ("rtmutex: Make wait_lock
      irq safe") with a four hour run of rcutorture scenario TREE03 with lockdep
      enabled as suggested by Paul McKenney.
      Signed-off-by: default avatarAnna-Maria Gleixner <anna-maria@linutronix.de>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Acked-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Acked-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Cc: bigeasy@linutronix.de
      Link: https://lkml.kernel.org/r/20180525090507.22248-3-anna-maria@linutronix.de
      59dc6f3c
    • Anna-Maria Gleixner's avatar
      rcu: Update documentation of rcu_read_unlock() · ec84b27f
      Anna-Maria Gleixner authored
      Since commit b4abf910 ("rtmutex: Make wait_lock irq safe") the
      explanation in rcu_read_unlock() documentation about irq unsafe rtmutex
      wait_lock is no longer valid.
      
      Remove it to prevent kernel developers reading the documentation to rely on
      it.
      Suggested-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: default avatarAnna-Maria Gleixner <anna-maria@linutronix.de>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Acked-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Cc: bigeasy@linutronix.de
      Link: https://lkml.kernel.org/r/20180525090507.22248-2-anna-maria@linutronix.de
      ec84b27f
  8. 09 Jun, 2018 1 commit
    • Linus Torvalds's avatar
      Merge branch 'proc-cmdline' · 3ca24ce9
      Linus Torvalds authored
      Merge proc_cmdline simplifications.
      
      This re-writes the get_mm_cmdline() logic to be rather simpler than it
      used to be, and makes the semantics for "cmdline goes past the end of
      the original area" more natural.
      
      You _can_ use prctl(PR_SET_MM) to just point your command line somewhere
      else entirely, but the traditional model is to just edit things in place
      and that still needs to continue to work.  At least this way the code
      makes some sense.
      
      * proc-cmdline:
        fs/proc: simplify and clarify get_mm_cmdline() function
        fs/proc: re-factor proc_pid_cmdline_read() a bit
      3ca24ce9