1. 27 Jul, 2018 20 commits
    • Chao Yu's avatar
      f2fs: fix to correct return value of f2fs_trim_fs · 01f9cf6d
      Chao Yu authored
      We should account trimmed block number from __wait_all_discard_cmd
      in __issue_discard_cmd_range, otherwise trimmed blocks returned
      by f2fs_trim_fs will be wrong, this patch fixes it.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      01f9cf6d
    • Chao Yu's avatar
      f2fs: fix to do sanity check with {sit,nat}_ver_bitmap_bytesize · c77ec61c
      Chao Yu authored
      This patch adds to do sanity check with {sit,nat}_ver_bitmap_bytesize
      during mount, in order to avoid accessing across cache boundary with
      this abnormal bitmap size.
      
      - Overview
      buffer overrun in build_sit_info() when mounting a crafted f2fs image
      
      - Reproduce
      
      - Kernel message
      [  548.580867] F2FS-fs (loop0): Invalid log blocks per segment (8201)
      
      [  548.580877] F2FS-fs (loop0): Can't find valid F2FS filesystem in 1th superblock
      [  548.584979] ==================================================================
      [  548.586568] BUG: KASAN: use-after-free in kmemdup+0x36/0x50
      [  548.587715] Read of size 64 at addr ffff8801e9c265ff by task mount/1295
      
      [  548.589428] CPU: 1 PID: 1295 Comm: mount Not tainted 4.18.0-rc1+ #4
      [  548.589432] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
      [  548.589438] Call Trace:
      [  548.589474]  dump_stack+0x7b/0xb5
      [  548.589487]  print_address_description+0x70/0x290
      [  548.589492]  kasan_report+0x291/0x390
      [  548.589496]  ? kmemdup+0x36/0x50
      [  548.589509]  check_memory_region+0x139/0x190
      [  548.589514]  memcpy+0x23/0x50
      [  548.589518]  kmemdup+0x36/0x50
      [  548.589545]  f2fs_build_segment_manager+0x8fa/0x3410
      [  548.589551]  ? __asan_loadN+0xf/0x20
      [  548.589560]  ? f2fs_sanity_check_ckpt+0x1be/0x240
      [  548.589566]  ? f2fs_flush_sit_entries+0x10c0/0x10c0
      [  548.589587]  ? __put_user_ns+0x40/0x40
      [  548.589604]  ? find_next_bit+0x57/0x90
      [  548.589610]  f2fs_fill_super+0x194b/0x2b40
      [  548.589617]  ? f2fs_commit_super+0x1b0/0x1b0
      [  548.589637]  ? set_blocksize+0x90/0x140
      [  548.589651]  mount_bdev+0x1c5/0x210
      [  548.589655]  ? f2fs_commit_super+0x1b0/0x1b0
      [  548.589667]  f2fs_mount+0x15/0x20
      [  548.589672]  mount_fs+0x60/0x1a0
      [  548.589683]  ? alloc_vfsmnt+0x309/0x360
      [  548.589688]  vfs_kern_mount+0x6b/0x1a0
      [  548.589699]  do_mount+0x34a/0x18c0
      [  548.589710]  ? lockref_put_or_lock+0xcf/0x160
      [  548.589716]  ? copy_mount_string+0x20/0x20
      [  548.589728]  ? memcg_kmem_put_cache+0x1b/0xa0
      [  548.589734]  ? kasan_check_write+0x14/0x20
      [  548.589740]  ? _copy_from_user+0x6a/0x90
      [  548.589744]  ? memdup_user+0x42/0x60
      [  548.589750]  ksys_mount+0x83/0xd0
      [  548.589755]  __x64_sys_mount+0x67/0x80
      [  548.589781]  do_syscall_64+0x78/0x170
      [  548.589797]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [  548.589820] RIP: 0033:0x7f76fc331b9a
      [  548.589821] Code: 48 8b 0d 01 c3 2b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ce c2 2b 00 f7 d8 64 89 01 48
      [  548.589880] RSP: 002b:00007ffd4f0a0e48 EFLAGS: 00000206 ORIG_RAX: 00000000000000a5
      [  548.589890] RAX: ffffffffffffffda RBX: 000000000146c030 RCX: 00007f76fc331b9a
      [  548.589892] RDX: 000000000146c210 RSI: 000000000146df30 RDI: 0000000001474ec0
      [  548.589895] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000013
      [  548.589897] R10: 00000000c0ed0000 R11: 0000000000000206 R12: 0000000001474ec0
      [  548.589900] R13: 000000000146c210 R14: 0000000000000000 R15: 0000000000000003
      
      [  548.590242] The buggy address belongs to the page:
      [  548.591243] page:ffffea0007a70980 count:0 mapcount:0 mapping:0000000000000000 index:0x0
      [  548.592886] flags: 0x2ffff0000000000()
      [  548.593665] raw: 02ffff0000000000 dead000000000100 dead000000000200 0000000000000000
      [  548.595258] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
      [  548.603713] page dumped because: kasan: bad access detected
      
      [  548.605203] Memory state around the buggy address:
      [  548.606198]  ffff8801e9c26480: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
      [  548.607676]  ffff8801e9c26500: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
      [  548.609157] >ffff8801e9c26580: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
      [  548.610629]                                                                 ^
      [  548.612088]  ffff8801e9c26600: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
      [  548.613674]  ffff8801e9c26680: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
      [  548.615141] ==================================================================
      [  548.616613] Disabling lock debugging due to kernel taint
      [  548.622871] WARNING: CPU: 1 PID: 1295 at mm/page_alloc.c:4065 __alloc_pages_slowpath+0xe4a/0x1420
      [  548.622878] Modules linked in: snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm snd_timer snd mac_hid i2c_piix4 soundcore ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx raid1 raid0 multipath linear 8139too crct10dif_pclmul crc32_pclmul qxl drm_kms_helper syscopyarea aesni_intel sysfillrect sysimgblt fb_sys_fops ttm drm aes_x86_64 crypto_simd cryptd 8139cp glue_helper mii pata_acpi floppy
      [  548.623217] CPU: 1 PID: 1295 Comm: mount Tainted: G    B             4.18.0-rc1+ #4
      [  548.623219] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
      [  548.623226] RIP: 0010:__alloc_pages_slowpath+0xe4a/0x1420
      [  548.623227] Code: ff ff 01 89 85 c8 fe ff ff e9 91 fc ff ff 41 89 c5 e9 5c fc ff ff 0f 0b 89 f8 25 ff ff f7 ff 89 85 8c fe ff ff e9 d5 f2 ff ff <0f> 0b e9 65 f2 ff ff 65 8b 05 38 81 d2 47 f6 c4 01 74 1c 65 48 8b
      [  548.623281] RSP: 0018:ffff8801f28c7678 EFLAGS: 00010246
      [  548.623284] RAX: 0000000000000000 RBX: 00000000006040c0 RCX: ffffffffb82f73b7
      [  548.623287] RDX: 1ffff1003e518eeb RSI: 000000000000000c RDI: 0000000000000000
      [  548.623290] RBP: ffff8801f28c7880 R08: 0000000000000000 R09: ffffed0047fff2c5
      [  548.623292] R10: 0000000000000001 R11: ffffed0047fff2c4 R12: ffff8801e88de040
      [  548.623295] R13: 00000000006040c0 R14: 000000000000000c R15: ffff8801f28c7938
      [  548.623299] FS:  00007f76fca51840(0000) GS:ffff8801f6f00000(0000) knlGS:0000000000000000
      [  548.623302] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  548.623304] CR2: 00007f19b9171760 CR3: 00000001ed952000 CR4: 00000000000006e0
      [  548.623317] Call Trace:
      [  548.623325]  ? kasan_check_read+0x11/0x20
      [  548.623330]  ? __zone_watermark_ok+0x92/0x240
      [  548.623336]  ? get_page_from_freelist+0x1c3/0x1d90
      [  548.623347]  ? _raw_spin_lock_irqsave+0x2a/0x60
      [  548.623353]  ? warn_alloc+0x250/0x250
      [  548.623358]  ? save_stack+0x46/0xd0
      [  548.623361]  ? kasan_kmalloc+0xad/0xe0
      [  548.623366]  ? __isolate_free_page+0x2a0/0x2a0
      [  548.623370]  ? mount_fs+0x60/0x1a0
      [  548.623374]  ? vfs_kern_mount+0x6b/0x1a0
      [  548.623378]  ? do_mount+0x34a/0x18c0
      [  548.623383]  ? ksys_mount+0x83/0xd0
      [  548.623387]  ? __x64_sys_mount+0x67/0x80
      [  548.623391]  ? do_syscall_64+0x78/0x170
      [  548.623396]  ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [  548.623401]  __alloc_pages_nodemask+0x3c5/0x400
      [  548.623407]  ? __alloc_pages_slowpath+0x1420/0x1420
      [  548.623412]  ? __mutex_lock_slowpath+0x20/0x20
      [  548.623417]  ? kvmalloc_node+0x31/0x80
      [  548.623424]  alloc_pages_current+0x75/0x110
      [  548.623436]  kmalloc_order+0x24/0x60
      [  548.623442]  kmalloc_order_trace+0x24/0xb0
      [  548.623448]  __kmalloc_track_caller+0x207/0x220
      [  548.623455]  ? f2fs_build_node_manager+0x399/0xbb0
      [  548.623460]  kmemdup+0x20/0x50
      [  548.623465]  f2fs_build_node_manager+0x399/0xbb0
      [  548.623470]  f2fs_fill_super+0x195e/0x2b40
      [  548.623477]  ? f2fs_commit_super+0x1b0/0x1b0
      [  548.623481]  ? set_blocksize+0x90/0x140
      [  548.623486]  mount_bdev+0x1c5/0x210
      [  548.623489]  ? f2fs_commit_super+0x1b0/0x1b0
      [  548.623495]  f2fs_mount+0x15/0x20
      [  548.623498]  mount_fs+0x60/0x1a0
      [  548.623503]  ? alloc_vfsmnt+0x309/0x360
      [  548.623508]  vfs_kern_mount+0x6b/0x1a0
      [  548.623513]  do_mount+0x34a/0x18c0
      [  548.623518]  ? lockref_put_or_lock+0xcf/0x160
      [  548.623523]  ? copy_mount_string+0x20/0x20
      [  548.623528]  ? memcg_kmem_put_cache+0x1b/0xa0
      [  548.623533]  ? kasan_check_write+0x14/0x20
      [  548.623537]  ? _copy_from_user+0x6a/0x90
      [  548.623542]  ? memdup_user+0x42/0x60
      [  548.623547]  ksys_mount+0x83/0xd0
      [  548.623552]  __x64_sys_mount+0x67/0x80
      [  548.623557]  do_syscall_64+0x78/0x170
      [  548.623562]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [  548.623566] RIP: 0033:0x7f76fc331b9a
      [  548.623567] Code: 48 8b 0d 01 c3 2b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ce c2 2b 00 f7 d8 64 89 01 48
      [  548.623632] RSP: 002b:00007ffd4f0a0e48 EFLAGS: 00000206 ORIG_RAX: 00000000000000a5
      [  548.623636] RAX: ffffffffffffffda RBX: 000000000146c030 RCX: 00007f76fc331b9a
      [  548.623639] RDX: 000000000146c210 RSI: 000000000146df30 RDI: 0000000001474ec0
      [  548.623641] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000013
      [  548.623643] R10: 00000000c0ed0000 R11: 0000000000000206 R12: 0000000001474ec0
      [  548.623646] R13: 000000000146c210 R14: 0000000000000000 R15: 0000000000000003
      [  548.623650] ---[ end trace 4ce02f25ff7d3df5 ]---
      [  548.623656] F2FS-fs (loop0): Failed to initialize F2FS node manager
      [  548.627936] F2FS-fs (loop0): Invalid log blocks per segment (8201)
      
      [  548.627940] F2FS-fs (loop0): Can't find valid F2FS filesystem in 1th superblock
      [  548.635835] F2FS-fs (loop0): Failed to initialize F2FS node manager
      
      - Location
      https://elixir.bootlin.com/linux/v4.18-rc1/source/fs/f2fs/segment.c#L3578
      
      	sit_i->sit_bitmap = kmemdup(src_bitmap, bitmap_size, GFP_KERNEL);
      
      Buffer overrun happens when doing memcpy. I suspect there is missing (inconsistent) checks on bitmap_size.
      
      Reported by Wen Xu (wen.xu@gatech.edu) from SSLab, Gatech.
      Reported-by: default avatarWen Xu <wen.xu@gatech.edu>
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      c77ec61c
    • Chao Yu's avatar
      f2fs: fix to do sanity check with secs_per_zone · 42bf546c
      Chao Yu authored
      As Wen Xu reported in below link:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=200183
      
      - Overview
      Divide zero in reset_curseg() when mounting a crafted f2fs image
      
      - Reproduce
      
      - Kernel message
      [  588.281510] divide error: 0000 [#1] SMP KASAN PTI
      [  588.282701] CPU: 0 PID: 1293 Comm: mount Not tainted 4.18.0-rc1+ #4
      [  588.284000] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
      [  588.286178] RIP: 0010:reset_curseg+0x94/0x1a0
      [  588.298166] RSP: 0018:ffff8801e88d7940 EFLAGS: 00010246
      [  588.299360] RAX: 0000000000000014 RBX: ffff8801e1d46d00 RCX: ffffffffb88bf60b
      [  588.300809] RDX: 0000000000000000 RSI: dffffc0000000000 RDI: ffff8801e1d46d64
      [  588.305272] R13: 0000000000000000 R14: 0000000000000014 R15: 0000000000000000
      [  588.306822] FS:  00007fad85008840(0000) GS:ffff8801f6e00000(0000) knlGS:0000000000000000
      [  588.308456] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  588.309623] CR2: 0000000001705078 CR3: 00000001f30f8000 CR4: 00000000000006f0
      [  588.311085] Call Trace:
      [  588.311637]  f2fs_build_segment_manager+0x103f/0x3410
      [  588.316136]  ? f2fs_commit_super+0x1b0/0x1b0
      [  588.317031]  ? set_blocksize+0x90/0x140
      [  588.319473]  f2fs_mount+0x15/0x20
      [  588.320166]  mount_fs+0x60/0x1a0
      [  588.320847]  ? alloc_vfsmnt+0x309/0x360
      [  588.321647]  vfs_kern_mount+0x6b/0x1a0
      [  588.322432]  do_mount+0x34a/0x18c0
      [  588.323175]  ? strndup_user+0x46/0x70
      [  588.323937]  ? copy_mount_string+0x20/0x20
      [  588.324793]  ? memcg_kmem_put_cache+0x1b/0xa0
      [  588.325702]  ? kasan_check_write+0x14/0x20
      [  588.326562]  ? _copy_from_user+0x6a/0x90
      [  588.327375]  ? memdup_user+0x42/0x60
      [  588.328118]  ksys_mount+0x83/0xd0
      [  588.328808]  __x64_sys_mount+0x67/0x80
      [  588.329607]  do_syscall_64+0x78/0x170
      [  588.330400]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [  588.331461] RIP: 0033:0x7fad848e8b9a
      [  588.336022] RSP: 002b:00007ffd7c5b6be8 EFLAGS: 00000206 ORIG_RAX: 00000000000000a5
      [  588.337547] RAX: ffffffffffffffda RBX: 00000000016f8030 RCX: 00007fad848e8b9a
      [  588.338999] RDX: 00000000016f8210 RSI: 00000000016f9f30 RDI: 0000000001700ec0
      [  588.340442] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000013
      [  588.341887] R10: 00000000c0ed0000 R11: 0000000000000206 R12: 0000000001700ec0
      [  588.343341] R13: 00000000016f8210 R14: 0000000000000000 R15: 0000000000000003
      [  588.354891] ---[ end trace 4ce02f25ff7d3df5 ]---
      [  588.355862] RIP: 0010:reset_curseg+0x94/0x1a0
      [  588.360742] RSP: 0018:ffff8801e88d7940 EFLAGS: 00010246
      [  588.361812] RAX: 0000000000000014 RBX: ffff8801e1d46d00 RCX: ffffffffb88bf60b
      [  588.363485] RDX: 0000000000000000 RSI: dffffc0000000000 RDI: ffff8801e1d46d64
      [  588.365213] RBP: ffff8801e88d7968 R08: ffffed003c32266f R09: ffffed003c32266f
      [  588.366661] R10: 0000000000000001 R11: ffffed003c32266e R12: ffff8801f0337700
      [  588.368110] R13: 0000000000000000 R14: 0000000000000014 R15: 0000000000000000
      [  588.370057] FS:  00007fad85008840(0000) GS:ffff8801f6e00000(0000) knlGS:0000000000000000
      [  588.372099] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  588.373291] CR2: 0000000001705078 CR3: 00000001f30f8000 CR4: 00000000000006f0
      
      - Location
      https://elixir.bootlin.com/linux/latest/source/fs/f2fs/segment.c#L2147
              curseg->zone = GET_ZONE_FROM_SEG(sbi, curseg->segno);
      
      If secs_per_zone is corrupted due to fuzzing test, it will cause divide
      zero operation when using GET_ZONE_FROM_SEG macro, so we should do more
      sanity check with secs_per_zone during mount to avoid this issue.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      42bf546c
    • Chao Yu's avatar
      f2fs: disable f2fs_check_rb_tree_consistence · 67fce70b
      Chao Yu authored
      If there is millions of discard entries cached in rb tree, each
      sanity check of it can cause very long latency as held cmd_lock
      blocking other lock grabbers.
      
      In other aspect, we have enabled the check very long time, as
      we see, there is no such inconsistent condition caused by bugs.
      
      But still we do not choose to kill it directly, instead, adding
      an flag to disable the check now, if there is related code change,
      we can reuse it to detect bugs.
      Signed-off-by: default avatarYunlei He <heyunlei@huawei.com>
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      67fce70b
    • Chao Yu's avatar
      f2fs: introduce and spread verify_blkaddr · e1da7872
      Chao Yu authored
      This patch introduces verify_blkaddr to check meta/data block address
      with valid range to detect bug earlier.
      
      In addition, once we encounter an invalid blkaddr, notice user to run
      fsck to fix, and let the kernel panic.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      e1da7872
    • Arnd Bergmann's avatar
      f2fs: use timespec64 for inode timestamps · 24b81dfc
      Arnd Bergmann authored
      The on-disk representation and the vfs both use 64-bit tv_sec values,
      so let's change the last missing piece in the middle.
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      24b81dfc
    • Chao Yu's avatar
      f2fs: fix to wait on page writeback before updating page · 6aead161
      Chao Yu authored
      In error path of f2fs_move_rehashed_dirents, inode page could be writeback
      state, so we should wait on inode page writeback before updating it.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      6aead161
    • Jaegeuk Kim's avatar
      f2fs: assign REQ_RAHEAD to bio for ->readpages · e2e59414
      Jaegeuk Kim authored
      As Jens reported, we'd better assign REQ_RAHEAD to bio by the fact that
      ->readpages is called only from read-ahead.
      
      In Documentation/filesystems/vfs.txt,
      
      readpages: called by the VM to read pages associated with the address_space
        	object. This is essentially just a vector version of
        	readpage.  Instead of just one page, several pages are
        	requested.
      	readpages is only used for read-ahead, so read errors are
        	ignored.  If anything goes wrong, feel free to give up.
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      e2e59414
    • Yunlei He's avatar
      f2fs: fix a hungtask problem caused by congestion_wait · 2a63531a
      Yunlei He authored
      This patch fix hungtask problem which can be reproduced as follow:
      
      Thread 0~3:
      while true
      do
              touch /xxx/test/file_xxx
      done
      
      Thread 4 write a new checkpoint every three seconds.
      
      In the meantime, fio start 16 threads for randwrite.
      
      With my debug info, cycles num will exceed 1000 in function
      f2fs_sync_dirty_inodes, and most of cycle will be dropped
      into congestion_wait() and sleep more than 20ms. Cycles num
      reduced to 3 with this patch.
      Signed-off-by: default avatarYunlei He <heyunlei@huawei.com>
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      2a63531a
    • Dan Carpenter's avatar
      f2fs: Fix uninitialized return in f2fs_ioc_shutdown() · 2a96d8ad
      Dan Carpenter authored
      "ret" can be uninitialized on the success path when "in ==
      F2FS_GOING_DOWN_FULLSYNC".
      
      Fixes: 60b2b4ee ("f2fs: Fix deadlock in shutdown ioctl")
      Signed-off-by: default avatarDan Carpenter <dan.carpenter@oracle.com>
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      2a96d8ad
    • Jaegeuk Kim's avatar
      f2fs: don't issue discard commands in online discard is on · 5a615492
      Jaegeuk Kim authored
      Actually, we don't need to issue discard commands, if discard is on, as
      mentioned in the comment.
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      5a615492
    • Chao Yu's avatar
      f2fs: fix to propagate return value of scan_nat_page() · e2374015
      Chao Yu authored
      As Anatoly Trosinenko reported in bugzilla:
      
      How to reproduce:
      1. Compile the 73fcb1a3 version of the kernel using the config attached
      2. Unpack and mount the attached filesystem image as F2FS
      3. The kernel will BUG() on mount (BUGs are explicitly enabled in config)
      
      [    2.233612] F2FS-fs (sda): Found nat_bits in checkpoint
      [    2.248422] ------------[ cut here ]------------
      [    2.248857] kernel BUG at fs/f2fs/node.c:1967!
      [    2.249760] invalid opcode: 0000 [#1] SMP NOPTI
      [    2.250219] Modules linked in:
      [    2.251848] CPU: 0 PID: 944 Comm: mount Not tainted 4.17.0-rc5+ #1
      [    2.252331] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
      [    2.253305] RIP: 0010:build_free_nids+0x337/0x3f0
      [    2.253672] RSP: 0018:ffffae7fc0857c50 EFLAGS: 00000246
      [    2.254080] RAX: 00000000ffffffff RBX: 0000000000000123 RCX: 0000000000000001
      [    2.254638] RDX: ffff9aa7063d5c00 RSI: 0000000000000122 RDI: ffff9aa705852e00
      [    2.255190] RBP: ffff9aa705852e00 R08: 0000000000000001 R09: ffff9aa7059090c0
      [    2.255719] R10: 0000000000000000 R11: 0000000000000000 R12: ffff9aa705852e00
      [    2.256242] R13: ffff9aa7063ad000 R14: ffff9aa705919000 R15: 0000000000000123
      [    2.256809] FS:  00000000023078c0(0000) GS:ffff9aa707800000(0000) knlGS:0000000000000000
      [    2.258654] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [    2.259153] CR2: 00000000005511ae CR3: 0000000005872000 CR4: 00000000000006f0
      [    2.259801] Call Trace:
      [    2.260583]  build_node_manager+0x5cd/0x600
      [    2.260963]  f2fs_fill_super+0x66a/0x17c0
      [    2.261300]  ? f2fs_commit_super+0xe0/0xe0
      [    2.261622]  mount_bdev+0x16e/0x1a0
      [    2.261899]  mount_fs+0x30/0x150
      [    2.262398]  vfs_kern_mount.part.28+0x4f/0xf0
      [    2.262743]  do_mount+0x5d0/0xc60
      [    2.263010]  ? _copy_from_user+0x37/0x60
      [    2.263313]  ? memdup_user+0x39/0x60
      [    2.263692]  ksys_mount+0x7b/0xd0
      [    2.263960]  __x64_sys_mount+0x1c/0x20
      [    2.264268]  do_syscall_64+0x43/0xf0
      [    2.264560]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [    2.265095] RIP: 0033:0x48d31a
      [    2.265502] RSP: 002b:00007ffc6fe60a08 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
      [    2.266089] RAX: ffffffffffffffda RBX: 0000000000008000 RCX: 000000000048d31a
      [    2.266607] RDX: 00007ffc6fe62fa5 RSI: 00007ffc6fe62f9d RDI: 00007ffc6fe62f94
      [    2.267130] RBP: 00000000023078a0 R08: 0000000000000000 R09: 0000000000000000
      [    2.267670] R10: 0000000000008000 R11: 0000000000000246 R12: 0000000000000000
      [    2.268192] R13: 0000000000000000 R14: 00007ffc6fe60c78 R15: 0000000000000000
      [    2.268767] Code: e8 5f c3 ff ff 83 c3 01 41 83 c7 01 81 fb c7 01 00 00 74 48 44 39 7d 04 76 42 48 63 c3 48 8d 04 c0 41 8b 44 06 05 83 f8 ff 75 c1 <0f> 0b 49 8b 45 50 48 8d b8 b0 00 00 00 e8 37 59 69 00 b9 01 00
      [    2.270434] RIP: build_free_nids+0x337/0x3f0 RSP: ffffae7fc0857c50
      [    2.271426] ---[ end trace ab20c06cd3c8fde4 ]---
      
      During loading NAT entries, we will do sanity check, once the entry info
      is corrupted, it will cause BUG_ON directly to protect user data from
      being overwrited.
      
      In this case, it will be better to just return failure on mount() instead
      of panic, so that user can get hint from kmsg and try fsck for recovery
      immediately rather than after an abnormal reboot.
      
      https://bugzilla.kernel.org/show_bug.cgi?id=199769Reported-by: default avatarAnatoly Trosinenko <anatoly.trosinenko@gmail.com>
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      e2374015
    • Weichao Guo's avatar
      f2fs: support in-memory inode checksum when checking consistency · 54c55c4e
      Weichao Guo authored
      Enable in-memory inode checksum to protect metadata blocks from
      in-memory scribbles when checking consistency, which has no
      performance requirements.
      Signed-off-by: default avatarWeichao Guo <guoweichao@huawei.com>
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      54c55c4e
    • Chao Yu's avatar
      f2fs: fix error path of fill_super · 4e423832
      Chao Yu authored
      In fill_super, if root inode's attribute is incorrect, we need to
      call f2fs_destroy_stats to release stats memory.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      4e423832
    • Chao Yu's avatar
      f2fs: relocate readdir_ra configure initialization · 4cac90d5
      Chao Yu authored
      readdir_ra is sysfs configuration instead of mount option, so it should
      not be initialized in default_options(), otherwise after remount, it can
      be reset to be enabled which may not as user wish, so let's move it to
      f2fs_tuning_parameters().
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      4cac90d5
    • Chao Yu's avatar
      f2fs: move s_res{u,g}id initialization to default_options() · 0aa7e0f8
      Chao Yu authored
      Let default_options() initialize s_res{u,g}id with default value like
      other options.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      0aa7e0f8
    • Chao Yu's avatar
      f2fs: don't acquire orphan ino during recovery · 76a45e3c
      Chao Yu authored
      During orphan inode recovery, checkpoint should never succeed due to
      SBI_POR_DOING flag, so we don't need acquire orphan ino which only be
      used by checkpoint.
      Signed-off-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      76a45e3c
    • Jaegeuk Kim's avatar
      f2fs: avoid potential deadlock in f2fs_sbi_store · a1933c09
      Jaegeuk Kim authored
      [  155.018460] ======================================================
      [  155.021431] WARNING: possible circular locking dependency detected
      [  155.024339] 4.18.0-rc3+ #5 Tainted: G           OE
      [  155.026879] ------------------------------------------------------
      [  155.029783] umount/2901 is trying to acquire lock:
      [  155.032187] 00000000c4282f1f (kn->count#130){++++}, at: kernfs_remove+0x1f/0x30
      [  155.035439]
      [  155.035439] but task is already holding lock:
      [  155.038892] 0000000056e4307b (&type->s_umount_key#41){++++}, at: deactivate_super+0x33/0x50
      [  155.042602]
      [  155.042602] which lock already depends on the new lock.
      [  155.042602]
      [  155.047465]
      [  155.047465] the existing dependency chain (in reverse order) is:
      [  155.051354]
      [  155.051354] -> #1 (&type->s_umount_key#41){++++}:
      [  155.054768]        f2fs_sbi_store+0x61/0x460 [f2fs]
      [  155.057083]        kernfs_fop_write+0x113/0x1a0
      [  155.059277]        __vfs_write+0x36/0x180
      [  155.061250]        vfs_write+0xbe/0x1b0
      [  155.063179]        ksys_write+0x55/0xc0
      [  155.065068]        do_syscall_64+0x60/0x1b0
      [  155.067071]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [  155.069529]
      [  155.069529] -> #0 (kn->count#130){++++}:
      [  155.072421]        __kernfs_remove+0x26f/0x2e0
      [  155.074452]        kernfs_remove+0x1f/0x30
      [  155.076342]        kobject_del.part.5+0xe/0x40
      [  155.078354]        f2fs_put_super+0x12d/0x290 [f2fs]
      [  155.080500]        generic_shutdown_super+0x6c/0x110
      [  155.082655]        kill_block_super+0x21/0x50
      [  155.084634]        kill_f2fs_super+0x9c/0xc0 [f2fs]
      [  155.086726]        deactivate_locked_super+0x3f/0x70
      [  155.088826]        cleanup_mnt+0x3b/0x70
      [  155.090584]        task_work_run+0x93/0xc0
      [  155.092367]        exit_to_usermode_loop+0xf0/0x100
      [  155.094466]        do_syscall_64+0x162/0x1b0
      [  155.096312]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [  155.098603]
      [  155.098603] other info that might help us debug this:
      [  155.098603]
      [  155.102418]  Possible unsafe locking scenario:
      [  155.102418]
      [  155.105134]        CPU0                    CPU1
      [  155.107037]        ----                    ----
      [  155.108910]   lock(&type->s_umount_key#41);
      [  155.110674]                                lock(kn->count#130);
      [  155.113010]                                lock(&type->s_umount_key#41);
      [  155.115608]   lock(kn->count#130);
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      a1933c09
    • Jaegeuk Kim's avatar
      f2fs: indicate shutdown f2fs to allow unmount successfully · 83a3bfdb
      Jaegeuk Kim authored
      Once we shutdown f2fs, we have to flush stale pages in order to unmount
      the system. In order to make stable, we need to stop fault injection as well.
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      83a3bfdb
    • Jaegeuk Kim's avatar
      f2fs: keep meta pages in cp_error state · af697c0f
      Jaegeuk Kim authored
      It turns out losing meta pages in shutdown period makes f2fs very unstable
      so that I could see many unexpected error conditions.
      
      Let's keep meta pages for fault injection and sudden power-off tests.
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      af697c0f
  2. 15 Jul, 2018 2 commits
    • Jaegeuk Kim's avatar
      f2fs: do checkpoint in kill_sb · 1cb50f87
      Jaegeuk Kim authored
      When unmounting f2fs in force mode, we can get it stuck by io_schedule()
      by some pending IOs in meta_inode.
      
      io_schedule+0xd/0x30
      wait_on_page_bit_common+0xc6/0x130
      __filemap_fdatawait_range+0xbd/0x100
      filemap_fdatawait_keep_errors+0x15/0x40
      sync_inodes_sb+0x1cf/0x240
      sync_filesystem+0x52/0x90
      generic_shutdown_super+0x1d/0x110
      kill_f2fs_super+0x28/0x80 [f2fs]
      deactivate_locked_super+0x35/0x60
      cleanup_mnt+0x36/0x70
      task_work_run+0x79/0xa0
      exit_to_usermode_loop+0x62/0x70
      do_syscall_64+0xdb/0xf0
      entry_SYSCALL_64_after_hwframe+0x44/0xa9
      0xffffffffffffffff
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      1cb50f87
    • Jaegeuk Kim's avatar
      f2fs: allow wrong configured dio to buffered write · 8a56dd96
      Jaegeuk Kim authored
      This fixes to support dio having unaligned buffers as buffered writes.
      
      xfs_io -f -d -c "pwrite 0 512" $testfile
       -> okay
      
      xfs_io -f -d -c "pwrite 1 512" $testfile
       -> EINVAL
      Reviewed-by: default avatarChao Yu <yuchao0@huawei.com>
      Signed-off-by: default avatarJaegeuk Kim <jaegeuk@kernel.org>
      8a56dd96
  3. 12 Jul, 2018 1 commit
  4. 06 Jul, 2018 7 commits
  5. 05 Jul, 2018 10 commits
    • Linus Torvalds's avatar
      Fix up non-directory creation in SGID directories · 0fa3ecd8
      Linus Torvalds authored
      sgid directories have special semantics, making newly created files in
      the directory belong to the group of the directory, and newly created
      subdirectories will also become sgid.  This is historically used for
      group-shared directories.
      
      But group directories writable by non-group members should not imply
      that such non-group members can magically join the group, so make sure
      to clear the sgid bit on non-directories for non-members (but remember
      that sgid without group execute means "mandatory locking", just to
      confuse things even more).
      Reported-by: default avatarJann Horn <jannh@google.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0fa3ecd8
    • Linus Torvalds's avatar
      autofs: rename 'autofs' module back to 'autofs4' · d02d21ea
      Linus Torvalds authored
      It turns out that systemd has a bug: it wants to load the autofs module
      early because of some initialization ordering with udev, and it doesn't
      do that correctly.  Everywhere else it does the proper "look up module
      name" that does the proper alias resolution, but in that early code, it
      just uses a hardcoded "autofs4" for the module name.
      
      The result of that is that as of commit a2225d93 ("autofs: remove
      left-over autofs4 stubs"), you get
      
          systemd[1]: Failed to insert module 'autofs4': No such file or directory
      
      in the system logs, and a lack of module loading.  All this despite the
      fact that we had very clearly marked 'autofs4' as an alias for this
      module.
      
      What's so ridiculous about this is that literally everything else does
      the module alias handling correctly, including really old versions of
      systemd (that just used 'modprobe' to do this), and even all the other
      systemd module loading code.
      
      Only that special systemd early module load code is broken, hardcoding
      the module names for not just 'autofs4', but also "ipv6", "unix",
      "ip_tables" and "virtio_rng".  Very annoying.
      
      Instead of creating an _additional_ separate compatibility 'autofs4'
      module, just rely on the fact that everybody else gets this right, and
      just call the module 'autofs4' for compatibility reasons, with 'autofs'
      as the alias name.
      
      That will allow the systemd people to fix their bugs, adding the proper
      alias handling, and maybe even fix the name of the module to be just
      "autofs" (so that they can _test_ the alias handling).  And eventually,
      we can revert this silly compatibility hack.
      
      See also
      
          https://github.com/systemd/systemd/issues/9501
          https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=902946
      
      for the systemd bug reports upstream and in the Debian bug tracker
      respectively.
      
      Fixes: a2225d93 ("autofs: remove left-over autofs4 stubs")
      Reported-by: default avatarBen Hutchings <ben@decadent.org.uk>
      Reported-by: default avatarMichael Biebl <biebl@debian.org>
      Cc: Ian Kent <raven@themaw.net>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d02d21ea
    • Linus Torvalds's avatar
      Merge tag 'acpi-4.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 06c85639
      Linus Torvalds authored
      Pull ACPI fixes from Rafael Wysocki:
       "These fix a recent ACPICA regression, fix a battery driver regression
        introduced during the 4.17 cycle and fix up the recently added support
        for the PPTT ACPI table.
      
        Specifics:
      
         - Revert part of a recent ACPICA regression fix that added leading
           newlines to ACPICA error messages and made the kernel log look
           broken (Rafael Wysocki).
      
         - Fix an ACPI battery driver regression introduced during the 4.17
           cycle due to incorrect error handling that made Thinkpad 13 laptops
           crash on boot (Jouke Witteveen).
      
         - Fix up the recently added PPTT ACPI table support by covering the
           case when a PPTT structure represents a processors group correctly
           (Sudeep Holla)"
      
      * tag 'acpi-4.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        ACPI / battery: Safe unregistering of hooks
        ACPI / PPTT: use ACPI ID whenever ACPI_PPTT_ACPI_PROCESSOR_ID_VALID is set
        ACPICA: Drop leading newlines from error messages
      06c85639
    • Linus Torvalds's avatar
      Merge tag 'pm-4.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 90dc8b65
      Linus Torvalds authored
      Pull power management fixes from Rafael Wysocki:
       "These fix a PCI power management regression introduced during the 4.17
        cycle and fix up the recently added support for devices in multiple
        power domains.
      
        Specifics:
      
         - Resume parallel PCI (non-PCIe) bridges on suspend-to-RAM (ACP S3)
           to avoid confusing the platform firmware which started to happen
           after a core power management regression fix that went in during
           the 4.17 cycle (Rafael Wysocki).
      
         - Fix up the recently added support for devices in multiple power
           domains by avoiding to power up the entire domain unnecessarily
           when attaching a device to it (Ulf Hansson)"
      
      * tag 'pm-4.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        PM / Domains: Don't power on at attach for the multi PM domain case
        PCI / ACPI / PM: Resume bridges w/o drivers on suspend-to-RAM
      90dc8b65
    • Linus Torvalds's avatar
      Merge tag 'riscv-for-linus-4.18-rc4' of... · b19b9282
      Linus Torvalds authored
      Merge tag 'riscv-for-linus-4.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux
      
      Pull RISC-V fixes from Palmer Dabbelt:
       "This contains a handful of fixes for the RISC-V port:
      
         - A fix to R_RISCV_ADD32/R_RISCV_SUB32 relocations that allows
           modules that use these to load correctly.
      
         - The removal of of_platform_populate(), which is obselete.
      
         - The removal of irq-riscv-intc.h, which is obselete.
      
         - A fix to PTRACE_SETREGSET.
      
         - Fixes that allow the RV32I kernel to build (at least for Zong, I've
           got another patch on the mailing list that's necessary on my setup :)).
      
        I've just given these a defconfig build test"
      
      * tag 'riscv-for-linus-4.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux:
        RISC-V: Fix PTRACE_SETREGSET bug.
        RISC-V: Don't include irq-riscv-intc.h
        riscv: remove unnecessary of_platform_populate call
        RISC-V: fix R_RISCV_ADD32/R_RISCV_SUB32 relocations
        RISC-V: Change variable type for 32-bit compatible
        RISC-V: Add definiion of extract symbol's index and type for 32-bit
        RISC-V: Select GENERIC_UCMPDI2 on RV32I
        RISC-V: Add conditional macro for zone of DMA32
      b19b9282
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu · 760885f2
      Linus Torvalds authored
      Pull m68knommu fix from Greg Ungerer:
       "A single fix for breakage introduced in this merge window"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu:
        m68k: fix "bad page state" oops on ColdFire boot
      760885f2
    • Mikita Lipski's avatar
      drm/amd/display: add a check for display depth validity · 413ff0b9
      Mikita Lipski authored
      [why]
      HDMI 2.0 fails to validate 4K@60 timing with 10 bpc
      [how]
      Adding a helper function that would verify if the display depth
      assigned would pass a bandwidth validation.
      Drop the display depth by one level till calculated pixel clk
      is lower than maximum TMDS clk.
      
      Bugzilla: https://bugs.freedesktop.org/106959Tested-by: default avatarMike Lothian <mike@fireburn.co.uk>
      Reviewed-by: default avatarHarry Wentland <harry.wentland@amd.com>
      Signed-off-by: default avatarMikita Lipski <mikita.lipski@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      413ff0b9
    • Mikita Lipski's avatar
      drm/amd/display: adding ycbcr420 pixel encoding for hdmi · a6311be8
      Mikita Lipski authored
      [why]
      HDMI EDID's VSDB contains spectial timings for specifically
      YCbCr 4:2:0 colour space. In those cases we need to verify
      if the mode provided is one of the special ones has to use
      YCbCr 4:2:0 pixel encoding for display info.
      [how]
      Verify if the mode is using specific ycbcr420 colour space with
      the help of DRM helper function and assign the mode to use
      ycbcr420 pixel encoding.
      Tested-by: default avatarMike Lothian <mike@fireburn.co.uk>
      Reviewed-by: default avatarHarry Wentland <harry.wentland@amd.com>
      Signed-off-by: default avatarMikita Lipski <mikita.lipski@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      a6311be8
    • Rafael J. Wysocki's avatar
      Merge branches 'acpi-tables' and 'acpica' · df958569
      Rafael J. Wysocki authored
      Merge ACPICA regression fix and a fix for the recently added PPTT
      support.
      
      * acpi-tables:
        ACPI / PPTT: use ACPI ID whenever ACPI_PPTT_ACPI_PROCESSOR_ID_VALID is set
      
      * acpica:
        ACPICA: Drop leading newlines from error messages
      df958569
    • Rafael J. Wysocki's avatar
      Merge branch 'pm-pci' · 88b96088
      Rafael J. Wysocki authored
      Merge a PCI power management regression fix.
      
      * pm-pci:
        PCI / ACPI / PM: Resume bridges w/o drivers on suspend-to-RAM
      88b96088