1. 30 Mar, 2014 1 commit
  2. 29 Mar, 2014 10 commits
  3. 28 Mar, 2014 7 commits
    • Jeff Layton's avatar
      svcrdma: fix offset calculation for non-page aligned sge entries · 3cbe01a9
      Jeff Layton authored
      The xdr_off value in dma_map_xdr gets passed to ib_dma_map_page as the
      offset into the page to be mapped. This calculation does not correctly
      take into account the case where the data starts at some offset into
      the page. Increment the xdr_off by the page_base to ensure that it is
      respected.
      
      Cc: Tom Tucker <tom@opengridcomputing.com>
      Signed-off-by: default avatarJeff Layton <jlayton@redhat.com>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      3cbe01a9
    • Jeff Layton's avatar
      xprtrdma: add separate Kconfig options for NFSoRDMA client and server support · 2e8c12e1
      Jeff Layton authored
      There are two entirely separate modules under xprtrdma/ and there's no
      reason that enabling one should automatically enable the other. Add
      config options for each one so they can be enabled/disabled separately.
      Signed-off-by: default avatarJeff Layton <jlayton@redhat.com>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      2e8c12e1
    • Tom Tucker's avatar
      Fix regression in NFSRDMA server · 7e4359e2
      Tom Tucker authored
      The server regression was caused by the addition of rq_next_page
      (afc59400). There were a few places that
      were missed with the update of the rq_respages array.
      Signed-off-by: default avatarTom Tucker <tom@ogc.us>
      Tested-by: default avatarSteve Wise <swise@ogc.us>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      7e4359e2
    • J. Bruce Fields's avatar
      nfsd: typo in nfsd_rename comment · fbb74a34
      J. Bruce Fields authored
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      fbb74a34
    • Kinglong Mee's avatar
    • Jeff Layton's avatar
      lockd: ensure we tear down any live sockets when socket creation fails during lockd_up · 679b033d
      Jeff Layton authored
      We had a Fedora ABRT report with a stack trace like this:
      
      kernel BUG at net/sunrpc/svc.c:550!
      invalid opcode: 0000 [#1] SMP
      [...]
      CPU: 2 PID: 913 Comm: rpc.nfsd Not tainted 3.13.6-200.fc20.x86_64 #1
      Hardware name: Hewlett-Packard HP ProBook 4740s/1846, BIOS 68IRR Ver. F.40 01/29/2013
      task: ffff880146b00000 ti: ffff88003f9b8000 task.ti: ffff88003f9b8000
      RIP: 0010:[<ffffffffa0305fa8>]  [<ffffffffa0305fa8>] svc_destroy+0x128/0x130 [sunrpc]
      RSP: 0018:ffff88003f9b9de0  EFLAGS: 00010206
      RAX: ffff88003f829628 RBX: ffff88003f829600 RCX: 00000000000041ee
      RDX: 0000000000000000 RSI: 0000000000000286 RDI: 0000000000000286
      RBP: ffff88003f9b9de8 R08: 0000000000017360 R09: ffff88014fa97360
      R10: ffffffff8114ce57 R11: ffffea00051c9c00 R12: ffff88003f829600
      R13: 00000000ffffff9e R14: ffffffff81cc7cc0 R15: 0000000000000000
      FS:  00007f4fde284840(0000) GS:ffff88014fa80000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007f4fdf5192f8 CR3: 00000000a569a000 CR4: 00000000001407e0
      Stack:
       ffff88003f792300 ffff88003f9b9e18 ffffffffa02de02a 0000000000000000
       ffffffff81cc7cc0 ffff88003f9cb000 0000000000000008 ffff88003f9b9e60
       ffffffffa033bb35 ffffffff8131c86c ffff88003f9cb000 ffff8800a5715008
      Call Trace:
       [<ffffffffa02de02a>] lockd_up+0xaa/0x330 [lockd]
       [<ffffffffa033bb35>] nfsd_svc+0x1b5/0x2f0 [nfsd]
       [<ffffffff8131c86c>] ? simple_strtoull+0x2c/0x50
       [<ffffffffa033c630>] ? write_pool_threads+0x280/0x280 [nfsd]
       [<ffffffffa033c6bb>] write_threads+0x8b/0xf0 [nfsd]
       [<ffffffff8114efa4>] ? __get_free_pages+0x14/0x50
       [<ffffffff8114eff6>] ? get_zeroed_page+0x16/0x20
       [<ffffffff811dec51>] ? simple_transaction_get+0xb1/0xd0
       [<ffffffffa033c098>] nfsctl_transaction_write+0x48/0x80 [nfsd]
       [<ffffffff811b8b34>] vfs_write+0xb4/0x1f0
       [<ffffffff811c3f99>] ? putname+0x29/0x40
       [<ffffffff811b9569>] SyS_write+0x49/0xa0
       [<ffffffff810fc2a6>] ? __audit_syscall_exit+0x1f6/0x2a0
       [<ffffffff816962e9>] system_call_fastpath+0x16/0x1b
      Code: 31 c0 e8 82 db 37 e1 e9 2a ff ff ff 48 8b 07 8b 57 14 48 c7 c7 d5 c6 31 a0 48 8b 70 20 31 c0 e8 65 db 37 e1 e9 f4 fe ff ff 0f 0b <0f> 0b 66 0f 1f 44 00 00 0f 1f 44 00 00 55 48 89 e5 41 56 41 55
      RIP  [<ffffffffa0305fa8>] svc_destroy+0x128/0x130 [sunrpc]
       RSP <ffff88003f9b9de0>
      
      Evidently, we created some lockd sockets and then failed to create
      others. make_socks then returned an error and we tried to tear down the
      svc, but svc->sv_permsocks was not empty so we ended up tripping over
      the BUG() in svc_destroy().
      
      Fix this by ensuring that we tear down any live sockets we created when
      socket creation is going to return an error.
      
      Fixes: 786185b5 (SUNRPC: move per-net operations from...)
      Reported-by: default avatarRaphos <raphoszap@laposte.net>
      Signed-off-by: default avatarJeff Layton <jlayton@redhat.com>
      Reviewed-by: default avatarStanislav Kinsbursky <skinsbursky@parallels.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      679b033d
    • Kinglong Mee's avatar
      NFSD: Traverse unconfirmed client through hash-table · 2b905635
      Kinglong Mee authored
      When stopping nfsd, I got BUG messages, and soft lockup messages,
      The problem is cuased by double rb_erase() in nfs4_state_destroy_net()
      and destroy_client().
      
      This patch just let nfsd traversing unconfirmed client through
      hash-table instead of rbtree.
      
      [ 2325.021995] BUG: unable to handle kernel NULL pointer dereference at
                (null)
      [ 2325.022809] IP: [<ffffffff8133c18c>] rb_erase+0x14c/0x390
      [ 2325.022982] PGD 7a91b067 PUD 7a33d067 PMD 0
      [ 2325.022982] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
      [ 2325.022982] Modules linked in: nfsd(OF) cfg80211 rfkill bridge stp
      llc snd_intel8x0 snd_ac97_codec ac97_bus auth_rpcgss nfs_acl serio_raw
      e1000 i2c_piix4 ppdev snd_pcm snd_timer lockd pcspkr joydev parport_pc
      snd parport i2c_core soundcore microcode sunrpc ata_generic pata_acpi
      [last unloaded: nfsd]
      [ 2325.022982] CPU: 1 PID: 2123 Comm: nfsd Tainted: GF          O
      3.14.0-rc8+ #2
      [ 2325.022982] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS
      VirtualBox 12/01/2006
      [ 2325.022982] task: ffff88007b384800 ti: ffff8800797f6000 task.ti:
      ffff8800797f6000
      [ 2325.022982] RIP: 0010:[<ffffffff8133c18c>]  [<ffffffff8133c18c>]
      rb_erase+0x14c/0x390
      [ 2325.022982] RSP: 0018:ffff8800797f7d98  EFLAGS: 00010246
      [ 2325.022982] RAX: ffff880079c1f010 RBX: ffff880079f4c828 RCX:
      0000000000000000
      [ 2325.022982] RDX: 0000000000000000 RSI: ffff880079bcb070 RDI:
      ffff880079f4c810
      [ 2325.022982] RBP: ffff8800797f7d98 R08: 0000000000000000 R09:
      ffff88007964fc70
      [ 2325.022982] R10: 0000000000000000 R11: 0000000000000400 R12:
      ffff880079f4c800
      [ 2325.022982] R13: ffff880079bcb000 R14: ffff8800797f7da8 R15:
      ffff880079f4c860
      [ 2325.022982] FS:  0000000000000000(0000) GS:ffff88007f900000(0000)
      knlGS:0000000000000000
      [ 2325.022982] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      [ 2325.022982] CR2: 0000000000000000 CR3: 000000007a3ef000 CR4:
      00000000000006e0
      [ 2325.022982] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
      0000000000000000
      [ 2325.022982] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
      0000000000000400
      [ 2325.022982] Stack:
      [ 2325.022982]  ffff8800797f7de0 ffffffffa0191c6e ffff8800797f7da8
      ffff8800797f7da8
      [ 2325.022982]  ffff880079f4c810 ffff880079bcb000 ffffffff81cc26c0
      ffff880079c1f010
      [ 2325.022982]  ffff880079bcb070 ffff8800797f7e28 ffffffffa01977f2
      ffff8800797f7df0
      [ 2325.022982] Call Trace:
      [ 2325.022982]  [<ffffffffa0191c6e>] destroy_client+0x32e/0x3b0 [nfsd]
      [ 2325.022982]  [<ffffffffa01977f2>] nfs4_state_shutdown_net+0x1a2/0x220
      [nfsd]
      [ 2325.022982]  [<ffffffffa01700b8>] nfsd_shutdown_net+0x38/0x70 [nfsd]
      [ 2325.022982]  [<ffffffffa017013e>] nfsd_last_thread+0x4e/0x80 [nfsd]
      [ 2325.022982]  [<ffffffffa001f1eb>] svc_shutdown_net+0x2b/0x30 [sunrpc]
      [ 2325.022982]  [<ffffffffa017064b>] nfsd_destroy+0x5b/0x80 [nfsd]
      [ 2325.022982]  [<ffffffffa0170773>] nfsd+0x103/0x130 [nfsd]
      [ 2325.022982]  [<ffffffffa0170670>] ? nfsd_destroy+0x80/0x80 [nfsd]
      [ 2325.022982]  [<ffffffff810a8232>] kthread+0xd2/0xf0
      [ 2325.022982]  [<ffffffff810a8160>] ? insert_kthread_work+0x40/0x40
      [ 2325.022982]  [<ffffffff816c493c>] ret_from_fork+0x7c/0xb0
      [ 2325.022982]  [<ffffffff810a8160>] ? insert_kthread_work+0x40/0x40
      [ 2325.022982] Code: 48 83 e1 fc 48 89 10 0f 84 02 01 00 00 48 3b 41 10
      0f 84 08 01 00 00 48 89 51 08 48 89 fa e9 74 ff ff ff 0f 1f 40 00 48 8b
      50 10 <f6> 02 01 0f 84 93 00 00 00 48 8b 7a 10 48 85 ff 74 05 f6 07 01
      [ 2325.022982] RIP  [<ffffffff8133c18c>] rb_erase+0x14c/0x390
      [ 2325.022982]  RSP <ffff8800797f7d98>
      [ 2325.022982] CR2: 0000000000000000
      [ 2325.022982] ---[ end trace 28c27ed011655e57 ]---
      
      [  228.064071] BUG: soft lockup - CPU#0 stuck for 22s! [nfsd:558]
      [  228.064428] Modules linked in: ip6t_rpfilter ip6t_REJECT cfg80211
      xt_conntrack rfkill ebtable_nat ebtable_broute bridge stp llc
      ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6
      nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw
      ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4
      nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security
      iptable_raw nfsd(OF) auth_rpcgss nfs_acl lockd snd_intel8x0
      snd_ac97_codec ac97_bus joydev snd_pcm snd_timer e1000 sunrpc snd ppdev
      parport_pc serio_raw pcspkr i2c_piix4 microcode parport soundcore
      i2c_core ata_generic pata_acpi
      [  228.064539] CPU: 0 PID: 558 Comm: nfsd Tainted: GF          O
      3.14.0-rc8+ #2
      [  228.064539] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS
      VirtualBox 12/01/2006
      [  228.064539] task: ffff880076adec00 ti: ffff880074616000 task.ti:
      ffff880074616000
      [  228.064539] RIP: 0010:[<ffffffff8133ba17>]  [<ffffffff8133ba17>]
      rb_next+0x27/0x50
      [  228.064539] RSP: 0018:ffff880074617de0  EFLAGS: 00000282
      [  228.064539] RAX: ffff880074478010 RBX: ffff88007446f860 RCX:
      0000000000000014
      [  228.064539] RDX: ffff880074478010 RSI: 0000000000000000 RDI:
      ffff880074478010
      [  228.064539] RBP: ffff880074617de0 R08: 0000000000000000 R09:
      0000000000000012
      [  228.064539] R10: 0000000000000001 R11: ffffffffffffffec R12:
      ffffea0001d11a00
      [  228.064539] R13: ffff88007f401400 R14: ffff88007446f800 R15:
      ffff880074617d50
      [  228.064539] FS:  0000000000000000(0000) GS:ffff88007f800000(0000)
      knlGS:0000000000000000
      [  228.064539] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      [  228.064539] CR2: 00007fe9ac6ec000 CR3: 000000007a5d6000 CR4:
      00000000000006f0
      [  228.064539] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
      0000000000000000
      [  228.064539] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
      0000000000000400
      [  228.064539] Stack:
      [  228.064539]  ffff880074617e28 ffffffffa01ab7db ffff880074617df0
      ffff880074617df0
      [  228.064539]  ffff880079273000 ffffffff81cc26c0 ffffffff81cc26c0
      0000000000000000
      [  228.064539]  0000000000000000 ffff880074617e48 ffffffffa01840b8
      ffffffff81cc26c0
      [  228.064539] Call Trace:
      [  228.064539]  [<ffffffffa01ab7db>] nfs4_state_shutdown_net+0x18b/0x220
      [nfsd]
      [  228.064539]  [<ffffffffa01840b8>] nfsd_shutdown_net+0x38/0x70 [nfsd]
      [  228.064539]  [<ffffffffa018413e>] nfsd_last_thread+0x4e/0x80 [nfsd]
      [  228.064539]  [<ffffffffa00aa1eb>] svc_shutdown_net+0x2b/0x30 [sunrpc]
      [  228.064539]  [<ffffffffa018464b>] nfsd_destroy+0x5b/0x80 [nfsd]
      [  228.064539]  [<ffffffffa0184773>] nfsd+0x103/0x130 [nfsd]
      [  228.064539]  [<ffffffffa0184670>] ? nfsd_destroy+0x80/0x80 [nfsd]
      [  228.064539]  [<ffffffff810a8232>] kthread+0xd2/0xf0
      [  228.064539]  [<ffffffff810a8160>] ? insert_kthread_work+0x40/0x40
      [  228.064539]  [<ffffffff816c493c>] ret_from_fork+0x7c/0xb0
      [  228.064539]  [<ffffffff810a8160>] ? insert_kthread_work+0x40/0x40
      [  228.064539] Code: 1f 44 00 00 55 48 8b 17 48 89 e5 48 39 d7 74 3b 48
      8b 47 08 48 85 c0 75 0e eb 25 66 0f 1f 84 00 00 00 00 00 48 89 d0 48 8b
      50 10 <48> 85 d2 75 f4 5d c3 66 90 48 3b 78 08 75 f6 48 8b 10 48 89 c7
      
      Fixes: ac55fdc4 (nfsd: move the confirmed and unconfirmed hlists...)
      Signed-off-by: default avatarKinglong Mee <kinglongmee@gmail.com>
      Cc: stable@vger.kernel.org
      Reviewed-by: default avatarJeff Layton <jlayton@redhat.com>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      2b905635
  4. 27 Mar, 2014 9 commits
  5. 18 Feb, 2014 1 commit
  6. 13 Feb, 2014 1 commit
    • NeilBrown's avatar
      lockd: send correct lock when granting a delayed lock. · 2ec197db
      NeilBrown authored
      If an NFS client attempts to get a lock (using NLM) and the lock is
      not available, the server will remember the request and when the lock
      becomes available it will send a GRANT request to the client to
      provide the lock.
      
      If the client already held an adjacent lock, the GRANT callback will
      report the union of the existing and new locks, which can confuse the
      client.
      
      This happens because __posix_lock_file (called by vfs_lock_file)
      updates the passed-in file_lock structure when adjacent or
      over-lapping locks are found.
      
      To avoid this problem we take a copy of the two fields that can
      be changed (fl_start and fl_end) before the call and restore them
      afterwards.
      An alternate would be to allocate a 'struct file_lock', initialise it,
      use locks_copy_lock() to take a copy, then locks_release_private()
      after the vfs_lock_file() call.  But that is a lot more work.
      Reported-by: default avatarOlaf Kirch <okir@suse.com>
      Signed-off-by: default avatarNeilBrown <neilb@suse.de>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      
      --
      v1 had a couple of issues (large on-stack struct and didn't really work properly).
      This version is much better tested.
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      2ec197db
  7. 11 Feb, 2014 1 commit
    • J. Bruce Fields's avatar
      nfsd4: fix acl buffer overrun · 09bdc2d7
      J. Bruce Fields authored
      4ac7249e "nfsd: use get_acl and
      ->set_acl" forgets to set the size in the case get_acl() succeeds, so
      _posix_to_nfsv4_one() can then write past the end of its allocation.
      Symptoms were slab corruption warnings.
      
      Also, some minor cleanup while we're here.  (Among other things, note
      that the first few lines guarantee that pacl is non-NULL.)
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      09bdc2d7
  8. 10 Feb, 2014 4 commits
  9. 09 Feb, 2014 6 commits
    • Al Viro's avatar
      fix a kmap leak in virtio_console · c9efe511
      Al Viro authored
      While we are at it, don't do kmap() under kmap_atomic(), *especially*
      for a page we'd allocated with GFP_KERNEL.  It's spelled "page_address",
      and had that been more than that, we'd have a real trouble - kmap_high()
      can block, and doing that while holding kmap_atomic() is a Bad Idea(tm).
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      c9efe511
    • Al Viro's avatar
      fix O_SYNC|O_APPEND syncing the wrong range on write() · d311d79d
      Al Viro authored
      It actually goes back to 2004 ([PATCH] Concurrent O_SYNC write support)
      when sync_page_range() had been introduced; generic_file_write{,v}() correctly
      synced
      	pos_after_write - written .. pos_after_write - 1
      but generic_file_aio_write() synced
      	pos_before_write .. pos_before_write + written - 1
      instead.  Which is not the same thing with O_APPEND, obviously.
      A couple of years later correct variant had been killed off when
      everything switched to use of generic_file_aio_write().
      
      All users of generic_file_aio_write() are affected, and the same bug
      has been copied into other instances of ->aio_write().
      
      The fix is trivial; the only subtle point is that generic_write_sync()
      ought to be inlined to avoid calculations useless for the majority of
      calls.
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      d311d79d
    • Linus Torvalds's avatar
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs · 9c1db779
      Linus Torvalds authored
      Pull btrfs fixes from Chris Mason:
       "This is a small collection of fixes"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
        Btrfs: fix data corruption when reading/updating compressed extents
        Btrfs: don't loop forever if we can't run because of the tree mod log
        btrfs: reserve no transaction units in btrfs_ioctl_set_features
        btrfs: commit transaction after setting label and features
        Btrfs: fix assert screwup for the pending move stuff
      9c1db779
    • Linus Torvalds's avatar
      Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 6f2a1c1e
      Linus Torvalds authored
      Pull perf fixes from Ingo Molnar:
       "Tooling fixes, mostly related to the KASLR fallout, but also other
        fixes"
      
      * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        perf buildid-cache: Check relocation when checking for existing kcore
        perf tools: Adjust kallsyms for relocated kernel
        perf tests: No need to set up ref_reloc_sym
        perf symbols: Prevent the use of kcore if the kernel has moved
        perf record: Get ref_reloc_sym from kernel map
        perf machine: Set up ref_reloc_sym in machine__create_kernel_maps()
        perf machine: Add machine__get_kallsyms_filename()
        perf tools: Add kallsyms__get_function_start()
        perf symbols: Fix symbol annotation for relocated kernel
        perf tools: Fix include for non x86 architectures
        perf tools: Fix AAAAARGH64 memory barriers
        perf tools: Demangle kernel and kernel module symbols too
        perf/doc: Remove mention of non-existent set_perf_event_pending() from design.txt
      6f2a1c1e
    • Filipe David Borba Manana's avatar
      Btrfs: fix data corruption when reading/updating compressed extents · a2aa75e1
      Filipe David Borba Manana authored
      When using a mix of compressed file extents and prealloc extents, it
      is possible to fill a page of a file with random, garbage data from
      some unrelated previous use of the page, instead of a sequence of zeroes.
      
      A simple sequence of steps to get into such case, taken from the test
      case I made for xfstests, is:
      
         _scratch_mkfs
         _scratch_mount "-o compress-force=lzo"
         $XFS_IO_PROG -f -c "pwrite -S 0x06 -b 18670 266978 18670" $SCRATCH_MNT/foobar
         $XFS_IO_PROG -c "falloc 26450 665194" $SCRATCH_MNT/foobar
         $XFS_IO_PROG -c "truncate 542872" $SCRATCH_MNT/foobar
         $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/foobar
      
      This results in the following file items in the fs tree:
      
         item 4 key (257 INODE_ITEM 0) itemoff 15879 itemsize 160
             inode generation 6 transid 6 size 542872 block group 0 mode 100600
         item 5 key (257 INODE_REF 256) itemoff 15863 itemsize 16
             inode ref index 2 namelen 6 name: foobar
         item 6 key (257 EXTENT_DATA 0) itemoff 15810 itemsize 53
             extent data disk byte 0 nr 0 gen 6
             extent data offset 0 nr 24576 ram 266240
             extent compression 0
         item 7 key (257 EXTENT_DATA 24576) itemoff 15757 itemsize 53
             prealloc data disk byte 12849152 nr 241664 gen 6
             prealloc data offset 0 nr 241664
         item 8 key (257 EXTENT_DATA 266240) itemoff 15704 itemsize 53
             extent data disk byte 12845056 nr 4096 gen 6
             extent data offset 0 nr 20480 ram 20480
             extent compression 2
         item 9 key (257 EXTENT_DATA 286720) itemoff 15651 itemsize 53
             prealloc data disk byte 13090816 nr 405504 gen 6
             prealloc data offset 0 nr 258048
      
      The on disk extent at offset 266240 (which corresponds to 1 single disk block),
      contains 5 compressed chunks of file data. Each of the first 4 compress 4096
      bytes of file data, while the last one only compresses 3024 bytes of file data.
      Therefore a read into the file region [285648 ; 286720[ (length = 4096 - 3024 =
      1072 bytes) should always return zeroes (our next extent is a prealloc one).
      
      The solution here is the compression code path to zero the remaining (untouched)
      bytes of the last page it uncompressed data into, as the information about how
      much space the file data consumes in the last page is not known in the upper layer
      fs/btrfs/extent_io.c:__do_readpage(). In __do_readpage we were correctly zeroing
      the remainder of the page but only if it corresponds to the last page of the inode
      and if the inode's size is not a multiple of the page size.
      
      This would cause not only returning random data on reads, but also permanently
      storing random data when updating parts of the region that should be zeroed.
      For the example above, it means updating a single byte in the region [285648 ; 286720[
      would store that byte correctly but also store random data on disk.
      
      A test case for xfstests follows soon.
      Signed-off-by: default avatarFilipe David Borba Manana <fdmanana@gmail.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      a2aa75e1
    • Josef Bacik's avatar
      Btrfs: don't loop forever if we can't run because of the tree mod log · 27a377db
      Josef Bacik authored
      A user reported a 100% cpu hang with my new delayed ref code.  Turns out I
      forgot to increase the count check when we can't run a delayed ref because of
      the tree mod log.  If we can't run any delayed refs during this there is no
      point in continuing to look, and we need to break out.  Thanks,
      Signed-off-by: default avatarJosef Bacik <jbacik@fb.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      27a377db