aboutsummaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2014-06-11nfsd: check passed socket's net matches NFSd superblock's oneStanislav Kinsbursky
commit 3064639423c48d6e0eb9ecc27c512a58e38c6c57 upstream. There could be a case, when NFSd file system is mounted in network, different to socket's one, like below: "ip netns exec" creates new network and mount namespace, which duplicates NFSd mount point, created in init_net context. And thus NFS server stop in nested network context leads to RPCBIND client destruction in init_net. Then, on NFSd start in nested network context, rpc.nfsd process creates socket in nested net and passes it into "write_ports", which leads to RPCBIND sockets creation in init_net context because of the same reason (NFSd monut point was created in init_net context). An attempt to register passed socket in nested net leads to panic, because no RPCBIND client present in nexted network namespace. This patch add check that passed socket's net matches NFSd superblock's one. And returns -EINVAL error to user psace otherwise. v2: Put socket on exit. Reported-by: Weng Meiling <wengmeiling.weng@huawei.com> Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com> [wengmeiling: backport to 3.4: adjust context] Signed-off-by: Weng Meiling <wengmeiling.weng@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-11nfsd: containerize NFSd filesystemStanislav Kinsbursky
note: this backport is just for the null pointer problem when start nfsd in none init netns. The nfsd is still not containerized. commit 11f779421a39b86da8a523d97e5fd3477878d44f upstream. This patch makes NFSD file system superblock to be created per net. This makes possible to get proper network namespace from superblock instead of using hard-coded "init_net". Note: NFSd fs super-block holds network namespace. This garantees, that network namespace won't disappear from underneath of it. This, obviously, means, that in case of kill of a container's "init" (which is not a mount namespace, but network namespace creator) netowrk namespace won't be destroyed. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> [wengmeiling: backport to 3.4: - export cache not per netns - NFSD service structure not per netns] Signed-off-by: Weng Meiling <wengmeiling.weng@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-11nfsd: pass proper net to nfsd_destroy() from NFSd kthreadsStanislav Kinsbursky
commit 88c47666171989ed4c5b1a5687df09511e8c5e35 upstream. Since NFSd service is per-net now, we have to pass proper network context in nfsd_shutdown() from NFSd kthreads. The simplest way I found is to get proper net from one of transports with permanent sockets. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> [wengmeiling: backport to 3.4: adjust context] Signed-off-by: Weng Meiling <wengmeiling.weng@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-11nfsd: pass net to __write_ports() and downStanislav Kinsbursky
commit 081603520b25f7b35ef63a363376a17c36ef74ed upstream. Precursor patch. Hard-coded "init_net" will be replaced by proper one in future. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> [wengmeiling: backport to 3.4: - adjust context - add net_ns parameter to __write_ports_delxprt()] Signed-off-by: Weng Meiling <wengmeiling.weng@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-11nfsd: pass net to nfsd_set_nrthreads()Stanislav Kinsbursky
commit 3938a0d5eb5effcc89c6909741403f4e6a37252d upstream. Precursor patch. Hard-coded "init_net" will be replaced by proper one in future. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> [wengmeiling: backport to 3.4: adjust context] Signed-off-by: Weng Meiling <wengmeiling.weng@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-11nfsd: pass net to nfsd_svc()Stanislav Kinsbursky
commit d41a9417cd89a69f58a26935034b4264a2d882d6 upstream. Precursor patch. Hard-coded "init_net" will be replaced by proper one in future. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> [wengmeiling: backport to 3.4: - adjust context - one more parameter(int port) for nfsd_svc()] Signed-off-by: Weng Meiling <wengmeiling.weng@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-11nfsd: pass net to nfsd_create_serv()Stanislav Kinsbursky
commit 6777436b0f072fb20a025a73e9b67a35ad8a5451 upstream. Precursor patch. Hard-coded "init_net" will be replaced by proper one in future. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> [wengmeiling: backport to 3.4: adjust context] Signed-off-by: Weng Meiling <wengmeiling.weng@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-11nfsd: pass net to nfsd_startup() and nfsd_shutdown()Stanislav Kinsbursky
commit db42d1a76a8dfcaba7a2dc9c591fa4e231db22b3 upstream. Precursor patch. Hard-coded "init_net" will be replaced by proper one in future. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> [wengmeiling: backport to 3.4: - adjust context - one more parameter(int port) for nfsd_startup() - no net ns initialization in nfsd_shutdown() - pass @net to lockd_up() in nfsd_startup() - pass @net to lockd_down() in nfsd_shutdown()] Signed-off-by: Weng Meiling <wengmeiling.weng@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-11nfsd: pass net to nfsd_init_socks()Stanislav Kinsbursky
commit db6e182c17cb1a7069f7f8924721ce58ac05d9a3 upstream. Precursor patch. Hard-coded "init_net" will be replaced by proper one in future. Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> [wengmeiling: backport to 3.4: - adjust context - one more parameter(int port) for nfsd_init_socks() - net initialization in nfsd_startup()] Signed-off-by: Weng Meiling <wengmeiling.weng@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-07nfsd: nfsd_open: when dentry_open returns an error do not propagate as ↵Harshula Jayasuriya
struct file commit e4daf1ffbe6cc3b12aab4d604e627829e93e9914 upstream. The following call chain: ------------------------------------------------------------ nfs4_get_vfs_file - nfsd_open - dentry_open - do_dentry_open - __get_file_write_access - get_write_access - return atomic_inc_unless_negative(&inode->i_writecount) ? 0 : -ETXTBSY; ------------------------------------------------------------ can result in the following state: ------------------------------------------------------------ struct nfs4_file { ... fi_fds = {0xffff880c1fa65c80, 0xffffffffffffffe6, 0x0}, fi_access = {{ counter = 0x1 }, { counter = 0x0 }}, ... ------------------------------------------------------------ 1) First time around, in nfs4_get_vfs_file() fp->fi_fds[O_WRONLY] is NULL, hence nfsd_open() is called where we get status set to an error and fp->fi_fds[O_WRONLY] to -ETXTBSY. Thus we do not reach nfs4_file_get_access() and fi_access[O_WRONLY] is not incremented. 2) Second time around, in nfs4_get_vfs_file() fp->fi_fds[O_WRONLY] is NOT NULL (-ETXTBSY), so nfsd_open() is NOT called, but nfs4_file_get_access() IS called and fi_access[O_WRONLY] is incremented. Thus we leave a landmine in the form of the nfs4_file data structure in an incorrect state. 3) Eventually, when __nfs4_file_put_access() is called it finds fi_access[O_WRONLY] being non-zero, it decrements it and calls nfs4_file_put_fd() which tries to fput -ETXTBSY. ------------------------------------------------------------ ... [exception RIP: fput+0x9] RIP: ffffffff81177fa9 RSP: ffff88062e365c90 RFLAGS: 00010282 RAX: ffff880c2b3d99cc RBX: ffff880c2b3d9978 RCX: 0000000000000002 RDX: dead000000100101 RSI: 0000000000000001 RDI: ffffffffffffffe6 RBP: ffff88062e365c90 R8: ffff88041fe797d8 R9: ffff88062e365d58 R10: 0000000000000008 R11: 0000000000000000 R12: 0000000000000001 R13: 0000000000000007 R14: 0000000000000000 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #9 [ffff88062e365c98] __nfs4_file_put_access at ffffffffa0562334 [nfsd] #10 [ffff88062e365cc8] nfs4_file_put_access at ffffffffa05623ab [nfsd] #11 [ffff88062e365ce8] free_generic_stateid at ffffffffa056634d [nfsd] #12 [ffff88062e365d18] release_open_stateid at ffffffffa0566e4b [nfsd] #13 [ffff88062e365d38] nfsd4_close at ffffffffa0567401 [nfsd] #14 [ffff88062e365d88] nfsd4_proc_compound at ffffffffa0557f28 [nfsd] #15 [ffff88062e365dd8] nfsd_dispatch at ffffffffa054543e [nfsd] #16 [ffff88062e365e18] svc_process_common at ffffffffa04ba5a4 [sunrpc] #17 [ffff88062e365e98] svc_process at ffffffffa04babe0 [sunrpc] #18 [ffff88062e365eb8] nfsd at ffffffffa0545b62 [nfsd] #19 [ffff88062e365ee8] kthread at ffffffff81090886 #20 [ffff88062e365f48] kernel_thread at ffffffff8100c14a ------------------------------------------------------------ Signed-off-by: Harshula Jayasuriya <harshula@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> [xr: Backported to 3.4: adjust context] Signed-off-by: Rui Xiang <rui.xiang@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-07NFSv4.1: Handle NFS4ERR_DELAY when resetting the NFSv4.1 sessionTrond Myklebust
commit c489ee290bdbbace6bb63ebe6ebd4dd605819495 upstream. NFS4ERR_DELAY is a legal reply when we call DESTROY_SESSION. It usually means that the server is busy handling an unfinished RPC request. Just sleep for a second and then retry. We also need to be able to handle the NFS4ERR_BACK_CHAN_BUSY return value. If the NFS server has outstanding callbacks, we just want to similarly sleep & retry. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Cc: Rui Xiang <rui.xiang@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-07NFSv4.1: Don't decode skipped layoutgetsWeston Andros Adamson
commit 085b7a45c63d3da5be155faab9249a5cab224561 upstream. layoutget's prepare hook can call rpc_exit with status = NFS4_OK (0). Because of this, nfs4_proc_layoutget can't depend on a 0 status to mean that the RPC was successfully sent, received and parsed. To fix this, use the result's len member to see if parsing took place. This fixes the following OOPS -- calling xdr_init_decode() with a buffer length 0 doesn't set the stream's 'p' member and ends up using uninitialized memory in filelayout_decode_layout. BUG: unable to handle kernel paging request at 0000000000008050 IP: [<ffffffff81282e78>] memcpy+0x18/0x120 PGD 0 Oops: 0000 [#1] SMP last sysfs file: /sys/devices/pci0000:00/0000:00:11.0/0000:02:01.0/irq CPU 1 Modules linked in: nfs_layout_nfsv41_files nfs lockd fscache auth_rpcgss nfs_acl autofs4 sunrpc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 dm_mirror dm_region_hash dm_log dm_mod ppdev parport_pc parport snd_ens1371 snd_rawmidi snd_ac97_codec ac97_bus snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc e1000 microcode vmware_balloon i2c_piix4 i2c_core sg shpchp ext4 mbcache jbd2 sr_mod cdrom sd_mod crc_t10dif pata_acpi ata_generic ata_piix mptspi mptscsih mptbase scsi_transport_spi [last unloaded: speedstep_lib] Pid: 1665, comm: flush-0:22 Not tainted 2.6.32-356-test-2 #2 VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform RIP: 0010:[<ffffffff81282e78>] [<ffffffff81282e78>] memcpy+0x18/0x120 RSP: 0018:ffff88003dfab588 EFLAGS: 00010206 RAX: ffff88003dc42000 RBX: ffff88003dfab610 RCX: 0000000000000009 RDX: 000000003f807ff0 RSI: 0000000000008050 RDI: ffff88003dc42000 RBP: ffff88003dfab5b0 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000080 R12: 0000000000000024 R13: ffff88003dc42000 R14: ffff88003f808030 R15: ffff88003dfab6a0 FS: 0000000000000000(0000) GS:ffff880003420000(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 0000000000008050 CR3: 000000003bc92000 CR4: 00000000001407e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process flush-0:22 (pid: 1665, threadinfo ffff88003dfaa000, task ffff880037f77540) Stack: ffffffffa0398ac1 ffff8800397c5940 ffff88003dfab610 ffff88003dfab6a0 <d> ffff88003dfab5d0 ffff88003dfab680 ffffffffa01c150b ffffea0000d82e70 <d> 000000508116713b 0000000000000000 0000000000000000 0000000000000000 Call Trace: [<ffffffffa0398ac1>] ? xdr_inline_decode+0xb1/0x120 [sunrpc] [<ffffffffa01c150b>] filelayout_decode_layout+0xeb/0x350 [nfs_layout_nfsv41_files] [<ffffffffa01c17fc>] filelayout_alloc_lseg+0x8c/0x3c0 [nfs_layout_nfsv41_files] [<ffffffff8150e6ce>] ? __wait_on_bit+0x7e/0x90 Signed-off-by: Weston Andros Adamson <dros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Cc: Rui Xiang <rui.xiang@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-07NFSv4.1: Fix a race in pNFS layoutcommitTrond Myklebust
commit a073dbff359f4741013ae4b8395f5364c5e00b48 upstream. We need to clear the NFS_LSEG_LAYOUTCOMMIT bits atomically with the NFS_INO_LAYOUTCOMMIT bit, otherwise we may end up with situations where the two are out of sync. The first half of the problem is to ensure that pnfs_layoutcommit_inode clears the NFS_LSEG_LAYOUTCOMMIT bit through pnfs_list_write_lseg. We still need to keep the reference to those segments until the RPC call is finished, so in order to make it clear _where_ those references come from, we add a helper pnfs_list_write_lseg_done() that cleans up after pnfs_list_write_lseg. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Acked-by: Benny Halevy <bhalevy@tonian.com> [bwh: Backported to 3.2: s/pnfs_put_lseg/put_lseg/] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Cc: Rui Xiang <rui.xiang@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-07NFS: nfs_getaclargs.acl_len is a size_tChuck Lever
commit 56d08fef2369d5ca9ad2e1fc697f5379fd8af751 upstream. Squelch compiler warnings: fs/nfs/nfs4proc.c: In function ‘__nfs4_get_acl_uncached’: fs/nfs/nfs4proc.c:3811:14: warning: comparison between signed and unsigned integer expressions [-Wsign-compare] fs/nfs/nfs4proc.c:3818:15: warning: comparison between signed and unsigned integer expressions [-Wsign-compare] Introduced by commit bf118a34 "NFSv4: include bitmap in nfsv4 get acl data", Dec 7, 2011. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Cc: Rui Xiang <rui.xiang@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-07nfsd: don't run get_file if nfs4_preprocess_stateid_op return errorfanchaoting
commit b022032e195ffca83d7002d6b84297d796ed443b upstream. we should return error status directly when nfs4_preprocess_stateid_op return error. Signed-off-by: fanchaoting <fanchaoting@cn.fujitsu.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Cc: Rui Xiang <rui.xiang@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-07NFSv4.1: integer overflow in decode_cb_sequence_args()Dan Carpenter
commit 0439f31c35d1da0b28988b308ea455e38e6a350d upstream. This seems like it could overflow on 32 bits. Use kmalloc_array() which has overflow protection built in. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Cc: Rui Xiang <rui.xiang@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-07nfsd4: fix xdr decoding of large non-write compoundsJ. Bruce Fields
commit 365da4adebb1c012febf81019ad3dc5bb52e2a13 upstream. This fixes a regression from 247500820ebd02ad87525db5d9b199e5b66f6636 "nfsd4: fix decoding of compounds across page boundaries". The previous code was correct: argp->pagelist is initialized in nfs4svc_deocde_compoundargs to rqstp->rq_arg.pages, and is therefore a pointer to the page *after* the page we are currently decoding. The reason that patch nevertheless fixed a problem with decoding compounds containing write was a bug in the write decoding introduced by 5a80a54d21c96590d013378d8c5f65f879451ab4 "nfsd4: reorganize write decoding", after which write decoding no longer adhered to the rule that argp->pagelist point to the next page. Signed-off-by: J. Bruce Fields <bfields@redhat.com> [bwh: Backported to 3.2: adjust context; there is only one instance to fix] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Cc: Rui Xiang <rui.xiang@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-07NFSv4 wait on recovery for async session errorsAndy Adamson
commit 4a82fd7c4e78a1b7a224f9ae8bb7e1fd95f670e0 upstream. When the state manager is processing the NFS4CLNT_DELEGRETURN flag, session draining is off, but DELEGRETURN can still get a session error. The async handler calls nfs4_schedule_session_recovery returns -EAGAIN, and the DELEGRETURN done then restarts the RPC task in the prepare state. With the state manager still processing the NFS4CLNT_DELEGRETURN flag with session draining off, these DELEGRETURNs will cycle with errors filling up the session slots. This prevents OPEN reclaims (from nfs_delegation_claim_opens) required by the NFS4CLNT_DELEGRETURN state manager processing from completing, hanging the state manager in the __rpc_wait_for_completion_task in nfs4_run_open_task as seen in this kernel thread dump: kernel: 4.12.32.53-ma D 0000000000000000 0 3393 2 0x00000000 kernel: ffff88013995fb60 0000000000000046 ffff880138cc5400 ffff88013a9df140 kernel: ffff8800000265c0 ffffffff8116eef0 ffff88013fc10080 0000000300000001 kernel: ffff88013a4ad058 ffff88013995ffd8 000000000000fbc8 ffff88013a4ad058 kernel: Call Trace: kernel: [<ffffffff8116eef0>] ? cache_alloc_refill+0x1c0/0x240 kernel: [<ffffffffa0358110>] ? rpc_wait_bit_killable+0x0/0xa0 [sunrpc] kernel: [<ffffffffa0358152>] rpc_wait_bit_killable+0x42/0xa0 [sunrpc] kernel: [<ffffffff8152914f>] __wait_on_bit+0x5f/0x90 kernel: [<ffffffffa0358110>] ? rpc_wait_bit_killable+0x0/0xa0 [sunrpc] kernel: [<ffffffff815291f8>] out_of_line_wait_on_bit+0x78/0x90 kernel: [<ffffffff8109b520>] ? wake_bit_function+0x0/0x50 kernel: [<ffffffffa035810d>] __rpc_wait_for_completion_task+0x2d/0x30 [sunrpc] kernel: [<ffffffffa040d44c>] nfs4_run_open_task+0x11c/0x160 [nfs] kernel: [<ffffffffa04114e7>] nfs4_open_recover_helper+0x87/0x120 [nfs] kernel: [<ffffffffa0411646>] nfs4_open_recover+0xc6/0x150 [nfs] kernel: [<ffffffffa040cc6f>] ? nfs4_open_recoverdata_alloc+0x2f/0x60 [nfs] kernel: [<ffffffffa0414e1a>] nfs4_open_delegation_recall+0x6a/0xa0 [nfs] kernel: [<ffffffffa0424020>] nfs_end_delegation_return+0x120/0x2e0 [nfs] kernel: [<ffffffff8109580f>] ? queue_work+0x1f/0x30 kernel: [<ffffffffa0424347>] nfs_client_return_marked_delegations+0xd7/0x110 [nfs] kernel: [<ffffffffa04225d8>] nfs4_run_state_manager+0x548/0x620 [nfs] kernel: [<ffffffffa0422090>] ? nfs4_run_state_manager+0x0/0x620 [nfs] kernel: [<ffffffff8109b0f6>] kthread+0x96/0xa0 kernel: [<ffffffff8100c20a>] child_rip+0xa/0x20 kernel: [<ffffffff8109b060>] ? kthread+0x0/0xa0 kernel: [<ffffffff8100c200>] ? child_rip+0x0/0x20 The state manager can not therefore process the DELEGRETURN session errors. Change the async handler to wait for recovery on session errors. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> [bwh: Backported to 3.2: - Adjust context - There's no restart_call label] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Cc: Rui Xiang <rui.xiang@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-07cifs: delay super block destruction until all cifsFileInfo objects are goneMateusz Guzik
commit 24261fc23db950951760d00c188ba63cc756b932 upstream. cifsFileInfo objects hold references to dentries and it is possible that these will still be around in workqueues when VFS decides to kill super block during unmount. This results in panics like this one: BUG: Dentry ffff88001f5e76c0{i=66b4a,n=1M-2} still in use (1) [unmount of cifs cifs] ------------[ cut here ]------------ kernel BUG at fs/dcache.c:943! [..] Process umount (pid: 1781, threadinfo ffff88003d6e8000, task ffff880035eeaec0) [..] Call Trace: [<ffffffff811b44f3>] shrink_dcache_for_umount+0x33/0x60 [<ffffffff8119f7fc>] generic_shutdown_super+0x2c/0xe0 [<ffffffff8119f946>] kill_anon_super+0x16/0x30 [<ffffffffa036623a>] cifs_kill_sb+0x1a/0x30 [cifs] [<ffffffff8119fcc7>] deactivate_locked_super+0x57/0x80 [<ffffffff811a085e>] deactivate_super+0x4e/0x70 [<ffffffff811bb417>] mntput_no_expire+0xd7/0x130 [<ffffffff811bc30c>] sys_umount+0x9c/0x3c0 [<ffffffff81657c19>] system_call_fastpath+0x16/0x1b Fix this by making each cifsFileInfo object hold a reference to cifs super block, which implicitly keeps VFS super block around as well. Signed-off-by: Mateusz Guzik <mguzik@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Reported-and-Tested-by: Ben Greear <greearb@candelatech.com> Signed-off-by: Steve French <sfrench@us.ibm.com> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> [xr: Backported to 3.4: adjust context] Signed-off-by: Rui Xiang <rui.xiang@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-07VFS: make vfs_fstat() use f[get|put]_light()Linus Torvalds
commit e994defb7b6813ba6fa7a2a36e86d2455ad1dc35 upstream. Use the *_light() versions that properly avoid doing the file user count updates when they are unnecessary. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [xr: Backported to 3.4: adjust function name] Signed-off-by: Rui Xiang <rui.xiang@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-07fs: cachefiles: add support for large files in filesystem cachingJustin Lecher
commit 98c350cda2c14a343d34ea01a3d9c24fea5ec66d upstream. Support the caching of large files. Addresses https://bugzilla.kernel.org/show_bug.cgi?id=31182 Signed-off-by: Justin Lecher <jlec@gentoo.org> Signed-off-by: Suresh Jayaraman <sjayaraman@suse.com> Tested-by: Suresh Jayaraman <sjayaraman@suse.com> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [bwh: Backported to 3.2: - Adjust context - dentry_open() takes dentry and vfsmount pointers, not a path pointer] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Cc: Rui Xiang <rui.xiang@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-07ecryptfs: Fix memory leakage in keystore.cGeyslan G. Bem
commit 3edc8376c06133e3386265a824869cad03a4efd4 upstream. In 'decrypt_pki_encrypted_session_key' function: Initializes 'payload' pointer and releases it on exit. Signed-off-by: Geyslan G. Bem <geyslan@gmail.com> Signed-off-by: Tyler Hicks <tyhicks@canonical.com> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Cc: Rui Xiang <rui.xiang@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-07CIFS: Fix error handling in cifs_push_mandatory_locksPavel Shilovsky
commit e2f2886a824ff0a56da1eaa13019fde86aa89fa6 upstream. Signed-off-by: Pavel Shilovsky <pshilovsky@etersoft.ru> Signed-off-by: Steve French <smfrench@gmail.com> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Cc: Rui Xiang <rui.xiang@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-07setfacl removes part of ACL when setting POSIX ACLs to SambaSteve French
commit b1d93356427be6f050dc55c86eb019d173700af6 upstream. setfacl over cifs mounts can remove the default ACL when setting the (non-default part of) the ACL and vice versa (we were leaving at 0 rather than setting to -1 the count field for the unaffected half of the ACL. For example notice the setfacl removed the default ACL in this sequence: steven@steven-GA-970A-DS3:~/cifs-2.6$ getfacl /mnt/test-dir ; setfacl -m default:user:test:rwx,user:test:rwx /mnt/test-dir getfacl: Removing leading '/' from absolute path names user::rwx group::r-x other::r-x default:user::rwx default:user:test:rwx default:group::r-x default:mask::rwx default:other::r-x steven@steven-GA-970A-DS3:~/cifs-2.6$ getfacl /mnt/test-dir getfacl: Removing leading '/' from absolute path names user::rwx user:test:rwx group::r-x mask::rwx other::r-x Signed-off-by: Steve French <smfrench@gmail.com> Acked-by: Jeremy Allison <jra@samba.org> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Cc: Rui Xiang <rui.xiang@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-07nfsd4: remove lockowner when removing lock stateidJ. Bruce Fields
commit a1b8ff4c97b4375d21b6d6c45d75877303f61b3b upstream. The nfsv4 state code has always assumed a one-to-one correspondance between lock stateid's and lockowners even if it appears not to in some places. We may actually change that, but for now when FREE_STATEID releases a lock stateid it also needs to release the parent lockowner. Symptoms were a subsequent LOCK crashing in find_lockowner_str when it calls same_lockowner_ino on a lockowner that unexpectedly has an empty so_stateids list. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-07nfsd4: warn on finding lockowner without stateid'sJ. Bruce Fields
commit 27b11428b7de097c42f205beabb1764f4365443b upstream. The current code assumes a one-to-one lockowner<->lock stateid correspondance. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-07NFSD: Call ->set_acl with a NULL ACL structure if no entriesKinglong Mee
commit aa07c713ecfc0522916f3cd57ac628ea6127c0ec upstream. After setting ACL for directory, I got two problems that caused by the cached zero-length default posix acl. This patch make sure nfsd4_set_nfs4_acl calls ->set_acl with a NULL ACL structure if there are no entries. Thanks for Christoph Hellwig's advice. First problem: ............ hang ........... Second problem: [ 1610.167668] ------------[ cut here ]------------ [ 1610.168320] kernel BUG at /root/nfs/linux/fs/nfsd/nfs4acl.c:239! [ 1610.168320] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC [ 1610.168320] Modules linked in: nfsv4(OE) nfs(OE) nfsd(OE) rpcsec_gss_krb5 fscache ip6t_rpfilter ip6t_REJECT cfg80211 xt_conntrack rfkill ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw auth_rpcgss nfs_acl snd_intel8x0 ppdev lockd snd_ac97_codec ac97_bus snd_pcm snd_timer e1000 pcspkr parport_pc snd parport serio_raw joydev i2c_piix4 sunrpc(OE) microcode soundcore i2c_core ata_generic pata_acpi [last unloaded: nfsd] [ 1610.168320] CPU: 0 PID: 27397 Comm: nfsd Tainted: G OE 3.15.0-rc1+ #15 [ 1610.168320] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 1610.168320] task: ffff88005ab653d0 ti: ffff88005a944000 task.ti: ffff88005a944000 [ 1610.168320] RIP: 0010:[<ffffffffa034d5ed>] [<ffffffffa034d5ed>] _posix_to_nfsv4_one+0x3cd/0x3d0 [nfsd] [ 1610.168320] RSP: 0018:ffff88005a945b00 EFLAGS: 00010293 [ 1610.168320] RAX: 0000000000000001 RBX: ffff88006700bac0 RCX: 0000000000000000 [ 1610.168320] RDX: 0000000000000000 RSI: ffff880067c83f00 RDI: ffff880068233300 [ 1610.168320] RBP: ffff88005a945b48 R08: ffffffff81c64830 R09: 0000000000000000 [ 1610.168320] R10: ffff88004ea85be0 R11: 000000000000f475 R12: ffff880068233300 [ 1610.168320] R13: 0000000000000003 R14: 0000000000000002 R15: ffff880068233300 [ 1610.168320] FS: 0000000000000000(0000) GS:ffff880077800000(0000) knlGS:0000000000000000 [ 1610.168320] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 1610.168320] CR2: 00007f5bcbd3b0b9 CR3: 0000000001c0f000 CR4: 00000000000006f0 [ 1610.168320] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 1610.168320] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 1610.168320] Stack: [ 1610.168320] ffffffff00000000 0000000b67c83500 000000076700bac0 0000000000000000 [ 1610.168320] ffff88006700bac0 ffff880068233300 ffff88005a945c08 0000000000000002 [ 1610.168320] 0000000000000000 ffff88005a945b88 ffffffffa034e2d5 000000065a945b68 [ 1610.168320] Call Trace: [ 1610.168320] [<ffffffffa034e2d5>] nfsd4_get_nfs4_acl+0x95/0x150 [nfsd] [ 1610.168320] [<ffffffffa03400d6>] nfsd4_encode_fattr+0x646/0x1e70 [nfsd] [ 1610.168320] [<ffffffff816a6e6e>] ? kmemleak_alloc+0x4e/0xb0 [ 1610.168320] [<ffffffffa0327962>] ? nfsd_setuser_and_check_port+0x52/0x80 [nfsd] [ 1610.168320] [<ffffffff812cd4bb>] ? selinux_cred_prepare+0x1b/0x30 [ 1610.168320] [<ffffffffa0341caa>] nfsd4_encode_getattr+0x5a/0x60 [nfsd] [ 1610.168320] [<ffffffffa0341e07>] nfsd4_encode_operation+0x67/0x110 [nfsd] [ 1610.168320] [<ffffffffa033844d>] nfsd4_proc_compound+0x21d/0x810 [nfsd] [ 1610.168320] [<ffffffffa0324d9b>] nfsd_dispatch+0xbb/0x200 [nfsd] [ 1610.168320] [<ffffffffa00850cd>] svc_process_common+0x46d/0x6d0 [sunrpc] [ 1610.168320] [<ffffffffa0085433>] svc_process+0x103/0x170 [sunrpc] [ 1610.168320] [<ffffffffa032472f>] nfsd+0xbf/0x130 [nfsd] [ 1610.168320] [<ffffffffa0324670>] ? nfsd_destroy+0x80/0x80 [nfsd] [ 1610.168320] [<ffffffff810a5202>] kthread+0xd2/0xf0 [ 1610.168320] [<ffffffff810a5130>] ? insert_kthread_work+0x40/0x40 [ 1610.168320] [<ffffffff816c1ebc>] ret_from_fork+0x7c/0xb0 [ 1610.168320] [<ffffffff810a5130>] ? insert_kthread_work+0x40/0x40 [ 1610.168320] Code: 78 02 e9 e7 fc ff ff 31 c0 31 d2 31 c9 66 89 45 ce 41 8b 04 24 66 89 55 d0 66 89 4d d2 48 8d 04 80 49 8d 5c 84 04 e9 37 fd ff ff <0f> 0b 90 0f 1f 44 00 00 55 8b 56 08 c7 07 00 00 00 00 8b 46 0c [ 1610.168320] RIP [<ffffffffa034d5ed>] _posix_to_nfsv4_one+0x3cd/0x3d0 [nfsd] [ 1610.168320] RSP <ffff88005a945b00> [ 1610.257313] ---[ end trace 838254e3e352285b ]--- Signed-off-by: Kinglong Mee <kinglongmee@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-07posix_acl: handle NULL ACL in posix_acl_equiv_modeChristoph Hellwig
commit 50c6e282bdf5e8dabf8d7cf7b162545a55645fd9 upstream. Various filesystems don't bother checking for a NULL ACL in posix_acl_equiv_mode, and thus can dereference a NULL pointer when it gets passed one. This usually happens from the NFS server, as the ACL tools never pass a NULL ACL, but instead of one representing the mode bits. Instead of adding boilerplat to all filesystems put this check into one place, which will allow us to remove the check from other filesystems as well later on. Signed-off-by: Christoph Hellwig <hch@lst.de> Reported-by: Ben Greear <greearb@candelatech.com> Reported-by: Marco Munderloh <munderl@tnt.uni-hannover.de>, Cc: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-05-13locks: allow __break_lease to sleep even when break_time is 0Jeff Layton
commit 4991a628a789dc5954e98e79476d9808812292ec upstream. A fl->fl_break_time of 0 has a special meaning to the lease break code that basically means "never break the lease". knfsd uses this to ensure that leases don't disappear out from under it. Unfortunately, the code in __break_lease can end up passing this value to wait_event_interruptible as a timeout, which prevents it from going to sleep at all. This causes __break_lease to spin in a tight loop and causes soft lockups. Fix this by ensuring that we pass a minimum value of 1 as a timeout instead. Cc: J. Bruce Fields <bfields@fieldses.org> Reported-by: Terry Barnaby <terry1@beam.ltd.uk> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-05-06ext4: use i_size_read in ext4_unaligned_aio()Theodore Ts'o
commit 6e6358fc3c3c862bfe9a5bc029d3f8ce43dc9765 upstream. We haven't taken i_mutex yet, so we need to use i_size_read(). Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-05-06ocfs2: do not put bh when buffer_uptodate failedalex chen
commit f7cf4f5bfe073ad792ab49c04f247626b3e38db6 upstream. Do not put bh when buffer_uptodate failed in ocfs2_write_block and ocfs2_write_super_or_backup, because it will put bh in b_end_io. Otherwise it will hit a warning "VFS: brelse: Trying to free free buffer". Signed-off-by: Alex Chen <alex.chen@huawei.com> Reviewed-by: Joseph Qi <joseph.qi@huawei.com> Reviewed-by: Srinivas Eeda <srinivas.eeda@oracle.com> Cc: Mark Fasheh <mfasheh@suse.com> Acked-by: Joel Becker <jlbec@evilplan.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-05-06ocfs2: dlm: fix recovery hungJunxiao Bi
commit ded2cf71419b9353060e633b59e446c42a6a2a09 upstream. There is a race window in dlm_do_recovery() between dlm_remaster_locks() and dlm_reset_recovery() when the recovery master nearly finish the recovery process for a dead node. After the master sends FINALIZE_RECO message in dlm_remaster_locks(), another node may become the recovery master for another dead node, and then send the BEGIN_RECO message to all the nodes included the old master, in the handler of this message dlm_begin_reco_handler() of old master, dlm->reco.dead_node and dlm->reco.new_master will be set to the second dead node and the new master, then in dlm_reset_recovery(), these two variables will be reset to default value. This will cause new recovery master can not finish the recovery process and hung, at last the whole cluster will hung for recovery. old recovery master: new recovery master: dlm_remaster_locks() become recovery master for another dead node. dlm_send_begin_reco_message() dlm_begin_reco_handler() { if (dlm->reco.state & DLM_RECO_STATE_FINALIZE) { return -EAGAIN; } dlm_set_reco_master(dlm, br->node_idx); dlm_set_reco_dead_node(dlm, br->dead_node); } dlm_reset_recovery() { dlm_set_reco_dead_node(dlm, O2NM_INVALID_NODE_NUM); dlm_set_reco_master(dlm, O2NM_INVALID_NODE_NUM); } will hang in dlm_remaster_locks() for request dlm locks info Before send FINALIZE_RECO message, recovery master should set DLM_RECO_STATE_FINALIZE for itself and clear it after the recovery done, this can break the race windows as the BEGIN_RECO messages will not be handled before DLM_RECO_STATE_FINALIZE flag is cleared. A similar race may happen between new recovery master and normal node which is in dlm_finalize_reco_handler(), also fix it. Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Reviewed-by: Srinivas Eeda <srinivas.eeda@oracle.com> Reviewed-by: Wengang Wang <wen.gang.wang@oracle.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-05-06ocfs2: dlm: fix lock migration crashJunxiao Bi
commit 34aa8dac482f1358d59110d5e3a12f4351f6acaa upstream. This issue was introduced by commit 800deef3f6f8 ("ocfs2: use list_for_each_entry where benefical") in 2007 where it replaced list_for_each with list_for_each_entry. The variable "lock" will point to invalid data if "tmpq" list is empty and a panic will be triggered due to this. Sunil advised reverting it back, but the old version was also not right. At the end of the outer for loop, that list_for_each_entry will also set "lock" to an invalid data, then in the next loop, if the "tmpq" list is empty, "lock" will be an stale invalid data and cause the panic. So reverting the list_for_each back and reset "lock" to NULL to fix this issue. Another concern is that this seemes can not happen because the "tmpq" list should not be empty. Let me describe how. old lock resource owner(node 1): migratation target(node 2): image there's lockres with a EX lock from node 2 in granted list, a NR lock from node x with convert_type EX in converting list. dlm_empty_lockres() { dlm_pick_migration_target() { pick node 2 as target as its lock is the first one in granted list. } dlm_migrate_lockres() { dlm_mark_lockres_migrating() { res->state |= DLM_LOCK_RES_BLOCK_DIRTY; wait_event(dlm->ast_wq, !dlm_lockres_is_dirty(dlm, res)); //after the above code, we can not dirty lockres any more, // so dlm_thread shuffle list will not run downconvert lock from EX to NR upconvert lock from NR to EX <<< migration may schedule out here, then <<< node 2 send down convert request to convert type from EX to <<< NR, then send up convert request to convert type from NR to <<< EX, at this time, lockres granted list is empty, and two locks <<< in the converting list, node x up convert lock followed by <<< node 2 up convert lock. // will set lockres RES_MIGRATING flag, the following // lock/unlock can not run dlm_lockres_release_ast(dlm, res); } dlm_send_one_lockres() dlm_process_recovery_data() for (i=0; i<mres->num_locks; i++) if (ml->node == dlm->node_num) for (j = DLM_GRANTED_LIST; j <= DLM_BLOCKED_LIST; j++) { list_for_each_entry(lock, tmpq, list) if (lock) break; <<< lock is invalid as grant list is empty. } if (lock->ml.node != ml->node) BUG() >>> crash here } I see the above locks status from a vmcore of our internal bug. Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Reviewed-by: Wengang Wang <wen.gang.wang@oracle.com> Cc: Sunil Mushran <sunil.mushran@gmail.com> Reviewed-by: Srinivas Eeda <srinivas.eeda@oracle.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Mark Fasheh <mfasheh@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-05-06reiserfs: fix race in readdirJeff Mahoney
commit 01d8885785a60ae8f4c37b0ed75bdc96d0fc6a44 upstream. jdm-20004 reiserfs_delete_xattrs: Couldn't delete all xattrs (-2) The -ENOENT is due to readdir calling dir_emit on the same entry twice. If the dir_emit callback sleeps and the tree is changed underneath us, we won't be able to trust deh_offset(deh) anymore. We need to save next_pos before we might sleep so we can find the next entry. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-05-06nfsd4: fix setclientid encode sizeJ. Bruce Fields
commit 480efaee085235bb848f1063f959bf144103c342 upstream. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-05-06nfsd: notify_change needs elevated write countJ. Bruce Fields
commit 9f67f189939eccaa54f3d2c9cf10788abaf2d584 upstream. Looks like this bug has been here since these write counts were introduced, not sure why it was just noticed now. Thanks also to Jan Kara for pointing out the problem. Reported-by: Matthew Rahtz <mrahtz@rapitasystems.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-05-06nfsd4: fix test_stateid error reply encodingJ. Bruce Fields
commit a11fcce1544df08c723d950ff0edef3adac40405 upstream. If the entire operation fails then there's nothing to encode. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-05-06nfsd4: buffer-length check for SUPPATTR_EXCLCREATJ. Bruce Fields
commit de3997a7eeb9ea286b15879fdf8a95aae065b4f7 upstream. This was an omission from 8c18f2052e756e7d5dea712fc6e7ed70c00e8a39 "nfsd41: SUPPATTR_EXCLCREAT attribute". Cc: Benny Halevy <bhalevy@primarydata.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-05-06nfsd4: session needs room for following op to error outJ. Bruce Fields
commit 4c69d5855a16f7378648c5733632628fa10431db upstream. Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-04-26jffs2: remove from wait queue after schedule()Li Zefan
commit 3ead9578443b66ddb3d50ed4f53af8a0c0298ec5 upstream. @wait is a local variable, so if we don't remove it from the wait queue list, later wake_up() may end up accessing invalid memory. This was spotted by eyes. Signed-off-by: Li Zefan <lizefan@huawei.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Brian Norris <computersforpeace@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-04-26jffs2: avoid soft-lockup in jffs2_reserve_space_gc()Li Zefan
commit 13b546d96207c131eeae15dc7b26c6e7d0f1cad7 upstream. We triggered soft-lockup under stress test on 2.6.34 kernel. BUG: soft lockup - CPU#1 stuck for 60009ms! [lockf2.test:14488] ... [<bf09a4d4>] (jffs2_do_reserve_space+0x420/0x440 [jffs2]) [<bf09a528>] (jffs2_reserve_space_gc+0x34/0x78 [jffs2]) [<bf0a1350>] (jffs2_garbage_collect_dnode.isra.3+0x264/0x478 [jffs2]) [<bf0a2078>] (jffs2_garbage_collect_pass+0x9c0/0xe4c [jffs2]) [<bf09a670>] (jffs2_reserve_space+0x104/0x2a8 [jffs2]) [<bf09dc48>] (jffs2_write_inode_range+0x5c/0x4d4 [jffs2]) [<bf097d8c>] (jffs2_write_end+0x198/0x2c0 [jffs2]) [<c00e00a4>] (generic_file_buffered_write+0x158/0x200) [<c00e14f4>] (__generic_file_aio_write+0x3a4/0x414) [<c00e15c0>] (generic_file_aio_write+0x5c/0xbc) [<c012334c>] (do_sync_write+0x98/0xd4) [<c0123a84>] (vfs_write+0xa8/0x150) [<c0123d74>] (sys_write+0x3c/0xc0)] Fix this by adding a cond_resched() in the while loop. [akpm@linux-foundation.org: don't initialize `ret'] Signed-off-by: Li Zefan <lizefan@huawei.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Brian Norris <computersforpeace@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-04-26jffs2: Fix crash due to truncation of csizeAjesh Kunhipurayil Vijayan
commit 41bf1a24c1001f4d0d41a78e1ac575d2f14789d7 upstream. mounting JFFS2 partition sometimes crashes with this call trace: [ 1322.240000] Kernel bug detected[#1]: [ 1322.244000] Cpu 2 [ 1322.244000] $ 0 : 0000000000000000 0000000000000018 000000003ff00070 0000000000000001 [ 1322.252000] $ 4 : 0000000000000000 c0000000f3980150 0000000000000000 0000000000010000 [ 1322.260000] $ 8 : ffffffffc09cd5f8 0000000000000001 0000000000000088 c0000000ed300de8 [ 1322.268000] $12 : e5e19d9c5f613a45 ffffffffc046d464 0000000000000000 66227ba5ea67b74e [ 1322.276000] $16 : c0000000f1769c00 c0000000ed1e0200 c0000000f3980150 0000000000000000 [ 1322.284000] $20 : c0000000f3a80000 00000000fffffffc c0000000ed2cfbd8 c0000000f39818f0 [ 1322.292000] $24 : 0000000000000004 0000000000000000 [ 1322.300000] $28 : c0000000ed2c0000 c0000000ed2cfab8 0000000000010000 ffffffffc039c0b0 [ 1322.308000] Hi : 000000000000023c [ 1322.312000] Lo : 000000000003f802 [ 1322.316000] epc : ffffffffc039a9f8 check_tn_node+0x88/0x3b0 [ 1322.320000] Not tainted [ 1322.324000] ra : ffffffffc039c0b0 jffs2_do_read_inode_internal+0x1250/0x1e48 [ 1322.332000] Status: 5400f8e3 KX SX UX KERNEL EXL IE [ 1322.336000] Cause : 00800034 [ 1322.340000] PrId : 000c1004 (Netlogic XLP) [ 1322.344000] Modules linked in: [ 1322.348000] Process jffs2_gcd_mtd7 (pid: 264, threadinfo=c0000000ed2c0000, task=c0000000f0e68dd8, tls=0000000000000000) [ 1322.356000] Stack : c0000000f1769e30 c0000000ed010780 c0000000ed010780 c0000000ed300000 c0000000f1769c00 c0000000f3980150 c0000000f3a80000 00000000fffffffc c0000000ed2cfbd8 ffffffffc039c0b0 ffffffffc09c6340 0000000000001000 0000000000000dec ffffffffc016c9d8 c0000000f39805a0 c0000000f3980180 0000008600000000 0000000000000000 0000000000000000 0000000000000000 0001000000000dec c0000000f1769d98 c0000000ed2cfb18 0000000000010000 0000000000010000 0000000000000044 c0000000f3a80000 c0000000f1769c00 c0000000f3d207a8 c0000000f1769d98 c0000000f1769de0 ffffffffc076f9c0 0000000000000009 0000000000000000 0000000000000000 ffffffffc039cf90 0000000000000017 ffffffffc013fbdc 0000000000000001 000000010003e61c ... [ 1322.424000] Call Trace: [ 1322.428000] [<ffffffffc039a9f8>] check_tn_node+0x88/0x3b0 [ 1322.432000] [<ffffffffc039c0b0>] jffs2_do_read_inode_internal+0x1250/0x1e48 [ 1322.440000] [<ffffffffc039cf90>] jffs2_do_crccheck_inode+0x70/0xd0 [ 1322.448000] [<ffffffffc03a1b80>] jffs2_garbage_collect_pass+0x160/0x870 [ 1322.452000] [<ffffffffc03a392c>] jffs2_garbage_collect_thread+0xdc/0x1f0 [ 1322.460000] [<ffffffffc01541c8>] kthread+0xb8/0xc0 [ 1322.464000] [<ffffffffc0106d18>] kernel_thread_helper+0x10/0x18 [ 1322.472000] [ 1322.472000] Code: 67bd0050 94a4002c 2c830001 <00038036> de050218 2403fffc 0080a82d 00431824 24630044 [ 1322.480000] ---[ end trace b052bb90e97dfbf5 ]--- The variable csize in structure jffs2_tmp_dnode_info is of type uint16_t, but it is used to hold the compressed data length(csize) which is declared as uint32_t. So, when the value of csize exceeds 16bits, it gets truncated when assigned to tn->csize. This is causing a kernel BUG. Changing the definition of csize in jffs2_tmp_dnode_info to uint32_t fixes the issue. Signed-off-by: Ajesh Kunhipurayil Vijayan <ajesh@broadcom.com> Signed-off-by: Kamlakant Patel <kamlakant.patel@broadcom.com> Signed-off-by: Brian Norris <computersforpeace@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-04-26jffs2: Fix segmentation fault found in stress testKamlakant Patel
commit 3367da5610c50e6b83f86d366d72b41b350b06a2 upstream. Creating a large file on a JFFS2 partition sometimes crashes with this call trace: [ 306.476000] CPU 13 Unable to handle kernel paging request at virtual address c0000000dfff8002, epc == ffffffffc03a80a8, ra == ffffffffc03a8044 [ 306.488000] Oops[#1]: [ 306.488000] Cpu 13 [ 306.492000] $ 0 : 0000000000000000 0000000000000000 0000000000008008 0000000000008007 [ 306.500000] $ 4 : c0000000dfff8002 000000000000009f c0000000e0007cde c0000000ee95fa58 [ 306.508000] $ 8 : 0000000000000001 0000000000008008 0000000000010000 ffffffffffff8002 [ 306.516000] $12 : 0000000000007fa9 000000000000ff0e 000000000000ff0f 80e55930aebb92bb [ 306.524000] $16 : c0000000e0000000 c0000000ee95fa5c c0000000efc80000 ffffffffc09edd70 [ 306.532000] $20 : ffffffffc2b60000 c0000000ee95fa58 0000000000000000 c0000000efc80000 [ 306.540000] $24 : 0000000000000000 0000000000000004 [ 306.548000] $28 : c0000000ee950000 c0000000ee95f738 0000000000000000 ffffffffc03a8044 [ 306.556000] Hi : 00000000000574a5 [ 306.560000] Lo : 6193b7a7e903d8c9 [ 306.564000] epc : ffffffffc03a80a8 jffs2_rtime_compress+0x98/0x198 [ 306.568000] Tainted: G W [ 306.572000] ra : ffffffffc03a8044 jffs2_rtime_compress+0x34/0x198 [ 306.580000] Status: 5000f8e3 KX SX UX KERNEL EXL IE [ 306.584000] Cause : 00800008 [ 306.588000] BadVA : c0000000dfff8002 [ 306.592000] PrId : 000c1100 (Netlogic XLP) [ 306.596000] Modules linked in: [ 306.596000] Process dd (pid: 170, threadinfo=c0000000ee950000, task=c0000000ee6e0858, tls=0000000000c47490) [ 306.608000] Stack : 7c547f377ddc7ee4 7ffc7f967f5d7fae 7f617f507fc37ff4 7e7d7f817f487f5f 7d8e7fec7ee87eb3 7e977ff27eec7f9e 7d677ec67f917f67 7f3d7e457f017ed7 7fd37f517f867eb2 7fed7fd17ca57e1d 7e5f7fe87f257f77 7fd77f0d7ede7fdb 7fba7fef7e197f99 7fde7fe07ee37eb5 7f5c7f8c7fc67f65 7f457fb87f847e93 7f737f3e7d137cd9 7f8e7e9c7fc47d25 7dbb7fac7fb67e52 7ff17f627da97f64 7f6b7df77ffa7ec5 80057ef17f357fb3 7f767fa27dfc7fd5 7fe37e8e7fd07e53 7e227fcf7efb7fa1 7f547e787fa87fcc 7fcb7fc57f5a7ffb 7fc07f6c7ea97e80 7e2d7ed17e587ee0 7fb17f9d7feb7f31 7f607e797e887faa 7f757fdd7c607ff3 7e877e657ef37fbd 7ec17fd67fe67ff7 7ff67f797ff87dc4 7eef7f3a7c337fa6 7fe57fc97ed87f4b 7ebe7f097f0b8003 7fe97e2a7d997cba 7f587f987f3c7fa9 ... [ 306.676000] Call Trace: [ 306.680000] [<ffffffffc03a80a8>] jffs2_rtime_compress+0x98/0x198 [ 306.684000] [<ffffffffc0394f10>] jffs2_selected_compress+0x110/0x230 [ 306.692000] [<ffffffffc039508c>] jffs2_compress+0x5c/0x388 [ 306.696000] [<ffffffffc039dc58>] jffs2_write_inode_range+0xd8/0x388 [ 306.704000] [<ffffffffc03971bc>] jffs2_write_end+0x16c/0x2d0 [ 306.708000] [<ffffffffc01d3d90>] generic_file_buffered_write+0xf8/0x2b8 [ 306.716000] [<ffffffffc01d4e7c>] __generic_file_aio_write+0x1ac/0x350 [ 306.720000] [<ffffffffc01d50a0>] generic_file_aio_write+0x80/0x168 [ 306.728000] [<ffffffffc021f7dc>] do_sync_write+0x94/0xf8 [ 306.732000] [<ffffffffc021ff6c>] vfs_write+0xa4/0x1a0 [ 306.736000] [<ffffffffc02202e8>] SyS_write+0x50/0x90 [ 306.744000] [<ffffffffc0116cc0>] handle_sys+0x180/0x1a0 [ 306.748000] [ 306.748000] Code: 020b202d 0205282d 90a50000 <90840000> 14a40038 00000000 0060602d 0000282d 016c5823 [ 306.760000] ---[ end trace 79dd088435be02d0 ]--- Segmentation fault This crash is caused because the 'positions' is declared as an array of signed short. The value of position is in the range 0..65535, and will be converted to a negative number when the position is greater than 32767 and causes a corruption and crash. Changing the definition to 'unsigned short' fixes this issue Signed-off-by: Jayachandran C <jchandra@broadcom.com> Signed-off-by: Kamlakant Patel <kamlakant.patel@broadcom.com> Signed-off-by: Brian Norris <computersforpeace@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-04-14proc: pid/status: show all supplementary groupsArtem Bityutskiy
commit 8d238027b87e654be552eabdf492042a34c5c300 upstream. We display a list of supplementary group for each process in /proc/<pid>/status. However, we show only the first 32 groups, not all of them. Although this is rare, but sometimes processes do have more than 32 supplementary groups, and this kernel limitation breaks user-space apps that rely on the group list in /proc/<pid>/status. Number 32 comes from the internal NGROUPS_SMALL macro which defines the length for the internal kernel "small" groups buffer. There is no apparent reason to limit to this value. This patch removes the 32 groups printing limit. The Linux kernel limits the amount of supplementary groups by NGROUPS_MAX, which is currently set to 65536. And this is the maximum count of groups we may possibly print. Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com> Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Cc: Qiang Huang <h.huangqiang@huawei.com> Cc: Li Zefan <lizefan@huawei.com> Cc: Jianguo Wu <wujianguo@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-04-03ext4: atomically set inode->i_flags in ext4_set_inode_flags()Theodore Ts'o
commit 00a1a053ebe5febcfc2ec498bd894f035ad2aa06 upstream. Use cmpxchg() to atomically set i_flags instead of clearing out the S_IMMUTABLE, S_APPEND, etc. flags and then setting them from the EXT4_IMMUTABLE_FL, EXT4_APPEND_FL flags, since this opens up a race where an immutable file has the immutable flag cleared for a brief window of time. Reported-by: John Sullivan <jsrhbz@kanargh.force9.co.uk> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-03-23Btrfs: fix data corruption when reading/updating compressed extentsFilipe David Borba Manana
commit a2aa75e18a21b21952dc6daa9bac7c9f4426f81f upstream. When using a mix of compressed file extents and prealloc extents, it is possible to fill a page of a file with random, garbage data from some unrelated previous use of the page, instead of a sequence of zeroes. A simple sequence of steps to get into such case, taken from the test case I made for xfstests, is: _scratch_mkfs _scratch_mount "-o compress-force=lzo" $XFS_IO_PROG -f -c "pwrite -S 0x06 -b 18670 266978 18670" $SCRATCH_MNT/foobar $XFS_IO_PROG -c "falloc 26450 665194" $SCRATCH_MNT/foobar $XFS_IO_PROG -c "truncate 542872" $SCRATCH_MNT/foobar $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/foobar This results in the following file items in the fs tree: item 4 key (257 INODE_ITEM 0) itemoff 15879 itemsize 160 inode generation 6 transid 6 size 542872 block group 0 mode 100600 item 5 key (257 INODE_REF 256) itemoff 15863 itemsize 16 inode ref index 2 namelen 6 name: foobar item 6 key (257 EXTENT_DATA 0) itemoff 15810 itemsize 53 extent data disk byte 0 nr 0 gen 6 extent data offset 0 nr 24576 ram 266240 extent compression 0 item 7 key (257 EXTENT_DATA 24576) itemoff 15757 itemsize 53 prealloc data disk byte 12849152 nr 241664 gen 6 prealloc data offset 0 nr 241664 item 8 key (257 EXTENT_DATA 266240) itemoff 15704 itemsize 53 extent data disk byte 12845056 nr 4096 gen 6 extent data offset 0 nr 20480 ram 20480 extent compression 2 item 9 key (257 EXTENT_DATA 286720) itemoff 15651 itemsize 53 prealloc data disk byte 13090816 nr 405504 gen 6 prealloc data offset 0 nr 258048 The on disk extent at offset 266240 (which corresponds to 1 single disk block), contains 5 compressed chunks of file data. Each of the first 4 compress 4096 bytes of file data, while the last one only compresses 3024 bytes of file data. Therefore a read into the file region [285648 ; 286720[ (length = 4096 - 3024 = 1072 bytes) should always return zeroes (our next extent is a prealloc one). The solution here is the compression code path to zero the remaining (untouched) bytes of the last page it uncompressed data into, as the information about how much space the file data consumes in the last page is not known in the upper layer fs/btrfs/extent_io.c:__do_readpage(). In __do_readpage we were correctly zeroing the remainder of the page but only if it corresponds to the last page of the inode and if the inode's size is not a multiple of the page size. This would cause not only returning random data on reads, but also permanently storing random data when updating parts of the region that should be zeroed. For the example above, it means updating a single byte in the region [285648 ; 286720[ would store that byte correctly but also store random data on disk. A test case for xfstests follows soon. Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> Signed-off-by: Chris Mason <clm@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-03-23fs/proc/base.c: fix GPF in /proc/$PID/map_filesArtem Fetishev
commit 70335abb2689c8cd5df91bf2d95a65649addf50b upstream. The expected logic of proc_map_files_get_link() is either to return 0 and initialize 'path' or return an error and leave 'path' uninitialized. By the time dname_to_vma_addr() returns 0 the corresponding vma may have already be gone. In this case the path is not initialized but the return value is still 0. This results in 'general protection fault' inside d_path(). Steps to reproduce: CONFIG_CHECKPOINT_RESTORE=y fd = open(...); while (1) { mmap(fd, ...); munmap(fd, ...); } ls -la /proc/$PID/map_files Addresses https://bugzilla.kernel.org/show_bug.cgi?id=68991 Signed-off-by: Artem Fetishev <artem_fetishev@epam.com> Signed-off-by: Aleksandr Terekhov <aleksandr_terekhov@epam.com> Reported-by: <wiebittewas@gmail.com> Acked-by: Pavel Emelyanov <xemul@parallels.com> Acked-by: Cyrill Gorcunov <gorcunov@openvz.org> Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-03-23NFS: Fix a delegation callback raceTrond Myklebust
commit 755a48a7a4eb05b9c8424e3017d947b2961a60e0 upstream. The clean-up in commit 36281caa839f ended up removing a NULL pointer check that is needed in order to prevent an Oops in nfs_async_inode_return_delegation(). Reported-by: "Yan, Zheng" <zheng.z.yan@intel.com> Link: http://lkml.kernel.org/r/5313E9F6.2020405@intel.com Fixes: 36281caa839f (NFSv4: Further clean-ups of delegation stateid validation) Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-03-23ocfs2 syncs the wrong range...Al Viro
commit 1b56e98990bcdbb20b9fab163654b9315bf158e8 upstream. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-03-23ocfs2: fix quota file corruptionJan Kara
commit 15c34a760630ca2c803848fba90ca0646a9907dd upstream. Global quota files are accessed from different nodes. Thus we cannot cache offset of quota structure in the quota file after we drop our node reference count to it because after that moment quota structure may be freed and reallocated elsewhere by a different node resulting in corruption of quota file. Fix the problem by clearing dq_off when we are releasing dquot structure. We also remove the DB_READ_B handling because it is useless - DQ_ACTIVE_B is set iff DQ_READ_B is set. Signed-off-by: Jan Kara <jack@suse.cz> Cc: Goldwyn Rodrigues <rgoldwyn@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Reviewed-by: Mark Fasheh <mfasheh@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>