aboutsummaryrefslogtreecommitdiff
path: root/fs/f2fs/inode.c
AgeCommit message (Collapse)Author
2013-08-26f2fs: add flags for inline xattrsJaegeuk Kim
This patch adds basic inode flags for inline xattrs, F2FS_INLINE_XATTR, and add a mount option, inline_xattr, which is enabled when xattr is set. If the mount option is enabled, all the files are marked with the inline_xattrs flag. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-08-19f2fs: avoid writing inode redundantly when creating a fileJin Xu
In f2fs_write_inode, updating inode after f2fs_balance_fs is not a optimized way in the case that f2fs_gc is performed ahead. The inode page will be unnecessarily written out twice, one of which is in f2fs_gc->...->sync_node_pages and the other is in update_inode_page. Let's update the inode page in prior to f2fs_balance_fs to avoid this. To reproduce it, $ touch file (before this step, should make the device need f2fs_gc) $ sync (or wait the bdi to write dirty inode) Signed-off-by: Jin Xu <jinuxstyle@gmail.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-08-06f2fs: fix a deadlock in fsyncJin Xu
This patch fixes a deadlock bug that occurs quite often when there are concurrent write and fsync on a same file. Following is the simplified call trace when tasks get hung. fsync thread: - f2fs_sync_file ... - f2fs_write_data_pages ... - update_extent_cache ... - update_inode - wait_on_page_writeback bdi writeback thread - __writeback_single_inode - f2fs_write_data_pages - mutex_lock(sbi->writepages) The deadlock happens when the fsync thread waits on a inode page that has been added to the f2fs' cached bio sbi->bio[NODE], and unfortunately, no one else could be able to submit the cached bio to block layer for writeback. This is because the fsync thread already hold a sbi->fs_lock and the sbi->writepages lock, causing the bdi thread being blocked when attempt to write data pages for the same inode. At the same time, f2fs_gc thread does not notice the situation and could not help. Even the sync syscall gets blocked. To fix it, we could submit the cached bio first before waiting on a inode page that is being written back. Signed-off-by: Jin Xu <jinuxstyle@gmail.com> [Jaegeuk Kim: add more cases to use f2fs_wait_on_page_writeback] Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-07-30f2fs: introduce help function F2FS_NODE()Gu Zheng
Introduce help function F2FS_NODE() to simplify the conversion of node_page to f2fs_node. Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-06-14f2fs: avoid freqeunt write_inode callsJaegeuk Kim
If update_inode is called, we don't need to do write_inode. So, let's use a *dirty* flag for each inode. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-05-28f2fs: fix wrong condition checkJaegeuk Kim
While an orphan inode has zero link_count, f2fs_gc is able to select the inode for foreground gc. - f2fs_gc - do_garbage_collect - gc_data_segment : f2fs_iget is failed : get_valid_blocks() != 0, so that retry --> here we got the infinite loop. This patch resolved this issue. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-05-28f2fs: avoid RECLAIM_FS-ON-W: deadlockJaegeuk Kim
This patch tries to avoid the following deadlock condition of which the reclaim path can trigger f2fs_balance_fs again. ================================= [ INFO: inconsistent lock state ] --------------------------------- inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage. kswapd0/41 [HC0[0]:SC0[0]:HE1:SE1] takes: (&sbi->gc_mutex){+.+.?.}, at: f2fs_balance_fs+0xe6/0x100 [f2fs] {RECLAIM_FS-ON-W} state was registered at: [<ffffffff810aa5a9>] mark_held_locks+0xb9/0x140 [<ffffffff810aae85>] lockdep_trace_alloc+0x85/0xf0 [<ffffffff8113ab2c>] __alloc_pages_nodemask+0x7c/0x9b0 [<ffffffff81175aa8>] alloc_pages_current+0xb8/0x180 [<ffffffff811319cf>] __page_cache_alloc+0xaf/0xd0 [<ffffffff8113225c>] find_or_create_page+0x4c/0xb0 [<ffffffffa021359e>] find_data_page+0x14e/0x210 [f2fs] [<ffffffffa021161b>] f2fs_gc+0x9eb/0xd90 [f2fs] [<ffffffffa0218fae>] f2fs_balance_fs+0xee/0x100 [f2fs] [<ffffffffa020848c>] f2fs_setattr+0x6c/0x200 [f2fs] [<ffffffff811ae51b>] notify_change+0x1db/0x3a0 [<ffffffff8118fbd0>] do_truncate+0x60/0xa0 [<ffffffff8118fd95>] vfs_truncate+0x185/0x1b0 [<ffffffff8118fe1c>] do_sys_truncate+0x5c/0xa0 [<ffffffff8118ffee>] SyS_truncate+0xe/0x10 [<ffffffff816e2b42>] system_call_fastpath+0x16/0x1b Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-04-23f2fs: add tracepoints for sync & inode operationsNamjae Jeon
Add tracepoints in f2fs for tracing the syncing operations like filesystem sync, file sync enter/exit. It will helf to trace the code under debugging scenarios. Also add tracepoints for tracing the various inode operations like building inode, eviction of inode, link/unlike of inodes. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Pankaj Kumar <pankaj.km@samsung.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> [Jaegeuk: combine and modify the tracepoint structures] Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-04-09f2fs: introduce a new global lock schemeJaegeuk Kim
In the previous version, f2fs uses global locks according to the usage types, such as directory operations, block allocation, block write, and so on. Reference the following lock types in f2fs.h. enum lock_type { RENAME, /* for renaming operations */ DENTRY_OPS, /* for directory operations */ DATA_WRITE, /* for data write */ DATA_NEW, /* for data allocation */ DATA_TRUNC, /* for data truncate */ NODE_NEW, /* for node allocation */ NODE_TRUNC, /* for node truncate */ NODE_WRITE, /* for node write */ NR_LOCK_TYPE, }; In that case, we lose the performance under the multi-threading environment, since every types of operations must be conducted one at a time. In order to address the problem, let's share the locks globally with a mutex array regardless of any types. So, let users grab a mutex and perform their jobs in parallel as much as possbile. For this, I propose a new global lock scheme as follows. 0. Data structure - f2fs_sb_info -> mutex_lock[NR_GLOBAL_LOCKS] - f2fs_sb_info -> node_write 1. mutex_lock_op(sbi) - try to get an avaiable lock from the array. - returns the index of the gottern lock variable. 2. mutex_unlock_op(sbi, index of the lock) - unlock the given index of the lock. 3. mutex_lock_all(sbi) - grab all the locks in the array before the checkpoint. 4. mutex_unlock_all(sbi) - release all the locks in the array after checkpoint. 5. block_operations() - call mutex_lock_all() - sync_dirty_dir_inodes() - grab node_write - sync_node_pages() Note that, the pairs of mutex_lock_op()/mutex_unlock_op() and mutex_lock_all()/mutex_unlock_all() should be used together. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-03-27f2fs: do not skip writing file meta during fsyncJaegeuk Kim
This patch removes data_version check flow during the fsync call. The original purpose for the use of data_version was to avoid writng inode pages redundantly by the fsync calls repeatedly. However, when user can modify file meta and then call fsync, we should not skip fsync procedure. So, let's remove this condition check and hope that user triggers in right manner. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-03-20f2fs: avoid BUG_ON from check_nid_range and update return path in do_read_inodeNamjae Jeon
In function check_nid_range, there is no need to trigger BUG_ON and make kernel stop. Instead it could just check and indicate the inode number to be EINVAL. Update the return path in do_read_inode to use the return from check_nid_range. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com> [Jaegeuk: replace BUG_ON with WARN_ON] Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-02-12f2fs: avoid balanc_fs during evict_inodeJaegeuk Kim
1. Background Previously, if f2fs tries to move data blocks of an *evicting* inode during the cleaning process, it stops the process incompletely and then restarts the whole process, since it needs a locked inode to grab victim data pages in its address space. In order to get a locked inode, iget_locked() by f2fs_iget() is normally used, but, it waits if the inode is on freeing. So, here is a deadlock scenario. 1. f2fs_evict_inode() <- inode "A" 2. f2fs_balance_fs() 3. f2fs_gc() 4. gc_data_segment() 5. f2fs_iget() <- inode "A" too! If step #1 and #5 treat a same inode "A", step #5 would fall into deadlock since the inode "A" is on freeing. In order to resolve this, f2fs_iget_nowait() which skips __wait_on_freeing_inode() was introduced in step #5, and stops f2fs_gc() to complete f2fs_evict_inode(). 1. f2fs_evict_inode() <- inode "A" 2. f2fs_balance_fs() 3. f2fs_gc() 4. gc_data_segment() 5. f2fs_iget_nowait() <- inode "A", then stop f2fs_gc() w/ -ENOENT 2. Problem and Solution In the above scenario, however, f2fs cannot finish f2fs_evict_inode() only if: o there are not enough free sections, and o f2fs_gc() tries to move data blocks of the *evicting* inode repeatedly. So, the final solution is to use f2fs_iget() and remove f2fs_balance_fs() in f2fs_evict_inode(). The f2fs_evict_inode() actually truncates all the data and node blocks, which means that it doesn't produce any dirty node pages accordingly. So, we don't need to do f2fs_balance_fs() in practical. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-02-12f2fs: add un/freeze_fs into super_operationsChangman Lee
This patch supports ioctl FIFREEZE and FITHAW to snapshot filesystem. Before calling f2fs_freeze, all writers would be suspended and sync_fs would be completed. So no f2fs has to do something. Just background gc operation should be skipped due to generate dirty nodes and data until unfreeze. Signed-off-by: Changman Lee <cm224.lee@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-02-12f2fs: save device node number into f2fs_inodeChangman Lee
This patch stores inode->i_rdev into on-disk inode structure. Alun reported that: aspire tmp # mount -t f2fs /dev/sdb mnt aspire tmp # mknod mnt/sda1 b 8 1 aspire tmp # mknod mnt/null c 1 3 aspire tmp # mknod mnt/console c 5 1 aspire tmp # ls -l mnt total 2 crw-r--r-- 1 root root 5, 1 Jan 22 18:44 console crw-r--r-- 1 root root 1, 3 Jan 22 18:44 null brw-r--r-- 1 root root 8, 1 Jan 22 18:44 sda1 aspire tmp # umount mnt aspire tmp # mount -t f2fs /dev/sdb mnt aspire tmp # ls -l mnt total 2 crw-r--r-- 1 root root 0, 0 Jan 22 18:44 console crw-r--r-- 1 root root 0, 0 Jan 22 18:44 null brw-r--r-- 1 root root 0, 0 Jan 22 18:44 sda1 In this report, f2fs lost the major/minor numbers of device files after umount. The reason was revealed that f2fs does not store the inode->i_rdev to the on-disk inode data structure. So, as the other file systems do, f2fs also stores i_rdev into the i_addr fields in on-disk inode structure without any on-disk layout changes. Note that, this bug is limited to device files made by mknod(). Reported-and-Tested-by: Alun Jones <alun.linux@ty-penguin.org.uk> Signed-off-by: Changman Lee <cm224.lee@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2013-01-11f2fs: add f2fs_balance_fs in several interfacesJaegeuk Kim
The f2fs_balance_fs() is to check the number of free sections and decide whether it needs to conduct cleaning or not. If there are not enough free sections, the cleaning job should be started. In order to control an amount of free sections even under high utilization, f2fs should call f2fs_balance_fs at all the VFS interfaces that are able to produce dirty pages. This patch adds the function calls in the missing interfaces as follows. 1. f2fs_setxattr() The f2fs_setxattr() produces dirty node pages so that we should call f2fs_balance_fs() either likewise doing in other VFS interfaces such as f2fs_lookup(), f2fs_mkdir(), and so on. 2. f2fs_sync_file() We should guarantee serving free sections for syncing metadata during fsync. Previously, there is no space check before triggering checkpoint and sync_node_pages. Therefore, if a bunch of fsync calls are triggered under 100% of FS utilization, f2fs is able to be faced with no free sections, resulting in BUG_ON(). 3. f2fs_sync_fs() Before calling write_checkpoint(), we should guarantee that there are minimum free sections. 4. f2fs_write_inode() f2fs_write_inode() is also able to produce dirty node pages. Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-12-26f2fs: fix handling errors got by f2fs_write_inodeJaegeuk Kim
Ruslan reported that f2fs hangs with an infinite loop in f2fs_sync_file(): while (sync_node_pages(sbi, inode->i_ino, &wbc) == 0) f2fs_write_inode(inode, NULL); The reason was revealed that the cold flag is not set even thought this inode is a normal file. Therefore, sync_node_pages() skips to write node blocks since it only writes cold node blocks. The cold flag is stored to the node_footer in node block, and whenever a new node page is allocated, it is set according to its file type, file or directory. But, after sudden-power-off, when recovering the inode page, f2fs doesn't recover its cold flag. So, let's assign the cold flag in more right places. One more thing: If f2fs_write_inode() returns an error due to whatever situations, there would be no dirty node pages so that sync_node_pages() returns zero. (i.e., zero means nothing was written.) Reported-by: Ruslan N. Marchenko <me@ruff.mobi> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-12-11f2fs: fix tracking parent inode numberJaegeuk Kim
Previously, f2fs didn't track the parent inode number correctly which is stored in each f2fs_inode. In the case of the following scenario, a bug can be occured. Let's suppose there are one directory, "/b", and two files, "/a" and "/b/a". - pino of "/a" is ROOT_INO. - pino of "/b/a" is DIR_B_INO. Then, # sync : The inode pages of "/a" and "/b/a" contain the parent inode numbers as ROOT_INO and DIR_B_INO respectively. # mv /a /b/a : The parent inode number of "/a" should be changed to DIR_B_INO, but f2fs didn't do that. Ref. f2fs_set_link(). In order to fix this clearly, I added i_pino in f2fs_inode_info, and whenever it needs to be changed like in f2fs_add_link() and f2fs_set_link(), it is updated temporarily in f2fs_inode_info. And later, f2fs_write_inode() stores the latest information to the inode pages. For power-off-recovery, f2fs_sync_file() triggers simply f2fs_write_inode(). Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-12-11f2fs: adjust kernel coding styleJaegeuk Kim
As pointed out by Randy Dunlap, this patch removes all usage of "/**" for comment blocks. Instead, just use "/*". Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
2012-12-11f2fs: add core inode operationsJaegeuk Kim
This adds core functions to get, read, write, and evict an inode. Signed-off-by: Changman Lee <cm224.lee@samsung.com> Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>