aboutsummaryrefslogtreecommitdiff
path: root/Documentation/filesystems/proc.txt
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/filesystems/proc.txt')
-rw-r--r--Documentation/filesystems/proc.txt207
1 files changed, 182 insertions, 25 deletions
diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
index a1793d670cd..ddc531a74d0 100644
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -33,7 +33,7 @@ Table of Contents
2 Modifying System Parameters
3 Per-Process Parameters
- 3.1 /proc/<pid>/oom_score_adj - Adjust the oom-killer
+ 3.1 /proc/<pid>/oom_adj & /proc/<pid>/oom_score_adj - Adjust the oom-killer
score
3.2 /proc/<pid>/oom_score - Display current oom-killer score
3.3 /proc/<pid>/io - Display the IO accounting fields
@@ -41,6 +41,7 @@ Table of Contents
3.5 /proc/<pid>/mountinfo - Information about mounts
3.6 /proc/<pid>/comm & /proc/<pid>/task/<tid>/comm
3.7 /proc/<pid>/task/<tid>/children - Information about task children
+ 3.8 /proc/<pid>/fdinfo/<fd> - Information about opened file
4 Configuring procfs
4.1 Mount options
@@ -142,7 +143,7 @@ Table 1-1: Process specific entries in /proc
pagemap Page table
stack Report full stack trace, enable via CONFIG_STACKTRACE
smaps a extension based on maps, showing the memory consumption of
- each mapping
+ each mapping and flags associated with it
..............................................................................
For example, to get the status information of a process, all you have to do is
@@ -181,6 +182,7 @@ read the file /proc/PID/status:
CapPrm: 0000000000000000
CapEff: 0000000000000000
CapBnd: ffffffffffffffff
+ Seccomp: 0
voluntary_ctxt_switches: 0
nonvoluntary_ctxt_switches: 1
@@ -232,11 +234,12 @@ Table 1-2: Contents of the status files (as of 2.6.30-rc7)
ShdPnd bitmap of shared pending signals for the process
SigBlk bitmap of blocked signals
SigIgn bitmap of ignored signals
- SigCgt bitmap of catched signals
+ SigCgt bitmap of caught signals
CapInh bitmap of inheritable capabilities
CapPrm bitmap of permitted capabilities
CapEff bitmap of effective capabilities
CapBnd bitmap of capabilities bounding set
+ Seccomp seccomp mode, like prctl(PR_GET_SECCOMP, ...)
Cpus_allowed mask of CPUs on which this process may run
Cpus_allowed_list Same as previous, but in "list format"
Mems_allowed mask of memory nodes allowed to this process
@@ -297,7 +300,7 @@ Table 1-4: Contents of the stat files (as of 2.6.30-rc7)
pending bitmap of pending signals
blocked bitmap of blocked signals
sigign bitmap of ignored signals
- sigcatch bitmap of catched signals
+ sigcatch bitmap of caught signals
wchan address where process went to sleep
0 (place holder)
0 (place holder)
@@ -415,8 +418,9 @@ Swap: 0 kB
KernelPageSize: 4 kB
MMUPageSize: 4 kB
Locked: 374 kB
+VmFlags: rd ex mr mw me de
-The first of these lines shows the same information as is displayed for the
+the first of these lines shows the same information as is displayed for the
mapping in /proc/PID/maps. The remaining lines show the size of the mapping
(size), the amount of the mapping that is currently resident in RAM (RSS), the
process' proportional share of this mapping (PSS), the number of clean and
@@ -430,11 +434,48 @@ and a page is modified, the file page is replaced by a private anonymous copy.
"Swap" shows how much would-be-anonymous memory is also used, but out on
swap.
+"VmFlags" field deserves a separate description. This member represents the kernel
+flags associated with the particular virtual memory area in two letter encoded
+manner. The codes are the following:
+ rd - readable
+ wr - writeable
+ ex - executable
+ sh - shared
+ mr - may read
+ mw - may write
+ me - may execute
+ ms - may share
+ gd - stack segment growns down
+ pf - pure PFN range
+ dw - disabled write to the mapped file
+ lo - pages are locked in memory
+ io - memory mapped I/O area
+ sr - sequential read advise provided
+ rr - random read advise provided
+ dc - do not copy area on fork
+ de - do not expand area on remapping
+ ac - area is accountable
+ nr - swap space is not reserved for the area
+ ht - area uses huge tlb pages
+ nl - non-linear mapping
+ ar - architecture specific flag
+ dd - do not include area into core dump
+ sd - soft-dirty flag
+ mm - mixed map area
+ hg - huge page advise flag
+ nh - no-huge page advise flag
+ mg - mergable advise flag
+
+Note that there is no guarantee that every flag and associated mnemonic will
+be present in all further kernel releases. Things get changed, the flags may
+be vanished or the reverse -- new added.
+
This file is only present if the CONFIG_MMU kernel configuration option is
enabled.
The /proc/PID/clear_refs is used to reset the PG_Referenced and ACCESSED/YOUNG
-bits on both physical and virtual pages associated with a process.
+bits on both physical and virtual pages associated with a process, and the
+soft-dirty bit on pte (see Documentation/vm/soft-dirty.txt for details).
To clear the bits for all the pages associated with the process
> echo 1 > /proc/PID/clear_refs
@@ -443,6 +484,10 @@ To clear the bits for the anonymous pages associated with the process
To clear the bits for the file mapped pages associated with the process
> echo 3 > /proc/PID/clear_refs
+
+To clear the soft-dirty bit
+ > echo 4 > /proc/PID/clear_refs
+
Any other value written to /proc/PID/clear_refs will have no effect.
The /proc/pid/pagemap gives the PFN, which can be used to find the pageflags
@@ -502,7 +547,7 @@ Table 1-5: Kernel info in /proc
sys See chapter 2
sysvipc Info of SysVIPC Resources (msg, sem, shm) (2.4)
tty Info of tty drivers
- uptime System uptime
+ uptime Wall clock since boot, combined idle time of all cpus
version Kernel version
video bttv info of video resources (2.4)
vmallocinfo Show vmalloced areas
@@ -722,6 +767,7 @@ The "Locked" indicates whether the mapping is locked in memory or not.
MemTotal: 16344972 kB
MemFree: 13634064 kB
+MemAvailable: 14836172 kB
Buffers: 3656 kB
Cached: 1195708 kB
SwapCached: 0 kB
@@ -754,6 +800,14 @@ AnonHugePages: 49152 kB
MemTotal: Total usable ram (i.e. physical ram minus a few reserved
bits and the kernel binary code)
MemFree: The sum of LowFree+HighFree
+MemAvailable: An estimate of how much memory is available for starting new
+ applications, without swapping. Calculated from MemFree,
+ SReclaimable, the size of the file LRU lists, and the low
+ watermarks in each zone.
+ The estimate takes into account that the system needs some
+ page cache to function well, and that not all reclaimable
+ slab will be reclaimable, due to items being in use. The
+ impact of those factors will vary from system to system.
Buffers: Relatively temporary storage for raw disk blocks
shouldn't get tremendously large (20MB or so)
Cached: in-memory cache for files read from the disk (the
@@ -800,7 +854,8 @@ WritebackTmp: Memory used by FUSE for temporary writeback buffers
if strict overcommit accounting is enabled (mode 2 in
'vm.overcommit_memory').
The CommitLimit is calculated with the following formula:
- CommitLimit = ('vm.overcommit_ratio' * Physical RAM) + Swap
+ CommitLimit = ([total RAM pages] - [total huge TLB pages]) *
+ overcommit_ratio / 100 + [total swap pages]
For example, on a system with 1G of physical RAM and 7G
of swap with a `vm.overcommit_ratio` of 30 it would
yield a CommitLimit of 7.3G.
@@ -810,16 +865,15 @@ Committed_AS: The amount of memory presently allocated on the system.
The committed memory is a sum of all of the memory which
has been allocated by processes, even if it has not been
"used" by them as of yet. A process which malloc()'s 1G
- of memory, but only touches 300M of it will only show up
- as using 300M of memory even if it has the address space
- allocated for the entire 1G. This 1G is memory which has
- been "committed" to by the VM and can be used at any time
- by the allocating application. With strict overcommit
- enabled on the system (mode 2 in 'vm.overcommit_memory'),
- allocations which would exceed the CommitLimit (detailed
- above) will not be permitted. This is useful if one needs
- to guarantee that processes will not fail due to lack of
- memory once that memory has been successfully allocated.
+ of memory, but only touches 300M of it will show up as
+ using 1G. This 1G is memory which has been "committed" to
+ by the VM and can be used at any time by the allocating
+ application. With strict overcommit enabled on the system
+ (mode 2 in 'vm.overcommit_memory'),allocations which would
+ exceed the CommitLimit (detailed above) will not be permitted.
+ This is useful if one needs to guarantee that processes will
+ not fail due to lack of memory once that memory has been
+ successfully allocated.
VmallocTotal: total size of vmalloc memory area
VmallocUsed: amount of vmalloc area which is used
VmallocChunk: largest contiguous block of vmalloc area which is free
@@ -1192,8 +1246,9 @@ second). The meanings of the columns are as follows, from left to right:
The "intr" line gives counts of interrupts serviced since boot time, for each
of the possible system interrupts. The first column is the total of all
-interrupts serviced; each subsequent column is the total for that particular
-interrupt.
+interrupts serviced including unnumbered architecture specific interrupts;
+each subsequent column is the total for that particular numbered interrupt.
+Unnumbered interrupts are not shown, only summed into the total.
The "ctxt" line gives the total number of context switches across all CPUs.
@@ -1320,10 +1375,10 @@ of the kernel.
CHAPTER 3: PER-PROCESS PARAMETERS
------------------------------------------------------------------------------
-3.1 /proc/<pid>/oom_score_adj- Adjust the oom-killer score
+3.1 /proc/<pid>/oom_adj & /proc/<pid>/oom_score_adj- Adjust the oom-killer score
--------------------------------------------------------------------------------
-This file can be used to adjust the badness heuristic used to select which
+These file can be used to adjust the badness heuristic used to select which
process gets killed in out of memory conditions.
The badness heuristic assigns a value to each candidate task ranging from 0
@@ -1333,8 +1388,8 @@ may allocate from based on an estimation of its current memory and swap use.
For example, if a task is using all allowed memory, its badness score will be
1000. If it is using half of its allowed memory, its score will be 500.
-There is an additional factor included in the badness score: root
-processes are given 3% extra memory over other tasks.
+There is an additional factor included in the badness score: the current memory
+and swap usage is discounted by 3% for root processes.
The amount of "allowed" memory depends on the context in which the oom killer
was called. If it is due to the memory assigned to the allocating task's cpuset
@@ -1361,6 +1416,12 @@ same system, cpuset, mempolicy, or memory controller resources to use at least
equivalent to discounting 50% of the task's allowed memory from being considered
as scoring against the task.
+For backwards compatibility with previous kernels, /proc/<pid>/oom_adj may also
+be used to tune the badness score. Its acceptable values range from -16
+(OOM_ADJUST_MIN) to +15 (OOM_ADJUST_MAX) and a special value of -17
+(OOM_DISABLE) to disable oom killing entirely for that task. Its value is
+scaled linearly with /proc/<pid>/oom_score_adj.
+
The value of /proc/<pid>/oom_score_adj may be reduced no lower than the last
value set by a CAP_SYS_RESOURCE process. To reduce the value any lower
requires CAP_SYS_RESOURCE.
@@ -1375,7 +1436,9 @@ minimal amount of work.
-------------------------------------------------------------
This file can be used to check the current score used by the oom-killer is for
-any given <pid>.
+any given <pid>. Use it together with /proc/<pid>/oom_score_adj to tune which
+process should be killed in an out-of-memory situation.
+
3.3 /proc/<pid>/io - Display the IO accounting fields
-------------------------------------------------------
@@ -1587,6 +1650,100 @@ pids, so one need to either stop or freeze processes being inspected
if precise results are needed.
+3.8 /proc/<pid>/fdinfo/<fd> - Information about opened file
+---------------------------------------------------------------
+This file provides information associated with an opened file. The regular
+files have at least three fields -- 'pos', 'flags' and mnt_id. The 'pos'
+represents the current offset of the opened file in decimal form [see lseek(2)
+for details], 'flags' denotes the octal O_xxx mask the file has been
+created with [see open(2) for details] and 'mnt_id' represents mount ID of
+the file system containing the opened file [see 3.5 /proc/<pid>/mountinfo
+for details].
+
+A typical output is
+
+ pos: 0
+ flags: 0100002
+ mnt_id: 19
+
+The files such as eventfd, fsnotify, signalfd, epoll among the regular pos/flags
+pair provide additional information particular to the objects they represent.
+
+ Eventfd files
+ ~~~~~~~~~~~~~
+ pos: 0
+ flags: 04002
+ mnt_id: 9
+ eventfd-count: 5a
+
+ where 'eventfd-count' is hex value of a counter.
+
+ Signalfd files
+ ~~~~~~~~~~~~~~
+ pos: 0
+ flags: 04002
+ mnt_id: 9
+ sigmask: 0000000000000200
+
+ where 'sigmask' is hex value of the signal mask associated
+ with a file.
+
+ Epoll files
+ ~~~~~~~~~~~
+ pos: 0
+ flags: 02
+ mnt_id: 9
+ tfd: 5 events: 1d data: ffffffffffffffff
+
+ where 'tfd' is a target file descriptor number in decimal form,
+ 'events' is events mask being watched and the 'data' is data
+ associated with a target [see epoll(7) for more details].
+
+ Fsnotify files
+ ~~~~~~~~~~~~~~
+ For inotify files the format is the following
+
+ pos: 0
+ flags: 02000000
+ inotify wd:3 ino:9e7e sdev:800013 mask:800afce ignored_mask:0 fhandle-bytes:8 fhandle-type:1 f_handle:7e9e0000640d1b6d
+
+ where 'wd' is a watch descriptor in decimal form, ie a target file
+ descriptor number, 'ino' and 'sdev' are inode and device where the
+ target file resides and the 'mask' is the mask of events, all in hex
+ form [see inotify(7) for more details].
+
+ If the kernel was built with exportfs support, the path to the target
+ file is encoded as a file handle. The file handle is provided by three
+ fields 'fhandle-bytes', 'fhandle-type' and 'f_handle', all in hex
+ format.
+
+ If the kernel is built without exportfs support the file handle won't be
+ printed out.
+
+ If there is no inotify mark attached yet the 'inotify' line will be omitted.
+
+ For fanotify files the format is
+
+ pos: 0
+ flags: 02
+ mnt_id: 9
+ fanotify flags:10 event-flags:0
+ fanotify mnt_id:12 mflags:40 mask:38 ignored_mask:40000003
+ fanotify ino:4f969 sdev:800013 mflags:0 mask:3b ignored_mask:40000000 fhandle-bytes:8 fhandle-type:1 f_handle:69f90400c275b5b4
+
+ where fanotify 'flags' and 'event-flags' are values used in fanotify_init
+ call, 'mnt_id' is the mount point identifier, 'mflags' is the value of
+ flags associated with mark which are tracked separately from events
+ mask. 'ino', 'sdev' are target inode and device, 'mask' is the events
+ mask and 'ignored_mask' is the mask of events which are to be ignored.
+ All in hex format. Incorporation of 'mflags', 'mask' and 'ignored_mask'
+ does provide information about flags and mask used in fanotify_mark
+ call [see fsnotify manpage for details].
+
+ While the first three lines are mandatory and always printed, the rest is
+ optional and may be omitted if no marks created yet.
+
+
------------------------------------------------------------------------------
Configuring procfs
------------------------------------------------------------------------------