Merge commit '85082fd7cbe3173198aac0eb5e85ab1edcc6352c' into test-build

Manual fixup of: arch/powerpc/Kconfig
author: Benjamin Herrenschmidt <benh@kernel.crashing.org> 2008-07-15 15:44:51 +1000
committer: Benjamin Herrenschmidt <benh@kernel.crashing.org> 2008-07-15 15:44:51 +1000
commit: 43d2548bb2ef7e6d753f91468a746784041e522d (patch)
tree: 77d13fcd48fd998393abb825ec36e2b732684a73 /Documentation
parent: 585583d95c5660973bc0cf64add517b040acd8a4 (diff)
parent: 85082fd7cbe3173198aac0eb5e85ab1edcc6352c (diff)
26 files changed, 825 insertions, 121 deletions
diff --git a/Documentation/ABI/testing/sysfs-block b/Documentation/ABI/testing/sysfs-block
index 4bd9ea53912..44f52a4f590 100644
--- a/Documentation/ABI/testing/sysfs-block
+++ b/Documentation/ABI/testing/sysfs-block
@@ -26,3 +26,37 @@ Description:
 		I/O statistics of partition <part>. The format is the
 		same as the above-written /sys/block/<disk>/stat
 		format.
+
+
+What:		/sys/block/<disk>/integrity/format
+Date:		June 2008
+Contact:	Martin K. Petersen <martin.petersen@oracle.com>
+Description:
+		Metadata format for integrity capable block device.
+		E.g. T10-DIF-TYPE1-CRC.
+
+
+What:		/sys/block/<disk>/integrity/read_verify
+Date:		June 2008
+Contact:	Martin K. Petersen <martin.petersen@oracle.com>
+Description:
+		Indicates whether the block layer should verify the
+		integrity of read requests serviced by devices that
+		support sending integrity metadata.
+
+
+What:		/sys/block/<disk>/integrity/tag_size
+Date:		June 2008
+Contact:	Martin K. Petersen <martin.petersen@oracle.com>
+Description:
+		Number of bytes of integrity tag space available per
+		512 bytes of data.
+
+
+What:		/sys/block/<disk>/integrity/write_generate
+Date:		June 2008
+Contact:	Martin K. Petersen <martin.petersen@oracle.com>
+Description:
+		Indicates whether the block layer should automatically
+		generate checksums for write requests bound for
+		devices that support receiving integrity metadata.
diff --git a/Documentation/ABI/testing/sysfs-bus-css b/Documentation/ABI/testing/sysfs-bus-css
new file mode 100644
index 00000000000..b585ec258a0
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-bus-css
@@ -0,0 +1,35 @@
+What:		/sys/bus/css/devices/.../type
+Date:		March 2008
+Contact:	Cornelia Huck <cornelia.huck@de.ibm.com>
+		linux-s390@vger.kernel.org
+Description:	Contains the subchannel type, as reported by the hardware.
+		This attribute is present for all subchannel types.
+
+What:		/sys/bus/css/devices/.../modalias
+Date:		March 2008
+Contact:	Cornelia Huck <cornelia.huck@de.ibm.com>
+		linux-s390@vger.kernel.org
+Description:	Contains the module alias as reported with uevents.
+		It is of the format css:t<type> and present for all
+		subchannel types.
+
+What:		/sys/bus/css/drivers/io_subchannel/.../chpids
+Date:		December 2002
+Contact:	Cornelia Huck <cornelia.huck@de.ibm.com>
+		linux-s390@vger.kernel.org
+Description:	Contains the ids of the channel paths used by this
+		subchannel, as reported by the channel subsystem
+		during subchannel recognition.
+		Note: This is an I/O-subchannel specific attribute.
+Users:		s390-tools, HAL
+
+What:		/sys/bus/css/drivers/io_subchannel/.../pimpampom
+Date:		December 2002
+Contact:	Cornelia Huck <cornelia.huck@de.ibm.com>
+		linux-s390@vger.kernel.org
+Description:	Contains the PIM/PAM/POM values, as reported by the
+		channel subsystem when last queried by the common I/O
+		layer (this implies that this attribute is not neccessarily
+		in sync with the values current in the channel subsystem).
+		Note: This is an I/O-subchannel specific attribute.
+Users:		s390-tools, HAL
diff --git a/Documentation/ABI/testing/sysfs-firmware-memmap b/Documentation/ABI/testing/sysfs-firmware-memmap
new file mode 100644
index 00000000000..0d99ee6ae02
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-firmware-memmap
@@ -0,0 +1,71 @@
+What:		/sys/firmware/memmap/
+Date:		June 2008
+Contact:	Bernhard Walle <bwalle@suse.de>
+Description:
+		On all platforms, the firmware provides a memory map which the
+		kernel reads. The resources from that memory map are registered
+		in the kernel resource tree and exposed to userspace via
+		/proc/iomem (together with other resources).
+
+		However, on most architectures that firmware-provided memory
+		map is modified afterwards by the kernel itself, either because
+		the kernel merges that memory map with other information or
+		just because the user overwrites that memory map via command
+		line.
+
+		kexec needs the raw firmware-provided memory map to setup the
+		parameter segment of the kernel that should be booted with
+		kexec. Also, the raw memory map is useful for debugging. For
+		that reason, /sys/firmware/memmap is an interface that provides
+		the raw memory map to userspace.
+
+		The structure is as follows: Under /sys/firmware/memmap there
+		are subdirectories with the number of the entry as their name:
+
+			/sys/firmware/memmap/0
+			/sys/firmware/memmap/1
+			/sys/firmware/memmap/2
+			/sys/firmware/memmap/3
+			...
+
+		The maximum depends on the number of memory map entries provided
+		by the firmware. The order is just the order that the firmware
+		provides.
+
+		Each directory contains three files:
+
+		start	: The start address (as hexadecimal number with the
+			  '0x' prefix).
+		end	: The end address, inclusive (regardless whether the
+			  firmware provides inclusive or exclusive ranges).
+		type	: Type of the entry as string. See below for a list of
+			  valid types.
+
+		So, for example:
+
+			/sys/firmware/memmap/0/start
+			/sys/firmware/memmap/0/end
+			/sys/firmware/memmap/0/type
+			/sys/firmware/memmap/1/start
+			...
+
+		Currently following types exist:
+
+		  - System RAM
+		  - ACPI Tables
+		  - ACPI Non-volatile Storage
+		  - reserved
+
+		Following shell snippet can be used to display that memory
+		map in a human-readable format:
+
+		-------------------- 8< ----------------------------------------
+		  #!/bin/bash
+		  cd /sys/firmware/memmap
+		  for dir in * ; do
+		      start=$(cat $dir/start)
+		      end=$(cat $dir/end)
+		      type=$(cat $dir/type)
+		      printf "%016x-%016x (%s)\n" $start $[ $end +1] "$type"
+		  done
+		-------------------- >8 ----------------------------------------
diff --git a/Documentation/block/data-integrity.txt b/Documentation/block/data-integrity.txt
new file mode 100644
index 00000000000..e9dc8d86adc
--- /dev/null
+++ b/Documentation/block/data-integrity.txt
@@ -0,0 +1,327 @@
+----------------------------------------------------------------------
+1. INTRODUCTION
+
+Modern filesystems feature checksumming of data and metadata to
+protect against data corruption.  However, the detection of the
+corruption is done at read time which could potentially be months
+after the data was written.  At that point the original data that the
+application tried to write is most likely lost.
+
+The solution is to ensure that the disk is actually storing what the
+application meant it to.  Recent additions to both the SCSI family
+protocols (SBC Data Integrity Field, SCC protection proposal) as well
+as SATA/T13 (External Path Protection) try to remedy this by adding
+support for appending integrity metadata to an I/O.  The integrity
+metadata (or protection information in SCSI terminology) includes a
+checksum for each sector as well as an incrementing counter that
+ensures the individual sectors are written in the right order.  And
+for some protection schemes also that the I/O is written to the right
+place on disk.
+
+Current storage controllers and devices implement various protective
+measures, for instance checksumming and scrubbing.  But these
+technologies are working in their own isolated domains or at best
+between adjacent nodes in the I/O path.  The interesting thing about
+DIF and the other integrity extensions is that the protection format
+is well defined and every node in the I/O path can verify the
+integrity of the I/O and reject it if corruption is detected.  This
+allows not only corruption prevention but also isolation of the point
+of failure.
+
+----------------------------------------------------------------------
+2. THE DATA INTEGRITY EXTENSIONS
+
+As written, the protocol extensions only protect the path between
+controller and storage device.  However, many controllers actually
+allow the operating system to interact with the integrity metadata
+(IMD).  We have been working with several FC/SAS HBA vendors to enable
+the protection information to be transferred to and from their
+controllers.
+
+The SCSI Data Integrity Field works by appending 8 bytes of protection
+information to each sector.  The data + integrity metadata is stored
+in 520 byte sectors on disk.  Data + IMD are interleaved when
+transferred between the controller and target.  The T13 proposal is
+similar.
+
+Because it is highly inconvenient for operating systems to deal with
+520 (and 4104) byte sectors, we approached several HBA vendors and
+encouraged them to allow separation of the data and integrity metadata
+scatter-gather lists.
+
+The controller will interleave the buffers on write and split them on
+read.  This means that the Linux can DMA the data buffers to and from
+host memory without changes to the page cache.
+
+Also, the 16-bit CRC checksum mandated by both the SCSI and SATA specs
+is somewhat heavy to compute in software.  Benchmarks found that
+calculating this checksum had a significant impact on system
+performance for a number of workloads.  Some controllers allow a
+lighter-weight checksum to be used when interfacing with the operating
+system.  Emulex, for instance, supports the TCP/IP checksum instead.
+The IP checksum received from the OS is converted to the 16-bit CRC
+when writing and vice versa.  This allows the integrity metadata to be
+generated by Linux or the application at very low cost (comparable to
+software RAID5).
+
+The IP checksum is weaker than the CRC in terms of detecting bit
+errors.  However, the strength is really in the separation of the data
+buffers and the integrity metadata.  These two distinct buffers much
+match up for an I/O to complete.
+
+The separation of the data and integrity metadata buffers as well as
+the choice in checksums is referred to as the Data Integrity
+Extensions.  As these extensions are outside the scope of the protocol
+bodies (T10, T13), Oracle and its partners are trying to standardize
+them within the Storage Networking Industry Association.
+
+----------------------------------------------------------------------
+3. KERNEL CHANGES
+
+The data integrity framework in Linux enables protection information
+to be pinned to I/Os and sent to/received from controllers that
+support it.
+
+The advantage to the integrity extensions in SCSI and SATA is that
+they enable us to protect the entire path from application to storage
+device.  However, at the same time this is also the biggest
+disadvantage. It means that the protection information must be in a
+format that can be understood by the disk.
+
+Generally Linux/POSIX applications are agnostic to the intricacies of
+the storage devices they are accessing.  The virtual filesystem switch
+and the block layer make things like hardware sector size and
+transport protocols completely transparent to the application.
+
+However, this level of detail is required when preparing the
+protection information to send to a disk.  Consequently, the very
+concept of an end-to-end protection scheme is a layering violation.
+It is completely unreasonable for an application to be aware whether
+it is accessing a SCSI or SATA disk.
+
+The data integrity support implemented in Linux attempts to hide this
+from the application.  As far as the application (and to some extent
+the kernel) is concerned, the integrity metadata is opaque information
+that's attached to the I/O.
+
+The current implementation allows the block layer to automatically
+generate the protection information for any I/O.  Eventually the
+intent is to move the integrity metadata calculation to userspace for
+user data.  Metadata and other I/O that originates within the kernel
+will still use the automatic generation interface.
+
+Some storage devices allow each hardware sector to be tagged with a
+16-bit value.  The owner of this tag space is the owner of the block
+device.  I.e. the filesystem in most cases.  The filesystem can use
+this extra space to tag sectors as they see fit.  Because the tag
+space is limited, the block interface allows tagging bigger chunks by
+way of interleaving.  This way, 8*16 bits of information can be
+attached to a typical 4KB filesystem block.
+
+This also means that applications such as fsck and mkfs will need
+access to manipulate the tags from user space.  A passthrough
+interface for this is being worked on.
+
+
+----------------------------------------------------------------------
+4. BLOCK LAYER IMPLEMENTATION DETAILS
+
+4.1 BIO
+
+The data integrity patches add a new field to struct bio when
+CONFIG_BLK_DEV_INTEGRITY is enabled.  bio->bi_integrity is a pointer
+to a struct bip which contains the bio integrity payload.  Essentially
+a bip is a trimmed down struct bio which holds a bio_vec containing
+the integrity metadata and the required housekeeping information (bvec
+pool, vector count, etc.)
+
+A kernel subsystem can enable data integrity protection on a bio by
+calling bio_integrity_alloc(bio).  This will allocate and attach the
+bip to the bio.
+
+Individual pages containing integrity metadata can subsequently be
+attached using bio_integrity_add_page().
+
+bio_free() will automatically free the bip.
+
+
+4.2 BLOCK DEVICE
+
+Because the format of the protection data is tied to the physical
+disk, each block device has been extended with a block integrity
+profile (struct blk_integrity).  This optional profile is registered
+with the block layer using blk_integrity_register().
+
+The profile contains callback functions for generating and verifying
+the protection data, as well as getting and setting application tags.
+The profile also contains a few constants to aid in completing,
+merging and splitting the integrity metadata.
+
+Layered block devices will need to pick a profile that's appropriate
+for all subdevices.  blk_integrity_compare() can help with that.  DM
+and MD linear, RAID0 and RAID1 are currently supported.  RAID4/5/6
+will require extra work due to the application tag.
+
+
+----------------------------------------------------------------------
+5.0 BLOCK LAYER INTEGRITY API
+
+5.1 NORMAL FILESYSTEM
+
+    The normal filesystem is unaware that the underlying block device
+    is capable of sending/receiving integrity metadata.  The IMD will
+    be automatically generated by the block layer at submit_bio() time
+    in case of a WRITE.  A READ request will cause the I/O integrity
+    to be verified upon completion.
+
+    IMD generation and verification can be toggled using the
+
+      /sys/block/<bdev>/integrity/write_generate
+
+    and
+
+      /sys/block/<bdev>/integrity/read_verify
+
+    flags.
+
+
+5.2 INTEGRITY-AWARE FILESYSTEM
+
+    A filesystem that is integrity-aware can prepare I/Os with IMD
+    attached.  It can also use the application tag space if this is
+    supported by the block device.
+
+
+    int bdev_integrity_enabled(block_device, int rw);
+
+      bdev_integrity_enabled() will return 1 if the block device
+      supports integrity metadata transfer for the data direction
+      specified in 'rw'.
+
+      bdev_integrity_enabled() honors the write_generate and
+      read_verify flags in sysfs and will respond accordingly.
+
+
+    int bio_integrity_prep(bio);
+
+      To generate IMD for WRITE and to set up buffers for READ, the
+      filesystem must call bio_integrity_prep(bio).
+
+      Prior to calling this function, the bio data direction and start
+      sector must be set, and the bio should have all data pages
+      added.  It is up to the caller to ensure that the bio does not
+      change while I/O is in progress.
+
+      bio_integrity_prep() should only be called if
+      bio_integrity_enabled() returned 1.
+
+
+    int bio_integrity_tag_size(bio);
+
+      If the filesystem wants to use the application tag space it will
+      first have to find out how much storage space is available.
+      Because tag space is generally limited (usually 2 bytes per
+      sector regardless of sector size), the integrity framework
+      supports interleaving the information between the sectors in an
+      I/O.
+
+      Filesystems can call bio_integrity_tag_size(bio) to find out how
+      many bytes of storage are available for that particular bio.
+
+      Another option is bdev_get_tag_size(block_device) which will
+      return the number of available bytes per hardware sector.
+
+
+    int bio_integrity_set_tag(bio, void *tag_buf, len);
+
+      After a successful return from bio_integrity_prep(),
+      bio_integrity_set_tag() can be used to attach an opaque tag
+      buffer to a bio.  Obviously this only makes sense if the I/O is
+      a WRITE.
+
+
+    int bio_integrity_get_tag(bio, void *tag_buf, len);
+
+      Similarly, at READ I/O completion time the filesystem can
+      retrieve the tag buffer using bio_integrity_get_tag().
+
+
+6.3 PASSING EXISTING INTEGRITY METADATA
+
+    Filesystems that either generate their own integrity metadata or
+    are capable of transferring IMD from user space can use the
+    following calls:
+
+
+    struct bip * bio_integrity_alloc(bio, gfp_mask, nr_pages);
+
+      Allocates the bio integrity payload and hangs it off of the bio.
+      nr_pages indicate how many pages of protection data need to be
+      stored in the integrity bio_vec list (similar to bio_alloc()).
+
+      The integrity payload will be freed at bio_free() time.
+
+
+    int bio_integrity_add_page(bio, page, len, offset);
+
+      Attaches a page containing integrity metadata to an existing
+      bio.  The bio must have an existing bip,
+      i.e. bio_integrity_alloc() must have been called.  For a WRITE,
+      the integrity metadata in the pages must be in a format
+      understood by the target device with the notable exception that
+      the sector numbers will be remapped as the request traverses the
+      I/O stack.  This implies that the pages added using this call
+      will be modified during I/O!  The first reference tag in the
+      integrity metadata must have a value of bip->bip_sector.
+
+      Pages can be added using bio_integrity_add_page() as long as
+      there is room in the bip bio_vec array (nr_pages).
+
+      Upon completion of a READ operation, the attached pages will
+      contain the integrity metadata received from the storage device.
+      It is up to the receiver to process them and verify data
+      integrity upon completion.
+
+
+6.4 REGISTERING A BLOCK DEVICE AS CAPABLE OF EXCHANGING INTEGRITY
+    METADATA
+
+    To enable integrity exchange on a block device the gendisk must be
+    registered as capable:
+
+    int blk_integrity_register(gendisk, blk_integrity);
+
+      The blk_integrity struct is a template and should contain the
+      following:
+
+        static struct blk_integrity my_profile = {
+            .name                   = "STANDARDSBODY-TYPE-VARIANT-CSUM",
+            .generate_fn            = my_generate_fn,
+       	    .verify_fn              = my_verify_fn,
+       	    .get_tag_fn             = my_get_tag_fn,
+       	    .set_tag_fn             = my_set_tag_fn,
+	    .tuple_size             = sizeof(struct my_tuple_size),
+	    .tag_size               = <tag bytes per hw sector>,
+        };
+
+      'name' is a text string which will be visible in sysfs.  This is
+      part of the userland API so chose it carefully and never change
+      it.  The format is standards body-type-variant.
+      E.g. T10-DIF-TYPE1-IP or T13-EPP-0-CRC.
+
+      'generate_fn' generates appropriate integrity metadata (for WRITE).
+
+      'verify_fn' verifies that the data buffer matches the integrity
+      metadata.
+
+      'tuple_size' must be set to match the size of the integrity
+      metadata per sector.  I.e. 8 for DIF and EPP.
+
+      'tag_size' must be set to identify how many bytes of tag space
+      are available per hardware sector.  For DIF this is either 2 or
+      0 depending on the value of the Control Mode Page ATO bit.
+
+      See 6.2 for a description of get_tag_fn and set_tag_fn.
+
+----------------------------------------------------------------------
+2007-12-24 Martin K. Petersen <martin.petersen@oracle.com>
diff --git a/Documentation/ftrace.txt b/Documentation/ftrace.txt
index 13e4bf054c3..77d3faa1a61 100644
--- a/Documentation/ftrace.txt
+++ b/Documentation/ftrace.txt
@@ -2,8 +2,11 @@
 		========================
 
 Copyright 2008 Red Hat Inc.
-Author: Steven Rostedt <srostedt@redhat.com>
+   Author:   Steven Rostedt <srostedt@redhat.com>
+  License:   The GNU Free Documentation License, Version 1.2
+Reviewers:   Elias Oltmanns and Randy Dunlap
 
+Writen for: 2.6.26-rc8 linux-2.6-tip.git tip/tracing/ftrace branch
 
 Introduction
 ------------
@@ -46,7 +49,7 @@ of ftrace. Here is a list of some of the key files:
 		that is configured.
 
   available_tracers : This holds the different types of tracers that
-		has been compiled into the kernel. The tracers
+		have been compiled into the kernel. The tracers
 		listed here can be configured by echoing in their
 		name into current_tracer.
 
@@ -90,11 +93,13 @@ of ftrace. Here is a list of some of the key files:
   trace_entries : This sets or displays the number of trace
 		entries each CPU buffer can hold. The tracer buffers
 		are the same size for each CPU, so care must be
-		taken when modifying the trace_entries. The number
-		of actually entries will be the number given
-		times the number of possible CPUS. The buffers
-		are saved as individual pages, and the actual entries
-		will always be rounded up to entries per page.
+		taken when modifying the trace_entries. The trace
+		buffers are allocated in pages (blocks of memory that
+		the kernel uses for allocation, usually 4 KB in size).
+		Since each entry is smaller than a page, if the last
+		allocated page has room for more entries than were
+		requested, the rest of the page is used to allocate
+		entries.
 
 		This can only be updated when the current_tracer
 		is set to "none".
@@ -114,13 +119,13 @@ of ftrace. Here is a list of some of the key files:
 		in performance.  This also has a side effect of
 		enabling or disabling specific functions to be
 		traced.  Echoing in names of functions into this
-		file will limit the trace to only those files.
+		file will limit the trace to only these functions.
 
   set_ftrace_notrace: This has the opposite effect that
 		set_ftrace_filter has. Any function that is added
 		here will not be traced. If a function exists
-		in both set_ftrace_filter and set_ftrace_notrace
-		the function will _not_ bet traced.
+		in both set_ftrace_filter and set_ftrace_notrace,
+		the function will _not_ be traced.
 
   available_filter_functions : When a function is encountered the first
 		time by the dynamic tracer, it is recorded and
@@ -138,7 +143,7 @@ Here are the list of current tracers that can be configured.
 
   ftrace - function tracer that uses mcount to trace all functions.
 		It is possible to filter out which functions that are
-		traced when dynamic ftrace is configured in.
+		to be traced when dynamic ftrace is configured in.
 
   sched_switch - traces the context switches between tasks.
 
@@ -297,13 +302,13 @@ explains which is which.
 
 The above is mostly meaningful for kernel developers.
 
-  time: This differs from the trace output where as the trace output
-	contained a absolute timestamp. This timestamp is relative
-	to the start of the first entry in the the trace.
+  time: This differs from the trace file output. The trace file output
+	included an absolute timestamp. The timestamp used by the
+	latency_trace file is relative to the start of the trace.
 
   delay: This is just to help catch your eye a bit better. And
 	needs to be fixed to be only relative to the same CPU.
-	The marks is determined by the difference between this
+	The marks are determined by the difference between this
 	current trace and the next trace.
 	 '!' - greater than preempt_mark_thresh (default 100)
 	 '+' - greater than 1 microsecond
@@ -322,13 +327,13 @@ output. To see what is available, simply cat the file:
   print-parent nosym-offset nosym-addr noverbose noraw nohex nobin \
  noblock nostacktrace nosched-tree
 
-To disable one of the options, echo in the option appended with "no".
+To disable one of the options, echo in the option prepended with "no".
 
   echo noprint-parent > /debug/tracing/iter_ctrl
 
 To enable an option, leave off the "no".
 
-  echo sym-offest > /debug/tracing/iter_ctrl
+  echo sym-offset > /debug/tracing/iter_ctrl
 
 Here are the available options:
 
@@ -344,7 +349,7 @@ Here are the available options:
 
   sym-offset - Display not only the function name, but also the offset
 		in the function. For example, instead of seeing just
-		"ktime_get" you will see "ktime_get+0xb/0x20"
+		"ktime_get", you will see "ktime_get+0xb/0x20".
 
   sym-offset:
    bash-4000  [01]  1477.606694: simple_strtoul+0x6/0xa0
@@ -364,7 +369,7 @@ Here are the available options:
 	user applications that can translate the raw numbers better than
 	having it done in the kernel.
 
-  hex - similar to raw, but the numbers will be in a hexadecimal format.
+  hex - Similar to raw, but the numbers will be in a hexadecimal format.
 
   bin - This will print out the formats in raw binary.
 
@@ -381,7 +386,7 @@ sched_switch
 ------------
 
 This tracer simply records schedule switches. Here's an example
-on how to implement it.
+of how to use it.
 
  # echo sched_switch > /debug/tracing/current_tracer
  # echo 1 > /debug/tracing/tracing_enabled
@@ -470,7 +475,7 @@ interrupt from triggering or the mouse interrupt from letting the
 kernel know of a new mouse event. The result is a latency with the
 reaction time.
 
-The irqsoff tracer tracks the time interrupts are disabled and when
+The irqsoff tracer tracks the time interrupts are disabled to the time
 they are re-enabled. When a new maximum latency is hit, it saves off
 the trace so that it may be retrieved at a later time. Every time a
 new maximum in reached, the old saved trace is discarded and the new
@@ -519,7 +524,7 @@ The difference between the 6 and the displayed timestamp 7us is
 because the clock must have incremented between the time of recording
 the max latency and recording the function that had that latency.
 
-Note the above had ftrace_enabled not set. If we set the ftrace_enabled
+Note the above had ftrace_enabled not set. If we set the ftrace_enabled,
 we get a much larger output:
 
 # tracer: irqsoff
@@ -570,21 +575,21 @@ vim:ft=help
 
 
 Here we traced a 50 microsecond latency. But we also see all the
-functions that were called during that time. Note that enabling
-function tracing we endure an added overhead. This overhead may
-extend the latency times. But never the less, this trace has provided
-some very helpful debugging.
+functions that were called during that time. Note that by enabling
+function tracing, we endure an added overhead. This overhead may
+extend the latency times. But nevertheless, this trace has provided
+some very helpful debugging information.
 
 
 preemptoff
 ----------
 
-When preemption is disabled we may be able to receive interrupts but
-the task can not be preempted and a higher priority task must wait
+When preemption is disabled, we may be able to receive interrupts but
+the task cannot be preempted and a higher priority task must wait
 for preemption to be enabled again before it can preempt a lower
 priority task.
 
-The preemptoff tracer traces the places that disables preemption.
+The preemptoff tracer traces the places that disable preemption.
 Like the irqsoff, it records the maximum latency that preemption
 was disabled. The control of preemptoff is much like the irqsoff.
 
@@ -696,7 +701,7 @@ Notice that the __do_softirq when called doesn't have a preempt_count.
 It may seem that we missed a preempt enabled. What really happened
 is that the preempt count is held on the threads stack and we
 switched to the softirq stack (4K stacks in effect). The code
-does not copy the preempt count, but because interrupts are disabled
+does not copy the preempt count, but because interrupts are disabled,
 we don't need to worry about it. Having a tracer like this is good
 to let people know what really happens inside the kernel.
 
@@ -732,7 +737,7 @@ To record this time, use the preemptirqsoff tracer.
 
 Again, using this trace is much like the irqsoff and preemptoff tracers.
 
- # echo preemptoff > /debug/tracing/current_tracer
+ # echo preemptirqsoff > /debug/tracing/current_tracer
  # echo 0 > /debug/tracing/tracing_max_latency
  # echo 1 > /debug/tracing/tracing_enabled
  # ls -ltr
@@ -862,9 +867,9 @@ This is a very interesting trace. It started with the preemption of
 the ls task. We see that the task had the "need_resched" bit set
 with the 'N' in the trace.  Interrupts are disabled in the spin_lock
 and the trace started. We see that a schedule took place to run
-sshd.  When the interrupts were enabled we took an interrupt.
-On return of the interrupt the softirq ran. We took another interrupt
-while running the softirq as we see with the capital 'H'.
+sshd.  When the interrupts were enabled, we took an interrupt.
+On return from the interrupt handler, the softirq ran. We took another
+interrupt while running the softirq as we see with the capital 'H'.
 
 
 wakeup
@@ -876,9 +881,9 @@ time it executes. This is also known as "schedule latency".
 I stress the point that this is about RT tasks. It is also important
 to know the scheduling latency of non-RT tasks, but the average
 schedule latency is better for non-RT tasks. Tools like
-LatencyTop is more appropriate for such measurements.
+LatencyTop are more appropriate for such measurements.
 
-Real-Time environments is interested in the worst case latency.
+Real-Time environments are interested in the worst case latency.
 That is the longest latency it takes for something to happen, and
 not the average. We can have a very fast scheduler that may only
 have a large latency once in a while, but that would not work well
@@ -889,8 +894,8 @@ tasks that are unpredictable will overwrite the worst case latency
 of RT tasks.
 
 Since this tracer only deals with RT tasks, we will run this slightly
-different than we did with the previous tracers. Instead of performing
-an 'ls' we will run 'sleep 1' under 'chrt' which changes the
author	Benjamin Herrenschmidt <benh@kernel.crashing.org>	2008-07-15 15:44:51 +1000
committer	Benjamin Herrenschmidt <benh@kernel.crashing.org>	2008-07-15 15:44:51 +1000
commit	43d2548bb2ef7e6d753f91468a746784041e522d (patch)
tree	77d13fcd48fd998393abb825ec36e2b732684a73 /Documentation
parent	585583d95c5660973bc0cf64add517b040acd8a4 (diff)
parent	85082fd7cbe3173198aac0eb5e85ab1edcc6352c (diff)