aboutsummaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)Author
2012-05-20mtd: map.h: fix arm cross-build failureArtem Bityutskiy
commit 4a42243886b87cd28a39b192161767c2af851a55 upstream. This patch fixes the following build failure: In file included from include/linux/mtd/qinfo.h:4:0, from include/linux/mtd/pfow.h:7, from drivers/mtd/lpddr/lpddr_cmds.c:27: include/linux/mtd/map.h: In function 'inline_map_read': include/linux/mtd/map.h:409:3: error: implicit declaration of function 'BUILD_BUG_ON' [-Werror=implicit-function-declaration] Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-05-20usbnet: fix skb traversing races during unlink(v2)Ming Lei
commit 5b6e9bcdeb65634b4ad604eb4536404bbfc62cfa upstream. Commit 4231d47e6fe69f061f96c98c30eaf9fb4c14b96d(net/usbnet: avoid recursive locking in usbnet_stop()) fixes the recursive locking problem by releasing the skb queue lock before unlink, but may cause skb traversing races: - after URB is unlinked and the queue lock is released, the refered skb and skb->next may be moved to done queue, even be released - in skb_queue_walk_safe, the next skb is still obtained by next pointer of the last skb - so maybe trigger oops or other problems This patch extends the usage of entry->state to describe 'start_unlink' state, so always holding the queue(rx/tx) lock to change the state if the referd skb is in rx or tx queue because we need to know if the refered urb has been started unlinking in unlink_urbs. The other part of this patch is based on Huajun's patch: always traverse from head of the tx/rx queue to get skb which is to be unlinked but not been started unlinking. Signed-off-by: Huajun Li <huajun.li.lee@gmail.com> Signed-off-by: Ming Lei <tom.leiming@gmail.com> Cc: Oliver Neukum <oneukum@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-05-11asm-generic: Use __BITS_PER_LONG in statfs.hH. Peter Anvin
commit f5c2347ee20a8d6964d6a6b1ad04f200f8d4dfa7 upstream. <asm-generic/statfs.h> is exported to userspace, so using BITS_PER_LONG is invalid. We need to use __BITS_PER_LONG instead. This is kernel bugzilla 43165. Reported-by: H.J. Lu <hjl.tools@gmail.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Link: http://lkml.kernel.org/r/1335465916-16965-1-git-send-email-hpa@linux.intel.com Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-05-11Fix __read_seqcount_begin() to use ACCESS_ONCE for sequence value readLinus Torvalds
commit 2f624278626677bfaf73fef97f86b37981621f5c upstream. We really need to use a ACCESS_ONCE() on the sequence value read in __read_seqcount_begin(), because otherwise the compiler might end up reloading the value in between the test and the return of it. As a result, it might end up returning an odd value (which means that a write is in progress). If the reader is then fast enough that that odd value is still the current one when the read_seqcount_retry() is done, we might end up with a "successful" read sequence, even despite the concurrent write being active. In practice this probably never really happens - there just isn't anything else going on around the read of the sequence count, and the common case is that we end up having a read barrier immediately afterwards. So the code sequence in which gcc might decide to reaload from memory is small, and there's no reason to believe it would ever actually do the reload. But if the compiler ever were to decide to do so, it would be incredibly annoying to debug. Let's just make sure. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-05-11efi: Add new variable attributesMatthew Garrett
commit 41b3254c93acc56adc3c4477fef7c9512d47659e upstream. More recent versions of the UEFI spec have added new attributes for variables. Add them. Signed-off-by: Matthew Garrett <mjg@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-05-11pipes: add a "packetized pipe" mode for writingLinus Torvalds
commit 9883035ae7edef3ec62ad215611cb8e17d6a1a5d upstream. The actual internal pipe implementation is already really about individual packets (called "pipe buffers"), and this simply exposes that as a special packetized mode. When we are in the packetized mode (marked by O_DIRECT as suggested by Alan Cox), a write() on a pipe will not merge the new data with previous writes, so each write will get a pipe buffer of its own. The pipe buffer is then marked with the PIPE_BUF_FLAG_PACKET flag, which in turn will tell the reader side to break the read at that boundary (and throw away any partial packet contents that do not fit in the read buffer). End result: as long as you do writes less than PIPE_BUF in size (so that the pipe doesn't have to split them up), you can now treat the pipe as a packet interface, where each read() system call will read one packet at a time. You can just use a sufficiently big read buffer (PIPE_BUF is sufficient, since bigger than that doesn't guarantee atomicity anyway), and the return value of the read() will naturally give you the size of the packet. NOTE! We do not support zero-sized packets, and zero-sized reads and writes to a pipe continue to be no-ops. Also note that big packets will currently be split at write time, but that the size at which that happens is not really specified (except that it's bigger than PIPE_BUF). Currently that limit is the system page size, but we might want to explicitly support bigger packets some day. The main user for this is going to be the autofs packet interface, allowing us to stop having to care so deeply about exact packet sizes (which have had bugs with 32/64-bit compatibility modes). But user space can create packetized pipes with "pipe2(fd, O_DIRECT)", which will fail with an EINVAL on kernels that do not support this interface. Tested-by: Michael Tokarev <mjt@tls.msk.ru> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: David Miller <davem@davemloft.net> Cc: Ian Kent <raven@themaw.net> Cc: Thomas Meyer <thomas@m3y3r.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-05-11USB: EHCI: fix crash during suspend on ASUS computersAlan Stern
commit 151b61284776be2d6f02d48c23c3625678960b97 upstream. This patch (as1545) fixes a problem affecting several ASUS computers: The machine crashes or corrupts memory when going into suspend if the ehci-hcd driver is bound to any controllers. Users have been forced to unbind or unload ehci-hcd before putting their systems to sleep. After extensive testing, it was determined that the machines don't like going into suspend when any EHCI controllers are in the PCI D3 power state. Presumably this is a firmware bug, but there's nothing we can do about it except to avoid putting the controllers in D3 during system sleep. The patch adds a new flag to indicate whether the problem is present, and avoids changing the controller's power state if the flag is set. Runtime suspend is unaffected; this matters only for system suspend. However as a side effect, the controller will not respond to remote wakeup requests while the system is asleep. Hence USB wakeup is not functional -- but of course, this is already true in the current state of affairs. This fixes Bugzilla #42728. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Tested-by: Steven Rostedt <rostedt@goodmis.org> Tested-by: Andrey Rahmatullin <wrar@wrar.name> Tested-by: Oleksij Rempel (fishor) <bug-track@fisher-privat.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> [bwh: Backported to 3.2: adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-05-11tcp: avoid order-1 allocations on wifi and tx pathEric Dumazet
[ This combines upstream commit a21d45726acacc963d8baddf74607d9b74e2b723 and the follow-on bug fix commit 22b4a4f22da4b39c6f7f679fd35f3d35c91bf851 ] Marc Merlin reported many order-1 allocations failures in TX path on its wireless setup, that dont make any sense with MTU=1500 network, and non SG capable hardware. After investigation, it turns out TCP uses sk_stream_alloc_skb() and used as a convention skb_tailroom(skb) to know how many bytes of data payload could be put in this skb (for non SG capable devices) Note : these skb used kmalloc-4096 (MTU=1500 + MAX_HEADER + sizeof(struct skb_shared_info) being above 2048) Later, mac80211 layer need to add some bytes at the tail of skb (IEEE80211_ENCRYPT_TAILROOM = 18 bytes) and since no more tailroom is available has to call pskb_expand_head() and request order-1 allocations. This patch changes sk_stream_alloc_skb() so that only sk->sk_prot->max_header bytes of headroom are reserved, and use a new skb field, avail_size to hold the data payload limit. This way, order-0 allocations done by TCP stack can leave more than 2 KB of tailroom and no more allocation is performed in mac80211 layer (or any layer needing some tailroom) avail_size is unioned with mark/dropcount, since mark will be set later in IP stack for output packets. Therefore, skb size is unchanged. Reported-by: Marc MERLIN <marc@merlins.org> Tested-by: Marc MERLIN <marc@merlins.org> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> [bwh: Correct commit hash for follow-on bug fix] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-05-11tcp: allow splice() to build full TSO packetsEric Dumazet
[ This combines upstream commit 2f53384424251c06038ae612e56231b96ab610ee and the follow-on bug fix commit 35f9c09fe9c72eb8ca2b8e89a593e1c151f28fc2 ] vmsplice()/splice(pipe, socket) call do_tcp_sendpages() one page at a time, adding at most 4096 bytes to an skb. (assuming PAGE_SIZE=4096) The call to tcp_push() at the end of do_tcp_sendpages() forces an immediate xmit when pipe is not already filled, and tso_fragment() try to split these skb to MSS multiples. 4096 bytes are usually split in a skb with 2 MSS, and a remaining sub-mss skb (assuming MTU=1500) This makes slow start suboptimal because many small frames are sent to qdisc/driver layers instead of big ones (constrained by cwnd and packets in flight of course) In fact, applications using sendmsg() (adding an additional memory copy) instead of vmsplice()/splice()/sendfile() are a bit faster because of this anomaly, especially if serving small files in environments with large initial [c]wnd. Call tcp_push() only if MSG_MORE is not set in the flags parameter. This bit is automatically provided by splice() internals but for the last page, or on all pages if user specified SPLICE_F_MORE splice() flag. In some workloads, this can reduce number of sent logical packets by an order of magnitude, making zero-copy TCP actually faster than one-copy :) Reported-by: Tom Herbert <therbert@google.com> Cc: Nandita Dukkipati <nanditad@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Tom Herbert <therbert@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: H.K. Jerry Chu <hkchu@google.com> Cc: Maciej Żenczykowski <maze@google.com> Cc: Mahesh Bandewar <maheshb@google.com> Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-05-11net: fix /proc/net/dev regressionEric Dumazet
[ Upstream commit 2def16ae6b0c77571200f18ba4be049b03d75579 ] Commit f04565ddf52 (dev: use name hash for dev_seq_ops) added a second regression, as some devices are missing from /proc/net/dev if many devices are defined. When seq_file buffer is filled, the last ->next/show() method is canceled (pos value is reverted to value prior ->next() call) Problem is after above commit, we dont restart the lookup at right position in ->start() method. Fix this by removing the internal 'pos' pointer added in commit, since we need to use the 'loff_t *pos' provided by seq_file layer. This also reverts commit 5cac98dd0 (net: Fix corruption in /proc/*/net/dev_mcast), since its not needed anymore. Reported-by: Ben Greear <greearb@candelatech.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Mihai Maruseac <mmaruseac@ixiacom.com> Tested-by: Ben Greear <greearb@candelatech.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-05-11KVM: unmap pages from the iommu when slots are removedAlex Williamson
commit 32f6daad4651a748a58a3ab6da0611862175722f upstream. We've been adding new mappings, but not destroying old mappings. This can lead to a page leak as pages are pinned using get_user_pages, but only unpinned with put_page if they still exist in the memslots list on vm shutdown. A memslot that is destroyed while an iommu domain is enabled for the guest will therefore result in an elevated page reference count that is never cleared. Additionally, without this fix, the iommu is only programmed with the first translation for a gpa. This can result in peer-to-peer errors if a mapping is destroyed and replaced by a new mapping at the same gpa as the iommu will still be pointing to the original, pinned memory address. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
2012-04-22Bluetooth: hci_core: fix NULL-pointer dereference at unregisterJohan Hovold
commit 94324962066231a938564bebad0f941cd2d06bb2 upstream. Make sure hci_dev_open returns immediately if hci_dev_unregister has been called. This fixes a race between hci_dev_open and hci_dev_unregister which can lead to a NULL-pointer dereference. Bug is 100% reproducible using hciattach and a disconnected serial port: 0. # hciattach -n /dev/ttyO1 any noflow 1. hci_dev_open called from hci_power_on grabs req lock 2. hci_init_req executes but device fails to initialise (times out eventually) 3. hci_dev_open is called from hci_sock_ioctl and sleeps on req lock 4. hci_uart_tty_close calls hci_dev_unregister and sleeps on req lock in hci_dev_do_close 5. hci_dev_open (1) releases req lock 6. hci_dev_do_close grabs req lock and returns as device is not up 7. hci_dev_unregister sleeps in destroy_workqueue 8. hci_dev_open (3) grabs req lock, calls hci_init_req and eventually sleeps 9. hci_dev_unregister finishes, while hci_dev_open is still running... [ 79.627136] INFO: trying to register non-static key. [ 79.632354] the code is fine but needs lockdep annotation. [ 79.638122] turning off the locking correctness validator. [ 79.643920] [<c00188bc>] (unwind_backtrace+0x0/0xf8) from [<c00729c4>] (__lock_acquire+0x1590/0x1ab0) [ 79.653594] [<c00729c4>] (__lock_acquire+0x1590/0x1ab0) from [<c00733f8>] (lock_acquire+0x9c/0x128) [ 79.663085] [<c00733f8>] (lock_acquire+0x9c/0x128) from [<c0040a88>] (run_timer_softirq+0x150/0x3ac) [ 79.672668] [<c0040a88>] (run_timer_softirq+0x150/0x3ac) from [<c003a3b8>] (__do_softirq+0xd4/0x22c) [ 79.682281] [<c003a3b8>] (__do_softirq+0xd4/0x22c) from [<c003a924>] (irq_exit+0x8c/0x94) [ 79.690856] [<c003a924>] (irq_exit+0x8c/0x94) from [<c0013a50>] (handle_IRQ+0x34/0x84) [ 79.699157] [<c0013a50>] (handle_IRQ+0x34/0x84) from [<c0008530>] (omap3_intc_handle_irq+0x48/0x4c) [ 79.708648] [<c0008530>] (omap3_intc_handle_irq+0x48/0x4c) from [<c037499c>] (__irq_usr+0x3c/0x60) [ 79.718048] Exception stack(0xcf281fb0 to 0xcf281ff8) [ 79.723358] 1fa0: 0001e6a0 be8dab00 0001e698 00036698 [ 79.731933] 1fc0: 0002df98 0002df38 0000001f 00000000 b6f234d0 00000000 00000004 00000000 [ 79.740509] 1fe0: 0001e6f8 be8d6aa0 be8dac50 0000aab8 80000010 ffffffff [ 79.747497] Unable to handle kernel NULL pointer dereference at virtual address 00000000 [ 79.756011] pgd = cf3b4000 [ 79.758850] [00000000] *pgd=8f0c7831, *pte=00000000, *ppte=00000000 [ 79.765502] Internal error: Oops: 80000007 [#1] [ 79.770294] Modules linked in: [ 79.773529] CPU: 0 Tainted: G W (3.3.0-rc6-00002-gb5d5c87 #421) [ 79.781066] PC is at 0x0 [ 79.783721] LR is at run_timer_softirq+0x16c/0x3ac [ 79.788787] pc : [<00000000>] lr : [<c0040aa4>] psr: 60000113 [ 79.788787] sp : cf281ee0 ip : 00000000 fp : cf280000 [ 79.800903] r10: 00000004 r9 : 00000100 r8 : b6f234d0 [ 79.806427] r7 : c0519c28 r6 : cf093488 r5 : c0561a00 r4 : 00000000 [ 79.813323] r3 : 00000000 r2 : c054eee0 r1 : 00000001 r0 : 00000000 [ 79.820190] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user [ 79.827728] Control: 10c5387d Table: 8f3b4019 DAC: 00000015 [ 79.833801] Process gpsd (pid: 1265, stack limit = 0xcf2802e8) [ 79.839965] Stack: (0xcf281ee0 to 0xcf282000) [ 79.844573] 1ee0: 00000002 00000000 c0040a24 00000000 00000002 cf281f08 00200200 00000000 [ 79.853210] 1f00: 00000000 cf281f18 cf281f08 00000000 00000000 00000000 cf281f18 cf281f18 [ 79.861816] 1f20: 00000000 00000001 c056184c 00000000 00000001 b6f234d0 c0561848 00000004 [ 79.870452] 1f40: cf280000 c003a3b8 c051e79c 00000001 00000000 00000100 3fa9e7b8 0000000a [ 79.879089] 1f60: 00000025 cf280000 00000025 00000000 00000000 b6f234d0 00000000 00000004 [ 79.887756] 1f80: 00000000 c003a924 c053ad38 c0013a50 fa200000 cf281fb0 ffffffff c0008530 [ 79.896362] 1fa0: 0001e6a0 0000aab8 80000010 c037499c 0001e6a0 be8dab00 0001e698 00036698 [ 79.904998] 1fc0: 0002df98 0002df38 0000001f 00000000 b6f234d0 00000000 00000004 00000000 [ 79.913665] 1fe0: 0001e6f8 be8d6aa0 be8dac50 0000aab8 80000010 ffffffff 00fbf700 04ffff00 [ 79.922302] [<c0040aa4>] (run_timer_softirq+0x16c/0x3ac) from [<c003a3b8>] (__do_softirq+0xd4/0x22c) [ 79.931945] [<c003a3b8>] (__do_softirq+0xd4/0x22c) from [<c003a924>] (irq_exit+0x8c/0x94) [ 79.940582] [<c003a924>] (irq_exit+0x8c/0x94) from [<c0013a50>] (handle_IRQ+0x34/0x84) [ 79.948913] [<c0013a50>] (handle_IRQ+0x34/0x84) from [<c0008530>] (omap3_intc_handle_irq+0x48/0x4c) [ 79.958404] [<c0008530>] (omap3_intc_handle_irq+0x48/0x4c) from [<c037499c>] (__irq_usr+0x3c/0x60) [ 79.967773] Exception stack(0xcf281fb0 to 0xcf281ff8) [ 79.973083] 1fa0: 0001e6a0 be8dab00 0001e698 00036698 [ 79.981658] 1fc0: 0002df98 0002df38 0000001f 00000000 b6f234d0 00000000 00000004 00000000 [ 79.990234] 1fe0: 0001e6f8 be8d6aa0 be8dac50 0000aab8 80000010 ffffffff [ 79.997161] Code: bad PC value [ 80.000396] ---[ end trace 6f6739840475f9ee ]--- [ 80.005279] Kernel panic - not syncing: Fatal exception in interrupt Signed-off-by: Johan Hovold <jhovold@gmail.com> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-04-13sched/x86: Fix overflow in cyc2ns_offsetSalman Qazi
commit 9993bc635d01a6ee7f6b833b4ee65ce7c06350b1 upstream. When a machine boots up, the TSC generally gets reset. However, when kexec is used to boot into a kernel, the TSC value would be carried over from the previous kernel. The computation of cycns_offset in set_cyc2ns_scale is prone to an overflow, if the machine has been up more than 208 days prior to the kexec. The overflow happens when we multiply *scale, even though there is enough room to store the final answer. We fix this issue by decomposing tsc_now into the quotient and remainder of division by CYC2NS_SCALE_FACTOR and then performing the multiplication separately on the two components. Refactor code to share the calculation with the previous fix in __cycles_2_ns(). Signed-off-by: Salman Qazi <sqazi@google.com> Acked-by: John Stultz <john.stultz@linaro.org> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Turner <pjt@google.com> Cc: john stultz <johnstul@us.ibm.com> Link: http://lkml.kernel.org/r/20120310004027.19291.88460.stgit@dungbeetle.mtv.corp.google.com Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Mike Galbraith <efault@gmx.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-04-13CIFS: Fix VFS lock usage for oplocked filesPavel Shilovsky
commit 66189be74ff5f9f3fd6444315b85be210d07cef2 upstream. We can deadlock if we have a write oplock and two processes use the same file handle. In this case the first process can't unlock its lock if the second process blocked on the lock in the same time. Fix it by using posix_lock_file rather than posix_lock_file_wait under cinode->lock_mutex. If we request a blocking lock and posix_lock_file indicates that there is another lock that prevents us, wait untill that lock is released and restart our call. Acked-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru> Signed-off-by: Steve French <sfrench@us.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-04-13x86,kgdb: Fix DEBUG_RODATA limitation using text_poke()Jason Wessel
commit 3751d3e85cf693e10e2c47c03c8caa65e171099b upstream. There has long been a limitation using software breakpoints with a kernel compiled with CONFIG_DEBUG_RODATA going back to 2.6.26. For this particular patch, it will apply cleanly and has been tested all the way back to 2.6.36. The kprobes code uses the text_poke() function which accommodates writing a breakpoint into a read-only page. The x86 kgdb code can solve the problem similarly by overriding the default breakpoint set/remove routines and using text_poke() directly. The x86 kgdb code will first attempt to use the traditional probe_kernel_write(), and next try using a the text_poke() function. The break point install method is tracked such that the correct break point removal routine will get called later on. Cc: x86@kernel.org Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Inspried-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Jason Wessel <jason.wessel@windriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-04-13kgdb,debug_core: pass the breakpoint struct instead of address and memoryJason Wessel
commit 98b54aa1a2241b59372468bd1e9c2d207bdba54b upstream. There is extra state information that needs to be exposed in the kgdb_bpt structure for tracking how a breakpoint was installed. The debug_core only uses the the probe_kernel_write() to install breakpoints, but this is not enough for all the archs. Some arch such as x86 need to use text_poke() in order to install a breakpoint into a read only page. Passing the kgdb_bpt structure to kgdb_arch_set_breakpoint() and kgdb_arch_remove_breakpoint() allows other archs to set the type variable which indicates how the breakpoint was installed. Signed-off-by: Jason Wessel <jason.wessel@windriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-04-02rtc: Provide flag for rtc devices that don't support UIEJohn Stultz
commit 4a649903f91232d02284d53724b0a45728111767 upstream. Richard Weinberger noticed that on some RTC hardware that doesn't support UIE mode, due to coarse granular alarms (like 1minute resolution), the current virtualized RTC support doesn't properly error out when UIE is enabled. Instead the current code queues an alarm for the next second, but it won't fire until up to a miniute later. This patch provides a generic way to flag this sort of hardware and fixes the issue on the mpc5121 where Richard noticed the problem. Reported-by: Richard Weinberger <richard@nod.at> Tested-by: Richard Weinberger <richard@nod.at> Signed-off-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-04-02compat: use sys_sendfile64() implementation for sendfile syscallChris Metcalf
commit 1631fcea8399da5e80a80084b3b8c5bfd99d21e7 upstream. <asm-generic/unistd.h> was set up to use sys_sendfile() for the 32-bit compat API instead of sys_sendfile64(), but in fact the right thing to do is to use sys_sendfile64() in all cases. The 32-bit sendfile64() API in glibc uses the sendfile64 syscall, so it has to be capable of doing full 64-bit operations. But the sys_sendfile() kernel implementation has a MAX_NON_LFS test in it which explicitly limits the offset to 2^32. So, we need to use the sys_sendfile64() implementation in the kernel for this case. Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Chris Metcalf <cmetcalf@tilera.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-04-02udlfb: remove sysfs framebuffer device with USB .disconnect()Kay Sievers
commit ce880cb860f36694d2cdebfac9e6ae18176fe4c4 upstream. The USB graphics card driver delays the unregistering of the framebuffer device to a workqueue, which breaks the userspace visible remove uevent sequence. Recent userspace tools started to support USB graphics card hotplug out-of-the-box and rely on proper events sent by the kernel. The framebuffer device is a direct child of the USB interface which is removed immediately after the USB .disconnect() callback. But the fb device in /sys stays around until its final cleanup, at a time where all the parent devices have been removed already. To work around that, we remove the sysfs fb device directly in the USB .disconnect() callback and leave only the cleanup of the internal fb data to the delayed work. Before: add /devices/pci0000:00/0000:00:1d.0/usb2/2-1/2-1.2 (usb) add /devices/pci0000:00/0000:00:1d.0/usb2/2-1/2-1.2/2-1.2:1.0 (usb) add /devices/pci0000:00/0000:00:1d.0/usb2/2-1/2-1.2/2-1.2:1.0/graphics/fb0 (graphics) remove /devices/pci0000:00/0000:00:1d.0/usb2/2-1/2-1.2/2-1.2:1.0 (usb) remove /devices/pci0000:00/0000:00:1d.0/usb2/2-1/2-1.2 (usb) remove /2-1.2:1.0/graphics/fb0 (graphics) After: add /devices/pci0000:00/0000:00:1d.0/usb2/2-1/2-1.2 (usb) add /devices/pci0000:00/0000:00:1d.0/usb2/2-1/2-1.2/2-1.2:1.0 (usb) add /devices/pci0000:00/0000:00:1d.0/usb2/2-1/2-1.2/2-1.2:1.0/graphics/fb1 (graphics) remove /devices/pci0000:00/0000:00:1d.0/usb2/2-1/2-1.2/2-1.2:1.0/graphics/fb1 (graphics) remove /devices/pci0000:00/0000:00:1d.0/usb2/2-1/2-1.2/2-1.2:1.0 (usb) remove /devices/pci0000:00/0000:00:1d.0/usb2/2-1/2-1.2 (usb) Tested-by: Bernie Thompson <bernie@plugable.com> Acked-by: Bernie Thompson <bernie@plugable.com> Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Florian Tobias Schandinat <FlorianSchandinat@gmx.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-04-02mm: thp: fix pmd_bad() triggering in code paths holding mmap_sem read modeAndrea Arcangeli
commit 1a5a9906d4e8d1976b701f889d8f35d54b928f25 upstream. In some cases it may happen that pmd_none_or_clear_bad() is called with the mmap_sem hold in read mode. In those cases the huge page faults can allocate hugepmds under pmd_none_or_clear_bad() and that can trigger a false positive from pmd_bad() that will not like to see a pmd materializing as trans huge. It's not khugepaged causing the problem, khugepaged holds the mmap_sem in write mode (and all those sites must hold the mmap_sem in read mode to prevent pagetables to go away from under them, during code review it seems vm86 mode on 32bit kernels requires that too unless it's restricted to 1 thread per process or UP builds). The race is only with the huge pagefaults that can convert a pmd_none() into a pmd_trans_huge(). Effectively all these pmd_none_or_clear_bad() sites running with mmap_sem in read mode are somewhat speculative with the page faults, and the result is always undefined when they run simultaneously. This is probably why it wasn't common to run into this. For example if the madvise(MADV_DONTNEED) runs zap_page_range() shortly before the page fault, the hugepage will not be zapped, if the page fault runs first it will be zapped. Altering pmd_bad() not to error out if it finds hugepmds won't be enough to fix this, because zap_pmd_range would then proceed to call zap_pte_range (which would be incorrect if the pmd become a pmd_trans_huge()). The simplest way to fix this is to read the pmd in the local stack (regardless of what we read, no need of actual CPU barriers, only compiler barrier needed), and be sure it is not changing under the code that computes its value. Even if the real pmd is changing under the value we hold on the stack, we don't care. If we actually end up in zap_pte_range it means the pmd was not none already and it was not huge, and it can't become huge from under us (khugepaged locking explained above). All we need is to enforce that there is no way anymore that in a code path like below, pmd_trans_huge can be false, but pmd_none_or_clear_bad can run into a hugepmd. The overhead of a barrier() is just a compiler tweak and should not be measurable (I only added it for THP builds). I don't exclude different compiler versions may have prevented the race too by caching the value of *pmd on the stack (that hasn't been verified, but it wouldn't be impossible considering pmd_none_or_clear_bad, pmd_bad, pmd_trans_huge, pmd_none are all inlines and there's no external function called in between pmd_trans_huge and pmd_none_or_clear_bad). if (pmd_trans_huge(*pmd)) { if (next-addr != HPAGE_PMD_SIZE) { VM_BUG_ON(!rwsem_is_locked(&tlb->mm->mmap_sem)); split_huge_page_pmd(vma->vm_mm, pmd); } else if (zap_huge_pmd(tlb, vma, pmd, addr)) continue; /* fall through */ } if (pmd_none_or_clear_bad(pmd)) Because this race condition could be exercised without special privileges this was reported in CVE-2012-1179. The race was identified and fully explained by Ulrich who debugged it. I'm quoting his accurate explanation below, for reference. ====== start quote ======= mapcount 0 page_mapcount 1 kernel BUG at mm/huge_memory.c:1384! At some point prior to the panic, a "bad pmd ..." message similar to the following is logged on the console: mm/memory.c:145: bad pmd ffff8800376e1f98(80000000314000e7). The "bad pmd ..." message is logged by pmd_clear_bad() before it clears the page's PMD table entry. 143 void pmd_clear_bad(pmd_t *pmd) 144 { -> 145 pmd_ERROR(*pmd); 146 pmd_clear(pmd); 147 } After the PMD table entry has been cleared, there is an inconsistency between the actual number of PMD table entries that are mapping the page and the page's map count (_mapcount field in struct page). When the page is subsequently reclaimed, __split_huge_page() detects this inconsistency. 1381 if (mapcount != page_mapcount(page)) 1382 printk(KERN_ERR "mapcount %d page_mapcount %d\n", 1383 mapcount, page_mapcount(page)); -> 1384 BUG_ON(mapcount != page_mapcount(page)); The root cause of the problem is a race of two threads in a multithreaded process. Thread B incurs a page fault on a virtual address that has never been accessed (PMD entry is zero) while Thread A is executing an madvise() system call on a virtual address within the same 2 MB (huge page) range. virtual address space .---------------------. | | | | .-|---------------------| | | | | | |<-- B(fault) | | | 2 MB | |/////////////////////|-. huge < |/////////////////////| > A(range) page | |/////////////////////|-' | | | | | | '-|---------------------| | | | | '---------------------' - Thread A is executing an madvise(..., MADV_DONTNEED) system call on the virtual address range "A(range)" shown in the picture. sys_madvise // Acquire the semaphore in shared mode. down_read(&current->mm->mmap_sem) ... madvise_vma switch (behavior) case MADV_DONTNEED: madvise_dontneed zap_page_range unmap_vmas unmap_page_range zap_pud_range zap_pmd_range // // Assume that this huge page has never been accessed. // I.e. content of the PMD entry is zero (not mapped). // if (pmd_trans_huge(*pmd)) { // We don't get here due to the above assumption. } // // Assume that Thread B incurred a page fault and .---------> // sneaks in here as shown below. | // | if (pmd_none_or_clear_bad(pmd)) | { | if (unlikely(pmd_bad(*pmd))) | pmd_clear_bad | { | pmd_ERROR | // Log "bad pmd ..." message here. | pmd_clear | // Clear the page's PMD entry. | // Thread B incremented the map count | // in page_add_new_anon_rmap(), but | // now the page is no longer mapped | // by a PMD entry (-> inconsistency). | } | } | v - Thread B is handling a page fault on virtual address "B(fault)" shown in the picture. ... do_page_fault __do_page_fault // Acquire the semaphore in shared mode. down_read_trylock(&mm->mmap_sem) ... handle_mm_fault if (pmd_none(*pmd) && transparent_hugepage_enabled(vma)) // We get here due to the above assumption (PMD entry is zero). do_huge_pmd_anonymous_page alloc_hugepage_vma // Allocate a new transparent huge page here. ... __do_huge_pmd_anonymous_page ... spin_lock(&mm->page_table_lock) ... page_add_new_anon_rmap // Here we increment the page's map count (starts at -1). atomic_set(&page->_mapcount, 0) set_pmd_at // Here we set the page's PMD entry which will be cleared // when Thread A calls pmd_clear_bad(). ... spin_unlock(&mm->page_table_lock) The mmap_sem does not prevent the race because both threads are acquiring it in shared mode (down_read). Thread B holds the page_table_lock while the page's map count and PMD table entry are updated. However, Thread A does not synchronize on that lock. ====== end quote ======= [akpm@linux-foundation.org: checkpatch fixes] Reported-by: Ulrich Obergfell <uobergfe@redhat.com> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Hugh Dickins <hughd@google.com> Cc: Dave Jones <davej@redhat.com> Acked-by: Larry Woodman <lwoodman@redhat.com> Acked-by: Rik van Riel <riel@redhat.com> Cc: Mark Salter <msalter@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-04-02math: Introduce div64_longSasha Levin
commit f910381a55cdaa097030291f272f6e6e4380c39a upstream. Add a div64_long macro which is used to devide a 64bit number by a long (which can be 4 bytes on 32bit systems and 8 bytes on 64bit systems). Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sasha Levin <levinsasha928@gmail.com> Cc: johnstul@us.ibm.com Link: http://lkml.kernel.org/r/1331829374-31543-1-git-send-email-levinsasha928@gmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-03-19Block: use a freezable workqueue for disk-event pollingAlan Stern
commit 62d3c5439c534b0e6c653fc63e6d8c67be3a57b1 upstream. This patch (as1519) fixes a bug in the block layer's disk-events polling. The polling is done by a work routine queued on the system_nrt_wq workqueue. Since that workqueue isn't freezable, the polling continues even in the middle of a system sleep transition. Obviously, polling a suspended drive for media changes and such isn't a good thing to do; in the case of USB mass-storage devices it can lead to real problems requiring device resets and even re-enumeration. The patch fixes things by creating a new system-wide, non-reentrant, freezable workqueue and using it for disk-events polling. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-03-19block: Fix NULL pointer dereference in sd_revalidate_diskJun'ichi Nomura
commit fe316bf2d5847bc5dd975668671a7b1067603bc7 upstream. Since 2.6.39 (1196f8b), when a driver returns -ENOMEDIUM for open(), __blkdev_get() calls rescan_partitions() to remove in-kernel partition structures and raise KOBJ_CHANGE uevent. However it ends up calling driver's revalidate_disk without open and could cause oops. In the case of SCSI: process A process B ---------------------------------------------- sys_open __blkdev_get sd_open returns -ENOMEDIUM scsi_remove_device <scsi_device torn down> rescan_partitions sd_revalidate_disk <oops> Oopses are reported here: http://marc.info/?l=linux-scsi&m=132388619710052 This patch separates the partition invalidation from rescan_partitions() and use it for -ENOMEDIUM case. Reported-by: Huajun Li <huajun.li.lee@gmail.com> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-03-19ipsec: be careful of non existing mac headersEric Dumazet
[ Upstream commit 03606895cd98c0a628b17324fd7b5ff15db7e3cd ] Niccolo Belli reported ipsec crashes in case we handle a frame without mac header (atm in his case) Before copying mac header, better make sure it is present. Bugzilla reference: https://bugzilla.kernel.org/show_bug.cgi?id=42809 Reported-by: Niccolò Belli <darkbasic@linuxsystems.it> Tested-by: Niccolò Belli <darkbasic@linuxsystems.it> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-03-12OMAPDSS: HDMI: PHY burnout fixTomi Valkeinen
commit c49d005b6cc8491fad5b24f82805be2d6bcbd3dd upstream. A hardware bug in the OMAP4 HDMI PHY causes physical damage to the board if the HDMI PHY is kept powered on when the cable is not connected. This patch solves the problem by adding hot-plug-detection into the HDMI IP driver. This is not a real HPD support in the sense that nobody else than the IP driver gets to know about the HPD events, but is only meant to fix the HW bug. The strategy is simple: If the display device is turned off by the user, the PHY power is set to OFF. When the display device is turned on by the user, the PHY power is set either to LDOON or TXON, depending on whether the HDMI cable is connected. The reason to avoid PHY OFF when the display device is on, but the cable is disconnected, is that when the PHY is turned OFF, the HDMI IP is not "ticking" and thus the DISPC does not receive pixel clock from the HDMI IP. This would, for example, prevent any VSYNCs from happening, and would thus affect the users of omapdss. By using LDOON when the cable is disconnected we'll avoid the HW bug, but keep the HDMI working as usual from the user's point of view. Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-03-12regset: Return -EFAULT, not -EIO, on host-side memory faultH. Peter Anvin
commit 5189fa19a4b2b4c3bec37c3a019d446148827717 upstream. There is only one error code to return for a bad user-space buffer pointer passed to a system call in the same address space as the system call is executed, and that is EFAULT. Furthermore, the low-level access routines, which catch most of the faults, return EFAULT already. Signed-off-by: H. Peter Anvin <hpa@zytor.com> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Roland McGrath <roland@hack.frob.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-03-12regset: Prevent null pointer reference on readonly regsetsH. Peter Anvin
commit c8e252586f8d5de906385d8cf6385fee289a825e upstream. The regset common infrastructure assumed that regsets would always have .get and .set methods, but not necessarily .active methods. Unfortunately people have since written regsets without .set methods. Rather than putting in stub functions everywhere, handle regsets with null .get or .set methods explicitly. Signed-off-by: H. Peter Anvin <hpa@zytor.com> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Roland McGrath <roland@hack.frob.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-03-12Fix autofs compile without CONFIG_COMPATLinus Torvalds
commit 3c761ea05a8900a907f32b628611873f6bef24b2 upstream. The autofs compat handling fix caused a compile failure when CONFIG_COMPAT isn't defined. Instead of adding random #ifdef'fery in autofs, let's just make the compat helpers earlier to use: without CONFIG_COMPAT, is_compat_task() just hardcodes to zero. We could probably do something similar for a number of other cases where we have #ifdef's in code, but this is the low-hanging fruit. Reported-and-tested-by: Andreas Schwab <schwab@linux-m68k.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-02-29epoll: limit pathsJason Baron
commit 28d82dc1c4edbc352129f97f4ca22624d1fe61de upstream. The current epoll code can be tickled to run basically indefinitely in both loop detection path check (on ep_insert()), and in the wakeup paths. The programs that tickle this behavior set up deeply linked networks of epoll file descriptors that cause the epoll algorithms to traverse them indefinitely. A couple of these sample programs have been previously posted in this thread: https://lkml.org/lkml/2011/2/25/297. To fix the loop detection path check algorithms, I simply keep track of the epoll nodes that have been already visited. Thus, the loop detection becomes proportional to the number of epoll file descriptor and links. This dramatically decreases the run-time of the loop check algorithm. In one diabolical case I tried it reduced the run-time from 15 mintues (all in kernel time) to .3 seconds. Fixing the wakeup paths could be done at wakeup time in a similar manner by keeping track of nodes that have already been visited, but the complexity is harder, since there can be multiple wakeups on different cpus...Thus, I've opted to limit the number of possible wakeup paths when the paths are created. This is accomplished, by noting that the end file descriptor points that are found during the loop detection pass (from the newly added link), are actually the sources for wakeup events. I keep a list of these file descriptors and limit the number and length of these paths that emanate from these 'source file descriptors'. In the current implemetation I allow 1000 paths of length 1, 500 of length 2, 100 of length 3, 50 of length 4 and 10 of length 5. Note that it is sufficient to check the 'source file descriptors' reachable from the newly added link, since no other 'source file descriptors' will have newly added links. This allows us to check only the wakeup paths that may have gotten too long, and not re-check all possible wakeup paths on the system. In terms of the path limit selection, I think its first worth noting that the most common case for epoll, is probably the model where you have 1 epoll file descriptor that is monitoring n number of 'source file descriptors'. In this case, each 'source file descriptor' has a 1 path of length 1. Thus, I believe that the limits I'm proposing are quite reasonable and in fact may be too generous. Thus, I'm hoping that the proposed limits will not prevent any workloads that currently work to fail. In terms of locking, I have extended the use of the 'epmutex' to all epoll_ctl add and remove operations. Currently its only used in a subset of the add paths. I need to hold the epmutex, so that we can correctly traverse a coherent graph, to check the number of paths. I believe that this additional locking is probably ok, since its in the setup/teardown paths, and doesn't affect the running paths, but it certainly is going to add some extra overhead. Also, worth noting is that the epmuex was recently added to the ep_ctl add operations in the initial path loop detection code using the argument that it was not on a critical path. Another thing to note here, is the length of epoll chains that is allowed. Currently, eventpoll.c defines: /* Maximum number of nesting allowed inside epoll sets */ #define EP_MAX_NESTS 4 This basically means that I am limited to a graph depth of 5 (EP_MAX_NESTS + 1). However, this limit is currently only enforced during the loop check detection code, and only when the epoll file descriptors are added in a certain order. Thus, this limit is currently easily bypassed. The newly added check for wakeup paths, stricly limits the wakeup paths to a length of 5, regardless of the order in which ep's are linked together. Thus, a side-effect of the new code is a more consistent enforcement of the graph depth. Thus far, I've tested this, using the sample programs previously mentioned, which now either return quickly or return -EINVAL. I've also testing using the piptest.c epoll tester, which showed no difference in performance. I've also created a number of different epoll networks and tested that they behave as expectded. I believe this solves the original diabolical test cases, while still preserving the sane epoll nesting. Signed-off-by: Jason Baron <jbaron@redhat.com> Cc: Nelson Elhage <nelhage@ksplice.com> Cc: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-02-29epoll: introduce POLLFREE to flush ->signalfd_wqh before kfree()Oleg Nesterov
commit d80e731ecab420ddcb79ee9d0ac427acbc187b4b upstream. This patch is intentionally incomplete to simplify the review. It ignores ep_unregister_pollwait() which plays with the same wqh. See the next change. epoll assumes that the EPOLL_CTL_ADD'ed file controls everything f_op->poll() needs. In particular it assumes that the wait queue can't go away until eventpoll_release(). This is not true in case of signalfd, the task which does EPOLL_CTL_ADD uses its ->sighand which is not connected to the file. This patch adds the special event, POLLFREE, currently only for epoll. It expects that init_poll_funcptr()'ed hook should do the necessary cleanup. Perhaps it should be defined as EPOLLFREE in eventpoll. __cleanup_sighand() is changed to do wake_up_poll(POLLFREE) if ->signalfd_wqh is not empty, we add the new signalfd_cleanup() helper. ep_poll_callback(POLLFREE) simply does list_del_init(task_list). This make this poll entry inconsistent, but we don't care. If you share epoll fd which contains our sigfd with another process you should blame yourself. signalfd is "really special". I simply do not know how we can define the "right" semantics if it used with epoll. The main problem is, epoll calls signalfd_poll() once to establish the connection with the wait queue, after that signalfd_poll(NULL) returns the different/inconsistent results depending on who does EPOLL_CTL_MOD/signalfd_read/etc. IOW: apart from sigmask, signalfd has nothing to do with the file, it works with the current thread. In short: this patch is the hack which tries to fix the symptoms. It also assumes that nobody can take tasklist_lock under epoll locks, this seems to be true. Note: - we do not have wake_up_all_poll() but wake_up_poll() is fine, poll/epoll doesn't use WQ_FLAG_EXCLUSIVE. - signalfd_cleanup() uses POLLHUP along with POLLFREE, we need a couple of simple changes in eventpoll.c to make sure it can't be "lost". Reported-by: Maxime Bizon <mbizon@freebox.fr> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-02-29target: Allow control CDBs with data > 1 pageAndy Grover
commit 4949314c7283ea4f9ade182ca599583b89f7edd6 upstream. We need to handle >1 page control cdbs, so extend the code to do a vmap if bigger than 1 page. It seems like kmap() is still preferable if just a page, fewer TLB shootdowns(?), so keep using that when possible. Rename function pair for their new scope. Signed-off-by: Andy Grover <agrover@redhat.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-02-29USB: Remove duplicate USB 3.0 hub feature #defines.Sarah Sharp
commit d9f5343e35d9138432657202afa8e3ddb2ade360 upstream. Somehow we ended up with duplicate hub feature #defines in ch11.h. Tatyana Brokhman first created the USB 3.0 hub feature macros in 2.6.38 with commit 0eadcc09203349b11ca477ec367079b23d32ab91 "usb: USB3.0 ch11 definitions". In 2.6.39, I modified a patch from John Youn that added similar macros in a different place in the same file, and committed dbe79bbe9dcb22cb3651c46f18943477141ca452 "USB 3.0 Hub Changes". Some of the #defines used different names for the same values. Others used exactly the same names with the same values, like these gems: #define USB_PORT_FEAT_BH_PORT_RESET 28 ... #define USB_PORT_FEAT_BH_PORT_RESET 28 According to my very geeky husband (who looked it up in the C99 spec), it is allowed to have object-like macros with duplicate names as long as the replacement list is exactly the same. However, he recalled that some compilers will give warnings when they find duplicate macros. It's probably best to remove the duplicates in the stable tree, so that the code compiles for everyone. The macros are now fixed to move the feature requests that are specific to USB 3.0 hubs into a new section (out of the USB 2.0 hub feature section), and use the most common macro name. This patch should be backported to 2.6.39. Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com> Cc: Tatyana Brokhman <tlinder@codeaurora.org> Cc: John Youn <johnyoun@synopsys.com> Cc: Jamey Sharp <jamey@minilop.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-02-29net: Make qdisc_skb_cb upper size bound explicit.David S. Miller
[ Upstream commit 16bda13d90c8d5da243e2cfa1677e62ecce26860 ] Just like skb->cb[], so that qdisc_skb_cb can be encapsulated inside of other data structures. This is intended to be used by IPoIB so that it can remember addressing information stored at hard_header_ops->create() time that it can fetch when the packet gets to the transmit routine. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-02-29ipv4: reset flowi parameters on route connectJulian Anastasov
[ Upstream commit e6b45241c57a83197e5de9166b3b0d32ac562609 ] Eric Dumazet found that commit 813b3b5db83 (ipv4: Use caller's on-stack flowi as-is in output route lookups.) that comes in 3.0 added a regression. The problem appears to be that resulting flowi4_oif is used incorrectly as input parameter to some routing lookups. The result is that when connecting to local port without listener if the IP address that is used is not on a loopback interface we incorrectly assign RTN_UNICAST to the output route because no route is matched by oif=lo. The RST packet can not be sent immediately by tcp_v4_send_reset because it expects RTN_LOCAL. So, change ip_route_connect and ip_route_newports to update the flowi4 fields that are input parameters because we do not want unnecessary binding to oif. To make it clear what are the input parameters that can be modified during lookup and to show which fields of floiw4 are reused add a new function to update the flowi4 structure: flowi4_update_output. Thanks to Yurij M. Plotnikov for providing a bug report including a program to reproduce the problem. Thanks to Eric Dumazet for tracking the problem down to tcp_v4_send_reset and providing initial fix. Reported-by: Yurij M. Plotnikov <Yurij.Plotnikov@oktetlabs.ru> Signed-off-by: Julian Anastasov <ja@ssi.bg> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-02-29NFSv4: Fix an Oops in the NFSv4 getacl codeTrond Myklebust
commit 331818f1c468a24e581aedcbe52af799366a9dfe upstream. Commit bf118a342f10dafe44b14451a1392c3254629a1f (NFSv4: include bitmap in nfsv4 get acl data) introduces the 'acl_scratch' page for the case where we may need to decode multi-page data. However it fails to take into account the fact that the variable may be NULL (for the case where we're not doing multi-page decode), and it also attaches it to the encoding xdr_stream rather than the decoding one. The immediate result is an Oops in nfs4_xdr_enc_getacl due to the call to page_address() with a NULL page pointer. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Andy Adamson <andros@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-02-20crypto: sha512 - use standard ror64()Alexey Dobriyan
commit f2ea0f5f04c97b48c88edccba52b0682fbe45087 upstream. Use standard ror64() instead of hand-written. There is no standard ror64, so create it. The difference is shift value being "unsigned int" instead of uint64_t (for which there is no reason). gcc starts to emit native ROR instructions which it doesn't do for some reason currently. This should make the code faster. Patch survives in-tree crypto test and ping flood with hmac(sha512) on. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-02-20mmc: dw_mmc: Fix PIO mode with support of highmemSeungwon Jeon
commit f9c2a0dc42a6938ff2a80e55ca2bbd1d5581c72e upstream. Current PIO mode makes a kernel crash with CONFIG_HIGHMEM. Highmem pages have a NULL from sg_virt(sg). This patch fixes the following problem. Unable to handle kernel NULL pointer dereference at virtual address 00000000 pgd = c0004000 [00000000] *pgd=00000000 Internal error: Oops: 817 [#1] PREEMPT SMP Modules linked in: CPU: 0 Not tainted (3.0.15-01423-gdbf465f #589) PC is at dw_mci_pull_data32+0x4c/0x9c LR is at dw_mci_read_data_pio+0x54/0x1f0 pc : [<c0358824>] lr : [<c035988c>] psr: 20000193 sp : c0619d48 ip : c0619d70 fp : c0619d6c r10: 00000000 r9 : 00000002 r8 : 00001000 r7 : 00000200 r6 : 00000000 r5 : e1dd3100 r4 : 00000000 r3 : 65622023 r2 : 0000007f r1 : eeb96000 r0 : e1dd3100 Flags: nzCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment xkernel Control: 10c5387d Table: 61e2004a DAC: 00000015 Process swapper (pid: 0, stack limit = 0xc06182f0) Stack: (0xc0619d48 to 0xc061a000) 9d40: e1dd3100 e1a4f000 00000000 e1dd3100 e1a4f000 00000200 9d60: c0619da4 c0619d70 c035988c c03587e4 c0619d9c e18158f4 e1dd3100 e1dd3100 9d80: 00000020 00000000 00000000 00000020 c06e8a84 00000000 c0619e04 c0619da8 9da0: c0359b24 c0359844 e18158f4 e1dd3164 e1dd3168 e1dd3150 3d02fc79 e1dd3154 9dc0: e1dd3178 00000000 00000020 00000000 e1dd3150 00000000 c10dd7e8 e1a84900 9de0: c061e7cc 00000000 00000000 0000008d c06e8a84 c061e780 c0619e4c c0619e08 9e00: c00c4738 c0359a34 3d02fc79 00000000 c0619e4c c05a1698 c05a1670 c05a165c 9e20: c04de8b0 c061e780 c061e7cc e1a84900 ffffed68 0000008d c0618000 00000000 9e40: c0619e6c c0619e50 c00c48b4 c00c46c8 c061e780 c00423ac c061e7cc ffffed68 9e60: c0619e8c c0619e70 c00c7358 c00c487c 0000008d ffffee38 c0618000 ffffed68 9e80: c0619ea4 c0619e90 c00c4258 c00c72b0 c00423ac ffffee38 c0619ecc c0619ea8 9ea0: c004241c c00c4234 ffffffff f8810000 0000006d 00000002 00000001 7fffffff 9ec0: c0619f44 c0619ed0 c0048bc0 c00423c4 220ae7a9 00000000 386f0d30 0005d3a4 9ee0: c00423ac c10dd0b8 c06f2cd8 c0618000 c0594778 c003a674 7fffffff c0619f44 9f00: 386f0d30 c0619f18 c00a6f94 c005be3c 80000013 ffffffff 386f0d30 0005d3a4 9f20: 386f0d30 0005d2d1 c10dd0a8 c10dd0b8 c06f2cd8 c0618000 c0619f74 c0619f48 9f40: c0345858 c005be00 c00a2440 c0618000 c0618000 c00410d8 c06c1944 c00410fc 9f60: c0594778 c003a674 c0619f9c c0619f78 c004a7e8 c03457b4 c0618000 c06c18f8 9f80: 00000000 c0039c70 c06c18d4 c003a674 c0619fb4 c0619fa0 c04ceafc c004a714 9fa0: c06287b4 c06c18f8 c0619ff4 c0619fb8 c0008b68 c04cea68 c0008578 00000000 9fc0: 00000000 c003a674 00000000 10c5387d c0628658 c003aa78 c062f1c4 4000406a 9fe0: 413fc090 00000000 00000000 c0619ff8 40008044 c0008858 00000000 00000000 Backtrace: [<c03587d8>] (dw_mci_pull_data32+0x0/0x9c) from [<c035988c>] (dw_mci_read_data_pio+0x54/0x1f0) r6:00000200 r5:e1a4f000 r4:e1dd3100 [<c0359838>] (dw_mci_read_data_pio+0x0/0x1f0) from [<c0359b24>] (dw_mci_interrupt+0xfc/0x4a4) [<c0359a28>] (dw_mci_interrupt+0x0/0x4a4) from [<c00c4738>] (handle_irq_event_percpu+0x7c/0x1b4) [<c00c46bc>] (handle_irq_event_percpu+0x0/0x1b4) from [<c00c48b4>] (handle_irq_event+0x44/0x64) [<c00c4870>] (handle_irq_event+0x0/0x64) from [<c00c7358>] (handle_fasteoi_irq+0xb4/0x124) r7:ffffed68 r6:c061e7cc r5:c00423ac r4:c061e780 [<c00c72a4>] (handle_fasteoi_irq+0x0/0x124) from [<c00c4258>] (generic_handle_irq+0x30/0x38) r7:ffffed68 r6:c0618000 r5:ffffee38 r4:0000008d [<c00c4228>] (generic_handle_irq+0x0/0x38) from [<c004241c>] (asm_do_IRQ+0x64/0xe0) r5:ffffee38 r4:c00423ac [<c00423b8>] (asm_do_IRQ+0x0/0xe0) from [<c0048bc0>] (__irq_svc+0x80/0x14c) Exception stack(0xc0619ed0 to 0xc0619f18) Signed-off-by: Seungwon Jeon <tgih.jun@samsung.com> Acked-by: Will Newton <will.newton@imgtec.com> Signed-off-by: Chris Ball <cjb@laptop.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-02-20writeback: fix dereferencing NULL bdi->dev on trace_writeback_queueWu Fengguang
commit 977b7e3a52a7421ad33a393a38ece59f3d41c2fa upstream. When a SD card is hot removed without umount, del_gendisk() will call bdi_unregister() without destroying/freeing it. This leaves the bdi in the bdi->dev = NULL, bdi->wb.task = NULL, bdi->bdi_list removed state. When sync(2) gets the bdi before bdi_unregister() and calls bdi_queue_work() after the unregister, trace_writeback_queue will be dereferencing the NULL bdi->dev. Fix it with a simple test for NULL. LKML-reference: http://lkml.org/lkml/2012/1/18/346 Reported-by: Rabin Vincent <rabin@rab.in> Tested-by: Namjae Jeon <linkinjeon@gmail.com> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-02-20writeback: fix NULL bdi->dev in trace writeback_single_inodeWu Fengguang
commit 15eb77a07c714ac80201abd0a9568888bcee6276 upstream. bdi_prune_sb() resets sb->s_bdi to default_backing_dev_info when the tearing down the original bdi. Fix trace_writeback_single_inode to use sb->s_bdi=default_backing_dev_info rather than bdi->dev=NULL for a teared down bdi. Reported-by: Rabin Vincent <rabin@rab.in> Tested-by: Rabin Vincent <rabin@rab.in> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-02-20lib: proportion: lower PROP_MAX_SHIFT to 32 on 64-bit kernelWu Fengguang
commit 3310225dfc71a35a2cc9340c15c0e08b14b3c754 upstream. PROP_MAX_SHIFT should be set to <=32 on 64-bit box. This fixes two bugs in the below lines of bdi_dirty_limit(): bdi_dirty *= numerator; do_div(bdi_dirty, denominator); 1) divide error: do_div() only uses the lower 32 bit of the denominator, which may trimmed to be 0 when PROP_MAX_SHIFT > 32. 2) overflow: (bdi_dirty * numerator) could easily overflow if numerator used up to 48 bits, leaving only 16 bits to bdi_dirty Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Reported-by: Ilya Tumaykin <librarian_rus@yahoo.com> Tested-by: Ilya Tumaykin <librarian_rus@yahoo.com> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-02-13usb: ch9.h: usb_endpoint_maxp() uses __le16_to_cpu()Kuninori Morimoto
commit 9c0a835a9d9aed41bcf9c287f5069133a6e2a87b upstream. The usb/ch9.h will be installed to /usr/include/linux, and be used from user space. But le16_to_cpu() is only defined for kernel code. Without this patch, user space compile will be broken. Special thanks to Stefan Becker Reported-by: Stefan Becker <chemobejk@gmail.com> Signed-off-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com> Signed-off-by: Felipe Balbi <balbi@ti.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-02-13PM / QoS: CPU C-state breakage with PM Qos changeVenkatesh Pallipadi
commit d020283dc694c9ec31b410f522252f7a8397e67d upstream. Looks like change "PM QoS: Move and rename the implementation files" merged during the 3.2 development cycle made PM QoS depend on CONFIG_PM which depends on (PM_SLEEP || PM_RUNTIME). That breaks CPU C-states with kernels not having these CONFIGs, causing CPUs to spend time in Polling loop idle instead of going into deep C-states, consuming way way more power. This is with either acpi idle or intel idle enabled. Either CONFIG_PM should be enabled with any pm_qos users or the !CONFIG_PM pm_qos_request() should return sane defaults not to break the existing users. Here's is the patch for the latter option. [rjw: Modified the changelog slightly.] Signed-off-by: Venkatesh Pallipadi <venki@google.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-02-13PM / Hibernate: Fix s2disk regression related to freezing workqueuesRafael J. Wysocki
commit 181e9bdef37bfcaa41f3ab6c948a2a0d60a268b5 upstream. Commit 2aede851ddf08666f68ffc17be446420e9d2a056 PM / Hibernate: Freeze kernel threads after preallocating memory introduced a mechanism by which kernel threads were frozen after the preallocation of hibernate image memory to avoid problems with frozen kernel threads not responding to memory freeing requests. However, it overlooked the s2disk code path in which the SNAPSHOT_CREATE_IMAGE ioctl was run directly after SNAPSHOT_FREE, which caused freeze_workqueues_begin() to BUG(), because it saw that worqueues had been already frozen. Although in principle this issue might be addressed by removing the relevant BUG_ON() from freeze_workqueues_begin(), that would reintroduce the very problem that commit 2aede851ddf08666f68ffc17be4 attempted to avoid into that particular code path. For this reason, to fix the issue at hand, introduce thaw_kernel_threads() and make the SNAPSHOT_FREE ioctl execute it. Special thanks to Srivatsa S. Bhat for detailed analysis of the problem. Reported-and-tested-by: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-02-06PCI: Rework ASPM disable codeMatthew Garrett
commit 3c076351c4027a56d5005a39a0b518a4ba393ce2 upstream. Right now we forcibly clear ASPM state on all devices if the BIOS indicates that the feature isn't supported. Based on the Microsoft presentation "PCI Express In Depth for Windows Vista and Beyond", I'm starting to think that this may be an error. The implication is that unless the platform grants full control via _OSC, Windows will not touch any PCIe features - including ASPM. In that case clearing ASPM state would be an error unless the platform has granted us that control. This patch reworks the ASPM disabling code such that the actual clearing of state is triggered by a successful handoff of PCIe control to the OS. The general ASPM code undergoes some changes in order to ensure that the ability to clear the bits isn't overridden by ASPM having already been disabled. Further, this theoretically now allows for situations where only a subset of PCIe roots hand over control, leaving the others in the BIOS state. It's difficult to know for sure that this is the right thing to do - there's zero public documentation on the interaction between all of these components. But enough vendors enable ASPM on platforms and then set this bit that it seems likely that they're expecting the OS to leave them alone. Measured to save around 5W on an idle Thinkpad X220. Signed-off-by: Matthew Garrett <mjg@redhat.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-02-03netns: Fail conspicously if someone uses net_generic at an inappropriate time.Eric W. Biederman
[ Upstream commit 5ee4433efe99b9f39f6eff5052a177bbcfe72cea ] By definition net_generic should never be called when it can return NULL. Fail conspicously with a BUG_ON to make it clear when people mess up that a NULL return should never happen. Recently there was a bug in the CAIF subsystem where it was registered with register_pernet_device instead of register_pernet_subsys. It was erroneously concluded that net_generic could validly return NULL and that net_assign_generic was buggy (when it was just inefficient). Hopefully this BUG_ON will prevent people to coming to similar erroneous conclusions in the futrue. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Tested-by: Sasha Levin <levinsasha928@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-02-03drm: Fix authentication kernel crashThomas Hellstrom
commit 598781d71119827b454fd75d46f84755bca6f0c6 upstream. If the master tries to authenticate a client using drm_authmagic and that client has already closed its drm file descriptor, either wilfully or because it was terminated, the call to drm_authmagic will dereference a stale pointer into kmalloc'ed memory and corrupt it. Typically this results in a hard system hang. This patch fixes that problem by removing any authentication tokens (struct drm_magic_entry) open for a file descriptor when that file descriptor is closed. Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-01-25SHM_UNLOCK: fix Unevictable pages stranded after swapHugh Dickins
commit 245132643e1cfcd145bbc86a716c1818371fcb93 upstream. Commit cc39c6a9bbde ("mm: account skipped entries to avoid looping in find_get_pages") correctly fixed an infinite loop; but left a problem that find_get_pages() on shmem would return 0 (appearing to callers to mean end of tree) when it meets a run of nr_pages swap entries. The only uses of find_get_pages() on shmem are via pagevec_lookup(), called from invalidate_mapping_pages(), and from shmctl SHM_UNLOCK's scan_mapping_unevictable_pages(). The first is already commented, and not worth worrying about; but the second can leave pages on the Unevictable list after an unusual sequence of swapping and locking. Fix that by using shmem_find_get_pages_and_swap() (then ignoring the swap) instead of pagevec_lookup(). But I don't want to contaminate vmscan.c with shmem internals, nor shmem.c with LRU locking. So move scan_mapping_unevictable_pages() into shmem.c, renaming it shmem_unlock_mapping(); and rename check_move_unevictable_page() to check_move_unevictable_pages(), looping down an array of pages, oftentimes under the same lock. Leave out the "rotate unevictable list" block: that's a leftover from when this was used for /proc/sys/vm/scan_unevictable_pages, whose flawed handling involved looking at pages at tail of LRU. Was there significance to the sequence first ClearPageUnevictable, then test page_evictable, then SetPageUnevictable here? I think not, we're under LRU lock, and have no barriers between those. Signed-off-by: Hugh Dickins <hughd@google.com> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: Rik van Riel <riel@redhat.com> Cc: Shaohua Li <shaohua.li@intel.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michel Lespinasse <walken@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2012-01-25tuner: Fix numberspace conflict between xc4000 and pti 5nf05 tunersMiroslav Slugen
commit cd4ca7afc61d3b18fcd635002459fb6b1d701099 upstream. Update xc4000 tuner definition, number 81 is already in use by TUNER_PARTSNIC_PTI_5NF05. Signed-off-by: Miroslav Slugen <thunder.mmm@gmail.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2012-01-25target: Set additional sense length field in sense dataRoland Dreier
commit 895f3022523361e9b383cf48f51feb1f7d5e7e53 upstream. The target code was not setting the additional sense length field in the sense data it returned, which meant that at least the Linux stack ignored the ASC/ASCQ fields. For example, without this patch, on a tcm_loop device: # sg_raw -v /dev/sda 2 0 0 0 0 0 gives cdb to send: 02 00 00 00 00 00 SCSI Status: Check Condition Sense Information: Fixed format, current; Sense key: Illegal Request Raw sense data (in hex): 70 00 05 00 00 00 00 00 while after the patch we correctly get the following (which matches what a regular disk returns): cdb to send: 02 00 00 00 00 00 SCSI Status: Check Condition Sense Information: Fixed format, current; Sense key: Illegal Request Additional sense: Invalid command operation code Raw sense data (in hex): 70 00 05 00 00 00 00 0a 00 00 00 00 20 00 00 00 00 00 Signed-off-by: Roland Dreier <roland@purestorage.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2012-01-25ACPI: Store SRAT table revisionKurt Garloff
commit 8df0eb7c9d96f9e82f233ee8b74e0f0c8471f868 upstream. In SRAT v1, we had 8bit proximity domain (PXM) fields; SRAT v2 provides 32bits for these. The new fields were reserved before. According to the ACPI spec, the OS must disregrard reserved fields. In order to know whether or not, we must know what version the SRAT table has. This patch stores the SRAT table revision for later consumption by arch specific __init functions. Signed-off-by: Kurt Garloff <kurt@garloff.de> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>