<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux/arch/arm/lib, branch v3.13.5</title>
<subtitle>Linux kernel source tree</subtitle>
<id>https://git.amat.us/linux/atom/arch/arm/lib?h=v3.13.5</id>
<link rel='self' href='https://git.amat.us/linux/atom/arch/arm/lib?h=v3.13.5'/>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/'/>
<updated>2013-11-30T22:21:03Z</updated>
<entry>
<title>ARM: 7907/1: lib: delay-loop: Add align directive to fix BogoMIPS calculation</title>
<updated>2013-11-30T22:21:03Z</updated>
<author>
<name>Fabio Estevam</name>
<email>festevam@gmail.com</email>
</author>
<published>2013-11-30T14:24:42Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=11d4bb1bd067f9d01d18f620ccfad516dc579593'/>
<id>urn:sha1:11d4bb1bd067f9d01d18f620ccfad516dc579593</id>
<content type='text'>
Currently mx53 (CortexA8) running at 1GHz reports:
Calibrating delay loop... 663.55 BogoMIPS (lpj=3317760)

Tom Evans verified that alignments of 0x0 and 0x8 run the two instructions of __loop_delay in one clock cycle (1 clock/loop), while alignments of 0x4 and 0xc take 3 clocks to run the loop twice. (1.5 clock/loop)

The original object code looks like this:

00000010 &lt;__loop_const_udelay&gt;:
  10:	e3e01000 	mvn	r1, #0
  14:	e51f201c 	ldr	r2, [pc, #-28]	; 0 &lt;__loop_udelay-0x8&gt;
  18:	e5922000 	ldr	r2, [r2]
  1c:	e0800921 	add	r0, r0, r1, lsr #18
  20:	e1a00720 	lsr	r0, r0, #14
  24:	e0822b21 	add	r2, r2, r1, lsr #22
  28:	e1a02522 	lsr	r2, r2, #10
  2c:	e0000092 	mul	r0, r2, r0
  30:	e0800d21 	add	r0, r0, r1, lsr #26
  34:	e1b00320 	lsrs	r0, r0, #6
  38:	01a0f00e 	moveq	pc, lr

0000003c &lt;__loop_delay&gt;:
  3c:	e2500001 	subs	r0, r0, #1
  40:	8afffffe 	bhi	3c &lt;__loop_delay&gt;
  44:	e1a0f00e 	mov	pc, lr

After adding the 'align 3' directive to __loop_delay (align to 8 bytes):

00000010 &lt;__loop_const_udelay&gt;:
  10:	e3e01000 	mvn	r1, #0
  14:	e51f201c 	ldr	r2, [pc, #-28]	; 0 &lt;__loop_udelay-0x8&gt;
  18:	e5922000 	ldr	r2, [r2]
  1c:	e0800921 	add	r0, r0, r1, lsr #18
  20:	e1a00720 	lsr	r0, r0, #14
  24:	e0822b21 	add	r2, r2, r1, lsr #22
  28:	e1a02522 	lsr	r2, r2, #10
  2c:	e0000092 	mul	r0, r2, r0
  30:	e0800d21 	add	r0, r0, r1, lsr #26
  34:	e1b00320 	lsrs	r0, r0, #6
  38:	01a0f00e 	moveq	pc, lr
  3c:	e320f000 	nop	{0}

00000040 &lt;__loop_delay&gt;:
  40:	e2500001 	subs	r0, r0, #1
  44:	8afffffe 	bhi	40 &lt;__loop_delay&gt;
  48:	e1a0f00e 	mov	pc, lr
  4c:	e320f000 	nop	{0}

, which now reports:
Calibrating delay loop... 996.14 BogoMIPS (lpj=4980736)

Some more test results:

On mx31 (ARM1136) running at 532 MHz, before the patch:
Calibrating delay loop... 351.43 BogoMIPS (lpj=1757184)

On mx31 (ARM1136) running at 532 MHz after the patch:
Calibrating delay loop... 528.79 BogoMIPS (lpj=2643968)

Also tested on mx6 (CortexA9) and on mx27 (ARM926), which shows the same
BogoMIPS value before and after this patch.

Reported-by: Tom Evans &lt;tom_usenet@optusnet.com.au&gt;
Suggested-by: Tom Evans &lt;tom_usenet@optusnet.com.au&gt;
Signed-off-by: Fabio Estevam &lt;fabio.estevam@freescale.com&gt;
Signed-off-by: Russell King &lt;rmk+kernel@arm.linux.org.uk&gt;
</content>
</entry>
<entry>
<title>ARM: 7893/1: bitops: only emit .arch_extension mp if CONFIG_SMP</title>
<updated>2013-11-20T23:05:53Z</updated>
<author>
<name>Will Deacon</name>
<email>will.deacon@arm.com</email>
</author>
<published>2013-11-19T14:46:11Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=b7ec699405f55667caeb46d96229d75bf33a83ad'/>
<id>urn:sha1:b7ec699405f55667caeb46d96229d75bf33a83ad</id>
<content type='text'>
Uwe reported a build failure when targetting a NOMMU platform with my
recent prefetch changes:

  arch/arm/lib/changebit.S: Assembler messages:
  arch/arm/lib/changebit.S:15: Error: architectural extension `mp' is
			not allowed for the current base architecture

This is due to use of the .arch_extension mp directive immediately prior
to an ALT_SMP(...) instruction. Whilst the ALT_SMP macro will expand to
nothing if !CONFIG_SMP, gas will still choke on the directive.

This patch fixes the issue by only emitting the sequence (including the
directive) if CONFIG_SMP=y.

Tested-by: Uwe Kleine-König &lt;u.kleine-koenig@pengutronix.de&gt;
Signed-off-by: Will Deacon &lt;will.deacon@arm.com&gt;
Signed-off-by: Russell King &lt;rmk+kernel@arm.linux.org.uk&gt;
</content>
</entry>
<entry>
<title>Merge branch 'for-linus' of git://git.linaro.org/people/rmk/linux-arm</title>
<updated>2013-11-13T23:51:29Z</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2013-11-13T23:51:29Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=f47671e2d861a2093179cd64dda22016664b2015'/>
<id>urn:sha1:f47671e2d861a2093179cd64dda22016664b2015</id>
<content type='text'>
Pull ARM updates from Russell King:
 "Included in this series are:

   1. BE8 (modern big endian) changes for ARM from Ben Dooks
   2. big.Little support from Nicolas Pitre and Dave Martin
   3. support for LPAE systems with all system memory above 4GB
   4. Perf updates from Will Deacon
   5. Additional prefetching and other performance improvements from Will.
   6. Neon-optimised AES implementation fro Ard.
   7. A number of smaller fixes scattered around the place.

  There is a rather horrid merge conflict in tools/perf - I was never
  notified of the conflict because it originally occurred between Will's
  tree and other stuff.  Consequently I have a resolution which Will
  forwarded me, which I'll forward on immediately after sending this
  mail.

  The other notable thing is I'm expecting some build breakage in the
  crypto stuff on ARM only with Ard's AES patches.  These were merged
  into a stable git branch which others had already pulled, so there's
  little I can do about this.  The problem is caused because these
  patches have a dependency on some code in the crypto git tree - I
  tried requesting a branch I can pull to resolve these, and all I got
  each time from the crypto people was "we'll revert our patches then"
  which would only make things worse since I still don't have the
  dependent patches.  I've no idea what's going on there or how to
  resolve that, and since I can't split these patches from the rest of
  this pull request, I'm rather stuck with pushing this as-is or
  reverting Ard's patches.

  Since it should "come out in the wash" I've left them in - the only
  build problems they seem to cause at the moment are with randconfigs,
  and since it's a new feature anyway.  However, if by -rc1 the
  dependencies aren't in, I think it'd be best to revert Ard's patches"

I resolved the perf conflict roughly as per the patch sent by Russell,
but there may be some differences.  Any errors are likely mine.  Let's
see how the crypto issues work out..

* 'for-linus' of git://git.linaro.org/people/rmk/linux-arm: (110 commits)
  ARM: 7868/1: arm/arm64: remove atomic_clear_mask() in "include/asm/atomic.h"
  ARM: 7867/1: include: asm: use 'int' instead of 'unsigned long' for 'oldval' in atomic_cmpxchg().
  ARM: 7866/1: include: asm: use 'long long' instead of 'u64' within atomic.h
  ARM: 7871/1: amba: Extend number of IRQS
  ARM: 7887/1: Don't smp_cross_call() on UP devices in arch_irq_work_raise()
  ARM: 7872/1: Support arch_irq_work_raise() via self IPIs
  ARM: 7880/1: Clear the IT state independent of the Thumb-2 mode
  ARM: 7878/1: nommu: Implement dummy early_paging_init()
  ARM: 7876/1: clear Thumb-2 IT state on exception handling
  ARM: 7874/2: bL_switcher: Remove cpu_hotplug_driver_{lock,unlock}()
  ARM: footbridge: fix build warnings for netwinder
  ARM: 7873/1: vfp: clear vfp_current_hw_state for dying cpu
  ARM: fix misplaced arch_virt_to_idmap()
  ARM: 7848/1: mcpm: Implement cpu_kill() to synchronise on powerdown
  ARM: 7847/1: mcpm: Factor out logical-to-physical CPU translation
  ARM: 7869/1: remove unused XSCALE_PMU Kconfig param
  ARM: 7864/1: Handle 64-bit memory in case of 32-bit phys_addr_t
  ARM: 7863/1: Let arm_add_memory() always use 64-bit arguments
  ARM: 7862/1: pcpu: replace __get_cpu_var_uses
  ARM: 7861/1: cacheflush: consolidate single-CPU ARMv7 cache disabling code
  ...
</content>
</entry>
<entry>
<title>Merge branch 'devel-stable' into for-next</title>
<updated>2013-11-12T10:58:59Z</updated>
<author>
<name>Russell King</name>
<email>rmk+kernel@arm.linux.org.uk</email>
</author>
<published>2013-11-12T10:58:59Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=df762eccbadf87850fbee444d729e0f1b1e946f1'/>
<id>urn:sha1:df762eccbadf87850fbee444d729e0f1b1e946f1</id>
<content type='text'>
Conflicts:
	arch/arm/include/asm/atomic.h
	arch/arm/include/asm/hardirq.h
	arch/arm/kernel/smp.c
</content>
</entry>
<entry>
<title>ARM: 7858/1: mm: make UACCESS_WITH_MEMCPY huge page aware</title>
<updated>2013-10-29T11:06:15Z</updated>
<author>
<name>Steven Capper</name>
<email>steve.capper@linaro.org</email>
</author>
<published>2013-10-14T08:49:10Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=a3a9ea656d19251326cdeaaa0b5adbfac41ddacf'/>
<id>urn:sha1:a3a9ea656d19251326cdeaaa0b5adbfac41ddacf</id>
<content type='text'>
The memory pinning code in uaccess_with_memcpy.c does not check
for HugeTLB or THP pmds, and will enter an infinite loop should
a __copy_to_user or __clear_user occur against a huge page.

This patch adds detection code for huge pages to pin_page_for_write.
As this code can be executed in a fast path it refers to the actual
pmds rather than the vma. If a HugeTLB or THP is found (they have
the same pmd representation on ARM), the page table spinlock is
taken to prevent modification whilst the page is pinned.

On ARM, huge pages are only represented as pmds, thus no huge pud
checks are performed. (For huge puds one would lock the page table
in a similar manner as in the pmd case).

Two helper functions are introduced; pmd_thp_or_huge will check
whether or not a page is huge or transparent huge (which have the
same pmd layout on ARM), and pmd_hugewillfault will detect whether
or not a page fault will occur on write to the page.

Running the following test (with the chunking from read_zero
removed):
 $ dd if=/dev/zero of=/dev/null bs=10M count=1024
Gave:  2.3 GB/s backed by normal pages,
       2.9 GB/s backed by huge pages,
       5.1 GB/s backed by huge pages, with page mask=HPAGE_MASK.

After some discussion, it was decided not to adopt the HPAGE_MASK,
as this would have a significant detrimental effect on the overall
system latency due to page_table_lock being held for too long.
This could be revisited if split huge page locks are adopted.

Signed-off-by: Steve Capper &lt;steve.capper@linaro.org&gt;
Reviewed-by: Nicolas Pitre &lt;nico@linaro.org&gt;
Signed-off-by: Russell King &lt;rmk+kernel@arm.linux.org.uk&gt;
</content>
</entry>
<entry>
<title>ARM: bitops: prefetch the destination word for write prior to strex</title>
<updated>2013-09-30T15:42:56Z</updated>
<author>
<name>Will Deacon</name>
<email>will.deacon@arm.com</email>
</author>
<published>2013-06-27T11:01:51Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=d779c07dd72098a7416d907494f958213b7726f3'/>
<id>urn:sha1:d779c07dd72098a7416d907494f958213b7726f3</id>
<content type='text'>
The cost of changing a cacheline from shared to exclusive state can be
significant, especially when this is triggered by an exclusive store,
since it may result in having to retry the transaction.

This patch prefixes our atomic bitops implementation with prefetchw,
to try and grab the line in exclusive state from the start. The testop
macro is left alone, since the barrier semantics limit the usefulness
of prefetching data.

Acked-by: Nicolas Pitre &lt;nico@linaro.org&gt;
Signed-off-by: Will Deacon &lt;will.deacon@arm.com&gt;
</content>
</entry>
<entry>
<title>ARM: delete mach-shark</title>
<updated>2013-09-17T10:34:36Z</updated>
<author>
<name>Linus Walleij</name>
<email>linus.walleij@linaro.org</email>
</author>
<published>2013-09-03T09:32:24Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=136dfa5edae3207422a8b93347eb79e92e07cdfa'/>
<id>urn:sha1:136dfa5edae3207422a8b93347eb79e92e07cdfa</id>
<content type='text'>
The Shark machine sub-architecture (also known as DNARD, the
DIGITAL Network Appliance Reference Design) lacks a maintainer
able to apply and test patches to modernize the architecture.

It is suspected that the current kernel, while it compiles,
does not even boot on this machine. The listed maintainer has
expressed that he will not be able to spend any time on the
maintenance for the coming year.

So let's delete it from the kernel for now. It can always be
resurrected with git revert if maintenance is resumed.

As the VIA82c505 PCI adapter was only used by this
architecture, that gets deleted too.

Cc: arm@kernel.org
Cc: Alexander Schulz &lt;alex@shark-linux.de&gt;
Signed-off-by: Linus Walleij &lt;linus.walleij@linaro.org&gt;
</content>
</entry>
<entry>
<title>ARM: 7835/2: fix modular build of xor_blocks() with NEON enabled</title>
<updated>2013-09-09T14:24:47Z</updated>
<author>
<name>Ard Biesheuvel</name>
<email>ard.biesheuvel@linaro.org</email>
</author>
<published>2013-09-09T14:08:38Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=9319206d712a5462b93870c38144a7c9a452d392'/>
<id>urn:sha1:9319206d712a5462b93870c38144a7c9a452d392</id>
<content type='text'>
Commit 0195659 introduced a NEON accelerated version of the xor_blocks()
function, but it needs the changes in this patch to allow it to be built
as a module rather than statically into the kernel.

This patch creates a separate module xor-neon.ko which exports the NEON
inner xor_blocks() functions depended upon by the regular xor.ko if it
is built with CONFIG_KERNEL_MODE_NEON=y

Reported-by: Josh Boyer &lt;jwboyer@fedoraproject.org&gt;
Signed-off-by: Ard Biesheuvel &lt;ard.biesheuvel@linaro.org&gt;
Signed-off-by: Russell King &lt;rmk+kernel@arm.linux.org.uk&gt;
</content>
</entry>
<entry>
<title>Pull branch 'for-rmk' of git://git.linaro.org/people/ardbiesheuvel/linux-arm into devel-stable</title>
<updated>2013-07-22T16:46:40Z</updated>
<author>
<name>Russell King</name>
<email>rmk+kernel@arm.linux.org.uk</email>
</author>
<published>2013-07-22T16:26:27Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=b4f656eea63376da79b0b5a17660c4ce14b71b74'/>
<id>urn:sha1:b4f656eea63376da79b0b5a17660c4ce14b71b74</id>
<content type='text'>
Comments from Ard Biesheuvel:

I have included two use cases that I have been using, XOR and RAID-6
checksumming. The former gets a 60% performance boost on the NEON, the
latter over 400%.

ARM: add support for kernel mode NEON

Adds kernel_neon_begin/end (renamed from kernel_vfp_begin/end in the
previous version to de-emphasize the VFP part as VFP code that needs
software assistance is not supported currently.)

Introduces &lt;asm/neon.h&gt; and the Kconfig symbol KERNEL_MODE_NEON. This
has been aligned with Catalin for arm64, so any NEON code that does
not use assembly but intrinsics or the GCC vectorizer (such as my
examples) can potentially be shared between arm and arm64 archs.

ARM: move VFP init to an earlier boot stage

This is needed so the NEON is enabled when the XOR and RAID-6 algo
boot time benchmarks are run.

ARM: be strict about FP exceptions in kernel mode

This adds a check to vfp_support_entry() to flag unsupported uses of
the NEON/VFP in kernel mode. FP exceptions (bounces) are flagged as
a bug, this is because of their potentially intermittent nature.
Exceptions caused by the fact that kernel_neon_begin has not been
called are just routed through the undef handler.

ARM: crypto: add NEON accelerated XOR implementation

This is the xor_blocks() implementation built with -ftree-vectorize,
60% faster than optimized ARM code. It calls in_interrupt() to check
whether the NEON flavor can be used: this should really not be
necessary, but due to xor_blocks'squite generic nature, there is no
telling how exactly people may be using it in the real world.

lib/raid6: add ARM-NEON accelerated syndrome calculation

This is a port of the RAID-6 checksumming code in altivec.uc ported
to use NEON intrinsics. It is about 4x faster than the sequential
code.
</content>
</entry>
<entry>
<title>arm: delete __cpuinit/__CPUINIT usage from all ARM users</title>
<updated>2013-07-14T23:36:52Z</updated>
<author>
<name>Paul Gortmaker</name>
<email>paul.gortmaker@windriver.com</email>
</author>
<published>2013-06-17T19:43:14Z</published>
<link rel='alternate' type='text/html' href='https://git.amat.us/linux/commit/?id=8bd26e3a7e49af2697449bbcb7187a39dc85d672'/>
<id>urn:sha1:8bd26e3a7e49af2697449bbcb7187a39dc85d672</id>
<content type='text'>
The __cpuinit type of throwaway sections might have made sense
some time ago when RAM was more constrained, but now the savings
do not offset the cost and complications.  For example, the fix in
commit 5e427ec2d0 ("x86: Fix bit corruption at CPU resume time")
is a good example of the nasty type of bugs that can be created
with improper use of the various __init prefixes.

After a discussion on LKML[1] it was decided that cpuinit should go
the way of devinit and be phased out.  Once all the users are gone,
we can then finally remove the macros themselves from linux/init.h.

Note that some harmless section mismatch warnings may result, since
notify_cpu_starting() and cpu_up() are arch independent (kernel/cpu.c)
and are flagged as __cpuinit  -- so if we remove the __cpuinit from
the arch specific callers, we will also get section mismatch warnings.
As an intermediate step, we intend to turn the linux/init.h cpuinit
related content into no-ops as early as possible, since that will get
rid of these warnings.  In any case, they are temporary and harmless.

This removes all the ARM uses of the __cpuinit macros from C code,
and all __CPUINIT from assembly code.  It also had two ".previous"
section statements that were paired off against __CPUINIT
(aka .section ".cpuinit.text") that also get removed here.

[1] https://lkml.org/lkml/2013/5/20/589

Cc: Russell King &lt;linux@arm.linux.org.uk&gt;
Cc: Will Deacon &lt;will.deacon@arm.com&gt;
Cc: linux-arm-kernel@lists.infradead.org
Signed-off-by: Paul Gortmaker &lt;paul.gortmaker@windriver.com&gt;
</content>
</entry>
</feed>
