Age | Commit message (Collapse) | Author |
|
This fixes a bug on my system where mfc_get was failing on
work_restart_pointer because it was not 128-byte aligned.
|
|
Only the first loop is converted. The performance
is increased up to ~5.97 khash/sec per SPE core.
TODO: try to update it to use better data layout and more
unrolling from shakti's variant.
|
|
Prevents the miner from eventually abnormally terminating
due to curl misbehaviour.
|
|
Currently the cast between uint32_t and uint64_t pointers
breaks strict aliasing rules and needs -fno-strict-aliasing
option as a workaround, otherwise the code gets miscompiled.
But -fno-strict-aliasing can seriously inhibit optimization
possibilities. For example, performance of 1 thread
on Cell PPU (using Altivec instructions):
CFLAGS="-O3 -mcpu=cell -fno-strict-aliasing" - 1.79 khash/sec
CFLAGS="-O3 -mcpu=cell -fstrict-aliasing" - 2.60 khash/sec
|
|
Because the data endiannes has changed (native instead of
little endian) and also SHA256 functions arguments are now
different, this required lots of changes all over the place.
Improves Altivec performance on Cell PPU from ~3.4 khash/s
to ~3.6 khash/s (two threads). Seems to have no effect on
SPU performance though.
|
|
This variable is set by SPU code, but valgrind can't see it and
complains.
|
|
Linux on PS3 gets a huge boost in litecoin mining performance.
Cell/BE support should be detected and enabled automatically
by autotools.
The miner threads are first allocated to the available SPU
cores (typically 6). The remaining threads are allocated on
PPU. There will be 8 threads total on PS3: 6 SPU threads
and 2 PPU threads.
Each SPU core provides ~5.4 khash/s if compiled with spu-elf-gcc 4.6
The performance may vary for different gcc versions, older ones are
typically slower.
|
|
The code can be compiled for different architectures from the same
source starting with gcc 4.7. But SSE2/Altivec/SPU targets have
compatibility wrappers, which also allow the use of older versions
of gcc.
Two hashes are processed at the same time, so twice bigger scratch
buffer is needed (~256K vs. ~128K).
Speedup on Cell PPU (32-bit), single thread, 3.2GHz:
~0.58 khash/s -> ~1.79 khash/sec
|
|
|
|
Now it seems to work correctly and provide performance ~0.58 khash/s
per thread on Cell PPU.
|
|
The mangled scrypt.c from Art Forz is too much broken on big endian
systems. Revert it back to something that is more maintainable.
|
|
|
|
3.73kH/s/core on a 3.6GHz PhenomII compiled with gcc 4.6.1 and CFLAGS="-march=amdfam10 -O3"
|
|
3.62kH/s/core on a 3.6GHz PhenomII compiled with gcc 4.6.1 and CFLAGS="-march=amdfam10 -O3"
|
|
|
|
|
|
amd64 linux speedup from 2.02 to 2.67 kH/s with default options, from 2.59 to 3.24kH/s with -O3
|
|
|
|
|
|
|
|
|
|
policies defined.
|
|
|
|
|
|
Fix include path of libcurl headers
|
|
Here's my x86_64 and linux optimisations. Hopefully shouldn't break other OSs now.
|
|
Add likely() macro.
Optimise a few obvious code paths with likely/unlikely.
Change algo to sse2_amd64 by default.
Move priority change to worker threads only.
Detect number of CPUs and set default number of threads == CPUs.
Add scheduling policy change to worker threads to SCHED_IDLE first and fallback to SCHED_BATCH on linux.
Don't error when failing to set priority.
Add CPU affinity and bind worker threads to CPUs when number of threads is a multiple of number of CPUs.
Update NEWS with changes.
|
|
|
|
Fix the include path for libcurl, if it was installed in a location
where gcc does not look by default. The variable is declared in
the LIBCURL_CHECK_CONFIG m4 macro.
|
|
|
|
OSX CPU Support
|
|
Derived from xorg source
http://cgit.freedesktop.org/xorg/xserver/tree/GL/glx/glxbyteorder.h?id=cdf6b15f039c4905d8d54152153b0a3ecd7aba55;id2=415e49b940bba2d08870db410ebb47d2add5d836
|
|
|
|
|
|
|
|
Use target instead of host.
Fix compilation for non win32 and non x86_64 platforms.
|
|
|
|
|
|
|
|
|
|
Also, some newline fixes (applog callers do not need newlines in strings)
|
|
Spotted by lfm
|
|
|
|
|
|
|
|
Also, remove a few superfluous printouts.
|
|
|
|
|
|
In miner.h, this fixes an alloca-definition-related warning.
For the other files, this is simply future-proofing/precaution.
|
|
Also, improve portability of alloca.
|