SHA256 logic circuit / bitcoin mining
Source: http://www.derkeiler.com/Newsgroups/sci.crypt/201201/msg00199.html
From: Alex Mizrahi <alex.mizrahi@xxxxxxxxx>·
Date: Wed, 18 Jan 2012 22:51:41 +0200·
hi
Some time ago I've posted here about my SHA256 speedup attempt via boolean functions (logic circuits).
Here's a more rigorous viability analysis (crosspost from comp.theory):

SHA256 has
7*64 + 48*3 + 8 = 600 add operations
6*64 + 4*48 = 576 bitwise rotations
2*48 = 96 bitwise shifts
12*64 + 4*48 = 960 bitwise operations
(number might be slightly different for different implementation)
Together that would be 2232 operations, only 600 of which are additions. But additions make the meat of
cryptographic diffusion and require many logic gates to implement.
On a CPU which requires one cycle per each operation it takes 2232*32 = 71424 cycles to compute 32
SHA256 outputs.
Logic gate implementation doesn't need to implement rotations and shifts as they just reorder inputs.
Full 32bit adder requires 160 bit ops,
So what's left is 600 * 160 + 960 * 32 = 126720.
(It can be further reduced a bit through constant propagation of zeroes from shifts.)
So we have 71424 against 126720. That's worse, but not five times worse, not even twice worse.
Furthermore, some CPU architectures do not implement rightrotate natively and have to implement it though
three operations (two shifts and OR). They would require 3384*32 = 108288 operations, which is pretty close.

(This would be NVIDIA GPU. One with native ROTR is AMD GPU (which also implements MUX
instruction which can shave cycle count even further).
So number of operations is close, but logic circuit implementation allows additional optimizations. Even 20%
speedup would make it competitive on NVIDIA GPU and ~50% speedup  on AMD GPU.
What optimizations are possible?
If we're doing preimage attack or bitcoin mining, we can fix values of some variables and vary only few of
SHA256 logic circuit / bitcoin mining
SHA256 logic circuit / bitcoin mining 1
them, say, 32 (size of nonce to pick in bitcoin mining). Fixing 736 variables out of 768 enables one to do
'constant propagation' kind of optimization.
This applies both to logic circuit implementation and to a traditional one, but logic circuit one makes more use
of it, as it can optimize parts of XOR or ADD with a constant, while traditional implementation would just do
full operation.
Then, it's possible to use some symmetries to reduce number of computations required, say, via binary
decision diagrams. But that works only for the first few rounds, after that it's fully diffused and BDDs will be
basically equivalent to full truth tables. You can't compress a random signal, you know. So savings aren't
huge. (Although I don't yet know the point at which symmetry is totally lost.)
I played with these and other ideas for a while, and it looks like there's simply no way to compute SHA256
significantly faster, i.e. optimizations are minuscule.
There is however one thing which bugged me... Let's say we're looking for input such that output is all ones (it
is easy to convert any preimage attack to this form), so we need to verify that
O1 and O2 and O3 and .. and O32 = TRUE
Now imagine that outputs O1, O2 .. O32 are represented in form of BDD or a disjunctive normal form (DNF).
As SHA256 is supposed to be indistinguishable from random, DNF has about 2^31 member, and BDD is
huge. Or if you look at truth table, it is 50% ones and 50% zeros randomly mixed.
Now if you consider a conjucnction (O1 and O2) it has only 25% of ones, and thus smaller DNF (and BDD).
Likewise, (O1 and O2 and O3) has even less ones, and so on, until we are left just with handful of ones  or
none at all  which are actually a solution to our problem.
So there is a great simplification at the last step, can we drive it down so whole computation gets simpler?
If we are going to compute it precisely  I don't think so.
But we do not need to be exact here. Just like in number theory there are probabilistic primality tests, we only
need to find such inputs where output is POTENTIALLY true with probability higher than 50% (higher =
better), and then do full check on them.
Alternatively, we can rule out inputs which are definitely false and do a full check on the rest. If first part can
be done cheaply, we'll have a speedup compared to baseline.
One way to do this is to find some approximate truth table where zero is actually zero, and one is "maybe one,
maybe zero", or "unknown", in other words.
When logic circuit is converted into a form where it has only AND and OR gates it is monotonic, i.e. flipping
one input to 1 might only flip ouput zeros to ones, but never ones to zeros.
This means we can inject extra ones into truth function at any step (say, to make it more symmetric) and that
would only inject some fake ones input output, but we can still trust zeros.
My first attempt was to inject extra one directly into input variable, i.e. assume both X1 and NOT(X1) are true
at once. But it doesn't work very well  output is all ones. This happens because this lossy representation
injects 1/8 more ones for each AND operation, and so it quickly turns to all ones.
SHA256 logic circuit / bitcoin mining
SHA256 logic circuit / bitcoin mining 2
Next try was to replace one of logic gates with 1 without actually computing it. Then output of a whole
function is correct on input combinations where that gate's output was indeed one, i.e. in 1/2 or 1/4 of all
combinations. Otherwise it might be true or false depending on luck.
So that's already some speedup, as there is no need to compute output of that gate (although a very small one),
and if that fudge doesn't affect output of all gates we still get lots of zeros.
When we look at it from truthtable point of view, essentially we replace truth table of one gate with all ones
leaving all other intact.
This is probably not the best way to do it. What if we inject few ones at different nodes as needed, i.e. to make
BDD more symmetric? I.e. if we have a fork node in BDD
0101 0100
We can replace it with eq node 0101 = 0101 OR 0100. And potentially we need to do less operations when we
have more eq nodes.
But this shit is fairly complex and I doubt it will work, so I'm considering other options, say, working with
DNFs.
In DNF, conjunction with fewer variables produces more ones, so maybe it makes sense to start with those to
get at least some ones cheaply?
.
SHA256 logic circuit / bitcoin mining
SHA256 logic circuit / bitcoin mining 3
Enter the password to open this PDF file:
File name:

File size:

Title:

Author:

Subject:

Keywords:

Creation Date:

Modification Date:

Creator:

PDF Producer:

PDF Version:

Page Count:

Preparing document for printing…
0%
Comments 0
Log in to post a comment