Does Rybka properly display nodes evaluated and nodes-per-second on its console display?
The following article will be of interest to anyone who truly wants know whether or not Rybka is counting
nodes correctly, and is properly displaying nodes-per-second on its console display.
Evidence is offered here that Rybka is artificially inflating the node count by a factor of 8. However,
you have to be willing to plow through some reverse-compiled source code in order to see this. It will be difficult to follow
this explanation so I have tried to simplify it as much as possible. You will most likely need to be an intermediate or advanced
computer programmer to follow this analysis.
Here we examine a small section of reverse-compiled 32-bit Rybka 1.2f code just before a print statement
where information is displayed on the Rybka console.
The software used to generate this code was W32DASM version 8.93. The code is presented (with annotations)
at the end of this article but first let's explain in English what is happening.
High Level Explanation
Vasik is retrieving some internal variables, doing some calculations, and displaying some information on
the console. For users not familiar with assembly language, values are usually passed to a function by 'pushing' them or sending
them to a temporary storage area called a 'stack'. They are then 'popped' off the stack by the called function. In the source
code we are examining, variables are retrieved from memory and are modified in registers present in the CPU. Finally, a function
is called to print some text and the values to the screen. Most of these parameters are composed of 2 32-bit values, which
I have called '1' and '2' in my annotation because they are two pieces that together make up a 64-bit value.
Variables:
There are no variable names in reverse-compiled code so to make things easier I have named a few of them:
rawnodes - the true number of positions Rybka has evaluated.
nodes - rawnodes multipled by 8, for an
unexplained reason.
tickcount - time in milliseconds since our machine was started.
deltatime - time in milliseconds
that Rybka has been running.
In English here is what is happening:
1. In order to calculate nodes per second we need to know the present node count as well as the elapsed
time that Rybka has been evaluating the position. We begin by calling a function which returns the elapsed time in milliseconds
*since the machine was booted* using the system call GetTickCount().
2. From this number we subtract the time that Rybka first began looking at the position in question to get
*the elapsed time that Rybka has been evaluating positions in milliseconds*. This is the "seconds" required for the "nodes
per second" calculation.
3. We need to calculate the raw node count which is composed of 2 values: 'kNode' and 'remainder'.
Note: For some reason Vasik uses 2 variables to count the nodes that Rybka has evaluated. It appears that
one variable represents kNodes evalauted (Nodes divided by 1024), while the other represents a fractional 'remainder'
of kNodes.
Vasik begins counting nodes by setting the kNode Value to 1 and the 'remainder' value to 1024. He counts
this 'remainder' down to 0 in units of 1 for each new position evaluated, then increments the kN count when the remainder
gets to 0.
Example:
Position 0: kNode=1, remainder=1024
Position 1: kNode=1, remainder=1023
Position 2: kNode=1, remainder=1022
...
Position
1022: kNode=1, remainder=1
Position 1023: kNode=1, remainder=0
Position 1024: kNode=2, remainder=1024
So to get true nodes, we multiply kNode by 1024 and subtract the 'remainder'.
For a node count of 50, kNode=1 and remainder=974. Raw node count=1*1024-974=50.
4. Vasik now multiplies the raw node count by 8, in my opinion for no reason. This new value we will
call "nodes" because he will display this on the screen as "nodes" as well as use it to calculate "nodes per second".
5. To calculate nodes per second Vasik takes the "nodes" and multiplies it by 1024 so that he can divide
it by the elapsed time in milliseconds that Rybka has been evaluating position to get an approximate "nodes" per second. In
reality, this is accomplished in a somewhat clumsy way by shifting the data in two 32-bit words to the left 10 places.
6. The ply count is retrieved from an internal value. This internal value is divided by 2 and has 3 subtracted
before it is printed. So ply "2" is represented internally as (2+3)*2 or 10. Ply "3" is represented as (3+3)*2 or 12. This
Hints that each ply is a two-step process and that there is a 3-ply "something" before we even get to ply-0.
This may all be confusing, but consider the source code below and the work required to convert what is going
on to English. You may prefer to stick to what you just read to browsing the actual code below.
Annotated Source Code
* Referenced by a (U)nconditional or (C)onditional Jump at Address:
|:004B1D32(C)
|
:004B1DAF
cmp dword ptr [esp+00000504], 0000000A // Here is where Vasik skips the print statement if the ply count is less than 10/2-3
or 2. You can modify Rybka to print information at all levels by changing the 'A' here to a 0.
:004B1DB7 jl 004B1E4D
:004B1DBD
cmp byte ptr [00877098], 00
:004B1DC4 jne 004B1E4D
* Reference To: KERNEL32.GetTickCount, Ord:01DFh
|
:004B1DCA Call dword ptr [004C4018] // GetTickCount (current system time in milliseconds), result in EAX
:004B1DD0
mov ecx, dword ptr [00877104] // kNode 1
:004B1DD6 mov edx, dword ptr [00877100] // kNode 2
:004B1DDC mov esi, eax //
tick count moved from eax to esi register
:004B1DDE xor edi, edi // zeros edi register
:004B1DE0 push edi // 0 sent
to stack
:004B1DE1 push 00000400 // 1024 sent to stack
:004B1DE6 mov eax, 00000001 // 1 sent to eax
:004B1DEB sub
eax, dword ptr [008770F8] // 1-oldtickcount sent to eax
:004B1DF1 push ecx // kNode 1 pushed to stack
:004B1DF2 push
edx // kNode 2 pushed to stack
:004B1DF3 add esi, eax // esi=tickcount+1-oldtickcount = deltatime in milliseconds
:004B1DF5
call 004C0230 // multiply values on stack -> kNode*1024, result to edx - eax register pair
:004B1DFA mov ecx, eax //
here we move the edx-eax data so we can use the eax register
:004B1DFC mov eax, dword ptr [008770F4] // node remainder
sent to eax
:004B1E01 mov ebx, edx
:004B1E03 cdq // sign extend EAX to create 64-bit value
in EDX-EAX
:004B1E04 push edi // 0 pushed on to stack
:004B1E05 sub ecx, eax // first half of kNode*1024-remainder =
rawnodes to ecx
:004B1E07 push 00000008 // 8 pushed on to stack
:004B1E09 sbb ebx, edx // second half of 64-bit subtraction,
rawnodes in ebx - ecx
:004B1E0B push ebx // rawnodes 1 sent to stack
:004B1E0C push ecx // rawnodes 2 sent to stack
:004B1E0D
call 004C0230 // multiply values on stack -> rawnodes * 8 = nodes
:004B1E12 mov ebp, edx // nodes 1
:004B1E14 mov
ebx, eax // nodes 2
:004B1E16 mov edx, ebx // swap node 1 and node 2 (high, low)
:004B1E18 mov eax, ebp
:004B1E1A
shld eax, edx, 0A // highest 10 bits of edx (low) copied and shifted into eax (high),
which itself is shifted. This is just a long-winded way to slide a 64-bit value to the left 10 bits.
:004B1E1E push edi
// time 1=0 sent to stack
:004B1E1F push esi // time 2=deltatime in milliseconds sent to stack
:004B1E20 shl edx, 0A
// shift left 10 bits (multiply by 1024), overflow handled above
:004B1E23 push eax // high bits sent to stack
:004B1E24
push edx // low bits - we have simply multiplied "nodes" by 1024 and put on stack
:004B1E25 call 004C0B20 // divide rawnodes*8*1024
by time*1000 to get nodes per second or nps
:004B1E2A push edx // nps
1 pushed to stack for print routine
:004B1E2B push eax // nps 2 pushed to stack for print routine
:004B1E2C mov eax,
dword ptr [esp+0000050C] // move into eax internal ply count
:004B1E33 push ebp // nodes 1 pushed to stack for print routine
:004B1E34
cdq // sign extend eax to create 64-bit value in edx - eax.
:004B1E35 push ebx // nodes 2 pushed to stack for print routine
:004B1E36
sub eax, edx // eax - edx to eax. edx is 0 so this instruction just clears flags
:004B1E38 push edi // time 1 in milliseconds
pushed to stack for print routine
:004B1E39 sar eax, 1 // divide internal ply count by 2 by shifting right 1 bit position
:004B1E3B
push esi // time 2 in milliseconds pushed to stack for print routine
:004B1E3C sub eax, 00000003 // after dividing by 2
we now subtract 3 to get displayed ply value
:004B1E3F push eax // ply
number pushed to stack for print routine
:004B1E40 push 0082F19C // address of text string pushed to stack, for print routine
:004B1E45
call 004B4B00 // print information string on screen using text, values on stack
:004B1E4A add esp, 00000020
* Referenced by a (U)nconditional or (C)onditional Jump at Addresses:
|:004B1DB7(C),
:004B1DC4(C)
|
:004B1E4D mov eax, dword ptr [esp+10]
:004B1E51 pop edi
:004B1E52 pop esi
:004B1E53 pop ebp
:004B1E54
pop ebx
:004B1E55 add esp, 000004F0
:004B1E5B ret
Conclusion:
Vasik appears to artificially inflate the Rybka 1.2f node count by a factor of 8. What do you think?