Does Rybka Properly Count Positions Evaluated?

Here we examine Rybka's assembly language instructions in detail

Does Rybka properly display nodes evaluated and nodes-per-second on its console display?

The following article will be of interest to anyone who truly wants know whether or not Rybka is counting nodes correctly, and is properly displaying nodes-per-second on its console display.

Evidence is offered here that Rybka is artificially inflating the node count by a factor of 8. However, you have to be willing to plow through some reverse-compiled source code in order to see this. It will be difficult to follow this explanation so I have tried to simplify it as much as possible. You will most likely need to be an intermediate or advanced computer programmer to follow this analysis.

Here we examine a small section of reverse-compiled 32-bit Rybka 1.2f code just before a print statement where information is displayed on the Rybka console.

The software used to generate this code was W32DASM version 8.93. The code is presented (with annotations) at the end of this article but first let's explain in English what is happening.

High Level Explanation

Vasik is retrieving some internal variables, doing some calculations, and displaying some information on the console. For users not familiar with assembly language, values are usually passed to a function by 'pushing' them or sending them to a temporary storage area called a 'stack'. They are then 'popped' off the stack by the called function. In the source code we are examining, variables are retrieved from memory and are modified in registers present in the CPU. Finally, a function is called to print some text and the values to the screen. Most of these parameters are composed of 2 32-bit values, which I have called '1' and '2' in my annotation because they are two pieces that together make up a 64-bit value.

Variables:

There are no variable names in reverse-compiled code so to make things easier I have named a few of them:

rawnodes - the true number of positions Rybka has evaluated.
nodes - rawnodes multipled by 8, for an unexplained reason.
tickcount - time in milliseconds since our machine was started.
deltatime - time in milliseconds that Rybka has been running.

In English here is what is happening:

1. In order to calculate nodes per second we need to know the present node count as well as the elapsed time that Rybka has been evaluating the position. We begin by calling a function which returns the elapsed time in milliseconds *since the machine was booted* using the system call GetTickCount().

2. From this number we subtract the time that Rybka first began looking at the position in question to get *the elapsed time that Rybka has been evaluating positions in milliseconds*. This is the "seconds" required for the "nodes per second" calculation.

3. We need to calculate the raw node count which is composed of 2 values: 'kNode' and 'remainder'.

Note: For some reason Vasik uses 2 variables to count the nodes that Rybka has evaluated. It appears that one variable represents kNodes evalauted (Nodes divided by 1024), while the other represents a fractional 'remainder' of kNodes.

Vasik begins counting nodes by setting the kNode Value to 1 and the 'remainder' value to 1024. He counts this 'remainder' down to 0 in units of 1 for each new position evaluated, then increments the kN count when the remainder gets to 0.

Example:

Position 0: kNode=1, remainder=1024
Position 1: kNode=1, remainder=1023
Position 2: kNode=1, remainder=1022
...
Position 1022: kNode=1, remainder=1
Position 1023: kNode=1, remainder=0
Position 1024: kNode=2, remainder=1024

So to get true nodes, we multiply kNode by 1024 and subtract the 'remainder'.

For a node count of 50, kNode=1 and remainder=974. Raw node count=1*1024-974=50.

4. Vasik now multiplies the raw node count by 8, in my opinion for no reason. This new value we will call "nodes" because he will display this on the screen as "nodes" as well as use it to calculate "nodes per second".

5. To calculate nodes per second Vasik takes the "nodes" and multiplies it by 1024 so that he can divide it by the elapsed time in milliseconds that Rybka has been evaluating position to get an approximate "nodes" per second. In reality, this is accomplished in a somewhat clumsy way by shifting the data in two 32-bit words to the left 10 places.

6. The ply count is retrieved from an internal value. This internal value is divided by 2 and has 3 subtracted before it is printed. So ply "2" is represented internally as (2+3)*2 or 10. Ply "3" is represented as (3+3)*2 or 12. This Hints that each ply is a two-step process and that there is a 3-ply "something" before we even get to ply-0.

This may all be confusing, but consider the source code below and the work required to convert what is going on to English. You may prefer to stick to what you just read to browsing the actual code below.

Annotated Source Code

* Referenced by a (U)nconditional or (C)onditional Jump at Address:
|:004B1D32(C)
|
:004B1DAF cmp dword ptr [esp+00000504], 0000000A // Here is where Vasik skips the print statement if the ply count is less than 10/2-3 or 2. You can modify Rybka to print information at all levels by changing the 'A' here to a 0.
:004B1DB7 jl 004B1E4D
:004B1DBD cmp byte ptr [00877098], 00
:004B1DC4 jne 004B1E4D

* Reference To: KERNEL32.GetTickCount, Ord:01DFh
|
:004B1DCA Call dword ptr [004C4018] // GetTickCount (current system time in milliseconds), result in EAX
:004B1DD0 mov ecx, dword ptr [00877104] // kNode 1
:004B1DD6 mov edx, dword ptr [00877100] // kNode 2
:004B1DDC mov esi, eax // tick count moved from eax to esi register
:004B1DDE xor edi, edi // zeros edi register
:004B1DE0 push edi // 0 sent to stack
:004B1DE1 push 00000400 // 1024 sent to stack
:004B1DE6 mov eax, 00000001 // 1 sent to eax
:004B1DEB sub eax, dword ptr [008770F8] // 1-oldtickcount sent to eax
:004B1DF1 push ecx // kNode 1 pushed to stack
:004B1DF2 push edx // kNode 2 pushed to stack
:004B1DF3 add esi, eax // esi=tickcount+1-oldtickcount = deltatime in milliseconds
:004B1DF5 call 004C0230 // multiply values on stack -> kNode*1024, result to edx - eax register pair
:004B1DFA mov ecx, eax // here we move the edx-eax data so we can use the eax register
:004B1DFC mov eax, dword ptr [008770F4] // node remainder sent to eax
:004B1E01 mov ebx, edx
:004B1E03 cdq // sign extend EAX to create 64-bit value in EDX-EAX
:004B1E04 push edi // 0 pushed on to stack
:004B1E05 sub ecx, eax // first half of kNode*1024-remainder = rawnodes to ecx
:004B1E07 push 00000008 // 8 pushed on to stack
:004B1E09 sbb ebx, edx // second half of 64-bit subtraction, rawnodes in ebx - ecx
:004B1E0B push ebx // rawnodes 1 sent to stack
:004B1E0C push ecx // rawnodes 2 sent to stack
:004B1E0D call 004C0230 // multiply values on stack -> rawnodes * 8 = nodes
:004B1E12 mov ebp, edx // nodes 1
:004B1E14 mov ebx, eax // nodes 2
:004B1E16 mov edx, ebx // swap node 1 and node 2 (high, low)
:004B1E18 mov eax, ebp
:004B1E1A shld eax, edx, 0A // highest 10 bits of edx (low) copied and shifted into eax (high), which itself is shifted. This is just a long-winded way to slide a 64-bit value to the left 10 bits.
:004B1E1E push edi // time 1=0 sent to stack
:004B1E1F push esi // time 2=deltatime in milliseconds sent to stack
:004B1E20 shl edx, 0A // shift left 10 bits (multiply by 1024), overflow handled above
:004B1E23 push eax // high bits sent to stack
:004B1E24 push edx // low bits - we have simply multiplied "nodes" by 1024 and put on stack
:004B1E25 call 004C0B20 // divide rawnodes*8*1024 by time*1000 to get nodes per second or nps
:004B1E2A push edx // nps 1 pushed to stack for print routine
:004B1E2B push eax // nps 2 pushed to stack for print routine
:004B1E2C mov eax, dword ptr [esp+0000050C] // move into eax internal ply count
:004B1E33 push ebp // nodes 1 pushed to stack for print routine
:004B1E34 cdq // sign extend eax to create 64-bit value in edx - eax.
:004B1E35 push ebx // nodes 2 pushed to stack for print routine
:004B1E36 sub eax, edx // eax - edx to eax. edx is 0 so this instruction just clears flags
:004B1E38 push edi // time 1 in milliseconds pushed to stack for print routine
:004B1E39 sar eax, 1 // divide internal ply count by 2 by shifting right 1 bit position
:004B1E3B push esi // time 2 in milliseconds pushed to stack for print routine
:004B1E3C sub eax, 00000003 // after dividing by 2 we now subtract 3 to get displayed ply value
:004B1E3F push eax // ply number pushed to stack for print routine
:004B1E40 push 0082F19C // address of text string pushed to stack, for print routine
:004B1E45 call 004B4B00 // print information string on screen using text, values on stack
:004B1E4A add esp, 00000020

* Referenced by a (U)nconditional or (C)onditional Jump at Addresses:
|:004B1DB7(C), :004B1DC4(C)
|
:004B1E4D mov eax, dword ptr [esp+10]
:004B1E51 pop edi
:004B1E52 pop esi
:004B1E53 pop ebp
:004B1E54 pop ebx
:004B1E55 add esp, 000004F0
:004B1E5B ret

Conclusion:

Vasik appears to artificially inflate the Rybka 1.2f node count by a factor of 8. What do you think?

Here is what happens if we modify the Rybka executable to display information on all search depth levels.

Note that Rybka spends 14 seconds doing "something" (see the line below with the text 'time 14454 nodes 0') before it even starts evaluating its first position. What could it be doing? Constructing a database of future mobility of the pieces 3 moves into the future? If he was doing "search extensions" we would expect the "node count" to be at least 1, wouldn't we?

Note that the node count at each ply depth is always a multiple of 8:

Test position

Here we do the same thing for the Rybka1.0beta executable. Note that the ply level has the value 2 subtracted before printing to the screen. We have modified the executable to display information at all ply levels.

Pseudo code from Rybka1.0beta:

if(plycount >= 5) { // only print when ply count <= 5-2, removed to allow printing at all ply levels
        ecx = GetTickCount() + 1 - M00667a70;
        eax = M00667a6c; // node count low
        edx = (M00667a74 << 0xa) - eax; // node count high*1024-node count low

// this is the way Vas stores his node count - maybe to make it harder

// to do what I am doing now...
        esi = (eax & 0xf) + edx * 8; // curious way to obscure true node count
        edx = 0;
        eax = esi;
        if(ecx < 10000) { // when our elapsed time is < 10 seconds we do this
            eax = eax << 0xa;
            ecx = ecx / ecx;
            edx = ecx % ecx;
        } else { // when our elapsed time is > 10 seconds we do this
            edx = (ecx >> 0xa) / (ecx >> 0xa) % (ecx >> 0xa) / (ecx >> 0xa);
        }
        L0040EE9A("info depth %d time %u nodes %u nps %u\n", plycount - 2, ecx, esi, eax); // print to screen, notice ply level minus 2
    }

Vasik Responds on his forum:

Posted: Thu Nov 09, 2006 7:41 am Post subject:

Quote:

Vas is the ideal person to reply to this.

Vas ... Your comments pls.

I'm probably a little off-topic in this thread Smile

Anyway, the explanation is too complex for me but it seems that Rybka 2.2 outputs node counts in multiples of 2. This shouldn't happen, probably there is some bug in my node counting or display.

Vas

Node Count Examined for Rybka 2.2, multiprocessor version

Reverse Engineering has been performed on Rybka 2.2, particularly with regard to node count.

Vasik takes the raw "node count" and divides this by 14, then multiplies by the number of active processors before printing to the screen. This is why the "node count" for Rybka 2.2 (on a 2-core PC) is always divisible by 2. The assembly language code is very similar to the code for Rybka 1.2f displayed above, so there is no need to re-display it.

The single processor version of Rybka 2.2 does not multiply the raw "node count" by the number of processors, however it still contains the mysterious division by 14.

Since there is no multiplication involved, the Rybka 2.2 single processor version "node count" may be an even or an odd number (you can verify this).

The Rybka 2.2 "node count" is composed of 2 internal numbers, a "low" and a "high" value. Rybka begins counting "nodes" with the value 32768, which it decrements to 0 in intervals of 1. When this "low" value reaches 0, it is reset to 32768 and the "high" node count is incremented by 1. This can be seen in the following code:

Here is how Vasik increases the "node count" by 1:

:004DB289 BD01000000              mov ebp, 00000001
:004DB28E 292DD4439200            sub dword ptr [009243D4], ebp
:004DB294 56                      push esi
:004DB295 57                      push edi
:004DB296 7566                    jne 004DB2FE
:004DB298 012DE0439200            add dword ptr [009243E0], ebp
:004DB29E C705D443920000800000    mov dword ptr [009243D4], 00008000
:004DB2A8 8315E443920000          adc dword ptr [009243E4], 00000000
:004DB2AF E82C9A0000              call 004E4CE0

The thought has occurred to me that Vasik is not counting actual nodes evaluated, but instead "internal updates" to his piece mobility database, and that he figures that on average there are 14 database updates per position evaluated. That would explain the division by the number 14 before the "node count" is printed to the screen.

Enter supporting content here