Houdini 1.03 is available

Discussion about chess-playing software (engines, hosts, opening books, platforms, etc...)
User avatar
Robert Houdart
Posts: 180
Joined: Thu Jun 10, 2010 4:55 pm
Contact:

Re: Houdini 1.03 is available

Post by Robert Houdart » Sat Jul 17, 2010 2:52 pm

sirabc wrote:
Robert Houdart wrote:If this processor is a double quad core with shared memory, the memory contention of the spin locks could absolutely kill the performance.
For this kind of architecture the "lockless hashing" promoted by Bob Hyatt should be used. I have since long coded the hashless locking in Houdini, but haven't activated it because for the most common hardware the spinlocks have better performance.
If your friend is interested, I could just compile him/her a 8_CPU version with "Hyatt hashing". Just let me know.
A shiny new compile as a present? Who wouldn't want it! I'll keep an eye on this thread for this compile. Hopefully this is the problem and we'll see Houdini crunch some nodes.

Thanks!
Sirabc, here's the download link for the 8_CPU version compiled with lockless hash table access ("Hyatt hashing"): http://www.cruxis.com/chess/download/Ho ... S_8CPU.zip

It may or may not solve the performance problem on the x5355 hardware, please keep me updated about success or failure.

Robert

Odeus37
Posts: 43
Joined: Mon Jun 14, 2010 5:38 pm

Re: Houdini 1.03 is available

Post by Odeus37 » Sat Jul 17, 2010 4:26 pm

By curiosity, I tested the "lockless hash table access" version on my Q6600 (with 4 threads only), and I was kinda surprised to find it like 6% faster than the 4 cpu "normal" x64 version ?

Anyway, really a big thanks for your work with Houdini : working well with Shredder GUI (unlike Fire and such...), and with multi-pv too now. With some Gaviota tablebases support, it would be perfect ! :lol:

User avatar
Robert Houdart
Posts: 180
Joined: Thu Jun 10, 2010 4:55 pm
Contact:

Re: Houdini 1.03 is available

Post by Robert Houdart » Sat Jul 17, 2010 5:19 pm

Odeus37 wrote:By curiosity, I tested the "lockless hash table access" version on my Q6600 (with 4 threads only), and I was kinda surprised to find it like 6% faster than the 4 cpu "normal" x64 version ?
That's cool!
Mind you, it's not very easy to measure multi-core speed in a reliable and accurate way. For this kind of comparison I use a fixed-time search from the starting position with the default 128 MB hash. Simply double-click the executable in Windows Explorer and type "go movetime 30000" to let the engine run for 30 seconds.
I correct the test results for the actual CPU usage during the test (to compensate for other processes consuming CPU). In the Task Manager one can find the total CPU time of the process to correct the results.
I repeat the test at least 3 times to estimate the random variations.

Here are the raw results on a Core i5-750 @2.66 GHz (4 cores):

Code: Select all

Houdini 1.03a x64 8_CPU
1) info multipv 1 depth 21 seldepth 48 score cp 13  time 29995 nodes 172110408 nps 5737000 [1:51 CPU]
2) info multipv 1 depth 21 seldepth 47 score cp 10  time 29994 nodes 174050881 nps 5802000 [1:52 CPU]
3) info multipv 1 depth 21 seldepth 49 score cp 18  time 29994 nodes 179477232 nps 5983000 [1:54 CPU]

Code: Select all

Houdini 1.03a x64 LOCKLESS 8_CPU
1) info multipv 1 depth 22 seldepth 49 score cp 14  time 29993 nodes 180690547 nps 6024000 [1:55 CPU]
2) info multipv 1 depth 21 seldepth 47 score cp 14  time 30005 nodes 173650984 nps 5787000 [1:53 CPU]
3) info multipv 1 depth 21 seldepth 48 score cp 12  time 29994 nodes 181592182 nps 6054000 [1:55 CPU]

Code: Select all

Houdini 1.03a x64 POPCNT 8_CPU
1) info multipv 1 depth 21 seldepth 49 score cp 12  time 29991 nodes 178575457 nps 5954000 [1:52 CPU]
2) info multipv 1 depth 21 seldepth 47 score cp 14  time 30000 nodes 178327630 nps 5944000 [1:52 CPU]
3) info multipv 1 depth 22 seldepth 51 score cp 14  time 29992 nodes 174626037 nps 5822000 [1:50 CPU]
If we scale the nps values to a full 2:00 minutes of CPU, we obtain the following:

Code: Select all

Houdini 1.03a x64 8_CPU
1) 5737000 * 120 / 111 = 6202 kN/s
2) 5802000 * 120 / 112 = 6216 kN/s
3) 5983000 * 120 / 114 = 6298 kN/s
=> Average speed = 6239 kN/s

Code: Select all

Houdini 1.03a x64 LOCKLESS 8_CPU
1) 6024000 * 120 / 115 = 6286 kN/s
2) 5787000 * 120 / 113 = 6145 kN/s
3) 6054000 * 120 / 115 = 6317 kN/s
=> Average speed = 6249 kN/s

Code: Select all

Houdini 1.03a x64 POPCNT 8_CPU
1) 5954000 * 120 / 112 = 6380 kN/s
2) 5944000 * 120 / 112 = 6369 kN/s
3) 5822000 * 120 / 110 = 6351 kN/s
=> Average speed = 6367 kN/s
The final conclusion from this test run is that on a Core i5-750 there is no measurable speed difference between the normal and the LOCKLESS version, whereas the POPCNT version is about 2% faster.

What are the results on your Q6600?

Robert

Vael Jean-Paul
Posts: 78
Joined: Thu Jun 10, 2010 7:59 am

Re: Houdini 1.03 is available

Post by Vael Jean-Paul » Sat Jul 17, 2010 7:07 pm

I have done the same test as with Houdini 1.03..and the situation is different!
I get a new record with Lockless ,he find the Mat in 17 in 03m13s! at depth 33 and as you can see at only 03m26s i'am at 17000KN/s!
What is different now is that i get better results with Split Depth(SD) 10
With SD 13 i see much later a Mate and also a much higher Mate value..you can say dubble +M40 ..but go fast down..and i don't see the solution +M17
in a short time.
After this test i also run a Benchtest (see lower)

Done on a core i7 920@3.80Ghz HT OFF ,Win7 x64 ,Fritz Gui ,Large Pages "On"

Houdini 1.03a x64 Lockless 4cores: Depth 21 in 00m55s +M20 Core i7 920 @3.80Ghz (19x200) HT OFF (Split Depth = 10)
Depth 25 in 01m03s +M18
Depth 33 in 03m13s +M17!! (03m26s -> 17000KN/s!)
Houdini 1.03a x64 Lockless 4cores: Depth 24 in 02m23s +M40 Core i7 920 @3.80Ghz (19x200) HT OFF (Split Depth = 13 slower!)
Depth 25 in 02m35s +M33
Depth 26 in 02m52s +M19
Depth 30 in 04m17s +M18 (06m00s -> 16689KN/s)
Houdini 1.03a x64 popcnt 4cores: Depth 22 in 01m29s +M19 Core i7 920 @3.80Ghz (19x200) HT OFF (Split Depth = 10)
Depth 30 in 02m39s +M18
Depth 32 in 03m37s +M18 (03m56s -> 16000KN/s)
Depth 33 in 05m04s +M17! (06m35s -> 17000KN/s)
Houdini 1.03a x64 popcnt 4cores: Depth 24 in 03m29s +M19 Core i7 920 @3.80Ghz (19x200) HT OFF (Split Depth = 13 slower!)
Depth 30 in 04m58s +M18
Depth 31 in 05m29s +M18 (05m31s -> 15200KN/s)
Houdini 1.03a x64 4cores: Depth 23 in 03m05s +M18 Core i7 920 @3.80Ghz (19x200) HT OFF (Split Depth = 10)
Depth 30 in 04m27s +M18
Depth 31 in 04m56s +M18 (05m50s -> 16000KN/s)
Houdini 1.03a x64 4cores: Depth 23 in 01m51s +M28 Core i7 920 @3.80Ghz (19x200) HT OFF (Split Depth = 13)
Depth 28 in 02m21s +M26
Depth 29 in 02m38s +M18
Depth 33 in 04m39s +M18 (05m36s -> 17000KN/s)

I use always "go depth 20" and here we see that the popcnt has the highest nodes/sec.
And surpricely the Lockless is the lowest..and still he gives me the fastest result and highest KN/s during his calculatings!?
I will test this also out with STS..and then i can finally start with games.
To know which Houdini 1.03a is the strongest i should test the 6 above :)

Houdini 1.03a x64 4cores:
info multipv 1 depth 20 seldepth 40 score cp 12 time 9851 nodes 67022690 nps 6803000
hashfull 1000 pv d2d4 g8f6 b1c3 d7d5 c1f4 e7e6 e2e3 f8d6 g1f3 b8c6 f1b5 e8
g8 e1g1 c8d7 b5c6 d7c6 f3e5 h7h6 d1d3 d8e7 h2h3
info multipv 1 depth 20 seldepth 40 score cp 12 time 9851 nodes 67022690 nps 6803000
hashfull 1000 pv d2d4 g8f6 b1c3 d7d5 c1f4 e7e6 e2e3 f8d6 g1f3 b8c6 f1b5 e8
g8 e1g1 c8d7 b5c6 d7c6 f3e5 h7h6 d1d3 d8e7 h2h3
bestmove d2d4 ponder g8f6
Houdini 1.03a x64 Lockless 4cores:
info multipv 1 depth 20 seldepth 46 score cp 12 time 7332 nodes 48795813 nps 6655000
hashfull 710 pv d2d4 d7d5 b1c3 g8f6 c1f4 e7e6 g1f3 f8d6 e2e3 d6f4 e3f4 e8g
8 f1e2 b8c6 e1g1 c8d7 a2a3 d8e7 d1d3 h7h6 f3e5 c6e5 f4e5
info multipv 1 depth 20 seldepth 46 score cp 12 time 7332 nodes 48795813 nps 6655000
hashfull 710 pv d2d4 d7d5 b1c3 g8f6 c1f4 e7e6 g1f3 f8d6 e2e3 d6f4 e3f4 e8g
8 f1e2 b8c6 e1g1 c8d7 a2a3 d8e7 d1d3 h7h6 f3e5 c6e5 f4e5
bestmove d2d4 ponder d7d5
Houdini 1.03a x64 popcnt 4cores:
info multipv 1 depth 20 seldepth 47 score cp 13 time 9072 nodes 65460395 nps 7215000
hashfull 1000 pv d2d4 g8f6 b1c3 d7d5 c1f4 c8f5 e2e3 e7e6 f1d3 f5d3 d1d3 b8
c6 a2a3 a7a6 g1f3 f8d6 e1g1 e8g8 f3e5 d8e8 c3e2 c6e5 f4e5 d6e5 d4e5
info multipv 1 depth 20 seldepth 47 score cp 13 time 9072 nodes 65460395 nps 7215000
hashfull 1000 pv d2d4 g8f6 b1c3 d7d5 c1f4 c8f5 e2e3 e7e6 f1d3 f5d3 d1d3 b8
c6 a2a3 a7a6 g1f3 f8d6 e1g1 e8g8 f3e5 d8e8 c3e2 c6e5 f4e5 d6e5 d4e5
bestmove d2d4 ponder g8f6

JP.

User avatar
Robert Houdart
Posts: 180
Joined: Thu Jun 10, 2010 4:55 pm
Contact:

Re: Houdini 1.03 is available

Post by Robert Houdart » Sat Jul 17, 2010 7:32 pm

Jean-Paul,

A position with a mate is not very useful for evaluating speed. Once the mate is found the branching factor is completely different from a normal position: if a Mate in 25 plies has been found the engine will never again go deeper than 25 plies in subsequent searches. In other words: all branches will be cut off at depth 25.
It's also better to use fixed time than fixed depth to limit the influence of the random variations due to multi-threading.

In summary, I suggest that the procedure that I used above will produce much more reliable results:
- search from a normal opening or midgame position (the starting position is OK)
- use fixed search time ("go movetime xxxxx")
- correct the results for actual CPU usage
- average over several runs

Robert

Odeus37
Posts: 43
Joined: Mon Jun 14, 2010 5:38 pm

Re: Houdini 1.03 is available

Post by Odeus37 » Sat Jul 17, 2010 8:09 pm

Here are the results with my outdated Q6600 :

Code: Select all

Houdini 1.03a x64 4_CPU
1) info multipv 1 depth 20 seldepth 45 score cp 12  time 29998 nodes 94618762 nps 3154000 [1:53 CPU]
2) info multipv 1 depth 20 seldepth 45 score cp 13  time 29991 nodes 96568435 nps 3219000 [1:54 CPU]
3) info multipv 1 depth 21 seldepth 45 score cp 14  time 30002 nodes 96031695 nps 3200000 [1:55 CPU]

Code: Select all

Houdini 1.03a x64 LOCKLESS 8_CPU
1) info multipv 1 depth 21 seldepth 49 score cp 21  time 29997 nodes 101998702 nps 3400000 [1:54 CPU]
2) info multipv 1 depth 20 seldepth 51 score cp 13  time 29996 nodes 103461505 nps 3449000 [1:56 CPU]
3) info multipv 1 depth 21 seldepth 44 score cp 14  time 29996 nodes 102532185 nps 3418000 [1:55 CPU]

Code: Select all

Houdini 1.03a x64 4_CPU
1) 3154000 * 120 / 113 = 3349 kN/s
2) 3219000 * 120 / 114 = 3388 kN/s
3) 3200000 * 120 / 115 = 3339 kN/s
=> Average speed = 3359 kN/s

Code: Select all

Houdini 1.03a x64 LOCKLESS 8_CPU
1) 3400000 * 120 / 114 = 3579 kN/s
2) 3449000 * 120 / 116 = 3568 kN/s
3) 3418000 * 120 / 115 = 3567 kN/s
=> Average speed = 3571 kN/s
I was checking nps reported by the engine, not corrected with the actual CPU time. But it's about same difference : 3571 is about 6% more than 3359.

User avatar
Robert Houdart
Posts: 180
Joined: Thu Jun 10, 2010 4:55 pm
Contact:

Re: Houdini 1.03 is available

Post by Robert Houdart » Sat Jul 17, 2010 9:36 pm

Odeus37, these are very good results!
Thanks for sharing,

Robert

Odeus37
Posts: 43
Joined: Mon Jun 14, 2010 5:38 pm

Re: Houdini 1.03 is available

Post by Odeus37 » Sat Jul 17, 2010 10:03 pm

About the "lockless hash table access" version, it would be nice to have more results with other processors.

If it's about same speed with recent CPUs, and faster with older ones (like the 6% with my q6600), you could maybe promote this version to the new default one. :)

Theo
Posts: 2
Joined: Sat Jul 17, 2010 9:57 pm
Real Name: Theodor

Re: Houdini 1.03 is available

Post by Theo » Sat Jul 17, 2010 10:11 pm

Robert Houdart wrote:
sirabc wrote:
Robert Houdart wrote:If this processor is a double quad core with shared memory, the memory contention of the spin locks could absolutely kill the performance.
For this kind of architecture the "lockless hashing" promoted by Bob Hyatt should be used. I have since long coded the hashless locking in Houdini, but haven't activated it because for the most common hardware the spinlocks have better performance.
If your friend is interested, I could just compile him/her a 8_CPU version with "Hyatt hashing". Just let me know.
A shiny new compile as a present? Who wouldn't want it! I'll keep an eye on this thread for this compile. Hopefully this is the problem and we'll see Houdini crunch some nodes.

Thanks!
Sirabc, here's the download link for the 8_CPU version compiled with lockless hash table access ("Hyatt hashing"): http://www.cruxis.com/chess/download/Ho ... S_8CPU.zip

It may or may not solve the performance problem on the x5355 hardware, please keep me updated about success or failure.

Robert
Hi Robert you're absolutely wrong.You have made Houdini ONLY for Intel " i " and AMD processors.This is the problem why the speed is much,much slower on Intel QX Skulltraills and Intel Xenons 8 threads.
The User make since months tests with the most compiles of the Ippolit family and has never such problems.

Odeus37
Posts: 43
Joined: Mon Jun 14, 2010 5:38 pm

Re: Houdini 1.03 is available

Post by Odeus37 » Sat Jul 17, 2010 10:22 pm

I noticed one more difference with the "lockless hash table access" version : it fills his hash slower than the normal version.

With 512 MB hash, 4 cores, on my q6600, in 30s :

- lockless version is about 65% hashtable full
- normal version is about 97% full

Post Reply