Using Spin-Loops on Intel® Pentium® 4 Processor and Intel® Xeon™ Processor:
Погрузитесь в мир C + Asm:
Spin-wait loops:
"The PAUSE instruction introduces a slight delay in the loop and effectively causes the memory
requests to be issued at approximately the maximum speed of the memory system bus,
approximately equal to the highest speed at which the sync_var value can be changed by
another processor. There is no point in trying to issue requests any faster than this. The net
effect of this usage is to improve spin-wait performance significantly. Inserting the PAUSE instruction has the added benefit of significantly reducing the power consumed during the spinwait because fewer system resources are used."
False sharing:
"Now consider the repercussions of having data from the critical section on the same cache line as the synchronization variable. While in the critical section, processor A obtains exclusive access
in order to modify the data. Since processors B, C, and D are constantly reading the synchronization variable, and since the synchronization variable is on the same line as the data,
processor A will lose exclusive access immediately after modifying the data. In other words,
because A needs exclusive access to the data, it must obtain exclusive access to the synchronization variable. Since processors B, C, and D need shared access to the synchronization variable, the cache line containing both items must change state over and over. This situation is called “false sharing”, and it results in many wasted transactions on the system bus that can seriously impact overall system performance. This situation is avoided if the synchronization variable is on one cache line, and the critical section data on a different cache line. Then, the first cache line remains shared between the processors while processor A maintains exclusive access to the second cache line. No changes in ownership of either cache line are required until processor A releases the critical section."
"4 Summary:
Synchronization between threads frequently involves the use of spin-wait loops. These loops
should make use of the PAUSE instruction to maximize performance and minimize power
consumption. The PAUSE instruction can be added to application code now, as it is ignored on
all known existing Intel architectures. Further, care should be taken to insure that
synchronization variables are the only data on a cache line to minimize system bus traffic. For
the Pentium 4 processor, the preferred cache size to honor is 128 bytes. If that is not possible,
then honoring a 64-byte line size is the next best thing."
воскресенье, 4 октября 2009 г.
Подписаться на:
Комментарии к сообщению (Atom)
Комментариев нет:
Отправить комментарий