понедельник, 12 октября 2009 г.

GPU: Top 10 Innovations and Top 3 Next Challenges in Fermi

Тут в статье(8 страниц) "The Top 10 Innovations in the New NVIDIA Fermi Architecture, and the Top 3 Next Challenges" говорят про 10 штук которые есть в Fermi и 3 которые еще докрутят:

Top 10 Innovations in Fermi
- Real Floating Point in Quality and Performance
- Error Correcting Codes on Main Memory and Caches
- 64 bit Virtual Address Space
- Caches
- Fast Context Switching
- Unified Address Space
- Debugging Support
- Faster Atomic Instructions to Support Task Based Parallel Programming
- A Brand New Instruction Set
- Also, Fermi is Faster than G80

Top 3 Next Challenges
- The Relatively Small Size of GPU Memory
- Inability to do I/O directly to GPU Memory
- No Glueless Multisocket Hardware and Software

Цитаты:
* The table previews my take on the Top 10 mostimportant innovations in the new Fermi architecture. This list is from a computer architect’s perspective, as a user would surely rank performance higher. At the endof the paper, I offer 3 challenges on how to bring future GPUs even closer to mainstream computing, which the table also lists.

* To my knowledge, Fermi is the first implementation of the newly revised IEEE 754 2008 standard, with Fused Multiply Add, a 16‐bit floating‐point memory format, all the rounding modes, and even support for denormalized numbers in hardware.

* As part of its large step towards mainstream computing, Fermi now has configurable 64 KB private first‐level caches with every streaming multiprocessor and a 768 KB shared second‐level cache.

* As long as they were changing the address size, providing a unified address space, and improving atomic instruction, the Fermi architects changed instructions sets completely to a more RISC‐like load/store architecture instead of an x86‐like architecture that had memory‐based operands. Although this dramatic change took dozens of person years to pull off, once again PTX made it much less onerous than when mainstream computing switches instruction sets.

* Use faster GDDR5 DRAM versus GDDR3 DRAM in G80;

* Back in the mainframe days, the famous computer architect Gene Amdahl proposed a rule of thumb for a well balanced design: “1 byte of memory and 1 byte per second of I/O are required for each instruction per second supported by a computer.”

* Stated alternatively, I am sure there would be a market for GPUs with terabytes of DRAM, even if the DRAM bandwidth dropped a bit to allow for the larger capacity.

* “I have been asked to describe the microprocessor of 2020. Such predictions in my opinion tend to overstate the worth of radical, new computing technologies. Hence, I boldly predict that changes will be evolutionary in nature, and not revolutionary. … I do not think the microprocessor of 2020 will be startling to people from our time… Pipelining, superscalar organization and caches will continue to play major roles in the advancement of microprocessor technology, and if hopes are realized, parallel processing will join them. … If parallel processing succeeds, this sea of transistors could also be used by multiple processors on a single chip, giving us a micromultiprocessor. … Looking ahead, microprocessor performance will easily keep doubling every 18 months through the turn of the century. After that, it is hard to bet against a curve that has outstripped all expectations.”

1 комментарий: