Return of the Coprocessor

I ran across a very interesting writeup over at the Inquirer. While not being known as the most reputable source of Technology News in some circles, the Inquirer consistently brings us a peek at what's behind locked doors. They are almost always the first place you hear about anything, from GPU roadmaps to CPU die changes and beyond. Today I saw a piece talking about AMD's Accelerator technology. Searching the web, I've put together a few interesting tidbits of info that relate, and would like to share them with you, and share my vision of the future.
First off, one of the biggest issues for Intel to overcome has been the lack of a coherent transport bus between processors and other components such as the chipset. AMD introduced the Opteron with a revolutionary transport bus called Hypertransport. Each AMD processor today utilizes this technology, from the lowly Sempron to the All-Mighty 8xx series of Opteron processors. Here's the low-down: Hypertransport is to the processor as Serial-ATA is to the harddrive. Like Serial-ATA, Hypertransport uses a serial bus that moves very quickly. It's designed to connect chips on a motherboard up to about 2 feet apart, at blistering bi-directional speeds (currently up to around 22gb/sec). It uses simple interconnects that are very easy to implement, and most everyone has picked up on it. nVidia, ATI, and ULi use Hypertransport links to connect their chipsets, Cray has built a supercomputer using Hypertransport links to connect dozens of Opteron processors together, even Transmeta is listed as a founding partner in the Consortium.
AMD's CTO, Phil Hester, recently did an interview with ZDNet talking about the future of AMD processors. The important part is in the first couple of questions. In that interview, Hester talks a bit about Hypertransport and licensing the technology out to other companies that would like to develop with it. He even goes as far as to say that replacing an Opteron in an 8-way system with a chip that is specifically designed to run Java and XML would be very beneficial. This is something that everyone has been hot on for the past couple years with the mega-hertz race ending and performance per watt becoming the most important feature of a system. Ideas of using GPU's were first about a year ago, and ATI has actually embraced this with the X1000-line of GPU's. They feature what are now being referenced as GPGPU's, or general purpose GPU's. The attractiveness of the GPU is it has incredible memory bandwidth compared to CPU's, and it also is very scalable. With around 48 pixel pipelines in some of the newest GPU's, applications that thread easily are greatly sped up.
The other big eye-opener about this is AMD's hiring of John McCalpin. You can check out his bio here. He was hired to work on AMD's Accelerator technology, which to this point has been shrouded in secrecy, and still is for the most part. Remember, everything that follows is just speculation. I doubt that the Accelerator guys even know exactly what they're doing, to be honest. The Inquirer brought up this project and decided to dissect it. Read their ideas if you'd like, but mine follow pretty close behind.
Back in the old days of my family's Tandy 1000HX, processors were very very simple pieces of silicon. If you've never seen an 8086 like the one that lived in that machine, well here it is. After a few years of using that machine, we decided to upgrade it, mainly thanks to my brother's pleas. The Math Coprocessor was the addition of choice back then, and it greatly sped up floating point-based computations. Gradually the industry's views changed, and the coprocessors were integrated into the main processor. The Pentium's integrated math coprocessor was one of it's best features, and today just about everything requires one.
So back to the future, the technology industry has gradually moved towards old designs more and more in the past couple years, as we can do them right now. Serial interfaces, once thought of as the slowest communication, are now the fastest, wiping the parallel buses out of sight. With fast interconnects, the idea of a high-performance "system on a chip" has all but disappeared. This is why the idea of a new breed of coprocessor is so viable.
The GPU is technically a coprocessor, handling the video needs for the system. Audio chips like the X-Fi have grown to extravagant proportions, and Aegia has spurred on the need for a physics processor. Each of these chips is technically very simple, but also very specialized. Video cards have grown to ridiculous proportions, with the higest performing machines having memory that runs two to three times faster than system memory, on a bus twice as wide. The GPU's themselves started with one or two pixel pipelines, and now cram 16, 24, even 32 pipelines into a single chip. ATI claims that in certain operations, GPU's can handle 7 times the workload that a standard Opteron can do. Applications are vast, as you can see over at gpgpu.org.
So back to the future and away from the present. I believe the industry is in a move back to processors and coprocessors, living together on a single board. With Hypertransport, especially it's latest incarnation which should show up in AMD's future sockets that are coming in mere months, the interconnect is finally available to make it worth while. Although CPU's continue to get faster every couple months, the CPU is becoming an aging beast. With dual-core CPU's being pushed as the future, Symetric Multi-Processing is being thrown into everything. Breaking off chunks of code to run on specialized cores or even processors is right around the corner, whether the programmers like it or not. The Cell processor was the first to use this idea, with one main core and 7-8 smaller specialized cores. If each of those cores could be designed to do one thing and programs could be developed to take advantage of that, speed increases of 10 fold are likely.
So I know, most everyone's brain is now scrambled. Sorry guys
. Here's what is going to happen though, in more detail than I've ever seen.
-
AMD will develop a Java/.Net accelerator and place it in a socket 1207 package, lacking a memory controller and using far less than the 1207 pins that will be on future Opterons
-
nVidia most likely will produce a superior accelerator and get AMD off the hook for production
-
Having licensed the technology, nVidia can begin to look at desktop applications, starting with workstations of course, maybe in the nature of a tcp/ip accelerator for more secure computing, or possibly a rendering chip that makes the GPU less of a restriction
-
Then the good stuff, accelerators make it to the desktop. Probably first will be a physics-related chip, although the chips for the desktop will cover more than one application. GPU's will not be replaced, since textures need a lot of local memory and the interconnect is just not strong enough, however physics is something Aegia has us way too excited about not to have a desktop accelerator for
-
Slowly but surely, cards will be replaced with sockets. We're talking probably 10 years on this, and a lot can change in 10 years, but I believe that having a socketed GPU with a separate memory bank is very possible. That would take the strain of developing power circuits and PCB's off the GPU guys and let them devote their whole energy towards the chips.
Seeing as everything I've been discussing is based on AMD hardware, you must be saying "Where's Intel in this?" This is probably the biggest concern of many people, since Intel's Hypertransport killer has been pushed back another year or so. The so-called CSI that Intel has been working on is just not coming together like they had hoped, and the lack of an integrated memory controller really negates the need for it now anyways. The development of accelerators will require Intel to make up some lost time, and I think that we will have a situation where Intel will not play ball and will require vendors to make 2 versions of every chip: an Intel version and a HT version. While being devastating to the market, as the biggest player usually is, this will allow the motherboard manufacturers to have a lot of fun. We'll see things from companies that bridge the two technologies, probably fairly poorly with high latencies introduced, but none the less, we'll see it.
Honestly this is the most exciting topic that I've heard in a very long time. Breaking out of the mold of faster and faster CPU's, and moving towards more specialized chips will breathe a new life into the industry. Maybe when Bill Gates finally retires and Microsoft can go on it's own, innovation will truly shine like it has in the OpenSource community. Things like OpenFirmware for instance are very exciting to me, as it will allow people to have more control over their machines. Hopefully the public will learn to hate the inter workings of DRM and rise up against those that wish to lock our machines into a pattern. That's very hopeful, but somewhat possible, kinda. Anyways, the next 10 years are going to be awesome, and I can't wait.
Be sure to post a comment, and let's get this to be a big topic.

![View your cart items []](/modules/ecommerce/cart/images/cart_empty.png)