We’re here at NVIDIA’s GPU Technology Conference (GTC) 2012, where NVIDIA is holding their semi-annual professional developers conference. There’s been a great deal announced that will take a few days to completely go over, but for now we wanted to start on the product side with NVIDIA’s major product announcements.  With the launch of GK104 back in March NVIDIA is now ready to start rolling out some of their professional productions, and while the next generation of Quadro is not yet ready, Tesla is another matter. This brings us to our first part of our GTC coverage: the next generation of Tesla cards, Tesla K10 and Tesla K20.

  Tesla K20* Tesla K10 Tesla M2090
Stream Processors <=2880 2 x 1536 512
Texture Units <=240 2 x 128 64
ROPs <=48 2 x 32 48
Core Clock ? 745MHz 650MHz
Shader Clock N/A N/A 1300MHz
Memory Clock ? 5GHz GDDR5 3.7GHz GDDR5
Memory Bus Width 384-bit 2 x 256-bit 384-bit
L2 Cache <=1.5MB 2 x 512KB 768KB
VRAM ? 2 x 4GB 6GB
ECC Full Partial (DRAM) Full
FP64 1/3 FP32 1/24 FP32 1/2 FP32
TDP ? 225W 225W
Transistor Count 7.1B 2 x 3.5B 3B
Manufacturing Process TSMC 28nm TSMC 28nm TSMC 40nm

The first of the new Teslas, and the only model slated to be available in the near future is the Tesla K10. In an interesting turn of events, Tesla K10 will be based on GK104. Specifically it’s a dual-GPU card based on NVIDIA’s recently launched GTX 690, modified to fit the needs of the GPU compute market. Previous generation Tesla cards have always been based on NVIDIA’s top-tier GPUs – GT200 and GF100/GF110 respectively – so this is the first time NVIDIA has ever split the Tesla market in this way by using a lower tier GPU.

The fact of the matter is that with GK104 first launching in GeForce products, NVIDIA downplayed GK104’s compute capabilities. And our own benchmarking has established that GTX 680’s compute performance is anywhere between slightly ahead of the Fermi based GTX 580 to well behind it. Being a descendant of GF114, GK104 had a fair bit of its compute capabilities stripped out relative to GF110, not the least of which is double-precision floating point performance, ECC cache protection, and a about half of the number of registers per CUDA core relative to Fermi.

Given the questionable compute performance of GK104, this makes NVIDIA’s decision to launch a Tesla part based on it quite unexpected.  Still, this is not to say that GK104 can’t perform well in the right situations and this is exactly what NVIDIA designing K10 around. The fact that we’ve found GK104 cards to be slow at compute workloads at times is not lost on NVIDIA; they know better than anyone else what GK104 really can and can’t do and have planned accordingly. For that reason NVIDIA is breaking from what little tradition there is with Tesla as a broad market product and pitching K10 at a very specific market.

NVIDIA’s market strategy here is actually summed up rather well in their K10 press release: “NVIDIA Tesla K10 GPU Accelerates Search for Oil and Gas Reserves, Signal and Image Processing for Defense Industry.” GK104 lacks the ECC and compute flexibility of the Fermi Tesla cards, but what it doesn’t lack is single-precision compute performance and memory bandwidth; and with a dual-GPU card in particular it has both of those in spades. Accordingly, NVIDIA’s goal for K10 is to go after the specific market segments that don’t need ECC and don’t need flexibility, but do need all the raw compute performance they can get. This as it turns out is something gamers are already familiar with: image processing. Image processing doesn’t need the incredible levels of precision that pure computational work does and for that matter it’s rather tolerant of the errant error, so NVIDIA believes there’s a suitably large market there that can be served by GK104 rather than GK110.

With that said, I must admit that if GK110 had come first I don’t know if we’d be having this conversation. Even if a dual GK104 card is faster splitting their market like this is not an easy to move to make. But with GK110 not due in retail for another 5-6 months it’s obviously NVIDIA’s only choice if they want to get new Tesla cards out on the market before the end of the year.

In any case we’ll know more about the full performance of K10 soon enough. Based on GTX 680 I think we already have a good idea of GK104’s basic strengths and weaknesses, but I also have to consider the possibility that NVIDIA has been sandbagging the GTX 600 series’ compute performance. NVIDIA has handicapped GeForce performance in a few different ways for quite a number of years in order to create distinct market segments, first for Quadro and more recently for Tesla.  With GTX 580 this was done by handicapping both double-precision and geometry performance, but because GK104 is inherently weak at double-precision NVIDIA would need to handicap the GTX 600 series in some other manner if they wanted to maintain this kind of market segmentation.  So perhaps GK104 is actually faster at compute than what we’ve seen so far?

Wrapping things up, while NVIDIA hasn’t posted every last spec for K10 they have posted enough for us to work with.  Like GTX 690 K10 is using fully enabled GK104 GPUs, so based on NVIDIA’s theoretical performance data of 4.58TFLOPs with 320GB/sec of bandwidth it’s almost certainly clocked at around 745MHz core and 5GHz memory. Meanwhile for memory the card has 8GB of GDDR5, which breaks down to 16 2Gb GDDR5 modules per GPU for a total of 32 on the card. TDP is said to be identical to M2090, which would make it a 225W part.  Finally, as far as availability and pricing is concerned officially K10 is available “now” though in practice partners won’t be shipping cards and systems until closer to the end of the month. Pricing is expected to be close to that of the M2090 it replaces, which would mean we’re looking at $2500 and higher.

Tesla K20 - The First GK110 Product


View All Comments

  • abhishek6893 - Thursday, May 17, 2012 - link

    @Ryan Smith
    Hello. Where is this NVIDIA GTC taking place? Can't NVIDIA conduct GTC in India?
    Please keep me more updated about NVIDIA's GTC. I am interested in these lectures.
    Thans in advance.
    Abhishek Patil,
  • Ryan Smith - Thursday, May 17, 2012 - link

    GTC (prime) takes place in San Jose, California, which is NVIDIA's home city. They do hold smaller international GTC events that are focused on training, but AFAIK those have only been held in Japan and China so far. Reply
  • PEJUman - Thursday, May 17, 2012 - link

    it's a bit unusual to see 15 SMXes instead of the usual 2^x units (16?).

    Seems to me they are harvesting the GK100 chips as in GTX 480 days. i.e. the chip is too big (with respect to the 28nm process capability), to be able to have enough yield for a complete 16SMXes?
    It's very funny to see a company as smart as them to fall under the same pitfall twice (GTX 480 & now)

    Made me wonder if Nvidia actually made a design decision on using this approach? i.e. they were planning to disable 1 or 2 SMXes right from the start, at which point, why didn't they come up with 17 SMXes design.
    I assumed there is interest in the 2^x SMX units since they fit nicely on bus/buffer widths (32, 64, 128, etc).
  • Kevin G - Thursday, May 17, 2012 - link

    The die shot that is floating around appears that it only has 15 SMX clusters instead of 16. Reply
  • chizow - Thursday, May 17, 2012 - link

    The early leaks of the GeForce GK110 part specified 2304 SPs, which indicates only 12/15 SMX active. It could be that Nvidia is already factoring in the harvesting and TDP targets in their realized yields for GK110 wafers.

    We may never see a full 2880 SP Kepler, and it may have been Nvidia's intention when designing an odd numbered 15 SMX chip.

    Truth of the matter is, with so many SPs, losing a few SMX would easily be the lesser of the evils if it came down to cutting functional units (see GTX 670 and GTX 680).
  • RussianSensation - Thursday, May 17, 2012 - link

    Even with 2304 SPs and 384-bit memory bandwidth, GTX780 will be very fast.

    Right now GTX680 is faster than HD7970:

    1536 SP vs. 2048 SP
    192 GB/sec vs. 264 GB/sec bandwidth
    32 ROP vs. 32 ROP
    128 TMUs vs. 128 TMUs

    Based on that Kepler needs 25% less SPs and 27% less memory bandwidth to compete. Even if HD8000 has 2560 SPs, 2304 SP GTX780 will still be plenty fast.

    Of course if it has full blown 2,880 SPs, it will be insanely fast.
  • clbench - Friday, May 18, 2012 - link

    HD 7970 seems faster than GTX 680 in most compute benchmarks:
  • CeriseCogburn - Saturday, May 19, 2012 - link

    Too bad amd has little to none at all compute software to put their card to use in compute, while the will still win again in sales nVidia has a huge and supported base of the same.

    You gotta love that paper amd phantom compute.
  • Ryan Smith - Thursday, May 17, 2012 - link

    K20 may ship with disabled functional units - this is part of what NVIDIA needs to figure out as they finish bringing up the chip - but GK110 as presented is complete. There are no hidden units (okay, the Polymorphs aren't on the diagram because of the audience), every functional unit is accounted for. So it's a design decision, specifically one that has an equal number of SMXs (5) for each pair of memory controllers. Reply
  • thebluephoenix - Thursday, May 17, 2012 - link

    Can those FP64 shaders do FP32? Or when GK110 GeForce comes out they'll do just nothing in games?

    Also, isn't better way of building compute efficient GPU to make all shaders FP64 capable 1/1 or 1/2 FP32? Like Fermi or GCN.

Log in

Don't have an account? Sign up now