Amazon Web Services this week introduced Trainium2, its new accelerator for artificial intelligence (AI) workload that tangibly increases performance compared to its predecessor, enabling AWS to train foundation models (FMs) and large language models (LLMs) with up to trillions of parameters. In addition, AWS has set itself an ambitious goal to enable its clients to access massive 65 'AI' ExaFLOPS performance for their workloads.

The AWS Trainium2 is Amazon's 2nd Generation accelerator designed specifically for FMs and LLMs training. When compared to its predecessor, the original Trainium, it features four times higher training performance, two times higher performance per watt, and three times as much memory – for a total of 96GB of HBM. The chip designed by Amazon's Annapurna Labs is a multi-tile system-in-package featuring two compute tiles, four HBM memory stacks, and two chiplets whose purpose is undisclosed for now.

Amazon notably does not disclose specific performance numbers of its Trainium2, but it says that its Trn2 instances are scale-out with up to 100,000 Trainium2 chips to get up to 65 ExaFLOPS of low-precision compute performance for AI workloads. Which, working backwards, would put a single Trainium2 accelerator at roughly 650 TFLOPS. 65 EFLOPS is a level set to be achievable only on the highest-performing upcoming AI supercomputers, such as the Jupiter. Such scaling should dramatically reduce the training time for a 300-billion parameter large language model from months to weeks, according to AWS.

Amazon yet has to disclose the full specifications for Trainium2, but we'd be surprised if it didn't add some features on top of what the original Trainium already supports. As a reminder, that co-processor supports FP32, TF32, BF16, FP16, UINT8, and configurable FP8 data formats as well as delivers up to 190 TFLOPS of FP16/BF16 compute performance.

What is perhaps more important than pure performance numbers of a single AWS Trainium2 accelerators is that Amazon has partners, such as Anthropic, that are ready to deploy it.

"We are working closely with AWS to develop our future foundation models using Trainium chips," said Tom Brown, co-founder of Anthropic. "Trainium2 will help us build and train models at a very large scale, and we expect it to be at least 4x faster than first generation Trainium chips for some of our key workloads. Our collaboration with AWS will help organizations of all sizes unlock new possibilities, as they use Anthropic's state-of-the-art AI systems together with AWS’s secure, reliable cloud technology."

Source: AWS

Comments Locked

19 Comments

View All Comments

  • mode_13h - Monday, December 4, 2023 - link

    A vengeful god made humans in his image, and look at the result!

    Sadly, humans didn't learn this lesson. We designed AI to think much the same way we do, and are training AI models on our own works and culture. You can't *really* expect such a process to yield terribly rational AI, can you?

    Garbage in, garbage out.
  • GeoffreyA - Wednesday, December 6, 2023 - link

    Exactly my thought. Rather than rational, it's going to be a monstrous version of humanity. The sad, ironic part is how companies are chasing after AI, scared to lose out on the billions of dollars, yet this technology is likely going to unravel the world. At the very least, make us a lot more stupid. Why think, when AI can think for us?
  • mode_13h - Thursday, December 7, 2023 - link

    Good points.

    Also, "hi!" It's been a while since we crossed posts!
    : )
  • GeoffreyA - Thursday, December 7, 2023 - link

    Yes, "hi," my friend! We used to have some great conversations, you, me, and Oxford Guy, and it's good to see everyone back. Well, I've just been caught up in personal life, problems with love and my lady, and it's been tough. I felt out of touch with a lot, but it's coming right, slowly.
  • mode_13h - Thursday, December 7, 2023 - link

    Regarding your personal life, I hope things work out for the best.

    OG and I are pretty much like oil & water. I don't miss some of those arguments.
  • GeoffreyA - Friday, December 8, 2023 - link

    Thank you.

    That's true. Seems like only yesterday. How time flies.

    Well, concerning computing, the big change that strikes me these days is AI, and while I see the advantages, I've got a bad feeling about all of it.
  • skaurus - Saturday, December 2, 2023 - link

    I'd like to note for future reference that I am of same opinion.
  • quorm - Friday, December 1, 2023 - link

    Trainium is still a terrible name.
  • eastcoast_pete - Sunday, December 3, 2023 - link

    Maybe to differentiate it from the "impossibletotrainium" setups? And yes, that's a pretty bad pun. Maybe ChatGPT can do better 😁?

Log in

Don't have an account? Sign up now