Feb 11, 2024

Unleashing An Open Source Torrent On CPUs And AI Engines

When you combine the forces of open source and the wide and deep semiconductor experience of legendary chip architect Jim Keller, something interesting is bound to happen. And that is precisely the plan with AI startup and now CPU maker Tenstorrent.

Tenstorrent was founded in 2016 by Ljubisa Bajic, Milos Trajkovic, and Ivan Hamer and is headquartered in Toronto. Keller was an angel investor and an advisor to the company from the get-go, and was brought in as chief technology officer in January 2021 after a stint at Intel’s server business, where he cleaned up some architectural and process messes as he did under a previous job at AMD. In January of this year, Keller was tapped to replace Bajic as chief executive officer, and the company is today announcing that it will bring in somewhere between $120 million and $150 million in its Series D funding, with Hyundai Motor Group and Samsung Catalyst Fund leading the round and with prior investors Fidelity Ventures, Eclipse Ventures, Epiq Capital, Maverick Capital, and others kicking in dough. To date, that will being the investment kitty to somewhere north of $384.5 million and will probably boost its valuation above $1.4 billion.

All that money is interesting, and necessary to pay for the substantial amount of engineering work that the Tenstorrent team needs to do to create a line of commercial-grade RISC-V server processors and AI accelerators to match them and, more importantly, to take on the hegemony of the Nvidia GPU in AI training. It is going to take money – and maybe a lot more money, and maybe not – to help companies cut the costs of AI training. What we do know is that Keller thinks he has just the team to do it, and we had a chat with him about the Tenstorrent mission, one that we have been looking forward to.

We will do a deep dive on the Tenstorent CPU and AI engine architectures in a follow-up.

Timothy Prickett Morgan: Let’s cut right to the chase scene. I have been dying to ask you this question because your answer matters. Why the hell do we need another AI accelerator?

Jim Keller: Well, the world abhors monopoly.

TPM: Yeah, but we got we have got so many different companies already in the game. None of it has worked to my satisfaction. It’s not like the Groq guys took the TPU idea, commercialized it, we’re done. It’s not like MapReduce and Yahoo Hadoop. Nirvana Systems and Habana Labs both had what I think were good architectures, and Intel has not had huge success with either. Graphcore and SambaNova are reasonable, Cerebras has waferscale and that is interesting. Esperanto is in there, too, with RISC-V. And everybody, as far as I can see, has a billion dollar problem to get to the next level. I know RISC-V is important, that it is the Linux of hardware and we’ve been waiting a long time for that moment. Using RISC-V to build an accelerator is the easy part of making an architectural choice.

What is it that Tenstorrent is doing can do that is different, better? I don’t expect you to spill all the architectural beans today, but what is driving you, and why?

Jim Keller: There are a bunch of things. First, whenever there’s a big hype cycle, more people get investments than are properly supportable by the industry. Ljubisa Bajic, one of the co-founders of Tenstorrent, and I had long chats because when SambaNova and Cerebras had very sky high valuations. So they raised a lot of money, and they started spending a lot of money, and we did the opposite. We had a $1 billion valuation post funding round last time and we were offered more money at higher valuations. And then we thought: Then what? Down rounds like everybody else? That’s really hard on your company. Like it kind of put your both your employees and your investors in a bad spot. So we raised less money at a lower valuation because we’re are in it for the long term.

Now, we have analyzed what Cerebras, Graphcore, SambaNova, Groq, and the others are doing, and they all have something interesting or they wouldn’t get funded.

You can say, well, we’re not going to make those mistakes and we have something to bring to the table.

I don’t think GPUs are the be all and end all of how to run AI programs. Everybody who describes an AI program, they describe a graph, and the graph needs to be lowered with interesting software transformations and map that to the hardware. That turns out to be a lot harder than is obvious for a bunch of reasons. But we feel like we’re actually making real progress on that. So we can make an AI computer that’s performant, and that works well and is scalable. We’re getting there.

The other is thing is that we started building a RISC-V – and we at Tenstorrernt we had long chats about this – and we think the future is going to be mostly AI. There is going to be interaction between general purpose CPUs and AI processors, and that program and software stack, and they are going to be on the same chip. And then there’s going to be lots of innovation in that space. And I called my good friends at Arm and said that we want to license it and it was a too expensive and they didn’t want to modify it. So we decided to build our own RISC-V processor. And we raised money partly on the last round on that thesis that RISC-V is interesting.

When we told customers about this, we were somewhat surprised – positively surprised – that people wanted to license the RISC-V processor standalone. And then we also found that some people who were interested in RISC-V are also interested in our AI intellectual property. When you look at the business model of Nvidia, AMD, Habana, and so on, they’re not licensing their IP to anybody. So people have come to us and they tell us that if we can prove our CPU or AI accelerator work – and the proof is silicon that runs – then they are interested in licensing the IP, both CPU and AI accelerator, to go build their own products.

The cool thing about building your own product is that you can own and control it and not pay 60 percent or 80 percent gross margin to someone else. So when people tell us Nvidia has already won and ask why Tenstorrent would compete, it is because whenever there’s a monopoly with really high margins that creates business opportunities.

TPM: This is a similar argument going on right now between InfiniBand, which is controlled by Nvidia, and the Ultra Ethernet Consortium. People keep telling me that Ethernet has been trying to kill InfiniBand since it was born. And I remind them that they are not competing with InfiniBand because it is dying, For the first time in two and a half decades, it is thriving. Same thing with Intel CPUs in the datacenter. There was no way 50 percent operating income for Data Center Group was going to hold over the long term. That kind of profit doesn’t just attract competition, it fuels it.

Jim Keller: In the real world, the actual gross margin is always somewhere in between. If you go much under 10 percent, you are going to really struggle to make any money and if you go over 50 percent you are going to invite competition.

Then there is the open source angle to all of this. The cool thing about open source is people can contribute. And then they can also have an opportunity to own it, or take a copy of it and do interesting stuff. Hardware is expensive to generate, taping out stuff is hard. But there are quite a few people building their own chips and they want to go do stuff.

Here is my thesis: We are going to start to generate more and more code with AI, and then the AI programs are an interaction between general purpose computing and AI computing, this is going to create, like a whole new wave of innovation. And AI has been fairly unique in that it has been amazingly open with models and frameworks – and then it’s running on very proprietary hardware.

TPM: A lot of the frameworks and models are not open source, and even those that are sometimes have commercial restrictions, like LLaMA, or have been closed off, like OpenAI in the transition from GTP-3 and GPT-3.5 to GPT-4.

Jim Keller: Yeah, there has been some very uneven terrain, I agree.

TPM: But I agree, there has been an element of openness to all of this. I would say something similar to relational databases decades ago.

So here is the question about open hardware: When you create a RISC-V processor, do you have to give it all back? What’s the licensing model?

Jim Keller: Here is the line that we are walking. RISC-V is an open source architecture, we have people contributing to that architecture definition. The reference model is open source, the guy who wrote the Whisper instruction set simulator works for us. We created a vector unit and contributed that. We built an RTL version of a vector unit and then open sourced that. We talked to a bunch of students and they said the infrastructure is good, but we need more test infrastructure. So we’re working on open sourcing our RTL verification infrastructure.

The RISC-V now owns the university research for computer architecture. It’s the de facto, default thing. Our AI processor has a RISC-V engine inside of it, and we’ve been trying to figure out how do we open source a RISC-V AI processor. Students want to be able to do experiments; they want to be able to download something, simulate it, make modifications, try and change it. And so we have a software stack on our engine, which we’re cleaning up so we can open source it, which we’re going to do this year. And then our hardware implementation has too many, let’s say, dirty bits in the hardware – you know, proprietary things. And we’re trying to figure out how to build an abstract version, which is a pretty clean RISC-V AI processor. And I would like to open source that because the cool thing about open source is once people start doing it and contribute to it, it grows. Open source is a one way street in this way: When people went to Linux, nobody went back to Unix.

I think we’re like 1 percent to maybe 5 percent of the way into the AI journey. I think there’s going to be so many experiments going on and open source is an opportunity for people to contribute. Just imagine, going back five years, if there was an open source AI engine. Instead of doing fifty random different things that didn’t work, imagine if they were doing their own random versions of an open source thing, but contributing back.

TPM: And that open source thing worked. Like GPT-3, for instance.

Jim Keller: Well, or that the net of all those people generated a really credible alternative to Nvidia that worked.

I’ve talked to lots of AI companies and when I was at Tesla, I saw lots of engines. And twenty companies would have 50 people working for two years building exactly the same thing the other nineteen companies all did. If that had been open source development, that would have moved a lot faster.

Some open source stuff, like PyTorch, has been open for a while, but the way the project ran wasn’t great, but PyTorch 2.0 fixed that. TVM is open source – we use that and it’s actually quite good. We will see what happens with Chris Lattner;s company, Modular AI, and the Mojo programming language. He says he’s going to open source Mojo, which does additional software compiler transformations. But we don’t have a clean target underneath that that drives some of the stuff. And so I was just talking to my guys today about how do we get our reference model cleaned up and make this a good open source AI engine reference model that people can add value to?

And once again, I think we’re in the short, you know, the early innings on how AI hardware is going to be built.

TPM: What’s your revenue model? You are going to build and sell things and you are going to license things, I assume?

Jim Keller: We build hardware. The initial idea was we’re going to build this great hardware. Last year, we got our first ten models working. We thought we had a path to maybe 30 models to 50 models, and we kind of stalled out. So we decided to refactor the code – we did two major rewrites of our software stack. And we are now onboarding some customers on the hardware we built. We did an announcement with LG, we have several more AI companies coming along the pipe. Then we built this RISC-V CPU, which is very high end. SiFive is a fine company, but their their projects are kind of in the middle, Ventana’s a little higher than that. And people kept telling us: We would like a very high-end CPU. So we’re building a very high-end CPU, and we are under discussions with ten organizations to license that.

We are a design company. We design a CPU, we design an AI engine, we design an AI software stack.

So whether it’s soft IP, a hard IP chiplet, or a complete chip, those are implementations. We were flexible on that front. For instance, on the CPU, we are going to license it multiple times before we tape out our own chiplet. We are talking to like a half a dozen companies who want to do like custom memory chiplets or NPU accelerators. I think for our next generation, both CPU and AI, we are going to build CPU and AI chiplets. But then other people will do other chiplets. And then we’ll put them together into systems.

TPM: They’re going to do the assembly and the systems, and all that you’re not interested in is literally making a package that you sell to Hewlett Packard, Dell, or whoever?

Jim Keller: We’ll see what happens. The weird thing is, you really have to build it the show it. People say, I would really like to build a billion of those, so show me 1,000. So we build a small cloud, we have 1000 of our AI chips in the cloud. When we first started, we were just going to put the chips in servers and give people access. It’s really easy. There’s Linux running, or you can have bare metal.

TPM: That was my next question. If you look at companies like Cerebras and SambaNova, they are really becoming cloud vendors or suppliers to specific cloud vendors looking for a niche and also a way to get AI done cheaper and easier than with GPUs from Nvidia. By my math, it looks like you need around $1 billion to train a next-gen AI model, and that money has to come from somewhere, or a way has to be found to do it cheaper.

Jim Keller: I’d say about half the AI software startups don’t even know you can buy computers. We talk to them, we get them interested, and then they ask if they can try it out on the cloud. On the flip side, as companies scale up, they start realizing that they are paying 3X or more to run AI on the clouds than in their own datacenters – it depends on what you are buying and what your amortization time is. It’s really expensive.

If we design a CPU and an AI accelerator that’s compelling, there are channels to the market: IP, chiplets, chips, systems, and cloud. It looks like to prove what you’re doing, you have to make chips, systems, and clouds to give people access to it. And then the key point is, can you build a business, build an engineering team, raise money, and generate revenue. Our investors mostly say we don’t need you to make a billion dollars, we need to sell tens of millions of dollars worth of stuff to show signal that the customers will pay for it – that it works and that they want it. And that’s the mission we’re on right now.

We’re on the journey. I told somebody recently, when things don’t work, you have a science project; when things work, you have a spreadsheet problem. A spreadsheet is like this. Our current chips are in Globalfoundries 12 nanometer. And somebody says, how fast would it be if you ported it to 3 nanometers. There’s no rocket science to it. You know performance of GF12 and TSMC 5N, 5N, and 3N, and you just spreadsheet it out and then ask, “Is that a compelling product?”

Did I think we were going to have to do all these things when I started? No, not really. But then again, is it surprising that as a company selling full function computers that you have to do everything? So I used to joke that when you build a product, there’s the 80/20 rule, which is 20 percent of the effort is the 80 percent of the results. And then there’s the 100 percent rule, which is you have to do 100 percent of the things that customers need to be successful.

TPM: In the modern era, companies don’t have to buy one of everything interesting to see what really works and what doesn’t. So that’s an improvement. But no matter the deployment model, the costs for AI training are very high.

Jim Keller: This is always true during a boom cycle. I have talked to multiple VCs that say they are raising $50 million for an AI software startup and $40 million of that will end up going to Nvidia. When you’re in a rush, that’s a good answer. And then you think, well, I could get the same performance from Tenstorrent for $10 million, but you have to do a lot more work. And then talk about the time value of money, and then they spend the money now. But when the hype cycle starts to wear off, and people start asking why are they spending this much money on stuff? Like, what’s a credible alternatives? How do we lower the cost?

TPM: You will be standing there. How much lower can you make AI training costs with Tenstorrent chips?

Jim Keller: Our target is 5X to 10X cheaper.

TPM: To be precise, 5X to 10X cheaper than GPU systems of similar performance.

Jim Keller: Yeah. There’s some technical reasons for that. We use significantly less memory bandwidth because we have a graph compiler and our architecture is more of a dataflow machine than are GPUs, so we can send data from one processing element to another. As soon as you use an HBM silicon interposer, it gets very expensive. One of the things that’s crazy right now is if you look at Nvidia’s markup on an H100 SXM5, most of the silicon content is from Samsung or SK Hynix. There is more value in the HBM DRAMs than in the Nvidia GPU silicon. And furthermore, if you want to build your own product, is Nvidia going to sell you an IP block or customize it for you? No.

TPM: Do you have any desire to do networking, or are you just focused on compute? I am hoping you give the right answer here.

Jim Keller: We have network ports on our chips, so we can hook them together in large arrays without going through somebody else’s switch. This is one of those reasons why, technically, our approach is cheaper than Nvidia’s approach. Nvidia likes selling high margin InfiniBand switches. We build a box where we don’t need that.

In their current situation, Nvidia is a big margin generator. In our situation, we ask why would you put an InfiniBand switch between a couple hundred chips? Why not just have the chips talk to each other directly? I’ve talked to a couple of really cool storage startups with really interesting products, and then they tell me their mission is to have really high margins. I tell them our mission is to really drive the cost of this down. You have to pick your mission.

So if somebody comes to me and they want to license rights to our technology so they can modify it and build their own products, I think that’s a great idea because I think innovation is going to accelerate when more people are able to take something solid, and then work on it. And that’s partly because I have confidence that we’ll learn from whoever we partner with. We have some really good designers and we’re thinking hard about our next generation.

TPM: So how do you shoot the between being Arm before SoftBank acquired it and after SoftBank did and Nvidia was chasing it? You want to be Arm, not twisted Arm.

Jim Keller: At the moment, we are a venture funded company, and our investors want our technology to work and want positive signal on our ability to build and sell product, which is what we’re focused on.

We just raised a round with Samsung and Hyundai for two different reasons.

Samsung knows me pretty well because I’ve done products with them at Digital Equipment, Apple, Tesla, and Intel – and they were all successful. They are interested in server silicon, in autonomous driving silicon, and AI silicon. So with RISC-V will be a generator of revenue, and they want to invest in that.

Hyundai came out of the talks we are having with every automotive company on the planet, and they all feel the industry needs to do something about the hold Mobileye and Nvidia have on them. They would like to have options, and many of the car makers would like to own their own solutions. Hyundai got very interested in us and said they wanted to invest, and they have become the number three automaker and they just bought Boston Dynamics, and they partner with Aptiv through Motional. They are making money building cars and other products, and they are very forward leaning.

In an environment where there’s going to be rapid change, you build a team around great people, and then you raise money. We’re raising over $100 million on an up round, in a tough market, and to be honest, it took a lot longer to close than it did last time, that’s for sure. I like working with Samsung, I had a lot of success with them. They’re a good, solid fab. They have a big IP portfolio, and we’re going to help them build a premium product and bring it to market. The Hyundai guys are great, and I have talked to a bunch of people. They’re super smart. They want to build chips, they want to go fast. There’s lots of opportunities.

Featuring highlights, analysis, and stories from the week directly from us to your inbox with nothing in between.Subscribe now

Timothy Prickett Morgan:Jim Keller: TPM: Jim Keller: TPM:Jim Keller: TPM: Jim Keller: TPM:Jim Keller:TPM:Jim Keller:TPM: Jim Keller: TPM: Jim Keller: TPM:Jim Keller: TPM:Jim Keller:TPM: Jim Keller: TPM:Jim Keller:TPM: Jim Keller: TPM: Jim Keller: