Wednesday, February 12

Last month, U.S. financial markets tumbled after a Chinese start-up called DeepSeek said it had built one of the world’s most powerful artificial intelligence systems using far fewer computer chips than many experts thought possible.

A.I. companies typically train their chatbots using supercomputers packed with 16,000 specialized chips or more. But DeepSeek said it needed only about 2,000.

As DeepSeek engineers detailed in a research paper published just after Christmas, the start-up used several technological tricks to significantly reduce the cost of building its system. Its engineers needed only about $6 million in raw computing power, roughly one-tenth of what Meta spent in building its latest A.I. technology.

What exactly did DeepSeek do? Here is a guide.

The leading A.I. technologies are based on what scientists call neural networks, mathematical systems that learn their skills by analyzing enormous amounts of data.

The most powerful systems spend months analyzing just about all the English text on the internet as well as many images, sounds and other multimedia. That requires enormous amounts of computing power.

About 15 years ago, A.I. researchers realized that specialized computer chips called graphics processing units, or GPUs, were an effective way of doing this kind of data analysis. Companies like the Silicon Valley chipmaker Nvidia originally designed these chips to render graphics for computer video games. But GPUs also had a knack for running the math that powered neural networks.

As companies packed more GPUs into their computer data centers, their A.I. systems could analyze more data.

But the best GPUs cost around $40,000, and they need huge amounts of electricity. Sending the data between chips can use more electrical power than running the chips themselves.

It did many things. Most notably, it embraced a method called “mixture of experts.”

Companies usually created a single neural network that learned all the patterns in all the data on the internet. This was expensive, because it required enormous amounts of data to travel between GPU chips.

If one chip was learning how to write a poem and another was learning how to write a computer program, they still needed to talk to each other, just in case there was some overlap between poetry and programming.

With the mixture of experts method, researchers tried to solve this problem by splitting the system into many neural networks: one for poetry, one for computer programming, one for biology, one for physics and so on. There might be 100 of these smaller “expert” systems. Each expert could concentrate on its particular field.

Many companies have struggled with this method, but DeepSeek was able to do it well. Its trick was to pair those smaller “expert” systems with a “generalist” system.

The experts still needed to trade some information with one another, and the generalist — which had a decent but not detailed understanding of each subject — could help coordinate interactions between the experts.

It is a bit like an editor’s overseeing a newsroom filled with specialist reporters.

Much more. But that is not the only thing DeepSeek did. It also mastered a simple trick involving decimals that anyone who remembers his or her elementary school math class can understand.

Remember your math teacher explaining the concept of pi. Pi, also denoted as π, is a number that never ends: 3.14159265358979 …

You can use π to do useful calculations, like determining the circumference of a circle. When you do those calculations, you shorten π to just a few decimals: 3.14. If you use this simpler number, you get a pretty good estimation of a circle’s circumference.

DeepSeek did something similar — but on a much larger scale — in training its A.I. technology.

The math that allows a neural network to identify patterns in text is really just multiplication — lots and lots and lots of multiplication. We’re talking months of multiplication across thousands of computer chips.

Typically, chips multiply numbers that fit into 16 bits of memory. But DeepSeek squeezed each number into only 8 bits of memory — half the space. In essence, it lopped several decimals from each number.

This meant that each calculation was less accurate. But that didn’t matter. The calculations were accurate enough to produce a really powerful neural network.

Well, they added another trick.

After squeezing each number into 8 bits of memory, DeepSeek took a different route when multiplying those numbers together. When determining the answer to each multiplication problem — making a key calculation that would help decide how the neural network would operate — it stretched the answer across 32 bits of memory. In other words, it kept many more decimals. It made the answer more precise.

Well, no. The DeepSeek engineers showed in their paper that they were also very good at writing the very complicated computer code that tells GPUs what to do. They knew how to squeeze even more efficiency out of these chips.

Few people have that kind of skill. But serious A.I. labs have the talented engineers needed to match what DeepSeek has done.

Some A.I. labs may be using at least some of the same tricks already. Companies like OpenAI do not always reveal what they are doing behind closed doors.

But others were clearly surprised by DeepSeek’s work. Doing what the start-up did is not easy. The experimentation needed to find a breakthrough like this involves millions of dollars — if not billions — in electrical power.

In other words, it requires enormous amounts of risk.

“You have to put a lot of money on the line to try new things — and often, they fail,” said Tim Dettmers, a researcher at the Allen Institute for Artificial Intelligence in Seattle who specializes in building efficient A.I. systems and previously worked as an A.I. researcher at Meta.

“That is why we don’t see much innovation: People are afraid to lose many millions just to try something that doesn’t work,” he added.

Many pundits pointed out that DeepSeek’s $6 million covered only what the start-up spent when training the final version of the system. In their paper, the DeepSeek engineers said they had spent additional funds on research and experimentation before the final training run. But the same is true of any cutting-edge A.I. project.

DeepSeek experimented, and it paid off. Now, because the Chinese start-up has shared its methods with other A.I. researchers, its technological tricks are poised to significantly reduce the cost of building A.I.

Share.

Leave A Reply

5 × three =

Exit mobile version