DeepSeek has released a new paper,Watch Officetel: Lover Friend Online with co-founder Liang Wenfeng credited as a contributor, detailing how its latest large language model DeepSeek-V3 achieves efficient training and inference using only 2,048 H800 GPUs – significantly fewer than the tens of thousands typically required. The team attributes this efficiency to four key innovations: memory optimization through multi-head latent attention (MLA), computational savings via a Mixture-of-Experts (MoE) design with FP8 precision, communication improvements using a multi-plane network topology, and faster inference through multi-token prediction (MTP). With MLA, KV cache memory usage is cut to just 70KB per token, up to 1/7 that of competing models. MoE architecture activates only 37 billion of the model’s 671 billion parameters per forward pass, reducing training costs by 90% compared to dense models. FP8 training further halves compute and memory usage, with minimal accuracy tradeoff. Beyond the model, the paper also outlines five future directions for AI hardware design, advocating for tighter integration between software and hardware to address memory, compute, and networking bottlenecks. [36Kr, in Chinese]
How online advice columns teach us to tell our own storiesJoe Biden writes Julia LouisFacebook’s Oversight Board makes bizarre ruling in its first group of decisionsEvery euphemism people are using instead of calling Stephen Paddock a terroristIf you didn't know, Puerto Rico is surrounded by ocean water, according to TrumpHow to decide if you should confront that annoying coworkerArmy vet who knelt before Trump shares why, you know, protests are importantMarshawn Lynch has an important message for Donald Trump right on his preGoogle to pay $3.8 million to underpaid female engineers and overlooked job candidatesMariah Carey already has her Christmas tree up3 scientists win Nobel Prize in medicine for biological clock research'Judas and the Black Messiah' is pure dynamite: Movie review'Street Gang' celebrates classic 'Sesame Street': Movie reviewFacebook and Apple's PR war seriously heats upFrontline nurse wins contest to watch movies in a remote lighthouse by herselfTrump's new scandal proves Teflon Don can't beat America's hatred for air travelFord Mustang MachHere's what Donald Trump had to say in response to the Las Vegas'Zack Snyder's Justice League' lands on HBO Max in MarchSorry haters, Macklemore's 'Same Love' just charted number one again What Twitter is doubling down on amid all the cutbacks The New York Times just released the best thing of the election Hinge won't actually make everyone pay to keep using the dating app U.S. government prepares for Election Day cyber attacks Some of my reactions to the Cubs winning that I now realize may have been extreme When a presidential election costs you the person you love the most Brands fly the 'W' to mark the Cubs' historic win Earthworm Dave breaks records, is euthanized 'for science' The new Wonder Woman trailer: Peak 'nasty woman' This is what waiting 108 years for a Cubs championship looks like Bob the Cat is chill AF during his audience with the Duchess of Cambridge Jennifer Lawrence seems to be dating 'Black Swan' director Darren Aronofsky YouTube gives creators more power over trolls in their comments sections Samsung will block Galaxy Note7s from connecting to cellular networks A 'Pen Pineapple Apple Pen' cafe has opened in Tokyo WhatsApp is testing a Snapchat Stories Melania Trump rails against bullying, a Trump trademark Facebook’s $200MM satellite wouldn’t have been enough to bring Sub Saharan Africa online Your slacktivism isn't as useless as everyone thinks Female tattooist illustrates hip hop icons with stunning photorealism
2.1275s , 8184.125 kb
Copyright © 2025 Powered by 【Watch Officetel: Lover Friend Online】,Pursuit Information Network