新聞中心
為了冷卻數(shù)據(jù)中心服務(wù)器,微軟轉(zhuǎn)向沸騰的液體
翻譯:宋媛媛
校對(duì):劉海峰
在哥倫比亞河?xùn)|岸的這個(gè)數(shù)據(jù)中心,微軟員工之間發(fā)送的電子郵件和其他通信實(shí)際上正在讓一個(gè)裝滿計(jì)算機(jī)服務(wù)器的鋼制容器內(nèi)的液體沸騰。
Emails and other communications sent betweenMicrosoft employees are literally making liquid boil inside a steel holdingtank packed with computer servers at this datacenter on the eastern bank of theColumbia River.
與水不同的是,沙發(fā)形儲(chǔ)罐內(nèi)的液體對(duì)電子設(shè)備無害,設(shè)計(jì)成沸騰溫度為122華氏度,比水的沸點(diǎn)低90度。
Unlike water, the fluid inside the couch-shapedtank is harmless to electronic equipment and engineered to boil at 122 degreesFahrenheit, 90 degrees lower than the boiling point of water.
服務(wù)器正在做的工作產(chǎn)生的沸騰效應(yīng)將熱量從承擔(dān)繁重工作的計(jì)算機(jī)處理器中帶走。低溫煮沸使服務(wù)器能夠在全功率下連續(xù)運(yùn)行,而不會(huì)因過熱而導(dǎo)致故障。
The boiling effect, which is generated by thework the servers are doing, carries heat away from laboring computerprocessors. The low-temperature boil enables the servers to operatecontinuously at full power without risk of failure due to overheating.
在儲(chǔ)罐內(nèi)部,從沸騰的流體中升起的蒸汽與儲(chǔ)罐蓋中的冷卻冷凝器接觸,這導(dǎo)致蒸汽變?yōu)橐后w,然后如下雨般滴回到浸入式服務(wù)器中,從而形成了一個(gè)閉環(huán)冷卻系統(tǒng)。
Inside the tank, the vapor rising from the boilingfluid contacts a cooled condenser in the tank lid, which causes the vapor tochange to liquid and rain back onto the immersed servers, creating a closedloop cooling system.
位于華盛頓州雷蒙德市的微軟數(shù)據(jù)中心高級(jí)開發(fā)團(tuán)隊(duì)的首席硬件工程師胡薩姆·艾麗薩(Husam Alissa)說:“我們是第一家在生產(chǎn)環(huán)境中運(yùn)行兩相浸沒冷卻的云提供商?!?/span>
“We are thefirst cloud provider that is running two-phase immersion cooling in aproduction environment,”said Husam Alissa, a principal hardware engineer onMicrosoft’s team for datacenter advanced development in Redmond, Washington.
數(shù)據(jù)中心的摩爾定律
Moore’s Law for the datacenter
在風(fēng)冷計(jì)算機(jī)芯片技術(shù)的可靠發(fā)展減慢之際,Microsoft長(zhǎng)期計(jì)劃的下一步就是在生產(chǎn)環(huán)境中部署兩相浸入式冷卻,以適應(yīng)對(duì)更快,功能更強(qiáng)大的數(shù)據(jù)中心計(jì)算機(jī)的需求。
The production environment deployment oftwo-phase immersion cooling is the next step in Microsoft’s long-term plan tokeep up with demand for faster, more powerful datacenter computers at a timewhen reliable advances in air-cooled computer chip technology have slowed.
幾十年來,芯片的進(jìn)步源于將更多的晶體管封裝到相同尺寸的芯片上的能力,這使得計(jì)算機(jī)處理器的速度每?jī)赡甏蠹s翻一番,而不會(huì)增加其電力需求。
For decades, chip advances stemmed from theability to pack more transistors onto the same size chip, roughly doubling thespeed of computer processors every two years without increasing their electricpower demand.
這種倍增現(xiàn)象被稱為摩爾定律摩爾定律在1965年觀察到了這一趨勢(shì),并預(yù)測(cè)這種趨勢(shì)將持續(xù)至少十年。它一直持續(xù)到2010年代,現(xiàn)在開始放慢速度。
This doubling phenomenon is called Moore’s Lawafter Intel co-founder Gordon Moore, who observed the trend in 1965 andpredicted it would continue for at least a decade. It held through the 2010sand has now begun to slow.
那是因?yàn)榫w管的寬度已經(jīng)縮小到原子級(jí),并且已經(jīng)達(dá)到物理極限。同時(shí),Alissa指出,對(duì)諸如人工智能等高性能應(yīng)用的更快計(jì)算機(jī)處理器的需求正在加速增長(zhǎng)。
That’s because transistor widths have shrunk tothe atomic scale and are reaching a physical limit. Meanwhile, the demand forfaster computer processors for high performance applications such as artificialintelligence has accelerated, Alissa noted.
為了滿足性能需求,計(jì)算行業(yè)已經(jīng)轉(zhuǎn)向可以允許更多電能消耗的芯片架構(gòu)。例如,中央處理單元或CPU已從每個(gè)芯片150瓦增加到300瓦以上。圖形處理單元(GPU)已增加到每個(gè)芯片700瓦以上。
To meet the need for performance, the computingindustry has turned to chip architectures that can handle more electric power.Central processing units, or CPUs, have increased from 150 watts to more than300 watts per chip, for example. Graphics processing units, or GPUs, haveincreased to more than 700 watts per chip.
輸入到這些處理器的電能越多,芯片就會(huì)變得越熱。增加的熱量提升了冷卻要求,以防止芯片發(fā)生故障。
The more electric power pumped through theseprocessors, the hotter the chips get. The increased heat has ramped up coolingrequirements to prevent the chips from malfunctioning.
位于Redmond的微軟數(shù)據(jù)中心高級(jí)開發(fā)小組杰出工程師及副總裁克里斯蒂安·貝拉迪(Christian Belady)說:“風(fēng)冷不夠了。”“這就是驅(qū)使我們進(jìn)行浸入式冷卻的原因,我們可以在其中直接將芯片的表面煮沸。”
“Air cooling is not enough,”said ChristianBelady, distinguished engineer and vice president of Microsoft’s datacenteradvanced development group in Redmond. “That’s what’s driving us to immersioncooling, where we can directly boil off the surfaces of the chip.”
他指出,液體中的熱傳遞比空氣更有效。
Heat transfer in liquids, he noted, is ordersof magnitude more efficient than air.
他補(bǔ)充說,此外,向液體冷卻的轉(zhuǎn)變?yōu)檎麄€(gè)數(shù)據(jù)中心帶來了類似摩爾定律的思維模式。
What’s more, he added, the switch to liquidcooling brings a Moore’s Law-like mindset to the whole of the datacenter.
他說:“液體冷卻使我們能夠變得更密集,從而在數(shù)據(jù)中心級(jí)別上繼續(xù)保持摩爾定律的趨勢(shì)?!?/span>
“Liquid cooling enables us to go denser, andthus continue the Moore’s Law trend at the datacenter level,”he said.
從加密貨幣礦工那里學(xué)到的教訓(xùn)
Lesson learned from cryptocurrency miners
Belady指出,液體冷卻是一種行之有效的技術(shù)。如今,道路上的大多數(shù)汽車都依靠它來防止發(fā)動(dòng)機(jī)過熱。包括微軟在內(nèi)的多家技術(shù)公司正在試驗(yàn)冷板技術(shù),該技術(shù)通過將液體通過金屬板輸送到服務(wù)器來冷卻服務(wù)器。
Liquid cooling is a proven technology, Beladynoted. Most cars on the road today rely on it to prevent engines fromoverheating. Several technology companies, including Microsoft, areexperimenting with cold plate technology, in which liquid is piped throughmetal plates, to chill servers.
加密貨幣行業(yè)的參與者率先開發(fā)了用于計(jì)算設(shè)備的浸沒式液冷,利用它來冷卻記錄數(shù)字貨幣交易的芯片。
Participants in the cryptocurrency industrypioneered liquid immersion cooling for computing equipment, using it to coolthe chips that log digital currency transactions.
微軟深入研究了液體浸沒作為AI等高性能計(jì)算應(yīng)用程序的冷卻解決方案。除其他事項(xiàng)外,調(diào)查顯示,兩相浸入式冷卻可將任何給定服務(wù)器的功耗降低5%至15%。
Microsoft investigated liquid immersion as acooling solution for high-performance computing applications such as AI. Amongother things, the investigation revealed that two-phase immersion cooling reducedpower consumption for any given server by 5% to 15%.
這些發(fā)現(xiàn)促使微軟團(tuán)隊(duì)與數(shù)據(jù)中心IT系統(tǒng)制造商和設(shè)計(jì)師Wiwynn合作開發(fā)了兩階段浸入式冷卻解決方案。第一個(gè)解決方案現(xiàn)在在昆西的Microsoft數(shù)據(jù)中心運(yùn)行。
The findings motivated the Microsoft team towork with Wiwynn, a datacenter IT system manufacturer and designer, to develop atwo-phase immersion cooling solution. The first solution is now running atMicrosoft’s datacenter in Quincy.
沙發(fā)形的水箱中充滿了3M的特別設(shè)計(jì)研發(fā)的流體。3M的液體冷卻液具有介電特性,使其成為有效的絕緣體,使服務(wù)器在完全浸入液體的情況下仍能正常運(yùn)行。
That couch-shaped tank is filled with anengineered fluid from 3M. 3M’s liquid cooling fluids have dielectric propertiesthat make them effective insulators, allowing the servers to operate normallywhile fully immersed in the fluid.
微軟技術(shù)研究員兼公司副總裁、Azure計(jì)算首席架構(gòu)師馬庫斯·豐圖拉(MarcusFonoura)表示,這種向兩階段液體浸泡冷卻的轉(zhuǎn)變?yōu)楦咝Ч芾碓瀑Y源提供了更大的靈活性。
This shift to two-phase liquid immersioncooling enables increased flexibility for the efficient management of cloudresources, according to Marcus Fontoura, a technical fellow and corporate vicepresident at Microsoft who is the chief architect of Azure compute.
例如,管理云資源的軟件可以將數(shù)據(jù)中心計(jì)算需求的突然峰值分配給液冷箱中的服務(wù)器。這是因?yàn)檫@些服務(wù)器可以在更高的功率下運(yùn)行-這一過程被稱為超頻-而不會(huì)有過熱的風(fēng)險(xiǎn)。
For example,software that manages cloud resources can allocate sudden spikes in datacentercompute demand to the servers in the liquid cooled tanks. That’s because theseservers can run at elevated power –a process called overclocking –withoutrisk of overheating.
方圖拉說:“例如,當(dāng)你到達(dá)1點(diǎn)鐘或2點(diǎn)鐘的時(shí)候,Teams就會(huì)出現(xiàn)一個(gè)巨大的峰值,因?yàn)槿藗冊(cè)谕粫r(shí)間加入會(huì)議。”“浸入式冷卻為我們提供了更大的靈活性來處理這些突發(fā)性工作負(fù)載?!?/p>
“For instance, we know that with Teams when youget to 1 o’clock or 2 o’clock, there is a huge spike because people are joiningmeetings at the same time,”Fontoura said. “Immersion cooling gives us moreflexibility to deal with these burst-y workloads.”
可持續(xù)數(shù)據(jù)中心
Sustainable datacenters
Fonoura補(bǔ)充說,將兩階段沉浸冷卻服務(wù)器添加到可用的計(jì)算資源組合中,還將允許機(jī)器學(xué)習(xí)軟件在整個(gè)數(shù)據(jù)中心(從電力和冷卻到維護(hù)技術(shù)人員)更有效地管理這些資源。
Adding the two-phase immersion cooled serversto the mix of available compute resources will also allow machine learningsoftware to manage these resources more efficiently across the datacenter, frompower and cooling to maintenance technicians, Fontoura added.
他說:“我們不僅會(huì)對(duì)效率產(chǎn)生巨大影響,還會(huì)對(duì)可持續(xù)性產(chǎn)生巨大影響,因?yàn)槟阋_保不會(huì)浪費(fèi),確保我們部署的每一件IT設(shè)備都能得到很好的利用?!?/span>
“We will have not only a huge impact onefficiency, but also a huge impact on sustainability because you make sure thatthere is not wastage, that every piece of IT equipment that we deploy will bewell utilized,”he said.
液體冷卻也是一種無水技術(shù),這將幫助微軟實(shí)現(xiàn)到本世紀(jì)末補(bǔ)水量超過消耗量的承諾。
Liquid cooling is also a waterless technology,which will help Microsoft meet its commitmentto replenish more water than it consumes bythe end of this decade.
流經(jīng)儲(chǔ)罐并使蒸汽凝結(jié)的冷卻盤管連接到一個(gè)單獨(dú)的閉環(huán)系統(tǒng),該系統(tǒng)使用流體將熱量從儲(chǔ)罐轉(zhuǎn)移到儲(chǔ)罐容器外的干式冷卻器。艾麗莎解釋說,因?yàn)檫@些盤管中的流體總是比周圍的空氣更熱,所以沒有必要噴水來調(diào)節(jié)空氣的蒸發(fā)冷卻。
The cooling coils that run through the tank andenable the vapor to condense are connected to a separate closed loop systemthat uses fluid to transfer heat from the tank to a dry cooler outside thetank’s container. Because the fluid in these coils is always warmer than theambient air, there’s no need to spray water to condition the air forevaporative cooling, Alissa explained.
微軟與基礎(chǔ)設(shè)施行業(yè)合作伙伴一起,也在研究如何以減少流體損失并且對(duì)環(huán)境幾乎沒有影響的方式來運(yùn)行儲(chǔ)罐。
Microsoft, together with infrastructureindustry partners, is also investigating how to run the tanks in ways thatmitigate fluid loss and will have little to no impact on the environment.
Azure首席軟件工程師伊安尼斯·馬努薩基斯(IoannisManousakis)表示:“如果方法得當(dāng),兩相浸沒冷卻將同時(shí)實(shí)現(xiàn)我們所有的成本、可靠性和性能要求,而與空氣冷卻相比,其能耗基本上只有一小部分?!?/span>
“If done right, two-phase immersion coolingwill attain all our cost, reliability and performance requirementssimultaneously with essentially a fraction of the energy spend compared to aircooling,”said IoannisManousakis, a principal software engineer with Azure.
‘我們把海帶到了服務(wù)器上’
‘We brought the sea to the servers’
微軟對(duì)兩相浸沒式冷卻的深入研究是該公司多管齊下的戰(zhàn)略的一部分,該戰(zhàn)略旨在使數(shù)據(jù)中心的構(gòu)建,運(yùn)營(yíng)和維護(hù)更加可持續(xù)和高效。
Microsoft’s investigation into two-phaseimmersion cooling is part of the company’s multi-pronged strategy to makedatacenters more sustainable and efficient to build, operate and maintain.
例如,數(shù)據(jù)中心高級(jí)開發(fā)團(tuán)隊(duì)還正在探索使用氫燃料電池代替柴油發(fā)電機(jī)在數(shù)據(jù)中心進(jìn)行備用發(fā)電的可能性。
For example, the datacenter advanceddevelopment team is also exploring the potential to usehydrogen fuel cells instead of diesel generators for backuppower generation at datacenters.
液體冷卻項(xiàng)目類似于微軟的Natick項(xiàng)目,該項(xiàng)目正在探索水下數(shù)據(jù)中心的可能性,這些數(shù)據(jù)中心可以快速部署,并且可以在海床上密封于類似潛艇的管狀容器內(nèi)運(yùn)行數(shù)年,而無需人工進(jìn)行任何現(xiàn)場(chǎng)維護(hù)。
The liquid cooling project is similar to Microsoft’sProject Natick, which is exploring the potential ofunderwater datacenters that are quick to deploy and can operate for years onthe seabed sealed inside submarine-like tubes without any onsite maintenance bypeople.
水下數(shù)據(jù)中心充斥著干燥的氮?dú)饪諝?,而不是特別設(shè)計(jì)研發(fā)的流體。服務(wù)器用風(fēng)扇和熱交換管道系統(tǒng)冷卻,管道系統(tǒng)通過密封的管道輸送海水。
Instead of an engineered fluid, the underwaterdatacenter is filled with dry nitrogen air. The servers are cooled with fansand a heat exchange plumbing system that pumps piped seawater through thesealed tube.
來自Project Natick的一個(gè)關(guān)鍵發(fā)現(xiàn)是,海底服務(wù)器的故障率是陸地?cái)?shù)據(jù)中心復(fù)制服務(wù)器故障率的八分之一。初步分析表明,缺乏濕度和氧氣的腐蝕作用是水下服務(wù)器性能優(yōu)越的主要原因。
A key finding from Project Natick is that theservers on the seafloor experienced one-eighth the failure rate of replicaservers in a land datacenter. Preliminary analysis indicates that the lack ofhumidity and corrosive effects of oxygen were primarily responsible for thesuperior performance of the servers underwater.
Alissa預(yù)計(jì),液浸箱中的服務(wù)器將體驗(yàn)到類似的卓越性能。他說:“我們把大海帶到了服務(wù)器上,而不是把數(shù)據(jù)中心放在海底?!?/span>
Alissa anticipates the servers inside theliquid immersion tank will experience similar superior performance. “We broughtthe sea to the servers rather than put the datacenter under the sea,”he said.
未來
The future
如果浸沒式箱體中的服務(wù)器的故障率如預(yù)期的那樣降低,則Microsoft可以轉(zhuǎn)換到一種模式,即在出現(xiàn)故障時(shí)不立即更換組件。這將限制蒸氣損失,并允許將油箱部署在偏遠(yuǎn)且難以維修的位置。
If the servers in the immersion tank experiencereduced failure rates as anticipated, Microsoft could move to a model wherecomponents are not immediately replaced when they fail. This would limit vaporloss as well as allow tank deployment in remote, hard-to-service locations.
此外,Belady指出,能夠?qū)⒎?wù)器密集地包裝在儲(chǔ)罐中,從而實(shí)現(xiàn)了重新構(gòu)想的服務(wù)器體系結(jié)構(gòu),該體系結(jié)構(gòu)針對(duì)低延遲,高性能應(yīng)用程序和低維護(hù)操作進(jìn)行了優(yōu)化。
What’s more, the ability to densely packservers in the tank enables a re-envisioned server architecture that’soptimized for low-latency, high-performance applications as well aslow-maintenance operation, Belady noted.
例如,這種箱體可以部署在城市中心的5G蜂窩通信塔下面,用于自動(dòng)駕駛汽車等應(yīng)用。
Such a tank, for example, could be deployedunder a 5G cellular communications tower in the middle of a city forapplications such as self-driving cars.
到目前為止,Microsoft在超大規(guī)模數(shù)據(jù)中心中只有一個(gè)運(yùn)行工作負(fù)載的箱體。在接下來的幾個(gè)月中,Microsoft團(tuán)隊(duì)將進(jìn)行一系列測(cè)試,以證明箱體和這項(xiàng)技術(shù)的可行性。
For now, Microsoft has one tank runningworkloads in a hyperscale datacenter. For the next several months, the Microsoftteam will perform a series of tests to prove the viability of the tank and thetechnology.
Belady說:“第一步是讓人們對(duì)這一概念感到舒適,并表明我們可以運(yùn)行生產(chǎn)工作負(fù)載?!?/span>
“This first step is about making people feelcomfortable with the concept and showing we can run production workloads,”Belady said.
在哥倫比亞河?xùn)|岸的這個(gè)數(shù)據(jù)中心,微軟員工之間發(fā)送的電子郵件和其他通信實(shí)際上正在讓一個(gè)裝滿計(jì)算機(jī)服務(wù)器的鋼制容器內(nèi)的液體沸騰。
Emails and other communications sent betweenMicrosoft employees are literally making liquid boil inside a steel holdingtank packed with computer servers at this datacenter on the eastern bank of theColumbia River.
與水不同的是,沙發(fā)形儲(chǔ)罐內(nèi)的液體對(duì)電子設(shè)備無害,設(shè)計(jì)成沸騰溫度為122華氏度,比水的沸點(diǎn)低90度。
Unlike water, the fluid inside the couch-shapedtank is harmless to electronic equipment and engineered to boil at 122 degreesFahrenheit, 90 degrees lower than the boiling point of water.
服務(wù)器正在做的工作產(chǎn)生的沸騰效應(yīng)將熱量從承擔(dān)繁重工作的計(jì)算機(jī)處理器中帶走。低溫煮沸使服務(wù)器能夠在全功率下連續(xù)運(yùn)行,而不會(huì)因過熱而導(dǎo)致故障。
The boiling effect, which is generated by thework the servers are doing, carries heat away from laboring computerprocessors. The low-temperature boil enables the servers to operatecontinuously at full power without risk of failure due to overheating.
在儲(chǔ)罐內(nèi)部,從沸騰的流體中升起的蒸汽與儲(chǔ)罐蓋中的冷卻冷凝器接觸,這導(dǎo)致蒸汽變?yōu)橐后w,然后如下雨般滴回到浸入式服務(wù)器中,從而形成了一個(gè)閉環(huán)冷卻系統(tǒng)。
Inside the tank, the vapor rising from the boilingfluid contacts a cooled condenser in the tank lid, which causes the vapor tochange to liquid and rain back onto the immersed servers, creating a closedloop cooling system.
位于華盛頓州雷蒙德市的微軟數(shù)據(jù)中心高級(jí)開發(fā)團(tuán)隊(duì)的首席硬件工程師胡薩姆·艾麗薩(Husam Alissa)說:“我們是第一家在生產(chǎn)環(huán)境中運(yùn)行兩相浸沒冷卻的云提供商?!?/span>
“We are thefirst cloud provider that is running two-phase immersion cooling in aproduction environment,”said Husam Alissa, a principal hardware engineer onMicrosoft’s team for datacenter advanced development in Redmond, Washington.
數(shù)據(jù)中心的摩爾定律
Moore’s Law for the datacenter
在風(fēng)冷計(jì)算機(jī)芯片技術(shù)的可靠發(fā)展減慢之際,Microsoft長(zhǎng)期計(jì)劃的下一步就是在生產(chǎn)環(huán)境中部署兩相浸入式冷卻,以適應(yīng)對(duì)更快,功能更強(qiáng)大的數(shù)據(jù)中心計(jì)算機(jī)的需求。
The production environment deployment oftwo-phase immersion cooling is the next step in Microsoft’s long-term plan tokeep up with demand for faster, more powerful datacenter computers at a timewhen reliable advances in air-cooled computer chip technology have slowed.
幾十年來,芯片的進(jìn)步源于將更多的晶體管封裝到相同尺寸的芯片上的能力,這使得計(jì)算機(jī)處理器的速度每?jī)赡甏蠹s翻一番,而不會(huì)增加其電力需求。
For decades, chip advances stemmed from theability to pack more transistors onto the same size chip, roughly doubling thespeed of computer processors every two years without increasing their electricpower demand.
這種倍增現(xiàn)象被稱為摩爾定律摩爾定律在1965年觀察到了這一趨勢(shì),并預(yù)測(cè)這種趨勢(shì)將持續(xù)至少十年。它一直持續(xù)到2010年代,現(xiàn)在開始放慢速度。
This doubling phenomenon is called Moore’s Lawafter Intel co-founder Gordon Moore, who observed the trend in 1965 andpredicted it would continue for at least a decade. It held through the 2010sand has now begun to slow.
那是因?yàn)榫w管的寬度已經(jīng)縮小到原子級(jí),并且已經(jīng)達(dá)到物理極限。同時(shí),Alissa指出,對(duì)諸如人工智能等高性能應(yīng)用的更快計(jì)算機(jī)處理器的需求正在加速增長(zhǎng)。
That’s because transistor widths have shrunk tothe atomic scale and are reaching a physical limit. Meanwhile, the demand forfaster computer processors for high performance applications such as artificialintelligence has accelerated, Alissa noted.
為了滿足性能需求,計(jì)算行業(yè)已經(jīng)轉(zhuǎn)向可以允許更多電能消耗的芯片架構(gòu)。例如,中央處理單元或CPU已從每個(gè)芯片150瓦增加到300瓦以上。圖形處理單元(GPU)已增加到每個(gè)芯片700瓦以上。
To meet the need for performance, the computingindustry has turned to chip architectures that can handle more electric power.Central processing units, or CPUs, have increased from 150 watts to more than300 watts per chip, for example. Graphics processing units, or GPUs, haveincreased to more than 700 watts per chip.
輸入到這些處理器的電能越多,芯片就會(huì)變得越熱。增加的熱量提升了冷卻要求,以防止芯片發(fā)生故障。
The more electric power pumped through theseprocessors, the hotter the chips get. The increased heat has ramped up coolingrequirements to prevent the chips from malfunctioning.
位于Redmond的微軟數(shù)據(jù)中心高級(jí)開發(fā)小組杰出工程師及副總裁克里斯蒂安·貝拉迪(Christian Belady)說:“風(fēng)冷不夠了?!薄斑@就是驅(qū)使我們進(jìn)行浸入式冷卻的原因,我們可以在其中直接將芯片的表面煮沸?!?/span>
“Air cooling is not enough,”said ChristianBelady, distinguished engineer and vice president of Microsoft’s datacenteradvanced development group in Redmond. “That’s what’s driving us to immersioncooling, where we can directly boil off the surfaces of the chip.”
他指出,液體中的熱傳遞比空氣更有效。
Heat transfer in liquids, he noted, is ordersof magnitude more efficient than air.
他補(bǔ)充說,此外,向液體冷卻的轉(zhuǎn)變?yōu)檎麄€(gè)數(shù)據(jù)中心帶來了類似摩爾定律的思維模式。
What’s more, he added, the switch to liquidcooling brings a Moore’s Law-like mindset to the whole of the datacenter.
他說:“液體冷卻使我們能夠變得更密集,從而在數(shù)據(jù)中心級(jí)別上繼續(xù)保持摩爾定律的趨勢(shì)。”
“Liquid cooling enables us to go denser, andthus continue the Moore’s Law trend at the datacenter level,”he said.
從加密貨幣礦工那里學(xué)到的教訓(xùn)
Lesson learned from cryptocurrency miners
Belady指出,液體冷卻是一種行之有效的技術(shù)。如今,道路上的大多數(shù)汽車都依靠它來防止發(fā)動(dòng)機(jī)過熱。包括微軟在內(nèi)的多家技術(shù)公司正在試驗(yàn)冷板技術(shù),該技術(shù)通過將液體通過金屬板輸送到服務(wù)器來冷卻服務(wù)器。
Liquid cooling is a proven technology, Beladynoted. Most cars on the road today rely on it to prevent engines fromoverheating. Several technology companies, including Microsoft, areexperimenting with cold plate technology, in which liquid is piped throughmetal plates, to chill servers.
加密貨幣行業(yè)的參與者率先開發(fā)了用于計(jì)算設(shè)備的浸沒式液冷,利用它來冷卻記錄數(shù)字貨幣交易的芯片。
Participants in the cryptocurrency industrypioneered liquid immersion cooling for computing equipment, using it to coolthe chips that log digital currency transactions.
微軟深入研究了液體浸沒作為AI等高性能計(jì)算應(yīng)用程序的冷卻解決方案。除其他事項(xiàng)外,調(diào)查顯示,兩相浸入式冷卻可將任何給定服務(wù)器的功耗降低5%至15%。
Microsoft investigated liquid immersion as acooling solution for high-performance computing applications such as AI. Amongother things, the investigation revealed that two-phase immersion cooling reducedpower consumption for any given server by 5% to 15%.
這些發(fā)現(xiàn)促使微軟團(tuán)隊(duì)與數(shù)據(jù)中心IT系統(tǒng)制造商和設(shè)計(jì)師Wiwynn合作開發(fā)了兩階段浸入式冷卻解決方案。第一個(gè)解決方案現(xiàn)在在昆西的Microsoft數(shù)據(jù)中心運(yùn)行。
The findings motivated the Microsoft team towork with Wiwynn, a datacenter IT system manufacturer and designer, to develop atwo-phase immersion cooling solution. The first solution is now running atMicrosoft’s datacenter in Quincy.
沙發(fā)形的水箱中充滿了3M的特別設(shè)計(jì)研發(fā)的流體。3M的液體冷卻液具有介電特性,使其成為有效的絕緣體,使服務(wù)器在完全浸入液體的情況下仍能正常運(yùn)行。
That couch-shaped tank is filled with anengineered fluid from 3M. 3M’s liquid cooling fluids have dielectric propertiesthat make them effective insulators, allowing the servers to operate normallywhile fully immersed in the fluid.
微軟技術(shù)研究員兼公司副總裁、Azure計(jì)算首席架構(gòu)師馬庫斯·豐圖拉(MarcusFonoura)表示,這種向兩階段液體浸泡冷卻的轉(zhuǎn)變?yōu)楦咝Ч芾碓瀑Y源提供了更大的靈活性。
This shift to two-phase liquid immersioncooling enables increased flexibility for the efficient management of cloudresources, according to Marcus Fontoura, a technical fellow and corporate vicepresident at Microsoft who is the chief architect of Azure compute.
例如,管理云資源的軟件可以將數(shù)據(jù)中心計(jì)算需求的突然峰值分配給液冷箱中的服務(wù)器。這是因?yàn)檫@些服務(wù)器可以在更高的功率下運(yùn)行-這一過程被稱為超頻-而不會(huì)有過熱的風(fēng)險(xiǎn)。
For example,software that manages cloud resources can allocate sudden spikes in datacentercompute demand to the servers in the liquid cooled tanks. That’s because theseservers can run at elevated power –a process called overclocking –withoutrisk of overheating.
方圖拉說:“例如,當(dāng)你到達(dá)1點(diǎn)鐘或2點(diǎn)鐘的時(shí)候,Teams就會(huì)出現(xiàn)一個(gè)巨大的峰值,因?yàn)槿藗冊(cè)谕粫r(shí)間加入會(huì)議?!薄敖胧嚼鋮s為我們提供了更大的靈活性來處理這些突發(fā)性工作負(fù)載?!?/p>
“For instance, we know that with Teams when youget to 1 o’clock or 2 o’clock, there is a huge spike because people are joiningmeetings at the same time,”Fontoura said. “Immersion cooling gives us moreflexibility to deal with these burst-y workloads.”
可持續(xù)數(shù)據(jù)中心
Sustainable datacenters
Fonoura補(bǔ)充說,將兩階段沉浸冷卻服務(wù)器添加到可用的計(jì)算資源組合中,還將允許機(jī)器學(xué)習(xí)軟件在整個(gè)數(shù)據(jù)中心(從電力和冷卻到維護(hù)技術(shù)人員)更有效地管理這些資源。
Adding the two-phase immersion cooled serversto the mix of available compute resources will also allow machine learningsoftware to manage these resources more efficiently across the datacenter, frompower and cooling to maintenance technicians, Fontoura added.
他說:“我們不僅會(huì)對(duì)效率產(chǎn)生巨大影響,還會(huì)對(duì)可持續(xù)性產(chǎn)生巨大影響,因?yàn)槟阋_保不會(huì)浪費(fèi),確保我們部署的每一件IT設(shè)備都能得到很好的利用?!?/span>
“We will have not only a huge impact onefficiency, but also a huge impact on sustainability because you make sure thatthere is not wastage, that every piece of IT equipment that we deploy will bewell utilized,”he said.
液體冷卻也是一種無水技術(shù),這將幫助微軟實(shí)現(xiàn)到本世紀(jì)末補(bǔ)水量超過消耗量的承諾。
Liquid cooling is also a waterless technology,which will help Microsoft meet its commitmentto replenish more water than it consumes bythe end of this decade.
流經(jīng)儲(chǔ)罐并使蒸汽凝結(jié)的冷卻盤管連接到一個(gè)單獨(dú)的閉環(huán)系統(tǒng),該系統(tǒng)使用流體將熱量從儲(chǔ)罐轉(zhuǎn)移到儲(chǔ)罐容器外的干式冷卻器。艾麗莎解釋說,因?yàn)檫@些盤管中的流體總是比周圍的空氣更熱,所以沒有必要噴水來調(diào)節(jié)空氣的蒸發(fā)冷卻。
The cooling coils that run through the tank andenable the vapor to condense are connected to a separate closed loop systemthat uses fluid to transfer heat from the tank to a dry cooler outside thetank’s container. Because the fluid in these coils is always warmer than theambient air, there’s no need to spray water to condition the air forevaporative cooling, Alissa explained.
微軟與基礎(chǔ)設(shè)施行業(yè)合作伙伴一起,也在研究如何以減少流體損失并且對(duì)環(huán)境幾乎沒有影響的方式來運(yùn)行儲(chǔ)罐。
Microsoft, together with infrastructureindustry partners, is also investigating how to run the tanks in ways thatmitigate fluid loss and will have little to no impact on the environment.
Azure首席軟件工程師伊安尼斯·馬努薩基斯(IoannisManousakis)表示:“如果方法得當(dāng),兩相浸沒冷卻將同時(shí)實(shí)現(xiàn)我們所有的成本、可靠性和性能要求,而與空氣冷卻相比,其能耗基本上只有一小部分?!?/span>
“If done right, two-phase immersion coolingwill attain all our cost, reliability and performance requirementssimultaneously with essentially a fraction of the energy spend compared to aircooling,”said IoannisManousakis, a principal software engineer with Azure.
‘我們把海帶到了服務(wù)器上’
‘We brought the sea to the servers’
微軟對(duì)兩相浸沒式冷卻的深入研究是該公司多管齊下的戰(zhàn)略的一部分,該戰(zhàn)略旨在使數(shù)據(jù)中心的構(gòu)建,運(yùn)營(yíng)和維護(hù)更加可持續(xù)和高效。
Microsoft’s investigation into two-phaseimmersion cooling is part of the company’s multi-pronged strategy to makedatacenters more sustainable and efficient to build, operate and maintain.
例如,數(shù)據(jù)中心高級(jí)開發(fā)團(tuán)隊(duì)還正在探索使用氫燃料電池代替柴油發(fā)電機(jī)在數(shù)據(jù)中心進(jìn)行備用發(fā)電的可能性。
For example, the datacenter advanceddevelopment team is also exploring the potential to usehydrogen fuel cells instead of diesel generators for backuppower generation at datacenters.
液體冷卻項(xiàng)目類似于微軟的Natick項(xiàng)目,該項(xiàng)目正在探索水下數(shù)據(jù)中心的可能性,這些數(shù)據(jù)中心可以快速部署,并且可以在海床上密封于類似潛艇的管狀容器內(nèi)運(yùn)行數(shù)年,而無需人工進(jìn)行任何現(xiàn)場(chǎng)維護(hù)。
The liquid cooling project is similar to Microsoft’sProject Natick, which is exploring the potential ofunderwater datacenters that are quick to deploy and can operate for years onthe seabed sealed inside submarine-like tubes without any onsite maintenance bypeople.
水下數(shù)據(jù)中心充斥著干燥的氮?dú)饪諝?,而不是特別設(shè)計(jì)研發(fā)的流體。服務(wù)器用風(fēng)扇和熱交換管道系統(tǒng)冷卻,管道系統(tǒng)通過密封的管道輸送海水。
Instead of an engineered fluid, the underwaterdatacenter is filled with dry nitrogen air. The servers are cooled with fansand a heat exchange plumbing system that pumps piped seawater through thesealed tube.
來自Project Natick的一個(gè)關(guān)鍵發(fā)現(xiàn)是,海底服務(wù)器的故障率是陸地?cái)?shù)據(jù)中心復(fù)制服務(wù)器故障率的八分之一。初步分析表明,缺乏濕度和氧氣的腐蝕作用是水下服務(wù)器性能優(yōu)越的主要原因。
A key finding from Project Natick is that theservers on the seafloor experienced one-eighth the failure rate of replicaservers in a land datacenter. Preliminary analysis indicates that the lack ofhumidity and corrosive effects of oxygen were primarily responsible for thesuperior performance of the servers underwater.
Alissa預(yù)計(jì),液浸箱中的服務(wù)器將體驗(yàn)到類似的卓越性能。他說:“我們把大海帶到了服務(wù)器上,而不是把數(shù)據(jù)中心放在海底。”
Alissa anticipates the servers inside theliquid immersion tank will experience similar superior performance. “We broughtthe sea to the servers rather than put the datacenter under the sea,”he said.
Azure的首席軟件工程師IoannisManousakis從Microsoft數(shù)據(jù)中心的兩相浸入式冷卻水箱中卸下了刀片服務(wù)器。由Gene Twedt為Microsoft攝影。
IoannisManousakis,a principal software engineer with Azure, removes a server blade from atwo-phase immersion cooling tank at a Microsoft datacenter. Photo by Gene Twedtfor Microsoft.
未來
The future
如果浸沒式箱體中的服務(wù)器的故障率如預(yù)期的那樣降低,則Microsoft可以轉(zhuǎn)換到一種模式,即在出現(xiàn)故障時(shí)不立即更換組件。這將限制蒸氣損失,并允許將油箱部署在偏遠(yuǎn)且難以維修的位置。
If the servers in the immersion tank experiencereduced failure rates as anticipated, Microsoft could move to a model wherecomponents are not immediately replaced when they fail. This would limit vaporloss as well as allow tank deployment in remote, hard-to-service locations.
此外,Belady指出,能夠?qū)⒎?wù)器密集地包裝在儲(chǔ)罐中,從而實(shí)現(xiàn)了重新構(gòu)想的服務(wù)器體系結(jié)構(gòu),該體系結(jié)構(gòu)針對(duì)低延遲,高性能應(yīng)用程序和低維護(hù)操作進(jìn)行了優(yōu)化。
What’s more, the ability to densely packservers in the tank enables a re-envisioned server architecture that’soptimized for low-latency, high-performance applications as well aslow-maintenance operation, Belady noted.
例如,這種箱體可以部署在城市中心的5G蜂窩通信塔下面,用于自動(dòng)駕駛汽車等應(yīng)用。
Such a tank, for example, could be deployedunder a 5G cellular communications tower in the middle of a city forapplications such as self-driving cars.
到目前為止,Microsoft在超大規(guī)模數(shù)據(jù)中心中只有一個(gè)運(yùn)行工作負(fù)載的箱體。在接下來的幾個(gè)月中,Microsoft團(tuán)隊(duì)將進(jìn)行一系列測(cè)試,以證明箱體和這項(xiàng)技術(shù)的可行性。
For now, Microsoft has one tank runningworkloads in a hyperscale datacenter. For the next several months, the Microsoftteam will perform a series of tests to prove the viability of the tank and thetechnology.
Belady說:“第一步是讓人們對(duì)這一概念感到舒適,并表明我們可以運(yùn)行生產(chǎn)工作負(fù)載?!?/span>
“This first step is about making people feelcomfortable with the concept and showing we can run production workloads,”Belady said.
文章來源:CDCC
免責(zé)聲明:本網(wǎng)站所收集的部分公開資料來源于互聯(lián)網(wǎng),轉(zhuǎn)載的目的在于傳遞更多信息及用于網(wǎng)絡(luò)分享,并不代表本站贊同其觀點(diǎn)和對(duì)其真實(shí)性負(fù)責(zé),也不構(gòu)成任何其他建議,文章內(nèi)容僅供參考。如果您發(fā)現(xiàn)網(wǎng)站上有侵犯您的知識(shí)產(chǎn)權(quán)的作品,請(qǐng)與我們?nèi)〉寐?lián)系,我們會(huì)及時(shí)修改或刪除。