CRAY Chapel Aggregation Library (CAL) November 12, 2018 - Manuels Informatiques

Ok

En poursuivant votre navigation sur ce site, vous acceptez l'utilisation de cookies. Ces derniers assurent le bon fonctionnement de nos services. En savoir plus.

CRAY Chapel Aggregation Library (CAL) November 12, 2018

Au format texte : Chapel Aggregation Library (CAL) November 12, 2018 Louis Jenkins Marcin Zalewski (Pacific Northwest National Lab.), Michael Ferguson (Cray Inc.) The Problem • Accessing remote data is slow § Multiple orders of magnitude slower to access than local memory Node #1 Data Node #0 Task Node #0 RAM Load – !"" #$ Store – !"" #$ GET – 2%$ L1 L2 ". ' #$ ( #$ PUT – 1%$ The Problem • Accessing remote data is slow § Multiple orders of magnitude slower to access than local memory • “Moving the computation to the data” not always the best solution § Using an !" statement requires migrating tasks to another locale Node #1 Data Node #0 Task Node #1 Task The Problem • Accessing remote data is slow § Multiple orders of magnitude slower to access than local memory • “Moving the computation to the data” not always the best solution § Using an !" statement requires migrating tasks to another locale ĂĽ Can become bottleneck if fine-grained The Problem • Accessing remote data is slow § Multiple orders of magnitude slower to access than local memory • “Moving the computation to the data” not always the best solution § Using an !" statement requires migrating tasks to another locale ĂĽ Can become bottleneck if fine-grained ĂĽ Task creation is relatively expensive • Tasks are too large to spawn in a fire-and-forget manner (issue #9984) • Migrating tasks require individual active messages (issue #9727) Node #1 Heap Task Task Stack Task Stack Task Stack Task Stack Task Stack Task Stack Task Task Task Task Task A Solution • Coarsen the granularity of the data § Buffer units of data to be sent to a locale in destination buffers From: Locale #0 To: Locale #1 A Solution • Coarsen the granularity of the data § Buffer units of data to be sent to a locale in destination buffers From: Locale #0 To: Locale #1 A Solution • Coarsen the granularity of the data § Buffer units of data to be sent to a locale in destination buffers From: Locale #0 To: Locale #1 A Solution • Coarsen the granularity of the data § Buffer units of data to be sent to a locale in destination buffers § When buffer is full, it can be flushed to be handled by the user From: Locale #0 To: Locale #1 Locale #0 Task Locale #1 Task Locale #0 Data Send to Locale #1 A Solution • Coarsen the granularity of the data § Buffer units of data to be sent to a locale in destination buffers § When buffer is full, it can be flushed to be handled by the user § User can perform coalescing to combine aggregated data From: Locale #0 To: Locale #1 Locale #0 Task Locale #1 Task Coalesced Send to Locale #1 Data Locale #0 Coalesced Data Communications Layer Tasking Layer Memory Layer Atomics Implementation begin-stmt on-stmt forall-stmt coforall-stmt User Program Chapel’s Multiresolution Design Philosophy • Higher Level composed of Lower Level abstractions, features, and language constructs § Changes to lower level propagate up to higher level § User free to use either ĂĽ High-Level for convenience ĂĽ Low-Level for performance Global-View Programming • Abstracts locality for the user § No need to think: “What portion of the array does this task own?” § Array can be accessed from any locale, even if it is not distributed over that locale… ĂĽ Remote references are resolved into remote PUT/GET implicitly R 7HQi ;HQ#HamK 4 yc k 7HQi HQ+HamK 4 yc j 7Q` UBMi B 4 HQ+Hai`ic B I HQ+H1M/c BYYV & 9 HQ+HamK Y4 ``(B)c 8 ' e JSAn_1.l*1UHQ+HamK- ;HQ#HamK- XXXVc R p` bmK , 7HQic k 7Q`HH  BM `` rBi? UY `2/m+2 bmKV & j bmK Y4 c 9 ' Chapel MPI Global-View Programming • Abstracts locality for the user § No need to think: “What portion of the array does this task own?” § Array can be accessed from any locale, even if it is not distributed over that locale… ĂĽ Remote references are resolved into remote PUT/GET implicitly • Multiresolution: More Abstraction R 7HQi ;HQ#HamK 4 yc k 7HQi HQ+HamK 4 yc j 7Q` UBMi B 4 HQ+Hai`ic B I HQ+H1M/c BYYV & 9 HQ+HamK Y4 ``(B)c 8 ' e JSAn_1.l*1UHQ+HamK- ;HQ#HamK- XXXVc Chapel MPI R p` bmK 4 Y `2/m+2 ``c Global-View Programming • Abstracts locality for the user § No need to think: “What portion of the array does this task own?” § Array can be accessed from any locale, even if it is not distributed over that locale… ĂĽ Remote references are resolved into remote PUT/GET implicitly • Multiresolution: Less Abstraction R 7HQi ;HQ#HamK 4 yc k 7HQi HQ+HamK 4 yc j 7Q` UBMi B 4 HQ+Hai`ic B I HQ+H1M/c BYYV & 9 HQ+HamK Y4 ``(B)c 8 ' e JSAn_1.l*1UHQ+HamK- ;HQ#HamK- XXXVc Chapel MPI R p` bmK , 7HQic k +Q7Q`HH HQ+ BM GQ+H2b rBi? UY `2/m+2 bmKV /Q QM HQ+ & j +Q7Q`HH iB/ BM yXXO?2`2XKthbFS` rBi? UY `2/m+2 bmKV & 9 7Q` B BM +QKTmi2_M;2U``X/QKBMXHQ+Ham#/QKBMUV- iB/V & 8 bmK Y4 ``(B)c e ' d ' 3 ' Chapel Aggregation Library (CAL) • Written in Chapel, for Chapel § Minimal and User-Friendly ĂĽ Unassuming of how data is handled ĂĽ Designed specifically for Chapel § Distributed, Scalable, and Parallel-Safe ĂĽ Supports Global-View Programming ĂĽ Usable with Chapel’s parallel and locality constructs § Modular, Reusable, and Generic ĂĽ Generic on user-defined type ĂĽ Easy to use and ’plug in’ R +QMbi Kb; 4 ]6`QK GQ+H2Oy iQ GQ+H2OR]c k +QMbi HQ+ 4 GQ+H2b(R)c j p` ;;`2;iQ` 4 M2r ;;`2;iQ`Ubi`BM;Vc 9 p` #m772` 4 ;;`2;iQ`X;;`2;i2UKb;- HQ+Vc 8 B7 #m772` 54 MBH i?2M ?M/H2"m772`U#m772`Vc e (U#m7- HQ+V BM ;;`2;iQ`X7Hmb?UV) QM HQ+ /Q ?M/H2"m772`U#m7Vc Minimalism • CAL is an aggregation library § Processing of the aggregated data is deferred to the user § Buffer is returned to the last task that filled it • Use privatization to enable global-view programming § GlobalClass forwards access to per-locale LocalClass privatized instances § Each privatized instance can communicate and coordinate with others Distributed Object Pattern Locale#0 Locale#N ! !"#$%# !"#$"%& '()! " "&'(") *+(,#+-+#.. * # /0!& +#"&&,%-./ $ 1#" -01 2 23// % & 345$"51067 +8-#97.:;50<":0=.1>4-%?-01@ +#"&&,%-.A/ ' B ! !"### $%!#"&"### ! " '#( "#$ % )*+& # ' ! !"### $%!#"&"### ! " '#( "#$ % )*+& # ' • Aggregator forwards all accesses to per-locale privatized instances • Distributed and parallel access is abstracted § Supports global-view programming Aggregator ! !"#$%# !"#$"%& '()! " "&'(") *$$"&$#+(" * # +,!& +,-.%/01 $ -#" /23 4 ./+1 % & -56$"63278 9:/#;80<=62>"<2?03@5/%A/23B +,-.%/0C1 ' D ! '!"## $('"!%&''(" ! ! )0!( "1 " 1"" #$% & */)1 $ 1"" '(--)6& & *+,,-7(./01"2)&3 4(--)650026"71 % 8 ! '!"## $('"!%&''(" ! ! )0!( "1 " 1"" #$% & */)1 $ 1"" '(--)6& & *+,,-7(./01"2)&3 4(--)650026"71 % 8 Locale#0 Locale#N Aggregator - Performance • 10x – 100x speedup at 32 nodes § Histogram § Hypergraph Generation • Aggregator is allocated on Locale#0, but accessible from Locale#1 § Accesses are forwarded to Locale#1’s privatized instance § Global-View Programming • Implicit parallelism (line 9) vs Explicit parallelism (line 11) Distributed - Example R p` ;;`2;iQ` 4 M2r ;;`2;iQ`UBMiVc k ff JB;`i2 iQ GQ+H2 OR 7`QK GQ+H2 Oy j QM GQ+H2b(R) & 9 ff ;;`2;i2 bBM;H2 pHm2 iQ GQ+H2 Oy 8 p` #m772` 4 ;;`2;iQ`X;;`2;i2Uy- GQ+H2b(y)Vc e ff A7 MQM@MBH- i?2M ?M/H2 #m772`X d B7 #m772` 54 MBH i?2M ?M/H2"m772`U#m772`Vc 3 ff ;;`2;i2 KmHiBTH2 mMBib Q7 /i pB *?T2H^b BKTHB+Bi T`HH2HBbK N p` #m772`b 4 ;;`2;iQ`X;;`2;i2URXXRyk9- GQ+H2b(y)Vc Ry ff *?2+F B7 Mv Q7 i?2 #m772`b `2 MBH RR (#m7 BM #m772`b) B7 #m7 54 MBH i?2M ?M/H2"m772`U#m7Vc Rk ' • Composition of Distributed Objects § Aggregator can be used within other global-view data structures § Future of Distributed Object Oriented Programming (?) Modularity Locale#0 Locale#N ! !"#$%# !"#$"%& '()! " "&'(") *+(,#+-+#.. * # /0!& +#"&&,%-./ $ 1#" -01 2 23// % & 345$"51067 +8-#97.:;50<":0=.1>4-%?-01@ +#"&&,%-.A/ ' B ! !"### $%!#"&"### ! ! '()* "# " +#, $%& ' -.'# $ +#, ())*+)(",* ' -))*+)(",*."/# % 0 ! !"### $%!#"&"### ! ! '()* "# " +#, $%& ' -.'# $ +#, ())*+)(",* ' -))*+)(",*."/# % 0 Future Works • Software release of CAL § Currently only available as module under Chapel HyperGraph Library (CHGL) ĂĽ github.com/pnnl/chgl § Independent release coming soon (?) • Integration into Chapel § Mason package or Standard Module (?) § Run-time integration • Aggregation handlers as first-class functions § Once Chapel has better first-class function support Potential Application Light Weight Tasks (LWT) • Chapel Tasks are infeasible to use in fire-and-forget manner § Stack size of tasks in Chapel are static and large (8MB default) § Task migration can be made asynchronous but is not aggregated • Solution – Make a library for LWT § Use Distributed Object pattern for GlobalView programming § Use Aggregator for aggregation § Use First-Class Functions (once improved) to represent a lightweight task R p` Hri 4 M2r GqhUpBbBiVc k T`Q+ pBbBiUp , o2`i2tV & j 7Q` pp BM M2B;?#Q`bUpV & 9 B7 ?bS`QT2`ivUppV & 8 HriXbTrMUpp- ppXHQ+H2Vc e ' d ' 3 ' N 7Q`HH p BM p2`iB+2b & Ry B7 ?bS`QT2`ivUpV & RR HriXbTrMUpVc Rk ' Rj ' Vertex Degree Distribution R ff 6BM/ H`;2bi /2;`22 Q7 HH p2`iB+2b BM /Bbi`B#mi2/ ;`T? k p` L 4 Kt `2/m+2 (p BM ;`T?X;2io2`iB+2bUV) ;`T?X/2;`22UpVc j ff >BbiQ;`K Bb +v+HB+HHv /Bbi`B#mi2/ Qp2` HH HQ+H2b 9 p` ?BbiQ;`K.QKBM 4 &RXXL' /KTT2/ *v+HB+Ubi`iA/t4RVc 8 p` ?BbiQ;`K , (?BbiQ;`K.QKBM) iQKB+ BMic e d ff ;;`2;i2 BM+`2K2Mib iQ ?BbiQ;`K 3 p` ;;`2;iQ` 4 M2r ;;`2;iQ`UBMiVc N 7Q`HH p BM ;`T?X;2io2`iB+2bUV & Ry +QMbi /2; 4 ;`T?X/2;`22UpVc RR +QMbi HQ+ 4 ?BbiQ;`K(/2;)XHQ+H2c Rk p` #m772` 4 ;;`2;iQ`X;;`2;i2U/2;- HQ+Vc Rj B7 #m772` 54 MBH & R9 QM HQ+ /Q (/2; BM #m772`) ?BbiQ;`K(/2;)X//URVc R8 #m772`X/QM2UVc Re ' Rd ' R3 RN ff 6Hmb? ky 7Q`HH U#m7- HQ+V BM ;;`2;iQ`X7Hmb?UV & kR QM HQ+ /Q (/2; BM #m7) ?BbiQ;`K(/2;)X//URVc kk #m772`X/QM2UVc kj '

Écrire un commentaire

Optionnel