WP3: Implementazione del prototipo al CNAF Status
Obiettivi WP3 Cluster SoC, low-power, no-low-latency network Technology tracking Progettazione, installazione, gestione cluster Testing, tuning e benchmarking Installazione, configurazione e mantenimento dei tool software richiesti dagli altri WP per lo studio delle prestazioni e per l’utilizzo del cluster (compilatori, librerie, framework di sviluppo, etc.) Verifica con gli altri WP delle alternative a software e compilatori/librerie non presenti (o rimaneggiati) per le nuove architetture SoC. Studio e implementazione software di monitoraggio delle principali metriche di interesse del progetto Il WP3 dipende dal WP2 fino al PM9 per quanto riguarda la decisione della piattaforma SoC su cui basare il cluster al CNAF
Status cluster CNAF pre-COSA
Status cluster CNAF (COSA “certified”) #ModelSoCISACoresTDP 1SupermicroIntel C2750 x86-648x Avoton20 6Jetson-k1-01Nvidia K1 ARMv74x A1515 1Odroid-XU3Samsung Exynos 5422 ARMv74x A154x A75 1CubieboardAllwinner A80 ARMv74x A154x A75 1Odroid-XUSamsung Exynos 5410 ARMv74x A15(4x A7)5 1ArndaleSamsung Exynos 5420 ARMv74x A15(4x A7)5 1SABREboardFreescale i.MX6 ARMv74x A95 12ARMv7 CardsARMv720x A154x A98x A740W
Cluster network MASTER (fanless x86-64) MASTER (fanless x86-64) ARM /24 eth1 eth0 CNAF ssh /23
Cluster services (provided by the master) FIREWALL ALLOW RULES SSH/APACHE FOR EVERYONE DHCP/BOOTP/DNS/TFTP/LDAP FOR CLUSTER NODES NATA UNIQUE EXTERNAL IP DHCPFOR CLUSTER NODES (A DEDICATED PORT) TFTPFOR BOOTP/PXE INSTALLATION NFSFOR CLUSER NODES LDAPFOR CLUSTER USERS (SLURM)FOR CLUSTER NODES SW IS CURRENTLY INSTALLED AS A BARE METAL (X86 HW) NEXT STEP: SW PACKED IN A VM OR A DOCKER CONTAINER
COSA power network Card 12V 5V 230V AC Power probe LabVIEW DC probe Card
PSU&Cables PSU HX1000i 12 linee 12V (Jetson) 6 linee 5V Cavi GRIDSEED Da 1 MOLEX a 6 BARREL
Measure and lab power equipment POWER SUPPLY POWER ANALYZER DIGITAL MULTIMETER
AllWinner A80 Nvidia Tegra K1 Samsung Exynos 5422 CPU4x A15 + 4x A74x A154x A15 + 4x A7 L1 Cache32KB/32KB L2 Cache2MB + 512KB GPU PowerVR G6230 (64cores) Kepler GK20a (192 cores) ARM Mali-T628 MP6 GPU API OpenGL ES 3.0 OpenCL 1.x Directx 9.3 OpenGL ES 3.1 OpenGL 4.4 OpenCL 1.2 CUDA 6.0 Directx 12 OpenGL ES 3.0 OpenCL 1.1 DirectX 11 Decoder1080p30: H.265/VP91440p30: H.264/VP8 H.264/VP8 H.264 and VP81440p : H.264/VP8 H.264/VP8 Memory Interfaces DDR3/DDR3L/LPDDR3 (8GB) Raw NAND 72-bit ECC eMMC v4.5 DDR3L,LPDDR3(8GB) eMMC 4.5 LPDDR3/DDR3 eMMC 5.0 TUNING A TRUE ETHEROGENEOUS ENVIRONMENT
AllWinner A80 Nvidia Tegra K1 Samsung Exynos 5422 USB 2x USB host 1x USB3.0/2.0 host / device HSIC 2x USB 3.0 3x USB 2.0 HSIC 2x USB 3.0 1x USB 2.0 1x HSIC Ethernet1x Ethernet MACN/A TS Interface No data1x TS SATAN/ASATA 3.1N/A PCIeN/A 5-lane PCIe with Gen1 (2.5GT/s) and Gen 2 (5.0 GT/s) speeds N/A Audio I/FPCM/I2SPCM/I2S, S/PDIF 1x PCM, 2x I2S, 1x S/PDIF Other I/Os4x SPI, 7x TWI, 7x UART 3x I2C, 2x SPI, UART, Up to 64 MPIO (Multi Purpose IO) 4x I2C, 7x HS-I2C, 3x SPI, 5x UART, GPIOs, 24-channel DMA controller
TUNING (1/4) Cores frequencies Max freq JETSON4 A152.3 GHz ODROID-XU34 A152.0 GHz 4 A71.4 GHz CUBIEBOARD4 A151.6 GHz 4 A71.2 GHz cpufreq utils for ARM/Intel cores nvi utils for Nvidia MP cores online/offline cores
TUNING (2/4) Memory bandwidth Single and dual channel LPDDRx, DDRx GPU CPU memory is shared (no data transfer between different memories as traditional CPU/GPU architecture) Benchmarks: STREAM and others
TUNING (3/4) Storage speed SD eMMC 5.0 (supposed to be the fastest) SSD (if SATA interface is present) NFS Benchmark: IOZONE, BONNIES++ and others
TUNING (4/4) Net bandwidth and latency PHY ETHERNET LINK JETSONPCI-E ODROID-XU3USB-ETH0 BRIDGE CUBIEBOARD Directly to the MAC link of SoC Benchmarks: IPperf and others
Il mercato SoC il mercato delle schede basate su SoC ARM dopo un biennio (2013/2014) ricco di nuovi prodotti è vistosamente rallentato (probabilmente per la transizione ARMv7 -> ARMv8) Tegra X1 solo annunciato Xeon-D solo annunciato Buone notizie: i SoC ARMv8 nel mondo mobile sono già una realtà Exynos 7420, Snapdragon 810, Mediatek Helio X10/X20, Kirin 930 A57/A72 e A53 Fino a 10 core (X20)
Next steps Buy new hardware Input by other WPs Consolidate the cluster Unique testing/benchmark framework Unique GUI interface User-friendly access Automatic installation (Puppet/Foreman, etc.) Continuous build integration Github repository Improve knoledge of bootloader (U-boot) Install GPU tools others than CUDA (OpenMP4, OpenCL, C++ AMP, CILK PLUS, etc.) Test new CPU/GPU in Android environment if Linux is not available (find differences betweeb GLibC and BioniC)