FESR Consorzio COMETA - Progetto PI2S2 Componenti software della Infrastruttura Alberto Falzone NICE s.r.l. 23 Luglio 2007 Catania
Alberto Falzone NICE srl – Componenti Software Infrastruttura – 23 Luglio Catania 2 Outline Disponibilità Compilatori Disponibilità Librerie per il Calcolo Parallelo Integrazione MPI con Platform LSF 6.2 Sottomissione jobs paralleli in locale Sottomissione jobs paralleli via gLite/LCG Sottomissione FLASH (problema di sedov) via gLite/LCG Demo on line
Alberto Falzone NICE srl – Componenti Software Infrastruttura – 23 Luglio Catania 3 Disponibilità compilatori GNU GCC Suite (Red Hat ) –gcc g++ g77 gcc -v Reading specs from /usr/lib/gcc/x86_64-redhat-linux/3.4.6/specs Configured with:../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --enable-shared - -enable-threads=posix --disable-checking --with-system-zlib --enable-__cxa_atexit --disable-libunwind- exceptions --enable-java-awt=gtk --host=x86_64-redhat-linux Thread model: posix gcc version (Red Hat ) g++ -v Reading specs from /usr/lib/gcc/x86_64-redhat-linux/3.4.6/specs Configured with:../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --enable-shared - -enable-threads=posix --disable-checking --with-system-zlib --enable-__cxa_atexit --disable-libunwind- exceptions --enable-java-awt=gtk --host=x86_64-redhat-linux Thread model: posix gcc version (Red Hat ) g77 -v Reading specs from /usr/lib/gcc/x86_64-redhat-linux/3.4.6/specs Configured with:../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --enable-shared - -enable-threads=posix --disable-checking --with-system-zlib --enable-__cxa_atexit --disable-libunwind- exceptions --enable-java-awt=gtk --host=x86_64-redhat-linux Thread model: posix gcc version (Red Hat )
Alberto Falzone NICE srl – Componenti Software Infrastruttura – 23 Luglio Catania 4 Disponibilità compilatori PGI Compiler V bit target on x86-64 Linux –pgcc, pgCC, pgf77, pgf90, pgf95, pghpf, Java (JRE) –Installation directory: /opt/share/pgi pgcc -V pgcc bit target on x86-64 Linux pgCC -V pgCC bit target on x86-64 Linux pgf77 -V pgf bit target on x86-64 Linux pgf90 -V pgf bit target on x86-64 Linux pgf95 -V pgf bit target on x86-64 Linux pghpf -V pghpf bit target on x86-64 Linux which pgcc /opt/share/pgi/linux86-64/7.0/bin/pgcc
Alberto Falzone NICE srl – Componenti Software Infrastruttura – 23 Luglio Catania 5 Disponibilità compilatori Intel Compiler Version 9.1 –icc (C/C++), ifort (F77, F90) icc -V Intel(R) C Compiler for Intel(R) EM64T-based applications, Version 9.1 Build Package ID: l_cc_c_ Copyright (C) Intel Corporation. All rights reserved. FOR NON-COMMERCIAL USE ONLY ifort -V Intel(R) Fortran Compiler for Intel(R) EM64T-based applications, Version 9.1 Build Package ID: l_fc_c_ Copyright (C) Intel Corporation. All rights reserved. FOR NON-COMMERCIAL USE ONLY Disponibile attualmente su mia personale licenza per consentire i test e applicazioni non commerciali.
Alberto Falzone NICE srl – Componenti Software Infrastruttura – 23 Luglio Catania 6 Disponibilità compilatori Intel Compiler Version 9.1 –icc (C/C++), ifort (F77, F90) Ambiente non disponibile di default Per abilitare le profiles eseguire:. /opt/share/intel/cce/ /bin/iccvars.(c)sh. /opt/share/intel/fce/ /bin/ifortvars.(c)sh
Alberto Falzone NICE srl – Componenti Software Infrastruttura – 23 Luglio Catania 7 Rete Infiniband Layer OpenFabrics (OFED) fornito da IBM Installation dir /usr/local/ofed Interfaccia configurata: ib0 ifconfig ib0 ib0 Link encap:UNSPEC HWaddr inet addr: Bcast: Mask: UP BROADCAST RUNNING MULTICAST MTU:2044 Metric:1 RX packets:276 errors:0 dropped:0 overruns:0 frame:0 TX packets:50 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:128 RX bytes:16212 (15.8 KiB) TX bytes:3756 (3.6 KiB) tvflash -i HCA #0: MT25208 Tavor Compat, BC2 HSDC, revision A0 Primary image is v build , with label 'HCA.HSDC.A0' Secondary image is v build , with label 'HCA.HSDC.A0' Vital Product Data Product Name: BC2 HSDC P/N: E/C: Rev: A01 S/N: CAM105000GV Freq/Power: PW=15W;PCIe 8x Date Code: 0650 Checksum: Ok TCP over IB su X ibstatus mthca0:1 Infiniband device 'mthca0' port 1 status: default gid: fe80:0000:0000:0000:0005:ad00:0009:556f base lid: 0xc sm lid: 0x2 state: 4: ACTIVE phys state: 5: LinkUp rate: 10 Gb/sec (4X)
Alberto Falzone NICE srl – Componenti Software Infrastruttura – 23 Luglio Catania 8 Disponibilità Librerie MPI Build delle librerie con PGI MPICH /opt/share/mpich/bin/mpichversion MPICH Version: 1.2.7p1 MPICH Release date: $Date: 2005/11/04 11:54:51$ MPICH Patches applied: none MPICH configure: --prefix=/opt/share/mpich --with-device=ch_p4 MPICH Device: ch_p4 MPICH2 /opt/share/mpich2/bin/mpich2version Version: Device: ch3:sock Configure Options: '--prefix=/opt/share/mpich2' '--with-pm=mpd' 'CC=pgcc' 'CXX=pgCC' 'F77=pgf77' 'F90=pgf90' CC: pgcc CXX: pgCC F77: pgf77 F90: pgf90
Alberto Falzone NICE srl – Componenti Software Infrastruttura – 23 Luglio Catania 9 Disponibilità Librerie MPI Build delle librerie con PGI MVAPICH link con OFED 64bit ! /opt/share/mvapich/bin/mpichversion MPICH Version: MPICH Release date: $Date: 2005/06/22 16:33:49$ MPICH Patches applied: none MPICH configure: --with-device=ch_gen2 --with-arch=LINUX -prefix=/opt/share/mvapich --with- romio --without-mpe -lib=-L/usr/local/ofed/lib64 -Wl,-rpath=/usr/local/ofed/lib64 -libverbs -lpthread MPICH Device: ch_gen2 MVAPICH2TCP over IB /opt/share/mvapich2/bin/mpich2version Version: MVAPICH Device: ch3:sock Configure Options: --prefix=/opt/share/mvapich2 --with-pm=mpd --without-mpe
Alberto Falzone NICE srl – Componenti Software Infrastruttura – 23 Luglio Catania 10 Integrazione delle Lib MPI con LSF 6.2 Possibile tramite l’uso di wrapper specifici TAGs disponibili al bsub – -a mpichp4 su rete Gigabit – -a mpich2su rete Infiniband (IPoIB) – -a mvapichsu rete Infiniband (link OFED) – -a mvapich2su rete Infiniband (IPoIB) esempi: bsub -a mpich -n 8 -q short mpirun.lsf -np 8 bsub -a mpich2 -n 8 -q short mpirun.lsf -np 8 bsub -a mvapich -n 8 -q short mpirun.lsf -np 8 bsub -a mvapich2 -n 8 -q short mpirun.lsf -np 8
Alberto Falzone NICE srl – Componenti Software Infrastruttura – 23 Luglio Catania 11 Integrazione delle Lib MPI con LSF 6.2 TAGs disponibili al bsub, al momento solo su INAF CT perchè esplicitamente richiesti: – -a mpich2_intelsu rete Infiniband (IPoIB) – -a mvapich2_intelsu rete Infiniband (IPoIB) Compilati tramite Intel Compiler V bit esempi: bsub -a mvapich_intel -n 8 -q short mpirun.lsf -np 8 bsub -a mvapich2_intel -n 8 -q short mpirun.lsf -np 8
Alberto Falzone NICE srl – Componenti Software Infrastruttura – 23 Luglio Catania 12 Sottomissione jobs paralleli in locale Possibile sia con $HOME condivisa che non condivisa Comando di sottomissione: bsub Opzioni piu’ importanti: –-q –-n –-a (tipo di job mpi) –-c (minore o uguale a quello impostato sulla coda) –-W (minore o uguale a quello impostato sulla coda) –-o (output file per stdout) [ %J = LSF JOBID ] –-e (error file per stderr) –-f “localfile operator remotefile” (da usare nel caso di home non condivisa) Esempio: –bsub -q short -n 8 -a mpich2 -o out.%J mpirun.lsf -np 8 /opt/share/mpich2/examples/cpi 8
Alberto Falzone NICE srl – Componenti Software Infrastruttura – 23 Luglio Catania 13 Sottomissione jobs paralleli in locale Comando bjobs Opzioni piu’ importanti: –-q –-u –-a (lista tutti i job di un certo utente o gruppo di utenti) Esempio: –bjobs -a JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME hpcuser DONE short infn-wn-01 4*infn-wn-0 *les/cpi 8 Jul 21 15:46 4*infn-wn hpcuser DONE short infn-wn-01 4*infn-wn-0 *les/cpi 8 Jul 21 15:47 4*infn-wn-05 ~]$
Alberto Falzone NICE srl – Componenti Software Infrastruttura – 23 Luglio Catania 14 Sottomissione jobs paralleli in locale MPICH-a mpichp4 ~]$ bsub -a mpichp4 -n 8 -q short mpirun.lsf -np 8 /opt/share/mpich/examples/cpi 8 Job is submitted to queue. ~]$ bjobs JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME 8624 hpcuser PEND short infn-wn-01 *les/cpi 8 Jul 21 15:46 ~]$ bjobs JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME 8624 hpcuser RUN short infn-wn-01 4*infn-wn-0 *les/cpi 8 Jul 21 15:46 4*infn-wn-02 ~]$ bjobs No unfinished job found ~]$ bjobs 8624 JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME 8624 hpcuser DONE short infn-wn-01 4*infn-wn-0 *les/cpi 8 Jul 21 15:46 4*infn-wn-02 ~]$
Alberto Falzone NICE srl – Componenti Software Infrastruttura – 23 Luglio Catania 15 Sottomissione jobs paralleli in locale MPICH-a mpichp4 cat out.8625 Sender: LSF System Subject: Job 8625: Done Job was submitted from host by user. Job was executed on host(s), in queue, as user. was used as the home directory. was used as the working directory. Started at Sat Jul 21 15:47: Results reported at Sat Jul 21 15:48: Your job looked like: # LSBATCH: User input mpirun.lsf -np 8 /opt/share/mpich/examples/cpi Successfully completed > cont.
Alberto Falzone NICE srl – Componenti Software Infrastruttura – 23 Luglio Catania 16 Sottomissione jobs paralleli in locale MPICH-a mpichp4 The output (if any) follows: Process 1 on infn-wn-06.ct.pi2s2.it Process 7 on infn-wn-05.ct.pi2s2.it Process 0 on infn-wn-06.ct.pi2s2.it Process 6 on infn-wn-05.ct.pi2s2.it Process 5 on infn-wn-05.ct.pi2s2.it Process 2 on infn-wn-06.ct.pi2s2.it Process 3 on infn-wn-06.ct.pi2s2.it Process 4 on infn-wn-05.ct.pi2s2.it pi is approximately , Error is wall clock time = P4 procgroup file is /opt/share/home/hpcuser/.lsf_8625_genmpi_pifile. Job /opt/lsf/6.2/linux2.6-glibc2.3-x86_64/bin/mpichp4_wrapper -np 8 /opt/share/mpich/examples/cpi 8 TID HOST_NAME COMMAND_LINE STATUS TERMINATION_TIME ===== ========== ================ ======================= =================== infn-wn-05 /opt/share/mpich Done 07/21/ :48: infn-wn-05 /opt/share/mpich Done 07/21/ :48: infn-wn-05 /opt/share/mpich Done 07/21/ :48: infn-wn-05 /opt/share/mpich Done 07/21/ :48: infn-wn-06 /opt/share/mpich Done 07/21/ :48: infn-wn-06 /opt/share/mpich Done 07/21/ :48: infn-wn-06 /opt/share/mpich Done 07/21/ :48: infn-wn-06 /opt/share/mpich Done 07/21/ :48:05 ~]$
Alberto Falzone NICE srl – Componenti Software Infrastruttura – 23 Luglio Catania 17 Sottomissione jobs paralleli in locale MVAPICH-a mvapich bsub -a mvapich -n 8 -q short -o out.%J mpirun.lsf -np 8 /opt/share/mvapich/examples/cpi 8 Job is submitted to queue. ~]$ bjobs JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME 8626 hpcuser PEND short infn-wn-01 *les/cpi 8 Jul 21 16:08 ~]$ bjobs JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME 8626 hpcuser RUN short infn-wn-01 4*infn-wn-0 *les/cpi 8 Jul 21 16:08 4*infn-wn-07 ~]$ bjobs 8626 JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME 8626 hpcuser DONE short infn-wn-01 4*infn-wn-0 *les/cpi 8 Jul 21 16:08 4*infn-wn-07 ~]$ bjobs -w 8626 JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME 8626 hpcuser DONE short infn-wn-01 4*infn-wn-01:4*infn-wn-07 mpirun.lsf -np 8 /opt/share/mvapich/examples/cpi 8 Jul 21 16:08 ~]$
Alberto Falzone NICE srl – Componenti Software Infrastruttura – 23 Luglio Catania 18 Sottomissione jobs paralleli in locale MVAPICH-a mvapich ~]$ cat out.8626 Sender: LSF System Subject: Job 8626: Done Job was submitted from host by user. Job was executed on host(s), in queue, as user. was used as the home directory. was used as the working directory. Started at Sat Jul 21 16:09: Results reported at Sat Jul 21 16:09: Your job looked like: # LSBATCH: User input mpirun.lsf -np 8 /opt/share/mvapich/examples/cpi Successfully completed. Resource usage summary: CPU time : 1.43 sec. Max Memory : 2 MB Max Swap : 13 MB Max Processes : 1 Max Threads : > cont.
Alberto Falzone NICE srl – Componenti Software Infrastruttura – 23 Luglio Catania 19 Sottomissione jobs paralleli in locale MVAPICH-a mvapich The output (if any) follows: infn-wn-01-ib0 infn-wn-07-ib0 Process 5 on infn-wn-07.ct.pi2s2.it Process 1 on infn-wn-01.ct.pi2s2.it Process 7 on infn-wn-07.ct.pi2s2.it Process 6 on infn-wn-07.ct.pi2s2.it Process 0 on infn-wn-01.ct.pi2s2.it Process 2 on infn-wn-01.ct.pi2s2.it Process 3 on infn-wn-01.ct.pi2s2.it Process 4 on infn-wn-07.ct.pi2s2.it pi is approximately , Error is wall clock time = Job /opt/lsf/6.2/linux2.6-glibc2.3-x86_64/bin/mvapich_wrapper -np 8 /opt/share/mvapich/examples/cpi 8 TID HOST_NAME COMMAND_LINE STATUS TERMINATION_TIME ===== ========== ================ ======================= =================== infn-wn-07 /opt/share/mvapi Done 07/21/ :09: infn-wn-07 /opt/share/mvapi Done 07/21/ :09: infn-wn-07 /opt/share/mvapi Done 07/21/ :09: infn-wn-07 /opt/share/mvapi Done 07/21/ :09: infn-wn-01 /opt/share/mvapi Done 07/21/ :09: infn-wn-01 /opt/share/mvapi Done 07/21/ :09: infn-wn-01 /opt/share/mvapi Done 07/21/ :09: infn-wn-01 /opt/share/mvapi Done 07/21/ :09:06
Alberto Falzone NICE srl – Componenti Software Infrastruttura – 23 Luglio Catania 20 Sottomissione jobs paralleli in locale MPICH2-a mpich2 bsub -a mpich2 -n 8 -q short -o out.%J mpirun.lsf -np 8 /opt/share/mpich2/examples/cpi 8 Job is submitted to queue. ~]$ bjobs JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME 8628 hpcuser PEND short infn-wn-01 *les/cpi 8 Jul 21 16:20 ~]$ bjobs JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME 8628 hpcuser RUN short infn-wn-01 4*infn-wn-0 *les/cpi 8 Jul 21 16:20 4*infn-wn-09 ~]$ bjobs 8628 JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME 8628 hpcuser DONE short infn-wn-01 4*infn-wn-0 *les/cpi 8 Jul 21 16:20 4*infn-wn-09 ~]$
Alberto Falzone NICE srl – Componenti Software Infrastruttura – 23 Luglio Catania 21 Sottomissione jobs paralleli in locale MPICH2-a mpich2 ~]$ cat out.8628 Sender: LSF System Subject: Job 8628: Done Job was submitted from host by user. Job was executed on host(s), in queue, as user. was used as the home directory. was used as the working directory. Started at Sat Jul 21 16:20: Results reported at Sat Jul 21 16:20: Your job looked like: # LSBATCH: User input mpirun.lsf -np 8 /opt/share/mpich2/examples/cpi Successfully completed. Resource usage summary: CPU time : 1.80 sec. Max Memory : 2 MB Max Swap : 13 MB Max Processes : 1 Max Threads : > cont.
Alberto Falzone NICE srl – Componenti Software Infrastruttura – 23 Luglio Catania 22 Sottomissione jobs paralleli in locale MPICH2-a mpich2 The output (if any) follows: Process 0 of 8 is on infn-wn-04.ct.pi2s2.it Process 1 of 8 is on infn-wn-04.ct.pi2s2.it Process 3 of 8 is on infn-wn-04.ct.pi2s2.it Process 2 of 8 is on infn-wn-04.ct.pi2s2.it Process 4 of 8 is on infn-wn-09.ct.pi2s2.it Process 7 of 8 is on infn-wn-09.ct.pi2s2.it Process 5 of 8 is on infn-wn-09.ct.pi2s2.it Process 6 of 8 is on infn-wn-09.ct.pi2s2.it pi is approximately , Error is wall clock time = infn-wn-04 infn-wn-09 Job /opt/lsf/6.2/linux2.6-glibc2.3-x86_64/bin/mpich2_wrapper -np 8 /opt/share/mpich2/examples/cpi 8 TID HOST_NAME COMMAND_LINE STATUS TERMINATION_TIME ===== ========== ================ ======================= =================== infn-wn-09 /opt/share/mpich Done 07/21/ :20: infn-wn-09 /opt/share/mpich Done 07/21/ :20: infn-wn-09 /opt/share/mpich Done 07/21/ :20: infn-wn-09 /opt/share/mpich Done 07/21/ :20: infn-wn-04 /opt/share/mpich Done 07/21/ :20: infn-wn-04 /opt/share/mpich Done 07/21/ :20: infn-wn-04 /opt/share/mpich Done 07/21/ :20: infn-wn-04 /opt/share/mpich Done 07/21/ :20:37 ~]$
Alberto Falzone NICE srl – Componenti Software Infrastruttura – 23 Luglio Catania 23 Sottomissione jobs paralleli in locale MVAPICH2-a mvapich2 bsub -a mvapich2 -n 8 -q short -o out.%J mpirun.lsf -np 8 /opt/share/mvapich2/examples/cpi 8 Job is submitted to queue. ~]$ bjobs JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME 8629 hpcuser PEND short infn-wn-01 *les/cpi 8 Jul 21 16:29 ~]$ bjobs JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME 8629 hpcuser RUN short infn-wn-01 4*infn-wn-0 *les/cpi 8 Jul 21 16:29 4*infn-wn-07 ~]$ bjobs 8629 JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME 8629 hpcuser DONE short infn-wn-01 4*infn-wn-0 *les/cpi 8 Jul 21 16:29 4*infn-wn-07 ~]$
Alberto Falzone NICE srl – Componenti Software Infrastruttura – 23 Luglio Catania 24 Sottomissione jobs paralleli in locale MVAPICH2-a mvapich2 ~]$ cat out.8629 Sender: LSF System Subject: Job 8629: Done Job was submitted from host by user. Job was executed on host(s), in queue, as user. was used as the home directory. was used as the working directory. Started at Sat Jul 21 16:29: Results reported at Sat Jul 21 16:30: Your job looked like: # LSBATCH: User input mpirun.lsf -np 8 /opt/share/mvapich2/examples/cpi Successfully completed. Resource usage summary: CPU time : 1.89 sec. Max Memory : 2 MB Max Swap : 13 MB Max Processes : 1 Max Threads : > cont.
Alberto Falzone NICE srl – Componenti Software Infrastruttura – 23 Luglio Catania 25 Sottomissione jobs paralleli in locale MVAPICH2-a mvapich2 The output (if any) follows: infn-wn-09.ct.pi2s2.it_41344 ( ) infn-wn-07.ct.pi2s2.it_35666 ( ) Process 0 of 8 is on infn-wn-09.ct.pi2s2.it Process 1 of 8 is on infn-wn-09.ct.pi2s2.it Process 2 of 8 is on infn-wn-09.ct.pi2s2.it Process 3 of 8 is on infn-wn-09.ct.pi2s2.it Process 4 of 8 is on infn-wn-07.ct.pi2s2.it Process 5 of 8 is on infn-wn-07.ct.pi2s2.it Process 6 of 8 is on infn-wn-07.ct.pi2s2.it Process 7 of 8 is on infn-wn-07.ct.pi2s2.it pi is approximately , Error is wall clock time = infn-wn-09 infn-wn-07 Job /opt/lsf/6.2/linux2.6-glibc2.3-x86_64/bin/mvapich2_wrapper -np 8 /opt/share/mvapich2/examples/cpi 8 TID HOST_NAME COMMAND_LINE STATUS TERMINATION_TIME ===== ========== ================ ======================= =================== infn-wn-07 /opt/share/mvapi Done 07/21/ :29: infn-wn-07 /opt/share/mvapi Done 07/21/ :29: infn-wn-07 /opt/share/mvapi Done 07/21/ :29: infn-wn-07 /opt/share/mvapi Done 07/21/ :29: infn-wn-09 /opt/share/mvapi Done 07/21/ :29: infn-wn-09 /opt/share/mvapi Done 07/21/ :29: infn-wn-09 /opt/share/mvapi Done 07/21/ :29: infn-wn-09 /opt/share/mvapi Done 07/21/ :29:57 ~]$
Alberto Falzone NICE srl – Componenti Software Infrastruttura – 23 Luglio Catania 26 Sottomissione jobs paralleli via grid ~]$ voms-proxy-init --voms cometa Cannot find file or dir: /home/falzone/.glite/vomses Your identity: /C=IT/O=INFN/OU=Personal Certificate/L=NICE/CN=Alberto Falzone Enter GRID pass phrase: Creating temporary proxy Done Contacting voms.ct.infn.it:15003 [/C=IT/O=INFN/OU=Host/L=Catania/CN=voms.ct.infn.it] "cometa" Done Creating proxy Done Your proxy is valid until Sun Jul 22 04:58: ~]$ voms-proxy-info -all subject : /C=IT/O=INFN/OU=Personal Certificate/L=NICE/CN=Alberto Falzone/CN=proxy/CN=proxy/CN=proxy issuer : /C=IT/O=INFN/OU=Personal Certificate/L=NICE/CN=Alberto Falzone/CN=proxy/CN=proxy identity : /C=IT/O=INFN/OU=Personal Certificate/L=NICE/CN=Alberto Falzone/CN=proxy/CN=proxy type : unknown strength : 512 bits path : /tmp/x509up_u514 timeleft : 11:59:47 === VO cometa extension information === VO : cometa subject : /C=IT/O=INFN/OU=Personal Certificate/L=NICE/CN=Alberto Falzone issuer : /C=IT/O=INFN/OU=Host/L=Catania/CN=voms.ct.infn.it attribute : /cometa/Role=NULL/Capability=NULL attribute : /cometa/hpc/Role=NULL/Capability=NULL timeleft : 11:56:33
Alberto Falzone NICE srl – Componenti Software Infrastruttura – 23 Luglio Catania 27 Sottomissione jobs paralleli via grid MPICH MPICH]$ ll total 524 -rwxr-xr-x 1 falzone falzone Jul 15 10:52 cpi-mpi1 -rw-rw-r-- 1 falzone falzone 0 Jul 20 16:49 edglog.log -rw-r--r-- 1 falzone falzone 497 Jul 4 11:45 edg_wl_ui_cmd_var.conf -rw-r--r-- 1 falzone falzone 480 Jul 4 11:45 edg_wl_ui.conf -rw-r--r-- 1 falzone falzone 596 Jul 21 17:04 mpi-cpi-mpi1.jdl -rwxr-xr-x 1 falzone falzone 161 Jul 21 16:55 mpi.post.sh -rwxr-xr-x 1 falzone falzone 159 Jul 21 16:55 mpi.pre.sh -rwxr-xr-x 1 falzone falzone 85 Jul 15 13:06 submit MPICH]$ MPICH]$ cat mpi-cpi-mpi1.jdl Type = "Job"; JobType = "MPICH"; NodeNumber = 8; Executable = "cpi-mpi1"; Arguments = "8"; StdOutput = "mpi.out"; StdError = "mpi.err"; InputSandbox = {"cpi-mpi1","mpi.pre.sh","mpi.post.sh"}; OutputSandbox = {"mpi.err","mpi.out"}; #Requirements = other.GlueCEUniqueId == "infn-ce-01.ct.pi2s2.it:2119/jobmanager-lcglsf-long" || other.GlueCEUniqueId == "infn-ce-01.ct.pi2s2.it:2119/jobmanager-lcglsf-short" || other.GlueCEUniqueId == "infn-ce-01.ct.pi2s2.it:2119/jobmanager-lcglsf-infinite"; RetryCount = 3;
Alberto Falzone NICE srl – Componenti Software Infrastruttura – 23 Luglio Catania 28 Sottomissione jobs paralleli via grid MPICH MPICH]$ cat edg_wl_ui_cmd_var.conf [ rank = - other.GlueCEStateEstimatedResponseTime; requirements = other.GlueCEStateStatus == "Production"; RetryCount = 3; ErrorStorage = "/tmp"; OutputStorage = "/tmp/jobOutput"; ListenerPort = 44000; ListenerStorage = "/tmp"; LoggingTimeout = 30; LoggingSyncTimeout = 30; LoggingDestination = "infn-rb-01.ct.pi2s2.it:9002"; # Default NS logger level is set to 0 (null) # max value is 6 (very ugly) NSLoggerLevel = 0; DefaultLogInfoLevel = 0; DefaultStatusLevel = 0; DefaultVo = "unspecified"; ] MPICH]$ cat edg_wl_ui.conf [ VirtualOrganisation = "cometa"; NSAddresses = "infn-rb-01.ct.pi2s2.it:7772"; LBAddresses = "infn-rb-01.ct.pi2s2.it:9000"; ## HLR location is optional. Uncomment and fill correctly for ## enabling accounting #HLRLocation = "fake HLR Location" ## MyProxyServer is optional. Uncomment and fill correctly for ## enabling proxy renewal. This field should be set equal to ## MYPROXY_SERVER environment variable MyProxyServer = "grid001.ct.infn.it" ] LoggingDestination = "infn-rb- 01.ct.pi2s2.it:9002"; VirtualOrganisation = "cometa"; NSAddresses = "infn-rb- 01.ct.pi2s2.it:7772"; LBAddresses = "infn-rb- 01.ct.pi2s2.it:9000";
Alberto Falzone NICE srl – Componenti Software Infrastruttura – 23 Luglio Catania 29 Sottomissione jobs paralleli via grid MPICH MPICH]$ edg-job-submit --config edg_wl_ui_cmd_var.conf --config-vo edg_wl_ui.conf -o jobs mpi-cpi-mpi1.jdl **** Warning: UI_CAN_NOT_EXECUTE **** Unable to execute "Python Tkinter Graphical": Unable to load library. Selected Virtual Organisation name (from --config-vo option): cometa Connecting to host infn-rb-01.ct.pi2s2.it, port 7772 Logging to host infn-rb-01.ct.pi2s2.it, port 9002 ================================ edg-job-submit Success ===================================== The job has been successfully submitted to the Network Server. Use edg-job-status command to check job current status. Your job identifier (edg_jobId) is: - The edg_jobId has been saved in the following file: /home/falzone/MPI_JOB_SCHOOL/MPICH/jobs ============================================================================================= $ edg-job-status ************************************************************* BOOKKEEPING INFORMATION: Status info for the Job : Current Status: Scheduled Status Reason: Job successfully submitted to Globus Destination: unict-diit-ce-01.ct.pi2s2.it:2119/jobmanager-lcglsf-infinite reached on: Sat Jul 21 15:25: ************************************************************* MPICH]$
Alberto Falzone NICE srl – Componenti Software Infrastruttura – 23 Luglio Catania 30 Sottomissione jobs paralleli via grid MPICH $ edg-job-status ************************************************************* BOOKKEEPING INFORMATION: Status info for the Job : Current Status: Scheduled Status Reason: Job successfully submitted to Globus Destination: unict-diit-ce-01.ct.pi2s2.it:2119/jobmanager-lcglsf-infinite reached on: Sat Jul 21 15:25: ************************************************************* $ edg-job-status ************************************************************* BOOKKEEPING INFORMATION: Status info for the Job : Current Status: Running Status Reason: Job successfully submitted to Globus Destination: unict-diit-ce-01.ct.pi2s2.it:2119/jobmanager-lcglsf-infinite reached on: Sat Jul 21 15:28: ************************************************************* MPICH]$
Alberto Falzone NICE srl – Componenti Software Infrastruttura – 23 Luglio Catania 31 Sottomissione jobs paralleli via grid MPICH $ edg-job-status ************************************************************* BOOKKEEPING INFORMATION: Status info for the Job : Current Status: Done (Success) Exit code: 0 Status Reason: Job terminated successfully Destination: unict-diit-ce-01.ct.pi2s2.it:2119/jobmanager-lcglsf-infinite reached on: Sat Jul 21 15:31: ************************************************************* $ edg-job-get-output --dir. Retrieving files from host: infn-rb-01.ct.pi2s2.it ( for 01.ct.pi2s2.it:9000/-NDv7HRgK7DlnnXiVYhfkQ ) ********************************************************************************* JOB GET OUTPUT OUTCOME Output sandbox files for the job: - have been successfully retrieved and stored in the directory: /home/falzone/MPI_JOB_SCHOOL/MPICH/falzone_-NDv7HRgK7DlnnXiVYhfkQ ********************************************************************************* $
Alberto Falzone NICE srl – Componenti Software Infrastruttura – 23 Luglio Catania 32 Sottomissione jobs paralleli via grid MPICH $ cd falzone_-NDv7HRgK7DlnnXiVYhfkQ/ falzone_-NDv7HRgK7DlnnXiVYhfkQ]$ ll total 8 -rw-rw-r-- 1 falzone falzone 336 Jul 21 17:35 mpi.err -rw-rw-r-- 1 falzone falzone 1308 Jul 21 17:35 mpi.out falzone_-NDv7HRgK7DlnnXiVYhfkQ]$ cat mpi.out ################### # mpipre.sh Begin Here is on mpi.pre.sh Sat Jul 21 17:27:11 CEST 2007 unict-diit-wn-07.ct.pi2s2.it End pi is approximately , Error is wall clock time = P4 procgroup file is /home/cometa002/.lsf_2292_genmpi_pifile. Job /opt/lsf/6.2/linux2.6-glibc2.3-x86_64/bin/mpichp4_wrapper -np 8./cpi-mpi > cont.
Alberto Falzone NICE srl – Componenti Software Infrastruttura – 23 Luglio Catania 33 Sottomissione jobs paralleli via grid MPICH TID HOST_NAME COMMAND_LINE STATUS TERMINATION_TIME ===== ========== ================ ======================= =================== unict-diit /home/cometa002/ Done 07/21/ :27: unict-diit /home/cometa002/ Done 07/21/ :27: unict-diit /home/cometa002/ Done 07/21/ :27: unict-diit /home/cometa002/ Done 07/21/ :27: unict-diit./cpi-mpi1 8 -p4 Done 07/21/ :27: unict-diit /home/cometa002/ Done 07/21/ :27: unict-diit /home/cometa002/ Done 07/21/ :27: unict-diit /home/cometa002/ Done 07/21/ :27:16 ################### # mpipost.sh Begin Here is on mpi.post.sh Sat Jul 21 17:27:22 CEST 2007 unict-diit-wn-07.ct.pi2s2.it MY_VARIABLE=pippopluto End falzone_-NDv7HRgK7DlnnXiVYhfkQ]$
Alberto Falzone NICE srl – Componenti Software Infrastruttura – 23 Luglio Catania 34 Sottomissione jobs paralleli via grid MPICH falzone_-NDv7HRgK7DlnnXiVYhfkQ]$ cat mpi.err Process 1 on unict-diit-wn-07.ct.pi2s2.it Process 4 on unict-diit-wn-10.ct.pi2s2.it Process 3 on unict-diit-wn-07.ct.pi2s2.it Process 7 on unict-diit-wn-10.ct.pi2s2.it Process 2 on unict-diit-wn-07.ct.pi2s2.it Process 5 on unict-diit-wn-10.ct.pi2s2.it Process 0 on unict-diit-wn-07.ct.pi2s2.it Process 6 on unict-diit-wn-10.ct.pi2s2.it falzone_-NDv7HRgK7DlnnXiVYhfkQ]$
Alberto Falzone NICE srl – Componenti Software Infrastruttura – 23 Luglio Catania 35 Sottomissione jobs paralleli via grid MVAPICH MVAPICH]$ ll total 436 -rwxr-xr-x 1 falzone falzone Jul 15 13:07 cpi-mpi1-ib0 -rw-r--r-- 1 falzone falzone 0 Jul 4 11:56 edglog.log -rw-r--r-- 1 falzone falzone 497 Jul 4 11:45 edg_wl_ui_cmd_var.conf -rw-r--r-- 1 falzone falzone 480 Jul 4 11:45 edg_wl_ui.conf -rw-r--r-- 1 falzone falzone 607 Jul 21 18:03 mpi-cpi-mpi1-ib.jdl -rwxr-xr-x 1 falzone falzone 159 Jul 21 17:49 mpi.post.sh -rwxr-xr-x 1 falzone falzone 158 Jul 21 17:49 mpi.pre.sh -rwxr-xr-x 1 falzone falzone 85 Jul 15 13:13 submit MVAPICH]$ cat mpi-cpi-mpi1-ib.jdl Type = "Job"; JobType = "MVAPICH"; NodeNumber = 8; Executable = "cpi-mpi1-ib0"; Arguments = "8"; StdOutput = "mpi.out"; StdError = "mpi.err"; InputSandbox = {"cpi-mpi1-ib0","mpi.pre.sh","mpi.post.sh"}; OutputSandbox = {"mpi.err","mpi.out"}; #Requirements = other.GlueCEUniqueId == "infn-ce-01.ct.pi2s2.it:2119/jobmanager-lcglsf-long" || other.GlueCEUniqueId == "infn-ce-01.ct.pi2s2.it:2119/jobmanager-lcglsf-short" || other.GlueCEUniqueId == "infn-ce-01.ct.pi2s2.it:2119/jobmanager-lcglsf-infinite"; RetryCount = 3; MVAPICH]$
Alberto Falzone NICE srl – Componenti Software Infrastruttura – 23 Luglio Catania 36 Sottomissione jobs paralleli via grid MVAPICH edg-job-submit --config edg_wl_ui_cmd_var.conf --config-vo edg_wl_ui.conf -o jobs mpi-cpi- mpi1-ib.jdl **** Warning: UI_CAN_NOT_EXECUTE **** Unable to execute "Python Tkinter Graphical": Unable to load library. Selected Virtual Organisation name (from --config-vo option): cometa Connecting to host infn-rb-01.ct.pi2s2.it, port 7772 Logging to host infn-rb-01.ct.pi2s2.it, port 9002 ================================ edg-job-submit Success ===================================== The job has been successfully submitted to the Network Server. Use edg-job-status command to check job current status. Your job identifier (edg_jobId) is: - The edg_jobId has been saved in the following file: /home/falzone/MPI_JOB_SCHOOL/MVAPICH/jobs ============================================================================================= $ edg-job-status ************************************************************* BOOKKEEPING INFORMATION: Status info for the Job : Current Status: Ready Status Reason: unavailable Destination: inaf-ce-01.ct.pi2s2.it:2119/jobmanager-lcglsf-infinite reached on: Sat Jul 21 16:09: *************************************************************
Alberto Falzone NICE srl – Componenti Software Infrastruttura – 23 Luglio Catania 37 Sottomissione jobs paralleli via grid MVAPICH $ edg-job-status ************************************************************* BOOKKEEPING INFORMATION: Status info for the Job : Current Status: Scheduled Status Reason: Job successfully submitted to Globus Destination: inaf-ce-01.ct.pi2s2.it:2119/jobmanager-lcglsf-infinite reached on: Sat Jul 21 16:09: ************************************************************* $ edg-job-status ************************************************************* BOOKKEEPING INFORMATION: Status info for the Job : Current Status: Running Status Reason: Job successfully submitted to Globus Destination: inaf-ce-01.ct.pi2s2.it:2119/jobmanager-lcglsf-infinite reached on: Sat Jul 21 16:12: ************************************************************* MVAPICH]$
Alberto Falzone NICE srl – Componenti Software Infrastruttura – 23 Luglio Catania 38 Sottomissione jobs paralleli via grid MVAPICH $ edg-job-status ************************************************************* BOOKKEEPING INFORMATION: Status info for the Job : Current Status: Done (Success) Exit code: 0 Status Reason: Job terminated successfully Destination: inaf-ce-01.ct.pi2s2.it:2119/jobmanager-lcglsf-infinite reached on: Sat Jul 21 16:17: ************************************************************* $ edg-job-get-output --dir. Retrieving files from host: infn-rb-01.ct.pi2s2.it ( for 01.ct.pi2s2.it:9000/JpOgNuFwUS3M7ureRVNmug ) ********************************************************************************* JOB GET OUTPUT OUTCOME Output sandbox files for the job: - have been successfully retrieved and stored in the directory: /home/falzone/MPI_JOB_SCHOOL/MVAPICH/falzone_JpOgNuFwUS3M7ureRVNmug *********************************************************************************
Alberto Falzone NICE srl – Componenti Software Infrastruttura – 23 Luglio Catania 39 Sottomissione jobs paralleli via grid MVAPICH MVAPICH]$ cd falzone_JpOgNuFwUS3M7ureRVNmug/ falzone_JpOgNuFwUS3M7ureRVNmug]$ ll total 8 -rw-rw-r-- 1 falzone falzone 288 Jul 21 18:20 mpi.err -rw-rw-r-- 1 falzone falzone 1356 Jul 21 18:20 mpi.out falzone_JpOgNuFwUS3M7ureRVNmug]$ cat mpi.out ################### # mpipre.sh Begin Here is on mpipre.sh Sat Jul 21 18:12:55 CEST 2007 inaf-wn-01.ct.pi2s2.it End inaf-wn-01-ib0 inaf-wn-05-ib0 pi is approximately , Error is wall clock time = Job /opt/lsf/6.2/linux2.6-glibc2.3-x86_64/bin/mvapich_wrapper -np 8./cpi-mpi1-ib > cont.
Alberto Falzone NICE srl – Componenti Software Infrastruttura – 23 Luglio Catania 40 Sottomissione jobs paralleli via grid MVAPICH TID HOST_NAME COMMAND_LINE STATUS TERMINATION_TIME ===== ========== ================ ======================= =================== inaf-wn-05./cpi-mpi1-ib0 8 Done 07/21/ :13: inaf-wn-05./cpi-mpi1-ib0 8 Done 07/21/ :13: inaf-wn-05./cpi-mpi1-ib0 8 Done 07/21/ :13: inaf-wn-05./cpi-mpi1-ib0 8 Done 07/21/ :13: inaf-wn-01./cpi-mpi1-ib0 8 Done 07/21/ :13: inaf-wn-01./cpi-mpi1-ib0 8 Done 07/21/ :13: inaf-wn-01./cpi-mpi1-ib0 8 Done 07/21/ :13: inaf-wn-01./cpi-mpi1-ib0 8 Done 07/21/ :13:01 ################### # mpipost.sh Begin Here is on mpipost.sh Sat Jul 21 18:13:04 CEST 2007 inaf-wn-01.ct.pi2s2.it MY_VARIABLE=pippopluto End falzone_JpOgNuFwUS3M7ureRVNmug]$ cat mpi.err Process 0 on inaf-wn-01.ct.pi2s2.it Process 3 on inaf-wn-01.ct.pi2s2.it Process 6 on inaf-wn-05.ct.pi2s2.it Process 1 on inaf-wn-01.ct.pi2s2.it Process 5 on inaf-wn-05.ct.pi2s2.it Process 2 on inaf-wn-01.ct.pi2s2.it Process 7 on inaf-wn-05.ct.pi2s2.it Process 4 on inaf-wn-05.ct.pi2s2.it falzone_JpOgNuFwUS3M7ureRVNmug]$
Alberto Falzone NICE srl – Componenti Software Infrastruttura – 23 Luglio Catania 41 Sottomissione jobs paralleli via grid MPICH2 $ edg-job-submit --config edg_wl_ui_cmd_var.conf --config-vo edg_wl_ui.conf -o jobs mpi-cpi- mpi2.jdl **** Warning: UI_CAN_NOT_EXECUTE **** Unable to execute "Python Tkinter Graphical": Unable to load library. Selected Virtual Organisation name (from --config-vo option): cometa Connecting to host infn-rb-01.ct.pi2s2.it, port 7772 Logging to host infn-rb-01.ct.pi2s2.it, port 9002 ================================ edg-job-submit Success ===================================== The job has been successfully submitted to the Network Server. Use edg-job-status command to check job current status. Your job identifier (edg_jobId) is: - The edg_jobId has been saved in the following file: /home/falzone/MPI_JOB_SCHOOL/MPICH2/jobs ============================================================================================= MPICH2]$
Alberto Falzone NICE srl – Componenti Software Infrastruttura – 23 Luglio Catania 42 Sottomissione jobs paralleli via grid MPICH2 $ edg-job-status ************************************************************* BOOKKEEPING INFORMATION: Status info for the Job : Current Status: Scheduled Status Reason: Job successfully submitted to Globus Destination: inaf-ce-01.ct.pi2s2.it:2119/jobmanager-lcglsf-short reached on: Sat Jul 21 16:38: ************************************************************* $ edg-job-status ************************************************************* BOOKKEEPING INFORMATION: Status info for the Job : Current Status: Running Status Reason: Job successfully submitted to Globus Destination: inaf-ce-01.ct.pi2s2.it:2119/jobmanager-lcglsf-short reached on: Sat Jul 21 16:41: ************************************************************* MPICH2]$
Alberto Falzone NICE srl – Componenti Software Infrastruttura – 23 Luglio Catania 43 Sottomissione jobs paralleli via grid MPICH2 $ edg-job-status ************************************************************* BOOKKEEPING INFORMATION: Status info for the Job : Current Status: Done (Success) Exit code: 0 Status Reason: Job terminated successfully Destination: inaf-ce-01.ct.pi2s2.it:2119/jobmanager-lcglsf-short reached on: Sat Jul 21 16:46: ************************************************************* $ edg-job-get-output --dir. Retrieving files from host: infn-rb-01.ct.pi2s2.it ( for 01.ct.pi2s2.it:9000/U1LGfdC0t2O-2Vx_ADFpKw ) ********************************************************************************* JOB GET OUTPUT OUTCOME Output sandbox files for the job: - have been successfully retrieved and stored in the directory: /home/falzone/MPI_JOB_SCHOOL/MPICH2/falzone_U1LGfdC0t2O-2Vx_ADFpKw *********************************************************************************
Alberto Falzone NICE srl – Componenti Software Infrastruttura – 23 Luglio Catania 44 Sottomissione jobs paralleli via grid MPICH2 MPICH2]$ cd falzone_U1LGfdC0t2O-2Vx_ADFpKw/ falzone_U1LGfdC0t2O-2Vx_ADFpKw]$ ll total 4 -rw-rw-r-- 1 falzone falzone 0 Jul 21 18:47 mpi.err -rw-rw-r-- 1 falzone falzone 1756 Jul 21 18:47 mpi.out falzone_U1LGfdC0t2O-2Vx_ADFpKw]$ cat mpi.out ################### # mpipre.sh Begin Here is on mpi.pre.sh Sat Jul 21 18:42:05 CEST 2007 inaf-wn-03.ct.pi2s2.it End Process 1 of 8 is on inaf-wn-03.ct.pi2s2.it Process 0 of 8 is on inaf-wn-03.ct.pi2s2.it Process 3 of 8 is on inaf-wn-03.ct.pi2s2.it Process 2 of 8 is on inaf-wn-03.ct.pi2s2.it Process 4 of 8 is on inaf-wn-05.ct.pi2s2.it Process 6 of 8 is on inaf-wn-05.ct.pi2s2.it Process 5 of 8 is on inaf-wn-05.ct.pi2s2.it Process 7 of 8 is on inaf-wn-05.ct.pi2s2.it pi is approximately , Error is wall clock time = inaf-wn-03 inaf-wn > cont.
Alberto Falzone NICE srl – Componenti Software Infrastruttura – 23 Luglio Catania 45 Sottomissione jobs paralleli via grid MPICH Job /opt/lsf/6.2/linux2.6-glibc2.3-x86_64/bin/mpich2_wrapper -np 8 -machinefile /home/cometasgm004/globus-tmp.inaf-wn /.mpi2/https_3a_2f_2finfn-rb- 01.ct.pi2s2.it_3a9000_2fU1LGfdC0t2O-2Vx_5fADFpKw/host11644./cpi-mpi2 8 TID HOST_NAME COMMAND_LINE STATUS TERMINATION_TIME ===== ========== ================ ======================= =================== inaf-wn-05./cpi-mpi2 8 Done 07/21/ :42: inaf-wn-05./cpi-mpi2 8 Done 07/21/ :42: inaf-wn-05./cpi-mpi2 8 Done 07/21/ :42: inaf-wn-05./cpi-mpi2 8 Done 07/21/ :42: inaf-wn-03./cpi-mpi2 8 Done 07/21/ :42: inaf-wn-03./cpi-mpi2 8 Done 07/21/ :42: inaf-wn-03./cpi-mpi2 8 Done 07/21/ :42: inaf-wn-03./cpi-mpi2 8 Done 07/21/ :42:23 ################### # mpipost.sh Begin Here is on mpi.post.sh Sat Jul 21 18:42:31 CEST 2007 inaf-wn-03.ct.pi2s2.it MY_VARIABLE=pippopluto End falzone_U1LGfdC0t2O-2Vx_ADFpKw]$
Alberto Falzone NICE srl – Componenti Software Infrastruttura – 23 Luglio Catania 46 Sottomissione jobs paralleli via grid MVAPICH2 MVAPICH2]$ ll total rwxr-xr-x 1 falzone falzone Jul 18 10:09 cpi-mpi2-ib -rw-r--r-- 1 falzone falzone 497 Jul 4 12:02 edg_wl_ui_cmd_var.conf -rw-r--r-- 1 falzone falzone 480 Jul 4 12:02 edg_wl_ui.conf -rw-r--r-- 1 falzone falzone 951 Jul 18 11:10 mpi-cpi-mpi2.jdl -rwxr-xr-x 1 falzone falzone 162 Jul 4 12:02 mpi.post.sh -rwxr-xr-x 1 falzone falzone 160 Jul 4 12:02 mpi.pre.sh -rwxr-xr-x 1 falzone falzone 85 Jul 15 13:14 submit MVAPICH2]$ cat mpi-cpi-mpi2.jdl Type = "Job"; JobType = "MVAPICH2"; NodeNumber = 8; Executable = "cpi-mpi2-ib"; Arguments = "8"; StdOutput = "mpi.out"; StdError = "mpi.err"; InputSandbox = {"cpi-mpi2-ib","mpi.pre.sh","mpi.post.sh"}; OutputSandbox = {"mpi.err","mpi.out"}; RetryCount = 3; falzone_-NDv7HRgK7DlnnXiVYhfkQ]$
Alberto Falzone NICE srl – Componenti Software Infrastruttura – 23 Luglio Catania 47 Sottomissione jobs paralleli via grid MVAPICH2 $ edg-job-submit --config edg_wl_ui_cmd_var.conf --config-vo edg_wl_ui.conf -o jobs mpi-cpi- mpi2.jdl **** Warning: UI_CAN_NOT_EXECUTE **** Unable to execute "Python Tkinter Graphical": Unable to load library. Selected Virtual Organisation name (from --config-vo option): cometa Connecting to host infn-rb-01.ct.pi2s2.it, port 7772 Logging to host infn-rb-01.ct.pi2s2.it, port 9002 ================================ edg-job-submit Success ===================================== The job has been successfully submitted to the Network Server. Use edg-job-status command to check job current status. Your job identifier (edg_jobId) is: - The edg_jobId has been saved in the following file: /home/falzone/MPI_JOB_SCHOOL/MVAPICH2/jobs ============================================================================================= $ edg-job-status ************************************************************* BOOKKEEPING INFORMATION: Status info for the Job : Current Status: Ready Status Reason: unavailable Destination: infnlns-ce-01.ct.pi2s2.it:2119/jobmanager-lcglsf-infinite reached on: Sat Jul 21 17:03: *************************************************************
Alberto Falzone NICE srl – Componenti Software Infrastruttura – 23 Luglio Catania 48 Sottomissione jobs paralleli via grid MVAPICH2 $ edg-job-status ************************************************************* BOOKKEEPING INFORMATION: Status info for the Job : Current Status: Scheduled Status Reason: Job successfully submitted to Globus Destination: infnlns-ce-01.ct.pi2s2.it:2119/jobmanager-lcglsf-infinite reached on: Sat Jul 21 17:03: ************************************************************* $ edg-job-status ************************************************************* BOOKKEEPING INFORMATION: Status info for the Job : Current Status: Running Status Reason: Job successfully submitted to Globus Destination: infnlns-ce-01.ct.pi2s2.it:2119/jobmanager-lcglsf-infinite reached on: Sat Jul 21 17:06: *************************************************************
Alberto Falzone NICE srl – Componenti Software Infrastruttura – 23 Luglio Catania 49 Sottomissione jobs paralleli via grid MVAPICH2 $ edg-job-status ************************************************************* BOOKKEEPING INFORMATION: Status info for the Job : Current Status: Done (Success) Exit code: 0 Status Reason: Job terminated successfully Destination: infnlns-ce-01.ct.pi2s2.it:2119/jobmanager-lcglsf-infinite reached on: Sat Jul 21 17:09: ************************************************************* MVAPICH2]$ edg-job-get-output --dir ct.pi2s2.it:9000/H4HyP0QFUkj9EiIFvQB6sw Retrieving files from host: infn-rb-01.ct.pi2s2.it ( for 01.ct.pi2s2.it:9000/H4HyP0QFUkj9EiIFvQB6sw ) ********************************************************************************* JOB GET OUTPUT OUTCOME Output sandbox files for the job: - have been successfully retrieved and stored in the directory: /home/falzone/MPI_JOB_SCHOOL/MVAPICH2/falzone_H4HyP0QFUkj9EiIFvQB6sw ********************************************************************************* MVAPICH2]$
Alberto Falzone NICE srl – Componenti Software Infrastruttura – 23 Luglio Catania 50 Sottomissione jobs paralleli via grid MVAPICH2 MVAPICH2]$ cd falzone_H4HyP0QFUkj9EiIFvQB6sw/ falzone_H4HyP0QFUkj9EiIFvQB6sw]$ ll total 4 -rw-rw-r-- 1 falzone falzone 0 Jul 21 19:10 mpi.err -rw-rw-r-- 1 falzone falzone 1886 Jul 21 19:10 mpi.out falzone_H4HyP0QFUkj9EiIFvQB6sw]$ cat mpi.out ################### # mpipre.sh Begin Here is on mpi.pre.sh Sat Jul 21 19:04:57 CEST 2007 infnlns-wn-01.ct.pi2s2.it End infnlns-wn-01.ct.pi2s2.it_53640 ( ) infnlns-wn-04.ct.pi2s2.it_41124 ( ) Process 0 of 8 is on infnlns-wn-01.ct.pi2s2.it Process 1 of 8 is on infnlns-wn-01.ct.pi2s2.it Process 3 of 8 is on infnlns-wn-01.ct.pi2s2.it Process 2 of 8 is on infnlns-wn-01.ct.pi2s2.it Process 4 of 8 is on infnlns-wn-04.ct.pi2s2.it Process 7 of 8 is on infnlns-wn-04.ct.pi2s2.it Process 5 of 8 is on infnlns-wn-04.ct.pi2s2.it Process 6 of 8 is on infnlns-wn-04.ct.pi2s2.it pi is approximately , Error is wall clock time = > cont.
Alberto Falzone NICE srl – Componenti Software Infrastruttura – 23 Luglio Catania 51 Sottomissione jobs paralleli via grid MVAPICH wall clock time = infnlns-wn-01 infnlns-wn-04 Job /opt/lsf/6.2/linux2.6-glibc2.3-x86_64/bin/mvapich2_wrapper -np 8 -machinefile /home/cometa001/globus-tmp.infnlns-wn /.mpi2/https_3a_2f_2finfn-rb- 01.ct.pi2s2.it_3a9000_2fH4HyP0QFUkj9EiIFvQB6sw/host29454./cpi-mpi2-ib 8 TID HOST_NAME COMMAND_LINE STATUS TERMINATION_TIME ===== ========== ================ ======================= =================== infnlns-wn./cpi-mpi2-ib 8 Done 07/21/ :05: infnlns-wn./cpi-mpi2-ib 8 Done 07/21/ :05: infnlns-wn./cpi-mpi2-ib 8 Done 07/21/ :05: infnlns-wn./cpi-mpi2-ib 8 Done 07/21/ :05: infnlns-wn./cpi-mpi2-ib 8 Done 07/21/ :05: infnlns-wn./cpi-mpi2-ib 8 Done 07/21/ :05: infnlns-wn./cpi-mpi2-ib 8 Done 07/21/ :05: infnlns-wn./cpi-mpi2-ib 8 Done 07/21/ :05:05 ################### # mpipost.sh Begin Here is on mpi.post.sh Sat Jul 21 19:05:13 CEST 2007 infnlns-wn-01.ct.pi2s2.it MY_VARIABLES=pippopluto End
Alberto Falzone NICE srl – Componenti Software Infrastruttura – 23 Luglio Catania 52 Sottomissione di FLASH via grid FLASH_TEST]$ cat flash.jdl Type = "Job"; JobType = "MVAPICH2"; NodeNumber = 8; Executable = "flash2_sedov"; Arguments = "8"; StdOutput = "mpi.out"; StdError = "mpi.err"; InputSandbox = {"mpi.pre.sh","mpi.post.sh","flash.par","flash2_sedov"}; OutputSandbox = {"mpi.err","mpi.out","sedov.log","amr_log","sedov_hdf5_chk_0006"}; #Requirements = other.GlueCEUniqueId == "infn-ce-01.ct.pi2s2.it:2119/jobmanager-lcglsf-long" || other.GlueCEUniqueId == "infn-ce-01.ct.pi2s2.it:2119/jobmanager-lcglsf-short" || other.GlueCEUniqueId == "infn-ce-01.ct.pi2s2.it:2119/jobmanager-lcglsf-infinite"; RetryCount = 3; FLASH_TEST]$ cat mpi.pre.sh echo " mpi.post.sh " export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/exp_soft/hdf5/ /lib/ FLASH_TEST]$ cat mpi.post.sh echo " mpi.post.sh " echo "LD_LIBRARY_PATH=$LD_LIBRARY_PATH"
Alberto Falzone NICE srl – Componenti Software Infrastruttura – 23 Luglio Catania 53 Sottomissione di FLASH via grid edg-job-submit --config edg_wl_ui_cmd_var.conf --config-vo edg_wl_ui.conf flash.jdl **** Warning: UI_CAN_NOT_EXECUTE **** Unable to execute "Python Tkinter Graphical": Unable to load library. Selected Virtual Organisation name (from --config-vo option): cometa Connecting to host infn-rb-01.ct.pi2s2.it, port 7772 Logging to host infn-rb-01.ct.pi2s2.it, port 9002 ================================ edg-job-submit Success ===================================== The job has been successfully submitted to the Network Server. Use edg-job-status command to check job current status. Your job identifier (edg_jobId) is: - The edg_jobId has been saved in the following file: /home/falzone/FLASH_TEST/jobs ============================================================================================= FLASH_TEST]$
Alberto Falzone NICE srl – Componenti Software Infrastruttura – 23 Luglio Catania 54 Sottomissione di FLASH via grid edg-job-status ************************************************************* BOOKKEEPING INFORMATION: Status info for the Job : Current Status: Scheduled Status Reason: Job successfully submitted to Globus Destination: infn-ce-01.ct.pi2s2.it:2119/jobmanager-lcglsf-long reached on: Tue Jul 17 14:59: ************************************************************* FLASH_TEST]$ edg-job-status ************************************************************* BOOKKEEPING INFORMATION: Status info for the Job : Current Status: Running Status Reason: Job successfully submitted to Globus Destination: infn-ce-01.ct.pi2s2.it:2119/jobmanager-lcglsf-long reached on: Tue Jul 17 15:01: ************************************************************* FLASH_TEST]$
Alberto Falzone NICE srl – Componenti Software Infrastruttura – 23 Luglio Catania 55 Sottomissione di FLASH via grid edg-job-status ************************************************************* BOOKKEEPING INFORMATION: Status info for the Job : Current Status: Done (Success) Exit code: 0 Status Reason: There were some warnings: some file(s) listed in the output sandbox were not available and were ignored Destination: infn-ce-01.ct.pi2s2.it:2119/jobmanager-lcglsf-long reached on: Tue Jul 17 15:03: ************************************************************* edg-job-get-output --dir. Retrieving files from host: infn-rb-01.ct.pi2s2.it ( for ) ********************************************************************************* JOB GET OUTPUT OUTCOME Output sandbox files for the job: - have been successfully retrieved and stored in the directory: /home/falzone/FLASH_TEST/falzone_yZl9bH9-4vc5JmtkCLLANw ********************************************************************************* FLASH_TEST]$
Alberto Falzone NICE srl – Componenti Software Infrastruttura – 23 Luglio Catania 56 Sottomissione di FLASH via grid cd /home/falzone/FLASH_TEST/falzone_yZl9bH9-4vc5JmtkCLLANw falzone_yZl9bH9-4vc5JmtkCLLANw]$ ll total rw-rw-r-- 1 falzone falzone 1600 Jul 17 17:09 amr_log -rw-rw-r-- 1 falzone falzone 0 Jul 17 17:09 mpi.err -rw-rw-r-- 1 falzone falzone Jul 17 17:09 mpi.out -rw-rw-r-- 1 falzone falzone Jul 17 17:09 sedov_hdf5_chk_0006 -rw-rw-r-- 1 falzone falzone Jul 17 17:09 sedov.log falzone_yZl9bH9-4vc5JmtkCLLANw]$ cat sedov.log FLASH log file: :00.24 Run number: 1 ============================================================================== Number of processors: 8 Dimensionality: 2 Max Number of Blocks/Proc: 1000 Number x zones: 8 Number y zones: 8 Number z zones: 1 Setup stamp: Tue Jul 17 11:33: Build stamp: Tue Jul 17 13:06:52 CEST 2007 System info: Linux infn-ui-01.ct.pi2s2.it EL.1.cernsmp x86_64 Version: FLASH Build directory: /home/falzone/software/FLASH2.5/object Setup syntax:./setup.py sedov -auto -verbose -ostype=Linux f compiler flags: mpif90 -c -fast -r8 -i4 -DN_DIM=2 -DMAXBLOCKS=1000 -DNXB=8 -DNYB=8 -DNZB=1 c compiler flags: mpicc -I /opt/exp_soft/hdf5/ //include -O2 -c -DN_DIM=2 -DMAXBLOCKS=1000 -DNXB=8 -DNYB=8 -DNZB=1 loader flags: -o ============================================================================== Comment: Sedov explosion ==============================================================================
Alberto Falzone NICE srl – Componenti Software Infrastruttura – 23 Luglio Catania 57 DEMO ON LINE
Alberto Falzone NICE srl – Componenti Software Infrastruttura – 23 Luglio Catania 58 Questions?