首页 IBM Platform LSF家族安装和配置简介.V1.0

IBM Platform LSF家族安装和配置简介.V1.0

举报
开通vip

IBM Platform LSF家族安装和配置简介.V1.0____________________________________IBMPlatformLSF家族安装和配置简介V1.0版马雪洁2013.5.7目录11集群结构11.1单纯LSF环境(命令行提交)11.2LSF+PAC环境(WEB提交)31.3LSF+PM环境(PM提交)32LSF安装和基本配置举例32.1LSF安装步骤32.1.1获得LSF和PAC安装包42.1.2解压缩lsfinstall安装脚本文件42.1.3修改install.config配置文件42.1.4执行安装42.1.5配置开机自启动42.1...

IBM Platform LSF家族安装和配置简介.V1.0
____________________________________IBMPlatformLSF家族安装和配置简介V1.0版马雪洁2013.5.7目录11集群结构11.1单纯LSF环境(命令行提交)11.2LSF+PAC环境(WEB提交)31.3LSF+PM环境(PM提交)32LSF安装和基本配置举例32.1LSF安装步骤32.1.1获得LSF和PAC安装包42.1.2解压缩lsfinstall安装脚本文件42.1.3修改install.config配置文件42.1.4执行安装42.1.5配置开机自启动42.1.6测试安装52.1.7启动/停止LSF进程(三种方式)52.1.8测试提交作业52.1.9使能root提交作业62.1.10修改配置文件后reconfig62.1.11日志和debug62.2配置文件说明72.3常用命令72.4配置公平竞争调度策略72.4.1添加轮循调度队列82.4.2添加层次公平竞争策略82.4.3多队列公平竞争策略92.4.4使能配置92.5配置抢占调度策略102.6配置全局限制策略102.6.1限制用户运行的作业数目102.6.2限制节点运行作业数目102.6.3限制队列作业的运行限制112.6.4设定Generallimits122.6.5使能配置122.7配置提交控制脚本esub122.8配置资源管理elim示例122.8.1汇报home目录空闲大小122.8.2汇报root进程数目132.8.3汇报应用程序许可证数目142.8.4测试elim脚本142.8.5添加资源定义和资源地图142.8.6查看资源数目143LSF命令行集成应用示例143.1CFD++集成(spoolingfile)143.1.1CFD++安装和许可证143.1.2集成许可证管理elim153.1.3添加CFD++jobstarter173.1.4添加CFDAPPprofile173.1.5CFD++命令行提交脚本实例183.2GAUSSIAN集成方式(spoolingfile)183.2.1Gaussian安装和许可证183.2.2Gaussian命令行提交脚本实例183.3Abaqus的脚本集成(bsub命令)193.4PlatformMPI作业203.5Openmpi作业203.6IntelMPI作业203.6.1Express版本不记账方式223.6.2Express版本blaunch记账方式273.6.3Standard版本PAM集成方式294安装PAC305使用PAC进行应用程序集成315.1Gaussian界面集成过程365.2CFD++集成后界面和后台脚本385.3在PAC中监控许可证406安装LicenseScheduler406.1基本安装测试406.2基本配置举例406.2.1添加许可证服务器地址406.2.2映射许可证feature:416.2.3使用许可证资源416.2.4配置许可证调度策略417常见问题418使用manpage419售后技术支持1集群结构较大的集群都会设计单独的登录节点,用户只能ssh到登录节点,不能直接ssh到集群的任何主节点和计算节点。同时配置用户在计算节点之间的ssh互信,为了并行作业的运行。登录节点也安装LSF,配置为LSF静态Client或者MXJ值为0,也即不运行作业的客户端。集群的WEB节点与办公访问局域网一个网段。如需使用浮动client,主节点网卡需要1.1单纯LSF环境(命令行提交)1.2LSF+PAC环境(WEB提交)用户通过portal提交作业:1.3LSF+PM环境(PM提交)2LSF安装和基本配置举例2.1安装前的准备工作NISready;NFS/GPFSready;2.2LSF安装步骤Useroottoinstall.GetNISandNFS/GPFSready.2.2.1获得LSF和PAC安装包lsf8.3_linux2.6-glibc2.3-x86_64.tar.Zlsf8.3_lsfinstall_linux_x86_64.tar.Zpac8.3_standard_linux-x64.tar.Z许可证文件platform_hpc_std_entitlement.dat2.2.2解压缩lsfinstall安装脚本文件Putthepackageunder/root/lsf[root@S2lsf]#gunziplsf8.3_lsfinstall_linux_x86_64.tar.Ztar-xvflsf8.3_lsfinstall_linux_x86_64.tar2.2.3修改install.config配置文件首先添加集群管理员lsfadmin。cdlsf8.3_lsfinstallviinstall.config[root@S2lsf8.3_lsfinstall]#catinstall.configLSF_TOP="/opt/lsf"(安装目录)LSF_ADMINS="lsfadmin"(先创建lsfadmin的用户名)LSF_CLUSTER_NAME="platform"(集群名称,任意指定)LSF_MASTER_LIST="s2s3"(LSF管理节点)LSF_ENTITLEMENT_FILE="/root/lsf/platform_hpc_std_entitlement.dat"(安装源许可证的地址)LSF_TARDIR="/root/lsf/"(安装源文件包的地址)2.2.4执行安装./lsfinstall-finstall.config2.2.5配置开机自启动/opt/lsf/9.1/installhostsetuprhostsetup2.2.6测试安装安装目录下的/conf目录[root@S2conf]#sourceprofile.lsfAddsourceprofile.lsfto/etc/profileifnorsh,setsshinlsf.conf[root@S2conf]#taillsf.confLSF_RSH="ssh"2.2.7启动/停止LSF进程(三种方式)[root@S2conf]#lsfstartup/lsfstop或者lsadminlimstatup/limshutdownlsadminresstartup/resshutdownbadminhstartup/hshutdown或者lsf_daemonsstart/stop[root@S2conf]#lsidIBMPlatformLSFExpress8.3forIBMPlatformHPC,May102012CopyrightPlatformComputingInc.,anIBMCompany,1992-2012.USGovernmentUsersRestrictedRights-Use,duplicationordisclosurerestrictedbyGSAADPScheduleContractwithIBMCorp.MyclusternameisplatformMymasternameiss2Youhavenewmailin/var/spool/mail/root[root@S2conf]#lsloadHOST_NAMEstatusr15sr1mr15mutpglsittmpswpmems2ok0.00.00.01%0.010151G20G61Gs4ok0.00.00.02%0.012183G20G62Gs6ok0.00.00.03%0.0123734M2G30Gs5ok0.00.00.05%0.0123468M2G30G2.2.8测试提交作业bsubsleep1000002.2.9使能root提交作业enableroottosubmitjob:LSF_ROOT_REX=local重启LSF进程。2.2.10修改配置文件后reconfig修改lsf.*配置文件后lsadminreconfig修改lsb.*配置文件后badminreconfig部分参数需要重启LSF主调度或者其他进程:badminmbdrestart;lsadminlimrestart;lsadminresrestart;badminhrestart2.2.11日志和debugFindthelogsunderlogdirectory.LSFwillrunmainly3processesoneachnode,onmasternodewillhave2more.Master:lim,res,sbatchd,mbatchd,mbschedCompute:lim,res,sbatchdTurnondebugincommandline:Runlim-2directlyonnodetocheckwhylimnotstartup.2.3配置文件说明目录/etc/init.d:/etc/init.s/lsflsf服务自启动脚本目录/apps/platform/8.3/lsf/conf:lsf.conflsf配置文件lsf.cluster.cluster83集群配置文件lsf.shared共享资源定义文件./lsbatch/cluster83/configdir/lsb.*调度系统配置文件lsb.userslsf用户与用户组配置文件lsb.queueslsf队列配置文件lsb.paramslsf调度参数配置文件lsb.applicationslsf应用配置文件lsb.hostslsf机器与机器组配置文件lsb.resourceslsf资源配置文件lsb.moduleslsf模块配置文件2.4常用命令bsub:提交作业;bjobs:查看作业信息;bhist:查看作业历史;lshosts:查看节点静态资源;bhosts,lsload:查看节点状态和资源信息;bqueues:查看队列配置;blimits:查看限制limit信息;lsid:集群版本和主节点;bmod:修改bsuboption;等等。2.5基于资源的调度策略bsub–R“ ((type==LINUX2.4&&r1m<2.0)||(type==AIX&&r1m<1.0)) ”或者在队列lsb.queues或者lsb.application文件定义:RES_REQ=select[((type==LINUX2.4&&r1m<2.0)||(type==AIX&&r1m<1.0))]bsub–R"select[type==any&&swap>=300&&mem>500]order[swap:mem]rusage[swap=300,mem=500]"job1bsub–Rrusage[mem=500:app_lic_v2=1||mem=400:app_lic_v1.5=1]"job1bsub–R"select[type==any&&swp>=300&&mem>500]order[mem]"job12.6配置公平竞争调度策略2.6.1添加轮循调度队列Modifylsb.queues,addfollowingBeginQueueQUEUE_NAME=roundRobinPRIORITY=40FAIRSHARE=USER_SHARES[[default,1]]#USERS=userGroupADefineyourownusergroupEndQueueRunbadminreconfigtoenablethechange.Runbqueues–ltocheckthequeue’sconfigure2.6.2添加层次公平竞争策略Addfollowingqueuetoaddhierarchicalsharepolicy:BeginQueueQUEUE_NAME=hierarchicalSharePRIORITY=40USERS=userGroupBuserGroupCFAIRSHARE=USER_SHARES[[userGroupB,7][userGroupC,3]]EndQueue2.6.3多队列公平竞争策略在lsb.queues中添加下列队列,注意节点组和用户组定义。BeginQueueQUEUE_NAME=verilogDESCRIPTION=masterqueuedefinitioncross-queuePRIORITY=50FAIRSHARE=USER_SHARES[[user1,100][default,1]]FAIRSHARE_QUEUES=normalshortHOSTS=hostGroupC#resourcecontention#RES_REQ=rusage[verilog=1]EndQueueBeginQueueQUEUE_NAME=shortDESCRIPTION=shortjobsPRIORITY=70#highestHOSTS=hostGroupCRUNLIMIT=510EndQueueBeginQueueQUEUE_NAME=normalDESCRIPTION=defaultqueuePRIORITY=40#lowestHOSTS=hostGroupCEndQueue2.6.4使能配置badminreconfig提交作业,并查看队列的用户动态优先级变化:bqueues–rlnormal2.7配置抢占调度策略配置最基本的slots抢占:BeginQueueQUEUE_NAME=shortPRIORITY=70HOSTS=hostGroupC#potentialconflictPREEMPTION=PREEMPTIVE[normal]EndQueueBeginQueueQUEUE_NAME=normalPRIORITY=40HOSTS=hostGroupC#potentialconflictPREEMPTION=PREEMPTABLE[short]EndQueue向两个队列提交作业,查看被preempt的作业的pending原因。2.8配置全局限制策略2.8.1限制用户运行的作业数目在lsb.users文件中添加:BeginUserUSER_NAMEMAX_JOBSJL/Puser14-user221user3-2groupA8-groupB@11Default2-EndUser2.8.2限制节点运行作业数目在lsb.hosts文件中:BeginHostHOST_NAMEMXJJL/Uhost142host221host3!-EndHost2.8.3限制队列作业的运行限制在lsb.queues中添加:BeginQueueQUEUE_NAME=myQueueHJOB_LIMIT=2PJOB_LIMIT=1UJOB_LIMIT=4HOSTS=hostGroupAUSERS=userGroupAEndQueue2.8.4设定Generallimits在lsb.resources文件定义全局generallimits示例:BeginLimitUSERSQUEUESHOSTSSLOTSMEMSWPuser1-hostB--20%user2normalhostA-20-EndLimitBeginLimitNAME=limit1USERS=user1PER_HOST=hostAhostCTMP=30%SWP=50%MEM=10%EndLimitBeginLimitPER_USERQUEUESHOSTSSLOTSMEMSWPTMPJOBSgroupA-hgroup1----2user2normal--200----short-----200EndLimit2.8.5使能配置badminreconfig2.9配置提交控制脚本esub全局esub脚本在作业被提交是调用,可以被自动的或者显式的调用从而控制用户作业提交的行为。编辑esub.project文件在$LSF_SERVERDIR下面(chmod为可执行):#!/bin/shif["_$LSB_SUB_PARM_FILE"!="_"];then.$LSB_SUB_PARM_FILEif["_$LSB_SUB_PROJECT_NAME"=="_"];thenecho"Youmustspecifyaproject!">&2exit$LSB_SUB_ABORT_VALUEfifiexit0在lsf.conf中定义LSB_ESUB_METHOD=”project”2.10配置资源管理elim示例2.10.1汇报home目录空闲大小编辑elim文件elim.home,放置在$LSF_SERVERDIR下面。chmod为可执行。#!/bin/shwhiletrue;dohome=`df-k/home|tail-1|awk'{printf"%4.1f",$4/(1024*1024)}'`echo1home$homesleep30done2.10.2汇报root进程数目编辑elim.root,放置在$LSF_SERVERDIR下面。chmod为可执行。#!/bin/shwhiletrue;doroot=`ps-ef|grep-vgrep|grep-c^root`echo1rootprocs$rootsleep30done2.10.3汇报应用程序许可证数目#!/bin/shlic_X=0;num=0whiletrue;do#onlywantthemastertogatherlic_Xif["$LSF_MASTER"="Y"];thenlic_X=`lmstat–a–clic_X.dat|grep...`>&2fi#onlywanttraining8,training1togathersimptonlicensesif["`hostname`"="training8"\–o"`hostname`"="training1"];thennum=`lmstat–a–csimpton_lic.dat|grep...`>&2fi#allhostsincludingmaster,willgatherthefollowingroot=`ps–efw|grep–vgrep|grep–croot`>>1&2tmp=`df–k/var/tmp|grepvar|awk'{print$4/1024}'`>&2if["$LSF_MASTER"="Y"];thenecho4lic_X$lic_Xsimpton$numrtprc$roottmp$tmpelseecho3simpton$numrtprc$roottmp$tmpfi#thesameINTERVALvaluesdefinedinlsf.sharedsleep60done2.10.4测试elim脚本直接运行./elim.root查看elim输出是否正确。2.10.5添加资源定义和资源地图在lsf.shared文件中添加rootprocs定义,并在lsf.clusterresourcesMap中添加资源和节点的映射关系。使能配置:lsadminreconfig;badminreconfig2.10.6查看资源数目lsload–l3LSF命令行集成应用示例本节例举几个应用的不同集成方式。使用spooling文件或者bsub命令行都可以自由转换。3.1CFD++集成(spoolingfile)3.1.1CFD++安装和许可证安装路径:ln-3620-4许可证:/gpfs/software/cfdpp/mbin/Metacomp.lic许可证服务器:ln-3620-4启动许可证服务器:[hpcadmin@mn-3650jessi]$sshln-3620-4Lastlogin:TueMar2619:19:242013frommn-3650.private.dns.zone[hpcadmin@ln-3620-4~]$/gpfs/software/cfdpp/mbin/lmgrd-c/gpfs/software/cfdpp/mbin/Metacomp.lic确认许可证服务器是否正常运行:/gpfs/software/cfdpp/mbin/lmutillmstat-a-c/gpfs/software/cfdpp/mbin/Metacomp.lic3.1.2集成许可证管理elim添加elim 方法 快递客服问题件处理详细方法山木方法pdf计算方法pdf华与华方法下载八字理论方法下载 :(elim全集群只需运行一个,因此只在头节点放置elim脚本即可)在头节点:cd$LSF_SERVERDIR添加如下文件:elim.lic:[root@mn-3650jessi]#cd$LSF_SERVERDIR[root@mn-3650etc]#pwd/opt/lsf/8.3/linux2.6-glibc2.3-x86_64/etc[root@mn-3650etc]#chmoda+xelim.lic修改如下的配置文件:[root@mn-3650etc]#vi$LSF_ENVDIR/lsf.shared添加如下一行:cfd_licNumeric30Y(CFD++License)[root@mn-3650etc]#vi$LSF_ENVDIR/lsf.cluster在resourcemap一段添加如下一行:BeginResourceMapRESOURCENAMELOCATIONcfd_lic[all]hostid[default]…[root@mn-3650etc]#lsadminreconfig;badminreconfig3.1.3添加CFD++jobstarter如果使用spoolingfile可不用添加。(Portal集成方式使用)添加jobstarter可执行文件:3.1.4添加CFDAPPprofile[root@mn-3650etc]#vi$LSF_ENVDIR/lsf.shared添加如下配置:BeginApplicationNAME=cfdJOB_STARTER=/opt/lsf/jobstarter/cfd_starterRES_REQ="rusage[cfd_lic=1]"EndApplicationbadmninreconfig使得此文件生效,使用bapp–lcfd查看是否成功:[root@mn-3650bin]#bapp-lcfdAPPLICATIONNAME:cfd--Nodescriptionprovided.STATISTICS:NJOBSPENDRUNSSUSPUSUSPRSV12120000PARAMETERS:JOB_STARTER:/opt/lsf/jobstarter/cfd_starterRES_REQ:"rusage[cfd_lic=1]"3.1.5CFD++命令行提交脚本实例然后bsub<cfd.sh提交作业。3.2GAUSSIAN集成方式(spoolingfile)3.2.1Gaussian安装和许可证路径:/gpfs/software/Gaussian/许可证:无许可证版本,单个作业只能单机运行。3.2.2Gaussian命令行提交脚本实例下面脚本:g03.sh提交作业:bsub<g03.sh3.3Abaqus的脚本集成(bsub命令)编辑脚本abaqus_run.sh脚本#!/bin/sh##version:1.3.0exportABAQUS_CMD="/gpfs/software/Abaqus/Commands/abaqus"exportLM_LICENSE_FILE="/gpfs/software/Abaqus/License/abq612.lic"#指定cpunumber,注意要与bsub命令行中-n指定的cpu个数一致exportNCPU=16#指定输入文件exportINPUT_FILE=beam.inp#指定作业名exportJOB_NAME=abaqus_job3${ABAQUS_CMD}job=$JOB_NAMEcpus=$NCPUinput=\"$INP_INPUT_FILE\"2)通过LSF提交#进入输入数据所在目录,执行bsub命令bsub-qqeng-n16./abaqus_run.sh3.4Amber作业(blaunch集成,可记账)针对intelmpi,编写mpdboot.lsf脚本。变为可执行,放置在$LSF_SERVERDIR下面编写提交作业脚本:[ymei@mnistest]$catnew.sh#!/bin/sh#BSUB-qsmall#BSUB-n128#BSUB-o%J.out#BSUB-e%J.err#BSUB-JIMPI#BSUB-x#exportPATH=/gpfs01/software/intel/impi/4.1.0.024/intel64/bin:$PATH#/gpfs01/home/ymei/jessi/mpdboot.lsfmpdboot.lsfexportI_MPI_DEVICE=ssm#exportI_MPI_FABRICS=shm:ofa#exportI_MPI_FAST_STARTUP=1#exportI_MPI_DEVICE=rdssm#mpiexec-np$LSB_DJOB_NUMPROC/gpfs01/software/intel/impi/4.1.0.024/test/hellowordmpiexec-np$LSB_DJOB_NUMPROC$AMBERHOME/bin/sander.MPI-ng32-groupfileremd10.groupfilempdallexit提交作业:bsub<new.sh3.5PlatformMPI作业3.5.1安装PlatformMPI确认用户无密码访问sshOK。安装PlatformMPI到共享目录下:shplatform_mpi-08.3.0.0-0320r.x64.sh-installdir=/opt/pmpi–norpm如果缺失CCompiler,执行:yuminstallgcc3.5.2LSF外面验证安装OK设置环境变量:exportMPI_REMSH="ssh-x"exportMPI_ROOT=/opt/pmpi/opt/ibm/platform_mpi/编译helloworld示例程序:/opt/pmpi/opt/ibm/platform_mpi/bin/mpicc-ohelloworld/opt/pmpi/opt/ibm/platform_mpi/help/hello_world.c[root@server3help]#/opt/pmpi/opt/ibm/platform_mpi/bin/mpirun-f../help/hostswarning:MPI_ROOT/opt/pmpi/opt/ibm/platform_mpi/!=mpirunpath/opt/pmpi/opt/ibm/platform_mpiHelloworld!I'm1of4onserver3Helloworld!I'm0of4onserver3Helloworld!I'm3of4oncomputer007Helloworld!I'm2of4oncomputer007[root@server3help]#cat../help/hosts-hserver3-np2/opt/pmpi/opt/ibm/platform_mpi/help/helloworld-hcomputer007-np2/opt/pmpi/opt/ibm/platform_mpi/help/helloworld3.5.3通过LSF提交exportMPI_REMSH=blaunch$mpirun-np4-IBV~/helloworld$mpirun-np32-IBV~/helloworld$mpirun-np4-TCP~/helloworld或者[root@server3conf]#bsub-o%J.out-e.%J.err-n4/opt/pmpi/opt/ibm/platform_mpi/bin/mpirun-lsb_mcpu_hosts/opt/pmpi/opt/ibm/platform_mpi/help/helloworldJob<210>issubmittedtodefaultqueue<normal>.[root@server3conf]#bjobsJOBIDUSERSTATQUEUEFROM_HOSTEXEC_HOSTJOB_NAMESUBMIT_TIME210rootPENDnormalserver3*elloworldMay910:55[root@server3conf]#cat210.outSender:LSFSystem<jessi@computer007>Subject:Job210:</opt/pmpi/opt/ibm/platform_mpi/bin/mpirun-lsb_mcpu_hosts/opt/pmpi/opt/ibm/platform_mpi/help/helloworld>incluster<jessi_cluster>DoneJob</opt/pmpi/opt/ibm/platform_mpi/bin/mpirun-lsb_mcpu_hosts/opt/pmpi/opt/ibm/platform_mpi/help/helloworld>wassubmittedfromhost<server3>byuser<root>incluster<jessi_cluster>.Jobwasexecutedonhost(s)<4*computer007>,inqueue<normal>,asuser<root>incluster<jessi_cluster>.</root>wasusedasthehomedirectory.</opt/lsf/conf>wasusedastheworkingdirectory.StartedatThuMay918:49:062013ResultsreportedatThuMay918:49:072013Yourjoblookedlike:------------------------------------------------------------#LSBATCH:Userinput/opt/pmpi/opt/ibm/platform_mpi/bin/mpirun-lsb_mcpu_hosts/opt/pmpi/opt/ibm/platform_mpi/help/helloworld------------------------------------------------------------Successfullycompleted.Resourceusagesummary:CPUtime:0.23sec.MaxMemory:2MBAverageMemory:2.00MBTotalRequestedMemory:-DeltaMemory:-(Delta:thedifferencebetweentotalrequestedmemoryandactualmaxusage.)MaxSwap:36MBMaxProcesses:1MaxThreads:1Theoutput(ifany)follows:Helloworld!I'm2of4oncomputer007Helloworld!I'm0of4oncomputer007Helloworld!I'm1of4oncomputer007Helloworld!I'm3of4oncomputer007PS:Readfile<.210.err>forstderroutputofthisjob.或者更多参数$/opt/platform_mpi/bin/mpirun-np120-ibv-hostlist"cn-22-001cn-22-002cn-22-003cn-22-004cn-22-005cn-22-006cn-22-007cn-22-008cn-22-009cn-22-010"/data/hello_world如果希望MPI作业不通过LSF提交运行,修改MPI_USELF环境变量为n3.6Openmpi作业下载openmpi软件包./configureLIBS=-ldl--with-lsf=yes-prefix=/usr/local/ompi/Openmpi1.3.2之上版本已经于LSFblaunch紧密集成。提交openmpi作业:bsub–n2–o%J.out–e%J.errmpiexecmympi.out3.7IntelMPI作业3.7.1Express版本不记账方式如果需要对作业记账,需要使用blaunch的集成方式。环境变量设置.bsahrcexportPATH=/gpfs/software/intel/composerxe/bin/:/gpfs/software/intel/mpi_41_0_024/include:/gpfs/software/intel/mpi_41_0_024/bin64:/gpfs/software/intel/composerxe/mkl:$PATHsource/gpfs/software/intel/composerxe/bin/compilervars.shintel64source/gpfs/software/intel/mpi_41_0_024/bin64/mpivars.shsource/gpfs/software/intel/composerxe/mkl/bin/mklvars.shintel64MPI测试程序Helloworld.c#include"mpi.h"#include<stdio.h>#include<math.h>intmain(intargc,char**argv){intmyid,numprocs;intnamelen;charprocessor_name[MPI_MAX_PROCESSOR_NAME];MPI_Init(&argc,&argv);MPI_Comm_rank(MPI_COMM_WORLD,&myid);MPI_Comm_size(MPI_COMM_WORLD,&numprocs);MPI_Get_processor_name(processor_name,&namelen);fprintf(stderr,"HelloWorld!Process%dof%don%s\n",myid,numprocs,processor_name);MPI_Finalize();}命令执行,TCP 协议 离婚协议模板下载合伙人协议 下载渠道分销协议免费下载敬业协议下载授课协议下载 mpirun-machinehost.eth-envI_MPI_FABRICSshm:tcp-np32./helloworld.icc命令执行,IB网络mpirun-machinehost.ib-envI_MPI_FABRICSshm:ofa-np32./helloworld.icc命令执行,Debug模式mpirun-machinehost.ib-envI_MPI_FABRICSshm:ofa-envI_MPI_DEBUG2-np32./helloworld.iccLSF提交脚本bsub_intelmpi_ib.sh#!/bin/sh#BSUB-cwd.#BSUB-R"span[ptile=4]"#BSUB-e%J.err#BSUB-o%J.outmpirun-machine$LSB_DJOB_HOSTFILE-envI_MPI_FABRICSshm:ofa./helloworld.icc提交作业:bsub<bsub_intelmpi_ib.sh3.7.2Express版本blaunch记账方式3.7.2.1编写mpdboot.lsf文件#!/usr/bin/envpython"""mpdbootforLSF[-f|--hostfilehostfile][-i|--ifhn=alternate_interface_hostname_of_ip_address-f|--hostfilehostfile][-h]"""importreimportstringimporttimeimportsysimportgetoptfromtimeimportctimefromosimportenviron,pathfromsysimportargv,exit,stdoutfrompopen2importPopen4fromsocketimportgethostname,gethostbynamedefmpdboot():#changemeMPI_ROOTDIR="/opt/intel/impi/4.0.0.025"#mpdCmd="%s/bin/mpd"%MPI_ROOTDIRmpdtraceCmd="%s/bin/mpdtrace"%MPI_ROOTDIRmpdtraceCmd2="%s/bin/mpdtrace-l"%MPI_ROOTDIRnHosts=1host=""ip=""localHost=""localIp=""found=FalseMAX_WAIT=5t1=0hostList=""hostTab={}cols=[]hostArr=[]hostfile=environ.get('LSB_DJOB_HOSTFILE')binDir=environ.get('LSF_BINDIR')ifenviron.get('LSB_MCPU_HOSTS')==None\orhostfile==None\orbinDir==None:print"notrunninginLSF"exit(-1)rshCmd=binDir+"/blaunch"p=re.compile("\S+_\d+\s+\(\d+\.\d+\.\d+\.\d+")#try:opts,args=getopt.getopt(sys.argv[1:],"hf:i:",["help","hostfile=","ifhn="])exceptgetopt.GetoptError,err:printstr(err)usage()sys.exit(-1)fileName=Noneifhn=Noneforo,ainopts:ifo=="-v":version();sys.exit()elifoin("-h","--help"):usage()sys.exit()elifoin("-f","--hostfile"):fileName=aelifoin("-i","--ifhn"):ifhn=aelse:print"option%sunrecognized"%ousage()sys.exit(-1)iffileName==None:ififhn!=None:print"--ifhnrequiresahostfilecontaining'hostnameifhn=alternate_interface_hostname_of_ip_address'\n"sys.exit(-1)#useLSB_DJOB_HOSTFILEfileName=hostfilelocalHost=gethostname()localIp=gethostbyname(localHost)pifhn=re.compile("\w+\s+\ifhn=\d+\.\d+\.\d+\.\d+")#pifhn=re.compile("\S+\ifhn=\d+\.\d+\.\d+\.\d+")try:#checkthehostfilemachinefile=open(fileName,"r")forlineinmachinefile:ifnotlineorline[0]=='#':continueline=re.split('#',line)[0]line=line.strip()ifnotline:continueifnotpifhn.match(line):#shouldnothave--ifhnoptionififhn!=None:print"hostfile%snotvalidfor--ifhn"%(fileName)print"hostfileshouldcontain'hostnameifhn=ip_address'"sys.exit(-1)host=re.split(r'\s+',line)[0]ifcmp(localHost,host)==0\orcmp(localIp,gethostbyname(host))==0:continuehostTab[host]=Noneelse:#multipleblaunch-escols=re.split(r'\s+\ifhn=',line)host=cols[0]ip=cols[1]ifcmp(localHost,host)==0\orcmp(localIp,gethostbyname(host))==0:continuehostTab[host]=ip#print"line:%s"%(line)machinefile.close()exceptIOError,err:printstr(err)exit(-1)#launchampdonlocalhostififhn!=None:#cmd=mpdCmd+"--ifhn=%s"%(ifhn)cmd="%s-n%s%s--ifhn=%s"%(rshCmd,localHost,mpdCmd,ifhn)else:#cmd=mpdCmdcmd="%s-n%s%s"%(rshCmd,localHost,mpdCmd)print"Startinganmpdonlocalhost:",cmdPopen4(cmd,0)#waittil5secondsatmaxwhilet1<MAX_WAIT:time.sleep(1)trace=Popen4(mpdtraceCmd2,0)#hostname_portnumber(IPaddress)line=trace.fromchild.readline()ifnotp.match(line):t1+=1continuestrings=re.split('\s+',line)(basehost,baseport)=re.split('_',strings[0])#print"host:",basehost,"port:",baseportfound=Truehost=""breakifnotfound:print"Cannotstartmpdonlocalhost"sys.exit(-1)else:print"Donestartinganmpdonlocalhost"#launchmpdontherestofhostsforhost,ipinhostTab.items():nHosts+=1ifnHosts<2:sys.exit(0)print"Constructinganmpdring..."ififhn!=None:forhost,ipinhostTab.items():#print"host:%sifhn%s\n"%(host,ip)cmd="%s%s%s-h%s-p%s--ifhn=%s"%(rshCmd,host,mpdCmd,basehost,baseport,ip)#print"cmd:",cmdPopen4(cmd,0)else:forhost,ipinhostTab.items():#print"host:%sifhn%s\n"%(host,ip)hostArr.append(host+"")hostList=string.join(hostArr)print"hostList:%s"%(hostList)cmd="%s-z\'%s\'%s-h%s-p%s"%(rshCmd,hostList,mpdCmd,basehost,baseport)print"cmd:",cmdPopen4(cmd,0)#waittillallmpdsarestartedMAX_TIMEOUT=300+0.1*(nHosts)t1=0started=Falsewhilet1<MAX_TIMEOUT:time.sleep(1)trace=Popen4(mpdtraceCmd,0)iflen(trace.fromchild.readlines())<nHosts:t1+=1continuestarted=Truebreakifnotstarted:print"Failedtoconstructanmpdring"exit(-1)print"Doneconstructinganmpdringat",ctime()defusage():print__doc__if__name__=='__main__':mpdboot()3.7.2.2提交作业脚本(spoolingfile)cpi.sh:#LSBATCH:Userinput#BSUB-n2#BSUB-PI210105G##BSUB-W00:33#BSUB-o%J.out#BSUB-e%J.err#BSUB-JIMPI#BSUB-R'span[ptile=1]'#BSUB-x#BSUB-m"iquadcore-01!rhel5-5"#BSUB-appdjob#exportLSB_DEBUG_CMD="LC_TRACELC_EXECLC_HPC"#exportLSB_CMD_LOG_MASK=LOG_DEBUG3exportPATH=/opt/intel/impi/4.0.0.025/bin:$PATH#./usr/share/modules/init/bash#modulepurge#moduleloadmpi/3.2.2.006set-xmpiexec-np$LSB_DJOB_NUMPROC/tmp/cpi10000mpdallexit3.7.2.3提交作业bsub<cpi.sh3.7.3Standard版本PAM集成方式[iquadcore-01]185%source/scratch/intel/impi/3.2.2.006/bin64/mpivars.csh[iquadcore-01]186%env|grepMPII_MPI_ROOT=/scratch/intel/impi/3.2.2.0063.7.3.1按照HPC文档配置intelmpi资源。Addintelmpiresourcesinlsf.sharedfileandaddintelmpiresourceinlsf.clusterfileforeachhost.Verifywithfollowingcommand:[iquadcore-01]189%lshostsHOST_NAMEtypemodelcpufncpusmaxmemmaxswpserverRESOURCESsaspm01X86_64PC6000116.123008M3074MYes(intelmpimpich2mgopenmpi)iquadcore-0X86_64Intel_EM60.087974M4094MYes(intelmpimg)(2)修改intelmpi_wrapper中安装路径[saspm01]189%sudovi`whichintelmpi_wrapper`#DefinetopdirectoryforIntelMPIMPI_TOPDIR="/scratch/intel/impi/3.2.2.006"#DefineMPIcommandsusedinthescriptMPIEXEC_CMD="$MPI_TOPDIR/bin64/mpiexec"MPDEXIT_CMD="$MPI_TOPDIR/bin64/mpdallexit"MPDBOOT_CMD="$MPI_TOPDIR/bin64/mpdboot"#CheckIntelMPIversion.Mustbe1.0.2orhigher.checkMPIversion3.7.3.2验证MPI在LSF外的可行性[iquadcore-01]195%[iquadcore-01]195%catp.hostsiquadcore-01iquadcore-01iquadcore-01saspm01saspm01saspm01[iquadcore-01]196%mpiexec-machinefilep.hosts-n4./testHelloworld:rank0of4runningoniquadcore-01Helloworld:rank1of4runningoniquadcore-01Helloworld:rank2of4runningoniquadcore-01Helloworld:rank3of4runningonsaspm01[iquadcore-01]197%mpdtrace-liquadcore-01_42093(172.20.1.100)saspm01_36768(172.20.6.95)3.7.3.3使用PAM方式提交LSF作业[iquadcore-01]200%[iquadcore-01]200%bsub-I-aintelmpi-n4-m"iquadcore-01saspm01!"mpirun.lsf./testJob<3814>issubmittedtoqueue<hpc_linux>.<<Waitingfordispatch...>><<Startingonsaspm01>>Helloworld:rank0of4runningonsaspm01Helloworld:rank1of4runningonsaspm01Helloworld:rank2of4runningoniquadcore-01Helloworld:rank3of4runningoniquadcore-01Job/scratch/sup5/zliu/7.0EP5/top/7.0/linux2.6-glibc2.3-x86_64/bin/intelmpi_wrapper./testTIDHOST_NAMECOMMAND_LINESTATUSTERMINATION_TIME=========================================================================00000iquadcore-./testDone03/16/201020:00:4900001iquadcore-./testDone03/16/201020:00:4900002saspm01./testDone03/16/201020:00:3900003saspm01./testDone03/16/201020:00:39[iquadcore-01]201%Youcanseethereisno"-np4"after"bsub-n4mpirun.lsf"3.7.3.4Debug方法提交命令后添加-pass-Dpass3–Tsdebug:bsub-I-aintelmpi-n4mpirun.lsf./test-pass-Dpass3-TSdebug4安装PAC1)检查安装文件,如pac8.3_standard_linux-x64.tar.Z,许可证在安装包中自带,位于NFS共享目录/apps/platform/8.3/pac下。2)解压缩pac8.3_standard_linux-x64.tar.Z,修改pac3)进入pac8.3_standard_linux-x644)修改pacinstall.sh安装路径和DBconnecter存放路径exportPAC_TOP="/apps/platform/8.3/pac"exportMYSQL_JDBC_DRIVER_JAR="/usr/share/java/mysql-connector-java-5.1.12.jar"5)安装mysql,并确认mysql服务启动正常。(yuminstallmysql*-y)安装client和server端,servicemysqldstatus/start/stop(不用执行)修改/opt/lsf/conf/lsbatch/cluster1/configdir/lsb.params加入ENABLE_EVENT_STREAM=ybadminreconfig6)运行pacinstall.sh进行安装(运行之前确认source了LSF的环境变量)7)Source换环境变量:source/apps/platform/8.3/pac/profile.platform(将上面命令添加到/etc/profile文件结尾,登陆自动source环境)8)使用下面命令启动portal:#pmcadminstart#perfadminstartall9)使用下面命令查看否正常启动:#pmcadminlist#perfadminlist10)使用下面地址访问portal:http://hostipaddress:808011)使用管理员或用户身份登录(NIS用户)12)配置VNC方法,请参考PAC管理员文档。5使用PAC进行应用程序集成PAC集成的概念:配置和设计XML提交页面,在对应的脚本文件中处理XML文件中传递的环境变量。最终生成提交作业的逻辑(/opt/pac/gui/conf/application/published/app.cmd文件的最后):JOB_RESULT=`/bin/sh-c"bsub-q$SUB)QUEUES$JOB_NAME_OPT$CWD_OPT${PROJECT_NAME_OPT}${CWD_DIR}${QUEUE_OPT}$NCPU_OPT$LSF_RESREQ$RUNHOST_OPT$APP_PARAMS$EXTRA_PARAMS$OUTPUT_OPT$NASTRAN_CMD$INPUT_OPT$MEMORYARCH_OPT$NASTRAN_PARAMS${NASTRAN_OPTIONS}${MPI_OPTIONS}2>&1"`5.1Gaussian界面集成过程使用lsfadmin登录进入http://hostipaddress:8080/platform/选中某现有模板,点击SaveAs为GAUSSIAN模板,进入Modify页面编辑GAUSSIAN模板。选中程序参数部分,点击Add:选择DropDownList,然后点击Next,设置如下的环境变量和下拉列表值,代表两种gaussian版本。下拉页面点击OK,保存后产生如下:编辑者可删除或者隐藏无用的选择信息。并可以配置默认的下拉列表变量等信息。如下图所示:修改后台脚本,在界面中点击SubmissionScript或者直接修改文件为如下:保存完后没提交测试作业进行测试。点击AddServer或者AddLocalFile添加.com文件。点击SubmitTestJob,运行作业。并查看TestJob运行状况。由于Gaussian设置的执行权限,hpcadmin不能执行。请使用gaussian用户组用户执行。5.2CFD++集成后界面和后台脚本添加和修改界面方式类似GAUSSIAN,改完后如下图所示:注意,需要将输入文件直接压缩为Zip文件上传到集群共享存储。后台脚本会直接解压缩Zip包,并进行相关的逻辑执行:CFD++集成模板进入到提交模板,选择需要运行的输入文件ZIP文件包。Submit5.3在PAC中监控许可证需购买licensescheduler模块;修改lsf.licensescheduler添加相应的许可证服务器和feature,如添加如下的:许可证服务器:BeginServiceDomainNAME=LanServerLIC_SERVERS=((1700@serverapp)(7789@ionode))EndServiceDomain许可证feature:BeginFeatureFLEX_NAME=MD_NASTRANNAME=nastran_licDISTRIBUTION=LanServer(default1high1/1)EndFeatureBeginFeatureFLEX_NAME=CFD_FASTPACK_BASENAME=fastran_licDISTRIBUTION=LanServer(default1high1/1)EndFeature然后重启license服务:bladminreconfig使用blstat查看许可证状态。在portal中查看许可证使用状态:6安装LicenseScheduler6.1基本安装测试获得安装包:lsf9.1_licsched_lnx26-libc23-x64.tar.Z解压缩:zcatlsf9.1_licsched_lnx26-libc23-x64.tar.Z|tarxvf–cdlsf9.1_licsched_lnx26-libc23-x64sourceLSFenvbeforeinstallation;./setupRestartlsftotestlicensescheduler:lsadminlimrestart;Runbld/blstattochecktheinstallationiscorrectornot.6.2基本配置举例使用licensescheduler调度nastran和CFD许可证:配置lsf.licenseshceduler文件:6.2.1添加许可证服务器地址BeginServiceDomainNAME=LanServerLIC_SERVERS=((1700@serverapp)(7789@ionode))EndServiceDomain6.2.2映射许可证featureBeginFeatureFLEX_NAME=MD_NASTRANNAME=nastran_licDISTRIBUTION=LanServer(default1high1/1)EndFeatureBeginFeatureFLEX_NAME=CFD_FASTPACK_BASENAME=fastran_licDISTRIBUTION=LanServer(default1high1/1)EndFeature6.2.3使用许可证资源bsub–Lpdefault–R“fastran_lic=1”fastran.cmd6.2.4配置许可证调度策略暂略。7常见问题PAC启动不起来,查看PAC日志。查看默认的8080端口是否已经被占用。修改端口。8使用manpageLSF的manpage做的非常丰富。manlsf.conf或者manbsub或者其他命令和配置文件的名称。9售后技术支持拨打TSS热线:800(400)-810-1818转5200需凭用户ICN代码。[root@mn-3650etc]#catelim.lic#!/bin/shtotallicences=`/gpfs/software/cfdpp/mbin/lmutillmstat-a-c/gpfs/software/cfdpp/mbin/Metacomp.lic|grep"UsersofCFD++_SOLV_Ser"|/bin/cut-d''-f7`whiletruedousedlicences=`/gpfs/software/cfdpp/mbin/lmutillmstat-a-c/gpfs/software/cfdpp/mbin/Metacomp.lic|/bin/grep"UsersofCFD++_SOLV_Ser"|/bin/cut-d''-f13`cfd_lic=$((${totallicences}-${usedlicences}))echo"1cfd_lic${cfd_lic}"/bin/sleep30done[hpcadmin@mn-3650jessi]$cat/opt/lsf/jobstarter/cfd_starter#!/bin/shMPI_RUN=/gpfs/software/cfdpp/hpmpi/bin/mpiruncase"$PRESSION"inSINGLE_PRESSION)CFD_CMD=/gpfs/software/cfdpp/mbin/mcfd.11.1/r4_hpmpimcfd;;DOUBLE_PRESSION)CFD_CMD=/gpfs/software/cfdpp/mbin/mcfd.11.1/hpmpimcfd;;esacCMD="$*-hostfile$LSB_DJOB_HOSTFILE$CFD_CMD"eval"$CMD"[hpcadmin@mn-3650jessi]$catcfd.sh#!/bin/sh#BSUB-n12#BSUB-o%J.out#BSUB-e%J.err#BSUB-appcfd#BSUB-R"rusage[cfd_lic=1]"cd/gpfs/software/cfd++/test/ogive//gpfs/software/cfdpp/hpmpi/bin/mpirun-hostfile$LSB_DJOB_HOSTFILE/gpfs/software/cfdpp/mbin/mcfd.11.1/hpmpimcfd#!/bin/sh#BSUB-qqchem#BSUB-n4#BSUB-R"span[hosts=1]"#BSUB-cwd.#BSUB-e%J.err#BSUB-o%J.outJOB=Full_codes_112_ipr_C1_0588.comJOBNAME=`basename"$JOB".com`exportg03root=/gpfs/software/GaussianexportGAUSS_SCRDIR=/tmpsource$g03root/g03/bsd/g03.profile/gpfs/software/Gaussian/g03/g03<$JOB>"$JOBNAME.log"[ms@mn-3650~]$cat/usr/share/pmc/gui/conf/application/published/GAUSSIAN/GAUSSIAN.cmd#!/bin/sh#numberoftasksperhostSPAN="span[hosts=1]"#LSF_RESREQ="select[type==any]"LANG=C#SourceCOMMONfunctions.${GUI_CONFDIR}/application/COMMON#checkBSUBparametersandcreatefinalbsuboptionsif["x$JOB_NAME"!="x"];thenJOB_NAME_OPT="-J\"$JOB_NAME\""elseJOB_NAME_OPT="-J`basename$OUTPUT_FILE_LOCATION`"fiif["x$SPAN"!="x"];thenLSF_RESREQ="$LSF_RESREQ$SPAN"fiif["x$LSF_RESREQ"!="x"];thenLSF_RESREQ="-R\"$LSF_RESREQ\""fiif["x$OUTPUT_FILE_LOCATION"!="x"];thenOUTPUT_OPT="-o\"${OUTPUT_FILE_LOCATION}/output.${EXECUTIONUSER}.txt\""CWD_DIR="-cwd\"$OUTPUT_FILE_LOCATION\""elseOUTPUT_OPT="-o${DEPLOY_HOME}/plugin/lsf/logs/output.${EXECUTIONUSER}.txt"CWD_DIR="-cwd/home/$USER"fiif["x$GS_VERSION"="xG03"];thenexportg03root=/gpfs/software/gaussianexportGAUSS_SCRDIR=/tmpsource$g03root/g03/bsd/g03.profileGS_BIN="/gpfs/software/gaussian/g03/g03"fiif["x$GS_VERSION"="xG09"];thenexportg09root=/gpfs/software/gaussianexportGAUSS_SCRDIR=/tmpsource$g09root/g09/bsd/g09.profileGS_BIN="/gpfs/software/gaussian/g09/g09"fiJOB_RESULT=`/bin/sh-c"bsub-J$JOB_NAME${CWD_DIR}-qqchem-n4$LSF_RESREQ$OUTPUT_OPT$GS_BIN\<$INPUT_FILE_COM\>\"$INPUT_FILE_COM.log\"2>&1"`exportJOB_RESULTOUTPUT_FILE_LOCATION${GUI_CONFDIR}/application/job-result.sh[hpcadmin@mn-3650jessi]$cat/usr/share/pmc/gui/conf/application/published/CFD/CFD.cmd#!/bin/shMPI_RUN=/gpfs/software/cfdpp/hpmpi/bin/mpirun./gpfs/software/cfdpp/mcfdenv.shexportLD_LIBRARY_PATH=$LD_LIBRARY_PATH:/gpfs/software/cfdpp/glibexportMPI_ROOT=/gpfs/software/cfdpp/hpmpiexportPATH=$PATH:$MPI_ROOT/binexportMANPATH=$MANPATH:$MPI_ROOT/share/manexportMPI_REMSH=sshSPAN="span[ptile=4]"#SourceCOMMONfunctions.${GUI_CONFDIR}/application/COMMONif["x$NCPU"!="x"];thenNCPU_OPT="-n$NCPU"fiif["x$DECK"!="x"];then DECK=`formatFilePath"${DECK}"` INPUT_OPT="`basename$DECK`"else echo"YoumustspecifyaninputzipfiletosubmitaCFDjob."1>&2 exit1fiif["x$EXTRA_PARAMS"!="x"];thenRESULT=`isValidOption"$EXTRA_PARAMS"`if["$RESULT"=="N"];thenexit1fifiif["x$QUEUE"!="x"];thenQUEUE_OPT="-q$QUEUE"elseQUEUE_OPT=""fiif["x$JOB_NAME"!="x"];thenJOB_NAME_OPT="-J\"$JOB_NAME\""elseJOB_NAME_OPT="-J`basename$OUTPUT_FILE_LOCATION`"fiif["x$SPAN"!="x"];thenLSF_RESREQ="$LSF_RESREQ$SPAN"fiif["x$LSF_RESREQ"!="x"];thenLSF_RESREQ="-R\"$LSF_RESREQ\""fiif["x$OUTPUT_FILE_LOCATION"!="x"];thenOUTPUT_OPT="-o\"${OUTPUT_FILE_LOCATION}/output.${EXECUTIONUSER}.txt\""CWD_DIR="-cwd\"$OUTPUT_FILE_LOCATION\""elseOUTPUT_OPT="-o${DEPLOY_HOME}/plugin/lsf/logs/output.${EXECUTIONUSER}.txt"CWD_DIR="-cwd/gpfs/home/$USER"fi/usr/bin/unzip$DECK-d$OUTPUT_FILE_LOCATIONJOB_RESULT=`/bin/sh-c"bsub-n12-qqchem-appcfd$JOB_NAME_OPT$CWD_OPT${CWD_DIR}${QUEUE_OPT}$NCPU_OPT$LSF_RESREQ$EXTRA_PARAMS$OUTPUT_OPT$MPI_RUN2>&1"`exportJOB_RESULTOUTPUT_FILE_LOCATION${GUI_CONFDIR}/application/job-result.sh_1432992581.vsd����������������������文本�砰!!!�_1432992583.vsd�����������������������文本�
本文档为【IBM Platform LSF家族安装和配置简介.V1.0】,请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑, 图片更改请在作品中右键图片并更换,文字修改请直接点击文字进行修改,也可以新增和删除文档中的内容。
该文档来自用户分享,如有侵权行为请发邮件ishare@vip.sina.com联系网站客服,我们会及时删除。
[版权声明] 本站所有资料为用户分享产生,若发现您的权利被侵害,请联系客服邮件isharekefu@iask.cn,我们尽快处理。
本作品所展示的图片、画像、字体、音乐的版权可能需版权方额外授权,请谨慎使用。
网站提供的党政主题相关内容(国旗、国徽、党徽..)目的在于配合国家政策宣传,仅限个人学习分享使用,禁止用于任何广告和商用目的。
下载需要: ¥17.0 已有0 人下载
最新资料
资料动态
专题动态
个人认证用户
百里登峰
暂无简介~
格式:doc
大小:3MB
软件:Word
页数:0
分类:
上传时间:2020-10-30
浏览量:20