首页 【有点过时】HACMP5.x_技术交流

【有点过时】HACMP5.x_技术交流

举报
开通vip

【有点过时】HACMP5.x_技术交流nullHACMP 5.X 新特性与日常维护HACMP 5.X 新特性与日常维护He Bing  hebing@cn.ibm.com 目录 目录 HACMP概念回顾 HACMP 5.x 新功能介绍 常见HA架构 日常管理 目录 目录 HACMP概念回顾 HACMP 5.x 新功能介绍 常见HA架构 日常管理 Although Hardware is Now Very Reliable, Hardware Failures Account for a Small Minority of Sy...

【有点过时】HACMP5.x_技术交流
nullHACMP 5.X 新特性与日常维护HACMP 5.X 新特性与日常维护He Bing  hebing@cn.ibm.com 目录 目录 HACMP概念回顾 HACMP 5.x 新功能介绍 常见HA架构 日常管理 目录 目录 HACMP概念回顾 HACMP 5.x 新功能介绍 常见HA架构 日常管理 Although Hardware is Now Very Reliable, Hardware Failures Account for a Small Minority of System OutagesAlthough Hardware is Now Very Reliable, Hardware Failures Account for a Small Minority of System OutagesSeveral studies place the proportion between 20% and 45% Human error, software error and planned maintenance cause the majority of service outages HACMP—(High Availability Cluster Multi Processing) HACMP—(High Availability Cluster Multi Processing)为什么需要高可用性? 什么是HACMP? High Availability: 系统可用性或运行时间最大化 系统宕机时间最小化 multi-processing: 一个cluster里的各个节点上可以运行多个应用 共享数据或并发访问数据. HACMP的目的 消除单点故障(SPOF),实现高可用 High Availability is fault resilient not fault tolerant 高可用 & 容错高可用 & 容错Software Layers on a HACMP nodeSoftware Layers on a HACMP nodeApplication Uses the services made highly available by HACMP HACMP Makes services highly available for applications Co-ordinates resource availability through the cluster RSCT Provides reliable communication between nodes Co-ordination of subsystems AIX Operating system services LVM Logical storage management TCP/IP Manages communications at a logical layer什么环境不适合HACMP什么环境不适合HACMPYou cannot suffer any downtime Failovers will cause at least some downtime Your environment is not stable HACMP depends on stable software levels and stable configuration HACMP is susceptible to the “fiddle factor” Your application needs manual intervention to recover from a failure Manual reset of a device, etc.使用HACMP的考虑点使用HACMP的考虑点Application must be able to recover from a stop/restart operation Must release all resources when stopped—either normally or abnormally Must tolerate a loss of memory contents Must tolerate a loss of processor state Must perform a restart from a checkpoint Must recover from partial data writes Must operate in a “transactional” protocol There must not be a single point of failure in the HA cluster Shared power supply, non-protected disk, etc. HACMP is a software solution 目录 目录 HACMP概念回顾 HACMP 5.x 新功能介绍 常见HA架构 日常管理 HACMP V5.1新特性1/3 HACMP V5.1新特性1/31.HACMP “classic” (HAS) has been dropped; only HACMP/ES was available based on IBM Reliable Scalable Cluster Technology 2.SMIT “Standard” and “Extended” configuration paths (procedures) 3.Automated configuration discovery 4.Custom resource groupsThis is the version that introduced major changes, from configuration simplification and performance enhancements to changing HACMP terminology. Some of the important new features in HACMP V5.1 were: HACMP V5.1新特性2/3 HACMP V5.1新特性2/35.Non IP networks based on heartbeating over disks 6.Fast disk takeover 7.Forced varyon of volume groups 8.Heartbeating over IP aliases 9.Improved security, by using cluster communication daemon (eliminating the need of using AIX “r” commands, thus eliminating the need for the /.rhosts file) 10.Improved performance for cluster configuration and synchronization 11.Normalization of HACMP terminology (aligning it with other HA products) 12.Simplification of configuration and maintenance HACMP V5.1新特性3/3 HACMP V5.1新特性3/313.Online Planning Worksheets enhancements 14.Various C-SPOC enhancements 15.GPFS integration HACMP V5.2新特性1/2 HACMP V5.2新特性1/21.Two-Node Configuration Assistant, with both SMIT menus and a Java™ interface (in addition to the SMIT “Standard” and “Extended” configuration paths). 2.File collections 3.User password management 4.Classic resource groups are not used anymore, replaced by custom resourcegroups Introduced in July 2004, HACMP V5.2 added more improvements,include configuration simplification, automation, and performance areas. Here is a summary of the improvements in HACMP V5.2: HACMP V5.2新特性2/2 HACMP V5.2新特性2/25.Automated test procedures 6.Automatic cluster verification 7.Improved Online Planning Worksheets (OLPW) can now import a configuration from an existing HACMP cluster 8.Event management (EM) has been replaced by Resource Monitoring and Control (RMC) subsystem (standard in AIX) 9.Enhanced security 10.Resource group dependencies 11.Self-healing clusters (correcting certain cluster configuration errors) 12.HACMP Smart Assist for WebSphere® Application Server HACMP V5.3新特性1/3 HACMP V5.3新特性1/31.Cluster verification at cluster Additional corrective actions taken during verification clverify warns of recognizable single points of failure clverify integrates HACMP/XD options - PPRC; GeoRM; GLVM clverify automatically creates clhosts.client file to be used as the prototype of the clhosts file on client nodesStarting July 2005, the new HACMP V5.3 continued the development of HACMP,by adding further improvements in management, configuration simplification,automation, and performance areas. Here is a summary of the improvements inHACMP V5.3: HACMP V5.3新特性2/3 HACMP V5.3新特性2/32.XML file format for OLPW files and ability to convert existing snapshot files into XML cluster configuration files 3.OEM volume and file system support,Veritas Volume Manager,Veritas File System 4.Further integration of HACMP with RSCT 5.More ‘Smart Assist’ options - DB2® and Oracle Application Server 6.Removal of certain site related restrictions from HACMP 7.Location dependency added for Resource Groups 8.Distribution preference for the IP service aliases 9.WebSMIT security improved by: client data validation before any HACMP commands are executed Server side validation of parameters WebSMIT authentication tools integrated with the AIX authentication mechanisms HACMP V5.3新特性3/3 HACMP V5.3新特性3/310.Cluster manager (clstrmgrES) daemon running at all times (regardless of cluster status - up or down) to support further automation of cluster configuration and enhanced administration 11.Cluster multi-peer extension daemon (clsmuxpdES) and cluster information daemon (clinfoES) changed 12.The Cluster Lock Manager (cllockd or cllockdES) is no longer supported as of HACMP 5.2. During node-by-node migration, it is uninstalled. Installing HACMP 5.2 or 5.3 removes the Lock Manager binaries and definitions 13.In order to improve HACMP security, all HACMP ODMs will be owned by root, group hacmp. Group "hacmp" is created if it does not already exist 14.The command line utilities cldiag and clverify are removed. All functionality is available from SMIT in HACMP 5.3HACMP 5.4新特性1/4: SimplerHACMP 5.4新特性1/4: Simpler1.Expanded and standardized Smart Assist Framework for Automatic Application Discovery and Configuration. Has a common look-and-feel with other Smart Assistants and allows for simpler management of clusters with selected applications 2.Enhanced Smart Assist for Oracle provides assistance to those involved with the installation of Oracle Application Server and/or Oracle Database Manage 3.Improved and extended WebSMIT provides an easier-to-use GUI with enhanced functionality for easier management of the HACMP cluster 4.Enhanced Cluster Test Tool provides several additional test scenarios, enabling more thorough validation of the cluster configurationHACMP 5.4新特性2/4 : Simpler (Cont)HACMP 5.4新特性2/4 : Simpler (Cont)5.Manual Resource Group Movement Enhancements for better usability 6.Enhancements to “Forced Down” enable customers to better manage scheduled maintenance and unscheduled downtime 7.Customers will be able to upgrade to new PTF levels and new releases without disrupting their application service 8.Customers are now able to start cluster services without disrupting their application service, thereby allowing better quality of serviceHACMP 5.4新特性3/4 : Faster & SmarterHACMP 5.4新特性3/4 : Faster & Smarter9.Cluster Verification facilities continue to be expanded, to better help customers prevent problems before they occur 10.Fast Failure Detection allows for faster detection of node failures on certain cluster configurations 11.HACMP now supports Linux on System p hardware with a selected feature set, enabling customers to utilize the proven capabilities of HACMP in a Linux environmentHACMP 5.4新特性4/4 : Goes the DistanceHACMP 5.4新特性4/4 : Goes the Distance12.HACMP/XD GLVM enhancements allow customers to utilize up to four data mirroring networks, as well as to use Enhanced Concurrent Volume Groups with GLVM 13.Support GPFS 2.3 to allow customers to use a cluster-wide filesystem 14.IPAT support on Geographic Networks to allow for better utilization of network resources 15.HACMP/XD for MetroMirror support leverages customers’ storage and MetroMirror investment for business resilience 目录 目录 HACMP概念回顾 HACMP 5.x 新功能介绍 常见HA架构 日常管理 常见HA架构 常见HA架构 两节点拓扑介绍 两节点资源组介绍 两节点接管介绍 多节点架构 其他高可用系统架构两节点HACMP拓扑结构示意图两节点HACMP拓扑结构示意图Network ClientsSerial HeartbeatpSeries Cluster NodepSeries Cluster NodeIP NetworkService & Standby Network AdaptersShared DiskIP HeartbeatsCluster NodesCluster NodesSince the cluster is treated as a single entity, we refer to the individual computers as nodes. Each node is an independent system Inter node communication is defined when the cluster is initialized.Service IP aliasesService IP aliases"Service Address" or "Service Label" is the connection to the computer AIX allows many addresses on a single adapter Does not affect the original configuration Allows separation of services Faster to move if necessaryIP地址切换(IPAT)方式一 (替换方式)IP地址切换(IPAT)方式一 (替换方式)At system bootWith HACMP runningAfter adapter failureAfterfailureAdapter Type192.168.0.1192.168.0.6nanaBoot / Service1.1.1.11.1.1.1naStandbyBoot1.1.1.21.1.1.2Standby192.168.0.2192.168.0.2192.168.0.6192.168.0.6192.168.0.2192.168.0.21.1.1.2Node ANode Bhost Two logical IP networks (Netmask 255.255.255.0) One physical network Clients always access 192.168.0.6 MAC address takeover or ARP cache update is also neededIP地址切换(IPAT)方式二 (别名方式)IP地址切换(IPAT)方式二 (别名方式)At system bootWith HACMP runningAfter adapter failure192.168.0.110.1.1.1nana1.1.1.11.1.1.1na1.1.1.21.1.1.2192.168.0.2192.168.0.210.1.1.150192.168.0.110.1.1.15010.1.1.110.1.1.15010.1.1.16010.1.1.160192.168.0.210.1.1.160192.168.0.210.1.1.1601.1.1.210.1.1.1Node ANode BAfterfailurehost 1.1.1.11.1.1.2 Initially configured addresses (Boot IP) Persistent IP addresses - useful for applications like Tivoli Service IP addresses - used by clients to access the cluster - multiple are allowedPersistent Node IP labelPersistent Node IP label 是一个 IP alias ,它可以分配给cluster里的一个特定节点 总是位于同一个节点 可以位于一块已经拥有 service 或 non-service IP label 的网卡上 不需在节点上安装额外的物理网卡 不属于任何资源组 能被用于对指定的节点进行管理 每个节点只能配置一个. 在节点启动后即可用,当HACMP服务停止后也始终保持可用 如果网卡失败,它只会迁移到相同网络的同一个节点上的其它网卡 如果节点失败,该IP标识不会迁移到群集中的其它节点Persistent Node IP label磁盘心跳(Heartbeat via disk)HACMP5.x的新功能 能够使用下列任何一种共享磁盘阵列 (Fibre Channel,SCSI, 或 SSA) 使用的磁盘是一个 enhanced concurrent volume group 的一部分, 唯一的 要求 对教师党员的评价套管和固井爆破片与爆破装置仓库管理基本要求三甲医院都需要复审吗 是这个 VG必须在两个节点都有定义 磁盘心跳(Heartbeat via disk)常见HA架构 常见HA架构 两节点拓扑介绍 两节点资源组介绍 两节点接管介绍 多节点架构 其他高可用系统架构 How Volume Groups are HandledHow Volume Groups are HandledTwo types: Shared Non-shared Shared volume groups can "migrate" Non-Shared volume groups are node bound Application data must be on a shared volume group to be "moved" Application code may be on either type of diskApplication Server ScriptsApplication Server Scripts"Application server", a name given to a series of scripts: Start the application Stop the application Monitor the application (optional) Re-start the application (optional) Applications must be able to be started from a previously unknown state by a script Applications must be able to be stopped by a scriptResource GroupsResource GroupsLogical constructs that group related attributes together The "container" used by HACMP to "move" resources Participating node list default node priorities Home node Have Policies on: Start up Fall over Fall back Distribution policy Dependant resource groupsResource Group Policies: StartupResource Group Policies: StartupResource group start up occurs: during initial cluster start up initial acquisition of the resource group May be modified by a "settling" timerOnline on Home Node Only (OHNO) only start on the highest priority Online on First Available Node (OFAN) will start on any one node Online on All Available Nodes (OAAN) The resource groups will start on all nodes Online Using Distribution Policy (OUDP) One resource group per network or node depending on the distribution policyResource Group Policies: FalloverResource Group Policies: FalloverResource group fallover occurs: When the current node can no longer support the resource group and it is "moved" to another node Failure has occurred Graceful shutdown with tabkover of the current nodeFallover to Next Priority Node (FNPN) Resource group is moved to the next node in the resource group's node list Fallover using Dynamic Node Priority (FDNP) Resource group is moved to the next node in the resource group's node list as recalculated based on the dynamic node criteria policy Bring Offline on Error Node (BOEN) Resource group is set to an offline state on this node onlyResource Group Policies: FallbackResource Group Policies: FallbackResource group fallback occurs: The resource group is not on its home node A higher priority node becomes available Can be modified by a fallback timerFallback to a Higher Priority Node (FHPN) When the higher priority node is available and/or the optional timer expires, the resource group moves Never Fallback (NFB) Regardless if a higher priority node becomes available, the resource group will not move常见HA架构 常见HA架构 两节点拓扑介绍 两节点资源组介绍 两节点接管介绍 多节点架构 其他高可用系统架构HACMP资源组(Online on Home Node Only)HACMP资源组(Online on Home Node Only)Fallover to Next Priority NodeOnline on Home Node OnlyFallback to a Higher Priority NodeHACMP资源组(Online on Home Node Only)HACMP资源组(Online on Home Node Only)HACMP资源组(Online on First Available Node)HACMP资源组(Online on First Available Node)Fallover to Next Priority NodeOnline on First Available NodeNever FallbackHACMP资源组(Online on First Available Node)HACMP资源组(Online on First Available Node)HACMP资源组(Online on All Available Nodes)HACMP资源组(Online on All Available Nodes)Bring Offline on Error NodeOnline on All Available NodesNever Fallback常见HA架构 常见HA架构 两节点拓扑介绍 两节点资源组介绍 两节点接管介绍 多节点架构 其他高可用系统架构 Failover possibilitiesFailover possibilitiesThree-node Mutual Takeover ClusterThree-node Mutual Takeover ClusterIncreased resiliency vs. 2-node cluster Redundant connections to storage and networks Server capacity must be sized to handle additional workload in failover scenarios Ideally, each node should be sized to run all workloads (in case 2 of 3 nodes failed) Some increase in complexity of cluster configuration and management“n + 1” HA Cluster“n + 1” HA ClusterIncreased resiliency vs. 2-node cluster Some efficiency gain (only one server “on standby”) Server capacity must be considered Ideally, Server D capacity should be sized to handle all workloads from Servers A, B, and C Some clients size Server D smaller; assuming that risk of Servers A, B, and C all failing at once is small Some increase in complexity of cluster configuration and management HA Clustering and VirtualizationHA Clustering and VirtualizationStill need two servers to avoid server as SPoF Base example shown at right: Two-node clusters configured to failover across physical boundaries Distribute primary nodes evenly across servers so that single server failure results in failover of only 50% of primary nodes Other common cluster configs can also be used in virtual environment常见HA架构 常见HA架构 两节点拓扑介绍 两节点资源组介绍 两节点接管介绍 多节点架构 其他高可用系统架构 GLVM架构GLVM架构 目录 目录 HACMP概念回顾 HACMP 5.x 新功能介绍 常见HA架构 日常管理 日常管理 日常管理版本管理 安装及配置 测试要点 日志管理 DARE C-SPOC 参数调整 NFS & HACMP 常用命令 HACMP软件规划-硬件平台 HACMP软件规划-硬件平台HACMP 版本V 5.2, V5.3 and V5.4,以及 HACMP Linux版本V5.4 支持的POWER5+的服务器有: IBM System p5 505 and 505Q (9115-505) IBM System p5 510 and 510Q (9110-510 and 9110-51A) IBM System p5 520 and 520Q (9111-520 and 9131-52A) IBM System p5 550 and 550Q (9113-550 and 9133-55A) IBM System p5 560Q (9116-561) IBM System p5 570 (9115-570) IBM System p5 590 and 595 (9119-590 and 9119-595) IBM System p5 285 (9111-285) HACMP软件规划-系统软件 HACMP软件规划-系统软件操作系统的版本和补丁要求 信息查看: http://www-03.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/TD101347 补丁下载 http://www-912.ibm.com/eserver/support/fixes/fcgui.jsp HACMP对power6的支持 HACMP对power6的支持 HACMP软件规划-应用软件 HACMP软件规划-应用软件一般来说,在一个cluster中,涉及到的应用软件版本一致,这样易于管理. 因为HACMP产品对应用软件并没有严格的限制,用户可以根据实际需求选择需要加入cluster的应用软件,并通过自己的脚本来管理. 日常管理 日常管理版本管理 安装及配置 测试要点 日志管理 DARE C-SPOC 参数调整 NFS & HACMP 常用命令 HACMP软件的安装HACMP软件的安装需要安装的组件 操作系统的补丁 HACMP软件 HACMP软件的补丁 软件的安装方法 NIM 光盘安装 本地硬盘安装 HACMP软件的配置过程HACMP软件的配置过程 HACMP配置前的准备工作 配置IP地址 编辑/etc/hosts文件 编写应用程序的启动/停止脚本 创建vg和文件系统 准备串口设备及磁盘心跳设备 HACMP的Standard配置过程 添加Cluster和节点 配置Cluster资源 创建Cluster资源组 同步HACMP的配置 HACMP的Extended配置过程 添加心跳 定制Cluster资源HACMP配置菜单HACMP配置菜单Smitty hacmp配置管理Extended ConfigurationExtended Configuration123Extended Topology ConfigurationExtended Topology Configuration1.11.21.31.41.5Extended Resource ConfigurationExtended Resource Configuration2.12.2Extended Resources ConfigurationExtended Resources Configuration2.1.12.1.2Extended Resource Group ConfigurationExtended Resource Group Configuration2.2.12.2.2启动和停止HACMP服务启动和停止HACMP服务启动HACMP服务 (V5.4版)启动HACMP服务 (V5.4版)停止HACMP服务 (V5.4版)停止HACMP服务 (V5.4版)Graceful down Take over Force down 日常管理 日常管理版本管理 安装及配置 测试要点 日志管理 DARE C-SPOC 参数调整 NFS & HACMP 常用命令 Verifying That Cluster Services Have StoppedVerifying That Cluster Services Have StoppedHACMP排错要点HACMP排错要点Cluster Log Files Cluster Daemons Monitoring Cluster: clstat/xclstat check log files check daemons by lssrc -g cluster or ps -ef lsvg -o ifconfig -a netstat -in lslpp -l cluster.* Dead man Switch Apply patch 日常管理 日常管理版本管理 安装及配置 测试要点 日志管理 DARE C-SPOC 参数调整 NFS & HACMP 常用命令 HACMP相关的日志文件1/7HACMP相关的日志文件1/7/tmp/clstrmgr.debug Contains time-stamped, formatted messages generated by the clstrmgrES daemon. The default messages are verbose and are typically adequate for troubleshooting most problems, however IBM support may direct you to enable additional debugging. Recommended Use: Information in this file is for IBM Support personnel. /tmp/cspoc.log Contains time-stamped, formatted messages generated by HACMP C-SPOC commands. The /tmp/cspoc.log file resides on the node that invokes the C-SPOC command. Recommended Use: Use the C-SPOC log file when tracing a C-SPOC command’s execution on cluster nodes. /tmp/emuhacmp.out Contains time-stamped, formatted messages generated by the HACMP Event Emulator. The messages are collected from output files on each node of the cluster, and cataloged together into the /tmp/emuhacmp.out log file. In verbose mode (recommended), this log file contains a line-by-line record of every event emulated. Customized scripts within the event are displayed, but commands within those scripts are not executed. HACMP相关的日志文件2/7HACMP相关的日志文件2/7/tmp/hacmp.out Contains time-stamped, formatted messages generated by HACMP scripts on the current day.In verbose mode (recommended), this log file contains a line-by-line record of every command executed by scripts,including the values of all arguments to each command.An event summary of each high-level event is included at the end of each event’s details. Recommended Use: Because the information in this log file supplements and expands upon the information in the /usr/es/adm/cluster.log file, it is the primary source of information when investigating a problem. Note: With recent changes in the way resource groups are handled and prioritized in fallover circumstances, the hacmp.out file and its event summaries have become even more important in tracking the activity and resulting location of your resource groups. In HACMP releases prior to 5.2, non-recoverable event script failures result in the event_error event being run on the cluster node where the failure occurred. The remaining cluster nodes do not indicate the failure. With HACMP 5.2 and up, all cluster nodes run the event_error event if any node has a fatal error. All nodes log the
本文档为【【有点过时】HACMP5.x_技术交流】,请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑, 图片更改请在作品中右键图片并更换,文字修改请直接点击文字进行修改,也可以新增和删除文档中的内容。
该文档来自用户分享,如有侵权行为请发邮件ishare@vip.sina.com联系网站客服,我们会及时删除。
[版权声明] 本站所有资料为用户分享产生,若发现您的权利被侵害,请联系客服邮件isharekefu@iask.cn,我们尽快处理。
本作品所展示的图片、画像、字体、音乐的版权可能需版权方额外授权,请谨慎使用。
网站提供的党政主题相关内容(国旗、国徽、党徽..)目的在于配合国家政策宣传,仅限个人学习分享使用,禁止用于任何广告和商用目的。
下载需要: 免费 已有0 人下载
最新资料
资料动态
专题动态
is_844017
暂无简介~
格式:ppt
大小:1MB
软件:PowerPoint
页数:0
分类:互联网
上传时间:2013-01-03
浏览量:17