BUSINESS CONTINUITY MISSION-CRITICAL EMC PER SAP

BUSINESS CONTINUITY MISSION-CRITICAL EMC PER SAP
Note to Presenter: This presentation supports the solution described in the white paper: EMC Mission-Critical Business Continuity for SAP - EMC VPLEX, Symmetrix VMAX, VNX, VMware vSphere HA, Brocade Networking, Oracle RAC, SUSE Linux Enterprise. It is intended for SAP Basis Administrators, Oracle DBAs, storage administrators, IT architects, and technical managers responsible for designing, creating, and managing mission-critical SAP applications in 24/7 landscapes. _______________________________________________________ Welcome! Today we will discuss an EMC solution that addresses mission-critical business continuity for SAP applications. As you can see from the list of components, the solution combines technologies from EMC, VMware, Brocade, Oracle, and SUSE. And the application platform for which the solution is designed is SAP ERP. EMC VPLEX, EMC Symmetrix VMAX, EMC VNX, VMware vSphere HA, soluzioni di rete Brocade, Oracle RAC, SUSE Linux Enterprise EMC Solutions Group

Programma Solution overview e architettura della soluzione
Componenti e configurazione della soluzione EMC VPLEX Metro VMware vSphere Architettura del sistema SAP Oracle Database Rete Brocade Storage EMC Test e convalida Riepilogo e conclusione The organization of the presentation is straightforward: A general overview of the solution, including the challenges it addresses and how it addresses them. A review of the overall architecture of the solution. An introduction to each of the enabling technologies in turn, and how they are configured for the solution. A summary of the testing that EMC carried out to validate the solution. A brief summary of the main points from the presentation and of the business benefits of the solution.

Business continuity mission-critical per SAP
Eliminazione di single point of failure a tutti i livelli nell'ambiente Fornitura di data center active-active con RPO e RTO vicini allo zero When designing a business continuity and high availability strategy, businesses must consider a range of challenges. Recovery point objectives (RPOs) and recovery time objectives (RTOs) are key metrics and answer two fundamental questions that businesses must address: How much data can we afford to lose (RPO)? How fast do we need the system or application to recover (RTO)? For mission-critical applications, minimizing RPO and RTO is a key challenge. The other main challenges include: Eliminating single points of failure—technology, people, processes Maximizing resource utilization Reducing infrastructure costs Managing the complexity of integrating, maintaining, and testing multiple point solutions The solution we’re presenting here addresses all these challenges for SAP ERP applications. It demonstrates an innovative, active/active deployment model for data centers up to 100 km apart. This transforms the traditional active/passive disaster recovery model to a highly available business continuity solution, with 24/7 application availability, no single points of failure, and near-zero RTOs and RPOs. And the solution is fully automated. The solution scenario consists of two, geographically-separate data centers, with SAP ERP running on VMWare virtual machines, and Oracle Database running on physical servers at the two sites. EMC VMAX and VNX arrays provide the physical storage for the environment and EMC VPLEX Metro provides distributed storage federation across the two sites. At a high level, the solution: Eliminates single points of failure at all layers in the environment, including storage, database, application, and network. Provides active/active data centers that support near-zero RPOs and RTOs and mission-critical business continuity. Additional benefits are also identified on this slide. All of these benefits can also deliver reduced costs for the business. Data center active-active RTO e RPO vicini allo zero Availability delle applicazioni 24x7 Nessun single point of failure High availability management semplificata Gestione degli errori e bilanciamento del carico completamente automatici Manutenzione senza tempo di inattività Implementazione semplificata di Oracle RAC su cluster a grandi distanze Incremento dell'utilizzo dell'infrastruttura

Sfida e soluzione Sfida Single point of failure SAP
Soluzione High availability e business continuity SAP implementations – the challenge and the solution Traditional SAP implementations have several single points of failure (SPOFs), including: Central Services Enqueue server Message server Database server Single site deployment Local disk storage The diagram on the left here illustrates these single points of failure and the diagram on the right illustrates the solution components that address these SPOFs – though this is not an exact one-to-one mapping. So, for example, VMware vSphere virtualizes the SAP application components and eliminates these as single points of failure, and VPLEX Metro virtualizes the storage layer and enables an active/active data center distributed across two geographically-separate sites. Overall, the architecture and components of the solution create an active/active clustered solution for the entire SAP stack. This enhances reliability and availability while simplifying the deployment and management of the environment. Note that, in this solution, the SAP enqueue and message servers are implemented as services within the ASCS instance.

Eliminazione dei single point of failure
This slide illustrates the high-availability solutions implemented at each layer of the environment to provide mission- critical high availability. a) Independent physical server with local storage The EMC validation team initially installed and validated the environment without any high-availability or business continuity protection schemes. The SAP application and database components resided on independent physical servers with local storage. Each single point of failure was then mitigated by using fault-tolerant components and high-availability clustering technologies. b) Storage layer HA All the storage required by the servers in the environment was moved to enterprise-class EMC storage arrays – a Symmetrix VMAX at Site A and a VNX5700 at Site B. In addition, Brocade 8510 Backbones were deployed to provide a redundant SAN fabric for storage access. This takes advantage of the proven five 9s uptime provided by the arrays and the SAN Backbones, including their advanced manageability and business continuity features. c) Database HA At the database layer, the backend database server was converted from an Oracle single instance database to a four-node Oracle RAC database on Oracle ASM. This eliminates the database server as a single point of failure. d) SAP application HA The SAP application servers were fully virtualized using VMware ESXi 5.0. Each of the SAP virtual machines was deployed using SUSE Linux Enterprise Server for SAP Applications as the guest operating system. SUSE Linux Enterprise High Availability Extension and SAP Enqueue Replication Server (ERS) were also deployed to protect the SAP message server and enqueue server. This eliminates the ASCS as a single point of failure. e) Data center HA The high-availability cluster configuration implemented thus far protects SAP within the data center. For high availability between the two data centers, the solution uses EMC VPLEX Metro storage virtualization technology. VPLEX Metro’s unique active/active clustering technology allows read/write access to distributed volumes across synchronous distances, enabling users at both locations to access the same information at the same time. This solution combines VPLEX Metro with SUSE Linux Enterprise HAE (at the operating system layer) and Oracle RAC (at the database layer) to remove the data center as a single point of failure and provide a robust business continuity strategy for mission-critical applications. f) Network HA Will look at in a later slide.

Componenti della soluzione
La business continuity mission-critical per SAP ERP è una combinazione di tecnologie di EMC, VMware, Oracle, SUSE e Brocade EMC VPLEX Metro EMC VPLEX Witness EMC Symmetrix VMAX ed EMC VNX Oracle RAC su cluster a grandi distanze VMware vSphere VMware vSphere High Availability SUSE Linux Enterprise Server for SAP Applications con SUSE Linux Enterprise High Availability Extension SAP Enqueue Replication Server Router core Brocade MLXe Backbone Brocade DCX 8510 These are the main technologies used by the solution: EMC VPLEX Metro is the primary enabling technology. It is a SAN-based storage federation solution that delivers both local and distributed storage federation. In the context of this solution, it is the technology that provides the virtual storage layer that enables an active/active Metro data center. EMC VPLEX Witness is a high availability component that supports continuous application availability, even in event of disruption at one of the data centers. EMC VMAX and EMC VNX arrays provide the enterprise-class storage platform for the solution, with proven five 9s availability, Fully Automated Storage Tiering (FAST), and a choice of replication technologies. The solution is designed for a SAP ERP system with SAP services on virtual machines and the database on physical servers. Oracle Database 11g provides the database platform for the solution. A single instance database was migrated to Oracle RAC on Extended Distance Clusters to remove single points of failure at the database layer, across distance. VMware vSphere virtualizes the SAP application components and eliminates these as single points of failure. And VMware High Availability (HA) protects the virtual machines in the case of physical server and OS failures. SUSE Linux Enterprise Server for SAP Applications, with SUSE Linux Enterprise High Availability Extension and SAP Enqueue Replication Server, protects the SAP central services across two cluster nodes to eliminate these services as single points of failure. Brocade Ethernet fabrics and MLXe core routers provide seamless networking and Layer2 extension between sites. Brocade DCX 8510 Backbones provide redundant SAN infrastructure, including fabric extension.

Architettura della soluzione
This diagram illustrates the physical architecture of all layers of the solution, including the network components. In each data center, an Ethernet fabric was built using Brocade virtual cluster switch (VCS) technology, which delivers a self-healing and resilient access layer with all links forwarding. Virtual Link Aggregation Groups (vLAGs) connect the VCS fabrics to the Brocade MLXe core routers that extend the Layer 2 network across the two data centers VPLEX Witness This diagram also shows the VPLEX Witness component that the solution uses to monitor connectivity between the two VPLEX clusters and to ensure continued availability in the event of an inter-cluster network partition failure or a cluster failure. VPLEX Witness is deployed on a virtual machine at a third, separate failure domain.

Livelli di protezione Before we move on to see how each of the enabling technologies was configured for the solution, this slide briefly summarizes the HA layers that the solution uses to eliminate single points of failure. The table at the center of the diagram summarizes the components deployed to provide local high availability. VPLEX Metro then extends this local high availability with a clustering architecture that breaks the boundaries of the data center and allows servers at multiple data centers to have read/write access to shared block storage devices. An even higher degree of resilience is then achieved by using VPLEX Witness and a VPLEX Cross-Cluster Connect configuration – both of which we will discuss later in the presentation.

VPLEX Metro: introduzione
Sito A VPLEX Cross-Cluster Connect Sito B Federation di storage basata su SAN Data center active-active Distanza di 100 km circa Ribilanciamento del carico di lavoro RPO/RTO vicino allo zero Migrazione di data center Note to Presenter: This slide contains animation. The first click reveals the VPLEX Witness components; the second click reveals the Cross-Cluster Connection configuration. VPLEX EMC VPLEX is a storage virtualization solution for both EMC and non-EMC storage arrays. EMC offers VPLEX in three configurations: VPLEX Local, which enables storage virtualization within a data center; VPLEX Metro, which enables storage virtualization across synchronous distances; and VPLEX Geo, which enables storage virtualization across asynchronous distances. VPLEX Metro This solution is based on VPLEX Metro. VPLEX Metro uses a unique clustering architecture – called AccessAnywhere – to help customers break the boundaries of the data center by enabling the same data to exist in two separate geographical locations, and to be accessed and updated at both locations at the same time. The two data centers can be up to 100 km apart, or have a round-trip time of up to 5 ms. This architecture delivers active/active, block-level access to data on two sites within synchronous distances, and supports workload balancing, near-zero RPOs and RTOs, and non-disruptive data center migration. Note to Presenter: Click now to display VPLEX Witness VPLEX High Availability VPLEX Metro enables application and data mobility and, when configured with VPLEX Witness, provides a high-availability infrastructure for clustered applications such as Oracle RAC. VPLEX Witness is an optional external server that is installed as a virtual machine in a separate failure domain to the VPLEX clusters. It connects to both VPLEX clusters using a VPN over the management IP network. By reconciling its own observations with information reported periodically by the clusters, the Witness enables the clusters to distinguish between inter-cluster network partition failures and cluster failures and to automatically resume I/O at the appropriate site. VPLEX Metro enables you to build an extended or stretch cluster as if it was a local cluster, and removes the data center as a single point of failure. Moreover, as the data and applications are active at both sites, the solution provides a simple business continuity strategy. Note to Presenter: Click now to display VPLEX Cross-Cluster Connection An even higher degree of availability can be achieved by using a VPLEX Cross-Cluster Connect configuration. In this case, each host is connected to the VPLEX clusters at both sites. This ensures that, in the unlikely event of a full VPLEX cluster failure, the host has an alternate path to the remaining VPLEX cluster. Sito C VPLEX WITNESS AccessAnywhere VPLEX High Availability VPLEX Witness VPLEX Cross-Cluster Connect Active Active

Configurazione di VPLEX Metro
Strutture logiche VPLEX Consistency group Volume virtuale Device distribuito Device Extent Volume di storage VPLEX encapsulates traditional physical storage array devices and applies layers of logical abstraction to these exported LUNs. This slide provides an overview of VPLEX logical storage structures and how these are configured for the solution. Starting at the bottom of the storage structure hierarchy: A storage volume is a LUN exported from an array and encapsulated by VPLEX. An extent is the mechanism VPLEX uses to divide storage volumes and may use all or part of the capacity of the underlying storage volume. A device encapsulates an extent or combines multiple extents or other devices into one large device with a specific RAID type. For the solution, there is a one-to-one mapping between storage volumes, extents, and devices at each site. The devices encapsulated at Site A are virtually provisioned thin devices, while the devices encapsulated at Site B are traditional LUNs. Next in the hierarchy are distributed devices. These encapsulate other devices from two separate VPLEX clusters. At the top layer of the storage structure are virtual volumes. These are created from a top-level device, which can be either a device or a distributed device. Virtual volumes are the elements that VPLEX exposes to hosts. To create distributed devices for the solution, all cluster-1 devices are mirrored remotely on cluster-2, in a distributed RAID 1 configuration. These distributed devices are encapsulated by virtual volumes, which are then presented to the hosts through storage views. The storage views define which hosts access which virtual volumes on which VPLEX ports. Next are consistency groups, which aggregate virtual volumes so that the same properties can be applied to them all. VPLEX Metro uses synchronous (as opposed to asynchronous) consistency groups. With synchronous consistency groups, clusters can be separated by up to 5 ms of latency. In this case, VPLEX Metro sends writes to the backend storage volumes, and acknowledges a write to the application only when the backend storage volumes in both clusters acknowledge the write. For the solution, a single consistency group contains all the virtual volumes that hold the Oracle database binaries, the ASM disk groups, and the OCR and voting files. A detach rule is defined for the consistency group to specify cluster-1 (or Site A) as the preferred cluster.

Componenti di virtualizzazione VMware
This slide provides an overview of the virtualization platform for the solution. As you can see, the SAP application servers are fully virtualized using VMware vSphere 5.0 and VPLEX Witness is also deployed on a virtual machine. VMware vMotion and VMware Storage vMotion are implemented as part of the solution, as are VMware High Availability and the vSphere Distributed Resource Scheduler. The last two items here are EMC plug-ins for vSphere: PowerPath/VE works as a multipathing plug-in that provides enhanced path management capabilities to ESXi hosts. VSI is a vSphere plug-in that provides a single management interface for managing EMC storage within the vSphere environment. vSphere 5.0 vMotion Storage vMotion VMware HA DRS (Distributed Resource Scheduler) EMC PowerPath/VE EMC Virtual Storage Integrator (VSI)

VMware vSphere con VPLEX Metro
Cross-Cluster Connect We’ll take a look now at VMware deployments on VPLEX Metro in general and then discuss the particular configuration used for this solution. VPLEX Metro delivers concurrent access to the same set of devices at two physically separate locations. This provides an active/active infrastructure that enables geographically stretched clusters based on VMware vSphere. And the use of Brocade vLAG technology enables extension of VLANs, and hence subnets, across the two physical data centers. So what can we achieve by deploying vSphere features and components together with VPLEX Metro? vMotion: Provides the ability to live migrate virtual machines between the two sites in anticipation of planned events such as hardware maintenance. Storage vMotion: Provides the ability to migrate a virtual machine’s storage without any interruption in the availability of the virtual machine. This allows the relocation of live virtual machines to new datastores. VMware DRS: Provides automatic load distribution and virtual machine placement across the two sites through the use of DRS groups and affinity rules. VMware HA: VMware HA is a host failover clustering technology that leverages multiple ESXi hosts, configured as a cluster, to provide rapid recovery from outages and cost-effective high availability for applications running in virtual machines. It protects against server failure by restarting VMs on other ESXi servers within the cluster, and it protects against application failure by monitoring VMs and resetting them in the event of guest OS failure. Combining VPLEX Metro HA with VMware HA provides automatic application restart for any site-level disaster. VPLEX Metro HA Cross-Cluster Connect: Protection of the VMware HA cluster can be further increased by adding a cross-cluster connect between the local VMware ESXi servers and the VPLEX cluster on the remote site, as shown in this slide. Cross-connecting vSphere environments to VPLEX clusters protects against local data unavailability events (which VMware vSphere 5.0 does not recognize) and ensures that failed virtual machines automatically move to the surviving site. This solution uses VPLEX Metro HA with Cross- Cluster Connect to maximize the availability of the VMware virtual machines. Note: VPLEX Cross-Cluster Connect is available for up to 1 ms of distance-induced latency.

Configurazione stretched cluster di VMware
The screenshots in this slide illustrate the configuration of the VMware stretched cluster for the solution. A single vSphere cluster is stretched between Site A and Site B by using a distributed VPLEX virtual volume with VMware HA and VMware DRS. There are four hosts in the cluster, two at each site. VPLEX Metro HA Cross-Cluster Connect provides increased resilience to the configuration. The first screenshot is from vCenter and shows the configuration of the vSphere cluster, with its four hosts and with vSphere DRS and HA enabled. The second screenshot shows the configuration of the datastore (EXT_SAP_VPLEX_DS01) created for the solution. This datastore was created on a 1 TB VPLEX distributed volume and presented to the ESXi hosts in the stretch cluster. All virtual machines were migrated to this datastore, using Storage vMotion, either because they needed to share virtual disks or because they needed to be able to vMotion between sites. Screenshot di vCenter

Configurazione di VMware HA e DRS
We’ve seen that both vSphere and DRS were enabled for the VMware stretched cluster. The first screenshot on this slide shows these options being configured for the cluster. vSphere HA configuration VM Monitoring was configured to restart individual virtual machines if their heartbeat is not received within 60 seconds. The VM Restart Priority option for the four SAP VMs was set to High, as shown in the second screenshot. This ensures that these VMs are powered on first in the event of an outage. This screen also shows the Host Isolation Response setting which was left at the default value of ‘Leave powered on’. The next screenshot shows the datastores used for heartbeating. As vSphere HA requires at least two datastores to implement heartbeating, a second datastore was created on a 20 GB VPLEX distributed volume and presented to all the ESXi hosts. vSphere DRS The final screenshot shows the DRS affinity rule configured for the solution. This is a VM- VM affinity rule which specifies that the ASCS (SAPASCS1) and ERS (SAPASCS2) virtual machines should always be kept on separate hosts. Priorità di riavvio HA per VM SAP HA e DRS attivati per stretched cluster VMware Datastore heartbeat HA Regola di affinità VM-VM DRS

EMC Virtual Storage Integrator e VPLEX
EMC Virtual Storage Integrator (VSI) for VMware vSphere is a plug-in to the VMware vSphere client that provides a single management interface for managing EMC storage within the vSphere environment. It provides enhanced visibility into VPLEX directly from the vCenter GUI. The Storage Viewer and Path Management features are accessible through the EMC VSI tab. In the solution, VPLEX distributed volumes host the EXT_SAP_VPLEX_DS01 VMFS datastore, and Storage Viewer provides details of the datastore’s virtual volumes, storage volumes, and paths. The screenshot here also shows that the LUNS which make up this datastore are four 256 GB distributed RAID 1 VPLEX Metro volumes that are accessible via PowerPath. Scheda EMC VSI nell'interfaccia grafica di vCenter

Architettura del sistema SAP
Software applicativo SAP SAP Enhancement Package 4 for SAP ERP 6.0 IDES SAP NetWeaver Application Server for ABAP 7.01 SAP Enqueue Replication Server Sistema operativo SUSE Linux Enterprise Server (SLES) for SAP Applications 11 SP1 SUSE Linux Enterprise High Availability Extension Virtualizzazione Servizi SAP su virtual machine VMware Database Oracle RAC su server fisici This slide illustrates the SAP system architecture for the solution. The SAP application layer is based on SAP ERP 604 and SAP NetWeaver The SAP ASCS instance, ERS instance, and Dialog Instances are virtualized on VMware ESXi servers. Each of the SAP VMs is deployed using SUSE Linux Enterprise Server for SAP Applications as the guest operating system. In addition, SUSE Linux Enterprise High Availability Extension and SAP Enqueue Replication Server are deployed to protect the SAP message server and enqueue server.

Architettura del sistema SAP: considerazioni sulla progettazione
Server di accodamento e messaggistica separati dall'istanza centrale e implementati come servizi nell'istanza ASCS ERS installato come parte dell'architettura HA per garantire una perdita del blocco delle applicazioni pari a zero Due istanze di dialogo forniscono processi operativi ridondanti quali dialogo, sfondo, aggiornamento, spool Istanza ASCS installata con un nome host virtuale per separarla dal nome host VM Istanza ERS installata con un numero diverso per evitare confusione quando sia ASCS sia ERS sono sotto controllo a livello di cluster The solution implements a high-availability SAP system architecture, with these features: The enqueue and message servers are decoupled from the Central Instance and implemented as services within the ASCS instance. SAP ERS is installed as part of the HA architecture to provide zero application lock loss and further protect the enqueue server. Two dialog instances are installed to provide redundant work processes such as dialog, background, update, and spool. The SAP system deployed for the solution also implements several key design features: The ASCS instance is installed with a virtual hostname to decouple it from VM hostname. The ERS instance is installed with a different instance number to avoid future confusion when both ASCS and ERS are under cluster control.

Architettura del sistema SAP: considerazioni sulla progettazione (continuazione)
Processi di aggiornamento SAP configurati su istanze di application server aggiuntive Profili ASCS, ERS, di avvio e di istanza di dialogo aggiornati con configurazioni ERS File system condivisi SAP archiviati su Oracle ACFS e montati come share NFS su VM SAP, presentati come risorse NFS con high availability gestite da Oracle Clusterware Storage per ambiente SAP incapsulato, distribuito su due siti e reso disponibile per i server SAP tramite VPLEX Metro The solution implements a high-availability SAP system architecture, with these features: The enqueue and message servers are decoupled from the Central Instance and implemented as services within the ASCS instance. SAP ERS is installed as part of the HA architecture to provide zero application lock loss and further protect the enqueue server. Two dialog instances are installed to provide redundant work processes such as dialog, background, update, and spool. The SAP system deployed for the solution also implements several key design features: SAP update processes are configured on the additional application server instances. The SAP ASCS, ERS, start, and dialog instance profiles are updated with ERS configurations. SAP shared file systems are stored on Oracle ACFS and mounted as NFS shares on the SAP VMs. These shared file systems are presented as a highly available NFS resource that is managed by Oracle Clusterware. The storage for the entire SAP environment is encapsulated and virtualized for the solution. The storage is distributed across the two sites and made available to the SAP servers through VPLEX Metro.

Configurazione di SUSE Linux Enterprise HAE
SLES HAE protegge i server di accodamento e messaggistica su due nodi di cluster sviluppati su VM VMware VMware High Availability protegge le VM Gli agent di risorse SAPInstance, master/slave e indirizzi IP virtuali monitorano e controllano l'availability delle risorse L'agent SAPInstance controlla le istanze ASCS ed ERS, configurate come risorse master/slave per garantire che ASCS ed ERS non vengano mai avviati sullo stesso nodo Partizione VMDK utilizzata come device SBD STONITH con l'opzione multi-autore configurata per autorizzare l'accesso in scrittura da più VM This slide shows how the solution uses SUSE Linux Enterprise High Availability Extension to protect the central services (message server and enqueue server) across two cluster nodes built on VMware virtual machines, with VMHA protecting the virtual machines. The key components of SUSE Linux Enterprise HAE that are implemented in this solution include: OpenAIS/Corosync, which acts as a high-availability cluster manager that supports multinode failover. Resource agents that monitor and control the availability of resources – the resource agents implemented are Virtual IP address, master/slave, and SAPInstance. A high-availability GUI and various command line tools. The SUSE HAE system deployed for the solution also implements several key design features: The SBD STONTH device for the solution uses a partition of a virtual disk. This means that both cluster nodes must have simultaneous access to this disk. The virtual disk is stored in the same datastore as the SAP virtual machines. This is provisioned and protected by VPLEX and is available on both sites. By default, VMFS prevents multiple virtual machines from accessing and writing to the same VMDK. However, sharing was enabled by configuring the multi-writer option. The SAPInstance resource agent controls the ASCS instance and ERS instance and is configured as a master/slave resource. In the event of a failure, the slave is promoted to the role of master and starts the SAP ASCS instance. Similarly, the master is demoted to the role of slave and starts the ERS instance. This master/slave mode ensures that an ASCS instance is never started on the same node as the ERS. Note: Corosync token parameter configuration In the Corosync configuration file—corosync.conf—the token timeout specifies the time (in milliseconds) after which a token loss is declared if a token is not received. This timeout corresponds to the time spent detecting the failure of a processor in the current configuration. For this solution, the value of this parameter is set to 10,000 ms in order to cope with the switchover of the underlying layers without unnecessary cluster service failover.

Architettura Oracle Database
Componenti Oracle Oracle Database 11g Release 2 Enterprise Edition Oracle ASM Oracle ACFS Oracle Clusterware Database single-instance sottoposto a migrazione a cluster RAC fisico a 4 nodi su ASM Oracle Extended RAC su VPLEX Gestione semplificata Gli host si connettono solo al cluster VPLEX locale Gli host inviano I/O solo una volta al cluster locale: non sono necessarie scritture doppie Non è necessaria l'implementazione di voting disk Oracle e Oracle Clusterware su un terzo sito Elimina cicli CPU host costosi utilizzati dal mirroring basato su host Protegge più database e/o applicazioni come unità Oracle components and configuration Oracle Database 11g Release 2 provides the underlying database for the SAP applications. At each data center, the database originated as a physical single instance. However, to eliminate the database server as a single point of failure, the single instance database was migrated to a four- node physical Oracle RAC cluster with the Oracle database residing on ASM. The Oracle database files and SAP ERP application files reside on Oracle ASM Cluster File System (ACFS). An Oracle RAC on Extended Distance Clusters architecture is deployed to allow servers in the cluster to reside in physically separate locations and to remove the data center as a single point of failure. Why Oracle Extended RAC over VPLEX Oracle RAC is normally run in a local data center due to the potential impact of distance-induced latency and the relative complexity and overhead of extending Oracle RAC across data centers with host-based mirroring using Oracle ASM. With EMC VPLEX Metro, however, an Oracle Extended RAC deployment, from the Oracle DBA perspective, becomes a standard Oracle RAC install and configuration. The main benefits of deploying Oracle Extended RAC with VPLEX include: VPLEX simplifies management of Extended Oracle RAC, as cross-site high availability is built in at the infrastructure level. To the Oracle DBA, installation, configuration, and maintenance are exactly the same as for a single site implementation of Oracle RAC. VPLEX eliminates the need for host-based mirroring of ASM disks and the host CPU cycles that this consumes. With VPLEX, ASM disk groups are configured with external redundancy and are protected by VPLEX distributed mirroring. Hosts need to connect to their local VPLEX cluster only and I/O is sent only once from that node. However, hosts have full read-write access to the same database at both sites. There is no need to deploy an Oracle voting disk on a third site to act as a quorum device at the application level. VPLEX enables you to create consistency groups that will protect multiple databases and/or applications as a unit.

Configurazione Oracle Database
4 volumi ACFS montati nel cluster RAC TRANS, ASCS500, SAPMNT esportati come share NFS su server SAP File system condivisi presentati come risorse NFS con high availability gestite da Oracle Clusterware Gruppi di dischi ASM configurati per riflettere il layout single-instance esistente The diagram in this slide provides a logical representation of the solution’s deployment of Oracle Extended RAC on VPLEX Metro. This solution uses four ACFS volumes mounted across the Oracle RAC cluster. Three of the ACFS volumes – SAPMNT, USRSAPTRANS, and ASCS00 – were then exported as NFS shares to the SAP servers, using a virtual IP address and a highly available NFS resource under control of Oracle Clusterware. The ASM disk groups for the solution were configured to reflect the existing single- instance Oracle database layout – the lowermost table in the slide shows the ASM disk groups and their configuration. Volume ACFS Mount point SAP_O_HOME /oracle/VSE/112 SAPMNT /sapmnt/VSE USRSAPTRANS /usr/sap/trans ASCS00 /usr/sap/VSE/ASCS00 Gruppo di dischi ASM N. di dischi Dimensione gruppo di dischi (GB) Ridondanza OCR 5 40 Normale EA_SAP_ACFS 4 64 Esterna EA_SAP_DATA 16 2.048 EA_SAP_REDO EA_SAP_REDOM EA_SAP_FRA 256

Infrastruttura di rete Brocade
This slide shows the IP and SAN networks deployed for the solution in the two data centers, and the Layer 2 extension between the data centers. These networks are created using Brocade networking technologies. IP network In each data center, the IP network is built using two Brocade VDX 6720 switches, which are deployed in a virtual cluster switch or VCS configuration. All servers are connected to the network using redundant 10 GbE connections provided by Brocade 1020 CNAs. The two VDX switches at each site are connected to a Brocade MLX Series router using a Virtual Link Aggregation Group or vLAG. The MLX Series routers extend the Layer 2 network between the two data centers. All traffic between Site A and Site B is routed through the MLX routers using multiple ports configured as a LAG. Oracle RAC relies on a highly available virtual IP for private network communication. For the solution, a separate VLAN—VLAN 10—is used for this interconnect, while VLAN 20 handled all public traffic. SAN network The SAN in each data center is built with Brocade DCX 8510 Backbones. All servers are connected to the SAN using redundant 8 Gb connections that are provided by Brocade 825 HBAs. The VPLEX to VPLEX connection between the data centers uses multiple FC connections between the DCX 8510 Backbones. These are used in active/active mode with failover. Rete IP SAN

Layout di storage EMC Sito A: EMC Symmetrix VMAX Sito B: EMC VNX5700
Provisioning virtuale Sito B: EMC VNX5700 Gruppi RAID e LUN tradizionali The storage at each site is provided by enterprise-class EMC storage arrays – a Symmetrix VMAX at Site A and a VNX5700 at Site B. Both the VMAX and VNX provide proven five 9s availability. Both also support EMC FAST (Fully Automated Storage Tiering) technology on a range of drive types and both are powered by Intel Xeon processors. VPLEX virtualizes storage on heterogeneous arrays – in this case, a VMAX and a VNX. However, it is still important to follow best practices for whichever storage arrays you are using. For the VMAX in the solution: VPLEX Metro, Oracle Extended RAC, and SAP volumes are laid out using EMC Virtual Provisioning. This configuration places the Oracle data files and log files in separate thin pools and allows each to use distinct RAID protection. The data files reside in a RAID 5 protected pool and the redo logs in a RAID 1 protected pool. You can see this layout in the first diagram in the slide. Storage was not pre-allocated to any of the devices, except for the Oracle REDO log devices, as recommended by EMC. For the VNX5700 in the solution: VPLEX Metro, Oracle Extended RAC, and SAP volumes were laid out using traditional RAID groups and LUNs. This configuration places the Oracle data files and log files in separate RAID groups and allows each to use distinct RAID protection. The data files reside in a RAID 5 protected RAID group and the redo logs in a RAID 10 protected RAID group. The FRA disk group resides on NL-SAS drives with RAID 6 protection. You can see this layout in the right-hand diagram in the slide. Similar EMC best practices apply to both Virtual Provisioning and traditional provisioning methods, and the same ASM disk groups were created on the VNX and VMAX. In addition, the LUNs created on the VNX match the number and size of the thin devices created on the VMAX.

Test e convalida Test Errore del processo del servizio di accodamento SAP Errore della virtual machine dell'istanza SAP ASCS Errore del nodo Oracle RAC Guasto del sito (cluster VPLEX, server ESXi, rete, nodi RAC) Isolamento dei cluster VPLEX Comportamento previsto L'applicazione continua a essere eseguita senza interruzioni The EMC validation team initially installed and validated the solution environment without any high-availability or business continuity protection schemes. They then transformed the environment to the mission-critical business continuity solution described in this presentation. To validate the solution, and to demonstrate the elimination of all single points of failure, the validation team carried out the tests listed in this slide. The result was the same for each test: that is, the application continued without interruption.

Errore del processo del servizio di accodamento SAP
L'agent di risorse SAPInstance rileva/segnala l'errore. L'agent di risorse master/slave promuove SAPASCS1 a master (che ospita i servizi ASCS). L'agent di risorse master/slave avvia ERS su SAPASCS2 quando ricollega il cluster. La tabella del blocco replicato viene ripristinata. 1 2 This slide summarizes the “SAP Enqueue Service Process Failure” test carried out by the EMC validation team. This test validates how the system behaves in the event of a SAP enqueue server process failure. Failure simulation To test this type of failure, the enqueue service process on the active ASCS node was terminated by running the kill command. System behavior When the node fails, the failure is reported, as shown in the uppermost screen segment in the slide. Then, what was previously the slave node (SAPASCS1) is promoted to become the master node and host the ASCS services, as shown in the main screen shot. When SAPASCS2 (the failed node) rejoins the cluster, the ERS is restarted on that node. Finally, the replicated lock table is restored. Result This test demonstrates that the application continues without interruption if the enqueue process fails and that no administrative intervention is required to deal with the failure. 3 Risultato L'applicazione continua a essere eseguita senza interruzioni Non è richiesto alcun intervento amministrativo 4

Errore della VM dell'istanza SAP ASCS
SAPASCS2 non è più disponibile dal client vSphere. L'agent di risorse SAPInstance rileva/segnala l'errore. VMHA riavvia la VM con errore sull'host ESXi ancora funzionante. L'agent di risorse master/slave promuove SAPASCS1 a master (che ospita i servizi ASCS) e avvia ERS su SAPASCS2 quando ricollega il cluster. La tabella del blocco replicato viene ripristinata. 1 2 3 This slide summarizes the “SAP ASCS Instance VM Failure” test carried out by the EMC validation team. This test validates how the system behaves if the VM on which the ASCS instance is running fails. Failure simulation To simulate this type of failure, the ESXi server hosting the ASCS instance VM was powered off via DRAC. The server was then rebooted without entering maintenance mode. System behavior The system responded to the failure as shown in steps 1 to 5 in the slide. Result This test demonstrates that the application continues without interruption if the ASCS instance VM fails and that no administrative intervention is required to deal with the failure. 4 5 Risultato L'applicazione continua a essere eseguita senza interruzioni Non è richiesto alcun intervento amministrativo

Errore del nodo Oracle RAC
Risultato L'utente finale registra tempi di risposta delle transazioni più lunghi quando il processo operativo DI si riconnette all'altro nodo RAC. Le transazioni non confermate sono state sottoposte a rollback a livello di database per garantire la coerenza dei dati. L'utente finale riceve il messaggio di errore del sistema e deve riavviare la transazione. Non è richiesto alcun intervento amministrativo. This slide summarizes the “Oracle RAC Node Failure” test carried out by the EMC validation team. This test validates how the system behaves in the event of an unexpected RAC node failure. Failure simulation To test this type of failure, the server was rebooted so that the Oracle RAC node running on it went offline. System behavior When the RAC node went offline, instance VSE003 became unavailable. The SAP instance work process then automatically connected to another RAC node – this is illustrated by the screen shots in the slide. Result The test results are also summarized on the slide. Il nodo RAC va offline: l'istanza VSE003 non è più disponibile. Il processo operativo dell'istanza SAP si connette a un altro nodo RAC. 1 2

Stato dell'ambiente prima del guasto del sito
Tutti i nodi RAC in esecuzione Cluster VPLEX disponibili su entrambi i siti Server ESXi disponibili su entrambi i siti Virtual machine SAP dei siti A e B funzionanti The next two tests relate to a complete site failure and isolation of a VPLEX cluster. This slide shows the status of the environment before these tests were carried out.

Guasto del sito VPLEX Witness ignora la regola di detach del consistency group in modo che VPLEX sul sito B rimanga disponibile. I nodi RAC sul sito B rimangono disponibili. VMHA riavvia SAPASCS1 e SAPDI1 sul sito B. SLE HAE rileva l'errore di SAPASCS1 e riavvia ERS quando il nodo ricollega il cluster. Gli utenti finali su SAPDI1 perdono le sessioni, ma possono eseguire di nuovo l'accesso quando viene riavviato sul sito B. Durante il riavvio, i nuovi utenti vengono indirizzati a SAPDI2. 1 2 This slide summarizes the “Site Failure” test carried out by the EMC validation team. This test validates how the system behaves in the event of a complete site failure. Failure simulation To test this failure scenario, the validation team simulated a complete failure of Site A, including VPLEX cluster, ESXi server, network, and Oracle RAC node components. The VPLEX Witness remained available on Site C. And on Site B, VPLEX cluster-2 remained in communication with the VPLEX Witness. System behavior Steps 1 to 5 outline how the system responds to the failure, and the diagram to the left illustrates the status of the environment after the site failure: When Site A fails, VPLEX Witness ensures that the consistency group’s detach rule, which defines cluster-1 as the preferred cluster, is overridden and that the storage served by VPLEX cluster-2 on Site B remains available. RAC nodes sse-ea-erac-n03 and sse-ea-erac-n04 on Site B remain available. When the ESXi servers on Site A fail, VMHA restarts SAPASCS1 and SAPDI1 on Site B, with SAPASCS1 restarted on a different ESXi host to SAPASCS2. SUSE Linux Enterprise HAE detects the failure of cluster node SAPASCS1. Because the ERS was running on this node, the cluster takes no action except to restart the ERS when SAPASCS1 rejoins the cluster. The lock table is preserved and operational all the time. End users on SAPDI1 lose their sessions due to the ESXi server failure. During the restart process, new users are directed to SAPDI2. When SAPDI1 restarts on Site B, users can log into SAPDI1 again. Result The overall result is that the application continues without interruption even in the event of a complete site failure. 3 4 5

Isolamento dei cluster VPLEX
VPLEX Witness ignora la regola di detach del consistency group in modo che VPLEX sul sito B rimanga disponibile. I nodi RAC sul sito B rimangono disponibili. I nodi RAC sul sito A vengono espulsi. I server ESXi sul sito A rimangono disponibili. Le virtual machine SAPASCS1 e SAPDI1 rimangono attive grazie a VPLEX Metro HA Cross-Cluster Connect. 1 2 This slide summarizes the “VPLEX Cluster Isolation” test carried out by the EMC validation team. This test validates how the system behaves in the event of isolation of one of the VPLEX clusters. Failure simulation To test this failure scenario, the validation team simulated isolation of the preferred cluster on Site A, with both the external Management IP network and the VPLEX WAN communications network partitioned. The LAG network remained available. VPLEX Witness remained available on Site C. On Site B, VPLEX cluster-2 remained in communication with VPLEX Witness. System behavior Steps 1 to 5 explain outline how the system responds to the failure, and the diagram to the left illustrates the status of the environment after cluster isolation: When the VPLEX on Site A becomes isolated, the VPLEX Witness ensures that the consistency group’s detach rule, which defines cluster-1 as the preferred cluster, is overridden and that the storage served by VPLEX cluster-2 on Site B remained available. RAC nodes sse-ea-erac-n03 and sse-ea-erac-n04 on Site B remain available. RAC nodes sse-ea-erac-n01 and sse-ea-erac-n02 on Site A are ejected. The ESXi servers on Site A remain available. Virtual machines SAPASCS1 and SAPDI1 remain active due to the use of VPLEX Metro HA Cross-Cluster Connect. Result The overall result is that the application continues without interruption even if the preferred VPLEX cluster is isolated. 3 4 5

Test e convalida Test Errore del processo del servizio di accodamento SAP Errore della virtual machine dell'istanza SAP ASCS Errore del nodo Oracle RAC Guasto del sito (cluster VPLEX, server ESXi, rete, nodi RAC) Isolamento dei cluster VPLEX Comportamento osservato L'applicazione continua a essere eseguita senza interruzioni The EMC validation team initially installed and validated the solution environment without any high-availability or business continuity protection schemes. They then transformed the environment to the mission-critical business continuity solution described in this presentation. To validate the solution, and to demonstrate the elimination of all single points of failure, the validation team carried out the tests listed in this slide. The result was the same for each test: that is, the application continued without interruption.

Riepilogo e conclusione
La soluzione combina le tecnologie EMC, SAP, VMware, Oracle, SUSE e Brocade per: eliminare i single point of failure a tutti i livelli nell'ambiente fornire data center active-active con RPO e RTO vicini allo zero abilitare la business continuity mission-critical per le applicazioni SAP This solution demonstrates the transformation of a traditional active/passive SAP deployment to a highly available business continuity solution with active/active data centers and always-on application availability. The solution combines EMC, VMware, Oracle, SUSE, and Brocade high-availability components to: Eliminate single points of failure at all layers in the environment Provide active/active data centers that support near-zero RPOs and RTOs Enable mission-critical business continuity for SAP applications Each single point of failure was identified and mitigated by using fault-tolerant components and high-availability clustering technologies. Resource utilization was increased by enabling active/active data access. And failure handling was fully automated to eliminate the final and often most unpredictable SPOF from the architecture—people and processes. In addition, the use of management and monitoring tools such as the vSphere Client, EMC Virtual Storage Integrator, and the VPLEX performance tools simplifies operational management and allows monitoring and mapping of the infrastructure stack. The testing performed by the EMC validation team demonstrates how using the solution design principles and components eliminated single points of failure at the local level and created an active-active data center that enables mission-critical business continuity for SAP. The components involved here are: EMC VPLEX Metro, EMC VPLEX Witness, EMC Symmetric VMAX, VMware vSphere HA, Oracle RAC, SUSE Linux Enterprise HAE, and Brocade networking. The testing also demonstrates how VPLEX Metro, combined with SUSE Linux Enterprise HAE, Oracle Extended RAC, and Brocade networking, extends this high availability to break the boundaries of the data center and allow servers at multiple data centers to have read/write access to shared block storage devices. VPLEX Witness and Cross-Cluster Connect provide an even higher level of resilience. Together, these technologies enable transformation of a traditional active/passive data center deployment to a mission-critical business continuity solution with active/active data centers, 24/7 application availability, no single points of failure, and near-zero RTOs and RPOs. Data center active-active RTO e RPO vicini allo zero Availability delle applicazioni 24x7 Nessun single point of failure High availability management semplificata Gestione degli errori e bilanciamento del carico completamente automatici Manutenzione senza tempo di inattività Implementazione semplificata di Oracle RAC su cluster a grandi distanze Incremento dell'utilizzo dell'infrastruttura

Thank you.

BUSINESS CONTINUITY MISSION-CRITICAL EMC PER SAP

Presentazioni simili

Presentazione sul tema: "BUSINESS CONTINUITY MISSION-CRITICAL EMC PER SAP"— Transcript della presentazione:

Presentazioni simili

Sul progetto

Feed-back

Entrare

Autorizzarsi attraverso i social network:

BUSINESS CONTINUITY MISSION-CRITICAL EMC PER SAP

Presentazioni simili

Presentazione sul tema: "BUSINESS CONTINUITY MISSION-CRITICAL EMC PER SAP"— Transcript della presentazione:

Presentazioni simili

Sul progetto

Feed-back