1 Understanding Oracle Exalogic

This chapter provides an overview of Oracle Exalogic and how Exalogic functions in an Oracle Fusion Middleware enterprise deployment.

In general, provides information on Exalogic components, architecture, Exalogic networking and deploying Exalogic with Exadata.

1.1 What is Exalogic?

Oracle Exalogic is an integrated hardware and software system designed to provide a complete platform for a wide range of application types and widely varied workloads.

Exalogic is intended for large-scale, performance-sensitive, mission-critical application deployments. It combines Oracle Fusion Middleware software and industry-standard Sun hardware to enable a high degree of isolation between concurrently deployed applications, which have varied security, reliability, and performance requirements. With Exalogic, you can develop a single environment that can support end-to-end consolidation of your applications.

1.2 Understanding Exalogic Components

Oracle Exalogic is delivered as a rack of hardware and in addition to the hardware components, Exalogic can be combined with Oracle Exalogic Elastic Cloud software, which consists of pre-integrated, standard technologies including the operating system, virtualization technology, networking software, device drivers, and firmware.

It consists of the following components:

  • Compute Nodes (Servers)

  • ZFS Storage (Storage Area Network/SAN)

  • Integrated Infiniband Networking

For more information about Exalogic, see 'Introduction to Exalogic Machine' in the Oracle Exalogic Elastic Cloud Machine Owner's Guide.

1.3 About the Exalogic Hardware Architecture

This section describes the Oracle Exalogic hardware architecture.

Oracle Exalogic was tested extensively on a wide range of hardware configurations to arrive at the optimal configuration for middleware type deployments. Design considerations included high availability, compute density, state-of-the-art components, balanced system design, field serviceability, centralized storage, and high-performance networking.

Figure 1-1 Exalogic Hardware Architecture

Description of Figure 1-1 follows
Description of "Figure 1-1 Exalogic Hardware Architecture"

This section contains the following topics:

1.3.1 About Compute Nodes

The compute nodes are much like servers. These compute nodes contain CPUs, networking, and internal flash storage.

Processing is performed by compute nodes. A full rack of Exalogic has 30 compute nodes, a half-rack has 16 compute nodes, a quarter-rack has 8 compute nodes, and a one-eighth rack has 4 compute nodes.

The compute node resembles traditional server hardware and is designed to be a general-purpose processing unit; however, its hardware and software have been specifically constructed and tuned to run Java-based middleware software.

Compute nodes are pre-loaded with the Exalogic Linux base image. They can be re-imaged with either a Solaris or Exalogic Elastic Cloud Software (EECS) server. You can run any type of application you want on a compute node if it is supported on the operating system.

Compute nodes balance high performance with high density. Density is a measure of computing power within a given amount of floor space in a data center. You could have multiple applications deployed on a single compute node. You could configure the compute node to have a backup compute node.

The number of processor cores on an Exalogic compute node depends on the machine version. For example, a compute node on a standard Exalogic X2-2 machine has two 6-core processors (12 cores in total), and a compute node on a standard X3-2 machine has two 8-core processors (16 cores in total).

1.3.2 About Exalogic Storage

Shared storage is provided by a Sun ZFS Storage ZS3 appliance, which is accessible by all the compute nodes. ZFS storage features optimized compression, performance and reliability optimizations, and is built in to the Exalogic machine.

With ZFS, storage has been specifically engineered to hold the binaries and configurations for both middleware and applications therefore reducing the number of installations and simplifying configuration management on the Exalogic system.

The Exalogic storage subsystem consists of two physically separate storage heads in an active/standby configuration and large shared disk array. Each of the storage heads is directly attached to the I/O fabric with redundant Quad Date Rate (QDR) InfiniBand. The storage subsystem is accelerated with two types of solid state memory that are used as read and write caches, respectively, in order to increase system performance. The storage heads transparently integrate the many Serial Attached SCSI disks in the disk array into a single ZFS cluster which is then made available to Exalogic compute nodes through standard network file systems supported by the compute node's operating system. ZFS cluster is made available to compute nodes or virtual machines, depending on the configuration.

1.3.3 About Exalogic Networking

InfiniBand and Ethernet switches enable network communication in Exalogic.

InfiniBand provides reliable delivery, security and quality of service at the physical layer in the networking stack, with a maximum bandwidth of 40Gb/s and latency down to 1 millisecond. The compute and storage nodes include InfiniBand network adapters, which are also referred to as host channel adapters (HCAs). The dual-port infiniband HCA provides a private internal network connecting the compute nodes and storage nodes to the system's I/O fabric.

In addition, the operating system images shipped with Exalogic are bundled with a suite of InfiniBand drivers and utilities called the OpenFabrics Enterprise Distribution (OFED). OFED is a core component of what Oracle refers to as the Exalogic Elastic Cloud Software. The Exalogic Elastic Cloud Software also includes optimizations that have been engineered into Oracle Fusion Middleware and that leverage OFED to provide higher performance over InfiniBand.

IB networking is used for all communications and data transfers within the Exalogic machine and can be used to connect multiple Oracle Engineered Systems together to create a very high performance, multi-purpose computing environment.

Although the hardware within Exalogic utilizes an InfiniBand fabric, the rest of your data center, along with the outside world, still speaks only Ethernet. This includes your application clients, such as web browsers, as well as legacy enterprise information systems, which components running within Exalogic may need to communicate with. Exalogic's switches and nodes enable this communication through the Ethernet over InfiniBand (EoIB) protocol. As the name suggests, EoIB gives InfiniBand devices the ability to emulate an Ethernet connection using IB hardware.

1.4 About Oracle Exalogic Elastic Cloud

Oracle Exalogic Elastic cloud is Oracle's first engineered system for Enterprise Java Applications.

These applications include Oracle Fusion Middleware and any application which can run on one of the Exalogic supported Operating systems namely Linux or Solaris. Hardware and software are engineered together to optimize extreme Java performance.

Oracle Exalogic can also be used for consolidation, by the addition of Oracle Elastic Cloud software and the Exalogic platform can be used to serve virtual server farms.

Figure 1-2 Oracle Exalogic Elastic Cloud

Description of Figure 1-2 follows
Description of "Figure 1-2 Oracle Exalogic Elastic Cloud"

Oracle has made unique optimizations and enhancements to Exalogic components, as well as Oracle’s Fusion middleware and Oracle’s applications, which includes on­chip network virtualization, high performance Remote Direct Memory Access (RDMA) at operating system and Java Virtual Machine (JVM) layers and Exalogic­aware workload management in Oracle WebLogic Server (Oracle's Java EE application server), to meet the highest standards of reliability, availability, scalability and performance.

Exalogic Elastic Cloud comprises Exabus, which is a set of hardware, firmware, and software optimizations that enable the operating system, middleware components, and even certain Oracle applications to make full use of the infiniband fabric and the Oracle Traffic Director.

The InfiniBand network fabric offers extremely high bandwidth and low latency, which provides major performance gains with respect to communication between the application server and the database server, and with respect to communication between different application server instances running within the Exalogic system.

Figure 1-3 Exalogic Elastic Cloud Software (v2.X) Performance Benchmark

Description of Figure 1-3 follows
Description of "Figure 1-3 Exalogic Elastic Cloud Software (v2.X) Performance Benchmark"

The current release of the Exalogic Elastic Cloud software includes a tightly integrated server virtualization layer with unique capabilities allowing the consolidation of multiple, separate virtual machines containing applications or Middleware on each server node while introducing essentially no I/O virtualization overhead to the Exabus InfiniBand network and storage fabric.

Physically, Oracle Exalogic Elastic Cloud can be viewed as a rack of physical server machines plus centralized storage, which all have been designed together to cater to typical high-performance Java application use cases.

This section contains the following topics:

1.4.1 Understanding Exalogic Elastic Cloud Architecture

This section describes the Exalogic elastic cloud architecture.

The Exalogic system consists of the following two major elements:

  • Exalogic X5­2 - A high performance hardware system, assembled by Oracle that integrates storage and compute resources using a high-performance I/O subsystem called Exabus, which is built on Oracle's Quad Data Rate (QDR) InfiniBand.

  • Exalogic Elastic Cloud Software - An essential package of Exalogic-specific software, device drivers, and firmware that is pre­integrated with Oracle Linux and Solaris, enabling Exalogic's advanced performance and Infrastructure-as-a-Service (IaaS) capability, server and network virtualization, storage and cloud management capabilities.

    • WebLogic Server - Session replication uses the SDP layer of IB networking to maximize performance of large scale data operations as this avoids some of the typical TCP/IP network processing overhead. When processing HTTP requests, WebLogic Server makes native use of the SDP protocol when called by Oracle Traffic Director, or when making HTTP requests to it. Through its Active Gridlink for RAC feature, WebLogic Server JDBC connections and connection pools can be configured to use the low level SDP protocol when communicating natively with Exadata over the IB fabric.

    • Coherence - Cluster communication has been dramatically redesigned to further minimize network latency when processing data sets across caches. Its elastic data feature increases performance in conjunction with the compute nodes built in solid state drives by optimizing both the use of RAM and garbage collection processing to minimize network and memory use. When sending data between caches it uses only a RDMA level IB verb set, thus avoiding nearly all the TCP/IP network processing overhead.

    • Tuxedo - Tuxedo has been similarly enhanced to make increasing use of SDP and RDMA protocols to optimize the performance of inter-process communications within and between compute nodes.

1.4.2 Commissioning an Oracle Exalogic Elastic Cloud

Oracle Fusion Middleware software has been enhanced with performance optimizations for deployment on Exalogic.

You can follow individual software documentation for specific Fusion Middleware Applications.

For example:
  • Identity And Access Enterprise Deployment Guide

  • Web Center Enterprise Deployment Guide

  • SOA Enterprise Deployment Guide

This document does not describe how to commission your Exalogic hardware or how to install the Oracle Exalogic Elastic Cloud software. For information on how to commission your Exalogic hardware refer to Oracle Exalogic Documentation Library Exalogic Release EL X2-2, X3-2, X4-2, and X5–2.

Weblogic Server Exalogic optimizations can be enabled following the "Exalogic Elastic Cloud Software Support" section and refer to Core Server in What's New in Oracle WebLogic Server guide.

This document does not describe how to commission your Exalogic hardware or how to install the Oracle Exalogic Elastic Cloud software. For information on how to do this, you should refer to the documentation:

Oracle Exalogic Documentation Library Exalogic Release EL X2-2, X3-2, X4-2, and X5–2.

http://docs.oracle.com/cd/E18476_01/

1.5 Understanding Exalogic Networking

This topic provides information on Exalogic networking.

The following topics describe how an Exalogic machine is networked:

1.5.1 Network Diagram for Exalogic Machine

This topic provides the information on network diagram for Exalogic machine.

Figure 4-1 shows the network diagram for an Oracle Exalogic machine.

Figure 1-4 Exalogic Machine Network Overview

Description of Figure 1-4 follows
Description of "Figure 1-4 Exalogic Machine Network Overview"
The schematic representation of Oracle Exalogic machine's network connectivity includes the following:
  • Default BOND0 interface, which is the private InfiniBand fabric including the compute nodes connected via Sun Network QDR InfiniBand Gateway Switches

    Typical Uses of this network are:
    • To communicate between compute nodes

    • To access the internal Oracle ZFS Storage Appliance and other Engineered Systems on the fabric

    • To communicate between vServers

    • InfiniBand partitions and memberships provide network isolation and security

    Note:

    InfiniBand BOND0 interfaces are the default channel of communication among Exalogic compute nodes and storage server head. IP subnets and additional bonds can be added on top of this default bonded interface.

    The device nodes representing the IPoIB network interface for Oracle Linux are referred to as ib0 and ib1. The corresponding logical devices created by Oracle Solaris are referred to as ibp0 and ibp1. The default IPoIB bonded interface BOND0 or IPMP0, configured by the Exalogic Configuration Utility, comprises these Linux-specific interfaces or Solaris-specific interfaces, respectively.

  • BOND1 interface, which is the Ethernet over InfiniBand (EoIB) link

    Typical Uses of this network are:
    • EoIB External Management Network on a vLAN

    • The IP address provided in the ECU spreadsheet created by the ECU configuration process

    • Used for Cloud Administration via Exalogic Control

    • EoIB user access networks on separate vLANs

    • Created by the Exalogic Administrator at post-Exalogic installation

    • Used to access guest vServers and their application services

    Note:

    The device nodes representing the EoIB network interface for Oracle Linux are referred to as vnic0 and vnic1. The Linux kernel creates eth device nodes that correspond to the vnic0 and vnic1 instances that are created on the Sun Network QDR InfiniBand Gateway Switch.

    The corresponding logical devices created by Oracle Solaris are referred to as eoib0 and eoib1. The EoIB bonded interface BOND1 or IPMP1 must be configured manually. When you configure them, choose the network interfaces specific to your operating system.

  • NET0 interface, which is associated with the host Ethernet port 0 IP address for every compute node and storage server head

    Typical Uses of this network are:
    • To access all physical components and ILOMs

    • To perform system administration and life cycle management

    • Used by the Exalogic Control stack

    Note:

    The device node representing the management network interface for Oracle Linux is referred to as eth0. The corresponding logical device created by Oracle Solaris is referred to as igb0.

  • Client access network for external data center connectivity

1.5.2 Understanding Network Protocols

In an Exalogic deployment, all networking is via Infiniband.

Most likely, your corporate network is Ethernet based. However, you can configure the Infiniband networks so that they understand Ethernet traffic, and then, you can attach your Exalogic machine to the corporate network. This is known as Ethernet over Infiniband (EoIB). The EoIB network communicates with your corporate network using 10 GB Ethernet. This network is known as the client/public/external network.

If you are communicating with other components inside the Exalogic machine, then you do not need to use Ethernet. InfiniBand adapters (HCAs) provide advanced features that can be used via the native "verbs" programming interface:

  • Data transfers can be initiated directly from user space to the hardware, bypassing the kernel and avoiding the overhead of a system call.

  • The adapter can handle all of the network protocol of breaking a large message (even many megabytes) into packets, generating ACKs, retransmitting lost packets, etc. without using any CPU on either the sender or receiver.

  • IPoIB (IP-over-InfiniBand) is a protocol that defines how to send IP packets over IB; for example, Linux has an "ib_ipoib" driver that implements this protocol. This driver creates a network interface for each InfiniBand port on the system, which makes an HCA act like an ordinary NIC.

IPoIB does not make full use of HCA capabilities; network traffic goes through the normal IP stack. This means a system call is required for every message, and the host CPU must handle breaking data up into packets. However, it does mean that applications that use normal IP sockets will work on top of the full speed of the IB link.

IPoIB provides a normal IP NIC interface that can run TCP (or UDP) sockets on top of it.

SDP (Sockets Direct Protocol) is a transport-agnostic protocol to support stream sockets over Remote Direct Memory Access (RDMA) network fabrics. It is specifically designed for Infiniband networks.

The purpose of the Sockets Direct Protocol is to provide a RDMA-accelerated alternative to the TCP protocol on IP. The goal is to do this in a manner that is transparent to the application.

SDP only deals with stream sockets, and if installed in a system, bypasses the operating system resident TCP stack for stream connections between any endpoints on the RDMA fabric. All other socket types (such as datagram, raw, packet, etc.) are supported by the Linux IP stack and operate over standard IP interfaces (that is, IPoIB on InfiniBand fabrics). The IP stack has no dependency on the SDP stack; however, the SDP stack depends on IP drivers for local IP assignments and for IP address resolution for endpoint identifications.

The IPoIB network is known as the internal network in this guide.

Both networks (EoIB and IPoIB) are accessed through an attached IP address. If you want to route traffic through the EoIB network, you send traffic through the IP address associated with that network. Similarly if you want traffic to go through the internal network, you use the IP address associated with that network. For example:

host1-int is associated with the internal (IPoIB) network

host1-ext is associated with the external (EoIB) network

If you want to communicate with host1 through the internal network, you send traffic to host1-int. If you want to use the external network, use host1-ext.

1.6 About Deploying Exalogic with Exadata

Most Oracle Fusion Middleware applications will interact with an Oracle Database. This database can reside on external hardware that is connected to the Exalogic machine via the external 10 GB Ethernet connected to the datacenter network.

However, you can obtain maximum performance if your database resides on an Exadata appliance connected directly to the Exalogic machine. If Exalogic is connected to Exadata, you have the option of communicating with the database using IP over Infiniband (IPoIB).

For information about connecting an Oracle Exalogic machine to an Oracle Exadata Database machine, see the Oracle Fusion Middleware Exalogic Machine Multirack Cabling Guide.

1.7 Understanding Types of Deployment

This section describes the types of Exalogic deployment.

You can configure Exalogic in physical or virtual deployment.

This section contains the following topics:

1.7.1 About a Physical Exalogic Configuration

In a physical Exalogic configuration, the application software is deployed on compute nodes. Each compute node runs its own single operating system.

All applications, including WebLogic Server, Coherence, and Tuxedo, then share this operating system kernel and the local compute node resources.

The Exalogic compute nodes are engineered servers and thus provide extreme performance to Java-based Middleware software deployed on the compute nodes.

This configuration does not include EECS and Middleware. In addition, applications running on the Exalogic platform are deployed and managed in very much the same way as they are on traditional platforms; new deployments are associated with appropriate physical compute, storage, memory and I/O resources. For more information, see Enterprise Manager in Manager Cloud Control Managing and Monitoring an Oracle Exalogic Elastic Cloud Machine guide which is the primary administration tool.

1.7.2 About a Virtual Exalogic Configuration

The purpose of server virtualization is to fundamentally isolate the operating system and applications stack from the constraints and boundaries of the underlying physical servers. By doing this, multiple virtual machines can be presented with the impression that they are each running on their own physical hardware when, in fact, they are sharing a physical server with other virtual machines.

This allows server consolidation in order to maximize the utilization of server hardware, while minimizing costs associated with the proliferation of physical servers-namely hardware, cooling, and real estate expenses.

This hardware isolation is accomplished either through a software based sharing or a direct device assignment (where a I/O device is directly assigned to a VM). Software based sharing is achieved by inserting a very thin layer of software between the operating system in the virtual machine and the underlying hardware to either directly emulate the hardware or to otherwise manage the flow and control of everything from CPU scheduling across the multiple VMs, to I/O management, to error handling.

The challenge with virtualization is to achieve a high enough consolidation ratio to achieve the cost benefits you need while still being able to provide the exceptional, predictable performance required from your core applications.

The Oracle Exalogic Elastic Cloud provides a unique Input/Output subsystem called Exabus. Exabus employs a converged network fabric to provide all Input/Output services to the applications running within an Exalogic system. Applications residing within an Exalogic system can access all network services provided within the datacenter network through Exabus.

In the latest version of Oracle Exalogic, Oracle has virtualized the InfiniBand connectivity in Exabus, using state-of-the-art, standards-based technology to permit the consolidation of multiple virtual machines per physical server with no impact on performance.

Exalogic includes support for a highly optimized version of the EECS hypervisor, which can be used to subdivide a physical compute node into multiple virtual servers (vServers), each of which may run a separate Oracle Linux operating system instance and applications.

The logical vServers can have specific amounts of physical compute, storage, memory and I/O resources, optionally pre-configured with middleware and applications. This approach allows for maximum levels of resource sharing and agility as vServers can share physical resources and can be provisioned in minutes. Pre-configured OVM templates for Oracle applications are available to download.

EECS has been engineered for tight integration with Exalogic's Exabus I/O backplane using a technique called Single Root I/O Virtualization (SR­IOV).

SR-IOV eliminates virtualization overhead to deliver the maximum performance and scalability, while also allowing the same InfiniBand I/O adapter to be shared by up to 63 virtual machines, each with a redundant pair of InfiniBand connections, enabling highly efficient, consolidated operations. SR-IOV's unique ability to nearly eliminate virtualization overhead while still allowing the sharing of hardware permits a much higher server consolidation ratio and higher performance.

1.7.3 About Choosing a Type of Deployment

Both of the Exalogic implementation styles (physical and virtual) can support the creation of a private cloud.

In a virtualized system, Exalogic Control is used to define, manage, and monitor cloud users and services. In a physical system, equivalent functionality is provided by Enterprise Manager with the Cloud Management Pack.

Among the benefits of using virtualized approach is application consolidation, tenant isolation (provision secure Exalogic resources to multiple tenants), deployment simplification, including scaling up or down. With the advent of Exalogic Elastic Cloud technology, the impact of virtualization on application throughput and latency has been minimized to negligible. Applications running in Exalogic vServers perform on par with deployments on bare metal, but retain all of the manageability and efficiency benefits that come with server virtualization.

If you deploy your application using a bare metal (physical) deployment, then you have at your disposal the raw processing power of the compute node. A single compute node is likely to offer more processing power than a single component of your application needs. So, to make the best use of the processing power available, several application components or applications will be installed onto the same compute node.

If you deploy your application using a virtual deployment, you can modularize the components, creating several smaller virtual servers to provide application component isolation. In a virtual deployment, if the underlying hardware fails, then a virtual server can be moved to a different underlying physical host to resume processing. Having a distributed deployment allows you to isolate failures to smaller areas.