首页>>厂商>>工控机厂商>>凌华科技

Architecting Highly Available CompactPCI Systems

2003/04/04

High Availability is an overused term in today's marketplace. Vendors have used this term to define architectures as simple as redundant power supplies and as complicated as fully redundant systems. This leads to the question "What is High Availability?". It might be easier to think of High Availability as an increase in the availability of a system, or a decrease in downtime. Many of today's telecommunications systems require 5NINES availability or 99.999% uptime. The amount of downtime allowed in these systems is 5.26 minutes per year ( 525,600 minutes/year x 99.999%). The 5 minutes of downtime includes scheduled maintenance as well as any downtime that might result from the failure of any part of the system. Designing High Availability systems that are capable of obtaining 5NINES availability will generally require that every function in the system be redundant, that is there is no single point of failure. The road to High Availability systems generally includes redundant power supplies, fan trays, and mirrored hard drives. The addition of these redundant components will decrease the probability that a component failure will cause a system failure. The addition of the redundant components has increased the availability of the system; it is now more highly available. As you might expect adding redundancy to power supplies, fans and hard drives is relatively straight forward. Providing for redundant compute elements in a system is a more complicated challange.

Application of CompactPCI to High Availability Applications

Developers have been applying PICMG 2.0 CompactPCI Specification compliant systems to a variety of High Availability applications over the years. As the market requirements for High Availability have increased, CompactPCI systems have had to evolve to meet the new challenges. The original CompactPCI systems were simple bus based architectures. Figure 1 shows typical first generation CompactPCI architecture.


PICMG 2.0 CompactPCI compliant systems are composed of one or more CompactPCI bus segments. Each segment can contain up to eight CompactPCI board slots. Each bus segment contains one System Slot and up to 7 Peripheral Slots. The PCI bus is used as the primary communication path between the slots in each bus segment. In this architecture the PCI Bus and the System Slot are single points of failure. A misbehaving Peripheral Slot can bring down the entire PCI Bus segment preventing communication between any of the slots. This single point of failure was a significant obstacle to the adoption of CompactPCI in High Availability applications. Early architects of CompactPCI High Availability systems had to overcome the limitation of the single point of failure PCI Bus. The typical solution was to add a second CompactPCI bus segment and duplicate the functionality in both bus segments. Figure 2 shows an example of a dual CompactPCI bus based architecture.



In Figure 2 dual bus segments and dual System Slots are used to provide redundancy for the single points of failures that exist in standard Compact PCI architectures. In the Dual Segment architecture, each of the System Slots can control either of the two PCI Bus Segments. By providing redundant System Slots, a failure of either System Slot can now be compensated for. This architecture also covers the potential fault of a PCI bus. If a fault occurs in PCI Bus 1, then PCI Bus 2 is available to handle the task. The engineering challenges with this kind of architecture are complicated. The System Slots provide clocks, arbitration and interrupt servicing for a bus segment. The failover of a System Slot requires that the clock drivers, request/grant arbitration and interrupt controllers also transfer over to the active System Slot. Knowing when a bus has failed and then being able to bring up the redundant System Slot without impacting the total system availability is difficult. In 1999 PICMG formed a subcommittee to standardize an implementation of Redundant System Slots. The PICMG 2.13 Redundant System Slot specification was abandoned three years later. PICMG 2.13 is the only subcommittee that was disbanded without completing a specification. This is largely due to the complexities of the problem and the propriety solutions that exist. It is clear that redundant system slots in CompactPCI can be used to increase system availability but at a cost and at a level of complexity that are prohibitive. Vendors that provide this type of architecture are selling proprietary solutions - not open architectures.

Adding IP Data Transport to CompactPCI

In September 2001, PICMG approved the PICMG 2.16 Packet Switched Backplane specification. This specification defines 10/100/1000Mbit Ethernet interconnects between peripheral slots and fabric slots in a compact PCI segment. The fabric slots are redundant. PICMG 2.16 compliant systems have been deployed in a variety of applications. The ubiquitous nature of the Ethernet interconnects and the need for IP data transports has led to high levels of adoption among system providers. Figure 3 shows a typical PICMG 2.0 and 2.16 architecture.

In PICMG 2.16 compliant systems the IP data transport can be used as the primary communications channel within the system. This communications path has redundant links to redundant Fabric Slots
The PICMG 2.16 specification allows an architect to avoid using the CompactPCI bus altogether, and provides a way of increasing system availability without increasing the cost of the system. PICMG 2.16 compliant systems are inherently redundant - there is no single point of failure. The Ethernet fabric is a convenient way to handle packet based data transport that we see in next generation applications.

The next step in the evolution of highly available CompactPCI systems is the removal of the System Slot. As applications take advantage of the IP interconnects in today's systems, the PCI bus is becoming an unused expense. PICMG is working on a specification called CompactTCA. The CompactTCA specification is expected to combine the system management capabilities defined in AdvancedTCA (PICMG 3.0) the form factor defined in PICMG 2.0 and the data transport defined in PICMG 2.16. This architecture will not contain a PCI bus. This kind of system will be able to support 24 Peripheral slots and two Fabric Slots. The elimination of the PCI bus will reduce the cost of the boards used in CompactPCI systems, reduce the complexities of providing redundant system slots and increase the total slot count. Figure 4 shows an example of a possible CompactTCA system.

Summary

PICMG 2.16 Packet Switched Backplane is a viable way to improve the availability of systems built today. The elimination of single points of failure found in first generation CompactPCI systems and the addition of redundant data transports provide the building blocks necessary to achieve 5NINES availability. Systems designers should beware of vendors providing products based on proprietary Redundant System Slot architectures. These closed architecture systems will not benefit from the CompactPCI ecosystem that exists today. It is clear that CompactPCI systems using PICMG 2.16 Packet Switched backplanes will provide the combination of point to point data transports and redundancy necessary to achieve 5NINES availability as well as providing a migration path to future technologies.

凌华科技供稿 CTI论坛编辑