Academia.eduAcademia.edu
Computer Networks 52 (2008) 1975–1987 Contents lists available at ScienceDirect Computer Networks journal homepage: www.elsevier.com/locate/comnet Introducing OMS protection in GMPLS-based optical ring networks Luis Velasco *, Salvatore Spadaro, Jaume Comellas, Gabriel Junyent Optical Communications Group, Universitat Politècnica de Catalunya (UPC), C/Jordi Girona, 1-3 D6-107, 08034 Barcelona, Spain a r t i c l e i n f o Article history: Received 1 June 2007 Received in revised form 12 February 2008 Accepted 13 February 2008 Available online 21 March 2008 Keywords: ASON/GMPLS ROADM LMP OMS dedicated protection OMS shared protection a b s t r a c t Legacy ring-based networks have been deployed in conjunction with SONET/SDH technology to provide survivability to transport networks, and they achieve service recovery within 50 ms after fault detection. Current generalized multiprotocol label switching (GMPLS)-controlled optical transport networks need efficient resilience mechanisms to allow recovery times equivalent to those granted by SONET/SDH networks. In this paper, dedicated and shared optical multiplex section (OMS) protection systems are proposed. Both solutions consist of a mechanism based on extensions of the GMPLS link management protocol (LMP) to properly manage the protection actions, and both utilize a new reconfigurable optical add/drop multiplexer (ROADM) design to support the protection schemes. The performance of both solutions has been experimentally evaluated. Ó 2008 Elsevier B.V. All rights reserved. 1. Introduction Legacy SONET/SDH ring-based networks are well-known for their inherently fast protection switching capability, which allows service recovery within 50 ms after fault detection [1]. An interruption of 50 ms or less in a transmission signal is perceived by higher layers as a transmission error. It may cause a packet retransmission handled by TCP/IP at the IP layer, but no TCP sessions will be affected at all. In VoIP applications, users do not perceive 100 ms outages [2]. A complete discussion about the 50 ms figure can be found in [3]. The introduction of OADMs into transport networks allows them to be configured in ring-based topologies similar to traditional SONET/SDH networks. An OADM allows the dropping of a specific wavelength out of the bundle of dense wavelength division multiplexing (DWDM)-multiplexed signals and the addition of another channel on the same wavelength. OADMs are currently evolving into reconfigurable (and remotely controlled) devices, paving * Corresponding author. Tel.: +34 93 401 69 99. E-mail addresses: luis.velasco@tsc.upc.edu (L. Velasco), spadaro@tsc. upc.edu (S. Spadaro), comellas@tsc.upc.edu (J. Comellas), junyent@tsc. upc.edu (G. Junyent). 1389-1286/$ - see front matter Ó 2008 Elsevier B.V. All rights reserved. doi:10.1016/j.comnet.2008.02.022 the way for future flexible optical networks. In fact, dynamic optical rings using ROADMs play a crucial role in the migration to the ASON/GMPLS paradigm [4,5]. An automatically switched optical network (ASON) [4] is an optical transport network that has dynamic connection set up/tear down capability. This functionality is accomplished by means of a control plane that carries out, among other things, routing and signaling functions. GMPLS is a technology that provides enhancements to MPLS to support switching capabilities not only at the packet level but also at the time slot, wavelength, or even fiber levels [5]. GMPLS provides a suitable control plane for dynamic optical networks. It includes the traffic engineering (TE) extensions of RSVP-TE [6] for signaling and the intra-domain linkstate OSPF-TE [7] for routing. The use of DWDM technology implies a very large number of parallel links (i.e., hundreds or even thousands of wavelengths if multiple fibers are used) between two adjacent nodes. The manual configuration and control of such a huge amount of resources becomes impractical. The link management protocol (LMP) [8] has been specified to resolve this issue. The LMP specification defines two core procedures, namely the control channel management and the link property correlation [8]. The former procedure covers, among other functionalities, the maintenance of an IP 1976 L. Velasco et al. / Computer Networks 52 (2008) 1975–1987 control channel between each pair of neighboring LMP nodes. It monitors the periodic exchange of Hello messages between neighbor optical node connection controllers (OCC) to confirm that the control channel is operational. The latter procedure can be used for fault management in cases of failure. Currently, the GMPLS recovery framework covers only optical channel (OCh) resilience [9,10]. However, optical multiplex section (OMS) protection schemes allow recovery of the complete bundle of DWDM channels in a fiber with just one protection action. This paper focuses on OMS protection for GMPLS-controlled optical ring networks. Previous work in the literature regarding protection at the OMS level must be taken into consideration. In [11], different architectures for resilient ring and mesh based optical networks are described and compared. In [12], different optical protection ring architectures are described. In particular, OCh shared protection rings (OCh-SPRing) including node architecture designs are discussed in detail. In [13], the authors propose a dedicated protection mechanism based on extensions to the RSVP-TE protocol for fault location and notification. The proposed mechanism can be applied to small metropolitan networks. This solution is lacking in terms of scalability, however, during extrapolation from the obtained protection time (45 ms for a three-node optical ring with links of 35 km) to larger optical rings. In this paper, we focus on dynamic optical rings supporting either dedicated or shared link protection (hereafter OMS DPRing and OMS SPRing, respectively). The OMS DPRing scheme is deployed over two-fiber unidirectional rings. One fiber is dedicated to the working traffic, whereas the other is reserved for protection. The OMS SPRing scheme is deployed over two-fiber bidirectional rings. The total capacity of each fiber is thus divided in two wavebands: one waveband is reserved for transporting working channels, and the other is used for transporting protection channels. Working and protection channels share each fiber in this case. We propose and evaluate complete solutions for building ring-based dynamic optical networks with OMS DPRing and OMS SPRing protection capabilities. Both proposals consist of: (1) a novel GMPLS automatic protection switching (GAPS) mechanism that coordinates the protection actions after failures and (2) a new reconfigurable optical ROADM design to support OMS protection. The performance of both solutions has been experimentally evaluated over the ASON/GMPLS CARISMA network test-bed [14]. Some other work related to protection has also been considered. In [15], the concept of differentiated reliability (DiR) was introduced. A reliability degree was assigned to each individual connection irrespective of the underlying protection mechanism. In our proposal, all of the network lightpaths have the same priority. In [16], a routing algorithm with shared-risk link groups (SRLG) disjoint protection for mesh networks was presented. In general, link failure dependency is an important factor to be considered when calculating disjoint routes. However, we assume that the optical topology has been designed during the planning phase in such a way that no common infrastructure (e.g., optical cables or conduits) is used by any two links in the network. The authors in [17] provide a framework for waveband switching (WBS). In WBS, wavelengths are grouped into bands and switched as single entities. Thus, a waveband is an intermediate entity between fibers and wavelengths. Our solution for OMS SPRing is based on separating wavelengths into two bands, one for working and one for protection. When a failure occurs, working and protection bands are switched. Our solution for OMS DPRing, on the other hand, is based on fiber switching. The remainder of the paper is organized as follows. Section 2 provides an overview of protection mechanisms for ring-based networks. In Section 3, an availability model for lightpaths that is useful for comparing the performance of the proposed schemes is presented. Section 4 is devoted to the basic description of the GAPS mechanism. In Section 5, the design for the utilized optical nodes is presented. Some experimental results are presented in Section 6. Finally, Section 7 draws the main conclusions of this work. 2. Protection mechanisms for ring-based networks Failures at the optical layer have a high impact on overall optical network performance due to the high bandwidth available per wavelength and the number of wavelengths per fiber. For example, fiber cuts resulting from digging works or the failure of individual transmitters or receivers are quite common [18]. In ring-based dynamic optical networks, protection at the optical layer can be implemented at either the OMS or OCh layer. At the OCh layer, the protection action is performed when the ROADMs inject/extract the selected wavelength to/from the ring network. At the OMS layer, the protection action is performed by the ROADMs adjacent to the failure. OMS protection is more appropriate for fiber failures, whereas OCh protection is adequate when a single channel fails. A general comparison in terms of cost, availability, and recovery time of different network architectures can be found in [11]. With the OCh schemes, it is possible to mix protected and unprotected traffic; in contrast, OMS schemes are more rigid because all channels are simultaneously recovered. On the other hand, if most of the channels need to be protected in some subareas of the network (typically, on the core network), OMS schemes are a good option. For example, in a network where all lightpaths are protected, OMS protection allows recovery of all connections with only one action. On the contrary, OCh protection requires one action for each impacted lightpath. This hardly increases the quantity of signaling messages required, as well as the optical node complexity. The protection scheme can be dedicated (dedicated protection ring, DPRing) or shared (shared protection ring, SPRing). Although the shared protection scheme consumes fewer resources than the dedicated approach, its implementation and management are typically more complex [11,19]. A detailed classification of resilience schemes for ring networks including both OCh and OMS layers can be found in [3] and [19]. In the next two subsections, we introduce two OMS protection schemes for ring-based dynamic optical networks. 1977 L. Velasco et al. / Computer Networks 52 (2008) 1975–1987 2.1. OMS dedicated protection ring (OMS DPRing) OMS DPRing consists of two counter-rotating unidirectional rings, each of which transmits in an opposite direction relative to the other (Fig. 1a). Only one fiber is dedicated for working traffic, and the other is reserved for protection. Both flows of a bidirectional lightpath are routed on different sides of the ring using the same wavelength. There is thus no possibility of reusing wavelengths on the ring for different lightpaths. Therefore, the maximum capacity that can be allocated on the ring is limited to the capacity of a single link. When a link failure occurs, it is detected by the two optical nodes adjacent to the failure. Both nodes loop back the bundle of optical channels on the protection ring in the opposite direction (dashed lines in Fig. 1b). To perform and manage efficient switching to the protection fiber, an automatic protection switching (APS)-like protocol [1] is required. 2.2. OMS shared protection ring (OMS SPRing) In OMS SPRing, the total capacity of each fiber is divided in two wavebands (B1, B2) as shown in Fig. 2a. One waveband on each fiber (B1 clockwise and B2 counter-clockwise) is reserved to transport working channels, whereas the other is used to transport protection channels. In the OMS SPRing scheme, therefore, working and protection channels share each fiber. Working connections in one fi- a ber are protected by the available capacity in the other fiber in the opposite direction of the ring. This way, no wavelength converters are needed when channels are moved from working to protection bands. Both directions of a bidirectional lightpath are routed along the same side of the ring in different fibers. The same wavelength can therefore be reused to accommodate a connection between other nodes whose routes do not overlap the existing connection (connections A–D and E– F in Fig. 2a). When a link (or node) failure is detected at the OMS level, the nodes adjacent to the failure will loop back all lightpaths at once on the protection channels of the ring (Fig. 2b). Similar to the OMS DPRing scheme, an APS-like protocol is required to manage the switching actions and ensure the correct use of the shared protection capacity. Although the implementation of OMS SPRing is more complex than that of OMS DPRing, it provides better bandwidth efficiency. For example, the maximum number of protected lightpaths that can be transported in an n node ring with OMS DPRing is limited to the number of wavelengths available in each link (e.g., W). On the contrary, the maximum number of protected lightpaths that can be transported using OMS SPRing depends on the traffic pattern. In our example, it ranges from W for hub-like traffic (one node sources all traffic) to a maximum of Wn/2 for the case where the nodes only send traffic to their adjacent nodes. b A B C A B C F E D F E D Working Protection Fig. 1. An OMS DPRing transporting one lightpath (a) before and (b) after a failure in the link B–C. a b A B C A B C F E D F E D Working Protection B1 B2 Fig. 2. An OMS SPRing transporting two lightpaths (a) before and (b) after a failure in the link B–C. 1978 L. Velasco et al. / Computer Networks 52 (2008) 1975–1987 3. Availability model for OMS protection schemes When comparing different protection schemes in optical networks, a crucial aspect involves the lightpaths’ availability. In this section, we define an availability model for the OMS DPRing and OMS SPRing schemes. The aim of this model is to justify that a protection scheme is strictly required. Generally speaking, availability is the probability that a system will be found in the operating state at a random time in the future. Steady state availability can be expressed as [3]: A¼ UpTime MTTF  UpTime þ DownTime MTTF þ MTTR ð1Þ PfEDPRing g ¼ PfElightpath g 8 0 19 > > > > < \ [ B \ C= B C ¼P Ei [ Ei \ Ej A @ > > > > 8i2ring 8j6¼i ; :8i2ring i;j2ring Generally speaking, optical links may be supported by common cables or conduits, and thus links in the ring may fail dependently [16]. When long-haul core networks connecting main cities are deployed, however, the planning phase must address this issue by choosing fibers supported by a disjoint infrastructure. In the present study, therefore, we consider the links in the ring to be mutually failure-independent and can express availability in OMS DPRing as where  MTTF is the mean time to failure, the expected time to the next failure of the network component following completion of the repair. MTTF is usually expressed in hours or in FITs (number of failures in 109 h).  MTTR is the mean time to repair, the expected time needed to repair the network component. ð3Þ ADPRing ¼ Alightpath ¼ Y 8i2ring 1 0 X B Y C BU i  Ai þ Aj C A @ 8i2ring ð4Þ 8j6¼i i;j2ring The probabilistic complement of the availability A is unavailability (U), defined as As an example, we calculate the availability for the lightpath A–D in the OMS DPRing network shown in Fig. 1a assuming that all links have the same length (300 km). Using the values given in Table 1, the availability will be: U ¼1A ADPRing ¼ A6link þ 6U link A5link ¼ 99:9981% ð2Þ For the purpose of availability analysis purpose, let us consider the figures shown in Table 1 for the MTTF and MTTR [18,20]. In long-haul networks, the system components with the highest failure rate are the optical cables (Table 1). Therefore, the availability model can be accurately estimated when taking into account only link failures. Let us denote Ex and Ex as an event and a negate event, respectively, associated with a functional element or system x. In our study, Ex ðEx Þ implies that x is (not) operating at the time t independent of the past history of events. Thus, PfEx g represents the x availability ðAx Þ and PfEx g its corresponding unavailability ðU x Þ. In an OMS DPRing, the lightpaths’ availability is given by the union of two disjoint groups of events, namely: (1) all links i in the ring are available and (2) one link in the ring is unavailable but the rest of the links are available and can be used for ring protection. In OMS DPRing, lightpaths use resources in every link of the ring as shown in Fig. 1. Therefore, ring and lightpath availability are coincident, and this value is given by the following mathematical expression: ð5Þ Availability figures close to 100% are difficult to compare. For this reason, we will use the unavailability figure. Applying (2), the lightpath A–D unavailability is U DPRing ¼ 1:88E  5 ð6Þ This is equivalent to saying that the A–D lightpath will be unavailable, on average, for 9.86 min per year over the OMS DPRing. We can also use (4) to calculate the lightpaths’ availability in OMS SPRing, taking into account that the network is bidirectional and lightpaths will be routed strictly through the shortest route in this case. In OMS SPRing, therefore, each lightpath has a different availability depending on its route. According to this, we can express the lightpath availability over an OMS SPRing as 1 0 Alightpath SPRing Y X B B ¼ Ai þ BU i  @ 8i2lightpath 8i2lightpath Y 8j6¼i i2lightpath j2ring C C Aj C A ð7Þ In this case, the unavailability for the lightpath A–D in the OMS SPRing shown in Fig. 2a is Table 1 MTTF and MTTR values Optical node failure rate Fiber-optic cable failure rate Plug-replacement equipment MTTR Fiber-optic cable MTTR 10,867 FITs 311 FITs/km 2h 12 h 3 5 U A—D SPRing ¼ 1  ðAlink þ 3U link Alink Þ ¼ 1:50E  5 ð8Þ The A–D lightpath will be unavailable, on average, 7.89 min per year over this OMS SPRing. 1979 L. Velasco et al. / Computer Networks 52 (2008) 1975–1987 On the basis of the previous results, we can say that OMS shared protection provides better lightpath availability than OMS dedicated protection. OMS shared protection makes it possible to find shortest routes for the lightpaths. This is opposite to the path protection case, where dedicated path protection provides better lightpath availability than shared path protection due to the fact that the protecting route is shared by several lightpaths. In fact, ‘shared’ in OMS protection refers to the fact that working and protection resources share one fiber; each optical working channel has been assigned a backup optical channel in the protection capacity. Fig. 3 shows the unavailability of the longest possible lightpaths in an OMS SPRing (solid lines) and OMS DPRing (dashed lines) as a function of the number of nodes (n) in the ring for several average link lengths (L). The target lightpath availability in a network has to be chosen according to the distances in that network. In metropolitan networks, an availability objective of 0.99999 (five nines) or 5.26 min/year of total outage is sometimes referred to as the availability objective. In long-haul core networks, however, a four nines availability objective (less than 53 min/year of total outage) is more appropriate. Note that the values of U link range from 104 to 103 for lengths ranging from 30 to 300 km, respectively, in Table 1. Thus, the graph in Fig. 3 also draws the unavailability objective of 104, which corresponds to a target availability of 0.9999. From Fig. 3, we can conclude that the maximum overall length that allows us to meet the strict unavailability objective is roughly 3800 km for the OMS DPRing scheme and roughly 4500 km for the OMS SPRing scheme. The OMS SPRing scheme provides an improvement of about 25% in the expected lightpaths’ unavailability over the OMS DPRing scheme. If no protection scheme is implemented or applied (1) to a ring network whose length is 3800 km, an unavailability of 1.4  102 would be found. This implies more than 5 days/year (or 20 min/day) of total outage, which clearly does not meet the required network availability. Therefore, a protection scheme is strictly required. Expected Unavailability (U) 1 E-03 Finally, let us analyze the behavior of OMS schemes in a multiple failure scenario. In OMS DPRing, all lightpaths will become unavailable under a double-link failure since lightpaths have the same route through all links in the ring. This is different in OMS SPRing, since each lightpath may have a different route. One lightpath will remain working under a double-link failure if the failure affects links that do not support a given lightpath; in all other cases, the lightpath will become unavailable. As an example, let us consider the OMS SPRing in Fig. 2a, where links A–F and F–E fail simultaneously. In this case, lightpath F–E will become unavailable whereas lightpath A–D will remain working. Eqs. (4) and (7) can be used to calculate the expected lightpath availability under any arbitrary number of failures. 4. GMPLS-controlLed OMS protection In this Section, we present the GAPS mechanism. This mechanism is based on extensions of the LMP protocol that can be used for fault management purposes in OMS protection. We first introduce the mechanism to control the OMS DPRing protection scheme, and then we extend the GAPS mechanism to control the OMS SPRing scheme. Protection time models for GAPS-controlled OMS protected rings are defined. 4.1. The GAPS mechanism As stated above, the OMS DPRing configuration consists of two counter-rotating rings. In the normal state (Fig. 4a), the working links in the transport plane carry regular traffic. When a network component fails, a switch event occurs and the working link is protected using backup links. Let us assume that OMS DPRings are remotely controlled by a GMPLS control plane that can be transported out-of-band in-fiber or out-of-fiber [5]. For the sake of simplicity, we assume that the topology of both the control and transport planes is the same (Fig. 4a), although the GAPS mechanism would work for any control plane topology independent of the transport plane. U(SPRing) (L=100 Km) U(DPRing) (L=100 Km) U(SPRing) (L=200 Km) U(DPRing) (L=200 Km) U(SPRing) (L=300 Km) U(DPRing) (L=300 Km) U lim 1 E-04 1 E-05 1 E-06 4 6 8 10 12 14 16 Number of nodes in the ring (n) Fig. 3. Lightpaths unavailability in OMS DPRing and in OMS SPRing. 18 20 1980 L. Velasco et al. / Computer Networks 52 (2008) 1975–1987 Fig. 4. OMS DPRing controlled by the GAPS mechanism. The link failure implies a Loss of Light (LoL) detection. After the detection, the failure must be corrected by its adjacent transport nodes. These nodes, called switching nodes, use the bridge and switch actions for the protection of the working link (Fig. 4b). Specifically, when an optical node detects a LoL, it notifies the failure to its corresponding OCC in the GMPLS-based control plane; this OCC becomes the head end. It conveys the failure detection to the OCC (tail end) corresponding to the other adjacent optical node, which executes a bridge. GAPS messages include the information depicted in Table 2. To illustrate how the GAPS mechanism works, Fig. 5 shows the recovery from a link failure. The initial state of the ring is the normal state. In this state (T0), all OCCs in the ring have exchanged normal state messages 1–4 with their neighbors. At time T1, node A detects a LoL on its working link and notifies its OCC. When an OCC receives notification of failure detection, it sends a switching request (i.e. GAPS messages) to the OCC of the adjacent node over the control network on both the short and long paths. The short path connects the head and tail OCCs directly, whereas the long path connects them through intermediate OCCs using the opposite side of the ring. Node A then becomes a switching node, and its OCC becomes the head end. The head end OCC sends a bridge request. All intermediate OCCs on the long path enter the full pass-through state. OCC D, upon receiving the bridge request from OCC A on the short path, trans- mits a LoL ring bridge. OCC D, upon receiving the bridge request from OCC A on the long path, executes a bridge and updates its status. OCC A, upon receiving the ACK from OCC D on the long path, executes a ring switch and updates its status. Signaling then reaches the steady-state. At time T2, the LoL clears. Node A notifies its OCC of this, and OCC A enters the wait-to-restore (WTR) state and advertises its new state to OCC D. Upon receiving the WTR bridge request on the short path, OCC D sends out a message with the WTR code. At time T3, the WTR interval expires. OCC A sends out a no request message. OCC D, upon receiving the no request from OCC A on the long path, drops its bridge and generates the Idle code. OCC A, upon receiving the Idle code on the long path, drops its switch and also generates the Idle code. All OCCs return then to the normal state. Since the protection channels are shared among all links, contention among the nodes may arise when multiple simultaneous failures occur. In these cases, the request with the lowest head node identifier (ID) has priority. This mechanism is useful in OMS SPRing, where some lightpaths can continue working in a double-failure scenario. If in-fiber signaling is used, the GAPS mechanism is able to efficiently manage node failures. When a LoL is detected in this case, the detecting OCC includes the identifier of the destination node in the GAPS message. If the GAPS message reaches the other end of the failure (i.e., the destination OCC), it will find itself as the destination OCC. If this occurs, Table 2 Information transported by GAPS messages Source/destination Node ID Identifies the origin/destination nodes for this GAPS message. Depending on the semantics for the message type, origin and destination nodes represent head and tail nodes or vice versa Request type Indicates the type of request. A request can be a condition (LoL), a state (normal) or an external request (not covered in this paper) Path Indicates whether the path the message is being sent to the short or long path Status Indicates the status of the protection switch 1981 L. Velasco et al. / Computer Networks 52 (2008) 1975–1987 OCC C T0 OCC D 3b 4a 4b OCC A 1a T1 Enter switching state 5b 6b 5a 1b 2a 2b 3a LOL received Enter switching state 1a NR/D A/S/IDLE 6a LOL/A D/S/IDLE 5b 1b NR/B A/S/IDLE 6b LOL/A D/L/IDLE 2a NR/A B/S/IDLE 7a LOL/A D/S/Br 2b NR/C B/S/IDLE 7b LOL/A D/L/Br 3a NR/B C/S/IDLE 8a LOL/D A/S/RDI 3b NR/D C/S/IDLE 8b LOL/D A/L/Sw 4a NR/C D/S/IDLE 9a WTR/D A/S/Sw 4b NR/A D/S/IDLE 5b 6a 6b Node Bridge 7b OCC C OCC B 6b 7a Node Passthrough 7b 7b Node Switch 8a Node Passthrough 8b 9b WTR/D A/L/Sw 5a LOL/D A/S/RDI 10a NR/D A/S/Sw 5b LOL/D A/L/IDLE 10b NR/D A/L/Sw 8b 8b T2 LOL cleared WTR starts 9a NR: No Request LOL: Loss of Light WTR: Wait To Restore 9b S: L: RDI: Br: Sw: 9b 9b 7a 6b 6b WTR Dest. 6b Short path Long path Remote Defect Indication Bridged Switched Generated Message Retransmitted Message T3 WTR expires 10a 10b 10b Node A Node B Node D 10b Node Normal 4b Node Normal 4a Node Normal 4b Node Normal 1a Node C 4b 1b 2a 2b 3b 3a Fig. 5. Failures management: GAPS messages. the failure was indeed a link failure as assumed. On the contrary, a node failure will be assumed if the message reaches an OCC adjacent to the destination and the destination OCC is unreachable. In the latter, the OCC adjacent to the failure node will act as the destination and assume its protecting role. A simplified finite state machine for the GAPS mechanism is illustrated in Fig. 6. When in-fiber signaling is used, messages that head and tail end OCCs would exchange through the short path are never received. The transition from the normal state to the ring bridge destination state can be completed either with an intermediate transition upon reception of the request message through the short path or directly upon reception of the request through the long path. Receiving the request message through the short path permits acceleration of the switching process by preparing the optical node. The same can be done in the intermediate nodes upon reception of a bridge request directed to the tail end. When considering bidirectional rings (OMS SPRing protection), two GAPS entities are needed so that one entity is available for each direction. Under normal conditions, the working wavebands in the transport plane are used to carry regular traffic. When a network component fails, a switch event occurs and the working wavebands are protected using the protection wavebands. A bidirectional link failure implies LoL detection in the adjacent optical nodes, which notify their OCCs in the GMPLS-based control plane of the failure (Fig. 7). The adjacent OCCs exchange bridge requests. 1982 L. Velasco et al. / Computer Networks 52 (2008) 1975–1987 ring bridge req. (not dest.) pass through no request Initial state no request normal ring bridge req. long path (dest.) ring bridge req. (short path) ring switching (head) destination bridged no request ring switching (dest.) no request ring switch dropped (head) head switched wtr (dest.) ring bridged (dest.) working channel SF ring switched (head) working channel clear wtr interval expires wtr (head) wtr bridge request Fig. 6. GAPS mechanism: finite state machine. The GAPS message (Table 3) contains all the information needed by the GAPS protocol. From a functional point of view, the GAPS agent is located on the top of two LMP agents (LMP east and west ). This way, GAPS messages can be sent through either the east or west control channel. Fig. 7. A failure in a bidirectional link is detected by its adjacent nodes. 4.2. GAPS LMP extensions definition We define GAPS as an LMP extension running in the GMPLS-based control plane of OMS protected networks. In this way, we avoid the implementation of a new control protocol that would increase the signaling overhead. Specifically, GAPS relies on control channel management functionalities provided by the LMP protocol. Once a control channel is activated between two adjacent nodes, the LMP Hello messages exchanged can be used to maintain control channel connectivity between the nodes. In order to run the GAPS mechanism, however, the definition of a novel LMP message is required: GAPS Message hGAPS Messagei::= hCommon Headeri hGAPSi This message is used to transmit GAPS information when the LMP adjacency is part of an OMS protected ring. Table 3 GAPS Object Format 4.3. Protection time Models for GAPS-controlled OMS protected rings In this Subsection, we present models for calculating the protection time for OMS DPRing and OMS SPRing running with the GAPS mechanism. Our aim is to determine the switching time requirements to be imposed on the optical nodes necessary to meet the protection time target. Let us define the protection time ðT DPRing Þ in an OMS DPRing as the interval from the decision to switch to the completion of the switching operation at the node initiating the bridge request. It includes, therefore, the notification from the initiating optical node to its OCC ðT config Þ, the propagation delay in each control network link ðT link Þ, the processing time in each OCC ðT control Þ, the time to configure each optical node in the ring ðT config Þ to perform the switching action, and the time to switch itself ðT switch Þ. Thus, T DPRing can be expressed as 1983 75 T(DPRing)(L=100Km) T(DPRing)(L=200Km) T(DPRing)(L=300Km) Objective 50 25 0 4 6 8 10 12 14 16 Number of nodes in the ring (n) 18 20 Fig. 8. Protection time for OMS DPRings. Theoretical protection time (ms) Theoretical protection time (ms) L. Velasco et al. / Computer Networks 52 (2008) 1975–1987 75 T(SPRing)(L=100Km) T(SPRing)(L=200Km) T(SPRing)(L=300Km) Objective 50 25 0 4 6 8 10 12 14 16 Number of nodes in the ring (n) 18 20 Fig. 9. Protection time for OMS SPRings. T DPRing ¼ 2T config þ T switch þ ð2n  1ÞT control þ 2ðn  1ÞT link 5. ROADM design to support OMS protection schemes ð9Þ Tswitch is predefined by the switching device. We use a switch with a response time below 1 ms, which is in line with the devices currently available on the market. T link depends on the ring link lengths (L) and the signal speed through the fiber. Note that T link is negligible for metropolitan ring networks. The protection time model for GAPS controlling a bidirectional ring is different than that defined in (9) for OMS DPRing. In fact, GAPS has been extended in the case of OMS SPRing to coordinate both protection actions (one for each direction) to be performed by the nodes adjacent to the failure. We assume that configuration actions are executed in the optical node in a serial manner. Let us define the protection time ðT SPRing Þ in an OMS SPRing as the interval from the decision to switch to the completion of the switch operation at the node initiating the bridge request. In this case, T SPRing can be expressed as T SPRing ¼ 2T config þ T switch þ nT control þ ðn  1ÞT link þ MaxðT config ; ðn  1ÞT control þ ðn  1ÞT link Þ ð10Þ The term Maxða; bÞ expresses the idea of configuration actions that are performed in a serial manner in the optical node. Note that (10) will give the same values as (9) when the time to configure the optical node is higher than the time to transport the GAPS message around the ring. This will happen if the number of nodes in the ring is low or the distances between ring nodes are short. In such cases, the protection time in OMS SPRing will be lower than the objective even though it is higher than that of the OMS DPRing. Figs. 8 and 9 show the theoretical protection time for OMS DPRing and OMS SPRing, respectively, as functions of the number of nodes (n) in the ring for several link lengths. They show the scalability of GAPS when the number of nodes in the ring is increased. In this analysis, we assume T config to be less than 5 ms and T control to be about 0.2 ms. From the GAPS mechanism, we can conclude that the typical target protection time (i.e. 50 ms) is reached even when rings are composed of a large number of nodes. However, it implies strict requirements for the hardware of the optical nodes (e.g., T control and T config ). Besides the design of the GAPS mechanism, we have designed two new optical nodes (one for OMS DPRings and another for OMS SPRings) capable of satisfying the requirements derived in the last section. Optical nodes are based on two wavelength selective switches (WSS). One of these is used for adding and the other for dropping the local traffic [21]. In the OMS DPRing node (Fig. 10a), two optical power meters (labeled with M in Fig. 10) measure the incoming optical power at the east and west ports. Two 2  2 optical switches have been added to the WSS components to allow OMS protection. The two pairs of optical Mux/demux are responsible for coupling the WDM-multiplexed bundle with the in-fiber optical supervisory channel (OSC), which transports the control channel. Two 1300 nm optical transponders ðkOSC Þ are used to convert electrical fast Ethernet signals to the optical domain. Additionally, a node controller (not shown in Fig. 10) is needed to manage the extra resources. The optical power meters monitor the incoming optical power levels at the west and east inputs and notify the node controller upon receiving out-of-bounds levels. Upon receiving this information, the node controller will send a LoL notification to the OCC. Note that if the link is not affected by a failure, optical power must always be received at each end of the link. When considering out-of-fiber signaling, the OSC does not transport the control channel. Nevertheless, the associated optical hardware cannot be eliminated because an optical pilot signal is still necessary to detect the repair of any failure affecting the link. The OMS SPRing optical node is also based on WSS as shown in Fig. 10b. We use WSS components with a capacity of 40 channels in the C-band. Although the optical node is defined as bidirectional, it is important to highlight that, as in the OMS DPRing optical node, only two WSS are used; this keeps the cost and complexity of the node low. We have defined waveband B1 as channels 1–20 and B2 as channels 21–40. Under normal conditions, B1 received from east and B2 received from west are used to transport the traffic and B2 east and B1 west are used for protection. Band splitter (BS) 1984 L. Velasco et al. / Computer Networks 52 (2008) 1975–1987 b a Demx M Demx M Mx s Mx s West East s B1 BS B1+B2 WSS WSS B1 s Mx Demx λOSC WSS Mx M WSS M BS B2 s East BS B2 Demx West B1 B2 s s λOSC λOSC λOSC Fig. 10. Optical nodes design to support OMS protection. (a) OMS DPRing scheme (b) OMS SPRing scheme. components are used to divide the WDM bundle into two bands, and splitters (S) are used to separate optical signals or join both bands. To implement OMS SPRing protection, four 2  2 optical switches decide which (B1, B2) bands are used to transport the traffic and which bands are used for protection. The resulting cost and complexity of the OMS SPRing optical node are not much higher than those of the OMS DPRing optical node; in fact, only two 2  2 optical switches and a set of passive components (splitters and band splitters) are added. The functionality depicted in Fig. 10 has been separated into several building blocks. Each block has been implemented as a separated card. Therefore, two additional cards equipped with active components other passive components (mux/demux, splitters, and band splitters), the transponder and optical switching and monitoring (OSNL) cards, have been added to the WSS components. The OSNL card includes the monitoring and switching devices. The switching device has a response time below 1ms and insertion losses lower than 0.9 dB. The monitoring device extracts a small part of the incoming optical power, transforms the sample into a digital value by means of an A/D converter, and stores the converted value in a register. The monitoring sweep time is 10 ls. Each active card in the optical node is equipped with an ARM7 32-bit RISC processor [22] running at 100 MHz. The processor card controls the different cards components and manages the communication with the node controller implemented in a separate card (the so-called ‘Master card’). The Master card communicates through an internal serial bus with the rest of the optical node cards and through a fast Ethernet interface with the control and management planes. The Master card is based on the UNC90 microcontroller module [23], which is equipped with an ARM9 32bit RISC processor [22] running at 180 MHz. In addition to other elements, the UNC90 module includes a RISC processor, 32 MByte SDRAM, and 32 MByte Flash Memory. The internal architecture of the optical node is shown in Fig. 11. The OSNL card processor includes an interrupt-driven system to allow the CPU to continue processing instruc- tions when a request from the Master Card arrives. In the mean time, the card processor executes a polling loop to continuously read samples from the monitoring register. If the values for the samples read within 1 ms are considered to be out-of-bounds, the card processor declares a LoL condition. This condition has to be communicated to the Master card by sending a proprietary message through the serial bus. The Master processor runs using a Linux Operating System. An application (the node agent) has been developed to manage the entire optical node by providing an interface between the cards and the control and management planes. The agent listens for incoming data from serial and TCP/UDP ports. When a message indicating a LoL condition is received through the serial port, the agent sends a simple network management protocol (SNMP) trap that brings the related information to the OCC in the control plane. The agent on the Master card accepts request-response commands using an XML-based proprietary protocol. When a message through a TCP/UDP port is received, the agent decodes it and initiates the appropriate communication to another card in the optical node through the serial bus. As defined in the previous Section, T config is the configuration time of the optical node (i.e., the time to process a request from the OCC or inform the OCC of any event). In order to achieve a recovery time shorter than 50 ms after fault detection, we specified 5 ms as the maximum value for T config in the previous section. During optimization of the system, some bottlenecks were detected and corrected. One of the more important ones relates to the TTY device driver architecture in the Linux kernel [24]. Linux considers serial ports to be high latency devices. When data is received, therefore, the TTY device driver schedules itself to push the data to the user application at some later point in the near future. This behavior introduces an unacceptable delay in the system. To avoid this high latency in the serial transmission, the Linux kernel was modified to define the serial driver as a low latency driver that immediately pushes the data to the user application, (the node agent). L. Velasco et al. / Computer Networks 52 (2008) 1975–1987 1985 Fig. 11. Optical node internal architecture. Fig. 12 shows the physical layout of the resulting OSNL and Master and Transponder Cards. It represents the frontal view of the optical node and the test-bed where the complete architecture has been tested. 6. Experimental results The performance of the GAPS mechanism with the ROADM nodes discussed in the previous sections has been experimentally evaluated using the ASON/GMPLS CARISMA network test-bed [14]. The CARISMA GMPLS control plane uses the RSVP-TE protocol for signaling, the OSPF-TE protocol for routing, and the LMP protocol for control channel management and link property correlation. The OCCs have been implemented using Linuxbased routers. Each pair of OCCs communicates through a single IP control channel implemented with full duplex Fast Ethernet links. Finally, each OCC has also a connection controller interface (CCI) for communicating with the optical nodes. Fig. 13a shows the switching time measured when the protection decision is made by the OCC at the control plane upon reception of a LoL notification from the optical node ð2T config þ T switch Þ. Thus, it does not include any interchange of GAPS messages. Fig. 12. Physical layout and testbed. Fig. 13b shows the experimental results for the protection time as a function of the number of nodes for OMS DPRing and OMS SPRing. Because of the testing environment, T link is negligible. In order to provide enough accuracy, the figures reflect the average values from 10 experiments. From these results and applications (9) for OMS DPRing and (10) for OMS SPRing, we found that T control is less than 0.1ms and T config is less than 4.5 ms. These results are better than those specified in previous sections. Thus the obtained behavior will be also better than that shown in Figs. 8 and 9 for OMS DPRing and OMS SPRing, respectively. Note that the experimental protection times in Fig. 13b do not include any propagation time (the T link terms in (9) and (10)). Therefore, these experimental times must be incremented with the specific propagation time. For example, in a 12 node OMS DPRing with links of 300 km, the term 2ðn  1ÞT link represents an additional delay of 33 ms. The protection time in this case would be 11.90 + 33=44.90 ms. This is better than the 48.6 ms value specified in previous sections. As an example of Figs. 13 and 14a and b show the experimental T DPRing and T SPRing , respectively, for the worst scenario considered with 18 nodes. When the link failure is repaired, optical power is detected again by the adjacent optical nodes. At this moment, the WTR period starts. After the WTR time, the protection is reverted and the signal is switched from the protection to the working links. Therefore, it is possible to deploy rings with 20 nodes and a total length of 2000 km or 16 nodes and 3200 km with a recovery time below 50 ms. These results show that our GAPS-based solutions scale linearly with both the number of nodes in the ring and link length. Thus, we can conclude that GAPS in conjunction with the designed ROADMs provide OMS protection under 50 ms in rings with high numbers of nodes. Finally, a comparison of both solutions is shown in Table 4. Although the cost and complexity of the OMS DPRing solution is lower than that of the OMS SPRing, the increment in the ROADM cost due to the OMS support is very low in both cases. OMS SPRing is more bandwidth efficient and provides better availability than OMS DPRing. 1986 L. Velasco et al. / Computer Networks 52 (2008) 1975–1987 a Experimental protection time (ms) b -10 Optical power (dBm) -15 -20 -25 -30 9.89 ms -35 -40 19 DPRing 17 SPRing 15 13 11 9 6 3 -45 0 5 10 9 12 15 18 Number of nodes in the ring (n) ms 15 Fig. 13. Experimental results (a) experimental 2T config þ T switch value (b) evolution of protection time with the number of nodes. a -10 b -10 -15 Optical power (dBm) Optical power(dBm) -15 -20 -25 -30 12.85 ms -35 -40 -20 -25 15.71 ms -30 -35 -40 -45 -45 5 0 10 15 20 25 30 35 40 45 0 5 10 15 20 25 30 35 40 45 Time (ms) Time (ms) Fig. 14. Experimental results for rings with 18 nodes (a) OMS DPRing protection time (b) OMS SPRing Protection time. Table 4 Comparison of OMS solutions Transported traffic (lightpaths) Availability Protection time Cost OMS DPRing OMS SPRing W (num. wavelengths/link). Independent of the number of nodes (n) in the ring High Fastest (<50 m) Lowest From W to Wn/2, depending on the traffic pattern Highest Fast (<50 m) Low 7. Conclusions In this paper, we have presented two OMS protection solutions (DPRing and SPRing) for GMPLS-controlled optical ring networks based on a novel GAPS mechanism. OMS protection makes possible the recovery of all optical channels in a fiber with just one protection action. Since the overall protection time increases linearly with the number of nodes, the scalability of the GAPS mechanism has been demonstrated. From the obtained results, we conclude that a ring-based network using the designed ROADM nodes and controlled by the GAPS protocol will provide survivability with a SDH-like service recovery time (<50 ms) even in large optical rings. A pay-as-you-grow strategy can be implemented using both schemes. OMS DPRing can be used in networks where the expected traffic demand is lower than the number of wavelengths available in each link. If the traffic grows, the migration from OMS DPRing to OMS SPRing consists of adding one OSNL card and one card for the passive components (splitters and band splitters) to every optical node in the ring. Acknowledgements This work has been partially funded by the i2Cat Foundation through the TRILOGY project and by the Spanish Science Ministry through the TEC-2005-08051-C03-02 RINGING project. References [1] ITU-T Rec. G.841, Types and characteristics of SDH network protection architectures, 1998. L. Velasco et al. / Computer Networks 52 (2008) 1975–1987 [2] G. Iannaccone, C. Chuah, R. Mortier, S. Bhattacharyya, C. Diot, Analysis of link failures in an IP backbone, in: Proceedings of ACM SIGCOMM IMW’02, Marseille, France, November 2002. [3] W.D. Grover, Mesh-Based Survivable Networks, Prentice Hall PTR, New Jersey, 2004. [4] ITU-T Rec. G.8080/Y.1304, Architecture for the Automatically Switched Optical Networks, 2001 and Am. 1, 2003. [5] E. Mannie, Generalized multi-protocol label switching (GMPLS) architecture, RFC-3945, 2004. [6] L. Berger, Generalized multi-protocol label switching (GMPLS) signaling resource reservation protocol-traffic engineering (RSVPTE) extensions, RFC 3473, 2003. [7] D. Katz, K. Kompella, D. Yeung, Traffic engineering (TE) extensions to OSPF Version 2, RFC 3630, 2003. [8] J. Lang, Link management protocol (LMP), RFC 4204, 2005. [9] J.P. Lang et al., RSVP-TE extensions in support of end-to-end generalized multi-protocol label switching (GMPLS) recovery, RFC 4872, 2007. [10] L. Berger et al., GMPLS segment recovery, RFC 4873, 2007. [11] P. Arijs et al, Design of ring and mesh based WDM transport networks, Optical Networks Magazine 3 (2000) 25–40. [12] M.-J. Li et al, Transparent optical protection ring architectures and applications, IEEE Journal of Lightwave Technology 23 (10) (2005) 3388–3403. [13] R. Muñoz et al., Experimental GMPLS fault management for OULSR transport networks, in: Optical Fiber Communications, OFC/NFOEC 3, 2005, paper JWA50. [14] J. Perelló, E. Escalona, S. Spadaro, J. Comellas, G. Junyent, Resource discovery in ASON/GMPLS transport networks, IEEE Communications Magazine 45 (10) (2007) 86–92. [15] A. Fumagalli, M. Tacca, Differentiated reliability (DiR) in wavelength division multiplexing rings, IEEE/ACM Transactions on Networking 14 (1) (2006) 159–168. [16] L. Guo, L. Lemin, A novel survivable routing algorithm with partial shared-risk link groups (SRLG)-disjoint protection based on differentiated reliability constraints in WDM optical mesh networks, IEEE Journal of Lightwave Technology 25 (6) (2007) 1410–1415. [17] X. Cao, V. Anand, C. Qiao, Framework for waveband switching in multigranular optical networks part I-multigranular cross-connect architectures, Journal of Optical Networking 5 (12) (2006) 1043– 1055. [18] M. To, P. Neusy, Unavailability analysis of long-haul networks, IEEE Journal on Selected Areas in Communication 12 (1994) 100–109. [19] J.-P. Vasseur, M. Pickavet, P. Demeester, Network Recovery – Protection and Restoration of Optical, SONET-SDH, IP and MPLS, Elsevier, San Francisco, 2004. [20] S. Verbrugge et al., General availability model for multilayer transport networks, in: Proceedings of DRCN, 2005, pp. 85–92. [21] S. Sygletos, A. Tzanakaki, I. Tomkos, Numerical study of cascadability performance of continuous spectrum wavelength blocker/selective switch at 10/40/160 Gb/s”, IEEE Photonics Technology Letters 18 (24) (2006) 2608–2610. [22] ARM: <http://www.arm.com>. [23] DIGI UNC90 - Datasheet: <http://www.digi.com/pdf/hwref_ cc9u. pdf>. [24] J. Corbet, A. Rubini, G. Kroah-Hartman, Linux Device Drivers, third ed., O’Reilly Media, Sebastopol, 2005. Luis Velasco (luis.velasco@tsc.upc.edu) received the M.Sc. degree in Telecommunications Engineering from Universidat Politécnica de Madrid (UPM), in 1989. In the same year, he joined Telefónica de España and was involved on the specifications and first office application of Telefónica’s SDH transport network. In 2003 he joined Universitat Politècnica de Catalunya (UPC), where currently he is assistant professor. He is currently working towards the Ph.D. degree at the Optical Communications group of UPC. His research interests include signaling, routing and resilience mechanisms architectures in ASON/GMPLS-based networks. 1987 Salvatore Spadaro (spadaro@tsc.upc.edu) received the M.Sc. and the Ph.D. degrees in Telecommunications Engineering from UPC (Barcelona, Spain) in 2000 and 2005, respectively. He also received the M.Sc. degree in Electrical Engineering from Politecnico di Torino, Italy, in 2000. He is currently associate professor in the Optical Communications group of the Signal Theory and Communications Department of UPC. He has been involved in international and national research projects. He has co-authored about 60 papers in international journals and conferences. His research interests are in the fields of all-optical networks with emphasis on traffic engineering and resilience. Jaume Comellas (comellas@tsc.upc.edu) recei ved M.S (1993) and Ph.D. (1999) degrees in Telecommunications Engineering from UPC. Since 1992 he has been a staff member of the Optical Communications Research Group of UPC. His current research interests mainly concern optical transmission and IP over WDM networking topics. He has participated in different research projects funded by the Spanish government and the European Commission. He has co-authored more than 100 research articles in national and international journals and conferences. He is associate professor at the Signal Theory and Communications Department of UPC. Gabriel Junyent (junyent@tsc.upc.edu) is a telecommunications engineer (Universidad Politécnica de Madrid, UPM, 1973), and holds a Ph.D. degree in communications (UPC, 1979). He has been a teaching assistant (UPC, 1973–1977), adjunct professor (UPC,1977– 1983), associate professor (UPC, 1983–1985), and professor (UPC, 1985–1989), and has been a full professor since 1989. In the last 15 years he has participated in more than 30 national and international R&D projects, and has published more than 30 journal papers and book chapters and 100 conference papers.