Computer Networks 52 (2008) 1975–1987
Contents lists available at ScienceDirect
Computer Networks
journal homepage: www.elsevier.com/locate/comnet
Introducing OMS protection in GMPLS-based optical ring networks
Luis Velasco *, Salvatore Spadaro, Jaume Comellas, Gabriel Junyent
Optical Communications Group, Universitat Politècnica de Catalunya (UPC), C/Jordi Girona, 1-3 D6-107, 08034 Barcelona, Spain
a r t i c l e
i n f o
Article history:
Received 1 June 2007
Received in revised form 12 February 2008
Accepted 13 February 2008
Available online 21 March 2008
Keywords:
ASON/GMPLS
ROADM
LMP
OMS dedicated protection
OMS shared protection
a b s t r a c t
Legacy ring-based networks have been deployed in conjunction with SONET/SDH technology to provide survivability to transport networks, and they achieve service recovery
within 50 ms after fault detection. Current generalized multiprotocol label switching
(GMPLS)-controlled optical transport networks need efficient resilience mechanisms to
allow recovery times equivalent to those granted by SONET/SDH networks. In this paper,
dedicated and shared optical multiplex section (OMS) protection systems are proposed.
Both solutions consist of a mechanism based on extensions of the GMPLS link management
protocol (LMP) to properly manage the protection actions, and both utilize a new reconfigurable optical add/drop multiplexer (ROADM) design to support the protection schemes.
The performance of both solutions has been experimentally evaluated.
Ó 2008 Elsevier B.V. All rights reserved.
1. Introduction
Legacy SONET/SDH ring-based networks are well-known
for their inherently fast protection switching capability,
which allows service recovery within 50 ms after fault
detection [1]. An interruption of 50 ms or less in a transmission signal is perceived by higher layers as a transmission error. It may cause a packet retransmission handled by TCP/IP
at the IP layer, but no TCP sessions will be affected at all. In
VoIP applications, users do not perceive 100 ms outages
[2]. A complete discussion about the 50 ms figure can be
found in [3].
The introduction of OADMs into transport networks allows them to be configured in ring-based topologies similar to traditional SONET/SDH networks. An OADM allows
the dropping of a specific wavelength out of the bundle
of dense wavelength division multiplexing (DWDM)-multiplexed signals and the addition of another channel on
the same wavelength. OADMs are currently evolving into
reconfigurable (and remotely controlled) devices, paving
* Corresponding author. Tel.: +34 93 401 69 99.
E-mail addresses: luis.velasco@tsc.upc.edu (L. Velasco), spadaro@tsc.
upc.edu (S. Spadaro), comellas@tsc.upc.edu (J. Comellas), junyent@tsc.
upc.edu (G. Junyent).
1389-1286/$ - see front matter Ó 2008 Elsevier B.V. All rights reserved.
doi:10.1016/j.comnet.2008.02.022
the way for future flexible optical networks. In fact, dynamic optical rings using ROADMs play a crucial role in
the migration to the ASON/GMPLS paradigm [4,5]. An automatically switched optical network (ASON) [4] is an optical
transport network that has dynamic connection set up/tear
down capability. This functionality is accomplished by
means of a control plane that carries out, among other
things, routing and signaling functions. GMPLS is a technology that provides enhancements to MPLS to support
switching capabilities not only at the packet level but also
at the time slot, wavelength, or even fiber levels [5]. GMPLS
provides a suitable control plane for dynamic optical networks. It includes the traffic engineering (TE) extensions
of RSVP-TE [6] for signaling and the intra-domain linkstate OSPF-TE [7] for routing.
The use of DWDM technology implies a very large number of parallel links (i.e., hundreds or even thousands of
wavelengths if multiple fibers are used) between two adjacent nodes. The manual configuration and control of such a
huge amount of resources becomes impractical. The link
management protocol (LMP) [8] has been specified to resolve this issue. The LMP specification defines two core procedures, namely the control channel management and the
link property correlation [8]. The former procedure covers,
among other functionalities, the maintenance of an IP
1976
L. Velasco et al. / Computer Networks 52 (2008) 1975–1987
control channel between each pair of neighboring LMP
nodes. It monitors the periodic exchange of Hello messages
between neighbor optical node connection controllers
(OCC) to confirm that the control channel is operational.
The latter procedure can be used for fault management in
cases of failure.
Currently, the GMPLS recovery framework covers only
optical channel (OCh) resilience [9,10]. However, optical
multiplex section (OMS) protection schemes allow recovery
of the complete bundle of DWDM channels in a fiber with
just one protection action. This paper focuses on OMS protection for GMPLS-controlled optical ring networks.
Previous work in the literature regarding protection at
the OMS level must be taken into consideration. In [11],
different architectures for resilient ring and mesh based
optical networks are described and compared. In [12], different optical protection ring architectures are described.
In particular, OCh shared protection rings (OCh-SPRing)
including node architecture designs are discussed in detail.
In [13], the authors propose a dedicated protection mechanism based on extensions to the RSVP-TE protocol for
fault location and notification. The proposed mechanism
can be applied to small metropolitan networks. This solution is lacking in terms of scalability, however, during
extrapolation from the obtained protection time (45 ms
for a three-node optical ring with links of 35 km) to larger
optical rings.
In this paper, we focus on dynamic optical rings supporting either dedicated or shared link protection (hereafter OMS DPRing and OMS SPRing, respectively). The OMS
DPRing scheme is deployed over two-fiber unidirectional
rings. One fiber is dedicated to the working traffic, whereas
the other is reserved for protection. The OMS SPRing
scheme is deployed over two-fiber bidirectional rings.
The total capacity of each fiber is thus divided in two wavebands: one waveband is reserved for transporting working
channels, and the other is used for transporting protection
channels. Working and protection channels share each fiber in this case.
We propose and evaluate complete solutions for building ring-based dynamic optical networks with OMS DPRing
and OMS SPRing protection capabilities. Both proposals
consist of: (1) a novel GMPLS automatic protection switching (GAPS) mechanism that coordinates the protection actions after failures and (2) a new reconfigurable optical
ROADM design to support OMS protection. The performance of both solutions has been experimentally evaluated
over the ASON/GMPLS CARISMA network test-bed [14].
Some other work related to protection has also been
considered. In [15], the concept of differentiated reliability
(DiR) was introduced. A reliability degree was assigned to
each individual connection irrespective of the underlying
protection mechanism. In our proposal, all of the network
lightpaths have the same priority. In [16], a routing algorithm with shared-risk link groups (SRLG) disjoint protection for mesh networks was presented. In general, link
failure dependency is an important factor to be considered
when calculating disjoint routes. However, we assume that
the optical topology has been designed during the planning
phase in such a way that no common infrastructure (e.g.,
optical cables or conduits) is used by any two links in the
network. The authors in [17] provide a framework for
waveband switching (WBS). In WBS, wavelengths are
grouped into bands and switched as single entities. Thus,
a waveband is an intermediate entity between fibers and
wavelengths. Our solution for OMS SPRing is based on separating wavelengths into two bands, one for working and
one for protection. When a failure occurs, working and protection bands are switched. Our solution for OMS DPRing,
on the other hand, is based on fiber switching.
The remainder of the paper is organized as follows. Section 2 provides an overview of protection mechanisms for
ring-based networks. In Section 3, an availability model for
lightpaths that is useful for comparing the performance of
the proposed schemes is presented. Section 4 is devoted to
the basic description of the GAPS mechanism. In Section 5,
the design for the utilized optical nodes is presented. Some
experimental results are presented in Section 6. Finally,
Section 7 draws the main conclusions of this work.
2. Protection mechanisms for ring-based networks
Failures at the optical layer have a high impact on overall optical network performance due to the high bandwidth
available per wavelength and the number of wavelengths
per fiber. For example, fiber cuts resulting from digging
works or the failure of individual transmitters or receivers
are quite common [18].
In ring-based dynamic optical networks, protection at
the optical layer can be implemented at either the OMS
or OCh layer. At the OCh layer, the protection action is performed when the ROADMs inject/extract the selected
wavelength to/from the ring network. At the OMS layer,
the protection action is performed by the ROADMs adjacent to the failure. OMS protection is more appropriate
for fiber failures, whereas OCh protection is adequate
when a single channel fails. A general comparison in terms
of cost, availability, and recovery time of different network
architectures can be found in [11]. With the OCh schemes,
it is possible to mix protected and unprotected traffic; in
contrast, OMS schemes are more rigid because all channels
are simultaneously recovered. On the other hand, if most of
the channels need to be protected in some subareas of the
network (typically, on the core network), OMS schemes are
a good option. For example, in a network where all lightpaths are protected, OMS protection allows recovery of
all connections with only one action. On the contrary,
OCh protection requires one action for each impacted
lightpath. This hardly increases the quantity of signaling
messages required, as well as the optical node complexity.
The protection scheme can be dedicated (dedicated protection ring, DPRing) or shared (shared protection ring,
SPRing). Although the shared protection scheme consumes
fewer resources than the dedicated approach, its implementation and management are typically more complex
[11,19].
A detailed classification of resilience schemes for ring
networks including both OCh and OMS layers can be found
in [3] and [19]. In the next two subsections, we introduce
two OMS protection schemes for ring-based dynamic optical networks.
1977
L. Velasco et al. / Computer Networks 52 (2008) 1975–1987
2.1. OMS dedicated protection ring (OMS DPRing)
OMS DPRing consists of two counter-rotating unidirectional rings, each of which transmits in an opposite direction relative to the other (Fig. 1a). Only one fiber is
dedicated for working traffic, and the other is reserved
for protection. Both flows of a bidirectional lightpath are
routed on different sides of the ring using the same wavelength. There is thus no possibility of reusing wavelengths
on the ring for different lightpaths. Therefore, the maximum capacity that can be allocated on the ring is limited
to the capacity of a single link. When a link failure occurs,
it is detected by the two optical nodes adjacent to the failure. Both nodes loop back the bundle of optical channels on
the protection ring in the opposite direction (dashed lines
in Fig. 1b). To perform and manage efficient switching to
the protection fiber, an automatic protection switching
(APS)-like protocol [1] is required.
2.2. OMS shared protection ring (OMS SPRing)
In OMS SPRing, the total capacity of each fiber is divided
in two wavebands (B1, B2) as shown in Fig. 2a. One waveband on each fiber (B1 clockwise and B2 counter-clockwise)
is reserved to transport working channels, whereas the
other is used to transport protection channels. In the
OMS SPRing scheme, therefore, working and protection
channels share each fiber. Working connections in one fi-
a
ber are protected by the available capacity in the other fiber in the opposite direction of the ring. This way, no
wavelength converters are needed when channels are
moved from working to protection bands.
Both directions of a bidirectional lightpath are routed
along the same side of the ring in different fibers. The same
wavelength can therefore be reused to accommodate a
connection between other nodes whose routes do not
overlap the existing connection (connections A–D and E–
F in Fig. 2a).
When a link (or node) failure is detected at the OMS level, the nodes adjacent to the failure will loop back all
lightpaths at once on the protection channels of the ring
(Fig. 2b). Similar to the OMS DPRing scheme, an APS-like
protocol is required to manage the switching actions and
ensure the correct use of the shared protection capacity.
Although the implementation of OMS SPRing is more
complex than that of OMS DPRing, it provides better bandwidth efficiency. For example, the maximum number of
protected lightpaths that can be transported in an n node
ring with OMS DPRing is limited to the number of wavelengths available in each link (e.g., W). On the contrary,
the maximum number of protected lightpaths that can
be transported using OMS SPRing depends on the traffic
pattern. In our example, it ranges from W for hub-like traffic (one node sources all traffic) to a maximum of Wn/2 for
the case where the nodes only send traffic to their adjacent
nodes.
b
A
B
C
A
B
C
F
E
D
F
E
D
Working
Protection
Fig. 1. An OMS DPRing transporting one lightpath (a) before and (b) after a failure in the link B–C.
a
b
A
B
C
A
B
C
F
E
D
F
E
D
Working
Protection
B1
B2
Fig. 2. An OMS SPRing transporting two lightpaths (a) before and (b) after a failure in the link B–C.
1978
L. Velasco et al. / Computer Networks 52 (2008) 1975–1987
3. Availability model for OMS protection schemes
When comparing different protection schemes in optical networks, a crucial aspect involves the lightpaths’ availability. In this section, we define an availability model for
the OMS DPRing and OMS SPRing schemes. The aim of this
model is to justify that a protection scheme is strictly
required.
Generally speaking, availability is the probability that a
system will be found in the operating state at a random
time in the future. Steady state availability can be expressed as [3]:
A¼
UpTime
MTTF
UpTime þ DownTime MTTF þ MTTR
ð1Þ
PfEDPRing g ¼ PfElightpath g
8
0
19
>
>
>
>
< \
[ B
\
C=
B
C
¼P
Ei [
Ei \
Ej A
@
>
>
>
>
8i2ring
8j6¼i
;
:8i2ring
i;j2ring
Generally speaking, optical links may be supported by
common cables or conduits, and thus links in the ring
may fail dependently [16]. When long-haul core networks
connecting main cities are deployed, however, the planning phase must address this issue by choosing fibers supported by a disjoint infrastructure. In the present study,
therefore, we consider the links in the ring to be mutually
failure-independent and can express availability in OMS
DPRing as
where
MTTF is the mean time to failure, the expected time to
the next failure of the network component following
completion of the repair. MTTF is usually expressed in
hours or in FITs (number of failures in 109 h).
MTTR is the mean time to repair, the expected time
needed to repair the network component.
ð3Þ
ADPRing ¼ Alightpath ¼
Y
8i2ring
1
0
X B
Y
C
BU i
Ai þ
Aj C
A
@
8i2ring
ð4Þ
8j6¼i
i;j2ring
The probabilistic complement of the availability A is
unavailability (U), defined as
As an example, we calculate the availability for the lightpath A–D in the OMS DPRing network shown in Fig. 1a
assuming that all links have the same length (300 km).
Using the values given in Table 1, the availability
will be:
U ¼1A
ADPRing ¼ A6link þ 6U link A5link ¼ 99:9981%
ð2Þ
For the purpose of availability analysis purpose, let us consider the figures shown in Table 1 for the MTTF and MTTR
[18,20]. In long-haul networks, the system components
with the highest failure rate are the optical cables (Table
1). Therefore, the availability model can be accurately estimated when taking into account only link failures.
Let us denote Ex and Ex as an event and a negate event,
respectively, associated with a functional element or system x. In our study, Ex ðEx Þ implies that x is (not) operating
at the time t independent of the past history of events.
Thus, PfEx g represents the x availability ðAx Þ and PfEx g its
corresponding unavailability ðU x Þ.
In an OMS DPRing, the lightpaths’ availability is given
by the union of two disjoint groups of events, namely:
(1) all links i in the ring are available and (2) one link in
the ring is unavailable but the rest of the links are available
and can be used for ring protection. In OMS DPRing, lightpaths use resources in every link of the ring as shown in
Fig. 1. Therefore, ring and lightpath availability are coincident, and this value is given by the following mathematical
expression:
ð5Þ
Availability figures close to 100% are difficult to compare.
For this reason, we will use the unavailability figure.
Applying (2), the lightpath A–D unavailability is
U DPRing ¼ 1:88E 5
ð6Þ
This is equivalent to saying that the A–D lightpath will be
unavailable, on average, for 9.86 min per year over the
OMS DPRing.
We can also use (4) to calculate the lightpaths’ availability in OMS SPRing, taking into account that the network is
bidirectional and lightpaths will be routed strictly through
the shortest route in this case. In OMS SPRing, therefore,
each lightpath has a different availability depending on its
route. According to this, we can express the lightpath availability over an OMS SPRing as
1
0
Alightpath
SPRing
Y
X B
B
¼
Ai þ
BU i
@
8i2lightpath
8i2lightpath
Y
8j6¼i
i2lightpath
j2ring
C
C
Aj C
A
ð7Þ
In this case, the unavailability for the lightpath A–D in the
OMS SPRing shown in Fig. 2a is
Table 1
MTTF and MTTR values
Optical node failure rate
Fiber-optic cable failure rate
Plug-replacement equipment MTTR
Fiber-optic cable MTTR
10,867 FITs
311 FITs/km
2h
12 h
3
5
U A—D
SPRing ¼ 1 ðAlink þ 3U link Alink Þ ¼ 1:50E 5
ð8Þ
The A–D lightpath will be unavailable, on average, 7.89 min
per year over this OMS SPRing.
1979
L. Velasco et al. / Computer Networks 52 (2008) 1975–1987
On the basis of the previous results, we can say that
OMS shared protection provides better lightpath availability than OMS dedicated protection. OMS shared protection
makes it possible to find shortest routes for the lightpaths.
This is opposite to the path protection case, where dedicated path protection provides better lightpath availability
than shared path protection due to the fact that the protecting route is shared by several lightpaths. In fact,
‘shared’ in OMS protection refers to the fact that working
and protection resources share one fiber; each optical
working channel has been assigned a backup optical channel in the protection capacity.
Fig. 3 shows the unavailability of the longest possible
lightpaths in an OMS SPRing (solid lines) and OMS DPRing
(dashed lines) as a function of the number of nodes (n) in
the ring for several average link lengths (L).
The target lightpath availability in a network has to be
chosen according to the distances in that network. In
metropolitan networks, an availability objective of
0.99999 (five nines) or 5.26 min/year of total outage is
sometimes referred to as the availability objective. In
long-haul core networks, however, a four nines availability
objective (less than 53 min/year of total outage) is more
appropriate. Note that the values of U link range from 104
to 103 for lengths ranging from 30 to 300 km, respectively,
in Table 1. Thus, the graph in Fig. 3 also draws the unavailability objective of 104, which corresponds to a target
availability of 0.9999.
From Fig. 3, we can conclude that the maximum overall
length that allows us to meet the strict unavailability
objective is roughly 3800 km for the OMS DPRing scheme
and roughly 4500 km for the OMS SPRing scheme. The
OMS SPRing scheme provides an improvement of about
25% in the expected lightpaths’ unavailability over the
OMS DPRing scheme.
If no protection scheme is implemented or applied (1)
to a ring network whose length is 3800 km, an unavailability of 1.4 102 would be found. This implies more than 5
days/year (or 20 min/day) of total outage, which clearly
does not meet the required network availability. Therefore,
a protection scheme is strictly required.
Expected Unavailability (U)
1 E-03
Finally, let us analyze the behavior of OMS schemes in a
multiple failure scenario. In OMS DPRing, all lightpaths will
become unavailable under a double-link failure since lightpaths have the same route through all links in the ring. This
is different in OMS SPRing, since each lightpath may have a
different route. One lightpath will remain working under a
double-link failure if the failure affects links that do not
support a given lightpath; in all other cases, the lightpath
will become unavailable. As an example, let us consider
the OMS SPRing in Fig. 2a, where links A–F and F–E fail
simultaneously. In this case, lightpath F–E will become
unavailable whereas lightpath A–D will remain working.
Eqs. (4) and (7) can be used to calculate the expected
lightpath availability under any arbitrary number of
failures.
4. GMPLS-controlLed OMS protection
In this Section, we present the GAPS mechanism. This
mechanism is based on extensions of the LMP protocol that
can be used for fault management purposes in OMS protection. We first introduce the mechanism to control the OMS
DPRing protection scheme, and then we extend the GAPS
mechanism to control the OMS SPRing scheme. Protection
time models for GAPS-controlled OMS protected rings are
defined.
4.1. The GAPS mechanism
As stated above, the OMS DPRing configuration consists
of two counter-rotating rings. In the normal state (Fig. 4a),
the working links in the transport plane carry regular traffic. When a network component fails, a switch event occurs
and the working link is protected using backup links. Let us
assume that OMS DPRings are remotely controlled by a
GMPLS control plane that can be transported out-of-band
in-fiber or out-of-fiber [5]. For the sake of simplicity, we
assume that the topology of both the control and transport
planes is the same (Fig. 4a), although the GAPS mechanism
would work for any control plane topology independent of
the transport plane.
U(SPRing) (L=100 Km)
U(DPRing) (L=100 Km)
U(SPRing) (L=200 Km)
U(DPRing) (L=200 Km)
U(SPRing) (L=300 Km)
U(DPRing) (L=300 Km)
U lim
1 E-04
1 E-05
1 E-06
4
6
8
10
12
14
16
Number of nodes in the ring (n)
Fig. 3. Lightpaths unavailability in OMS DPRing and in OMS SPRing.
18
20
1980
L. Velasco et al. / Computer Networks 52 (2008) 1975–1987
Fig. 4. OMS DPRing controlled by the GAPS mechanism.
The link failure implies a Loss of Light (LoL) detection.
After the detection, the failure must be corrected by its
adjacent transport nodes. These nodes, called switching
nodes, use the bridge and switch actions for the protection
of the working link (Fig. 4b). Specifically, when an optical
node detects a LoL, it notifies the failure to its corresponding OCC in the GMPLS-based control plane; this OCC becomes the head end. It conveys the failure detection to
the OCC (tail end) corresponding to the other adjacent
optical node, which executes a bridge. GAPS messages include the information depicted in Table 2.
To illustrate how the GAPS mechanism works, Fig. 5
shows the recovery from a link failure. The initial state of
the ring is the normal state. In this state (T0), all OCCs in
the ring have exchanged normal state messages 1–4 with
their neighbors.
At time T1, node A detects a LoL on its working link and
notifies its OCC. When an OCC receives notification of failure detection, it sends a switching request (i.e. GAPS messages) to the OCC of the adjacent node over the control
network on both the short and long paths. The short path
connects the head and tail OCCs directly, whereas the long
path connects them through intermediate OCCs using the
opposite side of the ring. Node A then becomes a switching
node, and its OCC becomes the head end. The head end OCC
sends a bridge request. All intermediate OCCs on the long
path enter the full pass-through state. OCC D, upon receiving the bridge request from OCC A on the short path, trans-
mits a LoL ring bridge. OCC D, upon receiving the bridge
request from OCC A on the long path, executes a bridge
and updates its status. OCC A, upon receiving the ACK from
OCC D on the long path, executes a ring switch and updates
its status. Signaling then reaches the steady-state.
At time T2, the LoL clears. Node A notifies its OCC of
this, and OCC A enters the wait-to-restore (WTR) state
and advertises its new state to OCC D. Upon receiving the
WTR bridge request on the short path, OCC D sends out a
message with the WTR code.
At time T3, the WTR interval expires. OCC A sends out a
no request message. OCC D, upon receiving the no request
from OCC A on the long path, drops its bridge and generates the Idle code. OCC A, upon receiving the Idle code on
the long path, drops its switch and also generates the Idle
code. All OCCs return then to the normal state.
Since the protection channels are shared among all
links, contention among the nodes may arise when multiple simultaneous failures occur. In these cases, the request
with the lowest head node identifier (ID) has priority. This
mechanism is useful in OMS SPRing, where some lightpaths can continue working in a double-failure scenario.
If in-fiber signaling is used, the GAPS mechanism is able
to efficiently manage node failures. When a LoL is detected
in this case, the detecting OCC includes the identifier of the
destination node in the GAPS message. If the GAPS message
reaches the other end of the failure (i.e., the destination
OCC), it will find itself as the destination OCC. If this occurs,
Table 2
Information transported by GAPS messages
Source/destination
Node ID
Identifies the origin/destination nodes for this GAPS message.
Depending on the semantics for the message type, origin and
destination nodes represent head and tail nodes or vice versa
Request
type
Indicates the type of request. A request can be a
condition (LoL), a state (normal) or an external request
(not covered in this paper)
Path
Indicates whether the path the message is being sent to the
short or long path
Status
Indicates the status of the protection switch
1981
L. Velasco et al. / Computer Networks 52 (2008) 1975–1987
OCC
C
T0
OCC
D
3b
4a
4b
OCC
A
1a
T1
Enter switching
state
5b
6b
5a
1b
2a
2b
3a
LOL received
Enter switching state
1a
NR/D
A/S/IDLE
6a LOL/A D/S/IDLE
5b
1b
NR/B
A/S/IDLE
6b LOL/A D/L/IDLE
2a
NR/A
B/S/IDLE
7a LOL/A D/S/Br
2b
NR/C
B/S/IDLE
7b LOL/A D/L/Br
3a
NR/B
C/S/IDLE
8a LOL/D A/S/RDI
3b
NR/D
C/S/IDLE
8b LOL/D A/L/Sw
4a
NR/C
D/S/IDLE
9a WTR/D A/S/Sw
4b
NR/A
D/S/IDLE
5b
6a
6b
Node
Bridge
7b
OCC
C
OCC
B
6b
7a
Node
Passthrough
7b
7b
Node
Switch
8a
Node
Passthrough
8b
9b WTR/D A/L/Sw
5a LOL/D A/S/RDI
10a NR/D
A/S/Sw
5b LOL/D A/L/IDLE
10b NR/D
A/L/Sw
8b
8b
T2
LOL cleared
WTR starts
9a
NR:
No Request
LOL: Loss of Light
WTR: Wait To Restore
9b
S:
L:
RDI:
Br:
Sw:
9b
9b
7a
6b
6b WTR Dest.
6b
Short path
Long path
Remote Defect Indication
Bridged
Switched
Generated Message
Retransmitted Message
T3
WTR expires
10a 10b
10b
Node A
Node B
Node D
10b
Node
Normal
4b
Node
Normal
4a
Node
Normal
4b
Node
Normal
1a
Node C
4b
1b
2a
2b
3b
3a
Fig. 5. Failures management: GAPS messages.
the failure was indeed a link failure as assumed. On the
contrary, a node failure will be assumed if the message
reaches an OCC adjacent to the destination and the destination OCC is unreachable. In the latter, the OCC adjacent
to the failure node will act as the destination and assume
its protecting role.
A simplified finite state machine for the GAPS mechanism is illustrated in Fig. 6. When in-fiber signaling is used,
messages that head and tail end OCCs would exchange
through the short path are never received. The transition
from the normal state to the ring bridge destination state
can be completed either with an intermediate transition
upon reception of the request message through the short
path or directly upon reception of the request through
the long path. Receiving the request message through the
short path permits acceleration of the switching process
by preparing the optical node. The same can be done in
the intermediate nodes upon reception of a bridge request
directed to the tail end.
When considering bidirectional rings (OMS SPRing protection), two GAPS entities are needed so that one entity is
available for each direction. Under normal conditions, the
working wavebands in the transport plane are used to carry
regular traffic. When a network component fails, a switch
event occurs and the working wavebands are protected
using the protection wavebands. A bidirectional link failure
implies LoL detection in the adjacent optical nodes, which
notify their OCCs in the GMPLS-based control plane of
the failure (Fig. 7). The adjacent OCCs exchange bridge
requests.
1982
L. Velasco et al. / Computer Networks 52 (2008) 1975–1987
ring bridge req.
(not dest.)
pass
through
no request
Initial
state
no request
normal
ring bridge req.
long path (dest.)
ring bridge req.
(short path)
ring
switching
(head)
destination
bridged
no request
ring
switching
(dest.)
no request
ring switch
dropped
(head)
head
switched
wtr
(dest.)
ring bridged
(dest.)
working
channel SF
ring switched
(head)
working
channel clear
wtr interval
expires
wtr
(head)
wtr bridge
request
Fig. 6. GAPS mechanism: finite state machine.
The GAPS message (Table 3) contains all the information
needed by the GAPS protocol.
From a functional point of view, the GAPS agent is located on the top of two LMP agents (LMP east and west ).
This way, GAPS messages can be sent through either the
east or west control channel.
Fig. 7. A failure in a bidirectional link is detected by its adjacent nodes.
4.2. GAPS LMP extensions definition
We define GAPS as an LMP extension running in the
GMPLS-based control plane of OMS protected networks.
In this way, we avoid the implementation of a new control
protocol that would increase the signaling overhead. Specifically, GAPS relies on control channel management functionalities provided by the LMP protocol. Once a control
channel is activated between two adjacent nodes, the
LMP Hello messages exchanged can be used to maintain
control channel connectivity between the nodes. In order
to run the GAPS mechanism, however, the definition of a
novel LMP message is required:
GAPS Message
hGAPS Messagei::= hCommon Headeri hGAPSi
This message is used to transmit GAPS information
when the LMP adjacency is part of an OMS protected ring.
Table 3
GAPS Object Format
4.3. Protection time Models for GAPS-controlled OMS
protected rings
In this Subsection, we present models for calculating
the protection time for OMS DPRing and OMS SPRing running with the GAPS mechanism. Our aim is to determine
the switching time requirements to be imposed on the
optical nodes necessary to meet the protection time target.
Let us define the protection time ðT DPRing Þ in an OMS
DPRing as the interval from the decision to switch to the
completion of the switching operation at the node initiating the bridge request. It includes, therefore, the notification from the initiating optical node to its OCC ðT config Þ,
the propagation delay in each control network link ðT link Þ,
the processing time in each OCC ðT control Þ, the time to configure each optical node in the ring ðT config Þ to perform the
switching action, and the time to switch itself ðT switch Þ.
Thus, T DPRing can be expressed as
1983
75
T(DPRing)(L=100Km)
T(DPRing)(L=200Km)
T(DPRing)(L=300Km)
Objective
50
25
0
4
6
8
10
12
14
16
Number of nodes in the ring (n)
18
20
Fig. 8. Protection time for OMS DPRings.
Theoretical protection time (ms)
Theoretical protection time (ms)
L. Velasco et al. / Computer Networks 52 (2008) 1975–1987
75
T(SPRing)(L=100Km)
T(SPRing)(L=200Km)
T(SPRing)(L=300Km)
Objective
50
25
0
4
6
8
10
12
14
16
Number of nodes in the ring (n)
18
20
Fig. 9. Protection time for OMS SPRings.
T DPRing ¼ 2T config þ T switch þ ð2n 1ÞT control þ 2ðn 1ÞT link
5. ROADM design to support OMS protection schemes
ð9Þ
Tswitch is predefined by the switching device. We use a
switch with a response time below 1 ms, which is in line
with the devices currently available on the market. T link depends on the ring link lengths (L) and the signal speed
through the fiber. Note that T link is negligible for metropolitan ring networks.
The protection time model for GAPS controlling a bidirectional ring is different than that defined in (9) for OMS
DPRing. In fact, GAPS has been extended in the case of
OMS SPRing to coordinate both protection actions (one
for each direction) to be performed by the nodes adjacent
to the failure. We assume that configuration actions are
executed in the optical node in a serial manner. Let us define the protection time ðT SPRing Þ in an OMS SPRing as the
interval from the decision to switch to the completion of
the switch operation at the node initiating the bridge request. In this case, T SPRing can be expressed as
T SPRing ¼ 2T config þ T switch þ nT control þ ðn 1ÞT link
þ MaxðT config ; ðn 1ÞT control þ ðn 1ÞT link Þ
ð10Þ
The term Maxða; bÞ expresses the idea of configuration actions that are performed in a serial manner in the optical
node. Note that (10) will give the same values as (9) when
the time to configure the optical node is higher than the
time to transport the GAPS message around the ring. This
will happen if the number of nodes in the ring is low or
the distances between ring nodes are short. In such cases,
the protection time in OMS SPRing will be lower than the
objective even though it is higher than that of the OMS
DPRing.
Figs. 8 and 9 show the theoretical protection time for
OMS DPRing and OMS SPRing, respectively, as functions
of the number of nodes (n) in the ring for several link
lengths. They show the scalability of GAPS when the number of nodes in the ring is increased. In this analysis, we assume T config to be less than 5 ms and T control to be about
0.2 ms. From the GAPS mechanism, we can conclude that
the typical target protection time (i.e. 50 ms) is reached
even when rings are composed of a large number of nodes.
However, it implies strict requirements for the hardware of
the optical nodes (e.g., T control and T config ).
Besides the design of the GAPS mechanism, we have designed two new optical nodes (one for OMS DPRings and
another for OMS SPRings) capable of satisfying the requirements derived in the last section. Optical nodes are based
on two wavelength selective switches (WSS). One of these
is used for adding and the other for dropping the local traffic [21].
In the OMS DPRing node (Fig. 10a), two optical power
meters (labeled with M in Fig. 10) measure the incoming
optical power at the east and west ports. Two 2 2 optical
switches have been added to the WSS components to allow
OMS protection. The two pairs of optical Mux/demux are
responsible for coupling the WDM-multiplexed bundle
with the in-fiber optical supervisory channel (OSC), which
transports the control channel. Two 1300 nm optical transponders ðkOSC Þ are used to convert electrical fast Ethernet
signals to the optical domain. Additionally, a node controller (not shown in Fig. 10) is needed to manage the extra
resources.
The optical power meters monitor the incoming optical
power levels at the west and east inputs and notify the
node controller upon receiving out-of-bounds levels. Upon
receiving this information, the node controller will send a
LoL notification to the OCC. Note that if the link is not affected by a failure, optical power must always be received
at each end of the link. When considering out-of-fiber
signaling, the OSC does not transport the control channel. Nevertheless, the associated optical hardware cannot be eliminated because an optical pilot signal is still
necessary to detect the repair of any failure affecting the
link.
The OMS SPRing optical node is also based on WSS as
shown in Fig. 10b. We use WSS components with a capacity of 40 channels in the C-band. Although the optical node
is defined as bidirectional, it is important to highlight that,
as in the OMS DPRing optical node, only two WSS are used;
this keeps the cost and complexity of the node low. We
have defined waveband B1 as channels 1–20 and B2 as
channels 21–40.
Under normal conditions, B1 received from east and B2
received from west are used to transport the traffic and B2
east and B1 west are used for protection. Band splitter (BS)
1984
L. Velasco et al. / Computer Networks 52 (2008) 1975–1987
b
a
Demx
M
Demx
M
Mx
s
Mx
s
West
East
s
B1
BS
B1+B2
WSS
WSS
B1
s
Mx
Demx
λOSC
WSS
Mx
M
WSS
M
BS
B2
s
East
BS
B2
Demx
West
B1
B2
s
s
λOSC
λOSC
λOSC
Fig. 10. Optical nodes design to support OMS protection. (a) OMS DPRing scheme (b) OMS SPRing scheme.
components are used to divide the WDM bundle into two
bands, and splitters (S) are used to separate optical signals
or join both bands. To implement OMS SPRing protection,
four 2 2 optical switches decide which (B1, B2) bands
are used to transport the traffic and which bands are used
for protection. The resulting cost and complexity of the
OMS SPRing optical node are not much higher than those
of the OMS DPRing optical node; in fact, only two 2 2
optical switches and a set of passive components (splitters
and band splitters) are added.
The functionality depicted in Fig. 10 has been separated
into several building blocks. Each block has been implemented as a separated card. Therefore, two additional cards
equipped with active components other passive components (mux/demux, splitters, and band splitters), the transponder and optical switching and monitoring (OSNL)
cards, have been added to the WSS components. The OSNL
card includes the monitoring and switching devices. The
switching device has a response time below 1ms and insertion losses lower than 0.9 dB. The monitoring device extracts a small part of the incoming optical power,
transforms the sample into a digital value by means of an
A/D converter, and stores the converted value in a register.
The monitoring sweep time is 10 ls.
Each active card in the optical node is equipped with an
ARM7 32-bit RISC processor [22] running at 100 MHz. The
processor card controls the different cards components
and manages the communication with the node controller
implemented in a separate card (the so-called ‘Master card’).
The Master card communicates through an internal serial bus with the rest of the optical node cards and through
a fast Ethernet interface with the control and management
planes. The Master card is based on the UNC90 microcontroller module [23], which is equipped with an ARM9 32bit RISC processor [22] running at 180 MHz. In addition
to other elements, the UNC90 module includes a RISC processor, 32 MByte SDRAM, and 32 MByte Flash Memory. The
internal architecture of the optical node is shown in Fig. 11.
The OSNL card processor includes an interrupt-driven
system to allow the CPU to continue processing instruc-
tions when a request from the Master Card arrives. In the
mean time, the card processor executes a polling loop to
continuously read samples from the monitoring register.
If the values for the samples read within 1 ms are considered to be out-of-bounds, the card processor declares a
LoL condition. This condition has to be communicated to
the Master card by sending a proprietary message through
the serial bus.
The Master processor runs using a Linux Operating System. An application (the node agent) has been developed to
manage the entire optical node by providing an interface
between the cards and the control and management planes.
The agent listens for incoming data from serial and TCP/UDP
ports. When a message indicating a LoL condition is received through the serial port, the agent sends a simple network management protocol (SNMP) trap that brings the
related information to the OCC in the control plane.
The agent on the Master card accepts request-response
commands using an XML-based proprietary protocol.
When a message through a TCP/UDP port is received, the
agent decodes it and initiates the appropriate communication to another card in the optical node through the serial
bus.
As defined in the previous Section, T config is the configuration time of the optical node (i.e., the time to process a request from the OCC or inform the OCC of any event). In
order to achieve a recovery time shorter than 50 ms after
fault detection, we specified 5 ms as the maximum value
for T config in the previous section. During optimization of
the system, some bottlenecks were detected and corrected.
One of the more important ones relates to the TTY device
driver architecture in the Linux kernel [24]. Linux considers
serial ports to be high latency devices. When data is received, therefore, the TTY device driver schedules itself to
push the data to the user application at some later point
in the near future. This behavior introduces an unacceptable delay in the system. To avoid this high latency in the
serial transmission, the Linux kernel was modified to define
the serial driver as a low latency driver that immediately
pushes the data to the user application, (the node agent).
L. Velasco et al. / Computer Networks 52 (2008) 1975–1987
1985
Fig. 11. Optical node internal architecture.
Fig. 12 shows the physical layout of the resulting OSNL
and Master and Transponder Cards. It represents the frontal view of the optical node and the test-bed where the
complete architecture has been tested.
6. Experimental results
The performance of the GAPS mechanism with the
ROADM nodes discussed in the previous sections has
been experimentally evaluated using the ASON/GMPLS
CARISMA network test-bed [14]. The CARISMA GMPLS
control plane uses the RSVP-TE protocol for signaling,
the OSPF-TE protocol for routing, and the LMP protocol
for control channel management and link property correlation. The OCCs have been implemented using Linuxbased routers. Each pair of OCCs communicates through
a single IP control channel implemented with full duplex
Fast Ethernet links. Finally, each OCC has also a connection controller interface (CCI) for communicating with
the optical nodes.
Fig. 13a shows the switching time measured when the
protection decision is made by the OCC at the control plane
upon reception of a LoL notification from the optical node
ð2T config þ T switch Þ. Thus, it does not include any interchange
of GAPS messages.
Fig. 12. Physical layout and testbed.
Fig. 13b shows the experimental results for the protection time as a function of the number of nodes for OMS
DPRing and OMS SPRing. Because of the testing environment, T link is negligible. In order to provide enough accuracy, the figures reflect the average values from 10
experiments. From these results and applications (9) for
OMS DPRing and (10) for OMS SPRing, we found that
T control is less than 0.1ms and T config is less than 4.5 ms. These
results are better than those specified in previous sections.
Thus the obtained behavior will be also better than that
shown in Figs. 8 and 9 for OMS DPRing and OMS SPRing,
respectively.
Note that the experimental protection times in Fig. 13b
do not include any propagation time (the T link terms in (9)
and (10)). Therefore, these experimental times must be
incremented with the specific propagation time. For example, in a 12 node OMS DPRing with links of 300 km, the
term 2ðn 1ÞT link represents an additional delay of
33 ms. The protection time in this case would be
11.90 + 33=44.90 ms. This is better than the 48.6 ms value
specified in previous sections.
As an example of Figs. 13 and 14a and b show the
experimental T DPRing and T SPRing , respectively, for the worst
scenario considered with 18 nodes.
When the link failure is repaired, optical power is detected again by the adjacent optical nodes. At this moment,
the WTR period starts. After the WTR time, the protection is
reverted and the signal is switched from the protection to
the working links.
Therefore, it is possible to deploy rings with 20 nodes
and a total length of 2000 km or 16 nodes and 3200 km
with a recovery time below 50 ms. These results show that
our GAPS-based solutions scale linearly with both the
number of nodes in the ring and link length. Thus, we
can conclude that GAPS in conjunction with the designed
ROADMs provide OMS protection under 50 ms in rings
with high numbers of nodes.
Finally, a comparison of both solutions is shown in Table
4. Although the cost and complexity of the OMS DPRing
solution is lower than that of the OMS SPRing, the increment in the ROADM cost due to the OMS support is very
low in both cases. OMS SPRing is more bandwidth efficient
and provides better availability than OMS DPRing.
1986
L. Velasco et al. / Computer Networks 52 (2008) 1975–1987
a
Experimental protection time (ms)
b
-10
Optical power (dBm)
-15
-20
-25
-30
9.89 ms
-35
-40
19
DPRing
17
SPRing
15
13
11
9
6
3
-45
0
5
10
9
12
15
18
Number of nodes in the ring (n)
ms
15
Fig. 13. Experimental results (a) experimental 2T config þ T switch value (b) evolution of protection time with the number of nodes.
a
-10
b
-10
-15
Optical power (dBm)
Optical power(dBm)
-15
-20
-25
-30
12.85 ms
-35
-40
-20
-25
15.71 ms
-30
-35
-40
-45
-45
5
0
10
15
20
25
30
35
40
45
0
5
10
15
20
25
30
35
40
45
Time (ms)
Time (ms)
Fig. 14. Experimental results for rings with 18 nodes (a) OMS DPRing protection time (b) OMS SPRing Protection time.
Table 4
Comparison of OMS solutions
Transported traffic
(lightpaths)
Availability
Protection time
Cost
OMS DPRing
OMS SPRing
W (num. wavelengths/link).
Independent of the number
of nodes (n) in the ring
High
Fastest (<50 m)
Lowest
From W to Wn/2,
depending on the
traffic pattern
Highest
Fast (<50 m)
Low
7. Conclusions
In this paper, we have presented two OMS protection
solutions (DPRing and SPRing) for GMPLS-controlled optical
ring networks based on a novel GAPS mechanism. OMS protection makes possible the recovery of all optical channels in
a fiber with just one protection action. Since the overall protection time increases linearly with the number of nodes, the
scalability of the GAPS mechanism has been demonstrated.
From the obtained results, we conclude that a ring-based network using the designed ROADM nodes and controlled by the
GAPS protocol will provide survivability with a SDH-like service recovery time (<50 ms) even in large optical rings.
A pay-as-you-grow strategy can be implemented using
both schemes. OMS DPRing can be used in networks where
the expected traffic demand is lower than the number of
wavelengths available in each link. If the traffic grows,
the migration from OMS DPRing to OMS SPRing consists
of adding one OSNL card and one card for the passive components (splitters and band splitters) to every optical node
in the ring.
Acknowledgements
This work has been partially funded by the i2Cat Foundation through the TRILOGY project and by the Spanish
Science Ministry through the TEC-2005-08051-C03-02
RINGING project.
References
[1] ITU-T Rec. G.841, Types and characteristics of SDH network
protection architectures, 1998.
L. Velasco et al. / Computer Networks 52 (2008) 1975–1987
[2] G. Iannaccone, C. Chuah, R. Mortier, S. Bhattacharyya, C. Diot,
Analysis of link failures in an IP backbone, in: Proceedings of ACM
SIGCOMM IMW’02, Marseille, France, November 2002.
[3] W.D. Grover, Mesh-Based Survivable Networks, Prentice Hall PTR,
New Jersey, 2004.
[4] ITU-T Rec. G.8080/Y.1304, Architecture for the Automatically
Switched Optical Networks, 2001 and Am. 1, 2003.
[5] E. Mannie, Generalized multi-protocol label switching (GMPLS)
architecture, RFC-3945, 2004.
[6] L. Berger, Generalized multi-protocol label switching (GMPLS)
signaling resource reservation protocol-traffic engineering (RSVPTE) extensions, RFC 3473, 2003.
[7] D. Katz, K. Kompella, D. Yeung, Traffic engineering (TE) extensions to
OSPF Version 2, RFC 3630, 2003.
[8] J. Lang, Link management protocol (LMP), RFC 4204, 2005.
[9] J.P. Lang et al., RSVP-TE extensions in support of end-to-end
generalized multi-protocol label switching (GMPLS) recovery, RFC
4872, 2007.
[10] L. Berger et al., GMPLS segment recovery, RFC 4873, 2007.
[11] P. Arijs et al, Design of ring and mesh based WDM transport
networks, Optical Networks Magazine 3 (2000) 25–40.
[12] M.-J. Li et al, Transparent optical protection ring architectures and
applications, IEEE Journal of Lightwave Technology 23 (10) (2005)
3388–3403.
[13] R. Muñoz et al., Experimental GMPLS fault management for OULSR
transport networks, in: Optical Fiber Communications, OFC/NFOEC
3, 2005, paper JWA50.
[14] J. Perelló, E. Escalona, S. Spadaro, J. Comellas, G. Junyent, Resource
discovery
in
ASON/GMPLS
transport
networks,
IEEE
Communications Magazine 45 (10) (2007) 86–92.
[15] A. Fumagalli, M. Tacca, Differentiated reliability (DiR) in wavelength
division multiplexing rings, IEEE/ACM Transactions on Networking
14 (1) (2006) 159–168.
[16] L. Guo, L. Lemin, A novel survivable routing algorithm with partial
shared-risk link groups (SRLG)-disjoint protection based on
differentiated reliability constraints in WDM optical mesh
networks, IEEE Journal of Lightwave Technology 25 (6) (2007)
1410–1415.
[17] X. Cao, V. Anand, C. Qiao, Framework for waveband switching in
multigranular optical networks part I-multigranular cross-connect
architectures, Journal of Optical Networking 5 (12) (2006) 1043–
1055.
[18] M. To, P. Neusy, Unavailability analysis of long-haul networks, IEEE
Journal on Selected Areas in Communication 12 (1994) 100–109.
[19] J.-P. Vasseur, M. Pickavet, P. Demeester, Network Recovery –
Protection and Restoration of Optical, SONET-SDH, IP and MPLS,
Elsevier, San Francisco, 2004.
[20] S. Verbrugge et al., General availability model for multilayer
transport networks, in: Proceedings of DRCN, 2005, pp. 85–92.
[21] S. Sygletos, A. Tzanakaki, I. Tomkos, Numerical study of cascadability
performance of continuous spectrum wavelength blocker/selective
switch at 10/40/160 Gb/s”, IEEE Photonics Technology Letters 18
(24) (2006) 2608–2610.
[22] ARM: <http://www.arm.com>.
[23] DIGI UNC90 - Datasheet: <http://www.digi.com/pdf/hwref_ cc9u.
pdf>.
[24] J. Corbet, A. Rubini, G. Kroah-Hartman, Linux Device Drivers, third
ed., O’Reilly Media, Sebastopol, 2005.
Luis Velasco (luis.velasco@tsc.upc.edu) received the M.Sc. degree in Telecommunications
Engineering from Universidat Politécnica de
Madrid (UPM), in 1989. In the same year, he
joined Telefónica de España and was involved
on the specifications and first office application of Telefónica’s SDH transport network. In
2003 he joined Universitat Politècnica de
Catalunya (UPC), where currently he is assistant professor. He is currently working
towards the Ph.D. degree at the Optical Communications group of UPC. His research
interests include signaling, routing and resilience mechanisms architectures in ASON/GMPLS-based networks.
1987
Salvatore Spadaro (spadaro@tsc.upc.edu)
received the M.Sc. and the Ph.D. degrees in
Telecommunications Engineering from UPC
(Barcelona, Spain) in 2000 and 2005, respectively. He also received the M.Sc. degree in
Electrical Engineering from Politecnico di
Torino, Italy, in 2000. He is currently associate
professor in the Optical Communications
group of the Signal Theory and Communications Department of UPC. He has been
involved in international and national
research projects. He has co-authored about
60 papers in international journals and conferences. His research interests
are in the fields of all-optical networks with emphasis on traffic engineering and resilience.
Jaume Comellas (comellas@tsc.upc.edu) recei
ved M.S (1993) and Ph.D. (1999) degrees in
Telecommunications Engineering from UPC.
Since 1992 he has been a staff member of the
Optical Communications Research Group of
UPC. His current research interests mainly
concern optical transmission and IP over
WDM networking topics. He has participated
in different research projects funded by the
Spanish government and the European Commission. He has co-authored more than 100
research articles in national and international
journals and conferences. He is associate professor at the Signal Theory
and Communications Department of UPC.
Gabriel Junyent (junyent@tsc.upc.edu) is a
telecommunications engineer (Universidad
Politécnica de Madrid, UPM, 1973), and holds
a Ph.D. degree in communications (UPC,
1979). He has been a teaching assistant (UPC,
1973–1977), adjunct professor (UPC,1977–
1983), associate professor (UPC, 1983–1985),
and professor (UPC, 1985–1989), and has been
a full professor since 1989. In the last 15 years
he has participated in more than 30 national
and international R&D projects, and has published more than 30 journal papers and book
chapters and 100 conference papers.