13.07.2015 Views

PCI Express Base Specification v1.0 - 2002.pdf

PCI Express Base Specification v1.0 - 2002.pdf

PCI Express Base Specification v1.0 - 2002.pdf

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>PCI</strong> <strong>Express</strong><strong>Base</strong> <strong>Specification</strong>Revision 1.0April 29, 2002


REVISION REVISION HISTORY DATE1.0 Initial release. 4/29/02<strong>PCI</strong>-SIG disclaims all warranties and liability for the use of this document and theinformation contained herein and assumes no responsibility for any errors that may appearin this document, nor does <strong>PCI</strong>-SIG make a commitment to update the informationcontained herein.Contact the <strong>PCI</strong>-SIG office to obtain the latest revision of the specification.Questions regarding the <strong>PCI</strong> <strong>Express</strong> <strong>Base</strong> <strong>Specification</strong> or membership in <strong>PCI</strong>-SIG may beforwarded to:Membership Serviceswww.pcisig.comE-mail: administration@pcisig.comPhone: 1-800-433-5177 (Domestic Only)503-291-2569Fax: 503-297-1090Technical Supporttechsupp@pcisig.comDISCLAIMERThis draft <strong>Specification</strong> is being provided to you for review purposes pursuant toArticle 15.2 of the Bylaws of <strong>PCI</strong>-SIG. This draft <strong>Specification</strong> is subject toamendment until it is officially adopted by the Board of Directors of <strong>PCI</strong>-SIG. TheBoard of Directors may, at its discretion, initiate additional review periods, in whichcase you will be notified of the same. Pursuant to Article 14 of the Bylaws, this draft<strong>Specification</strong> is to be considered <strong>PCI</strong>-SIG Confidential until adopted by the Board ofDirectors.All product names are trademarks, registered trademarks, or servicemarks of their respectiveowners.Copyright © 2002 <strong>PCI</strong>-SIG


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV 1.0ContentsPREFACE........................................................................................................................ 17OBJECTIVE OF THE SPECIFICATION................................................................................. 18DOCUMENT ORGANIZATION........................................................................................... 18DOCUMENTATION CONVENTIONS................................................................................... 19TERMS AND ABBREVIATIONS ......................................................................................... 20REFERENCE DOCUMENTS ............................................................................................... 251. INTRODUCTION................................................................................................... 271.1. A THIRD GENERATION I/O INTERCONNECT ....................................................... 271.2. <strong>PCI</strong> EXPRESS LINK............................................................................................. 291.3. <strong>PCI</strong> EXPRESS FABRIC TOPOLOGY ...................................................................... 301.3.1. Root Complex ............................................................................................ 311.3.2. Endpoints................................................................................................... 321.3.3. Switch ........................................................................................................ 331.3.4. <strong>PCI</strong> <strong>Express</strong>-<strong>PCI</strong> Bridge ........................................................................... 341.4. <strong>PCI</strong> EXPRESS FABRIC TOPOLOGY CONFIGURATION ........................................... 341.5. <strong>PCI</strong> EXPRESS LAYERING OVERVIEW.................................................................. 351.5.1. Transaction Layer ..................................................................................... 361.5.2. Data Link Layer ........................................................................................ 361.5.3. Physical Layer........................................................................................... 371.5.4. Layer Functions and Services ................................................................... 371.6. ADVANCED PEER-TO-PEER COMMUNICATION OVERVIEW ................................. 412. TRANSACTION LAYER SPECIFICATION ..................................................... 432.1. TRANSACTION LAYER OVERVIEW...................................................................... 432.2. ADDRESS SPACES, TRANSACTION TYPES, AND USAGE ...................................... 442.2.1. Memory Transactions................................................................................ 442.2.2. I/O Transactions........................................................................................ 452.2.3. Configuration Transactions ...................................................................... 452.2.4. Message Transactions............................................................................... 452.3. PACKET FORMAT OVERVIEW ............................................................................. 472.4. TRANSACTION DESCRIPTOR ............................................................................... 482.4.1. Overview.................................................................................................... 482.4.2. Transaction Descriptor –Transaction ID Field ........................................ 482.4.3. Transaction Descriptor – Attributes Field................................................ 502.4.4. Transaction Descriptor – Traffic Class Field........................................... 512.5. TRANSACTION ORDERING .................................................................................. 522.6. VIRTUAL CHANNEL (VC) MECHANISM.............................................................. 562.6.1. Virtual Channel Identification (VC ID) .................................................... 582.6.2. VC Support Options .................................................................................. 582.6.3. TC to VC Mapping .................................................................................... 593


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV 1.02.6.4. VC and TC Rules....................................................................................... 602.7. TRANSACTION LAYER PROTOCOL -PACKET DEFINITION AND HANDLING ......... 612.7.1. Transaction Layer Packet Definition Rules .............................................. 612.7.2. TLP Digest Rules....................................................................................... 652.7.3. TLPs with Data Payloads - Rules ............................................................. 662.7.4. Requests..................................................................................................... 672.7.5. Completions............................................................................................... 762.7.6. Handling of Received TLPs....................................................................... 782.8. MESSAGES.......................................................................................................... 882.8.1. <strong>Base</strong>line Messages..................................................................................... 882.8.2. Advanced Switching Support Message Group .......................................... 982.9. ORDERING AND RECEIVE BUFFER FLOW CONTROL............................................ 992.9.1. Overview and Definitions.......................................................................... 992.9.2. Flow Control Rules ................................................................................. 1002.10. DATA INTEGRITY ......................................................................................... 1092.10.1. Introduction............................................................................................. 1092.10.2. ECRC Rules............................................................................................. 1092.11. ERROR FORWARDING ................................................................................... 1132.11.1. Error Forwarding Usage Model............................................................. 1132.11.2. Rules For Use of Data Poisoning ........................................................... 1142.12. COMPLETION TIMEOUT MECHANISM ........................................................... 1142.13. TRANSACTION LAYER BEHAVIOR IN DL_DOWN STATUS ............................ 1152.14. TRANSACTION LAYER BEHAVIOR IN DL_UP STATUS .................................. 1163. DATA LINK LAYER SPECIFICATION .......................................................... 1173.1. DATA LINK LAYER OVERVIEW ........................................................................ 1173.2. DATA LINK CONTROL AND MANAGEMENT STATE MACHINE........................... 1193.2.1. Data Link Control and Management State Machine Rules..................... 1203.3. FLOW CONTROL INITIALIZATION PROTOCOL.................................................... 1213.3.1. Flow Control Initialization State Machine Rules.................................... 1233.4. DATA LINK LAYER PACKETS (DLLPS) ............................................................ 1253.4.1. Data Link Layer Packet Rules................................................................. 1253.5. DATA INTEGRITY ............................................................................................. 1303.5.1. Introduction............................................................................................. 1303.5.2. LCRC, Sequence Number, and Retry Management (TLP Transmitter).. 1303.5.3. LCRC and Sequence Number (TLP Receiver) ........................................ 1424. PHYSICAL LAYER SPECIFICATION ............................................................ 1494.1. INTRODUCTION................................................................................................. 1494.2. LOGICAL SUB-BLOCK................................................................................ 1494.2.1. Symbol Encoding..................................................................................... 1504.2.2. Framing and Application of Symbols to Lanes ....................................... 1534.2.3. Data Scrambling ..................................................................................... 1564.2.4. Link Initialization and Training .............................................................. 1574.2.5. Link Training and Status State Machine (LTSSM) ........................................ 1804.2.6. Link Training and Status State Descriptions........................................... 1834.2.7. Clock Tolerance Compensation .............................................................. 1954


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV 1.04.2.8. Compliance Pattern................................................................................. 1974.3. ELECTRICAL SUB-BLOCK................................................................................. 1984.3.1. Electrical Sub-Block Requirements......................................................... 1984.3.2. Electrical Signal <strong>Specification</strong>s .............................................................. 2014.3.3. Differential Transmitter (Tx) Output <strong>Specification</strong>s ............................... 2064.3.4. Differential Receiver (Rx) Input <strong>Specification</strong>s ...................................... 2115. SOFTWARE INITIALIZATION AND CONFIGURATION .......................... 2155.1. CONFIGURATION TOPOLOGY............................................................................ 2155.2. <strong>PCI</strong> EXPRESS CONFIGURATION MECHANISMS.................................................. 2165.2.1. <strong>PCI</strong> 2.3 Compatible Configuration Mechanism...................................... 2175.2.2. <strong>PCI</strong> <strong>Express</strong> Enhanced Configuration Mechanism................................. 2185.2.3. Root Complex Register Block.................................................................. 2185.3. CONFIGURATION TRANSACTION RULES ........................................................... 2195.3.1. Device Number........................................................................................ 2195.3.2. Configuration Transaction Addressing................................................... 2195.3.3. Configuration Request Routing Rules..................................................... 2205.3.4. Generating <strong>PCI</strong> Special Cycles using <strong>PCI</strong> Configuration Mechanism #12215.4. CONFIGURATION REGISTER TYPES................................................................... 2215.5. <strong>PCI</strong>-COMPATIBLE CONFIGURATION REGISTERS............................................... 2225.5.1. Type 0/1 Common Configuration Space ................................................. 2235.5.2. Type 0 Configuration Space Header....................................................... 2285.5.3. Type 1 Configuration Space Header....................................................... 2295.6. <strong>PCI</strong> POWER MANAGEMENT CAPABILITY STRUCTURE...................................... 2325.7. MSI CAPABILITY STRUCTURE.......................................................................... 2345.8. <strong>PCI</strong> EXPRESS CAPABILITY STRUCTURE............................................................ 2345.8.1. <strong>PCI</strong> <strong>Express</strong> Capability List Register (Offset 00h) ................................. 2355.8.2. <strong>PCI</strong> <strong>Express</strong> Capabilities Register (Offset 02h)...................................... 2355.8.3. Device Capabilities Register (Offset 04h)............................................... 2375.8.4. Device Control Register (Offset 08h)...................................................... 2415.8.5. Device Status Register (Offset 0Ah)........................................................ 2445.8.6. Link Capabilities Register (Offset 0Ch).................................................. 2465.8.7. Link Control Register (Offset 10h).......................................................... 2485.8.8. Link Status Register (Offset 12h) ............................................................ 2505.8.9. Slot Capabilities Register (Offset 14h).................................................... 2515.8.10. Slot Control Register (Offset 18h)........................................................... 2535.8.11. Slot Status Register (Offset 1Ah)............................................................. 2555.8.12. Root Control Register (Offset 1Ch)......................................................... 2565.8.13. Root Status Register (Offset 20h)............................................................ 2575.9. <strong>PCI</strong> EXPRESS EXTENDED CAPABILITIES........................................................... 2585.9.1. Extended Capabilities in Configuration Space ....................................... 2595.9.2. Extended Capabilities in the Root Complex Register Block ................... 2595.9.3. <strong>PCI</strong> <strong>Express</strong> Enhanced Capability Header............................................. 2595.10. ADVANCED ERROR REPORTING CAPABILITY ............................................... 2605.10.1. Advanced Error Reporting Enhanced Capability Header (Offset 00h).. 2615.10.2. Uncorrectable Error Status Register (Offset 04h) .................................. 2625


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV 1.05.10.3. Uncorrectable Error Mask Register (Offset 08h) ................................... 2635.10.4. Uncorrectable Error Severity Register (Offset 0Ch) .............................. 2645.10.5. Correctable Error Status Register (Offset 10h) ...................................... 2655.10.6. Correctable Error Mask (Offset 14h) ..................................................... 2655.10.7. Advanced Error Capabilities and Control Register (Offset 18h) ........... 2665.10.8. Header Log Register (Offset 1Ch) .......................................................... 2675.10.9. Root Error Command Register (Offset 2Ch)........................................... 2685.10.10. Root Error Status Register (Offset 30h).............................................. 2695.10.11. Error Source Identification Register (Offset 34h)............................... 2705.11. VIRTUAL CHANNEL CAPABILITY.................................................................. 2715.11.1. Virtual Channel Enhanced Capability Header ....................................... 2725.11.2. Port VC Capability Register 1 ................................................................ 2735.11.3. Port VC Capability Register 2 ................................................................ 2755.11.4. Port VC Control Register........................................................................ 2765.11.5. Port VC Status Register........................................................................... 2775.11.6. VC Resource Capability Register............................................................ 2775.11.7. VC Resource Control Register ................................................................ 2795.11.8. VC Resource Status Register................................................................... 2815.11.9. VC Arbitration Table............................................................................... 2825.11.10. Port Arbitration Table......................................................................... 2835.12. DEVICE SERIAL NUMBER CAPABILITY ......................................................... 2855.12.1. Device Serial Number Enhanced Capability Header (Offset 00h) ......... 2855.12.2. Serial Number Register (Offset 04h)....................................................... 2865.13. POWER BUDGETING CAPABILITY ................................................................. 2875.13.1. Power Budgeting Enhanced Capability Header (Offset 00h)................. 2875.13.2. Data Select Register (Offset 04h)............................................................ 2885.13.3. Data Register (Offset 08h) ...................................................................... 2895.13.4. Power Budget Capability Register (Offset 0Ch)..................................... 2916. POWER MANAGEMENT................................................................................... 2936.1. OVERVIEW ....................................................................................................... 2936.1.1. Statement of Requirements...................................................................... 2946.2. LINK STATE POWER MANAGEMENT................................................................. 2946.3. <strong>PCI</strong>-PM SOFTWARE COMPATIBLE MECHANISMS............................................. 2996.3.1. Device Power Management States (D-States) of a Function.................. 2996.3.2. PM Software Control of the Link Power Management State.................. 3026.3.3. Power Management Event Mechanisms ................................................. 3076.4. NATIVE <strong>PCI</strong> EXPRESS POWER MANAGEMENT MECHANISMS ........................... 3166.4.1. Active-State Power Management ............................................................ 3166.5. AUXILIARY POWER SUPPORT ........................................................................... 3326.5.1. Auxiliary Power Enabling....................................................................... 3326.6. POWER MANAGEMENT SYSTEM MESSAGES AND DLLPS................................. 3336.6.1. Power Management System Messages.................................................... 3336.6.2. Power Management DLLPs .................................................................... 3347. <strong>PCI</strong> EXPRESS SYSTEM ARCHITECTURE.................................................... 3357.1. INTERRUPT SUPPORT........................................................................................ 3356


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV 1.07.1.1. Rationale for <strong>PCI</strong> <strong>Express</strong> Interrupt Model............................................ 3357.1.2. <strong>PCI</strong> Compatible INTx Emulation ............................................................ 3367.1.3. INTx Emulation Software Model............................................................. 3367.1.4. Message Signaled Interrupt (MSI) Support ............................................ 3367.1.5. MSI Software Model................................................................................ 3377.1.6. PME Support........................................................................................... 3377.1.7. PME Software Model .............................................................................. 3387.1.8. PME Routing Between <strong>PCI</strong> <strong>Express</strong> and <strong>PCI</strong> Hierarchies..................... 3387.2. ERROR SIGNALING AND LOGGING.................................................................... 3387.2.1. Scope ....................................................................................................... 3387.2.2. Error Classification ................................................................................ 3397.2.3. Error Signaling ....................................................................................... 3407.2.4. Error Logging ......................................................................................... 3437.2.5. Error Listing and Rules........................................................................... 3447.2.6. Real and Virtual <strong>PCI</strong> Bridge Error Handling......................................... 3467.3. VIRTUAL CHANNEL SUPPORT........................................................................... 3477.3.1. Introduction and Scope ........................................................................... 3477.3.2. Supported TC/VC Configurations ........................................................... 3487.3.3. VC Arbitration......................................................................................... 3507.3.4. Isochronous Support ............................................................................... 3567.4. DEVICE SYNCHRONIZATION STOP MECHANISM ............................................. 3597.5. LOCKED TRANSACTIONS .................................................................................. 3607.5.1. Introduction............................................................................................. 3607.5.2. Initiation and Propagation of Locked Transactions - Rules................... 3607.5.3. Switches and Lock - Rules....................................................................... 3617.5.4. <strong>PCI</strong> <strong>Express</strong>/<strong>PCI</strong> Bridges and Lock - Rules............................................ 3627.5.5. Root Complex and Lock - Rules .............................................................. 3627.5.6. Legacy Endpoints .................................................................................... 3637.5.7. <strong>PCI</strong> <strong>Express</strong> Endpoints............................................................................ 3637.6. <strong>PCI</strong> EXPRESS RESET -RULES............................................................................ 3637.7. <strong>PCI</strong> EXPRESS NATIVE HOT PLUG SUPPORT...................................................... 3667.7.1. <strong>PCI</strong> <strong>Express</strong> Hot Plug Usage Model....................................................... 3667.7.2. Event Behavior ........................................................................................ 3717.7.3. Registers Grouped by Device Association .............................................. 3717.7.4. Messages ................................................................................................. 3767.7.5. <strong>PCI</strong> <strong>Express</strong> Hot Plug Interrupt/Wake Signal Logic .............................. 3777.7.6. The Operating System Hot Plug Method................................................. 3797.8. POWER BUDGETING CAPABILITY ..................................................................... 3807.8.1. System Power Budgeting Process Recommendations............................. 3807.9. SLOT POWER LIMIT CONTROL.......................................................................... 381A. ISOCHRONOUS APPLICATIONS AND SUPPORT ...................................... 383A.1. INTRODUCTION................................................................................................. 383A.2. ISOCHRONOUS CONTRACT AND CONTRACT PARAMETERS ............................... 385A.2.1. Isochronous Time Period and Isochronous Virtual Timeslot ................. 386A.2.2. Isochronous Payload Size ....................................................................... 386A.2.3. Isochronous Bandwidth Allocation ......................................................... 3877


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV 1.0A.2.4. Isochronous Transaction Latency ........................................................... 388A.2.5. An Example Illustrating Isochronous Parameters.................................. 389A.3. ISOCHRONOUS TRANSACTION RULES ............................................................... 390A.4. TRANSACTION ORDERING ................................................................................ 390A.5. ISOCHRONOUS DATA COHERENCY ................................................................... 390A.6. FLOW CONTROL ............................................................................................... 391A.7. TOPOLOGY RESTRICTIONS................................................................................ 391A.8. TRANSFER RELIABILITY ................................................................................... 392A.9. CONSIDERATIONS FOR BANDWIDTH ALLOCATION ........................................... 393A.9.1. Isochronous Bandwidth of <strong>PCI</strong> <strong>Express</strong> Links........................................ 393A.9.2. Isochronous Bandwidth of Endpoint Devices ......................................... 394A.9.3. Isochronous Bandwidth of Switches........................................................ 394A.9.4. Isochronous Bandwidth of Root Complex............................................... 394A.10. CONSIDERATIONS FOR <strong>PCI</strong> EXPRESS COMPONENTS..................................... 394A.10.1. A <strong>PCI</strong> <strong>Express</strong> Endpoint Device as a Requester ..................................... 394A.10.2. A <strong>PCI</strong> <strong>Express</strong> Endpoint Device as a Completer .................................... 395A.10.3. Switches................................................................................................... 396A.10.4. Root Complex .......................................................................................... 397B. SYMBOL ENCODING ........................................................................................ 399C. PHYSICAL LAYER APPENDIX........................................................................ 409C.1. DATA SCRAMBLING ......................................................................................... 4098


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV 1.0FiguresFIGURE 1-1: <strong>PCI</strong> EXPRESS LINK........................................................................................ 29FIGURE 1-2: EXAMPLE TOPOLOGY.................................................................................... 31FIGURE 1-3: LOGICAL BLOCK DIAGRAM OF A SWITCH ..................................................... 33FIGURE 1-4: HIGH-LEVEL LAYERING DIAGRAM ............................................................... 35FIGURE 1-5: PACKET FLOW THROUGH THE LAYERS ......................................................... 36FIGURE 1-6: ADVANCED PEER-TO-PEER COMMUNICATION .............................................. 41FIGURE 2-1: LAYERING DIAGRAM HIGHLIGHTING THE TRANSACTION LAYER.................. 43FIGURE 2-2: GENERIC TRANSACTION LAYER PACKET FORMAT........................................ 47FIGURE 2-3: TRANSACTION DESCRIPTOR .......................................................................... 48FIGURE 2-4: TRANSACTION ID.......................................................................................... 48FIGURE 2-5: ATTRIBUTES FIELD OF TRANSACTION DESCRIPTOR ...................................... 50FIGURE 2-6: VIRTUAL CHANNEL CONCEPT –AN ILLUSTRATION ...................................... 57FIGURE 2-7: VIRTUAL CHANNEL CONCEPT –SWITCH INTERNALS (UPSTREAM FLOW)..... 57FIGURE 2-8: AN EXAMPLE OF TC/VC CONFIGURATIONS.................................................. 60FIGURE 2-9: REQUEST HEADER FORMAT FOR 32B ADDRESSING OF MEMORY .................. 68FIGURE 2-10: REQUEST HEADER FORMAT FOR 64B ADDRESSING OF MEMORY................ 68FIGURE 2-11: REQUEST HEADER FORMAT FOR I/O TRANSACTIONS.................................. 68FIGURE 2-12: REQUEST HEADER FORMAT FOR CONFIGURATION TRANSACTIONS ............ 68FIGURE 2-13: REQUEST HEADER FORMAT FOR MSG REQUEST ......................................... 69FIGURE 2-14: REQUEST HEADER FORMAT FOR MSGDREQUEST ...................................... 69FIGURE 2-15: REQUEST HEADER FORMAT FOR MSGAS REQUEST.................................... 69FIGURE 2-16: REQUEST HEADER FORMAT FOR MSGASD REQUEST ................................. 69FIGURE 2-17: COMPLETION HEADER FORMAT .................................................................. 76FIGURE 2-18: COMPLETER ID ........................................................................................... 77FIGURE 2-19: FLOWCHART FOR HANDLING OF RECEIVED TLPS ....................................... 79FIGURE 2-20: FLOWCHART FOR SWITCH HANDLING OF TLPS........................................... 80FIGURE 2-21: FLOWCHART FOR HANDLING OF RECEIVED REQUEST ................................. 82FIGURE 2-22: INTX COLLAPSING IN A DUAL-HEADED BRIDGE ........................................ 92FIGURE 2-23: PAYLOAD_DEFINED MESSAGE.................................................................... 96FIGURE 2-24: RELATIONSHIP BETWEEN REQUESTER AND ULTIMATE COMPLETER ........... 99FIGURE 2-25: CALCULATION OF 32B ECRC FOR TLP END TO END DATA INTEGRITYPROTECTION............................................................................................................. 112FIGURE 3-1: LAYERING DIAGRAM HIGHLIGHTING THE DATA LINK LAYER .................... 117FIGURE 3-2: DATA LINK CONTROL AND MANAGEMENT STATE MACHINE...................... 119FIGURE 3-3: FLOWCHART DIAGRAM OF FLOW CONTROL INITIALIZATION PROTOCOL .... 122FIGURE 3-4: DLLP TYPE AND CRC FIELDS.................................................................... 126FIGURE 3-5: DATA LINK LAYER PACKET FORMAT FOR ACK AND NAK........................... 127FIGURE 3-6: DATA LINK LAYER PACKET FORMAT FOR INITFC1 .................................... 127FIGURE 3-7: DATA LINK LAYER PACKET FORMAT FOR INITFC2 .................................... 127FIGURE 3-8: DATA LINK LAYER PACKET FORMAT FOR UPDATEFC................................ 127FIGURE 3-9: PM DATA LINK LAYER PACKET FORMAT................................................... 127FIGURE 3-10: VENDOR SPECIFIC DATA LINK LAYER PACKET FORMAT .......................... 128FIGURE 3-11: DIAGRAM OF CRC CALCULATION FOR DLLPS......................................... 1299


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV 1.0FIGURE 3-12: TLP WITH LCRC AND SEQUENCE NUMBER APPLIED............................... 130FIGURE 3-13: TLP FOLLOWING APPLICATION OF SEQUENCE NUMBER AND RESERVED BITS................................................................................................................................. 132FIGURE 3-14: CALCULATION OF LCRC........................................................................... 134FIGURE 3-15: RECEIVED DLLP ERROR CHECK FLOWCHART.......................................... 139FIGURE 3-16: ACK/NAK DLLP PROCESSING FLOWCHART ............................................. 140FIGURE 3-17: RECEIVE DATA LINK LAYER HANDLING OF TLPS .................................... 145FIGURE 4-1: HIGH LEVEL LAYERING DIAGRAM HIGHLIGHTING PHYSICAL LAYER......... 149FIGURE 4-2: CHARACTER TO SYMBOL MAPPING............................................................. 150FIGURE 4-3: BIT TRANSMISSION ORDER ON PHYSICAL LANES - X1 EXAMPLE................ 151FIGURE 4-4: BIT TRANSMISSION ORDER ON PHYSICAL LANES - X4 EXAMPLE................ 151FIGURE 4-5: TLP WITH FRAMING SYMBOLS APPLIED..................................................... 154FIGURE 4-6: DLLP WITH FRAMING SYMBOLS APPLIED .................................................. 155FIGURE 4-7: FRAMED TLP ONAX1LINK ....................................................................... 155FIGURE 4-8: FRAMED TLP ONAX2LINK ....................................................................... 156FIGURE 4-9: FRAMED TLP ONAX4LINK ....................................................................... 156FIGURE 4-10: LFSR WITH SCRAMBLING POLYNOMIAL................................................... 157FIGURE 4-11: WIDTH NEGOTIATION, SIMPLIFIED STATE MACHINE, DOWNSTREAMCOMPONENT (PART 1).............................................................................................. 170FIGURE 4-12: WIDTH NEGOTIATION, SIMPLIFIED STATE MACHINE, DOWNSTREAMCOMPONENT (PART 2).............................................................................................. 171FIGURE 4-13: WIDTH NEGOTIATION, SIMPLIFIED STATE MACHINE, UPSTREAMCOMPONENT (PART 1).............................................................................................. 172FIGURE 4-14: WIDTH NEGOTIATION, SIMPLIFIED STATE MACHINE, UPSTREAMCOMPONENT (PART 2).............................................................................................. 173FIGURE 4-15: WIDTH NEGOTIATION EXAMPLE ............................................................... 174FIGURE 4-16: LINK WIDTH NEGOTIATION; STEPS 1,2..................................................... 176FIGURE 4-17: LINK WIDTH NEGOTIATION; STEPS 3, 4.................................................... 177FIGURE 4-18: LINK WIDTH NEGOTIATION; STEPS 5, 6.................................................... 179FIGURE 4-19: MAIN STATE DIAGRAM FOR LINK TRAINING AND STATUS STATE MACHINE................................................................................................................................. 183FIGURE 4-20: DETECT SUB-STATE MACHINE.................................................................. 184FIGURE 4-21: POLLING SUB-STATE MACHINE ................................................................ 186FIGURE 4-22: CONFIGURATION SUB-STATE MACHINE.................................................... 188FIGURE 4-23: RECOVERY SUB-STATE MACHINE.............................................................. 189FIGURE 4-24: L0S SUB-STATE MACHINE ......................................................................... 191FIGURE 4-25: L1 SUB-STATE MACHINE........................................................................... 192FIGURE 4-26: L2 SUB-STATE MACHINE.......................................................................... 193FIGURE 4-27: LOOPBACK STATE MACHINE..................................................................... 195FIGURE 4-28: SAMPLE DIFFERENTIAL SIGNAL ................................................................. 202FIGURE 4-29: SAMPLE TRANSMITTED WAVEFORM SHOWING -3.5 DB DE-EMPHASISAROUND A 0.5 V COMMON MODE........................................................................... 203FIGURE 4-30: A 30 KHZ BEACON SIGNALING THROUGH A 75 NFCAPACITOR............. 205FIGURE 4-31: BEACON, WHICH INCLUDES A 2 NS PULSE THROUGH A 75 NFCAPACITOR................................................................................................................................. 20510


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV 1.0FIGURE 4-32: MINIMUM TRANSMITTER TIMING AND VOLTAGE OUTPUT COMPLIANCESPECIFICATION......................................................................................................... 209FIGURE 4-33: COMPLIANCE TEST/MEASUREMENT LOAD................................................ 210FIGURE 4-34: MINIMUM RECEIVER EYE TIMING AND VOLTAGE COMPLIANCESPECIFICATION......................................................................................................... 214FIGURE 5-1: <strong>PCI</strong> EXPRESS ROOT COMPLEX DEVICE MAPPING ....................................... 216FIGURE 5-2: <strong>PCI</strong> EXPRESS SWITCH DEVICE MAPPING ................................................... 216FIGURE 5-3: <strong>PCI</strong> EXPRESS CONFIGURATION SPACE LAYOUT.......................................... 217FIGURE 5-4: COMMON CONFIGURATION SPACE HEADER................................................ 223FIGURE 5-5: TYPE 0CONFIGURATION SPACE HEADER.................................................... 228FIGURE 5-6: TYPE 1CONFIGURATION SPACE HEADER.................................................... 229FIGURE 5-7: <strong>PCI</strong> POWER MANAGEMENT CAPABILITY STRUCTURE................................. 232FIGURE 5-8: POWER MANAGEMENT CAPABILITIES ......................................................... 232FIGURE 5-9: POWER MANAGEMENT STATUS/CONTROL.................................................. 233FIGURE 5-10: <strong>PCI</strong> EXPRESS CAPABILITY STRUCTURE..................................................... 234FIGURE 5-11: <strong>PCI</strong> EXPRESS CAPABILITY LIST REGISTER................................................ 235FIGURE 5-12: <strong>PCI</strong> EXPRESS CAPABILITIES REGISTER ..................................................... 235FIGURE 5-13: DEVICE CAPABILITIES REGISTER .............................................................. 237FIGURE 5-14: DEVICE CONTROL REGISTER..................................................................... 241FIGURE 5-15: DEVICE STATUS REGISTER........................................................................ 244FIGURE 5-16: LINK CAPABILITIES REGISTER................................................................... 246FIGURE 5-17: LINK CONTROL REGISTER......................................................................... 248FIGURE 5-18: LINK STATUS REGISTER............................................................................ 250FIGURE 5-19: SLOT CAPABILITIES REGISTER .................................................................. 251FIGURE 5-20: SLOT CONTROL REGISTER......................................................................... 253FIGURE 5-21: SLOT STATUS REGISTER............................................................................ 255FIGURE 5-22: ROOT CONTROL REGISTER........................................................................ 256FIGURE 5-23: ROOT STATUS REGISTER........................................................................... 257FIGURE 5-24: <strong>PCI</strong> EXPRESS EXTENDED CONFIGURATION SPACE LAYOUT ..................... 258FIGURE 5-25: <strong>PCI</strong> EXPRESS ENHANCED CAPABILITY HEADER ....................................... 259FIGURE 5-26: <strong>PCI</strong> EXPRESS ADVANCED ERROR REPORTING EXTENDED CAPABILITYSTRUCTURE.............................................................................................................. 260FIGURE 5-27: ADVANCED ERROR REPORTING ENHANCED CAPABILITY HEADER ........... 261FIGURE 5-28: UNCORRECTABLE ERROR STATUS REGISTER ............................................ 262FIGURE 5-29: UNCORRECTABLE ERROR MASK REGISTER............................................... 263FIGURE 5-30: UNCORRECTABLE ERROR SEVERITY REGISTER......................................... 264FIGURE 5-31: CORRECTABLE ERROR STATUS REGISTER................................................. 265FIGURE 5-32: CORRECTABLE ERROR MASK REGISTER ................................................... 265FIGURE 5-33: ADVANCED ERROR CAPABILITIES AND CONTROL REGISTER .................... 266FIGURE 5-34: HEADER LOG REGISTER............................................................................ 267FIGURE 5-35: ROOT ERROR COMMAND REGISTER.......................................................... 268FIGURE 5-36: ROOT ERROR STATUS REGISTER............................................................... 269FIGURE 5-37: ERROR SOURCE IDENTIFICATION REGISTER.............................................. 270FIGURE 5-38: <strong>PCI</strong> EXPRESS VIRTUAL CHANNEL CAPABILITY STRUCTURE..................... 271FIGURE 5-39: VIRTUAL CHANNEL ENHANCED CAPABILITY HEADER.............................. 272FIGURE 5-40: PORT VC CAPABILITY REGISTER 1 ........................................................... 27311


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV 1.0FIGURE 5-41: PORT VC CAPABILITY REGISTER 2 ........................................................... 275FIGURE 5-42: PORT VC CONTROL REGISTER.................................................................. 276FIGURE 5-43: PORT VC STATUS REGISTER..................................................................... 277FIGURE 5-44: VC RESOURCE CAPABILITY REGISTER ..................................................... 277FIGURE 5-45: VC RESOURCE CONTROL REGISTER ......................................................... 279FIGURE 5-46: VC RESOURCE STATUS REGISTER ............................................................ 281FIGURE 5-47: STRUCTURE OF AN EXAMPLE VC ARBITRATION TABLE WITH 32-PHASES.283FIGURE 5-48: EXAMPLE PORT ARBITRATION TABLE WITH 128 PHASES AND 2-BIT TABLEENTRIES ................................................................................................................... 284FIGURE 5-49: <strong>PCI</strong> EXPRESS DEVICE SERIAL NUMBER CAPABILITY STRUCTURE ............ 285FIGURE 5-50: DEVICE SERIAL NUMBER ENHANCED CAPABILITY HEADER ..................... 285FIGURE 5-51: SERIAL NUMBER REGISTER....................................................................... 286FIGURE 5-52: <strong>PCI</strong> EXPRESS POWER BUDGETING CAPABILITY STRUCTURE .................... 287FIGURE 5-53: POWER BUDGETING ENHANCED CAPABILITY HEADER ............................. 287FIGURE 5-54: POWER BUDGETING DATA REGISTER........................................................ 289FIGURE 5-55: POWER BUDGET CAPABILITY REGISTER ................................................... 291FIGURE 6-1: LINK POWER MANAGEMENT STATE TRANSITIONS...................................... 297FIGURE 6-2: ENTRY INTO L1 LINK STATE....................................................................... 303FIGURE 6-3: EXIT FROM L1 LINK STATE INITIATED BY UPSTREAM COMPONENT............ 306FIGURE 6-4: A CONCEPTUAL PME CONTROL STATE MACHINE ..................................... 313FIGURE 6-5: L1 TRANSITION SEQUENCE ENDINGWITHAREJECTION ............................. 324FIGURE 6-6: L1 SUCCESSFUL TRANSITION SEQUENCE.................................................... 324FIGURE 6-7: EXAMPLE OF L1 EXIT LATENCY COMPUTATION......................................... 326FIGURE 6-8: EXAMPLE OF PME MESSAGE ADDRESSING IN A <strong>PCI</strong> EXPRESS-TO-<strong>PCI</strong>BRIDGE .................................................................................................................... 334FIGURE 7-1: ERROR CLASSIFICATION.............................................................................. 339FIGURE 7-2: AN EXAMPLE OF SYMMETRICAL TC TO VC MAPPING................................ 349FIGURE 7-3: AN EXAMPLE OF ASYMMETRICAL TC TO VC MAPPING ............................. 350FIGURE 7-4: AN EXAMPLE OF TRAFFIC FLOW ILLUSTRATING INGRESS AND EGRESS....... 351FIGURE 7-5: AN EXAMPLE OF DIFFERENTIATED TRAFFIC FLOW THROUGH A SWITCH.... 351FIGURE 7-6: SWITCH ARBITRATION STRUCTURE............................................................. 352FIGURE 7-7: VC ID AND PRIORITY ORDER –AN EXAMPLE............................................ 354FIGURE 7-8: HOT PLUG LOGIC ........................................................................................ 378FIGURE A-1: AN EXAMPLE SHOWING ENDPOINT-TO-ROOT-COMPLEX AND PEER-TO-PEERCOMMUNICATION MODELS ...................................................................................... 384FIGURE A-2: TWO BASIC BANDWIDTH RESOURCING PROBLEMS: OVER-SUBSCRIPTIONAND CONGESTION .................................................................................................... 385FIGURE A-3: A SIMPLIFIED EXAMPLE ILLUSTRATING <strong>PCI</strong> EXPRESS ISOCHRONOUSPARAMETERS ........................................................................................................... 389FIGURE A-4: AN EXAMPLE OF <strong>PCI</strong> EXPRESS TOPOLOGY SUPPORTING ISOCHRONOUSAPPLICATIONS.......................................................................................................... 392FIGURE C-1: SCRAMBLING SPECTRUM FOR DATA VALUE OF 0....................................... 41512


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV 1.0TablesTABLE 2-1: TRANSACTION TYPES FOR DIFFERENT ADDRESS SPACES............................... 44TABLE 2-2: ORDERING ATTRIBUTES ................................................................................. 51TABLE 2-3: CACHE COHERENCY MANAGEMENT ATTRIBUTE ........................................... 51TABLE 2-4 DEFINITION OF TC FIELD ENCODINGS ............................................................. 52TABLE 2-5: ORDERING RULES SUMMARY TABLE ............................................................. 53TABLE 2-6: TC TO VC MAPPING EXAMPLE ...................................................................... 59TABLE 2-7: TD AND EP FIELD VALUES............................................................................ 63TABLE 2-8: FMT[1:0] AND TYPE[4:0] FIELD ENCODINGS ................................................. 63TABLE 2-9: MESSAGE ROUTING........................................................................................ 65TABLE 2-10: MSG CODES.................................................................................................. 73TABLE 2-11: MSGDCODES .............................................................................................. 76TABLE 2-12: SWITCH MAPPING FOR INTX........................................................................ 91TABLE 2-13: POWER MANAGEMENT SYSTEM MESSAGES ................................................. 93TABLE 2-14: ERROR MESSAGES........................................................................................ 94TABLE 2-15: HOT PLUG SIGNALING MESSAGES................................................................ 97TABLE 2-16: FLOW CONTROL CREDIT TYPES ................................................................. 100TABLE 2-17: TLP FLOW CONTROL CREDIT CONSUMPTION ............................................ 101TABLE 2-18: MINIMUM FLOW CONTROL ADVERTISEMENTS........................................... 102TABLE 2-19: UPDATEFC TRANSMISSION LATENCY GUIDELINES BY LINK WIDTH ANDMAX PAYLOAD (SYMBOL TIMES) ............................................................................ 108TABLE 2-20: MAPPING OF BITS INTO ECRC FIELD......................................................... 110TABLE 3-1: DLLP TYPE ENCODINGS............................................................................. 125TABLE 3-2: MAPPING OF BITS INTO CRC FIELD ............................................................. 129TABLE 3-3: MAPPING OF BITS INTO LCRC FIELD........................................................... 133TABLE 3-4: REPLAY_TIMER LIMITS BY LINK WIDTH AND MAX_PAYLOAD_SIZE(SYMBOL TIMES)TOLERANCE: -0% / +100%.......................................................... 136TABLE 3-5: ACK TRANSMISSION LATENCY LIMIT AND ACKFACTOR BY LINK WIDTH ANDMAX PAYLOAD (SYMBOL TIMES) ............................................................................ 147TABLE 4-1: SPECIAL SYMBOLS ....................................................................................... 152TABLE 4-2: TS1 ORDERED-SET ...................................................................................... 159TABLE 4-3: TS2 ORDERED-SET ...................................................................................... 160TABLE 4-4: DIFFERENTIAL TRANSMITTER (TX)OUTPUT SPECIFICATIONS...................... 206TABLE 4-5: DIFFERENTIAL RECEIVER (RX) INPUT SPECIFICATIONS................................. 211TABLE 5-1: CONFIGURATION ADDRESS MAPPING........................................................... 218TABLE 5-2: REGISTER (AND REGISTER BIT-FIELD) TYPES .............................................. 221TABLE 5-3: COMMAND REGISTER................................................................................... 224TABLE 5-4: STATUS REGISTER ........................................................................................ 225TABLE 5-5: SECONDARY STATUS REGISTER ................................................................... 230TABLE 5-6: BRIDGE CONTROL REGISTER........................................................................ 231TABLE 5-7: POWER MANAGEMENT CAPABILITIES .......................................................... 232TABLE 5-8: POWER MANAGEMENT STATUS/CONTROL ................................................... 233TABLE 5-9: <strong>PCI</strong> EXPRESS CAPABILITY LIST REGISTER................................................... 235TABLE 5-10: <strong>PCI</strong> EXPRESS CAPABILITIES REGISTER ...................................................... 23613


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV 1.0TABLE 5-11: DEVICE CAPABILITIES REGISTER................................................................ 237TABLE 5-12: DEVICE CONTROL REGISTER...................................................................... 241TABLE 5-13: DEVICE STATUS REGISTER......................................................................... 245TABLE 5-14: LINK CAPABILITIES REGISTER.................................................................... 246TABLE 5-15: LINK CONTROL REGISTER .......................................................................... 248TABLE 5-16: LINK STATUS REGISTER ............................................................................. 250TABLE 5-17: SLOT CAPABILITIES REGISTER ................................................................... 251TABLE 5-18: SLOT CONTROL REGISTER.......................................................................... 253TABLE 5-19: SLOT STATUS REGISTER............................................................................. 255TABLE 5-20: ROOT CONTROL REGISTER......................................................................... 257TABLE 5-21: ROOT STATUS REGISTER ............................................................................ 258TABLE 5-22: <strong>PCI</strong> EXPRESS ENHANCED CAPABILITY HEADER ........................................ 259TABLE 5-23: ADVANCED ERROR REPORTING ENHANCED CAPABILITY HEADER ............ 261TABLE 5-24: UNCORRECTABLE ERROR STATUS REGISTER ............................................. 262TABLE 5-25: UNCORRECTABLE ERROR MASK REGISTER................................................ 263TABLE 5-26: UNCORRECTABLE ERROR SEVERITY REGISTER.......................................... 264TABLE 5-27: CORRECTABLE ERROR STATUS REGISTER.................................................. 265TABLE 5-28: CORRECTABLE ERROR MASK REGISTER .................................................... 266TABLE 5-29: ADVANCED ERROR CAPABILITIES REGISTER.............................................. 266TABLE 5-30: HEADER LOG REGISTER ............................................................................. 267TABLE 5-31: ROOT ERROR COMMAND REGISTER ........................................................... 268TABLE 5-32: ROOT ERROR STATUS REGISTER ................................................................ 269TABLE 5-33: ERROR SOURCE IDENTIFICATION REGISTER ............................................... 270TABLE 5-34: VIRTUAL CHANNEL ENHANCED CAPABILITY HEADER............................... 272TABLE 5-35: PORT VC CAPABILITY REGISTER 1 ............................................................ 273TABLE 5-36: PORT VC CAPABILITY REGISTER 2............................................................. 275TABLE 5-37: PORT VC CONTROL REGISTER ................................................................... 276TABLE 5-38: PORT VC STATUS REGISTER ...................................................................... 277TABLE 5-39: VC RESOURCE CAPABILITY REGISTER....................................................... 278TABLE 5-40: VC RESOURCE CONTROL REGISTER .......................................................... 279TABLE 5-41: VC RESOURCE STATUS REGISTER ............................................................. 282TABLE 5-42: DEFINITION OF THE 4-BIT ENTRIES IN THE VC ARBITRATION TABLE ......... 283TABLE 5-43 LENGTH OF THE VC ARBITRATION TABLE................................................... 283TABLE 5-44: LENGTH OF PORT ARBITRATION TABLE ..................................................... 284TABLE 5-45: DEVICE SERIAL NUMBER ENHANCED CAPABILITY HEADER ...................... 285TABLE 5-46: SERIAL NUMBER REGISTER........................................................................ 286TABLE 5-47: POWER BUDGETING ENHANCED CAPABILITY HEADER .............................. 288TABLE 5-48: POWER BUDGETING DATA REGISTER......................................................... 289TABLE 5-49: POWER BUDGET CAPABILITY REGISTER .................................................... 291TABLE 6-1: SUMMARY OF <strong>PCI</strong> EXPRESS LINK POWER MANAGEMENT STATES............... 298TABLE 6-2: RELATION BETWEEN POWER MANAGEMENT STATES OF LINK ANDCOMPONENTS........................................................................................................... 302TABLE 6-3: ENCODING OF THE ACTIVE STATE LINK PM SUPPORT FIELD....................... 327TABLE 6-4: DESCRIPTION OF THE SLOT CLOCK CONFIGURATION FIELD ......................... 327TABLE 6-5: DESCRIPTION OF THE COMMON CLOCK CONFIGURATION FIELD .................. 328TABLE 6-6: ENCODING OF THE L0S EXIT LATENCY FIELD .............................................. 32814


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV 1.0TABLE 6-7: ENCODING OF THE L1 EXIT LATENCY FIELD................................................ 329TABLE 6-8: ENCODING OF THE ENDPOINT L0S ACCEPTABLE LATENCY FIELD................ 329TABLE 6-9: ENCODING OF THE ENDPOINT L1 ACCEPTABLE LATENCY FIELD ................. 330TABLE 6-10: ENCODING OF THE ACTIVE STATE LINK PM CONTROL FIELD.................... 330TABLE 6-11: POWER MANAGEMENT SYSTEM MESSAGES AND DLLPS........................... 333TABLE 7-1: ERROR MESSAGES........................................................................................ 341TABLE 7-2: PHYSICAL LAYER ERROR LIST ..................................................................... 344TABLE 7-3: DATA LINK LAYER ERROR LIST................................................................... 344TABLE 7-4: TRANSACTION LAYER ERROR LIST .............................................................. 345TABLE 7-5: ELEMENTS OF THE STANDARD USAGE MODEL............................................. 367TABLE 7-6: ATTENTION INDICATOR STATES ................................................................... 368TABLE 7-7: POWER INDICATOR STATES .......................................................................... 369TABLE 7-8: EVENT BEHAVIOR ........................................................................................ 371TABLE A-1: ISOCHRONOUS BANDWIDTH RANGES AND GRANULARITIES........................ 387TABLE A-2: MAXIMUM NUMBER OF VIRTUAL TIMESLOTS ALLOWED FOR DIFFERENT <strong>PCI</strong>EXPRESS LINKS AT 2.5 GHZ..................................................................................... 393TABLE B-1: 8B/10B DATA SYMBOL CODES..................................................................... 399TABLE B-2: 8B/10B SPECIAL CHARACTER SYMBOL CODES ............................................ 40715


16<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV 1.0


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0PrefaceTraditional multi-drop, parallel bus technology is approaching its practical performancelimits. It is clear that balancing system performance requires I/O bandwidth to scale withprocessing and application demands. There is an industry mandate to re-engineer I/Oconnectivity within cost constraints. <strong>PCI</strong> <strong>Express</strong> comprehends the many I/O requirementspresented across the spectrum of computing and communications platforms, and rolls theminto a common scalable and extensible I/O industry specification. Alongside theseincreasing performance demands, the enterprise server and communications markets havethe need for improved reliability, security, and quality of service guarantees. Thisspecification will therefore be applicable to multiple market segments.Technology advances in high-speed, point-to-point interconnects enable us to break awayfrom the bandwidth limitations of multi-drop, parallel buses. The <strong>PCI</strong> <strong>Express</strong> basicphysical layer consists of a differential transmit pair and a differential receive pair. Dualsimplex data on these point-to-point connections is self-clocked and its bandwidth increaseslinearly with interconnect width and frequency. <strong>PCI</strong> <strong>Express</strong> takes an additional step ofincluding a message space within its bus protocol that is used to implement legacy “sideband”signals. This further reduction of signal pins produces a very low pin countconnection for components and adapters. The <strong>PCI</strong> <strong>Express</strong> Transaction, Data Link, andPhysical Layers are optimized for chip-to-chip and board-to-board interconnect applications.An inherent limitation of today’s <strong>PCI</strong>-based platforms is the lack of support for isochronousdata delivery, an attribute that is especially important to streaming media applications. Toenable these emerging applications, <strong>PCI</strong> <strong>Express</strong> adds a virtual channel mechanism. Inaddition to use for support of isochronous traffic, the virtual channel mechanism providesan infrastructure for future extensions in supporting new applications. By adhering to the<strong>PCI</strong> Software Model, today’s applications are easily migrated even as emerging applicationsare enabled.Key <strong>PCI</strong> <strong>Express</strong> architectural attributes include:• Continuation of the <strong>PCI</strong> Software Model• Serial, differential, low-voltage signaling• Layered architecture enabling physical layer attachment to copper, optical, oremerging physical signaling media• Predictable, low latency suitable for applications requiring isochronous data delivery• Robust data integrity and error handling in support of highly reliable systems• Embedded clocking scheme using 8 bit/10 bit encoding• High bandwidth per pin• Bandwidth scalability through Lane width and frequency• Hot attach and detach capability• Aggressive power management capabilities17


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0Objective of the <strong>Specification</strong>This specification describes the <strong>PCI</strong> <strong>Express</strong> architecture, interconnect attributes, busmanagement, and the programming interface required to design and build systems andperipherals that are compliant with the <strong>PCI</strong> <strong>Express</strong> specification.The goal is to enable such devices from different vendors to inter-operate in an openarchitecture. The specification is intended as an enhancement to the <strong>PCI</strong> architecturespanning multiple market segments; Clients (Desktops and Mobile), Servers (Standard andEnterprise), Embedded and Communication devices. The specification allows system OEMsand peripheral developers adequate room for product versatility and market differentiationwithout the burden of carrying obsolete interfaces or losing compatibility.Document OrganizationThe <strong>PCI</strong> <strong>Express</strong> specification is organized as a <strong>Base</strong> <strong>Specification</strong> and a set of companiondocuments. At this time, the <strong>PCI</strong> <strong>Express</strong> <strong>Base</strong> <strong>Specification</strong> and the <strong>PCI</strong> <strong>Express</strong> CardElectromechanical <strong>Specification</strong> are being published. As the <strong>PCI</strong> <strong>Express</strong> definition evolves,other companion documents will be published.The <strong>PCI</strong> <strong>Express</strong> <strong>Base</strong> <strong>Specification</strong> contains the technical details of the architecture,protocol, Link layer, physical layer, and software interface. The <strong>PCI</strong> <strong>Express</strong> <strong>Base</strong><strong>Specification</strong> is applicable to all.The <strong>PCI</strong> <strong>Express</strong> Card Electromechanical <strong>Specification</strong> focuses on information necessary toimplementing an evolutionary strategy with the current <strong>PCI</strong> desktop/server mechanicals aswell as electricals. The mechanical chapters of the specification contains definition ofevolutionary <strong>PCI</strong> <strong>Express</strong> card edge connectors while the electrical chapters cover auxiliarysignals, power delivery, and add-in card interconnect electrical budget.18


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0Documentation ConventionsCapitalizationSome terms are capitalized to distinguish their definition in the context of this documentfrom their common English meaning. Words not capitalized have their common Englishmeaning. When terms such as “memory write” or “memory read” appear completely inlower case, they include all transactions of that type.Register names and the names of fields and bits in registers and headers are presented withthe first letter capitalized and the remainder in lower case.Numbers and Number <strong>Base</strong>sHexadecimal numbers are written with a lower case “h” suffix, e.g., 0FFFFh and 80h.Hexadecimal numbers larger than four digits are represented with a space dividing eachgroup of four digits, as in 1E FFFF FFFFh. Binary numbers are written with a lower case“b” suffix, e.g., 1001b and 10b. Binary numbers larger than four digits are written with aspace dividing each group of four digits, as in 1000 0101 0010b.All other numbers are decimal.Reference InformationReference information is provided in various places to assist the reader and does notrepresent a requirement of this document. Such references are indicated by the abbreviation“(ref).” For example, in some places, a clock that is specified to have a minimum period of400 ps also includes the reference information maximum clock frequency of “2.5 GHz(ref).” Requirements of other specifications also appear in various places throughout thisdocument and are marked as reference information. Every effort has been made toguarantee that this information accurately reflects the referenced document; however, in caseof a discrepancy, the original document takes precedence.Implementation NotesImplementation Notes should not be considered to be part of this specification. They areincluded for clarification and illustration only. Implementation Notes within this documentare enclosed in a box and set apart from other text.19


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0Terms and Abbreviations8b/10bThe data encoding scheme 1 used in the <strong>PCI</strong> <strong>Express</strong> Physical Layer.Advertise (Credits) The term Advertise is used in the context of Flow Control to refer to theact of a Receiver sending information regarding its Flow Control Creditavailability by using a Flow Control Update Message.assertedThe active logical state of a conceptual or actual signal.attributeTransaction handling preferences indicated by specified Packet headerbits and fields (for example, non-snoop).core features A set of required features that must be supported by a device for it to beconsidered compliant to the <strong>PCI</strong> <strong>Express</strong> <strong>Specification</strong>.Beacon 30 kHz–500 MHz signal used to exit L2.BridgeA device which virtually or actually connects a <strong>PCI</strong>/<strong>PCI</strong>-X segment or<strong>PCI</strong> <strong>Express</strong> Port with an internal component interconnect or another<strong>PCI</strong>/<strong>PCI</strong>-X segment or <strong>PCI</strong> <strong>Express</strong> Port. A Bridge must include asoftware configuration interface as described in this document.x8Refers to a Link or Port with eight Physical Lanes.x1Refers to a Link or Port with one Physical Lane.xNRefers to a Link with “N” Physical Lanes.CharacterAn 8 bit quantity treated as an atomic entity; a Byte.cold resetA “Power Good Reset” following the application of power.CompleterThe logical device addressed by a Request.Completer ID The combination of a Completer's Bus Number, Device Number, andFunction Number which uniquely identifies the Completer of the Request.CompletionA Packet used to terminate, or to partially terminate, a Sequence isreferred to as a Completion. A Completion always corresponds to apreceding Request, and in some cases includes data.Configuration Space One of the four address spaces within the <strong>PCI</strong> <strong>Express</strong> architecture.Packets with a Configuration Space address are used to configure adevice.conventional <strong>PCI</strong> Protocol conforming to the <strong>PCI</strong> Local Bus <strong>Specification</strong>, Rev. 2.3.componentA physical device (a single package).Data Link Layer The intermediate layer of the <strong>PCI</strong> <strong>Express</strong> architecture that sits betweenthe Transaction Layer and the Physical Layer.DLLP orData Link Layer Packet Packet generated in the Data Link Layer to support Link managementfunctions.1 IBM Journal of Research and Development, Vol 27, #5, Sept 1983 “A DC-Balanced, Partitioned-Block8B/10B Transmission Code” by Widmer and Franaszek.20


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0Data PayloaddeasserteddeviceDownstreamDFTDWORD, DWegressEgress PortElectrical IdleElectrical Idle ExitEndpointError Recovery,Error DetectionFlow ControlFCP orFlow Control PacketfunctionheaderHierarchySome Packets include information following the header that is destinedfor consumption by the logical device receiving the Packet (for example,Write Requests or Read Completions). This information is called a DataPayload.The term deasserted refers to the inactive logical state of a conceptual oractual signal.A logical device, corresponding to a <strong>PCI</strong> device configuration space.May be used to refer to either a single or multi-function device.Downstream refers either to the relative position of aninterconnect/system element (Link/device) as something that is fartherfrom the Root Complex, or to a direction of information flow, i.e., wheninformation is flowing away from the Root Complex. The Ports on aSwitch which are not the Upstream Port are Downstream Ports. All Portson a Root Complex are Downstream Ports. The Downstreamcomponent on a Link is the component farther from the Root Complex.Acronym for Design for Testability.Four bytes of data on a naturally aligned four-byte boundary (i.e., theleast significant two bits of the address are 00b).Refers to direction. Means outgoing, i.e., transmitting direction.Transmitting port, i.e., the port that sends outgoing traffic. Typically usedas a reference to the role that port of the Switch has in the context of atransaction or more broadly in the context of traffic flow.State of the output driver where both lines, D+ and D-, are driven to theDC common mode voltage.When a receiver currently in Electrical Idle detects a signal at its inputport.A <strong>PCI</strong> <strong>Express</strong> device with a Type 00h Configuration Space header.Refers to the mechanisms for ensuring integrity of data transfer,including the management of the transmit side retry buffer(s).A method for communicating receive buffer information from a Receiverto a Transmitter to prevent receive buffer overflow and allow Transmittercompliance with ordering rules.DLLP used to send Flow Control information from the Transaction Layerin one component to the Transaction Layer in another component.A logical function corresponding to a <strong>PCI</strong> function configuration space.May be used to refer to one function of a multi-function device, or to theonly function in a single-function device.A set of fields that appear at the front of a Packet that contain theinformation required to determine the characteristics and purpose of thePacket.The Hierarchy defines the I/O interconnect topology supported by the<strong>PCI</strong> <strong>Express</strong> Architecture.21


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0Hierarchy DomainHost Bridgehot resetingressIngress PortI/O SpaceisochronousinvariantLaneLayerLinkLinkUpLogical Buslogical deviceLogical IdleMalformed PacketMemory SpaceMessageA <strong>PCI</strong> <strong>Express</strong> Hierarchy is segmented into multiple fragments by theRoot Complex that sources more than one <strong>PCI</strong> <strong>Express</strong> interface. Thesesub-hierarchies are called Hierarchy Domains.A Host Bridge is a part of a Root Complex which connects a host CPU orCPUs to a <strong>PCI</strong> <strong>Express</strong> Hierarchy.A reset propagated in-band across a Link using a Physical LayerMechanism.Refers to direction. Means incoming, i.e., receiving direction.Receiving port, i.e., the port that accepts incoming traffic. Typically usedas a reference to the role that port of the Switch has in the context of atransaction or more broadly in the context of traffic flow.One of the four address spaces of the <strong>PCI</strong> <strong>Express</strong> architecture.Identical to the I/O space defined in <strong>PCI</strong>.Refers to data associated with time-sensitive applications, such as audioor video applications.An invariant field of a TLP Header contains a value which cannot legallybe modified as the TLP flows through the <strong>PCI</strong> <strong>Express</strong> fabric.A set of differential signal pairs, one pair for transmission and one pairfor reception. A by-N Link is composed of N Lanes.Unit of distinction applied to the <strong>PCI</strong> <strong>Express</strong> <strong>Specification</strong> to clarify thebehavior of key elements of the interface. The use of the term Layer isnot intended to imply a specific implementation.A dual-simplex communications path between two components. Thecollection of two Ports and their interconnecting Lanes.Status from the Physical layer to the Link layer indicating both ends ofthe Link are connected.The logical connection among a collection of devices that have the samebus number in Configuration Space.An element of a <strong>PCI</strong> <strong>Express</strong> system that responds to a unique devicenumber in Configuration Space. As for physical devices in <strong>PCI</strong> 2.3,logical devices either include a single function or are multi-functiondevices. Furthermore, the term “logical device” is often used whendescribing requirements that apply individually to all functions within thelogical device. Unless otherwise specified, logical device requirementsin this specification apply to single function logical devices and to eachfunction individually of a multi-function logical device.A period of one or more symbol times when no information: TLPs,DLLPs, or any special symbol is being transmitted or received. Unlikeelectrical idle, during logical idle the idle character is being transmittedand received.A TLP which violates TLP formation rules.One of the four address spaces of the <strong>PCI</strong> <strong>Express</strong> architecture.Identical to the memory space defined in <strong>PCI</strong>.A Packet with a Message Space type.22


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0Message SignaledInterrupt, MSIMessage Spacenaturally alignedPacketAn optional feature that enables a device to request service by writing asystem-specified DW of data to a system-specified address using aMemory Write semantic Request.One of the four address spaces of the <strong>PCI</strong> <strong>Express</strong> architecture.Used in reference to a data payload which is some power of two inlength (L), indicates that the starting address of the data payload equalsan integer multiple of L.A fundamental unit of information transfer consisting of a header that, insome cases, is followed by a Data Payload.<strong>PCI</strong> bus The <strong>PCI</strong> Local Bus, as specified in the <strong>PCI</strong> 2.3 and <strong>PCI</strong>-X 1.0aspecifications.<strong>PCI</strong> Software Model The software model necessary to initialize, discover, configure, and use<strong>PCI</strong> device, as specified in <strong>PCI</strong> 2.3, <strong>PCI</strong>-X 1.0a, and <strong>PCI</strong> BIOSspecifications.Phantom FunctionNumber, PFNPhysical LanePhysical LayerPortPPMQWORD, QWReceiverReceiving PortreservedRequestRequesterAn unclaimed function number that may be used to expand the numberof outstanding transaction identifiers by logically combining the PFN withthe Tag identifier to create a unique transaction identification tuple.See Lane.The layer of the <strong>PCI</strong> <strong>Express</strong> architecture that directly interacts with thecommunication medium between the two components.In a logical sense, an interface associated with a component, betweenthat component and a <strong>PCI</strong> <strong>Express</strong> Link. In physical terms, a group oftransmitters and receivers physically located on the same chip thatdefine a Link.Parts per Million – Applied to frequency, this is the difference, inmillionths of a Hertz, between some stated ideal frequency, and themeasured long-term average of a frequency.Sixty-four bits (eight bytes) of data on a naturally aligned eight-byteboundary (i.e., the least significant three bits of the address are 000b).The component receiving Packet information across a Link.A Port on which a Packet is received.The contents, states, or information are not defined at this time. Usingany reserved area (for example, packet header bit-fields, configurationregister bits) in the <strong>PCI</strong> <strong>Express</strong> <strong>Specification</strong> is not permitted. Any useof the reserved areas of the <strong>PCI</strong> <strong>Express</strong> <strong>Specification</strong> will result in aproduct that is not <strong>PCI</strong> <strong>Express</strong>-compliant. The functionality of any suchproduct cannot be guaranteed in this or any future revision of the <strong>PCI</strong><strong>Express</strong> <strong>Specification</strong>.A Packet used to initiate a Sequence is referred to as a Request. ARequest includes some operation code, and, in some cases, it includesaddress and length, data, or other information.A logical device that first introduces a Sequence into the <strong>PCI</strong> <strong>Express</strong>domain.23


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0Requester IDRoot ComplexRoot PortSequenceThe combination of a Requester's Bus Number, Device Number, andFunction Number that uniquely identifies the Requester. In most cases,a <strong>PCI</strong> <strong>Express</strong> bridge or Switch forwards Requests from one interface toanother without modifying the Requester ID. A bridge from a bus otherthan <strong>PCI</strong> <strong>Express</strong> (including a <strong>PCI</strong> bus operating in conventional mode)must store the Requester ID for use when creating a Completion for theRequest.An entity that includes a Host Bridge and one or more Root Ports.A <strong>PCI</strong> <strong>Express</strong> Port, on a Root Complex, that maps a portion of the <strong>PCI</strong><strong>Express</strong> interconnect Hierarchy through an associated virtual <strong>PCI</strong>-<strong>PCI</strong>Bridge.A single Request and zero or more Completions associated with carryingout a single logical transfer by a Requester.Standard Hot-PlugController (SHPC) A <strong>PCI</strong> hot-plug controller compliant with SHPC 1.0.Split Transaction A single logical transfer containing an initial transaction (the SplitRequest) that the target (the completer or a bridge) terminates with SplitResponse, followed by one or more transactions (the Split Completions)initiated by the completer (or bridge) to send the read data (if a read) or acompletion message back to the requester.SwitchA Switch connects two or more Ports to allow Packets to be routed fromone Port to another. To configuration software, a Switch presents theappearance of an assemblage of <strong>PCI</strong>-to-<strong>PCI</strong> Bridges.SymbolA 10 bit quantity produced as the result of 8b/10b encoding.Symbol Time The period of time required to place a Symbol on a Lane (ten times theUnit Interval).TagA number assigned to a given Non-posted Request to distinguishCompletions for that Request from other Requests.TBDTo be defined by <strong>PCI</strong>-SIG.Transaction DescriptorAn element of a Packet header that, in addition to Address, Length, andType, describes the properties of the Transaction.TLP orTransaction Layer PacketA Packet generated in the Transaction Layer to convey a Request orCompletion.Transaction Layer The outermost layer of the <strong>PCI</strong> <strong>Express</strong> architecture that operates at thelevel of transactions (for example, read, write).TransceiverThe physical transmitter and receiver pair on a single chip.TransmitterThe component sending Packet information across a Link is theTransmitter.Unsupported Request,URA Request Packet that specifies some action or access to some spacethat is not supported by the Target.24


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0Unit Interval, UIUpstreamvariantwarm resetGiven a data stream of 1010… pattern, the Unit Interval is the valuemeasured by averaging the time interval between voltage transitions,over a time interval long enough to make all intentional frequencymodulation of the source clock negligible.Upstream refers either to the relative position of an interconnect/systemelement (Link/device) as something that is closer to the Root Complex,or to a direction of information flow, i.e., when information is flowingtowards the Root Complex. The Port on a Switch which is closesttopologically to the Root Complex is the Upstream Port. The Port on anEndpoint or Bridge component is an Upstream Port. The Upstreamcomponent on a Link is the component closer to the Root Complex.A variant field of a TLP Header contains a value which is subject topossible modification according to the rules of this specification as theTLP flows through the <strong>PCI</strong> <strong>Express</strong> fabric.A reset caused by driving “Power Good” inactive and then active, butwithout cycling the supplied power.Reference Documents<strong>PCI</strong> <strong>Express</strong> Card Electromechanical <strong>Specification</strong>, Rev. 1.0<strong>PCI</strong> Local Bus <strong>Specification</strong>, Rev. 2.3<strong>PCI</strong>-X Addendum to the <strong>PCI</strong> Local Bus <strong>Specification</strong>, Rev. 1.0a<strong>PCI</strong> Hot-Plug <strong>Specification</strong>, Rev. 1.1<strong>PCI</strong> Standard Hot-Plug Controller and Subsystem <strong>Specification</strong>, Rev. 1.0<strong>PCI</strong>-to-<strong>PCI</strong> Bridge Architecture <strong>Specification</strong>, Rev. 1.1<strong>PCI</strong> Power Management Interface <strong>Specification</strong>, Rev. 1.1Advanced Configuration and Power Interface <strong>Specification</strong>, Rev. 2.0Guidelines for 64-bit Global Identifier (EUI-64) Registration Authority25


26<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.011. IntroductionThis chapter presents an overview of the <strong>PCI</strong> <strong>Express</strong> architecture and key concepts. <strong>PCI</strong><strong>Express</strong> is a high performance, general purpose I/O Interconnect defined for a wide varietyof future computing and communication platforms. Key <strong>PCI</strong> attributes, such as its usagemodel, load-store architecture, and software interfaces, are maintained, whereas itsbandwidth-limiting, parallel bus implementation is replaced by a highly scalable, fully serialinterface. <strong>PCI</strong> <strong>Express</strong> takes advantage of recent advances in point-to-point interconnects,Switch-based technology, and packetized protocol to deliver new levels of performance andfeatures. Power Management, Quality Of Service(QoS), Hot Plug/Hot Swap support, DataIntegrity, and Error Handling are among some of the advanced features supported by <strong>PCI</strong><strong>Express</strong>.1.1. A Third Generation I/O InterconnectThe high-level requirements for this third generation I/O interconnect are as follows:• Supports multiple market segments and emerging applications:• Unifying I/O architecture for desktop, mobile, workstation, server,communications platforms, and embedded devices• Ability to deliver low cost, high volume solutions:• Cost at or below <strong>PCI</strong> cost structure at the system level• Support multiple platform interconnect usages:• Chip-to-chip, board-to-board via connector or cabling• New mechanical form factors:• Mobile, <strong>PCI</strong>-like form factor and modular, cartridge form factor• <strong>PCI</strong> compatible software model:• Ability to enumerate and configure <strong>PCI</strong> <strong>Express</strong> hardware using <strong>PCI</strong> systemfirmware implementations with no modifications• Ability to boot existing operating systems with no modifications• Ability to support existing I/O device drivers with no modifications• Ability to configure/enable new <strong>PCI</strong> <strong>Express</strong> functionality by adopting the <strong>PCI</strong>configuration paradigm27


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0• Performance:• Low-overhead, low-latency communications to maximize application payloadbandwidth and Link efficiency• High-bandwidth per pin to minimize pin count per device and connectorinterface• Scalable performance via aggregated Lanes and signaling frequency• Advanced features:• Comprehend different data types and ordering rules• Power management and budgeting• Ability to identify power management capabilities of a given function• Ability to transition a function into a specific power state• Ability to receive notification of the current power state of a function• Ability to propagate an event to wake the system• Ability to sequence device power-up to allow graceful platform policy inpower budgeting.• Ability to support differentiated services, i.e. different qualities of service (QoS)• Ability to create end-to-end isochronous (time-based, injection rate control)solutions• Ability to have dedicated Link resources per QoS data flow to improve fabricefficiency / effective performance in the face of head-of-line blocking• Ability to configure fabric QoS arbitration policies within every component• Ability to tag end-to-end QoS with each packet• Hot Plug and Hot Swap support• Ability to support existing <strong>PCI</strong> hot-plug and hot-swap solutions• Ability to support native hot-plug and hot-swap solutions (no side-bandsignals required)• Ability to support a unified software model for all form factors• Multi-hierarchy and advanced peer-to-peer communications• Ability to support vendor-specific and <strong>PCI</strong> <strong>Express</strong>-standard peer-to-peercommunications messaging• Ability to Cross Link multiple hierarchies to support peer-to-peercommunications across large fabric topologies28


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0• Data Integrity• Ability to support Link-level data integrity for all types of transaction andData Link packets• Ability to support end-to-end data integrity for high availability solutions• Error Handling• Ability to support <strong>PCI</strong> error handling• Ability to support advanced error reporting and handling to improve faultisolation and recovery solutions• Process Technology Independence• Ability to support different DC common mode voltages at transmitter andreceiver• Ease of Testing• Ability to test electrical compliance via simple connection to test equipment1.2. <strong>PCI</strong> <strong>Express</strong> LinkA Link represents a dual-simplex communications channel between two components. Thefundamental <strong>PCI</strong> <strong>Express</strong> Link consists of two, low-voltage, differentially driven signal pairs:a transmit pair and a receive pair as shown in Figure 1-1.PacketComponent AComponent BPacketFigure 1-1: <strong>PCI</strong> <strong>Express</strong> LinkOM13750The primary Link attributes are:• The basic Link – <strong>PCI</strong> <strong>Express</strong> Link consists of dual unidirectional differential Links,implemented as a transmit pair and a receive pair. A data clock is embedded usingthe 8b/10b-encoding scheme to achieve very high data rates.29


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0• Signaling rate – Once initialized, each Link must only operate at one of thesupported signaling levels. For this version of the specification, there is only onesignaling rate, which provides an effective 2.5 Gigabits/second/Lane/directionof raw bandwidth. The data rate is expected to increase with the technologyadvances in future.• Lanes – A Link must support at least one Lane – each Lane represents a set ofdifferential signal pairs (one pair for transmission, one pair for reception). Toscale bandwidth, a Link may aggregate multiple Lanes denoted by xN where Nmay be any of the supported Link widths. For example, an x8 Link represents anaggregate bandwidth of 20 Gigabits / second of raw bandwidth in each direction.This version of the Physical Layer supports x1, x2, x4, x8, x12, x16, and x32Lane widths.• Initialization - During hardware initialization, each <strong>PCI</strong> <strong>Express</strong> Link is set upfollowing a negotiation of Lane widths and frequency of operation by the twoagents at each end of the Link. No firmware or operating system software isinvolved.• Symmetry – Each Link must support a symmetric number of Lanes in eachdirection, i.e., an x16 Link indicates there are 16 differential signal pairs in eachdirection.1.3. <strong>PCI</strong> <strong>Express</strong> Fabric TopologyA fabric is composed of point-to-point Links that interconnect a set of components – anexample fabric topology is shown in Figure 1-2. This figure illustrates a single fabricinstance called a hierarchy – composed of a Root Complex (RC), multiple Endpoints (I/Odevices), a Switch, and <strong>PCI</strong> <strong>Express</strong>-<strong>PCI</strong> Bridge all interconnected via <strong>PCI</strong> <strong>Express</strong> Links.Each of the components of the topology are mapped in a single flat address space and canbe addressed by <strong>PCI</strong>-like load store accesses.30


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0CPU<strong>PCI</strong> <strong>Express</strong>Endpoint<strong>PCI</strong> <strong>Express</strong>-<strong>PCI</strong>Bridge<strong>PCI</strong> <strong>Express</strong><strong>PCI</strong> <strong>Express</strong>RootComplex<strong>PCI</strong> <strong>Express</strong>Memory<strong>PCI</strong>/<strong>PCI</strong>-XSwitch<strong>PCI</strong><strong>Express</strong><strong>PCI</strong><strong>Express</strong><strong>PCI</strong><strong>Express</strong><strong>PCI</strong><strong>Express</strong>LegacyEndpointLegacyEndpoint<strong>PCI</strong> <strong>Express</strong>Endpoint<strong>PCI</strong> <strong>Express</strong>EndpointFigure 1-2: Example TopologyOM137511.3.1. Root Complex• A Root Complex (RC) denotes the root of an I/O hierarchy that connects theCPU/memory subsystem to the I/O.• As illustrated in the previous figure, a Root Complex may support one or more <strong>PCI</strong><strong>Express</strong> Ports. Each interface defines a separate I/O hierarchy domain. Eachhierarchy domain may be composed of a single I/O Endpoint or a sub-hierarchycontaining one or more Switch components and I/O Endpoints.• The capability to route peer-to-peer transactions between hierarchy domains througha Root Complex is optional and implementation dependent. For example, animplementation may incorporate a real or virtual switch internally within the RootComplex to enable full peer-to-peer support in a software transparent way.• A Root Complex must support generation of configuration requests as a Requester.• A Root Complex is permitted to support the generation of I/O requests as aRequester.• A Root Complex must not support Lock semantics as a Completer.• A Root Complex is permitted to support generation of Locked Requests as aRequester.31


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.01.3.2. Endpoints“Endpoint” refers to a type of device that can be the Requester or Completer of a <strong>PCI</strong><strong>Express</strong> transaction either on its own behalf or on behalf of a distinct non-<strong>PCI</strong> <strong>Express</strong>device (other than a <strong>PCI</strong> device or Host CPU), e.g., a <strong>PCI</strong> <strong>Express</strong> attached graphicscontroller or a <strong>PCI</strong> <strong>Express</strong>-USB interface. Endpoints are classified as either legacy or <strong>PCI</strong><strong>Express</strong> Endpoints. The specific rules for each are described in Sections 1.3.2.1 and 1.3.2.2.1.3.2.1. Legacy Endpoint Rules• A Legacy Endpoint must be a device with a Type 00h Configuration Space header.• A Legacy Endpoint must support Configuration Requests as a Completer• A Legacy Endpoint may support I/O Requests as a Completer.• A Legacy Endpoint may generate I/O Requests.• A Legacy Endpoint may support Lock memory semantics as a Completer if that isrequired by the device’s legacy software support requirements.• A Legacy Endpoint must not issue a Locked Request.1.3.2.2. <strong>PCI</strong> <strong>Express</strong> Endpoint Rules• A <strong>PCI</strong> <strong>Express</strong> Endpoint must be a device with a Type 00h Configuration Spaceheader.• A <strong>PCI</strong> <strong>Express</strong> Endpoint must support Configuration Requests as a Completer• A <strong>PCI</strong> <strong>Express</strong> Endpoint must not require I/O resources claimed through BAR(s).• A <strong>PCI</strong> <strong>Express</strong> Endpoint must not generate I/O Requests.• A <strong>PCI</strong> <strong>Express</strong> Endpoint must not support Locked Requests as a Completer orgenerate them as a Requestor. <strong>PCI</strong> <strong>Express</strong>-compliant software drivers andapplications must be written to prevent the use of lock semantics when accessing a<strong>PCI</strong> <strong>Express</strong> Endpoint.32


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.01.3.3. SwitchA Switch is defined as a logical assembly of multiple “virtual” <strong>PCI</strong>-to-<strong>PCI</strong> bridge devices asillustrated in Figure 1-3. All Switches are governed by the following base rules (advancedSwitch components will support additional capabilities beyond those described below).SwitchVirtual<strong>PCI</strong>-<strong>PCI</strong>BridgeLegend<strong>PCI</strong> <strong>Express</strong> LinkUpstream PortVirtual<strong>PCI</strong>-<strong>PCI</strong>BridgeVirtual<strong>PCI</strong>-<strong>PCI</strong>BridgeVirtual<strong>PCI</strong>-<strong>PCI</strong>BridgeDownstream PortFigure 1-3: Logical Block Diagram of a SwitchOM13752• Switches appear to configuration software as two or more logical <strong>PCI</strong>-to-<strong>PCI</strong>Bridges.• A Switch forwards transactions using <strong>PCI</strong> bridge mechanisms, e.g. address basedrouting.• A Switch may only forward peer-to-peer transactions between two downstreamports.• Except as noted in this document, a Switch must forward all types of TLPs(Transaction Layer Packets) between any set of ports.• Locked Requests must be supported as specified in Section 7.2. Switches are notrequired to support downstream Ports as initiating ports for Locked requests.• Each enabled Switch Port must comply with the flow control specification withinthis document.• Each Switch must comply with the Link-level data integrity specification within thisdocument.33


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0• A Switch is not allowed to split a packet into smaller packets, e.g. a single packet witha 256-byte payload must not be divided into two packets each of 128 bytes payload.• Arbitration between Ingress Ports (inbound Link) of a Switch may be implementedusing round robin or weighted round robin when contention occurs on the sameVirtual Channel. This is described in more detail later within the specification.1.3.4. <strong>PCI</strong> <strong>Express</strong>-<strong>PCI</strong> Bridge• A <strong>PCI</strong> <strong>Express</strong> to <strong>PCI</strong>/<strong>PCI</strong>-X Bridge has one <strong>PCI</strong> <strong>Express</strong> Port, and one or multiple<strong>PCI</strong>/<strong>PCI</strong>-X bus interfaces.• A <strong>PCI</strong> <strong>Express</strong> to <strong>PCI</strong>/<strong>PCI</strong>-X Bridge must support all required <strong>PCI</strong> and/or <strong>PCI</strong>-Xtransactions on its <strong>PCI</strong> interface.• Locked Requests must be supported as specified in Chapter 7. <strong>PCI</strong> <strong>Express</strong>-<strong>PCI</strong>Bridges must not generate (propagate) Locked Requests from <strong>PCI</strong> to <strong>PCI</strong> <strong>Express</strong>,but are required for deadlock prevention to support Locked Requests from <strong>PCI</strong><strong>Express</strong> to <strong>PCI</strong>.• The <strong>PCI</strong> <strong>Express</strong> Port of <strong>PCI</strong> <strong>Express</strong>-<strong>PCI</strong> Bridge must comply with the flowcontrol specification within this document.• The <strong>PCI</strong> <strong>Express</strong> Port of <strong>PCI</strong> <strong>Express</strong>-<strong>PCI</strong> Bridge must comply with the Link-leveldata integrity specification within this document.1.4. <strong>PCI</strong> <strong>Express</strong> Fabric Topology ConfigurationThe <strong>PCI</strong> <strong>Express</strong> Configuration model supports two mechanisms:• <strong>PCI</strong> compatible configuration mechanism: The <strong>PCI</strong> compatible mechanismsupports 100% binary compatibility with <strong>PCI</strong> 2.3 or later aware operating systemsand their corresponding bus enumeration and configuration software.• <strong>PCI</strong> <strong>Express</strong> enhanced configuration mechanism: The enhanced mechanism isprovided to increase the size of available configuration space and to optimize accessmechanisms.Each <strong>PCI</strong> <strong>Express</strong> Link is mapped through <strong>PCI</strong>-to-<strong>PCI</strong> Bridge structure and has alogical <strong>PCI</strong> bus associated with it. A <strong>PCI</strong> <strong>Express</strong> Link is represented using a <strong>PCI</strong>-to-<strong>PCI</strong> Bridge structure and may either be a <strong>PCI</strong> <strong>Express</strong> Root Complex port, a Switchupstream port, or a Switch downstream port. The Root Port is a <strong>PCI</strong>-to-<strong>PCI</strong> bridgestructure that originates a <strong>PCI</strong> <strong>Express</strong> Hierarchy domain from a <strong>PCI</strong> <strong>Express</strong> RootComplex. Logical devices are mapped into configuration space such that each willrespond to a particular device number.34


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.01.5. <strong>PCI</strong> <strong>Express</strong> Layering OverviewThis document specifies the architecture in terms of three discrete logical layers: theTransaction Layer, the Data Link Layer, and the Physical Layer. Each of these layers isdivided into two sections: one that processes outbound (to be transmitted) informationand one that processes inbound (received) information, as shown in Figure 1-4.The fundamental goal of this layering definition is to facilitate the reader’s understandingof the specification. Note that this layering does not imply a particular <strong>PCI</strong> <strong>Express</strong>implementation.TransactionTransactionData LinkPhysicalLogical Sub-blockData LinkPhysicalLogical Sub-blockElectrical Sub-blockElectrical Sub-blockRXTXRXTXOM13753Figure 1-4: High-Level Layering Diagram<strong>PCI</strong> <strong>Express</strong> uses packets to communicate information between components. Packets areformed in the Transaction and Data Link Layers to carry the information from thetransmitting component to the receiving component. As the transmitted packets flowthrough the other layers, they are extended with additional information necessary to handlepackets at those layers. At the receiving side the reverse process occurs and packets gettransformed from their Physical Layer representation to the Data Link Layer representationand finally (for Transaction Layer Packets) to the form that can be processed by theTransaction Layer of the receiving device. Figure 1-5 shows the conceptual flow oftransaction level packet information through the layers.35


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0FramingSequenceNumberHeader Data CRC FramingTransaction LayerData Link LayerPhysical LayerFigure 1-5: Packet Flow Through the LayersOM13754Note that a simpler form of packet communication is supported between two Data LinkLayers (connected to the same Link) for the purpose of Link management.1.5.1. Transaction LayerThe upper layer of the architecture is the Transaction Layer. The Transaction Layer’sprimary responsibility is the assembly and disassembly of Transaction Layer Packets (TLP).TLP are used to communicate transactions, such as read and write, as well as certain types ofevents. The Transaction Layer is also responsible for managing credit-based flow controlfor TLP.Every request packet requiring a response packet is implemented as a split transaction. Eachpacket has a unique identifier that enables response packets to be directed to the correctoriginator. The packet format supports different forms (memory, I/O, configuration, andmessage) of addressing depending on the type of the transaction. The Packets may also haveattributes such as “no-snoop,” and “relaxed-ordering” which may be used to optimally routethese packets through the system.The transaction layer supports four address spaces: it includes the three <strong>PCI</strong> address spaces(memory, I/O, and configuration) and adds a Message Space. This specification uses theMessage Signaled Interrupt concept as a primary method for interrupt processing and usesMessage Space to support all prior side-band signals, such as interrupts, power-managementrequests, and so on, as in-band Message transactions. You could think of <strong>PCI</strong> <strong>Express</strong>Message transactions as “virtual wires” since their effect is to eliminate the wide array ofsideband signals currently used in a platform implementation.1.5.2. Data Link LayerThe middle layer in the stack, the Data Link Layer, serves as an intermediate stage betweenthe Transaction Layer and the Physical Layer. Responsibilities of Data Link Layer includeLink management, error detection, and error correction.The transmission side of the Data Link Layer accepts TLP assembled by the TransactionLayer, calculates and applies data protection code and TLP sequence number, and submitsthem to Physical Layer for transmission across the Link. The receiving Data Link Layer isresponsible for checking the integrity of received TLP and for submitting them to theTransaction Layer for further processing. On detection of TLP error(s), this layer is36


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0responsible for requesting retransmission of TLP until information is correctly received, orthe Link is determined to have failed.The Data Link Layer also generates and consumes packets that are used for Linkmanagement functions. To differentiate these packets from those used by the TransactionLayer (TLP), the term Data Link Layer Packet (DLLP) will be used when referring topackets that are generated and consumed at the Data Link Layer.1.5.3. Physical LayerThe Physical Layer includes all circuitry for interface operation, including driver and inputbuffers, parallel-to-serial and serial-to-parallel conversion, PLL(s), and impedance matchingcircuitry. It includes also logical functions related to interface initialization and maintenance.The Physical Layer exchanges information with the Data Link Layer in an implementationspecificformat. This layer is responsible for converting information received from DataLink Layer in to an appropriate serialized format and transmitting it across the <strong>PCI</strong> <strong>Express</strong>Link at a frequency and width compatible with the remote device.The <strong>PCI</strong> <strong>Express</strong> architecture has “hooks” to support future performance enhancements viaspeed upgrades and advanced encoding techniques. The future speeds, encoding techniquesor media may only impact the physical layer definition.1.5.4. Layer Functions and Services1.5.4.1. Transaction Layer ServicesThe Transaction Layer, in the process of generating and receiving TLP, exchanges FlowControl information with its complementary Transaction Layer on the other side of theLink. It is also responsible for supporting both software and hardware-initiated powermanagement.Initialization and configuration functions require the Transaction Layer to:• Store Link configuration information generated by the processor or management device• Store Link capabilities generated by Physical Layer hardware negotiation of widthA Transaction Layer’s Packet generation and processing services require it to:• Generate TLP from device core Requests• Convert received Request TLP into Requests for the device core• Convert received Completion Packets into a payload, or status information, deliverableto the core• Capability to generate “no-snoop required” transactions• Detect unsupported TLP and invoke appropriate mechanisms for handling them• Transaction level support for the switching and advanced communication applications37


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0• If end-to-end data integrity is supported, generate the end-to-end data integrity CRC andupdate the TLP header accordingly.Flow control services:• The Transaction Layer tracks flow control credits for TLP across the Link.• Transaction credit status is periodically transmitted to the remote Transaction Layerusing transport services of the Data Link Layer.• Remote Flow Control information is used to throttle TLP transmission.Ordering rules:• <strong>PCI</strong>/<strong>PCI</strong>-X compliant producer consumer ordering model• Extensions to support relaxed orderingPower management services:• ACPI/<strong>PCI</strong> power management, as dictated by system software.• Hardware-controlled autonomous power management minimizes power during full-onpower states.Virtual Channels and Traffic Class:• The combination of Virtual Channel mechanism and Traffic Class identificationis provided to support differentiated services and QoS support for certain classof applications.• Virtual Channels: Virtual Channels provide a means to support multipleindependent logical data flows over a given common physical resources of theLink. Conceptually this involves multiplexing different data flows onto a singlephysical Link.• Traffic Class: The Traffic Class is a Transaction Layer Packet label that istransmitted unmodified end-to-end through the fabric. At every service point(e.g. Switch) within the fabric, Traffic Class labels are used to apply appropriateservicing policies. Packets with different labels do not have orderingrequirements among each other and that allows independent traffic flows that arenot subject of global blocking conditions.38


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.01.5.4.2. Data Link Layer ServicesThe Data Link Layer is responsible for reliably exchanging information with its counterparton the opposite side of the Link.Initialization and power management services:• Accept power state Requests from Transaction Layer and convey to the PhysicalLayer• Convey active/reset/disconnected/power managed state to the Transaction LayerData protection, error checking, and retry services:• CRC generation• Transmitted TLP storage for Data Link level retry• Error checking• TLP acknowledgment and retry messages• Error indication for error reporting and logging• Link ACK timeout mechanism1.5.4.3. Physical Layer ServicesInterface initialization, maintenance control, and status tracking:• Reset/Hot Plug control/status• Interconnect power management• Width and Lane mapping negotiation• Polarity reversalSymbol and special ordered-set generation:• 8-bit/10-bit encoding/decoding.• Embedded clock tuning and alignmentSymbol transmission and alignment:• Transmission circuits• Reception circuits• Elastic buffer at receiving side• Multi-Lane de-skew (for widths > x1) at receiving side39


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0System DFT mechanism(s):• Loop-back mode1.5.4.4. Inter-Layer Interfaces1.5.4.4.1. Transaction/Data Link InterfaceThe Transaction to Data Link interface provides:• Byte or multi-byte data to be sent across the Linko Local TLP-transfer handshake mechanismo TLP boundary information• Requested power state for the LinkThe Data Link to Transaction interface provides:• Byte or multi-byte data received from the <strong>PCI</strong> <strong>Express</strong> Link• TLP framing information for the received byte• Actual power state for the Link• Link status information1.5.4.4.2. Data Link/Physical InterfaceThe Data Link to Physical interface provides:• Byte or multi-byte wide data to be sent across the Linko Data transfer handshake mechanismo TLP and DLLP boundary information for bytes• Requested power state for the LinkThe Physical to Data Link interface provides:• Byte or multi-byte wide data received from the <strong>PCI</strong> <strong>Express</strong> Link• TLP and DLLP framing information for data• Indication of errors detected by the Physical Layer• Actual power state for the Link• Connection status information40


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.01.6. Advanced Peer-to-Peer CommunicationOverviewAdvanced peer-to-peer communication is an optional functionality used to support peer-topeercommunications across one or more hierarchies that constitute a single fabric instance.Figure 1-6 shows an example of a fabric with multiple hierarchies.CPUCPUCPUCPUEndpoint 1Root ComplexRoot ComplexEndpoint 2Switch1AdvancedSwitchNCross-LinkAdvancedSwitch1SwitchJSwitch2<strong>PCI</strong> <strong>Express</strong>-<strong>PCI</strong> BridgeEndpoint mEndpoint 1Endpoint k-1Endpoint 3Endpoint m-1Endpoint 2Endpoint kEndpoint 4Host AHost BFigure 1-6: Advanced Peer-to-Peer CommunicationOM13755The primary attributes/requirements are:• Push-only communications paradigm – Endpoints use a “mailbox” approach toexchange control and data packets.• Optional support for multicast packet replication within advanced Switch components.Multicast allows an Endpoint to inject a single packet into the fabric targeted at amulticast group identifier and have the advanced Switch components replicate thispacket to all participating Endpoints. This eliminates the need for the injectingEndpoint having to know all of the participating Endpoints within the multicast group.• Uses a 16-bit global address space extender to uniquely identify an Endpoint port or amulticast group within an I/O fabric. A global address is referred to as a RouteIdentifier.• Each hierarchy defines an individual partition within the fabric.o At any given time, one Root Complex (RC) controls a partition.o Multiple partitions may be collapsed into a single partition by assigning ownership ofthe partition to one of the active RC. For example, dual-redundant fabrics that areinter-linked such that either RC may take over for the other should one RC fail.41


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0• Cross-Link devices are used to facilitate non-tree enumeration and peer-to-peercommunications. Software, using Cross-Link devices connected between Switches,establishes peer-to-peer connections between <strong>PCI</strong> <strong>Express</strong> agents residing either withinthe same hierarchy, or between agents that reside within different <strong>PCI</strong> <strong>Express</strong>hierarchies.• Communication between two hierarchies is not allowed until both the hierarchies areinitialized and configured. At this point, communications software can configure theLinks between the hierarchies (contained within advanced Switch components). Duringthis step, RIDs are assigned.• RC, Switches, and Endpoints that support advanced peer-to-peer communication mustsupport all mandatory <strong>PCI</strong> <strong>Express</strong> functionality to ensure interoperability with base RC,Switches, and Endpoints.• A Switch that supports advanced peer-to-peer communication must translate RIDforwarded packets to address-based routed packets if the attached RC or Endpoint doesnot support advanced peer-to-peer communication.The scope of the information provided in this base specification is limited to providingdefinition of basic primitives required to support advanced <strong>PCI</strong> <strong>Express</strong> packet switchingapplications. Detailed description of typical usage models, and operation of the capabilitiesenabled by these optional features are beyond the scope of this document and will bedescribed in a separate document, called Advanced <strong>PCI</strong> <strong>Express</strong> Packet Switching <strong>Specification</strong>, acompanion specification to the <strong>PCI</strong> <strong>Express</strong> <strong>Base</strong> <strong>Specification</strong>.42


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.022. Transaction Layer <strong>Specification</strong>2.1. Transaction Layer OverviewTransactionTransactionData LinkPhysicalLogical Sub-blockData LinkPhysicalLogical Sub-blockElectrical Sub-blockElectrical Sub-blockRXTXRXTXOM14295Figure 2-1: Layering Diagram Highlighting the Transaction LayerOne of the primary goals of the <strong>PCI</strong> <strong>Express</strong> Architecture is to maximize the efficiency ofcommunication between devices. To this end, the Transaction Layer implements:• A pipelined full split-transaction protocol• Mechanisms for differentiating the ordering and processing requirements ofTransaction Layer Packets (TLPs)• Credit-based flow control which eliminates wasted Link bandwidth due to retries• Optional support for data poisoning and end-to-end data integrity detection.43


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0The Transaction Layer comprehends the following:• TLP construction and processing• Association of <strong>PCI</strong> <strong>Express</strong> transaction-level mechanisms with device resourcesincluding:o Flow Controlo Virtual Channel management• Rules for ordering and management of TLPso Including Traffic Class differentiationThis chapter specifies the behaviors associated with the Transaction Layer.2.2. Address Spaces, Transaction Types, and UsageTransactions form the basis for information transfer between a Requester and Completer.Four address spaces are defined within the <strong>PCI</strong> <strong>Express</strong> architecture, and differentTransaction types are defined, each with its own unique intended usage, within each addressspace as shown in Table 2-1.Table 2-1: Transaction Types for Different Address SpacesAddress Space Transaction Types Basic UsageMemoryReadWriteTransfer data to/from a memory-mappedlocation.I/OReadWriteTransfer data to/from an I/O-mapped locationConfigurationMessageReadWrite<strong>Base</strong>lineVendor– definedAdvanced Switching2.2.1. Memory TransactionsMemory Transactions include the following types:• Read Request/Completion• Write RequestMemory Transactions use two different address formats:• Short Address Format: 32-bit address• Long Address Format: 64-bit addressDevice configuration/setupFrom event signaling mechanism to generalpurpose messaging44


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0Details about the rules associated with usage of these two address formats and the associatedTransaction Layer Packet (TLP) formats are outlined in Section 2.7.2.2.2. I/O Transactions<strong>PCI</strong> <strong>Express</strong> supports I/O Space for compatability with legacy devices which require theiruse. Future revisions of this specification are expected to depreciate the use of I/O Space.I/O Transactions include the following types:• Read Request/Completion• Write Request/CompletionI/O Transactions use a single address format:• Short Address Format: 32-bit addressDetails about the rules associated with I/O address, and the associated TLP formats areoutlined in Section 2.7.2.2.3. Configuration TransactionsConfiguration Transactions are used to access configuration registers of <strong>PCI</strong> <strong>Express</strong>devices. Mechanisms for generating these Transactions are platform specific.Configuration Transactions include the following types:• Read Request/Completion• Write Request/CompletionDetails about the rules associated with configuration address and the associated Packetformats are outlined in Section 2.7.2.2.4. Message TransactionsThe Message Transactions, or simply Messages, support two primary usage models:• In-band communication of events between <strong>PCI</strong> <strong>Express</strong> devices• Peer-to-peer communication between <strong>PCI</strong> <strong>Express</strong> devicesThese two usage models map to two different groups of Messages in <strong>PCI</strong> <strong>Express</strong>. The firstgroup supports the first usage model. The second group, which is associated with AdvancedSwitching support supports the second usage model.45


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0In the terms of how Message Requests are routed, this specification differentiates betweenthe following two routing mechanisms:• Implied Routing – without specific address/routing information contained withinMessage packet header• Destination is the other component on the Link or• Destination is the Root Complex.• Message is broadcast from the Root Complex to all downstream devices.• Explicit Routing – with specific address/routing information contained withinMessage packet header• Destination is another device within local <strong>PCI</strong> <strong>Express</strong> hierarchy or• Destination is a device within a different <strong>PCI</strong> <strong>Express</strong> hierachyNote that the explicit routing mechanism is used by the Messages that are definedfor support of advanced switching applications.<strong>PCI</strong> <strong>Express</strong> provides support for vendor-defined messages using specific reserved codesgiven in this document. The definition of specific vendor-defined messages is outside thescope of this document.2.2.4.1. Types of MessagesMessages defined within the <strong>PCI</strong> <strong>Express</strong> specification include the following types ofMessages:• System Management Message Groupo Interrupt Signalingo Error Signalingo Power Managemento Locked Transaction Supporto Payload Definedo Vendor Specific Messageso Hot Plug Signaling• Advanced Switching Support Message Groupo Data Packet Messageso Signal Packet Messages46


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.02.2.4.2. Vendor-defined MessagesThis specification establishes a standard framework within which vendors can specify theirown Vendor-defined Messages tailored to fit the specific requirements of their platforms(see Sections 2.8.1.5 and 2.8.1.7).Note that these Vendor-defined messages are not guaranteed to be interoperable withcomponents from different vendors.2.3. Packet Format OverviewTransactions consist of Requests and Completions, which are communicated using packets.Figure 2-2 shows a high level view of a Transaction Layer Packet, consisting of a header, forsome types of packets, a data payload, and an optional TLP digest. The following sectionsof this chapter will define the detailed structure of packet headers.Byte 0 >HeaderByte J >Byte K >Byte K+4 >Data Byte 0Data(included when applicable)TLP Digest (optional)Data Byte K+3Figure 2-2: Generic Transaction Layer Packet FormatOM13756Depending on the type of a packet, the header for that packet will include some of thefollowing types of fields:• Format of the packet• Type of the packet• Length for any associated data• Transaction Descriptor, including:o Transaction IDo Attributeso Traffic Class• Address/routing information• Byte enables• Message encoding• Completion status47


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.02.4. Transaction Descriptor2.4.1. OverviewThe Transaction Descriptor is a mechanism for carrying Transaction information betweenthe Requester and the Completer. Transaction Descriptors are composed of three fields:• Transaction ID – identifies outstanding Transactions• Attributes field – specifies characteristics of the Transaction• Traffic Class (TC) field – associates Transaction with type of required serviceFigure 2-3 shows the fields of the Transaction Descriptor. Note that these fields are showntogether to highlight their relationship as parts of a single logical entity. The fields are notcontiguous in the packet header.Transaction ID15:0 7:0 1:0 2:0Requester ID Tag Attributes TrafficClassFigure 2-3: Transaction DescriptorOM137572.4.2. Transaction Descriptor –Transaction ID FieldThe Transaction ID Field consists of two major sub-fields: Requester ID and Tag as shownin Figure 2-4.Requester IDTag7:04:02:07:0Bus NumberDeviceNumberFunctionNumberFigure 2-4: Transaction IDOM1375848


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0• Tag[7:0] is a 8-bit field generated by each Requestor, and it must be unique for alloutstanding Requests that require a Completion for that RequesteroooBy default, the maximum number of outstanding Requests perdevice/function shall be limited to 32, and only the lower 5 bits of the Tagfield are used with the remaining 3 required to be all 0’sIf the Extended Tag Field Enable bit (see Section 5.8.4) is set, the maximumis increased to 256, and the entire Tag field is usedReceiver/Completer behavior is undefined if multiple Requests are issuednon-unique Tag values• For Requests which do not require Completion (Posted Requests), the value in theTag[7:0] field is undefined and may contain any valueoFor Posted Requests, the value in the Tag[7:0] field must not affect Receiverprocessing of the Request• Requester ID and Tag combined form a global identifier for each Transaction withina Hierarchy.• Transaction ID is included with all Requests and Completions.• The Requester ID field is a 16-bit value that is unique for every <strong>PCI</strong> <strong>Express</strong>function.• Functions must capture the Bus and Device Numbers supplied with allConfiguration Requests (Type 0) completed by the function and supply thesenumbers in the Bus and Device Number fields of the Requester ID for all Requestsinitiated by the device/function.oNote that the Bus Number and Device Number may be changed at run time,and so it is necessary to re-capture this information with each and everyConfiguration Request.o Exception: The assignment of bus numbers to the logical devices within aRoot Complex may be done in an implementation specific way.Example: When a device (or function of a multi-function device) receives a Type 0Configuration Read or Write Request, the device comprehends that it is the intendedrecipient of the Request because it is a Type 0 Request. The routing informationfields of the Request include the recipient’s Bus Number and Device Number values(Figure 2-12). These values are captured by the device and used to generate theRequester ID field.• Prior to the initial Configuration Write to a device, the device is not permitted toinitiate Requests.ooException: Logical devices within a Root Complex are permitted to initiateRequests prior to software initiated configuration for accesses to system bootdevice(s).Note that this rule and the exception are consistent with the existing <strong>PCI</strong>model for system initialization and configuration.49


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0• Each function associated with a logical device must be designed to respond to aunique Function Number for Configuration Requests addressing that logical device.Note: Each logical device may contain up to eight logical functions.• A Switch must forward Requests without modifying the Transaction ID• A <strong>PCI</strong> <strong>Express</strong>-<strong>PCI</strong>-X Bridge operating in conventional <strong>PCI</strong> mode as well as a <strong>PCI</strong><strong>Express</strong>-<strong>PCI</strong> Bridge must forward Requests initiated on <strong>PCI</strong> using the Bus Number,Device Number, and Function Number associated with the Bridge to form theRequester ID.Implementation Note: Increasing Outstanding RequestsTo increase the maximum possible number of outstanding Requests requiring Completionbeyond 256, a single function device may, if the Phantom Function Number Enable bit is set(see Section 5.8.4), use Function Numbers 1-7 to logically extend the Tag identifier, allowingup to a 8-fold increase in the maximum number of outstanding RequestsUnclaimed function numbers are termed “Phantom Function Numbers (PFN).”2.4.3. Transaction Descriptor – Attributes FieldThe Attributes Field is used to provide additional information that allows modification ofthe default handling of Transactions. These modifications apply to different aspects ofhandling the Transactions within the system, such as:• Ordering• Hardware coherency management (snoop)Note that attributes are hints that allow for optimizations in the handling of traffic. Level ofsupport is dependent on target applications of particular <strong>PCI</strong> <strong>Express</strong> peripherals andplatform building blocks.AttributesRelaxedOrderingSnoopNotRequiredOM13759Figure 2-5: Attributes Field of Transaction Descriptor50


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.02.4.3.1. Relaxed Ordering AttributeTable 2-2 defines the states of the Relaxed Ordering attribute field. This Attribute isdiscussed in Section 2.5.Table 2-2: Ordering AttributesOrderingAttributeOrdering TypeOrdering Model0 Default Ordering <strong>PCI</strong> Producer/Consumer-based Ordering Model1 Relaxed Ordering <strong>PCI</strong>-X Relaxed Ordering Model2.4.3.2. “Snoop Not Required” AttributeTable 2-3 defines the states of the “Snoop Not Required” attribute field. Note that the“Snoop Not Required” attribute does not alter Transaction ordering.Table 2-3: Cache Coherency Management AttributeSnoop Not RequiredAttribute0 DefaultCache CoherencyManagement Type1 Snoop Not RequiredCoherency ModelHardware enforced cache coherencyexpectedHardware enforced cache coherencynot expected2.4.4. Transaction Descriptor – Traffic Class FieldThe Traffic Class (TC) is a 3-bit field that allows differentiation of transactions into eighttraffic classes.Together with the <strong>PCI</strong> <strong>Express</strong> Virtual Channel support, the TC mechanism is afundamental element for enabling differentiated traffic servicing. Every <strong>PCI</strong> <strong>Express</strong>Transaction Layer Packet uses TC information as an invariant label that is carried end to endwithin the <strong>PCI</strong> <strong>Express</strong> fabric. As the packet traverses across the fabric, this information isused at every Link and within each Switch element to make decisions with regards to properservicing of the traffic. A key aspect of servicing is the routing of the packets based on theirTC labels through corresponding Virtual Channels. (Section 2.6 covers the details of the VCmechanism.)51


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0Table 2-4 defines the TC encodings:Table 2-4 Definition of TC Field EncodingsTC Field Value Definition000 TC0: Best Effort service class (General Purpose I/O)(Default TC – must be supported by every <strong>PCI</strong> <strong>Express</strong> device)001 – 111 TC1-TC7: Differentiated service classes(Differentiation based on Weighted-Round-Robin and/or Priority)It is up to the system software to determine TC labeling and TC/VC mapping in order toprovide differentiated services that meet target platform requirements. For example, for aplatform that supports isochronous data traffic, TC7 is reserved for isochronoustransactions and TC7 must be mapped to the VC with the highest weight/priority. SeeSection 7.3.4 for details on isochronous support.The concept of Traffic Class applies only within the <strong>PCI</strong> <strong>Express</strong> interconnect fabric.Specific requirements of how <strong>PCI</strong> <strong>Express</strong> TC service policies are translated into policies onnon-<strong>PCI</strong> <strong>Express</strong> interconnects or within Root Complex or Endpoints is outside of thescope of this specification.2.5. Transaction OrderingTable 2-5 defines the ordering requirements for <strong>PCI</strong> <strong>Express</strong> Transactions. The rulesdefined in this table apply uniformly to all types of Transactions on <strong>PCI</strong> <strong>Express</strong> includingMemory, I/O, Configuration, and Messages. The ordering rules defined in this table applywithin a single Traffic Class (TC). There is no ordering among transactions within differentTCs. Note that this also implies that there is no ordering required between traffic that flowsthrough different Virtual Channels since transactions with the same TC label are not allowedto be mapped to multiple VCs on any <strong>PCI</strong> <strong>Express</strong> Link.For Table 2-5, the columns represent a first issued Transaction, and the rows represent asubsequently issued Transaction. The table entry indicates the ordering relationship betweenthe two Transactions. The table entries are defined as follows:• Yes–the second Transaction must be allowed to pass the first to avoid deadlock.(When blocking occurs, the second Transaction is required to pass the firstTransaction. Fairness must be comprehended to prevent starvation.)• Y/N–there are no requirements. The second Transaction may optionally pass thefirst Transaction or be blocked by it.• No–the second Transaction must not be allowed to pass the first Transaction. Thisis required to support Producer-Consumer strong ordering model.52


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0Table 2-5: Ordering Rules Summary TablePosted Request Non-Posted Request CompletionRow Pass Column?Memory Write orMessage Request(Col 2)ReadRequest(Col 3)I/O orConfigurationWrite Request(Col 4)ReadCompletion(Col 5)I/O orConfigurationWriteCompletion(Col 6)PostedRequestMemory Writeor MessageRequest(Row A)a) Nob) Y/NYes Yes a) Y/Nb) Yesa) Y/Nb) YesNon-PostedRequestCompletionRead Request(Row B)I/O orConfigurationWrite Request(Row C)ReadCompletion(Row D)I/O orConfigurationWriteCompletion(Row E)No Y/N Y/N Y/N Y/NNo Y/N Y/N Y/N Y/Na) NoYes Yes a) Y/N Y/Nb)Y/Nb) NoY/N Yes Yes Y/N Y/NExplanation of entries in Table 2-5:A2 a A Memory Write or Message Request with the Relaxed Ordering Attribute bitclear (‘0’) must not pass any other Memory Write or Message Request.A2 b A Memory Write or Message Request with the Relaxed Ordering Attribute bitset (‘1’) is permitted to pass any other Memory Write or Message Request.A3, A4 A Memory Write or Message Request must be allowed to pass Read Requestsand I/O or Configuration Write Requests to avoid deadlocks.A5, A6 a Endpoints, Switches, and Root Complex may allow Memory Write andMessage Requests to pass Completions or be blocked by Completions.A5, A6 b <strong>PCI</strong> <strong>Express</strong> to <strong>PCI</strong> Bridges and <strong>PCI</strong> <strong>Express</strong> to <strong>PCI</strong>-X Bridges, whenoperating <strong>PCI</strong> segment in conventional mode, must allow Memory Write andMessage Requests to pass Completions traveling in the <strong>PCI</strong> <strong>Express</strong> to <strong>PCI</strong>direction (Primary side of Bridge to Secondary side of Bridge) to avoiddeadlock.B2, C2 These Requests cannot pass a Memory Write or Message Request. Thispreserves strong write ordering required to support Producer/Consumer usagemodel.53


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0B3, B4,C3, C4B5, B6,C5, C6D2 aD2 bD3, D4,E3, E4D5 aD5 bD6E2Read Requests and I/O or Configuration Write Requests are permitted to beblocked by or to pass other Read Requests and I/O or Configuration WriteRequests.These Requests are permitted to be blocked by or to pass Completions.If the Relaxed Ordering attribute bit is not set, then a Read Completion cannotpass a previously enqueued Memory Write or Message Request.If the Relaxed Ordering attribute bit is set, then a Read Completion ispermitted to pass a previously enqueued Memory Write or Message Request.Completions must be allowed to pass Read and I/O or Configuration WriteRequests to avoid deadlocks.Read Completions associated with different Read Requests are allowed to beblocked by or to pass each other.Read Completions for one Request (will have the same Transaction ID) mustreturn in address order.Read Completions are permitted to be blocked by or to pass I/O orConfiguration Write Completions.I/O or Configuration Write Completions are permitted to be blocked by or topass Memory Write and Message Requests. Such Transactions are actuallymoving in the opposite direction and, therefore, have no ordering relationship.E5, E6 I/O or Configuration Write Completions are permitted to be blocked by or topass Read Completions and other I/O or Configuration Write Completions.Additional Rules:• <strong>PCI</strong> <strong>Express</strong> Switches are permitted to allow a Memory Write or Message Requestwith the Relaxed Ordering bit to set pass any previously posted Memory Write orMessage Request moving in the same direction. Switches must forward the RelaxedOrdering attribute unmodified. The Root Complex is also permitted to allow databytes within the Request to be written to system memory in any order. (The bytesmust be written to the correct system memory locations. Only the order in whichthey are written is unspecified). <strong>PCI</strong> <strong>Express</strong>-<strong>PCI</strong>-X Bridge devices must forwardthe Relaxed Ordering attribute unmodified but must treat all transactions as if theRelaxed Ordering attribute bit is not set.Note: This maintains compatibility with <strong>PCI</strong>-X relaxed ordering usage models andcorresponding rules. For more details, refer to the <strong>PCI</strong>-X Addendum to the <strong>PCI</strong> LocalBus <strong>Specification</strong>, Rev 1.0a.54


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0• For Root Complex and Switch, Memory Write combining (as defined in the <strong>PCI</strong><strong>Specification</strong>) is prohibited.Note: This is required so that devices can be permitted to optimize their receivebuffer and control logic for Memory Write sizes matching their natural expectedsizes, rather than being required to support the maximum possible Memory Writepayload size.• Combining of Memory Read Requests, and/or Completions for different Requests isprohibited.• The “Snoop Not Required” bit does not affect the required ordering behavior.Note: Main memory writes from the CPU accepted by the Root Complex arearchitecturally part of the system memory image; the Root Complex must ensurecoherency for subsequent device reads from main memory.Implementation Note: Large Memory Reads vs. Multiple Smaller Memory ReadsNote that the rule associated with entry D5b in Table 2-5 ensures that for a single MemoryRead Request serviced with multiple Completions, the Completions will be returned inaddress order. However, the rule associated with entry D5a permits that differentCompletions associated with distinct Memory Read Requests may be returned in a differentorder than the issue order for the Requests. For example, if a device issues a single MemoryRead Request for 256B from location 1000h, and the Request is returned using twoCompletions (see Section 2.7.6.2.1) of 128B each, it is guaranteed that the two Completionswill return in the following order:1 st Completion returned: Data from 1000h to 107Fh.2 nd Completion returned: Data from 1080h to 10FFh.However, if the device issues two Memory Read Requests for 128B each, first to location1000h, then to location 1080h, the two Completions may return in either order:1 st Completion returned: Data from 1000h to 107Fh.2 nd Completion returned: Data from 1080h to 10FFh.– or –1 st Completion returned: Data from 1080h to 10FFh.2 nd Completion returned: Data from 1000h to 107Fh.55


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.02.6. Virtual Channel (VC) MechanismThe <strong>PCI</strong> <strong>Express</strong> Virtual Channel (VC) mechanism provides support for carryingthroughout the <strong>PCI</strong> <strong>Express</strong> fabric traffic that is differentiated using TC labels. Thefoundation of VCs are independent fabric resources (queues/buffers and associated controllogic). These resources are used to move information across <strong>PCI</strong> <strong>Express</strong> Links with fullyindependent flow-control between different VCs. This is key to solving the problem offlow-control induced blocking where a single traffic flow may create a bottleneck for alltraffic within the system.Traffic is associated with VCs by mapping packets with particular TC labels to theircorresponding VCs. The <strong>PCI</strong> <strong>Express</strong> VC mechanism allows flexible mapping of TCs ontothe VCs. In the simplest form, TCs can be mapped to VCs on a 1:1 basis. To allowperformance/cost tradeoffs, <strong>PCI</strong> <strong>Express</strong> provides the capability of mapping multiple TCsonto a single VC. Section 2.6.3 covers details of TC to VC mapping.A Virtual Channel is established when one or multiple TCs are associated with physical VCresource designated by VC ID. This process is controlled by the <strong>PCI</strong> <strong>Express</strong> configurationsoftware as described in Sections 5.11 and 7.3.Support for TCs and VCs beyond default TC0/VC0 pair is optional. The association ofTC0 with VC0 is fixed, i.e. “hardwired”, and must be supported by all <strong>PCI</strong> <strong>Express</strong>components. Therefore the baseline TC/VC setup does not require any VC-specifichardware or software configuration. In order to ensure interoperability, <strong>PCI</strong> <strong>Express</strong>components that do not implement the optional <strong>PCI</strong> <strong>Express</strong> Virtual Channel CapabilityStructure must obey the following rules:• A Requester must only generate requests with TC0 label. (Note that if it initiatesrequests with a TC label other than TC0, the requests may be treated as illegal by thecomponent on the other side of the Link that implements the extended VCcapability and applies TC filtering.)• A Completer must accept requests with TC label other than TC0, and must preservethe TC label, i.e., any completion that it generates must have the same TC label asthe label of the request.• A Switch must map all TCs to VC0 and must forward all transactions regardless ofthe TC label.A <strong>PCI</strong> <strong>Express</strong> Endpoint or Root Complex that intends to be a Requester to issue requestswith TC label other than TC0 must implement the <strong>PCI</strong> <strong>Express</strong> Virtual Channel CapabilityStructure, even if it only supports the default VC. This is required in order to enablemapping of TCs beyond the default configuration. It must follow the TC/VC mappingrules according to the software programming of the VC Capability Structure.Figure 2-6 illustrates the concept of Virtual Channel. The enlarged area shows VC resourcesin one direction (Switch to RC). Conceptually, traffic that flows through VCs is muxed ontoa common physical Link resource on the transmit side and de-muxed into separate VC pathson the receive side.56


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0PacketsRoot ComplexVC0VCnVC0VCnLinkSwitchsingle Linkwith 2 VC'sVC0VCnVC0VCnPacketsComponentCComponentDComponentEComponentAComponentBDefault VC (VC0)Another VCFigure 2-6: Virtual Channel Concept – An IllustrationOM13760Internal to the Switch every Virtual Channel requires dedicated physical resources(queues/buffers and control logic) that support independent traffic flows inside the Switch.Figure 2-7 shows conceptually the VC resources within the Switch (shown in Figure 2-6)that are required to support traffic flow in the upstream direction.VC #0VC #0SwitchDownstreamPortsVC #0VC #1VC #1UpstreamPortOM13761Figure 2-7: Virtual Channel Concept – Switch Internals (Upstream Flow)57


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.02.6.1. Virtual Channel Identification (VC ID)A <strong>PCI</strong> <strong>Express</strong> Port can support up to eight Virtual Channels. These VCs are uniquelyidentified using the Virtual Channel Identification (VC ID) mechanism.Note that TLPs do not include VC ID information. The association of TLPs with VC IDfor the purpose of Flow Control accounting is done at each Port of the Link using TC to VCmapping as discussed in Section 2.6.3.All <strong>PCI</strong> <strong>Express</strong> Ports that support more than VC0 must provide the VC CapabilityStructure according to the definition in Section 5.11. Providing this extended structure isoptional for Ports that support only the default TC0/VC0 configuration. <strong>PCI</strong> <strong>Express</strong>configuration software is responsible for configuring Ports on both sides of the Link for amatching number of VCs. This is accomplished by scanning the <strong>PCI</strong> <strong>Express</strong> hierarchy andusing VC Capability registers associated with ports (that support more than default VC0) toestablish number of VCs for the Link. Rules for assigning VC ID for VC hardwareresources are as follows:• VC ID assignment must be unique per <strong>PCI</strong> <strong>Express</strong> Port – Same VC ID cannot beassigned to different VC hardware resources within the same Port.• VC ID assignment must be the same (matching in the terms of numbers of VCs andtheir IDs) for the two <strong>PCI</strong> <strong>Express</strong> Ports on both sides of a <strong>PCI</strong> <strong>Express</strong> Link.• VC ID 0 is assigned and fixed to the default VC.• For a <strong>PCI</strong> <strong>Express</strong> Port that supports the VC Capability Structure, the first VC hardwareresource must be the default VC.• VC ID assignment must be in increasing order (but not necessarily contiguous) for a <strong>PCI</strong><strong>Express</strong> Port that supports multiple VCs.2.6.2. VC Support OptionsTo simplify the interoperability when configuring number of supported VCs per Link, the<strong>PCI</strong> <strong>Express</strong> specification limits the set of valid VC configuration options to: 1, 2, 4, and 8.Other VC configurations such as 3, 5, 6, and 7 are not allowed.58


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.02.6.3. TC to VC MappingEvery Traffic Class that is supported must be mapped to one of the Virtual Channels. Themapping of TC0 to VC0 is fixed.The mapping of TCs other than TC0 is system software specific. However, the mappingalgorithm must obey the following rules:• One or multiple TCs can be mapped to a VC• One TC must not be mapped to multiple VCs in any <strong>PCI</strong> <strong>Express</strong> Port.• TC/VC mapping must be identical for <strong>PCI</strong> <strong>Express</strong> Ports on both sides of a <strong>PCI</strong><strong>Express</strong> Link.• For any two TCs (TCb > TCa), TCb must be mapped to the same VC as TCa or bemapped to a VC with higher VC ID. It is not allowed to map TCb on a VC with a lowerVC ID than the one TCa is mapped to.Table 2-6 provides an example of TC to VC mapping.Table 2-6: TC to VC Mapping ExampleSupported VC ConfigurationsVC0VC0, VC1VC0-VC3VC0-VC7Notes on conventions:TC/VC Mapping OptionsTC(0-7)/VC0TC(0-6)/VC0, TC7/VC1TC(0-1)/VC0, TC(2-4)/VC1, TC(5-6)/VC2,TC7/VC3TC[0:7]/VC[0:7]• TCn/VCk = TCn mapped to VCk• TC(n-m)/VCk = all TCs in the range n-m mapped to VCk (i.e., to the same VC)• TC[n:m]/VC[n:m] = TCn/VCn, TCn +1 / VCn +1, ..., TCm/VCmFigure 2-8 provides a graphical illustration of TC to VC mapping in several different Linkconfigurations. For additional considerations on TC/VC mapping including symmetricaland asymmetrical mapping within <strong>PCI</strong> <strong>Express</strong> Switches, refer to Section 7.3.59


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0SwitchLinkEndpointTC[0:7]LinkVC0TC[0:7]TC[0:6]TC7VC0VC1TC[0:6]TC7MappingEndpointTC[0:1]LinkVC0TC[0:1]SwitchRootComplexTC[2:4]VC1TC[2:4]TC[5:6]VC2TC[5:6]LinkTC7EndpointVC3LinkTC7MappingTC[0:1]TC[2:4]TC[5:6]VC0VC1VC2TC[0:1]TC[2:4]TC[5:6]TC[0:1]VC0TC[0:1]TC7VC3TC7TC[2:4]VC1TC[2:4]TC[5:6]VC2TC[5:6]TC7VC3TC7Figure 2-8: An Example of TC/VC ConfigurationsOM137622.6.4. VC and TC RulesHere is a summary of key rules associated with the TC/VC mechanism:• All <strong>PCI</strong> <strong>Express</strong> devices must support general purpose I/O Traffic Class, i.e., TC0and must implement the default VC0.• Each Virtual Channel (VC) has independent Flow Control.• There are no ordering relationships required between different TCs• There are no ordering relationships required between different VCs• A Switch’s peer-to-peer capability applies to all Virtual Channels supported by theSwitch.• Transactions with TC that is not mapped to any enabled VC in a <strong>PCI</strong> <strong>Express</strong>Ingress Port are treated as malformed transaction by the receiving device.• For Switches, transactions with TC that is not mapped to any of enabled VCs in thetarget Egress Port are treated as illegal transaction.60


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0• For a Root Port, transactions with a TC that is not mapped to any of enabled VCs inthe target RCRB are treated as illegal transaction.• Switches must support independent TC/VC mapping configuration for each port.• Root Complex must support independent TC/VC mapping configuration for eachRCRB and the associated Root Ports.For more details on the VC and TC mechanisms, including configuration, mapping, andarbitration, refer to Chapter 7.3.2.7. Transaction Layer Protocol - Packet Definitionand Handling<strong>PCI</strong> <strong>Express</strong> uses a packet based protocol to exchange information between the TransactionLayers of the two components communicating with each other over the Link. <strong>PCI</strong> <strong>Express</strong>supports the following basic transaction types: Memory, I/O, Configuration, and Messages.Two addressing formats for Memory Requests are supported: 32 bit and 64 bit.Transactions are carried using Requests and Completions. Completions are used only whererequired, for example, to return read data, or to acknowledge Completion of I/O andConfiguration Write Transactions. Completions are associated with their correspondingRequests by the value in the Requester ID field of the Packet header.2.7.1. Transaction Layer Packet Definition Rules• All Transaction Layer Packets (TLPs) must start with one of the headers defined inthis section.oSome TLPs include data following the header as determined by the Fmt[1:0]field specified in the TLP header.• TLP data must be four-byte naturally aligned and in increments of four-Byte DoubleWords (DW).• All TLP headers include the following fields:o Fmt[1:0] – Specifies global Format of TLP:• 00 - 3DW header, no data• 01 - 4DW header, no data• 10 - 3DW header, with data• 11 - 4DW header, with data61


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0oType[4:0] – See Table 2-8 for type encodings• Both Fmt[1:0] and Type[4:0] must be decoded to determine specificsof TLP format.o Length[9:0] – Length of data payload in DW• 00 0000 0001 = 1DW• 00 0000 0010 = 2DW• …..• 11 1111 1111 = 1023DW• 00 0000 0000 = 1024DWo Permitted Fmt[1:0] and Type[4:0] field values are shown in Table 2-8.• All other encodings are reserved.o TD - '1' indicates presence of TLP “digest” in the form of a single DW at theend of the TLP (see Figure 2-2)o EP -• If TD=’1’, EP ='0' means TLP digest is used for data poisoning;EP='1' means TLP digest is used for and end-to-end CRC (ECRC)field• If TD=’0’, EP='0' means TLP is not poisoned, EP='1' means TLP ispoisoned• Thus, the combination of TD and EP is interpreted as shown inTable 2-762


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0Table 2-7: TD and EP Field ValuesTD EP TLPD DigestEnd-to-EndData IntegrityErrorForwarding0 0 Not present No No N/A0 1 Not present No Yes N/A1 0 Present No Yes1 1 Present Yes YesDigest ValueFFFFFFFFhOtherwiseValid ECRCSeeSection 2.10FFFFFFFFh 2OtherwiseCommentsPoisoned dataSeeSection 2.11Poisoned dataNot poisoneddataNo errorsPoisoned dataECRC errorDifferent types of TLPs are discussed in more detail in the following sections.Table 2-8: Fmt[1:0] and Type[4:0] Field EncodingsTLP TypeMRd 0001MRdLk 0001MWr 1011Fmt Type[1:0] 3 [4:0]Description0 0000 Memory Read Request0 0001 Memory Read Request– Locked0 0000 Memory Write RequestIORd 00 0 0010 I/O Read RequestIOWr 10 0 0010 I/O Write RequestCfgRd0 00 0 0100 Configuration Read Type 0CfgWr0 10 0 0100 Configuration Write Type 0CfgRd1 00 0 0101 Configuration Read Type 1CfgWr1 10 0 0101 Configuration Write Type 12 Note that FFFFFFFFh cannot occur as a valid ECRC value.3 Requests with two Fmt[1:0] values shown can use either 32b (the first value) or 64b (the second value)Addressing Packet formats.63


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0TLP TypeFmt Type[1:0] 3 [4:0]DescriptionMsg 01 1 0r 2 r 1 r 0 Message Request – The sub-field r[2:0]specifies Message routing mechanism –See Table 2-9MsgD 11 1 0r 2 r 1 r 0 Message Request with data payload – Thesub-field r[2:0] specifies Message routingmechanism – SeeTable2-9MsgAS 01 1 1n 2 n 1 n 0 Message for Advanced Switching – The subfieldn[2:0] specifies the message type:1n 2 n 1 n 0 – Signaling Packet MessagesA detailed description of message typesand message headers will be presented in aseparate document entitled Advanced <strong>PCI</strong><strong>Express</strong> Packet Switching <strong>Specification</strong>.This is a companion specification to the <strong>PCI</strong><strong>Express</strong> <strong>Base</strong> <strong>Specification</strong>.MsgASD 11 1 1c 2 c 1 c 0 Message for Advanced Switching – Thesub-field c[2:0] specifies the message type:1c 2 c 1 c 0 – Data Packet MessagesA detailed description of message typesand message headers will be presented in aseparate document entitled Advanced <strong>PCI</strong><strong>Express</strong> Packet Switching <strong>Specification</strong>.This is a companion specification to the <strong>PCI</strong><strong>Express</strong> <strong>Base</strong> <strong>Specification</strong>.Cpl 00 0 1010 Completion without Data – used for I/O andConfiguration Write Completions, andMemory Read Completions with CompletionStatus other than Successful Completion”CplD 10 0 1010 Completion with Data – used for Memory,I/O, and Configuration Read CompletionsCplLk 00 0 1011 Completion for Locked Memory Readwithout Data – used only in error caseCplDLk 10 0 1011 Completion for Locked Memory Read –otherwise like CplDAll encodings not shown above areReserved64


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0Table 2-9: Message Routingr[2:0]Description000 Routed to Root Complex001 Routed by Address010 Routed by ID 42.7.2. TLP Digest Rules011 Broadcast from Root Complex100 Local - Terminate at Receiver101-111 Reserved - Terminate at Receiver• For any TLP, a value of ‘1’ in the TD field indicates the presence of the TLP Digestfield at the end of the TLPooThe presence or absence of the TLP Digest field must be checked for allTLPsA TLP with a ‘1’ in the TD field but without a TLP Digest, or a TLP with aTLP Digest but without a ‘1’ in the TD field, is a Malformed TLP• This is a reported error associated with the Receiving Port (seeSection 7.2)• For any TLP with a TLP Digest field, a value of ‘1’ in the EP field indicates that theTLP Digest field is used for an end-to-end CRC (ECRC)o The presence or absence of the ECRC must be checked for all TLPs• If the device at the ultimate destination of the TLPosupports neither data poisoning nor ECRC checking, the device must ignorethe TLP Digesto supports data poisoning but not ECRC checking, the device interprets thevalue in the TLP Digest field according to Section 2.11o supports ECRC checking, the device interprets the value in the TLP Digestfield as an ECRC value, according to the rules in Section 2.10.24 Similar to a Completion or a Configuration Request.65


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.02.7.3. TLPs with Data Payloads - Rules• Length is specified as a number of naturally aligned DW• Length[9:0] is reserved for all Messages except those which explicitly refer to a DataLengtho See Message Code table in Section 2.7.4.4.• The data payload of a TLP must not exceed the length specified by the value in theMax_Payload_Size field of the Link Command Register (see Section 5.8.7).ooNote: Max_Payload_Size applies only to TLPs with data payloads; MemoryRead Requests are not restricted in length by Max_Payload_Size. The size ofthe Memory Read Request is controlled by the Length fieldReceivers must check for violations of this rule. If a Receiver determines thata TLP violates this rule, the TLP is a Malformed TLP• This is a reported error associated with the Receiving Port (seeSection 7.2)• For TLPs, that include data, the value in the Length field and the actual amount ofdata included in the TLP must be equal.oReceivers must check for violations of this rule. If a Receiver determines thata TLP violates this rule, the TLP is a Malformed TLP• This is a reported error associated with the Receiving Port (seeSection 7.2)• Requests must not specify an Address/Length combination which causes a MemorySpace access to cross a 4K boundary.oReceivers may optionally check for violations of this rule. If a Receiverimplementing this check determines that a TLP violates this rule, the TLP isa Malformed TLP• If checked, this is a reported error associated with the Receiving Port(see Section 7.2)• Note: The Length specified in the Length field applies only to data – theTransaction Digest is not included in the Length• When a data payload is included in a TLP, the first Byte of data following the headercorresponds to the Byte address closest to zero and the succeeding Bytes are inincreasing Byte address sequence.o Example: For a 16B write to location 100h, the first byte following theheader would be the byte to be written to location 100h, and the second bytewould be written to location 101h, and so on, with the final byte written tolocation 10Fh.66


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0Implementation Note: Maintaining Alignment in Data PayloadsSection 2.7.6.2.1 discusses rules for forming Read Completions respecting certain naturaladdress boundaries. Memory Write performance can be significantly improved by respectingsimilar address boundaries in the formation of the Write Request. Specifically, formingWrite Requests such that natural address boundaries of 64 or 128 Bytes are respected willhelp to improve system performance.2.7.4. RequestsRequests include a Request header which for some types of Requests will be followed bysome number of DW of data. The rules for each of the fields of the Request header aredefined in the following sections.2.7.4.1. Address Field RulesTwo Address formats are specified, a 32b format and a 64b format. Figure 2-9 shows theRequest header format for 32b Addressing and Figure 2-10 shows the Request headerformat for 64b Addressing.• Memory Read Requests and Memory Write Requests can use either format.o For Addresses below 4 GB, Requesters must use the 32b format.• I/O Read Requests and I/O Write Requests use the format shown in Figure 2-11.• Configuration Read Requests and Configuration Write Requests use the formatshown in Figure 2-12.• Msg and MsgD Requests use the format shown in Figure 2-13• MsgAS and MsgASD Requests use the format shown in Figure 2-15• All <strong>PCI</strong> <strong>Express</strong> Agents must decode all address bits in the header - address aliasingis not allowed.Implementation Note: Prevention of Address AliasingFor correct software operation, full address decoding is required even in systems where itmay be known to the system hardware architect/designer that fewer than 64 bits of addressare actually meaningful in the system.67


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0Byte 0 >Byte 4 >Byte 8 >7 6 5FmtRx 0+0 +1 +2 +34 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0TypeRRequester IDTCReservedT EAttr RD PAddress[31:2]TagLengthLast DWBE1st DWBEROM13763Figure 2-9: Request Header Format for 32b Addressing of MemoryByte 0 >Byte 4 >Byte 8 >7 6 5RFmtx 1+0 +1 +2 +34 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0TypeRRequester IDTCReservedTDEPAddress[63:32]AttrTagRLengthLast DWBE1st DWBEByte 12 >Address[31:2]ROM13764Figure 2-10: Request Header Format for 64b Addressing of MemoryByte 0 >Byte 4 >Byte 8 >7 6 5RFmtx 0+0 +1 +2 +34 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0Type RTC0 0 0ReservedT AttrDE 0 0RLengthP0 0 0 0 0 0 0 0 0 1Requester IDTagLast DW BE 1st DW0 0 0 0 BEAddress[31:2]RFigure 2-11: Request Header Format for I/O TransactionsOM13765Byte 0 >Byte 4 >Byte 8 >+0 +1 +2 +37 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0R Fmt Type R TC ReservedTDE Attr RLengthx 0 0 0 0 P 0 0 0 0 0 0 0 0 0 0 0 1Requester IDTagLast DW BE 1st DW0 0 0 0 BEBus NumberDevice FunctionReservedExt. Reg. RegisterRNumber NumberAddress AddressOM13766Figure 2-12: Request Header Format for Configuration Transactions68


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0Byte 0 >Byte 4 >Byte 8 >7 6 5RFmt0 1+0 +1 +2 +34 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0Type R TC ReservedT ED P 0 Length - ReservedR0 0 0 0 0 0 0 0 0 0 0Requester IDTagMessage CodeAddress[63:32]/ReservedByte 12 >Address[31:2]/ReservedRFigure 2-13: Request Header Format for Msg RequestOM13767Byte 0 >Byte 4 >Byte 8 >7 6 5RFmt1 1+0 +1 +2 +34 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0Type R TC Reserved T E D P 0 0R LengthRequester IDTagMessage CodeAddress[63:32]/ReservedByte 12 >Address[31:2]/ReservedRFigure 2-14: Request Header Format for MsgD RequestOM14296Byte 0 >Byte 4 >Byte 8 >Byte 12 >7 6 5FmtR0 1+0 +1 +2 +34 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0Type R TC ReservedT EAttr RLength– ReservedD P0 0 0 0 0 0 0 0 0 0Reserved for MsgAS(A detailed description of message types and message headerswill be included in a separate document, called theAdvanced <strong>PCI</strong> <strong>Express</strong> Packet Switching <strong>Specification</strong>, acompanion specification to the <strong>PCI</strong> <strong>Express</strong> <strong>Base</strong> <strong>Specification</strong>.)Figure 2-15: Request Header Format for MsgAS RequestOM13768Byte 0 >Byte 4 >Byte 8 >Byte 12 >7 6 5FmtR1 1+0 +1 +2 +34 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0TypeRTCReservedTDEPAttrReserved for MsgASD(A detailed description of message types and message headerswill be included in a separate document, called theAdvanced <strong>PCI</strong> <strong>Express</strong> Packet Switching <strong>Specification</strong>, acompanion specification to the <strong>PCI</strong> <strong>Express</strong> <strong>Base</strong> <strong>Specification</strong>.)RLengthFigure 2-16: Request Header Format for MsgASD RequestOM1429869


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.02.7.4.2. First/Last DW Byte Enable Rules• The first DW Byte Enables[3:0] Field contains byte enables for the first DW of anyMemory Read or Write Request, and for the only DW of an I/O or ConfigurationRequest.o If there is only one DW for a Memory Request, this byte enable field is used.oIf the Length field for a Request indicates a length of greater than 1DW, thisfield must not be inactive (must not equal 0000b)• The Last DW Byte Enables[3:0] Field contains byte enables for the last DW of anyMemory Read or Write Request.ooIf the Length field for a Request indicates a length of 1DW, this field mustbe inactive (must equal 0000b).If the Length field for a Request indicates a length of greater than 1DW, thisfield must not be inactive (must not equal 0000b).• These fields are never used with Msg, MsgD, MsgAS, or MsgASD Requestso Note that these fields overlap the Message Code/Dest RID[7:0] fields• For each bit of the Byte Enables Fields:o a value of ‘0’ indicates that the corresponding Byte of Data must not bewritten or, if non-prefetchable, must not be read at the Completer.o a value of ‘1’ indicates that the corresponding Byte of Data must be writtenor read at the Completer.• If a Read Request of 1 DW specifies that no Bytes are enabled to be read (1 st DWByte Enables[3:0] field = b’0000), the corresponding Completion must specify aLength of 1 DW, and include a data payload of 1 DWo The contents of the data payload are unspecified and may be any value• Receiver/Completer behavior is undefined for a TLP violating the Byte Enablesrules specified in this section.• Receivers may optionally check for violations of the Byte Enables rules specified inthis section. If a Receiver implementing such checks determines that a TLP violatesone or more Byte Enable rules, the TLP is a Malformed TLPoIf Byte Enable rules are checked, a violation is a reported error associatedwith the Receiving Port (see Section 7.2)Implementation Note: Zero Length ReadA Memory Read Request of 1 DW with no Bytes enabled, or “zero length Read,” may beused by devices as a type of “flush” Request. For a Requester, the “flush” semantic allows adevice to ensure that previously issued Posted Writes have been completed at their <strong>PCI</strong><strong>Express</strong> destination.70


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0The “flush” semantic has wide application, and all Completers must implement thefunctionality associated with this semantic. Because a Requester may use the “flush”semantic without comprehending the characteristics of the Completer, Completers mustensure that zero length reads do not have side-effects. This is really just a specific case ofthe rule that in a non-prefetchable space, non-enabled Bytes must not be read at theCompleter. Note that the “flush” applies only to traffic in the same Traffic Class as the zerolength Read.• Of the first DW Byte Enables[3:0] Field:o Bit 0 corresponds to Byte 0 of the first DW of data.o Bit 1 corresponds to Byte 1 of the first DW of data.o Bit 2 corresponds to Byte 2 of the first DW of data.o Bit 3 corresponds to Byte 3 of the first DW of data.• Of the last DW Byte Enables[3:0] Field:o Bit 0 corresponds to Byte 0 of the last DW of data.o Bit 1 corresponds to Byte 1 of the last DW of data.o Bit 2 corresponds to Byte 2 of the last DW of data.o Bit 3 corresponds to Byte 3 of the last DW of data.Figure 2-9, Figure 2-10, Figure 2-11, and Figure 2-12 show the Byte Enable fields forMemory, I/O, and Configuration Requests.2.7.4.3. Rules for Tag, Requester ID, Traffic Class, and AttributeFields• The Tag[7:0] field contains the Tag as described in Section 2.4.2.• The Requester ID[15:0] field contains the Requester ID as described in Section 2.4.2.• The TC[2:0] field contains the Traffic Class identification as described inSection 2.4.4.• The Attr[1:0] field contains the Transaction Descriptor attribute as described inSection 2.4.3.Figure 2-9, Figure 2-10, Figure 2-11, Figure 2-12, Figure 2-13, and Figure 2-15 show thesefields.71


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.02.7.4.4. Message Space Rules• Message codes and support requirements are defined in Table 2-10• All devices must fully decode all Messages to distinguish supported Messages fromunsupported Messages (aliasing is not permitted).• Message Requests are posted and do not require Completion.• Message Requests follow the same ordering rules as Memory Write Requests.• Except as noted, the Address field is Reserved• Except as noted, the Attr(Attribute) field is set to 00b• Except as noted, Messages use Traffic Class = 0o Receivers must check for violations of this rule. If a Receiver determines thata TLP violates this rule, the TLP is a Malformed TLP• This is a reported error associated with the Receiving Port (seeSection 7.2)• Message Codes in the range 10000000 – 11111111 are reserved for Vendor Specificuse• Receipt of an unsupported Message is an Unsupported Requesto An Unsupported Request is a reported error associated with the Receivingdevice/function (see Section 7.2)o Note that many Messages are specified to be simply discarded by theReceiver without effect – such Messages are not considered UnsupportedRequests, and are, therefore, not errorsExample: A <strong>PCI</strong> <strong>Express</strong> Endpoint receiving an Unlock Message.72


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0Table 2-10: Msg CodesName Code[7:0] Routingr[2:0]RCSupport 5EpSwBrReqID 6Description/CommentsUnlock 0000 0000 011 t r r BD Unlock CompleterERR_COR 0011 0000 000 r t t BBDBDFERR_NONFATAL 0011 0001 000 r t t BBDBDFERR_FATAL 0011 0011 000 r t t BBDBDFSignal detection of acorrectable errorSignal detection of anuncorrectable errorSignal detection of a fatalerrorPM_Active_State_Nak 0001 0100 100 t r tr r B Power Managementrelated – see Chapter 6PM_PME 0001 1000 000If PMEsupported:tBDFPower Managementrelated – see Chapter 6PME_Turn_Off 0001 1001 011 t r r BDF Power Managementrelated – see Chapter 65 Abbreviations:RC = Root ComplexSw=Switch(onlyusedwith“Link” routing)Ep = EndpointBr = <strong>PCI</strong> <strong>Express</strong>/<strong>PCI</strong> Bridger = Supports as Receivert = Supports as TransmitterNote that Switches must support passing Messages on all legal routing paths. Only Messages specifyingLocal (0100b) routing or a reserved field value are terminated locally at the Receiving Port on a Switch.6 The Requester ID includes sub-fields for Bus Number, Device Number and Function Number. SomeMessages are not associated with specific Devices or Functions in a component, and for such Messagesthese fields are Reserved; this is shown in this column using a code. Some messages can be used in morethan one context, and therefore more than one code may be listed. The codes in this column are:B = Bus Number included; Device Number and Function Number are ReservedBD = Bus Number and Device Number included; Function Number is ReservedBDF = Bus Number, Device Number, and Function Number are included73


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0Name Code[7:0] Routingr[2:0]RCSupportEpSwBrReqIDDescription/CommentsPME_TO_Ack 0001 1011 000r t t(Note: Switchhandling isspecial)BDFPower Managementrelated – see Chapter 6Assert_INTA 0010 0000 100All:rAs Required:ttBAssert INTA virtual signalNote: These Messagesareusedfor<strong>PCI</strong>2.3compatible INTxemulationAssert_INTB 0010 0001 100All:BAssert INTB virtual signalrAs Required:ttAssert_INTC 0010 0010 100All:BAssert INTC virtual signalrAs Required:ttAssert_INTD 0010 0011 100All:BAssert INTD virtual signalrAs Required:ttDeassert_INTA 0010 0100 100All:rBDe-assert INTA virtualsignalAs Required:ttDeassert_INTB 0010 0101 100All:rBDe-assert INTB virtualsignalAs Required:ttDeassert_INTC 0010 0110 100All:rBDe-assert INTC virtualsignalAs Required:ttDeassert_INTD 0010 0111 100All:rBDe-assert INTD virtualsignalAs Required:tt74


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0Name Code[7:0] Routingr[2:0]Attention_Indicator_On 0100 0001 100Attention_Indicator_Blink 0100 0011 100Attention_Indicator_Off 0100 0000 100Power_Indicator_On 0100 0101 100Power_Indicator_Blink 0100 0111 100Power_Indicator_Off 0100 0100 100Attention_Button_Pressed 0100 1000 100Vendor Specific 1000 0000to1111 1111Note: Implementation specific.RCSupportEpSwBrt r tr rRequired forHot PlugSupportt r tr rRequired forHot PlugSupportt r tr rRequired forHot PlugSupportt r tr rRequired forHot PlugSupportt r tr rRequired forHot PlugSupportt r tr rRequired forHot PlugSupportr t r tRequired forHot PlugSupportReqIDBDFBDFBDFBDFBDFBDFBDF000 to See note. See100 7 note.Description/CommentsAttention Indicator OnAttention Indicator BlinkAttention Indicator OffPower Indicator OnPower Indicator BlinkPower Indicator OffAttention Button PressedCodes in this range arereserved for vendordefinition.7 Any value in this range is permitted.75


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0Table 2-11: MsgD CodesName Code[7:0] Routingr[2:0]RCSupportEpSwBrReq IDDescription/CommentsSet_Slot Power_Limit 0101 0000 100 t r tr r BDF Set Slot Power Limit inUpstream PortPayload_Defined 0111 1111 000 to100See note.Seenote.See Section 2.8.1.5Vendor Specific 1000 0000to1111 1111000 to100See note.Seenote.Codes in this range arereserved for vendordefinition.Note: Implementation specific.2.7.5. CompletionsAll Read Requests and Non-Posted Write Requests require Completion. Completionsinclude a Completion header that, for some types of Completions, will be followed by somenumber of DW of data. The rules for each of the fields of the Completion header aredefined in the following sections.Figure 2-17 shows the format of a Completion header.Byte 0 >Byte 4 >Byte 8 >7 6 5RFmtx 0+0 +1 +2 +34 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0Type R TC ReservedD T E P Attr RLengthCompleter IDReserved ReservedCompl.StatusRequester ID Tag Reserved RFigure 2-17: Completion Header FormatOM137692.7.5.1. Rules for Completers• The Completion Status[2:0] field indicates the status for a Completion:o 000b – Successful Completion (SC)o 001b – Unsupported Request (UR)o 010b – Configuration Request Retry Status (CRS)o 100b – Completer Abort (CA)o All others Reserved76


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0• Rules for determining the value in the Completion Status[2:0] field are inSection 2.7.6.2• The Completer ID[15:0] field is a 16-bit value that is unique for every <strong>PCI</strong> <strong>Express</strong>function (see Figure 2-18)• Functions must capture the Bus and Device Numbers supplied with allConfiguration Requests (Type 0) completed by the function, and supply thesenumbers in the Bus and Device Number fields of the Completer ID for allCompletions generated by the device/function.oooIf a function must generate a Completion prior to the initial deviceConfiguration Request, 0’s must be entered into the Bus Number and DeviceNumber fieldsNote that Bus Number and Device Number may be changed at run time,and so it is necessary to re-capture this information with each and everyConfiguration Request.Exception: The assignment of bus numbers to the logical devices within aRoot Complex may be done in an implementation specific way.• In some cases, a Completion with the UR status may be generated by amulti-function device without associating the Completion with a specific functionwithin the device – in this case, the Function Number field is Reserved, and is set toall ‘0’soExample: A multi-function device receives a Read Request which does nottarget any resource associated with any of the functions of the device – thedevice generates a Completion with UR status and sets a value of all ‘0’s inthe Function Number field of the Completer IDCompleter ID7:0Bus Number4:0DeviceNumber2:0FunctionNumberFigure 2-18: Completer IDOM13770• Completion headers must supply the same values for the Requester ID, Tag, Attributeand Traffic Class as were supplied in the header of the corresponding Request.Note: Prior to system initialization, Requester ID values may not be established. It isrequired that all Requests made prior to system initialization be initiated by the RootComplex, as all Completions will be routed to the Root Complex.• The Completion ID field is not meaningful prior to the software initialization andconfiguration of the completing device (using at least one Configuration Write Request),and the Requestor must ignore the value returned in the Completer ID field.77


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0• A Completion including data must specify the actual amount of data returned in thatCompletion, and must include the amount of data specified.o It is a TLP formation error to include more or less data than specified in theLength field, and the resulting TLP is a malformed TLP.Note: This is simply a specific case of the general rule requiring TLP data payloadlength match the value in the Length field.2.7.6. Handling of Received TLPs2.7.6.1. Handling of Received TLPs – RulesThis section describes how all Received TLPs are handled when they are delivered to theReceive Transaction Layer from the Receive Data Link Layer, after the Data Link Layer hasvalidated the integrity of the received TLP. The rules are diagramed in the flowchart shownin Figure 2-19.• Values in Reserved fields must be ignored by the Receiver.• All Received TLPs which fail the required (and implemented optional) checks of TLPformation rules described in this section, or which use undefined Type field values, areMalformed TLPs (MP) and must be discarded without updating Receiver Flow Controlinformationo This is a reported error associated with the Receiving Port (see Section 7.2)• If the value in the Type field is a defined value, update Receiver Flow Control trackinginformation (see Section 2.9)• If the value in the Type field indicates the TLP is a Request, handle according to RequestHandling Rules, otherwise, the TLP is a Completion – handle according to CompletionHandling Rules (following sections)78


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0StartIgnoringReserved Fields*, DoesTLP follow formationrules?NoYesYesIs value inType fieldDefined?NoTLP is Malformed:Discard TLPReport Malformed PacketEndUpdate FlowControl trackingIs TLP a Request?NoTLP is a Completion-See rules forCompletion HandlingYesTLP is a Request-See rules for Request Handling*TLP Header fields which are marked Reserved are not checked at the ReceiverFigure 2-19: Flowchart for Handling of Received TLPsOM13771Switches must process both TLPs which address resources within the Switch as well as TLPswhich address resources residing outside the Switch. Switches handle all TLPs which addressinternal resources of the Switch according to the rules above. TLPs which pass through theSwitch, or which address the Switch as well as passing through it, are handled according tothe following rules (see Figure 2-20):• If the value in the Type field indicates the TLP is not a Msg or MsgD Request, the TLPmust be routed according to the Switch routing rules• Switches route Completions using the information in the Requester ID field of theCompletion.• If the value in the Type field indicates the TLP is a Msg or MsgD Request, route theRequest according to the routing mechanism indicated in the r[2:0] sub-field of the Typefieldo If the value in r[2:0] indicates the Msg/MsgD terminates at the Receiver, or if theMessage Code field value is defined and corresponds to a Message which mustbe comprehended by the Switch, the Switch must process the message accordingto the Message processing rules79


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0StartIs TLP aMsg or MsgDRequest?NoYesRoute TLP to egress Portaccording to routing rulesPropagate to egress Port(s)according to r[2:0] sub-fieldof Type fieldEndIs Message (also)directed to ReceivingPort of Switch?NoEndYesIs value inMessage Codefield defined?YesProcess Message according toMessage handling rulesEndNoUnsupported RequestEndFigure 2-20: Flowchart for Switch Handling of TLPsOM137722.7.6.2. Request Handling RulesThis section describes how Received Requests are handled, following the initial processingdone with all TLPs. The rules are diagramed in the flowchart shown in Figure 2-21.• If the Request Type is not supported by the device, the Request is an UnsupportedRequest, and is reported according to Section 7.2oIf the Request requires Completion, a Completion Status of UR is returned (seeSection 2.7.5)• If the Request is a Message, and the Message Code specifies an undefined orunsupported value, the Request is an Unsupported Request, and is reported according toSection 7.2oIf the Message Code is a supported value, process the Message according to thecorresponding Message processing rulesIf the Request is not a Message, and is a supported Type, specific implementations may beoptimized based on a defined programming model which ensures that certain types of80


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0(otherwise legal) Requests will never occur. Such implementations may take advantage ofthe following rule:• If the Request violates the programming model of the device, the device may optionallytreat the Request as a Completer Abort, instead of handling the Request normallyooIf the Request is treated as a Completer Abort, this is a reported error associatedwith the device/function (see Section 7.2)If the Request requires Completion, a Completion Status of CA is returned (seeSection 2.7.5)Implementation Note: Optimizations <strong>Base</strong>d on Restricted Programming ModelWhen a device’s programming model restricts (vs. what is otherwise permitted in <strong>PCI</strong><strong>Express</strong>) the characteristics of a Request, that device is permitted to “Completer Abort” anyRequests which violate the programming model. Examples include unaligned or wrong-sizeaccess to a register block and unsupported size of request to a memory space.Generally, devices are able to assume a restricted programming model when allcommunication will be between the device’s driver software and the device itself. Deviceswhich may be accessed directly by operating system software or by applications which maynot comprehend the restricted programming model of the device (typically devices whichimplement “legacy” capabilities) should be designed to support all types of Requests whichare possible in the existing usage model for the device. If this is not done, the device mayfail to operate with existing software.• Otherwise (supported Request Type, not a Message), process the Requesto If the Completer is permanently unable to process the Request due to a devicespecificerror condition the Completer must, if possible, handle the Request as aCompleter Aborto This is a reported error associated with the Receiving device/function, ifthe error can be isolated to a specific device/function in the component,or to the Receiving Port if the error cannot be isolated (see Section 7.2)o For Configuration Requests only, following reset it is possible for a device toindicate that it is temporarily unable to process the Request – in this case, theConfiguration Request Retry Status Completion Status is used (see Section 7.6)o In the process of servicing the Request, the Completer may determine that the(otherwise acceptable) Request must be handled as an error, in which case theRequest is handled according to the type of the erroro Example: A <strong>PCI</strong> <strong>Express</strong>/<strong>PCI</strong> Bridge may initially accept a Requestbecause it specifies a memory range mapped to the secondary side of theBridge, but the Request may Master Abort or Target Abort on the <strong>PCI</strong>side of the Bridge. From the <strong>PCI</strong> <strong>Express</strong> perspective, the status of theRequest in this case is UR (for Master Abort) or CA (for Target Abort).If the Request requires Completion on <strong>PCI</strong> <strong>Express</strong>, the correspondingCompletion Status is returned.81


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0• If the Request is a type which requires a Completion to be returned, generate aCompletion according to the rules for Completion Formation (see Section 2.7.5)o The Completion Status is determined by the result of handling the RequestStartNoIs Request TypeSupported?UnsupportedRequestYesYesRequest Type = Msg?Does Request requirea Completion?YesSend Completion:Completion Status = URNoEndNoEndIs value inMessage Codefield defined?OptionalNoDoes Request violateDevice programmingmodel?NoProcess Request according toRequest handling rules(determine Completion Status,if applicable)YesDoes Requestrequire aCompletion?EndYesSend Completion:Completion Status = CANoEndUnsupportedRequestYesProcess Messageaccording to MessageHandling RulesDoes Request requirea Completion?YesNoEndEndEndSend CompletionEndFigure 2-21: Flowchart for Handling of Received RequestOM1377382


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0Implementation Note: Configuration Retry StatusSome devices require a lengthy self-initialization sequence complete before they are able toservice Configuration Requests (common with intelligent I/O solutions on <strong>PCI</strong>). <strong>PCI</strong>/<strong>PCI</strong>-X architecture has specified a 2 25 (<strong>PCI</strong>) or 2 26 (<strong>PCI</strong>-X) clock “recovery time” following resetto provide the required self-initialization time for such devices. <strong>PCI</strong> <strong>Express</strong> “softens” theneed for this time based recovery period by implementing a Configuration Request RetryCompletion Status. A device in receipt of a Configuration Request may respond with aConfiguration Request Retry Completion Status to effectively stall the ConfigurationRequest until such time that the subsystem has completed local initialization and is ready tocommunicate with the host. Note that is only legal to respond with a configuration retrycompletion status in response to a Configuration Request. Sending this Completion Statusin response to any other Request type will result in the generation of an error condition(Malformed TLP – see Section 7.2).A Root Complex in receipt of a Configuration Request Retry Completion Status in responseto its Configuration Request may choose to re-issue the Configuration Request as a newRequest on <strong>PCI</strong> <strong>Express</strong> or complete the Request to the host as a failed transaction. RootComplex implementations may further choose to only allow a fixed number ofConfiguration Request/Retry Completion Status loops before determining that something iswrong with the target of the Request and taking appropriate action. When used in systemsincluding <strong>PCI</strong> <strong>Express</strong> to <strong>PCI</strong>/<strong>PCI</strong>-X bridges, the Root Complex must comprehend thelimit T rhfa for <strong>PCI</strong>/<strong>PCI</strong>-X agents.The net result is that existing enumeration and configuration code will not see the lowerlevel retry protocol semantics that are kept at the hardware level. The CPU will only seelatency associated with the initial Configuration Request. See Section 7.6 for moreinformation on reset.2.7.6.2.1. Data Return for Read Requests• Individual Completions for Memory Read Requests may provide less than the fullamount of data Requested so long as all Completions for a given Request whencombined return exactly the amount of data Requested in the Read Request.o Completions for different Requests cannot be combined.o I/O and Configuration Reads must be completed with exactly one Completion.oThe Completion Status for a sub-Completion corresponds only to the statusassociated with the data returned with that sub-Completion• A sub-Completion with status other than Successful Completion, or for aConfiguration Read only, Configuration Retry Status, terminates theCompletions for a single Read Request• In this case, the value in the Length field is undefined, and mustbe ignored by the Receiver83


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0• Completions must not include more data than permitted by the Max_Payload_Sizeparameter, calculated as a naturally aligned boundary.o Receivers must check for violations of this rule – TLPs in violation areMalformed TLPso This is a reported error associated with the Receiving Port (see Section 7.2)Note: This is simply a special case of the rules which apply to all TLPs with datapayloads• Read Requests may be completed with one, or in some cases, multiple Completions• There is a parameter, R, which determines the naturally aligned address boundaries onwhich a Read Request may be serviced with multiple Completionso For a Root Complex, R is 64B or 128B• This value is reported through a configuration register (see Section 5.8)Note: Bridges and Endpoints may implement a corresponding command bitwhich may be set by system software to indicate the R value for the RootComplex, allowing the Bridge/Endpoint to optimize its behavior when theRoot Complex’s R is 128B.o For all other system elements, R is 128B• Completions for Requests which do not cross the naturally aligned address boundaries atinteger multiples of R Bytes must include all data specified in the Request• Requests which do cross the address boundaries at integer multiples of R Bytes may becompleted using more than one Completion, but the data must not be fragmentedexcept along the address boundaries.o The first Completion must start with the address specified in the Request, andmust end at one of the following:• the address specified in the Request plus the length specified by theRequest (i.e. the entire Request)• an address boundary between the start and end of the Request at aninteger multiple of R Byteso The final Completion must end with the address specified in the Request plusothe length specified by the RequestAll Completions between, but not including, the first and final Completions mustbe an integer multiple of R Bytes in length• Receivers may optionally check for violations of R. If a Receiver implementing thischeck determines that a Completion violates this rule, it must handle the Completionas a Malformed TLPo This is a reported error associated with the Receiving Port (see Section 7.2)• Multiple Memory Read Completions for a single Read Request must return data inincreasing address order.84


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0• When a Read Completion is generated with a Completion Status other than “SuccessfulCompletion”:o No data is included with the Completion• The Cpl (or CplLk) encoding is used instead of CplD (or CplDLk)o This Completion is the final Completion for the Request.• The Completer must not transmit additional Completions for thisRequest.• Example: Completer split the Request into four parts forservicing; the second Completion had a Completer AbortCompletion Status; the Completer terminated servicing for theRequest, and did not Transmit the remaining two Completions.85


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0Implementation Note: Restricted Programming ModelWhen a device’s programming model restricts (vs. what is otherwise permitted in <strong>PCI</strong> <strong>Express</strong>)the size and/or alignment of Read Requests directed to the device, that device is permitted touse a Completer Abort Completion Status for Read Requests which violate the programmingmodel. An implication of this is that such devices, generally devices where all communicationwill be between the device’s driver software and the device itself, need not necessarily implementthe buffering required to generate Completions of length R. However, in all cases, theboundaries specified by R must be respected for all reads which the device will Complete withSuccessful Completion status.Examples:1: Memory Read Request with Address of 1 0000h and Length of C0h Bytes (192 decimal)could be completed by a Root Complex with an R value of 64 Bytes with one of the followingcombinations of Completions (Bytes):192 –or–128, 64 –or–64, 128 –or–64, 64, 642: Memory Read Request with Address of 10000h and Length of C0h Bytes (192 decimal) couldbe completed by a Root Complex with an R value of 128 Bytes in one of the followingcombinations of Completions (Bytes):192 –or–128, 643: Memory Read Request with Address of 10020h and Length of 100h Bytes (256 decimal)could be completed by a Root Complex with an R value of 64 Bytes in one of the followingcombinations of Completions (Bytes):256 –or–32, 224 –or–32, 64, 160 –or–32, 64, 64, 96 –or–32, 64, 64, 64, 32 –or–32, 64, 128, 32 –or–32, 128, 96 –or–32, 128, 64, 32 –or–96, 160 –or–96, 128, 32 –or–96, 64, 96 –or–96, 64, 64, 32 –or–160, 96 –or–160, 64, 32 –or–224, 324: Memory Read Request with Address of 10020h and Length of 100h Bytes (256 decimal)could be completed by an Endpoint in one of the following combinations of Completions(Bytes):256 –or–96, 160 –or–96, 128, 32 –or–224, 3286


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.02.7.6.3. Completion Handling Rules• When a device receives Completion which does not correspond to any of outstandingRequests issued by that device, the Completion is called an “Unexpected Completion.”• Receipt of an Unexpected Completion is an error and must be handled according to thefollowing rules:o The Agent receiving an Unexpected Completion must discard the Completion.o An Unexpected Completion is a reported error associated with the ReceivingPort (see Section 7.2)Note: Unexpected Completions are assumed to occur mainly due to Switchmisrouting of the Completion. The Requester of the Request may not receive aCompletion for its Request in this case, and the Requester’s Completion Timeoutmechanism (see Section 2.12) will terminate the Request.• Completions with a Completion Status other than Successful Completion, orConfiguration Request Retry Status (in response to Configuration Request only) mustcause the Requester to:o Free any Flow Control credits and other resources associated with the Request.o Report the error according to the rules in Section 7.2.• Completions with a Configuration Request Retry Status in response to a Request otherthan a Configuration Request are Malformed TLPso This is a reported error associated with the Receiving Port (see Section 7.2)• Completions with a Reserved Completion Status value are treated as if the CompletionStatus was Unsupported Request (UR)o This is a reported error associated with the Receiving device/function,normally the same as the Requestor (see Section 7.2)• When a Read Completion is received with a Completion Status other than “SuccessfulCompletion”:o No data is included with the Completion• The Cpl (or CplLk) encoding is used instead of CplD (CplDLk)o This Completion is the final Completion for the Request.• The Requester must consider the Request terminated, and not expectadditional Completions.• Handling of partial Completions Received earlier isimplementation specific.Example: The Requester received 32B of Read data for a 128B Read Request ithad issued, then a Completion with the Completer Abort Completion Status.The Requester then must free the internal resources which had been allocated forthat particular Read Request.87


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0Implementation Note: Read Data Values with UR Completion StatusSome system configuration software depends on reading a data value of all ‘1’s when aConfiguration Read Request is terminated as an Unsupported Request, particularly whenprobing to determine the existence of a device in the system. A Root Complex intended foruse with software that depends on a read-data value of all ‘1’s must synthesize this valuewhen UR Completion Status is returned for a Configuration Read Request.2.8. MessagesThe <strong>PCI</strong> <strong>Express</strong> specification defines the following Messages:• <strong>Base</strong>line Message Groupo Interrupt Signalingo Power Managemento Error Signalingo Locked Transaction Supporto Slot Power Limit Supporto Payload Definedo Vendor Specific Messageso Hot Plug Signaling• Advanced Switching Support Message Groupo Data Packet Messageso Signal Packet Messages2.8.1. <strong>Base</strong>line Messages2.8.1.1. Interrupt Signaling - Rules• MSIs follow the rules defined for <strong>PCI</strong>.• MSIs are expressed as Memory Writes, and follow rules for Packet formation,Flow Control, and Data Integrity in the same way as Memory Writes• MSIs enforce data consistency by pushing ahead of them any previously postedwrite data using the same TC (as required by the ordering rules in Section 2.5).• When MSIs are not enabled, interrupts are signaled using the Assert_INTx andDeassert_INTx messages88


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0• Assert_INTx/Deassert_INTx messages are only issued Upstream (towards theRoot Complex)oReceivers may optionally check for violations of this rule. If a Receiverimplementing this check determines that anAssert_INTx/Deassert_INTx violates this rule, it must handle the TLPas a Malformed TLP• This is a reported error associated with the Receiving Port (seeSection 7.2)• The following have no effect, but are not errors:o For a particular ‘x’ (A, B, C or D), receipt of an Assert_INTx messagefollowing an earlier Assert_INTx without a Deasssert_INTx messagebetweeno For a particular ‘x’ (A, B, C or D), receipt of a Deassert_INTx messagefollowing an earlier Deassert_INTx without an Asssert_INTx messagebetween• All Assert_INTx and Deassert_INTx interrupt messages must use the defaultTraffic Class designator (TC0) Receivers must check for violations of this rule. Ifa Receiver determines that a TLP violates this rule, it must handle the TLP as aMalformed TLPoThis is a reported error associated with the Receiving Port (seeSection 7.2)Implementation Note: Synchronization of Data Traffic and InterruptsAll Assert_INTx and Deassert_INTx interrupts Requests must use TC0, which ensures thatthe classic ordering behavior expected in legacy hardware is maintained. MSIs may use theTC that is most appropriate for the device's programming model. This is generally the sameTC as is used to transfer data; for legacy I/O, TC0 is used.If a device uses more than one TC, it must explicitly ensure that proper synchronization ismaintained between data traffic and interrupt message(s) not using the same TC. Methodsfor ensuring this synchronization are implementation specific. One option is for a device toissue a zero length Read (as described in Section 2.7.4.2) using each additional TC used fordata traffic prior to issuing the MSI. Other methods are also possible. Note, however, thatplatform software (e.g., a device driver) is generally only capable of issuing transactions usingTC0.The Assert_INTx/Deassert_INTx message pairs constitute four “virtual wires” for each ofthe legacy <strong>PCI</strong> interrupts designated A, B, C, and D. The above rules for INTx messagingapply to all <strong>PCI</strong> <strong>Express</strong> compliant components, and facilitate the logical emulation of levelsensitiveinterrupt lines. A set of four virtual INTx wires is associated with each and every<strong>PCI</strong> <strong>Express</strong> Link in a hierarchy, and the components at both ends of each Link must tracktheir logical state. Further rules apply to Switches, Bridges and Root Complexes to enableproper emulation of level-sensitive <strong>PCI</strong> interrupts.89


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0Rules for Assert_INTx/Deassert_INTx specific to Switches and Bridges:• Components providing multiple Downstream Links must track the state of thefour “virtual wires” embodied in the Assert_INTx/Deassert_INTx pairsindependently for each of its Downstream Links, and present a “collapsed” setof Assert_INTx/Deassert_INTx pairs on its Upstream Link following the rulesoutlined aboveoCollapsing of Downstream virtual wire state onto Upstream virtual wirestate must follow the mapping rules provided below.• In the event that a Downstream Link goes to the DL_Down status (due tosurprise removal, hardware failure, software-initiated reset, etc.), the “virtualwires” embodied in the Assert_INTx/Deassert_INTx pairs associated with thatLink must be de-asserted. If that results in de-assertion of any UpstreamAssert_INTx/Deassert_INTx “virtual wires,” then the appropriateDeassert_INTx message(s) must be sent Upstream.Rules for Assert_INTx/Deassert_INTx specific to the Root Complex:• The Root Complex must track the state of the four “virtual wires” embodied inthe Assert_INTx/Deassert_INTx pairs independently for each of itsDownstream Links, and map these virtual signals to system interrupt resources.oDetails of mapping to system interrupt resources are beyond the scope ofthis specification• In the event that a Link attached to the Root Complex goes to the DL_Downstatus, the “virtual wires” embodied in the Assert_INTx/Deassert_INTx pairsassociated with that Link must be de-asserted, and any associated systeminterrupt resource request must also be discarded.Within a Switch or below a Bridge, there are typically multiple devices (the virtual <strong>PCI</strong>bridges for each Downstream Port in the case of a Switch) which must have their associatedINTx states mapped to the Upstream Port. The following rules describe how this mappingmust be done.• Switches must collapse the INTx “virtual wires” from each of their Downstream<strong>PCI</strong> <strong>Express</strong> Links according to Table 2-12.o The mapping is based on the device number (irrespective of the functionnumber) of the <strong>PCI</strong> to <strong>PCI</strong> bridge structure representing theDownstream Port of the Switch.90


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0Table 2-12: Switch Mapping for INTxDev # of P2P RepresentingSwitch Downstream Port0,4,8,12,16,20,24,281,5,9,13,17,21,25,292,6,10,14,18,22,26,303,7,11,15,19,23,27,31INTx Message fromDownstream <strong>PCI</strong> <strong>Express</strong>LinkINTAINTBINTCINTDINTAINTBINTCINTDINTAINTBINTCINTDINTAINTBINTCINTDMapping to INTxMessage on Upstream<strong>PCI</strong> <strong>Express</strong> LinkINTAINTBINTCINTDINTBINTCINTDINTAINTCINTDINTAINTBINTDINTAINTBINTC• <strong>PCI</strong> <strong>Express</strong>-<strong>PCI</strong>/X Bridges must collapse the INTA-INTD pins from each of theirDownstream <strong>PCI</strong>/X buses into just four INTx “virtual wires” on their Upstream Port.ooThe mapping between the INTx pin on <strong>PCI</strong>/X bus and the corresponding INTxmessages on <strong>PCI</strong> <strong>Express</strong> is based on the device number of the <strong>PCI</strong>/X devicerequesting the interrupt. The mapping is essentially the same as for switches(shown in Table 2-12) except that the “device numbers” column represents the<strong>PCI</strong>/X device numbers.Multi-headed <strong>PCI</strong> <strong>Express</strong>-<strong>PCI</strong>/X bridges will collapse the interrupts across themultiple <strong>PCI</strong>/X buses following the same rules as described for switches. Forexample, a dual-headed <strong>PCI</strong> <strong>Express</strong>-<strong>PCI</strong>-X bridge would collapse the INTApins from its two Downstream <strong>PCI</strong>/X buses as shown in Figure 2-22.91


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0Assert_INTA/Deassert_INTA<strong>PCI</strong> <strong>Express</strong><strong>PCI</strong> <strong>Express</strong>-<strong>PCI</strong>(-X) BridgeOR<strong>PCI</strong> Bus A<strong>PCI</strong> Bus AINTA# Pin<strong>PCI</strong> Bus BINTA# Pin<strong>PCI</strong> Bus BOM13774Figure 2-22: INTx Collapsing in a Dual-Headed Bridge• All internal devices integrated within a Switch or Bridge follow the same mapping rulesfor interrupt collapsing as described for Switches above.oMapping is based on the device number of the integrated device and is shown inTable 2-12, with the modification that the device number represents the devicenumber of the integrated device.o The Bridge/Switch consolidates interrupts from their Downstream Ports/Linksand the internal integrated devices following all of the rules stated above andcreates just four INTx “virtual wires” for its Upstream PortNote that the Requester ID of an Assert_INTx/Deassert_INTx Message will correspond tothe Transmitter of the message on that Link, and not necessarily to the original source of theinterrupt.Implementation Note: System Interrupt MappingNote that system software (including BIOS and operating system) needs to comprehend theremapping of legacy interrupts (INTx mechanism) in the entire topology of the system(including hierarchically connected Switches and subordinate <strong>PCI</strong> <strong>Express</strong>/<strong>PCI</strong> Bridges) toestablish proper correlation between <strong>PCI</strong> <strong>Express</strong> device interrupt and associated interruptresources in the system interrupt controller. The remapping described by Table 2-12 isapplied hierarchically at every Switch. In addition, <strong>PCI</strong> <strong>Express</strong>/<strong>PCI</strong> and <strong>PCI</strong>/<strong>PCI</strong> Bridgesperform a similar mapping function.92


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.02.8.1.2. Power Management GroupThis Message group is used to support <strong>PCI</strong> <strong>Express</strong> power management. Table 2-13summarizes the list of Messages that belong to this group.Table 2-13: Power Management System MessagesMessage Parameters CommentsPM_Active_State_Nak None Terminate at ReceiverPM_PME None Sent Upstream by PME-requestingcomponent. PropagatesUpstream.PME_Turn_Off None Broadcast DownstreamPME_TO_Ack None Sent Upstream by Endpoint. SentUpstream by Switch when receivedon all Downstream Ports.Notes:• Address field for all these messages is reserved.• The Length Field is reserved for all Power Management Messages.• All power management system messages must use the default Traffic Classdesignator (TC0).For more details on the usage of Power Management Messages, refer to Chapter 6.2.8.1.3. Error Signaling/Logging GroupError Messages are used to signal errors that occur on specific transactions and errors thatare not necessarily associated with a particular transaction (e.g., Link training fails). TheseMessages are initiated by the agent that detected an error.All Error Messages must use the default Traffic Class Designator (TC0).93


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0Table 2-14: Error MessagesError MessageERR_CORERR_NONFATALERR_FATALDescriptionThis Message is issued when the component or device detects acorrectable error on the <strong>PCI</strong> <strong>Express</strong> interface. The RootComplex is the ultimate recipient for this Message.This Message is issued when the component or device detects anon-fatal, uncorrectable error on the <strong>PCI</strong> <strong>Express</strong> interface. TheRoot Complex is the ultimate recipient for this Message.This Message is issued when the component or device detects afatal, uncorrectable error on the <strong>PCI</strong> <strong>Express</strong> interface. The RootComplex is the ultimate recipient for this Message.The initiator of the message is identified with the Requester ID of the message header. TheRoot Complex translates these error messages into platform level events. Refer toSection 7.2 for details on uses for these messages.2.8.1.4. Messages for Support of Locked TransactionsThe <strong>PCI</strong> <strong>Express</strong> specification defines the Unlock Message to support Lock Transactionsequences. The following rules apply to Unlock Message:• The Unlock Message must use the default Traffic Class designator (TC0)See Section 7.5 for details on implementing support for Lock Transaction sequences.2.8.1.5. Slot Power Limit SupportThe Set_Slot_Power_Limit message includes a one DW data payload. This message is usedto convey a slot power limitation value from a Downstream Port (of a Root Complex or aSwitch) to an Upstream Port of component (Endpoint, Switch or a <strong>PCI</strong> <strong>Express</strong>-<strong>PCI</strong>Bridge) attached to the same Link. The data payload is copied from the Slot CapabilitiesRegister of the Downstream Port and is written into the Device Capabilities Register of theUpstream Port on the other side of the Link. Bits 9:8 of the data payload map to the SlotPower Limit Scale field and Bits 7:0 map to the Slot Power Limit Value field. This message issent automatically by the Downstream Port (of a Root Complex or a Switch) when one ofthe following events occurs:o On a Configuration Write to the Slot Capabilities Register (see Section 5.8.9) when theData Link Layer reports DL_Up status.o Anytime when Link transitions from a non-DL_Up status to a DL_Up status (seeSection 2.14).The component on the other side of the Link (Endpoint, Switch or <strong>PCI</strong> <strong>Express</strong>-<strong>PCI</strong>Bridge) that receives Set_Slot_Power_Limit message must copy the values in the datapayload into the Device Capabilities Register associated with the component’s UpstreamPort. <strong>PCI</strong> <strong>Express</strong> components that are targeted exclusively for integration on the systemplanar (e.g. motherboard) as well as components that are targeted for integration on acard/module where power consumption of the entire card/module is below the lowest94


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0power limit specified for the card/module form-factor (as defined in the correspondingelectromechanical specification) are permitted to hardwire the value “0” in the Slot PowerLimit Scale and Slot Power Limit Value fields of the Device Capabilities Register, and arenot required to copy the Set_Slot_Power limit payload into that register.For more details on Power Limit control mechanism see Section 7.9.2.8.1.6. Payload_Defined MessageThe Payload_Defined Message allows expansion of <strong>PCI</strong> <strong>Express</strong> messaging capabilities,either as a general extension to the <strong>PCI</strong> <strong>Express</strong> specification or a vendor-specific extension.Such extensions are not covered specifically in this document. This section defines the rulesassociated with this Message generically.• The Payload_Defined Message includes at least 1 DW of data (see Figure 2-23)o The DW of data immediately following the Header includes two 16 bit fields:oo• Vendor ID (same value as used in Configuration Space Header – seeSection 5.5)• the value 0000h is reserved for non-vendor-specificextensions• Message Sub-TypeThere may be additional data following this 1 DWThe value in the Length field must correspond to the size of the entire datapayload associated with the TLP, including the required DW and anyadditional data payload• Receivers silently discard Payload_Defined Messages which they are not designed toreceive – this is not an error condition.95


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0Byte 0 >Byte 4 >Byte 8 >7 6 5RFmt1 1+0 +1 +2 +34 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4TypeRRequester IDTCReservedD T E P 0 Attr R0TagAddress[63:32]/Reserved3 2 1 0 7 6 5 4 3 2 1 0LengthMessage Code–Payload_DefinedByte 12 >Address[31:2]/ReservedRVendor IDMessage Sub-TypeFigure 2-23: Payload_Defined MessageOM137752.8.1.7. Vendor Specific MessageVendor specific Messages use the Code values 128 to 255.• Receivers silently discard Payload_Defined Messages which they are not designed toreceive – this is not an error condition.2.8.1.8. Hot Plug Signaling MessagesThe Hot plug Signaling Messages are virtual signals between Switches/Root Ports thatsupport Hot plug Event signaling and devices on cards that support Removal Requestfunctionality (doorbell mechanism) on the card. The Messages are defined to replicate theevents and registers defined for doorbell mechanisms wired directly to the Switch/RootPort. For more information see Section 7.7.Note that only devices on cards that support Remove request functionality (doorbellmechanism) on the card and the switch ports/root ports that support such cards arerequired to implement the hot plug signaling messages.96


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0Table 2-15: Hot Plug Signaling MessagesMessageAttention_Indicator_OnAttention_Indicator_BlinkAttention_Indicator_OffPower_Indicator_OnPower_Indicator_BlinkPower_Indicator_OffDescriptionThis message is issued by the Switch/Root Port when theAttention Indicator Control is set to 01b. The end devicereceiving the message will terminate the message and initiateappropriate action for to cause the Attention Indicator locatedon the card to turn on. If no indicators are present on the card,the message is discarded. For more implementationinformation see Section 7.7.This message is issued by the Switch/Root Port when theAttention Indicator Control is set to 10b. The end devicereceiving the message will terminate the message and initiateappropriate action for to cause the Attention Indicator locatedon the card to blink. If no indicators are present on the card, themessage is discarded. For more implementation informationsee Section 7.7.This message is issued by the Switch/Root Port when theAttention Indicator Control is set to 11b. The end devicereceiving the message will terminate the message and initiateappropriate action for to cause the Attention Indicator locatedon the card to turn off. If no indicators are present on the card,the message is discarded. For more implementationinformation see Section 7.7.This message is issued by the Switch/Root Port when thePower Indicator Command is set to 01b. The end devicereceiving the message will terminate the message and initiateappropriate action for to cause the Power Indicator located onthe card to turn on. If no indicators are present on the card, themessage is discarded. For more implementation informationsee Section 7.7.This message is issued by the Switch/Root Port when thePower Indicator Control is set to 10b. The end device receivingthe message will terminate the message and initiate appropriateaction for to cause the Power Indicator located on the card toblink. If no indicators are present on the card, the message isdiscarded. For more implementation information seeSection 7.7.This message is issued by the Switch/Root Port when thePower Indicator Command is set to 11b. The end devicereceiving the message will terminate the message and initiateappropriate action for to cause the Power Indicator located onthe card to turn off. If no indicators are present on the card, themessage is discarded. For more implementation informationsee Section 7.7.97


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0MessageAttention_Button_PressedDescriptionThis message is issued by a device in a slot that implements anAttention Button on the card to signal the Switch/Root Port togenerate the Attention Button Pressed Event. The SwitchSwitch/Root Port terminates the message and sets the AttentionButton Pressed register to 1b which may result in an interruptbeing generated. For more implementation information seeSection 7.7.All Endpoint devices must be able to handle the Attention and Power Indicator messageseven if the device does not implement the indicators. All down stream ports of switches androot ports must be able to handle the Attention_Button_Pressed message.2.8.2. Advanced Switching Support Message GroupThe Messages that belong to this group can be divided into the following two types:• Data Packet Messages:o Unicast, Data Packeto Multicast, Data Packet• Signaling Packet Messages:o Signaling Packet, without interrupto Null signaling Packet, interrupt to Host in the destination Hierarchyo Null signaling Packet, interrupt to destination deviceo Signaling Packet, with interrupt to Host in the destination Hierarchyo Signaling Packet, with interrupt to destination deviceA detailed description of message types and message headers will be presented in a separatedocument entitled Advanced <strong>PCI</strong> <strong>Express</strong> Packet Switching <strong>Specification</strong>. This is a companionspecification to the <strong>PCI</strong> <strong>Express</strong> <strong>Base</strong> <strong>Specification</strong>.98


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.02.9. Ordering and Receive Buffer Flow Control2.9.1. Overview and DefinitionsFlow Control (FC) is used to prevent overflow of receiver buffers and to enable compliancewith the ordering rules defined in Section 2.5. Note that the Flow Control mechanism isused by the Requester to track the queue/buffer space available in the Agent across the Linkas shown in Figure 2-24. That is, Flow Control is point-to-point (across a Link) and notend-to-end. Flow Control does not imply that a Request has reached its ultimate Completer.RequesterLinkIntermediateComponent(s)LinkUltimateCompleterOM13776Figure 2-24: Relationship between Requester and Ultimate CompleterFlow Control is orthogonal to the data integrity mechanisms used to implement reliableinformation exchange between Transmitter and Receiver. Flow Control can treat the flowof TLP information from Transmitter to Receiver as perfect, since the data integritymechanisms ensure that corrupted and lost TLPs are corrected through retransmission (seeSection 3.5).Each Virtual Channel maintains an independent Flow Control credit pool. The FCinformation is conveyed between two sides of the Link using DLLP packets. The VC IDfield of the DLLP is used to carry the Virtual Channel Identification that is required forproper flow-control credit accounting.Flow Control is handled by the Transaction Layer in cooperation with the Data Link Layer.The Transaction Layer performs Flow Control accounting functions for Received TLPs and“gates” TLP Transmissions based on available credits for transmission.Note: Flow Control is a function of the Transaction Layer and therefore the following typesof information transmitted on the interface are not associated with Flow Control Credits:LCRC, Packet Framing Symbols, other Special Symbols, and Data Link Layer to Data LinkLayer inter-communication packets. An implication of this fact is that these types ofinformation must be processed by the receiver at the rate they arrive (except as explicitlynoted in this specification).Also, any TLPs transferred from the Transaction Layer to the Data Link and Physical Layersmust have first passed the Flow Control “gate.” Thus, both Transmit and Receive FlowControl mechanisms are unaware if the Data Link Layer transmits a TLP repeatedly due toerrors on the Link.99


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.02.9.2. Flow Control RulesIn this and other sections of this specification, rules are described using conceptual“registers” that a <strong>PCI</strong> <strong>Express</strong> device could use in order to implement a <strong>PCI</strong> <strong>Express</strong>compliant design. This description does not imply or require a particular implementationand is used only to clarify the requirements.• Flow Control information is transferred using Flow Control Packets (FCPs), whichare a type of DLLP (see Section 3.4)• The unit of Flow Control credit is 16 Bytes for Data• For headers, the unit of Flow Control credit is one header• Each Virtual Channel has independent Flow Control• Flow Control distinguishes three types of TLPs (note relationship to ordering rules –see Section 2.5):o Posted Requests (P) – Messages and Memory Writeso Non-Posted Requests (NP) – All Reads, I/O, and Configuration Writeso Completions (CPL) – Associated with corresponding NP Requests• In addition, Flow Control distinguishes the following types of TLP informationwithin each of the three types:o Headers (H)o Data (D)• Thus, there are six types of information tracked by Flow Control for each VirtualChannel, as shown in Table 2-16.Table 2-16: Flow Control Credit TypesCredit TypePHPDNPHNPDCPLHCPLDAppliestoThisTypeofTLPInformationPosted Request HeadersPosted Request Data payloadNon-Posted Request HeadersNon-Posted Request Data payloadCompletion HeadersCompletion Data payload• TLPs consume Flow Control credits as shown in Table 2-17.100


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0Table 2-17: TLP Flow Control Credit ConsumptionTLP Credit Consumed 8Memory, I/O, Configuration Read Request1 NPH unitMemory Write Request 1PH+nPDunits 9I/O, Configuration Write RequestMessage Requests without dataMessage Requests with dataMemory Read CompletionI/O, Configuration Read CompletionsI/O, Configuration Write Completions1 NPH + 1 NPDNote: size of data written is never more thanone (aligned) DW1 PH unit1 PH + n PD units1 CPLH + n CPLD units1 CPLH unit + 1 CPLD unit1 CPLH unit• Components must implement independent Flow Control for all Virtual Channelsthat are supported by that component.• Flow Control is initialized autonomously by hardware only for the default VirtualChannel (VC0)oVC0 is initialized when the Data Link Layer is in the DL_Init state followingreset (see Sections 3.2 and 3.3)• When other Virtual Channels are enabled by software, each newly enabled VC willfollow the Flow Control initialization protocol (see Section 3.3)o Software enables a Virtual Channel by setting the VC Enable bits for thatVirtual Channel in both components on a Link (see Section 5.11)• For a multi-function device, a given VC is enabled when the VCEnable bit is set for any function; enabling the same VC in additionalfunctions enables those functions to use the VC, but the VC isinitialized only onceNote: It is possible for multiple VCs to be following the Flow Controlinitialization protocol simultaneously – each follows the initialization protocol asan independent process• Software disables a Virtual Channel by clearing the VC Enable bits for that VirtualChannel in both components on a Linko For a multi-function device, the VC Enable bit must be clear for all functionsoDisabling a Virtual Channel for a component resets the Flow Controltracking mechanisms for that Virtual Channel in that component8 Each Header credit implies the ability to accept a TLP Digest along with the corresponding TLP.9 For all cases where “n” appears, n = Roundup(DataLen/FC unit size).101


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0• InitFC1 and InitFC2 FCPs are used only for Flow Control initialization (seeSection 3.3)• An InitFC1, InitFC2, or UpdateFC FCP which specifies a Virtual Channel which isnot enabled is discarded without effect• During FC initialization for any Virtual Channel, including the default VC initializedas a part of Link initialization, Receivers must initially advertise VC credit valuesequal to or greater than those shown in Table 2-18.oComponents may optionally check for violations of this rule. If a componentimplementing this check determines a violation of this rule, the violation is aFlow Control Protocol Error (FCPE)• If checked, this is a reported error associated with the Receiving Port(see Section 7.2)Table 2-18: Minimum Flow Control AdvertisementsCredit TypePHPDNPHNPDCPLHCPLDMinimum Advertisement1 unit – credit value of 01hLargest possible setting of the Max_Payload_Size for thecomponent divided by FC Unit Size.Example: If the largest Max_Payload_Size value supportedis 1024B, the smallest permitted initial credit value wouldbe 040h.1 unit – credit value of 01h1 unit – credit value of 01hSwitch and <strong>PCI</strong> <strong>Express</strong> to <strong>PCI</strong>-X Bridge (<strong>PCI</strong>-X modeonly): 1 FC unit – credit value of 01hRoot Complex, Endpoint, and <strong>PCI</strong> <strong>Express</strong> to <strong>PCI</strong> Bridge:“infinite” FC units – initial credit value of all ‘0’s 10Switch and <strong>PCI</strong> <strong>Express</strong> to <strong>PCI</strong>-X Bridge (<strong>PCI</strong>-X modeonly): Largest possible setting of the Max_Payload_Size forthe component divided by FC Unit Size, or the size of thelargest Read Request the component will ever generate,whichever is smaller.Root Complex, Endpoint, and <strong>PCI</strong> <strong>Express</strong> to <strong>PCI</strong> Bridge:“infinite” FC units – initial credit value of all ‘0’s• If an “infinite” credit advertisement has been made for CPL during initialization, noFlow Control updates (UpdateFC) are sent for CPL following initializationoComponents may optionally check for violations of this rule. If a componentimplementing this check determines a violation of this rule, the violation is aFlow Control Protocol Error (FCPE)10 This value is interpreted as infinite by the Transmitter, which will, therefore, never throttle.102


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0• If checked, this is a reported error associated with the Receiving Port(see Section 7.2)• A TLP using an uninitialized VC is a Malformed TLPo This is a reported error associated with the Receiving Port (see Section 7.2)• For each type of information tracked, there are two quantities tracked for FlowControl TLP Transmission gating:o CREDITS_CONSUMED• Count of the total number of FC units consumed by TLPTransmissions made since Flow Control initialization.• Set to all ‘0’s at Interface Initialization• Incremented for each TLP the Transaction Layer allows to pass theFlow Control gate for Transmission• Size of increment corresponds to the number of creditsconsumed by the information committed to be sent• Incremented as shown:CREDITS_CONSUMED :=[Field Size](CREDITS_CONSUMED + Increment) mod 2Where Increment is the size in FC credits of thecorresponding part of the TLP sent, and [Field Size] is 8 forPH, NPH, and CPLH and 12 for PD, NPD and CPLDoCREDIT_LIMIT• The limit for total number of FC units which have been advertised bythe Receiver since Flow Control initialization• Undefined at Interface Initialization• Set to the value indicated during Flow Control initialization103


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0• For each FC update received,• if CREDIT_LIMIT is not equal to the update value, setCREDIT_LIMIT to update value• Optionally, check the update value validity by evaluating theequation:[Field Size](update value - CREDIT_LIMIT) mod 2> 2 [Field Size] /2,If the equation evaluates as true, the violation is a FlowControl Protocol Erroro If checked, this is a reported error associated with theReceiving Port (see Section 7.2)Note: In accordance with this rule, the largest permitted initial value,change in value, or advertised total for any PH, NPH or CPLH creditvalue is 128. The largest permitted initial value, change in value, oradvertised total for any PD, NPD or CPLD credit value is 2048.• The Transmitter gating function must determine if sufficient credits have beenadvertised to permit the transmission of a given TLP. If the Transmitter does nothave enough credits to transmit the TLP, it must block the transmission of the TLP,possibly stalling other TLPs that are using the same Virtual Channel. TheTransmitter must follow the ordering and deadlock avoidance rules specified inSection 2.5, which require that certain types of TLPs must bypass other specifictypes of TLPs when the latter are blocked. Note that TLPs using different VirtualChannels have no ordering relationship, and must not block each other.• The Transmitter gating function test is performed as follows:o For each required type of credit, the number of credits required is calculatedas:CREDITS_REQUIRED =[Field Size](CREDITS_CONSUMED + )) mod 2o Unless CREDIT_LIMIT was specified as “infinite” during Flow Controlinitialization, the Transmitter is permitted to Transmit a TLP if, for each typeof information in the TLP, the following equation is satisfied:[Field Size](CREDIT_LIMIT - CREDITS_REQUIRED) mod 2


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0• When accounting for credit use and return, information from different TLPs is nevermixed within one credit.• When some TLP is blocked from Transmission by a lack of FC Credit, Transmittersmust follow the ordering rules specified in Section 2.5 when determining what typesof TLPs must be permitted to bypass the stalled TLP.• The return of FC credits for a Transaction must not be interpreted to mean that theTransaction has completed or achieved system visibility.oFlow Control credit return is used for receive buffer management only, andAgents must not make any judgment about the Completion status or systemvisibility of a Transaction based on the return or lack of return of FlowControl information.• When a Transmitter sends a nullified TLP (with inverted LCRC and using EDB asthe end Symbol), the Transmitter does not modify CREDITS_CONSUMED forthat TLP (see Section 3.5.2.1)• For each type of information tracked, the following quantities are tracked for FlowControl TLP Receiver accounting:o CREDITS_ALLOCATED• Count of the total number of credits granted to the Transmitter sinceInitialization• Initially set according to the buffer size and allocation policies of theReceiver• This value is included in the InitFC and UpdateFC DLLPs (seeSection 3.4)• Incremented as the Receiver Transaction Layer makes additionalreceive buffer space available by processing Received TLPs• Increment size corresponds to the size of the space madeavailable• Incremented as shown:CREDITS_ALLOCATED :=[Field Size](CREDITS_ALLOCATED + Increment) mod 2Where [Field Size] is 8 for PH, NPH and CPLH and 12 forPD, NPD, and CPLD105


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0oCREDITS_RECEIVED (Optional – for optional error check describedbelow)• Count of the total number of FC units consumed by valid TLPsReceived since Flow Control initialization• Set to all ‘0’s at Interface Initialization• Incremented for each Received TLP according to the number of FCunits of the given type consumed by the Received TLP, provided thatTLP:• passes the Data Link Layer integrity checks• is not malformed• does not consume more credits than have been allocated (seefollowing rule)• If a Receiver implements the CREDITS_RECEIVED counter, then when a nullifiedTLP (with inverted LCRC and using EDB as the end Symbol) is received, theReceiver does not modify CREDITS_RECEIVED for that TLP (see Section 3.5.2.1)• A Receiver may optionally check for Receiver Overflow errors (TLPs exceedingCREDITS_ALLOCATED), by checking the following equation:[Field Size](CREDITS_ALLOCATED - CREDITS_RECEIVED) mod 2>= 2 [Field Size] /2If the check is implemented and this equation evaluates as true, the Receiver must:o discard the TLP(s) without modifying the CREDITS_RECEIVEDo de-allocate any resources which it had allocated for the TLP(s)If checked, this is a reported error associated with the Receiving Port (seeSection 7.2)Note: Following a Receiver Overflow error, Receiver behavior is undefined, but it isencouraged that the Receiver continues to operate, processing Flow Control updatesand accepting any TLPs which do not exceed allocated credits.106


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0• For NPH, NPD, PH and, if non-infinite, CPLH types, an UpdateFC FCP must bescheduled for Transmission each time the following sequence of events occurs:ooall advertised FC units for a particular type of credit are consumed by TLPsreceivedone or more units of that type are made available by TLPs processed• For PD and, if non-infinite, CPLD types, when the number of available credits is lessthan Max_Payload_Size, an UpdateFC FCP must be scheduled for Transmissioneach time one or more units of that type are made available by TLPs processed• UpdateFC FCPs may be scheduled for Transmission more frequently than isrequired• When the Link is Active, Update FCPs for each enabled type of FC credit must bescheduled for transmission at least once every 10 µs107


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0Implementation Note: Flow Control Update FrequencyFor components subject to receiving streams of TLPs, it is desirable to implement receivebuffers larger than the minimum size required to prevent Transmitter throttling due to lackof available credits. Likewise, UpdateFC FCPs must be returned such that the time requiredto send, receive and process the UpdateFC is sufficient. Table 2-19 shows recommendedvalues for the frequency of transmission based on Link Width and Max_Payload_Size values.The values are calculated as a function of the largest TLP payload size and Link width. Thevalues are measured at the Port of the TLP Receiver, starting with the time the last Symbolof a TLP in received to the first Symbol of the UpdateFC DLLP being transmitted. Thevalues are calculated using the formula:where( Max _ Payload _ Size TLPOverhead )Max_Payload_SizeTLP OverheadUpdateFactorLinkWidthInternalDelay+ * UpdateFactor+ InternalDelayLinkWidthThe value in the Max_Payload_Size field of the LinkCommand RegisterRepresents the additional TLP components which consumeLink bandwidth (Header, LCRC, framing Symbols) and istreated here as a constant value of 24 SymbolsUsed to balance Link bandwidth efficiency and receivebuffer sizes – the value varies according toMax_Payload_Size and Link width, and is included inTable 2-19The operating width of the LinkRepresents the internal processing delays for received TLPsand transmitted DLLPs, and is treated here as a constantvalue of 11 Symbol TimesTable 2-19: UpdateFC Transmission Latency Guidelines by Link Width and MaxPayload (Symbol Times)Link Operating Widthx1 x2 x4 x8 x12 x16 x32Max_Payload_Size128B 223UF = 1.4256B 403UF = 1.4512B 547UF = 1.01024B 1059UF = 1.02048B 2083UF = 1.04096B 4131UF = 1.0117UF = 1.4207UF = 1.4279UF = 1.0535UF = 1.01047UF = 1.02071UF = 1.064UF = 1.4109UF = 1.4145UF = 1.0273UF = 1.0529UF = 1.01041UF = 1.058UF = 2.598UF = 2.578UF = 1.0142UF = 1.0270UF = 1.0526UF = 1.049UF = 3.081UF = 3.0100UF = 2.0185UF = 2.0356UF = 2.0697UF = 2.039UF = 3.063UF = 3.078UF = 2.0142UF = 2.0270UF = 2.0526UF = 2.025UF = 3.037UF = 3.044UF = 2.076UF = 2.0140UF = 2.0268UF = 2.0108


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.02.10. Data Integrity2.10.1. IntroductionThe basic data reliability in <strong>PCI</strong> <strong>Express</strong> is achieved within the Link Layer, which uses a 32-bit CRC (LCRC) code to detect errors in TLPs on a Link-by-Link basis, and applies a Linkby-Linkretransmit mechanism for error recovery. A TLP is a unit of data and transactioncontrol that is created by a data-source at the “edge” of the <strong>PCI</strong> <strong>Express</strong> domain (such as anEndpoint or Root Complex), potentially routed through intermediate components (i.e.,Switches) and consumed by the ultimate <strong>PCI</strong> <strong>Express</strong> recipient. As a TLP passes through aSwitch, the Switch may need to change some control fields without modifying other fieldsthat should not change as the packet traverses the path. Therefore, the LCRC is regeneratedby Switches. Data corruption may occur internally to the Switch, and the regeneration of agood LCRC for corrupted data masks the existence of errors. To ensure end-to-end dataintegrity detection in systems that require high data reliability, a Transaction Layer end-toend32-bit CRC (ECRC) can be placed in the TLP Digest field at the end of a TLP. TheECRC covers all fields that do not change as the TLP traverses the path (invariant fields).The ECRC is generated by the Transaction Layer in the source component, and checked inthe destination component. A Switch that supports ECRC checking checks ECRC on TLPsthat are destined to a destination within the Switch itself. On all other TLPs a Switch mustpreserve the ECRC (forward it untouched) as an integral part of the TLP.2.10.2. ECRC RulesThe capability to generate and check ECRC is reported to software, and the ability to do sois enabled by software (see Section 5.8.3).• If a device is enabled to generate ECRC, it must calculate and apply ECRC for allTLPs originated by the device• Switches must pass TLPs with ECRC unchanged from the Ingress Port to theEgress Port• If a device reports the capability to check ECRC, it must support Advanced ErrorReporting (see Section 7.2)• If a device is enabled to check ECRC, it must do so for all TLPs received by thedevice including ECRCo Note that it is still possible for the device to receive TLPs without ECRC,and these are processed normally – this is not an errorNote that a Switch may perform ECRC checking on TLPs passing through the Switch.ECRC Errors detected by the Switch are reported in the same way any other device wouldreport them, but do not alter the TLPs passage through the Switch.109


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0A 32b ECRC is calculated for the entire TLP (header and data payload) using the followingalgorithm and appended to the end of the TLP (see Figure 2-2):• The ECRC value is calculated using the following algorithm (see Figure 2-25).• The polynomial used has coefficients expressed as 04C1 1DB7h• The seed value (initial value for ECRC storage registers) is FFFF FFFFh• All invariant fields of the TLP header and the entire data payload (if present) areincluded in the ECRC calculation, all bits in variant fields must be set to ‘1’ forECRC calculations.o Bit 0 of the Type field is varianto The EP field is varianto all other fields are invariant• ECRC calculation starts with bit 0 of Byte 0 and proceeds from bit 0 to bit 7 of eachByte of the TLP• The result of the ECRC calculation is complemented, and the complemented resultbits are mapped into the 32b TLP Digest field as shown in Table 2-20.ECRC Result BitTable 2-20: Mapping of Bits into ECRC FieldCorresponding Bit Position in the32b TLP Digest Field0 71 62 53 44 35 26 17 08 159 1410 1311 1212 1113 1014 915 816 2317 2218 21110


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0ECRC Result BitCorresponding Bit Position in the32b TLP Digest Field19 2020 1921 1822 1723 1624 3125 3026 2927 2828 2729 2630 2531 24• The 32b ECRC value is placed in the TLP Digest field at the end of the TLP (seeFigure 2-2)• For TLPs including a TLP Digest field used for an ECRC value, receivers whichsupport end-to-end data integrity checking, check the ECRC value in the TLP Digestfield by:oapplying the same algorithm used for ECRC calculation (above) to thereceived TLP, not including the 32b TLP Digest field of the received TLPo comparing the calculated result with the value in the TLP Digest field of thereceived TLPHow the Receiver makes use of the end-to-end data integrity check provided throughthe ECRC is beyond the scope of this document.111


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0bitorderTLP Byte 2TLP Byte 1TLP Byte 07 6 5 4 3 2 1 0Byte orderInput0 4 C 1 1 D B 7FF FF FF FFFF FF FFFF FF FF FF FF FFFF FF FFFF FF FF FFFF FF FF FF FF FF FF FF FF FF FF FF31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 031 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0TLP 32b ECRCOM13777Figure 2-25: Calculation of 32b ECRC for TLP End to End Data Integrity Protection112


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0Implementation Note: Protection of TD Bit Inside SwitchesIt is of utmost importance that Switches insure and maintain the integrity of the TD bit inTLPs that they receive and forward (i.e., by applying a special internal protectionmechanism), since corruption of the TD bit will cause a tandem device to misinterpret thepresence or absence of the TLP digest field.Similarly, it is highly recommended that Switches provide internal protection to other variantfields in TLPs that they receive and forward, as the end-to-end integrity of variant fields isnot sustained by the ECRC.Implementation Note: Data Link Layer Does Not Have Internal TLP VisibilitySince the Data Link Layer does not process the TLP header (it determines the start and endof the TLP based on indications from the Physical Layer), it is not aware of the existence ofthe TLP Digest field, and simply passes it to the Transaction Layer as a part of the TLP.2.11. Error ForwardingError Forwarding (also known as data poisoning), is enabled in <strong>PCI</strong> <strong>Express</strong> by eithermodifying the value placed in the TLP Digest field or by setting the proper value in the TDand EP fields. The rules for doing this are specified below Here are some examples of caseswhere Error Forwarding might be used:• Example #1: A read from main memory encounters uncorrectable error• Example #2: Parity error on a <strong>PCI</strong> write to main memory• Example #3: Data integrity error on an internal data buffer or cache.2.11.1. Error Forwarding Usage Model• Error Forwarding is only used for Read Completion Data or Write Data, never forthe cases when the error is in the “header” (request phase, address/command, etc.).Requests/Completions with header errors cannot be forwarded in general since truedestination cannot be positively known and, therefore, forwarding may cause director side effects such as data corruption, system failures, etc.• Used for controlled propagation of error through the system, system diagnostics, etc.• Does not cause Link Layer Retry – Poisoned TLPs will be retried only if there aretransmission errors on <strong>PCI</strong> <strong>Express</strong> as determined by the TLP error detectionmechanisms in the Data Link Layer. The Poisoned TLP may ultimately cause theoriginator of the request to re-issue it (at the Transaction Layer of above) in the caseof read operation or to take some other action. Such use of Error Forwardinginformation is beyond the scope of this specification.113


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.02.11.2. Rules For Use of Data Poisoning• Support for TLP poisoning in a Transmitter is optional.• Data Poisoning applies only to the data within a Write Request (Posted or Non-Posted) or a Read Completion.• Poisoning of a TLP is indicated using one of the following two mechanisms:o TD field = ‘1’: The value used for the TLP Digest field is the “stomp code”of all ‘1’so TD field = ‘0’ and EP field = ‘1’• If a Transmitter supports data poisoning , TLPs that are known to the Transmitterto include bad data must use one of the two poisoning mechanism defined above.The Receiver must consider all the information within a poisoned TLP to be affectedoIf applying Error Forwarding, the Receiver must cause all data from theindicated TLP to be tagged as bad (“poisoned”).• Receipt of a poisoned TLP is a reported error associated with the Receivingdevice/function (see Section 7.2)Note: For some applications it may be desirable for the Receiver to use data markedcorrupt – such use is not forbidden. How the Receiver makes use of the information that aTLP is poisoned is beyond the scope of this document.2.12. Completion Timeout MechanismIn any split transaction protocol, there is a risk associated with the failure of a Requester toreceive an expected Completion. To allow Requesters to attempt recovery from thissituation in a standard manner, the Completion Timeout mechanism is defined. Thismechanism is intended to be activated only when there is no reasonable expectation that theCompletion will be returned, and should never occur under normal operating conditions.Note that the values specified here do not reflect expected service latencies, and must not beused to estimate typical response times.The <strong>PCI</strong> <strong>Express</strong> elements that are capable of initiating Requests that invoke Completionsmust implement Completion Timeout mechanism. An exception is made for ConfigurationRequests (see below). This mechanism is activated for each Request which requiresCompletion when the Request is transmitted. Since <strong>PCI</strong> <strong>Express</strong> Switches do notautonomously initiate Requests (that need Completion), the requirement for CompletionTimeout support is limited only to Root Complex, <strong>PCI</strong> <strong>Express</strong>-<strong>PCI</strong> Bridges, and Endpointdevices.The <strong>PCI</strong> <strong>Express</strong> <strong>Specification</strong> defines the following range for the min/max acceptabletimer values for the Completion Timeout mechanism:• The Completion Timeout timer must not expire (i.e., cause timeout event) in less than10 ms.• The Completion Timeout timer must expire if a Request is not completed in 50 ms.114


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0A Completion Timeout is a reported error associated with the Requestor device/function(see Section 7.2).Note: A Memory Read Request for which there are multiple Completions must beconsidered “completed” only when all Completions have been received by the Requester. Ifsome, but not all, requested data is returned before the Completion Timeout timer expires,the Requestor is permitted to keep or to discard the data which was returned prior to timerexpiration.Configuration Requests have special requirements (see Sections 2.7.6.2 and 7.6). Because ofthese special requirements, the support and timer values for a Completion Timeout forConfiguration Requests are implementation specific.2.13. Transaction Layer Behavior in DL_Down StatusDL_Down status indicates that there is no connection with another component on the Link,or that the connection with the other component has been lost and is not recoverable by thePhysical or Data Link Layers. This section specifies the Transaction Layer’s behavior whenthe Data Link Layer reports DL_Down status to the Transaction Layer, indicating that theLink is non-operational.For a Root Complex, or any Port on a Switch other than the one closest to the RootComplex, DL_Down status is handled by:• returning all internal logic to the state specified for Link initialization• forming completions for any Requests submitted by the device core for Transmission,returning “Unsupported Request” Completion Status, then discarding the RequestsoThis is a reported error associated with the device/function for the (virtual)Bridge associated with the Port (see Section 7.2)o Requests already being processed by the Transaction Layer, for which it may notbe practical to return Completions, are discardedNote: This is equivalent to the case where the Request had been Transmittedbut not yet Completed before the Link status became DL_Down• These cases are handled by the Requester using the CompletionTimeout mechanismNote: The point at which a Request becomes “uncompletable” is implementationspecific• discarding all Completions submitted by the device core for TransmissionFor a Port on an Endpoint, and the Port on a Switch or Bridge which is closest to the RootComplex, DL_Down status is handled as a Link reset by:• returning all internal logic to the state specified for Link initialization• discarding all TLPs being processed• (for Switch and Bridge) propagating Link Reset to all other Ports115


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.02.14. Transaction Layer Behavior in DL_Up StatusDL_Up status indicates that a connection has been established with another component onthe associated Link. This section specifies the Transaction Layer’s behavior when the DataLink Layer reports entry to the DL_Up status to the Transaction Layer, indicating that theLink is operational. These behaviors relate to Slot Power Limit support.For a Downstream Port on a Root Complex or a Switch:• When transitioning from a non-DL_Up Status to a DL_Up Status, the Port must initiatethe transmission of a Set_Slot_Power_Limit message to the other component on theLink to convey the value programmed in the Slot Power Limit Scale and Value fields ofthe Slot Capabilities register.116


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.033. Data Link Layer <strong>Specification</strong>The Data Link Layer acts as an intermediate stage between the Transaction Layer and thePhysical Layer. Its primary responsibility is providing a reliable mechanism for exchangingTransaction Layer Packets (TLPs) between the two components on a Link.3.1. Data Link Layer Overview1 Described in TextTransactionTransaction4 1 23Data LinkData LinkPhysicalLogical Sub-blockPhysicalLogical Sub-blockElectrical Sub-blockElectrical Sub-blockRXTXRXTXOM13778Figure 3-1: Layering Diagram Highlighting the Data Link LayerThe Data Link Layer is responsible for reliably conveying Transaction Layer Packets (TLPs)supplied by the Transaction Layer across a <strong>PCI</strong> <strong>Express</strong> Link to the other component’sTransaction Layer. Services provided by the Data Link Layer include:Data Exchange:• Accept TLPs for transmission from the Transmit Transaction Layer and conveythem to the Transmit Physical Layer• Accept TLPs received over the Link from the Physical Layer and convey them to theReceive Transaction Layer117


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0Error Detection and Retry:• TLP Sequence Number and LCRC generation• Transmitted TLP storage for Data Link Layer Retry• Data integrity checking for TLPs and Data Link Layer Packets (DLLPs)• Acknowledgement and Retry DLLPs• Error indications for error reporting and logging mechanisms• Link Acknowledgement Timeout replay mechanismInitialization and power management services:• Track Link state and convey active/reset/disconnected state to Transaction LayerData Link Layer Packets (DLLPs) are:• used for Link Management functions including TLP acknowledgement, powermanagement, and conveyance of Flow Control information.• transferred between Data Link Layers of the two directly connected components ona LinkDLLPs are sent point-to-point, between the two components on one Link. TLPs are routedfrom one component to another, potentially through one or more intermediate components.Data Integrity checking for DLLPs and TLPs is done using a CRC included with each packetsent across the Link. DLLPs use a 16b CRC and TLPs (which can be much longer) use a32b LCRC. TLPs additionally include a sequence number, which is used to detect caseswhere one or more entire TLPs have been lost.Received DLLPs which fail the CRC check are discarded. The mechanisms which useDLLPs may suffer a performance penalty from this loss of information, but are selfrepairingsuch that a successive DLLP will supercede any information lost.TLPs which fail the data integrity checks (LCRC and sequence number), or which are lost intransmission from one component to another, are re-sent by the transmitter. Thetransmitter stores a copy of all TLPs sent, re-sending these copies when required, and purgesthe copies only when it receives a positive acknowledgement of error-free receipt from theother component. If a positive acknowledgement has not been received within a specifiedtime period, the transmitter will automatically start re-transmission. The receiver can requestan immediate re-transmission using a negative acknowledgement.The Data Link Layer appears as an information conduit with varying latency to theTransaction Layer. On any given individual Link all TLPs fed into the Transmit Data LinkLayer (1 and 3) will appear at the output of the Receive Data Link Layer (2 and 4) in thesame order at a later time, as illustrated in Figure 3-1. The latency will depend on a numberof factors, including pipeline latencies, width and operational frequency of the Link,transmission of electrical signals across the Link, and delays caused by Data Link LayerRetry. Because of these delays, the Transmit Data Link Layer (1 and 3) can applybackpressure to the Transmit Transaction Layer, and the Receive Data Link Layer (2 and 4)118


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0communicates the presence or absence of valid information to the Receive TransactionLayer.3.2. Data Link Control and Management StateMachineThe Data Link Layer tracks the state of the Link. It communicates Link status with theTransaction and Physical Layers, and performs Link Management through the PhysicalLayer. The Data Link Layer contains a Link Control and Management State Machine toperform these tasks. The states for this machine are described below, and are shown inFigure 3-2.States:• DL_Down – Physical Layer reporting Link is non-operational or Port is notconnected• DL_Init – Physical Layer reporting Link is operational, initialize Flow Control forthe default Virtual Channel• DL_Active – Normal operation modeResetDL_DownDL_InitDL_ActiveOM13779Figure 3-2: Data Link Control and Management State Machine119


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.03.2.1. Data Link Control and Management State MachineRulesRules per state:• DL_Inactiveo Initial state following <strong>PCI</strong> <strong>Express</strong> hot, warm, or cold reset (seeSection 7.6)o Upon entry to DL_Inactive• Reset all Data Link Layer state information to default values• Discard the contents of the Data Link Layer Retry Buffer (referSection 3.5)o While in DL_Inactive:• Report DL_Down status to the Transaction Layer as well as to therest of the Data Link LayerNote: This will cause the Transaction Layer to discard anyoutstanding transactions and to terminate internally any attempts totransmit a TLP. For a Port on a Root Complex or at the “bottom” ofa Switch, this is equivalent to a “hot remove.” For Port on anEndpoint or at the “top” of a Switch, having the Link go down isequivalent to a hot reset (see Section 2.13).• Discard TLP information from the Transaction and Physical Layers• Do not generate or accept DLLPso Exit to DL_Init if:• Indication from the Transaction Layer that the Link is not disabled bysoftware and the Physical Layer reports Physical LinkUp = 1• DL_Inito While in DL_Init:• Initialize Flow Control for the default Virtual Channel, VC0, followingthe Flow Control initialization protocol described in Section 3.3• Report DL_Down status while in state FC_INIT1; DL_Up status instate FC_INIT2o Exit to DL_Active if:• Flow Control initialization completes successfully, and the Physical Layercontinues to report Physical LinkUp = 1o Terminate attempt to initialize Flow Control for VC0 and Exit toDL_Inactive if:• Physical Layer reports Physical LinkUp = 0120


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0• DL_Activeo DL_Active is referred to as the normal operating stateo While in DL_Active:• Accept and transfer TLP information with the Transaction and PhysicalLayers as specified in this chapter• Generate and accept DLLPs as specified in this chapter• Report DL_Up status to the Transaction and Data Link Layerso Exit to DL_Inactive if:• Physical Layer reports Physical LinkUp = 03.3. Flow Control Initialization ProtocolBefore starting normal operation following power-up or interconnect Reset, it is necessary toinitialize Flow Control for the default Virtual Channel, VC0 (see Section 7.6). In addition,when additional Virtual Channels (VCs) are enabled, the Flow Control initialization processmust be completed for each newly enabled VC before it can be used (see Section 2.6). Thissection describes the initialization process which is used for all VCs. Note that since VC0 isenabled before all other VCs, no TLP traffic of any kind will be active prior to initializationof VC0. However, when additional VCs are being initialized there will typically be TLPtraffic flowing on other, already enabled, VCs. Such traffic has no direct effect on theinitialization process for the additional VC(s).There are two states in the VC initialization process. These states are:• FC_INIT1• FC_INIT2The rules for this process are given in the following section. Figure 3-3 shows a flowchart ofthe process.121


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0StartStart resend timerOnce started, continues to runwhile process is activeTransmit InitFC1-P for VCxReceivedInitFC1 or InitFC2DLLP for VCxNoTransmit InitFC1-NP for VCxYesNoTransmit InitFC1-Cpl for VCxRecord indicatedFC unit value forVCx and set Flagaccording to type:P, NP or CplYesTimer roll-over?NoFlags setfor all types(P, NP and Cpl)YesState: FC_INIT1Start resend timerOnce started, continues to runwhile process is activeTransmit InitFC2-P for VCxReceivedany InitFC2 DLLP,UpdateFC DLLP orany TLP for VCx?NoTransmit InitFC2-NP for VCxYesTransmit InitFC2-Cpl for VCxSet FlagNoYesTimer roll-over?NoFlag set?YesState: FC_INIT2EndFigure 3-3: Flowchart Diagram of Flow Control Initialization ProtocolOM13780122


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.03.3.1. Flow Control Initialization State Machine Rules• Rules for state FC_INIT1:o Entered when initialization of a VC (VCx) is required• Entrance to DL_Init state• When a VC is enabled by software (see Section 5.11)o While in FC_INIT1:• Transaction Layer must block transmission of TLPs using VCx• Transmit the following uninterrupted sequence of three successiveInitFC1 DLLPs for VCx in the following pattern:o• InitFC1 – P (first)• InitFC1 – NP (second)• InitFC1 – Cpl (third)• Repeat this InitFC1 DLLP transmission sequence as follows:• For VC0, transmit continuously at the maximum rate possibleon the Link (resend timer value is 0)• For VCs other than VC0, repeat the sequence when no otherTLPs or DLLPs are available for Transmission, but no lessfrequently than at an interval of 8 µs (-0% / +100%),measured from the start of transmission of the precedingsequence• Process received InitFC1 and InitFC2 DLLPs:Exit to FC_INIT2 if:• Record the indicated FC unit values• Set Flag FI1 once FC unit values have been recorded for eachof P, NP and Cpl• Flag FI1 has been set indicating that FC unit values have beenrecorded for each of P, NP and Cpl or VCx123


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0• Rules for state FC_INIT2:o While in FC_INIT2:• Transmission of TLPs using VCx by the Transaction Layer ispermitted• Transmit the following uninterrupted sequence of three successiveInitFC2 DLLPs for VCx in the following pattern:o• InitFC2 – P (first)• InitFC2 – NP (second)• InitFC2 – Cpl (third)• Repeat this InitFC2 DLLP transmission sequence as follows:• For VC0, transmit continuously at the maximum rate possibleon the Link (resend timer value is 0)• For VCs other than VC0, repeat the sequence when no otherTLPs or DLLPs are available for Transmission, but no lessfrequently than at an interval of 8 µs (-0% / +100%),measured from the start of transmission of the precedingsequence• Process received InitFC1 and InitFC2 DLLPs:• Ignore the indicated FC unit values• Set flag FI2 on receipt of any InitFC2 DLLP or VCx• Set flag FI2 on receipt of any TLP on VCx, or any UpdateFC DLLPfor VCxSignal completion and exit if:• Flag FI2 has been set• Violations of Flow Control initialization protocol are Data Link Layer ProtocolErrors (DLLPE). Checking for such errors in FC initialization protocol is optional.If checking is implemented, any detected error is a reported error associated with thePort (see Section 7.2)124


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.03.4. Data Link Layer Packets (DLLPs)The following DLLPs are used to support Link data integrity mechanisms:• Ack DLLP: TLP Sequence number acknowledgement; used to indicate successfulreceipt of some number of TLPs• Nak DLLP: TLP Sequence number negative acknowledgement; used to initiate aData Link Layer Retry• InitFC1, InitFC2 and UpdateFC DLLPs: For Flow Control• Plus additional DLLPs used for Power Management3.4.1. Data Link Layer Packet RulesAll DLLPs include the following fields:• DLLP Type - Specifies the type of DLLP. The defined encodings are shown inTable 3-1.• 16b CRCSee Figure 3-4.Table 3-1: DLLP Type EncodingsEncodingsDLLP Type0000 0000 Ack0001 0000 Nak0010 0000 PM_Enter_L10010 0001 PM_Enter_L230010 0010 PM_Active_State_Request_L0s0010 0011 PM_Active_State_Request_L10010 0100 PM_Request_Ack0011 0000 Vendor Specific – Not used in normal operation0100 0v 2 v 1 v 0 InitFC1-P (v[2:0] specifies Virtual Channel)0101 0v 2 v 1 v 0 InitFC1-NP0110 0v 2 v 1 v 0 InitFC1-Cpl1100 0v 2 v 1 v 0 InitFC2-P1101 0v 2 v 1 v 0 InitFC2-NP1110 0v 2 v 1 v 0 InitFC2-Cpl1000 0v 2 v 1 v 0 UpdateFC-P1001 0v 2 v 1 v 0 UpdateFC-NP1010 0v 2 v 1 v 0 UpdateFC-CplAll other encodingsReserved125


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0• For Ack and Nak DLLPs (see Figure 3-5):o The AckNak_Seq_Num field is used to indicate what TLPs are affectedo Transmission and Reception is handled by the Data Link Layer according tothe rules elsewhere in this chapter.• For InitFC1, InitFC2, and UpdateFC DLLPs:o The HdrFC field contains the credit value for Headers of the indicated type(P, NP, or Cpl)o The DataFC field contains the credit value for payload Data of the indicatedtype (P, NP, or Cpl)o The packet formats are shown in Figure 3-6, Figure 3-7, and Figure 3-8o Transmission is triggered by the Data Link Layer when initializing FlowControl for a Virtual Channel (see Section 3.3), and following Flow Controlinitialization by the Transaction Layer according to the rules in Section 2.9o Checked for integrity on reception by the Data Link Layer, then theinformation content of the DLLP is passed to the Transaction LayerNote: InitFC1 and InitFC2 DLLPs are used only for VC initialization• Power Management (PM) DLLPs (see Figure 3-9):o Transmission is triggered by the component’s power management logicaccording to the rules in Chapter 6o Checked for integrity on reception by the Data Link Layer, then passed tothe component’s power management logic• Vendor Specific (see Figure 3-10)+0 +1 +2 +37 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0Byte 0 >DLLP TypeByte 4 >16b CRCFigure 3-4: DLLP Type and CRC FieldsOM14303126


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0Byte 0 >Byte 4 >+0 +1 +2 +37 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 00000 0000 - Ack0001 0000 - NakReservedAckNak_Seq_Num16b CRCFigure 3-5: Data Link Layer Packet Format for Ack and NakOM13781Byte 0 >Byte 4 >+0 +1 +2 +37 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 00100 - P0101 - NP0110 - Cpl0v[2:0]R-VC ID16b CRCHdrFCRDataFCFigure 3-6: Data Link Layer Packet Format for InitFC1OM13782Byte 0 >Byte 4 >+0 +1 +2 +37 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 01100 - P1101 - NP1110 - Cpl0v[2:0]R-VC ID16b CRCHdrFCRDataFCFigure 3-7: Data Link Layer Packet Format for InitFC2OM13783Byte 0 >Byte 4 >+0 +1 +2 +37 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 01000 - P1001 - NP1010 - Cpl0v[2:0]R-VC ID16b CRCHdrFCRDataFCFigure 3-8: Data Link Layer Packet Format for UpdateFCOM13784+0 +1 +2 +37 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0Byte 0 >Byte 4 >0 0 1 0 0 x x x16b CRCReservedFigure 3-9: PM Data Link Layer Packet FormatOM14304127


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0+0 +1 +2 +37 6 5 4 3 2 1 07 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0Byte 0 >Byte 4 >0 0 1 1 0 0 0 016b CRC{defined by vendor}Figure 3-10: Vendor Specific Data Link Layer Packet FormatOM14305The following are the characteristics and rules associated with Data Link Layer Packets(DLLPs):• DLLPs are differentiated from TLPs when they are presented to, or received from,the Physical Layer.• DLLP data integrity is protected using a 16b CRC• The CRC value is calculated using the following rules (see Figure 3-11):o The polynomial used for CRC calculation has coefficients expressed as100Bho The seed value (initial value for CRC storage registers) is FFFFho CRC calculation starts with bit 0 of Byte 0 and proceeds from bit 0 to bit 7of each Byteo Note that CRC calculation uses all bits of the DLLP, regardless of field type,including reserved fields. The result of the calculation is complemented, thenplaced into the 16b CRC field of the DLLP as shown in Table 3-2.128


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0Table 3-2: Mapping of Bits into CRC FieldCRC Result BitCorresponding Bit Position in the16b CRC Field0 71 62 53 44 35 26 17 08 159 1410 1311 1212 1113 1014 915 8DLLP Byte 2DLLP Byte 1DLLP Byte 07 6 5 4 3 2 1 0bitorderByte orderInput1 0 0 BFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF15141312111098765432101514131211109876543210Figure 3-11: Diagram of CRC Calculation for DLLPsOM13785129


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.03.5. Data Integrity3.5.1. IntroductionThe Transaction Layer provides TLP boundary information to Data Link Layer. This allowsthe Data Link Layer to apply a Sequence Number and Link CRC (LCRC)error detection tothe TLP. The Receive Data Link Layer validates received TLPs by checking the SequenceNumber, LCRC code and any error indications from the Receive Physical Layer. In case oferror in a TLP, Data Link Layer Retry is used for recovery.The format of a TLP with the Sequence Number and LCRC code applied is shown inFigure 3-12.+0 +1 +2 +37 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1ReservedTLP Sequence Number{TLP Header}+(N-3) +(N-2) +(N-1) +N1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 031 0LCRCFigure 3-12: TLP with LCRC and Sequence Number AppliedOM137863.5.2. LCRC, Sequence Number, and Retry Management(TLP Transmitter)The TLP transmission path through the Data Link Layer (paths labeled 1 and 3 inFigure 3-1) prepares each TLP for transmission by applying a sequence number, thencalculating and appending a Link CRC (LCRC) which is used to ensure the integrity of TLPsduring transmission across a Link from one component to another. TLPs are stored in aretry buffer, and are re-sent unless a positive acknowledgement of receipt is received fromthe other component. If repeated attempts to transmit a TLP are unsuccessful, thetransmitter will determine that the Link is not operating correctly, and instruct the PhysicalLayer to retrain the Link. If Link retraining fails, the Physical Layer will indicate that theLink is no longer up, causing the DLCMSM to move to the DL_Inactive state.The mechanisms used to determine the TLP LCRC and the Sequence Number and tosupport Data Link Layer Retry are described in terms of conceptual “counters” and “flags”.This description does not imply nor require a particular implementation and is used only toclarify the requirements.130


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.03.5.2.1. LCRC and Sequence Number Rules (TLP Transmitter)The following counters and timer are used to explain the remaining rules in this section:• The following 12 bit counters are used:o NEXT_TRANSMIT_SEQ – Stores the packet sequence number applied toTLPso• Set to all ‘0’s in DL_Inactive stateACKD_SEQ – Stores the sequence number acknowledged in the mostrecently received Ack or Nak DLLP.• Set to all ‘1’s in DL_Inactive state• The following 2 bit counter is used:o REPLAY_NUM – Counts the number of times the Retry Buffer has beenre-transmitted• Set to all ‘0’s in DL_Inactive state• The following timer is used:o REPLAY_TIMER - Counts time since last Ack or Nak DLLP received• Started at the start of any TLP transmission or retransmission, if notalready running• Restarts for each Ack/Nak DLLP received while there areunacknowledged TLPs outstanding, if, and only if, the received Ackor Nak DLLP acknowledges some TLP in the retry buffer• Note: This ensures that REPLAY_TIMER is reset only whenforward progress is being made• Resets and holds when there are no outstanding unacknowledgedTLPsThe following rules describe how a TLP is prepared for transmission before being passed tothe Physical Layer:• The Transaction Layer indicates the start and end of the TLP to the Data Link Layerwhile transferring the TLPo The Data Link Layer treats the TLP as a “black box” and does not processor modify the contents of the TLP131


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0• Each TLP is assigned a 12 bit sequence number when it is accepted from theTransmit side of Transaction LayeroUpon acceptance of the TLP from the Transaction Layer, the packetsequence number is applied to the TLP by:• prepending the 12 bit value in NEXT_TRANSMIT_SEQ to the TLP• prepending four Reserved bits to the TLP, preceding the sequencenumber (see Figure 3-12)o If the equation:(NEXT_TRANSMIT_SEQ – ACKD_SEQ) mod 4096 >= 2048is true, the Transmitter must cease accepting TLPs from the Transaction Layeruntil the equation is no longer trueo Following the application of NEXT_TRANSMIT_SEQ to a TLP acceptedfrom the Transmit side of Transaction Layer, NXT_TRANSMIT_SEQ isincremented:NEXT_TRANSMIT_SEQ := (NEXT_TRANSMIT_SEQ + 1) mod 4096+0 +1 +2 +37 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1ReservedTLP Sequence Number[TLP Header]OM13787Figure 3-13: TLP Following Application of Sequence Number and Reserved Bits• TLP data integrity is protected during transfer between Data Link Layers using a 32bLCRC• The LCRC value is calculated using the following algorithm (see Figure 3-14)o The polynomial used has coefficients expressed as 04C1 1DB7ho The seed value (initial value for LCRC storage registers) is FFFF FFFFho The LCRC is calculated using the TLP following sequence numberapplication (see Figure 3-13)o LCRC calculation starts with bit 0 of Byte 0 (bit 8 of the TLP sequencenumber) and proceeds from bit 0 to bit 7 of each successive Byte.• Note that LCRC calculation uses all bits of the TLP, regardless offield type, including reserved fieldso The result of the LCRC calculation is complemented, and the complementedresult bits are mapped into the 32b LCRC field as shown in Table 3-3.132


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0Table 3-3: Mapping of Bits into LCRC FieldLCRC Result BitCorresponding Bit Position in the32b LCRC Field0 71 62 53 44 35 26 17 08 159 1410 1311 1212 1113 1014 915 816 2317 2218 2119 2020 1921 1822 1723 1624 3125 3026 2927 2828 2729 2630 2531 24oThe 32b LCRC field is appended to the TLP following the bytes receivedfrom the Transaction Layer (see Figure 3-12)133


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0bitorderTLP Byte 0Sequence Num.Res. Seq.#7 6 5432 1 0Byte orderInput0 4 C 1 1 D B 7FF FF FF FFFF FF FFFF FF FF FF FF FFFF FF FFFF FF FF FFFF FF FF FF FF FF FF FF FF FF FF FF31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 031 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0TLP 32b LCRCFigure 3-14: Calculation of LCRCOM13788134


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0To support cut-through routing of TLPs, a Transmitter is permitted to modify a transmittedTLP to indicate that the receiver must ignore that TLP (“nullify” the TLP).• A Transmitter is permitted to nullify a TLP being transmitted; to do this in a waywhich will robustly prevent misinterpretation or corruption, the Transmitter must doboth of the following:o use the remainder of the calculated LCRC value without inversionoindicate to the Transmit Physical Layer that the final framing Symbol mustbe EDB instead of END• When this is done, the Transmitter does not increment NEXT_TRANSMIT_SEQThe following rules describe the operation of the Data Link Layer Retry Buffer, from whichTLPs are re-transmitted when necessary:• Copies of Transmitted TLPs must be stored in the Data Link Layer Retry Buffer• If the Transmit Retry Buffer contains TLPs for which no Ack or Nak DLLP hasbeen received, and (as indicated by REPLAY_TIMER) no Ack or Nak DLLP hasbeen received for a period exceeding the time indicated in Table 3-4, the Transmitter:o blocks acceptance of new TLPs from the Transmit Transaction Layero completes transmission of the TLP currently being transmitted, if anyo starts re-transmitting TLPs from the Retry Buffer, starting with the oldestTLP in the buffer and continuing in original transmission ordero stops re-transmission from the Retry Buffer and incrementsREPLAY_NUM, if all entries in the Retry Buffer have been re-transmittedo Re-enables acceptance of new TLPs from the Transmit Transaction LayerThis is a reported error associated with the Port (see Section 7.2).• If REPLAY_NUM rolls over from “11” to “00” (indicating the Retry Buffer hasbeen re-transmitted four times without receiving an Ack or Nak), the Transmittersignals the Physical Layer to retrain the Link. This is a reported error associated withthe Port (see Section 7.2).o Note that Data Link Layer state, including the contents of the Retry Buffer,are not reset by this action unless the Physical Layer reports Physical LinkUp= 0 (causing the Data Link Control and Management State Machine totransition to the DL_Inactive state)135


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0Table 3-4 defines the threshold values for the REPLAY_TIMER timer. The values arespecified according to the largest TLP payload size and Link width.The values are measured at the Port of the TLP Transmitter, from last Symbol of TLP toFirst Symbol of TLP retransmission. The values are calculated using the formula (note – thisis simply three times the Ack Latency value – see Section 3.5.3.1):⎛⎜⎝( Max _ Payload _ Size + TLPOverhead )LinkWidth* AckFactor⎞+ InternalDelay⎟* 3⎠whereMax_Payload_SizeTLP OverheadAckFactorLinkWidthInternalDelayis the value in the Max_Payload_Size field of the LinkCommand Registerrepresents the additional TLP components which consumeLink bandwidth (Header, LCRC, framing Symbols) and istreated here as a constant value of 24 Symbolsis used to balance Link bandwidth efficiency and retrybuffer size – the value varies according toMax_Payload_Size and Link width, and is included inTable 3-5is the operating width of the Linkrepresents the internal processing delays for received TLPsand transmitted DLLPs, and is treated here as a constantvalue of 11 Symbol TimesTable 3-4: REPLAY_TIMER Limits by Link Width and Max_Payload_Size (SymbolTimes) Tolerance: -0% / +100%Link Operating WidthMax_Payload_Sizex1 x2 x4 x8 x12 x16 x32128B 669 351 192 174 147 117 75256B 1209 621 327 294 243 189 111512B 1641 837 435 234 300 234 1321024B 3177 1605 819 426 555 426 2282048B 6249 3141 1587 810 1068 810 4204096B 12393 6213 3123 1578 2091 1578 804136


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0Implementation Note: Recommended Priority of Scheduled TransmissionsWhen multiple DLLPs of the same type are scheduled for transmission but have not yetbeen transmitted, it is possible in many cases to “collapse” them into a single DLLP. Forexample, if a scheduled Ack DLLP transmission is stalled waiting for another transmissionto complete, and during this time another Ack is scheduled for transmission, it is onlynecessary to transmit the second Ack, since the information it provides will supercede theinformation in the first Ack.In addition to any TLP from the Transaction Layer (or the Retry Buffer, if a retry is inprogress), Multiple DLLPs of different types may be scheduled for transmission at the sametime, and must be prioritized for transmission. The following list shows the preferredpriority order for selecting information for transmission. Note that the priority of thevendor specific DLLP is not listed, as this is completely implementation specific, and there isno recommended priority. Note that this priority order is a guideline, and that in all cases itis a fairness mechanism is highly recommended to ensure that no type of traffic is blockedfor an extended or indefinite period of time by any other type of traffic. Note that the AckLatency value and REPLAY_TIMER limit specify requirements measured at the Port of thecomponent, and the internal arbitration policy of the component must ensure that theseexternally measured requirements are met.1) completion of any transmission (TLP or DLLP) currently in progress (highest priority)2) Nak DLLP transmissions3) Ack DLLP transmissions scheduled for transmission as soon as possible due toreceipt of a duplicate TLP –OR–expiration of the Ack latency timer (see Section 3.5.3.1)4) FC DLLP transmissions required to satisfy Section 2.95) Retry Buffer re-transmissions6) TLPs from the Transaction Layer7) FC DLLP transmissions other than those required to satisfy Section 2.98) All other DLLP transmissions (lowest priority)137


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0Since Ack/Nak and Flow Control DLLPs affect TLPs flowing in the opposite directionacross the Link, the TLP transmission mechanisms in the Data Link Layer are alsoresponsible for Ack/Nak and Flow Control DLLPs received from the other component onthe Link. These DLLPs are processed according to the following rules (see Figure 3-15):• If the Physical Layer indicates a Receiver Error, discard any DLLP currently beingreceived and free any storage allocated for the DLLP. Note that reporting sucherrors to software is done by the Physical Layer (and so are not reported by the DataLink Layer).• For all received DLLPs, the CRC value is checked by:o applying the same algorithm used for calculation (above) to the receivedDLLP, not including the 16b CRC field of the received DLLPo comparing the calculated result with the value in the CRC field of thereceived DLLP• if not equal, the DLLP is corrupto A corrupt received DLLP is discarded, and is a reported error associated withthe Port (see Section 7.2).• A received DLLP which is not corrupt, but which uses unsupported DLLP Typeencodings is discarded without further action. This is not considered an error.• Non-zero values in Reserved fields are ignored.• Receivers must process all DLLPs received at the rate they are received138


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0StartDidPhysical Layerindicate any receive errorsfor this DLLPYesNoCalculate CRC usingreceived DLLP,not including CRC fieldDiscard DLLPEndCalculated CRCequal to received value?NoYesProcess DLLPError: Bad DLLP;Discard DLLPEndOM13789Figure 3-15: Received DLLP Error Check Flowchart• Received FC DLLPs are passed to the Transaction Layer• Received PM DLLPs are passed to the component’s power management controllogic• For Ack and Nak DLLPs, the following steps are followed (see Figure 3-16):o If the AckNak_Seq_Num does not specify the Sequence Number of anunacknowledged TLP, or of the most recently acknowledged TLP, the DLLPis discarded• If the DLLP is an Ack DLLP, this is a DL Layer Protocol Errorwhich is a reported error associated with the Port (see Section 7.2).139


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0oooIf the AckNak_Seq_Num does not specify the Sequence Number of themost recently acknowledged TLP, then the DLLP acknowledges some TLPsin the retry buffer:• Purge from the retry buffer all TLPs from the oldest to the onecorresponding to the AckNak_Seq_Num• Load ACKD_SEQ with the value in the AckNak_Seq_Num field• Reset REPLAY_NUM and REPLAY_TIMERIf the DLLP is a Nak, initiate a replay (see below)If REPLAY_TIMER expires due to a failure to make progress onunacknowledged TLPs, initiate a replay. This is a reported error associatedwith the Port (see Section 7.2).StartYes(NEXT_TRANSMIT_SEQ -AckNak_Seq_Num) mod4096


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0The following rules describe the operation of the Data Link Layer Retry Buffer, from whichTLPs are re-transmitted when necessary:• Copies of Transmitted TLPs must be stored in the Data Link Layer Retry BufferWhen a replay is initiated, either due to reception of a Nak or due to REPLAY_TIMERexpiration, the following rules must be followed:• If all TLPs transmitted have been acknowledged, terminate replay, otherwisecontinue• Increment REPLAY_NUM• Complete transmission of any TLP currently being transmitted• Retransmit unacknowledged TLPs, starting with the oldest unacknowledged TLPand continuing in original transmission orderooooOnce all unacknowledged TLPs have been re-transmitted, return to normaloperationIf any Ack or Nak DLLPs are received during a replay, the transmitter ispermitted to complete the replay without regard to the Ack or Nak DLLP(s),or to skip retransmission of any newly acknowledged TLPs• Once the transmitter has started to resend a TLP, it must completetransmission of that TLP in all casesAck and Nak DLLPs received during a replay must be processed, and may becollapsed• Example: If multiple Acks are received, only the one specifying thelatest Sequence Number value must be considered – Acks specifyingearlier Sequence Number values are effectively “collapsed” into thisone• Example: During a replay, Nak is received, followed by an Ackspecifying a later Sequence Number – the Ack supercedes the Nak,and the Nak is ignoredNote: Since all entries in the Retry Buffer have already been allocated spacein the Receiver by the Transmitter’s Flow Control gating logic, no furtherflow control synchronization is necessary.141


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.03.5.3. LCRC and Sequence Number (TLP Receiver)The TLP receive path through the Data Link Layer (paths labeled 2 and 4 in Figure 3-1)processes TLPs received by the Physical Layer by checking the LCRC and sequence number,passing the TLP to the receive Transaction Layer if OK and requesting a retry if corrupted.The mechanisms used to check the TLP LCRC and the Sequence Number and to supportData Link Layer Retry are described in terms of conceptual “counters” and “flags”. Thisdescription does not imply or require a particular implementation and is used only to clarifythe requirements.3.5.3.1. LCRC and Sequence Number Rules (TLP Receiver)The following counter, flag, and timer are used to explain the remaining rules in this section:• The following 12 bit counter is used:o NEXT_RCV_SEQ – Stores the expected Sequence Number for the next TLP• Set to all ‘0’s in DL_Inactive state• The following flag is used:o NAK_SCHEDULED• Cleared when in DL_Inactive state• The following timer is used:o AckNak_LATENCY_TIMER – Counts time since an Ack or Nak DLLP wasscheduled for transmission• Set to 0 in DL_Inactive state• Restart from 0 each time an Ack or Nak DLLP is scheduled fortransmission; Reset to 0 when all TLPs received have beenacknowledged with an Ack DLLP• If there are initially no unacknowledged TLPs and a TLP is thenreceived, the AckNak_LATENCY_TIMER starts counting onlywhen the TLP has been forwarded to the Receive Transaction Layer142


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0The following rules are applied in sequence to describe how received TLPs are processed,and what events trigger the transmission of Ack and Nak DLLPs (see Figure 3-17):• If the Physical Layer indicates a Receiver Error, discard any TLP currently beingreceived and free any storage allocated for the TLP. Note that reporting such errorsto software is done by the Physical Layer (and so are not reported by the Data LinkLayer).oIf a TLP was being received at the time the receive error was indicated andthe NAK_SCHEDULED flag is clear,• schedule a Nak DLLP for transmission• set the NAK_SCHEDULED flag• If the Physical Layer reports that the received TLP end framing Symbol was EDB,and the LCRC is the logical NOT of the calculated value, discard the TLP and freeany storage allocated for the TLP. This is not considered an error.• The LCRC value is checked by:o applying the same algorithm used for calculation (above) to the receivedTLP, not including the 32b LCRC field of the received TLPo comparing the calculated result with the value in the LCRC field of thereceived TLP• if not equal, the TLP is corrupt - discard the TLP and free anystorage allocated for the TLP• If the NAK_SCHEDULED flag is clear,o schedule a Nak DLLP for transmissiono set the NAK_SCHEDULED flagThis is a reported error associated with the Port (see Section 7.2).• If the TLP Sequence Number is not equal to the expected value, stored inNEXT_RCV_SEQ:o discard the TLP and free any storage allocated for the TLPo If the TLP Sequence Number satisfies the following equation:(NEXT_RCV_SEQ - TLP Sequence Number) mod 4096


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0oOtherwise, the TLP is out of sequence (indicating one or more lost TLPs):• if the NAK_SCHEDULED flag is clear,• schedule a Nak DLLP for transmission• set the NAK_SCHEDULED flag• report TLP missingThis is a reported error associated with the Port (see Section 7.2).• If the TLP Sequence Number is equal to the expected value stored inNEXT_RCV_SEQ:oooThe Reserved bits, Sequence Number, and LCRC are removed and theremainder of the TLP is forwarded to the Receive Transaction Layer• The Data Link Layer indicates the start and end of the TLP to theTransaction Layer while transferring the TLP• The Data Link Layer treats the TLP as a “black box” anddoes not process or modify the contents of the TLP• Note that the Receiver Flow Control mechanisms do not account forany received TLPs until the TLP(s) are forwarded to the ReceiveTransaction LayerNEXT_RCV_SEQ is incrementedIf set, the NAK_SCHEDULED flag is cleared144


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0StartDid Physical Layerindicate any receiveerrors for this TLP?YesNoCalculate CRC usingreceived TLP,not including CRC fieldTLP end framingSymbol END?No [EDB]Calculated CRCequal to logical NOT ofreceived value?YesNoYesDiscard TLP:Free any allocated storageEndCalculated CRCequal to received value?NoYes(NEXT_RCV_SEQ -Sequence Number) mod4096


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0• In addition to the other requirements for sending Ack DLLPs, an Ack or NakDLLP must be transmitted when all of the following conditions are true:oooThe Data Link Control and Management State Machine is in the DL_ActivestateTLPs have been accepted, but not yet acknowledged by sending anAcknowledgement DLLPThe AckNak_LATENCY_TIMER reaches or exceeds the value specified inTable 3-5• Data Link Layer Acknowledgement DLLPs may be Transmitted more frequentlythan required• Data Link Layer Ack and Nak DLLPs specify the value (NEXT_RCV_SEQ - 1) inthe AckNak_Seq_Num fieldTable 3-5 defines the threshold values for the AckNak_LATENCY_TIMER timer, whichfor any specific case is called the Ack Latency. The values are specified according to thelargest TLP payload size and Link width. The values are measured at the Port of the TLPReceiver, starting with the time the last Symbol of a TLP is received to the first Symbol ofthe Ack/Nak DLLP being transmitted. The values are calculated using the formula:( Max _ Payload _ Size TLPOverhead )+ * AckFactor+ InternalDelayLinkWidthwhereMax_Payload_SizeTLP OverheadAckFactorLinkWidthInternalDelayis the value in the Max_Payload_Size field of the LinkCommand Registerrepresents the additional TLP components which consumeLink bandwidth (Header, LCRC, framing Symbols) and istreated here as a constant value of 24 Symbolsis used to balance Link bandwidth efficiency and retrybuffer size – the value varies according toMax_Payload_Size and Link width, and is defined inTable 3-5is the operating width of the Linkrepresents the internal processing delays for received TLPsand transmitted DLLPs, and is treated here as a constantvalue of 11 Symbol Times146


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0Table 3-5: Ack Transmission Latency Limit and AckFactor by Link Width and MaxPayload (Symbol Times)Max_Payload_SizeLink Operating Widthx1 x2 x4 x8 x12 x16 x32128B 223AF = 1.4117AF = 1.464AF = 1.458AF = 2.549AF = 3.039AF = 3.025AF = 3.0256B 403AF = 1.4207AF = 1.4109AF = 1.498AF = 2.581AF = 3.063AF = 3.037AF = 3.0512B 547AF = 1.0279AF = 1.0145AF = 1.078AF = 1.0100AF = 2.078AF = 2.044AF = 2.01024B 1059AF = 1.0535AF = 1.0273AF = 1.0142AF = 1.0185AF = 2.0142AF = 2.076AF = 2.02048B 2083AF = 1.01047AF = 1.0529AF = 1.0270AF = 1.0356AF = 2.0270AF = 2.0140AF = 2.04096B 4131 2071 1041 526 697 526 268AF = 1.0 AF = 1.0 AF = 1.0 AF = 1.0 AF = 2.0 AF = 2.0 AF = 2.0Implementation Note: Retry Buffer SizingThe Retry Buffer should be large enough to ensure that under normal operating conditions,transmission is never throttled because the retry buffer is full. In determining the optimalbuffer size, one must consider the Ack Latency value (Table 3-5), any differences betweenthe actual implementation and the internal processing delay used to generate these values,and the delays caused by the physical Link interconnect.Note that the Ack Latency values specified ensure that the range of permitted outstandingSequence Numbers will never be the limiting factor causing transmission stalls.147


148<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.044. Physical Layer <strong>Specification</strong>4.1. IntroductionThe Physical Layer isolates the Transaction and Data Link Layers from the signalingtechnology used for Link data interchange. The Physical Layer is divided into the Logicaland Electrical functional sub-blocks (see Figure 4-1).TransactionTransactionData LinkPhysicalLogical Sub-blockData LinkPhysicalLogical Sub-blockElectrical Sub-blockElectrical Sub-blockRXTXRXTXOM13792Figure 4-1: High Level Layering Diagram Highlighting Physical Layer4.2. LOGICAL SUB-BLOCKThe Logical sub-block has two main sections: a Transmit section that prepares outgoinginformation passed from the Data Link Layer for transmission by the Electrical sub-block,and a Receiver section that identifies and prepares received information before passing it tothe Data Link Layer.The Logical sub-block and Electrical sub-block coordinate the state of each transceiverthrough a status and control register interface or functional equivalent. The Logical subblockdirects control and management functions of the Physical Layer.Receivers may optionally check for violations of the rules associated with Receiver functionssuch as Symbol decoding and the like. If such checking is implemented, violations cause the149


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0indication of a Receiver Error to the Data Link Layer. A Receiver Error is a reported errorassociated with the Port (see Section 7.2).4.2.1. Symbol Encoding<strong>PCI</strong> <strong>Express</strong> uses an 8b/10b transmission code. The definition of this transmission code isidentical to that specified in ANSI X3.230-1994, clause 11 (and also IEEE 802.3z, 36.2.4).Using this scheme, eight bit Characters and one control bit are treated as three bits and fivebits mapped onto a four-bit code group and a six bit code group, respectivley. The controlbit in conjunction with the data character is used to identify when to encode one of the 12special symbols included in the 8b/10b transmission code. These code groups areconcatenated to form a ten-bit Symbol. As shown in Figure 4-2, ABCDE maps to abcdeiand FGH maps to fghj.TransmitReceiveTX, Control TX, Control MSBLSB7 6 5 4 3 2 1 0 Z8 bits + ControlH,G,F,E,D,C,B,A,ZMSBLSB7 6 5 4 3 2 1 0 Z8 bits + ControlH,G,F,E,D,C,B,A,Z8b 10bEncode10 bitsj,h,g,f,l,e,d,c,b,a10b 8bDecode10 bitsj,h,g,f,l,e,d,c,b,aMSBLSBMSBLSB98 76 5 4 3 2 1 098 76 5 4 3 2 1 0Figure 4-2: Character to Symbol MappingOM13793150


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.04.2.1.1. Serialization and De-serialization of DataThe bits of a Symbol are placed on a Lane starting with bit ‘a’ and ending with bit ‘j’.Examples are shown in Figure 4-3 and Figure 4-4.Symbol forByte 0Symbol forByte 1Symbol forByte 2Symbol forByte 3Symbol forByte 4a b c d e i f g h j a b c d e i f g h j a b c d e i f g h j a b c d e i f g h j a b c d e i f g h jtime = 0 time =1x Symbol Timetime =2x Symbol Timetime =3x Symbol Timetime =4x Symbol Timetime =5x Symbol TimeFigure 4-3: Bit Transmission Order on Physical Lanes - x1 ExampleOM13808Symbol for:Symbol for:Byte 0 Byte 4Lane 0a b c d e i f g h j a b c d e i f g h jByte 1 Byte 5Lane 1a b c d e i f g h j a b c d e i f g h jByte 2 Byte 6Lane 2a b c d e i f g h j a b c d e i f g h jByte 3 Byte 7Lane 3a b c d e i f g h j a b c d e i f g h jtime = 0 time =1x Symbol Timetime =2x Symbol TimeOM13809Figure 4-4: Bit Transmission Order on Physical Lanes - x4 Example151


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.04.2.1.2. Special Symbols for Framing and Link Management (Kcodes)The 8b/10b encoding scheme used by <strong>PCI</strong> <strong>Express</strong> provides Special Symbols that aredistinct from the Data Symbols used to represent Characters. These Special Symbols areused for various Link Management mechanisms described later in this chapter. SpecialSymbols are also used to frame DLLPs and TLPs, using distinct Special Symbols to allowthese two types of Packets to be quickly and easily distinguished.Table 4-1 shows the Special Symbols used for <strong>PCI</strong> <strong>Express</strong> and provides a brief descriptionfor the use of each. The use of these Symbols will be discussed in greater detail in followingsections.Table 4-1: Special SymbolsEncoding Symbol Name DescriptionK28.5 COM Comma Used for Lane and Link initializationand managementK27.7 STP Start TLP Marks the start of a TransactionLayer PacketK28.2 SDP Start DLLP Marks the start of a Data Link LayerPacketK29.7 END End Marks the end of a Transaction LayerPacket or a Data Link Layer PacketK30.7 EDB EnD Bad Marks the end of a nullified TLPK23.7 PAD Pad Used in Framing and Link Width andLane ordering negotiationsK28.0 SKP Skip Used for compensating for differentbit rates for two communicating portsK28.1 FTS Fast Training Sequence Used within an ordered-set to exitfrom L0s to L0K28.7 ReservedK28.3 IDL Idle Electrical Idle symbol used in theelectrical idle ordered-setK28.4 ReservedK28.6 ReservedK28.7 Reserved152


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.04.2.2. Framing and Application of Symbols to LanesThe Framing mechanism uses Special Symbol K28.2 “SDP” to start a DLLP and SpecialSymbol K27.7 “STP” to start a TLP. The Special Symbol K29.7 “END” is used to mark theend of either a TLP or a DLLP.The conceptual stream of Symbols must be mapped from its internal representation, whichis implementation dependent, onto the external Lanes. The Symbols are mapped onto theLanes such that the first Symbol (representing Character 0) is placed onto Lane 0, the secondis placed onto Lane 1, etc. The x1 Link represents a degenerate case, and the mapping istrivial, with all Symbols placed onto the single Lane in order.When no packet information or special ordered-sets are being transmitted, the Transmitter isin the Logical Idle state. During this time idle data must be transmitted. The idle data mustconsist of the data byte 0 (00 Hexadecimal), scrambled according to the rules ofSection 4.2.3 and 8b/10b encoded according to the rules of Section 4.2.1, in the same waythat TLP and DLLP data characters are scrambled and encoded. Likewise, when theReceiver is not receiving any packet information or special ordered-sets, the Receiver is inLogical Idle and shall receive idle data as described above. During transmission of the idledata, the skip ordered-set must continue to be transmitted as specified in Section 4.2.7.4.2.2.1. Framing and Application of Symbols to Lanes – RulesIn this section, “placed” is defined to mean a requirement on the transmitter to put thesymbol into the proper Lane of a Link.• TLPs must be framed by placing an STP Symbol at the start of the TLP and an ENDSymbol or EDB Symbol at the end of the TLP (see Figure 4-5).• DLLPs must be framed by placing an SDP Symbol at the start of the DLLP and anEND Symbol at the end of the DLLP.• Logical Idle is defined to be a period of one or more Symbol times when noinformation: TLPs, DLLPs or any type of Special Symbol is beingTransmitted/Received. Unlike Electrical Idle, during Logical Idle the Idle character(00h) is being transmitted and received.ooWhen the Transmitter is in Logical Idle, the Idle data character (00h) shall betransmitted on all Lanes. This is scrambled according to the rules inSection 4.2.3.Receivers must ignore incoming Logical data, and must not have any dependencyother than scramble sequencing on any specific data patterns.• For Links wider than x1, the STP Symbol (representing the start of a TLP) must beplaced in Lane 0 when starting Transmission of a TLP from a Logical Idle Linkcondition.• For Links wider than x1, the SDP Symbol (representing the start of a DLLP) must beplaced in Lane 0 when starting Transmission of an DLLP from a Logical Idle Linkcondition.153


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0• The STP Symbol must not be placed on the Link more frequently than once per SymbolTime.• The SDP Symbol must not be placed on the Link more frequently than once per SymbolTime.• As long as the above rules are satisfied, TLP and DLLP Transmissions are permitted tofollow each other successively.• One STP symbol and one SDP symbol may be placed on the Link in the same symboltime.Note: For x8 and wider Links, this means that STP and SDP Symbols can be placed in Lane4*N, where N is a positive integer. For example, for x8, STP and SDP Symbols can beplaced in Lanes 0 and 4; and for x16, STP and SDP Symbols can be placed in Lanes 0, 4, 8,or 12.• For xN Links where N is 8 or more, if an END Symbol is placed in a Lane K, where Kdoes not equal N-1, and is not followed by a STP or SDP Symbol in Lane K+1 (i.e.,there is no TLP or DLLP immediately following), then PAD Symbols must be placed inLanes K+1 to Lane N-1.o Example: on a x8, if END is placed in Lane 3, PAD must be placed in Lanes 4to 7, when not followed by STP or SDP.• The EDB symbol is used to mark the end of a nullified TLP. Refer to Section 3.5.2.1 forinformation on the usage of EDB.• Receivers may optionally check for violations of the rules of this section. If suchchecking is implemented, violations cause the indication of a Receiver Error to the DataLink Layer. A Receiver Error is a reported error associated with the Port (seeSection 7.2).Symbol 0 Symbol 1 Symbol 2STP Reserved Packet Sequence NumberSymbol (N-3) Symbol (N-2) Symbol (N-1)LCRC ValueENDFigure 4-5: TLP with Framing Symbols AppliedOM13794154


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0+0 +1 +2 +3Byte 0 >Byte 4 >SDPENDSymbol 0 Symbol 1 Symbol 2 Symbol 3 Symbol 4 Symbol 5 Symbol 6 Symbol 7OM13795Figure 4-6: DLLP with Framing Symbols AppliedReserved bits andSequence Numberadded byData Link LayerSTP Framing Symboladded by Physical LayerTLP generated byTransaction LayerLCRC added byData Link LayerEND Framing Symboladded by Physical LayerFigure 4-7: Framed TLP on a x1 LinkOM13796155


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0Lane 0 Lane 1STP/END Framing Symbols-Physical LayerSequence Number/LCRC-Data Link LayerTLP - Transaction LayerFigure 4-8: Framed TLP on a x2 LinkOM13797Lane 0 Lane 1Lane 2 Lane 3STP/END Framing Symbols-Physical LayerSequence Number/LCRC-Data Link LayerTLP - Transaction LayerFigure 4-9: Framed TLP on a x4 LinkOM137984.2.3. Data ScramblingThe scrambling function can be implemented with one or many Linear Feedback ShiftRegister’s (LFSR’s) on a multi-Lane Link. When there is more than one transmit LFSR perLink, these must operate in concert, maintaining the same simultaneous (see Table 4-4) valuein each LFSR. When there is more than one receive LFSR per Link, these must operate inconcert, maintaining the same simultaneous (see Table 4-5) value in each LFSR. Regardlessof how it’s implemented, the LFSRs must interact with data on a Lane-by-Lane basis as ifthere was a separate LFSR as described here for each Lane within that Link. On the transmitside, scrambling is applied to characters prior to the 8b/10b encoding. On the receive sidede-scrambling is applied to characters after 8b/10b decoding.The LFSR is graphically represented in Figure 4-10. Scrambling or unscrambling isperformed by serially XORing the 8-bit (D0-D7) character with the 16-bit (S0-S15) outputof the LFSR. An output of the LFSR, S15, is XORed with D0 of the data to be processed.The LFSR and data register are then serially advanced and the output processing is repeated156


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0for D1 through D7. The LFSR is advanced after the data is XORed. The LFSR implementsthe polynomial:G(X)=X 16 +X 15 +X 13 +X 4 +1Data scrambling rules:• The COM character initializes the LFSR.• The LFSR value is advanced eight serial shifts for each character except the SKP.• All data characters (D codes) except those within a Training Sequence Ordered-sets(TS1, TS2) and the Compliance Pattern are scrambled.• All special characters (K codes) are not scrambled.The initialized value of an LFSR seed (S0-S15) is 0FFFFh. Immediately after a COM exitsthe transmit LFSR, the LFSR on the transmit side is initialized. Every time a COM enters thereceive LFSR on any Lane of that Link, the LFSR on the receive side is initialized.Scrambling is enabled by default. It can be disabled for diagnostic purposes by setting bit 3in symbol 5 of the training sequence ordered-sets. If a training sequence ordered-set \isreceived with this bit set in all Lanes, scrambling must be disabled until the next reset occurs.See Table 4-2 and Table 4-3.For more information on scrambling, see Appendix C.S0S3S4 S5 S11 S12 S13 S14 S15> D Q Q> D Q Q> D Q Q> D Q Q> D Q Q> D Q Q> D Q Q> D Q Q> D Q QD7 D6 D5 D4 D3 D2 D1 D0Data In> D Q Q> D Q Q> D Q Q> D Q Q> D Q Q> D Q Q> D Q Q> D Q QData OutOM13799Figure 4-10: LFSR with Scrambling Polynomial4.2.4. Link Initialization and TrainingThis section defines the Physical Layer control process that configures and initializes eachLink for normal operation. This section covers following functions:• Configuring and initializing the Link.• Supporting normal packet transfers.• Supported state transitions when recovering from Link errors.• Restarting a Port from low power states.157


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0The following are discovered and determined during the training process:• Link width.• Link data rate 11 .• Lane reversal.• Polarity inversion.Training does:• Link data rate 12 negotiation.• Bit synchronization per Lane.• Lane polarity.• Symbol synchronization per Lane.• Lane ordering within a Link.• Link width negotiation.• Lane-to-Lane de-skew within a multi-Lane Link.Receivers may optionally check for violations of the Link Initialization and TrainingProtocols. If such checking is implemented, any violation is a Training Error. A TrainingError is a reported error associated with the Port (see Section 7.2). A Training Error isconsidered fatal to the Link.4.2.4.1. Training Sequence Ordered-setsTraining sequences are composed of ordered-sets used for bit alignment, symbol alignmentand to exchange physical layer parameters. Training sequence ordered-sets are neverscrambled but are always 8b/10b encoded. SKP ordered-sets may be transmitted duringtraining sequences but never interrupt a TS1 or TS2 ordered-set.Any reference in the state machine section indicating that 16 ordered-sets are to betransmitted after receiving “n” number of ordered-sets means to send at least 16 additionalordered-sets after the reception of at least “n” ordered-sets. This is in addition to theordered-sets sent while waiting for “n” ordered-sets to be received.In order for N_FTS to be valid two or more TSx ordered-sets must be received with thesame value.Anytime two consecutive TS1 or TS2 ordered-sets are received in any state with the reset bitset the Link Control Reset state must be entered directly.Anytime two consecutive TS1 or TS2 ordered-sets are received in any state with theLoopback Bit set, the Loopback state must be entered directly.11 This specification only defines one data rate. Future revisions will define additional rates.12 This specification defines the mechanism for negotiating the Link operational bit rate to the highestsupported operational data rate.158


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0Anytime two consecutive TS1 or TS2 ordered-sets are received in any state with the DisableBit set, the Disable state must be entered directly.When desired, Scrambling Disable bit must be set for all TS1 and TS2 sequences to ensurethat scrambling will be disabled. If TS1 and TS2 are received with the Scrambling Disable bitset, scrambling is disabled for that entire Lane (both directions). Scrambling remainsdisabled until the Link is reset.Skip ordered-sets may be sent between consecutive TS1 or TS2 ordered-sets. Idle data is notallowed between consecutive TS1 or TS2 ordered-sets.The Link control bits for Scrambling Disable, Reset, Link Disable, and Loopback Enable aremutually exclusive, only one of these bits may be set at a time. If more than one of theScrambling Disable, Reset, Link Disable or Loopback Enable bits are set the behavior isundefined.SymbolNumberTable 4-2: TS1 Ordered-SetAllowed Values Encoded Values Description0 K28.5 COMMA code group for symbol alignment1 0-255 D0.0 - D31.7,K23.72 0-31 D0.0 - D31.0,K23.7Link Number within componentLane Number within Port3 0 – 255 D0.0 - D31.7 N_FTS. This is the number of fast trainingordered-sets required by the receiver toobtain reliable bit and symbol lock.4 1 D1.0 Data Rate IdentifierBit 0 – Reserved, set to 0Bit 1 = 1, generation 1 (2.5 Gb/s) data ratesupportedBit 2:7 – Reserved, set to 05 Bit 0 = 0, 1Bit1=0,1Bit2=0,1D0.0, D1.0, D2.0,D4.0,D8.0Link ControlBit 0 = 0, De-assert ResetBit 0 = 1, Assert ResetBit3=0,1Bit1=0,EnableLinkBit 1 = 1, Disable LinkBit 4:7 = 0Bit 2 = 0, No LoopbackBit 2 = 1, Enable LoopbackBit3=0,EnableScramblingBit 3 = 1, Disable ScramblingBit4:7,Reserved6-15 D10.2 TS1 Identifier159


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0SymbolNumberTable 4-3: TS2 Ordered-SetAllowed Values Encoded Values Description0 K28.5 COMMA code group for symbol alignment1 0-255 D0.0 - D31.7,K23.72 0-31 D0.0 - D31.0,K23.7Link Number within componentLane Number within Port3 0 – 255 D0.0 - D31.7 N_FTS. This is the number of fast trainingordered-sets required by the receiver toobtain reliable bit and symbol lock.4 1 D1.0 Data Rate IdentifierBit 0 – Reserved, set to 0Bit 1 = 1, generation 1 (2.5 Gb/s) data ratesupportedBit 2:7 – Reserved, set to 05 Bit 0 = 0, 1Bit1=0,1Bit2=0,1D0.0, D1.0, D2.0,D4.0,D8.0Link ControlBit 0 = 0, De-assert ResetBit 0 = 1, Assert ResetBit3=0,1Bit1=0,EnableLinkBit 1 = 1, Disable LinkBit 4:7 = 0Bit 2 = 0, No LoopbackBit 2 = 1, Enable LoopbackBit3=0,EnableScramblingBit 3 = 1, Disable ScramblingBit4:7,Reserved6-15 D5.2 TS2 Identifier4.2.4.2. Lane Polarity InversionDuring the training sequence, the receiver looks at symbols 6-15 of TS1 and TS2 as theindicator of Lane polarity inversion (D+ and D- are swapped). If Lane polarity inversionoccurs, the TS1 symbols 6-15 received will be D21.5 as opposed to the expected D10.2.Similarly, if Lane polarity occurs, symbols 6-15 of the TS2 ordered-set will be D26.5 asopposed to the expected D5.2. This provides the clear indication of Lane polarity inversion.If polarity inversion is detected the receiver must invert the received data. The transmittermust never invert the transmitted data. Support for Lane Polarity Inversion is required onall <strong>PCI</strong> <strong>Express</strong> Lanes.160


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.04.2.4.3. Fast Training Sequence (FTS)FTS is the mechanism that is used for bit and symbol synchronization when transitioningfrom L0s to L0. The FTS is used by the receiver to detect the exit from Electrical Idle andalign the receiver’s bit/symbol receive circuitry to the incoming data. See Section 4.2.5 for adescription of L0 and L0s.A single FTS training sequence is an ordered-set composed of one K28.5 (COM) symboland three K28.1 symbols. The maximum number of FTS ordered-sets (N_FTS) is 255,providing a bit time synchronization of 4 * 255 * 10 * UI.After initial power up, the N_FTS value is exchanged in the TS1/TS2 training sequence.N_FTS defines the number of FTS ordered-set that must be transmitted when transitioningfrom L0s to L0. For the data rate in this specification, this corresponds to a bit lock time of16 ns to 4 µs.When transitioning from L0s to L0, the receiver shall observe the period of time fromElectrical Idle Exit to the time that the receiver obtains bit and symbol alignment. If theN_FTS period of time expires prior to the receiver obtaining alignment on all lanes of aLink, the receiver must transition to the Recovery state in order to recover the Linkalignment. This sequence is detailed in the LTSSM in Section 4.2.5.4.2.4.4. Link Error RecoveryAt any time the Physical Layer can be directed to enter the Recovery state, as described inSection 3.5. Refer to Section 7.2 for more information on behavior when the physical layerreports errors.4.2.4.5. Link ResetThere are two types of reset, one at the physical layer that is platform specific (“Power GoodReset” or cold/warm reset) and one that is passed in the Link Control Register (bit number0 of symbol 5) in the TS1 and TS2 ordered-sets. This reset is called the Link Control Resetor hot reset.161


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.04.2.4.5.1. Physical Layer Reset (“Power Good”A Physical Layer Reset is provided by the system to the logical sub-block and is used toproperly initialize the port. This Physical Layer Reset must be asserted when the power tothe device does not meet the device specifications. This Physical Layer Reset may also beasserted by other control agents in the device (for instance the Link Layer, the TransactionLayer or a software mechanism) to assert reset to the Physical Layer. The following must bemet when this reset is asserted:• The receiver terminations are disabled.• The transmitter must hold a constant DC common mode voltage on the differentialpair using a high impedance driver. For the definition of high impedance in thiscontext, see Table 4-4.4.2.4.5.2. Link Control Reset (Hot Reset)In addition to Physical Layer Reset, a Protocol Reset is defined. This Link Control Resetuses a reset indicator bit defined in the Link Control Register (Table 4-2, Table 4-3) that issent during the training sequence. An upstream device sets this bit to force a reset of all ofthe downstream devices and links. Optionally a downstream device may use this reset toreset other logic within the device. The method and mechanisms to do this isimplementation specific.When a bridge receives a training sequence with the reset bit asserted, it must propagate thatreset onto all downstream links by transmitting the TS1 ordered-sets with the reset bitasserted. Link Control Reset shall not propagate upstream. All other physical layerinformation exchanged in those ordered-sets must be accurate and correct.All Lanes within a multi-Lane Link transmit the TS1, TS2 ordered-sets during Link ControlReset. When Link Control Reset is removed, each transmitter and receiver must enterDetect.Unless otherwise specified the terms “reset” and “power-on/reset” in this chapter refer tothe Physical Layer Reset.4.2.4.6. Link DisableA Link can be disabled if directed. When directed to this state the following behavioroccurs:• The Port drives its transmitters to high impedance.• The receiver terminations must be disabled.• There should be no response to any received data.When directed to disable a Link, all Lanes within a multi-Lane Link transmit a minimum of 4and a maximum of 16 TS1 ordered-sets with the Disable Link bit set. The Link remainsdisabled until directed or a physical reset occurs.162


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0After a physical reset or after being directed out of Link disable, the next state is Detect.4.2.4.7. Link Data Rate NegotiationAll devices are required to initialize and configure with generation 1 data rate on each Lane.During initialization, a field is passed in the training sequence (see Section 4.2.4) to indicatethe maximum capable data rate for the Lane. This document specifies the data rate of2.5 Gb/s in each direction on each Lane.4.2.4.8. Link Width and Lane Sequence Negotiation<strong>PCI</strong> <strong>Express</strong> Links must consist of 1, 2, 4, 8, 12, 16, or 32 Lanes in parallel, referred to as x1,x2, x4, x8, x12, x16, and x32 links respectively. All Lanes within a Link shall transmit databased on the exact same frequency.The negotiation process is described as a sequence of steps. The negotiation establishesvalues for Link Number and Lane Number for each Lane that is part of a valid Link; eachLane that is not part of a valid Link exits the negotiation with values of K23.7 (PAD-out ofrange) for Link Number and Lane number.During Link width and Lane sequence negotiation, the two communicating ports mustaccommodate the maximum allowed Lane-Lane skew as specified by L RX-SKEW in Table 4-5.Optional behaviors are described to comprehend fixed configuration components andcomponents to be used in the implementation of advanced switching cross-links(Section 1.6). Annex specifications to this specification may impose other rules andrestrictions that must be comprehended by components compliant to those annexspecifications; it is the intent of this specification to comprehend interoperability for a broadrange of component capabilities.4.2.4.8.1. Required/Optional Port BehaviorThe ability for a set of transceivers to become one port and form one Link or becomemultiple ports and form multiple links is optional.A xN port must be capable of forming a xN Link as well as a x1 Link (where N can be 32,16, 12, 8, 4, 2, and 1). All other widths between N and 1 are optional.Support for Lane reversal at any and all ports is optional.163


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.04.2.4.8.2. Steps to Negotiate the Width and Lane Ordering of LinksWhile in the configuration state, for Lanes that have successfully completed the bitsynchronization, polarity inversion and symbol synchronization training, componentsnegotiate the Link width and sequence of the Lanes within each Link via the steps of:Step 1:The upstream component initializes Link numbers:An upstream component (downstream port) assigns unique Link numbers to groups ofLanes capable of being unique links13. Indivisible groups of Lanes (those that can onlybe configured as a Lane within one Link) must connect to at most one downstreamcomponent (upstream port). The initial Link numbers are presented on each Lane to thedownstream component(s). Until indicated (step 3), Lane numbers are presented asK23.7 (PAD-out of range). Upstream ports present their Link numbers and Lanenumbers as K23.7 (PAD-out of range).Mechanism: The upstream component shall send out the TS1 ordered-sets with theassigned Link numbers inserted into the Link number field (symbol 1) on the groups ofLanes capable of being unique Links and the Lane number field (symbol 2) set to K23.7.Example of a set of eight lanes on an upstream component capable of negotiating tobecome on x8 port when connected to one downstream component or two x4 portswhen connected to two different downstream components: The upstream component(downstream port(s)) sends out TS1 ordered-sets with the Link number set to N on fourlanes and Link number set to N+1 on the other four lanes. The Lane numbers are all setto K23.7. The resultant number of links that are formed as well as their width(s) isdependant upon the system configuration as well as the capabilities of the downstreamcomponents.Note: From this point on the rules are written to describe how each individual Link isconfigured. Regardless of the number of links a component supports, each Link isnegotiated with the same rules that follow. Lanes within unique (only capable of beingconfigured into one Link) and aggregated (capable of being configured into more thanone) links must comply with timing rules in (Section 4.2.4.9). Independent, unique linkshave independent timing and control of negotiation. Unique and aggregated links aremapped with one and only one <strong>PCI</strong>-to-<strong>PCI</strong> bridge structure (Section 1.4).13 The most flexible case being all Lanes could be separate x1 Links. The most restrictive case being allLanes as part of one Link.164


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0Step 2:The upstream port (downstream component) responds with Link numberassignments:A downstream component (upstream port) assigns the Link number (label) byassigning a common Link number to each of its lanes connected to the upstreamcomponent (downstream port), where the assigned Link number is selected fromone of the Link numbers the downstream component received from the upstreamcomponent. If the downstream component is restricted as to its placement of Lanenumber 0 14 , it must select the Link number received on that Lane. If thedownstream component is restricted to Link widths other than what is presented, itmust only transition the Link number of the subset of lanes that it can supportwithin the Link.Mechanism: The upstream port (downstream component) shall send out the TS1ordered-set with the assigned common Link number inserted into the Link numberfield (symbol 1) and the Lane number field (symbol 2) set to K23.7 for all laneswithin the widths supported by the port. Lanes which cannot be included due tosupported width restrictions shall continue sending TS1 ordered-sets with the Linknumber and Lane number fields both set to K23.7.Example 1: a x8 port: The upstream port (downstream component) sends out TS1ordered-sets with the Link number set to one of the Link numbers presented fromthe upstream component and the Lane number set to K23.7 on all 8 lanes. Per theexample under step 1 above, it must choose between N and N+1. If the upstreamport (downstream component) did not support Lane reversal, it must choose theLane number presented on its Lane 0.Example 2: a x16 port which is not capable of becoming a x8 Link, but is capable ofbeing a x 4 Link: The upstream port (downstream component) sends out TS1ordered-sets with the Link number set to the Link number presented from thedownstream port (upstream component) on the four lanes it can support within theLink; the Lane numbers remain set to K23.7 on those for lanes. It shall send outTS1 ordered-sets with the Link numbers and Lane numbers set to K23.7 on thetwelve remaining lanes.Note per Step 2: There may be times when a upstream port (downstreamcomponent) may be connected to another upstream port (downstreamcomponent)(cross-link). The rule below defines the behavior in this situation.Support for this behavior is optional.Upstream port (downstream component) connected to upstream port (downstreamcomponent):14 A simple example of this is the port does not support Lane reversal.165


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0If after a minimum of 16 TS1 ordered-sets have been received on each Lane withinthe perspective Link, the upstream port (downstream component) has not received aData Symbol for its Link number; the port may optionally assume the role of andownstream port and transition this port to Step 1. If this feature is not supported,it must maintain the values of K23.7 for Link and Lane number fields and exit thenegotiation.Step 3:The downstream port (upstream component) initializes Lane numbers:The downstream port (upstream component) must acknowledge the assigned Linknumber received from the upstream port (downstream component) by transitioningthe Link number to the assigned Link number on each Lane to the upstream port(downstream component) as well as transitioning the Lane number fields to itspreferred Lane numbers, while maintaining Link widths consistent with the widthrestrictions above. The preferred Lane numbers must be consecutive and one Lanenumber must be assigned to 0. If the downstream port (upstream component) hasnot received a Data Symbol for its Link number after receiving an additional 16 (orgreater) TS1 ordered-sets on all lanes in the perspective Link, all lanes of the Linkmust maintain values of K23.7 for Link and Lane number fields and exit thenegotiation. If the assigned Link number does not match any of its initial Linknumbers, see note below.Mechanism: The downstream port (upstream component) shall send out the TS1ordered-set with Link number field (symbol 1) set to the assigned Link number andthe Lane number field (symbol 2) set to its preferred number on all lanes whichreceived a Link number from the upstream port (downstream component) that canbe accommodated within the Link widths that port can support. It must transitionthe Link numbers and Lane numbers to K23.7 on the lanes that were previouslyrejected by the upstream port (downstream component) and any additional lanes thedownstream port (upstream component) cannot accommodate within its supportedLink widths.Note per Step 3: There may be times when a downstream port may be connected toa downstream port. The rules below define the behavior in this situation. Supportfor this behavior is optional.Downstream port (upstream component) connected to downstream port (upstreamcomponent):If the downstream port (upstream component) receives an assigned Link numberthat does not match any of its initial Link numbers, it may optionally compare thereceived Link number to its initial Link number. If the received Link number is less,it must transition these lanes to Step 2, assuming the role of an upstream port of adownstream component. A downstream port (upstream component) must remainin Step 3 if its assigned Link number was greater than the received Link number or itdoes not support this feature. If after a minimum of 16 TS1 ordered-sets have beenreceived on each Lane within the perspective Link, the downstream port (upstream166


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0component) has not received a Link number that matches any of its initial Linknumbers, it must maintain the values of K23.7 for Link and Lane number fields andexit the negotiation.Step 4:The upstream port (downstream component) responds with Lane numberassignments:The upstream port (downstream component) must accommodate Link width andLane numbers presented by the downstream port (upstream component) if it ispossible to do so (see system designer rules below in this section). If the upstreamport (downstream component) can accept the Link width presented but not the Lanenumbers, it must acknowledge with its preferred ordering of Lane numbers at thistime. The preferred Lane numbers must be consecutive and one Lane number mustbe assigned to 0. If the upstream port (downstream component) is restricted to Linkwidths other than what is presented, it must only transition the Lane numbers on thesubset of lanes that can be accommodated within the Link widths that port cansupport. It must transition the Link numbers and Lane numbers to K23.7 on thelanes that cannot be accommodated within the widths that port supports.Mechanism: If the upstream port (downstream component) has not received DataSymbols for its Lane numbers after receiving an additional 16 or more TS1 orderedsets,all perspective lanes of the Link must maintain values of K23.7 for Link andLane number fields and exit the negotiation. If the upstream port (downstreamcomponent) can accommodate the Lane numbers received from the downstreamport (upstream component), it shall send out the TS1 ordered-sets with Link numberfield (symbol 1) set to the assigned Link number and the Lane number field (symbol2) set to the Lane numbers assigned by the downstream port (upstream component).Otherwise, it shall insert its preferred numbers on all lanes with Lane numberscurrently assigned by the downstream port (upstream component) that it canaccommodate within the Link widths that port can support. It must transition theLink numbers and Lane numbers to K23.7 on the lanes that were previously rejectedby the downstream port (upstream component) and any additional lanes theupstream port (downstream component) cannot accommodate in the Link width.Step 5:The downstream port (upstream component) confirms Link number and Laneassignments:The downstream port (upstream component) must accommodate any Lane numberswhich are consistent with all system and component rules that do not match itspreferred ordering, completing the Link width and numbering negotiation (seesystem designer rules below in this section). If the upstream port (downstreamcomponent) has assigned Lane numbers to a number of lanes resulting in Link widththat port can not support, the downstream port (upstream component) mustaccommodate upstream port’s (downstream component’s) Lane 0, establishing aLink of width 1.167


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0Mechanism: If the downstream port (upstream component) has not received DataSymbols for its Lane numbers after receiving an additional 16 or more TS1 orderedsets,all lanes of the Link must maintain values of K23.7 for Link and Lane numberfields and exit the negotiation. If the downstream port (upstream component) is tofurther reduce the width requested by the upstream port (downstream component)to a width greater than x1, the downstream port (upstream component) must returnto step 3 15 . Otherwise, the downstream port (upstream component) shall transitionto sending the TS2 ordered-sets with the Link number fields (symbol 1) and Lanenumber fields (symbol 2) set to negotiated values. It must transition the Linknumbers and Lane numbers to K23.7 on the lanes that were previously rejected bythe upstream port (downstream component) and any additional lanes thedownstream port (upstream component) cannot accommodate in the Link width.Step 6:The upstream port (downstream component) confirms Link number and Laneassignments:Mechanism: After receiving at least one TS2 ordered-set, the upstream port(downstream component) shall transition to sending the TS2 ordered-sets with theLink number fields (symbol 1) and Lane number fields (symbol 2) set to negotiatedvalues. It must transition the Link numbers and Lane numbers to K23.7 on thelanes that were previously rejected by the downstream port (upstream component).Step 7:The downstream and upstream ports (upstream and downstream components,respectively) settle on Link number and Lane assignments:Both ports continue sending TS2 ordered-sets. If after completing the negotiation(steps 1 – 6), either port again transitions the Lane numbers (should only occur ifthis Link is an advanced switching cross-link (refer to Section 1.6) where both portshave been implemented as downstream ports of upstream components), the portmay optionally silently accept the received Lane numbers as the correct labeling ofthe other port’s transmitters and therefore its own receivers; otherwise, all lanes ofthe Link must maintain values of K23.7 for Link and Lane number fields and exit thenegotiation. No further changes to Link number and Lane number are allowed atthis point without a complete re-training and re-configuration of the ports andassociated Link(s). Label numbers are to be retained, skipping the Link width andLane sequence negotiation steps unless transitioned to Link state Polling.Quiet.When these steps are skipped, the previously negotiated Link and Lane numbers areretained and inserted into the appropriate fields of the TS1 and TS2 ordered-sets.15 It is only possible to return to step 3 one time due to the limited number of Link widths allowed (x1, x2, x4,x8, x12, x16, x32). The longest sequence of width negotiation consists of an upstream component(downstream port), which supports x16, x8, x2, x1 connected to a downstream component (upstream port),which supports x32, x12, x4, x1. Step 1 would attempt to create a x16 Link, step 2 a x12 Link, step 3 (firstpass) a x8 Link, step 4 (first pass) a x4 Link, step 3 (second pass) a x2 Link, step 4 (second pass) a x1 Link.168


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0Mechanism: At least 16 TS2 ordered-sets are sent after receiving one TS2 orderedset.If the received TS2 ordered-sets have Lane numbers that do not match thosetransmitted in the transmitted TS2 ordered-set, the port may internally disassociatethe transmitter and receiver from the same Lane number label, associating thereceiver with the Lane number received. The transmitter association to Lanenumber must not change once TS2 ordered-sets have been sent. If the receivercannot be associated with the Lane number received, all lanes of the Link mustmaintain values of K23.7 for Link and Lane number fields and exit the negotiation.The port returns to Config.Idle after at least 8 TS2 ordered-sets are received.All lanes that are connected to the other port but not included in the negotiated Linkmust maintain values of K23.7 for Link and Lane number fields assigned byupstream and downstream ports.Any lanes that fail to establish Data Symbol values for Link and Lane number fields areinactive, and will not exchange information with the Data Link Layer. All data and controlto the Data Link Layer from active lanes shall be consistent with the agreed Lanenumbering. When negotiating Link width and Lane sequence; each downstream port(upstream component) must transition between steps in unison across all of its lanes andeach upstream port (downstream component) must transition between steps in unisonacross all of its lanes.The following graphical flow diagrams demonstrate interoperability of components withsimplified negotiation machines. Components/ports with complex negotiation machines areadded to facilitate clarity.169


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0Start for DownstreamcomponentEntry from Polling withall lanes of the potentiallink in bit, polarity andsymbol alignment.Lane number and Linknumber initially PADNoDoes the received trainingsequences have a link numberother than PAD?No2 ms timeoutYesInsert one of the linknumbers received into TS1and transmit on all lanes ofthe link.See the text "Step 2"NoDoes the received trainingsequences have lane numbersother than PAD?No2 ms timeoutYesYesError ExitDownstreamPart 2Figure 4-11: Width Negotiation, Simplified State Machine, Downstream Component(Part 1)170


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0DownstreamPart 2Insert the lane numbers forthis link into TS1 andtransmit on all lanes of thelink.See the text "step 4"Next lowerlink widthDoes the received width matchthe desired width?Are the lane numbersacceptable?NoLower thelink widthNo lowerwidthYesSwitch to sending TS2At least 8 TS2 must be sentafter receiving one TS2No8TS2orderedsetsreceived?No2 ms timeoutYesError ExitYesExit to Config.IdleFigure 4-12: Width Negotiation, Simplified State Machine, Downstream Component(Part 2)171


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0Start for upstreamcomponentEntry from Polling withall lanes of the potentiallink in bit, polarity andsymbol alignment .Lane number initiallyPAD.Initialize the unique linknumbers, set lane number toPAD and transmit TS1.See the text "Step 1"NoDoes the received trainingsequences have a link numberother than PAD?No2 ms timeoutYesYesUpstreamPart 2Error ExitFigure 4-13: Width Negotiation, Simplified State Machine, Upstream Component(Part 1)172


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0UpstreamPart 2Set the lane numbers to thedesired lane numbers on alllanes of the link andtransmit TS1See the text "Step 3"NoDoes the received trainingsequences have a lane numberother than PAD?No2 ms timeoutYesYesDoes the received width matchthe desired width?Are the lane numbersacceptable?NoLower thelink widthNolowerwidthNext lowerlink widthSwitch to sending TS2No8TS2received?No2 ms timeoutYesError ExitYesExit to Config.IdleFigure 4-14: Width Negotiation, Simplified State Machine, Upstream Component(Part 2)173


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0System Designer Rules:System designers must connect Lanes within a Link interconnected through a connector orother suitable reference such that components are capable of labeling Lanes consistent withconsecutive Lanes of the reference, inclusive of reference Lane 0. Components with fixedLane ordering will interoperate with other compliant components flexible enough to alsosupport other labelings. A simple example of increased flexibility would be to accommodateLanes connected in reverse order. It is straightforward for a component to reverse Laneorder upon receiving a Lane number of 0 on its most significant Lane or failure to detect anactive in-bound Lane on its own Lane 0.Example of Simple, Compliant Components:Illustrated in Figure 4-15 are the transitions in the training exchange of Link and Lanenumbers between an upstream component’s 5 th downstream Port that supports Link widthsof x32, x12, x4, or x1 only and a downstream component’s upstream Port that supports Linkwidths of x16, x8, x2, or x1 only. This is a worst-case scenario, with negotiation occurring ateach step and the only common width is x1; at each step, the next largest Link widthsupported is implicit in the next exchange in the sequence. Link number is first (above), andLane number second (below), with K representing the K23.7 symbol; transitions trigger thenext step in the negotiation. The upstream component’s (downstream port’s) Lanetransitions are shown in white and downstream component’s (upstream port’s) Lanetransitions are shown in gray. Only the 16 Lanes that have successfully completed the bitsynchronization, polarity reversal and symbol synchronization training are shown.t0 K K12 K K34 K K5Port Lanes15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0KKKKKKKKKKKKKKKKKKKK5KKKKKKKKKKK5KKKKKKKKKKK5KKKKKKKKKKK5KKKKKKKKKKK5K5KKKKKKKKK5K5KKKKKKKKK5K5KKKKKKKKK5K5KKKKKKKFigure 4-15: Width Negotiation ExampleKK5K5K53KKKKKK5K5K52KKKKKK5K5K5151KKKK5K5K505050174


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.04.2.4.8.3. Port to Port Width Negotiation ExampleThe following text and diagrams show additional examples of two ports negotiating thewidth and Lane ordering of a Link.Configuring groups of Lanes to form single logical links is done via a negotiation processthat modifies the values of TS1 and TS2 symbols representing Link number and Lanenumber for each Lane. The process can be viewed as presentation of proposals andcounter-proposals in the form of transitioning symbol values in discrete steps. Ordered-setsare symbol synchronized across Lanes that potentially make up a single logical Link, allowingthe transitions of symbol values across Lanes to be examined together. The ordered-setscontaining each proposal are repeated until a new counter-proposal is detected via thereceipt of appropriate symbol transitions. The association of a Lane to a particular logicalLink is indicated by its final Link number symbol and its position (ordering) within the Linkis indicated by its final Lane number symbol. Lanes that have not been included in anylogical Link will have final symbol values identical to the pre-negotiation value of PAD.Steps 1 and 2 establish the number of links an upstream component (downstream port(s)) isattached to.Steps 1 and 2 also begin (but not necessarily) complete establishing the width of the resultantLink(s).175


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0Link ConfigurationSteps 1, 2Upstream CompDownstream Port(s)Downstream CompUpstream PortEntryEntryStep 1Config.RcvCfgStep 1TS1TS1Config.RcvCfgStep 2TS1Config.RcvCfgStep 2Config.RcvCfgStep 3Figure 4-16: Link Width Negotiation; Steps 1,2In order to enter the configuration state, Lanes within a perspective Link have alreadyexchanged TS1 ordered-sets and completed the bit synchronization, polarity inversion (ifneeded) and symbol synchronization functions. Prior to entering the configuration state, theLink number and Lane number fields have been set to PAD (K23.7) and TS1 ordered-setsare sent repeatedly.Step 1:Upon entering Config.RcvrCfg, the downstream port(s) starts the Link width and Laneordering negotiations by sending out the TS1 ordered-set with a unique Link number onsets of Lanes, which that component could support as unique links; the Lane numberscontinue to be set to PAD.Step 2:Upon receipt of the TS1 ordered-set with Link numbers (non-PADs) present in the Linknumber field, the upstream port shall respond by choosing one of the Link numbers itreceived. This step of returning the one Link number determines the downstreamport(s) the number of links that are to be negotiated.The upstream port responds with a Link number only on the Lanes in which it receiveda Link number and Lanes that it can support in one Link. A simple example: a port may176


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0be designed to support a x32 Link. Only 16 of those Lanes may have been attached, andtherefore TS1s received only on 16 Lanes. The port may not support a x16 Link, butmay support a x12 Link. In that case, the upstream port returns TS1 ordered-sets with aLink number only on the 12 Lanes that it is capable of supporting in a x12, and with theLink number set to PAD on the 4 remaining Lanes. This is the first counter-proposaltowards establishing the final Link width.Additional notes on steps 1 and 2:One method to create a cross-link Section 1.6 is to connect a downstream port toanother downstream port. One of two scenarios can occur; a.) The two ports choosedifferent Link numbers to begin negotiations, or b.) The two ports choose the same Linknumber to begin negotiation. If a.) occurs, the rules are described as part of step 3. Ifb.) occurs, the two ports will not yet be able to differentiate the Step 1 behavior of across-link condition from the Step 2 behavior of a normal upstream port.Note: If a system designer connects two (or more) downstream ports on one upstreamcomponent that is capable of being aggregated into one Link (Link aggregation) in across-link to two downstream ports on a different upstream component, theconfiguration results are undefined.ConfigurationSteps 3, 4Upstream CompDownstream Port(s)Return to Step 2SeeNote4Downstream CompUpstream PortEntry From Step 2Step 3Config.RcvCfgStep 3TS1See Note 1Step 4TS1See Note 2TS1See Note 3Config.RcvCfgStep 4Config.WaitStep 5Figure 4-17: Link Width Negotiation; Steps 3, 4Steps 3 and 4 establish Lane ordering within each Link established in Steps 1 and 2. To finda supported Link width common to both components, Steps 3 and 4 continue to reduce theLink width by removing select Lanes from the negotiation; Lanes never join a Linknegotiation through these steps. Returning a Lane’s Link number to the value of PADindicates removal; otherwise Link numbers persist with the value assigned in Step 2.177


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0Step 3:Upon receipt of TS1 ordered-sets with a Link number inserted in that field on eachLane, the downstream port transitions to Step 3, making its first proposal for Lanenumbers within each group of Lanes with common received Link numbers.Note 1 (Figure 4-17): In the event that a set of ports were connected to a singleupstream port, those ports would all see the same Link number returned. This is themechanism that allows those ports to be aggregated into one Link. If those ports are notcapable of being aggregated into one Link, the upstream component must continuenegotiation with only one of those ports and transition the Link number to PAD onLanes of the remaining ports, removing them from the negotiation process.Note 2 (Figure 4-17): Components in a cross-link as described in scenario a.) above,start negotiations by presenting different Link numbers to each other. Componentsdesigned to comprehend the cross-link condition implement the optional compare of thetwo Link numbers. The component that receives a Link number smaller than the Linknumber it presented on its port assumes the role of an upstream port and transitions toStep 2. The other component receives a Link number greater than the Link numberpresented on its port remains in Step 3 to await a further Link number transitionmatching its own.Step 4:Upon receipt of the TS1 ordered-set with Lane numbers presented in the Lane numberfields and a common Link number present in the Link number fields, the upstream porttransitions to Step 4, asserting an appropriate set of Lane numbers. The upstream portshould only counter-propose Lane numbers if it has a fixed ordering of its Lanes and thedownstream port is connected in a reversed Lane fashion; otherwise, Step 4acknowledges the downstream port proposal.178


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0ConfigurationSteps 5,6Upstream CompDownstream Port(s)Downstream CompUpstream PortEntry From Step 4Config.WaitStep 5TS2Step 5TS2Config.WaitStep 5Step 6Config.IdleStep 6TS2Config.IdleStep 6Exit to L0Exit to L0Figure 4-18: Link Width Negotiation; Steps 5, 6Steps 5 and 6 acknowledge the completed Link width and Lane sequence negotiation.Step 5:The downstream port transitions to sending TS2 ordered-sets with the Link number andthe Lane numbers inserted in the defined fields. At this time, the downstream port alsoremoves Lanes that have been removed from the negotiation by the upstream port.Note 3 (Figure 4-18): If additional Lanes are removed from the negotiation as may occurin the extreme mismatch of supported Link widths described in Step 4 of the rules, thedownstream port can only transition to TS2 if a x1 Link is to be formed. Feweradditional Lanes removed resulting in a x2 Link require the downstream port to return toStep 3, remaining in TS1.Step 6:The upstream port transitions to sending TS2 ordered-sets when it receives TS2 orderedsetswith the agreed upon Link number and the Lane ordering it last presented to thedownstream port. At least Lane 0 is retained in this step; the upstream port removesLanes that have been removed from the negotiation by the downstream port.Step 7: (not shown in Figure 4-18)As noted earlier, there is a case where two downstream ports have been negotiating witheach other and not realizing it until this step. This can occur in a cross-link when thetwo downstream ports both choose the same Link number to begin negotiation; both179


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0ports present their Link number in their Step 1, receiving that same Link number in Step3. In Step 3, they both presented their preferred Lane numbers; if connected such thatthese align, Step 5 will not modify Lane numbers and simply transition to sending TS2ordered-sets. However, if one component is connected in reverse Lane fashion andboth ports support Lane reversal, Step 5 will cause both to accommodate the other. Themechanism to observe this is when a TS2 ordered-sets arrives with Lane numbers thatdo not match the Lane numbers being transmitted. The optional behavior to resolvethis conflict is to continue sending TS2 ordered-set with the agreed upon Link numberand conflicting Lane numbers. However, the Lane numbers now represent thetransmitter Lane number. The port must then disassociate its transmitter with a receiverand reverse the ordering of the receivers to match the Lane ordering of the other port.4.2.4.9. Lane-to-Lane De-skewLane-to-Lane de-skew shall be done across all Lanes within multi-Lane links. Anunambiguous de-skew mechanism is the COM symbol transmitted during training sequenceor skip ordered-sets across all Lanes within the Link (at what the transmitter believes is)simultaneously. Other de-skew mechanisms may also be employed. The receiver mustcompensate for the allowable skew between Lanes within a multi-Lane Link beforedelivering the data and control to the Data Link Layer.4.2.4.10. Lane vs. Link TrainingThe initialization Link training process builds unassociated Lanes on a device into associatedLanes that form a Link. This occurs during the first state of the configuration state machineConfig.RcvrCfg where the links are configured (e.g. width negotiation and optional Lanereversal). State machines prior to the Config.RcvrCfg operate on a per Lane basis, afterConfig.RcvrCfg the operations are on a Link basis.For example, transmitted data prior to Config.RcvrCfg sends the specified data on all Lanesof the device; after the Config.RcvrCfg state the transmitter sends the specified data on allLanes of the configured Link.4.2.5. Link Training and Status State Machine (LTSSM)All timeout values specified in the Link training and status state machine (LTSSM) timeoutvalues are minus 0 seconds and plus 50% unless explicitly stated otherwise. All timeoutvalues must be set to the specified values after power-on/reset. All counter values must beset to the specified values after power-on/reset.The LTSSM states are illustrated in Figure 4-19. These states are described in followingsections.4.2.5.1. DetectThe purpose of this state is to detect when a far end receiver is powered on in order to avoidtransferring common mode between the transmitter and receiver.180


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.04.2.5.2. PollingThe Port transmits training ordered-sets and responds to the received training ordered-sets.In this state bit lock and symbol lock are established, Lane polarity is configured, and Lanedata rate is established.4.2.5.3. ConfigurationIn Configuration both the transmitter and receiver are sending and receiving data at thenegotiated data rate. The Link configures width and Lane reversal and manages Lane-to-Lane skew within the Link.4.2.5.4. RecoveryIn Recovery both the transmitter and receiver are sending and receiving data at thepreviously negotiated data rate. The Port transmits training ordered-sets and responds to thereceived training ordered-sets. In this state bit lock and symbol lock are re-established.4.2.5.5. L0L0 the normal operational state where data and control packets can be transmitted andreceived.4.2.5.6. L0sL0s is intended as a power savings state.L0s allows a Link to quickly enter and recover from a power conservation state withoutgoing through the Configuration or Recovery states.The entry to L0s occurs after receiving an Electrical Idle ordered-set.A transmitter and receiver Lane pair on a Port are not required to both be in L0ssimultaneously.4.2.5.7. L1L1 is intended as a power savings state.The L1 state allows an additional power savings over L0s at the cost of additional resumelatency.The receiver must be able to recover from this state within 64 µs, including reacquiring bitand symbol synchronization.The entry to L1 occurs after being directed by the Data Link Layer and receiving anElectrical Idle ordered-set.181


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.04.2.5.8. L2Power can be aggressively conserved in L2. Most of the Transmitter and Receiver may bedisabled 16 . Main power and clocks are not guaranteed, but aux 17 power is available.An upstream port must be able to send and a downstream port must be able to receive awakeup signal referred to as a Beacon. 18The entry to L2 occurs after being directed by the Data Link Layer and receiving anElectrical Idle ordered-set.4.2.5.9. External LoopbackLoopback is intended strictly for testing and validation purposes. When a Link is inloopback, the symbols received are “looped back” to the transmitter on the same Lane.A Loopback master is the component requesting loopback.A Loopback slave is the component looping back the data.Loopback is entered whenever two consecutive TS1 or TS2 ordered-sets are received withthe loopback bit set.Loopback is exited by the sending of an Electrical Idle ordered-set followed by ElectricalIdle.4.2.5.10. DisabledIn Disabled the receiver terminators must remain enabled and the transmitter is in a highimpedance Electrical Idle.The Receiver Detection sequence (see Section 4.3.1.8) is allowed while in the disabled state ifdesired.Disabled is entered when directed by the Data Link Layer.4.2.5.11. Link Control ResetLink Control Reset is entered when directed or when two consecutive TS1 or TS2 orderedsetsare received with the Reset bit set.16 The exception is the receiver termination, which must remain in a low impedance state.17 In this context, “aux” power means a power source which can be used to drive the Beacon and ReceiverDetection circuitry.18 A device generates beacons in order to wake a system that is in D3cold. See Section 4.3.2.4 forinformation on the electrical requirements of the beacon. Refer to Chapter 6 for more information on how adevice may use the beacon as the wake mechanism.182


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.04.2.6. Link Training and Status State DescriptionsInitial StateorDirected byData LinkLayerDetectP ollingDisabledConfigurationRecoveryFrom anyStateExternalLoopbackL0L2Link ControlResetL1L0sFigure 4-19: Main State Diagram for Link Training and Status State Machine4.2.6.1. Detect4.2.6.1.1. Detect.Quiet• Transmitter is in a high impedance Electrical Idle state.• Lane number and Link Number are initialized to K23.7.• Generation 1 data rate is selected.• LinkUp = 0 (status is cleared).• Next state is Detect.Active after a 64 ms timeout.183


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.04.2.6.1.2. Detect.Active• The transmitter performs a high impedance Receiver Detection sequence(seeSection 4.3.1.8 for more information).• Next state is Detect.Charge if a receiver is detected.• Next state is Detect.Quiet if a receiver is not detected.4.2.6.1.3. Detect.Charge• Transmitter is in a high impedance Electrical Idle state.• Next state is Polling after 64 ms timeout or when the operating DC common modevoltage is stable and within specification. 19EntryDetectDetect.Quiet64 ms timeoutNo DetectDetect.ActiveReceiverDetectedDetect.Charge64 ms ChargeExit to PollingFigure 4-20: Detect Sub-State Machine19 The common mode being driven must meet the Absolute Delta Between DC Common Mode During L0and Electrical Idle (V TX-CM-DC-ACTIVE-IDLE-DELTA) specification (see Table 4-4).184


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.04.2.6.2. Polling4.2.6.2.1. Polling.Quiet• Transmitter is in Electrical Idle.• A Receiver Detection sequence (see Section 4.3.1.8 for more information) isperformed.o If no receiver is present, next state is Detect• LinkUp = 0 (status is cleared).• Next state is Polling.Configuration if a single TS1 or TS2 ordered-set or theircomplement is received.• Next state is Polling.Active after a minimum of 64 ms.4.2.6.2.2. Polling.Active• Transmitter sends a minimum of 1024 consecutive TS1 ordered-sets on all Lanes.o Note: This guarantees a minimum of 64 µs for the bit lock time atgeneration 1 data rates.• Next state is Polling.Configuration if a single TS1 or TS2 ordered-set or theircomplement is received.• Next state is Polling.Compliance if the transmitter has entered Polling.Active 32consecutive times without receiving a single TS1 or TS2 ordered set and the receiverhas never detected an exit from Electrical Idle after the first time entering Polling.oNote: The compliance mode is entered only if no signal was detected at anyreceiver on a Link since the time of reset.• Next state is Polling.Quiet if the transmitter sends 1024 TS1 ordered-sets withoutreceiving a single TS1 or TS2 ordered-set.4.2.6.2.3. Polling.Compliance• 8b/10b encoder is set to positive disparity• Transmitter sends out the compliance pattern (see Section 4.2.8)• Next state is Polling.Active if Electrical Idle is no longer detected at the receiver.185


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.04.2.6.2.4. Polling.Configuration• Receiver inverts polarity if necessary (see Section 4.2.4.2).• Transmitter sends TS1 ordered-sets on the Port. At least 16 TS1 ordered-sets aresent after receiving one TS1 or TS2 ordered-set.• Next state is Configuration if eight consecutive TS1 or TS2 ordered-sets are receivedand no higher data rate is supportedoOtherwise, next state is Polling.Speed if eight consecutive TS1 or TS2ordered-sets are received.• Otherwise, next state is Polling.Active after a 2 ms timeout.4.2.6.2.5. Polling.Speed• The transmitter enters Electrical Idle for a minimum of T TX-IDLE-MIN (see Table 4-4).• Data rate is changed to highest common data rate supported in the training sequence(see Section 4.2.4.1).• Transmitter sends a minimum of 1024 consecutive TS1 ordered-sets on all lanes.o Note: This guarantees a minimum bit lock time.• Next state is Configuration.PollingEntry64 ms timeoutPolling.Quiet1024 TS1 setsPolling.ActivePolling.ComplianceNo DetectTSxreceived2 ms timeoutTSxreceivedPolling.ConfigurationTSxreceivedPolling.SpeedExit toConfigurationFigure 4-21: Polling Sub-State Machine186


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.04.2.6.3. Configuration4.2.6.3.1. Config.RcvrCfg• Transmitter sends TS1 ordered-sets on the Lanes. At least 16 TS1 ordered-sets aresent after receiving one TS1 or TS2 ordered-set.oNote: All lanes must have achieved bit and symbol lock by this state asensured by Polling.• Link width and Lane reversal is performed as described in Section 4.2.4.8.• Note: If some Lanes do not configure successfully they may be disabled or may bereturned to Polling.ooNote: Disabled Lanes should be re-enabled if any active Lanes within thesame Link enter Detect.Note: All Lanes on a configured Link must operate at the same data rate.• Next state is Config.Idle if a receiver negotiates a valid configuration and receiveseight consecutive TS1 or TS2 ordered-sets on all configured Lanes.Otherwise, the data rate that the Port indicates it supports is dropped down to the nextlower data rate and the next state is Polling. See Section 4.2.4.7 for information on data ratenegotiation.4.2.6.3.2. Config.Idle• Transmitter sends Idle data symbols on all configured Lanes. At least 16 idle datasymbols are sent after receiving one Idle data symbol.• Receiver waits for Idle data.• LinkUp = 1 (status is set true).• Next state is L0 if eight consecutive symbol times of Idle data received on allconfigured Lanes.• Otherwise, after a minimum 2 ms timeout the data rate that the Port indicates itsupports is dropped down to the next lower data rate and the next state is Polling.See Section 4.2.4.7 for information on data rate negotiation.187


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0EntryConfigurationConfig.RcvrCfgLinkConfiguredConfig.IdleLinkError2mstimeout8idlesymbolsExit to PollingExit to L0Figure 4-22: Configuration Sub-State Machine4.2.6.4. Recovery4.2.6.4.1. Recovery.RcvrCfg• Transmitter sends TS2 ordered-sets on all configured Lanes. At least 16 TS2ordered-sets are sent after receiving one TS2 ordered-set.• Next state is Recovery.Idle if 8 consecutive TS2 ordered-sets are received on allconfigured Lanes.• Otherwise, after 2 ms an error is reported to the Data Link Layer and the next stateis Polling.4.2.6.4.2. Recovery.Idle• Transmitter sends Idle data (minimum of 16 symbol times) on configured Lanes.• Receiver waits for Idle data.• Next state is L0 if eight consecutive symbol times of Idle data received on allconfigured Lanes• Otherwise, after 2 ms, an error is reported to the Data Link Layer and the next stateis Polling.188


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0EntryRecoveryRecovery.RcvrCfg8 consecutiveTS2 receivedRecovery.Idle2mstimeout2mstimeout8idlesymbolsExit to PollingExit to L0Figure 4-23: Recovery Sub-State Machine4.2.6.5. L0This is the normal operational state.• Transmitter and receiver are enabled in a low impedance state.• Next state is Recovery if TS1 or TS2 received.• Next state is Recovery if directed to this state.• Next state is Polling if directed to this state.• Next state is Detect if directed to this state.• Next state is L0s if receiver detects Electrical Idle ordered-set.• Next state of transmitter is L0s if directed to this state.• Next state is L1 if receiver detects Electrical Idle ordered-set and is directed to thisstate.• Next state is L2 if receiver detects Electrical Idle ordered-set and is directed to thisstate.• Next state is Link Control Reset if directed to this state.• Next state is the Disabled if directed to this state.• Next state is External Loopback if directed to this state.189


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.04.2.6.6. L0s4.2.6.6.1. Receiver L0s4.2.6.6.1.1. Rx_L0s.Entry• Next state is Rx_L0s.Idle after a T TX-IDLE-MIN (Table 4-4) timeout4.2.6.6.1.2. Rx_L0s.Idle• Next state is Rx_L0s.FTS if receiver detects an exit from Electrical Idle4.2.6.6.1.3. Rx_L0s.FTS• Receiver locks to incoming bit stream and acquires symbol alignment.• Next state is Recovery if the receiver does not detect bit and symbol alignmentwithin the N_FTS duration on all Lanes of the Link.• Otherwise, if bit and symbol lock is obtained the next state is L0.4.2.6.6.2. Transmitter L0s4.2.6.6.2.1. Tx_L0s.Entry• Transmitter is in Electrical Idle.• Next state is Tx_L0s.Idle after a T TX-IDLE-MIN (Table 4-4) timeout.4.2.6.6.2.2. Tx_L0s.Idle• Next state is Tx_L0s.FTS if directed.4.2.6.6.2.3. Tx_L0s.FTS• Transmitter sends N_FTS Fast Training Sequences.• Transmitter sends a single SKP ordered set on all configured Lanes.• Next state is L0.190


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0EntryL0s: RecieverRx_L0s.EntryT TX-IDLE-SET-TO-IDLERx_L0s.IdleElectrical IdleExitRx_L0s.FTSN_FTStimeoutExit to L0RecoveryEntryL0s: TransmitterTx_L0s.Entry20 nsTx_L0s.IdleElectrical IdleExitTx_L0s.FTSExit to L0Figure 4-24: L0s Sub-State Machine191


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.04.2.6.7. L14.2.6.7.1. L1.Entry• Transmitter is in Electrical Idle.• Receiver waits for at least Electrical Idle T TX-IDLE-SET-TO-IDLE time given in Table 4-4.• The next state is L1.Idle after a T TX-IDLE-MIN (Table 4-4) timeout.• Next state is L1.Quiet.4.2.6.7.2. L1.Idle• Transmitter is in Electrical Idle.• Next state is Recovery if directed or if the receiver detects exit from Electrical Idle.EntryL1L1.EntryT TX-IDLE-SET-TO-IDLEL1.IdleCommandor ElectricalIdle ExitExit to RecoveryFigure 4-25: L1 Sub-State Machine192


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.04.2.6.8. L24.2.6.8.1. L2.Idle• Transmitter is in a high impedance Electrical Idle state for a minimum of 64 ms.• Next state is Polling if detection of a Beacon occurs on Lane 0.• Next state is L2.Detect if directed to transmit a Beacon.4.2.6.8.2. L2.Detect• Transmitter is in a high impedance Electrical Idle state for a minimum of 64 ms. .• A high impedance Receiver Detection sequence is performed (see Section 4.3.1.8 formore information)• Next state is L2.TransmitWake if a receiver is detected.• Next state is Detect if a receiver is not detected.4.2.6.8.3. L2.TransmitWake• Transmitter is in a high impedance Electrical Idle state for a minimum of 64 ms.• The transmitter transmits the Beacon on at least Lane 0 of the Link (Refer toSection 4.3.2.4).• Next state is Polling if Electrical Idle is exited on any incoming receiver Lane.EntryL2L2.IdleDirectedL2.DetectDetectedL2.TransmitWakeTransitionDetectedNo DetectTransitionDetectedExit to PollingExit to DetectExit to PollingFigure 4-26: L2 Sub-State Machine193


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.04.2.6.9. Disabled• Entrance to and exit from this state only when directed.• Transmitter sends between 4 and 16 TS1 ordered-sets with the Disable bit set.• Transmitter then goes into a high impedance Electrical Idle state.• Next state is Detect when directed.4.2.6.10. LoopbackThis mode is intended for test and fault isolation use only, and is not a normal operationalmode. Only the entry and exit behavior is specified. All other details are implementationspecific.4.2.6.10.1. Loopback.Active• The Loopback Slave must receive valid 8b/10b data. If SKP ordered-sets arereceived they are also looped back to the Loopback Master. SKP symbols may beadded or removed by the Loopback Slave as needed.• The Loopback Slave re-transmitter is sending the 10 bit data as received. If thereceived data was not 8b/10b valid, the transmitter sends back the special symbolEDB control character in place of the invalid character.oNote: The Loopback Slave must transmit the data with the same disparity aswas received.• Next state is Loopback.Exit when an Electrical Idle ordered-set is received by theLoopback Slave after the Electical.Idle ordered-set is transmitted back to theLoopback Master.• ·Next state is Loopback.Exit if an Electrical Idle is detected continuously for aminimum of 1 UI at the Loopback Slave receiver.4.2.6.10.2. Loopback.Exit• Transmitter is in Electrical Idle• Next state is Detect194


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0EntryLoopback.ActiveElectrical IdleDetectedLoopback.Exit20 nstimeoutExit toDetectFigure 4-27: Loopback State Machine4.2.6.11. Link Control Reset4.2.6.11.1. Link Control Reset Active• Link enters reset state and transmits a minimum of 1024 TS1 ordered-sets with thereset bit set on all downstream ports.• All transmitters on upstream ports transmit one Electrical Idle ordered-set, thenenter Electrical Idle.• Next state is Polling.4.2.7. Clock Tolerance CompensationSkip ordered-sets (defined below) are used to compensate for differences in frequenciesbetween bit rates at two ends of a Link. The Receiver Physical Layer Logical sub-block mustinclude elastic buffering which performs this compensation. The interval between skipordered-set transmissions is derived from the absolute value of the Transmit and Receiveclock frequency difference specified in Table 4-4. Having worse case clock frequencies at thelimits of the tolerance specified will result in a 600 ppm difference between the transmit andreceive clocks of a Link. As a result, the transmit and receive clocks can shift one clockevery 1666 clocks.195


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0Rules for Transmitters• All Lanes shall transmit Symbols at the same frequency (the difference between bit ratesis 0 ppm within all multi-Lane links).• When transmitted, the skip ordered-set shall be transmitted simultaneously on all Lanesof a multi-Lane Link (See Section 4.2.4.9 and Table 4-4 for the definition ofsimultaneous in this context).• The transmitted skip ordered-set is: one COM Symbol followed by three consecutiveSKP Symbols• The skip ordered-set shall be scheduled for insertion at an interval between 1180 and1538 Symbol Times.• Scheduled SKIP ordered-sets shall be transmitted if a packet or ordered-set is not alreadyin progress, otherwise they are accumulated and then inserted consecutively at the nextpacket or ordered-set boundary.Rules for Receivers• Receivers shall recognize received skip ordered-set consisting of one COM Symbolfollowed consecutively by one to five SKP Symbols.• Receivers shall be tolerant to receive and process SKIP ordered-sets at an averageinterval between 1180 to 1538 symbol times.• Receivers shall be tolerant to receive and process consecutive SKIP ordered-sets.• Receivers shall be tolerant to receive and process SKIP ordered-sets separated fromeach other at most 5664 symbol times – measured as the distance between theleading COM symbols.196


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.04.2.8. Compliance PatternDuring polling the compliance substate of the polling state machine may be entered (seeSection 4.2.5.3). The compliance pattern consists of the sequence of 8b/10b symbols K28.5,D21.5, K28.5, and D10.2 repeating. Current running disparity must be set to negative beforesending the first symbol.The compliance pattern is not entered if the receiver has previously detected an exit fromElectrical Idle.The compliance sequence is:Symbol K28.5 D21.5 K28.5 D10.2Current Disparity 0 1 1 0Pattern 0011111010 1010101010 1100000101 0101010101For any given device that has multiple lanes, every fourth Lane is delayed by a total of 4symbols. A 2 symbol delay occurs at both the beginning and end of the 4 symbol sequence,for a total of 8 symbols.This delay sequence on every fourth Lane is then:Symbol: D D K28.5 D21.5 K28.5 D10.2 D DWhere D is 2 symbols that are the same such that disparity is preserved after sending the 2D symbols. Example D symbols are the K28.5 and the D10.2.After the 8 symbols are sent, the delay symbols are advanced to the next Lane and theprocess is repeated. This looks like:Lane 0 D D K28.5- D21.5 K28.5+ D10.2 D D K28.5- D21.5 K28.5+ D10.2Lane 1 K28.5- D21.5 K28.5+ D10.2 K28.5- D21.5 K28.5+ D10.2 D D K28.5- D21.5Lane 2 K28.5- D21.5 K28.5+ D10.2 K28.5- D21.5 K28.5+ D10.2 K28.5- D21.5 K28.5+ D10.2Lane 3 K28.5- D21.5 K28.5+ D10.2 K28.5- D21.5 K28.5+ D10.2 K28.5- D21.5 K28.5+ D10.2Lane 4 D D K28.5- D21.5 K28.5+ D10.2 D D K28.5- D21.5 K28.5+ D10.2Lane 5 K28.5- D21.5 K28.5+ D10.2 K28.5- D21.5 K28.5+ D10.2 D D K28.5- D21.5Key:K28.5- Comma when disparity is negative, specifically: “0011111010”K28.5+ Comma when disparity is positive, specifically: “1100000101”D21.5 Out of phase data character, specifically: “1010101010”D10.2 Out of phase data character, specifically: “0101010101”D Delay CharacterThis sequence of delays ensures that the maximum possible interference effects of adjacentlanes occur for when measuring the compliance pattern.The compliance pattern is only exited if an Electrical Idle Exit condition is detected at thereceiver or if a physical reset occurs.197


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.04.3. Electrical Sub-BlockThe Electrical sub-block contains Transmitter and a Receiver. The Transmitter is suppliedby the Logical sub-block with Symbols which it serializes and transmits onto a Lane. TheReceiver is supplied with serialized Symbols from the Lane. It transforms the electricalsignals into a bit stream which is de-serialized and supplied to the Logical sub-block alongwith a Link clock recovered from the incoming serial stream.4.3.1. Electrical Sub-Block Requirements4.3.1.1. Clocking DependenciesThe ports on the two ends of a Link must transmit data at a rate that is within 600 parts permillion (ppm) of each other at all times. This is specified to allow bit rate clock sources witha +/- 300 ppm tolerance.4.3.1.1.1. Spread Spectrum Clock (SSC) SourcesThe data rate can be modulated from +0% to -0.5% of the nominal data rate frequency, at amodulation rate in the range not exceeding 30 kHz – 33 kHz. The +/- 300 ppmrequirement still holds, which requires the two communicating ports be modulated such thatthey never exceed a total of 600 ppm difference. For most implementations this places therequirement that both ports require the same bit rate clock source when the data ismodulated with an SSC.4.3.1.2. AC CouplingEach Lane of a Link must be AC coupled. The minimum and maximum value for thecapacitance is given in Table 4-4. The requirement for the inclusion of AC couplingcapacitors on the interconnect media is associated with the transmitter.4.3.1.3. InterconnectIn the context of this spec, the interconnect consists of everything between the pins at atransmitter package and the pins of a receiver package. Often, this will consist of traces on aprinted circuit board of other suitable medium, AC coupling capacitors and perhapsconnectors. Regardless of what physically makes up the interconnect, the total capacitanceof the interconnect seen by the receiver detection circuit (see Section 4.3.1.8) may not exceed3 nF.4.3.1.4. TerminationLow and high impedance states are defined for both the transmitter and the receiver and arelisted in Table 4-4 and Table 4-5.198


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0The only time the transmitter high impedance state is required is to initialize and maintainElectrical Idle during times when a hot plug/removal or asynchronous power up event couldoccur. 20The transmitter low impedance state is required any time differential data is to be sent.The only time the receiver must be in a high impedance state is when the receiver does nothave power. Otherwise, the receiver must always be in a low impedance state.4.3.1.5. DC Common ModeThe receiver DC common mode is always 0 V during all states.The transmitter DC common mode is initially established during Detect and is held at thesame value during all subsequent states.4.3.1.6. ESDAll signal and power pins must withstand (2000 V) of ESD using the human body modeland 800 V using the charged device model without damage. Class 2 per JEDEC JESE22-A114-A.This ESD protection mechanism also protects the powered down receiver from potentialcommon mode transients during some possible reset or surprise insertion situations.4.3.1.7. Short Circuit RequirementsAll Transmitters and Receivers must support surprise hot insertion/removal without damageto the component. The transmitter and receiver must be capable of withstanding sustainedshort circuit to ground of D+ and D-.4.3.1.8. Receiver DetectionThe receiver detection sequence is used to avoid unwanted common mode transfersbetween the receiver and transmitter.The receiver detection can be performed in either a low or high impedance state unlessexplicitly specified.20 Any time high impedance is required it is explicitly stated in the Link Training and Status State Machine(LTSSM).199


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0The behavior of the receiver detection sequence is described below.Step 1. Transmitter is in a stable Electrical Idle state.Step 2. The transmitter changes the common mode voltage on both D+ and D-lines 21 to a different value.a. A receiver is detected based on the rate 22 that the lines change to the newvoltage.i. The receiver is not present if the voltage at the transmitter charges ata rate dictated by the transmitter impedance and capacitance of theinterconnect.ii. The receiver is present if the voltage at the transmitter charges at arate dictated by the transmitter impedance, the series capacitor, theinterconnect capacitance, and the receiver termination.The AC capacitance of the worst-case transmission line must not exceed 3 nF total. Theminimum and maximum AC capacitance for the AC coupling capacitors is given inTable 4-4.4.3.1.9. Disable/Surprise Removal DetectionTwo separate events may signal that one end of a Link has either been disabled ordisconnected.1. During L0 if Electrical Idle is detected without receiving the Electrical Idle ordered-set.The Link immediately enters Detect.2. During Electrical Idle and certain specified times the transmitter must poll for thepresence of a powered receiver as described in Section 4.3.1.8. If a receiver is no longerpresent, the Link immediately enters Detect.4.3.1.10. Electrical IdleElectrical idle is a steady state condition where the Transmitter and Receive voltages are heldconstant. Electrical idle is primarily used in power saving and common mode initialization.Before a transmitter enters Electrical Idle, it must send the Electrical Idle ordered-set, aK28.5 (COM) followed by three K28.3 (IDL)(see Table 4-4). After sending the last symbolof the Electrical Idle ordered-set the transmitter must be in a valid Electrical Idle state asspecified by T TX-IDLE-SET-TO-IDLE (see Table 4-4). The receiver shall use this ordered-set to enterelectrical idle.The Receiver terminations must remain enabled in Electrical Idle. The transmitter mustmeet the DC common mode specification while transitioning into and out of Electrical Idle,which can be done in a low or high impedance state unless specifically specified.21 The maximum change in common mode voltage can be no more than V TX-CM-RCV-DETECT in Table 4-4.22 The rate of change should be at least 40x different between a receiver present and not present.200


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0Any time a transmitter enters Electrical Idle it must remain in electrical idle for a minimumof T TX-IDLE-MIN (see Table 4-4). The receiver should expect the Electrical Idle ordered-setfollowed by a minimum amount of time in Electrical Idle (T TX-IDLE-SET-TO-IDLE ) to arm itsElectrical Idle Exit detector.See Section 4.3.1.9 for additional notes related to Electrical Idle.4.3.2. Electrical Signal <strong>Specification</strong>sA Differential Signal is defined by taking the voltage difference between two conductors. Inthis specification, a differential signal or differential pair is comprised of a voltage on apositive conductor, V D+ , and a negative conductor, V D- . The differential voltage (V DIFF ) isdefined as the difference of the positive conductor voltage and the negative conductorvoltage (V DIFF = V D+ - V D- ). The Common Mode Voltage (V CM ) is defined as the average ormean voltage present on the same differential pair (V CM = [V D+ + V D- ]/2). This document’selectrical specifications often refer to peak-to-peak measurements or peak measurements,which are defined by the following equations.• V DIFFp-p = (2*max|V D+ - V D- |) (This applies to a symmetric differential swing)• V DIFFp-p = (max|V D+ - V D- | {V D+ > V D- } + max|V D+ - V D- | {V D+ < V D- }) (Thisapplies to an asymmetric differential swing.)• V DIFFp = (max|V D+ - V D- |) (This applies to a symmetric differential swing)• V DIFFp-p = (max|V D+ - V D- | {V D+ > V D- }) or (max|V D+ - V D- | {V D+ < V D- }) whichever is greater (This applies to an asymmetric differential swing.)• V CMp = (max|V D+ + V D- |/2)Note: The maximum value is calculated on a per unit interval evaluation. The maximumfunction as described is implicit for all peak-to-peak and peak equations throughout the restof this chapter, and thus a max function will not appear in any following representations ofthese equations.In this section, DC is defined as all frequency components below F dc = 30 kHz. AC isdefined as all frequency components above F dc = 30 kHz. These definitions pertain to allvoltage and current specifications.An example waveform is shown in Figure 4-28. In this waveform the differential peak-peaksignal is approximately 0.6 V, the differential peak signal is approximately 0.3 V and thecommon mode is approximately 0.25 V.201


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0VoltsSample UI0.50000.45000.40000.35000.30000.25000.20000.15000.10000.05000.00001.8 1.9 2 2.1 2.2 2.3 2.4Time in nsD+D-Figure 4-28: Sample Differential Signal4.3.2.1. LossLoss (attenuation of the differential voltage swing) in this system is a critical parameter thatmust be properly considered and managed in order to ensure proper system functioning.Failure to do so may result in a differential signal swing arriving at the Receiver that does notmeet specifications. The interconnect loss is specified in terms of the amount of attenuationor loss it can tolerate between the Transmitter (Tx) and Receiver (Rx). The Tx is responsiblefor producing the specified differential eye height at the pins of its package. Together, theTx and the interconnect are responsible for producing the specified differential eye height atthe Rx pins (see Figure 4-34).The worst-case operational loss budget is calculated by taking the minimum output voltage(V TX-DIFFp-p = 800 mV) divided by the minimum input voltage to the receiver (V RX-DIFFp-p =175 mV), which results in 13.2 dB. Additional headroom in loss budget can be achieved bydriving a larger differential output voltage at the transmitter.4.3.2.2. JitterThe jitter budget is derived assuming a maximum bit error rate (BER) of 10 -12 . The allocationanticipates both data dependent and random jitter contributions. The total jitter budget isthe sum of the deterministic jitter and 14 times the standard deviation RMS value of therandom jitter distribution. Total jitter is the combined peak-to-peak measured jitter from allsources.202


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.04.3.2.3. De-emphasisDe-emphasis is included to minimize Inter-symbol interference (ISI) due to delta in lossversus the primary fundamental transient frequencies (i.e., Generation 1 fundamental band =250 MHz to 1.25 GHz).De-emphasis must be implemented when multiple bits of the same polarity are output insuccession. Subsequent bits are driven at a differential voltage level 3.5 dB (+/-.5 dB) belowthe first bit. Individual bits are always driven at the full voltage level.The only exception pertains to transmitting the Beacon (see Section 4.3.2.4).Note: The specified amount of de-emphasis was chosen to optimize maximum interoperabilitywhile minimizing complexity of managing configurable de-emphasis values.Thus, the de-emphasis was targeted to work for the worst-case loss budget of 11-13.2 dB,which tends to make it less optimal for the lower loss systems. The fact that is less optimalfor lower loss systems is more than offset by the fact that there is inherently more voltagemargin in lower loss systems.4.3.2.3.1. De-emphasis ExampleAn example waveform representing the 10-bit symbol 243H is shown in Figure 4-29.Sample Data Pattern0.80000.7000Volts0.60000.5000D+D-0.40000.30000.20000 0.4 0.8 1.2 1.6 2 2.4 2.8 3.2 3.6 4Time in nanosecondsFigure 4-29: Sample Transmitted Waveform Showing -3.5 dB De-emphasis Around a0.5 V Common Mode203


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.04.3.2.4. BeaconAll transmitter electrical specifications must be met while sending a Beacon with thefollowing exceptions and clarifications.• The period of the Beacon must be no greater than 33.333 µs maximum.• All Beacons must be transmitted on at least Lane 0 of multi-lane links 23 .• The Beacon signal must contain pulses that are 2 ns minimum.• The Beacon must be DC Balanced (i.e., any Beacon must contain an equal numberof 1’s and 0’s).• The output Beacon voltage level must be at a -6 dB de-emphasis level for Beaconpulses with a width greater than 500 ns.• The output Beacon voltage level can range between the pre-emphasized andcorresponding -3.5 dB de-emphasized voltage levels for Beacon pulses smaller than500 ns.• The output Beacon voltage level must be at the de-emphasis level for Beacon pulseswith a width greater than 500 ns. Otherwise, the Beacon output voltage can rangebetween the pre-emphasized and corresponding de-emphasized voltage levels.• A Receiver Detection sequence (Section 4.3.1.8) must occur every 100 ms, and if noreceiver is found then the Link returns to Detect.• The Lane-to-Lane Output Skew and Skip Symbol Output specifications do notapply.When any bridge and/or switch receives a Beacon, that component must propagate aBeacon upstream.4.3.2.4.1. Beacon ExampleAn example receiver waveform driven at the -6 dB level for a 30 kHz Beacon is shown inFigure 4-30. An example receiver waveform using the COM character at full speed signalingis shown in Figure 4-31 It should be noted that other waveforms and signaling are possibleother than the two examples shown below (i.e., Polling is another valid Beacon signal).23 Lane 0 as defined after Link Width and Lane reversal negotiations are complete.204


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0Sample Data Pattern0.20000.15000.1000Volts0.05000.0000-0.0500D+D--0.1000-0.1500-0.200032.9 33.1 33.3 33.5 33.7Time in microsecondsFigure 4-30: A 30 kHz BEACON Signaling Through a 75 nF CapacitorSample Data Pattern0.25000.20000.15000.1000Volts0.05000.0000-0.0500-0.1000-0.1500-0.2000-0.25000 1 2 3 4 5 6 7 8Time in nanosecondsD+D-Figure 4-31: BEACON, Which Includes a 2 ns Pulse Through a 75 nF Capacitor205


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.04.3.3. Differential Transmitter (Tx) Output <strong>Specification</strong>sThe following table defines the specification of parameters for the differential output at allTransmitters (Txs). The parameters are specified at the component pins.Table 4-4: Differential Transmitter (Tx) Output <strong>Specification</strong>sSymbol Parameter Min Nom Max Units CommentsUI Unit Interval 399.88 400 400.12 psEach UI is 400 ps+/-300 ppm. UI does notaccount for SSC dictatedvariations.SeeNote1.V TX-DIFFp-pDifferential Peakto PeakOutput Voltage0.800 1.2 VV TX-DIFFp-p =2*|V TX-D+ - V TX-D-|Measured at the packagepins of the transmitter.SeeNote2.V TX-DE-RatioT TX-EYET TX-EYE-MEDIAN-to-MAX-JITTERT TX-RISE ,T TX-FALLV TX-CM-AcpDe-EmphasizedDifferentialOutput Voltage(Ratio)Minimum TXEye WidthMaximum timebetween thejitter median andmaximumdeviation fromthe median.D+/D- TX OutputRise/Fall TimeAC PeakCommon ModeOutput Voltage-3.0 -3.5 -4.0 dB0.70 UI


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0Symbol Parameter Min Nom Max Units Comments|V TX-CM-DC [during L0] – V TX-CM-Idle-DC[During Electrical Idle.]|


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0Symbol Parameter Min Nom Max Units CommentsRL TX-CMZ TX-DIFF-DCZ TX-Match-DCZ TX-COM-High-IMP-DCL TX-SKEWC TXCommon ModeReturn LossDC DifferentialTX ImpedanceD+/D- TXImpedanceMatchingTransmitterCommon ModeHigh ImpedanceState (DC)Lane-to-LaneOutput SkewAC CouplingCapacitor6 dB90 100 110 Ω-5 +5 ΩMeasured over 50 MHz to1.25 GHzSeeNote4.TX DC Differential Mode LowimpedanceTX DC impedance matchingbetween D+ and D- on agiven Lane.5k 20k Ω Tx DC High Impedance.500 ps75 500 nFBetween any two Laneswithin a single Transmitter.All transmitters shall be ACcoupled to the media.Notes:1. No test load is necessarily associated with this value.2. Specified at the package pins into a timing and voltage compliance test load asshown in Figure 4-33 and measured over at least 250 Tx UIs. (also refer to theTransmitter Compliance Eye Diagram as shown in Figure 4-32).3. A T TX-EYE = 0.70 UI provides for a total sum of deterministic and random jitterbudget of T TX-JITTER-MAX = 0.30 UI for the transmitter collected over at least250 TX UIs. The T TX-EYE-MEDIAN-to-MAX-JITTER specification ensures a jitterdistribution in which the median and the maximum deviation from the median isless than half of the total TX jitter budget collected over at least 250 TX UIs. Itshould be noted that the median is not the same as the mean. The jitter mediandescribes the point in time where the number of jitter points on either side isapproximately equal as opposed to the averaged time value.4. The transmitter input impedance shall result in a differential return loss greaterthan or equal to 12 dB and a common mode return loss greater than or equal to6 dB over a frequency range of 50 MHz to 1.25 GHz. This input impedancerequirement applies to all valid input levels. The reference impedance for returnloss measurements for is 50 ohms to ground for both the D+ and D- line (i.e., asmeasured by a Vector Network Analyzer with 50 ohm probes - see Figure 4-33).Note: that the series capacitors C TX is optional for the return loss measurement.5. Measured between 20-80% at Transmitter package pins into a test load as shownin Figure 4-33 for both V TX-D+ and V TX-D- . The maximum rise/fall time of theD+ and D- signals in Table 4-4 is considered relative bounds in that absoluteboundaries for the maximum rise/fall time is dictated by the TransmitterCompliance Eye Diagram as shown in Figure 4-32.208


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.04.3.3.1. Transmitter Compliance Eye DiagramV TX-DIFF = 0 mV(D+ D– Crossing Point)[Transition Bit]V TX-DIFFp-p-MIN = 800 mVV TX-DIFF = 0 mV(D+ D– Crossing Point)[De-emphasized Bit]566 mV (3 dB) >= V TX-DIFFp-p-MIN >= 505 mV (4 dB).07 UI = UI – 0.3 UI(J TX-TOTAL-MAX )[Transition Bit]V TX-DIFFp-p-MIN = 800 mVOM13816Figure 4-32: Minimum Transmitter Timing and Voltage Output Compliance<strong>Specification</strong>There are two eye diagrams that must be met for the transmitter. Both eye diagrams mustaligned in time and meet the minimum 0.7 UI requirement. The different eye diagrams willdiffer in voltage depending whether it is a transition bit or a de-emphasized bit. The eyediagram must be valid for at least 250 UIs. The Tx UI must be used as a trigger for the eyediagram.209


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.04.3.3.2. Compliance Test and Measurement LoadThe AC timing and voltage parameters should be verified at the package pins into atest/measurement load shown in Figure 4-33.D+ PackagePinC=C TX<strong>Specification</strong>sfor Test/MeasurementLoadC=C TXD- PackagePinR=50 ohms R=50 ohmsFigure 4-33: Compliance Test/Measurement LoadThe test load is shown at the transmitter package reference plane, but the sameTest/Measurement load is applicable to the receiver package reference plane.Return Loss measurements do not require that C TX be part of the measurement test load.210


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.04.3.4. Differential Receiver (Rx) Input <strong>Specification</strong>sThe following table defines the specification of parameters for all differentialReceivers (Rxs). The parameters are specified at the component pins.Table 4-5: Differential Receiver (Rx) Input <strong>Specification</strong>sSymbol Parameter Min Nom Max Units CommentsUI Unit Interval 399.88 400 400.12 psV RX-DIFFp-pT RX-EYET RX-EYE-MEDIAN-to-MAX-JITTERDifferentialInput Peak toPeakVoltageMinimumReceiver EyeWidthMaximum timebetween thejitter medianand maximumdeviation fromthe median.0.175 1.200 V0.4 UI


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0Symbol Parameter Min Nom Max Units CommentsZ RX-COM-DCZ RX-Match-DCZ RX-COM-Inital-DCZ RX-COM-HIGH-IMP-DCDC InputCommon ModeInputImpedanceDifferential PairImpedanceMatchInitial DC InputCommon ModeInputImpedancePowered DownDC InputCommon ModeInputImpedance45 50 55 Ω-5 +5 Ω5 50 55 Ω200 k ΩRX DC Common Modeimpedance 50 Ω +/-10%tolerance. See Note 10.SeeNote7.RX DC impedancematching between D+ andD- on a given Lane.SeeNote10RX DC Common Modeimpedance allowed whenthe receiver terminationsare first power on.SeeNote11.RX DC Common Modeimpedance when thereceiver terminations arenot powered (i.e. nopower). See Note 12V RX-IDLE-DET-DIFFppT RX-IDLE-DET-DIFF-ENTERTIMEElectrical IdleDetectThresholdUnexpectedElectrical IdleEnter DetectThresholdIntegrationTime65 175 mV10 msL RX-SKEW Total Skew 20 nsV RX-IDLE-DET-DIFFp-p =2*|V RX-D+- V RX-D-|Measured at the packagepins of the Receiver.V RX-DIFFp-p


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0250 TX UIs. It should be noted that the median is not the same as the mean.The jitter median describes the point in time where the number of jitterpoints on either side is approximately equal as opposed to the averaged timevalue.9. The receiver input impedance shall result in a differential return loss greaterthan or equal to 15 dB and a common mode return loss greater than or equalto 6 dB over a frequency range of 50 MHz to 1.25 GHz. This inputimpedance requirement applies to all valid input levels. The referenceimpedance for return loss measurements for is 50 ohms to ground for boththe D+ and D- line (i.e., as measured by a Vector Network Analyzer with 50ohm probes - see Figure 4-33). Note: that the series capacitors C TX isoptional for the return loss measurement.10. Impedance during all operating conditions except when in disable.11. The Rx DC common mode impedance that must be present when thereceiver terminations are first enabled to ensure that the Receiver Detectoccurs properly. Compensation of this impedance can start immediately andthe (Z RX-COM-DC ) Rx DC Common Mode Impedance must be with in the 45ohms to 55 ohms range by the time Detect is entered.12. The Rx DC common mode impedance that exists when the receiverterminations are disabled or when no power is present. This helps ensurethat the Receiver Detect circuit will not falsely assume a receiver is enabledwhen it is not.13. If a receiver is not in Electrical Idle or directed to go into Electrical Idle, anda peak-to-peak differential signal remains below the Electrical Idle thresholdfor V RX-IDLE-DET-DIFF-ENTERTIME , a surprise removal or disable has occurred.213


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.04.3.4.1. Receiver Compliance Eye DiagramV RX-DIFF = 0 mV(D+ D– Crossing Point)V RX-DIFF = 0 mV(D+ D– Crossing Point)V RX-DIFFp-p-MIN > 175 mV0.4 UI = T RX-EYE-MINOM13818Figure 4-34: Minimum Receiver Eye Timing and Voltage Compliance <strong>Specification</strong>214


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.055. Software Initialization and ConfigurationThe <strong>PCI</strong> <strong>Express</strong> Configuration model supports two configuration space accessmechanisms:• <strong>PCI</strong> compatible configuration mechanism• <strong>PCI</strong> <strong>Express</strong> enhanced configuration mechanismThe <strong>PCI</strong> compatible mechanism supports 100% binary compatibility with <strong>PCI</strong> 2.3 or lateraware operating systems and their corresponding bus enumeration and configurationsoftware.The enhanced mechanism is provided to increase the size of available configuration spaceand to optimize access mechanisms.5.1. Configuration TopologyTo maintain compatibility with <strong>PCI</strong> software configuration mechanisms, all <strong>PCI</strong> <strong>Express</strong>elements have a <strong>PCI</strong>-compatible configuration space representation. Each <strong>PCI</strong> <strong>Express</strong>Link originates from a logical <strong>PCI</strong>-<strong>PCI</strong> Bridge and is mapped into configuration space as thesecondary bus of this bridge. The Root Port is a <strong>PCI</strong>-<strong>PCI</strong> Bridge structure that originates a<strong>PCI</strong> <strong>Express</strong> Link from a <strong>PCI</strong> <strong>Express</strong> Root Complex.A <strong>PCI</strong> <strong>Express</strong> Switch is represented by multiple <strong>PCI</strong>-<strong>PCI</strong> Bridge structures connecting <strong>PCI</strong><strong>Express</strong> Links to an internal logical <strong>PCI</strong> bus. The Switch Upstream Port is a <strong>PCI</strong>-<strong>PCI</strong>Bridge; the secondary bus of this bridge represents the switch’s internal routing logic.Switch Downstream Ports are <strong>PCI</strong>-<strong>PCI</strong> Bridges bridging from the internal bus to busesrepresenting the downstream <strong>PCI</strong> <strong>Express</strong> Links from a <strong>PCI</strong> <strong>Express</strong> Switch.A <strong>PCI</strong> <strong>Express</strong> endpoint is mapped into configuration space as a single logical device(Device 0) with one or more logical functions.215


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0Root ComplexRegister Block<strong>PCI</strong> CompatibleHost Bridge Device<strong>PCI</strong> <strong>Express</strong> Root Complex<strong>PCI</strong>-<strong>PCI</strong> Bridgerepresenting Root<strong>PCI</strong> <strong>Express</strong> Port<strong>PCI</strong> <strong>Express</strong> LinkOM14299Figure 5-1: <strong>PCI</strong> <strong>Express</strong> Root Complex Device Mapping<strong>PCI</strong>-<strong>PCI</strong> BridgerepresentingUpstream <strong>PCI</strong><strong>Express</strong> Port<strong>PCI</strong> <strong>Express</strong> Switch<strong>PCI</strong>-<strong>PCI</strong> BridgerepresentingDownstream<strong>PCI</strong> <strong>Express</strong> Port<strong>PCI</strong> <strong>Express</strong> LinkOM14300Figure 5-2: <strong>PCI</strong> <strong>Express</strong> Switch Device Mapping 245.2. <strong>PCI</strong> <strong>Express</strong> Configuration Mechanisms<strong>PCI</strong> <strong>Express</strong> extends the configuration space to 4096 bytes per device function as comparedto 256 bytes allowed by <strong>PCI</strong> <strong>Specification</strong> Revision 2.3. <strong>PCI</strong> <strong>Express</strong> configuration space isdivided into a <strong>PCI</strong> 2.3 compatible region, which consists of the first 256 bytes of a logicaldevice’s configuration space and an extended <strong>PCI</strong> <strong>Express</strong> configuration space region whichconsists of the remaining configuration space. The <strong>PCI</strong> 2.3 compatible region can beaccessed using either the mechanism defined in the <strong>PCI</strong> 2.3 specification or the enhanced<strong>PCI</strong> <strong>Express</strong> configuration access mechanism described later in this section. All changes24 Future <strong>PCI</strong> <strong>Express</strong> Switches may be implemented as a single Switch Device component (without the<strong>PCI</strong>-<strong>PCI</strong> bridges) that is not limited by legacy compatibility requirements imposed by existing <strong>PCI</strong> software.216


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0made using either access mechanism are equivalent; however, software is not allowed tosimultaneously use (interleave) both <strong>PCI</strong> <strong>Express</strong> and <strong>PCI</strong> access mechanisms to access theconfiguration registers of devices. The extended <strong>PCI</strong> <strong>Express</strong> region can only be accessedusing the enhanced <strong>PCI</strong> <strong>Express</strong> configuration access mechanism. 25<strong>PCI</strong> <strong>Express</strong>ExtendedConfigurationSpace(Not available onlegacy operatingsystems)<strong>PCI</strong> ConfigurationSpace(Available on legacyoperating systemsthrough legacy<strong>PCI</strong> mechanisms)FFFhFFh3Fh0Extendedconfigurationspace for <strong>PCI</strong><strong>Express</strong> parametersand capabilities(Not available onlegacy operatingsystems)<strong>PCI</strong> <strong>Express</strong>Capability StructureCapability needed by BIOSor by driver software on non<strong>PCI</strong> <strong>Express</strong> aware operatingsystems<strong>PCI</strong> 2.3 CompatibleConfiguration SpaceHeaderFigure 5-3: <strong>PCI</strong> <strong>Express</strong> Configuration Space LayoutOM143015.2.1. <strong>PCI</strong> 2.3 Compatible Configuration MechanismThe <strong>PCI</strong> 2.3 compatible <strong>PCI</strong> <strong>Express</strong> configuration mechanism supports the <strong>PCI</strong>configuration space programming model defined in the <strong>PCI</strong> Local Bus <strong>Specification</strong>, Rev. 2.3.By adhering to this model, systems incorporating <strong>PCI</strong> <strong>Express</strong> interfaces remain compliantwith conventional <strong>PCI</strong> bus enumeration and configuration software.In the same manner as <strong>PCI</strong> 2.3 devices, <strong>PCI</strong> <strong>Express</strong> devices are required to provide aconfiguration register space for software-driven initialization and configuration. Except forthe differences described in this chapter, the <strong>PCI</strong> <strong>Express</strong> configuration header spaceregisters are organized to correspond with the format and behavior defined in the <strong>PCI</strong> 2.3<strong>Specification</strong> (Section 6.1).The <strong>PCI</strong> 2.3 compatible configuration access mechanism uses the same Request format asthe enhanced <strong>PCI</strong> <strong>Express</strong> mechanism. For <strong>PCI</strong> compatible Configuration Requests, theExtended Register Address field must be all zeros.25 Accesses strictly to <strong>PCI</strong> <strong>Express</strong> extended configuration space using the enhanced <strong>PCI</strong> <strong>Express</strong>configuration access mechanism are allowed to be interleaved with <strong>PCI</strong> 2.3 configuration accessmechanism accesses.217


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.05.2.2. <strong>PCI</strong> <strong>Express</strong> Enhanced Configuration MechanismThe enhanced <strong>PCI</strong> <strong>Express</strong> configuration access mechanism utilizes a flat memory-mappedaddress space to access device configuration registers. In this case, the memory addressdetermines the configuration register accessed and the memory data returns the contents ofthe addressed register. The mapping from memory address A[27:0] to <strong>PCI</strong> <strong>Express</strong>configuration space address is defined in Table 5-1. The base address A[63:28] is allocatedin an implementation specific manner and reported by the system firmware to the operatingsystem.Table 5-1: Configuration Address MappingMemory Address <strong>PCI</strong> <strong>Express</strong> Configuration SpaceA[27:20] Bus[7:0]A[19:15] Device[4:0]A[14:12] Function[2:0]A[11:8] Extended Register [3:0]A[7:0] Register[7:0]5.2.2.1. Host Bridge RequirementsThe <strong>PCI</strong> <strong>Express</strong> Host Bridge is required to translate the memory-mapped <strong>PCI</strong> <strong>Express</strong>configuration space accesses from the host processor to <strong>PCI</strong> <strong>Express</strong> configurationtransactions. The use of Host Bridge <strong>PCI</strong> class code is reserved for backwards compatibility;host bridge configuration space is opaque to standard <strong>PCI</strong> <strong>Express</strong> software and may beimplemented in an implementation specific manner that is compatible with <strong>PCI</strong> Host BridgeType 0 configuration space.5.2.2.2. <strong>PCI</strong> <strong>Express</strong> Device RequirementsDevices must support an additional 4 bits for decoding configuration register access i.e. theymust decode the Extended Register Address[3:0] field of the Configuration Request header.5.2.3. Root Complex Register BlockEach root port is associated with a 4096 byte block of memory mapped registers referred toas the Root Complex Register Block (RCRB). These registers are used in a manner similarto configuration space and can include <strong>PCI</strong> <strong>Express</strong> extended capabilities and otherimplementation specific registers that apply to the root complex. The structure of the RCRBis described in Section 5.9.2.System firmware communicates the base address of the RCRB for each Root Port to theoperating system. Multiple Root Ports may be associated with the same RCRB. The RCRBmemory-mapped registers must not reside in the same address space as the memory-mappedconfiguration space.218


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.05.3. Configuration Transaction Rules5.3.1. Device NumberAs in conventional <strong>PCI</strong> and <strong>PCI</strong>-X, all <strong>PCI</strong> <strong>Express</strong> components are restricted toimplementing a single device number on their primary interface (Upstream Port), but mayimplement up to eight independent functions within that device number. Each internalfunction is selected based on decoded address information that is provided as part of theaddress portion of Configuration Request packets.Switches and Root Complexes must associate only Device 0 on the logical bus from aDownstream Port or a Root Port. Configuration Requests targeting the Bus Numberassociated with a Port specifying Device Number 0 are delivered to that Port; ConfigurationRequests specifying all other Device Numbers (1-31) must be terminated with anUnsupported Request Completion Status (equivalent to Master Abort in <strong>PCI</strong>). 26Switches, and components wishing to incorporate more than eight functions at theirupstream Port, may implement one or more Type 1 (<strong>PCI</strong>-to-<strong>PCI</strong> Bridge) configuration spaceheaders. This allows them to introduce an “internal bus” on which all the device numbersmay be utilized, but in this case all address information fields (bus, device and functionnumbers) must be completely decoded to access the correct register. Any configurationaccess targeting an unimplemented bus, device or function must return a Completion withUnsupported Request Completion Status.The following section provides details of the Configuration Space addressing mechanism.5.3.2. Configuration Transaction Addressing<strong>PCI</strong> <strong>Express</strong> Configuration Requests use the following addressing fields:• Bus Number – <strong>PCI</strong> <strong>Express</strong> maps logical <strong>PCI</strong> Bus Numbers onto <strong>PCI</strong> <strong>Express</strong> Linkssuch that <strong>PCI</strong> 2.3 compatible configuration software views the configuration spaceof a <strong>PCI</strong> <strong>Express</strong> Hierarchy as a <strong>PCI</strong> Hierarchy including multiple bus segments.• Device Number – Device Number association is discussed in Section 5.3.1.• Function Number – <strong>PCI</strong> <strong>Express</strong> also supports multi-function devices using thesame discovery mechanism as <strong>PCI</strong> 2.3.• Extended Register Number and Register Number – Specify the configuration spaceaddress of the register being accessed.26 Future switch components that are implemented as a single switch device (without the <strong>PCI</strong>-<strong>PCI</strong> Bridges)that is not limited by legacy compatibility requirements may not have this restriction. To accommodate suchfuture implementations, devices may not assume that device 0 is associated with their upstream port.219


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.05.3.3. Configuration Request Routing RulesFor <strong>PCI</strong> <strong>Express</strong> Endpoint devices, the following rules apply:• If Configuration Request Type is 1,o Follow the rules for handling Unsupported Requests• If Configuration Request Type is 0,o Determine if the Request addresses a valid local configuration space• If so, process the Request• If not, follow rules for handling Unsupported RequestsFor Switches and <strong>PCI</strong> <strong>Express</strong>-<strong>PCI</strong> Bridges, the following rules apply:• Propagation of Configuration Requests from Downstream to Upstream as well as peerto-peerare not supportedo Configuration Requests are initiated only by the Host Bridge• If Configuration Request Type is 0,o Determine if the Request addresses a valid local configuration space• If so, process the Request• If not, follow rules for handling Unsupported Requests• If Configuration Request Type is 1, apply the following tests, in sequence, to the BusNumber field:oIf in the case of a <strong>PCI</strong> <strong>Express</strong>-<strong>PCI</strong> Bridge, equal to the bus number assigned tosecondary <strong>PCI</strong> bus or, in the case of a Switch or Root Complex, equal to the busnumber and decoded device numbers assigned to one of the Root (RootComplex) or Downstream Ports (Switch),• Transform the Request to Type 0• Forward the Request to that Downstream Port (or <strong>PCI</strong> bus, in the caseof a <strong>PCI</strong> <strong>Express</strong>-<strong>PCI</strong> Bridge)o If not equal to the bus number of any of Downstream Ports or secondary <strong>PCI</strong>bus, but in the range of bus numbers assigned to one of a Downstream Port orsecondary <strong>PCI</strong> bus,• Forward the Request to that Downstream Port interface withoutmodificationo Else (none of the above) –• The Request is invalid - follow the rules for handling UnsupportedRequests220


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0• <strong>PCI</strong> <strong>Express</strong>-<strong>PCI</strong> Bridges must terminate as Unsupported Requests any ConfigurationRequests directed towards the <strong>PCI</strong> bus for which the Extended Register Address field isnon-zeroNote: This type of access is a consequence of a programming error.For Root Complexes:• Configuration Requests addressing Bus 0 are processed by the Root Complex.• Configuration Requests addressing other buses are processed according to the rules forSwitches (above)For all types of devices:All other configuration space addressing fields are decoded according to the <strong>PCI</strong> Local Bus<strong>Specification</strong>.5.3.4. Generating <strong>PCI</strong> Special Cycles using <strong>PCI</strong>Configuration Mechanism #1Generating <strong>PCI</strong> Special Cycles using <strong>PCI</strong> Configuration Mechanism Number One (see the<strong>PCI</strong> Local Bus <strong>Specification</strong>, Rev. 2.3 for details), and handling of such Requests, is not required.5.4. Configuration Register TypesConfiguration register fields are assigned one of the attributes described in Table 5-2.Table 5-2: Register (and Register Bit-Field) TypesRegisterAttributeRORWRW1CROSDescriptionRead-only register: Register bits are read-only and cannotbe altered by software.Read-Write register: Register bits are read-write and may beeither set or cleared by software to the desired state.Read-only status, Write-1-to-clear status register: Registerbits indicate status when read, a set bit indicating a statusevent may be cleared by writing a 1. Writing a 0 to RW1Cbits has no effect.Sticky bit - Read-only register: Register bits are read-onlyand cannot be altered by software. Bits are not cleared byreset and can only be reset with “Power Good Reset” (seeSection 7.6). Devices that consume AUX power are notallowed to reset sticky bits on “Power Good Reset” whenAUX power consumption (either via AUX power or PMEEnable) is enabled.221


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0RegisterAttributeRWSRW1CSHwInitRsvdPRsvdZDescriptionSticky bit - Read-Write register: Register bits are read-writeand may be either set or cleared by software to the desiredstate. Bits are not cleared by reset and can only be resetwith “Power Good Reset” (see Section 7.6). Devices thatconsume AUX power are not allowed to reset sticky bits on“Power Good Reset” when AUX power consumption (eithervia AUX power or PME Enable) is enabled.Sticky bit - Read-only status, Write-1-to-clear status register:Register bits indicate status when read, a set bit indicating astatus event may be cleared by writing a 1. Writing a 0 toRW1CS bits has no effect. Bits are not cleared by reset andcanonlyberesetwith“Power Good Reset” (seeSection 7.6). Devices that consume AUX power are notallowed to reset sticky bits on “Power Good Reset” whenAUX power consumption (either via AUX power or PMEEnable) is enabled.Hardware Initialized: Register bits are initialized by firmwareor hardware mechanisms such as pin strapping or serialEEPROM. Bits are read-only after initialization and can onlybe reset (for write-once by firmware) with “Power GoodReset” (see Section 7.6).Reserved and Preserved: Reserved for future RWimplementations; software must preserve value read forwrites to bits.Reserved and Zero: Reserved for future RW1Cimplementations; software must use 0 for writes to bits.5.5. <strong>PCI</strong>-Compatible Configuration RegistersThe first 256 bytes of the <strong>PCI</strong> <strong>Express</strong> configuration space form the <strong>PCI</strong> 2.3 compatibilityregion. This region completely aliases the <strong>PCI</strong> 2.3 configuration space of thedevice/function. Legacy <strong>PCI</strong> devices may also be accessed via enhanced <strong>PCI</strong> <strong>Express</strong>configuration access mechanism without requiring any modifications to the device hardwareor device driver software. This section establishes the mapping between <strong>PCI</strong> 2.3 and <strong>PCI</strong><strong>Express</strong> for format and behavior of <strong>PCI</strong> 2.3 compatible registers.All registers and fields not described in this section are assumed to have the exact samedefinition as in <strong>PCI</strong> 2.3.222


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.05.5.1. Type 0/1 Common Configuration SpaceFigure 5-4 details allocation for common register fields of <strong>PCI</strong> 2.3 Type 0 and Type 1Configuration Space Headers for <strong>PCI</strong> <strong>Express</strong> devices.310ByteOffsetDevice IDStatusVendor IDCommand00h04hClass CodeRevision ID08hBISTHeader TypeMaster LatencyTimerCacheLineSize0Ch10h14h18h1Ch20hHeader Type Specific24h28h2Ch30hCapabilities Pointer34h38hInterrupt PinInterrupt Line3ChFigure 5-4: Common Configuration Space HeaderThese registers are defined for both Type 0 and Type 1 Configuration Space Headers. The<strong>PCI</strong> <strong>Express</strong>-specific interpretation of these registers is defined in this section.223


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.05.5.1.1. Command Register (Offset 04h)Table 5-3 establishes the mapping between <strong>PCI</strong> 2.3 and <strong>PCI</strong> <strong>Express</strong> for <strong>PCI</strong> 2.3configuration space Command register.Table 5-3: Command RegisterBit Location Register Description Attributes2 Bus Master Enable – Controls the ability of a <strong>PCI</strong> <strong>Express</strong>agent to issue memory and I/O read/write requests. Disablingthis bit prevents a <strong>PCI</strong> <strong>Express</strong> agent from issuing any memoryor I/O read/write requests. Note that as MSI interrupt messagesare in-band memory writes, disabling the bus master enable bitdisables MSI interrupt messages as well.Default value of this field is 0.RW3 Special Cycle Enable – Does not apply to <strong>PCI</strong> <strong>Express</strong>. Mustbe hardwired to 0.4 Memory Write and Invalidate – Does not apply to <strong>PCI</strong> <strong>Express</strong>.Must be hardwired to 0.5 VGA Palette Snoop – Does not apply to <strong>PCI</strong> <strong>Express</strong>. Must behardwired to 0.6 Parity Error Enable – See Section 5.5.1.7.Default value of this field is 0.7 IDSEL Stepping / Wait Cycle Control – Does not apply to <strong>PCI</strong><strong>Express</strong>. Must be hardwired to 0.8 SERR Enable – See Section 5.5.1.7.This bit when set enables reporting of non-fatal and fatal errorsto the Root Complex. Note that <strong>PCI</strong> <strong>Express</strong>-specific errorregister bits take precedence over this bit.Default value of this field is 0.9 Fast Back-to-Back Transactions Enable – Does not apply to<strong>PCI</strong> <strong>Express</strong>. Must be hardwired to 0.10 Interrupt Disable - Controls the ability of a <strong>PCI</strong> <strong>Express</strong> deviceto generate INTx interrupt messages. When set, devices areprevented from generating INTx interrupt messages.Any INTx emulation interrupts already asserted must bedeasserted when this bit is set.Default value of this field is 0.RORORORWRORWRORW224


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.05.5.1.2. Status Register (Offset 06h)Table 5-4 establishes the mapping between <strong>PCI</strong> 2.3 and <strong>PCI</strong> <strong>Express</strong> for <strong>PCI</strong> 2.3configuration space Status register.Table 5-4: Status RegisterBit Location Register Description Attributes3 Interrupt Status - Indicates that an INTx interrupt message ispending internally to the device.Default value of this field is 0.RO4 Capabilities List – Indicates the presence of an extendedcapability list item. Since all <strong>PCI</strong> <strong>Express</strong> devices are requiredto implement the <strong>PCI</strong> <strong>Express</strong> capability structure, this bit mustbe set to 1.5 66 MHz Capable – Does not apply to <strong>PCI</strong> <strong>Express</strong>. Must behardwired to 0.7 Fast Back-to-Back Transactions Capable – Does not apply to<strong>PCI</strong> <strong>Express</strong>. Must be hardwired to 0.8 Master Data Parity Error – See Section 5.5.1.7.This bit is set by Requestor (Primary Side for Type 1Configuration Space Header Device) if its Parity Error Enable bitis set and either of the following two conditions occurs:• Requestor receives a Completion marked poisoned• Requestor poisons a write RequestIf the Parity Error Enable bit is cleared, this bit is never set.Default value of this field is 0.10:9 DEVSEL Timing – Does not apply to <strong>PCI</strong> <strong>Express</strong>. Must behardwired to 0.11 Signaled Target Abort – See Section 5.5.1.7.This bit is set when a device (Primary Side for Type 1Configuration Space Header device for requests completed bythe Type 1 Header device itself) completes a Request usingCompleter Abort Completion Status.Default value of this field is 0.12 Received Target Abort – See Section 5.5.1.7.This bit is set when a Requestor (Primary Side for Type 1Configuration Space Header device for requests initiated by theType 1 Header device itself) receives a Completion withCompleter Abort Completion Status.Default value of this field is 0.RORORORW1CRORW1CRW1C225


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0Bit Location Register Description Attributes13 Received Master Abort – See Section 5.5.1.7.This bit is set when a Requestor (Primary Side for Type 1Header Configuration Space Header device for requestsinitiated by the Type 1 Header device itself) receives aCompletion with Unsupported Request Completion Status.Default value of this field is 0.14 Signaled System Error – See Section 5.5.1.7.This bit is set when a device sends a ERR_FATAL orERR_NONFATAL message.Default value of this field is 0.15 Detected Parity Error – See Section 5.5.1.7.This bit is set by a device (Primary Side for Type 1 ConfigurationSpace Header device) whenever it receives a poisoned TLP,regardless of the state the Parity Error Enable bit.Default value of this field is 0.RW1CRW1CRW1C5.5.1.3. Cache Line Size Register (Offset 0Ch)The cache line size register is set by the system firmware and the operating system to systemcache line size. However, note that legacy <strong>PCI</strong> 2.3 software may not always be able toprogram this field correctly especially in case of hot-plug devices. This field is implementedby <strong>PCI</strong> <strong>Express</strong> devices as a read-write field for legacy compatibility purposes but has noimpact on any <strong>PCI</strong> <strong>Express</strong> device functionality.5.5.1.4. Master Latency Timer Register (Offset 0Dh)This register is also referred to as primary latency timer for Type 1 Configuration SpaceHeader devices. The primary/master latency timer does not apply to <strong>PCI</strong> <strong>Express</strong>. Thisregister must be hardwired to 0.5.5.1.5. Interrupt Line Register (Offset 3Ch)As in <strong>PCI</strong> 2.3, the Interrupt Line register communicates interrupt line routing information.The register is read/write and must be implemented by any device (or device function) thatuses an interrupt pin (see following description). Values in this register are programmed bysystem software and are system architecture specific. The device itself does not use thisvalue; rather the value in this register is used by device drivers and operating systems.5.5.1.6. Interrupt Pin Register (Offset 3Dh)The Interrupt Pin is a read-only register that identifies the legacy interrupt message(s) thedevice (or device function) uses; refer to Section 7.1 for further details. Valid values are 1, 2,226


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.03, and 4 that map to legacy interrupt messages for INTA, INTB, INTC, and INTDrespectively; a value of 0 indicates that the device uses no legacy interrupt message(s).5.5.1.7. Error RegistersThe error control/status register bits in the Command and Status registers (seeSection 5.5.1.1 and Section 5.5.1.2 respectively) control <strong>PCI</strong> compatible error reporting forboth <strong>PCI</strong> and <strong>PCI</strong> <strong>Express</strong> devices. Mapping of <strong>PCI</strong> <strong>Express</strong> errors onto <strong>PCI</strong> errors is alsodiscussed in Section 7.2.5.1. In addition to the <strong>PCI</strong> compatible error control and status, <strong>PCI</strong><strong>Express</strong> error reporting may be controlled separately from <strong>PCI</strong> devices through the <strong>PCI</strong><strong>Express</strong> Capability Structure described in Section 5.8. The <strong>PCI</strong> compatible error control andstatus register fields do not have any effect on <strong>PCI</strong> <strong>Express</strong> error reporting enabled throughthe <strong>PCI</strong> <strong>Express</strong> Capability Structure. <strong>PCI</strong> <strong>Express</strong> devices may also implement optionaladvanced error reporting as described in Section 5.10.227


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.05.5.2. Type 0 Configuration Space HeaderFigure 5-5 details allocation for register fields of <strong>PCI</strong> 2.3 Type 0 Configuration Space Headerfor <strong>PCI</strong> <strong>Express</strong> devices.310ByteOffsetDevice IDStatusVendor IDCommand00h04hClass CodeRevision ID08hBISTHeader TypeMaster LatencyTimerCacheLineSize0Ch10h14h<strong>Base</strong> Address Registers18h1Ch20h24hCardbus CIS Pointer28hSubsystem IDSubsystem Vendor ID2ChExpansion ROM <strong>Base</strong> Address30hReservedReservedCapabilities Pointer34h38hMax_LatMin_GntInterrupt PinInterrupt Line3ChFigure 5-5: Type 0 Configuration Space HeaderSection 5.5.1 details the <strong>PCI</strong> <strong>Express</strong>-specific registers that are valid for all ConfigurationSpace Header types. The <strong>PCI</strong> <strong>Express</strong>-specific interpretation of registers specific to Type 0<strong>PCI</strong> 2.3 Configuration Space Header is defined in this section.5.5.2.1. Min_Gnt/Max_Lat Registers (Offset 3Eh/3Fh)These registers do not apply to <strong>PCI</strong> <strong>Express</strong>. They must be read-only and hardwired to 0.228


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.05.5.3. Type 1 Configuration Space HeaderFigure 5-6 details allocation for register fields of <strong>PCI</strong> 2.3 Type 1 Configuration Space Headerfor <strong>PCI</strong> <strong>Express</strong> devices.310ByteOffsetDevice IDStatusVendor IDCommand00h04hClass CodeRevision ID08hBISTHeader TypePrimary LatencyTimerCacheLineSize0Ch<strong>Base</strong> Address Register 0<strong>Base</strong> Address Register 110h14hSecondary LatencyTimerSubordinate BusNumberSecondary BusNumberPrimary BusNumber18hSecondary StatusI/O LimitI/O <strong>Base</strong>1ChMemory LimitPrefetchable Memory LimitMemory <strong>Base</strong>Prefetchable Memory <strong>Base</strong>20h24hPrefetchable <strong>Base</strong> Upper 32 BitsPrefetchable Limit Upper 32 Bits28h2ChI/O Limit Upper 16 BitsI/O <strong>Base</strong> Upper 16 Bits30hReservedCapability Pointer34hExpansion ROM <strong>Base</strong> Address38hBridge ControlInterrupt PinInterrupt Line3ChFigure 5-6: Type 1 Configuration Space HeaderSection 5.5.1 details the <strong>PCI</strong> <strong>Express</strong>-specific registers that are valid for all ConfigurationSpace Header types. The <strong>PCI</strong> <strong>Express</strong>-specific interpretation of registers specific to Type 1<strong>PCI</strong> 2.3 Configuration Space Header is defined in this section.5.5.3.1. Secondary Latency Timer (Offset 1Bh)This register does not apply to <strong>PCI</strong> <strong>Express</strong>. It must be read-only and hardwired to 0.229


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.05.5.3.2. Secondary Status Register (Offset 1Eh)Table 5-5 establishes the mapping between <strong>PCI</strong> 2.3 and <strong>PCI</strong> <strong>Express</strong> for <strong>PCI</strong> 2.3configuration space Secondary Status register.Table 5-5: Secondary Status RegisterBit Location Register Description Attributes5 66 MHz Capable – Does not apply to <strong>PCI</strong> <strong>Express</strong>. Must beROhardwired to 0.7 Fast Back-to-Back Transactions Capable – Does not apply to<strong>PCI</strong> <strong>Express</strong>. Must be hardwired to 0.8 Master Data Parity Error – See Section 5.5.1.7.This bit is set by the Secondary side Requestor if theParity Error Response bit is set and either of the following twoconditions occurs:• Requestor receives Completion marked poisoned• Requestor poisons a write RequestIf the Parity Error Response bit is cleared, this bit is never set.Default value of this field is 0.10:9 DEVSEL Timing – Does not apply to <strong>PCI</strong> <strong>Express</strong>. Must behardwired to 0.11 Signaled Target Abort – See Section 5.5.1.7.This bit is set when the Secondary Side for Type 1 ConfigurationSpace Header device (for requests completed by the Type 1Header device itself) completes a Request using CompleterAbort Completion Status.Default value of this field is 0.12 Received Target Abort – See Section 5.5.1.7.This bit is set when the Secondary Side for Type 1 ConfigurationSpace Header device (for requests initiated by the Type 1Header device itself) receives a Completion with CompleterAbort Completion Status.Default value of this field is 0.13 Received Master Abort – See Section 5.5.1.7.This bit is set when the Secondary Side for Type 1 ConfigurationSpace Header device (for requests initiated by the Type 1Header device itself) receives a Completion with UnsupportedRequest Completion Status.Default value of this field is 0.RORW1CRORW1CRW1CRW1C230


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0Bit Location Register Description Attributes14 Received System Error – See Section 5.5.1.7.This bit is sent when a device sends a ERR_FATAL orERR_NONFATAL message.Default value of this field is 0.15 Detected Parity Error – See Section 5.5.1.7.This bit is set by the Secondary Side for a Type 1 ConfigurationSpace Header device whenever it receives a poisoned TLP,regardless of the state the Parity Error Response bit.Default value of this field is 0.RW1CRW1C5.5.3.3. Bridge Control Register (Offset 3Eh)Table 5-6 establishes the mapping between <strong>PCI</strong> 2.3 and <strong>PCI</strong> <strong>Express</strong> for <strong>PCI</strong> 2.3configuration space Bridge Control register.Table 5-6: Bridge Control RegisterBit Location Register Description Attributes0 Parity Error Response Enable – See Section 5.5.1.7.This bit controls the response to poisoned TLPs.Default value of this field is 0.RW1 SERR Enable – See Section 5.5.1.7.This bit controls forwarding of ERR_COR, ERR_NONFATALand ERR_FATAL from secondary to primary.Default value of this field is 0.5 Master Abort Mode – Does not apply to <strong>PCI</strong> <strong>Express</strong>. Must behardwired to 0.6 Secondary Bus Reset – Setting this bit triggers a warm reset onthe corresponding <strong>PCI</strong> <strong>Express</strong> Port and the <strong>PCI</strong> <strong>Express</strong>hierarchy domain subordinate to the Port.Default value of this field is 0.7 Fast Back-to-Back Transactions Enable – Does not apply to<strong>PCI</strong> <strong>Express</strong>. Must be hardwired to 0.8 Primary Discard Timer – Does not apply to <strong>PCI</strong> <strong>Express</strong>. Mustbe hardwired to 0.9 Secondary Discard Timer – Does not apply to <strong>PCI</strong> <strong>Express</strong>.Must be hardwired to 0.10 Discard Timer Status – Does not apply to <strong>PCI</strong> <strong>Express</strong>. Mustbe hardwired to 0.11 Discard Timer SERR Enable – Does not apply to <strong>PCI</strong> <strong>Express</strong>.Must be hardwired to 0.RWRORWRORORORORO231


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.05.6. <strong>PCI</strong> Power Management Capability StructureThis structure is required for all <strong>PCI</strong> <strong>Express</strong> devices. Figure 5-7 details allocation of the<strong>PCI</strong> PM Capability Structure register fields in a <strong>PCI</strong> <strong>Express</strong> Context. <strong>PCI</strong> <strong>Express</strong> devicesare required to support D0 and D3 device states (refer to Section 6.1.1); <strong>PCI</strong>-<strong>PCI</strong> bridgestructures representing <strong>PCI</strong> <strong>Express</strong> ports as described in Section 5.1 are required to indicatePME wake capability due to the in-band nature of PME messaging for <strong>PCI</strong> <strong>Express</strong>.The PME status bit for the <strong>PCI</strong>-<strong>PCI</strong> bridge structure representing <strong>PCI</strong> <strong>Express</strong> ports,however, is only set when the <strong>PCI</strong>-<strong>PCI</strong> bridge function is itself generating a PME. ThePME status bit is not set when the bridge is propagating a PME but the <strong>PCI</strong>-<strong>PCI</strong> bridgefunction itself is not internally asserting PME.310ByteOffsetPower Management CapabilitiesNext PTRCapability ID00hDataPM Control/StatusBridge ExtensionsPower Management Status and ControlFigure 5-7: <strong>PCI</strong> Power Management Capability Structure04h31 272625 24 22 21 20 19 18 16 158 70Next PTRCap IDPME SupportD2 SupportD1 SupportAUX CurrentVersionPME ClockRsvdPDevice Specific Initialization (DSI)Figure 5-8: Power Management CapabilitiesFigure 5-8 details allocation of register fields for Power Management Capabilities register;Table 5-7 establishes the mapping between <strong>PCI</strong> 2.3 and <strong>PCI</strong> <strong>Express</strong> for this register.Table 5-7: Power Management CapabilitiesBit Location Register Description Attributes7:0 Capability ID – Must be set to 01h RO15:8 Next Capability Pointer RO18:16 Version – Set to 02h for this version of the specification. RO19 PME Clock – Does not apply to <strong>PCI</strong> <strong>Express</strong>. Must behardwired to 0.21 Device Specific Initialization RO24:22 AUX Current RORO232


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0Bit Location Register Description Attributes25 D1 Support RO26 D2 Support RO31:27 PME Support – Must be set for <strong>PCI</strong>-<strong>PCI</strong> bridge structuresrepresenting ports on root complexes/switches.RO3124 23 22 2116 15 14 13 12 9 8 72 1 0DataRsvdPRsvdPBus Power/ClockControl EnableB2/B3 SupportPower StatePME EnableData SelectData ScalePME StatusFigure 5-9: Power Management Status/ControlFigure 5-9 details allocation of register fields for Power Management Status and Controlregister; Table 5-8 establishes the mapping between <strong>PCI</strong> 2.3 and <strong>PCI</strong> <strong>Express</strong> for thisregister.Table 5-8: Power Management Status/ControlBit Location Register Description Attributes1:0 Power State RW8 PME Enable RWS12:9 Data Select RW14:13 Data Scale RO15 PME Status RW1CS22 B2/B3 Support – Does not apply to <strong>PCI</strong> <strong>Express</strong>. Must behardwired to 0.RO23 Bus Power/Clock Control Enable – Does not apply to <strong>PCI</strong><strong>Express</strong>. Must be hardwired to 0.31:24 Data RORO233


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.05.7. MSI Capability StructureThis structure is required for all <strong>PCI</strong> <strong>Express</strong> devices that are capable of generatinginterrupts. Definition of register structure associated with MSI is compatible with <strong>PCI</strong> 2.3specification.5.8. <strong>PCI</strong> <strong>Express</strong> Capability Structure<strong>PCI</strong> <strong>Express</strong> defines a capability structure in <strong>PCI</strong> 2.3 compatible configuration space (first256 bytes) as shown in Figure 5-3 for identification of a <strong>PCI</strong> <strong>Express</strong> device and indicatessupport for new <strong>PCI</strong> <strong>Express</strong> features. The <strong>PCI</strong> <strong>Express</strong> Capability Structure is required for<strong>PCI</strong> <strong>Express</strong> devices. The capability structure is a mechanism for enabling <strong>PCI</strong> softwaretransparent features requiring support on legacy operating systems. In addition toidentifying a <strong>PCI</strong> <strong>Express</strong> device, the <strong>PCI</strong> <strong>Express</strong> Capability Structure is used to provideaccess to <strong>PCI</strong> <strong>Express</strong> specific Control/Status registers and related Power Managementenhancements.Figure 5-10 details allocation of register fields in the <strong>PCI</strong> <strong>Express</strong> Capability Structure. The<strong>PCI</strong> <strong>Express</strong> Capabilities, Device Capabilities, Device Status/Control, Link Capabilities andLink Status/Control registers are required for all <strong>PCI</strong> <strong>Express</strong> devices. Endpoints are notrequired to implement registers other than those listed above and terminate the capabilitystructure.Slot Capabilities and Slot Status/Control registers are required for Switch Downstream andRoot Ports if a slot is implemented on the port. Root Control/Status registers are requiredfor root ports. Root ports must implement the entire <strong>PCI</strong> <strong>Express</strong> Capability Structure.31231570ByteOffset3GIO Capabilities RegisterNext Cap Pointer3GIO Cap ID00hDevice Capabilities04hRootPortsPortswithSlotsAllDevicesDevice StatusLink StatusLink CapabilitiesDevice ControlLink Control08h0Ch10hSlot Capabilities14hSlot StatusSlot Control18hRsvdPRoot Control1ChRoot Status20hFigure 5-10: <strong>PCI</strong> <strong>Express</strong> Capability Structure234


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.05.8.1. <strong>PCI</strong> <strong>Express</strong> Capability List Register (Offset 00h)The <strong>PCI</strong> <strong>Express</strong> Capability List register enumerates the <strong>PCI</strong> <strong>Express</strong> Capability Structure inthe <strong>PCI</strong> 2.3 configuration space capability list. Figure 5-11 details allocation of register fieldsin the <strong>PCI</strong> <strong>Express</strong> Capability List register; Table 5-9 provides the respective bit definitions.15 8 70Next Capability PointerCapability IDFigure 5-11: <strong>PCI</strong> <strong>Express</strong> Capability List RegisterTable 5-9: <strong>PCI</strong> <strong>Express</strong> Capability List RegisterBit Location Register Description Attributes7:0 Capability ID – Indicates <strong>PCI</strong> <strong>Express</strong> Capability Structure.This field must return a Capability ID of (value to be assigned by<strong>PCI</strong>-SIG) indicating that this is a <strong>PCI</strong> <strong>Express</strong> CapabilityStructure.15:8 Next Capability Pointer – The offset to the next <strong>PCI</strong> capabilitystructure or 00h if no other items exist in the linked list ofcapabilities.RORO5.8.2. <strong>PCI</strong> <strong>Express</strong> Capabilities Register (Offset 02h)The <strong>PCI</strong> <strong>Express</strong> Capabilities register identifies <strong>PCI</strong> <strong>Express</strong> device type and associatedcapabilities. Figure 5-12 details allocation of register fields in the <strong>PCI</strong> <strong>Express</strong> Capabilitiesregister; Table 5-10 provides the respective bit definitions.15 14 139 8 74 30Interrupt Message Device/Port TypeCapabilityRsvdPNumberVersionSlot ImplementedFigure 5-12: <strong>PCI</strong> <strong>Express</strong> Capabilities Register235


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0Table 5-10: <strong>PCI</strong> <strong>Express</strong> Capabilities RegisterBit Location Register Description Attributes3:0 Capability Version – Indicates <strong>PCI</strong>-SIG defined <strong>PCI</strong> <strong>Express</strong>capability structure version number.Must be 1h for this specification.7:4 Device/Port Type – Indicates the type of <strong>PCI</strong> <strong>Express</strong> device.Defined encodings are:0000b <strong>PCI</strong> <strong>Express</strong> Endpoint device0001b Legacy <strong>PCI</strong> <strong>Express</strong> Endpoint device0100b Root Port of <strong>PCI</strong> <strong>Express</strong> Root Complex*0101b Upstream Port of <strong>PCI</strong> <strong>Express</strong> Switch*0110b Downstream Port of <strong>PCI</strong> <strong>Express</strong> Switch*0111b <strong>PCI</strong> <strong>Express</strong>-to-<strong>PCI</strong>/<strong>PCI</strong>-X Bridge*ROROAll other encodings are reserved.*This value is only valid for devices/functions that implement aType 01h <strong>PCI</strong> Configuration Space Header.Native <strong>PCI</strong> <strong>Express</strong> Endpoint devices that do not require I/Oresources for correct operation indicate a device Type of 0000b;such devices may request I/O resources (through BARs) forlegacy boot support but system software is allowed to closerequested I/O resources once appropriate services are madeavailable to device specific software for access to devicespecific resources claimed through memory BARs.Legacy <strong>PCI</strong> <strong>Express</strong> Endpoint devices that require I/O resourcesclaimed through BARs for correct operation indicate a DeviceType of 0001b.8 Slot Implemented – This bit when set indicates that the <strong>PCI</strong><strong>Express</strong> Link associated with this port is connected to a slot (ascompared to being connected to an integrated component orbeing disabled).This field is valid for the following <strong>PCI</strong> <strong>Express</strong> device/PortTypes:0100b Root Port of <strong>PCI</strong> <strong>Express</strong> Root Complex0110b Downstream Port of <strong>PCI</strong> <strong>Express</strong> Switch13:9 Interrupt Message Number – If this function is allocated morethan one MSI interrupt number, this register is required tocontain the offset between the base Message Data and the MSIMessage that is generated when any of status bits in either theSlot Status register or the Root Port Status register of thiscapability structure are set.Hardware is required to update this field so that it is correct if thenumber of MSI Messages assigned to the device changes.HwInitRO236


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.05.8.3. Device Capabilities Register (Offset 04h)The Device Capabilities register identifies <strong>PCI</strong> <strong>Express</strong> device specific capabilities.Figure 5-13 details allocation of register fields in the Device Capabilities register; Table 5-11provides the respective bit definitions.31 28 27 26 25 18 1715141312119 8 6 5 4 3 2 0RsvdPSlot Power ScaleSlot Power ValueMax Read Request Size SupportedPower Indicator Present On DeviceAttention Indicator Present On DeviceAttention Button Present On DeviceMax Payload Size SupportedPhantom Functions SupportedExtended Tag Field SupportedEndpoints L0s Acceptable LatencyEndpoint L1 Acceptable LatencyFigure 5-13: Device Capabilities RegisterTable 5-11: Device Capabilities RegisterBit Location Register Description Attributes2:0 Max_Payload_Size Supported – This field indicates themaximum payload size that the device can support for TLPs.Defined encodings are:RO000b 128B max payload size001b 256B max payload size010b 512B max payload size011b 1024B max payload size100b 2048B max payload size101b 4096B max payload size110b Reserved111b Reserved237


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0Bit Location Register Description Attributes4:3 Phantom Functions Supported – This field indicates thesupport for use of unclaimed function numbers to extend thenumber of outstanding transactions allowed by logicallycombining unclaimed function numbers (called PhantomFunctions) with the Tag identifier. See Section 2.4.2 fordescription of Tag Extensions.This field indicates the number of most significant bits of thefunction number portion of Requester ID that are logicallycombined with the Tag identifier. Defined encodings are:00b No function number bits used for Phantom Functions;device may implement all function numbers.01b10b11bFirst most significant bit of function number inRequestor ID used for Phantom Functions; device mayimplement functions 0-3. Functions 0, 1, 2, and 3 mayclaim functions 4, 5, 6, and 7 as Phantom Functionsrespectively.First two most significant bits of function number inRequestor ID used for Phantom Functions; device mayimplement functions 0-1. Function 0 may claimfunctions 2, 4, and 6 as Phantom Functions, function 1may claim functions 3, 5, and 7 as Phantom Functions.All three bits of function number in Requestor ID usedfor Phantom Functions; device must be a singlefunction 0 device that may claim all other functions asPhantom Functions.RONote that Phantom Function support for the Device must beenabled by the corresponding control field in the Device Controlregister.A Root Port must always return 0b in this field.5 Extended Tag Field Supported – This field indicates themaximum supported size of the Tag field. Defined encodingsare:0b 5-bit Tag field supported1b 8-bit Tag field supportedRONote that 8-bit Tag field support must be enabled by thecorresponding control field in the Device Control register.A Root Port must always return 0b in this field.238


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0Bit Location Register Description Attributes8:6 Endpoint L0s Acceptable Latency – This field indicates theacceptable latency that an Endpoint can withstand due to thetransition from L0s state to the L0 state. It is essentially anindirect measure of the Endpoint’s internal buffering.Power management software uses the reported L0s AcceptableLatency number to compare against the L0s exit latenciesreported by all components comprising the data path from thisEndpoint to the Root Complex Root Port to determine whetherActive State Link PM L0s entry can be used with no loss ofperformance. Defined encodings are:000b Less than 64 ns001b 64 ns-128 ns010b 128 ns-256 ns011b 256 ns-512 ns100b 512 ns-1 µs101b 1 µs-2 µs110b 2 µs-4 µs111b More than 4 µs11:9 Endpoint L1 Acceptable Latency – This field indicates theacceptable latency that an Endpoint can withstand due to thetransition from L1 state to the L0 state. It is essentially anindirect measure of the Endpoint’s internal buffering.Power management software uses the reported L1 AcceptableLatency number to compare against the L1 Exit Latenciesreported (see below) by all components comprising the datapath from this Endpoint to the Root Complex Root Port todetermine whether Active State Link PM L1 entry can be usedwith no loss of performance. Defined encodings are:000b Less than 1µs001b 1 µs-2 µs010b 2 µs-4 µs011b 4 µs-8 µs100b 8 µs-16 µs101b 16 µs-32 µs110b 32 µs-64 µs111b More than 64 µs12 Attention Button Present – This bit when set indicates that anAttention Button is implemented on the card or module.This bit is valid for the following <strong>PCI</strong> <strong>Express</strong> device Types:0000b <strong>PCI</strong> <strong>Express</strong> Endpoint device0001b Legacy <strong>PCI</strong> <strong>Express</strong> Endpoint deviceRORORO239


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0Bit Location Register Description Attributes13 Attention Indicator Present – This bit when set indicates thatan Attention Indicator is implemented on the card or module.This bit is valid for the following <strong>PCI</strong> <strong>Express</strong> device Types:0000b <strong>PCI</strong> <strong>Express</strong> Endpoint device0001b Legacy <strong>PCI</strong> <strong>Express</strong> Endpoint device14 Power Indicator Present – This bit indicates when set indicatesthat a Power Indicator is implemented on the card or module.This bit is valid for the following <strong>PCI</strong> <strong>Express</strong> device Types:0000b <strong>PCI</strong> <strong>Express</strong> Endpoint device0001b Legacy <strong>PCI</strong> <strong>Express</strong> Endpoint device17:15 Max_Read_Request_Size Supported (Root Complex only) -This field indicates the maximum Read Request size for theDevice as a Completer. Defined encodings are:000b Reserved001b Reserved010b 512B max read request size011b 1024B max read request size100b 2048B max read request size101b 4096B max read request size110b Reserved111b Reserved25:18 Slot Power Limit Value (Upstream Ports only) – In combinationwith the Slot Power Limit Scale value, specifies the upper limiton power supplied by slot.Power limit (in Watts) calculated by multiplying the value in thisfield by the value in the Slot Power Limit Scale field.This value is set by the Set_Slot_Power_Limit message orhardwired to 0000 0000b (see Section 7.9). The default value is0000 0000b.27:26 Slot Power Limit Scale (Upstream Ports only) – Specifies thescale used for the Slot Power Limit Value.Range of Values00b = 1.0x (25.5-255)01b = 0.1x (2.55-25.5)10b = 0.01x (0.255-2.55)11b = 0.001x (0.0-0.255)This value is set by the •Set_Slot_Power_Limit message orhardwired to 00b (see Section 7.9). The default value is all 00b.RORORORORO240


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.05.8.4. Device Control Register (Offset 08h)The Device Control register controls <strong>PCI</strong> <strong>Express</strong> device specific parameters. Figure 5-14details allocation of register fields in the Device Control register; Table 5-12 provides therespective bit definitions.15 14 12 11 10 9 8 7 5 4 3 2 1 0RsvdPMax Read Request SizeStopAux Power PM EnablePhantom Functions EnableExtended Tag Field EnableFigure 5-14: Device Control RegisterCorrectable Error Reporting EnableNon-Fatal Error Reporting EnableFatal Error Reporting EnableUnsupported Request Reporting EnableUnsupported Request SeverityMax Payload SizeTable 5-12: Device Control RegisterBit Location Register Description Attributes0 Correctable Error Reporting Enable – This bit controlsreporting of correctable errors. Refer to Section 7.2 for furtherdetails. For a multi-function device, this bit controls errorreporting for each function from point-of-view of the respectivefunction.For a Root Port, the reporting of correctable errors is internal tothe root. No external ERR_CORR message is generated.Default value of this field is 0.RW1 Non-Fatal Error Reporting Enable - This bit controls reportingof non-fatal errors. Refer to Section 7.2 for further details. For amulti-function device, this bit controls error reporting for eachfunction from point-of-view of the respective function.For a Root Port, the reporting of non-fatal errors is internal to theroot. No external ERR_NONFATAL message is generated.Default value of this field is 0.2 Fatal Error Reporting Enable - This bit controls reporting offatal errors. Refer to Section 7.2 for further details. For a multifunctiondevice, this bit controls error reporting for each functionfrom point-of-view of the respective function.For a Root Port, the reporting of fatal errors is internal to theroot. No external ERR_FATAL message is generated.Default value of this field is 0.RWRW241


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0Bit Location Register Description Attributes3 Unsupported Request Reporting Enable – This bit enablesreporting of Unsupported Requests when set. Refer toSection 7.2 for further details. For a multi-function device, thisbit controls error reporting for each function from point-of-view ofthe respective function. Note that the reporting of errormessages (ERR_CORR, ERR_NONFATAL, ERR_FATAL)received by Root Port is controlled exclusively by Root PortCommand Register described in Section 5.8.12.Default value of this field is 0.4 Unsupported Request Severity – This bit controls whetherERR_NONFATAL (0) or ERR_FATAL (1) is used for reportingUnsupported Request errors.Default value of this field is 0.7:5 Max_Payload_Size - This field sets maximum TLP payload sizefor the device. As a receiver, the device must handle TLPs aslarge as the set value; as transmitter, the device must notgenerate TLPs exceeding the set value. Permissible values thatcan be programmed are indicated by the Max_Payload_SizeSupported in the Device Capabilities register (refer toSection 5.8.3). Defined encodings for this field are:000b 128B max payload size001b 256B max payload size010b 512B max payload size011b 1024B max payload size100b 2048B max payload size101b 4096B max payload size110b Reserved111b ReservedRWRWRWDefault value of this field is 001b.8 Extended Tag Field Enable – When set, this bit enables adevice to use an 8-bit Tag field as a requester. If the bit iscleared, the device is restricted to a 5-bit Tag field. SeeSection 2.4.2 for description of Tag extensions.Default value of this field is 0.A Root Port does not implement this field.9 Phantom Functions Enable – When set, this bit enables adevice to use unclaimed functions as Phantom Functions toextend the number of outstanding transaction identifiers. If thebit is cleared, the device is not allowed to use PhantomFunctions. See Section 2.4.2 for description of Tag extensions.Default value of this field is 0.A Root Port does not implement this field.RWRW242


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0Bit Location Register Description Attributes10 Auxiliary (AUX) Power PM Enable - This bit when set enablesa device to draw AUX power independent of PME AUX power.devices that require AUX power on legacy operating systemsshould continue to indicate PME AUX power requirements.AUX power is allocated as requested in the AUX_Current field ofthe Power Management Capabilities Register (PMC),independent of the PME_En bit in the Power ManagementControl/Status Register (PMCSR) (see Chapter 6). For multifunctiondevices, a component is allowed to draw AUX power ifat least one of the functions has this bit set.Default value of this field is 0.11 Stop – Writing 1 to this bit signals the device to completepending transactions. Refer to Section 7.4 for device/functionstop synchronization mechanism.This bit always returns 0 when read.14:12 Max_Read_Request_Size - This field sets maximum ReadRequest size for the Device as a Requester. The Device mustnot generate read requests with size exceeding the set value.Permissible values that can be programmed are indicated by theMax_Read_Request_Size Supported in the Device Capabilitiesregister (refer to Section 5.8.3). Defined encodings for this fieldare:000b 128B max read request size001b 256B max read request size010b 512B max read request size011b 1024B max read request size100b 2048B max read request size101b 4096B max read request size110b Reserved111b ReservedRWRWRWDefault value of this field is 010b.243


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0Implementation Note: Use of Max_Read_Request_SizeThe Max_Read_Request_Size mechanism allows improved control of bandwidth allocationin systems where quality of service (QoS) is important for the target applications. Forexample, an arbitration scheme based on counting requests (and not the sizes of thoserequests) provides poor bandwidth allocation when some Requesters use much larger sizesthan others. The Max_Read_Request_Size mechanism can be used to force more uniformallocation of bandwidth, by restricting the upper size of read requests.The mechanism provides a way to simplify a Root Complex implementation by limiting thesize of the read requests which the Root Complex, as a Completer must handle.<strong>PCI</strong> <strong>Express</strong> aware operating systems may use the Max_Read_Request_Size mechanism tohelp enable correct operation of a device whose Max_Payload_Size capability is smaller thanthe Max_Payload_Size configured for other devices within the same Hierarchy Domain. Forsuch a device, its Max_Read_Request_Size can be configured to equal its Max_Payload_Size.Thus, read completion packets destined for that device are guaranteed never to exceed itsMax_Payload_Size even when the Completer's Max_Payload_Size is configured to a highervalue. Otherwise, the Max_Payload_Size of the other devices would have to be reduced tothe “lowest common denominator” of the devices they send read completions to.Use of the Max_Read_Request_Size mechanism as described above does not address theissue of devices sending large posted writes to a device whose Max_Payload_Size capabilityis smaller than their configured Max_Payload_Size. However, for many devices, theirprogramming model doesn't require them to receive posted writes of a size exceeding theirMax_Payload_Size capability anyway, making the posted writes issue irrelevant.5.8.5. Device Status Register (Offset 0Ah)The Device Status register provides information about <strong>PCI</strong> <strong>Express</strong> device specificparameters. Figure 5-15 details allocation of register fields in the Device Status register;Table 5-13 provides the respective bit definitions.15 65432 10RsvdZTransactions PendingAUX Power DetectedUnsupported Request DetectedFatal Error DetectedNon-Fatal Error DetectedCorrectable Error DetectedFigure 5-15: Device Status Register244


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0Table 5-13: Device Status RegisterBit Location Register Description Attributes0 Correctable Error Detected – This bit indicates status ofcorrectable errors detected. Errors are logged in this registerregardless of whether error reporting is enabled or not in theDevice Control register. For a multi-function device, eachfunction indicates status of errors as perceived by the respectivefunction.Default value of this field is 0.1 Non-Fatal Error Detected – This bit indicates status of nonfatalerrors detected. Errors are logged in this registerregardless of whether error reporting is enabled or not in theDevice Control register. For a multi-function device, eachfunction indicates status of errors as perceived by the respectivefunction.Default value of this field is 0.2 Fatal Error Detected - This bit indicates status of fatal errorsdetected. Errors are logged in this register regardless ofwhether error reporting is enabled or not in the Device Controlregister. For a multi-function device, each function indicatesstatus of errors as perceived by the respective function.Default value of this field is 0.3 Unsupported Request Detected – This bit indicates that thedevice received an Unsupported Request. Errors are logged inthis register regardless of whether error reporting is enabled ornot in the Device Control Register. For a multi-function device,each function indicates status of errors as perceived by therespective function.Default value of this field is 0.4 AUX Power Detected - Devices that require AUX power reportthis bit as set if AUX power is detected by the device.5 Transactions Pending – Indicates whether a device has anytransactions pending. A device indicates that transactions arepending (including completions for any outstanding non-postedrequests for all used Traffic Classes) by reporting this bit as set.A device may report this bit cleared only when all pendingtransactions (including completions for any outstanding nonpostedrequests on any used virtual channel) have beencompleted. Refer to Section 7.4 for device/function stopsynchronization mechanism.This bit must be set by hardware when a 1 is written to the Stopbit in the Device Control register and subsequently cleared (byhardware) when all pending transactions have been completed.RW1CRW1CRW1CRW1CRORO245


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.05.8.6. Link Capabilities Register (Offset 0Ch)The Link Capabilities register identifies <strong>PCI</strong> <strong>Express</strong> Link specific capabilities. Figure 5-16details allocation of register fields in the Link Capabilities register; Table 5-14 provides therespective bit definitions.31 24 2318 17RsvdP151412111094 3 0Port #L1 Exit LatencyL0s Exit LatencyActive State Link PM SupportMaximum Link SpeedMaximum Link WidthFigure 5-16: Link Capabilities RegisterTable 5-14: Link Capabilities RegisterBit Location Register Description Attributes3:0 Maximum Link Speed – This field indicates the maximum Linkspeed of the given <strong>PCI</strong> <strong>Express</strong> Link. Defined encodings are:RO0001b 2.5 Gb/s LinkAll other encodings are reserved.9:4 Maximum Link Width - This field indicates the maximum widthof the given <strong>PCI</strong> <strong>Express</strong> Link. Defined encodings are:000000b Reserved000001b x1000010b x2000100b x4001000b x8001100b x12010000b x16100000b x3211:10 Active State Link PM Support – This field indicates the level ofactive state power management supported on the given <strong>PCI</strong><strong>Express</strong> Link. Defined encodings are:00b Reserved01b L0s Entry Supported10b Reserved11b L0s and L1 SupportedRORO246


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0Bit Location Register Description Attributes14:12 L0s Exit Latency – This field indicates the L0s exit latency forthe given <strong>PCI</strong> <strong>Express</strong> Link. The value reported indicates thelength of time this Port requires to complete transition from L0sto L0. Defined encodings are:000b Less than 64 ns001b 64 ns-128 ns010b 128 ns-256 ns011b 256 ns-512 ns100b 512 ns-1 µs101b 1 µs-2 µs110b 2 µs-4 µs111b ReservedRONote that exit latencies may be influenced by <strong>PCI</strong> <strong>Express</strong>reference clock configuration depending upon whether acomponent uses a common or separate reference clock.17:15 L1 Exit Latency – This field indicates the L1 exit latency for thegiven <strong>PCI</strong> <strong>Express</strong> Link. The value reported indicates the lengthof time this Port requires to complete transition from L1 to L0.Defined encodings are:000b Less than 1µs001b 1 µs-2 µs010b 2 µs-4 µs011b 4 µs-8 µs100b 8 µs-16 µs101b 16 µs-32 µs110b 32 µs-64 µs111b L1 transition not supportedNote that exit latencies may be influenced by <strong>PCI</strong> <strong>Express</strong>reference clock configuration depending upon whether acomponent uses a common or separate reference clock.31:24 Port Number – This field indicates the <strong>PCI</strong> <strong>Express</strong> port numberfor the given <strong>PCI</strong> <strong>Express</strong> Link.ROHwInit247


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.05.8.7. Link Control Register (Offset 10h)The Link Control register controls <strong>PCI</strong> <strong>Express</strong> Link specific parameters. Figure 5-17details allocation of register fields in the Link Control register; Table 5-15 provides therespective bit definitions.15 7 6 5 4 3RsvdP2 1 0Common Clock ConfigurationRetrain LinkLink DisableRead Request Return Parameter ‘R” ControlLink Loop Back ModeActive State PM ControlFigure 5-17: Link Control RegisterTable 5-15: Link Control RegisterBit Location Register Description Attributes1:0 ActiveStateLinkPMControl– This field controls the level ofactive state PM supported on the given <strong>PCI</strong> <strong>Express</strong> Link.Defined encodings are:RW00b Disabled01b L0s Entry Supported10b Reserved11b L0s and L1 Entry SupportedDefault value for this field is 0.2 Link Loop Back Mode – This bit puts a Link in loop-back modefor debug/diagnostic purposes. For multi-function devices, aLink is put in loop-back mode if all functions of component havethis bit set.Default value of this field is 0.RW248


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0Bit Location Register Description Attributes3 Read Request Return Parameter “R” Control – Refer toSection 2.7.6.2.1 for the definition of Read Request ReturnParameter.Defined encodings are for “R” capabilities are:0b 64 byte1b 128 byteRW<strong>PCI</strong> <strong>Express</strong> Endpoints and Switches that do not implement thisfeature must hardwire the field to 0b.This field is hardwired for a Root Port and returns its “R” supportcapabilities.4 Link Disable – This bit disables the Link when set; this field isnot applicable and reserved for endpoint devices and UpstreamPorts of a Switch.Default value of this field is 0.5 Retrain Link – This bit initiates Link retraining when set; thisfield is not applicable and reserved for endpoint devices andUpstream Ports of a Switch.This bit always returns 0 when read.6 Common Clock Configuration – This bit when set indicatesthat this component and the component at the opposite end ofthis Link are operating with a distributed common referenceclock.A value of 0 indicates that this component and the component atthe opposite end of this Link are operating with asynchronousreference clock.Components utilize this common clock configuration informationto report the correct L0s and L1 Exit Latencies.Default value of this field is 0.RWRWRW249


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.05.8.8. Link Status Register (Offset 12h)The Link Status register provides information about <strong>PCI</strong> <strong>Express</strong> Link specific parameters.Figure 5-18 details allocation of register fields in the Link Status register; Table 5-16 providesthe respective bit definitions.15 13 12 11 10 9 4 30RsvdZSlot ClockConfigurationLink TrainingLink Training ErrorNegotiated Link WidthLink SpeedFigure 5-18: Link Status RegisterTable 5-16: Link Status RegisterBit Location Register Description Attributes3:0 Link Speed – This field indicates the negotiated Link speed ofthe given <strong>PCI</strong> <strong>Express</strong> Link. Defined encodings are:RO0001b 2.5 Gb/s <strong>PCI</strong> <strong>Express</strong> LinkAll other encodings are reserved.9:4 Negotiated Link Width – This field indicates the negotiatedwidth of the given <strong>PCI</strong> <strong>Express</strong> Link. Defined encodings are:000001b X1000010b X2000100b X4001000b X8001100b X12010000b X16100000b X32All other encodings are reserved.10 Link Training Error – This read-only bit indicates that a Linktraining error occurred.11 Link Training – This read-only bit indicates that Link training isin progress; hardware clears this bit once Link training iscomplete.RORORO250


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0Bit Location Register Description Attributes12 Slot Clock Configuration – This bit indicates that thecomponent uses the same physical reference clock that theplatform provides on the connector. If the device uses anindependent clock irrespective of the presence of a reference onthe connector, this bit must be clear.RO5.8.9. Slot Capabilities Register (Offset 14h)The Slot Capabilities register identifies <strong>PCI</strong> <strong>Express</strong> Slot specific capabilities. Figure 5-19details allocation of register fields in the Slot Capabilities register; Table 5-17 provides therespective bit definitions.3122 2117 16 15 147 6 5 4 3 2 1 0Physical Slot NumberRsvdPSlot Power ScaleSlot Power ValueHot-Plug CapableHot-Plug SurprisePower Indicator PresentAttention Indicator PresentMRL Sensor PresentPower Controller PresentAttention Button PresentFigure 5-19: Slot Capabilities RegisterTable 5-17: Slot Capabilities RegisterBit Location Register Description Attributes0 Attention Button Present – This bit when set indicates that an HwInitAttention Button is implemented on the chassis for this slot.1 Power Controller Present – This bit when set indicates that aPower Controller is implemented for this slot.2 MRL Sensor Present – This bit when set indicates that an MRLSensor is implemented on the chassis for this slot.3 Attention Indicator Present – This bit when set indicates thatan Attention Indicator is implemented on the chassis for this slot.4 Power Indicator Present – This bit when set indicates that aPower Indicator is implemented on the chassis for this slot.5 Hot-plug Surprise – This bit when set indicates that a devicepresent in this slot might be removed from the system withoutany prior notification.HwInitHwInitHwInitHwInitHwInit251


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.06 Hot-plug Capable – This bit when set indicates that this slot iscapable of supporting Hot-plug operations.14:7 Slot Power Limit Value – In combination with the Slot PowerLimit Scale value, specifies the upper limit on power supplied byslot.Power limit (in Watts) calculated by multiplying the value in thisfield by the value in the Slot Power Limit Scale field.This register must be implemented if the Slot Implemented bit isset.The default value is 0000 0000b.16:15 Slot Power Limit Scale – Specifies the scale used for the SlotPower Limit Value.Range of Values00b = 1.0x (25.5-255)01b = 0.1x (2.55-25.5)10b = 0.01x (0.255-2.55)11b = 0.001x (0.0-0.255)This register must be implemented if the Slot Implemented bit isset.The default value is all 00b.31:22 Physical Slot Number – This hardware initialized field indicatesthe physical slot number attached to this Port. This field must behardware initialized to a value that assigns a slot number that isglobally unique within the chassis. These registers should beinitialized to 0 for ports connected to devices that are eitherintegrated on the motherboard or integrated within the samesilicon as the Switch device or Root Port.HwInitHwInitHwInitHwInit252


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.05.8.10. Slot Control Register (Offset 18h)The Slot Control register controls <strong>PCI</strong> <strong>Express</strong> Slot specific parameters. Figure 5-20 detailsallocation of register fields in the Slot Control register; Table 5-18 provides the respective bitdefinitions.15 11 10 9 8RsvdP7 6 5 4 3 2 1 0Power Controller ControlPower Indicator ControlAttention Indicator ControlHot-plug Interrupt EnableCommand Completed Interrupt EnableAttention Button Pressed EnablePower Fault Detected EnableMRL Sensor Changed EnablePresence Detect Changed EnableFigure 5-20: Slot Control RegisterTable 5-18: Slot Control RegisterBit Location Register Description Attributes0 Attention Button Pressed Enable – This bit when set enablesthe generation of hot plug interrupt or wake message on anattention button pressed event.Default value of this field is 0.RW1 Power Fault Detected Enable – This bit when set enables thegeneration of hot plug interrupt or wake message on a powerfault event.Default value of this field is 0.2 MRL Sensor Changed Enable – This bit when set enables thegeneration of hot plug interrupt or wake message on a MRLsensor changed event.Default value of this field is 0.3 Presence Detect Changed Enable – This bit when set enablesthe generation of hot plug interrupt or wake message on apresence detect changed event.Default value of this field is 0.4 Command Completed Interrupt Enable – This bit when setenables the generation of hot plug interrupt when a command iscompleted by the Hot plug controller.Default value of this field is 0.5 Hot plug Interrupt Enable – This bit when set enablesgeneration of hot plug interrupt on enabled hot plug events.Default value of this field is 0.RWRWRWRWRW253


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0Bit Location Register Description Attributes7:6 Attention Indicator Control – Reads to this register return thecurrent state of the Attention Indicator; writes to this register setthe Attention Indicator. Defined encodings are:00b Reserved01b On10b Blink11b OffRWWrite to this register causes the Port to send the appropriateATTENTION_INDICATOR_* messages.9:8 Power Indicator Control – Reads to this register return thecurrent state of the Power Indicator; writes to this register set thePower Indicator. Defined encodings are:00b Reserved01b On10b Blink11b OffWrites to this register causes the Port to send the appropriatePOWER_INDICATOR_* messages.10 Power Controller Control – When read this register returns thecurrent state of the Power applied to the slot; when written setsthe power state of the slot per the defined encodings.0b Power On1b Power OffRWRW254


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.05.8.11. Slot Status Register (Offset 1Ah)The Slot Status register provides information about <strong>PCI</strong> <strong>Express</strong> Slot specific parameters.Figure 5-21 details allocation of register fields in the Slot Status register; Table 5-19 providesthe respective bit definitions.15 76 5 4 3 2 1 0RsvdZPresence Detect StateMRL Sensor StateCommand CompletedPresence Detect ChangedMRL Sensor ChangedPower Fault DetectedAttentionButtonPressedFigure 5-21: Slot Status RegisterTable 5-19: Slot Status RegisterBit Location Register Description Attributes0 Attention Button Pressed– This bit is set when the attentionbutton is pressed.Default value of this field is 0.1 Power Fault Detected – This bit is set when the PowerController detects a power fault at this slot.Default value of this field is 0.2 MRL Sensor Changed – This bit is set when a MRL Sensorstate change is detected.Default value of this field is 0.3 Presence Detect Changed – This bit is set when a PresenceDetect change is detected.Default value of this field is 0.4 Command Completed – This bit is set when the hot plugcontroller completes an issued command.Default value of this field is 0.5 MRL Sensor State – This register reports the status of the MRLsensor if it is implemented. Defined encodings are:0b MRL Closed1b MRL OpenRW1CRW1CRW1CRW1CRW1CRO255


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0Bit Location Register Description Attributes6 Presence Detect State – This bit indicates the presence of acard in the slot. This bit reflects the status of the PresenceDetect pin as defined in the <strong>PCI</strong> <strong>Express</strong> CardElectromechanical <strong>Specification</strong>. Defined encodings are:0b Slot Empty1b Card Present in slotThis register is required to be implemented on all SwitchDownstream Ports and Root Ports. The Presence Detect Statefield for Switch Downstream Ports or Root Ports not connectedto any slots should be hardwired to 1. This register is required ifa slot is implemented.RO5.8.12. Root Control Register (Offset 1Ch)The Root Control register controls <strong>PCI</strong> <strong>Express</strong> Root Complex specific parameters.Figure 5-22 details allocation of register fields in the Root Control register; Table 5-20provides the respective bit definitions.15RsvdP43 2 1 0PME Interrupt EnableSystem Error on Fatal Error EnableSystem Error on Non-Fatal Error EnableSystem Error on Correctable Error EnableFigure 5-22: Root Control Register256


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0Table 5-20: Root Control RegisterBit Location Register Description Attributes0 System Error on Correctable Error Enable – This bit controlsthe Root Complex’s response to correctable errors. If set itindicates that a System Error should be generated if acorrectable error is reported by any of the devices in thehierarchy associated with this Root Port.1 System Error on Non-Fatal Error Enable – This bit controlsthe Root Complex’s response to non-fatal errors. If set itindicates that a System Error should be generated if a non-fatalerror is reported by any of the devices in the hierarchyassociated with this Root Port.2 System Error on Fatal Error Enable – This bit controls theRoot Complex’s response to fatal errors. If set it indicates that aSystem Error should be generated if a fatal error is reported byany of the devices in the hierarchy associated with this RootPort.3 PME Interrupt Enable – This bit when set enables interruptgeneration upon receipt of a PME message as reflected in thePME Status register bit (see Table 5-21). A PME interrupt isalso generated if the PME Status register bit is set when this bitis set from a cleared state.Default value of this field is 0.RWRWRWRW5.8.13. Root Status Register (Offset 20h)The Root Status register provides information about <strong>PCI</strong> <strong>Express</strong> device specificparameters. Figure 5-23 details allocation of register fields in the Root Status register; Table5-21 provides the respective bit definitions.31RsvdZ17 16 15PME Requestor ID0PME PendingPME StatusFigure 5-23: Root Status Register257


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0Table 5-21: Root Status RegisterBit Location Register Description Attributes15:0 PME Requestor ID – This field indicates the <strong>PCI</strong> requestor ID ofthe last PME requestor.16 PME Status – This bit indicates that PME was asserted by therequestor ID indicated in the PME Requestor ID field.Subsequent PMEs are kept pending until the status register iscleared by software by writing a 1.17 PME Pending – This read-only bit indicates that another PME ispending when the PME Status bit is set. When the PME Statusbit is cleared by software; the PME is delivered by hardware bysetting the PME Status bit again and updating the Requestor IDappropriately. The PME pending bit is cleared by hardware if nomore PMEs are pending.RORW1CRO5.9. <strong>PCI</strong> <strong>Express</strong> Extended Capabilities<strong>PCI</strong> <strong>Express</strong> Extended Capability registers are located in device configuration space atoffsets 256 or greater as shown in Figure 5-24 or in the Root Complex Register Block(RCRB). These registers when located in the device configuration space are accessible usingonly the <strong>PCI</strong> <strong>Express</strong> extended configuration space flat memory-mapped access mechanism.<strong>PCI</strong> <strong>Express</strong> Extended Capability structures are allocated using a linked list of optional orrequired <strong>PCI</strong> <strong>Express</strong> Extended Capabilities following a format resembling <strong>PCI</strong> capabilitystructures. The first DWORD of the capability structure identifies the capability/versionand points to the next capability as shown in Figure 5-24.FFFh<strong>PCI</strong> <strong>Express</strong> Extended Capability<strong>PCI</strong> <strong>Express</strong> ExtendedConfiguration Space<strong>PCI</strong> <strong>Express</strong>Capability IDCapabilityData15:0 Capability ID19:16 Capability Version Number31:20 Next Capability Offset (0x0 based)FFhLength implied by CAP ID/Version Number<strong>PCI</strong> ConfigurationSpace<strong>PCI</strong> <strong>Express</strong> extended capabilities startat base of extended configuration region0Figure 5-24: <strong>PCI</strong> <strong>Express</strong> Extended Configuration Space LayoutOM14302258


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.05.9.1. Extended Capabilities in Configuration SpaceExtended Capabilities in device configuration space always begin at offset 100h with a <strong>PCI</strong><strong>Express</strong> Enhanced Capability Header (Section 5.9.3). Absence of any Extended Capabilitiesis required to be indicated by an Enhanced Capability Header with a Capability ID of FFFFhand a Next Capability Offset of 0h.5.9.2. Extended Capabilities in the Root Complex RegisterBlockExtended Capabilities in a Root Complex Register Block always begin at offset 0h with a<strong>PCI</strong> <strong>Express</strong> Enhanced Capability Header (Section 5.9.3). Absence of any ExtendedCapabilities is required to be indicated by an Enhanced Capability Header with a CapabilityID of FFFFh and a Next Capability Offset of 0h.5.9.3. <strong>PCI</strong> <strong>Express</strong> Enhanced Capability HeaderAll <strong>PCI</strong> <strong>Express</strong> extended capabilities must begin with a <strong>PCI</strong> <strong>Express</strong> Enhanced CapabilityHeader.31Next Capability Offset20 1916 15 03GIO Extended Capability IDCapability VersionFigure 5-25: <strong>PCI</strong> <strong>Express</strong> Enhanced Capability HeaderTable 5-22: <strong>PCI</strong> <strong>Express</strong> Enhanced Capability HeaderBitLocationDescription15:0 <strong>PCI</strong> <strong>Express</strong> Extended Capability ID – This field isa <strong>PCI</strong>-SIG defined ID number that indicates thenature and format of the extended capability.19:16 Capability Version – This field is a <strong>PCI</strong>-SIGdefined version number that indicates the version ofthe capability structure present.31:20 Next Capability Offset – This field contains theoffset to the next <strong>PCI</strong> <strong>Express</strong> capability structure or000h if no other items exist in the linked list ofcapabilities.For Extended Capabilities implemented in deviceconfiguration space, this offset is relative to thebeginning of <strong>PCI</strong> compatible configuration spaceand thus must always be either 000h (forterminating list of capabilities) or greater than 0FFh.RegisterAttributeRORORO259


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.05.10. Advanced Error Reporting CapabilityThe <strong>PCI</strong> <strong>Express</strong> Advanced Error Reporting capability is an optional extended capabilitythat may be implemented by <strong>PCI</strong> <strong>Express</strong> devices supporting advanced error control andreporting. The Advanced Error Reporting capability structure definition has additionalinterpretation for Root Ports; software must interpret the <strong>PCI</strong> <strong>Express</strong> device/Port Typefield (Section 5.8.1) in the <strong>PCI</strong> <strong>Express</strong> Capability Structure to determine the availability ofadditional registers for Root Ports.Figure 5-26 shows the <strong>PCI</strong> <strong>Express</strong> Advanced Error Reporting Capability Structure.Note that if an error reporting bit field is marked as optional in the error registers, the bitsmust be implemented or not implemented as a group across the Status, Mask and Severityregisters. In other words, a device is required to implement the same error bit fields incorresponding Status, Mask and Severity registers. Bits corresponding to bit fields that arenot implemented must be hardwired to 0.310ByteOffset3GIO Enhanced Capability Header00hUncorrectable Error Status RegisterUncorrectable Error Mask Register04h08hUncorrectable Error Severity Register0ChCorrectable Error Status RegisterCorrectable Error Mask RegisterAdvanced Error Capabilities Register10h14h18h1ChHeader Log RegisterOnlyValid forRootPortsError SourceIdentification RegisterRoot Error CommandRoot Error StatusCorrectable Error SourceIdentification Register2Ch30h34hFigure 5-26: <strong>PCI</strong> <strong>Express</strong> Advanced Error Reporting Extended Capability Structure260


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.05.10.1. Advanced Error Reporting Enhanced CapabilityHeader (Offset 00h)See Section 5.9.3 for a description of the <strong>PCI</strong> <strong>Express</strong> Enhanced Capability Header. TheExtended Capability ID for the Advanced Error Reporting Capability is 0001h.31Next Capability Offset20 1916 15 03GIO Extended Capability IDCapability VersionFigure 5-27: Advanced Error Reporting Enhanced Capability HeaderTable 5-23: Advanced Error Reporting Enhanced Capability HeaderBitLocationDescription15:0 <strong>PCI</strong> <strong>Express</strong> Extended Capability ID – This field isa <strong>PCI</strong>-SIG defined ID number that indicates thenature and format of the extended capability.Extended Capability ID for the Advanced ErrorReporting Capability is 0001h.19:16 Capability Version – This field is a <strong>PCI</strong>-SIGdefined version number that indicates the version ofthe capability structure present.Must be 1h for this version of the specification.31:20 Next Capability Offset – This field contains theoffset to the next <strong>PCI</strong> <strong>Express</strong> capability structure or000h if no other items exist in the linked list ofcapabilities.For Extended Capabilities implemented in deviceconfiguration space, this offset is relative to thebeginning of <strong>PCI</strong> compatible configuration spaceand thus must always be either 000h (forterminating list of capabilities) or greater than 0FFh.RegisterAttributeRORORO261


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.05.10.2. Uncorrectable Error Status Register (Offset 04h)31201918171615141312115 4 3 1 0RsvdZRsvdZRsvdZ19 ECRC Error Status18 Malformed TLP Status17 Received Overflow Status16 Unexpected Completion Status15 Completer Abort Status14 Completion Timeout Status13 Flow Control Protocol Error Status12 Poisoned TLP Status0 Training Error Status4 Data Link Protocol ErrorStatusFigure 5-28: Uncorrectable Error Status RegisterThe Uncorrectable Error Status register reports error status of individual error sources on a<strong>PCI</strong> <strong>Express</strong> device. An individual error status bit that is set indicates that a particular erroroccurred; software may clear an error status by writing a 1 to the respective bit. Refer toSection 7.2 for further details.BitLocationTable 5-24: Uncorrectable Error Status RegisterDescriptionRegisterAttributeDefaultValue0 Training Error Status (Optional) RW1CS 04 Data Link Protocol Error Status RW1CS 012 Poisoned TLP Status RW1CS 013 Flow Control Protocol Error Status (Optional) RW1CS 014 Completion Timeout Status RW1CS 015 Completer Abort Status (Optional) RW1CS 016 Unexpected Completion Status RW1CS 017 Receiver Overflow Status (Optional) RW1CS 018 Malformed TLP Status RW1CS 019 ECRC Error Status (Optional) RW1CS 0262


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.05.10.3. Uncorrectable Error Mask Register (Offset 08h)31201918171615141312115 4 3 1 0RsvdPRsvdPRsvdP19 ECRC Error Mask18 Malformed TLP Mask17 Received Overflow Mask16 Unexpected Completion Mask15 Completer Abort Mask14 Completion Timeout Mask13 Flow Control Protocol Error Mask12 Poisoned TLP Mask0 Training Error Mask4 Data Link Protocol Error MaskFigure 5-29: Uncorrectable Error Mask RegisterThe Uncorrectable Error Mask register controls reporting of individual errors by device tothe <strong>PCI</strong> <strong>Express</strong> Root Complex via a <strong>PCI</strong> <strong>Express</strong> error message. A masked error(respective bit set in mask register) is not reported to the <strong>PCI</strong> <strong>Express</strong> Root Complex by anindividual device. Refer to Section 7.2 for further details. There is a mask bit per bit of theUncorrectable Error Status register.BitLocationTable 5-25: Uncorrectable Error Mask RegisterDescriptionRegisterAttributeDefaultValue0 Training Error Mask (Optional) RWS 04 Data Link Protocol Error Mask RWS 012 Poisoned TLP Mask RWS 013 Flow Control Protocol Error Mask (Optional) RWS 014 Completion Timeout Mask RWS 015 Completer Abort Mask (Optional) RWS 016 Unexpected Completion Mask RWS 017 Receiver Overflow Mask (Optional) RWS 018 Malformed TLP Mask RWS 019 ECRC Error Mask (Optional) RWS 0263


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.05.10.4. Uncorrectable Error Severity Register (Offset 0Ch)3120 1918171615141312115 4 3 1 0RsvdPRsvdPRsvdP19 ECRC Error Severity18 Malformed TLP Severity17 Received Overflow Severity16 Unexpected Completion Severity0 Training Error Severity4 Data Link Protocol ErrorSeverity15 Completer Abort Severity14 Completion Timeout Severity13 Flow Control Protocol Error Severity12 Poisoned TLP SeverityFigure 5-30: Uncorrectable Error Severity RegisterThe Uncorrectable Error Severity register controls whether an individual error is reported asa non-fatal or fatal error. An error is reported as fatal when the corresponding error bit inthe severity register is set. If the bit is cleared, the corresponding error is considered nonfatal.Refer to Section 7.2 for further details.BitLocationTable 5-26: Uncorrectable Error Severity RegisterDescriptionRegisterAttributeDefaultValue0 Training Error Severity (Optional) RWS 14 Data Link Protocol Error Severity RWS 112 Poisoned TLP Severity RWS 013 Flow Control Protocol Error Severity (Optional) RWS 014 Completion Timeout Error Severity RWS 015 Completer Abort Error Severity (Optional) RWS 016 Unexpected Completion Error Severity RWS 017 Receiver Overflow Error Severity (Optional) RWS 118 Malformed TLP Severity RWS 119 ECRC Error Severity (Optional) RWS 0264


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.05.10.5. Correctable Error Status Register (Offset 10h)3112 11 9 8 7 6 5 1 0RsvdZRsvdZ12 Replay Timer Timeout Status9:11 RsvdZ8 REPLAY_NUM Rollover Status0 Receiver Error Status6 Bad TLP Status7 Bad DLLPStatusFigure 5-31: Correctable Error Status RegisterThe Correctable Error Status register reports error status of individual correctable errorsources on a <strong>PCI</strong> <strong>Express</strong> device. When an individual error status bit is set, it indicates thata particular error occurred; software may clear an error status by writing a 1 to the respectivebit. Refer to Section 7.2 for further details.BitLocationTable 5-27: Correctable Error Status RegisterDescriptionRegisterAttributeDefaultValue0 Receiver Error Status (Optional) RW1CS 06 Bad TLP Status RW1CS 07 Bad DLLP Status RW1CS 08 REPLAY_NUM Rollover Status RW1CS 012 Replay Timer Timeout Status RW1CS 05.10.6. Correctable Error Mask (Offset 14h)3112 11 9 8 7 6 5 1 0RsvdPRsvdP12 Replay Timer Timeout Mask9:11 RsvdZ8 REPLAY_NUM Rollover Mask0 Receiver Error Mask6 Bad TLP Mask7 Bad DLLPMaskFigure 5-32: Correctable Error Mask RegisterThe Correctable Error Mask register controls reporting of individual correctable errors bydevice to the <strong>PCI</strong> <strong>Express</strong> Root Complex via a <strong>PCI</strong> <strong>Express</strong> error message. A masked error(respective bit set in mask register) is not reported to the <strong>PCI</strong> <strong>Express</strong> Root Complex by an265


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0individual device. Refer to Section 7.2 for further details. There is a mask bit per bit in theCorrectable Error Status register.BitLocationTable 5-28: Correctable Error Mask RegisterDescriptionRegisterAttributeDefaultValue0 Receiver Error Mask (Optional) RWS 06 Bad TLP Mask RWS 07 Bad DLLP Mask RWS 08 REPLAY_NUM Rollover Mask RWS 012 Replay Timer Timeout Mask RWS 05.10.7. Advanced Error Capabilities and Control Register(Offset 18h)319 8 7 6540RsvdPFirst ErrorPointerECRC Check EnableECRC Check CapableECRC Generation EnableECRC Generation CapableFigure 5-33: Advanced Error Capabilities and Control RegisterFigure 5-33 details allocation of register fields in the Advanced Error Capabilities andControl register; Table 5-29 provides the respective bit definitions. Handling of multipleerrors is discussed in Section 7.2.4.2.Table 5-29: Advanced Error Capabilities RegisterBitLocationDescription4:0 First Error Pointer - The First Error Pointer is aread-only register that identifies the bit position ofthe first error reported in the Uncorrectable ErrorStatus register. Refer to Section 7.2 for furtherdetails5 ECRC Generation Capable – This bit indicates thatthe device is capable of generating ECRC (seeSection 2.10).6 ECRC Generation Enable – This bit when setenables ECRC generation (see Section 2.10).RegisterAttributeROSRORWS266


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0BitLocationDescriptionDefault value of this field is 0.7 ECRC Check Capable – This bit indicates that thedevice is capable of checking ECRC (seeSection 2.10).8 ECRC Check Enable – This bit when set enablesECRC checking (see Section 2.10).Default value of this field is 0.RegisterAttributeRORWS5.10.8. Header Log Register (Offset 1Ch)The header log register captures the header for the transaction that generated an error; referto Section 7.2 for further details. Section 7.2 also enumerates the conditions where thepacket header is logged. This register is 16 bytes and adheres to the format of the headersdefined throughout this specification.1270Header Log RegisterBitLocationFigure 5-34: Header Log RegisterTable 5-30: Header Log RegisterDescriptionRegisterAttributeDefaultValue127:0 Header of TLP associated with error ROS 0267


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.05.10.9. Root Error Command Register (Offset 2Ch)31RsvdP3 2 1 02 Fatal Error Reporting Enable1 Non-Fatal Error Reporting Enable0 Correctable Error Reporting EnableFigure 5-35: Root Error Command RegisterThe Root Error Command register allows finer control of root complex response toCorrectable, Non-Fatal and Fatal error messages than the basic root complex capability togenerate system errors in response to error messages. Bit fields enable/disable generation ofinterrupts (claimed by the Root Port) in addition to system error messages according to thedefinitions in Table 5-31.Table 5-31: Root Error Command RegisterBitLocationDescription0 Correctable Error Reporting Enable – When setthis bit enables the generation of an interrupt whena correctable error is reported by any of the devicesin the hierarchy associated with this Root Port.Refer to Section 7.2 for further details.1 Non-Fatal Error Reporting Enable – When set thisbit enables the generation of an interrupt when anon-fatal error is reported by any of the devices inthe hierarchy associated with this Root Port. Referto Section 7.2 for further details.2 Fatal Error Reporting Enable – When set this bitenables the generation of an interrupt when a fatalerror is reported by any of the devices in thehierarchy associated with this Root Port. Refer toSection 7.2 for further details.RegisterAttributeDefaultValueRW 0RW 0RW 0System error generation in response to <strong>PCI</strong> <strong>Express</strong> error messages may be turned off bysystem software using the <strong>PCI</strong> <strong>Express</strong> Capability Structure described in Section 5.8 whenadvanced error reporting via interrupts is enabled. Refer to Section 7.2 for further details.268


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.05.10.10. Root Error Status Register (Offset 30h)31MSI #27 26RsvdZ4 3 2 1 03 Next Uncorrectable Error Detected2 First Uncorrectable Error Detected1 Next Correctable Error Detected0 First Correctable Error DetectedFigure 5-36: Root Error Status RegisterThe Root Error Status register reports status of errors received by the root complex. Eachcorrectable and uncorrectable (non-fatal and fatal) error source has a first error bit and anext error bit associated with it respectively. When an error is received by a root complex,the respective first error bit is set and the Requestor ID is logged in the Error SourceIdentification register. A set individual error status bit indicates that a particular erroroccurred; software may clear an error status by writing a 1 to the respective bit. If softwaredoes not clear the first reported error before another error message is received, the nexterror status bit will be set but the Requestor ID of the subsequent error message isdiscarded. The next error status bits may be cleared by software by writing a 1 to therespective bit as well. Refer to Section 7.2 for further details.Table 5-32: Root Error Status RegisterBitLocationDescription0 First Correctable Error Detected – Set when acorrectable error is received and First CorrectableError Detected is not already set.Default value of this field is 0.1 Next Correctable Error Detected – Set when acorrectable error is received and First CorrectableError Detected is already set. This indicates thatone or more error message requestor IDs were lost.Default value of this field is 0.2 First Uncorrectable Error Detected – Set wheneither a fatal or a non-fatal error is received andFirst Uncorrectable Error Detected is not alreadyset.Default value of this field is 0.RegisterAttributeRW1CSRW1CSRW1CS269


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0BitLocationDescription3 Next Uncorrectable Error Detected – Set wheneither a fatal or a non-fatal error is received andFirst Uncorrectable Error Detected is already set.This indicates that one or more error messagerequestor IDs were lost.Default value of this field is 0.31:27 Advanced Error Interrupt Message Number – Ifthis function is allocated more than one MSIinterrupt number, this register is required to containthe offset between the base Message Data and theMSI Message that is generated when any of statusbits of this capability are set.Hardware is required to update this field so that it iscorrect if the number of MSI Messages assigned tothe device changes.RegisterAttributeRW1CSRO5.10.11. Error Source Identification Register (Offset 34h)3116 150Uncorrectable Error SourceIdentification RegisterCorrectable Error SourceIdentification RegisterFigure 5-37: Error Source Identification RegisterThe Error Source identification register identifies the source (Requestor ID) of firstcorrectable and non-fatal/fatal errors reported in the Root Error Status register. Refer toSection 7.2 for further details.Table 5-33: Error Source Identification RegisterBitLocationDescription15:0 Correctable Error Source Identification – Set withthe Requestor ID of the source when a correctableerror is received and First Correctable ErrorDetected is not already set.Default value of this field is 0.31:16 Uncorrectable Error Source Identification – Setwith the Requestor ID of the source when a nonfatal/fatalerror is received and the FirstUncorrectable Error Detected is not already set.Default value of this field is 0.RegisterAttributeROSROS270


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.05.11. Virtual Channel CapabilityThe <strong>PCI</strong> <strong>Express</strong> Virtual Channel capability is an optional extended capability that isrequired to be implemented by <strong>PCI</strong> <strong>Express</strong> ports of devices that support <strong>PCI</strong> <strong>Express</strong>functionality beyond the general purpose IO traffic, i.e. the default Traffic Class 0 (TC0)over the default Virtual Channel 0 (VC0). This may apply to devices with only one VC thatsupport TC filtering or to devices that support multiple VCs. Note that a <strong>PCI</strong> <strong>Express</strong>device that supports only TC0 over VC0 does not require VC extended capability andassociated registers. Figure 5-38 provides a high level view of the <strong>PCI</strong> <strong>Express</strong> VirtualChannel Capability Structure for all devices. This structure controls Virtual Channelassignment for <strong>PCI</strong> <strong>Express</strong> links and may be present in Endpoint devices, Switch ports(Upstream and Downstream), Root Ports and RCRB. Some registers/fields in the <strong>PCI</strong><strong>Express</strong> Virtual Channel Capability Structure may have different interpretation for Endpointdevices, Switch ports, Root Ports and RCRB. Software must interpret the <strong>PCI</strong> <strong>Express</strong>device/Port Type field (Section 5.8.1) in the <strong>PCI</strong> <strong>Express</strong> Capability Structure to determinethe availability and meaning of these registers/fields.The <strong>PCI</strong> <strong>Express</strong> Virtual Channel Capability Structure can be present in the ExtendedConfiguration Space of all devices or in RCRB with the restriction that it is only present inthe Extended Configuration Space of Function 0 for devices at their Upstream Ports.31 16 15 03GIO Enhanced Capability HeaderByte Offset00hPort VC Capability Register 1n*(2:0)04hVC Arb Table Offset(31:24)Port VC Capability Register 208hPort VC Status RegisterPort VC Control Register0ChPort Arb Table Offset(31:24)VC Resource Capability Register (0)10hSwitchports,RootPortsandRCRBAllDevicesVC Resource Status Register (0)Port Arb Table Offset(31:24)VC Resource Control Register (0)...VC Resource Capability Register(n)VC Resource Control Register (n)RsvdP14h18h10h + n * 0Ch14h + n * 0ChVC Resource Status Register (n)RsvdP18h + n * 0ChVC Arbitration TableVAT_Offset * 04hPort Arbitration Table (0)PAT_Offset(0) *04h* n = Extended VC CountPort Arbitration Table (n)PAT_Offset(n) *04hFigure 5-38: <strong>PCI</strong> <strong>Express</strong> Virtual Channel Capability Structure271


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0The following registers/fields are defined for <strong>PCI</strong> <strong>Express</strong> Virtual Channel CapabilityStructure.5.11.1. Virtual Channel Enhanced Capability HeaderSee Section 5.9.3 for a description of the <strong>PCI</strong> <strong>Express</strong> Enhanced Capability Header. TheExtended Capability ID for the Virtual Channel Capability is 0002h.31Next Capability Offset20 1916 15 03GIO Extended Capability IDCapability VersionFigure 5-39: Virtual Channel Enhanced Capability HeaderTable 5-34: Virtual Channel Enhanced Capability HeaderBitLocationDescription15:0 <strong>PCI</strong> <strong>Express</strong> Extended Capability ID – This field isa <strong>PCI</strong>-SIG defined ID number that indicates thenature and format of the extended capability.Extended Capability ID for the Virtual ChannelCapability is 0002h.19:16 Capability Version – This field is a <strong>PCI</strong>-SIGdefined version number that indicates the version ofthe capability structure present.Must be 1h for this version of the specification.31:20 Next Capability Offset – This field contains theoffset to the next <strong>PCI</strong> <strong>Express</strong> capability structure or000h if no other items exist in the linked list ofcapabilities.For Extended Capabilities implemented in deviceconfiguration space, this offset is relative to thebeginning of <strong>PCI</strong> compatible configuration spaceand thus must always be either 000h (forterminating list of capabilities) or greater than 0FFh.RegisterAttributeRORORO272


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.05.11.2. Port VC Capability Register 1The Port VC Capability Register 1 describes the configuration of the Virtual Channelsassociated with a <strong>PCI</strong> <strong>Express</strong> port. Figure 5-40 details allocation of register fields in thePort VC Capability Register 1; Table 5-35 provides the respective bit definitions.31 12 11 109 8 7 6 4 3 20RsvdPPort Arbitration Table Entry SizeReference ClockRsvdPLow Priority Extended VC CountRsvdPExtended VC CountFigure 5-40: Port VC Capability Register 1Table 5-35: Port VC Capability Register 1Bit Location Description Attribute2:0 Extended VC Count – Indicates the number of (extended) VirtualChannels in addition to the default VC supported by the device. Thisfield is valid for all devices.The minimum value of this field is 0 (for devices that only support thedefault VC). The maximum value is 7.RO6:4 Low Priority Extended VC Count – Indicates the number of (extended)Virtual Channels in addition to the default VC belonging to the lowpriorityVC (LPVC) group that has the lowest priority with respect toother VC resources in a strict-priority VC Arbitration. This field is validfor all devices.The minimum value of this field is 0 and the maximum value isExtended VC Count.RO273


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0Bit Location Description Attribute9:8 Reference Clock – Indicates the reference clock for Virtual Channelsthat support time-based WRR Port Arbitration. This field is valid onlyfor RCRB and Switch Upstream Ports. This field is not valid and mustbe set to 0 for Endpoint devices, Root Ports or Switch DownstreamPorts. Defined encodings are:00b 100 ns reference clock01b – 11b ReservedRONote: Time-based WRR Port Arbitration can be supported by multipleSwitch ports when they serve as egress for peer-to-peer traffic.However, only the Upstream Port of a Switch contains validReference Clock.11:10 Port Arbitration Table Entry Size – Indicates the size (in bits) of PortArbitration table entry in the device. This field is valid only for RCRBand Switch Upstream Ports. It is not valid and must be set to 0 forEndpoint devices, Root Ports or Switch Downstream Ports. Definedencodings are:00b The size of Port Arbitration table entry is 1 bit01b The size of Port Arbitration table entry is 2 bits10b The size of Port Arbitration table entry is 4 bits11b The size of Port Arbitration table entry is 8 bitsRO274


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.05.11.3. Port VC Capability Register 2The Port VC Capability Register 2 provides further information about the configuration ofthe Virtual Channels associated with a <strong>PCI</strong> <strong>Express</strong> port. Figure 5-41 details allocation ofregister fields in the Port VC Capability Register 2; Table 5-36 provides the respective bitdefinitions.31 24 238 70VC ArbitrationTable OffsetRsvdPVC ArbitrationCapabilityFigure 5-41: Port VC Capability Register 2Table 5-36: Port VC Capability Register 2BitLocationDescription7:0 VC Arbitration Capability – Indicates the types of VC Arbitrationsupported by the device for the LPVC group. This field is validfor all devices that report a Low Priority Extended VC Countgreater than 0.Each bit location within this field corresponds to a VC Arbitrationcapability defined below. When more than one bit in this field isset, it indicates that the port can be configured to provide differentVC arbitration services. Defined bit positions are:Bit 0 Hardware fixed Round-Robin (RR) or RRlikearbitration schemeBit 1Bit 2Bit 3Bits 4-7Weighted Round Robin (WRR) arbitrationwith 32 phasesWRR arbitration with 64 phasesWRR arbitration with 128 phasesReserved31:24 VC Arbitration Table Offset – Indicates the location of the VCArbitration Table. This field is valid for all devices.This field contains the zero-based offset of the table inDQWORDS (16 bytes) from the base address of the VirtualChannel Capability Structure. A value of 0 indicates that thetable is not present.AttributeRORO275


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.05.11.4. Port VC Control RegisterFigure 5-42 details allocation of register fields in the Port VC Control Register; Table 5-37provides the respective bit definitions.154 310RsvdPVC Arbitration SelectLoad VC Arbitration TableFigure 5-42: Port VC Control RegisterTable 5-37: Port VC Control RegisterBitLocationDescription0 Load VC Arbitration Table – Used for software to update theVC Arbitration Table. This field is valid for all devices when theVC Arbitration Table is used by the selected VC Arbitration.Software sets this bit to request hardware to apply new valuesprogrammed into VC Arbitration Table; clearing this bit has noeffect. Software checks the VC Arbitration Table Status field toconfirm that new values stored in the VC Arbitration Table arelatched by the VC arbitration logic.This bit always returns 0 when read.3:1 VC Arbitration Select – Used for software to configure the VCarbitration by selecting one of the supported VC Arbitrationschemes indicated by the VC Arbitration Capability field in thePort VC Capability Register 2. This field is valid for all devices.The value of this field is the number corresponding to one of theasserted bits in the VC Arbitration Capability field.This field can not be modified when more than one VC in theLPVC group is enabled.AttributeRWRW276


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.05.11.5. Port VC Status RegisterThe Port VC Status Register provides status of the configuration of Virtual Channelsassociated with a port. Figure 5-43 details allocation of register fields in the Port VC StatusRegister; Table 5-38 provides the respective bit definitions.1510RsvdZVC Arbitration Table StatusFigure 5-43: Port VC Status RegisterTable 5-38: Port VC Status RegisterBitLocationDescription0 VC Arbitration Table Status – Indicates the coherency statusof the VC Arbitration Table. This field is valid for all deviceswhen the VC Arbitration Table is used by the selected VCArbitration.This bit is set by hardware when any entry of the VC ArbitrationTable is written by software. This bit is cleared by hardwarewhen hardware finishes loading values stored in the VCArbitration Table after software sets the Load VC ArbitrationTable field in the Port VC Control Register.Default value of this field is 0.AttributeRO5.11.6. VC Resource Capability RegisterThe VC Resource Capability Register describes the capabilities and configuration of aparticular Virtual Channel resource. Figure 5-44 details allocation of register fields in the VCResource Capability Register; Table 5-39 provides the respective bit definitions.31 24 23 2216 15 148 70Port ArbitrationTable OffsetMaximum TimeSlotsRsvdPPort ArbitrationCapabilityRsvdPSnoop Transaction PermittedFigure 5-44: VC Resource Capability Register277


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0Table 5-39: VC Resource Capability RegisterBitLocationDescription7:0 Port Arbitration Capability – Indicates types of Port Arbitrationsupported by the VC resource. This field is valid for all Switchports and RCRB, but not for <strong>PCI</strong> <strong>Express</strong> Endpoint devices orRoot Ports.Each bit location within this field corresponds to a PortArbitration capability defined below. When more than one bit inthis field is set, it indicates that the VC resource can beconfigured to provide different arbitration services.Software selects among these capabilities by writing to the PortArbitration Select field (see below). Defined bit positions are:Bit 0 Hardware fixed Round-Robin (RR) or RRlikearbitration schemeBit 1Bit 2Bit 3Bit 4Bits 5-7Weighted Round Robin (WRR) arbitrationwith 32 phasesWRR arbitration with 64 phasesWRR arbitration with 128 phasesTime-based WRR with 128 phasesReserved15 Snoop Transaction Permitted – Indicates if snoop transactionis permitted over the VC resource. This field is valid only forRCRB, but not for <strong>PCI</strong> <strong>Express</strong> Endpoint devices, Switch portsor Root Ports.When this field is set, it indicates that the Root Complex is ableto honor the "Snoop Not Required" Attribute field in the TLPheader and ensure cache coherency for the transactions overthe VC resource. When this field is set to 0, it indicates that theRoot Complex ignores the "Snoop Not Required" Attribute fieldin the TLP header and does not perform Snoop operation fortransactions over the VC resource.22:16 Maximum Time Slots – Indicates the maximum number of timeslots (minus one) that the VC resource is capable of supportingwhen it is configured for time-based WRR Port Arbitration. Forexample, a value 0 in this field indicates the supportedmaximum number of time slots is 1 and a value of 127 indicatesthe supported maximum number of time slot is 128. This field isvalid for all Switch ports, Root Ports and RCRB, but not for <strong>PCI</strong><strong>Express</strong> Endpoint devices. In addition, this field is valid onlywhen Port Arbitration Capability indicates that the VC resourcesupports time-based WRR Port Arbitration.AttributeROHwInitHwIniit278


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.031:24 Port Arbitration Table Offset – Indicates the location of thePort Arbitration Table associated with the VC resource. Thisfield is valid for all Switch ports and RCRB, but not for <strong>PCI</strong><strong>Express</strong> Endpoint devices or Root Ports.This field contains the zero-based offset of the table inDQWORDS (16 bytes) from the base address of the VirtualChannel Capability Structure. A value of 0 indicates that thetable is not present.RO5.11.7. VC Resource Control RegisterFigure 5-45 details allocation of register fields in the VC Resource Control Register;Table 5-40 provides the respective bit definitions.31 30 27 26 24 23 20 19 17 16 158 70RsvdPRsvdPRsvdPTC/VC MapLoad Port Arbitration TablePort Arbitration SelectVC IDVC EnableFigure 5-45: VC Resource Control RegisterTable 5-40: VC Resource Control RegisterBitLocationDescription7:0 TC/VC Map – This field indicates the TCs that are mapped tothe VC resource. This field is valid for all devices.Bit locations within this field correspond to TC values. Forexample, when bit 7 is set in this field, TC7 is mapped to this VCresource. When more than one bit in this field is set, it indicatesthat multiple TCs are mapped to the VC resource.In order to remove one or more TCs from the TC/VC Map of anenabled VC, software must ensure that no new or outstandingtransactions with the TC labels are targeted at the given Link.Default value of this field is FFh for the first VC resource and is00h for other VC resources.Note:Bit 0 of this field is read-only. It must be set by hardware ('hardwired')for the first VC resource (default VC) and cleared forother VC resources when present.AttributeRW(see the noteforexceptions)279


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0BitLocationDescription16 Load Port Arbitration Table – This bit, when set, updates thePort Arbitration logic from the Port Arbitration Table for the VCresource. This field is valid for all Switch ports and RCRB, butnot for <strong>PCI</strong> <strong>Express</strong> Endpoint devices or Root Ports. In addition,this field is only valid when the Port Arbitration Table is used bythe selected Port Arbitration scheme (that is indicated by a setbit in the Port Arbitration Capability field selected by PortArbitration Select).Software sets this bit to signal hardware to update PortArbitration logic with new values stored in Port Arbitration Table;clearing this bit has no effect. Software uses the Port ArbitrationTable Status bit to confirm whether the new values of PortArbitration Table are completely latched by the arbitration logic.This bit always returns 0 when read.Default value of this field is 0.19:17 Port Arbitration Select – This field configures the VC resourceto provide a particular Port Arbitration service. This field is validonly for RCRB, but not for <strong>PCI</strong> <strong>Express</strong> Endpoint devices,Switch Ports or Root Ports.Permissible value of this field is a number corresponding to oneof the asserted bits in the Port Arbitration Capability field of theVC resource.This field can not be modified when the VC is already enabled.26:24 VC ID – This field assigns a VC ID to the VC resource (see notefor exceptions). This field is valid for all devices.This field can not be modified when the VC is already enabled.Note:For the first VC resource (default VC), this field is a read-onlyfield that must be set to 0 ('hard-wired').AttributeRWRWRW280


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0BitLocationDescription31 VC Enable – This field, when set, enables a Virtual Channel(see note 1 for exceptions). The Virtual Channel is disabledwhen this field is cleared. This field is valid for all devices.Software must use the VC Negotiation Pending bit to checkwhether the VC negotiation is complete. When VC NegotiationPending bit is cleared, a 1 read from this VC Enable bit indicatesthat the VC is enabled (Flow Control Initialization is completedfor the <strong>PCI</strong> <strong>Express</strong> port); a 0 read from this bit indicates that theVirtual Channel is currently disabled.Default value of this field is 1 for the first VC resource and is 0for other VC resource(s).Notes1. This bit is hardwired to 1 for the default VC (VC0), i.e.,writing to this field has no effect for VC0.2. To enable a Virtual Channel, the VC Enable bits for thatVirtual Channel must be set in both components on a Link.3. To disable a Virtual Channel, the VC Enable bits for thatVirtual Channel must be cleared in both components on aLink.4. Software must ensure that no traffic is using a VirtualChannel at the time it is disabled.5. Software must fully disable a Virtual Channel in bothcomponents on a Link before re-enabling the VirtualChannel.AttributeRW5.11.8. VC Resource Status RegisterFigure 5-46 details allocation of register fields in the VC Resource Status Register;Table 5-41 provides the respective bit definitions.15210RsvdZVC Negotiation Pendingrt Arbitration Table StatusFigure 5-46: VC Resource Status Register281


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0Table 5-41: VC Resource Status RegisterBitLocationDescription0 Port Arbitration Table Status – This bit indicates thecoherency status of the Port Arbitration Table associated withthe VC resource. This field is valid for RCRB, but not for <strong>PCI</strong><strong>Express</strong> Endpoint devices, Switch Ports or Root Ports. Inaddition, this field is valid only when the Port Arbitration Table isused by the selected Port Arbitration for the VC resource.This bit is set by hardware when any entry of the Port ArbitrationTable is written to by software. This bit is cleared by hardwarewhen hardware finishes loading values stored in the PortArbitration Table after software sets the Load Port ArbitrationTable field.Default value of this field is 0.1 VC Negotiation Pending – This bit indicates whether the VirtualChannel negotiation (initialization or disabling) is in pendingstate. This field is valid for all devices.When this bit is set by hardware, it indicates that the VCresource is still in the process of negotiation. This bit is clearedby hardware after the VC negotiation is complete. For a nondefaultVirtual Channel, software may use this bit when enablingor disabling the VC. For the default VC, this bit indicates thestatus of the process of Flow Control initialization.Before using a Virtual Channel, software must check whetherthe VC Negotiation Pending fields for that Virtual Channel arecleared in both components on a Link.AttributeRORO5.11.9. VC Arbitration TableThe VC Arbitration Table is a read-write register array that is used to store the arbitrationtable for VC Arbitration. This field is valid for all devices when a WRR table is used by theselected VC Arbitration. If it exists, the VC Arbitration Table is located by the VCArbitration Table Offset field.The VC Arbitration Table is a register array with fixed-size entries of 4 bits. Figure 5-47depicts the table structure of an example VC Arbitration Table with 32-phases. Each 4-bittable entry corresponds to a phase within a WRR arbitration period. The definition of tableentry is depicted in Table 5-42. The lower three bits (bit 0 to bit 2) contain the VC ID value,indicating that the corresponding phase within the WRR arbitration period is assigned to theVirtual Channel indicated by the VC ID.A phase containing a VC ID that does not correspond to any enabled VCs is simply skippedin the WRR arbitration.The highest bit (bit 3) of the table entry is reserved. The length of the table depends on theselected VC Arbitration as shown in Table 5-43.282


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0When the VC Arbitration Table is used by the default VC Arbitration method, the defaultvalues of the table entries must be all zero to ensure forward progress for the default VC(with VC ID of 0).31 28 7 4 3 0 Byte LocationPhase[7] … … … … … Phase[1] Phase[0] 00hPhase[15] … … … … … Phase[9] Phase[8] 04hPhase[23] … … … … … Phase[17] Phase[16] 08hPhase[31] … … … … … Phase[25] Phase[24] 0ChFigure 5-47: Structure of an Example VC Arbitration Table with 32-Phases.BitLocationTable 5-42: Definition of the 4-bit Entries in the VC Arbitration TableDescriptionAttribute2:0 VC ID RW3 Reserved RWVC Arbitration SelectTable 5-43 Length of the VC Arbitration TableVC Arbitration Table Length (in # of Entries)001b 32010b 64011b 1285.11.10. Port Arbitration TableThe Port Arbitration Table register is a read-write register array that is used to store theWRR arbitration table for Port Arbitration for the VC resource. This register array is validfor all Switch ports and RCRB, but not for Endpoint devices or Root Ports. It is onlypresent when one or more asserted bits in the Port Arbitration Capability field indicate thatthe device supports a Port Arbitration scheme that uses a programmable arbitration table.Furthermore, it is only valid when one of the above mentioned bits in the Port ArbitrationCapability field is selected by the Port Arbitration Select field.The Port Arbitration Table represents one port arbitration period. Figure 5-48 shows thestructure of an example Port Arbitration Table with 128 phases and 2-bit table entries. Eachtable entry containing a Port Number corresponds to a phase within a port arbitrationperiod. For example, a table with 2-bit entries can be used by a Switch component with upto 4 ports. A Port Number written to a table entry indicates that the phase within the PortArbitration period is assigned to the selected <strong>PCI</strong> <strong>Express</strong> port.• When the WRR Port Arbitration is used for a VC of any given port (as an Egress Portfor the traffic flow over the VC), a phase containing that port's Port Number is simplyskipped by the Port Arbiter.283


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0• When the Time-based WRR Port Arbitration is used for a VC of any given port, a phasecontaining that port's Port Number indicates an 'idle' time-slot for the Port Arbiter.The table entry size is determined by the Port Arbitration Table Entry Size field in the VCResource Capability Register 1. The length of the table is determined by the Port ArbitrationSelect field as shown in Table 5-44.When the Port Arbitration Table is used by the default Port Arbitration for the default VC,the default values for the table entries must contain at least one entry for each of other <strong>PCI</strong><strong>Express</strong> ports of the device to ensure forward progress for the default VC for each port. Thetable may contain RR or RR-like fair Port Arbitration for the default VC.31 30 5 4 3 2 1 0 Byte LocationPhase[15] … … … … … Phase[1] Phase[0] 00hPhase[31] … … … … … Phase[17] Phase[16] 04h08h0Ch10h14hPhase[111] … … … … … Phase[97] Phase[96] 18hPhase[127] … … … … … Phase[113] Phase[112] 1ChFigure 5-48: Example Port Arbitration Table with 128 Phases and 2-bit Table EntriesTable 5-44: Length of Port Arbitration TablePort Arbitration SelectPort Arbitration Table Length (in # of Entries)001b 32010b 64011b 128100b 128284


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.05.12. Device Serial Number CapabilityThe <strong>PCI</strong> <strong>Express</strong> Device Serial Number capability is an optional extended capability thatmay be implemented by any <strong>PCI</strong> <strong>Express</strong> device. The Device Serial Number is a read-only64-bit value that is unique for a given <strong>PCI</strong> <strong>Express</strong> device.All multi-function devices that implement this capability must implement it for function 0;other functions that implement this capability must return the same Device Serial Numbervalue as that reported by function 0.A <strong>PCI</strong> <strong>Express</strong> multi-device component such as a <strong>PCI</strong> <strong>Express</strong> Switch that implements thiscapability must return the same Device Serial Number for each device.310ByteOffset3GIO Enhanced Capability Header00hSerial Number Register04h08hFigure 5-49: <strong>PCI</strong> <strong>Express</strong> Device Serial Number Capability Structure5.12.1. Device Serial Number Enhanced Capability Header(Offset 00h)See Section 5.9.3 for a description of the <strong>PCI</strong> <strong>Express</strong> Enhanced Capability Header. TheExtended Capability ID for the Device Serial Number Capability is 0003h.31Next Capability Offset20 1916 15 03GIO Extended Capability IDCapability VersionFigure 5-50: Device Serial Number Enhanced Capability HeaderTable 5-45: Device Serial Number Enhanced Capability HeaderBitLocationDescription15:0 <strong>PCI</strong> <strong>Express</strong> Extended Capability ID – This field isa <strong>PCI</strong>-SIG defined ID number that indicates thenature and format of the extended capability.Extended Capability ID for the Device SerialNumber Capability is 0003h.19:16 Capability Version – This field is a <strong>PCI</strong>-SIGdefined version number that indicates the version ofRegisterAttributeRORO285


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0BitLocationDescriptionthe capability structure present.Must be 1h for this version of the specification.31:20 Next Capability Offset – This field contains theoffset to the next <strong>PCI</strong> <strong>Express</strong> capability structure or000h if no other items exist in the linked list ofcapabilities.For Extended Capabilities implemented in deviceconfiguration space, this offset is relative to thebeginning of <strong>PCI</strong> compatible configuration spaceand thus must always be either 000h (forterminating list of capabilities) or greater than 0FFh.RegisterAttributeRO5.12.2. Serial Number Register (Offset 04h)The Serial Number register is a 64-bit field that contains the IEEE defined 64-bit extendedunique identifier (EUI-64).630Serial Number RegisterFigure 5-51: Serial Number RegisterTable 5-46: Serial Number RegisterBitLocationDescription63:0 <strong>PCI</strong> <strong>Express</strong> Device Serial Number – This fieldcontains the IEEE defined 64-bit extended uniqueidentifier (EUI-64 ). This identifier includes a 24-bitcompany id value assigned by IEEE registrationauthority and a 40-bit extension identifier assignedby the manufacturer.RegisterAttributeRO286


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.05.13. Power Budgeting CapabilityThe <strong>PCI</strong> <strong>Express</strong> Power Budgeting Capability allows the system to properly allocate powerto devices that are added to the system at runtime. Through this capability, a device canreport the power it consumes on a variety of power rails, in a variety of device powermanagement states, in a variety of operating conditions. The system uses this information toensure that the system is capable of providing the proper power and cooling levels to thedevice. Failure to properly indicate device power consumption may risk device or systemfailure.This capability is required for all devices that are implemented as <strong>PCI</strong> <strong>Express</strong> Modules.Implementation of this capability is optional for <strong>PCI</strong> <strong>Express</strong> devices that are implementedeither as a <strong>PCI</strong> <strong>Express</strong> Card or are integrated on the motherboard. Devices that may beimplemented either as a <strong>PCI</strong> <strong>Express</strong> Module or a <strong>PCI</strong> <strong>Express</strong> Card are required toimplement this capability.310ByteOffset3GIO Enhanced Capability HeaderRsvdPData RegisterRsvdPData Select RegisterPower BudgetCapability Register00h04h08h0ChFigure 5-52: <strong>PCI</strong> <strong>Express</strong> Power Budgeting Capability Structure5.13.1. Power Budgeting Enhanced Capability Header (Offset00h)See Section 5.9.3 for a description of the <strong>PCI</strong> <strong>Express</strong> Enhanced Capability Header. TheExtended Capability ID for the Power Budgeting Capability is 0004h.31Next Capability Offset20 1916 15 03GIO Extended Capability IDCapability VersionFigure 5-53: Power Budgeting Enhanced Capability Header287


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0Table 5-47: Power Budgeting Enhanced Capability HeaderBitLocationDescription15:0 <strong>PCI</strong> <strong>Express</strong> Extended Capability ID – This field isa <strong>PCI</strong>-SIG defined ID number that indicates thenature and format of the extended capability.Extended Capability ID for the Power BudgetingCapability is 0004h.19:16 Capability Version – This field is a <strong>PCI</strong>-SIGdefined version number that indicates the version ofthe capability structure present.Must be 1h for this version of the specification.31:20 Next Capability Offset – This field contains theoffset to the next <strong>PCI</strong> <strong>Express</strong> capability structure or000h if no other items exist in the linked list ofcapabilities.For Extended Capabilities implemented in deviceconfiguration space, this offset is relative to thebeginning of <strong>PCI</strong> compatible configuration spaceand thus must always be either 000h (forterminating list of capabilities) or greater than 0FFh.RegisterAttributeRORORO5.13.2. Data Select Register (Offset 04h)This read-write register indexes the Power Budgeting Data reported through the Dataregister and selects the DWORD of Power Budgeting Data that should appear in the DataRegister. Index values for this register start at 0 to select the first DWORD of PowerBudgeting Data; subsequent DWORDs of Power Budgeting Data are selected by increasingindex values.288


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.05.13.3. Data Register (Offset 08h)This read-only register returns the DWORD of Power Budgeting Data selected by the DataSelect Register. Each DWORD of the Power Budgeting Data describes the power usage ofthe device in a particular operating condition. Power Budgeting Data for different operatingconditions is not required to be returned in any particular order, as long as incrementing theData Select Register causes information for a different operating condition to be returned.If the Data Select Register contains a value greater than or equal to the number of operatingconditions for which the device provides power information, this register should return allzeros.31 21 20 18 17 15 14 13 12 11 9 8 70RsvdP<strong>Base</strong> PowerPower RailTypePM StatePM Sub StateData ScaleFigure 5-54: Power Budgeting Data RegisterThe <strong>Base</strong> Power and Data Scale registers describe the power usage of the device; the PowerRail, Type, PM State, and PM Sub State registers describe the conditions under which thedevice has this power usage.Table 5-48: Power Budgeting Data RegisterBit Location Register Description Attributes7:0 <strong>Base</strong> Power – Specifies in Watts the base power value in thegiven operating condition. This value must be multiplied by thedata scale to produce the actual power consumption value.9:8 Data Scale – Specifies the scale to apply to the <strong>Base</strong> Powervalue. The power consumption of the device is determined bymultiplying the contents of the <strong>Base</strong> Power register field with thevalue corresponding to the encoding returned by this field.Defined encodings are:RORO00b 1.0x01b 0.1x10b 0.01x11b 0.001x289


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0Bit Location Register Description Attributes12:10 PM Sub State – Specifies the power management sub state ofthe operating condition being described. Defined encodings are:000b Default Sub State001b – 111b Device Specific Sub State14:13 PM State – Specifies the power management state of theoperating condition being described. Defined encodings are:00b D001b D110b D211b D3ROROA device returns 11b in this field and Aux or PME Aux in theType register to specify the D3-Cold PM State. An encoding of11b along with any other Type register value specifies the D3-Hot state.17:15 Type – Specifies the type of the operating condition beingdescribed. Defined encodings are:000b PME Aux001b Auxiliary010b Idle011b Sustained111b MaximumAll other encodings are reserved.19:18 Power Rail – Specifies the power rail of the operating conditionbeing described. Defined encodings are:000b Power (12V)001b Power (3.3V)010b Power (1.8V)111b ThermalROROAll other encodings are reserved.A device that implements the Power Budgeting Capability is required to provide data valuesfor the D0 Max and D0 Sustained PM State/Type combinations for every power rail fromwhich it consumes power; data for the D0 Max Thermal and D0 Sustained Thermalcombinations must also be provided if these values are different from the values reportedfor D0 Max and D0 Sustained on the power rails.Devices that support auxiliary power or PME from auxiliary power must provide data forthe appropriate power type (Aux or PME Aux).290


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.05.13.4. Power Budget Capability Register (Offset 0Ch)This register indicates the power budgeting capabilities of a device.7 10RsvdPSystem AllocatedFigure 5-55: Power Budget Capability RegisterTable 5-49: Power Budget Capability RegisterBit Location Register Description Attributes0 System Allocated – This bit when set indicates that the powerbudget for the device is included within the system powerbudget. Reported Power Budgeting Data for this device shouldbe ignored by software for power budgeting decisions if this bitis set.HwInit291


292<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.066. Power ManagementThis chapter describes <strong>PCI</strong> <strong>Express</strong> power management (<strong>PCI</strong> <strong>Express</strong>-PM) capabilities andprotocols.6.1. Overview<strong>PCI</strong> <strong>Express</strong>-PM provides the following services:• A mechanism to identify power management capabilities of a given function• The ability to transition a function into a certain power management state• Notification of the current power management state of a function• The option to wake the system on a specific event<strong>PCI</strong> <strong>Express</strong>-PM is compatible with the <strong>PCI</strong> Bus Power Management Interface <strong>Specification</strong>,Revision 1.1 (<strong>PCI</strong>-PM), and the Advanced Configuration and Power Interface <strong>Specification</strong>, Revision 2.0(ACPI). This chapter also defines <strong>PCI</strong> <strong>Express</strong> native power management extensions.These provide additional power management capabilities beyond the scope of the <strong>PCI</strong> PowerManagement Interface <strong>Specification</strong>.<strong>PCI</strong> <strong>Express</strong>-PM defines Link power management states, states that a <strong>PCI</strong> <strong>Express</strong> physicalLink is permitted to enter in response to either software driven D-state transitions or ActiveState Link PM activities (Active State Link PM is described later). <strong>PCI</strong> <strong>Express</strong> Links statesare not visible directly to legacy bus driver software, but are derived from the powermanagement state of the components residing on those Links. Defined Link states are L0,L0s, L1, L2, and L3. The power savings increase as the Link state transitions from L0through L3.<strong>PCI</strong> <strong>Express</strong> components are permitted to wake the system from any supported powermanagement state through the request of a power management event (PME). <strong>PCI</strong> <strong>Express</strong>systems may provide the optional auxiliary power supply (Vaux) needed for PME operationfrom the “off” system states. <strong>PCI</strong> <strong>Express</strong>-PM extends beyond its <strong>PCI</strong>-PM predecessor inthis regard as <strong>PCI</strong> <strong>Express</strong> PME “messages” include the geographical location (RequestorID) within the Hierarchy of the requesting agent. These PME messages are in-band TLPsrouted from the requesting device to a Root Complex.Another distinction of the <strong>PCI</strong> <strong>Express</strong>-PM PME mechanism is in its separation of thefollowing two tasks that are associated with PME:• Reactivation (wake) of the I/O Hierarchy (i.e., re-establishing reference clocks andmain power rails to the <strong>PCI</strong> <strong>Express</strong> components)• Sending the actual PME Message (vector) to the Root Complex293


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0An autonomous, hardware based active-state mechanism (Active State Link PM) enablespower savings even when the connected components are in the D0 state. After a period ofidle Link time the Active State Link PM mechanism engages in a physical layer protocol thatplaces the idle Link into a lower power state. Once in the lower power state transitions tothe fully operative L0 state are triggered by traffic appearing on either side of the Link.Endpoints initiate entry into a low power Link state. This feature may be disabled bysoftware.Throughout this document the term Upstream component, or Upstream device, is used torefer to the <strong>PCI</strong> <strong>Express</strong> component that is on the end of the <strong>PCI</strong> <strong>Express</strong> Link that ishierarchically closer to the root of the <strong>PCI</strong> <strong>Express</strong> tree hierarchy. The term Downstreamcomponent, or Downstream device, is used to refer to the <strong>PCI</strong> <strong>Express</strong> component that ison the end of the Link that is hierarchically further from the root of the <strong>PCI</strong> <strong>Express</strong> treehierarchy.6.1.1. Statement of RequirementsAll <strong>PCI</strong> <strong>Express</strong> components, with exception of the Root Complex, are required to meet orexceed the minimum requirements defined by the <strong>PCI</strong>-PM Software compatible <strong>PCI</strong><strong>Express</strong>-PM features. Root Complexes are required to participate in Link powermanagement DLLP protocols initiated by a downstream device, when all functions of adownstream component enter a <strong>PCI</strong>-PM Software compatible low power state. For furtherdetail, refer to Section 6.3.2.The Active State Link PM feature is a required feature (L0s entry at minimum) for allcomponents including Root Complexes, and is configured separately via the native <strong>PCI</strong><strong>Express</strong> configuration mechanisms.6.2. Link State Power Management<strong>PCI</strong> <strong>Express</strong> defines Link power management states, replacing the bus power managementstates that were defined by the <strong>PCI</strong>-PM specification. Link states are not visible to <strong>PCI</strong>-PMlegacy compatible software, and are either derived from the power management D-states ofthe corresponding components connected to that Link or by Active State powermanagement protocols (Refer to Section 6.4.1).Note that the <strong>PCI</strong> <strong>Express</strong> Physical Layer may define additional intermediate states. SeeChapter 4 for more detail on each state and how the Physical Layer handles transitionsbetween states.<strong>PCI</strong> <strong>Express</strong>-PM defines the following Link power management states:• L0 – Active state.All <strong>PCI</strong> <strong>Express</strong> transactions and other operations are enabled.L0 support is required for both Active State Link power management and <strong>PCI</strong>-PM compatible power management• L0s – A low resume latency, energy saving “standby” state.294


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0L0s support is required for Active State Link power management. It is notapplicable to <strong>PCI</strong>-PM compatible power management.All main power supplies, component reference clocks, and components’ internalPLLs must be active at all times during L0s. TLP and DLLP communicationover a Link that is in L0s is prohibited. The L0s state is used exclusively foractive-state power management.The <strong>PCI</strong> <strong>Express</strong> physical layer provides mechanisms for quick transitions fromthis state to the L0 state. When common (distributed) reference clocks are usedon both sides of a given Link, the transition time from L0s to L0 is typically lessthan 100 symbol times.• L1 – Higher latency, lower power “standby” state.L1 support is required for <strong>PCI</strong>-PM compatible power management. L1 isoptional for Active State Link power management.All platform provided main power supplies and component reference clocksmust remain active at all times during L1. The downstream component’sinternal PLLs may be shut off during L1, enabling greater energy savings at acost of increased exit latency 27 .The L1 state is entered whenever all functions of a downstream component ona given <strong>PCI</strong> <strong>Express</strong> Link are either programmed to a D-state other than D0, orif the downstream component requests L1 entry (Active State Link PM) andreceives positive acknowledgement for the request.Exit from L1 is initiated by an upstream initiated transaction targeting thedownstream component, or by the downstream component’s desire to initiate atransaction heading upstream. Transition from L1 to L0 is typically a fewmicroseconds.TLP and DLLP communication over a Link that is in L1 is prohibited.• L2/L3 Ready – Staging point for removal of main powerL2/L3 Ready transition protocol support is requiredThe L2/L3 Ready state is not directly related to either <strong>PCI</strong>-PM D-statetransitions or to Active State Link power management. L2/L3 Ready is thestate that a given Link enters into when the platform is preparing to enter itssystem sleep state. Following the completion of the L2/L3 Ready statetransition protocol for that Link, the Link is then ready for either L2 or L3, butnot actually in either of those states until main power has been removed.Depending upon the platform’s implementation choices with respect toproviding a Vaux supply, after main power has been removed the Link willeither settle into L2 (i.e., Vaux is provided), or it will settle into a zero power“off”state (see L3).27 For example, disabling the internal PLL may be something that is desirable when in D3 hot, but not sowhen in D1 or D2.295


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0The L2/L3 Ready state entry transition process must begin as soon as possiblefollowing the acknowledgment of a PM_TURN_OFF message, (i.e., theinjection of a PM_TO_Ack TLP). The downstream component initiates L2/L3Ready entry by injecting a PM_Enter_L23 DLLP onto its transmit Port. Referto Section 6.6 for further detail on power management system messages.TLP and DLLP communication over a Link that is in L2/L3 Ready isprohibited.Exit from L2/L3 Ready back to L0 may only be initiated by an upstreaminitiated transaction targeting the downstream component in the same mannerthat an upstream initiated transaction would trigger the transition from L1 backto L0. The case where an upstream initiated exit from L2/L3 Ready wouldoccur corresponds to the scenario where, sometime following the transition ofthe Link to L2/L3 Ready but before main power is removed, the platformpower manager decides not to enter the system sleep state.A Link’s transition into the L2/L3 Ready state is one of the final stagesinvolving <strong>PCI</strong> <strong>Express</strong> protocol leading up to the platform entering into in asystem sleep state wherein main power has been shut off (e.g., ACPI S3 or S4sleep state).• L2 – Auxiliary powered Link deep energy saving state.L2 support is optional, and dependent upon platform support of Vaux.L2 – The downstream component’s main power supply inputs and referenceclock inputs are shut off.• When in L2, all PME detection logic, Link reactivation “Beacon” logic,PME context, and any other “keep alive” logic is powered by Vaux.TLP and DLLP communication over a Link that is in L2 is prohibited.Exiting the L2 state is accomplished by reestablishing main power and referenceclocks to all components within the domain of the power manager, followed byfull Link training and initialization. Once a given Link has completed Linktraining and initialization it is then in the L0 state and may begin sending andreceiving TLPs and DLLPs.• L3 – Link Off state.Zero power state.Refer to Section 4.2 for further detail relating to entering and exiting each of the <strong>PCI</strong><strong>Express</strong> L-states.Figure 6-1 highlights the legitimate L-state transitions that may occur during the course ofLink operation.296


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0L0sL0L2L3L1L2/L3ReadySee note in textOM13819Figure 6-1: Link Power Management State TransitionsThe arc noted in Figure 6-1 indicates the case where the platform does not provide Vaux. Inthis case, the L2/L3 Ready state transition protocol results in a state of readiness for loss ofmain power, and once removed the Link settles into the L3 state.Link PM Transitions from any L-state to any other L-state must pass through the L0 stateduring the transition process with the exception of the L2/L3 Ready to L2 or L3 transitions.In this case, the Link transitions from L2/L3 Ready directly to either L2 or L3 when mainpower to the component is removed. (This follows along with a correspondingcomponent’s D-state transition from D3 hot to D3 cold )The following sequence, leading up to entering a system sleep state, illustrates the multi-stepLink state transition process:1. System Software directs all functions of a downstream component to D3 hot .2. The downstream component then initiates the transition of the Link to L1 asrequired.3. System Software then causes the Root Complex to broadcast the PM_Turn_Offmessage in preparation for removing the main power source.4. This message causes the subject Link to transition back to L0 in order to send it, andto enable the downstream component to respond with PM_TO_Ack.5. After the PM_TO_Ack is sent, the downstream component then initiates the L2/L3Ready transition protocol.L0 --> L1 --> L0 --> L2/L3 ReadyTable 6-1 summarizes each L-state, describing when they are used, and the <strong>PCI</strong> <strong>Express</strong>platform, and <strong>PCI</strong> <strong>Express</strong> component behaviors that correspond to each of them.A “Yes” entry indicates that support is required (unless otherwise noted). “On” and “Off”entries indicate the required clocking and power delivery. “On/Off” indicates an optionaldesign choice.297


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0Table 6-1: Summary of <strong>PCI</strong> <strong>Express</strong> Link Power Management StatesL0L-StateDescriptionFully activeLinkUsed bySWDirectedPMUsed byActiveStateLink PML0s Standby State No Yes 1L1L2/L3ReadyL2Lower PowerStandbyStaging pointfor powerremovalLow PowerSleep State(all clks, mainpower off)PlatformReferenceClocksPlatformMainPowerComponentInternal PLLPlatformVauxYes (D0) Yes (D0) On On On On/OffYes(D1-D3 hot)(D0)Yes 2(opt., D0)On On On On/OffOn On On/Off 3 On/OffYes 4 No On On On/Off On/OffYes 5 No Off Off Off On 6L3 Off (zero power) n/a n/a Off Off Off OffNotes:1. L0s exit latency will be greatest in Link configurations characterized by independentreference clock inputs for components connected to opposite ends of a given Link.(vs. a common, distributed reference clock)2. L1 entry may be requested within Active State Link PM protocol, however itssupport is optional.3. L1 exit latency will be greatest for components that internally shut off their PLLsduring this state4. L2/L3 Ready entry sequence is initiated at the completion of thePM_Turn_Off/PM_TO_Ack protocol handshake . It is not directly affiliated with aD-State transition, or a transition in accordance with Active State Link PM policiesand procedures.5. Depending upon the platform implementation, the system’s sleep state may utilizethe L2 state or transition to being fully off (L3). L2/L3 Ready state transitionprotocol is initiated by the downstream component following reception and TLPacknowledgement of the PM_Turn_Off TLP Message. While platform support foran L2 sleep state configuration is optional (i.e., support for Vaux delivery), <strong>PCI</strong><strong>Express</strong> component protocol support for transitioning the Link to the L2/L3 Readystate is required.6. L2 is distinguished from the L3 state only by the presence of Vaux. After thecompletion of the L2/L3 Ready state transition protocol and before main power hasbeen removed the Link has indicated its readiness main power removal.298


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.06.3. <strong>PCI</strong>-PM Software Compatible Mechanisms6.3.1. Device Power Management States (D-States) of aFunction<strong>PCI</strong> <strong>Express</strong> supports all <strong>PCI</strong>-PM device power management states. All functions mustsupport the D0 and D3 states (both D3 hot and D3 cold ). The D1 and D2 states are optional.Refer to the <strong>PCI</strong> Bus Power Management Interface <strong>Specification</strong> for further detail relating to the<strong>PCI</strong>-PM compatible features described in this specification. Note that where thisspecification defines detail that departs from the <strong>PCI</strong> Bus Power Management Interface<strong>Specification</strong>, this specification takes precedence for <strong>PCI</strong> <strong>Express</strong> components and Linkhierarchies.6.3.1.1. D0 StateAll <strong>PCI</strong> <strong>Express</strong> functions must support the D0 state. D0 is divided into two distinct substates,the “un-initialized” sub-state and the “active” sub-state. When a <strong>PCI</strong> <strong>Express</strong>component initially has its power applied, it defaults to the D0 uninitialized state. Componentsthat are in this state will be enumerated and configured by the <strong>PCI</strong> <strong>Express</strong> Hierarchyenumeration process. Following the completion of the enumeration and configurationprocess the function enters the D0 active state, the fully operational state for a <strong>PCI</strong> <strong>Express</strong>function. A function enters the D0 active state whenever any single or combination of thefunction’s Memory Space Enable, I/O Space Enable, or Bus Master Enable bits have beenenabled by system software6.3.1.2. D1 StateD1 support is optional. While in the D1 state, a function must not initiate any TLPs on theLink with the exception of a PME Message as defined in Section 6.3.3. Configurationrequests are the only TLP accepted (as target) by a function that is currently in the D1 state.All other received Requests must be handled as Unsupported Requests.Note that a function’s software driver participates in the process of transitioning thefunction from D0 to D1. It contributes to the process by saving any functional state (ifnecessary), and otherwise preparing the function for the transition to D1. As part of thisquiescence process the function’s software driver must ensure that any mid-transaction TLPs(i.e., Requests with outstanding Completions), are terminated prior to handing control to thesystem configuration software that would then complete the transition to D1.6.3.1.3. D2 StateD2 support is optional. While in the D2 state, a function must not initiate any TLPs on theLink with the exception of a PME Message as defined in Section 6.3.3. Configurationrequests are the only TLP accepted (as target) by a function that is currently in the D2 state.All other received TLPs must be handled as unsupported packets.299


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0Note that a function’s software driver participates in the process of transitioning thefunction from D0 to D2. It contributes to the process by saving any functional state (ifnecessary), and otherwise preparing the function for the transition to D2. As part of thisquiescence process the function’s software driver must ensure that any mid-transaction TLPs(i.e., Requests with outstanding Completions), are terminated prior to handing control to thesystem configuration software that would then complete the transition to D2.6.3.1.4. D3 StateD3 support is required, (both the D3 cold and the D3 hot states). Functions supporting PMEgeneration from D3 must support it for both D3 cold and the D3 hot states.Functional context does not need to be maintained by functions in the D3 state. Software isrequired to re- initialize the function following a D3 → D0 transition.The minimum recovery time following a D3 hot → D0 transition is 10 ms. This recoverytime may be used by the D3 hot → D0 transitioning component to bootstrap any of itscomponent interfaces (e.g., from serial ROM) prior to being accessible. Attempts to targetthe function during the recovery time (including configuration request packets) will result inundefined behavior.6.3.1.4.1. D3 hot StateWhen a function is in D3 hot , it must respond to configuration accesses targeting it. Theymust also participate in the PM_Turn_Off/PM_TO_Ack protocol. Refer to Section 6.3.3details. Once in D3 hot the function can later be transitioned into D3 cold (by removing powerfrom its host component).Transitions into the D3 hot state are used to establish a standard process for graceful saving offunctional state immediately prior to entering a deeper power savings state where power isremoved.Note that a function’s software driver participates in the process of transitioning thefunction from D0 to D3 hot . It contributes to the process by saving any functional state thatwould otherwise be lost with removal of main power, and otherwise preparing the functionfor the transition to D3 hot . As part of this quiescence process the function’s software drivermust ensure that any outstanding transactions (i.e., Requests with outstanding Completions),are terminated prior to handing control to the system configuration software that would thencomplete the transition to D3 hot .Note that D3 hot is also a useful state for reducing power consumption by idle components inan otherwise running system.300


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.06.3.1.4.2. D3cold StateA function transitions to the D3 cold state when its power is removed. A power-on sequencetransitions a function from the D3 cold state to the D0 Uninititialized state. At this point softwaremust perform a full initialization of the function in order to re-establish all functionalcontext, completing the restoration of the function to its D0 active state.Functions that support PME assertion from D3 cold must maintain their PME context forinspection by PME service routine software during the course of the resume process.Functions may only generate PME messages from D3 cold if the platform supplies them with aVaux supply or if they have an independent source of power. 28 PME context consists of allinformation relating to the function’s assertion of PME.Implementation Note: PME ContextExamples of PME context include, but are not limited to, a function’s PME_Status bit, therequesting agent’s Requester ID, Caller ID if supported by a modem, IP information for IPdirected network packets that trigger a resume event, SHPC extended context, etc.A function’s PME assertion is acknowledged when system software performs a “write 1 toclear” configuration write to the asserting function’s PME_Status bit of its <strong>PCI</strong>-PMcompatible PMCSR register.An auxiliary power source must be used to support PME event detection, Link reactivation,and to preserve PME context from within D3 cold . Note that once the I/O Hierarchy hasbeen brought back to a fully communicating state, as a result of the Link reactivation, thewaking agent then propagates a PME message to the root of the Hierarchy indicating thesource of the PME event. Refer to Section 6.3.3 for further PME specific detail. Exit fromD3 cold is accomplished with assertion of PWRGOOD, (either provided as an auxiliary signalor internally generated by the component), followed by the Link training sequence.28 Note that when a component reports support for PME generation from D3 hot and D3 cold (PMC register)this does not constitute a guarantee that the platform will support the generation of PMEs from D3 cold. To becertain of this, software must poll the components’ <strong>PCI</strong> <strong>Express</strong> capability structures to ensure that thecomponents report that Vaux is being provided to them by the platform (refer to Chapter 5 for details).301


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.06.3.2. PM Software Control of the Link Power ManagementStateThe power management state of a Link is determined by the D-state of its Downstreamcomponent.Table 6-2 depicts the relationships between the power state of a component (Endpoint,Switch) and its Upstream Link.Table 6-2: Relation Between Power Management States of Link and ComponentsDownstreamComponent D-StatePermissible UpstreamComponent D-StatePermissibleInterconnect StateD0 D0 L0, L0s, L1 (1)D1 D0-D1 L1D2 D0-D2 L1D3 hot D0-D3 hot L1, L2/L3 Ready (2)D3 cold D0-D3 cold L2 (3) ,L3Notes:1. All <strong>PCI</strong> <strong>Express</strong> components are required to support Active-State Link PowerManagement with L0s entry during idle at a minimum. The use of L1 within D0is optional.2. When all functions within a downstream component are programmed to D3 hotthe downstream component must request the transition of its Link to the L1state using the PM_ENTER_L1 DLLP. Once in D3 hot , following the executionof a PM_TURN_OFF / PM_TO_Ack handshake sequence, the downstreamcomponent must then request a Link transition to L2/3 Ready using thePM_ENTER_L23 DLLP. Following the L2/L3 Ready entry transition protocolthe downstream component must be ready for loss of main power and referenceclock.3. If Vaux is provided by the platform, the Link sleeps in L2. In the absence ofVaux, the L-state is L3The conditions governing Link state transition in the software directed <strong>PCI</strong>-PM compatiblepower management scheme are defined as:• A Switch or single function Endpoint device must initiate a Link state transitionof its Upstream Port (Switch), or Port (endpoint), to L1 based solely upon thatPort being programmed to D1, D2, or D3 hot . In the case of the Switch, systemsoftware bears the responsibility of ensuring that any D-state programming of aSwitch’s Upstream Port is done in a compliant manner with respect to <strong>PCI</strong><strong>Express</strong> hierarchy-wide PM policies (i.e., the Upstream Port cannot beprogrammed to a D-state that is any less active that the most active downstreamPort and downstream connected component/function(s)).302


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0• Multi-function Endpoints must not initiate a Link state transition to L1 until allof their functions have been programmed to a non-D0 D-state.6.3.2.1. Entry into the L1 StateFigure 6-2 depicts the process by which a Link is transitioned into the L1 state as a directresult of power management software programming the downstream connected componentinto a lower power state, (either D1, D2, or D3 hot state). This figure and the subsequentdescription outline the transition process for a single function downstream component thatis being programmed to a non-D0 state.1Legend:T - TransactionD - Data LinkP - PhysicalUpstream ComponentUpstream component sendsconfiguration write requestActiveInactiveT D P P D TDownstream ComponentDownstream componentbegins L1 transition processDownstream componentblocks scheduling of new TLPsDownstream component waitsto receive Ack for last TLPPM_Enter_L1 DLLPsent upstream2345Upstream component blocksscheduling of new TLPs679Upstream component receivesacknowledgment for last TLPUpstream component sendsPM_Request_Ack DLLPcontinuously until it seeselectrical idleUpstream component completesL1 transition: disables DataLink Layer, brings PhysicalLayer to electrical idletimeDownstream component waitsfor PM_Request_Ack DLLP,acknowledging thePM_Enter_L1 DLLPDownstream component seesPM_Request_Ack DLLP,disables Data Link Layer,and brings Physical Layerto electrical idle8Figure 6-2: Entry into L1 Link StateOM13820303


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0The following text provides additional detail for the Link state transition process picturedabove.PM Software Request:1. PM Software (upstream component) sends a TLP configuration request packet tochange the downstream function’s D-state (D1 for example).Downstream Component Link State Transition Initiation Process:2. The downstream component schedules the completion response corresponding tothe configuration write to its PMCSR PowerState field. All new TLP scheduling issuspended.3. The downstream component then waits until it receives a Link layeracknowledgement for the PMCSR write completion, and any other TLPs it hadpreviously sent. The component may retransmit a TLP out of its Link Layer Retrybuffer if required to do so by Link layer rules.4. Once all of the downstream component’s TLPs have been acknowledged thedownstream component transmits a PM_Enter_L1 DLLP onto its upstreamdirected(transmit) Port. The downstream component sends this DLLPcontinuously until it receives a response from the upstream component 29(PM_Request_Ack). While waiting for all of its TLPs to be acknowledged thedownstream component must not initiate any new TLPs. The downstreamcomponent must still however continue to accept TLPs and DLLPs from theupstream component, and it must also continue to respond with DLLPs as neededper Link Layer protocol. Refer to the Electrical chapter for more details on thephysical layer behavior.Upstream Component Link State Transition Process:5. Upon receiving the PM_Enter_L1 DLLP the upstream component blocks thescheduling of any future TLPs.6. The upstream component then must wait until it receives a Link layeracknowledgement for the last TLP it had previously sent. The upstream componentmay retransmit a TLP from its Link layer retry buffer if required to do so by the Linklayer rules.7. Once all of the upstream component’s TLPs have been acknowledged the upstreamcomponent sends a PM_Request_Ack DLLP downstream. The upstreamcomponent sends this DLLP continuously until it observes its receive Lanes enter29 If at this point the Downstream component needs to initiate a transfer on the Link, it must first completethe transition to L1 regardless. Once in L1 it is then permitted to initiate an exit L1 to handle the transfer.This corner case represents an event requiring a PME message occurring during the component’s transitionto L1.304


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0into the electrical idle state. See Chapter 4 for more details on the Physical Layerbehavior.30Completing the L1 Link State Transition:8. Once the downstream component has captured the PM_Request_Ack DLLP on itsreceive Lanes (signaling that the upstream component acknowledged the transitionto L1 request), it then disables its Link layer and brings the upstream directedphysical Link into the electrical idle state.9. When the upstream component observes its receive Lanes enter the electrical idlestate, it then stops sending PM_Request_Ack DLLPs, disables its Link layer andbrings its transmit Lanes to electrical idle completing the transition of the Link to L1.When two components’ interconnecting Link is in L1 as a result of the downstreamcomponent being programmed to a non-D0 state, both components suspend theoperation of their Flow Control Update, DLLP ACK/NAK Latency, and TLPCompletion Timeout counter mechanisms 31 . Refer to the Electrical chapter formore detail on the physical layer behavior.Components on either end of a Link in L1 may optionally disable their internal PLLs inorder to conserve more energy. Note however that platform supplied main power, andreference clocks must always be supplied to components on both ends of an L1 Link.6.3.2.2. Exit from L1 StateL1 exit can be initiated by the component on either end of a <strong>PCI</strong> <strong>Express</strong> Link. Adownstream component would initiate an L1 exit transition in order to bring the Link to L0such that it may then inject a PME message.The upstream component initiates L1 exit to re-establish normal TLP and DLLPcommunications on the Link.In either case the physical mechanism for transitioning a Link from L1 to L0 is the same andare described in detail within the Electrical Chapter.Figure 6-3 outlines a sequence that would trigger an Upstream component to initiatetransition of the Link to the L0 state.30 If, at this point, the Upstream component for any reason needs to initiate a transfer on the Link, it mustfirst complete the transition to L1 regardless. Once in L1 it is then permitted to initiate an exit from L1 tohandle the transfer.31 This is the required behavior regardless of whether the L1 state was the result of software driven PMprotocol, or the result of Active State Link PM protocol.305


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0UpstreamComponentStateD0D0D0D0LinkStateL1L1L0L0DownstreamComponentStateD1D1D0 D1D0StandbyStatePM ConfigurationRequest triggersL1 to L0 TransitionL1 to L0TransitionCompletePM ConfigurationRequestDeliveredFigure 6-3: Exit from L1 Link State Initiated by Upstream ComponentSequence of events:1. Power management software initiates a configuration cycle targeting a PMconfiguration register (the PowerState field of the PMCSR in this example) within afunction that resides in the Downstream component (e.g., to bring the function backto the D0 state).2. The Upstream component detects that a configuration cycle is intended for a Linkthat is currently in a low power state, and as a result, initiates a transition of that Linkinto the L0 state.3. In accordance with the Chapter 4 definition, both directions of the Link enter intoLink training, resulting in the transition of the Link to the L0 state. The L1 L0transition is discussed in detail in Chapter 4.4. Once both directions of the Link are back to the active L0 state, the Upstream Portsends the configuration Packet Downstream.6.3.2.3. Entry into the L2/L3 Ready StateTransition to the L2/L3 Ready state follows a process that is similar to the L1 entry process.There are some minor differences between the two that are spelled out below.• L2/L3 Ready entry transition protocol does not immediately result in an L2 or L3Link state. The transition to L2/L3 Ready is effectively a handshake to establish thedownstream component’s readiness for power removal. L2 or L3 is ultimatelyachieved when the platform removes the components’ power and reference clock.306


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0• The time for L2/L3 Ready entry transition is indicated by the completion of thePM_Turn_Off / PM_TO_Ack handshake sequence. Any actions on the part of thedownstream component necessary to ready itself for loss of power must becompleted prior to initiating the transition to L2/L3 Ready. Once all preparationsfor loss of power and clock are completed L2/L3 Ready entry is initiated by thedownstream component by sending the PM_Enter_L23 DLLP upstream.In contrast, the time for L1 entry transition is indicated by programming all of thedownstream component’s function(s) to non-D0 states, or by Active State Link PMpolicies. There are no preparations necessary before initiating a transition to L1.• The downstream component must be in D3 hot prior to being transitioned into theL2/L3 Ready state, i.e., a PM_Turn_Off message must never be sent unless allfunctions downstream of its point of origin are currently in D3 hot .In contrast, a downstream component initiating a transition to L1 would have alwaysinitially been in D0, and had just be reprogrammed to D1, D2, or D3 hot .• L2/L3 Ready entry transition protocol uses the PM_Enter_L23 DLLP.The L1 entry protocol uses the PM_Enter_L1.DLLP.In either case, the PM_Enter_Lx DLLP is sent repeatedly until the downstreamcomponent observes electrical idle on its receive Port.6.3.3. Power Management Event Mechanisms6.3.3.1. MotivationThe <strong>PCI</strong> <strong>Express</strong> PME mechanism is software compatible with the PME mechanismdefined by the <strong>PCI</strong>-PM specification. Power Management Events are generated by <strong>PCI</strong><strong>Express</strong> functions as a means of requesting a PM state change. Power Management Eventsare typically utilized to revive the system or an individual function from a low power state.Power management software may transition a <strong>PCI</strong> <strong>Express</strong> Hierarchy into a low power state,and transition the upstream links of these devices into the non-communicating L2 state 32 .The <strong>PCI</strong> <strong>Express</strong> PME generation mechanism is therefore broken into two components:• Waking a non-communicating Hierarchy. This step is required only if the upstream Linkof the device originating the PME is in the non-communicating L2 state, since in thatstate the device cannot send a PM_PME message upstream.• Sending a PM_PME message to the root of the <strong>PCI</strong> <strong>Express</strong> HierarchyPME indications are propagated to the Root Complex in the form of TLP messages.PM_PME messages include the logical location of the requesting agent within the Hierarchy(in the form of the Requester ID of the PME message header). Explicit identification within32 The L2 state is defined as “non-communicating” since component reference clock and main power supplyare removed in that state.307


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0the PM_PME message is intended to facilitate quicker PME service routine response, andhence shorter resume time.6.3.3.2. Link ReactivationThe Link reactivation mechanism provides a means of signaling the platform to re-establishpower and reference clocks to the components within its domain. Refer to Section 4.2 fordetails on the in-band mechanism for Link reactivation. Refer to the <strong>PCI</strong> <strong>Express</strong> CardElectromechanical <strong>Specification</strong> for details on the out-of-band mechanism for Link reactivation.Systems that allow PME generation from D3 cold state must provide auxiliary power tosupport Link reactivation when the main system power rails are off.The reactivation period ends when the upstream-directed Link of a device enters theinitialization phase as a result of Link transition from the L2 state to the L0 state–a poweronsequence transitions the device from the D3 cold state to the D0 uninitialized state.The downstream device shall cease requesting Link reactivation (either in-band or auxiliaryout-of-band) once it has entered the D0 uninitialized state.Once the Link has been re-activated and trained, the requesting agent then propagates aPM_PME message upstream to the Root Complex.6.3.3.2.1. PME Fence<strong>PCI</strong> <strong>Express</strong> devices need to be notified before their reference clock and main power maybe removed so that they can prepare for that eventuality. <strong>PCI</strong> <strong>Express</strong>-PM introduces afence mechanism that serves to initiate the power removal sequence while also coordinatingthe behavior of the platform’s power management controller and PME handling by <strong>PCI</strong><strong>Express</strong> agents.There exist race conditions where a downstream agent, if not somehow coordinated with theplatform’s power manager, could potentially initiate a PM_PME message while the powermanager was in the process of turning off the main power source to the Link Hierarchy.The net result of hitting this corner condition would be loss of the PME indication. Thefence mechanism ensures this does not happen.PME_Turn_Off Broadcast MessageBefore main component power and reference clocks are turned off the Root Complex orHot Plug controller within a Switch Downstream Port, must issue a broadcast message thatinstructs all agents downstream of that point within the hierarchy to cease initiation of anysubsequent PM_PME messages, effective immediately upon receipt of the PME_Turn_Offmessage.Each <strong>PCI</strong> <strong>Express</strong> agent is required to respond with a TLP “acknowledgement” Packet,PME_TO_ACK that is, as in the case of a PME Message, always routed upstream. In all308


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0cases, the PM_TO_Ack message must terminate at the PM_Turn_Off message’s point oforigin. 33Note that PM_PME and PME_TO_Ack, like all other <strong>PCI</strong> <strong>Express</strong> message packets, arehandled as posted transactions. It is their posted transaction nature that ensures that anypreviously injected PM_PME messages will be pushed ahead of the fence acknowledgementassuring full in-order delivery of any previously initiated PM_PME messages before the“Turn off” acknowledgement ever reaches the initiator of the PM_Turn_Off message.For the case where a PM_Turn_Off message is initiated upstream of a <strong>PCI</strong> <strong>Express</strong> Switch,the <strong>PCI</strong> <strong>Express</strong> Switch’s Upstream Port must report an “aggregate” acknowledgement onlyafter having received PME_TO_ACK packets from each of their downstream portsindividually. Once a PM_TO_Ack Packet has arrived on all downstream ports, the Switchthen sends a PM_TO_Ack packet on its upstream Port.All <strong>PCI</strong> <strong>Express</strong> components must accept 34 and acknowledge the PME_Turn_Off Packetfrom within the D3 hot State. Once an Endpoint has sent a PME_TO_Ack Packet on itstransmit Link, it must then prepare for removal of its power and reference clocks byinitiating a transition to the L2/L3 Ready state.A Switch must also transition its upstream Link to the L2/L3 Ready state in the samemanner as described in the previous paragraph for Endpoints. However, the Switch initiatesthis transition only after all of its downstream ports have entered L2/L3 Ready state.The Links attached to the originator of the PME_Turn_Off message are the last to assumethe L2/L3 Ready state. This serves as an indication to the power delivery manager 35 that allLinks within that portion of the <strong>PCI</strong> <strong>Express</strong> hierarchy have:• Successfully retired all in flight PME messages to the point of PME_Turn_Offmessage origin• Performed any necessary local conditioning in preparation for power removalThe power delivery manager must wait a minimum of 100 ns after observing all linkscorresponding to the point of origin of the PME_Turn_Off message enter L2/L3 Readybefore removing the components’ reference clock and main power.33 Point of origin for the PM_Turn_Off message could be all of the Root Ports for a given Root Complex (fullplatform sleep state transition), an individual hot plug capable Root Port, or a hot plug capable SwitchDownstream Port.34 FC credits permitting.35 Power delivery control within this context relates to control over the entire <strong>PCI</strong> <strong>Express</strong> Link hierarchy, orover a subset of <strong>PCI</strong> <strong>Express</strong> links ranging down to a single <strong>PCI</strong> <strong>Express</strong> Link for sub hierarchies residingdownstream of a Hot Plug controller managed interconnect.309


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0Implementation Note: PM_TO_Ack Message Proxy by Switch DevicesOne of the PM_Turn_off / PM_TO_Ack handshake's key roles is to ensure that all in flightPME messages are flushed from the <strong>PCI</strong> <strong>Express</strong> fabric prior to sleep state power removal.This is guaranteed to occur because PME messages and the PM_TO_Ack messages bothuse the posted request queue within VC0 and so all previously injected PME messages willbe made visible to the system before the PM_TO_Ack is received at the Root Complex.Once all downstream ports of the Root Complex receive a PM_TO_Ack message the RootComplex can then signal the power manager that it is safe to remove power without loss ofany PME messages.Switches create points of hierarchical expansion and so therefore must wait for all of theirconnected downstream ports to receive a PM_TO_Ack message before it can send aPM_TO_Ack message upstream on behalf of the sub-hierarchy that it has createddownstream. This can be accomplished very simply using common score boardingtechniques. For example, once a PM_Turn_Off broadcast message has been broadcastdownstream of the switch, the switch simply checks off each downstream port havingreceived a PM_TO_Ack. Once the last of its active downstream ports receives aPM_TO_Ack the switch will then send a single PM_TO_Ack message upstream as a proxyon behalf of the entire sub-hierarchy downstream of it. Note that once a downstream portreceives a PM_TO_Ack message and the switch has scored its arrival, the port is then free todrop the packet from its internal queues and free up the corresponding posted request queueFC credits.Implementation Note: PME_TO_Ack Deadlock AvoidanceAs specified earlier, any device that detects a PME_Turn_Off message must reply with aPME_TO_Ack message. However, system behavior must not depend on the correctbehavior of any single device. In order to avoid deadlock in the case that one or moredevices do not respond with a PME_TO_Ack message, the power manager must notdepend on the acceptance of a PME_TO_Ack message. For example, the power managermay timeout after waiting for the PME_TO_Ack message for a given time, after which itproceeds as if the message was accepted.310


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.06.3.3.3. PM_PME MessagesPM_PME messages are posted Transaction Layer Packets (TLPs) that inform the powermanagement software which agent within the <strong>PCI</strong> <strong>Express</strong> Hierarchy requests a PM statechange. PM_PME messages, like all other Power Management system messages must usethe general purpose Transfer Class, TC #0.PM_PME messages are always routed in the direction of the Root Complex. To send aPM_PME message on its upstream Link, a device must transition the Link to the L0 state (ifthe Link was not is that state already). Unless otherwise noted, the device will keep the Linkin the L0 state following the transmission of a PM_PME message.6.3.3.3.1. PM_PME “Backpressure” Deadlock AvoidanceA <strong>PCI</strong> <strong>Express</strong> Root Complex is typically implemented with local buffering to temporarilystore a finite number of PM_PME messages that could potentially be simultaneouslypropagating through the <strong>PCI</strong> <strong>Express</strong> Hierarchy at any given time. Given a limited numberof PM_PME messages that can be stored within the Root Complex, there can bebackpressure applied to the upstream directed posted queue in the event that the capacity ofthis temporary PM_PME message buffer is exceeded.Deadlock can occur according to the following example scenario:• Incoming PM_PME messages fill the Root Complex’s temporary storage to its fullcapacity while there are additional PM_PME messages still in the Hierarchy makingtheir way upstream.• Root Complex, on behalf of system software, issues split configuration read requesttargeting one of the PME requester’s PMCSR (e.g., reading its PME_Status bit).• The corresponding split completion Packet is required, as per producer/consumerordering rules, to push all previously posted PM_PME messages out ahead of it,which in this case are PM_PME messages that have no place to go.• PME service routine cannot make progress, PM_PME message storage situationdoes not improve.• Deadlock occurs.Precluding potential deadlocks requires the Root Complex to always enable forward progressunder these circumstances. This must be done by accepting any PM_PME messages thatposted queue flow control credits allow for, and discarding any PM_PME messages thatcreate an overflow condition. This required behavior ensures that no deadlock will occur inthese cases, however PM_PME messages will be discarded and hence lost in the process.To ensure that no PM_PME messages are lost permanently, all agents that are capable ofgenerating PM_PME must implement a PME Service Timeout mechanism to ensure thattheir PME requests are serviced in a reasonable amount of time.If after 100 ms (+ 50% / - 5%), the PME_Status bit of a requesting agent has not yet beencleared, the PME Service Timeout mechanism expires triggering the PME requesting agent311


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0to re-send the temporarily lost PM_PME message. If at this time the Link is in a noncommunicatingstate, then prior to re-sending the PM_PME message the agent mustreactivate the Link as defined in Section 6.3.3.2.6.3.3.4. PME Rules• All <strong>PCI</strong> <strong>Express</strong> components supporting <strong>PCI</strong> <strong>Express</strong>-PM must implement the <strong>PCI</strong>-PM PMC and PMCSR registers in accordance with the <strong>PCI</strong>-PM specification. Theseregisters reside in the <strong>PCI</strong>-PM compliant <strong>PCI</strong> Capability List format.• PME capable functions must implement the PME_Status bit, and underlyingfunctional behavior, in their PMCSR configuration register.• When a function initiates Link reactivation, or issues a PM_PME Message, itmust set its PME_Status bit.• Switches must route a PM_PME received on any Downstream Port to theirUpstream Port• PME capable agents must comply with PME_Turn_Off and PME_TO_Ack fenceprotocols• Before a Link or a portion of Hierarchy is transferred into a non-communicatingstate (i.e., a state they cannot issue a PM_PME Message from), a PME_Turn_OffMessage must be broadcast Downstream.6.3.3.5. PM_PME Delivery State MachineThe following diagram conceptually outlines the PM_PME delivery control state machine.This state machine determines ability of a Link to service PME events by issuing PM_PMEimmediately vs. requiring initial Link reactivation.312


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0Power-upPME_Turn_OffPME_TO_ACKPME_Status cleared(SW handler)CommunicatingPME_Status setPM_PME MessagePower GoodPME_Turn_OffPME_TO_ACKInitiate wake signalingNonCommunicatingPME_Status setInitiate wake signaling(in-band or out-of-band)PME SentLinkReactivationTimeoutPM_PME messagePower GoodClear wake signalingPM_PME MessageFigure 6-4: A Conceptual PME Control State MachineCommunicating State:At initial power-up, the Upstream Link the enters “Communicating” state• If PME_Status is asserted (assuming PME delivery is enabled), a PM_PME Message willbe issued Upstream, terminating at the root of the <strong>PCI</strong> <strong>Express</strong> Hierarchy. The nextstate is the “PME Sent” state• If a PME_Turn_Off Message is received, the Link enters the “Non-Communicating”state following its acknowledgment of the message and subsequent entry into the L2/L3Ready state.Non-communicating State:• If a Power Good signal transitions from inactive to active state (an indication that powerand clock have been restored), the next state is the “Communicating” state.• If PME_Status is asserted, the Link will transition to “Link Reactivation” state, andactivate the wake mechanism.313


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0PME Sent State• If PME_Status is cleared, the function becomes PME Capable again. Next state is the“Communicating” state.• If the PME_Status bit is not cleared by the time the PME service timeout expires, aPM_PME message is re-sent upstream. See Section 6.3.3.3.1 for an explanation of thetimeout mechanism.• If a PME message has been issued but the PME_Status has not been cleared by softwarewhen the Link is about to be transitioned into a messaging incapable state (aPME_Turn_Off Message is received), the Link transitions into “Link Reactivation” stateafter sending a PM_TO_ACK message. The device also activates the wake mechanism.Link Reactivation State• If a Power Good signal transitions from inactive to active state, the Link resumes atransaction-capable state. The device clears the wake signaling, issues a PM_PMEUpstream and transitions into the “PME Sent” state.314


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0Implementation Note: <strong>PCI</strong> <strong>Express</strong>-to-<strong>PCI</strong> Bridge PME Considerations<strong>PCI</strong> <strong>Express</strong>-to-<strong>PCI</strong> Bridges must “bridge” power management events from the originalPME# wire’or signal connections to the <strong>PCI</strong> <strong>Express</strong> in-band PME messaging scheme. <strong>PCI</strong><strong>Express</strong>-to-<strong>PCI</strong> Bridges are required to identify all PME messages that they issue on behalfof downstream legacy <strong>PCI</strong> functions as coming from the <strong>PCI</strong> bus segment where the PME#originated.A design consideration that must be comprehended is the potential for lost PMEindications. This particular issue is unique to <strong>PCI</strong> <strong>Express</strong>-to-<strong>PCI</strong> Bridges where the levelsensitive PME# signal is transformed into what is effectively an edge triggered PMEmessaging scheme and manifests itself in a race condition. The corner case corresponds tothe situation where one of the legacy <strong>PCI</strong> components asserts PME# (which now must beinput into the bridge, and not routed around it as in <strong>PCI</strong>-PM PME# routing). Followingthis the <strong>PCI</strong> <strong>Express</strong>-to-<strong>PCI</strong> bridge injects a PME message on behalf of the legacy agent. Ifanother Legacy PME# assertion occurs (on the same PME# input to the bridge) before theoriginal PME# service routing has cleared the PME_Status bit of the original PME#initiator, then given the wire’or nature of the <strong>PCI</strong>-PM PME#, PME# input at the bridge willremain asserted following the clearing of the first agent’s PME_Status bit.The net result is that the first PM_PME was serviced successfully, (single PM_PME messagepropagated upstream facilitated this), however the second PM_PME was lost. In order toavoid loss of PM_PMEs in the conversion of the level-triggered <strong>PCI</strong> PME to the edgetriggered <strong>PCI</strong> <strong>Express</strong> PM_PME message, the <strong>PCI</strong> PME signal must be periodically polledand a <strong>PCI</strong> <strong>Express</strong> PM_PME message must be generated if the <strong>PCI</strong> PME is sensed asserted.While the above scheme introduces the possibility of spurious PM_PMEs, these are deemedbenign and would be ignored by the operating system.It is the responsibility of the <strong>PCI</strong> <strong>Express</strong>-to-<strong>PCI</strong> Bridge to engage in the PME fenceprotocol on behalf of its downstream <strong>PCI</strong> devices. The PME_Turn_Off message willterminate at the <strong>PCI</strong> <strong>Express</strong>-to-<strong>PCI</strong> Bridge, and will not be communicated to thedownstream <strong>PCI</strong> devices. The <strong>PCI</strong> <strong>Express</strong>-to-<strong>PCI</strong> Bridge will not issue a PM_PMEmessage on behalf of a downstream <strong>PCI</strong> device while its upstream Link is in the L2 noncommunicatingstate.315


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.06.4. Native <strong>PCI</strong> <strong>Express</strong> Power ManagementMechanismsThe following sections define power management features that require new software. Whilethe presence of these features in new <strong>PCI</strong> <strong>Express</strong> designs will not break legacy softwarecompatibility, taking the full advantage of them requires new code to manage them.These features are enumerated and configured using <strong>PCI</strong> <strong>Express</strong> native configurationmechanisms as described in Chapter 5 of this specification. Refer to Chapter 5 for specificregister locations, bit assignments, and access mechanisms associated with these <strong>PCI</strong><strong>Express</strong>-PM features.6.4.1. Active-State Power ManagementAll <strong>PCI</strong> <strong>Express</strong> components are required to support the minimum requirements definedherein for Active State Link PM. This feature must be treated as being orthogonal to the<strong>PCI</strong>-PM Software compatible features from a minimum requirements perspective. Forexample, the Root Complex is exempt from the <strong>PCI</strong>-PM Software compatible featuresrequirements, however they must implement Active State Link PM’s minimum requirements.Components in the D0 state (i.e., fully active state) normally keep their Upstream Link in theactive L0 state, as defined in Section 6.3.2. Active-state Link power management defines aprotocol for components in the D0 state to reduce Link power by placing their UpstreamLinks into a low power state and instructing the other end of the Link to do likewise. Thiscapability allows hardware-autonomous, dynamic Link power reduction beyond what isachievable by software-only controlled (i.e., <strong>PCI</strong>-PM Software driven) power management.Two low power “standby” Link states are defined for Active State Link Power Management.The L0s low power Link state is optimized for short entry and exit latencies, while providingsubstantial power savings. If the L0s state is enabled in a device, it is required to bring anytransmit Link into L0s state whenever that Link is not in use (refer to Section 6.4.1.1.1 fordetails relating to the L0s invocation policy). All <strong>PCI</strong> <strong>Express</strong> components must support theL0s Link state from within the D0 device state.The L1 Link state is optimized for maximum power savings at a cost of longer entry and exitlatencies. L1 reduces Link power beyond the L0s state for cases where very low power isrequired and longer transition times are acceptable. Active State Link PM support for the L1Link state is optional.Each <strong>PCI</strong> <strong>Express</strong> component must report its level of support for Active State Link PowerManagement in the Active State Link PM Support configuration field.Each <strong>PCI</strong> <strong>Express</strong> component shall also report its L0s and L1 exit latency (the time that theyrequire to transition from the L0s or L1 state to the L0 state). Endpoints must also reportthe worst-case latency that they can withstand before risking, for example, internal fifooverruns due to the transition latency from L0s or L1 to the L0 state. Power managementsoftware can use the provided information to then enable the appropriate level of ActiveState Link Power Management.316


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0The L0s exit latency may differ significantly if the reference clock for opposing sides of agiven Link is provided from the same source, or delivered to each component from adifferent source. <strong>PCI</strong> <strong>Express</strong>-PM software informs each <strong>PCI</strong> <strong>Express</strong> device of its clockconfiguration via the “common clock configuration” bit in their <strong>PCI</strong> <strong>Express</strong> CapabilityStructure’s Link Control Register. This bit serves as the determining factor in the L0s exitlatency value reported by the device. All <strong>PCI</strong> <strong>Express</strong> devices power on with Active StateLink Power Management turned off by default. Software can enable active state Link powermanagement using a process described in Section 6.4.1.3.1.Power management software enables (or disables) Active State Link Power Management ineach Port of a component by programming the Active State Link PM Control field. Notethat new BIOS code can effectively enable or disable Active State Link PM functionalityeven when running with a legacy operating system.Implementation Note: Isochronous Traffic and Active State Link PowerManagementIsochronous traffic requires bounded service latency. Active State Link Power Managementmay add latency to isochronous transactions beyond expected limits. A possible solutionwould be to disable Active State Link Power Management for devices that are configuredwith an Isochronous Virtual Channel.Multi-function endpoints may be programmed with different values in their respectiveActive_PM_En registers of each function. The policy for such a component will be dictatedby the most active common denominator among all D0 functions according to the followingrules:• Functions in non-D0 state (D1 and deeper) are ignored in determining the Active StateLink Power Management policy• If any of the D0 functions has its Active State Power Link Management disabled, (ActiveState Link PM Control field = 00b), then Active State Link Power Management isdisabled for the entire component.• Else, if at least one of the D0 functions is enabled for L0s only (Active State Link PMControl field = 01b), then Active State Link Power Management is enabled for L0s only• Else, Active State Link Power Management is enabled for both L0s and L1 statesNote that the components must be capable of changing their behavior during runtime asdevices enter and exit low power device states. For example, if one function within a multifunctioncomponent is programmed to disable Active State Link Power Management, thenActive State Link Power Management will be disabled for that component while thatfunction is in the D0 state. Once the function transitions to a non-D0 state, Active StatePower Management will be enabled to at least the L0s state if all other functions are enabledfor Active State Link PM.317


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.06.4.1.1. L0s Active State Link Power Management StateAll <strong>PCI</strong> <strong>Express</strong> devices must support the L0s low power Link state. All components powerup to a default state where Active State Link PM is disabled.Transaction Layer and Link Layer timers are not affected by a transition to the L0s state (i.e.,they must follow the rules as defined in their respective chapters).Implementation Note: Minimizing L0s Exit LatencyL0s exit latency depends mainly on the ability of the receiver to quickly acquire bit andsymbol synchronization. Different approaches exist for high-frequency clocking solutionwhich may differ significantly in their L0s exit latency, and therefore in the efficiency ofActive State Link Power Management. To achieve maximum power savings efficiency withActive State Power Link Management, L0s exit latency should be kept low by properselection of the clocking solution.6.4.1.1.1. Entry to L0s StateEntry into the L0s state is managed separately for each direction of the Link. It is theresponsibility of each device at either end of the Link to initiate an entry into the L0s stateon its transmitting Lanes.A Port that is disabled for the L0s state must not transition its transmitting Lanes to the L0sstate. It must still however be able to tolerate having its receiver Port Lanes entering L0s, (asa result of the device at the other end bringing its transmitting Lanes into L0s state), andthen later returning to the L0 state.L0s Invocation Policy<strong>PCI</strong> <strong>Express</strong> ports that are enabled for L0s entry must transition their transmit Lanes to theL0s state if the defined idle conditions are met for a specified period of time. The port maychoose this period of time to be anywhere within the range of:(port’s reported L0s exit latency)/4 ≤ t ≤ port’s reported L0s exit latencyDefining the invocation time as a range enables the tuning of ASPM behavior, balancingpower savings with performance.Definition of IdleThe definition of “idle” varies with device categoryAn Endpoint Port or Root Complex Root Port is determined to be idle if the followingconditions are met:• No TLP is pending to transmit over the Link, or no FC credits are available totransmit anything• No ACK, NAK, or ACK Timeout DLLPs are pending for transmission318


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0A Switch’s upstream Port is determined to be idle if the following conditions are met:• All of the Switch’s Downstream Port receive Lanes are in the L0s state• No pending TLPs to transmit, or no FC credits are available to transmit anything• No ACK, NAK, or ACK Timeout DLLPs are pending for transmissionA Switch’s downstream Port is determined to be idle if the following conditions are met:• The Switch’s upstream Port’s receive Lanes are in the L0s state• No pending TLPs to transmit on this Link, or no FC credits are available• No ACK, NAK, or ACK Timeout DLLPs are pending for transmissionSee Section 4.2 for details on L0s entry by the Physical Layer.6.4.1.1.2. Exit from L0s StateComponents from either end of a <strong>PCI</strong> <strong>Express</strong> Link may initiate an exit from the L0s lowpower Link state.Note that a transition from the L0s Link state should never depend on the status (oravailability) of FC credits. The Link must be able to reach the Link Active state, and toexchange FC credits across the Link. For example, if all credits of some type wereconsumed when the Link entered L0s, then any component on either side of the Link muststill be able to transition the Link to the L0 state where new credits can be sent across theLink.Downstream Initiated ExitAn Endpoint or Switch is permitted to initiate an exit from the L0s low power state on itstransmit Link, (Upstream Port transmit Lanes in the case of a downstream Switch), if itneeds to communicate through the Link. The component initiates a transition to the L0state on Lanes in the upstream direction as described in Section 4.2.If the Upstream component is a Switch (i.e., it is not the Root Complex), then it must initiatea transition on its Upstream Port transmit Lanes (if the Upstream Port’s transmit Lanes arein a low power state) as soon as it detects an exit from L0s on any of its downstream ports.Upstream Initiated ExitThe Root Complex or Switch (Downstream Port) is permitted to initiate an exit from L0slow power state on any of its transmit Links if it needs to communicate through the Link.The component initiates a transition to the L0 state on Lanes in the downstream direction asdescribed in Chapter 4.If the Downstream component is a Switch (i.e., it is not an Endpoint), it must initiate atransition on all of its Downstream Port transmit Lanes that are in L0s at that time as soonas it detects an exit from L0s on its Upstream Port. Links that are already in the L0 state donot participate in the exit transition. Links whose downstream component is in a low powerstate (i.e., D1-D3 hot states) are also not affected by the exit transitions.319


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0For example, consider a Switch with an upstream Port in L0s and a downstream device in aD1 state. A configuration request packet travels downstream to the Switch, intending toultimately reprogram the downstream device from D1 to D0. The Switch’s upstream PortLink will transition to the L0 state to allow the packet to reach the Switch. The downstreamLink connecting to the device in D1 state will not transition to the L0 state yet; it will remainin the L1 state. The captured packet is checked and routed to the downstream Port thatshares a Link with the downstream device that is in D1. As described in Section 4.2, theSwitch now transitions the downstream Link to the L0 state. Note that the transition to theL0 state was triggered by the packet being routed to that particular downstream L1 Link, andnot by the transition of the upstream Port’s Link into the L0 state. If the packet’sdestination was targeting a different downstream Link, then that particular downstream Linkwould have remained in the L1 state.6.4.1.2. L1 Active State Link Power Management StateA component may optionally support the Active State Link PM L1 state; a state thatprovides greater power savings at the expense of longer exit latency. L1 exit latency is visibleto software, and reported via the configuration status register defined in Section 5.6.When supported, L1 entry is disabled by default in the Active State Link PM Controlconfiguration field.6.4.1.2.1. Entry to L1 StateAn Endpoint enabled for L1 Active State Link PM entry may initiate entry into the L1 Linkstate.Implementation Note: Initiating L1This specification does not dictate when an Endpoint must initiate a transition to the L1state on its transmit Lanes. The interoperable mechanisms for transitioning into and out ofL1 are defined within this specification, however the specific Active State Link PM policygoverning when to transition into L1 is left to the implementer.One possible approach would be for the downstream device to initiate a transition to the L1state once the Link has been in the L0s state for a set amount of time.320


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0Three power management messages provide support for Active State Link PowerManagement of the L1 state:• PM_ Active_State_Request_L1 (DLLP)• PM_ Request_ACK (DLLP)• PM_Active_State_Nak (TLP)Endpoints that have their Active State Link PM L1 entry enabled negotiate for the resultantL-state with the component on the upstream end of the Link. If the endpoint receives anegative acknowledgement in response to its issuance of a PM_ Active_State_Request_L1DLLP, then the endpoint must enter the L0s state as soon as possible 36 . Note that thecomponent on the upstream side of the Link may not support L1 Active State Link PM, or itmay be disabled and so could legitimately respond to the L1 entry request with a negativeacknowledgement.A Root Complex Root Port, or Switch Downstream Port must accept a request to enter alow power L1 state if all of the following conditions are true:• The Port supports Active State Link PM L1 entry, and Active State Link PM L1entry is enabled.• No TLP is scheduled for transmission• No Ack or Nak DLLP is scheduled for transmissionA Switch Upstream Port may request L1 entry on its Link provided all of the followingconditions are true for an implementation specific set amount of time:• The Upstream Port supports Active State Link PM L1 entry and it is enabled• All of the Switch’s Downstream Port Links are in the L1 state (or deeper)• No pending TLPs to transmit• No pending ACK, NAK, or ACK Timeout DLLPs to transmit• The Upstream Port’s receive Lanes are idleIf the Switch’s upstream Port receives a negative acknowledgement in response to itsissuance of a PM_Active_State_Request_L1 DLLP, then the Switch’s upstream Porttransmit Lanes must instead transition to the L0s state as soon as possible 37 .Note that it is legitimate for a Switch to be enabled for the Active State Link PM L1 Linkstate on any of its downstream ports and to be disabled or not even supportive of ActiveState Link PM L1 on its upstream Port. In that case, downstream ports may enter the L1Link state, but the Switch will never initiate an Active State Link PM L1 entry transition onits upstream Port.36 Assuming that the conditions for L0s entry are met.37 Assuming that the conditions for L0s entry are met.321


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0Active State Link PM L1 Negotiation Rules (see Figure 6-5 and Figure 6-6)• Upon deciding to enter a low power Link state, the downstream component must blockscheduling of any TLPs (including completion packets).• The downstream component must wait until it receives a Link layer acknowledgementfor the last TLP it had previously sent. The component may retransmit a TLP ifrequired by the Link layer rules.• The downstream component must also wait until it accumulates at least the minimumnumber of credits required to send the largest possible packet for any FC type. Note thatthis is required so that the component can immediately issue a TLP after it exists the L1state.• The downstream component then initiates Active State Link PM negotiation by sendinga PM_Active_State_Request_L1 DLLP onto its transmit Lanes. The downstreamcomponent sends this DLLP continuously until it receives a response from the upstreamdevice (see below). The downstream component remains in this loop waiting for aresponse from the Upstream Agent.ooDuring this waiting period, the downstream component must not initiate anyTransaction Layer transfers. It must still accept TLPs and DLLPs from theupstream component. It also responds with DLLPs as needed by the Link layerprotocol.If the Downstream component for any reason needs to initiate a transfer on theLink, it must first complete the transition to the low power Link state. Once in alower power Link state, the downstream component is then permitted to exit thelow power Link state to handle the transfer.• The Upstream component must immediately respond to the request with either anacceptance or a rejection of the request.Rules in case of rejection:• In the case of a rejection, the upstream component must schedule, as soon as possible, arejection (NAK) by sending the PM_Active_State_Nak Message to the downstreamrequesting agent. Once the PM_Active_State_Nak Message is sent, the upstreamcomponent is permitted to initiate any TLP or DLLP transfers.• If the request was rejected, the downstream component must immediately transition itstransmit Lanes into the L0s state, provided that conditions for L0s entry are met.Rules in case of acceptance:• If the upstream agent is ready to accept the request, it must block scheduling of anyTLPs.• The upstream component then must wait until it receives a Link layer acknowledgementfor the last TLP it had previously sent. The upstream component may retransmit a TLPif required by the Link layer rules.322


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0• The upstream component must also wait until it accumulates at least the minimumnumber of credits required to send the largest possible packet for any FC type. Notethat this is required so that the component can immediately issue a TLP after it exists theL1 state.• Once all TLPs have been acknowledged and enough FC credits accumulated, theupstream component sends a PM_Request_Ack DLLP downstream. The upstreamcomponent sends this DLLP continuously until it observes its receive Lanes enter intothe electrical idle state. See Chapter 4 for more details on the physical layer behavior.• If the Upstream component needs, for any reason, to initiate a transfer on the Link afterit sends a PM_Request_Ack DLLP, it must first complete the transition to the lowpower state. It is then permitted to exit the low power state to handle the transfer oncethe Link is back to L0.• When the downstream component detects a PM_Request_Ack DLLP on its receiveLanes (signaling that the upstream device acknowledged the transition to L1 request), thedownstream component then ceases sending the PM_Active_State_Request_L1 DLLP,disables its Link layer and brings its transmit Lanes into the electrical idle state.• When the upstream component detects an electrical idle on its receive Lanes (signalingthat the downstream component has entered the L1 state), it then ceases to send thePM_Request_Ack DLLP, disables its Link layer and brings the downstream direction ofthe Link into the electrical idle state.Notes:1. The transaction layer Completion Timeout mechanism is not affected by transition tothe L1 state (i.e., it must keep counting).2. Flow Control Update timers are frozen while the Link is in L1 state to prevent a timerexpiration that will unnecessarily transition the Link back to the L0 state.323


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0Upstream ComponentUpstream componentin active stateT D P P D TDownstream ComponentDownstream componentwishes to enter L1 stateDownstream component blocksscheduling of new TLPs12Downstream component receivedacknowledgment for last TLP3PM_Active_State_Request_L1DLLP45PM_Active_State_NAKreject MessageLegend:T - TransactionD - Data LinkP - PhysicalActiveInactivetimeEnters L0s state6Figure 6-5: L1 Transition Sequence Ending with a RejectionOM13823Legend:T - TransactionD - Data LinkP - PhysicalUpstream ComponentUpstream componentlayers in active stateActiveInactiveT D P P D TDownstream ComponentDownstream componentwishes to enter L1 stateDownstream componentblocks scheduling of new TLPsDownstream component receivesacknowledgment for last TLPPM_Active_State_Request_L1DLLPs12345Upstream component blocksscheduling of new TLPs6Upstream component receivesacknowledgment for last TLP7Upstream component sendsPM_Request_Ack DLLPsDownstream componenttransitions upstreamdirection to electrical idle89Upstream componenttransitions downstreamto electrical idletimeFigure 6-6: L1 Successful Transition SequenceOM13824324


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0L1 Exit latencyfor this port is8 µs32Link 188Link 23232RootSwitch ASwitch BLink 38Endpoint CFigure 6-7: Example of L1 Exit Latency ComputationSwitches are not required to initiate an L1 exit transition on other of its Downstream PortLinks.Upstream Initiated ExitA Root Complex, or a Switch is permitted to initiate an exit from L1 on any of its RootPorts, or Downstream Port Links if it needs to communicate through that Link. Thecomponent initiates a transition to the L0 state as described in Chapter 4. The Downstreamcomponent must respond by initiating a similar transition on its transmit Lanes.If the Downstream component is a Switch (i.e., it is not an Endpoint), it must initiate atransition on all of its Downstream Links (assuming the Downstream Link is in an ActiveState Link Power Management L1 state) as soon as it detects an exit from L1 state on itsupstream Port Link. Since L1 exit latencies are relatively long, a Switch must not wait untilits Upstream Port Link had fully exited to L0 before initiating an L1 exit transition on itsDownstream Port Links. If that were the case, a message traveling though multiple <strong>PCI</strong><strong>Express</strong> switches would experience accumulating latency as it traverses each Switch.A Switch is required to initiate a transition from L1 state on all of its Downstream PortLinks that are currently in L1 after no more than 1 µs from the beginning of a transitionfrom L1 state on its Upstream Port. Refer to Section 4.2 for details of the Physical Layersignaling during L1 exit. Downstream Port Links that are already in the L0 state do notparticipate in the exit transition. Downstream Port Links whose downstream component is326


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0in a low power D-state (D1-D3hot) are also not affected by the L1 exit transitions (i.e., suchLinks must not be transitioned to the L0 state).6.4.1.3. Active State Link PM ConfigurationAll <strong>PCI</strong> <strong>Express</strong> functions must implement the following configuration bits in support ofActive State Link PM. Refer to Chapter 5 for configuration register assignment and accessmechanisms.Each <strong>PCI</strong> <strong>Express</strong> component reports its level of support for Active State Link PowerManagement in the Active State Link PM Support configuration field below. All <strong>PCI</strong><strong>Express</strong> components must support transition to the L0s Link state. Support for transition tothe L1 Link state while in D0 active state is optional.Table 6-3: Encoding of the Active State Link PM Support FieldFieldRead/WriteDefaultValueDescriptionActive State Link PMSupportROmust be01bor11b00b – Reserved01b – L0s supported10b – Reserved11b – L0s and L1 supportedEach <strong>PCI</strong> <strong>Express</strong> component reports the source of its reference clock in its “Slot ClockConfiguration bit” located in its <strong>PCI</strong> <strong>Express</strong> Capability Structure’s Link Status Register.Table 6-4: Description of the Slot Clock Configuration FieldFieldSlot ClockConfigurationRead/WriteDefaultValueDescriptionRO HWInit This bit indicates that the componentuses the same physical reference clockthat the platform provides on theconnector. If the device uses anindependent clock irrespective of thepresence of a reference on theconnector, this bit must be clear. Forroot and switch downstream ports, thisbit when set, indicates that thedownstream port is using the samereference clock as the downstreamdevice or the slot. For switch and bridgeupstream ports, this bit when set,indicates that the upstream port is usingthe same reference clock that theplatform provides. Otherwise it is clear.Each <strong>PCI</strong> <strong>Express</strong> component must support the Common Clock Configuration bit in their<strong>PCI</strong> <strong>Express</strong> Capability Structure’s Link Command Register. Software writes to this register327


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0bit to indicate to the device whether it is sharing the same clock source as the device on theother end of the Link.Table 6-5: Description of the Common Clock Configuration FieldFieldCommon ClockConfigurationRead/WriteDefaultValueDescriptionRW 0 This bit when set indicates that thiscomponent and the component at theopposite end of the Link are operatingwithacommonclocksource. Avalueof0 indicates that this component and thecomponent at the opposite end of theLink are operating with separatereference clock sources. Default valueof this field is 0. Components utilize thiscommon clock configuration informationto report the correct L0s and L1 ExitLatencies.Each <strong>PCI</strong> <strong>Express</strong> component reports the L0s and L1 exit latency (the time that they requireto transition their transmit Lanes from the L0s or L1 state to the L0 state) in the L0s ExitLatency and the L1 Exit Latency configuration fields, respectively.Table 6-6: Encoding of the L0s Exit Latency FieldFieldRead/WriteDefaultValueDescriptionL0s Exit Latency RO N/A 000b – Less than 64 ns001b – 64 ns-128 ns010b – 128 ns-256 ns011b – 256 ns-512 ns100b – 512 ns-1 µs101b – 1 µs-2 µs110b – 2 µs-4 µs111b – Reserved328


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0Table 6-7: Encoding of the L1 Exit Latency FieldFieldRead/WriteDefaultValueDescriptionL1 Exit Latency RO N/A 000b – Less than 1 µs001b – 1 µs-2 µs010b – 2 µs-4 µs011b – 4 µs-8 µs100b – 8 µs-16 µs101b – 16 µs-32 µs110b – 32 µs-64 µs111b – L1 transition not supportedEndpoints also report the additional latency that they can absorb due to the transition fromL0s state or L1 state to the L0 state. This is reported in the Endpoint L0s AcceptableLatency and Endpoint L1 Acceptable Latency fields, respectively.Power management software, using the latency information reported by all components inthe <strong>PCI</strong> <strong>Express</strong> Hierarchy, can enable the appropriate level of Active State Link PowerManagement by comparing exit latency for each given path from root to Endpoint againstthe acceptable latency that each corresponding Endpoint can withstand.Table 6-8: Encoding of the Endpoint L0s Acceptable Latency FieldFieldEndpoint L0sAcceptable LatencyRead/WriteDefaultValueDescriptionRO N/A 000b – Less than 64 ns001b – 64 ns-128 ns010b – 128 ns-256 ns011b – 256 ns-512 ns100b – 512 ns-1 µs101b – 1 µs-2 µs110b – 2 µs-4 µs111b – More than 4 µs329


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0Table 6-9: Encoding of the Endpoint L1 Acceptable Latency FieldFieldEndpoint L1Acceptable LatencyRead/WriteDefaultValueDescriptionRO N/A 000b – Less than 1 µs001b – 1 µs-2 µs010b – 2 µs-4 µs011b – 4 µs-8 µs100b – 8 µs-16 µs101b – 16 µs-32 µs110b – 32 µs-64 µs111b – More than 64 µsPower management software enables (or disables) Active State Link Power Management ineach component by programming the Active State Link PM Control field.Table 6-10: Encoding of the Active State Link PM Control FieldFieldActive State Link PMControlRead/WriteDefaultValueDescriptionR/W 00b 00b – Disabled01b – L0s Entry Enabled10b – Reserved11b – L0s and L1 Entry enabledActive State Link PM Control = 00Port must not bring a Link into L0s state.Ports connected to the Downstream end of the Link must not issue aPM_Active_State_Request_L1 DLLP on its Upstream Link.Ports connected to the Upstream end of the Link receiving L1 request must respond withnegative acknowledgement.Active State Link PM Control = 01Port must bring a Link into L0s state if all conditions are met.Ports connected to the Downstream end of the Link must not issue aPM_Active_State_Request_L1 DLLP on its Upstream Link.Ports connected to the Upstream end of the Link receiving L1 request must respond withnegative acknowledgement.330


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0Active State Link PM Control = 11Port must bring a Link into L0s state if all conditions are met.Ports connected to the Downstream end of the Link may issuePM_Active_State_Request_L1 DLLPs.Ports connected to the Upstream end of the Link must respond with positiveacknowledgement to L1 request and transition into L1 if conditions for Root Complex RootPort or Switch downstream Port in Section 6.4.1.2.1 are met.6.4.1.3.1. Software Flow for Enabling Active State Link PowerManagementFollowing is an example software algorithm that highlights how to enable active state Linkpower management in a <strong>PCI</strong> <strong>Express</strong> component.• <strong>PCI</strong> <strong>Express</strong> components power up with active state Link power management disabled• <strong>PCI</strong> <strong>Express</strong> components power up with an appropriate value in their “Slot ClockConfiguration” bit. The method by which they initialize this bit is device-specific• <strong>PCI</strong> <strong>Express</strong> system software scans the “Slot Clock Configuration” bit in thecomponents on both ends of each Link to determine if both are using the samereference clock source or reference clocks from separate sources. If the “Slot ClockConfiguration” bits in both devices are set, then they are both using the same referenceclock source, otherwise not• <strong>PCI</strong> <strong>Express</strong> software updates the “Common Clock Configuration” bits in thecomponents on both ends of each Link to indicate if those devices share the samereference clock• Devices must reflect the appropriate L0s/L1 exit latency in their “L0s/L1 exit latencyregister bits,” per the setting of the "Common Clock Configuration" bit• <strong>PCI</strong> <strong>Express</strong> system software then reads and adds up the L0s/L1 exit latency numbersfrom all components on a given <strong>PCI</strong> <strong>Express</strong> hierarchy reaching up to each endpointcomponent• For each endpoint component, <strong>PCI</strong> <strong>Express</strong> system software examines the “EndpointL0s/L1 Acceptable Latency,” as reported by the endpoint component in their LinkCapabilities register, and enables (or leaves disabled) L0s/L1 entry (via the Active StateLin PM Control bits in the Link Control register) accordingly in some or all of theintervening device ports on that hierarchy331


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.06.5. Auxiliary Power Support6.5.1. Auxiliary Power EnablingThe <strong>PCI</strong>-PM specification requires that a function must support PME generation in order toconsume the maximum allowance of auxiliary current (375 mA vs. 20 mA). However, thereare instances where functions need to consume power even if they are "PME Disabled," orPME incapable by design. One example is a component with its system management modeactive during a system low power state.<strong>PCI</strong> <strong>Express</strong> PM provides a new control bit, “Aux_En,” that provides the means forenabling a function to draw the maximum allowance of auxiliary current independent of itslevel of support for PME generation.A <strong>PCI</strong> <strong>Express</strong> function requests aux power allocation by specifying a non-zero value in theAux_Current field of the Power Management Capabilities Register (PMC). Refer toChapter 5 for the Aux_En register bit assignment, and access mechanism.Legacy <strong>PCI</strong>-PM software is unaware of this new bit and will only be able to enable auxcurrent to a given function based on the function’s reported PME support, the Aux_Currentfield value and the function’s PME_Enable bit.Allocation of aux power using Aux_En is determined as follows:Aux_En=1b:Aux power is allocated as requested in the Aux_Current field of the Power ManagementCapabilities Register (PMC), independent of the PME_En bit in the Power ManagementControl/Status Register (PMCSR). The PME_En bit still controls the ability to masterPME.Aux_En = 0b:Aux power allocation is controlled by the PME_En bit as defined in the <strong>PCI</strong>-PMspecification.The Aux_En bit is sticky meaning that its state is not affected by transitions from the D3 coldto the D0 Uninitilaized state.332


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.06.6. Power Management System Messages andDLLPsTable 6-11 defines the location of each PM packet in the <strong>PCI</strong> <strong>Express</strong> stack.Table 6-11: Power Management System Messages and DLLPsPacketPM_Enter_L1PM_Enter_L23PM_Active_State_Request_L1PM_Request_AckPM_Active_State_NakPM_PMEPME_Turn_OffPME_TO_AckTypeDLLPDLLPDLLPDLLPTransaction Layer messageTransaction Layer messageTransaction Layer messageTransaction Layer message6.6.1. Power Management System MessagesPower management messages follow the general rules for <strong>PCI</strong> <strong>Express</strong> system messages.Message fields follow the following rules:• Length Field is reserved.• Attribute Field must be set to the default values (all 0’s).• Address Filed is reserved.• Requester IDo PM_PME message• Endpoints report their upstream Link bus number and the device andfunction number where the PME originated.• <strong>PCI</strong> <strong>Express</strong> to <strong>PCI</strong> Bridges - When the PME comes from a legacy agenton a <strong>PCI</strong> bus downstream, then the PM_PME Message requester IDreports the legacy bus number where the PME originated from, and thedevice and function number reported must both be zero. When the <strong>PCI</strong><strong>Express</strong>-<strong>PCI</strong> bridge initiates an internal PME message (e.g., at time whenhot plug event comes in and the SHPC is in non-D0 state), the requesterID is the bus number associated with that function. The device # andfunction # are whatever internal function needs to be awakened, e.g., theSHPC function in this example. In the example depicted in Figure 6-8, aPME is generated by the SHPC function. The requester ID in thePM_PME message contains bus = 7, device = 1, function = 1.333


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0oAll other messages report their upstream Link bus number, and device andfunction number must both be zero.• Virtual Channel Field must use the default virtual channel (VC0)Bus 63GIO-<strong>PCI</strong>Legacy BridgeP2PbridgeDev(0)Function 0: p2p bridge(upstream port)Bus 7Function 0: p2p bridge(downstream port)P2PbridgeDev(0)Dev(1)P2PbridgeFunction 0: p2p bridge(downstream port)Function 1: SHPCBus8 Wire’orPME#signalBus 8 Bus 9Bus9 Wire’orPME#signalFigure 6-8: Example of PME Message Addressing in a <strong>PCI</strong> <strong>Express</strong>-to-<strong>PCI</strong> Bridge6.6.2. Power Management DLLPsFor information on the structure of the power management DLLPs, refer to Section 3.4.334


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.077. <strong>PCI</strong> <strong>Express</strong> System ArchitectureThis chapter addresses various aspects of <strong>PCI</strong> <strong>Express</strong> interconnect architecture in aplatform context. It covers the details of interrupt support, error signaling and logging, VCs,isochronous support, Hot Plug, and Lock.7.1. Interrupt SupportThe <strong>PCI</strong> <strong>Express</strong> interrupt model supports two mechanisms:• INTx emulation• Message Signaled Interrupt (MSI) Support.For legacy compatibility, <strong>PCI</strong> <strong>Express</strong> provides a <strong>PCI</strong> INTx emulation mechanism to signalinterrupts to the system interrupt controller (typically part of the system core-logic). Thismechanism is compatible with existing <strong>PCI</strong> software, and provides the same level and typeof service as corresponding <strong>PCI</strong> interrupt signaling mechanism and is independent of systeminterrupt controller specifics. This legacy compatibility mechanism allows boot devicesupport without requiring complex BIOS-level interrupt configuration/control servicestacks. It virtualizes <strong>PCI</strong> physical interrupt signals by using an in-band signaling mechanism.In addition to <strong>PCI</strong> INTx compatible interrupt emulation, <strong>PCI</strong> <strong>Express</strong> requires support ofMessage Signaled Interrupt (MSI) mechanism. The <strong>PCI</strong> <strong>Express</strong> MSI mechanism iscompatible with the MSI capability defined in the <strong>PCI</strong> 2.3 <strong>Specification</strong>.7.1.1. Rationale for <strong>PCI</strong> <strong>Express</strong> Interrupt Model<strong>PCI</strong> <strong>Express</strong> takes an evolutionary approach from <strong>PCI</strong> with respect to interrupt support.As required for <strong>PCI</strong>/<strong>PCI</strong>-X interrupt mechanisms, each device is required to differentiatebetween INTx (legacy) and MSI (native) modes of operation. The <strong>PCI</strong> <strong>Express</strong> devicecomplexity required to support both schemes is no different than that for <strong>PCI</strong>/<strong>PCI</strong>-Xdevices today. The advantages of this approach include:• Compatibility with existing <strong>PCI</strong> software models• Direct support for boot devices• Easier End of Life (EOL) for INTx legacy mechanisms.Existing software model is used to differentiate legacy (INTx) vs. MSI modes of operation;thus, no special software support is required for <strong>PCI</strong> <strong>Express</strong>.335


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.07.1.2. <strong>PCI</strong> Compatible INTx Emulation<strong>PCI</strong> <strong>Express</strong> supports the <strong>PCI</strong> interrupts as defined in the <strong>PCI</strong> <strong>Specification</strong>, rev. 2.3including the Interrupt Pin and Interrupt Line registers of the <strong>PCI</strong> configuration space for<strong>PCI</strong> devices. <strong>PCI</strong> <strong>Express</strong> devices support these registers for backwards compatibility;however, interrupts are asserted using in-band messages in the form of Transaction LayerPackets (TLPs) rather than asserting physical pins.<strong>PCI</strong> <strong>Express</strong> defines two message Transactions, Assert_INTx and Deassert_INTx, foremulation of <strong>PCI</strong> INTx signaling, where x is A, B, C, and D for respective <strong>PCI</strong> interruptsignals. These messages are routed to the Root Complex where the Requester IDinformation (included in all requestor packets) enables flexibility in mapping deviceinterrupts to the system interrupt controller. <strong>PCI</strong> <strong>Express</strong> devices must use assert/de-assertmessages in pairs to emulate <strong>PCI</strong> interrupt level-triggered signaling. Actual mapping of <strong>PCI</strong><strong>Express</strong> INTx emulation to system interrupts is implementation specific as is mapping ofphysical interrupt signals in <strong>PCI</strong> today.The legacy INTx emulation mechanism may be depreciated in a future version of thisspecification.7.1.3. INTx Emulation Software ModelThe software model for legacy INTx emulation matches that of <strong>PCI</strong>. The system BIOSreporting of chipset/platform interrupt mapping and the association of a device’s interruptwith <strong>PCI</strong> interrupt lines is handled in exactly the same manner as with previous <strong>PCI</strong> systems.Legacy software reads from the device’s Interrupt Pin register to determine if the device isinterrupt driven. A value between 01 and 04 indicates that the device uses interrupt pin togenerate an interrupt.Note that similarly to physical interrupt signals, the INTx emulation mechanism maypotentially cause spurious interrupts that must be handled by the system software.7.1.4. Message Signaled Interrupt (MSI) SupportThe Message Signaled Interrupt (MSI) capability is defined in the <strong>PCI</strong> 2.3 <strong>Specification</strong>.MSI interrupt support, which is optional for <strong>PCI</strong> 2.3 devices, is required for <strong>PCI</strong> <strong>Express</strong>devices. MSI-capable devices deliver interrupts by performing memory write transactions.MSI is an edge-triggered interrupt; interrupt sharing is prohibited when using MSI.Neither the <strong>PCI</strong> 2.3 <strong>Specification</strong> nor this specification support level-triggered MSIinterrupts.Note that, unlike INTx emulation messages, MSIs are not restricted to TC0.336


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.07.1.5. MSI Software ModelIt is the system software’s responsibility to ensure that multiple MSI-capable devices cannotgenerate the same interrupt message.It is implementation specific whether an interrupt message is accepted or potentially lostwhen an interrupt with the same interrupt message vector is already in service. Additionalinterrupt messages with same vector may be potentially lost depending on the specifics ofthe core-logic (i.e., chipset) and system interrupt controller implementation.MSI-capable devices that require servicing of every interrupt message must not generatemultiple outstanding interrupt messages with the same vector. These devices must notgenerate another interrupt message until the device driver indicates that the previousinterrupt message of the same vector was serviced. The device driver might indicate that aninterrupt is serviced by reading the device’s interrupt status register.For particular usage model it might be acceptable to generate a new MSI message withoutrequiring the device driver to acknowledge the previous interrupt message. A commonexample occurs with timers (e.g., timer interrupts). There is no guarantee that all theinterrupt messages from such a device will be serviced. If all interrupt events must berecognized in a deterministic manner, devices that are source of interrupts must not generatesuccessive MSIs without having an explicit acknowledgement that each MSI has beendetected and serviced. This explicit acknowledgement is typically supported by interrupthandler software reading or writing to a particular internal status or control register ofinterrupting device. Details of this “handshake” mechanism such as using either a read orwrite synchronizing operation and the location and type of address space (Memory, I/O, orConfiguration space) of a control/status register, are implementation specific. Note,however, that for the purpose of supporting synchronization between hardware (interruptsource) and software (interrupt handler), it is recommended to use memory-mapped registerlocations.Certain <strong>PCI</strong> devices and their drivers rely on INTx-type level-triggered interrupt behavior(addressed by the <strong>PCI</strong> <strong>Express</strong> legacy INTx emulation mechanism). These devices and theirdrivers must be redesigned to take advantage of the MSI capability and edge-triggeredinterrupt semantics.7.1.6. PME Support<strong>PCI</strong> <strong>Express</strong> supports power management events from native <strong>PCI</strong> <strong>Express</strong> devices as well asPME-capable <strong>PCI</strong> devices.PME signaling is accomplished using an in-band transaction layer PME message (PM_PME)as described in Chapter 6.337


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.07.1.7. PME Software ModelFrom a software standpoint, PME behaves like an edge-triggered interrupt. This is differentfrom the level-triggered PME mechanism used for <strong>PCI</strong>. However, this does not impactoperating system software compatibility as PME reporting to the operating system isabstracted by the ACPI BIOS. Current ACPI-compatible operating systems support bothedge-triggered and level-triggered modes for PME.To signal PME, a <strong>PCI</strong> <strong>Express</strong> device generates a PME message on the <strong>PCI</strong> <strong>Express</strong> Link.System power management logic that is typically part of the core-logic (chipset), receives thisPME message and asserts a ACPI General Purpose Event (GPE) corresponding to the PMEmessage. ACPI ASL code may utilize the <strong>PCI</strong> <strong>Express</strong> Requestor ID in the PM_PME toinform the operating system which device caused the wake.Note that <strong>PCI</strong> <strong>Express</strong> architecture guarantees that the PME message is delivered reliablysince <strong>PCI</strong> <strong>Express</strong> Messages are communicated using TLPs. For more details on PMEsignaling, see Chapter 6 (Power Management) of this specification.7.1.8. PME Routing Between <strong>PCI</strong> <strong>Express</strong> and <strong>PCI</strong>HierarchiesPME-capable <strong>PCI</strong> devices assert the PME# pin to signal a power management event. Thephysical PME signal from <strong>PCI</strong> devices may either be converted to <strong>PCI</strong> <strong>Express</strong> a in-bandPME message by a <strong>PCI</strong> <strong>Express</strong>-<strong>PCI</strong> bridge or routed directly to a GPE pin on the corelogic chipset. Delivery of PME signaling from <strong>PCI</strong> devices is implementation specific as it isin <strong>PCI</strong>-based systems today. When converting from <strong>PCI</strong> level-triggered PME signaling toedge-triggered <strong>PCI</strong> <strong>Express</strong> PME messages, care must be taken not to lose any PMEs from<strong>PCI</strong> devices. Such a conversion mechanism may also result in spurious PMEs beinggenerated.7.2. Error Signaling and LoggingIn this document, errors which must be checked and errors which may optionally bechecked are identified. Each such error is associated either with the Port or with a specificdevice (or function in a multi-function device), and this association is given along with thedescription of the error. This section will discuss how errors are classified and reported.7.2.1. ScopeThis section explains the error signaling and logging requirements for <strong>PCI</strong> <strong>Express</strong>components. This includes errors which occur on the <strong>PCI</strong> <strong>Express</strong> interface itself and thoseerrors which occur on behalf of transactions initiated on <strong>PCI</strong> <strong>Express</strong>. This section does notfocus on errors which occur within the component that are unrelated to a particular <strong>PCI</strong><strong>Express</strong> transaction. This type of error signaling is better handled through proprietarymethods employing device-specific interrupts.338


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0<strong>PCI</strong> <strong>Express</strong> defines two error reporting paradigms: the baseline capability and theAdvanced Error Reporting capability. The baseline error reporting capabilities are requiredof all <strong>PCI</strong> <strong>Express</strong> devices and define the minimum error reporting requirements. TheAdvanced Error Reporting capability is defined for more robust error reporting and isimplemented with a specific <strong>PCI</strong> <strong>Express</strong> capability structure (refer to Chapter 5 for adefinition of this optional capability). This section explicitly calls out all error handlingdifferences between the baseline and the Advanced Error Reporting capability.All <strong>PCI</strong> <strong>Express</strong> devices support existing, non-<strong>PCI</strong> <strong>Express</strong>-aware, software for errorhandling by mapping <strong>PCI</strong> <strong>Express</strong> errors to existing <strong>PCI</strong> reporting mechanisms, in additionto the <strong>PCI</strong> <strong>Express</strong>-specific mechanisms.7.2.2. Error Classification<strong>PCI</strong> <strong>Express</strong> errors can be classified as two types: Uncorrectable errors and Correctableerrors. This classification separates those errors resulting in functional failure from thoseerrors resulting in degraded performance. Uncorrectable errors can further be classified asFatal or Non-Fatal.ERR_CORERR_UNCERR_FATALPHYPHYDLNKTXNDLNKTXNCorrectable ErrorsFigure 7-1: Error ClassificationUncorrectable ErrorsClassification of error severity as Fatal, Uncorrectable, and Correctable provides theplatform with mechanisms for mapping the error to a suitable handling mechanism. Forexample, the platform might choose to respond to correctable errors with low priority,performance monitoring software. Such software could count the frequency of correctableerrors and provide Link integrity information. On the other hand, a platform designer mightchoose to map fatal errors to a system-wide reset. It is the decision of the platform designerto map these <strong>PCI</strong> <strong>Express</strong> severity levels onto platform level severities.339


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.07.2.2.1. Correctable ErrorsCorrectable errors include those error conditions where the <strong>PCI</strong> <strong>Express</strong> protocol canrecover without any loss of information. Hardware corrects these errors and softwareintervention is not required. For example, an LCRC error in a TLP which is corrected byData Link Level Retry is considered a correctable error. Logging the frequency ofcorrectable errors may be helpful for profiling the integrity of a Link.7.2.2.2. Uncorrectable ErrorsUncorrectable errors are those error conditions that impact functionality of the interface.There is no <strong>PCI</strong> <strong>Express</strong> mechanism defined in this specification to correct these errors.For more robust error handling by the system, <strong>PCI</strong> <strong>Express</strong> further classifies uncorrectableerrors as Fatal and Non-fatal.7.2.2.2.1. Fatal ErrorsFatal errors are uncorrectable error conditions which render the particular <strong>PCI</strong> <strong>Express</strong> Linkand related hardware unreliable. For fatal errors, a reset of the Link may be required toreturn to reliable operation. Platform handling of fatal errors, and any efforts to limit theeffects of these errors, is platform implementation specific.Comparing with <strong>PCI</strong>/<strong>PCI</strong>-X, reporting a fatal error is somewhat analogous to assertingSERR#.7.2.2.2.2. Non-Fatal ErrorsNon-fatal errors are uncorrectable errors which cause a particular transaction to beunreliable but the Link is otherwise fully functional. Isolating non-fatal from fatal errorsprovides system management software the opportunity to recover from the error withoutresetting the Link(s) and disturbing other transactions in progress. Devices not associatedwith the transaction in error are not impacted by the error.7.2.3. Error SignalingThere are two complementary mechanisms in <strong>PCI</strong> <strong>Express</strong> which allow the agent detectingan error to alert the system or the initiating device that an error has occurred. The firstmechanism is through a Completion Status and the second method is with in-band errormessages.Note that it is the responsibility of the agent detecting the error to signal the errorappropriately.Section 7.2.5 enumerates all the errors and how the hardware is required to respond whenthe error is detected.340


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.07.2.3.1. Completion StatusThe Completion Status field in the Completion header indicates when the associatedRequest failed (refer to Section 2.7.5). This is the only method of error reporting in <strong>PCI</strong><strong>Express</strong> which enables the Requestor to associate an error with a specific Request. In otherwords, since Non-Posted Requests are not considered complete until after the Completionreturns, the Completion Status field gives the initiator an opportunity to “fix” the problem atsome higher level protocol (outside the scope of this specification). For example, if a Readis issued to prefetchable memory space and the Completion returns with a UnsupportedRequest Completion Status, perhaps due to a temporary condition, the initiator may chooseto reissue the Read Request without side effects. Note that from a <strong>PCI</strong> <strong>Express</strong> point ofview, the reissued Read Request is a distinct Request, and there is no relationship (on <strong>PCI</strong><strong>Express</strong>) between the first Request and the reissued Request.7.2.3.2. Error MessagesError messages are sent to the Root Complex for reporting the detection of errors accordingto the severity of the error.When multiple errors of the same severity are detected, the corresponding error messagesmay be merged for different errors of the same severity. At least one error message must besent for detected errors of each severity level.Table 7-1: Error MessagesError MessageERR_CORERR_NONFATALERR_FATALDescriptionThis Message is issued when the component or device detects acorrectable error on the <strong>PCI</strong> <strong>Express</strong> interface. Refer toSection 7.2.2.1 for the definition of a correctable error.This Message is issued when the component or device detects anon-fatal, uncorrectable error on the <strong>PCI</strong> <strong>Express</strong> interface. Refer toSection 7.2.2.2.2 for the definition of a non-fatal, uncorrectable error.This Message is issued when the component or device detects afatal, uncorrectable error on the <strong>PCI</strong> <strong>Express</strong> interface. Refer toSection 7.2.2.2.1 for the definition of a fatal, uncorrectable error.For these Messages, the Root Complex identifies the initiator of the Message by theRequester ID of the Message Header. The Root Complex translates these error Messagesinto platform level events.7.2.3.2.1. Uncorrectable Error Severity Programming (Advanced ErrorReporting)For devices implementing the Advanced Error Reporting capability, the UncorrectableErrors Severity register allows each uncorrectable error to be programmed to Fatal or Non-Fatal. Uncorrectable errors are not recoverable using defined <strong>PCI</strong> <strong>Express</strong> mechanisms.However, some platforms or devices might consider a particular error fatal to a Link ordevice while another platform considers that error non-fatal. The default value of the341


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0Uncorrectable Errors Severity register serves as a starting point for this specification but theregister can be reprogrammed if the device driver or platform software requires more robusterror handling.<strong>Base</strong>line error handling does not support severity programming.7.2.3.2.2. Masking Error MessagesSection 7.2.5 lists all the errors governed by this specification and enumerates when each ofthe above error messages are issued. For devices implementing the Advanced ErrorReporting capability, each of the errors are captured in the Uncorrectable Error Statusregister or Correctable Error Status register. The Uncorrectable Errors Mask register andCorrectable Errors Mask register allows each error condition to be masked independently.For devices that do not implement the Advanced Error Reporting capability, errors arereported with the Error Status field of the Link Status register and masked with the ErrorControl field of the Link Command register (see Section 5.6).When an error is masked, it is still logged but the error reporting Message is not sent to theRoot Complex. Errors masked with the Link Command register are masked independent ofthe bit settings in the Uncorrectable Errors Mask register and Correctable Errors Maskregister.7.2.3.2.3. Error PollutionError pollution can occur if error conditions for a given transaction are not isolated to theerror’s first occurrence. For example, assume the Physical Layer detects a Receiver Error.This error is detected at the Physical Layer and an error is reported to the Root Complex.To avoid having this error propagate and cause subsequent errors at upper layers (forexample, a TLP error at the Data Link Layer), making it more difficult to determine the rootcause of the error, subsequent errors which occur for the same packet will not be signaled atthe Data Link or Transaction layers. Similarly, when the Data Link layer detects an error,subsequent errors which occur for the same packet will not be signaled at the Transactionlayer. This behavior applies only to errors which are associated with a particular packet –other errors are reported for each occurrence.342


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.07.2.4. Error LoggingSection 7.2.5 lists all the errors governed by this specification and for each error, the loggingrequirements are specified. devices that do not support the Advanced Error Reportingcapability log only the Link Status register bits indicating that a Correctable, Uncorrectable-Non-fatal, or Uncorrectable-Fatal error has occurred. Note that some errors are alsoreported using the reporting mechanisms in the <strong>PCI</strong> compatible (Type 00h and 01h)configuration registers. Section 5.5 describes how these register bits are affected by thedifferent types of error conditions described in this section.For devices supporting the Advanced Error Reporting capability, each of the errors inTable 7-2, Table 7-3: and Table 7-4 corresponds to a particular bit in the UncorrectableError Status register or Correctable Error Status register, except for Unsupported Request,which is covered in the <strong>PCI</strong> <strong>Express</strong> device registers directly (see Section 5.8). Theseregisters are used by software to determine more precisely which error and what severityoccurred. For many of the Transaction Layer errors the associated TLP Header is logged inthe Header Log register. This helps system software to isolate errors to a particularapplication, and is useful for robust error handling by allowing system software to keep theremainder of the platform running normally.7.2.4.1. Root Complex Considerations (Advanced ErrorReporting)In addition to the above logging, a Root Complex that supports the Advanced ErrorReporting capability is required to implement the Error Source Identification register, whichrecords the Requester ID of the first ERR_NONFATAL/ERR_FATAL (uncorrectableerrors) and ERR_COR (correctable errors) messages received by the Root Complex. Systemsoftware written to support Advanced Error Reporting can use the Root Port Error Statusregister to determine which fields hold valid information.7.2.4.2. Multiple Error Handling (Advanced Error ReportingCapability)For the Advanced Error Reporting capability, the Uncorrectable Error Status register andCorrectable Error Status register accumulate the collection of errors which occur on thatparticular <strong>PCI</strong> <strong>Express</strong> interface. The bits remain set until explicitly cleared by software orreset. Since multiple bits might be set in the Uncorrectable Error Status register, the FirstError Pointer register points to the uncorrectable error that occurred first, except in the casewhere a non-fatal uncorrectable error is followed by a fatal error, in which case theinformation for the first fatal error is stored. Likewise, the TLP Header Log register storesthe Header for the first occurrence of a particular severity error, but is replaced when ahigher severity error is detected. For example: The TLP Header Log register is loaded dueto a correctable error, a subsequent correctable error leaves the TLP Header Log registerunmodified, but a following uncorrectable error causes the replacement of the originalcontents of the TLP Header Log register.343


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.07.2.5. Error Listing and RulesThe tables below list all of the <strong>PCI</strong> <strong>Express</strong> errors which are defined by this specification.Each error is listed with a short-hand name, how the error is detected in hardware, thedefault severity of the error, and the expected action taken by the agent which detects theerror. These actions form the rules for <strong>PCI</strong> <strong>Express</strong> error reporting and logging.The Default Severity column specifies the default severity for the error without any softwarereprogramming. For devices supporting the Advanced Error Reporting capability, theuncorrectable errors are programmable to Fatal or Non-fatal with the Error Severity register.Devices without Advanced Error Reporting capability use the default associations and arenot reprogrammable.Table 7-2: Physical Layer Error ListError NameDefaultSeverityDetecting Agent ActionReceiver Error Correctable Receiver (if checking):Send ERR_CORR to Root Complex unlessmasked.Training Error Uncorrectable(Fatal)If checking, send ERR_FATAL/ERR_NONFATALto Root Complex 38 unless maskedTable 7-3: Data Link Layer Error ListError Name Severity Detecting Agent ActionBad TLPReceiver:Send ERR_CORR to Root Complex unlessmasked.If the detecting agent supports the AdvancedError Reporting Capability, log the header of theTLP that encountered the error. Note that theheader may be unreliable.Bad DLLPReceiver:CorrectableSend ERR_CORR to Root Complex unlessmasked.Replay TimeoutTransmitter:Send ERR_CORR to Root Complex unlessmasked.REPLAY NUMRolloverTransmitter:Send ERR_CORR to Root Complex unlessmasked.38 Only the component closer to the Root Complex is typically capable of sending the error Message.344


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0Error Name Severity Detecting Agent ActionData Link LayerProtocol ErrorUncorrectable(Fatal)If checking, send ERR_FATAL/ERR_NONFATALto Root Complex unless masked.Table 7-4: Transaction Layer Error ListError Name Severity Detecting Agent ActionPoisoned TLPReceivedECRC CheckUnsupportedRequest (UR)CompletionTimeoutCompleterAbortUnexpectedCompletionReceiverOverflowUncorrectable(Non-Fatal)Uncorrectable(Fatal)Receiver (if data poisoning is supported):Send ERR_NONFATAL/ ERR_FATAL to RootComplex unless masked.If the detecting agent supports the AdvancedError Reporting Capability, log the header of thepoisoned TLP.Receiver:Send ERR_NONFATAL/ ERR_FATAL to RootComplex unless masked.If the detecting agent supports the AdvancedError Reporting Capability, log the header of theTLP that encounter the ECRC error.Request Receiver:Send ERR_NONFATAL/ ERR_FATAL to RootComplex unless masked.If the detecting agent supports the AdvancedError Reporting Capability, log the header of thetransaction that encountered the error.Requester:Send ERR_NONFATAL/ERR_FATAL to RootComplex.Completer (if device generates Completer Abortstatus):Send ERR_NONFATAL/ERR_FATAL to RootComplex.Receiver:Send ERR_NONFATAL/ERR_FATAL to RootComplex.If the detecting agent supports the AdvancedError Reporting Capability, log the header of theCompletion that encountered the error.Note that if Unexpected Completion is a result ofmisrouting, the Completion Timeout mechanismwill be triggered at the original Requester.Receiver (if checking):Send ERR_FATAL/ERR_NONFATAL to RootComplex.345


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0Error Name Severity Detecting Agent ActionFlow ControlProtocol ErrorMalformed TLP7.2.5.1. <strong>PCI</strong> MappingReceiver (if checking):Send ERR_FATAL/ERR_NONFATAL to RootComplex.Receiver:Send ERR_FATAL/ERR_NONFATAL to RootComplex.If the detecting agent supports the AdvancedError Reporting Capability, log the header of theTLP that encountered the error.In order to support <strong>PCI</strong> driver and software compatibility, <strong>PCI</strong> <strong>Express</strong> error conditions,where appropriate, must be mapped onto the <strong>PCI</strong> Status register bits for error reporting.In other words, when certain <strong>PCI</strong> <strong>Express</strong> errors are detected, the appropriate <strong>PCI</strong> Statusregister bit is set alerting the error to legacy <strong>PCI</strong> software. While the <strong>PCI</strong> <strong>Express</strong> errorresults in setting the <strong>PCI</strong> Status register, clearing the <strong>PCI</strong> Status register will not result inclearing bits in the Uncorrectable Error Status register and Correctable Error Status register.Similarly, clearing bits in the Uncorrectable Error Status register and Correctable ErrorStatus register will not result in clearing the <strong>PCI</strong> Status register.The <strong>PCI</strong> command register has bits which control <strong>PCI</strong> error reporting. However, the <strong>PCI</strong>Command Register does not affect the setting of the <strong>PCI</strong> <strong>Express</strong> error register bits.7.2.6. Real and Virtual <strong>PCI</strong> Bridge Error HandlingVirtual <strong>PCI</strong> Bridge configuration headers are associated with each <strong>PCI</strong> <strong>Express</strong> Port in aRoot Complex or a Switch. Naturally, <strong>PCI</strong>/<strong>PCI</strong>-X Bridges also implement <strong>PCI</strong> Bridgeconfiguration headers. For all of these cases, <strong>PCI</strong> <strong>Express</strong> error concepts requireappropriate mapping to the <strong>PCI</strong> error reporting structures. This section addresses the casesrelated to the virtual <strong>PCI</strong> Bridge associated with <strong>PCI</strong> <strong>Express</strong> Ports in Root Complex andSwitch cases. The mapping for <strong>PCI</strong>/<strong>PCI</strong>-X Bridges is similar, and is covered in detailelsewhere.346


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.07.2.6.1. Error Forwarding and <strong>PCI</strong> Mapping for Bridge - RulesIn general, a TLP is either passed from one side of the Virtual <strong>PCI</strong> Bridge to the other, or ishandled at the ingress side of the Bridge according to the same rules which apply to theultimate recipient of a TLP. The following rules cover <strong>PCI</strong> <strong>Express</strong> specific error relatedcases:• If a Request does not address a space mapped to the egress side of the Bridge, theRequest is terminated at the ingress side as an Unsupported Request• Poisoned TLPs are forwarded according to the same rules as non-poisoned TLPso When forwarding a poisoned TLP:• the Receiving side must set the Detected Parity Error bit in the(Secondary) Status register• the Transmitting side must set the Master Data Parity Error bit in theSecondary Status register if the Parity Error Response bit in theBridge Control register is set• ERR_COR, ERR_NONFATAL and ERR_FATAL are forwarded from thesecondary interface to primary interface, if the SERR# Enable bit in the Commandand Bridge Control register is set7.3. Virtual Channel Support7.3.1. Introduction and ScopeVirtual Channel mechanism provides a foundation for supporting differentiated serviceswithin the <strong>PCI</strong> <strong>Express</strong> fabric. It enables deployment of independent physical resources thattogether with traffic labeling are required for optimized handling of differentiated traffic.Traffic labeling is supported using Transaction Class TLP-level labels. Exact policy fortraffic differentiation is determined by the TC/VC mapping and by the VC-basedarbitration. The TC/VC mapping depends on the platform application requirements. Theserequirements drive the choice of VC arbitration algorithm andconfigurability/programmability of arbiters allows detailed tuning of the traffic servicingpolicy.Basic definition of Virtual Channel mechanism and associated Traffic Class labelingmechanism is covered in Chapter 2 of this specification. VC configuration/programmingmodel is defined in Section 5.11 of this document.347


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0The remaining sections of this chapter cover VC mechanisms from the system perspective.They address the next level details on:• Supported TC/VC configurations• VC-based arbitration – algorithms and rules• Traffic ordering considerations• Isochronous support as a specific usage model7.3.2. Supported TC/VC ConfigurationsA Virtual Channel is established when one or multiple TCs are associated with a physical VCresource designated by the VC ID. Every Traffic Class that is supported must be mapped toone of the Virtual Channels. The baseline <strong>PCI</strong> <strong>Express</strong> configuration requires support forthe default TC0/VC0 pair that is “hardwired” i.e., not configurable. Any support above thatlevel is optional. The TC/VC configuration process is controlled by system software usingprogramming model described in Section 5.11.To simplify for interoperability when configuring multiple VCs over a <strong>PCI</strong> <strong>Express</strong> Link,this specification provides restricting rules to limit the set of valid VC configurations thatcan be found in Section 2.6. In general, mapping of TCs to VCs other than TC0/VC0 is upto system software. Two basic TC/VC configurations are described here as examples:• Symmetrical TC to VC Mapping• TC to VC Re-mappingNote that multi-port components (Switches and Root Complex) are required to supportindependent TC/VC mapping for each <strong>PCI</strong> <strong>Express</strong> port, therefore they must support bothconfigurations.7.3.2.1. Symmetrical TC to VC MappingDifferentiated servicing of transactions with different TC labels can be realized throughproper mapping of TCs to VCs that provide certain service disciplines. In manyapplications, same service discipline is applied to transactions with the same TC labelregardless of the source of the transaction. Therefore, within a <strong>PCI</strong> <strong>Express</strong> fabriccomponent such as a Switch, the setting of TC to VC mapping is selected such that it is thesame for all ports of the Switch. This is called symmetrical TC to VC mapping.Figure 7-2 shows a symmetrical TC to VC mapping example, where the Switch has twodownstream ports and one upstream Port and supports four Virtual Channels at each Port.After VC configuration is established, the upstream Port and the downstream Portconnecting to Endpoint B have four VCs enabled with VC ID of 0, 1, 2, 3, respectively andhave the following TC to VC mapping: TC(0-1)/VC0, TC(2-4)/VC1, TC(5-6)/VC2,TC7/VC3. The connection to Endpoint A only has two VCs enabled with the following TCto VC mapping: TC(0-1)/VC0, TC7/VC3. In this example, the second VC (at the Link thatconnects Endpoint A and Switch) is assigned a VC ID of 3. In this configuration, allenabled VCs of Switch ports have the same set of associated TCs.348


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0Note that TC[2:6] are not mapped to the Link that connects Endpoint A and Switch, whichmeans that traffic labeled with TC[2:6] is not allowed between the Switch and Endpoint A.Traffic labeled with a TC number that is not in the list of TCs enabled for a <strong>PCI</strong> <strong>Express</strong>Port is treated as an illegal transaction. Corresponding packets will be dropped at thereceiving Port. This mechanism is referred to as TC filtering.Endpoint ASwitchTC[0:1]VC0TC[0:1]Root ComplexTC7Endpoint BTC[0:1]TC[2:4]VC1 VC3LinkVC0VC1TC7TC[0:1]TC[2:4]TC[0:1]TC[2:4]MappingTC[5:6]TC7VC0VC1VC2VC3TC[0:1]TC[2:4]TC[5:6]TC7TC[5:6]TC7VC2VC3TC[5:6]TC7LinkLinkFigure 7-2: An Example of Symmetrical TC to VC Mapping7.3.2.2. TC to VC Re-mappingFor some systems where <strong>PCI</strong> <strong>Express</strong> components with different capability are connected ina <strong>PCI</strong> <strong>Express</strong> fabric, an improved traffic differentiation can be achieved using TC to VC remapping.TC to VC re-mapping refers to the configuration of a multi-port <strong>PCI</strong> <strong>Express</strong>component whereas for traffic flowing from an Ingress Port to an Egress Port of thecomponent, the TC to VC mapping of the two ports are different.Figure 7-3 shows an example of TC to VC re-mapping. A simple Switch with onedownstream Port and one upstream Port connects an Endpoint to a Root Complex. At theupstream Port, two VCs (VC0 and VC1) are enabled with the following mapping: TC(0-6)/VC0, TC7/VC1. At the downstream Port, only the default VC (VC0) is enabled and allTCs are mapped to VC0. In this example while TC7 is mapped to VC0 at the downstreamPort, it is re-mapped to VC1 at the upstream Port. Although the Endpoint device onlysupports the default VC, when it labels transactions with different TCs, transactions withTC7 label from/to the Endpoint device can take advantage of the two Virtual Channelsenabled between the Switch and the Root Complex.349


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0SwitchRoot ComplexEndpointTC[0:7]VC0LinkTC[0:7]MappingTC[0:6]TC7VC0VC1LinkTC[0:6]TC7Figure 7-3: An Example of Asymmetrical TC to VC MappingImplementation Note: Multiple TCs Over a Single VCA single VC implementation may benefit from using multiple TC labels. TC labels provideordering domains that may be used to differentiate traffic within the endpoint or the RCindependent of the number of VCs supported.In a simple configuration there might be only a default VC supported. Within this platformthe traffic differentiation may not be supported in an optimum manner since the differenttraffic classes cannot be physically segregated. However, the benefits of carrying multipleTC labels can still be exploited particularly in the small and “shallow” topologies whereEndpoints are connected directly to RC rather then through cascaded switches. In thesetopologies traffic that is targeting RC only needs to traverse a single Link, and an optimizedscheduling of packets on both sides (Endpoint and RC) based on TC labels may accomplishsignificant improvement over the case when a single TC label is used. Still, inability to routedifferentiated traffic through separate resources with fully independent flow-control andindependent ordering exposes all of the traffic to the potential blocking head-of-lineconditions. Optimizing Endpoint internal architecture to minimize the exposure to theblocking conditions can reduce those risks.7.3.3. VC ArbitrationArbitration is one of the key aspects of Virtual Channel mechanism and it is defined in amanner that fully enables configurability to the specific application. In general, definition of<strong>PCI</strong> <strong>Express</strong> VC-based arbitration mechanism is driven by the following objectives:• To provide data flow forward progress required to avoid false transaction timeouts.• To provide differentiated services between data flows within the fabric.• To provide guaranteed bandwidth with deterministic (and reasonably small) end-toendlatency between components.As <strong>PCI</strong> <strong>Express</strong> Links are bidirectional, each <strong>PCI</strong> <strong>Express</strong> port can be an ingress or anEgress Port depending on the direction of traffic flow. This is illustrated by the example ofa 3-port Switch in Figure 7-4, where traffic flows between Switch ports are highlighted withdifferent types of lines. In the following sections, <strong>PCI</strong> <strong>Express</strong> VC Arbitration is defined350


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0using a Switch arbitration model since Switch is the <strong>PCI</strong> <strong>Express</strong> element that represents afunctional superset from the arbitration perspective.In addition, one-directional data flows are used in the description.3GIOLinkEgressTXRXIngressEgressEgressTXTX3GIOLinkRXIngressA 3-Port SwitchRXIngress3GIOLinkFigure 7-4: An Example of Traffic Flow Illustrating Ingress and Egress7.3.3.1. Traffic Flow and Switch Arbitration ModelThe following set of figures (Figure 7-5 and Figure 7-6) illustrates traffic flow through theSwitch and summarizes the key aspects of the arbitration.2.0b2.03.02.1b2.0a023.1aEgress Port # Ingress Port #2.1b3.1b 3.1a IngressPortsEgressPorts2.02.1a2.0b2.1a 3.1132.0a3.0Priority3.0Order3.1bFigure 7-5: An Example of Differentiated Traffic Flow Through a Switch3.1351


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0At each Ingress Port an incoming traffic stream is represented in Figure 7-5 by small boxes.These boxes represent packets that are carried within different VCs that are distinguishedusing different levels of gray. Each of the boxes that represents a packet belonging todifferent VC includes designation of ingress and Egress Ports to indicate where the packet iscoming from and where it is going. For example, designation “3.0” means that this packet isarriving at Port #0 (ingress) and it is destined to Port #3 (egress). Within the Switch packetsare routed and serviced based on Switch internal arbitration mechanisms.Switch arbitration model defines a required arbitration infrastructure and functionality withina Switch. This functionality is needed to support a set of arbitration policies that controltraffic contention for an Egress Port from multiple Ingress Ports.Figure 7-6 shows a conceptual model of a Switch highlighting resources and associatedfunctionality in ingress to egress direction. Note that each Port in the Switch can have a roleof an ingress or Egress Port. Therefore, this figure only shows one particular scenario wherethe 4-Port Switch in this example has ingress traffic on Port #0 and Port #1, that targetsPort #2 as an Egress Port. A different example may show different flow of traffic implyingdifferent roles of ports on the Switch. <strong>PCI</strong> <strong>Express</strong> architecture enables peer-to-peercommunication through the Switch and, therefore, possible scenarios using the sameexample may include multiple separate and simultaneous ingress to egress flows (e.g., Port 0to Port 2 and Port 1 to Port 3).Port Arbitrationwithin a VC in Egress portVC Arbitrationfor an Egress portIngressPorts01TC/VCMappingof theEgressPortTC/VCMappingof theEgressPortARBARBVC 0VC 1ARBThese structures replicatefor each egress port23EgressPortsFigure 7-6: Switch Arbitration StructureRouting of traffic received by the Switch on Port 0 and Port 1 and destined to Port 2 can beconceptually described by the following two steps. First, the target Egress Port is determinedbased on address/routing information in the TLP header. Secondly, the target VC of theEgress Port is determined based on the TC/VC map of the Egress Port. Transactions thattarget the same VC in the Egress Port but are from different Ingress Ports must bearbitrated before they can be forwarded to the corresponding resource in Egress Port. Thisarbitration is referred to as the Port Arbitration.352


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0Once traffic reaches corresponding destination VC resource in the Egress Port, it is subjectof arbitration for the shared Link. From the Egress Port point of view this arbitration canbe conceptually defined as a simple form of multiplexing where the multiplexing control isbased on arbitration policies that are either fixed or configurable/programmable. This stageof arbitration between different VCs at an Egress Port is called the VC Arbitration of theEgress Port.Independent of VC arbitration policy, a management/control logic associated with each VCmust observe transaction ordering and flow control rules before it can make pending trafficvisible to the arbitration mechanism.Implementation Note: VC Control Logic Requirements at Egress PortPart of the VC control logic resources at every Port includes:• VC Flow Control logic• VC Ordering Control logicFlow control credits are exchanged between two ports connected to the same Link.Availability of flow-control credits is one of the qualifiers that VC control logic must use todecide when a VC is allowed to compete for the shared Link resource (i.e., DLLtransmit/retry buffer). If a candidate packet cannot be submitted due to the lack of anadequate number of flow control credits, VC control logic MUST mask presence of pendingpacket to prevent blockage of traffic from other VCs. Note that since each VC includesbuffering resources for Posted, Non-Posted Requests and Completion packets, the VCcontrol logic must also take into account availability of flow control credits for the particularcandidate packet. In addition, VC control logic must observe ordering rules (see Section 2.5for more details) for Posted/Non-Posted/Completion transactions to prevent deadlocks andviolation of producer-consumer ordering model.Implementation Note: Arbitration for Multi-Function EndpointsThe arbitration of data flows from different functions of a multi-function Endpoint isbeyond the scope of this specification. Mapping of different data flows (within multifunctionEndpoint) to different TCs and VCs is implementation specific. Multi-functionEndpoints, however, should support <strong>PCI</strong> <strong>Express</strong> VC-based arbitration control mechanismif multiple VCs are implemented for the <strong>PCI</strong> <strong>Express</strong> Link.When a common VC on the <strong>PCI</strong> <strong>Express</strong> Link is shared by multiple functions, theaggregated traffic over the VC is subject to the bandwidth and latency regulations for thatVC on the <strong>PCI</strong> <strong>Express</strong> Link. The multi-function Endpoints should implement properarbitration for data flows from different functions in order to share the Link resources andachieve desired end to end services.353


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.07.3.3.2. VC Arbitration − Arbitration Between VCsThe VC Identification (VC ID) provides an inherent, i.e., default “prioritization” of VCs.Therefore, all VC resources are arranged in ascending order of relative priority in the <strong>PCI</strong><strong>Express</strong> Virtual Channel Capability Structure. As shown in an example in Figure 7-7 where 8VCs are supported by a <strong>PCI</strong> <strong>Express</strong> Port, the VCs are associated with default priority levelswhere VC0 is a lowest priority and VC7 is the highest priority.VCRelativeResource VC ID Priority VC Arbitration Usage ExampleExtended VC Count = 78th VC7th VC6th VC5th VC4th VC3rd VC2nd VC1st VCVC 7VC 6VC 5VC 4VC 3VC 2VC 1VC 0HighPriority OrderLowStrict PriorityGoverned byVC ArbitrationCapability field(e.g.WRR)For isochronous trafficFor other low latency usage(such as over-subscribable realtime streams, or low latencymessaging)Low Priority Extended VC Count = 3For QoS usageDefault VC (3GIO/<strong>PCI</strong>)Figure 7-7: VC ID and Priority Order – An ExampleHowever, this inherent prioritization does not imply restrictions in terms of algorithms thatcan be deployed for handling VC arbitration. Before VCs above the default VC0 can beenabled, they must be configured for the appropriate arbitration policy. <strong>PCI</strong> <strong>Express</strong>architecture defines the following arbitration methods:• Strict Priority – <strong>Base</strong>d on inherent prioritization, i.e., VC0=lowest, VC7=highest• Round Robin (RR) – Simplest form of arbitration where all VCs have equal priority• Weighted RR – Programmable weight factor determines the level of serviceThe <strong>PCI</strong> <strong>Express</strong> VC Capability programming model allows mixing of different arbitrationmethods by grouping VCs into the two groups, the lower group (VC0 to VC3) and theupper group (VC4 to VC7) as shown by the example in Figure 7-7. The upper groupoperates using strict priority scheme and the lower group as a whole is treated as the lowestpriority member in the strict priority arbitration stack. The arbitration within the lower groupcan be configured to one of the supported arbitration methods. The size of this group isindicated by the Low Priority Extended VC Count field in the Port VC Capability Register 1.The arbitration methods are listed in the VC Arbitration Capability field in the Port VCCapability Register 2. See Section 5.11 for details. When the Low Priority Extended VC354


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0Count field is set to zero, all VCs are governed by the strict-priority VC arbitration; when thefield is equal to the Extended VC Count, all VCs are governed by the VC arbitrationindicated by the VC Arbitration Capability field.7.3.3.2.1. Strict Priority Arbitration ModelStrict priority provides minimal latency for high priority transactions. However, it maycreate a potential starvation for a low priority traffic if it is not applied correctly. Using strictpriority scheme implies that traffic at every priority level (except at the lowest) is regulated interms of both maximum peak bandwidth and duration of the peak bandwidth usage. Thisregulation must be provided at the sources of the traffic or at the ports where traffic entersthe <strong>PCI</strong> <strong>Express</strong> fabric. It is assumed that lowest priority will be provided with an adequateleftover of bandwidth to allow reasonable forward progress and to prevent applicationtimeouts. For example, isochronous traffic requires to be served as the highest priority byeach <strong>PCI</strong> <strong>Express</strong> component. As detailed in Section 7.3.4, regulation of isochronousresource usage is managed by software and is enforced by <strong>PCI</strong> <strong>Express</strong> fabric componentssuch as Switches and Root Complex in the manner that over-subscription is prevented.7.3.3.2.2. Round Robin Arbitration ModelRound Robin model is used to provide simple arbitration that allows at transaction-levelequal 39 opportunities to all traffic. Note that this scheme is used where different unorderedstreams need to be serviced with the same priority.In the case where differentiation is required, a Weighted Round Robin scheme can be used.The WRR scheme is commonly used in the case where bandwidth regulation is not enforcedby the sources of traffic and therefore it is not possible to use the priority scheme withoutrisking starvation of lower priority traffic. The key is that this scheme provides fairnessduring traffic contention by allowing at least one arbitration win per arbitration loop.Assigned weight regulates both minimum allowed bandwidth and maximum burstiness foreach VC during the contention. This means that it bounds the arbitration latency for trafficfrom different VCs. Note that latencies are also dependent on the maximum packet sizesallowed for traffic that is mapped onto those VCs.One of the key usage models for WRR scheme is support for QoS policy where differentQoS levels can be provided using different weights.Although weight can be fixed (by hardware implementation) for certain applications, toprovide more generic support for different applications <strong>PCI</strong> <strong>Express</strong> components thatsupport WRR scheme are recommended to make it programmable. Programming of WRRis controlled using software interface defined in Section 5.11.39 Note that this does not imply equivalence and fairness in the terms of bandwidth usage.355


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.07.3.3.3. Port Arbitration − Arbitration Within VCArbitration within VC refers to the arbitration between the traffic that is mapped onto thesame VC but is coming from different Ingress Ports. Inherent prioritization scheme thatmakes sense when talking about arbitration among VCs in this context is not applicablesince it would imply strict arbitration priority for different ports. Traffic from differentports can be arbitrated using the following supported schemes:• Hardware-fixed Round Robin or RR-like arbitration scheme• Programmable WRR arbitration scheme• Programmable Time-based WRR arbitration schemeHardware-fixed RR or RR-like scheme is the simplest to implement since it does not requireany programmability. It makes all ports equal priority, which is acceptable for applicationswhere no software-managed differentiation or per-port-based bandwidth budgeting isrequired.Programmable WRR allows flexibility that it can operate as flat RR or if differentiation isrequired, different weights can be applied to traffic coming from different ports in thesimilar manner as described in Section 7.3.3.2. This scheme is used where differentallocation of bandwidth needs to be provided for different ports.A Time-based WRR is used for applications where not only different allocation ofbandwidth is required but also a tight control of usage of that bandwidth. This schemeallows control on the amount of traffic that can be injected from different ports withincertain fixed period of time. This is required for certain applications such as isochronouswhere traffic needs to meet a strict deadline requirement. Section 7.3.4 provides basic rulesto support isochronous applications. For more details on time-based arbitration and on theisochronous as a usage model for this arbitration scheme refer to Appendix A.7.3.4. Isochronous SupportServicing isochronous data transfer requires a system to provide not only guaranteed databandwidth but also deterministic service latency. The isochronous support mechanisms in<strong>PCI</strong> <strong>Express</strong> are defined to ensure that isochronous traffic receives its allocated bandwidthover a relevant period of time while also preventing starvation of the other traffic in thesystem. Isochronous support mechanisms apply to communication between Endpoint andRoot Complex as well as to peer-to-peer communication with the following restrictions:• In the Endpoint to Root Complex communication model, isochronous trafficconsists of read and write requests to the Root Complex and read completions fromthe Root Complex.• In the Peer-to-Peer model, isochronous traffic is limited to unicast push-onlytransactions (memory writes or messages). The push-only transactions can be withina single host domain or across multiple host domains.356


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0Isochronous service is realized through proper use of <strong>PCI</strong> <strong>Express</strong> mechanisms such asTraffic Class (TC) transaction labeling, Virtual Channel (VC) data-transfer protocol, and TCto VC mapping. End to end isochronous service requires software to set up properconfiguration along the path between the Requester to Completer. This section describesthe rules for software configuration and the rules hardware components must follow toprovide end to end isochronous services. More information and background materialregarding isochronous applications and isochronous design guide can be found inAppendix A..7.3.4.1. Rules for Software ConfigurationSystem software MUST obey the following rules to configure <strong>PCI</strong> <strong>Express</strong> fabric forisochronous transactions:• Within a <strong>PCI</strong> <strong>Express</strong> hierarchy domain or within multiple <strong>PCI</strong> <strong>Express</strong> hierarchydomains spawned from Root Ports that have a common Root Complex RegisterBlock (RCRB), software must designate one or more TCs for isochronoustransactions. In the rest of this section, a TC designated for isochronous transactionsis referred to as an Isochronous TC.• The setting of the Attribute fields of all isochronous requests targeting the sameCompleter must be fixed and identical.• On any <strong>PCI</strong> <strong>Express</strong> Link, software must assign all Isochronous TCs to the VC withthe highest VC ID. In the rest of this section, this VC is referred to as theIsochronous VC.• On any <strong>PCI</strong> <strong>Express</strong> port, software must configure the Isochronous VC so that it isserved with the highest priority in VC arbitration. This is accomplished by eitherenabling strict priority VC arbitration (where the Isochronous VC having the highestVC ID would be served with the highest priority) or by configuring other VCarbitration mechanism to achieve equivalent effect.• For Switch ports and RCRB, the Isochronous VC must support and be configuredwith a time-based Port Arbitration.• Software must not assign other TCs to the Isochronous VC.• Software must not assign Isochronous TC to any other VC.• Software must not assign the number of isochronous transactions to a <strong>PCI</strong> <strong>Express</strong>port or RCRB that exceeds the Maximum Time Slots capability reported by the <strong>PCI</strong><strong>Express</strong> port or RCRB. Software must not assign all <strong>PCI</strong> <strong>Express</strong> Link capacity toisochronous traffic in order to ensure forward progress of other transactions.• Software must limit the Max_Payload_Size for each <strong>PCI</strong> <strong>Express</strong> hierarchy domainto meet the isochronous latency requirements.357


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.07.3.4.2. Rules for RequestersA Requester requiring isochronous services must obey the following rules:• The value in the Length field of read requests must never exceed Max_Payload_Size.• All read and write requests must never cross naturally aligned address boundaries.• When system software indicates to the device driver of the Requester that snooptransaction is not allowed by the Completer, the Requester must set the "Snoop NotRequired" Attribute bit.• MSI must not be mapped to any TC used for isochronous traffic.Note: An isochronous Requester that uses MSI mechanism must select a different TC (otherthan the one used for isochronous traffic) to transmit MSI packets. Before MSI can begenerated, the Requester is required to perform synchronization (for example "flushing"using Memory Read of zero length).7.3.4.3. Rules for CompletersA Completer providing isochronous services must obey the following rules:• A Completer must not apply backpressure (due to the flow control) to isochronousrequests injected uniformly to the <strong>PCI</strong> <strong>Express</strong> Link.• A Completer must report its isochronous bandwidth capability in the Max Time Slotsfield in the VC Resource Capability Register intended for isochronous use. Note that aCompleter must account for partial writes.• A Completer must observe the maximum isochronous transaction latency.• A Root Complex as a Completer must implement RCRB and support time-basedPort Arbitration mechanism for the Isochronous VC. Note that the time-based PortArbitration only applies to request transactions.7.3.4.4. Rules for Switch ComponentsA Switch component providing isochronous services must obey the following rules:• A Switch port must not apply backpressure (due to flow control) to isochronousrequests injected uniformly to the <strong>PCI</strong> <strong>Express</strong> Link.• A Switch component must observe the maximum isochronous transaction latency.• A Switch component must return isochronous read completions in strictly the sameorder as the corresponding isochronous read requests.• A Switch component must serve and forward isochronous write requests in strictlythe same order.358


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0• A Switch component must support time-based Port Arbitration mechanism for theIsochronous VC. Note that the time-based Port Arbitration only applies to requesttransactions but not to completion transactions.• A Switch component must allow isochronous write requests (peer to peer) to passisochronous read completions (Root Complex to Endpoint).7.4. Device Synchronization STOP MechanismRenumbering bus numbers by system software during system operation may cause requestorID (based upon bus numbers) for a given device to change; as a result, any requests orcompletions for that device still in flight may be rendered invalid due to the change in therequester ID. It is also desirable to be able to ensure that there are no outstandingtransactions during a hot-plug orderly removal. A device synchronization stop mechanism isprovided to allow system software to ensure that no transactions are in flight with respect toa particular endpoint device before performing a bus renumbering operation that causes thebus number (and requestor ID) to change for a given device.The device synchronization stop mechanism for endpoint devices is implemented via theStop mechanism and the associated Stop and Transactions Pending bits (see Section 5.8).System software signals a device to stop by setting the Stop bit in the Device Commandregister of the device. The Stop operation is assumed to have completed by software if adevice signals that no more transactions are pending by clearing the Transactions Pendingstatus bit in the Device Status register; a device is not permitted to issue any new requestsafter the Stop bit is set.Prior to clearing the Transaction Pending bit, an endpoint must ensure that:• Completions for outstanding non-posted requests for all used Traffic Classes havebeen received by the corresponding Requestors.• All requests with completions initiated by this device have returned completions.• All posted requests of all Traffic Classes have been “flushed” (i.e., have beenreceived by intended targets) in all directions including between endpoint and host,and between peer-to-peer endpoints.Implementation Note: Flush MechanismsIn a simple case such as that of an endpoint device communicating only with host memory,“flush” can be implemented using a directed memory read. A memory read needs to beperformed on all TCs that the device is using. If a device has pending peer-to-peertransactions (including pending completions), then it must use a non-posted transaction suchas a directed memory read targeted to specific peer destination to perform the “flush.” Thespecific mechanism used is implementation specific but must be performed by hardwarewithout software assistance.359


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.07.5. Locked Transactions7.5.1. IntroductionLocked Transaction support is required to prevent deadlock in systems that use legacysoftware which causes the accesses to I/O devices. Note that some CPUs may generatelocked accesses as a result of executing instructions that implicitly trigger lock. Some legacysoftware misuses these transactions and generates locked sequences even when exclusiveaccess is not required. Because locked accesses to I/O devices introduce potential deadlocksapart from those mentioned above, as well as serious performance degradation, <strong>PCI</strong> <strong>Express</strong>Endpoints are prohibited from supporting locked accesses, and new software must not useinstructions which will cause locked accesses to I/O devices. Legacy Endpoints supportlocked accesses only for compatibility with existing software.Only the Root Complex is allowed to initiate Locked Requests on <strong>PCI</strong> <strong>Express</strong>. LockedRequests initiated by Endpoints and Bridges are not supported. This is consistent withlimitations for locked transaction use outlined in the <strong>PCI</strong> Local Bus <strong>Specification</strong>, Rev 2.3(Appendix F- Exclusive Accesses).This section specifies the rules associated with supporting locked accesses from the HostCPU to Legacy Endpoints, including the propagation of those transactions through Switchesand <strong>PCI</strong> <strong>Express</strong>/<strong>PCI</strong> Bridges.7.5.2. Initiation and Propagation of Locked Transactions -RulesLocked sequences are generated by the Host CPU(s) as one or more reads followed by anequal number of writes to the same location(s). When a lock is established, all other traffic isblocked from using the path between the Root Complex and the locked Legacy Endpoint orBridge.• Lock is initiated on <strong>PCI</strong> <strong>Express</strong> using the “lock”–type Read Request/Completion(MRdLk/CplDLk) and terminated with the Unlock MessageoMRdLk, CplDLk and Unlock semantics are allowed only for the defaultTraffic Class (TC0)• The Unlock Message is broadcast from the Root Complex to all Endpoints andBridgeso Any device which is not involved in the locked sequence must ignore thisMessageThe initiation and propagation of a locked transaction sequence through <strong>PCI</strong> <strong>Express</strong> isperformed as follows:• A locked transaction sequence is started with a MRdLk Requesto Any successive reads for the locked transaction also use MRdLk Requestso The Completions for any MRdLk Request use the CplDLk Completion type360


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0• All writes for the locked sequence use MWr Requests• The Unlock Message is used to indicate the end of a locked sequenceo A Switch must propagate Unlock Messages to all Ports other than the IngressPort, regardless of the state of the Switch or <strong>PCI</strong> <strong>Express</strong>/<strong>PCI</strong> Bridge withrespect to locko A <strong>PCI</strong> <strong>Express</strong>/<strong>PCI</strong> Bridge may propagate Unlock by deasserting LOCK#on its <strong>PCI</strong> interface• Upon receiving an Unlock Message, a Legacy Endpoint or Bridge must unlock itselfif it is in a locked stateoIf not locked, or if the Receiver is a <strong>PCI</strong> <strong>Express</strong> Endpoint or Bridge whichdoes not support lock, the Unlock Message is ignored and discarded7.5.3. Switches and Lock - RulesSwitches must distinguish transactions associated with locked sequences from othertransactions to prevent other transactions from interfering with the lock and potentiallycausing deadlock. The following rules cover how this is done. Note that locked accesses arelimited to TC0, which is always mapped to VC0.• When a Switch propagates a MRdLk Request from the Ingress Port (closest to theRoot Complex) to the Egress Port, it must block all Requests which map to thedefault Virtual Channel (VC0) from being propagated to the Egress PortooIf the Egress Port is enabled to use one or more non-default VCs (VCs otherthan VC0), and if a Request specifies a Traffic Class which maps to a nondefaultVC on the Egress Port, then the Request must not be blockedIf a subsequent MRdLk Request is Received at this Ingress Port addressing adifferent Egress Port, the behavior of the Switch is undefinedNote: This sort of split-lock access is not supported by <strong>PCI</strong> <strong>Express</strong> andsoftware must not cause such a locked access. System deadlock may resultfrom such accesses.• When the CplDLk for the first MRdLk Request is returned, the Switch must blockall Requests from all other Ports from being propagated to either of the Portsinvolved in the locked access, except for Requests which map to non-default VCs onthe Egress Port361


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0• The two Ports involved in the locked sequence must remain blocked as describedabove until the Switch receives the Unlock Message (at the Ingress Port for the initialMRdLk Request)o The Unlock Message must broadcast to all other PortsoThe Ingress Port is unblocked once the Unlock Message arrives, and theEgress Port(s) which were blocked are unblocked following the Transmissionof the Unlock Message out of the Egress Ports• Ports which were not involved in the locked access are unaffected bythe Unlock Message7.5.4. <strong>PCI</strong> <strong>Express</strong>/<strong>PCI</strong> Bridges and Lock - RulesThe requirements for <strong>PCI</strong> <strong>Express</strong>/<strong>PCI</strong> Bridges are similar to those for Switches, exceptthat, because <strong>PCI</strong> <strong>Express</strong>/<strong>PCI</strong> Bridges use only the default Virtual Channel and TrafficClass, all other traffic is blocked during the locked access. The requirements on the <strong>PCI</strong> busside of the <strong>PCI</strong> <strong>Express</strong>/<strong>PCI</strong> Bridge match the requirements for a <strong>PCI</strong>/<strong>PCI</strong> Bridge (see<strong>PCI</strong>-to-<strong>PCI</strong> Bridge Architecture <strong>Specification</strong> 1.1).• When a <strong>PCI</strong> <strong>Express</strong>/<strong>PCI</strong> Bridge propagates a Locked Read Request through theBridge, it must block all Requests not associated with the locked access from beingpropagated through the Bridge in the same direction as the Locked Read Request• When the Locked Completion for the first Locked Read Request is returned, theBridge must block all Requests not associated with the locked access from flowingthrough the Bridge in either direction• The Bridge must remain blocked as described above until the Bridge is unlockedfrom the side which initiated the locked accessoThe Bridge unlocks itself and propagates the unlock to the other side of theBridge7.5.5. Root Complex and Lock - RulesA Root Complex is permitted to support locked transactions as a Requestor. If lockedtransactions are supported, a Root Complex must follow the sequence described inSection 7.5.2 to perform a locked access. The mechanisms used by the Root Complex tointerface <strong>PCI</strong> <strong>Express</strong> to the Host CPU(s) are outside the scope of this document.362


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.07.5.6. Legacy EndpointsLegacy Endpoints are permitted to support locked accesses, although their use isdiscouraged. If locked accesses are supported, Legacy Endpoints must handle them asfollows:• The Legacy Endpoint becomes locked when it Transmits the first Completion forthe first Read Request of the locked accessoOnce locked, the Legacy Endpoint must remain locked until it receives theUnlock Message• While locked, a Legacy Endpoint must not issue any Requests using Traffic Classeswhich map to the default Virtual Channel (VC0)Note that this requirement applies to all possible sources of Requests within theEndpoint, in the case where there is more than one possible source of Requests.oRequests may be issued using Traffic Classes which map to VCs other thanthe default Virtual Channel7.5.7. <strong>PCI</strong> <strong>Express</strong> Endpoints<strong>PCI</strong> <strong>Express</strong> Endpoints do not support lock. A <strong>PCI</strong> <strong>Express</strong> Endpoint must treat a MRdLkRequest as an Unsupported Request (see Chapter 2).7.6. <strong>PCI</strong> <strong>Express</strong> Reset -RulesThis section specifies the behavior of <strong>PCI</strong> <strong>Express</strong> Link reset. The reset can be generated bythe platform or on the component, but any relationship between the <strong>PCI</strong> <strong>Express</strong> Link resetand component or platform reset is component or platform specific (respectively).• There must be a hardware mechanism for setting or returning all Port state to theinitial conditions specified in this document – this mechanism is called “Power GoodReset”ooooA “Power Good Reset” will occur following the application of power to thecomponent. This is called a “cold” resetIn some cases, it may be possible for the “Power Good Reset” mechanism tobe triggered by hardware without the removal and re-application of power tothe component. This is called a “warm” resetNote that there is also an in-band mechanism for propagating reset across aLink. This is called a “hot” reset and is described in Section 4.2.4.5.Note also that entering the DL_Inactive state is in some ways identical to a“hot” reset – see Section 2.13.• On exit from any type of reset (cold, warm, or hot), all Port registers and statemachines must be set to their initialization values as specified in this document363


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0• On exit from a “Power Good Reset”, the Physical Layer will attempt to bring up theLink (see Section 4.2.5). Once both components on a Link have entered the initialLink Training state, they will proceed through Link initialization for the PhysicalLayer and then through Flow Control initialization for VC0, making the Data Linkand Transaction Layers ready to use the Linko Following Flow Control initialization for VC0, it is possible for TLPs andDLLPs to be transferred across the LinkFollowing a reset, some devices may require additional time before they are able to respondto Requests they receive. Particularly for Configuration Requests it is necessary thatcomponents and devices behave in a deterministic way, which the following rules address.The first set of rules address requirements for components and devices:• A component must enter the initial active Link Training state within 80 ms of theend of “Power Good Reset” (Link Training is described in Section 4.2.5)oNote: In some systems, it is possible that the two components on a Link mayexit “Power Good Reset” at different times. Each component must observethe requirement to enter the initial active Link Training state within 80 ms ofthe end of “Power Good Reset” from its own point of view.• On the completion of Link Training (entering the DL_Active state, see Section 3.2),a component must be able to receive and process TLPs and DLLPsThe second set of rules address requirements placed on the system:• To allow components to perform internal initialization, system software must waitfor at least 100 ms from the end of a reset (cold/warm/hot) before it is permitted toissue Configuration Requests to <strong>PCI</strong> <strong>Express</strong> devicesoA system must guarantee that all components intended to be software visibleat boot time are ready to receive Configuration Requests within 100 ms ofthe end of “Power Good Reset” – how this is done is beyond the scope ofthis specification• The Root Complex and/or system software must allow 1.0s (+50% / -0%) after areset (hot/warm/cold), before it may determine that a device which fails to return aSuccessful Completion status for a valid Configuration Request is a broken deviceooi.e.: if the Root Complex repeats Configuration Requests terminated withConfiguration Request Retry Status, then it must continue repeating theRequest(s) until 1s after T 0RC , at which point it is permitted to terminate theRequest as a URNote: This delay is analogous to the T rhfa parameter specified for <strong>PCI</strong>/<strong>PCI</strong>-X,and is intended to allow an adequate amount of time for devices whichrequire self initialization.• When attempting a Configuration access to devices on a <strong>PCI</strong> or <strong>PCI</strong>-X segmentbehind a <strong>PCI</strong> <strong>Express</strong>/<strong>PCI</strong>(-X) Bridge, the timing parameter T rhfa must be respected364


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0When a Link is in normal operation, the following rules apply:• If, for whatever reason, a normally operating Link goes down, the Transaction andData Link Layers will enter the DL_Inactive state (see Sections 2.13 and 3.2.1)• For any virtual or actual <strong>PCI</strong> Bridge, any of the following must cause a reset of thesecondary side of the Bridge using the Physical Layer mechanism for communicatingLink Reset (see Section 4.2.4.5):o Setting the Secondary Bus Reset bit of the Bridge Control registero Entering DL_Inactive on the primary side of the Bridgeo Link reset using the Physical Layer mechanism for communicating LinkResetCertain aspects of “Power Good Reset” are specified in this document and others arespecific to a platform, form factor and/or implementation. Specific platforms, form factorsor application spaces may require the additional specification of the timing and/orsequencing relationships between the components of the system for “Power Good Reset”.For example, it might be required that all <strong>PCI</strong> <strong>Express</strong> components within a chassis observethe assertion and deassertion of “Power Good Reset” at the same time (to within sometolerance). In a multi-chassis environment, it might be necessary to specify that the chassiscontaining the Root Complex be the last to exit “Power Good Reset.”In all cases where power is supplied, the following parameters must be defined:• T pvpgl – “Power Good” must remain inactive at least this long after power becomesvalid• T pwrgd – When deasserted, “Power Good” must remain deasserted at least this long• T fail – When power becomes invalid, “Power Good” must be deasserted within thistimeAdditional parameters may be specified.In all cases where a reference clock is supplied, the following parameter must be defined:• T pwrgd-clk – “Power Good” must remain inactive at least this long after any suppliedreference clock stableAdditional parameters may be specified.365


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.07.7. <strong>PCI</strong> <strong>Express</strong> Native Hot Plug SupportThe <strong>PCI</strong> <strong>Express</strong> architecture is designed to natively support both hot plug and hot removeof devices. This section defines the standard usage model defined for all <strong>PCI</strong> <strong>Express</strong> formfactors supporting Hot plug and hot removal of devices. This usage model provides thefoundation for how indicators and push-buttons should behave if implemented in a system.The definitions of indicators and push-buttons apply to all <strong>PCI</strong> <strong>Express</strong> Hot-Plug models.7.7.1. <strong>PCI</strong> <strong>Express</strong> Hot Plug Usage Model7.7.1.1. Why Specify a Usage Model?A standard usage model is beneficial to customers who buy systems with hot-plug slotsbecause many customers utilize hardware and software from different vendors. A standardusage model allows customers to use the hot-plug slots on all of their systems withouthaving to retrain operators. The <strong>PCI</strong> <strong>Express</strong> Hot-Plug standard usage model is derivedfrom the standard usage model defined in the <strong>PCI</strong> Standard Hot-Plug Controller and Subsystem<strong>Specification</strong>, Rev 1.0 and is identical from the user perspective. Note that only slight changeswere made in register definitions and conformance to the standard usage model is requiredby all <strong>PCI</strong> <strong>Express</strong> form factors that implement hot-plug and use indicators and buttons.Implementation Note: All <strong>PCI</strong> <strong>Express</strong> Form Factors that Support Hot-Plug/Remove Should Not Deviate from the Standard Usage ModelDeviating from the Standard Usage Model causes the solution to be non-<strong>PCI</strong> <strong>Express</strong>complaint and will create issues that would not exist otherwise, such as:• User confusion• More extensive hardware testing• Functional incompatibilities with system software• Encountering untested paths in system software366


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.07.7.1.2. Elements of the Standard Usage ModelTable 7-5: Elements of the Standard Usage ModelElementIndicatorsManually-operated Retention Latch (MRL)MRL SensorElectromechanical InterlockAttention ButtonSoftware User InterfaceSlot NumberingPurposeShows the power and attention state ofthe slotHolds add-in cards in placeAllows the port and system software todetect the MRL being openedPrevents removal of add-in cards whileslot is poweredAllows user to request hot-plugoperationsAllows user to request hot-plugoperationsProvides visual identification of slots7.7.1.2.1. IndicatorsThe Standard Usage Model defines two indicators; the Power Indicator and the Attentionindicator. The Platform can provide the two indicators per slot or module bay and theindicators can be can be implemented on the chassis or the module, see form factor hot plugrequirements for implementation details. Each indicator is in one of three states: on, off, orblinking. Hot-plug system software has exclusive control of the indicator states by writingthe command status registers associated with the indicator.The Hot-Plug capable port controls blink frequency, duty cycle, and phase. Blinkingindicators operate at a frequency of 1 to 2 Hz and 50% (+/- 5%) duty cycle. Blinkingindicators are not required to be synchronous and in-phase between ports.Indicators must be placed in close proximity to their associated hot-plug slot if indicators areimplemented on the chassis so that the association between the indicators and the hot-plugslot is clear.Both indicators are completely under the control of system software. The Switch device orRoot Port never changes the state of an indicator in response to an event such as a powerfault or unexpected MRL opening unless commanded to do so by software. An exception isgranted to Platforms capable of detecting stuck-on power faults. In the specific case of astuck-on power fault, the Platform is permitted to override the Switch device or Root Portand force the Power Indicator to be on (as an indication that the add-in card should not beremoved). In all cases, the ports internal state for the Power Indicator must match thesoftware selected state. The handling by system software of stuck-on faults is optional andnot described elsewhere. Therefore, the Platform vendor must ensure that this optionalfeature of the Standard Usage Model is addressed via other software, Platformdocumentation, or by other means.367


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.07.7.1.2.1.1 Attention IndicatorThe Attention Indicator is yellow or amber in color and is used to indicate that anoperational problem exists or that the hot-plug slot is being identified so that a humanoperator can locate it easily.Table 7-6: Attention Indicator StatesIndicator AppearanceOffOnBlinkingMeaningNormal - Normal operationAttention - Operational problem at this slotLocate - Slot is being identified at the user’s requestAttention Indicator OffWhen the Attention Indicator is off, it means that neither the add-in card (if one is present)nor the hot-plug slot requires attention.Attention Indicator OnWhen the Attention Indicator is on, it means an operational problem exists at the card orslot.An operational problem is a condition that prevents continued operation of an add-in card.The operating system or other system software determines whether a specific conditionprevents continued operation of an add-in card and whether lighting the Attention Indicatoris appropriate. Examples of operational problems include problems related to externalcabling, add-in cards, software drivers, and power faults. In general, when the AttentionIndicator is on, it means that an operation was attempted and failed or that an unexpectedevent occurred.The Attention Indicator is not used to report problems detected while validating the requestfor a hot-plug operation. Validation is a term applied to any check that system softwareperforms to assure that the requested operation is viable, permitted, and will not causeproblems. Examples of validation failures include denial of permission to perform a hotplugoperation, insufficient power budget, and other conditions that may be detected beforean operation begins.Attention Indicator BlinkingWhen the Attention Indicator is blinking, it means that system software is identifying thisslot for a human operator to find. This behavior is controlled by a user (for example, from asoftware user interface or management tool).368


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.07.7.1.2.1.2 Power IndicatorThe Power Indicator is green in color and is used to indicate the power state of the slot.Table 7-7: Power Indicator StatesIndicator AppearanceOffOnBlinkingMeaningPower Off - Insertion or removal of add-in cards ispermitted. All supply voltages (except Vaux) havebeen removed from the slot if required for add-incard removal. Note that Vaux is removed when theMRL is open.Power On - The slot is powered on. Insertion orremoval of add-in cards is not permitted.Power Transition - The slot is in the process ofpowering up or down. Insertion or removal of add-incards is not permitted.Power Indicator OffWhen the Power Indicator is off, it means that insertion or removal of an add-in card ispermitted. Main power to the slot is off if required by the form factor, example of mainpower removal is the <strong>PCI</strong> <strong>Express</strong> card form factor. If the Platform provides Vaux to hotplugslots and the MRL is closed, any signals switched by the MRL are connected to the sloteven when the Power Indicator is off. Signals switched by the MRL are disconnected whenthe MRL is opened. System software must cause a slot’s Power Indicator to be turned offwhen the slot is not powered and/or it is permissible to insert or remove add-in cards. Seethe appropriate electromechanical specifications for form factor details.Power Indicator OnWhen the Power Indicator is on, it means that main power to the slot is on and thatinsertion or removal of an add-in card is not permitted.Power Indicator BlinkingWhen the Power Indicator is blinking, it means that the slot is powering up or poweringdown and that insertion or removal of an add-in card is not permitted. A blinking PowerIndicator also provides visual feedback to the human operator when the Attention Button ispressed.7.7.1.2.2. Manually-operated Retention Latch (MRL)An MRL is a manually-operated retention mechanism that holds an add-in card in the slotand prevents the user from removing the card. The MRL rigidly holds the card in the slot sothat cables may be attached without the risk of creating intermittent contact. MRLs thathold down two or more add-in cards simultaneously are permitted in Platforms that do notprovide MRL Sensors.369


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.07.7.1.2.3. MRL SensorThe MRL Sensor is a Switch, optical device, or other type of sensor that reports the positionof a slot’s MRL to the port. The MRL Sensor reports closed when the MRL is fully closedand open at all other times (that is, fully open and intermediate positions).If Vaux is wired to hot-plug slots, the signals switched by the MRL must be automaticallyremoved from the slot when the MRL Sensor indicates that the MRL is open and must berestored to the slot when the MRL Sensor indicates that MRL has closed again.The MRL Sensor allows the port to monitor the position of the MRL and therefore allowsthe port to detect unexpected openings of the MRL. When an unexpected opening of theMRL associated with a slot is detected, the port changes the state of that slot to disabled andnotifies system software. The port does not autonomously change the state of either thePower Indicator or Attention Indicator.7.7.1.2.4. Electromechanical InterlockAn electromechanical interlock is a mechanism for physically locking the add-in card orMRL in place until the system software and port release it. Implementation of the interlockis optional. There is no mechanism in the programming interface for explicit control ofelectromechanical interlocks. The Standard Usage Model assumes that if electromechanicalinterlocks are implemented, they are controlled by the same port output signal that enablesmain power to the slot. Systems may optionally expand control of interlocks to providephysical security of the add-in cards.7.7.1.2.5. Attention ButtonAn Attention Button is a momentary-contact push-button, located adjacent to each hot-plugslot or on a module that is pressed by the user to initiate a hot-insertion or a hot-removal atthat slot.The Power Indicator provides visual feedback to the human operator (if the system softwareaccepts the request initiated by the Attention Button) by blinking. Once the Power Indicatorbegins blinking, a 5-second abort interval exists during which a second depression of theAttention Button cancels the operation.If an operation initiated by an Attention Button fails for any reason, it is recommended thatsystem software present a message explaining the failure via a software user interface or addthe message to a system log.7.7.1.2.6. Software User InterfaceSystem software provides a user interface that allows hot-insertions and hot-removals to beinitiated and that allows occupied slots to be monitored. A detailed discussion of hot-pluguser interfaces is operating system specific and is therefore beyond the scope of thisdocument.On systems with multiple hot-plug slots, the system software must allow the user to initiateoperations at each slot independent of the states of all other slots. Therefore, the user is370


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0permitted to initiate a hot-plug operation on one slot using either the software user interfaceor the Attention Button while a hot-plug operation on another slot is in process, regardlessof which interface was used to start the first operation.7.7.1.2.7. Slot NumberingA Physical Slot Identifier (as defined in <strong>PCI</strong> HP 1.1, Section 1.5) consists of an optionalchassis number and the physical slot number of the hot-plug slot. System softwaredetermines the physical slot number from registers in the port. The chassis number is 0 forthe main chassis. The chassis number for other chassis must be a non-zero value obtainedfrom a <strong>PCI</strong>-to-<strong>PCI</strong> bridge’s Chassis Number register (see <strong>PCI</strong> Bridge 1.1, Section 13.4).The Standard Usage Model also requires that each physical slot number is globally uniquewithin a chassis.7.7.2. Event BehaviorDepending on the power state of the Switch device or Root Port, it may be programmed togenerate a system interrupt or PME (see Table 7-8).Table 7-8: Event BehaviorEventRegister Bit SetWhen DetectedCleared byPort OptionallyGenerates theFollowing WhenEvent is Detected:PresenceDetectChangePresence DetectEvent StatusWriting a 1 to thedetected bitSystem Interrupt, PMEAttentionButtonPressedAttention ButtonPressed EventWriting a 1 to thedetected bitSystem Interrupt, PMEMRLSensorChangedMRL SensorChange DetectedEventWriting a 1 to thedetected bitSystem Interrupt, PMEPowerFaultPower FaultDetected EventWriting a 1 to thedetected bit.System Interrupt, PME7.7.3. Registers Grouped by Device AssociationThe registers listed below are grouped by device to convey all registers associated withimplementing each device in ports. These registers are unique to each Switch device or RootPort implementing hot plug slots.371


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.07.7.3.1. Attention Button RegistersDescription Register Attribute Default ValueAttention Button Present – This bit indicates ifan Attention Button is implemented on thechassis or card.Attention Button Pressed – This bit is setwhen the Attention Button is pressed. Thisregister is set by the debounced output of anAttention Button. This bit is also set by the portreceiving the Attention_Button_Pressedmessage from the end device.Attention Button Pressed Enable – This bitwhen set enables the generation of the hot pluginterrupt or a wake signal on an Attention ButtonPressed event.HwInit7.7.3.2. Attention Indicator RegistersRW1C 0RW 0N/ADescription Register Attribute Default ValueAttention Indicator Present – This bit indicatesif an Attention Indicator is implemented on thechassis or card.Attention Indicator Control – When read thisregister returns the current state of the AttentionIndicator; when written the Attention Indicator isset to this state. If an Attention Indicator isimplemented on the card, when written, the portwill send the appropriate Attention Indicatormessage (determined by the decoding) to thedevice on the card. Defined encodings are:00b Reserved01b On10b Blink11b OffHwInitRWN/AN/A372


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.07.7.3.3. Power Indicator RegistersDescription Register Attribute Default ValuePower Indicator Present – This bit indicates ifa Power Indicator is implemented on the chassisor card.Power Indicator Control – When read thisregister returns the current state of the PowerIndicator; when written the Power Indicator is setto this state. If a Power Indicator isimplemented on the card, when written, the portwill send the appropriate Power Indicatormessage (determined by the decoding) to thedevice on the card. Defined encodings are:00b Reserved01b On10b Blink11b Off7.7.3.4. Power Controller RegistersHwInitRWN/AN/ADescription Register Attribute Default ValuePower Controller Present – This bit indicates if HwInitN/Aa Power Controller is implemented for this slot.Power Controller Control – When read, thisregister returns the current state of the Powerapplied to the slot; when written, the PowerController turns on or off power to slot. Definedencodings are:RWN/A0b1bPower OnPower OffPower Fault Detected – This bit is set when thePower Controller detects a power fault at thisslot.Power Fault Detected Enable – This bit whenset enables the generation of the hot pluginterrupt or a wake signal on a power faultevent.RW1C 0RW 0373


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.07.7.3.5. Presence Detect RegistersDescription Register Attribute Default ValuePresence Detect State – This bit indicates thepresence of a card. in the slot. The bit willreflect the status of the Presence Detect pin asdefined in the <strong>PCI</strong> <strong>Express</strong> CardElectromechanical <strong>Specification</strong>. Definedencodings are:0b Slot Empty1b Card Present in slotRON/AThis register is required to be implemented onall Switch devices and Root Ports. Thepresence detect pin for Switch devices or RootPorts not connected to slots should behardwired to 1.Presence Detect Changed Event – This bit isset when the value of Presence Detect Statechanges.RW1C 07.7.3.6. MRL Sensor RegistersDescription Register Attribute Default ValueMRL Sensor Present – This bit indicates if anMRL Sensor is implemented on the chassis.MRL Sensor Changed – This bit is set whenthe value of the MRL Sensor State changed.Presence Detect Changed Enable – This bitwhen set enables the generation of the hot pluginterrupt or a wake signal on a presence detectchanged event.MRL Sensor State – This register reports thestatus of the MRL sensor if it is implemented.Defined encodings are:HwInitRW1C 0RW 0RON/AN/A0b1bMRL ClosedMRL Open374


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.07.7.3.7. Port Capabilities and Slot Information RegistersDescription Register Attribute Default ValueSlot Implemented – This bit when set indicatesthat the Link associated with this downstreamport is connected to a slot, as oppose to beingconnected to an integrated device or beingdisabled.Physical Slot Number – This hardwareinitialized field indicates the physical slot numberattached to the port. This field must behardware initialized to a value that assigns a slotnumber that is globally unique within thechassis. These registers should be initialized to0 for ports connected to integrated devices onthe motherboard or integrated within the samesilicon as the Switch device or Root Port.Hot-Plug capable – This bit when set indicatesthis slot is capable of supporting Hot-Plug.Hot-Plug Surprise – This bit when set indicatesthat the device might be removed from thesystem without any prior notification.HwInitHwInitHwInitHwInit7.7.3.8. Hot Plug Interrupt Control RegistersN/AN/AN/AN/ADescription Register Attribute Default ValueHot Plug Interrupt enable – This bit when setenables generation of the hot plug interrupt onenabled hot plug events.Command Completed Interrupt Enable – Thisbit when set enables the generation of hot pluginterrupt when a command is completed by thehot plug control logic.RW 0RW 0375


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.07.7.4. MessagesThe messages defined here allow for cards to implement indicators and buttons on the cardwithout having to connect signals directly to the port. Detailed explanation of each messageis located in Chapter 2.7.7.4.1. Messages for Attention IndicatorThis series of messages allows the Attention Indicator be implemented on the card asopposed to the chassis. These messages are sent by the downstream port to the device andinstruct the device to set its Attention Indicator to the indicated state. The followingmessages are used:ATTENTION_INDICATOR_ONATTENTION_INDICATOR_BLINKATTENTION_INDICATOR_OFFAll Endpoint devices are required to handle the Attention Indicator messages even if thedevice does not implement the indicators.7.7.4.2. Messages for Power IndicatorThis series of messages allows the Power Indicator be implemented on the card as opposedto the chassis. These messages are sent by the downstream port to the device and instructthe device to set its Power Indicator to the indicated state. The following messages are used:POWER_INDICATOR_ONPOWER_INDICATOR_BLINKPOWER_INDICATOR_OFFAll Endpoint devices are required to handle the Power Indicator messages even if the devicedoes not implement the indicators.7.7.4.3. Messages for Attention ButtonATTENTION_BUTTON_PRESSED - This message allows the attention button to beimplemented on the card and informs the port that the attention button has been pressed.Upon receipt of this message the port terminates the message and sets the Attention ButtonPressed bit in the Hot-plug Event Register.All down stream ports of switches and root ports are required to handle theAttention_Button_Pressed message.376


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.07.7.5. <strong>PCI</strong> <strong>Express</strong> Hot Plug Interrupt/Wake Signal LogicA port with hot plug compatibilities supports generation of hot plug interrupts on thefollowing hot plug events:• Attention Button Pressed• Power Fault Detected• MRL Sensor Changed• Presence Detect ChangedWhen the system is in a sleep state or if the hot plug capable port is in a device state D1, D2,or D3hot, the enabled hot plug controller events generate a wake message (using PMEmechanism) instead of a hot plug interrupt.A hot plug capable port also supports generation of hot plug interrupt when the hot plugcontrol logic completes an issued command. However, if the system is in a sleep state or ifthe hot plug capable port is in a device state D1, D2, or D3hot, a wake event will not begenerated.Figure 7-8 shows the logical connection between the hot plug event logic and the systeminterrupt/wake generation logic.377


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0INTx#Hot-plugInterrupt EnableXCommandCompletedInterrupt EnableXMSI EnableXMSIgenerationlogicSHPC InterruptMessageCommandCompletedCPME_En Bit(or host bridgeequivalent)XWakeupSignalSlot StatusRegisterSlot ControlRegisterAttentionButtonPressedCPower FaultDetectedAttentionButton PressedEnableXPower FaultDetectedEnableXCMRL SensorChangedMRL SensorChangedEnableXCPresenceDetectChangedPresenceDetectChangedEnableXCCREGISTER BIT -RW1CXREGISTER BIT -READ/WRITEFigure 7-8: Hot Plug Logic378


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.07.7.6. The Operating System Hot Plug MethodSome systems that include hot plug capable root ports and switches that are released beforeACPI-compliant operating systems with native hot plug support are available, can use ACPIfirmware for propagating hot plug events. Firmware control of the hot plug registers mustbe disabled if an operating system with native support is used. Platforms that provide ACPIfirmware to propagate hot plug events must also provide a control method to transfercontrol to the operating system. This method is called Operating System Hot Plug (OSHP)and is provided for each port that is hot plug capable and being controlled by ACPIfirmware.Operating systems with native hot plug support must execute the OSHP method, if present,for each hot plug capable port before accessing the hot plug registers and when returningfrom a hibernated state. If a port’s OSHP method is executed multiple times, and the switchto operating system control has already been achieved, the method must return successfullywithout doing anything. After the OSHP method is executed, the firmware must not accessthe ports hot plug registers. If any signals such as the System Interrupt or PME# have beenredirected for servicing by the firmware, they must be restored appropriately for operatingsystem control.The following is an example of a namespace entry for an SHPC that is managed byfirmware.Device(PPB1){...}Method(OSHP, 0) {// Disable firmware access to SHPC and restore// the normal System Interrupt and Wakeup Signal// connection.}...Implementation Note: Controlling Hot Plug by Using ACPIWhen using ACPI to control the hot plug events, the following should be considered:Firmware should redirect the System Interrupt to a GPE so that A<strong>PCI</strong> can service theinterrupts instead of the operating system. An appropriate _Exx GPE handler should beprovided. When an operating system with native hot plug support executes the OSHPmethod, the firmware restores the normal System Interrupt so the interrupts can be servicedby the operating system.379


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.07.8. Power Budgeting CapabilityWith the addition of a hot plug capability for add-in cards, the need arises for the system tobe capable of properly allocating power to any new devices added to the system. Thiscapability is a separate and distinct function from power management and a basic level ofsupport is required to ensure proper operation of the system. The power budgeting conceptputs in place the building blocks that allow devices to interact with system to achieve thesegoals. There are many ways in which the system can implement the actual powermanagement capabilities, and as such, they are beyond the scope of this specification.Devices that will be present on hot pluggable add-in cards are required to implement thepower budgeting capabilities. Devices that are implemented for use on add-in cards or onthe motherboard have the option of supporting the power budgeting capability. Devicesthat are designed for both add-in cards and modules must implement power budgeting. Thedevices and/or add-in cards are required by <strong>PCI</strong> <strong>Express</strong> to remain under the configurationpower limit specified in the corresponding electromechanical specification until they havebeen configured and enabled by the system. The system should guarantee that power hasbeen properly budgeted prior to enabling an add-in card.7.8.1. System Power Budgeting Process RecommendationsIt is recommended that system firmware provide the power budget management agent thefollowing information:• Total system power budget (power supply information).• Total power allocated by system firmware (motherboard devices).• Total number of slots and the types of slots.System firmware is responsible for allocating power for all devices on the motherboard thatdo not have power budgeting capabilities. The firmware may or may not include standard<strong>PCI</strong> <strong>Express</strong> devices that are connected to the standard power rails. When the firmwareallocates the power for a device then it must set the SYSTEM_ALLOC bit to “1” to indicatethat it has been properly allocated. The power budget manager is responsible for allocatingall <strong>PCI</strong> <strong>Express</strong> devices including motherboard devices that have the power budgetingcapability and have not been marked allocated. The power budget manager is responsiblefor determining if hot plugged devices can be budgeted and enabled in the system.There are alternate methods which may provide the same functionality, and it is not requiredthat the Power Budgeting Process be implemented in this manner.380


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.07.9. Slot Power Limit Control<strong>PCI</strong> <strong>Express</strong> provides a mechanism for software controlled limiting of the maximum powerper slot that <strong>PCI</strong> <strong>Express</strong> card/module (associated with that slot) can consume. The keyelements of this mechanism are the:• Slot Power Limit Value and Scale fields of the Slot Capability register implementedin the Downstream Ports of a Root Complex and a Switch• Slot Power Limit Value and Scale fields of the Device Capability registerimplemented in the Upstream Ports of a Endpoint, Switch and <strong>PCI</strong> <strong>Express</strong>-<strong>PCI</strong>Bridge• Set_Slot_Power_Limit message that conveys the content of the Slot Power LimitValue and Scale fields of the Slot Capability register of the Downstream Port (of aRoot Complex or a Switch) to the corresponding Slot Power Limit Value and Scalefields of the Device Capability register in the Upstream Port of the componentconnected to the same LinkPower limits on the platform are typically controlled by the software (for example, platformfirmware) that comprehends the specifics of the platform such as:• partitioning of the platform, including slots for IO expansion using add-incards/modules• power delivery capabilities• thermal capabilitiesThis software is responsible for correctly programming the Slot Power Limit Value and Scalefields of the Slot Capability registers of the Downstream Ports connected to IO expansionslots. After the value has been written into the register within the Downstream Port, it isconveyed to the other component connected to that port using the Set_Slot_Power_Limitmessage (see Section 2.8.1.5. The receipient of the message must use the value in themessage data payload to limit usage of the power for the entire card/module, unless thecard/module will never exceed the lowest value specified in the correspondnigelectromechanical specification. It is assumed that device driver software associated withcard/module will be able (by reading the values of the Slot Power Limit Value and Scalefields of the Device Capability register) to configure hardware of the card/module toguarantee that the card/module will not exceed imposed limit. In the case where theplatform imposes a limit that is below minimum needed for adequate operation, the devicedriver will be able to communicate this discreprency to higher level configuration software.The following rules cover the Slot Power Limit control mechanism:For Cards/Modules:• Until and unless a Set_Slot_Power_Limit message is received indicating a Slot PowerLimit value greater than the lowest value specified in the electromechanicalspecification for the card/module's form factor, the card/module must not consumemore than the lowest value specified.381


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0• A card/module must never consume more power than what was specified in themost recently received Set_Slot_Power_Limit message.• Endpoint, Switch and <strong>PCI</strong> <strong>Express</strong>-<strong>PCI</strong> Bridge components that are targeted forintegration on a card/module where total consumed power is below lowest limitdefined for the targeted form factor are permitted to ignore Set_Slot_Power_Limitmessages, and to return a value of 0 in the Slot Power Limit Value and Scale fields ofthe Device Capability register• Such components still must be able to receive the Set_Slot_Power_Limit messagecorrectly but simply discard the messageFor Root Complex and Switches which source slots:• A Downstream Port must not transmit a Set_Slot_Power_Limit message whichindicates a limit that is lower than the lowest value specified in the electromechanicalspecification for the slot's form factor.Implementation Note: Slot Power Limit Control RegistersTypically Slot Power Limit registers within Downstream Ports of Root Complex or a SwitchDevice will be programmed by platform-specific software. Some implementations may use ahardware method for initializing the values of these registers and therefore not requiresoftware support.Endpoint, Switch and <strong>PCI</strong> <strong>Express</strong>-<strong>PCI</strong> Bridge components that are targeted for integrationon the card/module where total consumed power is below lowest limit defined for that formfactor are allowed to ignore Set_Slot_Power_Limit messages. Note that <strong>PCI</strong> <strong>Express</strong>components that take this implementation approach may not be compatible with potentialfuture defined form factors. Such form factors may impose a lower power limit which isbelow the minimum required by a new card/module based on the existing component.382


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0AA. Isochronous Applications and SupportA.1.IntroductionData traffic in <strong>PCI</strong> <strong>Express</strong> environment can be generally classified in two categories: bulkdatatraffic and real-time traffic.While a <strong>PCI</strong> <strong>Express</strong> Endpoint device with bulk data transfer generally requires highthroughput and low latency to achieve good performance, it can tolerate occasional datatransfers that complete with arbitrarily long delays. The normal semantics for generalpurposeI/O transactions, as defined for <strong>PCI</strong> <strong>Express</strong> default Traffic Class (TC0), aresupported by the default <strong>PCI</strong> <strong>Express</strong> Virtual Channel (VC0). VC0 supports bulk datatransfer by providing “best-effort” class of service. This means, since there is no trafficregulation for the VC0, during any given time period, any device may issue more transactionsthan <strong>PCI</strong> <strong>Express</strong> Links can support and may saturate the physical <strong>PCI</strong> <strong>Express</strong> Links.Therefore, there is no guaranteed bandwidth or deterministic latency provided to the deviceby the VC0. This is why the default general purpose I/O Traffic Class is referred to as the"best-effort" Traffic Class.On the other hand, a <strong>PCI</strong> <strong>Express</strong> Endpoint device with real-time data transferrequirements, such as audio and video data streaming, would continuously/periodicallygenerate <strong>PCI</strong> <strong>Express</strong> transactions. The amount of bandwidth that device can consume willdepend on device’s requirements and may be subject of limitation that can be imposed bythe <strong>PCI</strong> <strong>Express</strong> platform software and hardware. Isochronous data transfer protocol in<strong>PCI</strong> <strong>Express</strong> is designed to provide not only guaranteed data bandwidth but alsodeterministic service latency. The design goal of isochronous mechanisms in <strong>PCI</strong> <strong>Express</strong> isto ensure that isochronous traffic receives its allocated bandwidth over a relevant timeperiod while also preventing starvation of other non-isochronous traffic.Furthermore, there may exist data traffic that requires level of service falling in between whatare required for bulk data traffic and isochronous data traffic. These types of transactionscan be supported in general by using Traffic Classes (TC1 to TC7) associated withdifferentiated services. However, details of service policies for these Traffic Classes are notaddressed in this section.Two paradigms of <strong>PCI</strong> <strong>Express</strong> communication are supported by the <strong>PCI</strong> <strong>Express</strong>isochronous mechanisms: Endpoint-to-Root-Complex communication model and Peer-to-Peer (Endpoint-to-Endpoint) communication model. In the Endpoint-to-Root-Complexcommunication model, the primary isochronous traffic is memory read and write requests tothe Root Complex and read completions from the Root Complex. In the Peer-to-Peermodel, isochronous traffic is limited to unicast push-only transactions (memory writes ormessages). The push-only transactions can be within a single host domain or across multiple383


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0host domains. Figure A-1 shows an example of a simple system with both communicationmodels. In the figure, devices A, B, called Requesters, are <strong>PCI</strong> <strong>Express</strong> Endpoint devicescapable of issuing isochronous request transactions, while device C and Root Complex,called Completers, are capable of being the targets of isochronous request transactions. AnEndpoint-to-Root-Complex communication is established between device A and the RootComplex, and a Peer-to-Peer communication is established between device B and device C.In the rest of this section, Requester and Completer will be used to make reference to <strong>PCI</strong><strong>Express</strong> elements involved in transactions. The specific aspects of each communicationmodel will be called out explicitly.Root Complex(Completer)ReadCompletionsRead/WriteRequestsWriteRequestsSwitchWriteRequestsDevice A(Requester)Device B(Requester)Device C(Completer)Isochronous traffic flowFigure A-1: An Example Showing Endpoint-to-Root-Complex and Peer-to-PeerCommunication Models• Guaranteed bandwidth and deterministic latency requires end to end isochronousservice. If the Isochronous TC is ever mixed with other Traffic Classes in the sameVirtual Channel on a <strong>PCI</strong> <strong>Express</strong> Link, then head of line blocking caused by trafficinteraction and flow control may compromise the Quality of Service (QoS) forisochronous transactions. Although some level of QoS may be provided if this trafficmixing occurs only on a small portion of the data path, it may not be quantifiable.Therefore, for the rest of this Section, we assume that dedicated Virtual Channels areprovided for the Isochronous TC on each <strong>PCI</strong> <strong>Express</strong> Link to provide end to endisochronous service and all <strong>PCI</strong> <strong>Express</strong> components along the path between theRequester and the Completer meet the requirements described in this Section. Thededicated Virtual Channel for the Isochronous TC can also be called Isochronous VC.Specifically, system software must obey the rules described in Section 2.6.4.384


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0A.2.Isochronous Contract and Contract ParametersIn order to support isochronous data transfer with guaranteed bandwidth and deterministiclatency, an isochronous contract must be established between a Requester/Completer pairand the <strong>PCI</strong> <strong>Express</strong> fabric. This contract must enforce both resource reservation andtraffic regulation. Without such contract, two basic problems, over-subscription andcongestion, may occur as illustrated in Figure A-2. When interconnect bandwidth resourcesare over-subscribed, the increased latency may cause failure of isochronous service andstarvation of non-isochronous services. Traffic congestion occurs when too manyisochronous requests are issued in a short time window. This potentially causes excessiveservice latencies for both isochronous traffic and non-isochronous traffic.IsochronousRequests:Over-subscriptionTimeCongestionIsochronousRequests:TimeFigure A-2: Two Basic Bandwidth Resourcing Problems: Over-Subscription andCongestionThe isochronous transfer mechanism in this specification addresses these problems withtraffic regulation including admission control and service discipline. Under a softwaremanaged admission control, a Requester must not issue isochronous transactions unless therequired isochronous bandwidth and resource have been allocated. Specifically, theisochronous bandwidth is given by the following formula:N ⋅YBW = .TThe formula defines allocated bandwidth (BW) as a function of specified number (N) oftransactions of a specified payload size (Y) within a specified time period (T). Anotherimportant parameter in the isochronous contract is latency. <strong>Base</strong>d on the contract,isochronous transactions are completed within a specified latency (L). Once aRequester/Completer pair is admitted for isochronous communication, the bandwidth andlatency are guaranteed to the Requester (A <strong>PCI</strong> <strong>Express</strong> Endpoint device) by the Completer(Root Complex for Endpoint-to-Root-Complex communication and another <strong>PCI</strong> <strong>Express</strong>Endpoint device for Peer-to-Peer communication) and by the <strong>PCI</strong> <strong>Express</strong> fabriccomponents (Switches). Specific service disciplines must be implemented by isochronouscapable<strong>PCI</strong> <strong>Express</strong> components. The service disciplines are imposed to <strong>PCI</strong> <strong>Express</strong>Switches and Completers in such a manner that the service of isochronous requests issubject to a specific service interval (t). This mechanism is used to provide the method of385


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0controlling when an isochronous packet injected by a Requester is serviced. Consequently,isochronous traffic is policed in such manner that only packets that can be injected into thefabric in compliance with the isochronous contract are allowed to make immediate progressand start being serviced by the <strong>PCI</strong> <strong>Express</strong> fabric. A non-compliant Requester that tries toinject more isochronous transactions than what was being allowed by the contract isprevented from doing so by the flow-control mechanism. In this way the isochronousservice to other well-behaved (compliant) Requesters will not be affected by the noncompliantdevice.In the Endpoint-to-Root-Complex model, since the aggregated isochronous traffic iseventually limited by the host memory subsystem's bandwidth capabilities, isochronous readrequests, write requests (and messages) are budgeted together. A Requester may divide theisochronous bandwidth between read requests and write requests as appropriate.In the (push-only) Peer-to-Peer model, isochronous bandwidth only applies to requesttransactions.A.2.1. Isochronous Time Period and Isochronous VirtualTimeslotThe <strong>PCI</strong> <strong>Express</strong> isochronous time period (T) is uniformly divided into units of virtualtimeslots (t). Up to one isochronous request is allowed within one virtual timeslot. Thevirtual timeslot supported by a <strong>PCI</strong> <strong>Express</strong> component is reported through the ReferenceClock field in the <strong>PCI</strong> <strong>Express</strong> Virtual Channel Capability Structure defined in Section 5.11.When Reference Clock = 00b, duration of a virtual timeslot t is 100 ns. Duration ofisochronous time period T depends on the number of phases of the supported time-basedWRR port arbitration table size. When the time-based WRR Port Arbitration Table sizeequals to 128, there are 128 virtual timeslots (t) in an isochronous time period, i.e.T = 12.8 ms.Note that isochronous period T as well as virtual timeslots t do not need to be aligned andsynchronized among different <strong>PCI</strong> <strong>Express</strong> isochronous devices, i.e., notion of {T, t} is localto each individual isochronous device.A.2.2. Isochronous Payload SizeThe payload size (Y) for isochronous transactions must not exceed Max Payload Size (seeSection 5.8.4). After configuration, Max Payload Size is fixed within a <strong>PCI</strong> <strong>Express</strong>hierarchy domain. The fixed Max Payload Size value is used for isochronous bandwidthbudgeting regardless of the actual size of data payload associated with isochronoustransactions. For isochronous bandwidth budgeting, we haveY = Max_Payload_Size .In order for Completers to meet isochronous contract, Requesters must ensure that anyisochronous request contains a naturally aligned data block. A transaction with partial writeis treated as a normally accounted transaction. A Completer must account for partial writesas part of bandwidth assignment (for worst case servicing time).386


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0A.2.3. Isochronous Bandwidth AllocationGiven T, t and Y, the maximum virtual timeslots within a time period isTNmax= ,tand the maximum specifiable isochronous bandwidth isYBWmax= .tThe granularity with which isochronous bandwidth can be allocated is defined as:YBWgranularit y= .TGiven T and t at 12.8 ms and 100 ns, respectively, N max is 128. As shown in Table A-1,BW max and BW granularity are functions of the isochronous payload size Y.Table A-1: Isochronous Bandwidth Ranges and GranularitiesY (bytes) 128 256 512 1024BW max (MB/s) 1280 2560 5120 10240BW granularity (MB/s) 10 20 40 80Assigning isochronous bandwidth BW link to a <strong>PCI</strong> <strong>Express</strong> Link is equivalent to assigningN link virtual timeslots per isochronous period, where N link is given byBWlinkNlink= .BWgranularityFor a Switch port serving as an Egress Port (or a RCRB serving as a 'virtual' Egress Port) foran isochronous traffic, the N max virtual timeslots within T are represented by the time-basedWRR Port Arbitration Table in the <strong>PCI</strong> <strong>Express</strong> Virtual Channel Capability Structuredetailed in Section 5.11. The table consists of N max entries. An entry in the table representsone virtual timeslot in the isochronous time period. When a table entry is given a value ofPN, it means that that timeslot is assigned to an ingress port (in respect to the isochronoustraffic targeting the Egress Port) designated by a Port Number of PN. Therefore, N linkvirtual timeslots are assigned to the ingress port when there are N link entries in the table aregiven value of PN. The Egress Port may admit one isochronous request transaction fromthe ingress port for further service only when the table entry reached by the Egress Port'sisochronous time ticker (that increments by 1 every t time and wraps around when reachingT) is set to PN. Even if there are outstanding isochronous requests ready in the ingress port,they will not be served until next round of time-based WRR arbitration. In this manner, thetime-based Port Arbitration Table serves for both isochronous bandwidth assignment andisochronous traffic regulation.For a <strong>PCI</strong> <strong>Express</strong> Endpoint device serving as a Requester or a Completer, isochronousbandwidth allocation is accomplished through negotiation between system software anddevice driver, which is outside of the scope of this specification.387


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0A.2.4. Isochronous Transaction LatencyTransaction latency is composed of the latency through the <strong>PCI</strong> <strong>Express</strong> fabric and thelatency contributed by the rest of the system. For memory transactions, transaction latencyis the accumulated delay across the <strong>PCI</strong> <strong>Express</strong> fabric plus the service delay of theCompleter. Isochronous transaction latency is defined for each transaction and measured inunits of virtual timeslot t.• For a Requester in the Endpoint-to-Root-Complex model, the read latency is defined asthe round-trip latency. This is the delay from the time when the device submits amemory read request packet to its Transaction Layer (transmit side) to the time when thecorresponding read completion arrives at the device's Transaction Layer (receive side).• For a Requester in both Endpoint-to-Root-Complex and Peer-to-Peer models, the writelatency is defined as the delay from the time when the Requester posts a memory writerequest to its <strong>PCI</strong> <strong>Express</strong> Transaction Layer (transmit side) to the time when the datawrite becomes globally visible within the memory subsystem of the Completer. A writeto memory reaches the point of global visibility when all agents accessing that memoryaddress get the updated data.As part of the isochronous contract, the upper bound and the lower bound of isochronoustransaction latency are provided. The size of isochronous data buffers in a Requester can bedetermined using the minimum and maximum isochronous transaction latencies. As shownlater, for most of common platforms, the minimum isochronous transaction latency is muchsmaller than the maximum isochronous transaction latency. As a conservative measure, weset the minimum isochronous transaction latency to zero and only provide guidelines onmeasuring the maximum isochronous transaction latency.For a Requester, the maximum isochronous (read or write) transaction latency (L) can beaccounted as the following:L = L Fabric+ L Completer,where L Fabric is the maximum latency of the <strong>PCI</strong> <strong>Express</strong> fabric and L Completer is the maximumlatency of the Completer.Transaction latency for a <strong>PCI</strong> <strong>Express</strong> Link or a <strong>PCI</strong> <strong>Express</strong> fabric, L Fabric , is defined as thedelay from the time a transaction is posted at the transmission end to the time it is availableat the receiving end. This applies to both read and write transactions. (Note that readtransactions traverse <strong>PCI</strong> <strong>Express</strong> fabric twice, first time during the request phase andsecond time during the completion phase.) L Fabric depends on the topology, latency due toeach <strong>PCI</strong> <strong>Express</strong> Link and arbitration point in the path from the Requester to theCompleter. The latency on a <strong>PCI</strong> <strong>Express</strong> Link depends on pipeline delays, width andoperational frequency of the Link, transmission of electrical signals across the medium, wakeup latency from low power states, and delays caused by Data Link Layer Retry.As specified later, a restriction on the <strong>PCI</strong> <strong>Express</strong> topology is imposed for each targetedplatform in order to provide a practically meaningful guideline for L Fabric . The values of L Fabricprovided in the guideline should be reasonable and serve as practical upper limits undernormal operating conditions.388


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0The value of L Completer depends on the memory technology and specific memory configurationsettings and the arbitration policies in the Completer that comprehend <strong>PCI</strong> <strong>Express</strong>isochronous traffic. The target value for L Completer should provide enough headroom to allowfor implementation tradeoffs.Definitions of read and write transaction latencies for a Completer are different:• Read transaction latency for the Completer is defined as the delay from the time amemory read transaction is available at the receiver end of a <strong>PCI</strong> <strong>Express</strong> Port in theCompleter to the time the corresponding read completion transaction is posted to thetransmission end of the <strong>PCI</strong> <strong>Express</strong> Port.• Write transaction latency is defined as the delay from the time a memory writetransaction is available at the receiver end of a <strong>PCI</strong> <strong>Express</strong> Port in the Completer to thetime that the transmitted data is globally visible.All of the isochronous transaction latencies defined above are based on the assumption thatthe Requester injects isochronous transactions uniformly. According to an isochronouscontract of {N, T, t}, the uniform traffic injection is defined such that up to N transactionsare evenly distributed over the isochronous period T based on a ticker granularity of virtualtimeslot t. For a Requester with non-uniform isochronous transaction injection, theRequester is responsible of accounting for any additional delay due to the deviation of itsinjection pattern from a uniform injection pattern.A.2.5. An Example Illustrating Isochronous ParametersFigure A-3 illustrates the key isochronous parameters using a simplified example withT = 20t and L = 22t. A Requester has reserved isochronous bandwidth of four transactionsper T. The device shares the allocated isochronous bandwidths for both read requests andwrite requests. As shown, during one isochronous time period, two read requests and twowrite requests are issued by the Requester. All requests are completed within the designatedtransaction latency L. Also shown in the figure, there is no time dependency between theservice time of write requests and the arrival time of read completions.RequestsCompletionsTR 1 W 1R 2 W 2LW 1 W 2 R 1 R 2tFigure A-3: A Simplified Example Illustrating <strong>PCI</strong> <strong>Express</strong> Isochronous Parameters389


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0A.3.Isochronous Transaction RulesIsochronous transactions follow the same rules as described in Chapter 2. In order to assistthe Completer to meet latency requirements, the following additional rules further illustrateand clarify the proper behavior of isochronous transactions:• The value in the Length field of read requests must never exceed Max Payload Size.• All read and write requests must never cross naturally aligned address boundaries.A.4.Transaction OrderingIn general, isochronous transactions follow the same ordering rules as described in Section2.5. The following ordering rules further illustrate and clarify the proper behavior ofisochronous transactions:• There is no ordering between isochronous transactions and other <strong>PCI</strong> <strong>Express</strong>transactions, since on each <strong>PCI</strong> <strong>Express</strong> Link isochronous transactions are mapped to adedicated Virtual Channel and are not mixed with transactions of other Traffic Classes.• Isochronous write requests are served on any <strong>PCI</strong> <strong>Express</strong> Link in strictly the sameorder as isochronous write requests are posted.• Switches must allow isochronous write requests to pass isochronous read completions.A.5.Isochronous Data CoherencyCache coherency for isochronous transactions is not an I/O interconnect issue but rather anoperating system software and Root Complex hardware issue. This specification providesthe necessary mechanism to control Root Complex behavior in terms of enforcing hardwarecache coherency on a transaction basis.For platforms where snoop latency in a Root Complex is either unbounded or can beexcessively large, in order to meet tight maximum isochronous transaction latency L Completer , ormore precisely L Root_Complex , all isochronous transactions must have the “Snoop Not Required”Attribute bit set.Root Complex must report the Root Complex's capability to the system software by settingthe Snoop Transaction Permitted field in the VC Resource Capability Register (for the VCresource intended for isochronous traffic) in RCRB. <strong>Base</strong>d on whether or not a RootComplex is capable of providing hardware enforced cache coherency for isochronous trafficwhile still meeting isochronous latency target, system software can then inform device driveof Endpoint devices to set or unset the “Snoop Not Required” Attribute bit for isochronoustransactions.Note that cache coherency considerations for isochronous traffic do not apply to Peer-to-Peer communication.390


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0A.6.Flow ControlCompleters (<strong>PCI</strong> <strong>Express</strong> Endpoint device or Root Complex) and <strong>PCI</strong> <strong>Express</strong> fabriccomponents should implement proper sizing of buffers such that under normal operatingconditions, no back-pressure due to flow control should be applied to isochronous trafficinjected uniformly by a Requester. For Requesters that are compliant to the isochronouscontract, but have bursty injection behavior, Switches and Completers may apply flowcontrol back-pressure as long as the admitted isochronous traffic is uniform and compliantto the isochronous contract. Under abnormal conditions when isochronous traffic jitterbecomes significant or when isochronous traffic is oversubscribed either due to excessiveData Link Layer Retry, flow control provides a natural mechanism to ensure functionalcorrectness.A.7.Topology RestrictionsTotal service latency for a Requester depends on the position of that device within aparticular <strong>PCI</strong> <strong>Express</strong> topology. In order to provide a realistic upper bound of suchlatency, it is necessary to establish topology restrictions for target platforms.For desktop, volume workstation and mobile platforms, the worst case topology is 3-leveldeep as shown in Figure A-4. In other words, for Endpoint-to-Root-Complexcommunication, a <strong>PCI</strong> <strong>Express</strong> Endpoint device with isochronous service request needs tobe able to work on a platform with two levels of Switches between it and the Root Complex.Peer-to-Peer communication should also work for the same 3-level deep <strong>PCI</strong> <strong>Express</strong>topology for two <strong>PCI</strong> <strong>Express</strong> Endpoint devices that support Peer-to-Peer communication.For server, high-end workstation and embedded communication platforms, the worst casetopology can go beyond 3-level deep. In these platforms, a <strong>PCI</strong> <strong>Express</strong> Endpoint devicewith isochronous service request may connect to a Root Complex or other peer Endpointdevices through cascaded Switches.391


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0Root ComplexEndPointSwitchSwitchEndPointEndPointFigure A-4: An Example of <strong>PCI</strong> <strong>Express</strong> Topology Supporting IsochronousApplicationsA.8.Transfer ReliabilitySame as for non-isochronous traffic, reliable transfer is provided for isochronous traffic by<strong>PCI</strong> <strong>Express</strong> interconnect and Completer's memory subsystem. In other words, once anisochronous request is accepted in the <strong>PCI</strong> <strong>Express</strong> fabric, it will not be dropped by any <strong>PCI</strong><strong>Express</strong> component. When the request requires completion, corresponding completionpacket(s) will be returned to the requester. Requesters are responsible for shaping andconditioning isochronous traffic. With resource reservation and traffic regulationmechanism described above, guaranteed isochronous service is provided under normaloperating conditions. When such conditions are not met, errors due to retry and flowcontrol manifest in excessive latencies for isochronous transactions. In order to resolve thecongestion caused by excessive retries and flow control (for example, one retry perisochronous period per Link may be budgeted in isochronous resource reservation managedby system software), a Requester may delay or drop non-committed isochronous requests. Itmay also drop late-received completions. For late-received data packets in the Completer'smemory subsystem, it is up to the application and/or driver software to determine if datashould be discarded.392


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0A.9.Considerations for Bandwidth AllocationA.9.1. Isochronous Bandwidth of <strong>PCI</strong> <strong>Express</strong> LinksIsochronous bandwidth budgeting for <strong>PCI</strong> <strong>Express</strong> Links can be derived based on Linkparameters such as isochronous payload size, the speed, and the width of the Link.Isochronous bandwidth allocation for a <strong>PCI</strong> <strong>Express</strong> Link is limited to certain percentage ofthe maximum effective Link bandwidth in order to leave sufficient bandwidth for nonisochronoustraffic and to account for temporary Link bandwidth reduction due to retries.Link utilization is counted based on the actual cycles consumed on the physical <strong>PCI</strong> <strong>Express</strong>Link. The maximum number of virtual timeslots allowed per Link (N link ) depends on theisochronous packet payload size and also the speed and width of the Link. Table A-2: showsN link and Link utilization as functions of isochronous payload size and <strong>PCI</strong> <strong>Express</strong> Linkwidth when the Link runs at 2.5 GHz and isochronous Link utilization limited to 50%. Forlow to medium <strong>PCI</strong> <strong>Express</strong> Links width (with number of Lanes between 1 and 8), therelatively slow Link bandwidth limits the isochronous resource (virtual timeslot) allocation.However, for wider <strong>PCI</strong> <strong>Express</strong> Links (with 12 or 16 Lanes), the relatively large virtualtimeslot (at 100 n) limits the isochronous resource allocation.Table A-2: Maximum Number of Virtual Timeslots Allowed for Different <strong>PCI</strong><strong>Express</strong> Links at 2.5 GHz# Lanes 1 2 48 12 16Y (Bytes) N link %Util N link %Util N link %Util N link %Util N link %Util N link %Util128 11 50% 22 50% 44 50% 88 50% 128 48% 128 36%256 5 43% 11 47% 23 49% 46 49% 70 50% 93 50%512 3 50% 6 50% 12 50% 24 50% 36 50% 48 50%As isochronous bandwidth allocation on a <strong>PCI</strong> <strong>Express</strong> Link is based on number oftransactions N link per isochronous period. There is no distinction between read requests andwrite requests in budgeting isochronous bandwidth on a <strong>PCI</strong> <strong>Express</strong> Link. In other words,even though a read request packet (without payload) can be much smaller than a writerequest packet (with payload), their Link utilization is accounted as the same according to thelarger one (a write request). This is because for each read request in one direction of a <strong>PCI</strong><strong>Express</strong> Link there will be one or more read completions with payload on the otherdirection of the <strong>PCI</strong> <strong>Express</strong> Link. Without differentiating between read and write requesttransactions, the allocated isochronous bandwidth for a <strong>PCI</strong> <strong>Express</strong> Link in the Endpointto-Root-Complexmodel is assumed to consume bandwidth in both directions. For thepush-only Peer-to-Peer model, software may take advantage of the unidirectionalisochronous traffic pattern in budgeting <strong>PCI</strong> <strong>Express</strong> Link resource.393


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0A.9.2. Isochronous Bandwidth of Endpoint DevicesFor Peer-to-Peer communication, when a <strong>PCI</strong> <strong>Express</strong> Endpoint device serves as theCompleter of isochronous traffic, its device driver is responsible for reporting to theoperating system-level <strong>PCI</strong> <strong>Express</strong> isochronous configuration software if the device iscapable of being a Completer for isochronous transactions. In addition, the device drivermust report if there is enough bandwidth to service the requests within the Completer’smemory subsystem. The specifics of the reporting mechanism are outside of the scope ofthis specification.A.9.3. Isochronous Bandwidth of SwitchesAllocation of isochronous bandwidth for a Switch must consider the capacity and utilizationof <strong>PCI</strong> <strong>Express</strong> Links associated with the ingress Port and the Egress Port of the Switch thatconnect the Requester and the Completer, respectively. The lowest common denominatorof the two determines if a requested isochronous bandwidth can be supported.A.9.4. Isochronous Bandwidth of Root ComplexIsochronous bandwidth of Root Complex is reported to the software through RCRBStructure. Specifically, the Maximum Time Slots field of the VC Resource Capability Registerin VC Capability Structure indicate the total isochronous bandwidth shared by the RootPorts associated with the RCRB. Details of the platform budgeting for available isochronousbandwidth within a Root Complex are outside of the scope of this specification.A.10. Considerations for <strong>PCI</strong> <strong>Express</strong> ComponentsA.10.1.A <strong>PCI</strong> <strong>Express</strong> Endpoint Device as a RequesterBefore a <strong>PCI</strong> <strong>Express</strong> Endpoint device as a Requester can start issuing isochronous requesttransactions, the following configuration steps must be performed by software:• Configuration of an Isochronous Virtual Channel that Isochronous Traffic Class ismapped to.• Enabling of the Isochronous VC.According to the rules stated in Chapter 2, an Endpoint Requester must issue isochronoustransactions using Flow Control credits available for the corresponding Isochronous VC.When isochronous transactions (requests) are injected uniformly, the receive Port, being aSwitch Port or a Root Port, will issue Flow Control credit back promptly such that no backpressureis applied to the Isochronous VC. Therefore, the Endpoint Requester can size itsbuffer based on the <strong>PCI</strong> <strong>Express</strong> fabric latency L Fabric plus the completer's latency L Completer .When isochronous transactions are injected non-uniformly, either some transactionsexperience longer <strong>PCI</strong> <strong>Express</strong> fabric delay or the Endpoint Requester gets back-pressured394


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0on the Isochronous VC. This kind of Requester must size its buffer to account for thedeviation of its injection pattern from uniformity.A.10.2.A <strong>PCI</strong> <strong>Express</strong> Endpoint Device as a CompleterA <strong>PCI</strong> <strong>Express</strong> Endpoint device may serve as a Completer for isochronous Peer-to-Peercommunication. Before a <strong>PCI</strong> <strong>Express</strong> Endpoint device starts serving isochronoustransactions, its <strong>PCI</strong> <strong>Express</strong> Port must be configured by operating system-levelconfiguration software to enable an Isochronous VC.An Endpoint Completer must observe the maximum isochronous transaction latency(L Completer ). How an Endpoint Completer schedules memory cycles for <strong>PCI</strong> <strong>Express</strong>isochronous transactions and other memory transactions is outside of the scope of thisspecification as long as L Completer is met for <strong>PCI</strong> <strong>Express</strong> isochronous transactions.An Endpoint Completer communicates with a Requester through <strong>PCI</strong> <strong>Express</strong> fabriccomponents such as Switches. Since isochronous requests injected to an EndpointCompleter have already been regulated by Switches to conform to the isochronous contract,the Endpoint Completer does not have to regulate isochronous request traffic. However, anEndpoint Completer must size its internal buffer such that no back-pressure is applied to theIsochronous VC.Since Switches do not check for the additional isochronous transactions rules stated inSection A.3, an Endpoint Completer may perform the following operations for invalidisochronous transactions:• Return partial completions for read requests with the value in the Length field exceedingMax Payload Size.• Return partial completions for read requests that cross naturally aligned addressboundaries.• Write partial data for write requests that cross naturally aligned address boundaries.395


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0A.10.3.SwitchesA Switch may have multiple ports capable of supporting isochronous transactions. Before aSwitch starts serving isochronous transactions for a port, the following configuration stepsmust be performed by the software:• Configuration of an Isochronous Virtual Channel that Isochronous Traffic Class ismapped to.• Configuration of the port as an ingress port:o Configuration (or reconfiguration if the Egress Port Isochronous VC is alreadyenabled) of the time-based WRR Port Arbitration Table of the targeting EgressPort to include N link entries set to the ingress port's Port Number. Here N link isthe isochronous allocation for the ingress port.o Enabling the targeting Egress Port to load newly programmed Port ArbitrationTable.• Configuration of the port as an Egress Port:o Configuration of the Isochronous VC's Port Arbitration Table with number ofentries set according to the assigned isochronous bandwidth for all ingress portswith isochronous traffic targeting the Egress Port.o Select proper VC Arbitration such as strict-priority based VC Arbitration.o If required, configuration of the port's VC Arbitration Table with large weightsassigned to the Isochronous VC.• Enabling of the Isochronous VC for the port.The Isochronous VC needs to be served as the highest priority in arbitrating for the shared<strong>PCI</strong> <strong>Express</strong> Link resource at an Egress Port. This is comprehended by a Switch's internalarbitration scheme. As the Isochronous VC is assigned with highest VC ID, for Switch portthat supports priority-based VC arbitration, the Isochronous VC is served with the highestarbitration priority. For Switch port that supports WRR-based VC arbitration, softwaremust program the weights for the Isochronous VC to be large enough so that the service isequivalent to a highest priority one.In addition, a Switch port may use “just in time” scheduling mechanism to reduce VCarbitration latency. Instead of pipelining non-isochronous Transport Layer packets to theData Link Layer of the Egress Port in a manner that Data Link Layer transmit bufferbecomes saturated, the Switch port may hold off scheduling of a new non-isochronouspacket to the Data Link Layer as long as it is possible without incurring unnecessary Linkidle time.When an Isochronous VC is enabled for a Switch port (ingress) that is connected to aRequester, the Switch must enforce proper traffic regulation to ensure that isochronoustraffic from the ingress port conforms to this specification (N link transactions perisochronous period programmed in the target Switch Egress Port's Port Arbitration Table).With a such enforcement, normal isochronous transactions from compliant Requesters willnot be impacted by ill behavior of any incompliant Requester.396


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0Isochronous traffic regulation from any ingress port is implemented as part of the PortArbitration of the target Egress Port. Specifically, a time-based WRR Port Arbitration isused to schedule isochronous read and/or write request transactions. The N max virtualtimeslots (t) within the isochronous time period (T) are represented by the time-based WRRTable in the <strong>PCI</strong> <strong>Express</strong> Virtual Channel Capability Structure detailed in Section 5.11. Thetable consists of N max entries. A table entry represents one virtual timeslot. An ingress Portis assigned with N link virtual timeslots when N link entries in the target Egress Port's time-basedWRR Port Table are set to the ingress port's Port Number.The above isochronous traffic regulation mechanism only applies to request transactions butnot to completion transactions. As read completion transactions only come from upstreamport and go to downstream ports, no Port Arbitration is needed. When Endpoint-to-Root-Complex and Peer-to-Peer communications co-exist in a Switch, a downstream (egress) portmay mix isochronous write requests and read completions in the same direction. In the caseof contention, the Egress Port must allow write requests to pass read completions to ensurethe Switch meet latency requirement for isochronous requests.A.10.4.Root ComplexA Root Complex may have multiple Root Ports capable of supporting isochronoustransactions. Before a Root Complex starts serving isochronous transactions for a Root Port,the port must be configured by the operating system-level <strong>PCI</strong> <strong>Express</strong> configurationsoftware to enable an Isochronous VC using the following configuration steps:• Configuration of an Isochronous Virtual Channel that Isochronous Traffic Class ismapped to.• Configuration of the Root Port as an Ingress Port:o Configuration (or reconfiguration if the Isochronous VC in RCRB is alreadyenabled) of the time-based WRR Port Arbitration Table of the targeting RCRBto include N link entries set to the ingress port's Port Number. Here N link is theisochronous allocation for the ingress port.o Enabling the targeting RCRB to load newly programmed Port ArbitrationTable.• Configuration of the Root Port as an Egress Port:o If supported, configuration of the Root Port's VC Arbitration Table with largeweights assigned to the Isochronous VC.• Enabling of the Isochronous VC for the Root Port.A Root Complex must observe the maximum isochronous transaction latency (L Completer ormore precisely L Root_Complex ) that applies to all the Root Ports in the Root Complex. How aRoot Complex schedules memory cycles for <strong>PCI</strong> <strong>Express</strong> isochronous transactions andother memory transactions is outside of the scope of this specification as long as L Root_Complex ismet for <strong>PCI</strong> <strong>Express</strong> isochronous transactions.When an Isochronous VC is enabled for a Root Port, the Root Complex must enforceproper traffic regulation to ensure that isochronous traffic from the Root Port confirms to397


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0this specification (N link transactions per isochronous period). With such enforcement,normal isochronous transactions from compliant Requesters will not be impacted by illbehavior of any incompliant Requesters. Isochronous traffic regulation is implementedusing the time-based Port Arbitration Table in RCRB.As Switches do not check for the additional isochronous transaction rules stated inSection A.3, Root Complex may perform the following operations for invalid isochronoustransactions:• Return partial completions for read requests with the value in the Length field exceedingMax Payload Size.• Return partial completions for read requests that cross naturally aligned addressboundaries.• Write partial data for write requests that cross naturally aligned address boundaries.398


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0BB. Symbol EncodingTable B-1 shows the Byte to Symbol encodings for data characters. Table B-2 shows theSymbol encodings for the Special Symbols used for TLP/DLLP Framing and for interfacemanagement. RD- and RD+ refer to the Running Disparity of the Symbol sequence on aper-Lane basis.Data ByteNameData ByteValueTable B-1: 8b/10b Data Symbol CodesBits HGF EDCBA Current RD -abcdei fghjCurrent RD +abcdei fghjD0.0 00 000 00000 100111 0100 011000 1011D1.0 01 000 00001 011101 0100 100010 1011D2.0 02 000 00010 101101 0100 010010 1011D3.0 03 000 00011 110001 1011 110001 0100D4.0 04 000 00100 110101 0100 001010 1011D5.0 05 000 00101 101001 1011 101001 0100D6.0 06 000 00110 011001 1011 011001 0100D7.0 07 000 00111 111000 1011 000111 0100D8.0 08 000 01000 111001 0100 000110 1011D9.0 09 000 01001 100101 1011 100101 0100D10.0 0A 000 01010 010101 1011 010101 0100D11.0 0B 000 01011 110100 1011 110100 0100D12.0 0C 000 01100 001101 1011 001101 0100D13.0 0D 000 01101 101100 1011 101100 0100D14.0 0E 000 01110 011100 1011 011100 0100D15.0 0F 000 01111 010111 0100 101000 1011D16.0 10 000 10000 011011 0100 100100 1011D17.0 11 000 10001 100011 1011 100011 0100D18.0 12 000 10010 010011 1011 010011 0100D19.0 13 000 10011 110010 1011 110010 0100D20.0 14 000 10100 001011 1011 001011 0100D21.0 15 000 10101 101010 1011 101010 0100D22.0 16 000 10110 011010 1011 011010 0100399


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0Data ByteNameData ByteValueBits HGF EDCBA Current RD -abcdei fghjCurrent RD +abcdei fghjD23.0 17 000 10111 111010 0100 000101 1011D24.0 18 000 11000 110011 0100 001100 1011D25.0 19 000 11001 100110 1011 100110 0100D26.0 1A 000 11010 010110 1011 010110 0100D27.0 1B 000 11011 110110 0100 001001 1011D28.0 1C 000 11100 001110 1011 001110 0100D29.0 1D 000 11101 101110 0100 010001 1011D30.0 1E 000 11110 011110 0100 100001 1011D31.0 1F 000 11111 101011 0100 010100 1011D0.1 20 001 00000 100111 1001 011000 1001D1.1 21 001 00001 011101 1001 100010 1001D2.1 22 001 00010 101101 1001 010010 1001D3.1 23 001 00011 110001 1001 110001 1001D4.1 24 001 00100 110101 1001 001010 1001D5.1 25 001 00101 101001 1001 101001 1001D6.1 26 001 00110 011001 1001 011001 1001D7.1 27 001 00111 111000 1001 000111 1001D8.1 28 001 01000 111001 1001 000110 1001D9.1 29 001 01001 100101 1001 100101 1001D10.1 2A 001 01010 010101 1001 010101 1001D11.1 2B 001 01011 110100 1001 110100 1001D12.1 2C 001 01100 001101 1001 001101 1001D13.1 2D 001 01101 101100 1001 101100 1001D14.1 2E 001 01110 011100 1001 011100 1001D15.1 2F 001 01111 010111 1001 101000 1001D16.1 30 001 10000 011011 1001 100100 1001D17.1 31 001 10001 100011 1001 100011 1001D18.1 32 001 10010 010011 1001 010011 1001D19.1 33 001 10011 110010 1001 110010 1001D20.1 34 001 10100 001011 1001 001011 1001D21.1 35 001 10101 101010 1001 101010 1001D22.1 36 001 10110 011010 1001 011010 1001D23.1 37 001 10111 111010 1001 000101 1001D24.1 38 001 11000 110011 1001 001100 1001400


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0Data ByteNameData ByteValueBits HGF EDCBA Current RD -abcdei fghjCurrent RD +abcdei fghjD25.1 39 001 11001 100110 1001 100110 1001D26.1 3A 001 11010 010110 1001 010110 1001D27.1 3B 001 11011 110110 1001 001001 1001D28.1 3C 001 11100 001110 1001 001110 1001D29.1 3D 001 11101 101110 1001 010001 1001D30.1 3E 001 11110 011110 1001 100001 1001D31.1 3F 001 11111 101011 1001 010100 1001D0.2 40 010 00000 100111 0101 011000 0101D1.2 41 010 00001 011101 0101 100010 0101D2.2 42 010 00010 101101 0101 010010 0101D3.2 43 010 00011 110001 0101 110001 0101D4.2 44 010 00100 110101 0101 001010 0101D5.2 45 010 00101 101001 0101 101001 0101D6.2 46 010 00110 011001 0101 011001 0101D7.2 47 010 00111 111000 0101 000111 0101D8.2 48 010 01000 111001 0101 000110 0101D9.2 49 010 01001 100101 0101 100101 0101D10.2 4A 010 01010 010101 0101 010101 0101D11.2 4B 010 01011 110100 0101 110100 0101D12.2 4C 010 01100 001101 0101 001101 0101D13.2 4D 010 01101 101100 0101 101100 0101D14.2 4E 010 01110 011100 0101 011100 0101D15.2 4F 010 01111 010111 0101 101000 0101D16.2 50 010 10000 011011 0101 100100 0101D17.2 51 010 10001 100011 0101 100011 0101D18.2 52 010 10010 010011 0101 010011 0101D19.2 53 010 10011 110010 0101 110010 0101D20.2 54 010 10100 001011 0101 001011 0101D21.2 55 010 10101 101010 0101 101010 0101D22.2 56 010 10110 011010 0101 011010 0101D23.2 57 010 10111 111010 0101 000101 0101D24.2 58 010 11000 110011 0101 001100 0101D25.2 59 010 11001 100110 0101 100110 0101D26.2 5A 010 11010 010110 0101 010110 0101401


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0Data ByteNameData ByteValueBits HGF EDCBA Current RD -abcdei fghjCurrent RD +abcdei fghjD27.2 5B 010 11011 110110 0101 001001 0101D28.2 5C 010 11100 001110 0101 001110 0101D29.2 5D 010 11101 101110 0101 010001 0101D30.2 5E 010 11110 011110 0101 100001 0101D31.2 5F 010 11111 101011 0101 010100 0101D0.3 60 011 00000 100111 0011 011000 1100D1.3 61 011 00001 011101 0011 100010 1100D2.3 62 011 00010 101101 0011 010010 1100D3.3 63 011 00011 110001 1100 110001 0011D4.3 64 011 00100 110101 0011 001010 1100D5.3 65 011 00101 101001 1100 101001 0011D6.3 66 011 00110 011001 1100 011001 0011D7.3 67 011 00111 111000 1100 000111 0011D8.3 68 011 01000 111001 0011 000110 1100D9.3 69 011 01001 100101 1100 100101 0011D10.3 6A 011 01010 010101 1100 010101 0011D11.3 6B 011 01011 110100 1100 110100 0011D12.3 6C 011 01100 001101 1100 001101 0011D13.3 6D 011 01101 101100 1100 101100 0011D14.3 6E 011 01110 011100 1100 011100 0011D15.3 6F 011 01111 010111 0011 101000 1100D16.3 70 011 10000 011011 0011 100100 1100D17.3 71 011 10001 100011 1100 100011 0011D18.3 72 011 10010 010011 1100 010011 0011D19.3 73 011 10011 110010 1100 110010 0011D20.3 74 011 10100 001011 1100 001011 0011D21.3 75 011 10101 101010 1100 101010 0011D22.3 76 011 10110 011010 1100 011010 0011D23.3 77 011 10111 111010 0011 000101 1100D24.3 78 011 11000 110011 0011 001100 1100D25.3 79 011 11001 100110 1100 100110 0011D26.3 7A 011 11010 010110 1100 010110 0011D27.3 7B 011 11011 110110 0011 001001 1100D28.3 7C 011 11100 001110 1100 001110 0011402


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0Data ByteNameData ByteValueBits HGF EDCBA Current RD -abcdei fghjCurrent RD +abcdei fghjD29.3 7D 011 11101 101110 0011 010001 1100D30.3 7E 011 11110 011110 0011 100001 1100D31.3 7F 011 11111 101011 0011 010100 1100D0.4 80 100 00000 100111 0010 011000 1101D1.4 81 100 00001 011101 0010 100010 1101D2.4 82 100 00010 101101 0010 010010 1101D3.4 83 100 00011 110001 1101 110001 0010D4.4 84 100 00100 110101 0010 001010 1101D5.4 85 100 00101 101001 1101 101001 0010D6.4 86 100 00110 011001 1101 011001 0010D7.4 87 100 00111 111000 1101 000111 0010D8.4 88 100 01000 111001 0010 000110 1101D9.4 89 100 01001 100101 1101 100101 0010D10.4 8A 100 01010 010101 1101 010101 0010D11.4 8B 100 01011 110100 1101 110100 0010D12.4 8C 100 01100 001101 1101 001101 0010D13.4 8D 100 01101 101100 1101 101100 0010D14.4 8E 100 01110 011100 1101 011100 0010D15.4 8F 100 01111 010111 0010 101000 1101D16.4 90 100 10000 011011 0010 100100 1101D17.4 91 100 10001 100011 1101 100011 0010D18.4 92 100 10010 010011 1101 010011 0010D19.4 93 100 10011 110010 1101 110010 0010D20.4 94 100 10100 001011 1101 001011 0010D21.4 95 100 10101 101010 1101 101010 0010D22.4 96 100 10110 011010 1101 011010 0010D23.4 97 100 10111 111010 0010 000101 1101D24.4 98 100 11000 110011 0010 001100 1101D25.4 99 100 11001 100110 1101 100110 0010D26.4 9A 100 11010 010110 1101 010110 0010D27.4 9B 100 11011 110110 0010 001001 1101D28.4 9C 100 11100 001110 1101 001110 0010D29.4 9D 100 11101 101110 0010 010001 1101D30.4 9E 100 11110 011110 0010 100001 1101403


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0Data ByteNameData ByteValueBits HGF EDCBA Current RD -abcdei fghjCurrent RD +abcdei fghjD31.4 9F 100 11111 101011 0010 010100 1101D0.5 A0 101 00000 100111 1010 011000 1010D1.5 A1 101 00001 011101 1010 100010 1010D2.5 A2 101 00010 101101 1010 010010 1010D3.5 A3 101 00011 110001 1010 110001 1010D4.5 A4 101 00100 110101 1010 001010 1010D5.5 A5 101 00101 101001 1010 101001 1010D6.5 A6 101 00110 011001 1010 011001 1010D7.5 A7 101 00111 111000 1010 000111 1010D8.5 A8 101 01000 111001 1010 000110 1010D9.5 A9 101 01001 100101 1010 100101 1010D10.5 AA 101 01010 010101 1010 010101 1010D11.5 AB 101 01011 110100 1010 110100 1010D12.5 AC 101 01100 001101 1010 001101 1010D13.5 AD 101 01101 101100 1010 101100 1010D14.5 AE 101 01110 011100 1010 011100 1010D15.5 AF 101 01111 010111 1010 101000 1010D16.5 B0 101 10000 011011 1010 100100 1010D17.5 B1 101 10001 100011 1010 100011 1010D18.5 B2 101 10010 010011 1010 010011 1010D19.5 B3 101 10011 110010 1010 110010 1010D20.5 B4 101 10100 001011 1010 001011 1010D21.5 B5 101 10101 101010 1010 101010 1010D22.5 B6 101 10110 011010 1010 011010 1010D23.5 B7 101 10111 111010 1010 000101 1010D24.5 B8 101 11000 110011 1010 001100 1010D25.5 B9 101 11001 100110 1010 100110 1010D26.5 BA 101 11010 010110 1010 010110 1010D27.5 BB 101 11011 110110 1010 001001 1010D28.5 BC 101 11100 001110 1010 001110 1010D29.5 BD 101 11101 101110 1010 010001 1010D30.5 BE 101 11110 011110 1010 100001 1010D31.5 BF 101 11111 101011 1010 010100 1010D0.6 C0 110 00000 100111 0110 011000 0110404


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0Data ByteNameData ByteValueBits HGF EDCBA Current RD -abcdei fghjCurrent RD +abcdei fghjD1.6 C1 110 00001 011101 0110 100010 0110D2.6 C2 110 00010 101101 0110 010010 0110D3.6 C3 110 00011 110001 0110 110001 0110D4.6 C4 110 00100 110101 0110 001010 0110D5.6 C5 110 00101 101001 0110 101001 0110D6.6 C6 110 00110 011001 0110 011001 0110D7.6 C7 110 00111 111000 0110 000111 0110D8.6 C8 110 01000 111001 0110 000110 0110D9.6 C9 110 01001 100101 0110 100101 0110D10.6 CA 110 01010 010101 0110 010101 0110D11.6 CB 110 01011 110100 0110 110100 0110D12.6 CC 110 01100 001101 0110 001101 0110D13.6 CD 110 01101 101100 0110 101100 0110D14.6 CE 110 01110 011100 0110 011100 0110D15.6 CF 110 01111 010111 0110 101000 0110D16.6 D0 110 10000 011011 0110 100100 0110D17.6 D1 110 10001 100011 0110 100011 0110D18.6 D2 110 10010 010011 0110 010011 0110D19.6 D3 110 10011 110010 0110 110010 0110D20.6 D4 110 10100 001011 0110 001011 0110D21.6 D5 110 10101 101010 0110 101010 0110D22.6 D6 110 10110 011010 0110 011010 0110D23.6 D7 110 10111 111010 0110 000101 0110D24.6 D8 110 11000 110011 0110 001100 0110D25.6 D9 110 11001 100110 0110 100110 0110D26.6 DA 110 11010 010110 0110 010110 0110D27.6 DB 110 11011 110110 0110 001001 0110D28.6 DC 110 11100 001110 0110 001110 0110D29.6 DD 110 11101 101110 0110 010001 0110D30.6 DE 110 11110 011110 0110 100001 0110D31.6 DF 110 11111 101011 0110 010100 0110D0.7 E0 111 00000 100111 0001 011000 1110D1.7 E1 111 00001 011101 0001 100010 1110D2.7 E2 111 00010 101101 0001 010010 1110405


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0Data ByteNameData ByteValueBits HGF EDCBA Current RD -abcdei fghjCurrent RD +abcdei fghjD3.7 E3 111 00011 110001 1110 110001 0001D4.7 E4 111 00100 110101 0001 001010 1110D5.7 E5 111 00101 101001 1110 101001 0001D6.7 E6 111 00110 011001 1110 011001 0001D7.7 E7 111 00111 111000 1110 000111 0001D8.7 E8 111 01000 111001 0001 000110 1110D9.7 E9 111 01001 100101 1110 100101 0001D10.7 EA 111 01010 010101 1110 010101 0001D11.7 EB 111 01011 110100 1110 110100 1000D12.7 EC 111 01100 001101 1110 001101 0001D13.7 ED 111 01101 101100 1110 101100 1000D14.7 EE 111 01110 011100 1110 011100 1000D15.7 EF 111 01111 010111 0001 101000 1110D16.7 F0 111 10000 011011 0001 100100 1110D17.7 F1 111 10001 100011 0111 100011 0001D18.7 F2 111 10010 010011 0111 010011 0001D19.7 F3 111 10011 110010 1110 110010 0001D20.7 F4 111 10100 001011 0111 001011 0001D21.7 F5 111 10101 101010 1110 101010 0001D22.7 F6 111 10110 011010 1110 011010 0001D23.7 F7 111 10111 111010 0001 000101 1110D24.7 F8 111 11000 110011 0001 001100 1110D25.7 F9 111 11001 100110 1110 100110 0001D26.7 FA 111 11010 010110 1110 010110 0001D27.7 FB 111 11011 110110 0001 001001 1110D28.7 FC 111 11100 001110 1110 001110 0001D29.7 FD 111 11101 101110 0001 010001 1110D30.7 FE 111 11110 011110 0001 100001 1110D31.7 FF 111 11111 101011 0001 010100 1110406


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0Data ByteNameTable B-2: 8b/10b Special Character Symbol CodesData ByteValueBits HGF EDCBA Current RD -abcdei fghjCurrent RD +abcdei fghjK28.0 1C 000 11100 001111 0100 110000 1011K28.1 3C 001 11100 001111 1001 110000 0110K28.2 5C 010 11100 001111 0101 110000 1010K28.3 7C 011 11100 001111 0011 110000 1100K28.4 9C 100 11100 001111 0010 110000 1101K28.5 BC 101 11100 001111 1010 110000 0101K28.6 DC 110 11100 001111 0110 110000 1001K28.7 FC 111 11100 001111 1000 110000 0111K23.7 F7 111 10111 111010 1000 000101 0111K27.7 FB 111 11011 110110 1000 001001 0111K29.7 FD 111 11101 101110 1000 010001 0111K30.7 FE 111 11110 011110 1000 100001 0111407


408<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0CC. Physical Layer AppendixC.1.Data ScramblingThe following subroutines encode and decode an eight-bit value contained in “inbyte” withthe LFSR. This is presented as one example only; there are many ways to obtain the properoutput. This example demonstrates how to advance the LFSR eight times in one operationand how to XOR the data in one operation. Many other implementations are possible butthey must all produce the same output as that shown here.The following algorithm uses the “C” programming language conventions, where “” represent the shift left and shift right operators, “>” is the compare greater thanoperator, and “ ^ ” is the exclusive or operator, and & is the logical “AND” operator./*this routine implements the serial descrambling algorithm in parallel formthis advances the lfsr 8 bits every time it is calledthis fewer than 36 xor gates to implement (with a static register)The XOR required to advance 8 bits / clock is:bit 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15*/8 9 10 11 8 9 10 11 0 1 2 3 4 5 6 79 10 11 12 9 10 11 12 12 13 14 15 8 9 810 11 12 13 10 11 12 13 13 14 15 9 10 912 13 14 15 13 14 15 14 15 10 11 1115 14 15 12 13 1415 15 15409


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0/* XOR required for creating the serial data isbi 0 1 2 3 4 5 6 7*/15 14 13 12 11 10 9 815 14 13 12 11 10 915 14 13 12 11 1015 14 13 1215int scramble_byte(int inbyte){static int scrambit[16];static int bit[16];static int bit_out[16];static unsigned short lfsr = 0xffff;int i, outbyte;// 16 bit short for polynomialif (inbyte == COMMA){lfsr = 0xffff;return (COMMA);}if (inbyte == SKIP)return (SKIP);// if this is a comma// reset the LFSR// and return the same data// don't advance or encode on skipfor (i=0; i> i) & 1;// convert LFSR to bit array for legibilityfor (i=0; i> i) & 1;// apply the xor to the dataif (! (inbyte && 0x100) && // if not a KCODE, scramble the data! (TrainingSequence == TRUE)) // and if not in the middle of atraining sequence{scrambit[0] ^= bit[15];scrambit[1] ^= bit[14] ^ bit[15];scrambit[2] ^= bit[13] ^ bit[14] ^ bit[15];scrambit[3] ^= bit[12] ^ bit[13] ^ bit[14];scrambit[4] ^= bit[11] ^ bit[12] ^ bit[13] ^ bit[15];scrambit[5] ^= bit[10] ^ bit[11] ^ bit[12] ^ bit[14];scrambit[6] ^= bit[9] ^ bit[10] ^ bit[11] ^ bit[13];scrambit[7] ^= bit[8] ^ bit[9] ^ bit[10] ^ bit[12] ^ bit[15];}410


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0outbyte = 0;for (i= 0; i


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0if (inbyte == COMMA){lfsr = 0xffff;return (COMMA);}if (inbyte == SKIP)return (SKIP);// if this is a comma// reset the LFSR// and return the same data// don't advance or encode on skipfor (i=0; i> i) & 1;for (i=0; i> i) & 1;// apply the xor to the dataif (! (inbyte && 0x100) &&// if not a KCODE, scramble the data! (TrainingSequence == TRUE)) // and if not in the middle of atraining sequence{descrambit[0] ^= bit[15];descrambit[1] ^= bit[14] ^ bit[15];descrambit[2] ^= bit[13] ^ bit[14] ^ bit[15];descrambit[3] ^= bit[12] ^ bit[13] ^ bit[14];descrambit[4] ^= bit[11] ^ bit[12] ^ bit[13] ^ bit[15];descrambit[5] ^= bit[10] ^ bit[11] ^ bit[12] ^ bit[14];descrambit[6] ^= bit[9] ^ bit[10] ^ bit[11] ^ bit[13];descrambit[7] ^= bit[8] ^ bit[9] ^ bit[10] ^ bit[12] ^ bit[15];}outbyte = 0;for (i= 0; i


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0bit_out[0] = bit[8] ^ bit[9] ^ bit[10] ^ bit[12] ^ bit[15] ;bit_out[1] = bit[9] ^ bit[10] ^ bit[11] ^ bit[13];bit_out[2] = bit[10] ^ bit[11] ^ bit[12] ^ bit[14];bit_out[3] = bit[11] ^ bit[12] ^ bit[13] ^ bit[15];bit_out[4] = bit[8] ^ bit[9] ^ bit[10]^ bit[13] ^ bit[14] ^ bit[15] ;bit_out[5] = bit[9] ^ bit[10] ^ bit[11] ^ bit[14] ^ bit[15] ;bit_out[6] = bit[10] ^ bit[11] ^ bit[12] ^ bit[15] ;bit_out[7] = bit[11] ^ bit[12] ^ bit[13];bit_out[8] = bit[0] ^ bit[12] ^ bit[13] ^ bit[14] ;bit_out[9] = bit[1] ^ bit[13] ^ bit[14] ^ bit[15] ;bit_out[10] = bit[2] ^ bit[14] ^ bit[15];bit_out[11] = bit[3] ^ bit[15];bit_out[12] = bit[4] ;bit_out[13] = bit[5] ^ bit[8] ^ bit[9] ^ bit[10] ^ bit[12] ^ bit[15] ;bit_out[14] = bit[6] ^ bit[9] ^ bit[10] ^ bit[11] ^ bit[13] ;bit_out[15] = bit[7] ^ bit[8] ^ bit[9] ^ bit[11] ^ bit[14] ^ bit[15] ;lfsr = 0;for (i=0; i


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.00,8 1,9 2,A 3,B 4,C 5,D 6,E 7,F68 B78A CF05 3E1B 7C36 F86C 50C9 A192 E33570 667B CCF6 39FD 73FA E7F4 6FF9 DFF2 1FF578 3FEA 7FD4 FFA8 5F41 BE82 DD15 1A3B 3476An 8 bit value of 0 repeatedly encoded with the LFSR after reset produces the followingconsecutive 8 bit values:00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F00 8D 76 D2 C2 68 B3 26 1F 6C 43 08 A5 54 D3 52 3410 02 95 55 8A 1B BE 1A BB 3D B7 56 FA 1B 0B F6 5320 41 9C 80 F0 4A C3 7F 74 91 52 86 BE B6 A7 6F 7A30 D6 E6 63 31 0C FE 36 F0 29 EA F3 1E 94 26 34 EB40 17 3B 85 53 4D DC 4A E9 88 0E 20 5D D0 ED 01 6950 DE 38 9E FA 07 4B DB 68 7B 43 0A 45 08 D7 0D 9660 98 E6 35 3F A5 98 76 FE 15 F7 0E C8 AF 90 60 6670 CB D5 F0 FA 9F 00 82 2B 91 74 31 0E 1E 6A F4 7680 48 69 6D F4 93 8A CD 7B 7E AD 13 15 EE FE 72 1E90 3B AA 14 A0 E7 D4 AA 23 67 9E DC B0 FB 73 A5 E0A0 4F 94 CA 06 12 92 E2 63 6D 62 78 45 93 0C 26 53B0 02 22 59 3E 63 CA 6E 2B 1F 1F 6A 63 ED A9 B5 35C0 FD A0 A2 4A 96 E1 AF 71 62 7B D5 E1 8A 56 A0 55D0 68 89 D1 82 FF B4 4C 23 7F 1E 48 83 7F E8 1A B2E0 CD EA C5 A9 C3 AC 01 62 CE 39 09 F6 7D 76 5F 39B0 41 58 A5 2F 7A 4A 6D 83 7A 58 8D 38 5C FF 3D 77C0 B3 9C 23 3C A0 91 4D 56 E1 0B ED 43 A7 29 74 98D0 A2 DB 2D E5 7F C8 8D E7 69 C6 B8 0A C9 83 D0 64F0 3A F9 3D 05 EA D9 E9 EE 97 3B BD 44 8C 4B E2 0FScrambling produces the power spectrum shown in Figure C-1.414


<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0-803GIO Scrambled '0'-90Power (dBm / Hz)-100-110-120-130-1400 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000Frequency (MHz)Figure C-1: Scrambling Spectrum for Data Value of 0415


416<strong>PCI</strong> EXPRESS BASE SPECIFICATION, REV. 1.0

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!