Saturday, January 7, 2017

CCIE SPv4 - MPLS L3 VPN - BGP PE-CE Routing

Software versions:
IOS XE 15.5
IOS XR 5.3

The topology for this demo:
This post will focus on the de facto standard used by most ISPs. BGP makes it relatively easy to deploy, no redistribution on the PEs are required, BGP automatically moves traffic between AFIs. So we'll have IPv4/IPv6 unicast on the AC PE-CE connection and VPNv4/VPNv6 PE-PE. Currently only IPv4 running as the only AFI currently, we can enable IPv6 in upcoming scenarios. We are going to test out IPv4/VPNv4 only. I'll list out the addressing for the PE-CE connections, all the VRFs will have the same subnet. The Router number is the subnet IP, XR1 is .11.

R1
IPv4 - 131.0.0.0 255.255.255.0
IPv6 - 2131:CC1E::/64

R3
IPv4 - 83.0.0.0 255.255.255.0, 73.0.0.0 255.255.255.0
IPv6 - 2083:CC1E::/64, 2073:CC1E::/64

R5
IPv4 - 59.0.0.0 255.255.255.0
IPv6 - 2059:CC1E::/64

R6
IPv4 - 116.0.0.0 255.255.255.0 , 106.0.0.0 255.255.255.0
IPv6 - 2116:CC1E::/64, 2106:CC1E::/64

XR1
IPv4 - 113.0.0.0/24, 121.0.0.0/24
IPv6 - 2113:CC1E::/64, 2121:CC1E::/64

XR2
IPv4 - 214.0.0.0/24
IPv6 - 2214:CC1E::/64

XR3
IPv4 - 143.0.0.0/24
IPv6 - 2143:CC1E::/64


BGP as the PE-CE protocol is widely deployed and easy to manipulate. As long as the VRF, VPNv4 and IGP/LDP are in place, everything else is ready to go. We're gonna start by configuring R1, R3, R5, R6 and XR3 as PE routers. The VPNv4 piece is already in place, we need to focus on the PE-CE part. 

Since a VRF is required to connect to the customer, we have to leverage VRF aware BGP by going under the VRF process of BGP to peer with the CE. The CE is going to be using the global RIB for BGP and VRF defined RIBs for the remaining IGPs/Static routing. 

R1
router bgp 50693
address-family ipv4 vrf BGP
  neighbor 131.0.0.13 remote-as 134
  neighbor 131.0.0.13 activate
 exit-address-family

R13
router bgp 134
 bgp log-neighbor-changes
 no bgp default ipv4-unicast
 neighbor 131.0.0.1 remote-as 50693
 !
 address-family ipv4
  network 131.0.0.0 mask 255.255.255.0
  neighbor 131.0.0.1 activate
 exit-address-family


R3
router bgp 50693
address-family ipv4 vrf BGP
  neighbor 83.0.0.8 remote-as 8
  neighbor 83.0.0.8 activate
 exit-address-family

R8
router bgp 8
 bgp log-neighbor-changes
 no bgp default ipv4-unicast
 neighbor 83.0.0.3 remote-as 50693
 !
 address-family ipv4
  network 83.0.0.0 mask 255.255.255.0
  neighbor 83.0.0.3 activate
 exit-address-family


R5
router bgp 50693
address-family ipv4 vrf BGP
  neighbor 59.0.0.9 remote-as 9
  neighbor 59.0.0.9 activate
 exit-address-family

R9
router bgp 9
 bgp log-neighbor-changes
 no bgp default ipv4-unicast
 neighbor 59.0.0.5 remote-as 50693
 !
 address-family ipv4
  network 59.0.0.0 mask 255.255.255.0
  neighbor 59.0.0.5 activate
 exit-address-family


R6
router bgp 50693
address-family ipv4 vrf BGP
  neighbor 106.0.0.10 remote-as 10
  neighbor 106.0.0.10 activate
 exit-address-family

R10
router bgp 10
 bgp log-neighbor-changes
 no bgp default ipv4-unicast
 neighbor 106.0.0.6 remote-as 50693
 !
 address-family ipv4
  network 106.0.0.0 mask 255.255.255.0
  neighbor 106.0.0.6 activate
 exit-address-family


XR3
router bgp 50693
vrf BGP
  rd 20:50693
  address-family ipv4 unicast
  !
  neighbor 143.0.0.14
   remote-as 143
   address-family ipv4 unicast
    route-policy PASS in
    route-policy PASS out
!
route-policy PASS
  pass
end-policy

R14
router bgp 143
 bgp log-neighbor-changes
 no bgp default ipv4-unicast
 neighbor 143.0.0.13 remote-as 50693
 !
 address-family ipv4
  network 143.0.0.0 mask 255.255.255.0
  neighbor 143.0.0.13 activate
 exit-address-family


Now that we have all the configuration in place, I'll go through the step verification that is required to make sure that all is well. 

To level set, there will 2 labels used the "transport" and "VPN" labels. The VPN label is allocated by BGP upon receipt of a prefix from a customer in that customers VRF. VPNv4 applies the label, this label identifies customer traffic, this label does not change.

The transport label is a bit different, it is allocated via LDP. When LDP adjacency is formed, every route that IGP has in the RIB, LDP assigns a label for, this is done by default. This feature can be modified if required. The transport label changes on a per hop basis through the MPLS core, it used to move the VPN label from ingress PE to egress PE. The 2 labels stack is Transport label, then VPN label, then payload. 

Let's start on R1.

R1#sh bgp vpnv4 unicast all | b Route
Route Distinguisher: 20:50693 (default for vrf BGP)
 * i 59.0.0.0/24      192.168.1.5              0    100      0 9 i
 *>i                  192.168.1.5              0    100      0 9 i
 * i 83.0.0.0/24      192.168.1.3              0    100      0 8 i
 *>i                  192.168.1.3              0    100      0 8 i
 * i 106.0.0.0/24     192.168.1.6              0    100      0 10 i
 *>i                  192.168.1.6              0    100      0 10 i
 r>  131.0.0.0/24     131.0.0.13               0             0 134 i
 * i 143.0.0.0/24     192.168.1.13             0    100      0 143 i
 *>i                  192.168.1.13             0    100      0 143 i

The route with the r> indicates that the route is local, or rib failure, either a better AD or connected, typically both. You can see we learn the same routes 2 different times, like 59.0.0.0/24. Let's expand that route to see why.

R1#sh bgp vpnv4 unicast all 59.0.0.0
BGP routing table entry for 20:50693:59.0.0.0/24, version 14
Paths: (2 available, best #2, table BGP)
  Advertised to update-groups:
     3
  Refresh Epoch 1
  9
    192.168.1.5 (metric 4) (via default) from 192.168.1.14 (192.168.1.14)
      Origin IGP, metric 0, localpref 100, valid, internal
      Extended Community: RT:20:50693
      Originator: 192.168.1.5, Cluster list: 192.168.1.14
      mpls labels in/out nolabel/26
      rx pathid: 0, tx pathid: 0
  Refresh Epoch 1
  9
    192.168.1.5 (metric 4) (via default) from 192.168.1.2 (192.168.1.2)
      Origin IGP, metric 0, localpref 100, valid, internal, best
      Extended Community: RT:20:50693
      Originator: 192.168.1.5, Cluster list: 192.168.1.2
      mpls labels in/out nolabel/26
      rx pathid: 0, tx pathid: 0x0

The reason we get two entries is we are learning the route from R2 and XR4, the BGP Route Reflectors. The reason that the bottom path was chosen is because the BGP path selection process has an ordered list in a top to down fashion, in this case, the lower Router ID was used, in this case R2 advertised the route as well as XR4, R2's RID or Cluster list ID is lower. 

You'll also notice below the Originator and Cluster list is "mpls labes in/out nolabel/26". This means that no labels were allocated in bound, but label 26 is allocated outbound. That means that if we were to look in the LFIB, we should see label 26 is mapped to 131.0.0.0/24 [V], the [V] identifying this as a VPN route.

R1#sh mpls forwarding-table vrf BGP
Local      Outgoing   Prefix           Bytes Label   Outgoing   Next Hop
Label      Label      or Tunnel Id     Switched      interface
26         No Label   131.0.0.0/24[V]  1597232       aggregate/BGP

The "aggregate/BGP" means that this route is learned in via BGP. R1 allocated label 26 via BGP, if any other PE wanted to reach this prefix, it would use label 26 as the VPN label to get to R1. The ingress PE would use LDP assigned labels to get to R1, when R1 would receive the incoming traffic, popping the VPN label, it would know where to send the traffic. 

R9#traceroute 131.0.0.13 num
Type escape sequence to abort.
Tracing the route to 131.0.0.13
VRF info: (vrf in name/id, vrf out name/id)
  1 59.0.0.5 1 msec 1 msec 1 msec
  2 10.4.5.4 [MPLS: Labels 25/26 Exp 0] 5 msec 4 msec 5 msec
  3 10.15.4.15 [MPLS: Labels 24031/26 Exp 0] 28 msec 21 msec 21 msec
  4 131.0.0.1 [AS 134] 20 msec 12 msec 19 msec
  5 131.0.0.13 [AS 134] 7 msec *  5 msec

As R9 sends traffic to R13, it doesn't know about the MPLS core, it knows via BGP, to send traffic to R5. When R5 receives packet in, it does a RIB lookup to see if it knows about the 131.0.0.0/24 prefix, it does, and the next hop is 192.168.1.1. 

R5#sh ip route vrf BGP 131.0.0.0

Routing Table: BGP
Routing entry for 131.0.0.0/24, 1 known subnets
B        131.0.0.0 [200/0] via 192.168.1.1, 1d00h

Now, since we know that the 131.0.0.0/24 is there and the next hop is R1 or 192.168.1.1, there is no outgoing interface listed in the RIB, so the LFIB is consulted.

We can't use 131.0.0.0/24 since the MPLS core isn't aware of the customers network, so it looks up the next hop info for R1.

R5#sh mpls forwarding-table 192.168.1.1
Local      Outgoing   Prefix           Bytes Label   Outgoing   Next Hop
Label      Label      or Tunnel Id     Switched      interface
25         25         192.168.1.1/32   290524        Gi1.45     10.4.5.4

OK, so now we know that label 25 is the outgoing label to R4 on G1.45. 

This can also be looked up with CEF if you know the valid next hop info.

R5#sh ip cef 192.168.1.1
192.168.1.1/32
  nexthop 10.4.5.4 GigabitEthernet1.45 label 25

The process repeats itself on R4, XR5 where it gets interesting.

RP/0/0/CPU0:XR5#sh mpls forwarding prefix 192.168.1.1/32
Sat Jan  7 01:32:47.232 UTC
Local  Outgoing    Prefix             Outgoing     Next Hop        Bytes
Label  Label       or ID              Interface                    Switched
------ ----------- ------------------ ------------ --------------- ------------
24031  Pop         192.168.1.1/32     Gi0/0/0/0.115 10.1.15.1       2433594

So, now we receive the traffic on XR5, instead of a label value in the outgoing label field, we see "Pop" which tells a couple things, 1 we are directly connected to R1, it also indicates that "implicit null" has been advertised. This is advertised by R1 to it's directly connected peers.

R1#sh mpls ldp bindings local detail | b 192.168.1.1/32
  lib entry: 192.168.1.1/32, rev 2, chkpt: none
        local binding:  label: imp-null (owner LDP)
          Advertised to:
          192.168.1.12:0         192.168.1.15:0         192.168.1.11:0

R1 is telling XR2, XR5 and XR1 all to pop the transport label when the traffic comes to those routers. 

RP/0/0/CPU0:XR5#sh mpls ldp bindings 192.168.1.1/32
Sat Jan  7 01:37:10.064 UTC
192.168.1.1/32, rev 172
        Local binding: label: 24031
        Remote bindings: (3 peers)
            Peer                Label
            -----------------   ---------
            192.168.1.1:0       ImpNull

As you can see, XR5 sees implicit null being advertised from R1. That is why when R1 sees traffic coming in with Label 26, it knows where to send it, out the VRF BGP interface to R13.

It honestly takes this much digging to understand the depth of MPLS. Knowing how to recurse is really important, since we are only using LDP and not RSVP TE or MPLS TE or Segment Routing or MPLS TP yet, this is very basic. 

Thanks for stopping by!
Rob Riker, CCIE #50693

No comments:

Post a Comment