* [chassis/multi-asic] Make sure iBGP session established as directly connected (#16777)
What I did:
Make Sure for internal iBGP we are one-hop away (directly connected) by using Generic TTL security mechanism.
Why I did:
Without this change it's possible on packet chassis i-BGP can be established even if there no direct connection. Below is the example
- Let's say we have 3 LC's LC1/LC2/LC3 each having i-BGP session session with each other over Loopback4096
- Each LC's have static route towards other LC's Loopback4096 to establish i-BGP session
- LC1 learn default route 0.0.0.0/0 from it's e-BGP peers and send it over to LC2 and LC3 over i-BGP
- Now for some reason on LC2 static route towards LC3 is removed/not-present/some-issue we expect i-BGP session should go down between LC2 and LC3
- However i-BGP between LC2 and LC3 does not go down because of feature ip nht-resolve-via-default where LC2 will use default route to reach Loopback4096 of LC3. As it's using default route BGP packets from LC2 towards LC3 will first route to LC1 and then go to LC3 from there.
Above scenario can result in packet mis-forwarding on data plane
How I fixed it:-
To make sure BGP packets between i-BGP peers are not going with extra routing hop enable using GTSM feature
neighbor PEER ttl-security hops NUMBER
This command enforces Generalized TTL Security Mechanism (GTSM), as specified in RFC 5082. With this command, only neighbors that are the specified number of hops away will be allowed to become neighbors. This command is mutually exclusive with ebgp-multihop.
We set hop count as 1 which makes FRR to reject BGP connection if we receive BGP packets if it's TTL < 255. Also setting this attribute make sure i-BGP frames are originated with IP TTL of 255.
How I verify:
Manual Verification of above scenario. See blow BGP packets receive with IP TTL 254 (additional routing hop) we are seeing FIN TCP flags as BGP is rejecting the connection
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
* Update peer-group.conf.j2
* Update result_all.conf
* Update result_base.conf
---------
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
#### Why I did it
src/sonic-utilities
```
* 2b6b6580 - (HEAD -> 202305, origin/202305) Added support to display only nonzero queue counter. (#2978) (#3046) (15 hours ago) [abdosi]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-host-services
```
* 689395a - (HEAD -> 202305, origin/202305) Updated the iptable rule to use parent/base name of midplane interface of chassis. (#75) (2 days ago) [abdosi]
* 45212a8 - [DualToR][caclmgrd] Fix IPtables rules for multiple vlan interfaces for DualToR config (#82) (2 days ago) [vdahiya12]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-linux-kernel
```
* 6ff3cc2 - (HEAD -> 202305, origin/202305) arm64: Kconfig inclusions to fix PCI hang and MTD detection (#362) (2 days ago) [Pavan Naregundi]
```
#### How I did it
#### How to verify it
#### Description for the changelog
What I did:
Enable Sending BGP Community over internal neighbors over iBGP Session
Microsoft ADO: 25268695
Why I did:
Without this change BGP community send by e-BGP Peers are not carry-forward to other e-BGP peers.
str2-xxxx-lc1-2# show bgp ipv6 20c0:a801::/64
BGP routing table entry for 20c0:a801::/64, version 52141
Paths: (1 available, best #1, table default)
Not advertised to any peer
65000 65500
2603:10e2:400::6 from 2603:10e2:400::6 (3.3.3.6)
Origin IGP, localpref 100, valid, internal, best (First path received)
Last update: Tue Sep 26 16:08:26 2023
str2-xxxx-lc1-2# show ip bgp 192.168.35.128/25
BGP routing table entry for 192.168.35.128/25, version 52688
Paths: (1 available, best #1, table default)
Not advertised to any peer
65000 65502
3.3.3.6 from 3.3.3.6 (3.3.3.6)
Origin IGP, localpref 100, valid, internal, best (First path received)
Last update: Tue Sep 26 15:45:51 2023
After the change
str2-xxxx-lc2-2(config)# router bgp 65100
str2-xxxx-lc2-2(config-router)# address-family ipv4
str2-xxxx-lc2-2(config-router-af)# neighbor INTERNAL_PEER_V4 send-community
str2-xxxx-lc2-2(config-router-af)# exit
str2-xxxx-lc2-2(config-router)# address-family ipv6
str2-xxxx-lc2-2(config-router-af)# neighbor INTERNAL_PEER_V6 send-community
str2-xxxx-lc1-2# show bgp ipv6 20c0:a801::/64
BGP routing table entry for 20c0:a801::/64, version 52400
Paths: (1 available, best #1, table default)
Not advertised to any peer
65000 65500
2603:10e2:400::6 from 2603:10e2:400::6 (3.3.3.6)
Origin IGP, localpref 100, valid, internal, best (First path received)
**Community: 1111:1111**
Last update: Tue Sep 26 16:10:19 2023
str2-xxxx-lc1-2# show ip bgp 192.168.35.128/25
BGP routing table entry for 192.168.35.128/25, version 52947
Paths: (1 available, best #1, table default)
Not advertised to any peer
65000 65502
3.3.3.6 from 3.3.3.6 (3.3.3.6)
Origin IGP, localpref 100, valid, internal, best (First path received)
**Community: 1111:1111**
Last update: Tue Sep 26 16:10:09 2023
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
#### Why I did it
src/sonic-linux-kernel
```
* f086121 - (HEAD -> 202305, origin/202305) Intgerate HW-MGMT 7.0030.2008 Changes (#361) (12 hours ago) [Vivek]
* 7551dd9 - arm64: Enable CONFIG_KEXEC_FILE (#360) (13 hours ago) [Pavan Naregundi]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Why I did it
Update SAI version to SAIBuild2305.26.0.0
New features
FDB entries are now restored after warmboot to prevent temporary system flooding.
Update SDK/FW to 4.6.2102/2012.2102
Fixed Issues:
Some of the Warmboot related files which were created by SDK during switch create are now generated during pre shutdown flow
Work item tracking
Microsoft ADO (number only):
How I did it
Updating the versions in make file.
How to verify it
Running sonic-mgmt regression.
cherry pick #16894
Why I did it
Privileges and volumes were incorrectly set in macsec container. Privileged flag is set to false and volumes are not mounted properly.
admin@vlab-01:~$ docker inspect macsec0 | grep Privi
"Privileged": false,
admin@vlab-01:~$ docker inspect macsec0 | grep -A 10 Binds
"Binds": [
"/var/run/redis0:/var/run/redis:rw",
"/var/run/redis-chassis:/var/run/redis-chassis:ro",
"/usr/share/sonic/device/x86_64-nokia_ixr7250e_36x400g-r0/Nokia-IXR7250E-36x100G/0:/usr/share/sonic/hwsku:ro",
"/var/run/redis0/:/var/run/redis0/:rw",
"/usr/share/sonic/device/x86_64-nokia_ixr7250e_36x400g-r0:/usr/share/sonic/platform:ro"
],
Work item tracking
Microsoft ADO (number only):
How I did it
How to verify it
Make sure privileged settings remain unchanged and make sure volumes are properly mounted
admin@vlab-01:~$ docker inspect macsec | grep Privi
"Privileged": false,
admin@vlab-01:~$ docker inspect macsec | grep -A 10 Binds
"Binds": [
"/etc/timezone:/etc/timezone:ro",
"/var/run/redis:/var/run/redis:rw",
"/var/run/redis-chassis:/var/run/redis-chassis:ro",
"/etc/fips/fips_enable:/etc/fips/fips_enable:ro",
"/usr/share/sonic/templates/rsyslog-container.conf.j2:/usr/share/sonic/templates/rsyslog-container.conf.j2:ro",
"/etc/sonic:/etc/sonic:ro",
"/host/warmboot:/var/warmboot",
"/usr/share/sonic/device/x86_64-kvm_x86_64-r0/Force10-S6000/:/usr/share/sonic/hwsku:ro",
"/usr/share/sonic/device/x86_64-kvm_x86_64-r0:/usr/share/sonic/platform:ro"
],
* Support lazy install of sdk drivers
This patch adds support for lazy install of Marvell prestera SDK
drivers for platform-nokia. Lazy install for drivers is added as
updated sdk driver needs to classify the drivers required for platform
during compile time. SDK drivers and platform files are now fetched
from a submodule(mrvl-prestera).
Additionaly, DTB required for sonic_fit creation during compile time
is sourced from sonic-linux-kernel.
Signed-off-by: Pavan Naregundi <pnaregundi@marvell.com>
* Add hugepage cmdline agrument
Updated sdk & driver requries hugepage to be reserved during kernel
boot. These kernel command line agrument are passed from installer.conf
in device folder.
Signed-off-by: Pavan Naregundi <pnaregundi@marvell.com>
* Update SAI deb to 1.12.0-3
Signed-off-by: Pavan Naregundi <pnaregundi@marvell.com>
---------
Signed-off-by: Pavan Naregundi <pnaregundi@marvell.com>
Why I did it
A race condition exists while the TPH is processing a netlink message - if a second netlink message arrives during processing it will be missed since TPH is not listening for other messages.
Another bug was found where TPH was unnecessarily restarting since it was checking admin status instead of operational status of portchannels.
How I did it
Subscribe to APPL_DB for updates on LAG operational state
Track currently sniffed interfaces
How to verify it
Send tunnel packets with destination IP of an unresolved neighbor, verify that ping commands are run
Shut down a portchannel interface, verify that sniffer does not restart
Send tunnel packets, verify ping commands are still run
Bring up portchannel interface, verify that sniffer restarts
Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
Orchagent uses PORTCHANNEL term when parsing this field. Change the YANG model to align to orchagent.
- Why I did it
When specifying PORTCHANNEL in ACL_TABLE_TYPE table YAGN model validation does not pass, when using term LAG orchagent does not accept such table type.
Fix it by aligning YANG model to orchagent.
- How I did it
Fix in YANG model.
- How to verify it
Create custom ACL table type.
Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
#### Why I did it
src/sonic-utilities
```
* 3609e417 - (HEAD -> 202305, origin/202305) [sonic-package-manager] do not modify config_db.json (#3032) (2 hours ago) [Stepan Blyshchak]
* 354dfe80 - [sonic_installer]: Improve exception handling: introduce notes. (#3028) (3 hours ago) [Nazarii Hnydyn]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Why I did it
Share docker image to support gnmi container and telemetry container
backport #16863
Work item tracking
Microsoft ADO 25423918:
How I did it
Create telemetry image from gnmi docker image.
Enable gnmi container and disable telemetry container by default.
How to verify it
Run end to end test.
#### Why I did it
src/sonic-swss
```
* 65720c1a - (HEAD -> 202305, origin/202305) Send hearbeat during warm reboot freese (#2923) (#2956) (14 hours ago) [Hua Liu]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-swss
```
* 9b9ac4fd - (HEAD -> 202305, origin/202305) Add more debug information when PFC WD is triggered (#2858) (8 minutes ago) [Stephen Sun]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Why I did it
To avoid orchagent crash issue like sonic-net/sonic-swss#2935, disable unsupported counters on SONiC management devices.
Work item tracking
Microsoft ADO (number only): 25437720
How I did it
Update the minigraph parser to disable unsupported counters on management devices.
How to verify it
Verified by unittest.
Manually apply patch to DUT and do config load_minigraph
Co-authored-by: Zhijian Li <zhijianli@microsoft.com>
Co-authored-by: StormLiangMS <89824293+StormLiangMS@users.noreply.github.com>
Why I did it
The current DEVICE_NEIGHBOR_METADATA yang model has two issues that would block GCU operation when it checks if the current config aligns with the YANG model:
Missing cluster field in YANG
Incomplete set of device type. The device type in YANG model doesn't include all the device type.
Work item tracking
Microsoft ADO (number only): 25577813
How I did it
Add cluster field in DEVICE_NEIGHBOR_METADATA YANG model.
Change device type to string.
Fix the UT test accordingly.
How to verify it
Build the image and verify the unit tests passed.
Signed-off-by: zitingguo-ms <zitingguo@microsoft.com>
Write error message to syslog when add user failed or connect to TACACS server failed.
Why I did it
With these messages, we can downgrade TACACS server with issue to lower priority.
Work item tracking
Microsoft ADO: 24667696
How I did it
Write error message to syslog when add user failed or connect to TACACS server failed.
How to verify it
Pass all UT.
Manually verify error message generated.
Why I did it
Drop for 8111-32EH-O:
Fix for clear_trap_configuration errors
Fix OREDERED ECMP NHG drop when route is added before members are added
Fix port handling of empty ecmp group to drop packets
Fix for link_notification_handle error
Auto FPD upgrade support
Work item tracking
Microsoft ADO (number only):
How I did it
update platform to 202305.1.0.1
#### Why I did it
src/sonic-gnmi
```
* a49ca56 - (HEAD -> 202305, origin/202305) Merge pull request #167 from zbud-msft/cherry-pick-fix-panic-202305 (11 hours ago) [StormLiangMS]
* 6ba1125 - Merge branch '202305' into cherry-pick-fix-panic-202305 (2 weeks ago) [Zain Budhwani]
* 3a0fbb9 - Fix build error (2 weeks ago) [Zain Budhwani]
* 7fad847 - Recover from potential panic when doing map to JSON serialization (#161) (2 weeks ago) [Zain Budhwani]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-platform-common
```
* e7325db - (HEAD -> 202305, origin/202305) Fix SSD health percentage issue for vendor Virtium (#407) (#408) (11 hours ago) [Stephen Sun]
* 87e33ab - [Credo][Ycable] Remove the thread locker protection from the thread-safe APIs (#388) (11 hours ago) [Xinyu Lin]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-sairedis
```
* 5a052ed - (HEAD -> 202305, origin/202305) [warmboot] Add workaround for `INIT_VIEW` failure (#1252) (11 hours ago) [Jing Zhang]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Why I did it
Add the create_only_config_db_buffers attribute to the DEVICE_METADATA|localhost. If the "create_only_config_db_buffers" exists and is equal to "true" - the buffers will be created according to the config_db configuration (for example BUFFER_QUEUE|* table), otherwise the maximum available buffers (which are read from SAI) will be created, regardless of the CONFIG_DB buffers configuration.
Work item tracking
Microsoft ADO (number only):
How I did it
Add the create_only_config_db_buffers.json files for Mellanox devices (not MSFT SKU's), and inject the content to the CONFIG_DB during the swss docker container start.
How to verify it
Manual verification:
Install the image with this PR included on the not MSFT SKU switch
Check the show queue counters output and verify that only configured in CONFIG_DB buffers are created
root@sonic:/home/admin# show queue counters
Port TxQ Counter/pkts Counter/bytes Drop/pkts Drop/bytes
--------- ----- -------------- --------------- ----------- ------------
Ethernet0 UC0 0 0 0 N/A
Ethernet0 UC1 0 0 0 N/A
Ethernet0 UC2 0 0 0 N/A
Ethernet0 UC3 0 0 0 N/A
Ethernet0 UC4 0 0 0 N/A
Ethernet0 UC5 0 0 0 N/A
Ethernet0 UC6 0 0 0 N/A
Open the /usr/share/sonic/device/$DEVICE/$SKU/create_only_config_db_buffers.json and change it to:
"create_only_config_db_buffers": "false"
Do config reload
Check the show queue counters output and verify that all available buffers are created
root@sonic:/home/admin# show queue counters
Port TxQ Counter/pkts Counter/bytes Drop/pkts Drop/bytes
--------- ----- -------------- --------------- ----------- ------------
Ethernet0 UC0 0 0 0 N/A
Ethernet0 UC1 0 0 0 N/A
Ethernet0 UC2 0 0 0 N/A
Ethernet0 UC3 0 0 0 N/A
Ethernet0 UC4 0 0 0 N/A
Ethernet0 UC5 0 0 0 N/A
Ethernet0 UC6 0 0 0 N/A
Ethernet0 UC7 60 15346 0 N/A
Ethernet0 MC8 N/A N/A N/A N/A
Ethernet0 MC9 N/A N/A N/A N/A
Ethernet0 MC10 N/A N/A N/A N/A
Ethernet0 MC11 N/A N/A N/A N/A
Ethernet0 MC12 N/A N/A N/A N/A
Ethernet0 MC13 N/A N/A N/A N/A
Ethernet0 MC14 N/A N/A N/A N/A
Ethernet0 MC15 N/A N/A N/A N/A
Why I did it
To improve FAST reboot dataplane downtime
Work item tracking
N/A
How I did it
Updated SAI xml config file
How to verify it
Run sonic-mgmt tests of fastboot