Why I did it
To fix the EVPN type5 failure seen in FRR when there are multipaths for nexthop. The type5 routes were queued
show ip route vrf Vrf1
Codes: K - kernel route, C - connected, S - static, R - RIP,
O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
T - Table, v - VNC, V - VNC-Direct, A - Babel, F - PBR,
f - OpenFabric,
> - selected route, * - FIB route, q - queued, r - rejected, b - backup
t - trapped, o - offload failure
VRF Vrf1:
B>q 5.5.5.0/24 [200/0] via 30.0.0.2, Vlan100 onlink, weight 1, 00:00:40
q via 40.0.0.3, Vlan100 onlink, weight 1, 00:00:40
C>* 10.0.0.0/24 is directly connected, Vlan10, 00:00:43
B>q 100.0.0.0/24 [200/0] via 30.0.0.2, Vlan100 onlink, weight 1, 00:00:40
q via 40.0.0.3, Vlan100 onlink, weight 1, 00:00:40
Work item tracking
Microsoft ADO (number only):
How I did it
Porting the FRR fixFRRouting/frr#14835
How to verify it
Validated EVPN multipath with the scenario and confirmed its working.
The format of the media_settings.json file was updated to support the Port SI Per Speed Enhancements. Since media_checker is the validator for the media_settings.json file, it needs to be updated to align with the new format.
How I did it
I added six new SI parameter names introduced as part of the Port SI Per Speed Enhancements. Additionally, I implemented handling for the new hierarchy level (lane_speed_key) in the updated media_settings.json format while maintaining backward compatibility with vendors whose JSON does not support port SI per speed.
How to verify it
I locally built the Debian package using 'make target/debs/bullseye/sonic-device-data_1.0-1_all.deb,' and it completed successfully. Jenkins also built the entire image, which includes the media_checker as part of its process.
[arp_update]: Flush MAC mismatch neighbors
- Check for MAC mismatch between neighbor entries in the kernel and APPL_DB
- Flush any entries with a mismatch
Currently in this repo would not build dhcp_server container image by default, which would cause that building issue for dhcp_server introduced by other modules cannot be noticed in time.
This PR is to set build dhcp_server container in vs image.
* Update sonic-utilities to master branch version
sonic-utilities was (intentionally) pointing to a commit on a fork,
since merging sonic-utilities's changes for Bookworm first onto the
master branch would result in PR checker failures. Now that
sonic-buildimage is on master branch and the Bookworm changes in
sonic-utilities have been merged into master, sonic-utilties can now
point to master.
17e77fe2 Revert "Run yang validation in unit test (#3025)" (#3055)
96dd5559 [dhcp_relay] Fix dhcp_relay counter display issue (#3054)
6dfeee69 [sflow][db_migrator] Egress Sflow support (#3020)
02a588b7 Don't collect /proc/sched_debug
d7ec3251 Fix error about having a mutable default for field headers in dataclass
0ab3ab91 Fix test execution on Bookworm (#3041)
ef8f6f83 Specify test dependencies under extra_requires
61c44e80 Update python packages
1e813105 [wol] Implement wol command line utility (#3048)
8ebc56a0 [sonic_installer]: Improve exception handling: introduce notes. (#3029)
3610ce93 [sonic-package-manager] Fix YANG validation failure on upgrade when feature has constraints in YANG model on FEATURE table (#2933)
cfd2dd39 Add container rsyslog.conf to the sys dump (#3039)
c4b07828 Support new platform in generic configuration update (#3038)
a8d236c8 [fast-reboot-filter-routes.py] Remove click and improve error reporting (#3030)
75199c0f [sonic-package-manager] insert newline in /etc/sonic/generated_services.conf (#3040)
cd855698 [VOQ][saidump] Modify generate_dump: replace save_saidump with save_saidump_by_route_size (#2972)
f1e24ae5 GCU support for Cisco-8000 features (#3010)
67e1c3dc Update GCU rsyslog validator (#3012)
253b7975 [sonic-package-manager] do not modify config_db.json (#3032)
177dd8e8 [sonic-package-manager] add generated service to /etc/sonic/generated_services.conf (#3037)
62fcd77a Configure NTP according to extended configuration (#2835)
ced09404 [dualtor_neighbor_check] Adjust zero-mac check condition (#3034)
a4eeb698 [config] config reload should generate sysinfo if missing (#3031)
e01fc891 Run yang validation in unit test (#3025)
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
#### Why I did it
src/sonic-host-services
```
* 445ec8b - (HEAD -> master, origin/master, origin/HEAD) Revert "Add support to make determine/process reboot-cause services restartable (#86)" (#89) (31 hours ago) [anamehra]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-platform-daemons
```
* e2d9f87 - (HEAD -> master, origin/master, origin/HEAD) Add dynamic sensor logic for fixed and psu presence/state checking in thermalctld (#401) (27 hours ago) [Gregory Boudreau]
```
#### How I did it
#### How to verify it
#### Description for the changelog
How I did it
Remove Python3 venv in Python3-only sonic-mgmt-docker
How to verify it
There is no impact to sonic-mgmt-docker:latest tag.
Build sonic-mgmt-docker with LEGACY_SONIC_MGMT_DOCKER=y, see python3 venv is there.
Build sonic-mgmt-docker with LEGACY_SONIC_MGMT_DOCKER=n, see python3 venv is NOT included.
#### Why I did it
src/sonic-swss
```
* 14408ca3 - (HEAD -> master, origin/master, origin/HEAD) [Chassis][master][orchagent] : Added test case to verify WRED profile on system ports (#2954) (9 hours ago) [vmittal-msft]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-mgmt-common
```
* d96bfcd - (HEAD -> master, origin/master, origin/HEAD) YANG tree generator and linter (#113) (6 hours ago) [faraazbrcm]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-platform-common
```
* 2efe97e - (HEAD -> master, origin/master, origin/HEAD) Fix VDM freeze and unfreeze needed for PM stats collection (#402) (3 hours ago) [jaganbal-a]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-sairedis
```
* 9621316 - (HEAD -> master, origin/master, origin/HEAD) [syncd] Remove notify pointers manual handling (#1326) (19 hours ago) [Kamil Cudnik]
```
#### How I did it
#### How to verify it
#### Description for the changelog
* Fix sonic-mgmt docker build due to Bookworm changes
Because of the Bookworm upgrade, when some build target is specified on
the command line, the build system will try to build everything for
buster and bullseye distros, even if it's not needed by the target.
As a workaround, call the underlying Makefile.work script with
`BLDENV=bullseye` to have it only build packages related to sonic-mgmt.
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
* Mark docker-sonic-mgmt as a Bullseye container
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
---------
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
This change was submitted directly to 202205 but it's also needed in master and 202305 with SAI9.x
#13346
There has been a couple CSPs for this as well:
CS00012273013 - [7.1][J2, J2c+] Disable SA Equals DA trap on DNX
CS00012320965 - SAI9.2: iBGP doesn't work due to SA_EQUALS_DA trap
If SA_EQUALS_DA trap is enabled iBGP won't work as the Ethernet-IB0 ports are expected to get packets with SA==DA.
In the VOQ chassis design, for outgoing control plane packets, the packets goes the recycle port for routing, therefore the dmac of the packet should be the asic router mac. The source mac is assigned by the kernel, so it is also the asic router mac.
Why I did it
sonic_dhcp_server.whl contains not only dhcp_server functionality but also part of dhcp_relay functionality, the existing naming is not appropriate.
#### Why I did it
src/sonic-sairedis
```
* 4ee9c25 - (HEAD -> master, origin/master, origin/HEAD) Add TestSwitch missing attribute (#1327) (12 hours ago) [noaOrMlnx]
* 4cbbeed - Add SAI Notification support for host_tx_ready (#1307) (18 hours ago) [noaOrMlnx]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-swss-common
```
* 8dc6218 - (HEAD -> master, origin/master, origin/HEAD) Add STATE_TRANSCEIVER_INFO_TABLE_NAME to shcema.h (#824) (12 hours ago) [noaOrMlnx]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-mgmt-common
```
* 268b67c - (HEAD -> master, origin/master, origin/HEAD) Integrating the transformer infra GET optimization, Request context cancel handling and other bug fixes (#111) (2 hours ago) [Balachandar Mani]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Why I did it
HLD implementation: Container Hardening (sonic-net/SONiC#1364)
Work item tracking
Microsoft ADO (number only): 14807420
How I did it
Reduce linux capabilities in privileged flag
How to verify it
Check container's settings: Privileged is false and container only has default Linux caps, does not have extended caps.
Modify j2 template files in docker-dhcp-relay. Add dhcprelayd to group dhcp-relay instead of isc-dhcp-relay-VlanXXX, which would make dhcprelayd to become critical process.
In dhcprelayd, subscribe FEATURE table to check whether dhcp_server feature is enabled.
2.1 If dhcp_server feature is disabled, means we need original dhcp_relay functionality, dhcprelayd would do nothing. Because dhcrelay/dhcpmon configuration is generated in supervisord configuration, they will automatically run.
2.2 If dhcp_server feature is enabled, dhcprelayd will stop dhcpmon/dhcrelay processes started by supervisord and subscribe dhcp_server related tables in config_db to start dhcpmon/dhcrelay processes.
2.3 While dhcprelayd running, it will regularly check feature status (by default per 5s) and would encounter below 4 state change about dhcp_server feature:
A) disabled -> enabled
In this scenario, dhcprelayd will subscribe dhcp_server related tables and stop dhcpmon/dhcrelay processes started by supervisord and start new pair of dhcpmon/dhcrelay processes. After this, dhcpmon/dhcrelay processes are totally managed by dhcprelayd.
B) enabled -> enabled
In this scenaro, dhcprelayd will monitor db changes in dhcp_server related tables to determine whether to restart dhcpmon/dhrelay processes.
C) enabled -> disabled
In this scenario, dhcprelayd would unsubscribe dhcp_server related tables and kill dhcpmon/dhcrelay processes started by itself. And then dhcprelayd will start dhcpmon/dhcrelay processes via supervisorctl.
D) disabled -> disabled
dhcprelayd will check whether dhcrelay processes running status consistent with supervisord configuration file. If they are not consistent, dhcprelayd will kill itself, then dhcp_relay container will stop because dhcprelayd is critical process.
Why I did it
Fixing CVEs CVE-2023-46752 CVE-2023-46753 CVE-2023-47234 CVE-2023-47235
Work item tracking
Microsoft ADO (number only):
How I did it
Porting the fixes in the below PRs
FRRouting/frr#14645FRRouting/frr#14716
How to verify it
Running regression
- Why I did it
The current low power mode setting implementation requests the user to set the port to admin down first before toggling LP mode, this is not backward compatible, now revert it to the old way so that the user can toggle the LP mode regardless of the port admin status.
- How I did it
Revert the recent changes related to LPM in PR #14130 and #16545
- How to verify it
Run all sfputil and SFP platform API related tests on all the Mellanox platforms.
Signed-off-by: Kebo Liu <kebol@nvidia.com>
Why I did it
Fixing sonic-cfg-help to handle nested container scenario. In case of nested container, the inner container name acts as key for the table. For e.g.
"AUTO_TECHSUPPORT": {
"GLOBAL": {
}
}
Previous output
AUTO_TECHSUPPORT
Description: AUTO_TECHSUPPORT part of config_db.json
+-------------------------+----------------------------------------------------+-------------+-----------+-------------+
| Field | Description | Mandatory | Default | Reference |
+=========================+====================================================+=============+===========+=============+
| state | Knob to make techsupport invocation event-driven | | | |
| | based on core-dump generation | | | |
+-------------------------+----------------------------------------------------+-------------+-----------+-------------+
| rate_limit_interval | Minimum time in seconds between two successive | | | |
| | techsupport invocations. Configure 0 to explicitly | | | |
| | disable | | | |
+-------------------------+----------------------------------------------------+-------------+-----------+-------------+
| max_techsupport_limit | Max Limit in percentage for the cummulative size | | | |
| | of ts dumps. No cleanup is performed if the value | | | |
| | isn't configured or is 0.0 | | | |
+-------------------------+----------------------------------------------------+-------------+-----------+-------------+
| max_core_limit | Max Limit in percentage for the cummulative size | | | |
| | of core dumps. No cleanup is performed if the | | | |
| | value isn't congiured or is 0.0 | | | |
+-------------------------+----------------------------------------------------+-------------+-----------+-------------+
| available_mem_threshold | Memory threshold; 0 to disable techsupport | | 10.0 | |
| | invocation on memory usage threshold crossing | | | |
+-------------------------+----------------------------------------------------+-------------+-----------+-------------+
| min_available_mem | Minimum Free memory (in MB) that should be | | 200 | |
| | available for the techsupport execution to start | | | |
+-------------------------+----------------------------------------------------+-------------+-----------+-------------+
| since | Only collect the logs & core-dumps generated since | | | |
| | the time provided. A default value of '2 days ago' | | | |
| | is used if this value is not set explicitly or a | | | |
| | non-valid string is provided | | | |
+-------------------------+----------------------------------------------------+-------------+-----------+-------------+
New output
AUTO_TECHSUPPORT
Description: AUTO_TECHSUPPORT part of config_db.json
key - GLOBAL
+-------------------------+----------------------------------------------------+-------------+-----------+-------------+
| Field | Description | Mandatory | Default | Reference |
+=========================+====================================================+=============+===========+=============+
| state | Knob to make techsupport invocation event-driven | | | |
| | based on core-dump generation | | | |
+-------------------------+----------------------------------------------------+-------------+-----------+-------------+
| rate_limit_interval | Minimum time in seconds between two successive | | | |
| | techsupport invocations. Configure 0 to explicitly | | | |
| | disable | | | |
+-------------------------+----------------------------------------------------+-------------+-----------+-------------+
| max_techsupport_limit | Max Limit in percentage for the cummulative size | | | |
| | of ts dumps. No cleanup is performed if the value | | | |
| | isn't configured or is 0.0 | | | |
+-------------------------+----------------------------------------------------+-------------+-----------+-------------+
| max_core_limit | Max Limit in percentage for the cummulative size | | | |
| | of core dumps. No cleanup is performed if the | | | |
| | value isn't congiured or is 0.0 | | | |
+-------------------------+----------------------------------------------------+-------------+-----------+-------------+
| available_mem_threshold | Memory threshold; 0 to disable techsupport | | 10.0 | |
| | invocation on memory usage threshold crossing | | | |
+-------------------------+----------------------------------------------------+-------------+-----------+-------------+
| min_available_mem | Minimum Free memory (in MB) that should be | | 200 | |
| | available for the techsupport execution to start | | | |
+-------------------------+----------------------------------------------------+-------------+-----------+-------------+
| since | Only collect the logs & core-dumps generated since | | | |
| | the time provided. A default value of '2 days ago' | | | |
| | is used if this value is not set explicitly or a | | | |
| | non-valid string is provided | | | |
+-------------------------+----------------------------------------------------+-------------+-----------+-------------+
Work item tracking
Microsoft ADO (number only):
How I did it
Fixing sonic-cfg-help tool to handle nested container
How to verify it
Added UT to verify it.
#### Why I did it
src/sonic-snmpagent
```
* 3b6a4ad - (HEAD -> master, origin/master, origin/HEAD) Enable faulthandler to provide more fault information (#301) (22 hours ago) [Hua Liu]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-sairedis
```
* 9804bd7 - (HEAD -> master, origin/master, origin/HEAD) Fix compilation issue due to PORT_STATE_CHANGE_QUEUE_SIZE undefined (#1324) (2 days ago) [Ashish Singh]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-swss
```
* 2ca3deb0 - (HEAD -> master, origin/master, origin/HEAD) [dash] fix DASH ACL Rule protocol use-after-free (#2958) (9 hours ago) [Yakiv Huryk]
* b8841ecb - [orchagent]: Extend the SRv6Orch to support the programming of the L3Adj (#2902) (24 hours ago) [Carmine Scarpitta]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-platform-common
```
* cb80f17 - (HEAD -> master, origin/master, origin/HEAD) Fix issue: QSFP module with id 0x0d can be parsed using 8636 (#412) (20 hours ago) [Stephen Sun]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-linux-kernel
```
* b2601c7 - (HEAD -> master, origin/master, origin/HEAD) [mellanox] Update Kernel patches and Kconfig for Linux 6.1.x (#359) (3 hours ago) [Vivek]
* ba37b4d - Ported Fullcone NAT changes are ported from 5.10 to 6.1 kernel. (#357) (3 hours ago) [Akhilesh Samineni]
* b899479 - Bookworm:AMD-Pensando ELBA SOC support (#353) (3 hours ago) [Shantanu Shrivastava]
* 07a6d64 - [marvell-arm64]: Update kernel patches for Linux 6.1.x (#352) (3 hours ago) [Keshav Gupta]
* 73abe79 - Set CONFIG_IGB to m for the build to work (#340) (3 hours ago) [Vivek]
* 0c12436 - Use bookworm-tagged slave container for now (3 hours ago) [Saikrishna Arcot]
* aca1572 - Use bookworm slave container (3 hours ago) [Saikrishna Arcot]
* bbf045a - Update kernel to 6.1.38 (3 hours ago) [Saikrishna Arcot]
```
#### How I did it
#### How to verify it
#### Description for the changelog
This is change taken as part of the HLD: sonic-net/SONiC#1470.
In this PR we add the logic to parse the SecondarySubnets field in the minigraph and add a flag in "secondary" in the vlan_interface table of the config db.
Microsoft ADO (number only): 16784946
How I did it
Made changes in the minigraph.py to parse the xml entry and add the parsed value to the config db
How to verify it
Added python tests in the sonic-config-engine folder to test the config db entries.
This is change taken as part of the HLD: sonic-net/SONiC#1470 and this is a follow up on the PR #16827 where in the docker-dhcp we pick the value of primary gateway of the interface from the VLAN_Interface table which has "secondary" flag set in the config_db
Microsoft ADO (number only): 16784946
How did I do it
- Changes in the j2 file to add a new "-pg" parameter in the dhcpv4-relay.agents.j2, the ip would be retrieved from the config db's vlan_interface table such that the interface which are picked will have secondary field set.
- Changes in isc-dhcp to re-order the addresses of the discovered interface and which has the ip which has the passed parameter.
Change DHCP rate limit in SONiC copp configuration to 100 PPS as this is
necessary to ensure that DHCP flood does not cause LACP/BGP flaps in all
scenarios
This is an extension to the change in image_config: copp: Enable rate limiting
for bgp, lacp, dhcp, lldp, macsec and udld #14859 and sonic-mgmt change in
[tests/copp]: Update copp mgmt tests to support new rate-limits sonic-mgmt#8199
Why I did it
300 PPS is not sufficient to prevent LACP/BGP flaps in all cases. 100 PPS seems to
provide better resiliency against DHCP traffic flood to CPU.
Microsoft ADO 25776614:
Send DHCP broadcast packets to DUT and verify that they are trapped to CPU at 100 PPS.
Signed-off-by: Prabhat Aravind <paravind@microsoft.com>
#### Why I did it
src/sonic-host-services
```
* 5dcd1e5 - (HEAD -> master, origin/master, origin/HEAD) Add support to make determine/process reboot-cause services restartable (#86) (6 hours ago) [anamehra]
```
#### How I did it
#### How to verify it
#### Description for the changelog
#### Why I did it
src/sonic-swss
```
* 194566a7 - (HEAD -> master, origin/master, origin/HEAD) Fix the Orchagent Qos error messages reported in Issue #16787 (#2947) (6 hours ago) [saksarav-nokia]
```
#### How I did it
#### How to verify it
#### Description for the changelog
Why I did it
[Bookworm] Update platform-modules-dell for Bookworm #16735
How I did it
Modified platform driver to comply with bookworm kernel.
Removed MODULE_SUPPORTED_DEVICE wherever used.
Modified python build commands for building whl packages.
How to verify it
Verify whether all the platform bookworm debs are built.
make target/debs/bookworm/platform-modules-z9100_1.1_amd64.deb
Load the platform debian into the device and install it in bookworm image.
Verify the platform related CLI and the functionality
Why I did it
Update SDK/SAI and FW for Mellanox Platform
How I did it
Update SDK/FW to v4.6.2104/v2012.2104
Fixed Issues:
Some of the Warmboot related files which were created by SDK during switch create are now generated during pre shutdown flow
New Features:
Debian 12 and kernel 6.1 support
Update SAI
New Features:
Auto Fec Support
FDB entries are now restored after warmboot to prevent temporary system flooding.
Minor Enhancement and Bug Fix in integrate-mlnx-sdk
How to verify it
Build Image and run tests
Signed-off-by: Vivek Reddy <vkarri@nvidia.com>
Why I did it
Add platform support for Debian 12 (Bookworm) on Mellanox Platform
How I did it
Update hw-management to v7.0030.2008
Deprecate the sfp_count == module_count approach in favour of asic init completion
Ref: Mellanox/hw-mgmt@bf4f593
Add xxd package to base image which is required by hw-management scripts
Add the non-upstream flag into linux kernel cache options
Update the thermalctl logic based on new sysfs attributes
Fix the integrate-mlnx-hw-mgmt script to not populate the arm64 Kconfig
How to verify it
Build kernel and run platform tests
Signed-off-by: Vivek Reddy <vkarri@nvidia.com>
Co-authored-by: Junchao-Mellanox <junchao@nvidia.com>
Co-authored-by: Junchao-Mellanox <57339448+Junchao-Mellanox@users.noreply.github.com>
Add a note saying if running on a recent kernel, then Docker 20.10.10 or
newer needs to be used. This is because in Bookworm, glibc will use the
`clone3` syscall, which is not properly handled by Docker's seccomp
filter in versions older than 20.10.10.
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>