Commit Graph

8256 Commits

Author SHA1 Message Date
mssonicbld
f215595699
[submodule] Update submodule sonic-sairedis to the latest HEAD automatically (#17454)
src/sonic-sairedis

* 9621316 - (HEAD -> 202311, origin/202311) [syncd] Remove notify pointers manual handling (#1326) (2 weeks ago) [Kamil Cudnik]
* 4ee9c25 - Add TestSwitch missing attribute (#1327) (2 weeks ago) [noaOrMlnx]
* 4cbbeed - Add SAI Notification support for host_tx_ready (#1307) (2 weeks ago) [noaOrMlnx]
* 9804bd7 - Fix compilation issue due to PORT_STATE_CHANGE_QUEUE_SIZE undefined (#1324) (3 weeks ago) [Ashish Singh]
2023-12-13 15:34:35 -08:00
mssonicbld
2e8c2eba14
Revert "[swss/syncd] remove dependency on interfaces-config.service (#13084) (#14341)" (#15094) (#17367) (#17447) 2023-12-09 10:22:55 +08:00
Aravind-Subbaroyan
62429a2328
Update cisco-8000.ini (#17429)
FCS/CRC Errors will only be reported as RX_ERR.
Fix to avoid the mac port related errors.
Fix for sharedResSize testcase failure in QoS-SAI
Fix the issue related to voltage in 'show platform psustatus'.
Support WRED drop for lossy queues.
Fixed an issue where lossy traffic was getting dropped.
Enhancement of SAI logging for errors and interrupts
2023-12-07 17:04:45 -08:00
Ying Xie
6d22649c81
[202311] lock down some sub module branches (#17405)
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
2023-12-04 18:35:14 -08:00
zitingguo-ms
897a023637 Upgrade xgs SAI version to 8.4.31.0 (#17059)
Why I did it
Upgrade the xgs SAI version to 8.4.31.0 to include the following changes:

8.4.22.0: [SDK upgrade][CSP CS00012314723][SAI_BRANCH rel_ocp_sai_8_4] SID:bcmtmPfcDdrScan thread takes 100% CPU utilization
8.4.23.0: [SDK upgrade][CSP CS00012290176[SAI_BRANCH rel_ocp_sai_8_4] SDK-323160: bcm_l3_ecmp_member_add returns Table Full error while ISSU
8.4.24.0:
[SDK upgrade]Merge "[CSP NA][SAI_BRANCH rel_ocp_sai_8_4] SID: Software LinkScan Not Catching Short Local/Remote Fault Events" into hsdk_6.5.27_SAI_8.4.0_GA
[SDK upgrade][CSP NA][SAI_BRANCH rel_ocp_sai_8_4] SID: Software LinkScan Not Catching Short Local/Remote Fault Events
8.4.25.0: [SAI_BRANCH rel_ocp_sai_8_4]CLONE - SAI - 8.4 - _brcm_sai_cosq_stat_get errors for CPU queue 41
8.4.26.0: [CSP CS00012307911] Fixed incorrect CPU related SAI port obj encoding/decoding in most subsystems
8.4.27.0: [CSP CS00012309154] [TD3] SAI_STATUS_INVALID_PARAMETER on setting SAI_BUFFER_POOL_ATTR_SIZE, OA crash
8.4.28.0: [CSP CS00012315552] Excessive logging from _brcm_sai_acl_tbl_grp_mbr_migration
8.4.29.0: [CSP CS00012321369] Fix TH2 regression with MMU/pool size
8.4.30.0: [SDK upgrade][CSP CS00012316299][SAI_BRANCH rel_ocp_sai_8_4] L3 entry delete failed when SER error is present
8.4.31.0: [CSP CS00012307911] Revert and limit scope of previous change due to WB issue.
Work item tracking
Microsoft ADO (number only): 26021230
How I did it
Upgrade the SAI version in sai.mk file.

How to verify it
Run advanced reboot on TH2 and TD3:

https://dev.azure.com/mssonic/internal/_build/results?buildId=422024&view=results
https://dev.azure.com/mssonic/internal/_build/results?buildId=423352&view=results
@saiarcot895 run warm reboot from 202012 to target image and they've passed
TH2: https://dev.azure.com/mssonic/internal/_build/results?buildId=423112&view=logs&j=76acabad-01e9-5c52-6fe6-d396d63e85d2&t=0d14fb40-14d5-50ca-4a23-af1778140cbf
TH: https://dev.azure.com/mssonic/internal/_build/results?buildId=423119&view=logs&j=76acabad-01e9-5c52-6fe6-d396d63e85d2&t=0d14fb40-14d5-50ca-4a23-af1778140cbf
TD3: https://dev.azure.com/mssonic/internal/_build/results?buildId=423074&view=logs&j=76acabad-01e9-5c52-6fe6-d396d63e85d2&t=0d14fb40-14d5-50ca-4a23-af1778140cbf
2023-12-04 22:14:03 +00:00
Kebo Liu
2528b70630 [Mellanox] Add special rsyslog filter for MSN2410 platform (#17365)
- Why I did it
Mellanox MSN2410 platforms have a non-functional error log: "ERR pmon#sensord: Error getting sensor data: dps460/#10: Can't read". This error is because of a firmware issue with some PSU, we are not able to upgrade the FW online. Since there is no functional impact, this error log can be ignored safely

- How I did it
Add a new rsyslog rule to the rsyslog-container.conf.j2, if the docker name is pmon and the platform name matches, the new rule will be inserted into the docker rsyslogd.conf

- How to verify it
run regression on the MSN2410 platform to make the error log will not be printed to the syslog.

Signed-off-by: Kebo Liu <kebol@nvidia.com>
2023-12-04 22:14:03 +00:00
Sudharsan Dhamal Gopalarathnam
8c782c91a4 [FRR]zebra: Fix fpm multipath encap addition (#17247)
Why I did it
To fix the EVPN type5 failure seen in FRR when there are multipaths for nexthop. The type5 routes were queued

show ip route vrf Vrf1
Codes: K - kernel route, C - connected, S - static, R - RIP,
       O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
       T - Table, v - VNC, V - VNC-Direct, A - Babel, F - PBR,
       f - OpenFabric,
       > - selected route, * - FIB route, q - queued, r - rejected, b - backup
       t - trapped, o - offload failure

VRF Vrf1:
B>q 5.5.5.0/24 [200/0] via 30.0.0.2, Vlan100 onlink, weight 1, 00:00:40
  q                    via 40.0.0.3, Vlan100 onlink, weight 1, 00:00:40
C>* 10.0.0.0/24 is directly connected, Vlan10, 00:00:43
B>q 100.0.0.0/24 [200/0] via 30.0.0.2, Vlan100 onlink, weight 1, 00:00:40
  q                      via 40.0.0.3, Vlan100 onlink, weight 1, 00:00:40
Work item tracking
Microsoft ADO (number only):
How I did it
Porting the FRR fix FRRouting/frr#14835

How to verify it
Validated EVPN multipath with the scenario and confirmed its working.
2023-12-04 22:14:03 +00:00
Dev Ojha
15d9177c14 [Snappi] Update snappi module on sonic-mgmt docker (#17269)
* Update snappi module on Dockerfile.j2

* Update snappi module on Dockerfile.j2

* Update snappi module for py2 and venv
2023-12-04 22:14:03 +00:00
Tomer Shalvi
dccc5bf6cf Media_settings.json Validator Update (#16908)
The format of the media_settings.json file was updated to support the Port SI Per Speed Enhancements. Since media_checker is the validator for the media_settings.json file, it needs to be updated to align with the new format.


How I did it
I added six new SI parameter names introduced as part of the Port SI Per Speed Enhancements. Additionally, I implemented handling for the new hierarchy level (lane_speed_key) in the updated media_settings.json format while maintaining backward compatibility with vendors whose JSON does not support port SI per speed.

How to verify it
I locally built the Debian package using 'make target/debs/bullseye/sonic-device-data_1.0-1_all.deb,' and it completed successfully. Jenkins also built the entire image, which includes the media_checker as part of its process.
2023-12-04 22:14:03 +00:00
Pavan-Nokia
451398f801 [Nokia-7215][armhf] Enable Watchdog service (#16612)
Enable CPUWDT service to enable watchdog
2023-12-04 22:14:03 +00:00
Lawrence Lee
7d308e340a [arp_update]: Flush neighbors with incorrect MAC info (#17238)
[arp_update]: Flush MAC mismatch neighbors

- Check for MAC mismatch between neighbor entries in the kernel and APPL_DB
- Flush any entries with a mismatch
2023-12-04 22:14:03 +00:00
Yaqiang Zhu
345064dccb [dhcp_server] Set to build dhcp_server image in vs image (#17340)
Currently in this repo would not build dhcp_server container image by default, which would cause that building issue for dhcp_server introduced by other modules cannot be noticed in time.
This PR is to set build dhcp_server container in vs image.
2023-12-04 22:14:03 +00:00
ShiyanWangMS
936f8689b9 Remove Python3 venv in Python3-only sonic-mgmt-docker (#17337)
How I did it
Remove Python3 venv in Python3-only sonic-mgmt-docker

How to verify it
There is no impact to sonic-mgmt-docker:latest tag.
Build sonic-mgmt-docker with LEGACY_SONIC_MGMT_DOCKER=y, see python3 venv is there.
Build sonic-mgmt-docker with LEGACY_SONIC_MGMT_DOCKER=n, see python3 venv is NOT included.
2023-12-04 22:14:03 +00:00
Xincun Li
b78e3a0d20 Ensure that 'logrotate-config.service' is set as a dependency to start before 'logrotate.service'. (#17312)
* Ensure that 'logrotate-config.service' is set as a dependency to start before 'logrotate.service'.
2023-12-04 22:14:03 +00:00
arista-nwolfe
dfe7c1e720 [Arista]: Disable SA_EQUALS_DA trap on DNX LC SKUs (#17206)
This change was submitted directly to 202205 but it's also needed in master and 202305 with SAI9.x
#13346

There has been a couple CSPs for this as well:
CS00012273013 - [7.1][J2, J2c+] Disable SA Equals DA trap on DNX
CS00012320965 - SAI9.2: iBGP doesn't work due to SA_EQUALS_DA trap

If SA_EQUALS_DA trap is enabled iBGP won't work as the Ethernet-IB0 ports are expected to get packets with SA==DA.

In the VOQ chassis design, for outgoing control plane packets, the packets goes the recycle port for routing, therefore the dmac of the packet should be the asic router mac. The source mac is assigned by the kernel, so it is also the asic router mac.
2023-12-04 22:14:03 +00:00
Yaqiang Zhu
82cebcd690 [dhcp_server] Rename sonic_dhcp_server to sonic_dhcp_utilities (#17276)
Why I did it
sonic_dhcp_server.whl contains not only dhcp_server functionality but also part of dhcp_relay functionality, the existing naming is not appropriate.
2023-12-04 22:14:03 +00:00
Mai Bui
a40daff883 [docker-sonic-mgmt-framework] limit privileged flag for mgmt-framework container (#17217)
Why I did it
HLD implementation: Container Hardening (sonic-net/SONiC#1364)

Work item tracking
Microsoft ADO (number only): 14807420
How I did it
Reduce linux capabilities in privileged flag

How to verify it
Check container's settings: Privileged is false and container only has default Linux caps, does not have extended caps.
2023-12-04 22:14:03 +00:00
Yaqiang Zhu
ab8af94a2c [dhcp_server] Mark dhcp_server docker as Bullseyse docker (#17290)
How I did it
Mark dhcp_server docker as Bullseyse docker

How to verify it
Set INCLUDE_DHCP_SERVER to y and build image, build successfully
2023-12-04 22:14:03 +00:00
Yaqiang Zhu
7764805aa8 [dhcp_server] Add support for only configures 1 ip in dhcp_server range (#17280)
How I did it
Add support for only configures 1 ip in dhcp_server range.
Treat range with value out of order as invalid range.
2023-12-04 22:14:03 +00:00
Pavan-Nokia
6020fbfac3 [armhf][Nokia-7215] Remove platform reboot (#17010) 2023-12-04 22:14:03 +00:00
Vivek
5c36732f3b [lldp] Clean up service start logic owing to port init start optimization (#17268)
Signed-off-by: Vivek Reddy <vkarri@nvidia.com>
2023-12-04 22:14:02 +00:00
Yaqiang Zhu
f48e8b61cf [dhcp_relay] Use dhcprelayd to manage critical processes (#17236)
Modify j2 template files in docker-dhcp-relay. Add dhcprelayd to group dhcp-relay instead of isc-dhcp-relay-VlanXXX, which would make dhcprelayd to become critical process.
In dhcprelayd, subscribe FEATURE table to check whether dhcp_server feature is enabled.
2.1 If dhcp_server feature is disabled, means we need original dhcp_relay functionality, dhcprelayd would do nothing. Because dhcrelay/dhcpmon configuration is generated in supervisord configuration, they will automatically run.
2.2 If dhcp_server feature is enabled, dhcprelayd will stop dhcpmon/dhcrelay processes started by supervisord and subscribe dhcp_server related tables in config_db to start dhcpmon/dhcrelay processes.
2.3 While dhcprelayd running, it will regularly check feature status (by default per 5s) and would encounter below 4 state change about dhcp_server feature:
A) disabled -> enabled
In this scenario, dhcprelayd will subscribe dhcp_server related tables and stop dhcpmon/dhcrelay processes started by supervisord and start new pair of dhcpmon/dhcrelay processes. After this, dhcpmon/dhcrelay processes are totally managed by dhcprelayd.
B) enabled -> enabled
In this scenaro, dhcprelayd will monitor db changes in dhcp_server related tables to determine whether to restart dhcpmon/dhrelay processes.
C) enabled -> disabled
In this scenario, dhcprelayd would unsubscribe dhcp_server related tables and kill dhcpmon/dhcrelay processes started by itself. And then dhcprelayd will start dhcpmon/dhcrelay processes via supervisorctl.
D) disabled -> disabled
dhcprelayd will check whether dhcrelay processes running status consistent with supervisord configuration file. If they are not consistent, dhcprelayd will kill itself, then dhcp_relay container will stop because dhcprelayd is critical process.
2023-12-04 22:14:02 +00:00
Sudharsan Dhamal Gopalarathnam
e86ceaac90 [FRR]Fixing CVEs CVE-2023-46752 CVE-2023-46753 CVE-2023-47234 CVE-2023-47235 (#17259)
Why I did it
Fixing CVEs CVE-2023-46752 CVE-2023-46753 CVE-2023-47234 CVE-2023-47235

Work item tracking
Microsoft ADO (number only):
How I did it
Porting the fixes in the below PRs

FRRouting/frr#14645
FRRouting/frr#14716

How to verify it
Running regression
2023-12-04 22:14:02 +00:00
Kebo Liu
f96742fb98 [Mellanox] Revert LPM implementation to the old way (#17096)
- Why I did it
The current low power mode setting implementation requests the user to set the port to admin down first before toggling LP mode, this is not backward compatible, now revert it to the old way so that the user can toggle the LP mode regardless of the port admin status.

- How I did it
Revert the recent changes related to LPM in PR #14130 and #16545

- How to verify it
Run all sfputil and SFP platform API related tests on all the Mellanox platforms.

Signed-off-by: Kebo Liu <kebol@nvidia.com>
2023-12-04 22:14:02 +00:00
Sudharsan Dhamal Gopalarathnam
8c1bd85830 [yang]Fixing sonic-cfg-help to handle nested container (#17260)
Why I did it
Fixing sonic-cfg-help to handle nested container scenario. In case of nested container, the inner container name acts as key for the table. For e.g.

"AUTO_TECHSUPPORT": {
        "GLOBAL": {
         }
}
Previous output

AUTO_TECHSUPPORT
Description: AUTO_TECHSUPPORT part of config_db.json

+-------------------------+----------------------------------------------------+-------------+-----------+-------------+
| Field                   | Description                                        | Mandatory   | Default   | Reference   |
+=========================+====================================================+=============+===========+=============+
| state                   | Knob to make techsupport invocation event-driven   |             |           |             |
|                         | based on core-dump generation                      |             |           |             |
+-------------------------+----------------------------------------------------+-------------+-----------+-------------+
| rate_limit_interval     | Minimum time in seconds between two successive     |             |           |             |
|                         | techsupport invocations. Configure 0 to explicitly |             |           |             |
|                         | disable                                            |             |           |             |
+-------------------------+----------------------------------------------------+-------------+-----------+-------------+
| max_techsupport_limit   | Max Limit in percentage for the cummulative size   |             |           |             |
|                         | of ts dumps. No cleanup is performed if the value  |             |           |             |
|                         | isn't configured or is 0.0                         |             |           |             |
+-------------------------+----------------------------------------------------+-------------+-----------+-------------+
| max_core_limit          | Max Limit in percentage for the cummulative size   |             |           |             |
|                         | of core dumps. No cleanup is performed if the      |             |           |             |
|                         | value isn't congiured or is 0.0                    |             |           |             |
+-------------------------+----------------------------------------------------+-------------+-----------+-------------+
| available_mem_threshold | Memory threshold; 0 to disable techsupport         |             | 10.0      |             |
|                         | invocation on memory usage threshold crossing      |             |           |             |
+-------------------------+----------------------------------------------------+-------------+-----------+-------------+
| min_available_mem       | Minimum Free memory (in MB) that should be         |             | 200       |             |
|                         | available for the techsupport execution to start   |             |           |             |
+-------------------------+----------------------------------------------------+-------------+-----------+-------------+
| since                   | Only collect the logs & core-dumps generated since |             |           |             |
|                         | the time provided. A default value of '2 days ago' |             |           |             |
|                         | is used if this value is not set explicitly or a   |             |           |             |
|                         | non-valid string is provided                       |             |           |             |
+-------------------------+----------------------------------------------------+-------------+-----------+-------------+


New output

AUTO_TECHSUPPORT
Description: AUTO_TECHSUPPORT part of config_db.json

key - GLOBAL
+-------------------------+----------------------------------------------------+-------------+-----------+-------------+
| Field                   | Description                                        | Mandatory   | Default   | Reference   |
+=========================+====================================================+=============+===========+=============+
| state                   | Knob to make techsupport invocation event-driven   |             |           |             |
|                         | based on core-dump generation                      |             |           |             |
+-------------------------+----------------------------------------------------+-------------+-----------+-------------+
| rate_limit_interval     | Minimum time in seconds between two successive     |             |           |             |
|                         | techsupport invocations. Configure 0 to explicitly |             |           |             |
|                         | disable                                            |             |           |             |
+-------------------------+----------------------------------------------------+-------------+-----------+-------------+
| max_techsupport_limit   | Max Limit in percentage for the cummulative size   |             |           |             |
|                         | of ts dumps. No cleanup is performed if the value  |             |           |             |
|                         | isn't configured or is 0.0                         |             |           |             |
+-------------------------+----------------------------------------------------+-------------+-----------+-------------+
| max_core_limit          | Max Limit in percentage for the cummulative size   |             |           |             |
|                         | of core dumps. No cleanup is performed if the      |             |           |             |
|                         | value isn't congiured or is 0.0                    |             |           |             |
+-------------------------+----------------------------------------------------+-------------+-----------+-------------+
| available_mem_threshold | Memory threshold; 0 to disable techsupport         |             | 10.0      |             |
|                         | invocation on memory usage threshold crossing      |             |           |             |
+-------------------------+----------------------------------------------------+-------------+-----------+-------------+
| min_available_mem       | Minimum Free memory (in MB) that should be         |             | 200       |             |
|                         | available for the techsupport execution to start   |             |           |             |
+-------------------------+----------------------------------------------------+-------------+-----------+-------------+
| since                   | Only collect the logs & core-dumps generated since |             |           |             |
|                         | the time provided. A default value of '2 days ago' |             |           |             |
|                         | is used if this value is not set explicitly or a   |             |           |             |
|                         | non-valid string is provided                       |             |           |             |
+-------------------------+----------------------------------------------------+-------------+-----------+-------------+


Work item tracking
Microsoft ADO (number only):
How I did it
Fixing sonic-cfg-help tool to handle nested container

How to verify it
Added UT to verify it.
2023-12-04 22:14:02 +00:00
Sudharsan Dhamal Gopalarathnam
2f3b48fe64 [FRR] Fixing zebra to handle non notification of better admin won (#17184)
* [FRR]Fixing zebra to handle non notification of better admin won

* Updating the patch with latest changes from FRR
2023-12-04 22:14:02 +00:00
Shashanka Balakuntala
fad1081b2f [minigraph]: Adding new secondary field to VLAN_INTERFACE table (#16827)
This is change taken as part of the HLD: sonic-net/SONiC#1470.
In this PR we add the logic to parse the SecondarySubnets field in the minigraph and add a flag in "secondary" in the vlan_interface table of the config db.

Microsoft ADO (number only): 16784946

How I did it
Made changes in the minigraph.py to parse the xml entry and add the parsed value to the config db

How to verify it
Added python tests in the sonic-config-engine folder to test the config db entries.
2023-12-04 22:14:02 +00:00
Shashanka Balakuntala
c0963db5a3 [dhcp-relay]: Modify dhcp relay to pick primary address (#17012)
This is change taken as part of the HLD: sonic-net/SONiC#1470 and this is a follow up on the PR #16827 where in the docker-dhcp we pick the value of primary gateway of the interface from the VLAN_Interface table which has "secondary" flag set in the config_db

Microsoft ADO (number only): 16784946

How did I do it
-  Changes in the j2 file to add a new "-pg" parameter in the dhcpv4-relay.agents.j2, the ip would be retrieved from the config db's vlan_interface table such that the interface which are picked will have secondary field set.

- Changes in isc-dhcp to re-order the addresses of the discovered interface and which has the ip which has the passed parameter.
2023-12-04 22:14:02 +00:00
prabhataravind
26ade35fdf [image_config]: Update DHCP rate-limit (#17132)
Change DHCP rate limit in SONiC copp configuration to 100 PPS as this is
necessary to ensure that DHCP flood does not cause LACP/BGP flaps in all
scenarios

This is an extension to the change in image_config: copp: Enable rate limiting 
for bgp, lacp, dhcp, lldp, macsec and udld #14859 and sonic-mgmt change in 
[tests/copp]: Update copp mgmt tests to support new rate-limits sonic-mgmt#8199

Why I did it
300 PPS is not sufficient to prevent LACP/BGP flaps in all cases. 100 PPS seems to
provide better resiliency against DHCP traffic flood to CPU.

Microsoft ADO 25776614:

Send DHCP broadcast packets to DUT and verify that they are trapped to CPU at 100 PPS.

Signed-off-by: Prabhat Aravind <paravind@microsoft.com>
2023-12-04 22:14:02 +00:00
Ze Gan
a87cddc6c9
[docker-database-init.sh]: Fix wrong creating of database_global.json in multi asic platform (#17221)
Fix bug: #17161 (comment)
multi-asic platforms it will never go to the else part as DATABASE_TYPE is always ""


Microsoft ADO (number only): 25072889

Move the checker NAMESPACE_ID == "" back

Signed-off-by: Ze Gan <ganze718@gmail.com>
2023-11-21 09:41:20 -08:00
Xichen96
ee38e2447d
[dhcp_server] Add show dhcp_server ipv4 lease (#17125)
* Add show dhcp_server ipv4 lease
* add ut for show dhcp_server ipv4 lease
2023-11-21 08:42:07 -08:00
mssonicbld
1bf2012de4
[submodule] Update submodule sonic-host-services to the latest HEAD automatically (#17248)
#### Why I did it
src/sonic-host-services
```
* 50db9d3 - (HEAD -> master, origin/master, origin/HEAD) Move sonic-host-services-data from sonic-buildimage into this repo (3 hours ago) [Saikrishna Arcot]
* 1a9442f - Replace libpam-cracklib with libpam-pwquality (3 hours ago) [Saikrishna Arcot]
* 31590a1 - Fix diff output in test for Python 3 (3 hours ago) [Saikrishna Arcot]
* cc3e330 - Specify test dependencies under extra_requires (3 hours ago) [Saikrishna Arcot]
```
#### How I did it
#### How to verify it
#### Description for the changelog
2023-11-21 16:34:30 +08:00
Oleksandr Ivantsiv
c2af11064f
[Mellanox] Change the default breakout mode for internal ports of the Mellanox-SN4700-O28 SKU. (#17192)
- Why I did it
Fix the issue with configuration generation from the minigrapth:

- How I did it
Change the default breakout mode for internal ports to the mode that corresponds platfom.json configuration.

- How to verify it
1. Deploy minigraph
2. Run config load_minigraph -y command
2023-11-21 09:51:36 +02:00
mssonicbld
2e32cba321
[submodule] Update submodule sonic-sairedis to the latest HEAD automatically (#17230) 2023-11-21 15:08:09 +08:00
Mai Bui
6ea03f9f78
[docker-restapi] limit privileged flag for restapi container (#17138)
Why I did it
HLD implementation: Container Hardening (sonic-net/SONiC#1364)

Work item tracking
Microsoft ADO (number only): 14807420
How I did it
Reduce linux capabilities in privileged flag

How to verify it
Run restapi sonic-mgmt tests on sn4600c
Check container's settings: Privileged is false and container only has default Linux caps, does not have extended caps.
2023-11-21 14:50:31 +08:00
jfeng-arista
6dfaf5e293
[sonic-vs]: Add fabric port data for vs test, and start fabricmgrd in vs environment (#16791)
Add fabric port data for vs test, and start fabricmgrd in vs environment.

This PR depends on sonic-net/sonic-sairedis#1301

sonic-net/sonic-swss#2920 needs this one merge first.
2023-11-20 16:21:03 -08:00
Pavan Naregundi
307e39bde4
[Marvell-arm64] Add platform support for rd98DX35xx (#16874)
* [Marvell-arm64] Add platform support for rd98DX35xx

This change adds following two variants of rd98DX35xx board to arm64
build.

Board with CPU integrated into the 98DX35xx switching chip:

 Platform: arm64-marvell_rd98DX35xx-r0
 HwSKU: rd98DX35xx
 ASIC: marvell
 Port Config: 32x1G + 16x2.5G + 6x25G

Board with external CN9131 CPU connected over PCI to 98DX35xx
switching chip:

 Platform: arm64-marvell_rd98DX35xx_cn9131-r0
 HwSKU: rd98DX35xx_cn9131
 ASIC: marvell
 Port Config: 32x1G + 16x2.5G + 6x25G

Change-Id: I21dc9fe972417daaabb20a5bddf7779d72b7972e
Signed-off-by: Pavan Naregundi <pnaregundi@marvell.com>

* Add HWSKU for rd98DX35xx and rd98DX35xx_cn9131

This patch adds new HWSKU's for Marvell arm64 platforms rd98DX35xx
and rd98DX35xx_cn9131.

Change-Id: Id7c14f49f0e304335cc4ca73dcae52362c49d231
Signed-off-by: Pavan Naregundi <pnaregundi@marvell.com>

---------

Signed-off-by: Pavan Naregundi <pnaregundi@marvell.com>
2023-11-20 09:43:02 -08:00
abdosi
4a7aa2634f
[chassis] Support advertisement of Loopback0 of all LC's across all e-BGP peers in TSA mode (#16714)
What I did:
In Chassis TSA mode Loopback0 Ip's of each LC's should be advertise through e-BGP peers of each remote LC's

How I did:

- Route-map policy to Advertise own/self Loopback IP to other internal iBGP peers with a community internal_community as define in constants.yml
- Route-map policy to match on above internal_community when route is received from internal iBGP peers and set a internal tag as define in constants.yml and also delete the internal_community so we don't send to any of e-BGP peers
- In TSA new route-map match on above internal tag and permit the route (Loopback0 IP's of remote LC's) and set the community to traffic_shift_community.
- In TSB delete the above new route-map.

How I verify:

Manual Verification

UT updated.
sonic-mgmt PR: sonic-net/sonic-mgmt#10239


Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
2023-11-20 09:42:02 -08:00
Nazarii Hnydyn
c99ec1f80a
[hash] Add ECMP/LAG Hash Algorithm YANG model (#17079)
- Why I did it
Added YANG model as part of Generic Hash feature development

- How I did it
Added YANG model

- How to verify it
1. Add UT
2. Verified manually with the feature qualification

Signed-off-by: Nazarii Hnydyn <nazariig@nvidia.com>
2023-11-20 17:43:58 +02:00
Nazarii Hnydyn
c43ea1c904
[installer] Create a blank grubenv if doesn't exist. (#17216)
- Why I did it
To fix BIOS firmware update after fresh image installation from ONiE

- How I did it
Initialized empty GRUB environment file after ONiE installation

- How to verify it
1. Install image from ONiE
2. Run BIOS firmware upgrade

Signed-off-by: Nazarii Hnydyn <nazariig@nvidia.com>
2023-11-20 17:33:39 +02:00
Stephen Sun
b93852d53d
[Mellanox] Support running hw-management service on MSN4700 emulation platform (#16584)
- Why I did it
Support running hw-management service on MSN4700 emulation platform.

- How I did it
Use physical EEPROM instead of the fake one
Do not skip PSUd, PCId, thermal control daemon
Adjust PCIe and thermal configuration files
Adjust platform.json for different chassis names and thermals
Remove a patch to hw-management in order to enable it

- How to verify it
Run Nvidia simulation on SN4700 (ASIC and Platform)

Signed-off-by: Stephen Sun <stephens@nvidia.com>
2023-11-19 11:03:46 +02:00
Volodymyr Samotiy
672781e24a
[mlnx-fw-upgrade] Add FW reactivation in case 2 FW upgrades were done without reboot (#17092)
- Why I did it
In order to activate FW after it was upgraded need to perform reboot.
If reboot wasn't performed and user need to upgrade to another SONiC image then it will fail.
The reason for that is that during SONiC upgrade new FW should be installed but it will fail because previously installed FW wasn't activated.
In order to allow 2nd FW upgrade without reboot in-between need to reactivate FW image.
This change handles such flow.

Example of issue scenario:

User installed SONiC image on the switch
Then for some reason FW was upgraded by user or script but reboot was not performed to activate it.
After that upgrade to new SONiC image will fail because new image need to install FW but it fails due to previous one wasn't activated.

- How I did it
In "mlnx-fw-upgrade" script check if FW upgrade failed with the error that FW was already installed but reboot was not performed.
If so then perform FW image reactivation and try to upgrade FW again.

- How to verify it
Install SONiC image on the switch
Then upgrade FW but don't perform reboot.
After that upgrade to new SONiC image and check that upgrade was successfull.

Signed-off-by: Volodymyr Samotiy <volodymyrs@nvidia.com>
2023-11-19 11:01:31 +02:00
Samuel Angebault
c2899eb44c
[Arista] Update platform library submodules (#16701)
Why I did it

- Convert hw-dump into generate-dump plugins
- Enable DRAM scrubber on some products
- Fix xcvr driver active low register bit logic
- Improve cooling algorithm (now considers xcvrs and modules)
- Add linecard graceful shutdown (disabled by default)

The scrubber was enabled for the following products:

- DCS-7050QX-32S
- DCS-7050CX3-32S
- DCS-7060CX-32S
2023-11-17 17:15:39 -08:00
abdosi
e37b4f3cfa
Revert iBGP GTSM feature for VOQ Chassis (#17037)
What I did:

Revert the GTSM feature for VOQ iBGP session done as part of #16777.

Why I did:
On VOQ chassis BGP packets go over Recycle Port and then for Ingress Pipeline Routing making ttl as 254 and failing single hop check.

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
2023-11-17 17:03:37 -08:00
saksarav-nokia
534eed9de7
[Nokia][Nokia-IXR7250E-SUP-10] Update BCM config for supervisor card to reduce the CPU usage (#16790)
Disabled the bcmCNTR thread to reduce the CPU usage for Nokia SFM cards.

Signed-off-by: saksarav <sakthivadivu.saravanaraj@nokia.com>
2023-11-17 15:11:05 -08:00
Ze Gan
9f08f88a0d
[dpu]: Add DPU database service (#17161)
Sub PRs:

sonic-net/sonic-host-services#84
#17191

Why I did it
According to the design, the database instances of DPU will be kept in the NPU host.

Microsoft ADO (number only): 25072889

How I did it
To follow the multiple ASIC design, I assume a new platform environment variable NUM_DPU will be defined in the /usr/share/sonic/device/$PLATFORM/platform_env.conf. Based on this number, NPU host will launch a corresponding number of instances for the DPU database.

Signed-off-by: Ze Gan <ganze718@gmail.com>
2023-11-17 09:10:03 -08:00
arista-nwolfe
00a9412880
[Arista]: Set SYNCD_SHM_SIZE for Arista DNX Devices (#17205)
SAI 9.x requires a SYNCD_SHM_SIZE specified otherwise it will default to 64mb which is insufficient for syncd.

E.G. of a few failures seen when insufficient shmem was set

ha_init:  The file: warmboot_data_0 is of size=762[MB] and is beyond the directory: /dev/shm available storage of size=64[MB]#015
syncd.sh[26074]: Cannot get SYNCD_SHM_SIZE for chip: [869] in /usr/share/sonic/device/x86_64-broadcom_common/syncd_shm.ini. Skip set SYNCD_SHM_SIZE.

Syncd hangs here:

syncd#syncd: [none] SAI_API_SWITCH:_brcm_sai_shr_ha_section_resize:536 start=0x7f6e641b4000, end=0x7f6e645b4000, len=302276608, free=0x7f6e641b4000
Broadcom recommended using 1gb for DNX devices.

Since currently we don't use SAI9.x on master and 202305 this change won't fix anything until we upgrade the SAI on those branches.
2023-11-17 09:06:25 -08:00
mssonicbld
e4878ff1ad
[submodule] Update submodule sonic-dbsyncd to the latest HEAD automatically (#17207)
#### Why I did it
src/sonic-dbsyncd
```
* e294eb0 - (HEAD -> master, origin/master, origin/HEAD) Update the code coverage rate to 80% (#63) (16 hours ago) [xumia]
```
#### How I did it
#### How to verify it
#### Description for the changelog
2023-11-17 16:33:54 +08:00
mssonicbld
ff435ec6cf
[submodule] Update submodule sonic-platform-daemons to the latest HEAD automatically (#17209)
#### Why I did it
src/sonic-platform-daemons
```
* 55a6828 - (HEAD -> master, origin/master, origin/HEAD) Update the code coverage rate to 80% (#406) (16 hours ago) [xumia]
```
#### How I did it
#### How to verify it
#### Description for the changelog
2023-11-17 16:33:46 +08:00
mssonicbld
3393b3069e
[submodule] Update submodule sonic-swss-common to the latest HEAD automatically (#17213) 2023-11-17 15:25:54 +08:00