Commit Graph

8510 Commits

Author SHA1 Message Date
Zain Budhwani
ff5efe8fb3
[eventd] Fix eventd UT flakiness (#17055)
### Why I did it

Fix flakiness of eventd UT - run sub after capture service starts

##### Work item tracking
- Microsoft ADO **(number only)**:25650744

#### How I did it

Run sub socket after capture socket is initialized

#### How to verify it

Pipeline
2024-02-12 21:52:38 -08:00
Pavan Naregundi
c6602c9585
[Marvell-arm64]: Fix SYNCD_RPC build (#17266)
Change-Id: I0bd4932d03141f3f7bc523b49a1bf3d1809817a8

Signed-off-by: Pavan Naregundi <pnaregundi@marvell.com>
2024-02-12 15:11:19 -08:00
Pavan Naregundi
b31a3030fb
[Marvell-arm64] Fix boot issue on rd98DX35xx_cn9131 (#17277)
Change-Id: I411f12963fb8dc0eb3569faf4df68082b852e3a8

Signed-off-by: Pavan Naregundi <pnaregundi@marvell.com>
2024-02-12 15:11:00 -08:00
Prince George
0564ce48c9
[baseimage]: Update smartmontool version >= v7.4 (#17635)
Why I did it
Update smartmontool verson to 7.4. This is done to prevent smartmontools service to exit with non-zero exit status on platform that does not have a SSD/disk to be monitored.

Until Debian Bullseye (which had smartmontools 7.2), Debian had a patch applied that changed the default quit mode to never exit. A bug report was filed on Debian, saying that the source code patch isn't needed and could just be done via command line options, and also that smartmontools 7.3 has a new built-in option to exit with 0 if there are no monitorable devices found (which prevents systemd from treating it as a service failure). Because of that, Debian Bookworm (which also upgraded to 7.3) removed the patch and restored the default behavior of exiting with exit code 17 if there are no devices found.

Smartmontools v7.3 has this issue, because of which smartd exits with non-zero exit status even with "-q" option.

How I did it
Update the smartmontools to version 7.4 which has the fix for exiting gracefully if no monitoring device is found
Added smartd option "-q nodev0" to allow smartd to exit with status 0 if no monitoring device found
2024-02-12 09:37:12 -08:00
Stepan Blyshchak
cac73d80ca
[bootchart] enable command line recording (#17778)
Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
2024-02-12 08:36:44 -08:00
Kebo Liu
1b5f72127a
[Mellanox] Remove SFP sensors from sensors.conf (#17631)
- Why I did it
The cable thermal sensors will be deprecated from the kernel driver. When cable host management is enabled, NOS will fetch the cable temperature from cable EEPROM, kernel driver will not provide the sysfs anymore.

- How I did it
Remove the relevant sensor form the conf files

- How to verify it
Run sonic mgmt sensor test

Signed-off-by: Kebo Liu <kebol@nvidia.com>
2024-02-12 16:12:57 +02:00
Saikrishna Arcot
34bdfc8b39
Add Bookworm swss-layer (#18062)
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
2024-02-09 15:56:26 -08:00
mssonicbld
bd47fd1559
[submodule] Update submodule sonic-utilities to the latest HEAD automatically (#18078)
#### Why I did it
src/sonic-utilities
```
* 81c5349f - (HEAD -> master, origin/master, origin/HEAD) [chassis] fix show bgp summary when no neighbors are present on one ASIC (#3158) (10 hours ago) [Arvindsrinivasan Lakshmi Narasimhan]
```
#### How I did it
#### How to verify it
#### Description for the changelog
2024-02-09 16:32:30 +08:00
snider-nokia
7f3fd1377d
[Nokia-IXR7250E][Devicedata] Update the device data for Nokia IXR7250E platform (thermal logging thresholds) (#18063)
These changes adjust Nokia IXR7250 thermal sensor logging thresholds.

Why I did it
To modify the thermal sensor logging thresholds used on LC and Supervisor.

How I did it
Modified the JSON based thermal logging thresholds used to determine when to log current high sensor temperature and hottest sensor margin fluctuations.

How to verify it
Verify that syslog messages indicating current (high) temperature and margin values are only logged when these respective values fluctuate by at least 5 degrees.
2024-02-08 13:03:05 -08:00
Arvindsrinivasan Lakshmi Narasimhan
4703192d0f
[nokia][chassis][voq] update the sai_post_init soc file with interrupt ids (#18066)
Update/Add the sai_postinit_cmd.soc with the interrupt-ids

Microsoft ADO 26730061:

How to verify it
Verify on the Chassis LCs
2024-02-08 13:01:51 -08:00
dbarashinvd
7a34d4a275
[Mellanox] fix code for warm reboot to work with FW controlled ports (#18065)
- Why I did it
Fix the code to work also after warm reboot to work with FW controlled ports.
In warm reboot the control state sysfs of each port does not change unlike reboot or fast boot.

- How I did it
1. Check procfs cmdline if warm reboot done this is due to the fact pmon don't recognize warm reboot when it's taking place since pmon is loaded after warm reboot is finished.
2. If warm reboot done, check in static detection part for each port if it's FW controlled. If so, leave it this way and stop the state machine flow (set it to final state).

- How to verify it
1. Boot a switch with CMIS host management with at least one FW controlled port (non active cables or non cmis cables) then run warm reboot.
2. Verify no errors of sysfs reading appears for control sysfs
2024-02-08 14:49:56 +02:00
mssonicbld
a554ac40a7
[submodule] Update submodule sonic-sairedis to the latest HEAD automatically (#18068)
#### Why I did it
src/sonic-sairedis
```
* a504933 - (HEAD -> master, origin/master, origin/HEAD) Change dash API pipeline name (#1351) (11 hours ago) [Kamil Cudnik]
```
#### How I did it
#### How to verify it
#### Description for the changelog
2024-02-08 16:32:49 +08:00
mssonicbld
bd4bf76163
[submodule] Update submodule sonic-swss to the latest HEAD automatically (#18070)
#### Why I did it
src/sonic-swss
```
* b3b6a838 - (HEAD -> master, origin/master, origin/HEAD) [test_mux] Multi-mux-nh full test coverage (#3028) (25 minutes ago) [Nikola Dancejic]
* 3bd01444 - Bfd support for TSA state. (#2926) (6 hours ago) [siqbal1986]
```
#### How I did it
#### How to verify it
#### Description for the changelog
2024-02-08 16:32:42 +08:00
mssonicbld
f49711f246
[submodule] Update submodule sonic-utilities to the latest HEAD automatically (#18071)
#### Why I did it
src/sonic-utilities
```
* a3cf5c02 - (HEAD -> master, origin/master, origin/HEAD) Fix the sfputil treats page number as decimal instead of hexadecimal (#3153) (6 hours ago) [Kebo Liu]
* 167f9966 - [Mellanox] Add support of the nvidia-bluefield platform to generate-dump utility. (#3091) (20 hours ago) [Oleksandr Ivantsiv]
```
#### How I did it
#### How to verify it
#### Description for the changelog
2024-02-08 16:32:39 +08:00
jfeng-arista
5a20589415
Start fabric mgr daemon in swss container. (#17473)
The fabricmgr daemon started in vs environment for testing from #16791, we now start the daemon in product code.
2024-02-07 23:45:10 -08:00
mssonicbld
fffd6e6607
[submodule] Update submodule sonic-platform-common to the latest HEAD automatically (#18047)
#### Why I did it
src/sonic-platform-common
```
* 888075d - (HEAD -> master, origin/master, origin/HEAD) [ssd_generic] Add support Transcend ssd-health. (#436) (31 hours ago) [Michael Shih]
```
#### How I did it
#### How to verify it
#### Description for the changelog
2024-02-07 16:32:48 +08:00
mssonicbld
f9e510ccfa
[submodule] Update submodule sonic-dash-api to the latest HEAD automatically (#18050)
#### Why I did it
src/sonic-dash-api
```
* da6899b - (HEAD -> master, origin/master, origin/HEAD) Add/update fields needed for private link implementation (9 hours ago) [Prince Sunny]
* 960eab3 - Merge branch 'master' into pl-api (33 hours ago) [Prince Sunny]
* bc29979 - Merge branch 'master' into pl-api (4 days ago) [Lawrence Lee]
* 2d565d3 - Merge branch 'master' into pl-api (4 days ago) [Lawrence Lee]
* df6c512 - remove tunnel_key (4 days ago) [Lawrence Lee]
* 4d5ebda - Update proto files for PL (4 days ago) [Lawrence Lee]
```
#### How I did it
#### How to verify it
#### Description for the changelog
2024-02-07 16:32:44 +08:00
mssonicbld
22ac869f55
[submodule] Update submodule sonic-swss to the latest HEAD automatically (#18051)
#### Why I did it
src/sonic-swss
```
* b18cbac6 - (HEAD -> master, origin/master, origin/HEAD) [Ci] Fix the test script naming issue (#3021) (81 minutes ago) [xumia]
* 5fd896f6 - [PortOrch] Add FEC codeword errors in port stats (#3029) (87 minutes ago) [vdahiya12]
* 77d56e6e - Fix the Orchagent crash seen during Port channel OC test cases. (#3042) (9 hours ago) [saksarav-nokia]
* 4d470592 - Fix memory leak and object copying bugs in orchagent (#3017) (10 hours ago) [Saikrishna Arcot]
```
#### How I did it
#### How to verify it
#### Description for the changelog
2024-02-07 16:32:41 +08:00
mssonicbld
18bba22f88
[submodule] Update submodule sonic-utilities to the latest HEAD automatically (#18052)
#### Why I did it
src/sonic-utilities
```
* 0408226f - (HEAD -> master, origin/master, origin/HEAD) Fix `sudo config load_mgmt_config` fails with error "File /var/run/dhclient.eth0.pid does not exist" (#3149) (18 hours ago) [Mai Bui]
```
#### How I did it
#### How to verify it
#### Description for the changelog
2024-02-07 16:32:37 +08:00
zitingguo-ms
74494010e1
[Broadcom] Upgrade xgs SAI to 10.1.6.0 (#18044)
Why I did it
Upgrade the xgs SAI version to 10.1.6.0 to include the following fix:

10.1.6.0: [CS00012332630][SAI_BRANCH rel_ocp_sai_10_1] SAI - OTHER - [SAI BUG] sflow use psample to send packet, but the psample in linux version is not right.
10.1.4.0: [CS00012329827]ECMP LB traffic polarization, configure hash_offset along with hash_seed attr
10.1.3.0: Double commit test code fixes in EM for 10.1.
10.1.2.0: fix ODP packaging in rel_ocp_sai_10_1
10.1.1.0: Use knet-cb procfs path for DNX port speed sampling rate (does not use new genl)
Work item tracking
Microsoft ADO (number only): 26720003
How I did it
Upgrade xgs SAI version in sai.mk file.

How to verify it
Run full qual on s6100 T1: https://elastictest.org/scheduler/testplan/65c1c2e69e3e72f540cae34b
2024-02-07 09:29:40 +08:00
mssonicbld
c8371422fb
[submodule] Update submodule dhcprelay to the latest HEAD automatically (#18046)
#### Why I did it
src/dhcprelay
```
* 363fa06 - (HEAD -> master, origin/master, origin/HEAD) Skip vlans with no dhcpv6 server configured (#46) (8 hours ago) [kellyyeh]
```
#### How I did it
#### How to verify it
#### Description for the changelog
2024-02-06 16:32:27 +08:00
mssonicbld
858107eb28
[submodule] Update submodule sonic-swss to the latest HEAD automatically (#18048)
#### Why I did it
src/sonic-swss
```
* d566e15a - (HEAD -> master, origin/master, origin/HEAD) Allow L4 port range egress ACL rules on DNX (#3014) (9 hours ago) [arista-nwolfe]
```
#### How I did it
#### How to verify it
#### Description for the changelog
2024-02-06 16:32:22 +08:00
mssonicbld
3d9cf77c26
[submodule] Update submodule sonic-utilities to the latest HEAD automatically (#18049)
#### Why I did it
src/sonic-utilities
```
* b5487357 - (HEAD -> master, origin/master, origin/HEAD) [route_check.py] account static routes in route_check.py (#3120) (9 hours ago) [Stepan Blyshchak]
* 64e1f9f4 - [Mellanox buffer migrator] Do not touch the buffer model on generic SKUs if the buffer configuration is empty (#3114) (19 hours ago) [Stephen Sun]
```
#### How I did it
#### How to verify it
#### Description for the changelog
2024-02-06 16:32:18 +08:00
Oleksandr Ivantsiv
ea02734b8d
[dhcp-server] Change the kea-dhcp4 PID file directory to tmpfs. (#17974) 2024-02-05 10:26:46 -08:00
Yaqiang Zhu
c323ccfa72
[dhcp_server][yang] Update supported option type to string (#18029) 2024-02-05 10:25:55 -08:00
Yevhen Fastiuk
2f35079979
[Mellanox] Fix uninitialized variable on module plug event (#17011)
- Why I did it
To fix uninitialized variable

- How I did it
Add initial value

Signed-off-by: Yevhen Fastiuk <yfastiuk@nvidia.com>
2024-02-05 19:41:16 +02:00
dbarashinvd
0aacc1f28e
[Mellanox] fix sysfs reading that gets garbage end of line using strip (#17830)
- Why I did it
when reading sysfs fd upon python poller events, there's end of line garbage like "# 012" (without space between the 2 parts) trailing the real value of 1 or 0

- How I did it
using python strip() to remove end of line

- How to verify it
run the CMIS host management feature on a switch
wait few minutes until switch completes boot up sequence including CMIS host manager
then disconnect or reconnect a port to create a poller event
2024-02-05 19:39:55 +02:00
mssonicbld
529031210f
[submodule] Update submodule sonic-sairedis to the latest HEAD automatically (#18019)
#### Why I did it
src/sonic-sairedis
```
* e5b8d4e - (HEAD -> master, origin/master, origin/HEAD) Make changes to support compiling on Bookworm (with GCC 12) (#1344) (3 days ago) [Saikrishna Arcot]
```
#### How I did it
#### How to verify it
#### Description for the changelog
2024-02-05 16:32:54 +08:00
Stepan Blyshchak
e1a8d2a6e8
[nvidia][syncd] fix incorrect permission of /tmp in syncd container (#17777)
Fixes #16034
2024-02-05 00:00:29 -08:00
mssonicbld
412cd7acbf
[submodule] Update submodule sonic-dash-api to the latest HEAD automatically (#18017)
#### Why I did it
src/sonic-dash-api
```
* ec15bc7 - (HEAD -> master, origin/master, origin/HEAD) Revert "rename VnetMapping.action_type" (#17) (2 hours ago) [Ze Gan]
* ad0f59e - Add unspecified default value to all enums (2 days ago) [Lawrence Lee]
*   dd844b1 - Merge branch 'add-enum-default' of github.com:theasianpianist/sonic-dash-api into add-enum-default (4 days ago) [Lawrence Lee]
|\  
| * 4b31135 - Merge branch 'master' into add-enum-default (4 days ago) [Lawrence Lee]
* | 4b41ea7 - rename VnetMapping.action_type (4 days ago) [Lawrence Lee]
|/  
* b1ab99f - Add unspecified default value to all enums (4 days ago) [Lawrence Lee]
```
#### How I did it
#### How to verify it
#### Description for the changelog
2024-02-05 14:33:07 +08:00
StormLiangMS
185d2f4e62
fix the compile issue for slim image (#18015)
Why I did it
The PR introduced a bug for slim image build, #17905, by which the sonic_asic_platform is missing when build docker image for slim image.

[ building ] [ target/docker-dhcp-relay.gz ]
/sonic/dockers/docker-dhcp-relay/cli-plugin-tests /sonic
/sonic
Traceback (most recent call last):
  File "/usr/local/bin/j2", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.9/dist-packages/j2cli/cli.py", line 202, in main
    output = render_command(
  File "/usr/local/lib/python3.9/dist-packages/j2cli/cli.py", line 186, in render_command
    result = renderer.render(args.template, context)
  File "/usr/local/lib/python3.9/dist-packages/j2cli/cli.py", line 85, in render
    return self._env \
  File "/usr/lib/python3/dist-packages/jinja2/environment.py", line 1090, in render
    self.environment.handle_exception()
  File "/usr/lib/python3/dist-packages/jinja2/environment.py", line 832, in handle_exception
    reraise(*rewrite_traceback_stack(source=source))
  File "/usr/lib/python3/dist-packages/jinja2/_compat.py", line 28, in reraise
    raise value.with_traceback(tb)
  File "/sonic/dockers/docker-dhcp-relay/Dockerfile.j2", line 48, in top-level template code
    {% if build_reduce_image_size != "y" or sonic_asic_platform != "broadcom" %}
jinja2.exceptions.UndefinedError: 'sonic_asic_platform' is undefined
make: *** [slave.mk:1072: target/docker-dhcp-relay.gz] Error 1
make: *** Waiting for unfinished jobs....
[ finished ] [ target/docker-swss-layer-bullseye.gz ]
[ finished ] [ target/docker-syncd-brcm-dnx.gz ]
make[1]: *** [Makefile.work:608: target/sonic-broadcom.bin] Error 2
make[1]: Leaving directory '/data/work/1/s'
make: *** [Makefile:41: target/sonic-broadcom.bin] Error 2
And why it slipped the PR test? PR test doesn't compile with slim option, it won't check sonic_asic_platform != "broadcom" for PR build.

Work item tracking
Microsoft ADO (number only):
How I did it
Export sonic_asic_platform for docker build in slave.mk

How to verify it
build with slim image option.
2024-02-04 10:30:58 +08:00
mssonicbld
6c258bec64
[submodule] Update submodule sonic-swss-common to the latest HEAD automatically (#18021)
#### Why I did it
src/sonic-swss-common
```
* 3c3ae57 - (HEAD -> master, origin/master, origin/HEAD) Provide build flag to Disable compilation of libyang dependent interfaces (#853) (5 hours ago) [Vivek]
```
#### How I did it
#### How to verify it
#### Description for the changelog
2024-02-03 16:32:18 +08:00
mssonicbld
665184ee43
[submodule] Update submodule sonic-platform-common to the latest HEAD automatically (#18010)
#### Why I did it
src/sonic-platform-common
```
* 538ec67 - (HEAD -> master, origin/master, origin/HEAD) Tx/Rx power values should be rounded up to 3 decimal places (#432) (6 hours ago) [mihirpat1]
```
#### How I did it
#### How to verify it
#### Description for the changelog
2024-02-02 16:32:19 +08:00
wenyiz2021
892f171b80
[Master] [DNX SAI] Update DNX SAI to 9.2.X and SDK on master branch (#17935)
SAI 9.2.x was sanitized and posted on 202305 branch: https://github.com/sonic-net/sonic-buildimage/pull/17432/files

Posting SAI 9.2.x to master branch also.

26607678
2024-02-01 17:44:48 -08:00
Ze Gan
89137b8fc9
[ci]: Enable daily building for ubuntu20.04 to every branch (#17520)
- The ubuntu 2004 is needed by 202311
- Because the artifacts of ubuntu2004 are used by other repos, a daily building is needed without an updating of this repo for a long time.

Signed-off-by: Ze Gan <ganze718@gmail.com>
2024-02-01 11:14:30 -08:00
Ze Gan
1c901b8f12
[docker-database]: Install sonic-dash-api CLI in database container (#17479)
Add sonic-dash-api CLI in database container for decoding the dash objects from protobuf to readable json.

Signed-off-by: Ze Gan <ganze718@gmail.com>
2024-02-01 11:13:51 -08:00
Dror Prital
4af43dc63b
[Mellanox] Update SIMX version to 23.10-1123 (#17958)
- Why I did it
Update NVIDIA SIMX Version to 23.10-1123

- How I did it
Changed fw.mk file
2024-01-31 19:41:23 +02:00
mssonicbld
36cd5b6a24
[submodule] Update submodule sonic-swss-common to the latest HEAD automatically (#17955)
#### Why I did it
src/sonic-swss-common
```
* 253ceb6 - (HEAD -> master, origin/master, origin/HEAD) Fix race condition in ZmqServer. (#850) (23 hours ago) [mint570]
```
#### How I did it
#### How to verify it
#### Description for the changelog
2024-01-31 16:32:33 +08:00
Sudharsan Dhamal Gopalarathnam
77384494b3
[Mellanox]Update SDK/FW to 4.6.2202/2012.2202 (#17947)
- Why I did it
Update SDK/FW version to 4.6.2202/2012.2202

Fixed issues:
1. On Spectrum-3 systems, ports' toggling while sending traffic on 400G speed ports, might result in stuck FW.
2. In Spectrum-1 switch systems, 50G SR2 speed mode is not supported when AutoNeg is enabled. In this case although the max interface speed is 50G for SR2 or SR4 or SR, the actual max interface speed negotiated between the loopback is 25G.
3. On Spectrum-2 and Spectrum-3, Switch create in fastboot might take more than 40 seconds in case there are no active links.
4. When performing warmboot from version prior to 202205 to 202205 and above , no aging and mac move take place

- How I did it
Updating make files.

-How to verify it
Running regression
2024-01-31 08:35:16 +02:00
mssonicbld
3cdc76e18c
[submodule] Update submodule sonic-platform-pde to the latest HEAD automatically (#17953)
#### Why I did it
src/sonic-platform-pde
```
* f2cc748 - (HEAD -> master, origin/master, origin/HEAD) Merge pull request #35 from nonodark/local (21 hours ago) [賓少鈺]
* 607e920 - Fix 'Chassis' object has no attribute 'get_num_psu' in test_psu.py (3 weeks ago) [nonodark]
```
#### How I did it
#### How to verify it
#### Description for the changelog
2024-01-31 14:32:16 +08:00
Baorong Liu
d1cce42f4a
[staticroutebfd] fix an error in error logging (#17043)
Why I did it
Fix an error in the log_err call.
this error can be triggered by an invalid static route key. usually the code cannot go here with normal config file. but hit this issue with an invalid key by manual testing with redis-cli directly. the file is scanned by Python lint to prevent such errors.

Work item tracking
Microsoft ADO ():26250268

How I did it
fix the format error.

How to verify it
1, ran pylint to check the design, make sure no such error in the design file.
2, wrote a separate python program to verify the log call.
In the current logging related testing, usually use patch/mock for logging. for this specific error, could not trigger it if we call mock function instead the real function in the design. so need to do lint checking for code change.
2024-01-30 22:21:46 -08:00
Zain Budhwani
c8439cdd4b
Disable eventd and rsyslog plugin in slim images (#17905)
### Why I did it

Disable eventd at buildtime for slim images

##### Work item tracking
- Microsoft ADO **(number only)**:26386286

#### How I did it

Add flags for disabling eventd and only copy rsyslog conf files when eventd is included and not slim image

#### How to verify it

Manual testing
2024-01-30 22:14:23 -08:00
Lior Avramov
865042ed23
[Nvidia] Update syncd docker to use python version 3 (#17735)
* Remove python2 from compilation of python-sdk-api

* Upgrade Python version in syncd RPC docker image to Python3
2024-01-30 13:47:39 -08:00
kellyyeh
90056a92ac
Only add to DHCP_RELAY if dhcpv6 servers exist (#17770) 2024-01-30 10:02:34 -08:00
xumia
bb5a420de5
[Build] Fix krb5 package not found issue (#17926)
Why I did it
Fix the build issue caused by the wrong version specified.

See the build error logs:

Try 4: /usr/bin/wget --retry-connrefused failed to get: -O
--2024-01-26 11:38:23--  https://sonicstorage.blob.core.windows.net/public/fips/bullseye/0.10/amd64/libk5crypto3_1.18.3-6+deb11u14+fips_amd64.deb
Resolving sonicstorage.blob.core.windows.net (sonicstorage.blob.core.windows.net)... 20.60.59.131
Connecting to sonicstorage.blob.core.windows.net (sonicstorage.blob.core.windows.net)|20.60.59.131|:443... connected.
HTTP request sent, awaiting response... 404 The specified blob does not exist.
2024-01-26 11:38:23 ERROR 404: The specified blob does not exist..

Try 5: /usr/bin/wget --retry-connrefused failed to get: -O
make[1]: *** [Makefile:12: /sonic/target/debs/bullseye/symcrypt-openssl_0.10_amd64.deb] Error 8
make[1]: Leaving directory '/sonic/src/sonic-fips'
Work item tracking
Microsoft ADO (number only): 26577929
The package not installed but PR passed issue is traced in another issue #17927

How I did it
Add the libkrb5-dev and the depended packages to fix docker-sonic-vs build failure.
The package libzmq3-dev has dependency on the libkrb5-dev.
2024-01-30 21:44:32 +08:00
mssonicbld
2683e378e9
[submodule] Update submodule sonic-sairedis to the latest HEAD automatically (#17939)
#### Why I did it
src/sonic-sairedis
```
* 5b2a517 - (HEAD -> master, origin/master, origin/HEAD) Revert "add if statement for module control mode support" (#1341) (22 hours ago) [dbarashinvd]
```
#### How I did it
#### How to verify it
#### Description for the changelog
2024-01-30 16:32:35 +08:00
mssonicbld
bf9b6091d9
[submodule] Update submodule sonic-utilities to the latest HEAD automatically (#17940)
#### Why I did it
src/sonic-utilities
```
* 3d45c0c6 - (HEAD -> master, origin/master, origin/HEAD) Migrate GNMI table (#3053) (9 hours ago) [ganglv]
```
#### How I did it
#### How to verify it
#### Description for the changelog
2024-01-30 16:32:31 +08:00
Liping Xu
c23c8afbf4
handle json load exception in bgpmon (#17856)
Why I did it
ICM reported due to "BGPMon Process exited" which was caused by json load exception.

Work item tracking
Microsoft ADO (number only):
25916773
How I did it
Add an exception handle during json load.

How to verify it
Verified locally, add debug log to modify the output string of cmd to make it not with json formation, then check the syslog.
2024-01-29 15:55:28 +08:00
Kevin Wang
5516381d7e
[qos] change the template keyword from Compute-AI to ComputeAI (#17902)
Why I did it
Align the keywords to make qos configuration take effect

Work item tracking
Microsoft ADO (number only):
How I did it
Change the keyword to ComputeAI

How to verify it
reload minigraph and check the qos configuration
2024-01-29 10:10:54 +08:00
Volodymyr Samotiy
f1d6655004
[Mellanox] Disable SSD NCQ on Mellanox platforms (#17567)
- Why I did it
Based on some research some products might experience an occasional IO failures in the communication between CPU and SSD because of NCQ.
There seems to be a problem between some kernel versions and some SATA controllers.

Syslog error message examples:

Error "ata1: SError: { UnrecovData Handshk }" - "failed command: WRITE FPDMA QUEUED".
Error "ata1: SError: { RecovComm HostInt PHYRdyChg CommWake 10B8B DevExch }" - "failed command: READ FPDMA QUEUED".
Some vendors already disabled NCQ on their platforms in SONiC due to similar issue:

[Arista] Disable ATA NCQ for a few products #13739 [Arista] Disable ATA NCQ for a few products
[Arista] Disable SSD NCQ on DCS-7050CX3-32S #13964 [Arista] Disable SSD NCQ on DCS-7050CX3-32S
Also there are other discussions on Debian/Ubuntu forums about similar issues and it was suggested to disable NCQ:

https://askubuntu.com/questions/133946/are-these-sata-errors-dangerous

- How I did it
Add a kernel parameter to tell libata to disable NCQ

- How to verify it
Use FIO tool - fio --direct=1 --rw=randrw --bs=64k --ioengine=libaio --iodepth=64 --runtime=120 --numjobs=4
2024-01-28 16:26:07 +02:00