Commit Graph

5734 Commits

Author SHA1 Message Date
Hua Liu
a20b43e502
[202012] Check config file not empty after modify it in hostcfgd. (#14385)
**What I did**
Check /etc/pam.d/sshd integrity after modify it in hostcfgd.

**Why I did it**
Found some incident that /etc/pam.d/sshd become empty file during OR upgrade. 

**How I verified it**
Pass all UT.
Add new UT to cover new code.

**Details if related**
This is a manually cherry-pick PR for https://github.com/sonic-net/sonic-host-services/pull/36
2023-03-27 00:30:05 -07:00
Neetha John
43aec133da
[202012] [qos] Update RDMA-CENTRIC lossy profile to use static threshold for Th devices (#14398)
Backport #14372 to 202012

Why I did it
For better accounting purposes, updating the ingress lossy traffic profile to use static threshold. This change is only intended for Th devices using RDMA-CENTRIC profiles

How I did it
Update the buffer templates for Th devices in RDMA-CENTRIC folder to use the correct threshold

Signed-off-by: Neetha John <nejo@microsoft.com>
2023-03-24 10:41:48 -07:00
Ying Xie
a027b37a56
[build] Fix isc-dhcp full version in rules.mk (#13288) (#14376)
During the build process, a dsc file is retrieved from the URL:
http://deb.debian.org/debian/pool/main/i/isc-dhcp/isc-dhcp_4.4.1-2.3.dsc

Depending on the DNS resolution, the server reached may respond with a
HTTP 404 error code, what stops the build process.
In all cases, the URL http://deb.debian.org/debian/pool/main/i/isc-dhcp/
no more lists this DSC file but one with a different format.

The suffix "+deb11u1" is now appended to identify the debian version.

- append this suffix to the make file rules of isc-dhcp

Signed-off-by: Guillaume Lambert <guillaume.lambert@orange.com>
Co-authored-by: Guilt <guillaume.lambert@orange.com>
2023-03-21 20:06:11 -07:00
Neetha John
cd85a2e2c1
[202012] [submodule] Update submodule for sonic-utilities (#14357)
This PR includes the following commits
```
5b0f0fc [202012][dhcp_relay] Fix dhcp_relay restart error while add/del vlan (sonic-net/sonic-utilities#2688)
48fd842 [show][muxcable] increase timeout for displaying HW_STATUS (sonic-net/sonic-utilities#2712)
f0a9f4f [dhcp_relay] Add show/clear/counter cli for dhcp_relay (sonic-net/sonic-utilities#2719)
8627944 Revert "[202012] Update load minigraph to load backend acl" (sonic-net/sonic-utilities#2736)
93c7d43 [warm-reboot] Use kexec_file_load instead of kexec_load when available (sonic-net/sonic-utilities#2608)
cc78747 [warm/fast-reboot] Backup logs from tmpfs to disk during fast/warm shutdown (sonic-net/sonic-utilities#2714)
```
2023-03-21 10:55:05 -07:00
vdahiya12
857d74d4fe
[202012][sonic-platform-daemons][sonic-utilities] update submodule (#14048)
For sonic-platform-daemons following commits are added to the submodule

dd8fbae (HEAD -> 202012, origin/202012) [ycabled] add more coverage to ycabled; add minor name change for vendor API CLI return key-values pairs (#338)
846555e [thermalctld] fix some redundant removal of state DB tables (#315)
3d92fb9 Use github code scanning instead of LGTM (#316)

For sonic-utilities the following commits are added in this PR to the submodule
git log --oneline 39cdb49c..202012
ec4c6ea5 (HEAD -> 202012, origin/202012) [show][muxcable] add some new commands health, reset-cause, queue_info support for muxcable (#2414) (#2704)
03ef272e [202012][vlan] Remove add field of vlanid to DHCP_RELAY table while adding vlan (#2681)
e00a81ac [202012][dhcp-relay] Add support for dhcp_relay config cli (#2640)
274184e1 [vlan] Refresh dhcpv6_relay config while adding/deleting a vlan (#2660) (#2668

#### Why I did it
updating the submodule of sonic-platform-daemons, sonic-utilities

#### How I did it

updated the submodule
2023-03-20 13:43:14 -07:00
Neetha John
6c7e24381e [storage_backend] Add backend acl service (#14229)
Why I did it
This PR addresses the issue mentioned above by loading the acl config as a service on a storage backend device

How I did it
The new acl service is a oneshot service which will start after swss and does some retries to ensure that the SWITCH_CAPABILITY info is present before attempting to load the acl rules. The service is also bound to sonic targets which ensures that it gets restarted during minigraph reload and config reload

How to verify it
Build an image with the following changes and did the following tests

Verified that acl is loaded successfully on a storage backend device after a switch boot up
Verified that acl is loaded successfully on a storage backend ToR after minigraph load and config reload
Verified that acl is not loaded if the device is not a storage backend ToR or the device does not have a DATAACL table

Signed-off-by: Neetha John <nejo@microsoft.com>
2023-03-20 20:25:21 +00:00
Neetha John
94f9942ef6 Update dynamic threshold for TD2 (#14224)
Why I did it
Update dynamic threshold to -1 to get optimal performance for RDMA traffic

How I did it
Modified pg_profile_lookup.ini to reflect the correct value

Signed-off-by: Neetha John <nejo@microsoft.com>
2023-03-20 20:25:17 +00:00
kellyyeh
e528408d14 Update dhcpmon rx/tx packet filtering and fix server rx count (#13898)
Why I did it
Dhcpmon had incorrect RX count for server side packets. It does not raise any false alarms, but could miss catching server side packet count mismatch between snapshot and current counter.

Add debug mode which prints counter to syslog

How I did it
Due to dualtor inbound filter requirement, there are currently two filters, each for listening to rx / tx packets.
Originally, we opened up an rx/tx socket for each interface specified, which causes duplicate socket. Now we initialize the sockets only once. Both sockets are not binded to an interface, and we use vlan to interface mapping to filter packets. For inbound uplinks, we use a portchannel to interface mapping.

Previous dhcpmon counter before dual tor change:
[ Agg-Vlan1000- Current rx/tx] Discover: 1/ 4, Offer: 1/ 1, Request: 3/ 12, ACK: 1/ 1
[ eth0- Current rx/tx] Discover: 0/ 0, Offer: 0/ 0, Request: 0/ 0, ACK: 0/ 0
[ eth0- Current rx/tx] Discover: 0/ 0, Offer: 0/ 0, Request: 0/ 0, ACK: 0/ 0
[ PortChannel104- Current rx/tx] Discover: 0/ 1, Offer: 0/ 0, Request: 0/ 3, ACK: 0/ 0
[ PortChannel103- Current rx/tx] Discover: 0/ 1, Offer: 0/ 0, Request: 0/ 3, ACK: 0/ 0
[ PortChannel102- Current rx/tx] Discover: 0/ 2, Offer: 1/ 0, Request: 0/ 6, ACK: 1/ 0
[ PortChannel101- Current rx/tx] Discover: 0/ 0, Offer: 0/ 0, Request: 0/ 0, ACK: 0/ 0
[ Vlan1000- Current rx/tx] Discover: 1/ 0, Offer: 0/ 1, Request: 3/ 0, ACK: 0/ 1
[ Agg-Vlan1000- Current rx/tx] Discover: 1/ 4, Offer: 1/ 1, Request: 3/ 12, ACK: 1/ 1

Dhcpmon counter after this PR:
[ PortChannel104- Current rx/tx] Discover: 0/ 1, Offer: 0/ 0, Request: 0/ 3, ACK: 0/ 0
[ PortChannel103- Current rx/tx] Discover: 0/ 1, Offer: 0/ 0, Request: 0/ 3, ACK: 0/ 0
[ PortChannel102- Current rx/tx] Discover: 0/ 2, Offer: 1/ 0, Request: 0/ 6, ACK: 1/ 0
[ PortChannel101- Current rx/tx] Discover: 0/ 0, Offer: 0/ 0, Request: 0/ 0, ACK: 0/ 0
[ Vlan1000- Current rx/tx] Discover: 1/ 0, Offer: 0/ 1, Request: 3/ 0, ACK: 0/ 1
[ Agg-Vlan1000- Current rx/tx] Discover: 1/ 4, Offer: 1/ 1, Request: 3/ 12, ACK: 1/ 1

How to verify it
Ran dhcp relay test to send all four packets in singles and batches on both single ToR and dual ToR. Counter was as expected.
2023-03-20 20:25:13 +00:00
Zain Budhwani
a78d4c9750
[202012] Update sonic-telemetry submodule (#14174)
Has following commits:
```
b93c4ac Zain Budhwani Wed Mar 1 15:45:43 2023 -0800 Fix crash when retrieving cpu utilization (sonic-net/sonic-gnmi#70) (sonic-net/sonic-gnmi#71)
af1ec19 Zain Budhwani Wed Mar 1 15:13:53 2023 -0800 Add diff cov (sonic-net/sonic-gnmi#85)
3f41377 Zain Budhwani Tue Feb 28 16:48:22 2023 -0800 Add logs for md5 checksum (sonic-net/sonic-gnmi#80)
67b7fb2 Zain Budhwani Mon Feb 27 23:44:49 2023 -0800 Add get-update to azp yml (sonic-net/sonic-gnmi#79)
5d6c47f Zain Budhwani Fri Feb 24 13:11:53 2023 -0800 Add net core and code coverage results (sonic-net/sonic-gnmi#77)
984bc6d Zain Budhwani Wed Feb 22 16:03:01 2023 -0800 [202012] Enable unit test (sonic-net/sonic-gnmi#76)
e8e4335 Zain Budhwani Fri Feb 10 16:27:58 2023 -0800 Change dir name in pipeline (sonic-net/sonic-gnmi#75)
a1cc7ab Zain Budhwani Tue Jan 31 14:11:27 2023 -0800 Add 202012 branch to pr checker (sonic-net/sonic-gnmi#72)
eaea6c5 ganglyu Mon Nov 14 10:18:07 2022 +0800 Fix format
```
2023-03-20 11:58:17 -07:00
mssonicbld
fd33a01796 [ci/build]: Upgrade SONiC package versions 2023-03-19 20:51:09 +08:00
mssonicbld
36cc9ae5d6
[ci/build]: Upgrade SONiC package versions (#14310) 2023-03-18 19:01:08 +08:00
mssonicbld
b791970c1c
[ci/build]: Upgrade SONiC package versions (#14306) 2023-03-18 09:39:48 +08:00
Yakiv Huryk
ab5115846d
[202012][Mellanox] update sdk/fw build procedure (#14025) (#14220)
- Why I did it
To optimize Mellanox platform build

- How I did it
sdk debs are now downloaded as Spectrum-SDK-Drivers-SONiC-Bins release
sx kernel is downloaded as zip from Spectrum-SDK-Drivers
2023-03-16 12:42:19 +02:00
Prince Sunny
e2e3625500
[202012][Submodule] update for sonic-restapi (#14241)
Update sonic-restapi for the following commit:

44121be - 2023-03-14: Support ipv6 prefix length greater than 64 and check for adv_prefix
47e4b53 - 2023-03-15: Set allowed IPv6 pfx len to be 60
2023-03-15 17:10:28 -07:00
Sudharsan Dhamal Gopalarathnam
79548e472d
[Mellanox]Fix lpmode set when logical port is larger than 64 (#14138) (#14202)
Manual cherry-pick of https://github.com/sonic-net/sonic-buildimage/pull/14138
- Why I did it
In sfplpm API, the number of logical ports is hardcoded as 64. When a system contains more port than this, the SDK APIs would fail with a syslog as below

Mar 7 03:53:58.105980 r-leopard-58 ERR syncd#SDK: [MGMT_LIB.ERR] Slot [0] Module [0] has logport [0x00010069] in enabled state
Mar 7 03:53:58.105980 r-leopard-58 ERR syncd#SDK: [SDK_MGMT_LIB.ERR] Failed in __sdk_mgmt_phy_module_pwr_attr_set, error: Internal Error
Mar 7 03:53:58.106118 r-leopard-58 ERR pmon#-c: Error occurred when setting power mode for SFP module 0, slot 0, error code 1

- How I did it 
Remove the hardcoded value of 64. Obtained the number of logical ports from SDK

- How to verify it 
Manual testing
2023-03-14 10:19:02 -07:00
xumia
18d049082e
[ci/build]: Upgrade SONiC package versions (#14205)
Why I did it
[ci/build]: Upgrade SONiC package versions

How I did it
How to verify it
2023-03-14 08:00:29 +08:00
Samuel Angebault
9de3b4936b
Add comment with affected products (#13803)
#### Why I did it

Some products might experience an occasional IO failure in the communication between CPU and SSD.
Based on some research it could be attributable to some device not handling ATA NCQ (Native Command Queue).

This issue currently affect 4 products:
 - `DCS-7170-32C*`
 - `DCS-7170-64C`
 - `DCS-7060DX4-32`
 - `DCS-7260CX3-64`
 
#### How I did it

This change disable NCQ on the affected drive for a small set of products.

#### How to verify it

When the fix is applied, these 2 patterns can be found in the dmesg.
`ata1.00: FORCE: horkage modified (noncq)`
`NCQ (not used)`

Test results using: `fio --direct=1 --rw=randrw --bs=64k --ioengine=libaio --iodepth=64 --runtime=120 --numjobs=4`

with NCQ (`ata1.00: 61865984 sectors, multi 1: LBA48 NCQ (depth 32), AA`)
```
   READ: bw=33.9MiB/s (35.6MB/s), 33.9MiB/s-33.9MiB/s (35.6MB/s-35.6MB/s), io=4073MiB (4270MB), run=120078-120078msec
  WRITE: bw=34.1MiB/s (35.8MB/s), 34.1MiB/s-34.1MiB/s (35.8MB/s-35.8MB/s), io=4100MiB (4300MB), run=120078-120078msec
```
without NCQ (`ata1.00: 61865984 sectors, multi 1: LBA48 NCQ (not used)`)
```
   READ: bw=31.7MiB/s (33.3MB/s), 31.7MiB/s-31.7MiB/s (33.3MB/s-33.3MB/s), io=3808MiB (3993MB), run=120083-120083msec
  WRITE: bw=31.9MiB/s (33.4MB/s), 31.9MiB/s-31.9MiB/s (33.4MB/s-33.4MB/s), io=3830MiB (4016MB), run=120083-120083msec
```

#### Description for the changelog
Disable ATA NCQ for a few Arista products
2023-03-13 13:23:31 -07:00
Ashwin Srinivasan
9e7b038d9c
[202012] Added libpci and pciutils to the pmon docker (#12684) (#14056)
#### Why I did it

This is part of a corresponding change to the pcie daemon that enables it to verify PCI peripherals on a platform against a preconfigured YAML file, and enables the pcied daemon to call the system commands needed for PCI peripheral verification

#### How I did it
Adding aforementioned libraries to the Dockerfile.j2 file

#### How to verify it
run 'which setpci' from the pmon docker - would show the path of the binary

#### Description for the changelog

Modified pmon's Dockerfile.j2 to include pciutils and libpci libraries.

**cherry-pick of SHA: 7de04504c9518d68aa00c304b7376fdff4e1d318**
2023-03-08 17:32:41 -08:00
Marty Y. Lok
f0c1ef0abc
[marvell-armhf][uboot] Fixed the uboot setting for sonic-installer set-default form 202012 to 202205 branch. (#13911)
#### Why I did it
When using ```sonic-install set-default``` to switch the image from 202012 to 202205.  The system will be stuck at loading kernel while reboot.

#### How I did it
The issue is caused by the kernal size related setting in uboot environment is smaller in the 202012 branch while they are larger in 202205 branch.  The "sonic-installer set-default" just changes the boot_next variable.  To fix this issue, we sync up the 202012 branch kernel related setting with the 202205 branch.  This PR is only applicable to 202012 branch.

#### How to verify it
1) Install the latest 202205 image .89 or latest and reboot
2) Install the 202012 image which contains this fix and reboot
3) using "sonic-installer set-default 202205 image and reboot
4) system should start without any issue. 
  
#### Ensure to add label/tag for the feature raised. example - PR#2174 under sonic-utilities repo. where, Generic Config and Update feature has been labelled as GCU.
2023-03-08 15:27:49 -08:00
prabhataravind
6f949226d1
[202012][swss]: Submodule update (#14171)
* Include the following commits:
  - a21b160 [202012][orchagent]: Handle duplicate routes in a graceful manner (#2666)
  - 1540161 [bfdorch] add default TOS value for BFD packet (#2692)
  - 860430c [ci] run apt-get update before apt-get install (#2686)
2023-03-08 14:35:29 -08:00
Sudharsan Dhamal Gopalarathnam
545b526a49
[202012][mellanox]Fix lpmode set when logical port is larger than 64 (#14137)
This PR is to backport #14138 to 202012.

- Why I did it
In sfplpm API, the number of logical ports is hardcoded as 64. When a system contains more port than this, the SDK APIs would fail with a trace as below

Enabling low-power mode for port Ethernet0... Traceback (most recent call last):
File "/usr/share/sonic/platform/plugins/sfplpmset.py", line 167, in
set_lpmode(handle, cmd, sfp_module)
File "/usr/share/sonic/platform/plugins/sfplpmset.py", line 128, in set_lpmode
SX_MGMT_PHY_MOD_PWR_ATTR_PWR_MODE_E, SX_MGMT_PHY_MOD_PWR_MODE_LOW_E)
File "/usr/share/sonic/platform/plugins/sfplpmset.py", line 115, in pwr_attr_set
mgmt_phy_mod_pwr_attr_set(handle, module_id, attr_type, power_mode)
File "/usr/share/sonic/platform/plugins/sfplpmset.py", line 84, in mgmt_phy_mod_pwr_attr_set
assert SX_STATUS_SUCCESS == rc, "sx_mgmt_phy_mod_pwr_attr_set failed"
AssertionError: sx_mgmt_phy_mod_pwr_attr_set failed
Error! Unable to set LPM for 1, rc = 1, err msg: [+] opening sdk
Mar 07 03:25:28 INFO LOG: Initializing SX log with STDOUT as output file.
Mar 07 03:25:28 ERROR SX_API_PORT: sx_mgmt_phy_mod_pwr_attr_get: This API is deprecated and will be removed in the future. Please use sx_mgmt_phy_module_pwr_attr_get in its place.
Mar 07 03:25:28 ERROR SX_API_PORT: sx_mgmt_phy_mod_pwr_attr_set: This API is deprecated and will be removed in the future. Please use sx_mgmt_phy_module_pwr_attr_set in its place.

- How I did it
Remove the hardcoded value of 64. Obtained the number of logical ports from SDK

- How to verify it
Manual testing
2023-03-09 00:04:09 +02:00
SuvarnaMeenakshi
481f51f45c
[202012][sonic-snmpagent]: Advance submodule (#14111)
#### Why I did it
Update sonic-snmpagent submodule to include below commit:
fba50c6  [202012]: snmp vlan support per RFC1213 and added the missing support for RFC2863 (#279)
2023-03-07 11:16:19 -08:00
xumia
2ca6ec484e
[202012][Security][CVE-2022-2309] Upgrade lxml from 4.6.5 to 4.9.1 (#14066)
Why I did it
Fix CVE-2022-2309, upgrade lxml from 4.6.3 to 4.9.1
2023-03-07 09:43:46 +00:00
xumia
280939b5c9 [Build] Support to use loosen version when failed to install python packages (#14013)
Why I did it
[Build] Support to use loosen version when failed to install python packages
It is to fix the issue #14012

How I did it
Try to use the installation command without constraint

How to verify it
2023-03-07 04:57:35 +00:00
mssonicbld
06be00525a
[ci/build]: Upgrade SONiC package versions (#14080) 2023-03-05 04:31:07 +08:00
Ikki Zhu
be46225033 [Seastone] fix dx010 qsfp eeprom data write issue (#13930)
Why I did it
Platform cases test_tx_disable, test_tx_disable_channel, test_power_override failed in dx010.

How I did it
Add i2c access algorithm for CPLD i2c adapters.

How to verify it
Verify it with platform_tests/api/test_sfp.py::TestSfpApi test cases.
2023-03-02 20:06:09 +00:00
Zain Budhwani
3776ddb7c8 Remove dialout as critical process (#14006)
#### Why I did it

Remove dialout as critical process as it is no longer used in prod. As part of future work, can remove dialout completely

#### How I did it

Remove from critical process list
2023-03-02 20:06:09 +00:00
jhli-cisco
b7ef7fce16
Update cisco-8000.ini (#14009)
#### Why I did it
Update cisco platform module to 202012-v0.2.6

#### How I did it
Update cisco-8000.ini
2023-03-01 16:05:11 -08:00
Saikrishna Arcot
26b0e7f709 Use tmpfs for /var/log on Arista 7050CX3-32S (#13805)
This is to reduce writes to the SSD on the device.

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
2023-02-28 18:23:40 +00:00
Ikki Zhu
f47024cdfd add psu fans status led available config (#13926)
Why I did it
Seastone does not have the psu fans' status led, need to reflect it in platform.json.

How I did it
Set the psu fans status led available to false.

How to verify it
Verify it with platform_tests/api/test_psu_fans.py::TestPsuFans::test_set_fans_led case.
2023-02-28 08:18:28 +00:00
mssonicbld
cc17c7ac11
[ci/build]: Upgrade SONiC package versions (#13992) 2023-02-26 22:57:45 +08:00
Sudharsan Dhamal Gopalarathnam
ca17198f04
[202012][Mellanox] Change MFT version to 4.21.0-100 (#13956)
- Why I did it
Update MFT version to 4.21.0-100 to include a fix for an issue reported using mlxlink on qsfp-dd

- How I did it
Update mft.mk

- How to verify it
Run regression on Mellanox platforms
2023-02-26 09:42:52 +02:00
mssonicbld
7455c56024
[ci/build]: Upgrade SONiC package versions (#13985) 2023-02-25 14:57:37 +08:00
xumia
8636494c6d
[Build] Pin the toposort version to 1.7 in python2 (#13979)
Why I did it
Fix the docker-base-stretch build issue in nephos platform.

Collecting supervisord-dependent-startup==1.4.0
  Downloading d386c3d2cf/supervisord_dependent_startup-1.4.0-py2.py3-none-any.whl
Collecting toposort>=1.5 (from supervisord-dependent-startup==1.4.0)
  Downloading 44e51b4216/toposort-1.9.tar.gz
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
    IOError: [Errno 2] No such file or directory: '/tmp/pip-build-LnROQE/toposort/setup.py'
    
    ----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-LnROQE/toposort/

The other platforms have been upgraded to docker-base-buster, not impacted.

How I did it
Pin the toposort version to 1.7, the package supervisord-dependent-startup has dependency on it.
The toposort>=1.8 only for python3, is not applicable to python2.
2023-02-25 07:33:23 +08:00
xumia
09ce5ec7b9
[Build] Clean up the debian preference config file (#13887) (#13976)
Why I did it
Support to upgrade packages, do better cleanup after the build.

How I did it
Remove the no use preference version control file after the build.

How to verify it
2023-02-24 13:08:29 -08:00
Junchao-Mellanox
0f47c5be59
[202012] [Mellanox] Fix issue: cannot lable port for logical port is logical port number larger than 64 (#13709)
- Why I did it
sfp_event.py gets a PMPE message when a cable event is available. In PMPE message, there is no label port available. Current sfp_event.py is using sx_api_port_device_get to get 64 logical ports attributes, and find the label port from those 64 attributes. However, if there are more than 64 ports, sfp_event.py might not be able to find the label port and drop the PMPE message.

- How I did it
Don't use hardcoded 64, get logical port number instead.

- How to verify it
Manual test
2023-02-23 08:27:21 +02:00
Stepan Blyshchak
73c7ced753
[202012][Mellanox] Place FW binaries under platform directory instead of squashfs (#13890)
Upgrade from old image always requires squashfs mount to get the next image FW binary. This can be avoided if we put FW binary under platform directory which is easily accessible after installation:

admin@r-spider-05:~$ ls /host/image-fw-new-loc.0-dirty-20230208.193534/platform/fw-SPC.mfa
/host/image-fw-new-loc.0-dirty-20230208.193534/platform/fw-SPC.mfa
admin@r-spider-05:~$ ls -al /tmp/image-fw-new-loc.0-dirty-20230208.193534-fs/etc/mlnx/fw-SPC.mfa
lrwxrwxrwx 1 root root 66 Feb  8 17:57 /tmp/image-fw-new-loc.0-dirty-20230208.193534-fs/etc/mlnx/fw-SPC.mfa -> /host/image-fw-new-loc.0-dirty-20230208.193534/platform/fw-SPC.mfa

- Why I did it
202211 and above uses different squashfs compression type that 201911 kernel can not handle. Therefore, we avoid mounting squashfs altogether with this change.

- How I did it
Place FW binary under /host/image-/platform/mlnx/, soft links in /etc/mlnx are created to avoid breaking existing scripts/automation.
/etc/mlnx/fw-SPCX.mfa is a soft link always pointing to the FW that should be used in current image
mlnx-fw-upgrade.sh is updated to prefer /host/image-/platform/mlnx location and fallback to /etc/mlnx in squashfs in case new location does not exist. This is necessary to do image downgrade.

- How to verify it
Upgrade from 201911 to 202012
202012 to 201911 downgrade
202012 -> 202012 reboot
ONIE -> 202012 boot (First FW burn)

Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
2023-02-22 17:38:54 +02:00
Liu Shilong
0cc6f1ca82 [ci] Fix docker hang issue and change template reference branch (#13894)
Why I did it
Azure pipeline change.
Use common template to make it easy to change common steps.
Fix docker hang issue.

How I did it
2023-02-22 18:36:50 +08:00
mssonicbld
6230ced2b1
[ci/build]: Upgrade SONiC package versions (#13897) 2023-02-21 22:49:29 +08:00
Liu Shilong
d046712b25
[ci] Kill hanged docker build process to avoid build timeout issue. (#13726) (#13731)
Why I did it
Docker build has a low rate of hanging up.
It hangs on different steps. So, it looks like a bug in docker daemon.

How I did it
Start a daemon process to scan running time more than 1 hours, and kill the process.

How to verify it
2023-02-20 18:16:25 +08:00
Junchao-Mellanox
7543993af3
[202012] [Mellanox] Fix issue: SFP eeprom corrupted after replacing cable with different sfp type (#13543)
- Why I did it
There are 3 tasks in xcvrd:

main task, run a loop to recover missing SFP static information to DB every 1 minute
SFP state task, a process which listens cable plug in/out event, insert SFP static information to DB while a cable is inserted
SFP DOM update task, a thread which handles cable DOM information update every 1 minute
Let assume user replaces QSFP with QSFP-DD. There are two issues:

Only SFP state task listens cable plug in/out event, main task and SFP DOM update task does not know SFP type has changed, they still “think” the SFP type is QSFP. So, main task and SFP DOM update task uses QSFP standard to parse QSFP-DD EEPROM which causes corrupted data.
There is a race condition between main task and SFP state task. They both insert SFP static information to DB. Depends on timing, it is possible that main task using wrong SFP type to override SFP static information.
The PR is to fix these two issues.

There is no such issue on 202205 and above because there is a refactor for xcvrd:

SFP state task was changed from process to thread, so that all 3 tasks share the same memory space, they always have correct SFP type.
Recover missing SFP information logical was moved from main task to SFP state task. There is no race condition anymore.

- How I did it
It is difficult to back port latest xcvrd because there are many refactor/new features in xcvrd after 202012 release. It will be huge effort to do so. Based on that, we decided to fix the issue on Nvidia platform API side. The fix is that: refreshing SFP type before any SFP API which accessing SFP EEPROM. Refreshing SFP type before any SFP API would cause a small performance down: Due to my test on 202012 branch, accessing transceiver INFO and DOM INFO for 32 ports takes 1.7 seconds before the change. The number changes to 2.4 seconds after the change. I suppose the performance down is acceptable.

- How to verify it
Manual test
Regression
2023-02-19 09:47:32 +02:00
Lawrence Lee
5b889543ee
[202012][swss]: Submodule update (#13839)
Include following commit:

- 0d95f076 [202012]: Reduce log level when FDB cache lookup fails (#2667)
2023-02-17 10:46:33 -08:00
Hua Liu
2b39cd61fb
[202012] [sonic-swss-common] Update sonic-swss-common submodule (#13813)
#### Why I did it
Submodule update for sonic-swss-common with following change:
```
3e34309 2023-02-11 | RedisPipeline ignore flush when call dtor from another thread. (#736) [Hua Liu]
```
2023-02-17 10:41:25 -08:00
Samuel Angebault
e01e1860d4 [Arista] Disable ATA NCQ for a few products (#13739)
Why I did it
Some products might experience an occasional IO failure in the communication between CPU and SSD.
Based on some research it could be attributable to some device not handling ATA NCQ (Native Command Queue).

This issue currently affect 4 products:

DCS-7170-32C*
DCS-7170-64C
DCS-7060DX4-32
DCS-7260CX3-64

How I did it
This change disable NCQ on the affected drive for a small set of products.

How to verify it
When the fix is applied, these 2 patterns can be found in the dmesg.
ata1.00: FORCE: horkage modified (noncq)
NCQ (not used)

Test results using: fio --direct=1 --rw=randrw --bs=64k --ioengine=libaio --iodepth=64 --runtime=120 --numjobs=4

with NCQ (ata1.00: 61865984 sectors, multi 1: LBA48 NCQ (depth 32), AA)

   READ: bw=33.9MiB/s (35.6MB/s), 33.9MiB/s-33.9MiB/s (35.6MB/s-35.6MB/s), io=4073MiB (4270MB), run=120078-120078msec
  WRITE: bw=34.1MiB/s (35.8MB/s), 34.1MiB/s-34.1MiB/s (35.8MB/s-35.8MB/s), io=4100MiB (4300MB), run=120078-120078msec
without NCQ (ata1.00: 61865984 sectors, multi 1: LBA48 NCQ (not used))

   READ: bw=31.7MiB/s (33.3MB/s), 31.7MiB/s-31.7MiB/s (33.3MB/s-33.3MB/s), io=3808MiB (3993MB), run=120083-120083msec
  WRITE: bw=31.9MiB/s (33.4MB/s), 31.9MiB/s-31.9MiB/s (33.4MB/s-33.4MB/s), io=3830MiB (4016MB), run=120083-120083msec
Which release branch to backport (provide reason below if selected)
2023-02-16 17:54:16 +00:00
Ikki Zhu
2135c6eb2f [DX010 platform] fix dx010 platform testcase issues (#13595)
Why I did it
1. fix chassis test_set_fans_led case
2. fix chassis get_name case mismatch issue
3. fix fan_drawer test_set_fans_speed
4. fix component test_components test case

How I did it
Add corresponding configuration into chassis json file

How to verify it
Run platform tests cases to verify these failure cases
2023-02-16 17:52:12 +00:00
Qi Luo
9731aa36c2
[sonic-snmpagent] Update submodule (#13832)
#### Why I did it
Include below commits:
```
7147354 2023-02-14 | Fix: zero route may have empty nexthop (#276) [Qi Luo]
e60a64c 2022-11-30 | Use github code scanning instead of LGTM (#274) [Liu Shilong]
```
2023-02-16 09:04:36 -08:00
jhli-cisco
592ce16d05
Update cisco-8000.ini (#13793)
#### Why I did it
1.57.x SDK based incremental drop that addresses:
Fix for MIGSMSFT-158
Support for VxLAN and BFD Serviceability CLI
sfputil reset platform fix to handle 100G optics
Added thermal management feature for ZR optics sensors

#### How I did it
Update cisco-8000 submodule to v0.2.5
2023-02-14 11:05:09 -08:00
jcaiMR
936679ee47 Set 'origin' and 'AS Path' for T1 SLB routes (#13613)
* set origin and as-path prepend for routes from SLB
2023-02-10 18:38:16 +00:00
Lawrence Lee
4e70a2bfbc
[202012][swss]: Submodule update for SWSS (#13722)
Include following commits:
- c98b9f09 [202012][orchagent]: Get bridge port ID from FDBOrch cache instead of SAI API #2657 (#2658)
- 59886b8f [MuxOrch] Enabling neighbor when adding in active state (#2601)
2023-02-09 11:01:58 -08:00
jingwenxie
95893698e2
[202012][sonic-utilites] advance submodule (#13734)
```
39cdb49c7 [202012][show] Add bgpraw to show run all (#2639)
b3ebba2ca [202012][show] add new CLI to show tunnel route objects #2255 (#2659)
d08f59b9f Fixed a bug in "show vnet routes all" causing screen overrun. (#2644) (#2654)
a996abdb5 [202012][show] show logging CLI support for logs stored in tmpfs (#2652)
c60f771c0 [202012][show_bfd] add local discriminator in show bfd command (#2616)
```
2023-02-09 10:04:58 -08:00