#### Why I did it
Fix for link down issue seen with AOI 100G-PSM4 optics on 8102-64H-O [JIRA ID# MIGSMSFT-23]
#### How I did it
update platform module to 0.2.7
Why I did it
mmh3's new version 3.1.0 breaks pipeline build.
bullseye/buster/jessie pined the version to 2.5.1
How I did it
Pin mmh3's version as other dists.
How to verify it
Improve sudo cat command for RO user.
#### Why I did it
RO user can use sudo command show none syslog files.
#### How I did it
Improve sudo cat command for RO user.
#### How to verify it
Pass all UT.
Manually check fixed code work correctly.
#### Description for the changelog
Improve sudo cat command for RO user.
#### Why I did it
Bug in script that was passing in null as log level value if missing from config_db
#### How I did it
Added more robust conditional statement
#### How to verify it
1) Remove log_level from config db
2) config reload -y
3) telemetry should not crash
#### Why I did it
Update sonic-snmpagent submodule to include below commit:
Revert "[202012]: snmp vlan support per RFC1213 and added the missing support for RFC2863 (#279)" (#280)
Why I did it
sonic-slave-stretch build failed for mmh3 version update to 3.10 on Mar 24.
How I did it
Enable reproducible build for vhdx image.
How to verify it
**What I did**
Check /etc/pam.d/sshd integrity after modify it in hostcfgd.
**Why I did it**
Found some incident that /etc/pam.d/sshd become empty file during OR upgrade.
**How I verified it**
Pass all UT.
Add new UT to cover new code.
**Details if related**
This is a manually cherry-pick PR for https://github.com/sonic-net/sonic-host-services/pull/36
Backport #14372 to 202012
Why I did it
For better accounting purposes, updating the ingress lossy traffic profile to use static threshold. This change is only intended for Th devices using RDMA-CENTRIC profiles
How I did it
Update the buffer templates for Th devices in RDMA-CENTRIC folder to use the correct threshold
Signed-off-by: Neetha John <nejo@microsoft.com>
During the build process, a dsc file is retrieved from the URL:
http://deb.debian.org/debian/pool/main/i/isc-dhcp/isc-dhcp_4.4.1-2.3.dsc
Depending on the DNS resolution, the server reached may respond with a
HTTP 404 error code, what stops the build process.
In all cases, the URL http://deb.debian.org/debian/pool/main/i/isc-dhcp/
no more lists this DSC file but one with a different format.
The suffix "+deb11u1" is now appended to identify the debian version.
- append this suffix to the make file rules of isc-dhcp
Signed-off-by: Guillaume Lambert <guillaume.lambert@orange.com>
Co-authored-by: Guilt <guillaume.lambert@orange.com>
For sonic-platform-daemons following commits are added to the submodule
dd8fbae (HEAD -> 202012, origin/202012) [ycabled] add more coverage to ycabled; add minor name change for vendor API CLI return key-values pairs (#338)
846555e [thermalctld] fix some redundant removal of state DB tables (#315)
3d92fb9 Use github code scanning instead of LGTM (#316)
For sonic-utilities the following commits are added in this PR to the submodule
git log --oneline 39cdb49c..202012
ec4c6ea5 (HEAD -> 202012, origin/202012) [show][muxcable] add some new commands health, reset-cause, queue_info support for muxcable (#2414) (#2704)
03ef272e [202012][vlan] Remove add field of vlanid to DHCP_RELAY table while adding vlan (#2681)
e00a81ac [202012][dhcp-relay] Add support for dhcp_relay config cli (#2640)
274184e1 [vlan] Refresh dhcpv6_relay config while adding/deleting a vlan (#2660) (#2668
#### Why I did it
updating the submodule of sonic-platform-daemons, sonic-utilities
#### How I did it
updated the submodule
Why I did it
This PR addresses the issue mentioned above by loading the acl config as a service on a storage backend device
How I did it
The new acl service is a oneshot service which will start after swss and does some retries to ensure that the SWITCH_CAPABILITY info is present before attempting to load the acl rules. The service is also bound to sonic targets which ensures that it gets restarted during minigraph reload and config reload
How to verify it
Build an image with the following changes and did the following tests
Verified that acl is loaded successfully on a storage backend device after a switch boot up
Verified that acl is loaded successfully on a storage backend ToR after minigraph load and config reload
Verified that acl is not loaded if the device is not a storage backend ToR or the device does not have a DATAACL table
Signed-off-by: Neetha John <nejo@microsoft.com>
Why I did it
Update dynamic threshold to -1 to get optimal performance for RDMA traffic
How I did it
Modified pg_profile_lookup.ini to reflect the correct value
Signed-off-by: Neetha John <nejo@microsoft.com>
Why I did it
Dhcpmon had incorrect RX count for server side packets. It does not raise any false alarms, but could miss catching server side packet count mismatch between snapshot and current counter.
Add debug mode which prints counter to syslog
How I did it
Due to dualtor inbound filter requirement, there are currently two filters, each for listening to rx / tx packets.
Originally, we opened up an rx/tx socket for each interface specified, which causes duplicate socket. Now we initialize the sockets only once. Both sockets are not binded to an interface, and we use vlan to interface mapping to filter packets. For inbound uplinks, we use a portchannel to interface mapping.
Previous dhcpmon counter before dual tor change:
[ Agg-Vlan1000- Current rx/tx] Discover: 1/ 4, Offer: 1/ 1, Request: 3/ 12, ACK: 1/ 1
[ eth0- Current rx/tx] Discover: 0/ 0, Offer: 0/ 0, Request: 0/ 0, ACK: 0/ 0
[ eth0- Current rx/tx] Discover: 0/ 0, Offer: 0/ 0, Request: 0/ 0, ACK: 0/ 0
[ PortChannel104- Current rx/tx] Discover: 0/ 1, Offer: 0/ 0, Request: 0/ 3, ACK: 0/ 0
[ PortChannel103- Current rx/tx] Discover: 0/ 1, Offer: 0/ 0, Request: 0/ 3, ACK: 0/ 0
[ PortChannel102- Current rx/tx] Discover: 0/ 2, Offer: 1/ 0, Request: 0/ 6, ACK: 1/ 0
[ PortChannel101- Current rx/tx] Discover: 0/ 0, Offer: 0/ 0, Request: 0/ 0, ACK: 0/ 0
[ Vlan1000- Current rx/tx] Discover: 1/ 0, Offer: 0/ 1, Request: 3/ 0, ACK: 0/ 1
[ Agg-Vlan1000- Current rx/tx] Discover: 1/ 4, Offer: 1/ 1, Request: 3/ 12, ACK: 1/ 1
Dhcpmon counter after this PR:
[ PortChannel104- Current rx/tx] Discover: 0/ 1, Offer: 0/ 0, Request: 0/ 3, ACK: 0/ 0
[ PortChannel103- Current rx/tx] Discover: 0/ 1, Offer: 0/ 0, Request: 0/ 3, ACK: 0/ 0
[ PortChannel102- Current rx/tx] Discover: 0/ 2, Offer: 1/ 0, Request: 0/ 6, ACK: 1/ 0
[ PortChannel101- Current rx/tx] Discover: 0/ 0, Offer: 0/ 0, Request: 0/ 0, ACK: 0/ 0
[ Vlan1000- Current rx/tx] Discover: 1/ 0, Offer: 0/ 1, Request: 3/ 0, ACK: 0/ 1
[ Agg-Vlan1000- Current rx/tx] Discover: 1/ 4, Offer: 1/ 1, Request: 3/ 12, ACK: 1/ 1
How to verify it
Ran dhcp relay test to send all four packets in singles and batches on both single ToR and dual ToR. Counter was as expected.
- Why I did it
To optimize Mellanox platform build
- How I did it
sdk debs are now downloaded as Spectrum-SDK-Drivers-SONiC-Bins release
sx kernel is downloaded as zip from Spectrum-SDK-Drivers
Update sonic-restapi for the following commit:
44121be - 2023-03-14: Support ipv6 prefix length greater than 64 and check for adv_prefix
47e4b53 - 2023-03-15: Set allowed IPv6 pfx len to be 60
Manual cherry-pick of https://github.com/sonic-net/sonic-buildimage/pull/14138
- Why I did it
In sfplpm API, the number of logical ports is hardcoded as 64. When a system contains more port than this, the SDK APIs would fail with a syslog as below
Mar 7 03:53:58.105980 r-leopard-58 ERR syncd#SDK: [MGMT_LIB.ERR] Slot [0] Module [0] has logport [0x00010069] in enabled state
Mar 7 03:53:58.105980 r-leopard-58 ERR syncd#SDK: [SDK_MGMT_LIB.ERR] Failed in __sdk_mgmt_phy_module_pwr_attr_set, error: Internal Error
Mar 7 03:53:58.106118 r-leopard-58 ERR pmon#-c: Error occurred when setting power mode for SFP module 0, slot 0, error code 1
- How I did it
Remove the hardcoded value of 64. Obtained the number of logical ports from SDK
- How to verify it
Manual testing
#### Why I did it
Some products might experience an occasional IO failure in the communication between CPU and SSD.
Based on some research it could be attributable to some device not handling ATA NCQ (Native Command Queue).
This issue currently affect 4 products:
- `DCS-7170-32C*`
- `DCS-7170-64C`
- `DCS-7060DX4-32`
- `DCS-7260CX3-64`
#### How I did it
This change disable NCQ on the affected drive for a small set of products.
#### How to verify it
When the fix is applied, these 2 patterns can be found in the dmesg.
`ata1.00: FORCE: horkage modified (noncq)`
`NCQ (not used)`
Test results using: `fio --direct=1 --rw=randrw --bs=64k --ioengine=libaio --iodepth=64 --runtime=120 --numjobs=4`
with NCQ (`ata1.00: 61865984 sectors, multi 1: LBA48 NCQ (depth 32), AA`)
```
READ: bw=33.9MiB/s (35.6MB/s), 33.9MiB/s-33.9MiB/s (35.6MB/s-35.6MB/s), io=4073MiB (4270MB), run=120078-120078msec
WRITE: bw=34.1MiB/s (35.8MB/s), 34.1MiB/s-34.1MiB/s (35.8MB/s-35.8MB/s), io=4100MiB (4300MB), run=120078-120078msec
```
without NCQ (`ata1.00: 61865984 sectors, multi 1: LBA48 NCQ (not used)`)
```
READ: bw=31.7MiB/s (33.3MB/s), 31.7MiB/s-31.7MiB/s (33.3MB/s-33.3MB/s), io=3808MiB (3993MB), run=120083-120083msec
WRITE: bw=31.9MiB/s (33.4MB/s), 31.9MiB/s-31.9MiB/s (33.4MB/s-33.4MB/s), io=3830MiB (4016MB), run=120083-120083msec
```
#### Description for the changelog
Disable ATA NCQ for a few Arista products
#### Why I did it
This is part of a corresponding change to the pcie daemon that enables it to verify PCI peripherals on a platform against a preconfigured YAML file, and enables the pcied daemon to call the system commands needed for PCI peripheral verification
#### How I did it
Adding aforementioned libraries to the Dockerfile.j2 file
#### How to verify it
run 'which setpci' from the pmon docker - would show the path of the binary
#### Description for the changelog
Modified pmon's Dockerfile.j2 to include pciutils and libpci libraries.
**cherry-pick of SHA: 7de04504c9518d68aa00c304b7376fdff4e1d318**
#### Why I did it
When using ```sonic-install set-default``` to switch the image from 202012 to 202205. The system will be stuck at loading kernel while reboot.
#### How I did it
The issue is caused by the kernal size related setting in uboot environment is smaller in the 202012 branch while they are larger in 202205 branch. The "sonic-installer set-default" just changes the boot_next variable. To fix this issue, we sync up the 202012 branch kernel related setting with the 202205 branch. This PR is only applicable to 202012 branch.
#### How to verify it
1) Install the latest 202205 image .89 or latest and reboot
2) Install the 202012 image which contains this fix and reboot
3) using "sonic-installer set-default 202205 image and reboot
4) system should start without any issue.
#### Ensure to add label/tag for the feature raised. example - PR#2174 under sonic-utilities repo. where, Generic Config and Update feature has been labelled as GCU.
* Include the following commits:
- a21b160 [202012][orchagent]: Handle duplicate routes in a graceful manner (#2666)
- 1540161 [bfdorch] add default TOS value for BFD packet (#2692)
- 860430c [ci] run apt-get update before apt-get install (#2686)
This PR is to backport #14138 to 202012.
- Why I did it
In sfplpm API, the number of logical ports is hardcoded as 64. When a system contains more port than this, the SDK APIs would fail with a trace as below
Enabling low-power mode for port Ethernet0... Traceback (most recent call last):
File "/usr/share/sonic/platform/plugins/sfplpmset.py", line 167, in
set_lpmode(handle, cmd, sfp_module)
File "/usr/share/sonic/platform/plugins/sfplpmset.py", line 128, in set_lpmode
SX_MGMT_PHY_MOD_PWR_ATTR_PWR_MODE_E, SX_MGMT_PHY_MOD_PWR_MODE_LOW_E)
File "/usr/share/sonic/platform/plugins/sfplpmset.py", line 115, in pwr_attr_set
mgmt_phy_mod_pwr_attr_set(handle, module_id, attr_type, power_mode)
File "/usr/share/sonic/platform/plugins/sfplpmset.py", line 84, in mgmt_phy_mod_pwr_attr_set
assert SX_STATUS_SUCCESS == rc, "sx_mgmt_phy_mod_pwr_attr_set failed"
AssertionError: sx_mgmt_phy_mod_pwr_attr_set failed
Error! Unable to set LPM for 1, rc = 1, err msg: [+] opening sdk
Mar 07 03:25:28 INFO LOG: Initializing SX log with STDOUT as output file.
Mar 07 03:25:28 ERROR SX_API_PORT: sx_mgmt_phy_mod_pwr_attr_get: This API is deprecated and will be removed in the future. Please use sx_mgmt_phy_module_pwr_attr_get in its place.
Mar 07 03:25:28 ERROR SX_API_PORT: sx_mgmt_phy_mod_pwr_attr_set: This API is deprecated and will be removed in the future. Please use sx_mgmt_phy_module_pwr_attr_set in its place.
- How I did it
Remove the hardcoded value of 64. Obtained the number of logical ports from SDK
- How to verify it
Manual testing
#### Why I did it
Update sonic-snmpagent submodule to include below commit:
fba50c6 [202012]: snmp vlan support per RFC1213 and added the missing support for RFC2863 (#279)
Why I did it
[Build] Support to use loosen version when failed to install python packages
It is to fix the issue #14012
How I did it
Try to use the installation command without constraint
How to verify it
Why I did it
Platform cases test_tx_disable, test_tx_disable_channel, test_power_override failed in dx010.
How I did it
Add i2c access algorithm for CPLD i2c adapters.
How to verify it
Verify it with platform_tests/api/test_sfp.py::TestSfpApi test cases.
#### Why I did it
Remove dialout as critical process as it is no longer used in prod. As part of future work, can remove dialout completely
#### How I did it
Remove from critical process list
Why I did it
Seastone does not have the psu fans' status led, need to reflect it in platform.json.
How I did it
Set the psu fans status led available to false.
How to verify it
Verify it with platform_tests/api/test_psu_fans.py::TestPsuFans::test_set_fans_led case.
- Why I did it
Update MFT version to 4.21.0-100 to include a fix for an issue reported using mlxlink on qsfp-dd
- How I did it
Update mft.mk
- How to verify it
Run regression on Mellanox platforms
Why I did it
Fix the docker-base-stretch build issue in nephos platform.
Collecting supervisord-dependent-startup==1.4.0
Downloading d386c3d2cf/supervisord_dependent_startup-1.4.0-py2.py3-none-any.whl
Collecting toposort>=1.5 (from supervisord-dependent-startup==1.4.0)
Downloading 44e51b4216/toposort-1.9.tar.gz
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "<string>", line 1, in <module>
IOError: [Errno 2] No such file or directory: '/tmp/pip-build-LnROQE/toposort/setup.py'
----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-LnROQE/toposort/
The other platforms have been upgraded to docker-base-buster, not impacted.
How I did it
Pin the toposort version to 1.7, the package supervisord-dependent-startup has dependency on it.
The toposort>=1.8 only for python3, is not applicable to python2.
Why I did it
Support to upgrade packages, do better cleanup after the build.
How I did it
Remove the no use preference version control file after the build.
How to verify it
- Why I did it
sfp_event.py gets a PMPE message when a cable event is available. In PMPE message, there is no label port available. Current sfp_event.py is using sx_api_port_device_get to get 64 logical ports attributes, and find the label port from those 64 attributes. However, if there are more than 64 ports, sfp_event.py might not be able to find the label port and drop the PMPE message.
- How I did it
Don't use hardcoded 64, get logical port number instead.
- How to verify it
Manual test
Upgrade from old image always requires squashfs mount to get the next image FW binary. This can be avoided if we put FW binary under platform directory which is easily accessible after installation:
admin@r-spider-05:~$ ls /host/image-fw-new-loc.0-dirty-20230208.193534/platform/fw-SPC.mfa
/host/image-fw-new-loc.0-dirty-20230208.193534/platform/fw-SPC.mfa
admin@r-spider-05:~$ ls -al /tmp/image-fw-new-loc.0-dirty-20230208.193534-fs/etc/mlnx/fw-SPC.mfa
lrwxrwxrwx 1 root root 66 Feb 8 17:57 /tmp/image-fw-new-loc.0-dirty-20230208.193534-fs/etc/mlnx/fw-SPC.mfa -> /host/image-fw-new-loc.0-dirty-20230208.193534/platform/fw-SPC.mfa
- Why I did it
202211 and above uses different squashfs compression type that 201911 kernel can not handle. Therefore, we avoid mounting squashfs altogether with this change.
- How I did it
Place FW binary under /host/image-/platform/mlnx/, soft links in /etc/mlnx are created to avoid breaking existing scripts/automation.
/etc/mlnx/fw-SPCX.mfa is a soft link always pointing to the FW that should be used in current image
mlnx-fw-upgrade.sh is updated to prefer /host/image-/platform/mlnx location and fallback to /etc/mlnx in squashfs in case new location does not exist. This is necessary to do image downgrade.
- How to verify it
Upgrade from 201911 to 202012
202012 to 201911 downgrade
202012 -> 202012 reboot
ONIE -> 202012 boot (First FW burn)
Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>