This PR is to backport #10401 to 202111
- Why I did it
Take new hw-mgmt release to SONiC, including:
new features:
hw-mgmt: add to PSU FW upgrade tool command to show current FW version
hw-mgmt: add to PSU FW upgrade tool support for single-PSU-in-the-system FW upgrade
hw-mgmt: add attribute “/firmware” to show FW version of restricted upgradable PSUs only
hw-mgmt: Add NVME temperature reports attributes (_alarm/_crit/_min/_max)
bug fix:
psu: redundant i2c_addr attributes being created for psu 3 & 4 in system having only 2 psus.
hw-mgmt: in SPC1/2 i2c driver removal is too slow vs. ASIC reset causing non-functional log errors
PSU thresholds sysfs changed in 5.10 to “read only” preventing modification (modification required due PSU HW bug)
CPLD3 sysfs attribute missing after chip down/up flow
sysfs attributes missing when hw-mgmt is restarted (stop/start) within systemd
release notes can be found from link https://github.com/Mellanox/hw-mgmt/blob/V.7.0020.2004/debian/Release.txt
- How I did it
Update hw-mgmt make file with new version number
Update hw-mgmt submodule pointer
- How to verify it
Run platform regression on all Mellanox platform
Signed-off-by: Kebo Liu <kebol@nvidia.com>
Why I did it
Exclude the innovium build in upgrading version build, currently, the builds are always failed, exclude the build temporarily.
Increase the broadcom build timeout.
- Why I did it
Fastboot will delay all counters in CONFIG DB, it relies on enable_counters.py to recover the delayed counters. However, enable_counters.py does not recover those non-default counters.
- How I did it
For non-default counters, if it is in CONFIG DB, put delay status to false after the waiting.
- How to verify it
Manual test
c5c105f (HEAD -> 202111, origin/202111) [PBH] Implement Edit Flows (#2093)
7050581 [techsupport] Handle minor fixes of TS Lock and update auto-TS (#2114)
3602f99 Fix issues in clear_qos (#2122)
4f96d3b Fixing get port speed when oper status is down (#2123)
5bb99c7 Validate LAG has members before mirror session create (#2130)
ec6c8af [vxlan] Remove tunnel map objects on VNET tunnel removal (#2150)
7e7db19 [BFD]Registering BFD state change callback during session creation (#2202)
618fe07 [VNET]Fixing nexthop group delete during route change (#2198)
91b66df [portsorch]: Prevent LAG member configuration when port has active ACL binding (#2165)
29de9d0 Remove redundant and problematic code to skip "pool" field in buffer profile handling (#2197)
ded0b45 [PBH] Implement Edit Flows (#2169)
2ee0f49 [neighsyncd] increase neighsyncd timeout (#2209)
a0160c0 [QosOrch] The notifications cannot be drained in QosOrch in case the first one needs to retry (#2206)
Fix the generating version file failure issue caused by artifacts folder change.
When changing to use the same template for PR build, official build and packages version upgrade, the artifacts folder adding a "target" folder, the version upgrade task should be changed accordingly.
Why I did it
docker hub will limit the pull rate.
Use ACR instead to pull debian related docker image.
How I did it
Set DEFAULT_CONTAINER_REGISTRY in pipeline.
* [Marvell] Update armhf SAI deb version 1.9.1 (#9865)
Move marvell armhf SAI deb to 1.9.1 to address build failures.
Signed-off-by: Rajkumar Pennadam Ramamoorthy <rpennadamram@marvell.com>
* [Marvell] Update armhf driver/sai deb version (#10126)
Fixed Marvell SAI deb version naming issue reported in Marvell-switching/sonic-marvell-binaries#62
Signed-off-by: Rajkumar Pennadam Ramamoorthy <rpennadamram@marvell.com>
* [Build]: only install grpc in amd64 (#10212)
[Build]: only install grpc in amd64
Unblock marvell-armhf build.
Co-authored-by: Rajkumar-Marvell <54936542+rajkumar38@users.noreply.github.com>
Why I did it
Cherry-pick commits from master to 202111 to fix build broken issue.
See detail in the commits.
Why I did it
Fix host image debian package version issue.
The package dependencies may have issue, when some of debian packages of the base image are upgraded. For example, libc is installed in base image, but if the mirror has new version, when running "apt-get upgrade", the package will be upgraded unexpected. To avoid such issue, need to add the versions when building the host image.
How I did it
The package versions of host-image should contain host-base-image.
#### Why I did it
when adding and removing ports after init stage we saw two issues:
first:
In several cases, after removing a port, lldpmgr is continuing to try to add a port to lldp with lldpcli command. the execution of this command is continuing to fail since the port is not existing anymore.
second:
after adding a port, we sometimes see this warning messgae:
"Command failed 'lldpcli configure ports Ethernet18 lldp portidsubtype local etp5b': 2021-07-27T14:16:54 [WARN/lldpctl] cannot find port Ethernet18"
we added these changes in order to solve it.
#### How I did it
port create events are taken from app db only.
lldpcli command is executed only when linux port is up.
when delete port event is received we remove this command from pending_cmds dictionary
#### How to verify it
manual tests and running lldp tests
#### Description for the changelog
Dynamic port configuration - solve lldp issues when adding/removing ports
Why I did it
Kernel hang in during early boot is caused due overwriting of device tree with uncompressing kernel. Added the fdt_high which gives a safe offset from kernel location.
How I did it
Setting uboot environment variable fdt_high.
How to verify it
Successful boot of bullseye kernel on Marvell Armada 380/385.
Change-Id: I3e2521780f5ecdb3bdf6cbb6542250814ca11959
Signed-off-by: Pavan Naregundi <pnaregundi@marvell.com>
Why I did it
Removing incorrect check in plt setup for fw_env config: This check was added before to compare 2 different types of disk. Now the check is redundant and check is not required as transition is complete.
2)Removing legacy_volume_label in create_partition: legacy_volume_label is not used in armhf install files. With legacy_volume_label initialized to NULL, current code will always return true for check, if demo_part exits.
How I did it
Change is about removing the redundant/incorrect code explained above.
How to verify it
uboot fw_printenv and fw_setenv is tested
onie-nos-install has be verified.
Signed-off-by: Pavan Naregundi <pnaregundi@marvell.com>
For Bullseye, Python 2 isn't present at all. This means that in certain
build cases (such as building something only for Bullseye), the version
file may not exist, and so the sort command would fail.
For most normal build commands, this probably won't be an issue, because
the SONiC build will start with Buster (which has both Python 2 and
Python 3 wheels built), and so the py2 and py3 files will be present
even during the Bullseye builds.
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Why I did it
[Build]: Enable marvell-armhf PR check
Improve the azp dependencies, make the Test stage only depended on BuildVS stage. The Test stage will be triggered once the BuildVS stage finished, reduce the waiting time.
- Why I did it
With the previous MFT 4.18.1-16 there is a bug in mstdump tool accessing wrong address. it is confirmed this issue does not exist in official 4.18.0-106.
- How I did it
Update the MFT version to 4.18.0-106
- How to verify it
Run regression on Mellanox platforms
Why I did it
support to collect version when purging debian package
Support to collect version multiple times
How I did it
Add the collection action before purging.
#### Why I did it
To fix https://github.com/Azure/sonic-buildimage/issues/9643
#### How I did it
Instead of ast.literal_eval added python2 compat code for json strings unicode -> str convertion.
We need python2 compatibility since py2 sonic config engine (buster/sonic_config_engine-1.0-py2-none-any.whl target) is still included into the build (ENABLE_PY2_MODULES flag is set for buster). Once we abandon buster and python2, this compat and ast.literal_eval could be cleaned up all through the code base.
#### How to verify it
run steps from the linked issue
Why I did it
ACL have ACCEPT action indeed, but yang doesn't support it.
How I did it
Add 'ACCEPT' enum to sonic-types.yang.j2
How to verify it
Run the YANG model unit tests
6562ad3 (HEAD -> 202111, origin/202111) [sfpshow][recycle_port] sfpshow script needs to skip recycle ports (#2109)
f184a61 Update `config mirror_session` CLI to support heximal gre type value (#2095)
03936ea (HEAD -> 202111, origin/202111) define index for recirc port (#118)
d48f750 [port_util] Fix issue: port_util.get_vlan_interface_oid_map should not raise exception when DB has not RIF data (#117)
- Why I did it
PDDF utils were python2 compliant and they needed to be migrated to Python3 (as per Bullseye)
PDDF common platform APIs file name changed as the name was already in use
Indentation issues
Dead/redundant code needed to be removed
- How I did it
Made files Python3 compliant
Indentation corrected
Redundant code removed
- How to verify it
AS7326 Accton platform uses PDDF. PDDF utils were run on this platform to verify.
cherry-pick of #9393 for 202111
- Use SfpOptoeBase by default to leverage new `sonic_xcvr` refactor
- Add support for `Woodleaf` product
- Move `libsfp-eeprom.so` to a different `.deb` package
- Add new logrotate configuration for arista logs
- Improve logging mechanism for the drivers (IO loglevel, fix syslog duplicates)
- Initialize chassis cards in parallel
- Refactor of `get_change_event` to fix interrupts treated as presence change
Why I did it
[Build]: Fix armhf mirrors not existing issue
The mirror endpoint debian-archive.trafficmanager.net does not support armhf, change to use deb.debian.org and security.debian.org.
Correct thrift.0.13.0 dependent package name.
In previous code, the buildout target was named as PYTHON3_THRIFT_0_13_0
But when add the prackage to LIBTHRIFT_0_13_0, it typo as PYTHON_THRIFT_0_13_0
Co-authored-by: Yang Wang<yangwang1@microsoft.com>
9968d60 (HEAD -> 202111, origin/202111) [sonic-package-manager] do not mod_config for whole config db when setting init_cfg (#2055)
4b3d53f [generate_dump] exclude mft and mlx folders from /etc (#2072)
51d92ae Validation check correction while adding a member to PortChannel (#2078)
6a43306 [techsupport] Added a lock to avoid running techsupport in parallel (#2065)
44cfdd9 Try get port operational speed from STATE DB (#2030)
45ea623 Fix sonic-installer failure due to missing import
- Why I did it
Fix issue: psu might use wrong voltage sysfs which causes invalid voltage value. The flow is like:
1. User power off a PSU
2. All sysfs files related to this PSU are removed
3. User did a reboot/config reload
4. PSU will use wrong sysfs as voltage node
- How I did it
Always try find an existing sysfs.
- How to verify it
Manual test
#### Why I did it
PR https://github.com/Azure/sonic-utilities/pull/1825 added validation for the input of `config mirror session add`, and only decimal value is accepted.
An issue https://github.com/Azure/sonic-buildimage/issues/10096 was raised to suggest accepting HEX value as well, and the suggestion makes sense to me.
To accept HEX value for GRE type, and keep backward compatibility as well, I updated the YANG model to support both decimal and hexadecimal input for GRE type.
#### How I did it
Update the regex for GRE type.
#### How to verify it
Verified by UT
```
platform linux -- Python 3.9.2, pytest-6.0.2, py-1.10.0, pluggy-0.13.0
rootdir: /sonic/src/sonic-yang-models
plugins: pyfakefs-4.5.4, cov-2.10.1
collected 3 items
tests/test_sonic_yang_models.py .. [ 66%]
tests/yang_model_tests/test_yang_model.py . [100%]
========================================================================================== 3 passed in 2.53s ==========================================================================================
```
#### Description for the changelog
Update YANG model for mirror session to support decimal value for GRE type.
This can save 6 sec for teamd LAG restoration - the time between:
```
Mar 9 13:51:10.467757 r-panther-13 WARNING teamd#teamd_PortChannel1[28]: Got SIGUSR1.
Mar 9 13:52:33.310707 r-panther-13 INFO teamd#teamd_PortChannel1[27]: carrier changed to UP
```
- Why I did it
Optimize warm boot. Specifically reduce the time needed for LAG restoration.
- How I did it
Kill teamd docker after graceful shutdown of teamd processes.
- How to verify it
Run warm reboot.
Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
Why I did it
The marvel-armhf build is hung, it does not exit after waiting for a long time.
It is caused by the process /etc/entropy.py which is started by the postinst script in target/debs/buster/sonic-platform-nokia-7215_1.0_armhf.deb
When mounting the partition that contains `/host` during initramfs, the
mount binary available there (coming from busybox) tries each filesystem
in `/proc/filesystems` and sees which one succeeds. During this time,
there may be some error messages logged into dmesg because some of the
incorrect filesystems failed to mount the partition.
Specify the filesystem type explicitly so that initramfs knows it's that
type, and we know what filesystem will always get used there.
Fixes#9998
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>