- Why I did it
Fix issue: psu might use wrong voltage sysfs which causes invalid voltage value. The flow is like:
1. User power off a PSU
2. All sysfs files related to this PSU are removed
3. User did a reboot/config reload
4. PSU will use wrong sysfs as voltage node
- How I did it
Always try find an existing sysfs.
- How to verify it
Manual test
#### Why I did it
PR https://github.com/Azure/sonic-utilities/pull/1825 added validation for the input of `config mirror session add`, and only decimal value is accepted.
An issue https://github.com/Azure/sonic-buildimage/issues/10096 was raised to suggest accepting HEX value as well, and the suggestion makes sense to me.
To accept HEX value for GRE type, and keep backward compatibility as well, I updated the YANG model to support both decimal and hexadecimal input for GRE type.
#### How I did it
Update the regex for GRE type.
#### How to verify it
Verified by UT
```
platform linux -- Python 3.9.2, pytest-6.0.2, py-1.10.0, pluggy-0.13.0
rootdir: /sonic/src/sonic-yang-models
plugins: pyfakefs-4.5.4, cov-2.10.1
collected 3 items
tests/test_sonic_yang_models.py .. [ 66%]
tests/yang_model_tests/test_yang_model.py . [100%]
========================================================================================== 3 passed in 2.53s ==========================================================================================
```
#### Description for the changelog
Update YANG model for mirror session to support decimal value for GRE type.
This can save 6 sec for teamd LAG restoration - the time between:
```
Mar 9 13:51:10.467757 r-panther-13 WARNING teamd#teamd_PortChannel1[28]: Got SIGUSR1.
Mar 9 13:52:33.310707 r-panther-13 INFO teamd#teamd_PortChannel1[27]: carrier changed to UP
```
- Why I did it
Optimize warm boot. Specifically reduce the time needed for LAG restoration.
- How I did it
Kill teamd docker after graceful shutdown of teamd processes.
- How to verify it
Run warm reboot.
Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
Why I did it
The marvel-armhf build is hung, it does not exit after waiting for a long time.
It is caused by the process /etc/entropy.py which is started by the postinst script in target/debs/buster/sonic-platform-nokia-7215_1.0_armhf.deb
When mounting the partition that contains `/host` during initramfs, the
mount binary available there (coming from busybox) tries each filesystem
in `/proc/filesystems` and sees which one succeeds. During this time,
there may be some error messages logged into dmesg because some of the
incorrect filesystems failed to mount the partition.
Specify the filesystem type explicitly so that initramfs knows it's that
type, and we know what filesystem will always get used there.
Fixes#9998
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
- Why I did it
To implement blocking feature state change.
- How I did it
Record the actual feature state in STATE DB from hostcfg.
- How to verify it
UT + verification by running on the switch and checking STATE DB.
Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
e56e9b4 Fix CVE-2021-3121 warning (#96)
bf1be4f [ci]: Support code diff coverage threshold 50% (#94)
64e516c Ported Marvell armhf build on x86 for debian buster to use cross-compilation instead of qemu emulation (#80)
e426388 [ci]: Support azp code coverage (#87)
Why I did it
uboot env get and set commands fw_printenv/fw_setenv are not available in bullseye sonic image. Some platforms using them where failing. Ex: sonic-installer commands in marvell-armhf.
In case of buster, u-boot-tools was providing these commands.
How I did it
Added libubootenv-tool which provides these tools along with other uboot tools in build_debian.sh.
How to verify it
root@localhost:# fw_printenv serverip
serverip=10.4.50.39
root@localhost:# fw_setenv serverip 10.4.50.38
root@localhost:~# fw_printenv serverip
serverip=10.4.50.38
Change-Id: I558f8737f41d83d3e8527ce340391ae8f978b6d8
Signed-off-by: Pavan Naregundi <pnaregundi@marvell.com>
Why I did it
It is to fix the issue #10048
When building .raw image, for instance, target/sonic-broadcom.raw, it will generate a .bin image, target/sonic-broadcom.bin, as the intermediate file. The intermediate file is a build target which may contains different dependencies with the raw one.
6a6b711 (HEAD -> 202111, origin/202111) Fix issue: sometimes PFC WD unable to create zero buffer pool (#2164)
459aee0 Use abort instead of exit in case calling SAI API failure (#2170)
e767137 Fix issue config qos reload causing orchagent aborted via tracking dependencies among QoS tables (#2116)
Why I did it
To enable PTF-SAI testing on 202111 branch, cherry-pick necessary PR from master
How I did it
Based on the current ptf docker create a new docker for sai-ptf(saiv2)
upgrade related package
use the latest ptf and install it
How to verify it
Tested on DUT
* [PTF-SAIv2]Add ptf docker for sai-ptf (saiv2) (#9729)
* [PTF-SAIv2]Add ptf dockre for sai-ptf (saiv2)
Base on current ptf docker create a new docker for sai-ptf(saiv2)
upgrade related package
use the latest ptf and install it
test done:
NOJESSIE=1 NOSTRETCH=1 NOBULLSEYE=1 ENABLE_SYNCD_RPC=y make target/docker-ptf-sai.gz
BLDENV=buster make -f Makefile.work target/docker-ptf-sai.gz
* upgrade the thrift to 014
* install xmlrunner python3 version (#10086)
Co-authored-by: Yang Wang <yangwang1@microsoft.com>
Support saiserver v2 with python3 and thrift 0.13.0
add variables to support the saiserverv2
build different thrift in saithrift depends on saiserver version
build differernt versions of saiserver
make the saiserver and saiserver docker with version number
test done:
build two different versions of sasiserver in local build environment
add saiserver to buster
Co-authored-by: Yang Wang <yangwang1@microsoft.comwq>
Generate the sai.profile base on the brcm j2 file if the sai.profile
is not existing in the dut mounted folder.
Change the supervisor service configuration accordingly.
Testing done:
Add the script and config in dut
saiservice server can start automatically with [systemctl start saiserver]
Signed-off-by: richardyu-ms <richard.yu@microsoft.com>
Co-authored-by: yangwang <yangwang@microsoft.com>
Cherry pick of #10072
- Why I did it
Removing DPB breakout modes that require adjacent ports to be disabled as that is not supported by the current DPB infrastructure.
Correspondingly had to remove the hwsku.json file from any SKUs which utilized these removed modes such that the system will fall back to ports_config.ini and DPB will not be supported for those SKUs.
- How I did it
Modified the platform.json files of Mellanox devices.
- How to verify it
Execute show int break [Ethernet] on the affected platforms and ensure there are no modes present that would require an adjacent port to be disabled to function.
91d7558 (HEAD -> 202111, origin/202111) Allow IPv4 link-local nexthops (#1903)
ceb5161 Fix for 2053, Fix IPv6 BGP multipath-relax peer-type. (#2062)
b3b279a [crm] Use sai_object_type_get_availability() API to get counters (#2098)
28955f4 Try get port operational speed from STATE DB (#2119)
Why I did it
Fixed the monit container_checker fails due to unexpected "database-chassis" docker running on Supervisor card in the VOQ chassis. fixes#9042
How I did it
Added database-chassis to the always running docker list if platform is supervisor card.
How to verify it
Execute the CLI command "sudo monit status container_checker"
Signed-off-by: mlok <marty.lok@nokia.com>
* Update container_checker for multi-asic devices
Update container_checker for multi-asic devices to add database containers in always_running_containers.
Previous change was made for single-asic, and that database containers were not considered as feature when writing to state_db.
* Update container_checker
Update an indent
Why I did it
Code review was still in progress when #9858 was merged and upon further testing I have arrived at a better solution.
How I did it
Modified supervisord configuration j2 template for pmon to require no minimum uptime for chassisd_db_init and to remove the redundant exit_codes directive
How to verify it
Boot switch and verify in syslog that there are no errors related to chassis_db_init
Why I did it
amrhf build fails while building sonic-config-engine whl package
https://dev.azure.com/mssonic/be1b070f-be15-4154-aade-b1d3bfb17054/_apis/build/builds/77089/logs/9
The reason for the failure is due to the fact that there is a new line generated at the top of the file in buffer config test cases while building for broadcom based platform and this issue is not seen in Marvell based platforms.
How I did it
Removed the new line for all the buffer test cases as there is no need to add it and accordingly changed the buffer_config.j2 where the new line is generated.
Why I did it
During warm-reboot and fast-reboot the below error logs appear
Feb 3 22:05:15.187408 r-lionfish-13 ERR container: docker cmd: kill for nat failed with 404 Client Error for http+docker://localhost/v1.41/containers/nat/json: Not Found ("No such container: nat")
The container command when called for local mode doesn't check if it is enabled before calling docker kill which throws the above errors.
b6ca76b482/scripts/fast-reboot (L699)
How I did it
Checking feature state if local mode and returning error exit code along with valid debug message.
How to verify it
Manually tested with warm-reboot and fast-reboot
Added UT to verify it.
PR #9481 changed auditd's log directory to be /var/log instead of
/var/log/audit, because SONiC mounts a disk image at /var/log during
runtime, and so the /var/log/audit directory might not exist (since it
would've been created during package installation, mounting another
partition at /var/log will hide it). However, for security reasons,
auditd changes the log directory to have 0750 permissions, so that not
everyone knows about the audit logs or read them.
To fix this, revert the change to auditd's log directory, and tell
systemd to create the audit log directory at runtime if it doesn't
exist. Because the disk image gets mounted during initramfs (before
systemd starts), systemd will make sure that the /var/log/audit
directory will exist.
Fixes#9548 and #10015
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
- Why I did it
The latest upgrade of Mellanox hw-mgmt V7.0020.1300 introduced a couple new kernel modules for new Mellanox platforms that have yet to be upstreamed to the linux kernel.
As these new platforms do not have SONiC support we elected not to upstream these new drivers to sonic-linux-kernel but hw-mgmt expects them to exist which is causing a non-functional error on switch boot.
Feb 15 00:09:55.374130 r-leopard-simx-74 ERR systemd-modules-load[269]: Failed to find module 'emc2305'
Feb 15 00:09:55.374141 r-leopard-simx-74 ERR systemd-modules-load[269]: Failed to find module 'ads1015'
To resolve this we can patch hw-mgmt to no longer attempt to load these modules by default.
- How I did it
Added a SONiC patch to Mellanox hw-mgmt in order to remove the unused kernel modules which were not upstreamed to sonic-linux-kernel
- How to verify it
Boot switch and verify there are no error logs regarding kernel modules failing to load.
- Why I did it
Stopping swss and syncd causes some driver module unloading. Those driver modules are depended by PMON. This could trigger ERROR logs in syslog.
- How I did it
Adjust warmboot shutdown order in make file
- How to verify it
Manual test
- Why I did it
In SONiC thermal control algorithm, it compares thermal zone temperature with thermal zone threshold. Previously, a thermal zone with no thermal sensor can still get its threshold. However, a recently driver patch changes this behavior: a thermal zone with no thermal sensor will return 0 for threshold. We need to ignore such thermal zone.
- How I did it
Ignore thermal zones whose temperature is 0.
- How to verify it
Added unit test case and Manual test
- Why I did it
swsscommon.ConfigDBConnector does not automatically close connection when the instance is recycled by python. So, it should not create this instance each time calling check_services. It will cause error like Failed to read from file /var/run/hw-management/led/led_status_capability - OSError(24, 'Too many open files')
- How I did it
Only connect DB once in init
- How to verify it
Manual test
Why I did it
In the recent minigraph changes we add separate BGP session configuration for V4 and V6 internal VoQ neighbors.
This PR is adding different Peer groups for V4 and V6 neighbors
How I did it
Add VOQ_CHASSIS_V4_PEER and VOQ_CHASSIS_V6_PEER groups
Add extra Unit tests
How to verify it
Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>
5331ecd [vslib]: Fix MACsec bug in SCI and XPN (#1003)
ac04509 Fix build issues on gcc-10 (#999)
1b8ce97 [pipeline] Download swss common artifact in a separated directory (#995)
7a2e096 Change sonic-buildimage.vs artifact source from CI build to official build. (#992)
d5866a3 [vslib]: fix create MACsec SA error (#986)
f36f7ce Added Support for enum query capability of Nexthop Group Type. (#989)
323b89b Support for MACsec statistics (#892)
26a8a12 Prevent other notification event storms to keep enqueue unchecked and drained all memory that leads to crashing the switch router (#968)
0cb253a Fix object availability conversion (#974)
Enable dbgsym package for dhcpmon.
Allow CFLAGS and LDFLAGS from environment variables to be used
in the dhcp6relay build. This makes sure that the -O2 flag from
dpkg-buildflags gets used.
Finally, enable all hardening flags in dpkg-buildflags for
dhcp6relay and dhcpmon. The change from the default set of flags is that
during linking, immediate binding of symbols is done instead of lazy
binding.
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
sonic-swss
1aa40f7 Remove port serdes object before removing port (#2152)
876d690 [doc] Updating Policer config in Configuration manual (#2144)
sonic-utilities
dfed952 show_platfom_info not run for simx (#2042)
71fdee7 [aclshow] fix aclshow when clear is called before counters are populated (#2037)
a48a027 [sonic-package-manager] implement blocking feature state change (#2035)
c51871d [ci] Fix python dependencies reference path. (#2060)
Why I did it
Radvd.conf.j2 template creates two copies of the vlan interface when there are more than one ipv6 address assigned to a single vlan interface. Changed the format to add prefixes under the same vlan interface block.
How I did it
Modifies radvd.conf.j2 and added unit tests
How to verify it
Configure multiple ipv6 address to the same vlan, start radvd
Unit test will check if radvd.conf with multiple ipv6 addresses is formed correctly
This issue causes negative threshold value and thus deleting log files even when there is enough space.
This issue causes negative threshold value and thus deleting log files even when there is enough space.
- Why I did it
To fix an issue when log files get deleted even if there is enough space.
- How I did it
Fixed an typo.
- How to verify it
Run the portion of the script that calculates threshold, see that the threshold is calculated correctly.
Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
Why I did it
the strcpy and buffer allocation is not safe, it corrupts 1 byte on the stack. Depending on the memory layout, it may or may not cause issue immediately.
message type is not validated before updating the counter. Which could cause segment fault.
How I did it
Remove the unsafe strcpy, use config->interface.c_str() instead.
Check message type before updating counters.
How to verify it
The issue (1) caused segment fault on a specific platform. The fix was validated there. Issue (2) was precautionary. Added log in case it triggers.
- Why I did it
Update MFT to version 4.18.1-16 for bugs fixes and new SN2201 support
- How I did it
Advance to MFT tool version to 4.18.1-16
- How to verify it
Manually tested on all Mellanox platforms (ASIC FW Upgrade, link debug tools, CPLD upgrade, etc.)
- Why I did it
Error log was shown on switches during boot
pmon#supervisord 2021-12-22 04:27:16,709 INFO exited: chassis_db_init (exit status 0; not expected)
- How I did it
Add exit code zero as an expected exit code and also disable autorestart.
- How to verify it
Boot the switch and ensure the above log line does not appear.