Why I did it
To incorporate the below changes in DellEMC S6100, S6000 platforms.
Enable thermalctld
Backport Platform API changes from master branch.
How I did it
Remove 'skip_thermalctld:true' in pmon_daemon_control.json
Implement the platform API methods in the respective device files
How to verify it
Verified that platform data is displayed by show platform fan and show platform temperature commands.
Why I did it
Cannot retrieve and display the reboot-cause.
How I did it
Correct the platform initialization definition.
How to verify it
Manual reboot and then 'show reboot-cause'
Backport #9258 to 201911
Why I did it
When PSU is powered off, the PSU is still on the switch and the air flow is still the same. In this case, it is not necessary to set FAN speed to 100%.
How I did it
When PSU is powered of, don't treat it as absent.
How to verify it
Adjust existing unit test case
Add new case in sonic-mgmt
Conflicts:
platform/mellanox/mlnx-platform-api/sonic_platform/thermal_infos.py
- Why I did it
To include latest fixes.
1. On CMIS modules, after low power configuration, the firmware waited for the module state to be ModuleReady instead of ModuleLowPower causing delays.
2. When connecting Spectrum devices with optical transceivers that support RXLOS, remote side port down might cause the switch firmware to get stuck and cause unexpected switch behavior.
3. On rare occasions, when working with port rates of 1GbE or 10GbE and congestion occurs, packets may get stuck in the chip and may cause switch to hang.
4. When ECMP has high amount of next-hops based on VLAN interfaces, in some rare cases, packets will get a wrong VLAN tag and will be dropped.
5. Using SN4600C with copper or optics loopback cables in NRZ speeds, link may raise in long link up times ( up to 70 seconds).
6. When connecting SN4600C to SN4600C after Fastboot in 50GbE No_FEC mode with a copper cable, the link up time may take ~20 seconds.
- How I did it
Updated SDK submodule and relevant makefiles with the required versions.
- How to verify it
Build an image and run tests from "soni-mgmt".
Signed-off-by: Volodymyr Samotiy <volodymyrs@nvidia.com>
3ce811960f19c514a6ca0b1c611b2c453eb3a0a3 (HEAD -> 201911, origin/201911) [201911][port2alias]: Fix to get right number of return values (#1907)
e648290b51fa4ec4d465efe55aa4d27d16edb249 disk_Check: Scan & mount as RW when disk turns into Read-only (#1872)
Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
Commits on Oct 26, 2021
Remove exec from platform_reboot call to prevent reboot hang (#1881) 066b5adf6d737a5bd174123d4d00dab4b6110cf6
Commits on Nov 17, 2021
[fdbshow]: Handle FDB cleanup gracefully. (#1918) c80321c98d0741f340d2900108bad7fed76c80cd
a0417f6f [Buffer Manager][201911] Reclaim unused buffer for admin-down ports (1837)
f77d393b [bufferorch][201911] Handle DEL_COMMAND for BUFFER_PG and BUFFER_QUEUE table (1787)
Signed-off-by: Stephen Sun <stephens@nvidia.com>
#### Why I did it
Upgrade Mellanox-SAI to 1.19.3 to support reclaiming reserved buffer on admin down ports
#### How I did it
To support reclaiming reserved buffer on admin down ports.
#### How to verify it
Regression test and manual test.
Why I did it
This PR aims to fix the bug in Monit template file of dhcp_relay container.
If Multi-VLAN were configured on device, multiple dhcrelay processes will be spawned in dhcp_relay container. Then there will be an entry for each dhcrelay process in Monit configuration file of dhcp_relay container.
Currently Monit template file of dhcp_relay container can not be rendered correctly to generate configuration file and will cause Monit can not start up.
#### Why I did it
Recently, the reserved buffer of admin-down ports is going to be reclaimed.
However, the way to do this differs among vendors.
We need to find a way to pass vendor information to swss docker.
#### How I did it
Fetch the ASIC vendor information when the docker is created and pass it to the docker as environment variable `ASIC_VENDOR`.
Why I did it
Fix error during building docker-sonic-mgmt-framework on 201911
Signed-off-by: Stephen Sun stephens@nvidia.com
How I did it
Cause:
While building sonic-mgmt-framework docker, it needs to install grpcio-tools version 1.20.0 which has a dependency on grpcio version >=1.20.0.
As >=1.20.0 is specified, it will install the latest version of grpcio.
It had worked well until the grpcio package version 1.40.0 was released 3 days ago.
Looks like some new dependencies are introduced by the latest version.
Fix:
Designate grpcio version 1.39.0 explicitly, which is the latest version of grpcio that worked well.
Why I did it
Update FRR 7.2.1 head. The following is a list of new commits.
5ae667a1f Merge pull request #9335 from FRRouting/mergify/bp/stable/7.2/pr-9214
eb679e8a1 zebra: bugfix of error quit of zebra, due to no nexthop ACTIVE
80d2eaa98 Merge pull request #8886 from FRRouting/mergify/bp/stable/7.2/pr-8876
1eeab2c1e lib: remove pure attribute from functions that modify memory
eb00dc4ec Merge pull request #6944 from LabNConsulting/working/lb/7.2/valgrind-supp-libyang
b9d6d05bf bgpd: suppress new libyang_1.0 related loss reports
8c26a71eb Merge pull request #6562 from ton31337/fix/configuration_for_labeled_unicast_in_place_7.2
386a1719c bgpd: Make sure network/aggregate-address commands lay down under labeled safi
b01c8bf28 Merge pull request #6526 from ton31337/fix/set_ipv6_ll_if_global_zero_7.2
c382833e8 bgpd: Use IPv6 LL address as nexthop if global was set to ::/LL
99509b835 Merge pull request #6395 from opensourcerouting/7.2/init-config-perms
7eef8f7b1 build: use configfile mode in init script
4cbe07705 Merge pull request #6360 from opensourcerouting/7.2/fix-warnings
84bb11785 nhrpd: clean up SA warning
aac726476 nhrpd: be more careful with linked lists
3a4b6d654 debian: Fix spelling error
756c67c6c Merge pull request #6284 from opensourcerouting/7.2/gcc-10
65a116a64 Merge pull request #6354 from ton31337/fix/communities_bgpd_crash_7.2
f7a00fd67 bgpd: Check to ensure community attributes exist before freeing them
a960f99c2 vrrpd: fix build on Fedora Rawhide
d4caff99f babeld: GCC complaining about no return in non-void function
a014c27ae babeld: fix build on Fedora Rawhide
79ff55b5b bgpd: remove unused variable
ff343e588 pimd: Make frr able to be built by gcc 10
9a3cf1ba2 ldpd: remove multiple definitions of thread_master
a19515bfe ldpd: fix another linking issue with GCC-10
b4c8de38c tests: fix build with GCC 10
4f27e8c85 ldpd: Fix linking error on Fedora Rawhide with GCC 10
How I did it
Update FRR 7.2 pointer and create a tag frr-7.2.1-s4.
- Why I did it
Update SDK\FW version to 4.4.3326\2008.3326. This version contains:
New Features:
1. Add support for Fast Boot for SN3800
Bug Fixing:
1. In some cases, when the total number of allocations exceeds the resource limit, an error can occur due to incorrect resource release procedure. This issue is most likely to affect the following resources: flow counters, ACL actions, PBS, WJH filter, Tunnels, ECMP containers, MC (L2 &L3)
2. On Spectrum systems, when using Async Router API with IPV6, an error message in the log regarding failing to remove ECMP container may show up. This error is not functional and can be safely ignored.
3. On Spectrum-2 systems and above, when using warm boot, setting max_bridge_num to a value greater than 1968 will cause an error and potential crash.
4. Some Molex cables do not support speed after reboot
- How I did it
- How to verify it
Was verified by running regression tests that includes complete sonic-mgmt tests supported
Why I did it
serial-getty service exited in Dell S6100 device randomly.
How I did it
Added serial-getty to monit services.
How to verify it
Stop serial-getty in ssh session and check whether the service restarts or not
Updated Broadcom SAI Debian package to 3.7.6.1 Following are the major changes here:
- CS00011651922/CS00012192502 SID:Parity error in TDM Calendar memories causes traffic drop after SER correction
- CS00011222060 soc_mem_alpm_delete: unit 0: ALPM delete operation[L3_DEFIP_ALPM_IPV6_128] encountered parity error
- Cesto Phy Recovery enhancement.
- SDK compile with flag -DBCM_MONOTONIC_TIME and -DBCM_MONOTONIC_MUTEXES
Why I did it
The time gap between last config load & db-listen seem to have increased.
Any config updates that occurred in this gap gets missed by db-listen.
This could miss updating /etc/pam.d/common-auth-sonic
How I did it
Add a one shot timer, just before db-listen. The timer will fire after the subscribe is done
When the timer fires, reload tacacs & aaa
Why I did it
To handle newer SSD firmware version in DellEMC S6100 platform (S210506G - 3IE devices).
How I did it
Update s6100_ssd_upgrade_status.sh to handle newer SSD firmware version.
How to verify it
Logs: UT_logs.txt
Signed-off-by: Dror Prital <drorp@nvidia.com>
* [Mellanox] Update FW version to 2008.3218 (#8079)
Update FW version to 2008.3218, fixing the following issues:
- 50G/100G links that are operationally down before warm-reboot are not coming up after warm-reboot
- 50G/100G links with admin shut / no shut commands are not coming up after warm-reboot