Signed-off-by: Yong Zhao yozhao@microsoft.com
Why I did it
Currently we leveraged the Supervisor to monitor the running status of critical processes in each container and it is more reliable and flexible than doing the monitoring by Monit. So we removed the functionality of monitoring the critical processes by Monit.
How I did it
I removed the script process_checker and corresponding Monit configuration entries of critical processes.
How to verify it
I verified this on the device str-7260cx3-acs-1.
This is due to the fact that we use SONIC_OVERRIDE_BUILD_VARS internally
in our build jobs and this is not accounted in caching framework.
So we add MLNX_SDK_DEB_VERSION to force rebuild if we changed it via
SONIC_OVERRIDE_BUILD_VARS.
Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
#### Why I did it
- After [sonic-linux-kernel#177](https://github.com/Azure/sonic-linux-kernel/pull/177) changes, the I2C mux channels of Baseboard and Switchboard CPLDs are moved from i2c-4 and i2c-5 to i2c-36 and i2c-37 respectively.
- This caused QSFP driver initialization of i2c-36 to i2c-41 to fail causing the ports from Ethernet208 to Ethernet248 fail.
#### How I did it
- The fix to this problem is to change the order of QSFP driver initialization to I2C mux channels.
- Instead of the order i2c-10 to i2c-41, the order i2c-4 to i2c-35 is being utilized.
- Also, need to change the i2c-mux-channel number for Baseboard CPLD and switchboard CPLD in scripts to access them.
#### Why I did it
On our platforms syncd must be up while using the sonic_platform.
The issue is warm-reboot script first disables syncd then instantiate Chassis, which tries to connect syncd in __init__.
#### How I did it
Refactor Chassis to lazy initialize components.
Signed-off-by: Volodymyr Boyko <volodymyrx.boiko@intel.com>
Initialize fans and thermals lists on demand; make them properties in order to reduce Chassis object initialization time
Signed-off-by: Volodymyr Boyko <volodymyrx.boiko@intel.com>
- Why I did it
Update FW version to 2008_3110 fixing SN3800 specific warm boot scenario:
1. Disable interface
2. Warm Boot
3. Enable Interface --> link will remain down.
- How I did it
Use new FW that contains the fix for the problem mentioned above
- How to verify it
Run the scenario mentioned above and make sure that the link is up after warm boot
Signed-off-by: Dror Prital <drorp@nvidia.com>
#### Why I did it
According to thermalctld hld, each fan must belong to a fan drawer, if the fan drawer does not physically exist, put fan into a virtual fan drawer. This PR is to clear fan from chassis._fan_list
#### How I did it
1. Don't put fan to chassis._fan_list
2. Always query fan from fan_drawer
Why I did it
Microsoft reported occasional daemon crashes on devices running 201911. On close inspection it was due to PMBus reads failing on IOError on very rare occasions.
This is the fix for 202012 branch.
How I did it
Add try/except block on performing reads on PMBus GPIOs.
How to verify it
Which release branch to backport (provide reason below if selected)
Co-authored-by: Zhi Yuan (Carl) Zhao <zyzhao@arista.com>
- Why I did it
Pick up fix from new hw-management package:
Fix gearbox thermal zone name, which was lack suffix thermal zone number
- How I did it
Update the hw-management version number in the make file
Update hw-management submodule pointer
- How to verify it
Run platform related test cases on Mellanox platform
Why I did it
This SAI change is to Enable TD2 platforms to also be able to handle VFP-based sub-interfaces.
This is the feature that is also needed for TD2 platforms.
How to verify it
Guohan and Team has validated this feature change on TD2 platforms
#### Why I did it
xcvrd crashes when the application advertisement capability flag is not seen for few transceivers.
#### How I did it
Initialize the additional application capability in dunder init
Add support for Accton as9726-32d platform
This pull request is based on as9716-32d, so I reference as9716-32d to create new model: as9726-32d.
This module do not need led driver to control led, FPGA can handle it.
I also implement API2.0(sonic_platform) for this model, CPLD driver, PSU driver, Fan driver to control these HW behavior.
Why I did it
Fix issues below.
#7133#6602
So, remove the dps200 driver from the platform-specific driver.
Then, add the dps200 module driver to the Linux kernel tree.
How I did it
Remove the dps200 driver from the platform-specific driver and add the dps200 module driver to the Linux kernel.
How to verify it
Build an image with Azure/sonic-linux-kernel#207
Then, install to the Haliburton.
This is to pick up BRCM SAI 4.3.3.5-2 which contains 2 main changes:
1. hsdk-6.5.21-p1.patch (to address some field problems related to SDK issue.
2. Fix for CS00012097141 (remove some unnecessary debug setting that was causing non functional impacting problem at boot time)
Preliminary tests looks fine. BGP neighbors were all up with proper routes programmed
interfaces are all up
Manually ran the following test cases on S6100 DUT and all passed:
```
ipfwd/test_dir_bcast.py
fib/test_fib.py
vxlan/test_vxlan_decap.py
decap/test_decap.py
fdb/test_fdb.py
```
LED_PROC_INIT_SOC variable was incorrectly referenced as LED_SOC_INIT_SOC. Introduced in #5483
Rather than fixing the typo, I decided to simplify the script, removing the need for the conditional altogether by moving the bcmcmd call inside the conditional which checks for the presence of LED_SOC_INIT_SOC.
#### Why I did it
- PSU data is loaded into state DB.
Following errors are seen in syslogs:
"Failed to update PSU data - '<=' not supported between instances of 'float' and 'str'"
- Issue is not seen in master image as the PSU API return type is different.
#### How I did it
- Changed the return type in PSU API's.
#### Why I did it
400G media EEPROM and DOM information are not populated properly in DellEMC Z9332f platform.
#### How I did it
Handled QSFP_DD, QSFP28/QSFP+, SFP+ accordingly based on media type detected.
Incorporate the below changes in DellEMC Z9332F platform:
- Implemented watchdog platform API support
- Implement ‘get_position_in_parent’, ‘is_replaceable’ methods for all device types
- Change return type of SFP methods to match specification in sonic_platform_common/sfp_base.py
- Added platform.json file in device directory.
Co-authored-by: V P Subramaniam <Subramaniam_Vellalap@dell.com>
- Why I did it
Updated FW to xx.2008.2526 version.
Fixed issues:
1. Spectrum-2, Spectrum-3 | sFlow | High CPU load and high on fully loaded switch.
2. Spectrum-2, Spectrum-3 | Fine grain LAG | in rare cases doesn’t update the right entry
- How I did it
Updated submodule pointer and version in a Makefile.
- How to verify it
Full regression and bugs validation
Signed-off-by: Shlomi Bitton <shlomibi@nvidia.com>
#### Why I did it
Adopt a single way to get fan direction for all ASIC types.
It depends on hw-mgmt V.7.0010.2000.2303. Depends on https://github.com/Azure/sonic-buildimage/pull/7419
#### How I did it
Originally, the get_direction was implemented by fetching and parsing `/var/run/hw-management/system/fan_dir` on the Spectrum-2 and the Spectrum-3 systems. It isn't supported on the Spectrum system.
Now, it is implemented by fetching `/var/run/hw-management/thermal/fanX_dir` for all the platforms.
Signed-off-by: Stephen Sun <stephens@nvidia.com>
- Why I did it
To include below changes:
Set monitoring VLAN hostif up dy default (for VNET ping tool)
- How I did it
Updated SAI submodule pointer
- How to verify it
Create VLAN hostif according to changes in PR: Azure/sonic-swss#1645
Verify it is admin up by default
Signed-off-by: Volodymyr Samotiy <volodymyrs@nvidia.com>
- Support compile sonic arm image on arm server. If arm image compiling is executed on arm server instead of using qemu mode on x86 server, compile time can be saved significantly.
- Add kernel argument systemd.unified_cgroup_hierarchy=0 for upgrade systemd to version 247, according to #7228
- rename multiarch docker to sonic-slave-${distro}-march-${arch}
Co-authored-by: Xianghong Gu <xgu@centecnetworks.com>
Co-authored-by: Shi Lei <shil@centecnetworks.com>
The following error message is observed during chassis object being destroyed
"Exception ignored in: <function Chassis.__del__ at 0x7fd22165cd08>
Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/sonic_platform/chassis.py", line 83, in __del__
ImportError: sys.meta_path is None, Python is likely shutting down
The chassis tries to import deinitialize_sdk_handle during being destroyed for the purpose of releasing the sdk_handle.
However, importing another module during shutting down can cause the error because some of the fundamental infrastructures are no longer available."
This error occurs when a chassis object is created and then destroyed in the Python shell.
- How I did it
To fix it, record the deinitialize_sdk_handle in the chassis object when sdk_handle is being initialized and call the deinitialize handler when the chassis object is being destroyed
- How to verify it
Manually test.
- Why I did it
Upgrade hw-mgmt to 7.0100.2303
Bug fixes
1. Fan direction feature fix for fixed FAN system (using shell instead of binutils/strings)
2. Remove cpld 4th link on systems with only 3 CPLD's
3. hw-mgmt: thermal: Add hardcoded critical trip point. Follow-up after patch "Removing critical thermal zones to prevent unexpected software system shutdown".
4. Fix sensor attribute mapping to be label based instead of index based to allow common handling of voltage regulator names independently of hardware changes.
5. Update 'lm-sensors' custom configuration file. Relevant only for users utilizing sensors.conf files coming along with hw-management package.
6. For full feature list please follow https://github.com/Mellanox/hw-mgmt/blob/V.7.0010.2300_BR/debian/Release.txt
- How I did it
Update hw-mgmt pointer
Remove unused patches
Fix existing patch to make sure it apply successfully
- How to verify it
Full platform regression on all mellanox platforms
- Why I did it
Changes in the new release:
1. Fix 10G and 50G speeds in SAI XML to support all interface types
2. Enable SMAC=DMAC and SMAC MC in tunnel debug counter
3. Add tunnel statistics
4. Add isolation group API implementation
5. Fix ACL ANY debug counter to correctly track ACL drops
6. Add VXLAN source port hard coded range, controlled by K/V
7. FW dump me now feature
8. Add mlxtrace to saidump
9. Speed lane setting and AN control
10. Implement query stats API
11. VNI miss part of tunnel decal drop reason
- How I did it
Update the version number in SAI make file, update the mlnx-sai submodule pointer.
- How to verify it
Run full regression tests on Mellanox platforms
Signed-off-by: Dror Prital <drorp@nvidia.com>
This is the SAI 4.3.3.5 code drop from BRCM to address 2 CSP case and initial MMU changes
Note the MMU changes is the same as that of SAI 4.3.3.4-1 (#7341) but with official patch.
- Case CS00012178716 [4.3] Polling SAI_QUEUE_STAT_SHARED_WATERMARK_BYTES fails often on TH3
- Case CS00012159273 [4.3.3.3] [TD3][IPFWD] subnet broadcast flooding end with extra VLAN Tag if member port of VLAN interface deleted then added back to VLAN
Once we have this PR merged, will validate each one of the above to ensure they are indeed fixed...
Preliminary tests looks fine. BGP neighbors were all up with proper routes programmed
interfaces are all up
Manually ran the following test cases on TD3 DUT and all passed:
ipfwd/test_dir_bcast.py
fib/test_fib.py
vxlan/test_vxlan_decap.py
decap/test_decap.py
fdb/test_fdb.py
New features and fixes in the new SDK/FW:
SN4600C | AN/LT support
SN2700 | AN/LT bugs fixes
WJH | FID_MISS support
Signed-off-by: Kebo Liu <kebol@nvidia.com>
#### Why I did it
- xcvrd crash was seen in latest 201811 images.
- For Dell S6100,API 2.0 uses poll mode while 1.0 was still using interrupt mode.
#### How I did it
- Modified get_transceiver_change_event in 1.0 to poll mode.
Modified the MRVL SAI debian version format to include debian revision number. This helps in identifying the SAI deb causing any build/runtime issue.
Signed-off-by: Rajkumar Pennadam Ramamoorthy <rpennadamram@marvell.com>
* 7260cx3 DualToR config.bcm support based on DualToR setting in device metadata at boot time.
For HWSKU Arista-7260CX3-C64 the MMU setting SOC for T0/T1 is also combined into the config.bcm.j2 logic so use just one config file and adding delta based on Switch Roles.
- Why I did it
Add missed files for dynamic buffer calculation for ACS-MSN3420 and ACS-MSN4410
- How I did it
asic_table.j2: Add mapping from platform to ASIC
Add buffer_dynamic.json.j2 for ACS-MSN4410.
- How to verify it
Check whether the dynamic buffer calculation daemon starts successfully.
Signed-off-by: Stephen Sun <stephens@nvidia.com>
- Why I did it
There was a change to replace platform utils with sonic platform API in psuutil. However, psu API is not initialized on host side. The PR is to fix it.
Backport of #7016 to the 202012 branch.
- How I did it
Initialize PSU API on both host and non-host side
- How to verify it
Manual test