Commit Graph

1402 Commits

Author SHA1 Message Date
gechiang
5d29a2c2a6
[202012] BRCM SAI 4.3.3.5-3 Enable VFP-based subintf on td2 (#7728)
Why I did it
This SAI change is to Enable TD2 platforms to also be able to handle VFP-based sub-interfaces.
This is the feature that is also needed for TD2 platforms.

How to verify it
Guohan and Team has validated this feature change on TD2 platforms
2021-05-27 15:11:03 -07:00
Aravind Mani
63354c0daa DellEMC: Z9332f fix SFP issues (#7677)
#### Why I did it
xcvrd crashes when the application advertisement capability flag is not seen for few transceivers.

#### How I did it
Initialize the additional application capability in dunder init
2021-05-24 22:27:26 +00:00
ec-michael-shih
ced3b3f310 [Platform] Accton add to support as9726-32d platform. (#7479)
Add support for Accton as9726-32d platform

This pull request is based on as9716-32d, so I reference as9716-32d to create new model: as9726-32d.
This module do not need led driver to control led, FPGA can handle it.
I also implement API2.0(sonic_platform) for this model, CPLD driver, PSU driver, Fan driver to control these HW behavior.
2021-05-24 22:09:24 +00:00
schobtr
86e78d231e [Haliburton-Celestica] Use psu driver dps200 from Linux kernel (#7247)
Why I did it
Fix issues below.
#7133
#6602

So, remove the dps200 driver from the platform-specific driver.
Then, add the dps200 module driver to the Linux kernel tree.

How I did it
Remove the dps200 driver from the platform-specific driver and add the dps200 module driver to the Linux kernel.

How to verify it
Build an image with Azure/sonic-linux-kernel#207
Then, install to the Haliburton.
2021-05-24 22:08:17 +00:00
Junchao-Mellanox
7a69e0ffb2 [mellaonox]: No need enable thermal zones in thermal_manager.deinitialize since they are enabled by default (#7556)
No need enable thermal zones in thermal_manager.deinitialize since they are enabled by default. And removing this will faster thermalctld exit speed
2021-05-24 22:07:14 +00:00
Aravind Mani
6d732b5857 Dell S6000,S6100 system health changes (#6788)
Needed support for platform system health in Dell platforms
2021-05-19 17:18:44 +00:00
gechiang
3eaadec21c
[202012] BRCM SAI 4.3.3.5-2 picked up BRCM hsdk-6.5.21-p1.patch and CS00012097141 fix (#7581)
This is to pick up BRCM SAI 4.3.3.5-2 which contains 2 main changes:
1.  hsdk-6.5.21-p1.patch (to address some field problems related to SDK issue.
2. Fix for CS00012097141 (remove some unnecessary debug setting that was causing non functional impacting problem at boot time)

Preliminary tests looks fine. BGP neighbors were all up with proper routes programmed
interfaces are all up
Manually ran the following test cases on S6100 DUT and all passed:
```
     ipfwd/test_dir_bcast.py
     fib/test_fib.py
     vxlan/test_vxlan_decap.py
     decap/test_decap.py
     fdb/test_fdb.py
```
2021-05-12 07:56:46 -07:00
Joe LeVeque
ed52c938c1 [brcm] Fix and simplify start_led.sh (#7548)
LED_PROC_INIT_SOC variable was incorrectly referenced as LED_SOC_INIT_SOC. Introduced in #5483

Rather than fixing the typo, I decided to simplify the script, removing the need for the conditional altogether by moving the bcmcmd call inside the conditional which checks for the presence of LED_SOC_INIT_SOC.
2021-05-10 16:01:16 -07:00
Aravind Mani
cf36209ae1 DellEMC: Fix Z9332f xcvrd crash (#7544) 2021-05-10 15:56:59 -07:00
Aravind Mani
11aa05b1d3
[202012][DellEMC] Z9332f PSU data is not updated in state DB (#7543)
#### Why I did it

- PSU data is loaded into state DB.
   Following errors are seen in syslogs:
  "Failed to update PSU data - '<=' not supported between instances of 'float' and 'str'"

- Issue is not seen in master image as the PSU API return type is different.

#### How I did it

- Changed the return type in PSU API's.
2021-05-06 11:53:42 -07:00
gechiang
c7540a8aba
[202012] BRCM SAI 4.3.3.5-1 Added Support for 100G special AN/LT mode (#7534) 2021-05-06 09:20:30 -07:00
Aravind Mani
5dc2860d37 DellEMC: Z9332f SFP enhancements (#7457)
#### Why I did it
400G media EEPROM and DOM information are not populated properly in DellEMC Z9332f platform.

#### How I did it
Handled QSFP_DD, QSFP28/QSFP+, SFP+ accordingly based on media type detected.
2021-05-05 13:48:30 -07:00
vpsubramaniam
7d98a3fe47 DellEMC: Z9332F - Watchdog support, add platform.json, new platform API implementation and fixes (#6988)
Incorporate the below changes in DellEMC Z9332F platform:

- Implemented watchdog platform API support
- Implement ‘get_position_in_parent’, ‘is_replaceable’ methods for all device types
- Change return type of SFP methods to match specification in sonic_platform_common/sfp_base.py
- Added platform.json file in device directory.

Co-authored-by: V P Subramaniam <Subramaniam_Vellalap@dell.com>
2021-05-05 13:47:03 -07:00
shlomibitton
ad05c98d34 [Mellanox] Update FW to xx.2008.2526 (#7511)
- Why I did it
Updated FW to xx.2008.2526 version.

Fixed issues:
1. Spectrum-2, Spectrum-3 | sFlow | High CPU load and high on fully loaded switch.
2. Spectrum-2, Spectrum-3 | Fine grain LAG | in rare cases doesn’t update the right entry

- How I did it
Updated submodule pointer and version in a Makefile.

- How to verify it
Full regression and bugs validation

Signed-off-by: Shlomi Bitton <shlomibi@nvidia.com>
2021-05-05 09:36:14 -07:00
Junchao-Mellanox
b9680e9e25 [Mellanox] Adjust PSU fan name to align with sysfs file name (#7490)
Change PSU fan name from psu_{psu_index}fan{fan_index} to psu{psu_index}_fan{fan_index}
2021-05-05 09:35:31 -07:00
Stephen Sun
a554ddc91d [Mellanox] Adopt single way to get fan direction for all ASIC types (#7386)
#### Why I did it
Adopt a single way to get fan direction for all ASIC types.
It depends on hw-mgmt V.7.0010.2000.2303. Depends on https://github.com/Azure/sonic-buildimage/pull/7419

#### How I did it
Originally, the get_direction was implemented by fetching and parsing `/var/run/hw-management/system/fan_dir` on the Spectrum-2 and the Spectrum-3 systems. It isn't supported on the Spectrum system.
Now, it is implemented by fetching `/var/run/hw-management/thermal/fanX_dir` for all the platforms.

Signed-off-by: Stephen Sun <stephens@nvidia.com>
2021-05-05 09:34:25 -07:00
Volodymyr Samotiy
4152c3e337
[Mellanox] [202012] Update SAI submodule pointer (#7499)
- Why I did it
To include below changes:
Set monitoring VLAN hostif up dy default (for VNET ping tool)

- How I did it
Updated SAI submodule pointer

- How to verify it
Create VLAN hostif according to changes in PR: Azure/sonic-swss#1645
Verify it is admin up by default

Signed-off-by: Volodymyr Samotiy <volodymyrs@nvidia.com>
2021-05-04 17:29:47 +03:00
guxianghong
a0fde3a626 [arm] support compile sonic arm image on arm server (#7285)
- Support compile sonic arm image on arm server. If arm image compiling is executed on arm server instead of using qemu mode on x86 server, compile time can be saved significantly.
- Add kernel argument systemd.unified_cgroup_hierarchy=0 for upgrade systemd to version 247, according to #7228
- rename multiarch docker to sonic-slave-${distro}-march-${arch}

Co-authored-by: Xianghong Gu <xgu@centecnetworks.com>
Co-authored-by: Shi Lei <shil@centecnetworks.com>
2021-05-02 08:11:56 -07:00
Stephen Sun
48908b1c5a Fix issue: exception occurred during chassis object being destroyed (#7446)
The following error message is observed during chassis object being destroyed

"Exception ignored in: <function Chassis.__del__ at 0x7fd22165cd08>
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/sonic_platform/chassis.py", line 83, in __del__
ImportError: sys.meta_path is None, Python is likely shutting down
The chassis tries to import deinitialize_sdk_handle during being destroyed for the purpose of releasing the sdk_handle.
However, importing another module during shutting down can cause the error because some of the fundamental infrastructures are no longer available."

This error occurs when a chassis object is created and then destroyed in the Python shell.

- How I did it
To fix it, record the deinitialize_sdk_handle in the chassis object when sdk_handle is being initialized and call the deinitialize handler when the chassis object is being destroyed

- How to verify it
Manually test.
2021-04-29 10:10:25 -07:00
Junchao-Mellanox
d4e8c3f666 [Mellanox] Upgrade hw-mgmt to 7.0100.2303 (#7419)
- Why I did it
Upgrade hw-mgmt to 7.0100.2303

Bug fixes

1. Fan direction feature fix for fixed FAN system (using shell instead of binutils/strings)
2. Remove cpld 4th link on systems with only 3 CPLD's
3. hw-mgmt: thermal: Add hardcoded critical trip point. Follow-up after patch "Removing critical thermal zones to prevent unexpected software system shutdown".
4. Fix sensor attribute mapping to be label based instead of index based to allow common handling of voltage regulator names independently of hardware changes.
5. Update 'lm-sensors' custom configuration file. Relevant only for users utilizing sensors.conf files coming along with hw-management package.
6. For full feature list please follow https://github.com/Mellanox/hw-mgmt/blob/V.7.0010.2300_BR/debian/Release.txt

- How I did it
Update hw-mgmt pointer
Remove unused patches
Fix existing patch to make sure it apply successfully

- How to verify it
Full platform regression on all mellanox platforms
2021-04-29 10:09:58 -07:00
Dror Prital
16dc30b944
[Mellanox] [202012] Update SAI version 1.18.3.0 (#7427)
- Why I did it
Changes in the new release:
1. Fix 10G and 50G speeds in SAI XML to support all interface types
2. Enable SMAC=DMAC and SMAC MC in tunnel debug counter
3. Add tunnel statistics
4. Add isolation group API implementation
5. Fix ACL ANY debug counter to correctly track ACL drops
6. Add VXLAN source port hard coded range, controlled by K/V
7. FW dump me now feature
8. Add mlxtrace to saidump
9. Speed lane setting and AN control
10. Implement query stats API
11. VNI miss part of tunnel decal drop reason

- How I did it
Update the version number in SAI make file, update the mlnx-sai submodule pointer.

- How to verify it
Run full regression tests on Mellanox platforms

Signed-off-by: Dror Prital <drorp@nvidia.com>
2021-04-26 20:44:36 +03:00
gechiang
4c43ecc81b
[202012] BRCM SAI 4.3.3.5 pick up 2 bug fixes and MMU changes (#7400)
This is the SAI 4.3.3.5 code drop from BRCM to address 2 CSP case and initial MMU changes
Note the MMU changes is the same as that of SAI 4.3.3.4-1 (#7341) but with official patch.

- Case CS00012178716 [4.3] Polling SAI_QUEUE_STAT_SHARED_WATERMARK_BYTES fails often on TH3
- Case CS00012159273 [4.3.3.3] [TD3][IPFWD] subnet broadcast flooding end with extra VLAN Tag if member port of VLAN interface deleted then added back to VLAN

Once we have this PR merged, will validate each one of the above to ensure they are indeed fixed...

Preliminary tests looks fine. BGP neighbors were all up with proper routes programmed
interfaces are all up
Manually ran the following test cases on TD3 DUT and all passed:

     ipfwd/test_dir_bcast.py
     fib/test_fib.py
     vxlan/test_vxlan_decap.py 
     decap/test_decap.py
     fdb/test_fdb.py
2021-04-22 08:34:52 -07:00
Rajkumar-Marvell
b8daa00ef0 [marvell] Move armhf syncd build from stretch to buster. (#7366)
Signed-off-by: Rajkumar Pennadam Ramamoorthy <rpennadamram@marvell.com>
2021-04-21 15:40:52 -07:00
Kebo Liu
2208c9212a [Mellanox] Update SDK to 4.4.2522 and FW to 2008.2520 (#7391)
New features and fixes in the new SDK/FW:

SN4600C | AN/LT support
SN2700 | AN/LT bugs fixes
WJH | FID_MISS support

Signed-off-by: Kebo Liu <kebol@nvidia.com>
2021-04-21 14:05:56 -07:00
Aravind Mani
8832c10fb7 [Dell S6100]: Add dell ich driver (#7336)
dell_ich driver was removed as part of #7309 and it is needed for watchdog tickle in S6100 platform.
2021-04-21 14:02:27 -07:00
Aravind Mani
80fdb29957 Dell S6100: Modify transceiver change event from interrupt to poll mode (#7309)
#### Why I did it

- xcvrd crash was seen in latest 201811 images.
- For Dell S6100,API 2.0 uses poll mode while 1.0 was still using interrupt mode.

#### How I did it

- Modified get_transceiver_change_event in 1.0 to poll mode.
2021-04-21 14:01:53 -07:00
Rajkumar-Marvell
45cb76fe25 [Marvell] Updated armhf SAI deb version info. (#6863)
Modified the MRVL SAI debian version format to include debian revision number. This helps in identifying the SAI deb causing any build/runtime issue.

Signed-off-by: Rajkumar Pennadam Ramamoorthy <rpennadamram@marvell.com>
2021-04-18 08:29:20 -07:00
gechiang
fac5e204c4 7260cx3 DualToR config.bcm support based on DualToR setting in device metadata at boot time (#7168)
* 7260cx3 DualToR config.bcm support based on DualToR setting in device metadata at boot time. 
For HWSKU Arista-7260CX3-C64 the MMU setting SOC for T0/T1 is also combined into the config.bcm.j2 logic so use just one config file and adding delta based on Switch Roles.
2021-04-15 16:12:09 -07:00
gechiang
d273f3fbcb BRCM SAI 4.3.3.4 Pick up 8 major bug fixes (#7218) 2021-04-13 17:17:32 -07:00
Stephen Sun
1312feef1e Bug fix: Support dynamic buffer calculation on ACS-MSN3420 and ACS-MSN4410 (#7113)
- Why I did it
Add missed files for dynamic buffer calculation for ACS-MSN3420 and ACS-MSN4410

- How I did it
asic_table.j2: Add mapping from platform to ASIC
Add buffer_dynamic.json.j2 for ACS-MSN4410.

- How to verify it
Check whether the dynamic buffer calculation daemon starts successfully.

Signed-off-by: Stephen Sun <stephens@nvidia.com>
2021-04-08 18:36:27 +00:00
Junchao-Mellanox
d79a456a62
[Mellanox] Initialize PSU API on both host and docker side (#7075)
- Why I did it
There was a change to replace platform utils with sonic platform API in psuutil. However, psu API is not initialized on host side. The PR is to fix it.
Backport of #7016 to the 202012 branch.

- How I did it
Initialize PSU API on both host and non-host side

- How to verify it
Manual test
2021-04-07 20:36:02 +03:00
Wirut Getbamrung
27fe336d81 [device/celestica]: Fix failed test cases of DX010 platform APIs (#6564)
1. Add device/celestica/x86_64-cel_seastone-r0/platform.json 
2. Update functions to support python3.7
3. Add more functions follow latest sonic_platform_base
4. Fix the bug

Co-authored-by: 119064273 <2276096708@qq.com>
Co-authored-by: Eric Zhu <erzhu@celestica.com>
Co-authored-by: doni@celestica.com <doni@celestica.com>
2021-04-05 14:02:41 -07:00
Joe LeVeque
dd9be59cd1
[202012][dockers][supervisor] Increase event buffer size for process exit listener; Set all event buffer sizes to 1024 (#7203)
#### Why I did it

Backport of https://github.com/Azure/sonic-buildimage/pull/7083 to the 202012 branch.

To prevent error [messages](https://dev.azure.com/mssonic/build/_build/results?buildId=2254&view=logs&j=9a13fbcd-e92d-583c-2f89-d81f90cac1fd&t=739db6ba-1b35-5485-5697-de102068d650&l=802) like the following from being logged:

```
Mar 17 02:33:48.523153 vlab-01 INFO swss#supervisord 2021-03-17 02:33:48,518 ERRO pool supervisor-proc-exit-listener event buffer overflowed, discarding event 46
```

This is basically an addendum to https://github.com/Azure/sonic-buildimage/pull/5247, which increased the event buffer size for dependent-startup. While supervisor-proc-exit-listener doesn't subscribe to as many events as dependent-startup, there is still a chance some containers (like swss, as in the example above) have enough processes running to cause an overflow of the default buffer size of 10.

This is especially important for preventing erroneous log_analyzer failures in the sonic-mgmt repo regression tests, which have started occasionally causing PR check builds to fail. Example [here](https://dev.azure.com/mssonic/build/_build/results?buildId=2254&view=logs&j=9a13fbcd-e92d-583c-2f89-d81f90cac1fd&t=739db6ba-1b35-5485-5697-de102068d650&l=802).

I set all supervisor-proc-exit-listener event buffer sizes to 1024, and also updated all dependent-startup event buffer sizes to 1024, as well, to keep things simple, unified, and allow headroom so that we will not need to adjust these values frequently, if at all.
2021-04-01 12:52:19 -07:00
Joe LeVeque
23c7940787 [docker-gbsyncd-vs] Run new gbsyncdmgrd in lieu of deprecated gbsyncd_startup.py (#7154)
To improve management of docker-gbsyncd-vs. gbsyncd_startup.py simply spawned syncd processes and then exited. In that case, supervisord would no longer manage any processes in the container, and thus there was no way to know if a critical process had exited.

I recently created gbsyncdmgrd to be a more complete, robust replacement for gbsyncd_startup.py.

NOTE: This PR is dependent on the inclusion of gbsyncdmgrd in the sonic-sairedis repo. A submodule update is pending at
#7089
2021-03-31 21:39:50 -07:00
Mykola Gerasymenko
567eae6116 [barefoot]: Add psample module to load at boot time on BFN platform (#7164)
The psample module was not loaded on barefoot platform. The loading of this module is a prerequisite for testing SFlow.

* add `.gitignore` to the `barefoot` subdirectory to overwrite ignore "platform/**/debian/*" in the root directory
2021-03-31 08:48:53 -07:00
Stephen Sun
f080220801 [mellanox]: Integrate hw-mgmt package V.7.0010.2002 (#7148)
Integrate hw-management package V.7.0010.2002

Bug fixes:
Removing critical thermal zones to prevent unexpected software system shutdown:
*Kernel 4.9 -0071-mlxsw-core-Remove-critical-trip-point-from-thermal-z.patch
*Kernel 4.19 -076-mlxsw-core-Remove-critical-trip-point-from-thermal-z.patch
Removing redundant link for cpld3 for fixed systems (SN2100, SN2010).
Fix an issue with missed attribute for cpld3 (port CPLD) for SN2700, SN2410.

Signed-off-by: Stephen Sun <stephens@nvidia.com>
2021-03-31 08:47:39 -07:00
Myron Sosyak
0ec92765ba [barefoot]: Updated SDK packages to 20210324 (#7142)
Update unsupported SAI attr ('SAI_ACL_TABLE_ATTR_FIELD_OUTER_VLAN_ID') to fix issues on acl table create
2021-03-31 08:46:16 -07:00
arheneus@marvell.com
26ee7cc5f4 [marvell]: Marvell prestera kernel driver (#7066)
Build Marvell kernel driver for prestera sai sdk
Builds interrupt and dma kernel driver
Removed the older method pre-compiled kernel module debian package and its makefile
2021-03-31 01:00:07 -07:00
Volodymyr Samotiy
0269fc19c2 [Mellanox] Update SDK to 4.4.2508 and FW to xx.2008.2508 (#7141)
Fix the following issues:

Spectrum-2, Spectrum-3 | Port | Fix link issue when using 25 GbE rate between two ports while one is on Spectrum-2-based system and the other is on Spectrum-3-based system
All | warmboot | fail to upgrade from earlier SONiC versions with official SDK/FW 4.4.2306 (was on SONiC 201911)
All | What-Just-Happened | When enabling or disabling WJH under high traffic load to the host CPU, in very specific and low probability conditions, an error could occur, that may result in loss of data, channel failure or in extreme cases SW failure

Signed-off-by: Volodymyr Samotiy <volodymyrs@nvidia.com>
2021-03-29 09:34:01 -07:00
vmittal-msft
88b0eabadd [broadcom]: Updated bcmsai to 4.3.3.3 (#7090)
To add latest SAI drop REL_4.3.3.3 to SONIC which addresses the following CSP cases:

CS00012058054: [4.3][IPinIP][TTL-PIPE] IPinIP TTL Pipe Mode is NOT working it is behaving UNIFORM mode even programed as PIPE mode
CS00011227466: [4.3] Warmboot support with tunnel encap
2021-03-29 09:33:30 -07:00
Volodymyr Samotiy
dce4ed5466 [Mellanox] Update FW to xx.2008.2424 (#7118)
Fixed issues:
* Mellanox SN-2700 breakout port not linking up with QSA

Signed-off-by: Volodymyr Samotiy <volodymyrs@nvidia.com>
2021-03-26 17:40:47 +00:00
Junchao-Mellanox
e6a08d8638 Fix issue: should not initialize led color in __init__ file as platform API will be called by multiple daemons (#7114)
- Why I did it
The existing Fan led and Psu led object initialize itself to green color in init method. However, there are multiple daemons calls sonic platform API and there could be a case that:

A PSU is removed from system
Reboot switch
psud detects that 1 PSU is missing and set PSU led to red
Other daemon just start up and call sonic platform API, the API set PSU led to green by call PsuLed.init
This PR is a partial fix for the issue. As we also need guarantee that the led is initialized with a correct value. I checked existing psud and thermalctld code. psud always initialize the PSU led color on boot up, thermalcltd need some changes to initialize led color on the first run

- How I did it
Remove the led color initialization code from FanLed.init and PsuLed.init

- How to verify it
Manual test
2021-03-26 17:40:42 +00:00
Joe LeVeque
42e0ffb32f [docker-gbsyncd-vs] Run gbsyncd_startup.py directly (#7084)
Eliminate the need for `gbsyncd_start.sh`, which simply calls `exec "/usr/bin/gbsyncd_startup.py"`. The shell script is unnecessary.

Once this PR merges, we can remove `gbsyncd_start.sh` from the sonic-sairedis repo.
2021-03-26 17:39:35 +00:00
Kebo Liu
0282f4fd47 [Mellanox] Update SDK to 4.4.2418, FW to 2008.2416, SAI to new commit (#7041)
- Why I did it
To pick up new features and fix from SDK/FW and SAI

SDK/FW new Feature:

All | Added support for multiple modules and cable types. For full list contact Nvidia networking support
Spectrum-3 | SN46000C | Added support for up to 5W on ports 49 to 64 .
SDK/FW bugs' fix:

All | fast reboot | fast boot failure from latest 201811 to 201911 and above
Spectrum | 10GbE/1GbE Transceiver (FTLX8574D3BCV) stopped working after firmware upgrade
Spectrum-2 | When device is rebooted with locked Optical Transceivers in split mode, the firmware may get stuck
Spectrum-2 | SN3700 | When connecting at 200GbE to Ixia K400, Ixia receives CRC errors
Spectrum-2 | SN3800 | On rare occasions packets loss may be experienced due to signal integrity issues
Spectrum-2 | When the port is a member of a LAG, after a warmboot and port toggle on the peer-side, the port remains down
Spectrum-3 | SN4700 | While using Optic cable in Split 4x1 mode in PAM4, when two first ports are toggled, the other 2 ports go down
Spectrum-3 | SN4700 | When working in 400GbE, deleting the headroom configuration (changing buffer size to zero) on the fly may cause continual packet drops
SAI

All | sFlow | Use hardcoded value 1 as netlink group number ax expected by hsflowd
- How I did it
Update the related version number in the make files and update the submodule pointer accordingly.

- How to verify it
Run regression test and everything works good.
2021-03-15 19:18:48 -07:00
Volodymyr Samotiy
0a7cd215ce [Mellanox] Update SDK to 4.4.2318, FW to *.2008.2314 (#6794)
To have the following fixes:
* All | Port status remains down after warm boot and flapping the port on peer side
* All | LAG HASH  | IPv6 SRC_IP is not accounted in LAG hashing [
* All | ASIC driver | Kernel crash observed when driver reload is initiated before it fully loaded
* Spectrum-3 | Buffer | In lossless configuration, headroom is been evicted only when the shared buffers is free
* All | prevent FW access during ISSU

Signed-off-by: Volodymyr Samotiy <volodymyrs@nvidia.com>
2021-03-15 19:18:33 -07:00
Volodymyr Boiko
8932678285 [platform][barefoot] Use urllib.parse.quote (#7010)
Fix Python 2 -> Python 3 migration issue

Signed-off-by: Volodymyr Boyko <volodymyrx.boiko@intel.com>
2021-03-15 19:12:54 -07:00
Volodymyr Boiko
2e2c85222e [barefoot][platform] Extend sonic_platform psu.py (#7006)
Improve sonic_platform PSU support

Signed-off-by: Volodymyr Boyko <volodymyrx.boiko@intel.com>
2021-03-15 19:12:33 -07:00
DavidZagury
66831f368b [Mellanox] Update MFT to 4.16.0-105 (#7007)
- Why I did it
Update MFT tool version to 4.16.0

Bugs fixes:
mlxlink: Fixed an issue that caused the margin scan to fail with the following message: Eye scan not completed.
mlxcable: Cable firmware burning capability is not supported.

New features:
mlxlink: Enabled margin scan on Network links.
mlxlink: Added PRBS TX/RX polarity inversion using the following flags: --invert_tx_polarity / --invert_rx_polarity

- How I did it
Update MFT make file with new version number.

- How to verify it
Build image and test related functions on Mellanox platform
2021-03-10 15:09:10 -08:00
Wirut Getbamrung
df89f6dcb6 [device/celestica]: Add xcvrd event support for Haliburton (#6517)
#### Why I did it
- The xcvrd service requires an event detection function, unplug or plug in the transceiver.

#### How I did it
- Add sysfs interrupt to notify userspace app of external interrupt
- Implement get_change_event() in chassis api.
- Also begin installing Python 3 sonic-platform package for Celestica platforms
2021-03-10 09:26:19 -08:00
gechiang
5f088e20af BRCM SAI 4.3.3.1-1 pick up Temp Patch to fix Dual TOR ACL issue CS00011559393 (#6980) 2021-03-10 09:25:53 -08:00
Samuel Angebault
e1f8c07d9e
[Arista] Update platform drivers (#6946)
- Provide `hw-management-generate-dump.sh` for `show techsupport`
 - Load `optoe3` for OSFP and QSFP-DD transceivers
 - Enhance reboot-cause caching robustness

Signed-off-by: Samuel Angebault <staphylo@arista.com>
2021-03-08 09:37:16 -08:00
Volodymyr Boiko
3f99959828 [platform][barefoot] Fix as9516bf installation (#6938)
To fix sonic_platform installation on as9516bf platform

Signed-off-by: Volodymyr Boyko <volodymyrx.boiko@intel.com>
2021-03-04 21:58:53 +00:00
DavidZagury
fac295a023 [Mellanox] Add support for SN4600 system (#6879)
- Why I did it
Add support for new 64x200G SN4600 systems

- How I did it
Add all relevant files (w/o platform.json and hwsku.json as they will come later) with default SKU.

- How to verify it
Install image on switch, verify all ports are up and configured properly, run full platform SONiC tests.
2021-03-04 21:23:05 +00:00
Volodymyr Boiko
760b3361ae [barefoot][platform] Refactor legacy scripts (#6871)
Removed or adapted obsolete code

Signed-off-by: Volodymyr Boyko <volodymyrx.boiko@intel.com>
2021-03-04 21:23:05 +00:00
Volodymyr Boiko
0d15e07d14 [barefoot][device][platform] Moved pcie.yaml (#6862)
To fix #6860

Signed-off-by: Volodymyr Boyko <volodymyrx.boiko@intel.com>
2021-03-04 21:23:05 +00:00
Joe LeVeque
b4555b9dbd [Mellanox] Ensure concrete platform API classes call base class initializer (#6854)
In preparation for the merging of Azure/sonic-platform-common#173, which properly defines class and instance members in the Platform API base classes.

It is proper object-oriented methodology to call the base class initializer, even if it is only the default initializer. This also future-proofs the potential addition of custom initializers in the base classes down the road.
2021-03-04 21:23:05 +00:00
Joe LeVeque
88f4c6523d [DellEMC] Ensure concrete platform API classes call base class initializer (#6853)
In preparation for the merging of Azure/sonic-platform-common#173, which properly defines class and instance members in the Platform API base classes.

It is proper object-oriented methodology to call the base class initializer, even if it is only the default initializer. This also future-proofs the potential addition of custom initializers in the base classes down the road.
2021-03-04 21:23:05 +00:00
Arun Saravanan Balachandran
3d52e3f69a DellEMC: S5232, Z9264, Z9332 - Platform API fixes (#6842)
#### Why I did it

To incorporate the below changes in DellEMC S5232, Z9264, Z9332 platforms.

- Update thermal high threshold values
- Make watchdog API Python2 and Python3 compatible
- Fix LGTM alerts
- Z9264: Fix get_change_event timer value

#### How I did it

- Use 'universal_newlines=True' in subprocess.Popen call.
- Change the timeout in 'get_change_event' to milliseconds to match specification in sonic_platform_common/chassis_base.py
2021-03-04 21:23:05 +00:00
rkdevi27
926572ee82 [dell/s6000]: Enable graceful reboot in S6000 (#6835)
The S6000 devices, the cold reboot is abrupt and it is likely to cause issues which will cause the device to land into EFI shell. Hence the platform reboot will happen after graceful unmount of all the filesystems as in S6100.

Moved the platform_reboot to platform_reboot_override and hooked it to the systemd shutdown services as in S6100
2021-03-04 21:23:05 +00:00
shlomibitton
bc3a1de4c0 [Mellanox] Add hw-mgmt patch for SimX platform adaptation (#6782)
- Why I did it
System is stuck on 'starting' state on SimX platform because of infinite loop on 'hw-management-ready.sh' script .
The loop is polling to check if the hw-mgmt sysfs created before proceeding with the flow, for SimX platform the sysfs will never create so the system is not starting properly.

- How I did it
Add a condition to poll on hw-mgmt sysfs only if the switch is real HW and not SimX platform.

- How to verify it
Check "systemctl status hw-management.service" output on a SimX switch with this patch, the state will be "active".

Signed-off-by: Shlomi Bitton <shlomibi@nvidia.com>
2021-03-04 21:23:05 +00:00
Eran Dahan
a21babef11 [Docker] Added support for python2 (#6753)
- Why I did it
Mellanox SDK APIs support python 2 at the moment.

- How I did it
Mellanox SDK APIs support python 2 at the moment.

- How to verify it
Add python 2 to Mellanox syncd only.

- Which release branch to backport (provide reason below if selected)
docker exec -t syncd /bin/bash -c "sx_api_dbg_generate_dump.py /home/sx_api_dbg_dump"
You can see that it will work and generate /home/sx_api_dbg_dump

Signed-off-by: allas <allas@nvidia.com>
2021-03-04 21:23:05 +00:00
Aravind Mani
4893666b1d DellEMC:Fix EEPROM read error (#6736)
#### Why I did it
EEPROM read failure was seen in Dell platforms

#### How I did it
Make python 2/3 compliant API's to fix the issue
2021-03-04 21:23:05 +00:00
Kebo Liu
5e44bc5e68 [Mellanox] Update hw-management package to version 7.0010.2000 (#6692)
- Why I did it
   Bug fixes
   - In rare cases when thermal algorithm is reactivated after FAN/PSU insertion, FAN remains at high rpm
   - When stop hw-management code received error in the log instead of exit code '0'.
   - In SPC1 i2c sometimes collide with chip reset coming from SDK
   - Remove raw eeprom data link, when working with PSU which don't have eeprom for "msn274x", "msn24xx" and "msn27xx" systems
   - Fix memory leak on mlxsw_core_bus_device module removal

- How I did it
Update the hw-mgmt version number in the make file
Update the hw-mgmt repo pointer

- How to verify it
run platform related test cases on all Mellanox platform

Signed-off-by: Kebo Liu <kebol@nvidia.com>
2021-03-04 21:23:05 +00:00
ec-michael-shih
e13672b94a [Platform] Accton add to support as4630-54te platform. (#6683)
Add support for Accton as4630-54te platform
2021-03-04 21:23:05 +00:00
Sujin Kang
15aed52ef2 [pcie.yaml] Move pcie configuration file path to platform directory (#6475)
- Why I did it
The pcie configuration file location is under plugin directory not under platform directory.
#6437

- How I did it

Move all pcie.yaml configuration file from plugin to platform directory.
Remove unnecessary timer to start pcie-check.service
Move pcie-check.service to sonic-host-services
- How to verify it
Verify on the device
2021-03-04 21:23:05 +00:00
vmittal-msft
9b570cb04d Updated BCM SAI to latest 4.3.3.1 drop (#6947) 2021-03-04 09:20:19 -08:00
Samuel Angebault
a654518968
[Arista] Driver and platform update (#6468) (#6872)
- Add support for `DCS-7050SX3-48YC8` and `DCS-7050SX3-48C8` platform
 - Add support for more variants of `DCS-7280CR3-32[PD]4`
 - Add Supervisor to Linecard consutil support
 - Complete Watchdog platform API support
 - Fix some PSU behavior on `DCS-7050QX-32` and `DCS-7060CX-32S`
 - Fix SEU management on `DCS-7060CX-32S`
 - Allow kernel modules to build up to linux 5.10
 - Rename led color `orange` to `amber`
 - Miscellaneous fixes
2021-02-24 10:09:52 -08:00
roman_savchuk
024b173521 [build]: fixed BFN target build (#6826)
#### Why I did it
Fix runtime issues caused by SONiC update

#### How I did it
- new attribute SAI_ACL_ENTRY_ATTR_FIELD_ACL_IP_TYPE  supported 
- new attribute SAI_SWITCH_ATTR_AVAILABLE_IPMC_ENTRY supported

Signed-off-by: Roman Savchuk <romanx.savchuk@intel.com>
2021-02-23 23:56:01 +00:00
Volodymyr Samotiy
2b64e72fa8 [Mellanox][SAI] update submodule pointer (#6806)
Open ACL Outer VLAN ID for egress for ports part of VLAN RIF

- Why I did it
Open ACL Outer VLAN ID for egress for ports part of VLAN RIF

- How I did it
Updated SAI submodule pointer

- How to verify it
Build an image, deploy and check all is up and running.
Verify ACL sonic-mgmt test is passing

Signed-off-by: Volodymyr Samotiy <volodymyrs@nvidia.com>
2021-02-23 23:56:01 +00:00
dflynn-Nokia
5661d20e21 [Nokia ixs7215] Platform API 2.0 improvements (#6787)
- Improve sonic-mgmt platform test suite pass rate
- Improve coverage of platform unit tests
- Provide platform specific reboot logic as per platform porting guide
- Fix bug due to pcie.yaml file being located in the wrong directory
2021-02-23 23:56:01 +00:00
Myron Sosyak
faf2fd3be1 [BFN] Fix MTU for internal interface (#6783)
Set correct MTU size of internal interface for Newport platform
2021-02-23 23:56:01 +00:00
Joe LeVeque
d7517a704c [PDDF] Build and install Python 3 package (#6286)
- Make PDDF code compliant with both Python 2 and Python 3
- Align code with PEP8 standards using autopep8
- Build and install both Python 2 and Python 3 PDDF packages
2021-02-23 23:56:01 +00:00
lguohan
a5085607b4 [syncd-vs]: remove hardcode version for iproute2 and libcap2-bin (#6713)
Fix #6711 

the requirement was introduced in commit 75104bb35d
to support sflow in stretch build. in buster build, the requirement
is met, no need to pin down the version.

Signed-off-by: Guohan Lu <lguohan@gmail.com>
2021-02-17 10:31:51 -08:00
Joe LeVeque
27ca6040f1 [platform] Update path to udevprefix.conf file (#6779)
Azure/sonic-utilities#1431 changes the path to the udevprefix.conf file. The file previously inappropriately resided in the <platform>/plugins/ directory. That directory is reserved for now-deprecated Python platform plugins, and will be removed in the near future.
2021-02-16 15:32:42 -08:00
gechiang
5640958322 [broadcom]: BRCM SAI 4.3.0.13-1 Pick up BRCM Patch to fix bogus interface counters (#6775)
This PR is needed to fix the show interface counters output issue where the counters are not correct due to an issue introduced in BRCM SAI 4.3.

- How to verify it
Without the fix if one injects packets, the expected counters for the interfaces involved do not show correct count values. The RX count looks to be TX count while TX count looks to be RX count but even that the values could not be trusted.
After the fix the counters started to look correct. Here is one sample output taken after the fix is applied where I manually injected 10,000 packets into Ethernet92 to be routed out of the port channel member port Ethernet40. Also injected 10,000 invalid packets into the same Ethernet92 and all 10,000 packets were shown RX_DRP correctly.
2021-02-16 15:32:25 -08:00
Volodymyr Boiko
8b3813b637 [barefoot][sonic-platform] Fix sfp reset (#6746)
Fix wrong sfp reset return value

Signed-off-by: Volodymyr Boyko <volodymyrx.boiko@intel.com>
2021-02-16 15:31:54 -08:00
Volodymyr Boiko
bcc4a52f56 [barefoot][sonic-platform] Refactor sfp.py (#6770)
Use separate file for each sfp eeprom operation

Signed-off-by: Volodymyr Boyko <volodymyrx.boiko@intel.com>
2021-02-16 15:31:44 -08:00
Volodymyr Boiko
5aadc7ff55 [barefoot][sonic-platform] Fix get_system_eeprom_info and refactor eeprom.py (#6739)
Signed-off-by: Volodymyr Boyko <volodymyrx.boiko@intel.com>
2021-02-16 15:31:38 -08:00
carl-nokia
fce7f8a24f [Platform][ixs7215]: Platform API test required files with Updates and Improvements (#6738)
- Why I did it
Enable platform API tests to run successfully by providing required test infrastructure files along with supporting changes.

- How I did it
Added platform.json along with supporting changes.
- Addition of pcie.yaml supporting pcied
- Addition of Real fan drawer support vs Virtual
- Removal of python2 wheel with support in place for python3
- supporting changes platform api tests
2021-02-16 15:31:24 -08:00
Volodymyr Boiko
814cf03f98 [platform][barefoot] Fix sonic_platform for x86_64-accton_wedge100bf_65x-r0 (#6612)
platform modules' deb package rules for x86_64-accton_wedge100bf_65x-r0 was missing the code that installs the sonic_platform wheel package file under device directory on host (which is checked by pmon deamon on start)
2021-02-16 15:31:17 -08:00
vmittal-msft
8aa5fbbe79 [broadcom]: Upgrading bcmsai from 4.3.0.10-5 to 4.3.0.13 (#6767)
Merged bcmsai 4.3.0.13 code to top of master bcmsai 4.3.0.10-5.

- How to verify it
Ran nightly regression on T0 and T1 topology using bcmsai 4.3.0.13. Test results are better than previous runs.
For T0 -
New test passing - CRM, Decap, FDB, platform test, VxLAN
New test failing – PFC unknown MAC
For T1 –
New test passing – Port Channel
New test failing – platform test
2021-02-16 15:31:02 -08:00
Blueve
b92dfb15cb [Celestica][haliburton][cp210x] modprobe cp210x to ensure the driver loaded properly (#6715)
This PR is to fix issue: #6603
The CP210x driver not attached properly after first login (cold-plug), lsusb and dmesg shown that the usb devices has been recognized but driver is missing.
2021-02-16 15:29:46 -08:00
Arun Saravanan Balachandran
9f577512f8 Dell S6100: Watchdog - Fix Python3 incompatibility (#6734)
To make watchdog.arm() method python3 compatible in DellEMC S6100.
2021-02-16 15:29:36 -08:00
Joe LeVeque
dc2ea28b1c [sonic-platform-common] Update submodule (#6742)
Submodule commits included:

* src/sonic-platform-common 6ad0004...bd4dc03 (1):
  > [sonic_sfp/qsfp_dd.py] Update DOM capability method name to align with other drivers (#163)

Also align all calling function names to match.
2021-02-16 15:29:28 -08:00
Tamer Ahmed
becd143a41 [syncd-rpc] Install Libboost Atomic 1.71, Libqtcore And Libqtnetwork (#6689)
When Building syncd-rpc, libthrift has dependency on libboost-atomic1.71.0,
however the debian packager install version 1.67 instead. This PR
preinstalls libboost-atomic v 1.71 to avoid falling back to v 1.67.

signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
2021-02-16 15:29:21 -08:00
jostar-yang
fb6e157ab3 [as7312-54x] Support platform API2.0 (#6272)
Add platform 2.0 support for Accton as7312-54x platform
2021-02-16 15:29:11 -08:00
lguohan
de4a675dd1 [saibcm-modules]: match linux kernel version (#6732)
Signed-off-by: Guohan Lu <lguohan@gmail.com>
2021-02-16 15:28:59 -08:00
vmittal-msft
b92c2f5150 [broadcom]: BRCM SAI 4.3.0.10-5 : Fix for ACL entry set attribute for IN_PORTS for TD3 (#6718)
ACL entry set attribute updates all the entries in the table. The correct behavior is to set the attribute on single entry.

- How I did it
Current SDK code, while setting the new attribute, is going through all the entries and updating it. Added a logic to check for requested entry and only allow for that ACL entry.
A case has filed with BRCM. Once an official fix is provided by BRCM, we will then remove this in house fix and apply the official fix.
2021-02-16 15:28:51 -08:00
Junchao-Mellanox
a04d7efddc Fix dynamic minimum fan table issue caused by python3 (#6690)
**- Why I did it**
After migrating to python3, the operator '/' always get a float result, but it gets integer result in python2. Need fix this in thermal_conditions.

**- How I did it**
1. cast float value to int
2. change the unit test case to cover this situation

**- How to verify it**
Manually test and regression test
2021-02-16 15:28:43 -08:00
lguohan
ab03441ce9 [sonic-linux-kernel]: security update to kernel 4.19.152 (#6490)
Signed-off-by: Guohan Lu <lguohan@gmail.com>
2021-02-16 15:28:34 -08:00
Roy Lee
191e90e400 [device/accton/as4630-54pe] Fix accton driver not been installed (#6321)
Accton util applies lsmod to check if drivers are installed.
But lsmod may return error on startup and skip module installation.

Signed-off-by: roy_lee <roy_lee@edge-core.com>
2021-02-16 15:28:19 -08:00
Aravind Mani
78ad83fe2a [DellEMC Z9332f] Added support for platform system health daemon (#6642) 2021-02-16 15:28:10 -08:00
Stepan Blyshchak
5fe8352978
[Mellanox][SAI] update submodule pointer (#6728)
- Apply device MAC on port host interface when port is removed from LAG.
- [Shared Headroom]: fixed watermark handling for SHP flow
- Decrease verbosity of policer unbind message when no policer is attached

Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
2021-02-10 23:13:06 -08:00
Volodymyr Boiko
742bbed255 [barefoot][platform] Fix sonic-platform host installation (#6696)
prerm is needed for platform modules package to be properly removed.
Added prerm to remove installed in postinst wheel packages.

Signed-off-by: Volodymyr Boyko <volodymyrx.boiko@intel.com>
2021-02-06 23:32:27 -08:00
gechiang
eccff4bf17 BRCM SAI 4.3.0.10-4 Fix _brcm_sai_indexed_data_get () with unexpected queue causing _brcm_sai_switch_assert () after warm reboot (#6664) 2021-02-05 16:15:49 -08:00
gechiang
a3bdbb79c1 [broadcom]: broadcom sai update to 4.3.0.10-3 (#6620)
1. BRCM SAI Debian build need not have any Kernel version dependency - Starting with 4.3 BRCM made changes in SAI so that this dependency has been cleaned up. We can now remove the Kernel Version dependency from Azure Pipeline build script.

2. Bypass PEER_MODE p2mp setting causing SYNCd crash on non-TD3 SKUs - Temporarily patch BRCM SAI code to not cause SYNCd crash when Orchagent program SAI_TUNNEL_ATTR_PEER_MODE: SAI_TUNNEL_PEER_MODE_P2MP on Non-TD3 SKUs. Will remove this when BRCM provide proper fix to address this issue.
2021-02-05 16:13:08 -08:00
Mahesh Maddikayala
bc2a13136a [BCMSAI] Update BCMSAI debian to 4.3.0.10 with 6.5.21 SDK, and opennsl module to 6.5.21 (#6526)
BCMSAI 4.3.0.10, 6.5.21 SDK release with enhancements and fixes for vxlan, TD3 MMU, TD4-X9 EA support, etc.
2021-02-05 16:08:04 -08:00
Danny Allen
c7d8faee18 Revert "BRCM SAI 4.3.0.10-4 Fix _brcm_sai_indexed_data_get () with unexpected queue causing _brcm_sai_switch_assert () after warm reboot (#6664)"
This reverts commit 9f2a85697f.
2021-02-05 16:07:49 -08:00
Joe LeVeque
78bf8159e8 [platform] Update QSFP method name 'parse_qsfp_dom_capability' -> 'parse_dom_capability' (#6695)
**- Why I did it**
PR https://github.com/Azure/sonic-platform-common/pull/102 modified the name of the SFF-8436 (QSFP) method to align the method name between all drivers, renaming it from `parse_qsfp_dom_capability` to `parse_dom_capability`. Once the submodule was updated, the callers using the old nomenclature broke. This PR updates all callers to use the new naming convention.

**- How I did it**

Update the name of the function globally for all calls into the SFF-8436 driver.

Note that the QSFP-DD driver still uses the old nomenclature and should be modified similarly. I will open a PR to handle this separately.
2021-02-05 15:48:30 -08:00
Arun Saravanan Balachandran
0bae3b44ec DellEMC: S6100, S6000 - Enable thermalctld, Platform API implementation and fixes (#6438)
**- Why I did it**

To incorporate the below changes in DellEMC S6100, S6000 platforms.

- S6100, S6000:
    - Enable 'thermalctld'
    - Implement DeviceBase methods (presence, status, model, serial) for Fantray and Component
    - Implement ‘get_position_in_parent’, ‘is_replaceable’ methods for all device types
    - Implement ‘get_status’ method for Fantray
    - Implement ‘get_temperature’, ‘get_temperature_high_threshold’, ‘get_voltage_high_threshold’, ‘get_voltage_low_threshold’ methods for PSU
    - Implement ‘get_status_led’, ‘set_status_led’ methods for Chassis
    - SFP:
        - Make EEPROM read both Python2 and Python3 compatible
        - Fix ‘get_tx_disable_channel’ method’s return type
        - Implement ‘tx_disable’, ‘tx_disable_channel’ and ‘set_power_override’ methods
- S6000:
    - Move PSU thermal sensors from Chassis to respective PSU
    - Make available the data of both Fans present in each Fantray


**- How I did it**

- Remove 'skip_thermalctld:true' in pmon_daemon_control.json
- Implement the platform API methods in the respective device files
- Use `bytearray` for data read from transceiver EEPROM 
- Change return type of 'get_tx_disable_channel' to match specification in sonic_platform_common/sfp_base.py
2021-02-05 15:48:13 -08:00
Lior Avramov
0244069666 [Mellanox] Update FW upgrade script to use 'mlxfwmanager -d' option for specifying MST device in FW burn operation (#6541)
**- Why I did it**
Reduce the time it takes for the ASIC FW burn as part of the automatic FW upgrade procedure.

**- How I did it**
Add -d option to mlxfwmanager tool to use the faster MST device and not the default one which is not the fastest one.

**- How to verify it**
I manually changed ASIC FW followed by reboot command in order for FW upgrade to take place on deinit.
I manually changed ASIC FW followed by hard reset in order for FW upgrade to take place on init.

Signed-off-by: liora <liora@nvidia.com>
2021-02-05 15:47:40 -08:00
xumia
3a7441c913 [build]: Fix syncd dpkg cache dependency issue (#6680)
* Fix syncd dpkg cache dependency issue
2021-02-05 15:47:28 -08:00
Eran Dahan
d7e9cba966 [MLNX] update SAI submodule to include fix for debug dump (#6667)
**Why I did it**
Disable SDK extended dump due to issue found

**How I did it**
Update SAI submodule

**How to verify it**
Verify the SDK extended dump is not called.

Signed-off-by: Eran Dahan <erand@nvidia.com>
2021-02-05 15:47:03 -08:00
gechiang
9f2a85697f BRCM SAI 4.3.0.10-4 Fix _brcm_sai_indexed_data_get () with unexpected queue causing _brcm_sai_switch_assert () after warm reboot (#6664) 2021-02-05 15:46:37 -08:00
Volodymyr Boiko
d26a4aff9a [platform][barefoot] Install sonic_platform to host (#6644)
- Why I did it
SONiC design requires sonic_platform package to be installed in SONiC host environment, not only in docker containers.

- How I did it
For now, sonic_platform python wheel package, that is used by pmon, is provided via device-specific platform modules deb packages that unpacks the wheel package file into specific device's directory on lazy-install.
The PR makes deb packages' postinst script also install these unpacked wheel packages to host.

Signed-off-by: Volodymyr Boyko <volodymyrx.boiko@intel.com>
2021-02-03 10:48:53 -08:00
Stephen Sun
bf8a76634c [syncd-rpc docker] Fix issue: ptf_nn_agent isn't able to start in syncd-rpc docker on buster (#6448)
- Why I did it
Fix issue: ptf_nn_agent isn't able to start in syncd-rpc docker on buster.

- How I did it
The issue is fixed by installing python-dev, cffi and nnpy for python 2 explicitly.

- How to verify it
Run copp test on RPC image.
2021-02-03 10:43:48 -08:00
gechiang
56a689cf30 [broadcom]: Fix BRCM Syncd Error:syncd#/supervisord: syncd sh: 1: ethtool: not found (#6615)
Starting with BRCM SAI 4.3.1.5 we see the following :ethtool not fount" error in syslog during boot up:
```
Jan 27 07:36:14.712472 str-s6100-acs-1 INFO syncd#/supervisord: syncd sh: 1:
Jan 27 07:36:14.712844 str-s6100-acs-1 INFO syncd#/supervisord: syncd ethtool: not found
Jan 27 07:36:14.713228 str-s6100-acs-1 INFO syncd#/supervisord: syncd #015
Jan 27 07:36:14.713840 str-s6100-acs-1 INFO syncd#syncd: [0] SAI_API_HOSTIF:_brcm_sai_hostif_speed_set:11894 cmd ethtool -s Ethernet39 speed 40000 rc:32512
Jan 27 07:36:14.717204 str-s6100-acs-1 NOTICE swss#orchagent: :- setHostIntfsOperStatus: Set operation status DOWN to host interface Ethernet39
Jan 27 07:36:14.717204 str-s6100-acs-1 NOTICE swss#orchagent: :- initPort: Initialized port Ethernet39
Jan 27 07:36:14.717204 str-s6100-acs-1 NOTICE swss#orchagent: :- initializePort: Initializing port alias:Ethernet36 pid:1000000000040
Jan 27 07:36:14.726793 str-s6100-acs-1 NOTICE swss#portsyncd: :- onMsg: nlmsg type:16 key:Ethernet36 admin:0 oper:0 addr:4c:76:25:f5:48:80 ifindex:75 master:0
Jan 27 07:36:14.727967 str-s6100-acs-1 NOTICE swss#portsyncd: :- onMsg: Publish Ethernet36(ok) to state db
Jan 27 07:36:14.729331 str-s6100-acs-1 NOTICE swss#orchagent: :- addHostIntfs: Create host interface for port Ethernet36
Jan 27 07:36:14.752398 str-s6100-acs-1 INFO syncd#/supervisord: syncd sh: 1: ethtool: not found#015
Jan 27 07:36:14.752689 str-s6100-acs-1 INFO syncd#syncd: [0] SAI_API_HOSTIF:_brcm_sai_hostif_speed_set:11894 cmd ethtool -s Ethernet36 speed 40000 rc:32512
Jan 27 07:36:14.756050 str-s6100-acs-1 NOTICE swss#orchagent: :- setHostIntfsOperStatus: Set operation status DOWN to host interface Ethernet36
Jan 27 07:36:14.757585 str-s6100-acs-1 NOTICE swss#orchagent: :- initPort: Initialized port Ethernet36
```
It seems that starting with BRCM SAI 4.2.1.5 syncd is using ethtool to set the host interface speed and since this ethtool was not part of the syncd Docker, we observe these "ethtool not found" issue.
2021-02-03 10:43:48 -08:00
Volodymyr Boiko
3e70b97342 [barefoot][platform] platform API 2.0 fixes (#6607)
To improve python3 support of berefoot's sonic_platform

Signed-off-by: Volodymyr Boyko <volodymyrx.boiko@intel.com>
2021-02-03 10:43:48 -08:00
Joe LeVeque
d146079362 [docker-sonic-vs] Install sonic-platform-common package (#6587)
**- Why I did it**

sonic-utilities will become dependent upon sonic-platform-common as of https://github.com/Azure/sonic-utilities/pull/1386.

**- How I did it**

- Add sonic-platform-common as a dependency in docker-sonic-vs.mk
- Additionally, no longer install Python 2 packages of swsssdk and sonic-py-common, as they should no longer be needed.
2021-02-03 10:37:57 -08:00
yozhao101
cc9c3f567e [supervisord] Monitoring the critical processes with supervisord. (#6242)
- Why I did it
Initially, we used Monit to monitor critical processes in each container. If one of critical processes was not running
or crashed due to some reasons, then Monit will write an alerting message into syslog periodically. If we add a new process
in a container, the corresponding Monti configuration file will also need to update. It is a little hard for maintenance.

Currently we employed event listener of Supervisod to do this monitoring. Since processes in each container are managed by
Supervisord, we can only focus on the logic of monitoring.

- How I did it
We borrowed the event listener of Supervisord to monitor critical processes in containers. The event listener will take
following steps if it was notified one of critical processes exited unexpectedly:

The event listener will first check whether the auto-restart mechanism was enabled for this container or not. If auto-restart mechanism was enabled, event listener will kill the Supervisord process, which should cause the container to exit and subsequently get restarted.

If auto-restart mechanism was not enabled for this contianer, the event listener will enter a loop which will first sleep 1 minute and then check whether the process is running. If yes, the event listener exits. If no, an alerting message will be written into syslog.

- How to verify it
First, we need checked whether the auto-restart mechanism of a container was enabled or not by running the command show feature status. If enabled, one critical process should be selected and killed manually, then we need check whether the container will be restarted or not.

Second, we can disable the auto-restart mechanism if it was enabled at step 1 by running the commnad sudo config feature autorestart <container_name> disabled. Then one critical process should be selected and killed. After that, we will see the alerting message which will appear in the syslog every 1 minute.

- Which release branch to backport (provide reason below if selected)

 201811
 201911
[x ] 202006
2021-01-28 09:28:27 -08:00
Kebo Liu
7fc8caa36b [mellanox]: Update SAI to sonic2012 1.18.1.0 (#6566)
Changes in the new release:

1. Policy based hashing optimization
2. New attribute support for Max port headroom
3. Tunnel ECN map fixes
4. Tunnel EVPN skeleton extensions (peer attrib, maps)
5. Bridge port admin not affecting port admin (optimize port down time)
6. CRM new API for neighbors and tunnel termination entries
7. Improve FDB event for flush by bridge port (before, null bridge was reported to SONiC, now the bridge will be extracted from bridge port)
8. DHCP L2 v4+v6 traps (for ZTP use case)
9. Generic counter implementation

Signed-off-by: Kebo Liu <kebol@nvidia.com>
2021-01-28 09:25:05 -08:00
Guohan Lu
11a8e89f27 [build]: add _BUILD_ENV to specify env for dpkg-buildpackage
Signed-off-by: Guohan Lu <lguohan@gmail.com>
2021-01-28 09:23:36 -08:00
Guohan Lu
0b2abafcde [docker-ptf]: build docker ptf
- combine docker-ptf-saithrift into docker-ptf docker
- build docker-ptf under platform vs
- remove docker-ptf for other platforms

Signed-off-by: Guohan Lu <lguohan@gmail.com>
2021-01-28 09:23:12 -08:00
Kebo Liu
3706b31b49 Add hw-mgmt patch to support SDK OFFLINE event for handling flow within service firmware upgrade (#6550)
During ISSU, "mlxsw_minimal" driver still trying to access firmware, in some cases FW could return some wrong critical threshold value which will cause switch shutdown.

**- How I did it**
In order to prevent "mlxsw_minimal" driver from accessing ASIC during ISSU, SDK will raise "OFFLINE" 'udev' event
at the early beginning of such flow. When this event is received, hw-management will remove "mlxsw_minimal" driver.
There is no need to implement the opposite "ONLINE" event since this flow is ended up with "kexec".

**- How to verify it**
repeatedly perform warm reboot, make sure there is no switch shutdown occurred.
2021-01-28 09:22:52 -08:00
Kebo Liu
a0fd862620 [mellanox]: Update SDK to 4.4.2308, FW to *.2008.2308 (#6552)
Bugs fixes:
    All | Kernel | During system reload when CPU is loaded with heavy traffic, a Kernel Panic may occur.
    All | Modules, Port split | FW stuck when device rebooted with locked Optical Transceivers in split mode
    Spectrum-3 | PFC | On Spectrum-3 systems, slow reaction time to Rx pause packets on 40GbE ports may lead to buffer overflow on servers.
    Spectrum-3 | SN4700, Port Split | On rare occasion SN4700, conducting 100G split (4x25G) in NRZ when splitter port 1 or 2 are down, ports 3 and 4 will also go down.

Enahncments:
    All | Kernel | new notification on ISSU start, so other kernel drivers can disable any interface to ASIC

Signed-off-by: Kebo Liu <kebol@nvidia.com>
2021-01-28 09:21:56 -08:00
Antonina Melnyk
df76e33bb3 [barefoot] Fixes for platform API (#6487)
There was a mismatch with Eeprom class methods names and methods called from Eeprom class.

Signed-off-by: Antonina Melnyk antoninax.melnyk@intel.com
2021-01-24 22:42:31 -08:00
Danny Allen
bedd05bb43 [DellEMC Z9332f] Remove duplicate ipmihelper.py script (#6536)
Fixes #6445

Because the ipmihelper.py script in the 9332 folder is slightly different than the common one (due to LGTM fixes), when the common one gets copied during build time it causes the workspace/build to become dirty.

Signed-off-by: Danny Allen <daall@microsoft.com>
2021-01-23 21:05:55 -08:00
Qi Luo
28b62bee3f sonic-config-engine uses libswsscommon instead of swsssdk (#6406)
**- Why I did it**
swsssdk will be deprecated. Migrate sonic-config-engine to use libswsscommon library instead

**- How to verify it**
Unit test
2021-01-22 10:56:13 -08:00
lguohan
9acbc591e1 [mellanox]: fix mellanox hw-management build (#6471)
use dpkg-buildpackage build with fakeroot

Signed-off-by: Guohan Lu <lguohan@gmail.com>
2021-01-19 01:11:42 -08:00
Kebo Liu
824d7adc2d [Mellanox] Make determine-reboot-cause service start after hw-management service (#6465)
**- Why I did it**

On the Mellanox platform, reboot cause is fetched from some certain sysfs which is created by the hw-management service. So determine-reboot-cause service shall start after hw-management, otherwise it could fail due to the related sysfs is not available yet.

**- How I did it**

Add a patch to the hw-management service to make sure determine-reboot-cause service should start after it.

Signed-off-by: Kebo Liu <kebol@nvidia.com>
2021-01-19 01:10:08 -08:00
Wirut Getbamrung
120a8da50d [device/celestica]: Add thermalctld support on DX010 platform APIs (#6089)
**- Why I did it**
- The thermalctld daemon on the Pmon docker requires support from the thermal manager API.

**- How I did it**
- Removed the old function for detecting a faulty fan.
- Removed the old function for detecting excess temperature.
- Implement thermal_manager APIs based on ThermalManagerBase
- Implement thermal_conditions APIs based on ThermalPolicyConditionBase
- Implement thermal_actions APIs based on ThermalPolicyActionBase
- Implement thermal_info APIs based on ThermalPolicyInfoBase
- Add thermal_policy.json
2021-01-19 01:09:54 -08:00
brandonchuang
c40e43aadb [device/accton] Fix accton driver not been installed (#6327)
Accton util applies lsmod to check if drivers are installed.
But lsmod may return error on startup and skip module installation.

Signed-off-by: Brandon Chuang <brandon_chuang@edge-core.com>
2021-01-15 08:23:04 -08:00
Roy Lee
29562d0a4b [device/accton]: As7816-64x, fix memory leakage on accton fan monitor. (#6168)
It's been reported that accton fan monitor process keeps consuming memory after few days.
The amount of memory occupied increases in linear and never leased.

Signed-off-by: roy_lee <roy_lee@edge-core.com>
2021-01-15 08:21:13 -08:00
Kebo Liu
21d4df3dcd [mellanox][platform api] fix a missing import time module (#6458)
“time" module was missed to be imported and will cause an error when the branch hit.

Signed-off-by: Kebo Liu <kebol@nvidia.com>
2021-01-15 08:20:57 -08:00
Kalimuthu-Velappan
e6de7d3286 [build]: Fix for missing dependencies in the DPKG framework (#6393)
1. Fixes the missing DPKG file for gbsyncd-vs package
2. Fixes the softlink issue on the Platform-common and ztp package
3. Fixes the PYTHNON_DEBS list is missing for DBG dockers.
2021-01-15 08:18:20 -08:00
Junchao-Mellanox
78ca4d1c1a [Mellanox] Fix issue: need import initialize_sdk_handle in get_sdk_handle (#6435)
Found test_sfp.py failed due to use a method without importing it.
2021-01-15 08:17:50 -08:00
dflynn-Nokia
533b7cc676 [Nokia ixs7215] Add SW assist for platform entropy & fix inband mgmt support (#6417)
- Improve random number generation during early Sonic initialization by providing SW updates to Linux entropy value.
- Improve handling of platform In-Band management port

This commit provides the following updates to the Nokia ixs7215 platform

1. The Marvell Armada-38x SOC requires SW assistance to improve the system
   entropy value available early on in the Sonic boot sequence.
2. The Nokia ixs7215 platform does not have a dedicated Out-Of-Band (OOB) mgmt
   port and thus requires additional logic to optionally support configuring
   front panel port 48 as an In-Band mgmt port. This commit provides additional
   logic to manage and maintain the operation of this In-Band mgmt port.
2021-01-15 08:16:46 -08:00
carl-nokia
d2f684b05c [Platform][nokia]: python3-smbus package add with python3 and jinja fixes (#6416)
fix platform driver breakage due to python3 upgrade and fix load minigraph errors with config load_minigraph -y

**- How I did it**
added python3-smbus to the pmon docker template since the previous was python2 specific 
fixed additional "ord" python2 specific code 
fixed the jinja templates used by qos reload - the template logic required data to be parsed 

**- How to verify it**
run "show platform XXX" commands and verify output
run "sudo config load_minigraph -y" and verify configuration 
run "show interfaces XXX" and verify output 

Co-authored-by: Carl Keene <keene@nokia.com>
2021-01-15 08:16:32 -08:00
guxianghong
9f89da15ba [Centec] upgrade to buster docker for DOCKER_SYNCD_CENTEC_RPC, docker-saiserver-centec and platform-modules (#6423)
Centec syncd have beend upgraded to buster, docker-syncd-centec-rpc do not need generate stretch based docker.

Co-authored-by: Xianghong Gu <xgu@centecnetworks.com>
2021-01-15 08:16:25 -08:00
Kebo Liu
4011e0f559 [Mellanox] [platform API] Fix “local variable 'label_port' referenced before assignment” error (#6419)
In rare case can see that xcvrd failed due to "UnboundLocalError: local variable 'label_port' referenced before assignment"

Init "label_port" as None at the beginning of the function, to avoid the case that "label_port" not assigned.
2021-01-15 08:16:06 -08:00
gechiang
6d9b05c032 Anchor the libprotobuf-dev version based on a fixed version by using debian control dependency (#6420) 2021-01-15 08:15:26 -08:00
lguohan
45b724fe76 [build]: fix dpkg admindir corruption issue in parallel build (#6408)
Fix #119

when parallel build is enable, multiple dpkg-buildpackage
instances are running at the same time. /var/lib/dpkg is shared
by all instances and the /var/lib/dpkg/updates could be corrupted
and cause the build failure.

the fix is to use overlay fs to mount separate /var/lib/dpkg
for each dpkg-buildpackage instance so that they are not affecting
each other.

Signed-off-by: Guohan Lu <lguohan@gmail.com>
2021-01-12 06:22:51 -08:00
guxianghong
2ae182623c [Centec ARM64]Upgrade Centec syncd docker to buster and Enable Telemetry on ARM64 (#6386)
* Enable telemetry for ARM64 by default

* [Centec]Upgrade Centec syncd docker to buster; libjemalloc2 have been installed in docker-base-buster, remove libjemalloc1 from docker-syncd-centec's Dockerfile.j2

Co-authored-by: Gu Xianghong <xgu@centecnetworks.com>
2021-01-09 08:29:36 -08:00
gechiang
7462850ba4 [brcm]: BRCM SAI 4.2.1.5-9 Fix _brcm_sai_indexed_data_get () with unexpected queue causing _brcm_sai_switch_assert () after warm reboot (#6374)
Starting from build (master) 176 the warm reboot on BRCM Platform started to experience syncd crash. Upon further debug by Ying it was determined that the crash was related to the following new change:
[Dynamic buffer calc] Support dynamic buffer calculation (#1338)

Ying also debugged further and found The crash was caused by buffer pool profile setting operation SAI_BUFFER_PROFILE_ATTR_SHARED_DYNAMIC_TH

A case has filed with BRCM while a potential fix was tried by Ying that seems to have addressed this issue and we are making this change available in master branch so that it will allow further feature validation/testing especially in the warm reboot area.
Once an official fix is provided by BRCM, we will then remove this in house fix and apply the official fix.

- How to verify it
Just perform warm reboot with any master code 175 or above you should see this issue or issue the following cmd will also cause the crash: "mmuconfig -p egress_lossy_profile -a 0"
2021-01-09 08:27:16 -08:00
carl-nokia
941c27ce2a [Nokia]: Enable Telemetry for armhf and provide required qos files (#6364)
* [platform][Nokia]: Add buffers and qos files for config qos reload

   - providing required files

* [platform][armhf]: remove hardcoded disable for Telemetry on armhf

Co-authored-by: Carl Keene <keene@nokia.com>
2021-01-09 08:26:52 -08:00
Aravind Mani
7305b55c80 DellEMC: Z9332f change SFP detection logic (#6261)
- Dynamically change EEPROM driver based on media type.
- Otherwise, EEPROM INFO and DOM INFO might not be fetched properly and will result in erroneous output.
2021-01-09 08:25:58 -08:00
Samuel Angebault
c6f14c9927
[202012][Arista] Update driver submodules (#6397)
- Cleanup and Refactor of library internals, logic mostly unchanged.
 - Enhance debugability with `arista dump` and `arista diag` commands.
 - Fix power supply detection issue.
2021-01-09 08:05:54 -08:00
lguohan
dc94373a86 [docker-sonic-vs]: reduce the build steps for docker-sonic-vs (#6350)
combine multiple same operation into one operation to reduce
the build steps. this is to avoid max depth exceeded issue
in the build.

Signed-off-by: Guohan Lu <lguohan@gmail.com>
2021-01-04 23:25:15 -08:00
Arun Saravanan Balachandran
1b049421aa [DellEMC] Add platform-modules as prerequisite for determine-reboot-cause (#6322)
Add a systemd dependency to make platform-modules service as a prerequisite for determine-reboot-cause service to ensure platform initialization is complete before determine-reboot-cause.service executes.
2021-01-04 23:25:07 -08:00
Myron Sosyak
f7bb635be8 [BFN] Convert platform modules to python 3 (#6347)
Fix syntax errors during xcvrd start with Python 3 daemons
2021-01-04 23:24:19 -08:00
Denys Petryshyn
42fa096883 [BFN] Upgrade docker-syncd-bfn to buster (#6345)
* Add changes to allow migration of bfn syncd to buster

* Update BFN packages for Debian 10

Signed-off-by: Denys Petryshyn <denysx.petryshyn@intel.com>
2021-01-04 23:23:46 -08:00
lguohan
07b9282456 [broadcom]: match the brcm sai filename version to control file version (#6339)
the control file version is 4.2.1.5-7

sonic$ sudo dpkg -i target/debs/buster/libsaibcm-dev_4.2.1.5-8_amd64.deb
(Reading database ... 175880 files and directories currently installed.)
Preparing to unpack .../libsaibcm-dev_4.2.1.5-8_amd64.deb ...
Unpacking libsaibcm-dev (4.2.1.5-7) over (4.2.1.5-7) ...
Setting up libsaibcm-dev (4.2.1.5-7) ...
lgh@491d842369cf:/sonic$ sudo dpkg -i target/debs/buster/libsaibcm_4.2.1.5-8_amd64.deb
(Reading database ... 175880 files and directories currently installed.)
Preparing to unpack .../libsaibcm_4.2.1.5-8_amd64.deb ...
Unpacking libsaibcm (4.2.1.5-7) over (4.2.1.5-7) ...
Setting up libsaibcm (4.2.1.5-7) ...
Processing triggers for libc-bin (2.28-10) ...

Signed-off-by: Guohan Lu <lguohan@gmail.com>
2021-01-03 08:47:41 -08:00
Kebo Liu
3acf7006ed
[mellanox]: Update Mellanox SDK to 4.4.2208 FW to *.2008.2208 (#6333)
Features:
    Spectrum-3 | Systems | Added GA-level support for SN4700 A0 system
    All | Shared headroom | Added GA-level support for Shared headroom between PGs

  Bugs fixes:
    All | Counters | Sent traffic in certain size is wrongly increase to a smaller size counter, because port extended counter has a counter for sent traffic per packet-size range
    All | Shared buffer | Configuring shared buffer on the fly may, on occasion, cause the chip to get stuck
    Spectrum-2 | Modules | On occasion, link down is experienced with INPHI COLORZ PAM4 100G optic cables on SN3700 systems
2020-12-31 17:44:02 -08:00
carl-nokia
a1fe203788
[Nokia]: EEPROM platform API Python3 compliance changes (#6318)
- Why I did it
Make EEPROM platform APIs Python3 compliant in Nokia platform.

- How I did it
Handle bytearray type returned by read_eeprom and read_eeprom_bytes methods.

- How to verify it
Boot Nokia ixs7215 and verify PMON docker running and show platform syseeprom

Co-authored-by: Carl Keene <keene@nokia.com>
2020-12-30 07:34:57 -08:00
Ubuntu
273846a412 FRR 7.5
Build libyang1 which is required for frr 7.5
2020-12-29 03:44:49 -08:00
Kebo Liu
9c06df8e1c
update mft tool to 4.15.3 (#6281) 2020-12-27 11:19:13 +02:00
Prince Sunny
8fd50e895c
[submodule]: swss Tunnel Manager changes (#5843)
Introduce tunnel manager daemon. Start the process as part of swss container

Submodule update for swss:
9ed3026 - 2020-12-24 : [NAT] ACL Rule with DO_NOT_NAT action is getting failed. (#1502) [Akhilesh Samineni]
c39a4b1 - 2020-12-23 : Mux/IPTunnel orchagent changes (#1497) [Prince Sunny]
bc8df0e - 2020-12-23 : Add support for headroom pool watermark (#1567) [Neetha John]
2020-12-26 11:17:18 -08:00
Joe LeVeque
d40c9a1e8d
[docker-base-buster][docker-config-engine-buster] No longer install Python 2 (#6162)
**- Why I did it**

As part of migrating SONiC codebase from Python 2 to Python 3

**- How I did it**

- No longer install Python 2 in docker-base-buster or docker-config-engine-buster.
- Install Python 2 and pip2 in the following containers until we can completely eliminate it there:
    - docker-platform-monitor
    - docker-sonic-mgmt-framework
    - docker-sonic-vs
- Pin pip2 version <21 where it is still temporarily needed, as pip version 21 will drop support for Python 2
- Also preform some other cleanup, ensuring that pip3, setuptools and wheel packages are installed in docker-base-buster, and then removing any attempts to re-install them in derived containers
2020-12-25 21:29:25 -08:00
Andriy Kokhan
94e143cebe
[BFN] Updated SAI headers to v1.7.1 (#6294)
Signed-off-by: Andriy Kokhan <andriyx.kokhan@intel.com>

Co-authored-by: Andriy Kokhan <andriyx.kokhan@intel.com>
2020-12-24 18:45:41 -08:00
KISHORE KUNAL
4bb8ab3495
Add support to start fdbsyncd when orchagent docker starts (#5979)
Add support to start fdbsyncd when swss docker starts. 
New demon is added to sync MAC from Kernel to DB and vise versa.
2020-12-24 18:36:01 -08:00