Commit Graph

1310 Commits

Author SHA1 Message Date
Samuel Angebault
b6fc13b89b
[Arista] Update platform drivers (#6945)
- Provide `hw-management-generate-dump.sh` for `show techsupport`
 - Load `optoe3` for OSFP and QSFP-DD transceivers
 - Enhance reboot-cause caching robustness

Signed-off-by: Samuel Angebault <staphylo@arista.com>
2021-03-03 09:09:10 -08:00
vmittal-msft
6eb9a9d0eb
Updated BCM SAI to latest 4.3.3.1 drop (#6947) 2021-03-03 09:07:53 -08:00
fk410167
20f0f069c1
Making PDDF 2.0 base classes python3 compliant (#6924)
- Made python2 to python3 changes
- Removed ord() func as python3 return int instead of str
- Had to change chr(..) to bytes([..]) function while using ctypes class methods
2021-03-01 09:48:59 -08:00
Kebo Liu
0e71d82f72
[Mellanox] Update hw-management package to version 7.0010.2000 (#6692)
- Why I did it
   Bug fixes
   - In rare cases when thermal algorithm is reactivated after FAN/PSU insertion, FAN remains at high rpm
   - When stop hw-management code received error in the log instead of exit code '0'.
   - In SPC1 i2c sometimes collide with chip reset coming from SDK
   - Remove raw eeprom data link, when working with PSU which don't have eeprom for "msn274x", "msn24xx" and "msn27xx" systems
   - Fix memory leak on mlxsw_core_bus_device module removal

- How I did it
Update the hw-mgmt version number in the make file
Update the hw-mgmt repo pointer

- How to verify it
run platform related test cases on all Mellanox platform

Signed-off-by: Kebo Liu <kebol@nvidia.com>
2021-03-01 10:01:50 +02:00
Sangita Maity
18263c99dd
[DPB|master] Update Dynamic Port Breakout Logic for flexible alias support a… (#6831)
To fix [DPB| wrong aliases for interfaces](https://github.com/Azure/sonic-buildimage/issues/6024) issue, implimented flexible alias support [design doc](https://github.com/Azure/SONiC/pull/749)

> [[dpb|config] Fix the validation logic of breakout mode](https://github.com/Azure/sonic-utilities/pull/1440) depends on this

#### How I did it

1. Removed `"alias_at_lanes"` from port-configuration file(i.e. platfrom.json) 
2. Added dictionary to "breakout_modes" values. This defines the breakout modes available on the platform for this parent port, and it maps to the alias list. The alias list presents the alias names for individual ports in order under this breakout mode.
```
{
    "interfaces": {
        "Ethernet0": {
            "index": "1,1,1,1",
            "lanes": "0,1,2,3",
            "breakout_modes": {
                "1x100G[40G]": ["Eth1"],
                "2x50G": ["Eth1/1", "Eth1/2"],
                "4x25G[10G]": ["Eth1/1", "Eth1/2", "Eth1/3", "Eth1/4"],
                "2x25G(2)+1x50G(2)": ["Eth1/1", "Eth1/2", "Eth1/3"],
                "1x50G(2)+2x25G(2)": ["Eth1/1", "Eth1/2", "Eth1/3"]
            }
        }
}
```
#### How to verify it
`config interface breakout`

Signed-off-by: Sangita Maity <samaity@linkedin.com>
2021-02-26 00:13:33 -08:00
Joe LeVeque
ac15a42c57
[DellEMC] Ensure concrete platform API classes call base class initializer (#6853)
In preparation for the merging of Azure/sonic-platform-common#173, which properly defines class and instance members in the Platform API base classes.

It is proper object-oriented methodology to call the base class initializer, even if it is only the default initializer. This also future-proofs the potential addition of custom initializers in the base classes down the road.
2021-02-25 11:20:34 -08:00
Joe LeVeque
516ff8bfff
[Mellanox] Ensure concrete platform API classes call base class initializer (#6854)
In preparation for the merging of Azure/sonic-platform-common#173, which properly defines class and instance members in the Platform API base classes.

It is proper object-oriented methodology to call the base class initializer, even if it is only the default initializer. This also future-proofs the potential addition of custom initializers in the base classes down the road.
2021-02-25 11:06:22 -08:00
ec-michael-shih
66e3e51f70
[Platform] Accton add to support as4630-54te platform. (#6683)
Add support for Accton as4630-54te platform
2021-02-25 10:47:38 -08:00
Aravind Mani
ab785f52d3
DellEMC:Fix EEPROM read error (#6736)
#### Why I did it
EEPROM read failure was seen in Dell platforms

#### How I did it
Make python 2/3 compliant API's to fix the issue
2021-02-25 10:17:05 -08:00
Rajkumar-Marvell
965f4901ec
[Marvell] Updated armhf SAI deb version info. (#6863)
Modified the MRVL SAI debian version format to include debian revision number. This helps in identifying the SAI deb causing any build/runtime issue.

Signed-off-by: Rajkumar Pennadam Ramamoorthy <rpennadamram@marvell.com>
2021-02-25 09:17:45 -08:00
shlomibitton
3de6a67353
[Mellanox] Add hw-mgmt patch for SimX platform adaptation (#6782)
- Why I did it
System is stuck on 'starting' state on SimX platform because of infinite loop on 'hw-management-ready.sh' script .
The loop is polling to check if the hw-mgmt sysfs created before proceeding with the flow, for SimX platform the sysfs will never create so the system is not starting properly.

- How I did it
Add a condition to poll on hw-mgmt sysfs only if the switch is real HW and not SimX platform.

- How to verify it
Check "systemctl status hw-management.service" output on a SimX switch with this patch, the state will be "active".

Signed-off-by: Shlomi Bitton <shlomibi@nvidia.com>
2021-02-25 12:41:29 +02:00
DavidZagury
5aee92e56d
[Mellanox] Add support for SN4600 system (#6879)
- Why I did it
Add support for new 64x200G SN4600 systems

- How I did it
Add all relevant files (w/o platform.json and hwsku.json as they will come later) with default SKU.

- How to verify it
Install image on switch, verify all ports are up and configured properly, run full platform SONiC tests.
2021-02-25 09:30:43 +02:00
ruijie.com.cn
a582c13e98
[Ruijie] Add ruijie platform & device (#4954)
Add new platform x86_64-ruijie_b6510-48vs8cq-r0 (Trident 3)
    ASIC Vendor: Broadcom
    Switch ASIC: Trident 3
    Port Config: 48x25G+8x100G

Signed-off-by: tim-rj <sonic_rd@ruijie.com.cn>
2021-02-24 16:45:27 -08:00
Arun Saravanan Balachandran
142d93b80a
DellEMC: S5232, Z9264, Z9332 - Platform API fixes (#6842)
#### Why I did it

To incorporate the below changes in DellEMC S5232, Z9264, Z9332 platforms.

- Update thermal high threshold values
- Make watchdog API Python2 and Python3 compatible
- Fix LGTM alerts
- Z9264: Fix get_change_event timer value

#### How I did it

- Use 'universal_newlines=True' in subprocess.Popen call.
- Change the timeout in 'get_change_event' to milliseconds to match specification in sonic_platform_common/chassis_base.py
2021-02-24 12:42:01 -08:00
Samuel Angebault
cf55ca110d
[Arista] Update driver submodules (#6873)
- Fix parent `__init__` call for platform API, based on Azure/sonic-platform-common#173
 - Implementation for more platform API methods
 - Daily storage information reporting in `arista daemon`
 - Enhancements to reboot cause reporting, now multi-sourcing reboot cause information
 - Miscellaneous fixes
2021-02-24 12:32:48 -08:00
rkdevi27
a37824ff0c
[dell/s6000]: Enable graceful reboot in S6000 (#6835)
The S6000 devices, the cold reboot is abrupt and it is likely to cause issues which will cause the device to land into EFI shell. Hence the platform reboot will happen after graceful unmount of all the filesystems as in S6100.

Moved the platform_reboot to platform_reboot_override and hooked it to the systemd shutdown services as in S6100
2021-02-24 12:00:24 -08:00
Eran Dahan
a472cabc0b
[Docker] Added support for python2 (#6753)
- Why I did it
Mellanox SDK APIs support python 2 at the moment.

- How I did it
Mellanox SDK APIs support python 2 at the moment.

- How to verify it
Add python 2 to Mellanox syncd only.

- Which release branch to backport (provide reason below if selected)
docker exec -t syncd /bin/bash -c "sx_api_dbg_generate_dump.py /home/sx_api_dbg_dump"
You can see that it will work and generate /home/sx_api_dbg_dump

Signed-off-by: allas <allas@nvidia.com>
2021-02-24 19:49:58 +02:00
Volodymyr Boiko
8c4fd2b73b
[barefoot][platform] Refactor legacy scripts (#6871)
Removed or adapted obsolete code

Signed-off-by: Volodymyr Boyko <volodymyrx.boiko@intel.com>
2021-02-24 09:42:18 -08:00
Volodymyr Boiko
8ec75803a7
[barefoot][device][platform] Moved pcie.yaml (#6862)
To fix #6860

Signed-off-by: Volodymyr Boyko <volodymyrx.boiko@intel.com>
2021-02-23 13:16:42 -08:00
dflynn-Nokia
cbe7493b8e
[Nokia ixs7215] Platform API 2.0 improvements (#6787)
- Improve sonic-mgmt platform test suite pass rate
- Improve coverage of platform unit tests
- Provide platform specific reboot logic as per platform porting guide
- Fix bug due to pcie.yaml file being located in the wrong directory
2021-02-23 09:25:14 -08:00
roman_savchuk
a2b7cdfda3
[build]: fixed BFN target build (#6826)
#### Why I did it
Fix runtime issues caused by SONiC update

#### How I did it
- new attribute SAI_ACL_ENTRY_ATTR_FIELD_ACL_IP_TYPE  supported 
- new attribute SAI_SWITCH_ATTR_AVAILABLE_IPMC_ENTRY supported

Signed-off-by: Roman Savchuk <romanx.savchuk@intel.com>
2021-02-22 23:31:33 -08:00
Aravind Mani
3aee87d6dc
Dell S6000,S6100 system health changes (#6788)
Needed support for platform system health in Dell platforms
2021-02-22 23:26:59 -08:00
Myron Sosyak
7ec4d15320
[BFN] Fix MTU for internal interface (#6783)
Set correct MTU size of internal interface for Newport platform
2021-02-22 10:23:46 -08:00
Sujin Kang
d5238ae8dd
[pcie.yaml] Move pcie configuration file path to platform directory (#6475)
- Why I did it
The pcie configuration file location is under plugin directory not under platform directory.
#6437

- How I did it

Move all pcie.yaml configuration file from plugin to platform directory.
Remove unnecessary timer to start pcie-check.service
Move pcie-check.service to sonic-host-services
- How to verify it
Verify on the device
2021-02-21 08:27:37 -08:00
Samuel Angebault
5fb374b03d
[Arista] Driver and platform update (#6468)
- Add support for `DCS-7050SX3-48YC8` and `DCS-7050SX3-48C8` platform
 - Add support for more variants of `DCS-7280CR3-32[PD]4`
 - Add Supervisor to Linecard consutil support
 - Complete Watchdog platform API support
 - Fix some PSU behavior on `DCS-7050QX-32` and `DCS-7060CX-32S`
 - Fix SEU management on `DCS-7060CX-32S`
 - Allow kernel modules to build up to linux 5.10
 - Rename led color `orange` to `amber`
 - Miscellaneous fixes
2021-02-19 10:48:52 -08:00
Volodymyr Samotiy
ea100d2a19
[Mellanox][SAI] update submodule pointer (#6806)
Open ACL Outer VLAN ID for egress for ports part of VLAN RIF

- Why I did it
Open ACL Outer VLAN ID for egress for ports part of VLAN RIF

- How I did it
Updated SAI submodule pointer

- How to verify it
Build an image, deploy and check all is up and running.
Verify ACL sonic-mgmt test is passing

Signed-off-by: Volodymyr Samotiy <volodymyrs@nvidia.com>
2021-02-18 11:53:19 +02:00
Volodymyr Samotiy
6998aef114
[Mellanox] Update SDK to 4.4.2318, FW to *.2008.2314 (#6794)
To have the following fixes:
* All | Port status remains down after warm boot and flapping the port on peer side
* All | LAG HASH  | IPv6 SRC_IP is not accounted in LAG hashing [
* All | ASIC driver | Kernel crash observed when driver reload is initiated before it fully loaded
* Spectrum-3 | Buffer | In lossless configuration, headroom is been evicted only when the shared buffers is free
* All | prevent FW access during ISSU

Signed-off-by: Volodymyr Samotiy <volodymyrs@nvidia.com>
2021-02-16 18:07:11 -08:00
Joe LeVeque
c651fbc5d6
[platform] Update path to udevprefix.conf file (#6779)
Azure/sonic-utilities#1431 changes the path to the udevprefix.conf file. The file previously inappropriately resided in the <platform>/plugins/ directory. That directory is reserved for now-deprecated Python platform plugins, and will be removed in the near future.
2021-02-14 10:40:24 -08:00
gechiang
0252425377
[broadcom]: BRCM SAI 4.3.0.13-1 Pick up BRCM Patch to fix bogus interface counters (#6775)
This PR is needed to fix the show interface counters output issue where the counters are not correct due to an issue introduced in BRCM SAI 4.3.

- How to verify it
Without the fix if one injects packets, the expected counters for the interfaces involved do not show correct count values. The RX count looks to be TX count while TX count looks to be RX count but even that the values could not be trusted.
After the fix the counters started to look correct. Here is one sample output taken after the fix is applied where I manually injected 10,000 packets into Ethernet92 to be routed out of the port channel member port Ethernet40. Also injected 10,000 invalid packets into the same Ethernet92 and all 10,000 packets were shown RX_DRP correctly.
2021-02-12 03:00:16 -08:00
Volodymyr Boiko
b8da05147e
[barefoot][sonic-platform] Fix sfp reset (#6746)
Fix wrong sfp reset return value

Signed-off-by: Volodymyr Boyko <volodymyrx.boiko@intel.com>
2021-02-11 18:13:53 -08:00
Volodymyr Boiko
72ca4d740b
[barefoot][sonic-platform] Refactor sfp.py (#6770)
Use separate file for each sfp eeprom operation

Signed-off-by: Volodymyr Boyko <volodymyrx.boiko@intel.com>
2021-02-11 17:53:15 -08:00
Volodymyr Boiko
32c497f5e3
[barefoot][sonic-platform] Fix get_system_eeprom_info and refactor eeprom.py (#6739)
Signed-off-by: Volodymyr Boyko <volodymyrx.boiko@intel.com>
2021-02-11 15:00:35 -08:00
carl-nokia
2bf1806a34
[Platform][ixs7215]: Platform API test required files with Updates and Improvements (#6738)
- Why I did it
Enable platform API tests to run successfully by providing required test infrastructure files along with supporting changes.

- How I did it
Added platform.json along with supporting changes.
- Addition of pcie.yaml supporting pcied
- Addition of Real fan drawer support vs Virtual
- Removal of python2 wheel with support in place for python3
- supporting changes platform api tests
2021-02-11 13:17:00 -08:00
Volodymyr Boiko
9cd9ff72ea
[platform][barefoot] Fix sonic_platform for x86_64-accton_wedge100bf_65x-r0 (#6612)
platform modules' deb package rules for x86_64-accton_wedge100bf_65x-r0 was missing the code that installs the sonic_platform wheel package file under device directory on host (which is checked by pmon deamon on start)
2021-02-11 10:52:37 -08:00
vmittal-msft
71c5e74c42
[broadcom]: Upgrading bcmsai from 4.3.0.10-5 to 4.3.0.13 (#6767)
Merged bcmsai 4.3.0.13 code to top of master bcmsai 4.3.0.10-5.

- How to verify it
Ran nightly regression on T0 and T1 topology using bcmsai 4.3.0.13. Test results are better than previous runs.
For T0 -
New test passing - CRM, Decap, FDB, platform test, VxLAN
New test failing – PFC unknown MAC
For T1 –
New test passing – Port Channel
New test failing – platform test
2021-02-11 02:31:56 -08:00
Stepan Blyshchak
0e17525937
[Mellanox][SAI] update submodule pointer (#6729)
Include SAI bug fixes:

- Apply device MAC on port host interface when port is removed from LAG.
- [Shared Headroom]: fixed watermark handling for SHP flow
- Decrease verbosity of policer unbind message when no policer is attached

Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
2021-02-10 23:13:46 -08:00
Blueve
d8fe592e2b
[Celestica][haliburton][cp210x] modprobe cp210x to ensure the driver loaded properly (#6715)
This PR is to fix issue: #6603
The CP210x driver not attached properly after first login (cold-plug), lsusb and dmesg shown that the usb devices has been recognized but driver is missing.
2021-02-10 06:15:13 -08:00
Arun Saravanan Balachandran
8ec747f2eb
Dell S6100: Watchdog - Fix Python3 incompatibility (#6734)
To make watchdog.arm() method python3 compatible in DellEMC S6100.
2021-02-10 06:13:24 -08:00
Joe LeVeque
7ea0d9e27a
[sonic-platform-common] Update submodule (#6742)
Submodule commits included:

* src/sonic-platform-common 6ad0004...bd4dc03 (1):
  > [sonic_sfp/qsfp_dd.py] Update DOM capability method name to align with other drivers (#163)

Also align all calling function names to match.
2021-02-10 06:12:49 -08:00
Tamer Ahmed
149a68b956
[syncd-rpc] Install Libboost Atomic 1.71, Libqtcore And Libqtnetwork (#6689)
When Building syncd-rpc, libthrift has dependency on libboost-atomic1.71.0,
however the debian packager install version 1.67 instead. This PR
preinstalls libboost-atomic v 1.71 to avoid falling back to v 1.67.

signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
2021-02-10 02:26:31 -08:00
jostar-yang
c54a03fade
[as7312-54x] Support platform API2.0 (#6272)
Add platform 2.0 support for Accton as7312-54x platform
2021-02-09 10:02:09 -08:00
lguohan
cea8c1895e
[saibcm-modules]: match linux kernel version (#6732)
Signed-off-by: Guohan Lu <lguohan@gmail.com>
2021-02-09 08:48:23 -08:00
Arun Saravanan Balachandran
3015de1dd0
[sonic-host-service] Move to sonic-host-services package (#6273)
- Why I did it

To move ‘sonic-host-service’ which is currently built as a separate package to ‘sonic-host-services' package. 

- How I did it

- Moved 'sonic-host-server' to 'src/sonic-host-services' and included it as part of the python3 wheel.
- Other files were moved to 'src/sonic-host-services-data' and included as part of the deb package.
- Changed build option ‘INCLUDE_HOST_SERVICE’ to ‘ENABLE_HOST_SERVICE_ON_START’ for enabling sonic-hostservice at boot-up by default.
2021-02-08 19:35:08 -08:00
vmittal-msft
1d99d14edf
[broadcom]: BRCM SAI 4.3.0.10-5 : Fix for ACL entry set attribute for IN_PORTS for TD3 (#6718)
ACL entry set attribute updates all the entries in the table. The correct behavior is to set the attribute on single entry.

- How I did it
Current SDK code, while setting the new attribute, is going through all the entries and updating it. Added a logic to check for requested entry and only allow for that ACL entry.
A case has filed with BRCM. Once an official fix is provided by BRCM, we will then remove this in house fix and apply the official fix.
2021-02-08 08:07:33 -08:00
lguohan
de51ee33df
[syncd-vs]: remove hardcode version for iproute2 and libcap2-bin (#6713)
Fix #6711 

the requirement was introduced in commit 75104bb35d
to support sflow in stretch build. in buster build, the requirement
is met, no need to pin down the version.

Signed-off-by: Guohan Lu <lguohan@gmail.com>
2021-02-07 19:51:22 -08:00
Junchao-Mellanox
6d4c20efb1
Fix dynamic minimum fan table issue caused by python3 (#6690)
**- Why I did it**
After migrating to python3, the operator '/' always get a float result, but it gets integer result in python2. Need fix this in thermal_conditions.

**- How I did it**
1. cast float value to int
2. change the unit test case to cover this situation

**- How to verify it**
Manually test and regression test
2021-02-07 11:21:44 +02:00
lguohan
834347b8f7
[sonic-linux-kernel]: security update to kernel 4.19.152 (#6490)
Signed-off-by: Guohan Lu <lguohan@gmail.com>
2021-02-06 21:02:06 -08:00
Roy Lee
b6a6c0c6b2
[device/accton/as4630-54pe] Fix accton driver not been installed (#6321)
Accton util applies lsmod to check if drivers are installed.
But lsmod may return error on startup and skip module installation.

Signed-off-by: roy_lee <roy_lee@edge-core.com>
2021-02-06 13:56:16 -08:00
Aravind Mani
c6085c6e1b
[DellEMC Z9332f] Added support for platform system health daemon (#6642) 2021-02-06 13:53:35 -08:00
Volodymyr Boiko
3f2a493c14
[barefoot][platform] Fix sonic-platform host installation (#6696)
prerm is needed for platform modules package to be properly removed.
Added prerm to remove installed in postinst wheel packages.

Signed-off-by: Volodymyr Boyko <volodymyrx.boiko@intel.com>
2021-02-06 11:33:06 -08:00
Joe LeVeque
18f2c5cfdd
[platform] Update QSFP method name 'parse_qsfp_dom_capability' -> 'parse_dom_capability' (#6695)
**- Why I did it**
PR https://github.com/Azure/sonic-platform-common/pull/102 modified the name of the SFF-8436 (QSFP) method to align the method name between all drivers, renaming it from `parse_qsfp_dom_capability` to `parse_dom_capability`. Once the submodule was updated, the callers using the old nomenclature broke. This PR updates all callers to use the new naming convention.

**- How I did it**

Update the name of the function globally for all calls into the SFF-8436 driver.

Note that the QSFP-DD driver still uses the old nomenclature and should be modified similarly. I will open a PR to handle this separately.
2021-02-05 14:41:05 -08:00
Arun Saravanan Balachandran
fa89c6dd8f
DellEMC: S6100, S6000 - Enable thermalctld, Platform API implementation and fixes (#6438)
**- Why I did it**

To incorporate the below changes in DellEMC S6100, S6000 platforms.

- S6100, S6000:
    - Enable 'thermalctld'
    - Implement DeviceBase methods (presence, status, model, serial) for Fantray and Component
    - Implement ‘get_position_in_parent’, ‘is_replaceable’ methods for all device types
    - Implement ‘get_status’ method for Fantray
    - Implement ‘get_temperature’, ‘get_temperature_high_threshold’, ‘get_voltage_high_threshold’, ‘get_voltage_low_threshold’ methods for PSU
    - Implement ‘get_status_led’, ‘set_status_led’ methods for Chassis
    - SFP:
        - Make EEPROM read both Python2 and Python3 compatible
        - Fix ‘get_tx_disable_channel’ method’s return type
        - Implement ‘tx_disable’, ‘tx_disable_channel’ and ‘set_power_override’ methods
- S6000:
    - Move PSU thermal sensors from Chassis to respective PSU
    - Make available the data of both Fans present in each Fantray


**- How I did it**

- Remove 'skip_thermalctld:true' in pmon_daemon_control.json
- Implement the platform API methods in the respective device files
- Use `bytearray` for data read from transceiver EEPROM 
- Change return type of 'get_tx_disable_channel' to match specification in sonic_platform_common/sfp_base.py
2021-02-05 12:30:08 -08:00
Lior Avramov
f76926add3
[Mellanox] Update FW upgrade script to use 'mlxfwmanager -d' option for specifying MST device in FW burn operation (#6541)
**- Why I did it**
Reduce the time it takes for the ASIC FW burn as part of the automatic FW upgrade procedure.

**- How I did it**
Add -d option to mlxfwmanager tool to use the faster MST device and not the default one which is not the fastest one.

**- How to verify it**
I manually changed ASIC FW followed by reboot command in order for FW upgrade to take place on deinit.
I manually changed ASIC FW followed by hard reset in order for FW upgrade to take place on init.

Signed-off-by: liora <liora@nvidia.com>
2021-02-04 19:44:16 +02:00
xumia
19ccba4d05
[build]: Fix syncd dpkg cache dependency issue (#6680)
* Fix syncd dpkg cache dependency issue
2021-02-04 09:03:14 -08:00
Eran Dahan
984c1cd209
[MLNX] update SAI submodule to include fix for debug dump (#6667)
**Why I did it**
Disable SDK extended dump due to issue found

**How I did it**
Update SAI submodule

**How to verify it**
Verify the SDK extended dump is not called.

Signed-off-by: Eran Dahan <erand@nvidia.com>
2021-02-04 09:12:28 +02:00
gechiang
f00588855d
BRCM SAI 4.3.0.10-4 Fix _brcm_sai_indexed_data_get () with unexpected queue causing _brcm_sai_switch_assert () after warm reboot (#6664) 2021-02-03 15:59:56 -08:00
Volodymyr Boiko
f8ddc39adb
[platform][barefoot] Install sonic_platform to host (#6644)
- Why I did it
SONiC design requires sonic_platform package to be installed in SONiC host environment, not only in docker containers.

- How I did it
For now, sonic_platform python wheel package, that is used by pmon, is provided via device-specific platform modules deb packages that unpacks the wheel package file into specific device's directory on lazy-install.
The PR makes deb packages' postinst script also install these unpacked wheel packages to host.

Signed-off-by: Volodymyr Boyko <volodymyrx.boiko@intel.com>
2021-02-02 16:19:43 -08:00
gechiang
7928fbf36b
[broadcom]: broadcom sai update to 4.3.0.10-3 (#6620)
1. BRCM SAI Debian build need not have any Kernel version dependency - Starting with 4.3 BRCM made changes in SAI so that this dependency has been cleaned up. We can now remove the Kernel Version dependency from Azure Pipeline build script.

2. Bypass PEER_MODE p2mp setting causing SYNCd crash on non-TD3 SKUs - Temporarily patch BRCM SAI code to not cause SYNCd crash when Orchagent program SAI_TUNNEL_ATTR_PEER_MODE: SAI_TUNNEL_PEER_MODE_P2MP on Non-TD3 SKUs. Will remove this when BRCM provide proper fix to address this issue.
2021-01-31 16:59:01 -08:00
Stephen Sun
4f50658cfc
[syncd-rpc docker] Fix issue: ptf_nn_agent isn't able to start in syncd-rpc docker on buster (#6448)
- Why I did it
Fix issue: ptf_nn_agent isn't able to start in syncd-rpc docker on buster.

- How I did it
The issue is fixed by installing python-dev, cffi and nnpy for python 2 explicitly.

- How to verify it
Run copp test on RPC image.
2021-01-31 09:11:33 +02:00
gechiang
c5d47791e8
[broadcom]: Fix BRCM Syncd Error:syncd#/supervisord: syncd sh: 1: ethtool: not found (#6615)
Starting with BRCM SAI 4.3.1.5 we see the following :ethtool not fount" error in syslog during boot up:
```
Jan 27 07:36:14.712472 str-s6100-acs-1 INFO syncd#/supervisord: syncd sh: 1:
Jan 27 07:36:14.712844 str-s6100-acs-1 INFO syncd#/supervisord: syncd ethtool: not found
Jan 27 07:36:14.713228 str-s6100-acs-1 INFO syncd#/supervisord: syncd #015
Jan 27 07:36:14.713840 str-s6100-acs-1 INFO syncd#syncd: [0] SAI_API_HOSTIF:_brcm_sai_hostif_speed_set:11894 cmd ethtool -s Ethernet39 speed 40000 rc:32512
Jan 27 07:36:14.717204 str-s6100-acs-1 NOTICE swss#orchagent: :- setHostIntfsOperStatus: Set operation status DOWN to host interface Ethernet39
Jan 27 07:36:14.717204 str-s6100-acs-1 NOTICE swss#orchagent: :- initPort: Initialized port Ethernet39
Jan 27 07:36:14.717204 str-s6100-acs-1 NOTICE swss#orchagent: :- initializePort: Initializing port alias:Ethernet36 pid:1000000000040
Jan 27 07:36:14.726793 str-s6100-acs-1 NOTICE swss#portsyncd: :- onMsg: nlmsg type:16 key:Ethernet36 admin:0 oper:0 addr:4c:76:25:f5:48:80 ifindex:75 master:0
Jan 27 07:36:14.727967 str-s6100-acs-1 NOTICE swss#portsyncd: :- onMsg: Publish Ethernet36(ok) to state db
Jan 27 07:36:14.729331 str-s6100-acs-1 NOTICE swss#orchagent: :- addHostIntfs: Create host interface for port Ethernet36
Jan 27 07:36:14.752398 str-s6100-acs-1 INFO syncd#/supervisord: syncd sh: 1: ethtool: not found#015
Jan 27 07:36:14.752689 str-s6100-acs-1 INFO syncd#syncd: [0] SAI_API_HOSTIF:_brcm_sai_hostif_speed_set:11894 cmd ethtool -s Ethernet36 speed 40000 rc:32512
Jan 27 07:36:14.756050 str-s6100-acs-1 NOTICE swss#orchagent: :- setHostIntfsOperStatus: Set operation status DOWN to host interface Ethernet36
Jan 27 07:36:14.757585 str-s6100-acs-1 NOTICE swss#orchagent: :- initPort: Initialized port Ethernet36
```
It seems that starting with BRCM SAI 4.2.1.5 syncd is using ethtool to set the host interface speed and since this ethtool was not part of the syncd Docker, we observe these "ethtool not found" issue.
2021-01-30 05:28:33 -08:00
Volodymyr Boiko
481870670d
[barefoot][platform] platform API 2.0 fixes (#6607)
To improve python3 support of berefoot's sonic_platform

Signed-off-by: Volodymyr Boyko <volodymyrx.boiko@intel.com>
2021-01-29 17:29:37 -08:00
Joe LeVeque
5985d949fa
[docker-sonic-vs] Install sonic-platform-common package (#6587)
**- Why I did it**

sonic-utilities will become dependent upon sonic-platform-common as of https://github.com/Azure/sonic-utilities/pull/1386.

**- How I did it**

- Add sonic-platform-common as a dependency in docker-sonic-vs.mk
- Additionally, no longer install Python 2 packages of swsssdk and sonic-py-common, as they should no longer be needed.
2021-01-28 09:44:43 -08:00
Mahesh Maddikayala
f6b842edd3
[BCMSAI] Update BCMSAI debian to 4.3.0.10 with 6.5.21 SDK, and opennsl module to 6.5.21 (#6526)
BCMSAI 4.3.0.10, 6.5.21 SDK release with enhancements and fixes for vxlan, TD3 MMU, TD4-X9 EA support, etc.
2021-01-28 08:38:47 -08:00
Kebo Liu
7f222e7bc1
[mellanox]: Update SAI to sonic2012 1.18.1.0 (#6566)
Changes in the new release:

1. Policy based hashing optimization
2. New attribute support for Max port headroom
3. Tunnel ECN map fixes
4. Tunnel EVPN skeleton extensions (peer attrib, maps)
5. Bridge port admin not affecting port admin (optimize port down time)
6. CRM new API for neighbors and tunnel termination entries
7. Improve FDB event for flush by bridge port (before, null bridge was reported to SONiC, now the bridge will be extracted from bridge port)
8. DHCP L2 v4+v6 traps (for ZTP use case)
9. Generic counter implementation

Signed-off-by: Kebo Liu <kebol@nvidia.com>
2021-01-27 12:29:28 -08:00
Guohan Lu
044efe72ca [build]: add _BUILD_ENV to specify env for dpkg-buildpackage
Signed-off-by: Guohan Lu <lguohan@gmail.com>
2021-01-27 08:28:21 -08:00
Guohan Lu
ca0e8cbe0e [docker-ptf]: build docker ptf
- combine docker-ptf-saithrift into docker-ptf docker
- build docker-ptf under platform vs
- remove docker-ptf for other platforms

Signed-off-by: Guohan Lu <lguohan@gmail.com>
2021-01-27 08:28:21 -08:00
Kebo Liu
9ff56445c9
Add hw-mgmt patch to support SDK OFFLINE event for handling flow within service firmware upgrade (#6550)
During ISSU, "mlxsw_minimal" driver still trying to access firmware, in some cases FW could return some wrong critical threshold value which will cause switch shutdown.

**- How I did it**
In order to prevent "mlxsw_minimal" driver from accessing ASIC during ISSU, SDK will raise "OFFLINE" 'udev' event
at the early beginning of such flow. When this event is received, hw-management will remove "mlxsw_minimal" driver.
There is no need to implement the opposite "ONLINE" event since this flow is ended up with "kexec".

**- How to verify it**
repeatedly perform warm reboot, make sure there is no switch shutdown occurred.
2021-01-27 15:39:54 +02:00
Kebo Liu
84985e103d
[mellanox]: Update SDK to 4.4.2308, FW to *.2008.2308 (#6552)
Bugs fixes:
    All | Kernel | During system reload when CPU is loaded with heavy traffic, a Kernel Panic may occur.
    All | Modules, Port split | FW stuck when device rebooted with locked Optical Transceivers in split mode
    Spectrum-3 | PFC | On Spectrum-3 systems, slow reaction time to Rx pause packets on 40GbE ports may lead to buffer overflow on servers.
    Spectrum-3 | SN4700, Port Split | On rare occasion SN4700, conducting 100G split (4x25G) in NRZ when splitter port 1 or 2 are down, ports 3 and 4 will also go down.

Enahncments:
    All | Kernel | new notification on ISSU start, so other kernel drivers can disable any interface to ASIC

Signed-off-by: Kebo Liu <kebol@nvidia.com>
2021-01-25 10:52:22 -08:00
Guohan Lu
4b5212b8b2 [vstest]: add default vs test
Check #6483

add test to make sure default route change in eth0 does not
affect the default route in the default vrf

Signed-off-by: Guohan Lu <lguohan@gmail.com>
2021-01-24 22:25:11 -08:00
Antonina Melnyk
da7f80d70b
[barefoot] Fixes for platform API (#6487)
There was a mismatch with Eeprom class methods names and methods called from Eeprom class.

Signed-off-by: Antonina Melnyk antoninax.melnyk@intel.com
2021-01-24 16:32:18 -08:00
Danny Allen
ef6a05f07e
[DellEMC Z9332f] Remove duplicate ipmihelper.py script (#6536)
Fixes #6445

Because the ipmihelper.py script in the 9332 folder is slightly different than the common one (due to LGTM fixes), when the common one gets copied during build time it causes the workspace/build to become dirty.

Signed-off-by: Danny Allen <daall@microsoft.com>
2021-01-23 00:26:46 -08:00
yozhao101
be3c036794
[supervisord] Monitoring the critical processes with supervisord. (#6242)
- Why I did it
Initially, we used Monit to monitor critical processes in each container. If one of critical processes was not running
or crashed due to some reasons, then Monit will write an alerting message into syslog periodically. If we add a new process
in a container, the corresponding Monti configuration file will also need to update. It is a little hard for maintenance.

Currently we employed event listener of Supervisod to do this monitoring. Since processes in each container are managed by
Supervisord, we can only focus on the logic of monitoring.

- How I did it
We borrowed the event listener of Supervisord to monitor critical processes in containers. The event listener will take
following steps if it was notified one of critical processes exited unexpectedly:

The event listener will first check whether the auto-restart mechanism was enabled for this container or not. If auto-restart mechanism was enabled, event listener will kill the Supervisord process, which should cause the container to exit and subsequently get restarted.

If auto-restart mechanism was not enabled for this contianer, the event listener will enter a loop which will first sleep 1 minute and then check whether the process is running. If yes, the event listener exits. If no, an alerting message will be written into syslog.

- How to verify it
First, we need checked whether the auto-restart mechanism of a container was enabled or not by running the command show feature status. If enabled, one critical process should be selected and killed manually, then we need check whether the container will be restarted or not.

Second, we can disable the auto-restart mechanism if it was enabled at step 1 by running the commnad sudo config feature autorestart <container_name> disabled. Then one critical process should be selected and killed. After that, we will see the alerting message which will appear in the syslog every 1 minute.

- Which release branch to backport (provide reason below if selected)

 201811
 201911
[x ] 202006
2021-01-21 12:57:49 -08:00
Qi Luo
3c16f80382
sonic-config-engine uses libswsscommon instead of swsssdk (#6406)
**- Why I did it**
swsssdk will be deprecated. Migrate sonic-config-engine to use libswsscommon library instead

**- How to verify it**
Unit test
2021-01-20 12:06:08 -08:00
lguohan
755c73797c
[mellanox]: fix mellanox hw-management build (#6471)
use dpkg-buildpackage build with fakeroot

Signed-off-by: Guohan Lu <lguohan@gmail.com>
2021-01-18 13:10:27 -08:00
Kebo Liu
4cf9316ec3
[Mellanox] Make determine-reboot-cause service start after hw-management service (#6465)
**- Why I did it**

On the Mellanox platform, reboot cause is fetched from some certain sysfs which is created by the hw-management service. So determine-reboot-cause service shall start after hw-management, otherwise it could fail due to the related sysfs is not available yet.

**- How I did it**

Add a patch to the hw-management service to make sure determine-reboot-cause service should start after it.

Signed-off-by: Kebo Liu <kebol@nvidia.com>
2021-01-15 11:38:31 -08:00
Wirut Getbamrung
0ca343422d
[device/celestica]: Add thermalctld support on DX010 platform APIs (#6089)
**- Why I did it**
- The thermalctld daemon on the Pmon docker requires support from the thermal manager API.

**- How I did it**
- Removed the old function for detecting a faulty fan.
- Removed the old function for detecting excess temperature.
- Implement thermal_manager APIs based on ThermalManagerBase
- Implement thermal_conditions APIs based on ThermalPolicyConditionBase
- Implement thermal_actions APIs based on ThermalPolicyActionBase
- Implement thermal_info APIs based on ThermalPolicyInfoBase
- Add thermal_policy.json
2021-01-15 10:20:47 -08:00
Roy Lee
c9d3e25115
[device/accton]: As7816-64x, fix memory leakage on accton fan monitor. (#6168)
It's been reported that accton fan monitor process keeps consuming memory after few days.
The amount of memory occupied increases in linear and never leased.

Signed-off-by: roy_lee <roy_lee@edge-core.com>
2021-01-15 08:06:21 -08:00
Kebo Liu
1b2980540d
[mellanox][platform api] fix a missing import time module (#6458)
“time" module was missed to be imported and will cause an error when the branch hit.

Signed-off-by: Kebo Liu <kebol@nvidia.com>
2021-01-15 08:01:11 -08:00
Kalimuthu-Velappan
18350a5dd9
[build]: Fix for missing dependencies in the DPKG framework (#6393)
1. Fixes the missing DPKG file for gbsyncd-vs package
2. Fixes the softlink issue on the Platform-common and ztp package
3. Fixes the PYTHNON_DEBS list is missing for DBG dockers.
2021-01-13 10:32:42 -08:00
Junchao-Mellanox
0a49edb68e
[Mellanox] Fix issue: need import initialize_sdk_handle in get_sdk_handle (#6435)
Found test_sfp.py failed due to use a method without importing it.
2021-01-13 09:42:04 -08:00
dflynn-Nokia
674fac21c1
[Nokia ixs7215] Add SW assist for platform entropy & fix inband mgmt support (#6417)
- Improve random number generation during early Sonic initialization by providing SW updates to Linux entropy value.
- Improve handling of platform In-Band management port

This commit provides the following updates to the Nokia ixs7215 platform

1. The Marvell Armada-38x SOC requires SW assistance to improve the system
   entropy value available early on in the Sonic boot sequence.
2. The Nokia ixs7215 platform does not have a dedicated Out-Of-Band (OOB) mgmt
   port and thus requires additional logic to optionally support configuring
   front panel port 48 as an In-Band mgmt port. This commit provides additional
   logic to manage and maintain the operation of this In-Band mgmt port.
2021-01-12 16:59:42 -08:00
carl-nokia
380edf054d
[Platform][nokia]: python3-smbus package add with python3 and jinja fixes (#6416)
fix platform driver breakage due to python3 upgrade and fix load minigraph errors with config load_minigraph -y

**- How I did it**
added python3-smbus to the pmon docker template since the previous was python2 specific 
fixed additional "ord" python2 specific code 
fixed the jinja templates used by qos reload - the template logic required data to be parsed 

**- How to verify it**
run "show platform XXX" commands and verify output
run "sudo config load_minigraph -y" and verify configuration 
run "show interfaces XXX" and verify output 

Co-authored-by: Carl Keene <keene@nokia.com>
2021-01-12 15:05:06 -08:00
guxianghong
d4f9fa56aa
[Centec] upgrade to buster docker for DOCKER_SYNCD_CENTEC_RPC, docker-saiserver-centec and platform-modules (#6423)
Centec syncd have beend upgraded to buster, docker-syncd-centec-rpc do not need generate stretch based docker.

Co-authored-by: Xianghong Gu <xgu@centecnetworks.com>
2021-01-12 12:36:10 -08:00
Kebo Liu
015b421e5e
[Mellanox] [platform API] Fix “local variable 'label_port' referenced before assignment” error (#6419)
In rare case can see that xcvrd failed due to "UnboundLocalError: local variable 'label_port' referenced before assignment"

Init "label_port" as None at the beginning of the function, to avoid the case that "label_port" not assigned.
2021-01-12 10:43:57 -08:00
jostar-yang
bbd6967c82
[as7326-54x] Remove not need executable flag (#6326)
Remove executable bit from the service files
2021-01-12 10:40:51 -08:00
gechiang
26fd52780d
Anchor the libprotobuf-dev version based on a fixed version by using debian control dependency (#6420) 2021-01-12 09:51:15 -08:00
lguohan
ab2ae41212
[build]: fix dpkg admindir corruption issue in parallel build (#6408)
Fix #119

when parallel build is enable, multiple dpkg-buildpackage
instances are running at the same time. /var/lib/dpkg is shared
by all instances and the /var/lib/dpkg/updates could be corrupted
and cause the build failure.

the fix is to use overlay fs to mount separate /var/lib/dpkg
for each dpkg-buildpackage instance so that they are not affecting
each other.

Signed-off-by: Guohan Lu <lguohan@gmail.com>
2021-01-12 06:03:12 -08:00
Samuel Angebault
1498408ce7
[Arista] Update driver submodules (#6396)
- Cleanup and Refactor of library internals, logic mostly unchanged.
 - Enhance debugability with `arista dump` and `arista diag` commands.
 - Fix power supply detection issue.
2021-01-10 07:42:56 -08:00
guxianghong
c64052bb28
[Centec ARM64]Upgrade Centec syncd docker to buster and Enable Telemetry on ARM64 (#6386)
* Enable telemetry for ARM64 by default

* [Centec]Upgrade Centec syncd docker to buster; libjemalloc2 have been installed in docker-base-buster, remove libjemalloc1 from docker-syncd-centec's Dockerfile.j2

Co-authored-by: Gu Xianghong <xgu@centecnetworks.com>
2021-01-09 08:07:30 -08:00
Joe LeVeque
e52581e919
[PDDF] Build and install Python 3 package (#6286)
- Make PDDF code compliant with both Python 2 and Python 3
- Align code with PEP8 standards using autopep8
- Build and install both Python 2 and Python 3 PDDF packages
2021-01-07 10:03:29 -08:00
gechiang
a6907a7c62
[brcm]: BRCM SAI 4.2.1.5-9 Fix _brcm_sai_indexed_data_get () with unexpected queue causing _brcm_sai_switch_assert () after warm reboot (#6374)
Starting from build (master) 176 the warm reboot on BRCM Platform started to experience syncd crash. Upon further debug by Ying it was determined that the crash was related to the following new change:
[Dynamic buffer calc] Support dynamic buffer calculation (#1338)

Ying also debugged further and found The crash was caused by buffer pool profile setting operation SAI_BUFFER_PROFILE_ATTR_SHARED_DYNAMIC_TH

A case has filed with BRCM while a potential fix was tried by Ying that seems to have addressed this issue and we are making this change available in master branch so that it will allow further feature validation/testing especially in the warm reboot area.
Once an official fix is provided by BRCM, we will then remove this in house fix and apply the official fix.

- How to verify it
Just perform warm reboot with any master code 175 or above you should see this issue or issue the following cmd will also cause the crash: "mmuconfig -p egress_lossy_profile -a 0"
2021-01-07 05:46:48 -08:00
carl-nokia
f99dbff722
[Nokia]: Enable Telemetry for armhf and provide required qos files (#6364)
* [platform][Nokia]: Add buffers and qos files for config qos reload

   - providing required files

* [platform][armhf]: remove hardcoded disable for Telemetry on armhf

Co-authored-by: Carl Keene <keene@nokia.com>
2021-01-06 15:09:13 -08:00
Aravind Mani
4632e0b5b1
DellEMC: Z9332f change SFP detection logic (#6261)
- Dynamically change EEPROM driver based on media type.
- Otherwise, EEPROM INFO and DOM INFO might not be fetched properly and will result in erroneous output.
2021-01-06 11:12:04 -08:00
lguohan
6fbd52b1c1
[docker-sonic-vs]: reduce the build steps for docker-sonic-vs (#6350)
combine multiple same operation into one operation to reduce
the build steps. this is to avoid max depth exceeded issue
in the build.

Signed-off-by: Guohan Lu <lguohan@gmail.com>
2021-01-04 21:57:58 -08:00
Arun Saravanan Balachandran
19b2b44638
[DellEMC] Add platform-modules as prerequisite for determine-reboot-cause (#6322)
Add a systemd dependency to make platform-modules service as a prerequisite for determine-reboot-cause service to ensure platform initialization is complete before determine-reboot-cause.service executes.
2021-01-04 20:33:44 -08:00
sandycelestica
c38eee8294
Update udev rules to support 96 ttyUSB ports (#6334)
Signed-off-by: Jing Kan jika@microsoft.com
2021-01-05 10:26:05 +08:00
Myron Sosyak
9006b961b7
[BFN] Convert platform modules to python 3 (#6347)
Fix syntax errors during xcvrd start with Python 3 daemons
2021-01-04 15:00:48 -08:00
Denys Petryshyn
9e57f5157e
[BFN] Upgrade docker-syncd-bfn to buster (#6345)
* Add changes to allow migration of bfn syncd to buster

* Update BFN packages for Debian 10

Signed-off-by: Denys Petryshyn <denysx.petryshyn@intel.com>
2021-01-04 10:38:53 -08:00
lguohan
618b4bf243
[broadcom]: match the brcm sai filename version to control file version (#6339)
the control file version is 4.2.1.5-7

sonic$ sudo dpkg -i target/debs/buster/libsaibcm-dev_4.2.1.5-8_amd64.deb
(Reading database ... 175880 files and directories currently installed.)
Preparing to unpack .../libsaibcm-dev_4.2.1.5-8_amd64.deb ...
Unpacking libsaibcm-dev (4.2.1.5-7) over (4.2.1.5-7) ...
Setting up libsaibcm-dev (4.2.1.5-7) ...
lgh@491d842369cf:/sonic$ sudo dpkg -i target/debs/buster/libsaibcm_4.2.1.5-8_amd64.deb
(Reading database ... 175880 files and directories currently installed.)
Preparing to unpack .../libsaibcm_4.2.1.5-8_amd64.deb ...
Unpacking libsaibcm (4.2.1.5-7) over (4.2.1.5-7) ...
Setting up libsaibcm (4.2.1.5-7) ...
Processing triggers for libc-bin (2.28-10) ...

Signed-off-by: Guohan Lu <lguohan@gmail.com>
2021-01-03 08:28:05 -08:00
brandonchuang
c4156b8745
[device/accton] Fix accton driver not been installed (#6327)
Accton util applies lsmod to check if drivers are installed.
But lsmod may return error on startup and skip module installation.

Signed-off-by: Brandon Chuang <brandon_chuang@edge-core.com>
2021-01-02 08:02:30 -08:00