Commit Graph

654 Commits

Author SHA1 Message Date
Mahesh Maddikayala
f6b842edd3
[BCMSAI] Update BCMSAI debian to 4.3.0.10 with 6.5.21 SDK, and opennsl module to 6.5.21 (#6526)
BCMSAI 4.3.0.10, 6.5.21 SDK release with enhancements and fixes for vxlan, TD3 MMU, TD4-X9 EA support, etc.
2021-01-28 08:38:47 -08:00
Guohan Lu
ca0e8cbe0e [docker-ptf]: build docker ptf
- combine docker-ptf-saithrift into docker-ptf docker
- build docker-ptf under platform vs
- remove docker-ptf for other platforms

Signed-off-by: Guohan Lu <lguohan@gmail.com>
2021-01-27 08:28:21 -08:00
Danny Allen
ef6a05f07e
[DellEMC Z9332f] Remove duplicate ipmihelper.py script (#6536)
Fixes #6445

Because the ipmihelper.py script in the 9332 folder is slightly different than the common one (due to LGTM fixes), when the common one gets copied during build time it causes the workspace/build to become dirty.

Signed-off-by: Danny Allen <daall@microsoft.com>
2021-01-23 00:26:46 -08:00
yozhao101
be3c036794
[supervisord] Monitoring the critical processes with supervisord. (#6242)
- Why I did it
Initially, we used Monit to monitor critical processes in each container. If one of critical processes was not running
or crashed due to some reasons, then Monit will write an alerting message into syslog periodically. If we add a new process
in a container, the corresponding Monti configuration file will also need to update. It is a little hard for maintenance.

Currently we employed event listener of Supervisod to do this monitoring. Since processes in each container are managed by
Supervisord, we can only focus on the logic of monitoring.

- How I did it
We borrowed the event listener of Supervisord to monitor critical processes in containers. The event listener will take
following steps if it was notified one of critical processes exited unexpectedly:

The event listener will first check whether the auto-restart mechanism was enabled for this container or not. If auto-restart mechanism was enabled, event listener will kill the Supervisord process, which should cause the container to exit and subsequently get restarted.

If auto-restart mechanism was not enabled for this contianer, the event listener will enter a loop which will first sleep 1 minute and then check whether the process is running. If yes, the event listener exits. If no, an alerting message will be written into syslog.

- How to verify it
First, we need checked whether the auto-restart mechanism of a container was enabled or not by running the command show feature status. If enabled, one critical process should be selected and killed manually, then we need check whether the container will be restarted or not.

Second, we can disable the auto-restart mechanism if it was enabled at step 1 by running the commnad sudo config feature autorestart <container_name> disabled. Then one critical process should be selected and killed. After that, we will see the alerting message which will appear in the syslog every 1 minute.

- Which release branch to backport (provide reason below if selected)

 201811
 201911
[x ] 202006
2021-01-21 12:57:49 -08:00
Wirut Getbamrung
0ca343422d
[device/celestica]: Add thermalctld support on DX010 platform APIs (#6089)
**- Why I did it**
- The thermalctld daemon on the Pmon docker requires support from the thermal manager API.

**- How I did it**
- Removed the old function for detecting a faulty fan.
- Removed the old function for detecting excess temperature.
- Implement thermal_manager APIs based on ThermalManagerBase
- Implement thermal_conditions APIs based on ThermalPolicyConditionBase
- Implement thermal_actions APIs based on ThermalPolicyActionBase
- Implement thermal_info APIs based on ThermalPolicyInfoBase
- Add thermal_policy.json
2021-01-15 10:20:47 -08:00
Roy Lee
c9d3e25115
[device/accton]: As7816-64x, fix memory leakage on accton fan monitor. (#6168)
It's been reported that accton fan monitor process keeps consuming memory after few days.
The amount of memory occupied increases in linear and never leased.

Signed-off-by: roy_lee <roy_lee@edge-core.com>
2021-01-15 08:06:21 -08:00
jostar-yang
bbd6967c82
[as7326-54x] Remove not need executable flag (#6326)
Remove executable bit from the service files
2021-01-12 10:40:51 -08:00
gechiang
26fd52780d
Anchor the libprotobuf-dev version based on a fixed version by using debian control dependency (#6420) 2021-01-12 09:51:15 -08:00
lguohan
ab2ae41212
[build]: fix dpkg admindir corruption issue in parallel build (#6408)
Fix #119

when parallel build is enable, multiple dpkg-buildpackage
instances are running at the same time. /var/lib/dpkg is shared
by all instances and the /var/lib/dpkg/updates could be corrupted
and cause the build failure.

the fix is to use overlay fs to mount separate /var/lib/dpkg
for each dpkg-buildpackage instance so that they are not affecting
each other.

Signed-off-by: Guohan Lu <lguohan@gmail.com>
2021-01-12 06:03:12 -08:00
Samuel Angebault
1498408ce7
[Arista] Update driver submodules (#6396)
- Cleanup and Refactor of library internals, logic mostly unchanged.
 - Enhance debugability with `arista dump` and `arista diag` commands.
 - Fix power supply detection issue.
2021-01-10 07:42:56 -08:00
gechiang
a6907a7c62
[brcm]: BRCM SAI 4.2.1.5-9 Fix _brcm_sai_indexed_data_get () with unexpected queue causing _brcm_sai_switch_assert () after warm reboot (#6374)
Starting from build (master) 176 the warm reboot on BRCM Platform started to experience syncd crash. Upon further debug by Ying it was determined that the crash was related to the following new change:
[Dynamic buffer calc] Support dynamic buffer calculation (#1338)

Ying also debugged further and found The crash was caused by buffer pool profile setting operation SAI_BUFFER_PROFILE_ATTR_SHARED_DYNAMIC_TH

A case has filed with BRCM while a potential fix was tried by Ying that seems to have addressed this issue and we are making this change available in master branch so that it will allow further feature validation/testing especially in the warm reboot area.
Once an official fix is provided by BRCM, we will then remove this in house fix and apply the official fix.

- How to verify it
Just perform warm reboot with any master code 175 or above you should see this issue or issue the following cmd will also cause the crash: "mmuconfig -p egress_lossy_profile -a 0"
2021-01-07 05:46:48 -08:00
Aravind Mani
4632e0b5b1
DellEMC: Z9332f change SFP detection logic (#6261)
- Dynamically change EEPROM driver based on media type.
- Otherwise, EEPROM INFO and DOM INFO might not be fetched properly and will result in erroneous output.
2021-01-06 11:12:04 -08:00
Arun Saravanan Balachandran
19b2b44638
[DellEMC] Add platform-modules as prerequisite for determine-reboot-cause (#6322)
Add a systemd dependency to make platform-modules service as a prerequisite for determine-reboot-cause service to ensure platform initialization is complete before determine-reboot-cause.service executes.
2021-01-04 20:33:44 -08:00
sandycelestica
c38eee8294
Update udev rules to support 96 ttyUSB ports (#6334)
Signed-off-by: Jing Kan jika@microsoft.com
2021-01-05 10:26:05 +08:00
lguohan
618b4bf243
[broadcom]: match the brcm sai filename version to control file version (#6339)
the control file version is 4.2.1.5-7

sonic$ sudo dpkg -i target/debs/buster/libsaibcm-dev_4.2.1.5-8_amd64.deb
(Reading database ... 175880 files and directories currently installed.)
Preparing to unpack .../libsaibcm-dev_4.2.1.5-8_amd64.deb ...
Unpacking libsaibcm-dev (4.2.1.5-7) over (4.2.1.5-7) ...
Setting up libsaibcm-dev (4.2.1.5-7) ...
lgh@491d842369cf:/sonic$ sudo dpkg -i target/debs/buster/libsaibcm_4.2.1.5-8_amd64.deb
(Reading database ... 175880 files and directories currently installed.)
Preparing to unpack .../libsaibcm_4.2.1.5-8_amd64.deb ...
Unpacking libsaibcm (4.2.1.5-7) over (4.2.1.5-7) ...
Setting up libsaibcm (4.2.1.5-7) ...
Processing triggers for libc-bin (2.28-10) ...

Signed-off-by: Guohan Lu <lguohan@gmail.com>
2021-01-03 08:28:05 -08:00
brandonchuang
c4156b8745
[device/accton] Fix accton driver not been installed (#6327)
Accton util applies lsmod to check if drivers are installed.
But lsmod may return error on startup and skip module installation.

Signed-off-by: Brandon Chuang <brandon_chuang@edge-core.com>
2021-01-02 08:02:30 -08:00
Pradchaya Phucharoen
bf693be113
[platform/dx010] Remove unused pca9541 device init line (#6280)
Remove the unused codes addressed in #5891 on Dx010 platform.
2020-12-24 01:58:42 -08:00
Srideep
2f1445e6f1
[DellEMC-Z9332f] Fix platform issues (#6250)
[DellEMC-Z9332f] Fix platform issues
* Change to optoe driver
* fix API 2.0 issues
* Support reboot reason
2020-12-22 16:21:09 -08:00
vmittal-msft
ee8c3d34a2
[sonic-sairedis submodule] Updated SAI header 1.7.1 for BRCM and Mellanox SDK/SAI (#6218)
* [Mellanox] Update SAI to 1.18.0
* [Mellanox] Update SDK to 4.4.2112
* Updated Mellanox SAI to 1.18.0.2
* Updated bcmsai debians to use SAI 1.7.1
* Updated Mellanox to use SAI 1.7.1
* Updated submodule sonic-sairedis using SAI 1.7.1

Co-authored-by: Vineet Mittal <vmittalmittal@microsoft.com>
Co-authored-by: Nazarii Hnydyn <nazariig@nvidia.com>
2020-12-20 12:11:06 -08:00
gechiang
f9e96283e4
fix BRCM SAI warm-reboot Removing ip2me trap entry returns error results in orchagent crash (#6244) 2020-12-18 10:08:56 -08:00
Roy Lee
54681f169f
[device/accton] AS****_54X, validate accton util to set sfp's tx_disable (#5941)
bug fix: #5914

Validated for tx_disable function of SFP+ on AS7312-54X, AS5812-54X, AS5712-54x, and AS5812-54x.

Signed-off-by: roy_lee <roy_lee@edge-core.com>
2020-12-17 22:35:59 -08:00
vmittal-msft
a624aa01c7
Upgrade syncd to buster. (#6106)
- Why I did it
To upgrade brcm syncd to buster

- How I did it

Updated BCM SAI using kernel version 4.19.0-12 and debian 10 to support buster.
Updated syncd docker from stretch to buster in sonic-buildimage
- How to verify it

Ensured docker is running synd buster.
After upgrade, ensured all BGP peers and ip interfaces are up.
Ping to BGP neighbors is working fine.
2020-12-17 12:46:45 -08:00
Wirut Getbamrung
4257c792a2
[device/celestica]: Add xcvrd event support for Seastone-DX010 (#5896)
- Add sysfs interrupt to notify userspace app of external interrupt
- Implement get_change_event() in chassis api.
2020-12-14 10:22:56 -08:00
Srideep
3c9a7ec623
[DellEMC Z9332f] Platform API 2.0 Support and bug fixing (#5958)
- Add platform infra to support 2.0 API
- Bug fixing for 9332 known issues
2020-12-10 10:30:44 -08:00
gechiang
0ffadf357e
Upgrade to SAI BCM 4.2.1.5-6 to pick up the fib hash patch (CS00011388674) and the fdb invalid port patch (CS00011298546) (#6165) 2020-12-09 13:53:27 -08:00
Samuel Angebault
44f4c2ed66
[Arista] Update driver submodules (#6151)
- Enhance eeprom parsing robustness on corrupted fields
 - Add chassis provisioning service
 - Disable CPU sleep state on some systems
 - Complete refactor for FanSlots
 - Fix module unload while still in use
2020-12-08 11:17:28 -08:00
Arun Saravanan Balachandran
c7a952ac48
DellEMC S6100: Update reboot-cause determination to support new firmware (#5807)
- Why I did it
For determining reboot-cause while running newer BIOS, SMF firmware.

- How I did it
Made changes in reboot-cause determination script to add support for behavior of newer firmware.

- How to verify it
Performed different type of resets and verified "show reboot-cause" provides the correct reason.
Logs: UT_logs.txt

- Description for the changelog
DellEMC S6100: Update reboot-cause determination to support new firmware
2020-12-01 15:15:11 -08:00
Arun Saravanan Balachandran
165cae73ab
[DellEMC]: S6100, S6000 - Platform API fixes (#6073)
- Change return type of SFP methods to match specification in sonic_platform_common/sfp_base.py
- Use init methods of base classes to initialize common instance variables
- Handle negative timeout values in S6100's watchdog ‘arm’ method
- Return appropriate values for 'get_target_speed', 'set_status_led' to avoid false warnings
2020-12-01 10:43:41 -08:00
Samuel Angebault
6803e1d158
[Arista] Update driver submodules (#6054)
- Implement platform module API
 - Implement platform drawer API
 - Complete and fix a few other platform API
 - Add psu and fan support for chassis
 - Fix dependency issue with determine-reboot-cause
2020-11-30 10:58:14 -08:00
sandycelestica
d475b96aa3
[Celestica] Fix issues where the part of udev rule are not working as expect (#6039)
* Update 50-ttyUSB-C0.rules
Remove some debug codes

* Update popmsg.sh
remove the debug info

* Update udev_prefix.sh
remove debug info

* Update platform-modules-haliburton.postinst
To enable the execute permission of udev scripts

* Update 50-ttyUSB-C0.rules
Fix port 13 - 48 plug out but not  pop out message issue.

Signed-off-by: Jing Kan jika@microsoft.com
2020-11-30 16:19:46 +08:00
Pradchaya Phucharoen
fb53ab7f7b
[platform/cels] Dx010: fix pca9548 downstream device address collision (#5899)
- Why I did it
Fix the i2c device to address conflicts behind the PCA9548 switch.

- How I did it
Load the i2c-mux-pca954x with parameter force-deselect-on-exit=1.
2020-11-26 09:35:58 -08:00
Joe LeVeque
7f4ab8fbd8
[sonic-utilities] Update submodule; Build and install as a Python 3 wheel (#5926)
Submodule updates include the following commits:

* src/sonic-utilities 9dc58ea...f9eb739 (18):
  > Remove unnecessary calls to str.encode() now that the package is Python 3; Fix deprecation warning (#1260)
  > [generate_dump] Ignoring file/directory not found Errors (#1201)
  > Fixed porstat rate and util issues (#1140)
  > fix error: interface counters is mismatch after warm-reboot (#1099)
  > Remove unnecessary calls to str.decode() now that the package is Python 3 (#1255)
  > [acl-loader] Make list sorting compliant with Python 3 (#1257)
  > Replace hard-coded fast-reboot with variable. And some typo corrections (#1254)
  > [configlet][portconfig] Remove calls to dict.has_key() which is not available in Python 3 (#1247)
  > Remove unnecessary conversions to list() and calls to dict.keys() (#1243)
  > Clean up LGTM alerts (#1239)
  > Add 'requests' as install dependency in setup.py (#1240)
  > Convert to Python 3 (#1128)
  > Fix mock SonicV2Connector in python3: use decode_responses mode so caller code will be the same as python2 (#1238)
  > [tests] Do not trim from PATH if we did not append to it; Clean up/fix shebangs in scripts (#1233)
  > Updates to bgp config and show commands with BGP_INTERNAL_NEIGHBOR table (#1224)
  > [cli]: NAT show commands newline issue after migrated to Python3 (#1204)
  > [doc]: Update Command-Reference.md (#1231)
  > Added 'import sys' in feature.py file (#1232)

* src/sonic-py-swsssdk 9d9f0c6...1664be9 (2):
  > Fix: no need to decode() after redis client scan, so it will work for both python2 and python3 (#96)
  > FieldValueMap `contains`(`in`)  will also work when migrated to libswsscommon(C++ with SWIG wrapper) (#94)

- Also fix Python 3-related issues:
    - Use integer (floor) division in config_samples.py (sonic-config-engine)
    - Replace print statement with print function in eeprom.py plugin for x86_64-kvm_x86_64-r0 platform
    - Update all platform plugins to be compatible with both Python 2 and Python 3
    - Remove shebangs from plugins files which are not intended to be executable
    - Replace tabs with spaces in Python plugin files and fix alignment, because Python 3 is more strict
    - Remove trailing whitespace from plugins files
2020-11-25 10:28:36 -08:00
Arun Saravanan Balachandran
dc15fbc0ee
[DellEMC]: EEPROM platform API Python3 compliance changes (#5960)
Make EEPROM platform APIs Python3 compliant in DellEMC platforms by handling bytearray type returned by read_eeprom and read_eeprom_bytes methods.
2020-11-24 17:30:41 -08:00
xumia
98d749128a
Fix docker images rebuilt issue when building each host image (#5925)
* Change back the mtime changed by applying patch

* Fix bug

* Fix bug

* Use the grep or pattern instead to call a new grep command
2020-11-24 21:45:06 +08:00
Joe LeVeque
80bf8691e8
[Syncd] containers still based on Stretch must still use Python 2 (#6010)
Some syncd containers are still based on Debian Stretch, and thus do not have Python 3 available. For these containers, we must still rely on Python 2 to run supervisord_dependent_startup and supervisor-proc-exit-listener.
2020-11-23 22:35:58 -08:00
lguohan
4d3eb18ca7
[supervisord]: use abspath as supervisord entrypoint (#5995)
use abspath makes the entrypoint not affected by PATH env.

Signed-off-by: Guohan Lu <lguohan@gmail.com>
2020-11-22 21:18:44 -08:00
Joe LeVeque
23247514f9
Fix a number of LGTM alerts (#5952)
Fix 259 alerts reported by the LGTM tool:

- 245 for Unused import
- 7 for Testing equality to None
- 5 for Duplicate key in dict literal
- 1 for Module is imported more than once
- 1 for Unused local variable
2020-11-20 10:58:48 -08:00
Joe LeVeque
7bf05f7f4f
[supervisor] Install vanilla package once again, install Python 3 version in Buster container (#5546)
**- Why I did it**

We were building a custom version of Supervisor because I had added patches to prevent hangs and crashes if the system clock ever rolled backward. Those changes were merged into the upstream Supervisor repo as of version 3.4.0 (http://supervisord.org/changes.html#id9), therefore, we should be able to simply install the vanilla package via pip. This will also allow us to easily move to Python 3, as Python 3 support was added in version 4.0.0.

**- How I did it**

- Remove Makefiles and patches for building supervisor package from source
- Install Python 3 supervisor package version 4.2.1 in Buster base container
    - Also install Python 3 version of supervisord-dependent-startup in Buster base container
- Debian package installed binary in `/usr/bin/`, but pip package installs in `/usr/local/bin/`, so rather than update all absolute paths, I changed all references to simply call `supervisord` and let the system PATH find the executable to prevent future need for changes just in case we ever need to switch back to build a Debian package, then we won't need to modify these again.
- Install Python 2 supervisor package >= 3.4.0 in Stretch and Jessie base containers
2020-11-19 23:41:32 -08:00
Mahesh Maddikayala
3be3f4d104
[BCM SAI] update BCM SAI to 4.2.1.5 (#5866)
This release includes changes for supporting Debian Buster, fixes for internally found issues and more enhancement related to spec coverage and feature parity for Broadcom ASICs.

Additional fixes included:

CS00011465498 - Warm reboot
CS00011465061 - interfaces not coming up
CS00011396506 - nexthop resource leak
CS00011452080 - BCM SAI crash while getting lane count
2020-11-18 11:20:43 -08:00
fk410167
a3dd3f55f9
Platform Driver Developement Framework (PDDF) (#4756)
This change introduces PDDF which is described here: https://github.com/Azure/SONiC/pull/536

Most of the platform bring up effort goes in developing the platform device drivers, SONiC platform APIs and validating them. Typically each platform vendor writes their own drivers and platform APIs which is very tailor made to that platform. This involves writing code, building, installing it on the target platform devices and testing. Many of the details of the platform are hard coded into these drivers, from the HW spec. They go through this cycle repetitively till everything works fine, and is validated before upstreaming the code.
PDDF aims to make this platform driver and platform APIs development process much simpler by providing a data driven development framework. This is enabled by:

JSON descriptor files for platform data
Generic data-driven drivers for various devices
Generic SONiC platform APIs
Vendor specific extensions for customisation and extensibility

Signed-off-by: Fuzail Khan <fuzail.khan@broadcom.com>
2020-11-12 10:22:38 -08:00
Srideep
89d9471654
[DellEMC S5232f] Updates and bug fixes for platform (#5887)
* Fix platform sensors
 * Fix issues reported in fpga driver
 * Update fixes for API 2.0 platform code
2020-11-11 12:59:30 -08:00
Ciju Rajan K
609cbdd0f3
[Juniper] Platform bug fixes / improvements (#5541)
* [Juniper] Platform bug fixes / improvements

This patch set introduces the following changes for
the two platforms.

 - QFX5210
   - Fixes a driver bug related to reboot notifier
   - Disable pcied
   - Introduces a wrapper script for fast / warm reboots
     for unloading the driver containing reboot handler
   - Support for PSM4 optics in media_settings

 - QFX5200
   - BCM configuration file updates
   - Bug fixes for EM policy
   - Fixes a driver bug related to reboot notifier
   - Introduces a wrapper script for fast / warm reboots
     for unloading the driver containing reboot handler
   - Disable pcied
   - Support for PSM4 optics

Signed-off-by: Ciju Rajan K <crajank@juniper.net>
2020-11-10 22:13:23 -08:00
Ying Xie
b5cfc02552
[celestica dx010] comment out the initialization of PCA9541 (#5891)
The original code tried to initialize PCA9541 without having the
driver loaded. As result the initialization didn't take effect.

Recently PCA9541 driver was added to the kernel and since then
the initialization takes effect and has negatively impacted the
platform stability.

Commenting the initialization code out to restore the original
behavior while analyzing further.

Signed-off-by: Ying Xie <ying.xie@microsoft.com>
2020-11-10 16:53:27 -08:00
gechiang
33076348ca
Moving BRCM SAI 4.2.1.3 to 4.2.1.3-1 to pick up fix for CS00011396506 to fix CRM nexthop resource inuse leak (#5878) 2020-11-10 15:28:42 -08:00
Blueve
fb6af9d6d7
[broadcom][cel][udev] Update console usb devices permission (#5840)
* Add MODE "0666" to all console usb tty

Signed-off-by: Jing Kan jika@email.com
2020-11-09 08:54:21 +08:00
Roy Lee
ce6286eb84
[device/accton] Remove the use of python pickle package (#5475)
Pickle is applied to save the order of i2c adapters at installation.
With pickle removed, it just checks the order of i2c buses every time it needs.
2020-11-04 16:24:53 -08:00
abdosi
dddf96933c
[monit] Adding patch to enhance syslog error message generation for monit alert action when status is failed. (#5720)
Why/How I did:

Make sure first error syslog is triggered based on FAULT TOLERANCE condition.

Added support of repeat clause with alert action. This is used as trigger
for generation of periodic syslog error messages if error is persistent

Updated the monit conf files with repeat every x cycles for the alert action
2020-10-31 17:29:49 -07:00
Samuel Angebault
12911ba619
[Arista] Update arista driver submodules (#5736)
- Change `/run/arista` mount to pmon by `/var/run/platform_cache`
 - Python3 by default for Arista platform initialisation
 - Fix outstanding py2/3 compatibility issues (eeprom mostly)
 - Use pytest for unit testing
 - Miscellaneous modular fixes
2020-10-30 04:17:30 -07:00
Arun Saravanan Balachandran
6145e4f6f1
[DellEMC]: FanDrawer and get_high_critical_threshold Platform API implementation for S6000, S6100, Z9100 and Z9264F (#5673)
- Implement FanDrawer and get_high_critical_threshold Platform API for S6000, S6100, Z9100 and Z9264F.
- Fix incorrect fan direction values in S6100, Z9100
2020-10-29 18:05:16 -07:00
Samuel Angebault
5bfe37ca42
[Arista] Update driver submodules (#5686)
- Enable thermalctld support for our platforms
 - Fix Chassis.get_num_sfp which had an off by one
 - Implement read_eeprom and write_eeprom in SfpBase
 - Refactor of Psus and PsuSlots. Psus they are now detected and metadata reported
 - Improvements to modular support

Co-authored-by: Zhi Yuan Carl Zhao <zyzhao@arista.com>
2020-10-23 12:28:36 -07:00