Commit Graph

181 Commits

Author SHA1 Message Date
Sujin Kang
2c41441edd
Revert "[202012] [Arista] Enable syseepromd update database (with 202012 fix) (#8963)" (#9041)
This reverts commit 94456b1680.
2021-10-22 09:54:49 -07:00
zzhiyuan
94456b1680
[202012] [Arista] Enable syseepromd update database (with 202012 fix) (#8963)
Co-authored-by: Zhi Yuan (Carl) Zhao <zyzhao@arista.com>
Why I did it
The previous PR #8914 was reverted due to crashing the 202012 syseepromd.

How I did it
Tested the 202012 image with change and fixed the disparity between master and 202012.

How to verify it
Run the built image on the dut and syseepromd will not crash, and in redis-cli can fetch the eeprom information.
2021-10-14 16:43:57 -07:00
Sujin Kang
4d859eb923
Revert "Add Arista eeprom platform API method to update database (#8914)" (#8955)
This reverts commit 6a6b81b983.
2021-10-12 10:13:48 -07:00
zzhiyuan
6a6b81b983
Add Arista eeprom platform API method to update database (#8914)
Why I did it
Sujin noticed that Arista eeprom platform API cannot update the redis database. Although Arista and Guohan believe that database update logic should be part of the daemon, it is easy enough to implement the fix for Arista for now.

How I did it
Made Arista eeprom platform API inherit from TlvInfoDecoder, then write Arista's own visit_eeprom method.
Co-authored-by: Zhi Yuan (Carl) Zhao <zyzhao@arista.com>
2021-10-07 20:36:08 -07:00
zzhiyuan
380ad2e8fc
[202012] [Arista] Raise ValueError on thermal manager invalid fan speed (#8903)
Why I did it
Vaibhav Dahiya notified me that invalid fan speed policy was expecting an error raised in sonic-mgmt testing, but it was not raised.
This change will fix test_platform_info.py::test_thermal_control_load_invalid_value_json

How I did it
Add in the suggested code chunk to Arista platform submodule to raise ValueError when an invalid fan speed is set in thermal policy.

How to verify it
Vaibhav Dahiya has verified it through sonic-mgmt testing.

Co-authored-by: Zhi Yuan (Carl) Zhao <zyzhao@arista.com>
2021-10-05 18:29:26 -07:00
Samuel Angebault
c76f884af1
[202012][Arista] Update platform library (#8709)
fix mac address format for get_system_eeprom_info
harden pmbus status reading for Clearlake
force loading PSUs on Cloverdale
2021-09-13 07:36:43 -07:00
Samuel Angebault
d74e927a8d
[202012][Arista] Update platform library submodules (#8630) 2021-08-31 19:34:38 -07:00
Samuel Angebault
6a2d9e177c
[202012][Arista] Update platform library submodules (#8530)
Fix Chassis.get_name to return the same value than what's in platform.json
Fix Chassis.get_system_eeprom_info when running from within pmon.
Fix Watchdog.get_remaining_time (fixes [202012 platform_tests] TestWatchdogApi::test_remaining_time failure on vms20-t1-7050cx3-3.1 #8440 and [ 202012 platform_tests ] TestWatchdogApi::test_arm_disarm_states failure on vms20-t1-7050cx3-3.1 #8439)
Implement missing thermal infos and conditions (fixes [202012 platform_tests] test_platform_info.py::test_thermal_control_psu_absence error #8453)
Fix Chassis.set_status_led return value (fixes [2020 platform_tests] TestChassisApi::test_status_led failure on vms20-t0-7050cx3-1  #8464)
2021-08-20 10:30:12 -07:00
roman_savchuk
a06cd18dbe
[BFN]: update bfnsdk package (#8350)
Signed-off-by: Roman Savchuk <romanx.savchuk@intel.com>
2021-08-10 08:19:37 -07:00
Samuel Angebault
99efd5346e
[202012][Arista] Update platform library submodules (#8339)
This PR only contains backports from master

Fix leak discovered on master, though 202012 is not affected it's better to have the fix (fixes [master] thermalctld leak on Arista devices makes them unreachable when memory is exhausted #7515)
Fix EepromDecoderimplementation in the platform API (fixes syseepromd crashing repeatedly on SONiC.20201231.02 #8263)
Fix Mineral platform definition and configuration
Fix build issues in environments where /proc is not mounted/restricted (fixes PLATFORM=broadcom fails arista "ReloadCauseManagerTest" first time #7800)
Fix some pytest issues
Add sfp-eeprom C API and also mount it in pmon
2021-08-05 18:35:31 -07:00
andywongarista
b4832d40a9
[202012] [Arista] Update platform drivers submodules (#7916)
There was an issue on master where `thermal.get_position_in_parent` in the platform API was returning -1 instead of a proper index. This is a backport of the fix for that issue.
2021-06-21 18:00:01 -07:00
Volodymyr Boiko
26c6f2a4b2 [barefoot][platform] Chassis.get_reboot_cause (#7794)
To fix determine-reboot-cause service which was failing due to non-implemented thrown from get_reboot_case, if the reboot was done with `sudo reboot` (cold reboot)

Signed-off-by: Volodymyr Boyko <volodymyrx.boiko@intel.com>
2021-06-16 12:38:30 +00:00
yozhao101
fb2c995f53
[202012][Monit] Deprecate the feature of monitoring the critical processes by Monit (#7823)
Signed-off-by: Yong Zhao yozhao@microsoft.com

Why I did it
Currently we leveraged the Supervisor to monitor the running status of critical processes in each container and it is more reliable and flexible than doing the monitoring by Monit. So we removed the functionality of monitoring the critical processes by Monit.

How I did it
I removed the script process_checker and corresponding Monit configuration entries of critical processes.

How to verify it
I verified this on the device str-7260cx3-acs-1.
2021-06-09 09:04:22 -07:00
Volodymyr Boiko
44d9489b8d [barefoot][platform] Refactor chassis.py (#7704)
#### Why I did it
On our platforms syncd must be up while using the sonic_platform.
The issue is warm-reboot script first disables syncd then instantiate Chassis, which tries to connect syncd in __init__.

#### How I did it
Refactor Chassis to lazy initialize components.

Signed-off-by: Volodymyr Boyko <volodymyrx.boiko@intel.com>
2021-06-09 08:25:30 +00:00
Volodymyr Boiko
384680a83f [platform][barefoot] Lazy initialize fans and thermals list (#7103)
Initialize fans and thermals lists on demand; make them properties in order to reduce Chassis object initialization time

Signed-off-by: Volodymyr Boyko <volodymyrx.boiko@intel.com>
2021-06-09 08:22:44 +00:00
Volodymyr Boiko
6a90c09cc7 [barefoot][sonic-platform] Fix Fan.set_speed (#7763)
Fixed typo

Signed-off-by: Volodymyr Boyko <volodymyrx.boiko@intel.com>
2021-06-03 12:13:42 +00:00
Volodymyr Boiko
202c31ebbe [barefoot][platform] Support fans and thermal (#7004)
Add support for fans and thermals to sonic-platform package for Montara platform

Signed-off-by: Volodymyr Boyko <volodymyrx.boiko@intel.com>
2021-06-03 12:13:42 +00:00
zzhiyuan
39a4cd9c3a
[202012][Arista] Update Arista submodule to include pmbus fix (#7737)
Why I did it
Microsoft reported occasional daemon crashes on devices running 201911. On close inspection it was due to PMBus reads failing on IOError on very rare occasions.

This is the fix for 202012 branch.

How I did it
Add try/except block on performing reads on PMBus GPIOs.

How to verify it
Which release branch to backport (provide reason below if selected)

Co-authored-by: Zhi Yuan (Carl) Zhao <zyzhao@arista.com>
2021-05-28 15:43:30 -07:00
Joe LeVeque
dd9be59cd1
[202012][dockers][supervisor] Increase event buffer size for process exit listener; Set all event buffer sizes to 1024 (#7203)
#### Why I did it

Backport of https://github.com/Azure/sonic-buildimage/pull/7083 to the 202012 branch.

To prevent error [messages](https://dev.azure.com/mssonic/build/_build/results?buildId=2254&view=logs&j=9a13fbcd-e92d-583c-2f89-d81f90cac1fd&t=739db6ba-1b35-5485-5697-de102068d650&l=802) like the following from being logged:

```
Mar 17 02:33:48.523153 vlab-01 INFO swss#supervisord 2021-03-17 02:33:48,518 ERRO pool supervisor-proc-exit-listener event buffer overflowed, discarding event 46
```

This is basically an addendum to https://github.com/Azure/sonic-buildimage/pull/5247, which increased the event buffer size for dependent-startup. While supervisor-proc-exit-listener doesn't subscribe to as many events as dependent-startup, there is still a chance some containers (like swss, as in the example above) have enough processes running to cause an overflow of the default buffer size of 10.

This is especially important for preventing erroneous log_analyzer failures in the sonic-mgmt repo regression tests, which have started occasionally causing PR check builds to fail. Example [here](https://dev.azure.com/mssonic/build/_build/results?buildId=2254&view=logs&j=9a13fbcd-e92d-583c-2f89-d81f90cac1fd&t=739db6ba-1b35-5485-5697-de102068d650&l=802).

I set all supervisor-proc-exit-listener event buffer sizes to 1024, and also updated all dependent-startup event buffer sizes to 1024, as well, to keep things simple, unified, and allow headroom so that we will not need to adjust these values frequently, if at all.
2021-04-01 12:52:19 -07:00
Mykola Gerasymenko
567eae6116 [barefoot]: Add psample module to load at boot time on BFN platform (#7164)
The psample module was not loaded on barefoot platform. The loading of this module is a prerequisite for testing SFlow.

* add `.gitignore` to the `barefoot` subdirectory to overwrite ignore "platform/**/debian/*" in the root directory
2021-03-31 08:48:53 -07:00
Myron Sosyak
0ec92765ba [barefoot]: Updated SDK packages to 20210324 (#7142)
Update unsupported SAI attr ('SAI_ACL_TABLE_ATTR_FIELD_OUTER_VLAN_ID') to fix issues on acl table create
2021-03-31 08:46:16 -07:00
Volodymyr Boiko
8932678285 [platform][barefoot] Use urllib.parse.quote (#7010)
Fix Python 2 -> Python 3 migration issue

Signed-off-by: Volodymyr Boyko <volodymyrx.boiko@intel.com>
2021-03-15 19:12:54 -07:00
Volodymyr Boiko
2e2c85222e [barefoot][platform] Extend sonic_platform psu.py (#7006)
Improve sonic_platform PSU support

Signed-off-by: Volodymyr Boyko <volodymyrx.boiko@intel.com>
2021-03-15 19:12:33 -07:00
Samuel Angebault
e1f8c07d9e
[Arista] Update platform drivers (#6946)
- Provide `hw-management-generate-dump.sh` for `show techsupport`
 - Load `optoe3` for OSFP and QSFP-DD transceivers
 - Enhance reboot-cause caching robustness

Signed-off-by: Samuel Angebault <staphylo@arista.com>
2021-03-08 09:37:16 -08:00
Volodymyr Boiko
3f99959828 [platform][barefoot] Fix as9516bf installation (#6938)
To fix sonic_platform installation on as9516bf platform

Signed-off-by: Volodymyr Boyko <volodymyrx.boiko@intel.com>
2021-03-04 21:58:53 +00:00
Volodymyr Boiko
760b3361ae [barefoot][platform] Refactor legacy scripts (#6871)
Removed or adapted obsolete code

Signed-off-by: Volodymyr Boyko <volodymyrx.boiko@intel.com>
2021-03-04 21:23:05 +00:00
Volodymyr Boiko
0d15e07d14 [barefoot][device][platform] Moved pcie.yaml (#6862)
To fix #6860

Signed-off-by: Volodymyr Boyko <volodymyrx.boiko@intel.com>
2021-03-04 21:23:05 +00:00
Sujin Kang
15aed52ef2 [pcie.yaml] Move pcie configuration file path to platform directory (#6475)
- Why I did it
The pcie configuration file location is under plugin directory not under platform directory.
#6437

- How I did it

Move all pcie.yaml configuration file from plugin to platform directory.
Remove unnecessary timer to start pcie-check.service
Move pcie-check.service to sonic-host-services
- How to verify it
Verify on the device
2021-03-04 21:23:05 +00:00
Samuel Angebault
a654518968
[Arista] Driver and platform update (#6468) (#6872)
- Add support for `DCS-7050SX3-48YC8` and `DCS-7050SX3-48C8` platform
 - Add support for more variants of `DCS-7280CR3-32[PD]4`
 - Add Supervisor to Linecard consutil support
 - Complete Watchdog platform API support
 - Fix some PSU behavior on `DCS-7050QX-32` and `DCS-7060CX-32S`
 - Fix SEU management on `DCS-7060CX-32S`
 - Allow kernel modules to build up to linux 5.10
 - Rename led color `orange` to `amber`
 - Miscellaneous fixes
2021-02-24 10:09:52 -08:00
roman_savchuk
024b173521 [build]: fixed BFN target build (#6826)
#### Why I did it
Fix runtime issues caused by SONiC update

#### How I did it
- new attribute SAI_ACL_ENTRY_ATTR_FIELD_ACL_IP_TYPE  supported 
- new attribute SAI_SWITCH_ATTR_AVAILABLE_IPMC_ENTRY supported

Signed-off-by: Roman Savchuk <romanx.savchuk@intel.com>
2021-02-23 23:56:01 +00:00
Myron Sosyak
faf2fd3be1 [BFN] Fix MTU for internal interface (#6783)
Set correct MTU size of internal interface for Newport platform
2021-02-23 23:56:01 +00:00
Volodymyr Boiko
8b3813b637 [barefoot][sonic-platform] Fix sfp reset (#6746)
Fix wrong sfp reset return value

Signed-off-by: Volodymyr Boyko <volodymyrx.boiko@intel.com>
2021-02-16 15:31:54 -08:00
Volodymyr Boiko
bcc4a52f56 [barefoot][sonic-platform] Refactor sfp.py (#6770)
Use separate file for each sfp eeprom operation

Signed-off-by: Volodymyr Boyko <volodymyrx.boiko@intel.com>
2021-02-16 15:31:44 -08:00
Volodymyr Boiko
5aadc7ff55 [barefoot][sonic-platform] Fix get_system_eeprom_info and refactor eeprom.py (#6739)
Signed-off-by: Volodymyr Boyko <volodymyrx.boiko@intel.com>
2021-02-16 15:31:38 -08:00
Volodymyr Boiko
814cf03f98 [platform][barefoot] Fix sonic_platform for x86_64-accton_wedge100bf_65x-r0 (#6612)
platform modules' deb package rules for x86_64-accton_wedge100bf_65x-r0 was missing the code that installs the sonic_platform wheel package file under device directory on host (which is checked by pmon deamon on start)
2021-02-16 15:31:17 -08:00
Tamer Ahmed
becd143a41 [syncd-rpc] Install Libboost Atomic 1.71, Libqtcore And Libqtnetwork (#6689)
When Building syncd-rpc, libthrift has dependency on libboost-atomic1.71.0,
however the debian packager install version 1.67 instead. This PR
preinstalls libboost-atomic v 1.71 to avoid falling back to v 1.67.

signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
2021-02-16 15:29:21 -08:00
lguohan
ab03441ce9 [sonic-linux-kernel]: security update to kernel 4.19.152 (#6490)
Signed-off-by: Guohan Lu <lguohan@gmail.com>
2021-02-16 15:28:34 -08:00
Volodymyr Boiko
742bbed255 [barefoot][platform] Fix sonic-platform host installation (#6696)
prerm is needed for platform modules package to be properly removed.
Added prerm to remove installed in postinst wheel packages.

Signed-off-by: Volodymyr Boyko <volodymyrx.boiko@intel.com>
2021-02-06 23:32:27 -08:00
Volodymyr Boiko
d26a4aff9a [platform][barefoot] Install sonic_platform to host (#6644)
- Why I did it
SONiC design requires sonic_platform package to be installed in SONiC host environment, not only in docker containers.

- How I did it
For now, sonic_platform python wheel package, that is used by pmon, is provided via device-specific platform modules deb packages that unpacks the wheel package file into specific device's directory on lazy-install.
The PR makes deb packages' postinst script also install these unpacked wheel packages to host.

Signed-off-by: Volodymyr Boyko <volodymyrx.boiko@intel.com>
2021-02-03 10:48:53 -08:00
Stephen Sun
bf8a76634c [syncd-rpc docker] Fix issue: ptf_nn_agent isn't able to start in syncd-rpc docker on buster (#6448)
- Why I did it
Fix issue: ptf_nn_agent isn't able to start in syncd-rpc docker on buster.

- How I did it
The issue is fixed by installing python-dev, cffi and nnpy for python 2 explicitly.

- How to verify it
Run copp test on RPC image.
2021-02-03 10:43:48 -08:00
Volodymyr Boiko
3e70b97342 [barefoot][platform] platform API 2.0 fixes (#6607)
To improve python3 support of berefoot's sonic_platform

Signed-off-by: Volodymyr Boyko <volodymyrx.boiko@intel.com>
2021-02-03 10:43:48 -08:00
yozhao101
cc9c3f567e [supervisord] Monitoring the critical processes with supervisord. (#6242)
- Why I did it
Initially, we used Monit to monitor critical processes in each container. If one of critical processes was not running
or crashed due to some reasons, then Monit will write an alerting message into syslog periodically. If we add a new process
in a container, the corresponding Monti configuration file will also need to update. It is a little hard for maintenance.

Currently we employed event listener of Supervisod to do this monitoring. Since processes in each container are managed by
Supervisord, we can only focus on the logic of monitoring.

- How I did it
We borrowed the event listener of Supervisord to monitor critical processes in containers. The event listener will take
following steps if it was notified one of critical processes exited unexpectedly:

The event listener will first check whether the auto-restart mechanism was enabled for this container or not. If auto-restart mechanism was enabled, event listener will kill the Supervisord process, which should cause the container to exit and subsequently get restarted.

If auto-restart mechanism was not enabled for this contianer, the event listener will enter a loop which will first sleep 1 minute and then check whether the process is running. If yes, the event listener exits. If no, an alerting message will be written into syslog.

- How to verify it
First, we need checked whether the auto-restart mechanism of a container was enabled or not by running the command show feature status. If enabled, one critical process should be selected and killed manually, then we need check whether the container will be restarted or not.

Second, we can disable the auto-restart mechanism if it was enabled at step 1 by running the commnad sudo config feature autorestart <container_name> disabled. Then one critical process should be selected and killed. After that, we will see the alerting message which will appear in the syslog every 1 minute.

- Which release branch to backport (provide reason below if selected)

 201811
 201911
[x ] 202006
2021-01-28 09:28:27 -08:00
Guohan Lu
0b2abafcde [docker-ptf]: build docker ptf
- combine docker-ptf-saithrift into docker-ptf docker
- build docker-ptf under platform vs
- remove docker-ptf for other platforms

Signed-off-by: Guohan Lu <lguohan@gmail.com>
2021-01-28 09:23:12 -08:00
Antonina Melnyk
df76e33bb3 [barefoot] Fixes for platform API (#6487)
There was a mismatch with Eeprom class methods names and methods called from Eeprom class.

Signed-off-by: Antonina Melnyk antoninax.melnyk@intel.com
2021-01-24 22:42:31 -08:00
Samuel Angebault
c6f14c9927
[202012][Arista] Update driver submodules (#6397)
- Cleanup and Refactor of library internals, logic mostly unchanged.
 - Enhance debugability with `arista dump` and `arista diag` commands.
 - Fix power supply detection issue.
2021-01-09 08:05:54 -08:00
Myron Sosyak
f7bb635be8 [BFN] Convert platform modules to python 3 (#6347)
Fix syntax errors during xcvrd start with Python 3 daemons
2021-01-04 23:24:19 -08:00
Denys Petryshyn
42fa096883 [BFN] Upgrade docker-syncd-bfn to buster (#6345)
* Add changes to allow migration of bfn syncd to buster

* Update BFN packages for Debian 10

Signed-off-by: Denys Petryshyn <denysx.petryshyn@intel.com>
2021-01-04 23:23:46 -08:00
Andriy Kokhan
94e143cebe
[BFN] Updated SAI headers to v1.7.1 (#6294)
Signed-off-by: Andriy Kokhan <andriyx.kokhan@intel.com>

Co-authored-by: Andriy Kokhan <andriyx.kokhan@intel.com>
2020-12-24 18:45:41 -08:00
Samuel Angebault
44f4c2ed66
[Arista] Update driver submodules (#6151)
- Enhance eeprom parsing robustness on corrupted fields
 - Add chassis provisioning service
 - Disable CPU sleep state on some systems
 - Complete refactor for FanSlots
 - Fix module unload while still in use
2020-12-08 11:17:28 -08:00
Dmytro Shevchuk
026f0ec3fb
[Barefoot]: fix unresolved SFP type on Newport/Montara (#6063)
Fix `show interface status` and `sfpshow eeprom` commands showing incorrect information on Newport/Montara platforms
2020-12-04 11:00:03 -08:00