- Improve chassis linecard restartability
- Fix 'show system-health' cli by adding non standard api
- Fix ledd crash on linecards with Recycle/Inband ports
- Refactor DPM management and add ADM1266 support
- Add state machine to update DPM RTC clock periodically
- Improve xcvr temperature reporting
- Fix lane mapping and `default_sku` for `x86_64-arista_7170_32c` platform
- Fix `7170-32C/CD` platform definition
To fix determine-reboot-cause service which was failing due to non-implemented thrown from get_reboot_case, if the reboot was done with `sudo reboot` (cold reboot)
Signed-off-by: Volodymyr Boyko <volodymyrx.boiko@intel.com>
Signed-off-by: Yong Zhao yozhao@microsoft.com
Why I did it
Currently we leveraged the Supervisor to monitor the running status of critical processes in each container and it is more reliable and flexible than doing the monitoring by Monit. So we removed the functionality of monitoring the critical processes by Monit.
How I did it
I removed the script process_checker and corresponding Monit configuration entries of critical processes.
How to verify it
I verified this on the device str-7260cx3-acs-1.
#### Why I did it
On our platforms syncd must be up while using the sonic_platform.
The issue is warm-reboot script first disables syncd then instantiate Chassis, which tries to connect syncd in __init__.
#### How I did it
Refactor Chassis to lazy initialize components.
Signed-off-by: Volodymyr Boyko <volodymyrx.boiko@intel.com>
- Fix `thermal.get_position_in_parent` issue which causes test_snmp_phy_entity failure
- Add support for xcvr thermal info so that thermalctld can incorporate it into the cooling algorithm (QSFP and OSFP/QSFP-DD modules only)
- Add improvements to `arista` CLI
#### Why I did it
To fix the following:
```
# psuutil status
Traceback (most recent call last):
File "/usr/local/bin/psuutil", line 8, in <module>
sys.exit(cli())
File "/usr/local/lib/python3.7/dist-packages/click/core.py", line 764, in __call__
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/click/core.py", line 717, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.7/dist-packages/click/core.py", line 1137, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/lib/python3.7/dist-packages/click/core.py", line 956, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.7/dist-packages/click/core.py", line 555, in invoke
return callback(*args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/psuutil/main.py", line 93, in status
psu_name = psu.get_name()
File "/usr/local/lib/python3.7/dist-packages/sonic_platform_base/device_base.py", line 28, in get_name
raise NotImplementedError
NotImplementedError
```
#### How I did it
Implemented get_name
Signed-off-by: Volodymyr Boyko <volodymyrx.boiko@intel.com>
Platform library changes
- Fix the use of /proc/modules during testing, fixes#7463
- Add `libsfp-eeprom.so` build to read/write xcvr eeproms in C
- Add some more reboot-cause information
- Write down temperature hw thresholds to the sensors
- Report software thresholds through platform api
- Writ `port_name sysfs` file of optoe`
- Tests enhancements
- Fix dependency issues for chassis provisioning
Platform configuration changes
- Add `pcie.yaml` configuration for a few platforms
- Mount `libsfp-eeprom.so` inside `pmon`
- Fix `Arista-7050SX3-48C8` and `Arista-7050SX3-48YC8' platform and hwsku
- Miscellaneous fixes
Co-authored-by: Boyang Yu <byu@arista.com>
Co-authored-by: Zhi Yuan Carl Zhao <zyzhao@arista.com>
Signed-off-by: Stepan Blyschak stepanb@nvidia.com
This PR is part of SONiC Application Extension
Depends on #5938
- Why I did it
To provide an infrastructure change in order to support SONiC Application Extension feature.
- How I did it
Label every installable SONiC Docker with a minimal required manifest and auto-generate packages.json file based on
installed SONiC images.
- How to verify it
Build an image, execute the following command:
admin@sonic:~$ docker inspect docker-snmp:1.0.0 | jq '.[0].Config.Labels["com.azure.sonic.manifest"]' -r | jq
Cat /var/lib/sonic-package-manager/packages.json file to verify all dockers are listed there.
- Fix build issue when `/proc/cmdline` is not available. Fixes#7145
- Properly detect linecard slot from linecard CPU
- Some fixes on the uart link between supervisor and linecard CPU
- Other small fixes
Platform API
- Fix Watchdog get_remaining_time logic
- Improve Sfp platform API implementation
- Improve EepromDecoder API implementation
- Fix mismatch between Fan name and platform.json
- Add PSU get_maximum_supplied_power
Internal
- Refactor of Xcvr declaration and initialization
- Cleanup of Resets and Gpios
- Add platform library versioning to enhance support capabilities
- Allow supervisor to manage cards from slot 2
- Miscelanous cleanups and refactors
Signed-off-by: Samuel Angebault <staphylo@arista.com>
The psample module was not loaded on barefoot platform. The loading of this module is a prerequisite for testing SFlow.
* add `.gitignore` to the `barefoot` subdirectory to overwrite ignore "platform/**/debian/*" in the root directory
Initialize fans and thermals lists on demand; make them properties in order to reduce Chassis object initialization time
Signed-off-by: Volodymyr Boyko <volodymyrx.boiko@intel.com>
- Provide `hw-management-generate-dump.sh` for `show techsupport`
- Load `optoe3` for OSFP and QSFP-DD transceivers
- Enhance reboot-cause caching robustness
Signed-off-by: Samuel Angebault <staphylo@arista.com>
- Fix parent `__init__` call for platform API, based on Azure/sonic-platform-common#173
- Implementation for more platform API methods
- Daily storage information reporting in `arista daemon`
- Enhancements to reboot cause reporting, now multi-sourcing reboot cause information
- Miscellaneous fixes
#### Why I did it
Fix runtime issues caused by SONiC update
#### How I did it
- new attribute SAI_ACL_ENTRY_ATTR_FIELD_ACL_IP_TYPE supported
- new attribute SAI_SWITCH_ATTR_AVAILABLE_IPMC_ENTRY supported
Signed-off-by: Roman Savchuk <romanx.savchuk@intel.com>
- Why I did it
The pcie configuration file location is under plugin directory not under platform directory.
#6437
- How I did it
Move all pcie.yaml configuration file from plugin to platform directory.
Remove unnecessary timer to start pcie-check.service
Move pcie-check.service to sonic-host-services
- How to verify it
Verify on the device
- Add support for `DCS-7050SX3-48YC8` and `DCS-7050SX3-48C8` platform
- Add support for more variants of `DCS-7280CR3-32[PD]4`
- Add Supervisor to Linecard consutil support
- Complete Watchdog platform API support
- Fix some PSU behavior on `DCS-7050QX-32` and `DCS-7060CX-32S`
- Fix SEU management on `DCS-7060CX-32S`
- Allow kernel modules to build up to linux 5.10
- Rename led color `orange` to `amber`
- Miscellaneous fixes
platform modules' deb package rules for x86_64-accton_wedge100bf_65x-r0 was missing the code that installs the sonic_platform wheel package file under device directory on host (which is checked by pmon deamon on start)
When Building syncd-rpc, libthrift has dependency on libboost-atomic1.71.0,
however the debian packager install version 1.67 instead. This PR
preinstalls libboost-atomic v 1.71 to avoid falling back to v 1.67.
signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
prerm is needed for platform modules package to be properly removed.
Added prerm to remove installed in postinst wheel packages.
Signed-off-by: Volodymyr Boyko <volodymyrx.boiko@intel.com>
- Why I did it
SONiC design requires sonic_platform package to be installed in SONiC host environment, not only in docker containers.
- How I did it
For now, sonic_platform python wheel package, that is used by pmon, is provided via device-specific platform modules deb packages that unpacks the wheel package file into specific device's directory on lazy-install.
The PR makes deb packages' postinst script also install these unpacked wheel packages to host.
Signed-off-by: Volodymyr Boyko <volodymyrx.boiko@intel.com>
- Why I did it
Fix issue: ptf_nn_agent isn't able to start in syncd-rpc docker on buster.
- How I did it
The issue is fixed by installing python-dev, cffi and nnpy for python 2 explicitly.
- How to verify it
Run copp test on RPC image.
- combine docker-ptf-saithrift into docker-ptf docker
- build docker-ptf under platform vs
- remove docker-ptf for other platforms
Signed-off-by: Guohan Lu <lguohan@gmail.com>
- Why I did it
Initially, we used Monit to monitor critical processes in each container. If one of critical processes was not running
or crashed due to some reasons, then Monit will write an alerting message into syslog periodically. If we add a new process
in a container, the corresponding Monti configuration file will also need to update. It is a little hard for maintenance.
Currently we employed event listener of Supervisod to do this monitoring. Since processes in each container are managed by
Supervisord, we can only focus on the logic of monitoring.
- How I did it
We borrowed the event listener of Supervisord to monitor critical processes in containers. The event listener will take
following steps if it was notified one of critical processes exited unexpectedly:
The event listener will first check whether the auto-restart mechanism was enabled for this container or not. If auto-restart mechanism was enabled, event listener will kill the Supervisord process, which should cause the container to exit and subsequently get restarted.
If auto-restart mechanism was not enabled for this contianer, the event listener will enter a loop which will first sleep 1 minute and then check whether the process is running. If yes, the event listener exits. If no, an alerting message will be written into syslog.
- How to verify it
First, we need checked whether the auto-restart mechanism of a container was enabled or not by running the command show feature status. If enabled, one critical process should be selected and killed manually, then we need check whether the container will be restarted or not.
Second, we can disable the auto-restart mechanism if it was enabled at step 1 by running the commnad sudo config feature autorestart <container_name> disabled. Then one critical process should be selected and killed. After that, we will see the alerting message which will appear in the syslog every 1 minute.
- Which release branch to backport (provide reason below if selected)
201811
201911
[x ] 202006
- Enhance eeprom parsing robustness on corrupted fields
- Add chassis provisioning service
- Disable CPU sleep state on some systems
- Complete refactor for FanSlots
- Fix module unload while still in use