Commit Graph

557 Commits

Author SHA1 Message Date
Santhosh Kumar T
9093feb113
[DellEMC][201911] S6100 CPLD upgrade support in 201911 branch porting changes (#10686)
Why I did it
Porting changes from DellEMC: S6100 CPLD upgrade #4299 and DellEMC S6100 CPLD upgrade support #3834 to 201911 branch
Added CPLD upgrade support for DellEMC S6100 platform.
2022-04-28 09:23:38 -07:00
Santhosh Kumar T
ac35a62747
[DellEMC][201911] S6100 S6000 - Show techsupport enhancement (#10690) 2022-04-27 09:17:35 -07:00
Arun Saravanan Balachandran
33ef26d97b
[201911] DellEMC: S6000, S6100 - Enable thermalctld, Platform API changes (#9384)
Why I did it
To incorporate the below changes in DellEMC S6100, S6000 platforms.

Enable thermalctld
Backport Platform API changes from master branch.
How I did it
Remove 'skip_thermalctld:true' in pmon_daemon_control.json
Implement the platform API methods in the respective device files
How to verify it
Verified that platform data is displayed by show platform fan and show platform temperature commands.
2021-12-10 12:23:22 -08:00
Samuel Angebault
dfa77a54d5
[201911][Arista] Backport logrotate configuration (#9455)
Backport logrotate configuration for arista*.log files
2021-12-08 19:11:04 -08:00
Santhosh Kumar T
ddf40cb729
[201911] Dell S6000 I2C not responding to certain optics - porting (#8855) 2021-10-25 15:25:12 +05:30
Aravind Mani
c53822c9e8
[201911] Dell S6100:Add serial-getty service to monit (#8409)
Why I did it
serial-getty service exited in Dell S6100 device randomly.

How I did it
Added serial-getty to monit services.

How to verify it
Stop serial-getty in ssh session and check whether the service restarts or not
2021-08-19 10:13:34 -07:00
abdosi
de3d30f36d
Updated Broadcom SAI Debian package to 3.7.6.1 (#8365)
Updated Broadcom SAI Debian package to 3.7.6.1 Following are the major changes here:

- CS00011651922/CS00012192502 SID:Parity error in TDM Calendar memories causes traffic drop after SER correction
- CS00011222060 soc_mem_alpm_delete: unit 0: ALPM delete operation[L3_DEFIP_ALPM_IPV6_128] encountered parity error
- Cesto Phy Recovery enhancement.
- SDK compile with flag -DBCM_MONOTONIC_TIME and -DBCM_MONOTONIC_MUTEXES
2021-08-06 17:55:41 -07:00
Arun Saravanan Balachandran
d573cd141d
[201911] DellEMC S6100: Update SSD upgrade status checker (#8225)
Why I did it
To handle newer SSD firmware version in DellEMC S6100 platform (S210506G - 3IE devices).

How I did it
Update s6100_ssd_upgrade_status.sh to handle newer SSD firmware version.

How to verify it
Logs: UT_logs.txt
2021-08-05 22:43:53 -07:00
abdosi
0f56f8b4f4
[201911] Updated to Broadcom SAI debian package to 3.7.5.2-3 (#7887)
Updated to Broadcom SAI debian package to 3.7.5.2-3
2021-06-15 16:03:23 -07:00
Joe LeVeque
b6acac4e6a [brcm] Fix and simplify start_led.sh (#7548)
LED_PROC_INIT_SOC variable was incorrectly referenced as LED_SOC_INIT_SOC. Introduced in #5483

Rather than fixing the typo, I decided to simplify the script, removing the need for the conditional altogether by moving the bcmcmd call inside the conditional which checks for the presence of LED_SOC_INIT_SOC.
2021-05-31 08:09:19 -07:00
zzhiyuan
f2afdf666e
[201911][Arista] Update Arista submodule to include pmbus fix (#7723)
#### Why I did it
Microsoft reported occasional daemon crashes on devices running 201911. On close inspection it was due to PMBus reads failing on IOError on very rare occasions.

#### How I did it
Add try/except block on performing reads on PMBus GPIOs.

Co-authored-by: Zhi Yuan (Carl) Zhao <zyzhao@arista.com>
2021-05-27 12:30:38 -07:00
Santhosh Kumar T
04b6112132
[DellEMC] Recovering the SSD upgrade status post reload in S6100 (#7688)
Why I did it
To recover the SSD upgrade state in case, if ONIE-uninstall or ssd_fw_upgrade folder got deleted.
To handle newer SSD version(S21506G - 3IE GPIO7 low devices).
Also correcting the error messages for non-upgraded S6100s.
2021-05-25 15:24:09 -07:00
Santhosh Kumar T
6204a1d809
[201911] DellEMC S6100 SSD Monitor additional changes (#7291)
Why I did it
Added soft-reboot plugin support.
Added SSD version s16425cq check
Added error message to display in console/SSH in case reboot is called in faulty/non-upgraded devices.
2021-05-04 09:48:04 -07:00
rkdevi27
47011a8e2c
[201911][DellEMC] Fix abrupt reboot in S6000 (#6909)
The S6000 devices, the cold reboot is abrupt and it is likely to cause issues which will cause the device to land into EFI shell. Hence the platform reboot will happen after graceful unmount of all the filesystems as in S6100.
2021-03-30 16:14:45 -07:00
Joe LeVeque
72b32a96fc
[201911][dockers][supervisor] Increase event buffer size for process exit listener (#7106)
Backport of https://github.com/Azure/sonic-buildimage/pull/7083 to the 201911 branch.

#### Why I did it

To prevent error [messages](https://dev.azure.com/mssonic/build/_build/results?buildId=2254&view=logs&j=9a13fbcd-e92d-583c-2f89-d81f90cac1fd&t=739db6ba-1b35-5485-5697-de102068d650&l=802) like the following from being logged:

```
Mar 17 02:33:48.523153 vlab-01 INFO swss#supervisord 2021-03-17 02:33:48,518 ERRO pool supervisor-proc-exit-listener event buffer overflowed, discarding event 46
```

This is basically an addendum to https://github.com/Azure/sonic-buildimage/pull/5247, which increased the event buffer size for dependent-startup. While supervisor-proc-exit-listener doesn't subscribe to as many events as dependent-startup, there is still a chance some containers (like swss, as in the example above) have enough processes running to cause an overflow of the default buffer size of 10.

This is especially important for preventing erroneous log_analyzer failures in the sonic-mgmt repo regression tests, which have started occasionally causing PR check builds to fail. Example [here](https://dev.azure.com/mssonic/build/_build/results?buildId=2254&view=logs&j=9a13fbcd-e92d-583c-2f89-d81f90cac1fd&t=739db6ba-1b35-5485-5697-de102068d650&l=802).

I set all supervisor-proc-exit-listener event buffer sizes to 1024, and also updated all dependent-startup event buffer sizes to 1024, as well, to keep things simple, unified, and allow headroom so that we will not need to adjust these values frequently, if at all.
2021-03-29 10:07:43 -07:00
Santhosh Kumar T
140576ddbb
[201911] DellEMC S6100 SSD Monitor (#6934)
Why I did it
To monitor the SSD health condition in DellEMC S6100 platform post upgrade.

A daemon is introduced to monitor the SSD every one hour.

To check for SSD status at boot time and at the time of cold-reboot.

All these changes are supported only for newer SSD firmware.

Added a platform_reboot_pre_check script to prevent cold-reboot based on SSD status.
Depends on Azure/sonic-utilities#1472
DO NOT MERGE UNTIL ABOVE PR IS MERGED
2021-03-12 17:02:17 -08:00
gechiang
705b0c4daa
[broadcom]: BRCM SAI 3.7.5.2-2 Pick up fix for CS00011729558 SAI_STATUS_INSUFFICIENT_RESOURCE wit attr SAI_BUFFER_PROFILE_ATTR_RESERVED_BUFFER_SIZE on the buffer profile using mmuconfig -p egress_lossy_profile (#6900)
This is to address the issue when "mmuconfig -p egress_lossy_profile" is executed which causes SYNCd failure with SAI_STATUS_INSUFFICIENT_RESOURCE for attr SAI_BUFFER_PROFILE_ATTR_RESERVED_BUFFER_SIZE.
This change also requires the change from (https://github.com/Azure/sonic-swss/pull/1649)
This SAI change was already tested as part of the (https://github.com/Azure/sonic-swss/pull/1649) PR.
2021-03-03 11:09:32 -08:00
Roy Lee
ce6cc3821f [device/accton]: As7816-64x, fix memory leakage on accton fan monitor. (#6168)
It's been reported that accton fan monitor process keeps consuming memory after few days.
The amount of memory occupied increases in linear and never leased.

Signed-off-by: roy_lee <roy_lee@edge-core.com>
2021-02-18 18:10:22 -08:00
Wirut Getbamrung
a5de91069c [device/celestica]: Add thermalctld support on DX010 platform APIs (#6089)
**- Why I did it**
- The thermalctld daemon on the Pmon docker requires support from the thermal manager API.

**- How I did it**
- Removed the old function for detecting a faulty fan.
- Removed the old function for detecting excess temperature.
- Implement thermal_manager APIs based on ThermalManagerBase
- Implement thermal_conditions APIs based on ThermalPolicyConditionBase
- Implement thermal_actions APIs based on ThermalPolicyActionBase
- Implement thermal_info APIs based on ThermalPolicyInfoBase
- Add thermal_policy.json
2021-02-18 18:09:57 -08:00
Samuel Angebault
6cc5c93484
[arista]: Update Arista driver submodules (#6670)
On the DCS-7060CX-32S, a SEU can happen on a CPLD which by default would reboot the platform.
Other SEU scenarios are already handled but this one was missed since it's specific to this platform.
It's a pretty rare case which will now be reported in the syslog the same way others are.
2021-02-10 23:15:55 -08:00
lguohan
fcf93dda12
[sonic-linux-kernel]: kernel security update to 4.9.246 (#6545)
* [sonic-linux-kernel]: kernel security update to 4.9.246
* [Arista] Update driver submodule (#60)
     Update kernel dependency to 4.9.0-14-2

Signed-off-by: Guohan Lu <lguohan@gmail.com>
Co-authored-by: Samuel Angebault <angebault.samuel@gmail.com>
2021-01-28 08:46:07 -08:00
lguohan
22a19e87aa [build]: wait for conflicts package to be uninstalled (#5039)
when parallel build is enabled, both docker-fpm-frr and docker-syncd-brcm
is built at the same time, docker-fpm-frr requires swss which requires to
install libsaivs-dev. docker-syncd-brcm requires syncd package which requires
to install libsaibcm-dev.

since libsaivs-dev and libsaibcm-dev install the sai header in the same
location, these two packages cannot be installed at the same time. Therefore,
we need to serialize the build between these two packages. Simply uninstall
the conflict package is not enough to solve this issue. The correct solution
is to have one package wait for another package to be uninstalled.

For example, if syncd is built first, then it will install libsaibcm-dev.
Meanwhile, if the swss build job starts and tries to install libsaivs-dev,
it will first try to query if libsaibcm-dev is installed or not. if it is
installed, then it will wait until libsaibcm-dev is uninstalled. After syncd
job is finished, it will uninstall libsaibcm-dev and swss build job will be
unblocked.

To solve this issue, _UNINSTALLS is introduced to uninstall a package that
is no longer needed and to allow blocked job to continue.

Signed-off-by: Guohan Lu <lguohan@gmail.com>
2021-01-27 14:07:30 -08:00
lguohan
8bcdefbc34 [docker-orchagent]: make build depends only on sairedis package (#6467)
backport c4b5b002c3

make swss build depends only on libsairedis instead of syncd. This allows to build swss without depending
on vendor sai library.

Currently, libsairedis build also buils syncd which requires vendor SAI lib. This makes difficult to build
swss docker in buster while still keeping syncd docker in stretch, as swss requires libsairedis which also
build syncd and requires vendor to provide SAI for buster. As swss docker does not really contain syncd
binary, so it is not necessary to build syncd for swss docker.

[submodule]: update sonic-sairedis
1e42517996bfe41ac58d4c25ee3f93502befcb9d (HEAD -> 201911) [build]: add option to build without syncd

Signed-off-by: Guohan Lu <lguohan@gmail.com>
2021-01-27 13:51:24 -08:00
zzhiyuan
511541f7f0
[Arista] Use thermalctld instead of fancontrol (#6173)
**- Why I did it**
There is a preference to use thermalctld instead of fancontrol for 201911 release branch. The Arista platform submodule updates and thermal policies in the platforms will allow Arista devices to use thermalctld instead of fancontrol.

**- How I did it**
I cherry-picked the necessary commits from master branch for sonic-platform-modules-arista into 201911 branch. I've also added the file to skip fancontrol and added the thermal policies json.

**- How to verify it**
On Gardena, Upperlake, Clearlake, and Lodoga thermalctld is up and running with no errors. Fans show ~29%.

Co-authored-by: Zhi Yuan Carl Zhao <zyzhao@arista.com>
2021-01-27 08:31:32 -08:00
judyjoseph
c80a7c837c
Update the correct SAI version in the sai bcm debian package. (#6369)
[201911] Update the version of the SAI debian package to reflect the actual version 3.7.5.2-1
2021-01-06 18:29:51 -08:00
judyjoseph
e6b9c74ee0
Update SAI 3.7 brcm package (#6324)
Release new SAI bcm package with the new patches merged into SUG INT_3.7 release branch and fix provided for CS00011619081
2020-12-31 08:32:12 -08:00
jostar-yang
8c2242000e [as5835-54x] Modify qsfp port reset to normal state (#5161)
HW set qsfp port to reset at default. so need SW to set to normal when boot.

1. Modify cpld driver to invert reset offset value
2. Set to normal when boot.
2020-11-14 12:24:32 -08:00
Samuel Angebault
2284dd7a3c [led]: Skip ledinit if there is no led_proc_init.soc file for broadcom platform (#5483)
Some platforms don't leverage the brcm led coprocessor.
However ledinit will try to load a non existing file and exit with an
error code.
This change is a cosmetic fix mostly.

- How to verify it

Boot a platform without the configuration and verify in the syslog that the exit status of ledinit is 0
Boot a platform with the configuration and verify in the syslog that the exit status of ledinit is 0 and the leds are working.
Verified by adding a dumb led_proc_init.soc on an Arista platform which usually doesn't use it.
2020-11-09 12:38:11 -08:00
Samuel Angebault
8efc830718
[Arista] Update driver submodules (#5811)
- Improve SMBus performance
 - Introduce devel package currently not used by sonic build system
2020-11-04 17:06:30 -08:00
abdosi
0fad6bdc7f [monit] Adding patch to enhance syslog error message generation for monit alert action when status is failed. (#5720)
Why/How I did:

Make sure first error syslog is triggered based on FAULT TOLERANCE condition.

Added support of repeat clause with alert action. This is used as trigger
for generation of periodic syslog error messages if error is persistent

Updated the monit conf files with repeat every x cycles for the alert action
2020-11-01 10:27:10 -08:00
Aravind Mani
66e0298a89 [Dell S6100] Properly release memory upon ICH driver deinit (#5561)
During platform deinitialization, dell_ich is not removed properly and when we do initialize s6100 platform, ICH driver sysfs attributes are not attached. Because of this, get_transceiver_change_event returns error and this leads xcvrd to crash.
2020-10-30 08:59:18 -07:00
Aravind Mani
3734bf326b
[201911] DellEMC platform API 2.0 for Z9264f, S5232f (#5637)
Add platform API 2.0 support for Z9264f, S5232f in 201911 branch
2020-10-28 10:01:47 -07:00
Samuel Angebault
ce7248604e
[Arista] Update arista driver submodules (#5654)
Only import yaml python module if necessary
2020-10-17 21:06:17 -07:00
abdosi
7db8cb0e03
Taken the fix from BRCM SAI 4.2.1.3 (6.5.19 hsdk). (#5550)
The error message is updated to provide correct information
of Netdevice link being down.

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
2020-10-06 09:04:51 -07:00
Tamer Ahmed
2cc98b4bac [platform] Add Support For Environment Variable File (#5010)
* [platform] Add Support For Environment Variable

This PR adds the ability to read environment file from /etc/sonic.
the file contains immutable SONiC config attributes such as platform,
hwsku, version, device_type. The aim is to minimize calls being made
into sonic-cfggen during boot time.

singed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
2020-09-28 21:14:39 +00:00
Joe LeVeque
b70c6f72b2 [dockers][supervisor] Increase event buffer size for dependent-startup (#5247)
When stopping the swss, pmon or bgp containers, log messages like the following can be seen:

```
Aug 23 22:50:43.789760 sonic-dut INFO swss#supervisord 2020-08-23 22:50:10,061 ERRO pool dependent-startup event buffer overflowed, discarding event 34
Aug 23 22:50:43.789760 sonic-dut INFO swss#supervisord 2020-08-23 22:50:10,063 ERRO pool dependent-startup event buffer overflowed, discarding event 35
Aug 23 22:50:43.789760 sonic-dut INFO swss#supervisord 2020-08-23 22:50:10,064 ERRO pool dependent-startup event buffer overflowed, discarding event 36
Aug 23 22:50:43.789760 sonic-dut INFO swss#supervisord 2020-08-23 22:50:10,066 ERRO pool dependent-startup event buffer overflowed, discarding event 37
```

This is due to the number of programs in the container managed by supervisor, all generating events at the same time. The default event queue buffer size in supervisor is 10. This patch increases that value in all containers in order to eliminate these errors. As more programs are added to the containers, we may need to further adjust these values. I increased all buffer sizes to 25 except for containers with more programs or templated supervisor.conf files which allow for a variable number of programs. In these cases I increased the buffer size to 50. One final exception is the swss container, where the buffer fills up to ~50, so I increased this buffer to 100.

Resolves https://github.com/Azure/sonic-buildimage/issues/5241
2020-09-28 16:12:53 +00:00
yozhao101
7580c846ad
[201911][Monit] Unmonitor processes in disabled containers (#5462)
We want to let Monit to unmonitor the processes in containers which are disabled in `FEATURE` table such that
Monit will not generate false alerting messages into the syslog.

- Backport of https://github.com/Azure/sonic-buildimage/pull/5153 to the 201911 branch

Signed-off-by: Yong Zhao <yozhao@microsoft.com>
2020-09-25 00:30:41 -07:00
Wirut Getbamrung
10534a39eb
[device/celestica]: Update DX010 platform APIS on 201911 branch (#5416)
* [device/celestica]: DX010 platform API update (#4608)

- Fix fancontrol.service path
- Fix return temp format in thermal API
- Improve init time in chassis API
- Upgrade sfp API

* [device/celestica]: Update DX010 reboot cause API (#4678)

- Add more cases support in DX010 reboot cause API
    - Add Thermal Overload reboot cause support
    - Add new Watchdog reboot cause support

* [device/celestica]: using sonic-py-common package
2020-09-24 10:20:15 -07:00
Aravind Mani
a637c7f9b7
[201911] Dell S6100 fix mux issue (#5415)
- Why I did it
For fixing PCA MUX attachment issue in Dell S6100 platform.

- How I did it
Wait till IOM MUX powered up properly and start I2C enumeration.
2020-09-22 15:29:21 -07:00
Arun Saravanan Balachandran
cb55779937 DellEMC S6100: Log HW reboot reason registers (#5361) 2020-09-19 14:09:31 -07:00
Samuel Angebault
4a185fc09b
[Arista] Update driver submodules. (#5408)
- Fix show platform firmware platform plugin error
- Fix import behavior for arista's sonic_platform implementation
- Fix fan led color and detection on Smartsville
2020-09-18 21:40:49 -07:00
judyjoseph
ef89cb96a0
[201911] Broadcom SAI 3.7.5.2 (#5330)
* Broadcom SAI 3.7.5.2, with the fixes for following CSP's 

e5e06f4 Fix for CS00010914668(KB0029456/SDK-218585) and CS00010503275(KB0029315/SDK-213475)
cf4f8da Solution for CS00010775359 in 3.7
0348f03 Patch for CS00010897814
a2d2fdd Patch for CS00010817763
4d362e8 Patch for CS00010636736
557ddc6 Solution for CS00010443542
0f122f1 Port SDK SER fix for dynamic tables (SDK-175398 / SDK-221245) to SAI 3.7
37e5c5e Fix for CS00010790550
64daf8a Fix for CS00010726597
e7f000e Fix for CS00010697761
44b7ab3 Solution for CS00010617498.
1475c24 CSP10503275 request to pull KB0029314 into 3.7
2020-09-18 17:17:30 -07:00
Samuel Angebault
cde4e88f3a
[201911][Arista] Update arista drivers submodules (#5290)
- fix watchdog timeout units
 - remove arista bind mounts for docker-snmp
 - add python3 mounts for pmon
2020-09-01 20:19:13 -07:00
Aravind Mani
fef4626804
Dell S6100 Port I2C changes to 201911 branch (#5148) 2020-08-18 14:39:36 -07:00
Guohan Lu
da7e36e869 [docker-syncd-brcm]: use service dependency in supervisord to start services 2020-08-15 22:31:23 -07:00
Joe LeVeque
309a098b21
[201911][Python] Migrate applications/scripts to import sonic-py-common package (#5132)
As part of consolidating all common Python-based functionality into the new sonic-py-common package, this pull request:
1. Redirects all Python applications/scripts in sonic-buildimage repo which previously imported sonic_device_util or sonic_daemon_base to instead import sonic-py-common, which was added to the 201911 branch in https://github.com/Azure/sonic-buildimage/pull/5063
2. Replaces all calls to `sonic_device_util.get_platform_info()` to instead call `sonic_py_common.get_platform()` and removes any calls to `sonic_device_util.get_machine_info()` which are no longer necessary (i.e., those which were only used to pass the results to `sonic_device_util.get_platform_info()`.
3. Removes unused imports to the now-deprecated sonic-daemon-base package and sonic_device_util.py module

This is a step toward resolving https://github.com/Azure/sonic-buildimage/issues/4999
2020-08-13 16:35:53 -07:00
abdosi
ea1b25a210
[libsaibcm] updated libsaibcm debian package for (#5138)
3.7.5.1-3 in 201911

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
2020-08-11 08:55:43 -07:00
Joe LeVeque
840be7732c
[201911][devices] Update SFP keys to align with new standard (#4976)
Align SFP key names with new standard defined in https://github.com/Azure/sonic-platform-common/pull/97

- hardwarerev -> hardware_rev
- serialnum -> serial
- manufacturename -> manufacturer
- modelname -> model
- Connector -> connector
2020-07-16 11:09:47 -07:00
Samuel Angebault
41ba95ee3f
[arista] update Arista drivers submodules (#4967)
Merge most of the changes that recently made it to master.
This will be the last such merge operation and future commits will only cherry-pick fixes and targeted features.

Major fixes and features,
- reboot cause enhancement with more hardware reboot cause reporting
- fix reboot cause parsing issue with 201811 release
- fix get_change_event logic
- fix error message on missing sysfs entry by our plugins
- final piece of the platform refactors for fan and sensor reporting through the platform API
2020-07-16 10:36:07 -07:00
arlakshm
7c699df654 Add support for bcmsh and bcmcmd utlitites in multi ASIC devices (#4926)
Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>
This PR has changes to support accessing the bcmsh and bcmcmd utilities on multi ASIC devices
Changes done
- move the link of /var/run/sswsyncd from docker-syncd-brcm.mk to docker_image_ctl.j2
- update the bcmsh and bcmcmd scripts to take -n [ASIC_ID] as an argument on multi ASIC platforms
2020-07-11 09:47:24 -07:00