Commit Graph

4247 Commits

Author SHA1 Message Date
Qi Luo
cb79de1b12 [build]: stop prompt during build (#6585)
Some commands used during build will prompt user interactively, but this is not expected during build. Since most output is collected into log file, user could not see the prompt and feel the build process hangs.

- How I did it

Use mv command in non interactive mode
Redirect stdin to null if command output is collected into log file.
2021-01-28 09:29:56 -08:00
Vaibhav Hemant Dixit
afd30ee170 [sonic-sairedis] advance submodule to include fix for syncd crash during shutdown (#6581)
Remove unregisterMessageHandler from NetMsgRegistrar thread (#779)
2021-01-28 09:29:43 -08:00
Guohan Lu
89eb0c4a3c [docker-fmp-frr]: remove blank lines in generated critical_process
Signed-off-by: Guohan Lu <lguohan@gmail.com>
2021-01-28 09:29:09 -08:00
Guohan Lu
f00bb52f7c [proc-exit-listener]: ignore blank lines
make proc-exit-listener more rebust

Signed-off-by: Guohan Lu <lguohan@gmail.com>
2021-01-28 09:28:52 -08:00
yozhao101
cc9c3f567e [supervisord] Monitoring the critical processes with supervisord. (#6242)
- Why I did it
Initially, we used Monit to monitor critical processes in each container. If one of critical processes was not running
or crashed due to some reasons, then Monit will write an alerting message into syslog periodically. If we add a new process
in a container, the corresponding Monti configuration file will also need to update. It is a little hard for maintenance.

Currently we employed event listener of Supervisod to do this monitoring. Since processes in each container are managed by
Supervisord, we can only focus on the logic of monitoring.

- How I did it
We borrowed the event listener of Supervisord to monitor critical processes in containers. The event listener will take
following steps if it was notified one of critical processes exited unexpectedly:

The event listener will first check whether the auto-restart mechanism was enabled for this container or not. If auto-restart mechanism was enabled, event listener will kill the Supervisord process, which should cause the container to exit and subsequently get restarted.

If auto-restart mechanism was not enabled for this contianer, the event listener will enter a loop which will first sleep 1 minute and then check whether the process is running. If yes, the event listener exits. If no, an alerting message will be written into syslog.

- How to verify it
First, we need checked whether the auto-restart mechanism of a container was enabled or not by running the command show feature status. If enabled, one critical process should be selected and killed manually, then we need check whether the container will be restarted or not.

Second, we can disable the auto-restart mechanism if it was enabled at step 1 by running the commnad sudo config feature autorestart <container_name> disabled. Then one critical process should be selected and killed. After that, we will see the alerting message which will appear in the syslog every 1 minute.

- Which release branch to backport (provide reason below if selected)

 201811
 201911
[x ] 202006
2021-01-28 09:28:27 -08:00
Shi Su
fc825f9a58 [FRR] Create a separate script to wait zebra to be ready to receive connections (#6519)
The requirement for zebra to be ready to accept connections is a generic problem that is not 
specific to bgpd. Making the script to wait for zebra socket a separate script and let bgpd and 
staticd to wait for zebra socket.
2021-01-28 09:25:29 -08:00
Kebo Liu
7fc8caa36b [mellanox]: Update SAI to sonic2012 1.18.1.0 (#6566)
Changes in the new release:

1. Policy based hashing optimization
2. New attribute support for Max port headroom
3. Tunnel ECN map fixes
4. Tunnel EVPN skeleton extensions (peer attrib, maps)
5. Bridge port admin not affecting port admin (optimize port down time)
6. CRM new API for neighbors and tunnel termination entries
7. Improve FDB event for flush by bridge port (before, null bridge was reported to SONiC, now the bridge will be extracted from bridge port)
8. DHCP L2 v4+v6 traps (for ZTP use case)
9. Generic counter implementation

Signed-off-by: Kebo Liu <kebol@nvidia.com>
2021-01-28 09:25:05 -08:00
dflynn-Nokia
e469460ef3 [docker-config-engine-stretch]: Fix dependency typo PYTHON2_SWSSCOMMON (#6568)
This commit fixes a typo in the fix delivered in PR #6538

syncd fails on the armhf platform within sonic-config-engine/portconfig.py when importing the following
'from swsscommon.swsscommon import ConfigDBConnector'
2021-01-28 09:24:55 -08:00
Guohan Lu
11a8e89f27 [build]: add _BUILD_ENV to specify env for dpkg-buildpackage
Signed-off-by: Guohan Lu <lguohan@gmail.com>
2021-01-28 09:23:36 -08:00
Guohan Lu
0b2abafcde [docker-ptf]: build docker ptf
- combine docker-ptf-saithrift into docker-ptf docker
- build docker-ptf under platform vs
- remove docker-ptf for other platforms

Signed-off-by: Guohan Lu <lguohan@gmail.com>
2021-01-28 09:23:12 -08:00
Kebo Liu
3706b31b49 Add hw-mgmt patch to support SDK OFFLINE event for handling flow within service firmware upgrade (#6550)
During ISSU, "mlxsw_minimal" driver still trying to access firmware, in some cases FW could return some wrong critical threshold value which will cause switch shutdown.

**- How I did it**
In order to prevent "mlxsw_minimal" driver from accessing ASIC during ISSU, SDK will raise "OFFLINE" 'udev' event
at the early beginning of such flow. When this event is received, hw-management will remove "mlxsw_minimal" driver.
There is no need to implement the opposite "ONLINE" event since this flow is ended up with "kexec".

**- How to verify it**
repeatedly perform warm reboot, make sure there is no switch shutdown occurred.
2021-01-28 09:22:52 -08:00
bingwang-ms
be281536e1 [bgpmon]: Fix exception in bgpmon caused by duplicate bgp neighbor ID (#6546)
* Fix exception in bgpmon caused by duplicate keys
It is possible that BGP neighbors in IPv4 and IPv6 address families
share the same name (such as bgp monitor). However, such case is not
handled in bgpmon, and an Exception will be raised. This commit will
address the issue by Using set instead of list to avoid duplicate keys.
2021-01-28 09:22:39 -08:00
Kebo Liu
a0fd862620 [mellanox]: Update SDK to 4.4.2308, FW to *.2008.2308 (#6552)
Bugs fixes:
    All | Kernel | During system reload when CPU is loaded with heavy traffic, a Kernel Panic may occur.
    All | Modules, Port split | FW stuck when device rebooted with locked Optical Transceivers in split mode
    Spectrum-3 | PFC | On Spectrum-3 systems, slow reaction time to Rx pause packets on 40GbE ports may lead to buffer overflow on servers.
    Spectrum-3 | SN4700, Port Split | On rare occasion SN4700, conducting 100G split (4x25G) in NRZ when splitter port 1 or 2 are down, ports 3 and 4 will also go down.

Enahncments:
    All | Kernel | new notification on ISSU start, so other kernel drivers can disable any interface to ASIC

Signed-off-by: Kebo Liu <kebol@nvidia.com>
2021-01-28 09:21:56 -08:00
Tamer Ahmed
80ba3a4f6f [dhcp-relay]: Launch DHCP Relay On L3 Vlan (#6527)
Recent changes brought l2 vlan concept which do not have DHCP
clients behind them and so DHCP relay is not required. Also,
dhcpmon fails to launch on those vlans as their interfaces
lack IP addresses. This PR limit launch of both DHCP relay
and dhcpmon to L3 vlans only.

singed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
2021-01-28 09:21:49 -08:00
Guohan Lu
e6c98eab3e [submodule]: update sonic-swss
* 01c3abd 2021-01-23 | Enhance dynamic buffer calculation and bug fixes (#1601) (HEAD, origin/202012) [Stephen Sun]
* e7af660 2021-01-24 | [fpmsyncd] Skip routes to eth0 or docker0 (#1606) [Shi Su]

Signed-off-by: Guohan Lu <lguohan@gmail.com>
2021-01-24 22:50:49 -08:00
lguohan
3b3161050e [submodule]: update sonic-sairedis (#6544)
* 20b9573 2021-01-24 | [SAI]: update SAI submodule (#775) (HEAD, origin/master, origin/HEAD) [lguohan]
* 667c33d 2021-01-22 | [syncd] Comparison logic add support to LABEL attribute with higher priority (#764) [Kamil Cudnik]
* aaf5b98 2021-01-22 | [vslib]: Fix missing MACsec Create Port action (#770) [Ze Gan]

Signed-off-by: Guohan Lu <lguohan@gmail.com>
2021-01-24 22:43:58 -08:00
Zhenhong Zhao
154a6ab6c5 [frrcfgd] introduce frrcfgd to manage frr config when frr_mgmt_framework_config is true (#5142)
- Support for non-template based FRR configurations (BGP, route-map, OSPF, static route..etc) using config DB schema.
- Support for save & restore - Jinja template based config-DB data read and apply to FRR during startup

**- How I did it**

- add frrcfgd service
- when frr_mgmg_framework_config is set, frrcfgd starts in bgp container
- when user changed the BGP or other related table entries in config DB, frrcfgd will run corresponding VTYSH commands to program on FRR.
- add jinja template to generate FRR config file to be used by FRR daemons while bgp container restarted

**- How to verify it**
1. Add/delete data on config DB and then run VTYSH "show running-config" command to check if FRR configuration changed.
1. Restart bgp container and check if generated FRR config file is correct and run VTYSH "show running-config" command to check if FRR configuration is consistent with attributes in config DB

Co-authored-by: Zhenhong Zhao <zhenhong.zhao@dell.com>
2021-01-24 22:43:30 -08:00
Dmytro Shevchuk
7e2b7553bc [sonic-cfggen] parse optional fec and autoneg fields from hwsku.json (#6155)
**- Why I did it**

For now `hwsku.json` and `platform.json` dont support optional fields. For example no way to add `fec` or `autoneg` field using `platform.json` and `hwsku.json`.

**- How I did it**
Added parsing of optional fields from hwsku.json.

**- How to verify it**
Add optional field to `hwsku.json`. After first boot will be generated new `config_db.json` or you can generate it using `sonic-cfggen` command. In this file must be optional field from `hwsku.json` or check using command `redis-cli hgetall PORT_TABLE:Ethernet0`
Example of `hwsku.json`, that must be parsed:
```
{
    "interfaces": {
        "Ethernet0": {
            "default_brkout_mode": "1x100G[40G]",
            "fec": "rs",
            "autoneg": "0"
        },
...
}
```
Example of generated `config_db.json`:
```
    "PORT": {
        "Ethernet0": {
            "alias": "Ethernet0",
            "lanes": "0,1,2,3",
            "speed": "100000",
            "index": "1",
            "admin_status": "up",
            "fec": "rs",
            "autoneg": "0",
            "mtu": "9100"
        },
```
So, we can see this entries in redis db:
```
admin@sonic:~$ redis-cli hgetall PORT_TABLE:Ethernet0

 1) "alias"
 2) "Ethernet0"
 3) "lanes"
 4) "0,1,2,3"
 5) "speed"
 6) "100000"
 7) "index"
 8) "1"
 9) "admin_status"
10) "up"
11) "fec"
12) "rs"
13) "autoneg"
14) "0"
15) "mtu"
16) "9100"
17) "description"
18) ""
19) "oper_status"
20) "up"
```

Also its way to fix `show interface status`, `FEC` field but also need add `FEC` field to `hwsku.json`.
Before:
```
admin@sonic:~$ show interfaces status
  Interface            Lanes    Speed    MTU    FEC        Alias    Vlan    Oper    Admin             Type    Asym PFC
-----------  ---------------  -------  -----  -----  -----------  ------  ------  -------  ---------------  ----------
  Ethernet0          0,1,2,3     100G   9100     N/A    Ethernet0  routed      up       up  QSFP28 or later         N/A
```
After:
```
admin@sonic:~$ show interfaces status
  Interface            Lanes    Speed    MTU    FEC        Alias    Vlan    Oper    Admin             Type    Asym PFC
-----------  ---------------  -------  -----  -----  -----------  ------  ------  -------  ---------------  ----------
  Ethernet0          0,1,2,3     100G   9100     rs    Ethernet0  routed      up       up  QSFP28 or later         N/A
```
2021-01-24 22:43:10 -08:00
Antonina Melnyk
df76e33bb3 [barefoot] Fixes for platform API (#6487)
There was a mismatch with Eeprom class methods names and methods called from Eeprom class.

Signed-off-by: Antonina Melnyk antoninax.melnyk@intel.com
2021-01-24 22:42:31 -08:00
Joe LeVeque
c8b3a709f1 [sonic-host-services] Report unit test coverage (#6533)
To view unit test coverage of sonic-host-services package upon build
2021-01-23 21:16:23 -08:00
Tamer Ahmed
59fd5fd3b8 [build-docker-buster]: Install libboost 1.171 In Build Docker (#6532)
Installing newst buster version of libboost (v1.71) in build docker.

signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
2021-01-23 21:16:01 -08:00
Joe LeVeque
ceecc8f9d4 Remove things needed for building Python 3 from source (#6441)
**- Why I did it**

Prior to SONiC using Debian Buster, we needed to build Python 3.5 or newer from source for installation in the SNMP container, becuase it wasn't available from the Debian repository for Jessie or Stretch. Now that all containers are based on Buster, we simply install Python 3.7 from the Debian repository in the host as well as all containers. We are no longer building Python 3 from source, so the Makefile is unused and we no longer need to install build dependencies in the slave containers.

**- How I did it**

- Remove Python 3 makefile
- No longer install Python 3 build dependencies in the slave containers.
2021-01-23 21:13:55 -08:00
Joe LeVeque
6a50042b66 [process-reboot-cause] Make process-reboot-cause executable (#6534)
process-reboot-cause script should be executable.
2021-01-23 21:06:30 -08:00
Joe LeVeque
50912ad966 [sonic-platform-daemons] Update submodule (#6535)
Submodule changes to be committed:

* src/sonic-platform-daemons 81318f7...e72f6cd (3):
  > [ledd] Minor refactor; add unit tests (#143)
  > [thermalctld] Report unit test coverage (#141)
  > [psud] Increase unit test coverage (#140)
2021-01-23 21:06:17 -08:00
Danny Allen
bedd05bb43 [DellEMC Z9332f] Remove duplicate ipmihelper.py script (#6536)
Fixes #6445

Because the ipmihelper.py script in the 9332 folder is slightly different than the common one (due to LGTM fixes), when the common one gets copied during build time it causes the workspace/build to become dirty.

Signed-off-by: Danny Allen <daall@microsoft.com>
2021-01-23 21:05:55 -08:00
Qi Luo
1239310e02 [docker-config-engine-stretch]: Add missing dependency PYTHON2_SWSSCOMMON (#6538)
Otherwise all the docker image derived from docker-config-engine-stretch will have broken SONIC_CONFIG_ENGINE_PY2
The bug is introduced in #6406
2021-01-23 21:05:45 -08:00
Samuel Angebault
9dfee1f09a [pmon]: Run ledd using python3 unless excluded (#6528)
**- Why I did it**

Ledd is the last daemon that is not enabled to run in python3.
Even though there is a plan to deprecate this daemon and to replace it by something else it's one simple step toward python2 deprecation.

**- How I did it**

Changed the `command=` line for `ledd` in the `supervisord` configuration of `pmon`.
Copied what was done for other daemons.

**- How to verify it**

Booting a product that has a `led_control.py` should now show the ledd running in python3.
I ran `python3 -m pylint` on all `led_control.py` plugin which means that most of them should be python3 compliant.
There is however still a risk that some might not work.
2021-01-22 11:42:22 -08:00
Prince Sunny
32c20bdbe4 Submodule update swss-common (#6525) 2021-01-22 10:57:28 -08:00
abdosi
dbe71dfdeb Updated BBR to use peer group name as prefix. (#6515)
To make BBR configured for peer-group if it's name starts with (prefixed) with the string define in constants.yml instead of exact string match.
2021-01-22 10:57:17 -08:00
Lawrence Lee
1923920b33 [minigraph.py]: Force /128 prefix for server IPv6 loopbacks (#6524)
Meet the requirement for the MUX_CABLE table that IPv6 loopbacks have a /128 prefix

Note that this change only affects the MUX_CABLE table, all other tables continue to use the loopback address provided in minigraph.

Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
2021-01-22 10:57:06 -08:00
Tamer Ahmed
4220f1c2dd [sonic-swss-common]: Update Submodule (#6508)
Update in this change:
640a218 [packaging]: Add Support For Libboost v1.71.0 (#449)

signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
2021-01-22 10:56:21 -08:00
Qi Luo
28b62bee3f sonic-config-engine uses libswsscommon instead of swsssdk (#6406)
**- Why I did it**
swsssdk will be deprecated. Migrate sonic-config-engine to use libswsscommon library instead

**- How to verify it**
Unit test
2021-01-22 10:56:13 -08:00
Qi Luo
77ec2ee14e [sonic-swss] Update submodule (#6336)
Including below commits:

36f7332 2021-01-14 | modified ERR log to NOTICE log for FDB notification failure after VLAN delete (#1595) [madhanmellanox]
c21c883 2021-01-12 | [ci]: download artifacts from master branch build (#1597) [lguohan]
a1d03a4 2021-01-12 | [fgnhgorch] Match mode changes for Fine Grained ECMP (#1565) [anish-n]
1b65f3d 2021-01-12 | [ci]: use sonicbld pool (#1594) [lguohan]
48ae866 2021-01-12 | [pfcwd] Update PFC storm detection logic for Mellanox platforms (#1586) [Volodymyr Samotiy]
850001f 2021-01-12 | [FPMSYNCD] Evpn/Vxlan related changes to support FRR7.5(#1585) [KISHORE KUNAL]
64ca9bb 2021-01-12 | [ci]: only copy artifacts when build is successful (#1590) [lguohan]
17d0dae 2021-01-11 | [Fdborch] Fix for arm compilation (#1592) [Prince Sunny]
693a02c 2021-01-08 | [gearbox] Add support for "hwinfo" field (#1547) [Baptiste Covolato]
7e3b2c6 2021-01-09 | [Evpn Warmreboot] Added Dependancy check logic in VrfMgr (#1466) [nkelapur]
a960e2e 2021-01-09 | [Orchagent]: FdbOrch changes for EVPN VXLAN (#1275) [Pankaj Jain]
097cfda 2021-01-08 | [swss test] update setup guide for swss tests (#1582) [Ying Xie]
b42253a 2021-01-05 | Fix for armhf build (#1580) [Qi Luo]
d8c1465 2021-01-05 | [dvs] Update/disable DVS tests with new FRR 7.5 behavior (#1579) [Danny Allen]
f6c7422 2021-01-05 | ASIC internal temperature sensors support (#1517) [Santhosh Kumar T]
0aa9ef2 2021-01-01 | Simply by auto iterator type, because we will tune the return types of library functions (#1577) [Qi Luo]
773238b 2020-12-31 | [build]: Fix format string for size_t (#1576) [Qi Luo]
7ba4e43 2020-12-30 | [fgnhgorch] add warm reboot support for fgnhg (#1538) [weixchen1215]
4cf6617 2020-12-30 | [ci]: add build for arm64 and armhf (#1572) [lguohan]
6ebc0ed 2020-12-29 | [ci]: add azure-pipeline for amd64 (#1571) [lguohan]
e32b9d0 2020-12-29 | [FDBSYNCD] Added pytest for fdbsyncd (#1560) [KISHORE KUNAL]
a43f6be 2020-12-30 | [crm] Add support for snat, dnat and ipmc crm resources (#1511) [Prabhu Sreenivasan]
7fc3888 2020-12-29 | PY Test script for EVPN L3 VxLAN (#1330) [Tapash Das]
6eb36d9 2020-12-27 | vlanmgr changes related to EVPN VxLan warmboot (#1460) [anilkpan]

Signed-off-by: Guohan Lu <lguohan@gmail.com>
2021-01-22 10:40:51 -08:00
Qi Luo
c5b7370a8f [baseimage]: Cleanup sudoers file (#6518) 2021-01-21 08:41:23 -08:00
Tamer Ahmed
039f5c253c [submodule]: Update SONiC Utilities Submodule (#6507)
Changes in this update:
37695c8 [show]: Use TCP Connection For Muxcable Commands (#1371)
8119ba2 Validations checks while creating and deleting a Portchannel (#1326)
3df267e [config] Fix Breakout mode option and BREAKOUT_CFG table check method (#1270)
9bd709b [show] Fix show arp in case with FDB entries, linked to default VLAN (#1357)
bc2d27e [generate_dump]: fix syntax error

signed-of-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
2021-01-21 08:40:43 -08:00
KISHORE KUNAL
15c5423522 [frr]: ADD L3 VNI EVPN Support for SONiC, Send RMAC and VLAN along with prefix to fpmsyncd. (#4806)
Currently FRR is send Prefix with VNI information to FPMSYNCD. This PR allows FRR to send RMAC with EVPN Type5 prefix to fpmsyncd. This is a temp fix. This patch will be removed once neighorch is ready to handle the Prefix and ARP (containing RMAC) separately.
2021-01-21 08:40:15 -08:00
shlomibitton
deea12403a [submodule]: update sonic-utilities (#6485)
- [route_check.py] - update includes checks on subscriptions (https://github.com/Azure/sonic-utilities/pull/1344)
- Validations checks while adding a member to PortChannel and removing a member from a Portchannel (https://github.com/Azure/sonic-utilities/pull/1328)
- [show] Add subcommand to show midplane status for modular chassis (https://github.com/Azure/sonic-utilities/pull/1267)
- [pytest][qos][config] Added pytests for "config qos reload" commands" (https://github.com/Azure/sonic-utilities/pull/1346)
- Drop explict 3 seconds pause between two object updates/deletes. (https://github.com/Azure/sonic-utilities/pull/1359)
- [show]fix for show muxcable status by replacing "hostname" to "peer_switch" for deriving tor ipv4_address (https://github.com/Azure/sonic-utilities/pull/1360)
- [PFCWD] Fix 'start' pfcwd command (https://github.com/Azure/sonic-utilities/pull/1345)

Signed-off-by: Shlomi Bitton <shlomibi@nvidia.com>
2021-01-21 08:39:31 -08:00
Danny Allen
85bf62ad0d [submodule] Update sonic-sairedis submodule pointer (#6496)
[ci]: download artifacts from master branch (#768)
Do not create fabric port if mapping is not available (#769)
[syncd] Comparison logic log also current attr value on set operation (#763)
Add fabric port test to vslib (#737)
[ci]: use sonicbld pool (#766)
[tests] Remove exit command blocking all tests to run (#765)
[vslib]: adapt macsec sai 1.7.1 (#755)
Add support for SAI_SWITCH_ATTR_AVAILABLE_IPMC_ENTRY needed by CRM (#756)

Signed-off-by: Danny Allen <daall@microsoft.com>
2021-01-21 08:39:09 -08:00
Shi Su
5079de7647 [bgpd]: Check zebra is ready to connect when starting bgpd (#6478)
Fix #5026

There is a race condition between zebra server accepts connections and bgpd tries to connect. Bgpd has a chance to try to connect before zebra is ready. In this scenario, bgpd will try again after 10 seconds and operate as normal within these 10 seconds. As a consequence, whatever bgpd tries to sent to zebra will be missing in the 10 seconds. To avoid such a scenario, bgpd should start after zebra is ready to accept connections.
2021-01-19 01:11:50 -08:00
lguohan
9acbc591e1 [mellanox]: fix mellanox hw-management build (#6471)
use dpkg-buildpackage build with fakeroot

Signed-off-by: Guohan Lu <lguohan@gmail.com>
2021-01-19 01:11:42 -08:00
lguohan
17148d70fc [kvm]: add debug cmd for build_kvm_image.sh (#6472)
dump netstat info on error

Signed-off-by: Guohan Lu <lguohan@gmail.com>
2021-01-19 01:11:22 -08:00
Lawrence Lee
c7058a6d15 [minigraph.py]: Don't create mux table entries for servers w/o loopbacks (#6457)
Avoid sonic-cfggen crashing when a server does not have a configured loopback address in the minigraph

Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
2021-01-19 01:10:43 -08:00
dflynn-Nokia
0d93b233d2 [build arm] fix sonic-slave-buster build break (#6469)
When building the sonic-slave-buster docker container, the node.js package is
installed to meet the requirements of the Azure DevOPs pipleline
build. Recently this install of node.js has been failing.

This commit fixes that build break by upgrading the
sonic-slave-buster build to install version 14.x of node.js which is the
current LTS version for buster.
2021-01-19 01:10:25 -08:00
Kebo Liu
824d7adc2d [Mellanox] Make determine-reboot-cause service start after hw-management service (#6465)
**- Why I did it**

On the Mellanox platform, reboot cause is fetched from some certain sysfs which is created by the hw-management service. So determine-reboot-cause service shall start after hw-management, otherwise it could fail due to the related sysfs is not available yet.

**- How I did it**

Add a patch to the hw-management service to make sure determine-reboot-cause service should start after it.

Signed-off-by: Kebo Liu <kebol@nvidia.com>
2021-01-19 01:10:08 -08:00
Wirut Getbamrung
120a8da50d [device/celestica]: Add thermalctld support on DX010 platform APIs (#6089)
**- Why I did it**
- The thermalctld daemon on the Pmon docker requires support from the thermal manager API.

**- How I did it**
- Removed the old function for detecting a faulty fan.
- Removed the old function for detecting excess temperature.
- Implement thermal_manager APIs based on ThermalManagerBase
- Implement thermal_conditions APIs based on ThermalPolicyConditionBase
- Implement thermal_actions APIs based on ThermalPolicyActionBase
- Implement thermal_info APIs based on ThermalPolicyInfoBase
- Add thermal_policy.json
2021-01-19 01:09:54 -08:00
brandonchuang
c40e43aadb [device/accton] Fix accton driver not been installed (#6327)
Accton util applies lsmod to check if drivers are installed.
But lsmod may return error on startup and skip module installation.

Signed-off-by: Brandon Chuang <brandon_chuang@edge-core.com>
2021-01-15 08:23:04 -08:00
Roy Lee
29562d0a4b [device/accton]: As7816-64x, fix memory leakage on accton fan monitor. (#6168)
It's been reported that accton fan monitor process keeps consuming memory after few days.
The amount of memory occupied increases in linear and never leased.

Signed-off-by: roy_lee <roy_lee@edge-core.com>
2021-01-15 08:21:13 -08:00
Lawrence Lee
4f6e161079 [minigraph.py]: Check for empty cluster tag before parsing (#6440)
Some non-production minigraphs will have an empty ClusterName tag

Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
2021-01-15 08:21:08 -08:00
Kebo Liu
21d4df3dcd [mellanox][platform api] fix a missing import time module (#6458)
“time" module was missed to be imported and will cause an error when the branch hit.

Signed-off-by: Kebo Liu <kebol@nvidia.com>
2021-01-15 08:20:57 -08:00
Junchao-Mellanox
ed0ac08e44 [Mellanox] PSU and module thermals are no longer child of chassis (#6460)
In order to build up device hierachy, PSU and module thermals are no longer child of chassis. PSU thermal belongs to PSU objects and SFP thermals belong to SFP object now. Need align this change in platform.json. Move thermal objects to correct parent device
2021-01-15 08:20:43 -08:00