Commit Graph

4698 Commits

Author SHA1 Message Date
Joe LeVeque
c46bf41ea5 [sonic-host-services] Add 'parameterized' package as a test dependency (#7900)
#### Why I did it

Recently, the build started failing with messages like

```
2021-06-16T16:55:02.8675603Z tests/hostcfgd/hostcfgd_test.py:5: in <module>
2021-06-16T16:55:02.8676208Z     from parameterized import parameterized
2021-06-16T16:55:02.8677145Z E   ModuleNotFoundError: No module named 'parameterized'
```

Unit tests for hostcfgd depend on the `parameterized` Python package, but it was never added as a dependency to the setup.py file. This dependency was added ~3 months ago. I'm not sure why we only started seeing this failure recently.

#### How I did it

Add 'parameterized' package as a test dependency in setup.py for sonic-host-services package
2021-06-17 07:09:50 +00:00
Sujin Kang
d67a5b887f Support multiple pcie configuration file and change the pcie status table name to match with pcied changes (#7886)
Why I did it
Support multiple pcie configuration file and change the pcie status table name
This is to match with below two PRs.
Azure/sonic-platform-common#195
Azure/sonic-platform-daemons#189

How I did it
Check pcie configuration file with wild card and change the device status table name

How to verify it
Restart with changes and see if the pcie check works as expected.
2021-06-17 07:09:50 +00:00
Renuka Manavalan
e851a42db7 [Kubernetes]: The kube server could be used as http-proxy for docker (#7469)
Why I did it
The SONiC switches get their docker images from local repo, populated during install with container images pre-built into SONiC FW. With the introduction of kubernetes, new docker images available in remote repo could be deployed. This requires dockerd to be able to pull images from remote repo.

Depending on the Switch network domain & config, it may or may not be able to reach the remote repo. In the case where remote repo is unreachable, we could potentially make Kubernetes server to also act as http-proxy.

How I did it
When admin explicitly enables, the kubernetes-server could be configured as docker-proxy. But any update to docker-proxy has to be via service-conf file environment variable, implying a "service restart docker" is required. But restart of dockerd is vey expensive, as it would restarts all dockers, including database docker.

To avoid dockerd restart, pre-configure an http_proxy using an unused IP. When k8s server is enabled to act as http-proxy, an IP table entry would be created to direct all traffic to the configured-unused-proxy-ip to the kubernetes-master IP. This way any update to Kubernetes master config would be just manipulating IPTables, which will be transparent to all modules, until dockerd needs to download from remote repo.

How to verify it
Configure a switch such that image repo is unreachable
Pre-configure dockerd with http_proxy.conf using an unused IP (e.g. 172.16.1.1)
Update ctrmgrd.service to invoke ctrmgrd.py with "-p" option.
Configure a k8s server, and deploy an image for feature with set_owner="kube"
Check if switch could successfully download the image or not.
2021-06-17 07:09:50 +00:00
DavidZagury
49388fd595 [Mellanox] Install MFT packages on Syncd container (#7844)
To have access to MFT tools in the Syncd container on Mellanox switches due to SAI dump API implementation enhancements
2021-06-17 07:09:50 +00:00
Sudharsan Dhamal Gopalarathnam
199c75f36b
[202012][sonic-utilities] submodule update (#7891)
d86d765 [202012]Fixing db_migrator for Feature table (#1676)
440b0f4 [config] Sort Config Db When Saving (#1623) (#1651)
2021-06-16 18:33:41 +03:00
Blueve
4cbf7e975b [console][minigraph] Avoid generate config for self console port (#7817)
Signed-off-by: Jing Kan jika@microsoft.com
2021-06-16 12:46:25 +00:00
xumia
74955f5301 [build]: Fix missing the depended files of dpkg cache in config engine (#7840)
#### Why I did it
The PR checkers do not re-run the sonic-config-engine test cases, caused by some of the config files changes not detected.

https://sonic-jenkins.westus2.cloudapp.azure.com/job/mellanox/job/buildimage-mlnx-all/660/console
…
07:13:24  ======================================================================
07:13:24  ERROR: test_bgpd_quagga (tests.test_j2files.TestJ2Files)
07:13:24  ----------------------------------------------------------------------
…
07:13:24  ======================================================================
07:13:24  ERROR: test_zebra_quagga (tests.test_j2files.TestJ2Files)
07:13:24  ----------------------------------------------------------------------
…
07:13:24  error: Test failed: <unittest.runner.TextTestResult run=161 errors=2 failures=0>
07:13:24  [  FAIL LOG END  ] [ target/python-wheels/sonic_config_engine-1.0-py2-none-any.whl ]
07:13:24  make: *** [slave.mk:603: target/python-wheels/sonic_config_engine-1.0-py2-none-any.whl] Error 1
07:13:24  Makefile.work:292: recipe for target 'target/sonic-mellanox.bin' failed
07:13:24  make[1]: *** [target/sonic-mellanox.bin] Error 2
07:13:24  make[1]: Leaving directory '/data2/johnar/workspace/mellanox/buildimage-mlnx-all'
07:13:24  Makefile:7: recipe for target 'target/sonic-mellanox.bin' failed
07:13:24  make: *** [target/sonic-mellanox.bin] Error 2

See PR: https://github.com/Azure/sonic-buildimage/pull/7476


#### How I did it
Add the depended files.
See src/sonic-config-engine/tests/test_j2files.py
2021-06-16 12:44:53 +00:00
Lawrence Lee
11b2a607f8 [minigraph] Check for null VLAN MAC (#7854)
Explicitly check for null VLAN MAC in minigraph parser before setting it - if it is null, do not set the VLAN MAC attribute
2021-06-16 12:44:52 +00:00
Stephen Sun
a2e729122d [Mellanox] Adjust Makefile for SDK/python-sdk-api to support both python2 and python3 (#7848)
- Why I did it
Adjust the Makefile for SDK/python-SDK-API to support both python2 and python3

- How to verify it
Build the image and check whether python2 and python3 are both supported by SDK API.

Signed-off-by: Stephen Sun <stephens@nvidia.com>
2021-06-16 12:41:07 +00:00
Volodymyr Boiko
26c6f2a4b2 [barefoot][platform] Chassis.get_reboot_cause (#7794)
To fix determine-reboot-cause service which was failing due to non-implemented thrown from get_reboot_case, if the reboot was done with `sudo reboot` (cold reboot)

Signed-off-by: Volodymyr Boyko <volodymyrx.boiko@intel.com>
2021-06-16 12:38:30 +00:00
Andriy Yurkiv
2fe91ae30f Set default values only on the first start (#7735) 2021-06-16 12:38:30 +00:00
Shi Su
15bc3c3ae0 [bgpcfgd] Redistribute static routes (#7492)
Why I did it
Enable redistribution of static routes

How I did it
Enable redistribution of static routes when the first route is added to STATIC_ROUTE table of Config_DB and disable the redistribution when the last route is removed from STATIC_ROUTE table.
2021-06-16 03:53:19 +00:00
Guohan Lu
fa321f182c [ci]: set -ex for official build to exit on any build failures
Signed-off-by: Guohan Lu <lguohan@gmail.com>
2021-06-15 17:32:07 +08:00
gechiang
341e15b620
[202012] Bring in BRCM SAI changes from SAI 4.3.3.7 (#7850) 2021-06-14 17:52:35 -07:00
mssonicbld
99b03cff45
[ci/build]: Upgrade SONiC package versions (#7856) 2021-06-12 14:22:14 +00:00
Stephen Sun
79617d24fd
[submodule][202012] Advance submodule head for sonic-utilities (#7836)
Advance submodule head for sonic-utilities

b894c5b5 Fix build test failure caused by error module name (Azure/sonic-utilities#1662)
5a7c06a0 [config]][tacacs+] Change tacacs+ minimum timeout value base on spec (Azure/sonic-utilities#1631)
080a689c [202012] [db_migrator] fix old 1911 feature config migration to a new one. (Azure/sonic-utilities#1636)
43fff88c Change to use rvtysh when calling the show commands (Azure/sonic-utilities#1646)
88a823f0 [db_migrator][Mellanox] Update Mellanox buffer migrator with 2km-cable supported (Azure/sonic-utilities#1564)
d096ff78 [config]Static routes to config_db (1534)
a68d8d09 route_check: Updates  (Azure/sonic-utilities#1645)
2021-06-11 06:53:43 -07:00
mssonicbld
b5551f044e
[ci/build]: Upgrade SONiC package versions (#7805) 2021-06-11 12:58:22 +00:00
bingwang-ms
c1b380df73
[docker-teamd]: Increase teammgrd timeout to allow graceful shutdown. (#7662) (#7842)
The PR is a cherry-pick of #7662.

Signed-off-by: Nazarii Hnydyn <nazariig@nvidia.com>
2021-06-10 12:49:18 -07:00
yozhao101
fb2c995f53
[202012][Monit] Deprecate the feature of monitoring the critical processes by Monit (#7823)
Signed-off-by: Yong Zhao yozhao@microsoft.com

Why I did it
Currently we leveraged the Supervisor to monitor the running status of critical processes in each container and it is more reliable and flexible than doing the monitoring by Monit. So we removed the functionality of monitoring the critical processes by Monit.

How I did it
I removed the script process_checker and corresponding Monit configuration entries of critical processes.

How to verify it
I verified this on the device str-7260cx3-acs-1.
2021-06-09 09:04:22 -07:00
jostar-yang
a9824b73c6 [as7726-32x] Fix module_reset sysfs (#7715)
Fix modules_reset sysfs due to need to do revert.

Signed-off-by: Jostar Yang <jostar_yang@accton.com.tw>
2021-06-09 08:28:13 +00:00
Stepan Blyshchak
ce3bdaf697 [nvidia/mellanox] add MLNX_SDK_DEB_VERSION to SDK packages flags list. (#7747)
This is due to the fact that we use SONIC_OVERRIDE_BUILD_VARS internally
in our build jobs and this is not accounted in caching framework.
So we add MLNX_SDK_DEB_VERSION to force rebuild if we changed it via
SONIC_OVERRIDE_BUILD_VARS.

Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
2021-06-09 08:28:13 +00:00
Santhosh Kumar T
31a8b1c87a [DellEMC] Z9332: Change in i2c mapping (#7797)
#### Why I did it
- After [sonic-linux-kernel#177](https://github.com/Azure/sonic-linux-kernel/pull/177)  changes, the I2C mux channels of Baseboard and Switchboard CPLDs are moved from i2c-4 and i2c-5 to i2c-36 and i2c-37 respectively.
- This caused QSFP driver initialization of i2c-36 to i2c-41 to fail causing the ports from Ethernet208 to Ethernet248 fail.

#### How I did it
- The fix to this problem is to change the order of QSFP driver initialization to I2C mux channels.
- Instead of the order i2c-10 to i2c-41, the order i2c-4 to i2c-35 is being utilized.
- Also, need to change the i2c-mux-channel number for Baseboard CPLD and switchboard CPLD in scripts to access them.
2021-06-09 08:27:19 +00:00
Volodymyr Boiko
44d9489b8d [barefoot][platform] Refactor chassis.py (#7704)
#### Why I did it
On our platforms syncd must be up while using the sonic_platform.
The issue is warm-reboot script first disables syncd then instantiate Chassis, which tries to connect syncd in __init__.

#### How I did it
Refactor Chassis to lazy initialize components.

Signed-off-by: Volodymyr Boyko <volodymyrx.boiko@intel.com>
2021-06-09 08:25:30 +00:00
Volodymyr Boiko
384680a83f [platform][barefoot] Lazy initialize fans and thermals list (#7103)
Initialize fans and thermals lists on demand; make them properties in order to reduce Chassis object initialization time

Signed-off-by: Volodymyr Boyko <volodymyrx.boiko@intel.com>
2021-06-09 08:22:44 +00:00
Dror Prital
9b5e0694e3
[Mellanox][202012] Update FW version to 2008_3110 (#7807)
- Why I did it
Update FW version to 2008_3110 fixing SN3800 specific warm boot scenario:

1. Disable interface
2. Warm Boot
3. Enable Interface --> link will remain down.

- How I did it
Use new FW that contains the fix for the problem mentioned above

- How to verify it
Run the scenario mentioned above and make sure that the link is up after warm boot

Signed-off-by: Dror Prital <drorp@nvidia.com>
2021-06-08 14:06:14 +03:00
Stepan Blyshchak
4506525b61
[sonic-linux-kernel] submodule update (#7776)
Includes below comments:
```
fcf7cdc [patch] add patch "net: sch_generic: fix the missing new qdisc assignment bug" (#213)
```

#### Why I did it
To bring the fix "net: sch_generic: fix the missing new qdisc assignment bug".

#### How I did it
Updated submodule.

#### How to verify it

Build and run.
Verify that flapping a LAG member port does not lead to this member beeing stuck in disabled state.
2021-06-07 02:05:34 -07:00
Ying Xie
f0efc090f0 [7050] updating 7050 MMU configurations (#7801)
Why I did it
7050 S4Q31 mmu configuration is missing ALPM configurations, causing not enough memory reserved for routes. Orchagent crashes on a nightly testbed with 6400 route entries.

How I did it
Add the missing ALPM configurations.

How to verify it
Load the configuration on testbed and verified new configuration exists and no more crash.

Signed-off-by: Ying Xie ying.xie@microsoft.com
2021-06-07 06:04:13 +00:00
Kebo Liu
33cb83cbd1 [Mellanox] Align PSU name convention returned from psu.get_name platform API (#7783)
Make PSU name returned from platform API aligned with the convention "PSU {X}" instead of "PSU{X}".
2021-06-07 06:02:32 +00:00
Renuka Manavalan
32e5137ab7 Add service to restore TACACS from old config (#7560)
Why I did it
In upgrade scenarios, where config_db.json is not carry forwarded to new image, it could be left w/o TACACS credentials.
Added a service to trigger 5 minutes after boot and restore TACACS, if /etc/sonic/old_config/tacacs.json is present.

How I did it
By adding a service, that would fire 5 mins after boot.
This service apply tacacs if available.

How to verify it
Upgrade and watch status of tacacs.timer & tacacs.service
You may create /etc/sonic/old_config/tacacs.json, with updated credentials
(before 5mins after boot) and see that appears in config & persisted too.

Which release branch to backport (provide reason below if selected)
 201911
 202006
 202012
2021-06-07 06:02:32 +00:00
mssonicbld
1e9cb30008
[ci/build]: Upgrade SONiC package versions (#7770) 2021-06-05 14:03:04 +00:00
Volodymyr Samotiy
754e4fea17
[Mellanox] Update SDK to 4.4.3106 and FW to xx.2008.3106 (#7785)
Signed-off-by: Volodymyr Samotiy <volodymyrs@nvidia.com>
2021-06-03 19:18:07 -07:00
Myron Sosyak
b65b59c1b5 [BFN] Enable syncd-rpc build (#7646)
Why I did it
To enable syncd-rpc for Barefoot build

How I did it
Set the flag

How to verify it
ENABLE_SYNCD_RPC=y make configure PLATFORM=barefoot
ENABLE_SYNCD_RPC=y make all
2021-06-03 12:13:42 +00:00
Volodymyr Boiko
6a90c09cc7 [barefoot][sonic-platform] Fix Fan.set_speed (#7763)
Fixed typo

Signed-off-by: Volodymyr Boyko <volodymyrx.boiko@intel.com>
2021-06-03 12:13:42 +00:00
Volodymyr Boiko
202c31ebbe [barefoot][platform] Support fans and thermal (#7004)
Add support for fans and thermals to sonic-platform package for Montara platform

Signed-off-by: Volodymyr Boyko <volodymyrx.boiko@intel.com>
2021-06-03 12:13:42 +00:00
Myron Sosyak
e7009513da [docker-database] Fix Python3 issue (#7700)
#### Why I did it
To avoid the following error
```
Traceback (most recent call last):
  File "/usr/local/bin/flush_unused_database", line 10, in <module>
    if 'PONG' in output:
TypeError: a bytes-like object is required, not 'str'
```
`communicate` method returns the strings if streams were opened in text mode; otherwise, bytes.
In our case text arg  in Popen is not true and that means that `communicate` return the bytes
#### How I did it
Set `text=True` to get strings instead of bytes
#### How to verify it
run `/usr/local/bin/flush_unused_database` inside database container
2021-06-02 02:39:31 +00:00
bingwang-ms
eb8c05c306 Fix lldpmgrd syntax issue (#7742)
Signed-off-by: bingwang <bingwang@microsoft.com>
2021-06-02 02:39:31 +00:00
Lawrence Lee
6a0e9078d4 [docker-orchagent]: Increase ndppd kernel poll interval (#7456)
Why I did it
ndppd by default reads /proc/net/ipv6_route ever 30 seconds. Since T1s advertise so many routes to ToRs, this file is extremely large, and reading it causes ndppd's CPU usage to spike every 30 seconds

How I did it
Increase the delay for reading this file to the maximum possible value (max integer value), which will result in CPU spikes every ~24 days instead of every 30 seconds

How to verify it
Start ndppd with the new config file, confirm that no CPU spikes are seen except at startup

Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
2021-06-02 02:38:54 +00:00
novikauanton
556bb62db4
added barefoot to job filters (#7771)
Build failed with error "Failed to verify the package"
The reason of the fail is version verification
Version manager didn't find dependencies for the barefoot platform
2021-06-02 09:18:00 +08:00
mssonicbld
eddce4d58b
[ci/build]: Upgrade SONiC package versions (#7755) 2021-05-31 13:02:58 +00:00
Stephen Sun
d387d75420 [Mellanox] Support buffer configuration for 2km cables (#7337)
#### Why I did it
Support 2km cables for Microsoft SKUs

#### How I did it
1. Update pg_profile_lookup.ini with 2000m cable supported
2. Update buffer configuration for t1 with uplink cable 2000m
  - For SN3800 platform:
    - C64:
      - t0: 32 100G down links and 32 100G up links.
      - t1: 56 100G down links and 8 100G up links with 2 km cable.
    - D112C8: 112 50G down links and 8 100G up links.
    - D24C52: 24 50G down links, 20 100G down links, and 32 100G up links.
    - D28C50: 28 50G down links, 18 100G down links, and 32 100G up links.
  - For SN2700 platform:
    - D48C8: 48 50G down links and 8 100G up links.
    - C32:
      - t0: 16 100G down links and 16 100G up links.
      - t1: 24 100G down links and 8 100G up links with 2 km cable.
  - For SN4600C platform:
    - D112C8: 112 50G down links and 8 100G up links.

#### How to verify it
Run regression test
2021-05-31 04:39:59 +00:00
Neetha John
b0f3ecb5cf Rename AristaQX-32S skus (#7751)
This PR contains the following changes
Original Arista-7050-QX-32S sku (32x40G ports) has been renamed to Arista-7050QX32S-Q32
Arista-7050-QX-32S is symlinked to Arista-7050QX-32S-S4Q31 (4x10G, 31x40G ports)

Signed-off-by: Neetha John <nejo@microsoft.com>
2021-05-31 04:38:19 +00:00
Neetha John
20b7654389 [minigraph] Parse bandwidth for DeviceMgmtLinks (#7744)
Why I did it
The current code skips parsing bandwidth for DeviceMgmtLinks. We have a use case to set the speed for these type of links based on the bandwidth attribute in the minigraph

How to verify it
Ran sonic-cfggen on a minigraph and verified that interface of type DeviceMgmtLink has speed set in the PORT table from the bandwidth attribute in the minigraph
2021-05-31 04:38:18 +00:00
Wirut Getbamrung
85fccfe8bf [device/celestica]: Fix remaining failed test cases of Seastone-DX010 platform API (#7743)
**- Why I did it**
- To fix failed test cases of Seastone-DX010 platform APIs that found on [platform_tests](https://github.com/Azure/sonic-mgmt/tree/master/tests/platform_tests/api) script

**- How I did it**
1. Add device/celestica/x86_64-cel_seastone-r0/platform.json 
2. Update functions to support python3.7
3. Add more functions follow latest sonic_platform_base
4. Fix the bug
2021-05-31 04:38:18 +00:00
yozhao101
3af05fdffe [Monit] Restart telemetry container if memory usage is beyond the threshold (#7645)
Signed-off-by: Yong Zhao yozhao@microsoft.com

Why I did it
This PR aims to monitor the memory usage of streaming telemetry container and restart streaming telemetry container if memory usage is larger than the pre-defined threshold.

How I did it
I borrowed the system tool Monit to run a script memory_checker which will periodically check the memory usage of streaming telemetry container. If the memory usage of telemetry container is larger than the pre-defined threshold for 10 times during 20 cycles, then an alerting message will be written into syslog and at the same time Monit will run the script restart_service to restart the streaming telemetry container.

How to verify it
I verified this implementation on device str-7260cx3-acs-1.
2021-05-31 04:38:18 +00:00
Junchao-Mellanox
74216f8710 [Mellanox] clear fan from chassis._fan_list (#7682)
#### Why I did it

According to thermalctld hld, each fan must belong to a fan drawer, if the fan drawer does not physically exist, put fan into a virtual fan drawer. This PR is to clear fan from chassis._fan_list

#### How I did it

1. Don't put fan to chassis._fan_list
2. Always query fan from fan_drawer
2021-05-31 04:32:40 +00:00
mssonicbld
71d4b17ad0
[ci/build]: Upgrade SONiC package versions (#7746) 2021-05-29 14:22:37 +00:00
zzhiyuan
39a4cd9c3a
[202012][Arista] Update Arista submodule to include pmbus fix (#7737)
Why I did it
Microsoft reported occasional daemon crashes on devices running 201911. On close inspection it was due to PMBus reads failing on IOError on very rare occasions.

This is the fix for 202012 branch.

How I did it
Add try/except block on performing reads on PMBus GPIOs.

How to verify it
Which release branch to backport (provide reason below if selected)

Co-authored-by: Zhi Yuan (Carl) Zhao <zyzhao@arista.com>
2021-05-28 15:43:30 -07:00
jostar-yang
bedcc44cb7
[as7726-32x] Support API2.0 (#7729)
Add platform API 2.0 support for as7726-32x platform

Signed-off-by: Jostar Yang <jostar_yang@accton.com.tw>
2021-05-28 12:23:20 -07:00
Ying Xie
8fc68f9781
[202012][swss][utilities] advance submodule heads (#7739)
sonic-utilities:
* 8b98d45 2021-05-25 | [show] support for show muxcable firmware version of only active banks (#1629) (HEAD -> 202012) [vdahiya12]
* afd0975 2021-05-20 | [show] add support for muxcable metrics (#1615) [vdahiya12]

sonic-swss
* 7611df5 2021-05-27 | [tunneldecaporch] Set default MTU for the overlay loopback interface (#1756) (HEAD -> 202012) [Volodymyr Samotiy]
* 22fbb5c 2021-05-27 | [202012] Resolve neighbor when nexthop does not exist (#1759) (github/202012) [Shi Su]
* ec7710c 2021-05-27 | [Bulk mode] Limit the size of bulker (#1760) [Shi Su]

Signed-off-by: Ying Xie <ying.xie@microsoft.com>
2021-05-28 08:41:09 -07:00
Kebo Liu
babaaaad6b [Mellanox] Add support for MSN4600 A1 system (#7732)
Add new sensor conf for MSN4600 A1 system
Add a Mellanox hw-management patch to support MSN4600 A1 system
2021-05-27 22:30:39 +00:00