Commit Graph

6893 Commits

Author SHA1 Message Date
yozhao101
d63d16ba58 [memory_checker] Do not check memory usage of containers which are not created (#11129)
Signed-off-by: Yong Zhao yozhao@microsoft.com

Why I did it
This PR aims to fix an issue (#10088) by enhancing the script memory_checker.

Specifically, if container is not created successfully during device is booted/rebooted, then memory_checker do not need check its memory usage.

How I did it
In the script memory_checker, a function is added to get names of running containers. If the specified container name is not in current running container list, then this script will exit without checking its memory usage.

How to verify it
I tested on a lab device by following the steps:

Stops telemetry container with command sudo systemctl stop telemetry.service

Removes telemetry container with command docker rm telemetry

Checks whether the script memory_checker ran by Monit will generate the syslog message saying it will exit without checking memory usage of telemetry.
2022-06-19 08:01:18 +00:00
Ying Xie
36b54da653
[brcm docker build] remove extra line (#11182)
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
2022-06-17 07:51:35 -07:00
Ying Xie
95dc2e23ff
[202205][BRCM_SAI] update Brcm SAI dependencies (#11173)
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
2022-06-17 05:02:00 -07:00
xumia
90e56cc55b [Build] Improve docker build performance (#11111)
Why I did it
The docker storage driver vfs is not a good option for build, it uses the “deep copy” when building a new layer, leads to lower performance and more space used on disk than other storage drivers.
A better docker storage driver is the default one overlay2, it is a modern union filesystem.
2022-06-17 03:31:53 +00:00
bingwang-ms
16c424b081 Update YANG for PORT_QOS_MAP to support switch level mapping (#11089)
Signed-off-by: bingwang <wang.bing@microsoft.com>

Co-authored-by: Neetha John <nejo@microsoft.com>
2022-06-17 03:31:43 +00:00
bingwang-ms
255d77e610 Generate switch level dscp_to_tc_map entry from qos_config template (#11087)
* Generate switch level dscp_to_tc_map

Signed-off-by: bingwang <wang.bing@microsoft.com>
2022-06-17 03:31:32 +00:00
shlomibitton
323aa791ec [Mellanox] [pmon] Fix for PMON service not starting when restarting SWSS service after fast/warm reboot (#10901)
- Why I did it
Recent change to delay PMON service in case of fast/warm reboot introduce an issue when restarting only SWSS service after fast/warm reboot for Nvidia platform.
Since the timer is triggered only when the system boot, in a scenario when the system is after a fast/warm reboot and the user restart SWSS service, as part of syncd.sh script, PMON service will stop but the timer will not start again.

- How I did it
On syncd.sh script, in case of fast/warm indication, check if pmon.timer is running.
If it is running it means we are at the first boot and continue normally.
If it is not running, meaning the service was restarted, start the timer to keep the system behavior consistent.

- How to verify it
Run fast/warm reboot.
service swss restart.
Observe PMON service starting.

Signed-off-by: Shlomi Bitton <shlomibi@nvidia.com>
2022-06-17 03:31:18 +00:00
yozhao101
8a76cdc66e [hostcfgd] Initialize Restart= in feature's systemd config by the value of auto_restart in CONFIG_DB (#10915)
Why I did it
Recently the nightly testing pipeline found that the autorestart test case was failed when it was run against master image. The reason is Restart= field in each container's systemd configuration file was set to Restart=no even the value of auto_restart field in FEATURE table of CONFIG_DB is enabled.

This issue introduced by #10168 can be reproduced by the following steps:

Issues the config command to disable the auto-restart feature of a container
Runs command config reload or config reload minigraph to enable auto-restart of the container
Checks Restart= field in the container's systemd config file mentioned in step 1 by running the command
sudo systemctl cat <container_name>.service
Initially this PR (#10168) wants to revert the changes proposed by this: #8861. However, it did not fully revert all the changes.

How I did it
When hostcfgd started or was restarted, the Restart= field in each container's systemd configuration file should be initialized according to the value of auto_restart field in FEATURE table of CONFIG_DB.

How to verify it
I verified this change by running auto-restart test case against newly built master image and also ran the unittest:
2022-06-17 00:58:10 +00:00
vdahiya12
bb8e12fe94
[202205][sonic-platform-daemons] submodule update (#11169)
The following commits are pushed

1f112b8 (HEAD -> 202205, origin/202205) [sonic-ycabled] fix grpc logic for timeout,cli HWSTATUS value retrival logic for active-active cable (#264)

Signed-off-by: vaibhav-dahiya vdahiya@microsoft.com
2022-06-16 16:01:14 -07:00
Ying Xie
9329c4b987
[202205][bcm sai] upgrade Broadcom SAI to 7.1.0.0-5 (#11159)
* [bcm sai] upgrade Broadcom SAI to 7.1.0.0-5

- Enable Microsoft AN/LT patch

Signed-off-by: Ying Xie <ying.xie@microsoft.com>
2022-06-16 16:00:17 -07:00
Ying Xie
f14d2ae5e3
[202205][linkmgr] advance submodule head (#11158)
linkmgrrd:
* d6518dd 2022-06-14 | Fix IP header checksum in handleSendSwitchCommand (#88) (HEAD -> 202205, github/202205) [Jing Zhang]

swss:
* 4430445 2022-06-03 | Add port counter sanity check (#2300) (HEAD -> 202205, github/202205) [Junhua Zhai]
* 01b017c 2022-05-28 | [counter] Support gearbox counters (#2218) [Junhua Zhai]

utilities:
* ce96543 2022-05-26 | [subinterface]Avoid removing the subinterface when last configured ip is removed (#2181) (HEAD -> 202205, github/202205) [Sudharsan Dhamal Gopalarathnam]
* ed97c6f 2022-05-26 | [subinterface] Fix route add command to accept subinterface as dev (#2180) [Sudharsan Dhamal Gopalarathnam]
* 53ff644 2022-06-09 | [gendump] Add Support to dump BCM-DNX commands (#1813) [saksarav-nokia]
* 0e31790 2022-06-15 | [config][muxcable] fix minor config DB logic issue (#2210) [vdahiya12]

Signed-off-by: Ying Xie <ying.xie@microsoft.com>
2022-06-16 15:59:55 -07:00
mssonicbld
1817c325d3
[ci/build]: Upgrade SONiC package versions (#11060)
Co-authored-by: mssonicbld <vsts@fv-az125-175.rkccfo2qup5e5ofdktzmdhpvwd.jx.internal.cloudapp.net>
2022-06-16 23:33:23 +08:00
zitingguo-ms
ae90bfae4b [AN/LT][Fix bug]:enable phy_an_lt_msft attribute on some platforms (#11147) 2022-06-16 02:13:22 +00:00
Jon Goldberg
3f12919dee [Nokia ixs7215] change var/log size to 4GB (#11122)
This makes use of #11121 to add support for configuration of VAR_LOG_SIZE on Nokia IXS7215
2022-06-16 02:12:59 +00:00
Jon Goldberg
b2685736e0 [installer]: fix armhf for installer.conf usage (#11121)
This fixes the build for armhf to be able to use '/device///installer.conf' files. Specifically, armhf needs support to be able to change the size of /var/log/ directory. It is hardcoded to 512 bytes on all armhf platforms currently. This change will allow any armhf platform to be able to use an installer.conf file to customize the installed image.
2022-06-16 02:12:59 +00:00
judyjoseph
8fc5c9b31f Cleanup macsec stateDB tables on restart (#11066)
Clean macsec tables in STATE_DB on start
2022-06-16 02:12:59 +00:00
StormLiangMS
a4c8290637
[202205] [submodule] Advanced sonic-swss (#11137)
submodule advance
Commit included:

54a9828 - (HEAD, public/202205) Combine PGs in buffermgrd (https://github.com/Azure/sonic-buildimage/pull/2281) (https://github.com/Azure/sonic-buildimage/pull/2329) (6 minutes ago)
2022-06-15 17:03:49 -07:00
Richard.Yu
3467f434e8 [Tunnel PFC][Fix bug] Fix bug and Tests for adding property 'sai_remap_prio_on_tnl_egress' (#11027)
* [Tunnel PFC] Tests for adding property 'sai_remap_prio_on_tnl_egress'

Add tests for adding property 'sai_remap_prio_on_tnl_egress', this
property should only be added in dual tor environment.

Test done:
Run test test_j2files.py

Co-authored-by: richardyu <richardyu@contoso.com>
2022-06-14 14:59:14 +00:00
Shilong Liu
933e0d11df
[build] Fix issue between reproducible build and dood. (#11084) 2022-06-13 11:15:00 +08:00
Saikrishna Arcot
921658c7a6 Add ping to swss-layer docker (#11093)
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
2022-06-10 14:48:14 +00:00
Ying Xie
c7d8f51c68
[202205][linkmgrd][sairedis] advance submodule head (#11091)
linkmgrd:
* 2da783b 2022-06-07 | Check self's mux mode before switching peer to standby & add support for `detach` mode (#79) (HEAD -> 202205, github/202205) [Jing Zhang]

sairedis:
* 54642c7 2022-06-09 | [counter] Fix port flex counter  (#1052) (HEAD -> 202205, github/202205) [Junhua Zhai]
* b7f5f92 2022-06-06 | [ci] Paralize azure pipeline  (#1054) [Shilong Liu]

swss:
* 77043fb 2022-06-09 | [fpmsyncd] don't manipulate route weight (#2321) (HEAD -> 202205, github/202205) [Ying Xie]
* ae157f1 2022-06-10 | Fix test_warm_reboot issues blocking PR merge (#2309) (#2318) [Shilong Liu]

Signed-off-by: Ying Xie <ying.xie@microsoft.com>
2022-06-10 07:42:43 -07:00
Ying Xie
40a421913a [makefile] remove all fsroot folders (#11030)
Why I did it
Make reset didn't clean-up all fsroot folders.

How I did it
Remove all fsroot folders used during build.

How to verify it
Run local build and local make reset:

sudo mkdir fsroot-test
sudo touch fsroot-test/foo
make reset
(Without this change, make reset cannot remove fsroot-foo, with the change, the repo become clean after make reset.)

Signed-off-by: Ying Xie ying.xie@microsoft.com
2022-06-09 16:52:49 +00:00
xumia
e853f8e7ff [Build]: Fix the version files for armhf/arm64 not used issue (#11021)
Why I did it
[Build]: Fix the version files in host-base-image for armhf/arm64 not used issue
2022-06-09 16:51:03 +00:00
Kebo Liu
7af4efacb7 [Mellanox] Update SN2201 sai profile and platform reboot script (#10978)
- Why I did it
1. SN2201 sai profile needs to be updated according to the latest hardware.
2. In the reboot script, need to use the common symbol link of the power_cycle sysfs instead of directly accessing it due to SN2201 sysfs is different than other platforms.
3. echo 1 > $SYSFS_PWR_CYCLE will trigger the reboot immediately, the following sleep 3 and echo 0 > $SYSFS_PWR_CYCLE will never be executed, can be removed.

- How I did it
1. Replace the SN2201 sai profile with the latest one.
2. In the platform_reboot script, replace the direct sysfs path with the symbol link path.
3. Remove the redundant code from platform_reboot

- How to verify it
Perform reboot on all the Nvidia platforms, and check all can be rebooted successfully.

Signed-off-by: Kebo Liu <kebol@nvidia.com>
2022-06-09 16:50:19 +00:00
Junchao-Mellanox
00d04dcb5f [Mellanox] optimize platform API import time (#10815)
- Why I did it
"import sonic_platform" takes about 600ms ~ 1000ms, it is kind of slow. After this optimization, the time is about 100ms. The benefit is that those CLIs which does not need the slow import sentence would be faster than before.

- How I did it
Find slow import and call them when need.

- How to verify it
Measure the import time.
2022-06-09 16:50:12 +00:00
vdahiya12
d4c4993282
[202205][sonic-utilities] submodule update (#11065)
0fc6f47 (HEAD -> 202205, origin/202205) [config][muxcable] Add support for displaying soc_ipv4 and cable_type in config/show muxcable commands (#2189)

Signed-off-by: vaibhav-dahiya vdahiya@microsoft.com
2022-06-08 19:50:48 -07:00
Shilong Liu
edf5e445be
[build] Disable reproducible build in 202205. (#11071)
Why I did it
It seems that reproducible build and dood conflicts.
Disable reproducible build first. Investigate the issue later.
2022-06-08 17:54:00 +08:00
mssonicbld
1c2e361080
[ci/build]: Upgrade SONiC package versions (#11048)
Upgrade SONiC Versions
Co-authored-by: mssonicbld <vsts@fv-az113-110.2axxbwkg0v3e1hk3nyhxwcxvsf.bx.internal.cloudapp.net>
2022-06-07 10:01:24 +08:00
Ying Xie
f6f0aaaad8
[202205][linkmgrd] advance submodule head (#11033)
linkmgrd:
* d27ca81 2022-06-05 |  Separate I2C mux state probing and gRPC forwarding state probing  (#86) (HEAD -> 202205) [Jing Zhang]
* 9d7d301 2022-06-01 | Revert "Update log level for mux probing and mux state chance (#23)" (#85) [Jing Zhang]
* 60d3d77 2022-06-05 | Fix peer mux wait back off factor (#84) [Longxiang Lyu]

Signed-off-by: Ying Xie <ying.xie@microsoft.com>
2022-06-05 08:34:33 -07:00
Ying Xie
dbb4a98046 [pr test] increase T1-lag PR test timeout to 5 hours (#11029)
Why I did it
Some PR test are timing out on T1-lag kvm test.

How I did it
Increase the timeout to 5 hours.

How to verify it
Test on this PR.

Signed-off-by: Ying Xie ying.xie@microsoft.com
2022-06-05 15:23:45 +00:00
Richard.Yu
af855033ec [Tunnel PFC] Add property for tunnel PFC (#10962)
* [Tunnel PFC] Add property for tunnel PFC

Replace the config.bcm file with j2 template file
- Add 'sai_remap_prio_on_tnl_egress=1' property when device metadata local
- Host subtype is 'dualtor'
- Change sai.profile foe the new config.bcm.j2
2022-06-05 15:21:24 +00:00
bingwang-ms
76502c821e Update qos template to support SYSTEM_DEFAULT table (#10936)
* Update qos template to support SYSTEM_DEFAULT table

Signed-off-by: bingwang <wang.bing@microsoft.com>
2022-06-05 15:21:10 +00:00
xumia
043656dfe8 Support symcrypt fips config for aboot/uboot (#10729)
Why I did it
Support symcrypt fips config for aboot/uboot
2022-06-05 15:20:20 +00:00
Ying Xie
ea3df2a21a
[platform build] fix platform ycabled build (#11020)
* remove python2 wheel for sonic-platform-common

Signed-off-by: vaibhav-dahiya <vdahiya@microsoft.com>
2022-06-04 09:43:05 -07:00
mssonicbld
aecbf4718f
[ci/build]: Upgrade SONiC package versions (#11013)
Co-authored-by: mssonicbld <vsts@fv-az95-899.pq21ngt4mckezax5v03dvw0kka.ex.internal.cloudapp.net>
2022-06-03 09:08:13 +08:00
Ying Xie
0514923ea1
[azure pipeline] enable PR test for 202205 branch (#11017)
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
2022-06-02 12:00:50 -07:00
Ying Xie
0a01faa383
[202205][linkmgrd] advance submodule heads (#11016)
linkmgrd:
* 3c2b546 2022-05-31 | Add default route support to `active-active` state machine (#78) (github/202205, master, 202205) [Jing Zhang]
* 6fa892e 2022-05-27 | Degrade `LinkProberStateMachineBase` virtual function logging level (#80) [Longxiang Lyu]
* 7b695ca 2022-05-27 | Fix mux wait timer and peer mux wait timer (#81) [Longxiang Lyu]

platform-daemons:
* 0d90023 2022-05-31 | grpc client implementation for active-active dualtor (#248) (github/master, github/202205, master, 202205) [vdahiya12]
* 6b8bf69 2022-05-27 | [ycabled] Fix some syntax warnings in ycabled (#263) [vdahiya12]
* 2bcf936 2022-05-24 | [ycabled] fix the posting for mux_cable_static_info per downlink when ycabled is spawned; synchronizing executing Telemetry API (#257) [vdahiya12]
* ce217c0 2022-04-25 | Include changes from xcvr_api in transceiver_info table (#253) [qinchuanares]
* e0f8a35 2022-04-22 | Fix checkReplyType failed issue via recreating xcvr_table_helper on forking subprocess (#255) [Stephen Sun]

platform-common:
* f575a40 2022-05-24 | [Credo][Ycable] changes for synchronizing executing Telemetry API's when mux toggle is inprogress (#280) (github/202205, master, 202205) [vdahiya12]
* b043372 2022-05-11 | [sonic_ssd] Nokia-7215: "show platform ssdhealth" not showing health percent (#279) [bill-nokia]
* d62d3d6 2022-05-04 | [CMIS]Fix low-power to high power mode transition (#268) [Prince George]
* f918125 2022-05-02 | [syseeprom] Enable display of vendor extension TLV content (#270) [dflynn-Nokia]
* 4e08440 2022-04-14 | [Credo][Ycable] improve logging for Server Powered off/Faulty cables (#272) [vdahiya12]

Signed-off-by: Ying Xie <ying.xie@microsoft.com>
2022-06-02 11:49:34 -07:00
Ying Xie
b6aafc1fde
[202205][swss] advance swss submodule head (#11014)
Including change:

* 7ff8f75 2022-06-03 | Revert "[portsorch]: Prevent LAG member configuration when port has active ACL binding (#2165)" (#2306) (HEAD -> 202205, github/202205) [bingwang-ms]

Signed-off-by: Ying Xie <ying.xie@microsoft.com>
2022-06-02 10:36:52 -07:00
Ying Xie
845b27e91f [submodules] set submodule branches
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
2022-05-31 19:26:13 +00:00
Alexander Allen
b4bc051267
Add logging for slave container builds (#10628)
#### Why I did it

No logs currently exist for sonic-save-X containers which makes it difficult to debug.

#### How I did it

Altered Makefile.work to create logs in the sonic-slave-X folder while still displaying the log to the screen to prevent interfering with any existing tooling. 

#### How to verify it

Do `make configure` and verify that logs show up in `sonic-slave-buster/` and `sonic-slave-bullseye/`

#### Description for the changelog
Add logging for slave container builds

#### A picture of a cute animal (not mandatory but encouraged)
TBD
2022-05-31 09:59:52 -07:00
Lukas Stockner
c9b27cde71
[swss] Clear VXLAN tunnel table from State DB on startup (#10822)
* When reloading config after crashes, VTEP interfaces are sometimes not created since the tunnel still exists in the STATE_DB.
* Adding VXLAN_TUNNEL_TABLE to the list of tables to be cleaned in swss.sh fixes the problem.
2022-05-31 08:54:31 -07:00
Yakiv Huryk
7306d68411
[build][asan] make dpkg cache asan-aware (#10750)
Currently, the build with ASAN_ENABLE=y reuses the packages built with
ASAN_ENABLE=n (and vice versa). To address this issue, ASAN_ENABLE is added to DEP_FLAGS for asan-enabled packages (docker-syncd-mlnx, syncd, docker-orchagent, swss).

- Why I did it
To make dpkg cache use/rebuild the packages for ASAN_ENABLE=y/n.

- How I did it
Added ASAN_ENABLE to the DEP_FLAGS for asan-enabled packages.

- How to verify it
Built with ASAN_ENABLE=y/n and checked the .flags .log files.

Signed-off-by: Yakiv Huryk <yhuryk@nvidia.com>
2022-05-31 11:15:44 +03:00
Yakiv Huryk
bd91b2eef3
[asan] add debug package for asan-enabled containers (#10953)
This is to improve the readability of ASAN reports. The debug package adds function names and source code references to the backtrace (currently, there are only binary addresses of functions)

Another way to address this issue is to build the image with "INSTALL_DEBUG_TOOLS=y". The downside of this approach is that the image size and compilation time are unnecessarily big. Also, the idea is to make the "ENABLE_ASAN" self-sufficient, which would not be the case for this approach.

- Why I did it
To improve the readability of asan logs.

- How I did it
Added SYNCD_DBG and SWSS_DBG to corresponding docker images for ASAN_ENABLE=y build

- How to verify it
Add artificial memory leak
Build with ASAN_ENABLE=y
Test the image and check the ASAN report

Signed-off-by: Yakiv Huryk <yhuryk@nvidia.com>
2022-05-31 09:24:18 +03:00
Eric Zhu
8c1ded61b0
[SONiC-CEL]: fix platform fancontrol testcase failure issue (#10934) 2022-05-31 10:54:55 +08:00
Samuel Angebault
912923f47b
[Arista] Update supervisor configurations (#10913)
* Removed unused default_config.json

* Remove asic.conf file from HW SKUs directories as they are not used by upstream code

* Enable dynamic PCI ID identification on Otterlake2

Co-authored-by: Maxime Lorrillere <mlorrillere@arista.com>
2022-05-30 13:34:55 -07:00
xumia
5072315c89
[Ci]: Fix the target directory not empty issue when publishing artifacts #10972
Why I did it
Fix the target directory not empty issue when publishing artifacts.
Some of the artifacts are published to $(Build.ArtifactStagingDirectory)/target/ before source code checked out.
2022-05-30 16:50:06 +08:00
Guohan Lu
73c5ac11ee
[CODEOWNERS]: update code owners for various repos (#10980)
Signed-off-by: Guohan Lu <lguohan@gmail.com>
2022-05-30 01:25:51 -07:00
Shilong Liu
650b00e41e
[ci] Publish logs when building image job is canceled by timeout. (#10919) 2022-05-30 16:02:27 +08:00
Jing Zhang
b3e33d4f45
[sonic-linkmgrd][master] submodule updates (#10925)
[sonic-linkmgrd][master] submodule updates

d744bfb Longxiang Lyu Wed May 25 08:40:42 2022 +0800 Support switch between using wellknown mac or server mac addr (#73)
684e989 Jing Zhang Wed May 18 09:59:02 2022 -0700 Avoid switching active when LinkState == Down (#77)
e4aa4fd Longxiang Lyu Tue May 17 09:13:23 2022 +0800 [Makefile] Remove redundant optimization options (#75)
4ec7505 Jing Zhang Thu May 12 08:19:20 2022 -0700 [ci]: uplift diff coverage threshold to 80% (#71)

sign-off: Jing Zhang zhangjing@microsoft.com
2022-05-29 22:53:46 -07:00
xumia
b9ecaa3234
[Build]: Support to use the base image version when a package version not specified (#10971)
Why I did it
It is to fix issue: #10952
[Build]: Support to use the base image version when a package version not specified
2022-05-30 12:06:32 +08:00