Commit Graph

6457 Commits

Author SHA1 Message Date
Neetha John
ef9fb9db05 [sonic-config-engine] Generate expected output with different cable len (#11092)
Why I did it
To address internal build failures where the cable len for some of the skus is set to 300m for all tiers.

How I did it
For the buffers test, generate a new output file based off the original expected output with CABLE_LENGTH table updated to use 300m. In the comparison logic, compare against each of the expected output files and if any matches, the testcase is set to pass

Signed-off-by: Neetha John <nejo@microsoft.com>
2022-06-23 02:33:57 +00:00
Neetha John
3304fcd3a5 [qos]: Adjust 7260 buffer sizes to accomodate extra lossless queues (#11018)
Why I did it
As part of PCBB changes, we need to enable 2 extra lossless queues. The changes in this PR are done to adjust only the reserved sizes on Th2 for the additional 2 lossless queues
Calculations are done based on 40 downlinks for T1 and 16 uplinks for dual ToR

How to verify it
Verified that the rendering works fine on Th2 dut
Unit tests have been updated to reflect the modified buffer sizes when pcbb is enabled. There are existing testcases that will test the original buffer sizes when pcbb is disabled. With these changes, was able to build sonic-config-engine wheel successfully

Signed-off-by: Neetha John <nejo@microsoft.com>
2022-06-23 02:33:48 +00:00
Lior Avramov
f9e93d2f31 Change severity of log messages for cases where docker container was stopped during service checker operation (#11188)
#### Why I did it
There might be a case where service checker periodic operation determined that specific container is running but when it tries to perform an operation on it, it was already closed by the user. This is a valid flow and we should not log an error message, informative warning is enough. 

#### How I did it
I reduce log severity.

#### How to verify it
I verified it manually.
2022-06-22 23:09:39 +00:00
Ze Gan
40f7cec98d [azurepipeline]: Add t0-sonic pool back to Azp checker (#11181)
Why I did it
The t0-sonic pool has been fixed, so add it back to azp checker.

How I did it
Remove continueOnError in run-test-template.yml.

Signed-off-by: Ze Gan <ganze718@gmail.com>
2022-06-22 23:09:26 +00:00
Hua Liu
d97b336a5d [SSHD] Enable SSHD keepalive timeout feature (#11115)
#### Why I did it
SSHD keepalive timeout feature not enabled on sonic.

#### How I did it
Enable SSHD keepalive timeout feature by set ClientAliveCountMax to 1.

#### How to verify it
Pass All E2E test case.
Manually test with following steps:

1. Change config and restart sshd
2. Connect a ssh with -vvv option to show debug message
3. Get running ssh by command and stop it:

```
azureuser@liuh-dev-vm-02:~$ ps -auxww | grep vvv
azureus+ 1614153  0.0  0.0  12244  6004 pts/1S+   15:48   0:00 ssh admin@10.250.0.101 -vvv
azureus+ 1615570  0.0  0.0   8168  2424 pts/3S+   15:49   0:00 grep --color=auto vvv
azureuser@liuh-dev-vm-02:~$ kill -Stop 1614153
```

4. Check TCP status from server side with ss command:
https://man7.org/linux/man-pages/man8/ss.8.html

```
admin@vlab-01:~$ ss | grep -i ssh
tcp   ESTAB  0  010.250.0.101:ssh 10.250.0.1:58150
tcp   FIN-WAIT-2 0  010.250.0.101:ssh 10.250.0.1:58164
tcp   ESTAB  0  010.250.0.101:ssh 10.250.0.1:57978
```

FIN-WAIT-2 means server already terminate the connection and wait for client response:
https://kb.iu.edu/d/ajmi
.  FIN-WAIT-2  <-- <SEQ=300><ACK=101><CTL=ACK>  <-- CLOSE-WAIT

5. Check again later will show the session been complete closed:

```
admin@vlab-01:~$ ss | grep -i ssh
tcp   ESTAB  0  010.250.0.101:ssh 10.250.0.1:58150
tcp   ESTAB  0  010.250.0.101:ssh 10.250.0.1:57978
```
2022-06-22 23:09:15 +00:00
saksarav-nokia
1976e55010 Update platform/broadcom/sonic-platform-modules-nokia (#11107) 2022-06-22 23:09:01 +00:00
saksarav-nokia
5a3c8d693f Updated Nokia device BCM and platform config (#11106) 2022-06-22 23:08:51 +00:00
Sudharsan Dhamal Gopalarathnam
379d77af42 [lldp]Fix lldp spawned after reboot when disabled (#11080)
- Why I did it
When LLDP is disabled through feature command, it gets spawned after reboot.

- How I did it
In syncd.sh check if the service is enabled before spawning automatically during cold reboot.

- How to verify it
Disable lldp feature. Perform cold reboot and verify its not spawned.
2022-06-22 23:08:05 +00:00
Lior Avramov
e015232ebf Add IP interface loopback action related content to YANG models. (#11012)
*Add IP interface loopback action related content to the required YANG models.
2022-06-22 23:06:18 +00:00
Andriy Yurkiv
d9f8af8e31 [Mellanox] Install MFT package on platform monitor (pmon) container (#10932)
- Why I did it
Need to execute mlxreg inside pmon docker

- How I did it
Add MFT package to pmon Makefile

- How to verify it
Install image, go to pmon : docker exec -it pmon bash, exec mlxreg
Verifiy warm, fast and cold reboot while MFT is being called in pmon constantly 

Signed-off-by: Andriy Yurkiv <ayurkiv@nvidia.com>
2022-06-22 23:05:52 +00:00
bingwang-ms
6f713419ba Add two extra lossless queues for bounced back traffic (#10496)
Signed-off-by: bingwang <bingwang@microsoft.com>

Why I did it
This PR is to add two extra lossless queues for bounced back traffic.
HLD sonic-net/SONiC#950

SKUs include
Arista-7050CX3-32S-C32
Arista-7050CX3-32S-D48C8
Arista-7260CX3-D108C8
Arista-7260CX3-C64
Arista-7260CX3-Q64

How I did it
Update the buffers.json.j2 template and buffers_config.j2 template to generate new BUFFER_QUEUE table.

For T1 devices, queue 2 and queue 6 are set as lossless queues on T0 facing ports.
For T0 devices, queue 2 and queue 6 are set as lossless queues on T1 facing ports.
Queue 7 is added as a new lossy queue as DSCP 48 is mapped to TC 7, and then mapped into Queue 7

How to verify it
Verified by UT
Verified by coping the new template and generate buffer config with sonic-cfggen
2022-06-22 23:05:14 +00:00
Liu Shilong
57244dd24a
[build] Add version files to docker image dependencies (#11195)
* [ci] Support to skip vstest using include/exclude config file. (#11086)

example:
├── folderA
│  ├──  fileA (skip vstest)
│  ├──  fileB
│  └──  fileC
If we want to skip vstest when changing /folderA/fileA, and not skip vstest when changing fileB or fileC.

vstest-include:
^folderA/fileA

vstest-exclude:
^folderA

* [build] Add version files to docker image dependencies
2022-06-22 14:12:25 +08:00
Ying Xie
3ea8df3096
[202205][swss] advance submodule head (#11200)
swss:
* a3bfd96 2022-06-18 | Enhance mock test for dynamic buffer manager for port removing and qos reload flows (#2262) (HEAD -> 202205, github/202205) [Stephen Sun]
* b17d6c0 2022-05-28 | Support mock_test infra for dynamic buffer manager and fix issues found during mock test (#2234) [Stephen Sun]
* 3fb23a1 2022-06-16 | [aclorch] Fix and simplify DTel watchlist tables and entries (#2155) [Mickey Spiegel]
* 9ace643 2022-06-16 | [intfmgr]: Set proxy_arp kernel param (#2334) [Lawrence Lee]
* 013609a 2022-06-14 | [crmorch] Prevent exceededLogCounter from resetting when low and high values are equal (#2327) [Alexander Allen]
* 83a1306 2022-06-13 | Fix key generation in removeDecapTunnel (#2322) [Myron Sosyak]
* 3d018ad 2022-06-15 | Apply `DSCP_TO_TC_MAP` from `PORT_QOS_MAP|global` to switch level (#2314) [bingwang-ms]

Signed-off-by: Ying Xie <ying.xie@microsoft.com>
2022-06-21 09:28:46 -07:00
jingwenxie
7a22cbff28
[202205][utilities] advance utilities submodule head (#11184)
13ec600 [generic-config-updater] Add NTP validator (#2212)
4fc09b1 [GCU] Handling non-compliant leaf-list with string values (#2174)
ac89489 Modify override testcase to cover PORT admin_status (#2165)
d7953d2 [GCU] Validate peer_group_range ip_range are correct (#2145)
2022-06-20 09:02:24 -07:00
yozhao101
d63d16ba58 [memory_checker] Do not check memory usage of containers which are not created (#11129)
Signed-off-by: Yong Zhao yozhao@microsoft.com

Why I did it
This PR aims to fix an issue (#10088) by enhancing the script memory_checker.

Specifically, if container is not created successfully during device is booted/rebooted, then memory_checker do not need check its memory usage.

How I did it
In the script memory_checker, a function is added to get names of running containers. If the specified container name is not in current running container list, then this script will exit without checking its memory usage.

How to verify it
I tested on a lab device by following the steps:

Stops telemetry container with command sudo systemctl stop telemetry.service

Removes telemetry container with command docker rm telemetry

Checks whether the script memory_checker ran by Monit will generate the syslog message saying it will exit without checking memory usage of telemetry.
2022-06-19 08:01:18 +00:00
Ying Xie
36b54da653
[brcm docker build] remove extra line (#11182)
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
2022-06-17 07:51:35 -07:00
Ying Xie
95dc2e23ff
[202205][BRCM_SAI] update Brcm SAI dependencies (#11173)
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
2022-06-17 05:02:00 -07:00
xumia
90e56cc55b [Build] Improve docker build performance (#11111)
Why I did it
The docker storage driver vfs is not a good option for build, it uses the “deep copy” when building a new layer, leads to lower performance and more space used on disk than other storage drivers.
A better docker storage driver is the default one overlay2, it is a modern union filesystem.
2022-06-17 03:31:53 +00:00
bingwang-ms
16c424b081 Update YANG for PORT_QOS_MAP to support switch level mapping (#11089)
Signed-off-by: bingwang <wang.bing@microsoft.com>

Co-authored-by: Neetha John <nejo@microsoft.com>
2022-06-17 03:31:43 +00:00
bingwang-ms
255d77e610 Generate switch level dscp_to_tc_map entry from qos_config template (#11087)
* Generate switch level dscp_to_tc_map

Signed-off-by: bingwang <wang.bing@microsoft.com>
2022-06-17 03:31:32 +00:00
shlomibitton
323aa791ec [Mellanox] [pmon] Fix for PMON service not starting when restarting SWSS service after fast/warm reboot (#10901)
- Why I did it
Recent change to delay PMON service in case of fast/warm reboot introduce an issue when restarting only SWSS service after fast/warm reboot for Nvidia platform.
Since the timer is triggered only when the system boot, in a scenario when the system is after a fast/warm reboot and the user restart SWSS service, as part of syncd.sh script, PMON service will stop but the timer will not start again.

- How I did it
On syncd.sh script, in case of fast/warm indication, check if pmon.timer is running.
If it is running it means we are at the first boot and continue normally.
If it is not running, meaning the service was restarted, start the timer to keep the system behavior consistent.

- How to verify it
Run fast/warm reboot.
service swss restart.
Observe PMON service starting.

Signed-off-by: Shlomi Bitton <shlomibi@nvidia.com>
2022-06-17 03:31:18 +00:00
yozhao101
8a76cdc66e [hostcfgd] Initialize Restart= in feature's systemd config by the value of auto_restart in CONFIG_DB (#10915)
Why I did it
Recently the nightly testing pipeline found that the autorestart test case was failed when it was run against master image. The reason is Restart= field in each container's systemd configuration file was set to Restart=no even the value of auto_restart field in FEATURE table of CONFIG_DB is enabled.

This issue introduced by #10168 can be reproduced by the following steps:

Issues the config command to disable the auto-restart feature of a container
Runs command config reload or config reload minigraph to enable auto-restart of the container
Checks Restart= field in the container's systemd config file mentioned in step 1 by running the command
sudo systemctl cat <container_name>.service
Initially this PR (#10168) wants to revert the changes proposed by this: #8861. However, it did not fully revert all the changes.

How I did it
When hostcfgd started or was restarted, the Restart= field in each container's systemd configuration file should be initialized according to the value of auto_restart field in FEATURE table of CONFIG_DB.

How to verify it
I verified this change by running auto-restart test case against newly built master image and also ran the unittest:
2022-06-17 00:58:10 +00:00
vdahiya12
bb8e12fe94
[202205][sonic-platform-daemons] submodule update (#11169)
The following commits are pushed

1f112b8 (HEAD -> 202205, origin/202205) [sonic-ycabled] fix grpc logic for timeout,cli HWSTATUS value retrival logic for active-active cable (#264)

Signed-off-by: vaibhav-dahiya vdahiya@microsoft.com
2022-06-16 16:01:14 -07:00
Ying Xie
9329c4b987
[202205][bcm sai] upgrade Broadcom SAI to 7.1.0.0-5 (#11159)
* [bcm sai] upgrade Broadcom SAI to 7.1.0.0-5

- Enable Microsoft AN/LT patch

Signed-off-by: Ying Xie <ying.xie@microsoft.com>
2022-06-16 16:00:17 -07:00
Ying Xie
f14d2ae5e3
[202205][linkmgr] advance submodule head (#11158)
linkmgrrd:
* d6518dd 2022-06-14 | Fix IP header checksum in handleSendSwitchCommand (#88) (HEAD -> 202205, github/202205) [Jing Zhang]

swss:
* 4430445 2022-06-03 | Add port counter sanity check (#2300) (HEAD -> 202205, github/202205) [Junhua Zhai]
* 01b017c 2022-05-28 | [counter] Support gearbox counters (#2218) [Junhua Zhai]

utilities:
* ce96543 2022-05-26 | [subinterface]Avoid removing the subinterface when last configured ip is removed (#2181) (HEAD -> 202205, github/202205) [Sudharsan Dhamal Gopalarathnam]
* ed97c6f 2022-05-26 | [subinterface] Fix route add command to accept subinterface as dev (#2180) [Sudharsan Dhamal Gopalarathnam]
* 53ff644 2022-06-09 | [gendump] Add Support to dump BCM-DNX commands (#1813) [saksarav-nokia]
* 0e31790 2022-06-15 | [config][muxcable] fix minor config DB logic issue (#2210) [vdahiya12]

Signed-off-by: Ying Xie <ying.xie@microsoft.com>
2022-06-16 15:59:55 -07:00
mssonicbld
1817c325d3
[ci/build]: Upgrade SONiC package versions (#11060)
Co-authored-by: mssonicbld <vsts@fv-az125-175.rkccfo2qup5e5ofdktzmdhpvwd.jx.internal.cloudapp.net>
2022-06-16 23:33:23 +08:00
zitingguo-ms
ae90bfae4b [AN/LT][Fix bug]:enable phy_an_lt_msft attribute on some platforms (#11147) 2022-06-16 02:13:22 +00:00
Jon Goldberg
3f12919dee [Nokia ixs7215] change var/log size to 4GB (#11122)
This makes use of #11121 to add support for configuration of VAR_LOG_SIZE on Nokia IXS7215
2022-06-16 02:12:59 +00:00
Jon Goldberg
b2685736e0 [installer]: fix armhf for installer.conf usage (#11121)
This fixes the build for armhf to be able to use '/device///installer.conf' files. Specifically, armhf needs support to be able to change the size of /var/log/ directory. It is hardcoded to 512 bytes on all armhf platforms currently. This change will allow any armhf platform to be able to use an installer.conf file to customize the installed image.
2022-06-16 02:12:59 +00:00
judyjoseph
8fc5c9b31f Cleanup macsec stateDB tables on restart (#11066)
Clean macsec tables in STATE_DB on start
2022-06-16 02:12:59 +00:00
StormLiangMS
a4c8290637
[202205] [submodule] Advanced sonic-swss (#11137)
submodule advance
Commit included:

54a9828 - (HEAD, public/202205) Combine PGs in buffermgrd (https://github.com/Azure/sonic-buildimage/pull/2281) (https://github.com/Azure/sonic-buildimage/pull/2329) (6 minutes ago)
2022-06-15 17:03:49 -07:00
Richard.Yu
3467f434e8 [Tunnel PFC][Fix bug] Fix bug and Tests for adding property 'sai_remap_prio_on_tnl_egress' (#11027)
* [Tunnel PFC] Tests for adding property 'sai_remap_prio_on_tnl_egress'

Add tests for adding property 'sai_remap_prio_on_tnl_egress', this
property should only be added in dual tor environment.

Test done:
Run test test_j2files.py

Co-authored-by: richardyu <richardyu@contoso.com>
2022-06-14 14:59:14 +00:00
Shilong Liu
933e0d11df
[build] Fix issue between reproducible build and dood. (#11084) 2022-06-13 11:15:00 +08:00
Saikrishna Arcot
921658c7a6 Add ping to swss-layer docker (#11093)
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
2022-06-10 14:48:14 +00:00
Ying Xie
c7d8f51c68
[202205][linkmgrd][sairedis] advance submodule head (#11091)
linkmgrd:
* 2da783b 2022-06-07 | Check self's mux mode before switching peer to standby & add support for `detach` mode (#79) (HEAD -> 202205, github/202205) [Jing Zhang]

sairedis:
* 54642c7 2022-06-09 | [counter] Fix port flex counter  (#1052) (HEAD -> 202205, github/202205) [Junhua Zhai]
* b7f5f92 2022-06-06 | [ci] Paralize azure pipeline  (#1054) [Shilong Liu]

swss:
* 77043fb 2022-06-09 | [fpmsyncd] don't manipulate route weight (#2321) (HEAD -> 202205, github/202205) [Ying Xie]
* ae157f1 2022-06-10 | Fix test_warm_reboot issues blocking PR merge (#2309) (#2318) [Shilong Liu]

Signed-off-by: Ying Xie <ying.xie@microsoft.com>
2022-06-10 07:42:43 -07:00
Ying Xie
40a421913a [makefile] remove all fsroot folders (#11030)
Why I did it
Make reset didn't clean-up all fsroot folders.

How I did it
Remove all fsroot folders used during build.

How to verify it
Run local build and local make reset:

sudo mkdir fsroot-test
sudo touch fsroot-test/foo
make reset
(Without this change, make reset cannot remove fsroot-foo, with the change, the repo become clean after make reset.)

Signed-off-by: Ying Xie ying.xie@microsoft.com
2022-06-09 16:52:49 +00:00
xumia
e853f8e7ff [Build]: Fix the version files for armhf/arm64 not used issue (#11021)
Why I did it
[Build]: Fix the version files in host-base-image for armhf/arm64 not used issue
2022-06-09 16:51:03 +00:00
Kebo Liu
7af4efacb7 [Mellanox] Update SN2201 sai profile and platform reboot script (#10978)
- Why I did it
1. SN2201 sai profile needs to be updated according to the latest hardware.
2. In the reboot script, need to use the common symbol link of the power_cycle sysfs instead of directly accessing it due to SN2201 sysfs is different than other platforms.
3. echo 1 > $SYSFS_PWR_CYCLE will trigger the reboot immediately, the following sleep 3 and echo 0 > $SYSFS_PWR_CYCLE will never be executed, can be removed.

- How I did it
1. Replace the SN2201 sai profile with the latest one.
2. In the platform_reboot script, replace the direct sysfs path with the symbol link path.
3. Remove the redundant code from platform_reboot

- How to verify it
Perform reboot on all the Nvidia platforms, and check all can be rebooted successfully.

Signed-off-by: Kebo Liu <kebol@nvidia.com>
2022-06-09 16:50:19 +00:00
Junchao-Mellanox
00d04dcb5f [Mellanox] optimize platform API import time (#10815)
- Why I did it
"import sonic_platform" takes about 600ms ~ 1000ms, it is kind of slow. After this optimization, the time is about 100ms. The benefit is that those CLIs which does not need the slow import sentence would be faster than before.

- How I did it
Find slow import and call them when need.

- How to verify it
Measure the import time.
2022-06-09 16:50:12 +00:00
vdahiya12
d4c4993282
[202205][sonic-utilities] submodule update (#11065)
0fc6f47 (HEAD -> 202205, origin/202205) [config][muxcable] Add support for displaying soc_ipv4 and cable_type in config/show muxcable commands (#2189)

Signed-off-by: vaibhav-dahiya vdahiya@microsoft.com
2022-06-08 19:50:48 -07:00
Shilong Liu
edf5e445be
[build] Disable reproducible build in 202205. (#11071)
Why I did it
It seems that reproducible build and dood conflicts.
Disable reproducible build first. Investigate the issue later.
2022-06-08 17:54:00 +08:00
mssonicbld
1c2e361080
[ci/build]: Upgrade SONiC package versions (#11048)
Upgrade SONiC Versions
Co-authored-by: mssonicbld <vsts@fv-az113-110.2axxbwkg0v3e1hk3nyhxwcxvsf.bx.internal.cloudapp.net>
2022-06-07 10:01:24 +08:00
Ying Xie
f6f0aaaad8
[202205][linkmgrd] advance submodule head (#11033)
linkmgrd:
* d27ca81 2022-06-05 |  Separate I2C mux state probing and gRPC forwarding state probing  (#86) (HEAD -> 202205) [Jing Zhang]
* 9d7d301 2022-06-01 | Revert "Update log level for mux probing and mux state chance (#23)" (#85) [Jing Zhang]
* 60d3d77 2022-06-05 | Fix peer mux wait back off factor (#84) [Longxiang Lyu]

Signed-off-by: Ying Xie <ying.xie@microsoft.com>
2022-06-05 08:34:33 -07:00
Ying Xie
dbb4a98046 [pr test] increase T1-lag PR test timeout to 5 hours (#11029)
Why I did it
Some PR test are timing out on T1-lag kvm test.

How I did it
Increase the timeout to 5 hours.

How to verify it
Test on this PR.

Signed-off-by: Ying Xie ying.xie@microsoft.com
2022-06-05 15:23:45 +00:00
Richard.Yu
af855033ec [Tunnel PFC] Add property for tunnel PFC (#10962)
* [Tunnel PFC] Add property for tunnel PFC

Replace the config.bcm file with j2 template file
- Add 'sai_remap_prio_on_tnl_egress=1' property when device metadata local
- Host subtype is 'dualtor'
- Change sai.profile foe the new config.bcm.j2
2022-06-05 15:21:24 +00:00
bingwang-ms
76502c821e Update qos template to support SYSTEM_DEFAULT table (#10936)
* Update qos template to support SYSTEM_DEFAULT table

Signed-off-by: bingwang <wang.bing@microsoft.com>
2022-06-05 15:21:10 +00:00
xumia
043656dfe8 Support symcrypt fips config for aboot/uboot (#10729)
Why I did it
Support symcrypt fips config for aboot/uboot
2022-06-05 15:20:20 +00:00
Ying Xie
ea3df2a21a
[platform build] fix platform ycabled build (#11020)
* remove python2 wheel for sonic-platform-common

Signed-off-by: vaibhav-dahiya <vdahiya@microsoft.com>
2022-06-04 09:43:05 -07:00
mssonicbld
aecbf4718f
[ci/build]: Upgrade SONiC package versions (#11013)
Co-authored-by: mssonicbld <vsts@fv-az95-899.pq21ngt4mckezax5v03dvw0kka.ex.internal.cloudapp.net>
2022-06-03 09:08:13 +08:00
Ying Xie
0514923ea1
[azure pipeline] enable PR test for 202205 branch (#11017)
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
2022-06-02 12:00:50 -07:00