Commit Graph

7298 Commits

Author SHA1 Message Date
mssonicbld
80c19c2874
[submodule] Update submodule sonic-sairedis to the latest HEAD automatically (#16291)
#### Why I did it
src/sonic-sairedis
```
* 2ebbd48 - (HEAD -> 202211, origin/202211) [syncd] Add pre match logic for acl entry (#1240) (11 hours ago) [Kamil Cudnik]
* 1db8726 - Use SAI_STATUS_ITEM_NOT_FOUND when key not found (#1224) (11 hours ago) [Lawrence Lee]
* 9e4071b - [CI]: Fix collect log error in azp template. (#1282) (4 days ago) [Nazarii Hnydyn]
```
#### How I did it
#### How to verify it
#### Description for the changelog
2023-08-30 16:32:59 +08:00
mssonicbld
decbc0d39f
[submodule] Update submodule sonic-linux-kernel to the latest HEAD automatically (#16330)
#### Why I did it
src/sonic-linux-kernel
```
* 10d7946 - (HEAD -> 202211, origin/202211) PATCH] net: allow user to set metric on default route learned via Router Advertisement (#326) (8 hours ago) [abdosi]
```
#### How I did it
#### How to verify it
#### Description for the changelog
2023-08-30 16:32:51 +08:00
Nazarii Hnydyn
99e1ce9987
[202211][PPI]: Enable global port late create for SPC-4. (#15801)
DEPENDS:

[202211][ppi]: Implement port bulk comparison logic (#2564)  sonic-swss#2821
HLD: sonic-net/SONiC#1084

Why I did it
Enabled port late create on SN5600 switch boots up with no ports
Work item tracking
N/A
How I did it
Updated SAI xml config file
How to verify it
Run sonic-mgmt tests fastboot
2023-08-30 16:05:58 +08:00
Kebo Liu
ba82b52a1a
[Mellanox] Update MFT to newer version 4.25.0-62 (#16149) (#16203)
- Why I did it
Update Mellanox MFT tool to version 4.25.0-62

- How I did it
Update the MFT tool make file

- How to verify it
Run full sonic-mgmt regression.

Signed-off-by: Kebo Liu <kebol@nvidia.com>
2023-08-30 16:04:07 +08:00
mssonicbld
207d0c38ca
[submodule] Update submodule sonic-platform-common to the latest HEAD automatically (#16332)
#### Why I did it
src/sonic-platform-common
```
* 05cf5c1 - (HEAD -> 202211, origin/202211) Change Y cable simulator log level from error to warning due to false alarm (11 hours ago) [ShiyanWangMS]
* 35ea290 - Update CMIS api's rendering max-duration (#375) (11 hours ago) [rajann]
* 33bd498 - Retrieve FW version using CDB command for CMIS transceivers + handle single bank FW versioning (#372) (11 hours ago) [mihirpat1]
* 2434362 - Render Media lane and Media assignment options info from Application Code (#368) (11 hours ago) [rajann]
* 862674b - Modify sfputil show fwversion to include build version for active/inactive FW version fields (#367) (11 hours ago) [mihirpat1]
* 8edfece - Adding electrical for 800G and 100G (#365) (11 hours ago) [mihirpat1]
* 8a1debf - SFF-8472: Fix tx_disable_channel to avoid write to read-only bit (#364) (11 hours ago) [mihirpat1]
* 223a231 - Update host electrical interface for 2x400G breakout cable (#363) (11 hours ago) [mihirpat1]
* baabd8f - fix get module hardware minor revision (#361) (11 hours ago) [Qingxiao Ren]
* 2ebabf5 - Prevent VDM dictionary related KeyError when a transceiver module is pulled while a bulk get method is interrogating said module (#360) (11 hours ago) [snider-nokia]
* 1498ed6 - [CMIS] Add API to get module power up duration (#354) (11 hours ago) [ChiouRung Haung]
* 1cae718 - Modify get_host_lane_assignment_option to return value based on application id (#352) (11 hours ago) [mihirpat1]
```
#### How I did it
#### How to verify it
#### Description for the changelog
2023-08-30 14:33:05 +08:00
mssonicbld
4c82749c2c
[submodule] Update submodule sonic-utilities to the latest HEAD automatically (#16336) 2023-08-30 13:57:20 +08:00
mssonicbld
a25f722dea
[submodule] Update submodule sonic-platform-daemons to the latest HEAD automatically (#16334) 2023-08-30 13:57:15 +08:00
Ye Jianquan
b2fbfbe7c3
[PRTest, 202211] Skip telemetry/test_telemetry.py::test_on_change_updates (#16314)
* [PRTest, 202211] Skip telemetry/test_telemetry.py::test_on_change_updates

* exclude test_warm_reboot
2023-08-30 10:43:57 +08:00
Junchao-Mellanox
f73d322081
Fix issue: watchdogutil command does not work (#16242)
Conflicts:
	platform/mellanox/mlnx-platform-api/sonic_platform/watchdog.py
	platform/mellanox/mlnx-platform-api/tests/test_watchdog.py
2023-08-28 23:58:15 +08:00
Vaibhav Hemant Dixit
e7ce179b73 Fix CONFIG_DB_INITIALIZED flag check logic and set/reset flag for warmboot (#15685)
* Fix CONFIG_DB_INITIALIZED flag check logic and set/reset flag for warm-reboot
* Fix db-cli usage
* Handle same image warm-reboot and generalize handling of INIT flag
* Cover boot from ONIE case: set config init flag when minigraph, config_db are missing
* Handle case: first boot of SONiC
* Check for config init flag
* Simplify logic, and do not call db_migrator for same image reboot
2023-08-25 02:32:24 +08:00
Vadym Hlushko
adb43ff1f4
[mlxtrace] Add mft-fwtrace-cfg.deb which contains fwtrace_cfg files for the mlxtrace utility (#15960)
Backport of #15961

Why I did it
Added the fwtrace config files in order to be able to call mlxstrace utility during show techsupport dump.

Work item tracking
Microsoft ADO (number only):
How I did it
Added fwtrace config files. Added path to these files to sai.profile for each mlnx device.

How to verify it
Execute the show techsupport command and check if mlxstrace output is in system dump.
2023-08-20 19:29:32 +08:00
Kebo Liu
4f403c9079
[202211] [Mellanox] Update SDK/FW/SAI to 4.6.1020/2012.1020/SAIBuild2211.25.1.0 (#16096) (#16095)
This is to backport #16096

Why I did it
SONiC changes:

Support Spectrum4 ASIC FW binary building.
Support new SDK sx-obj-desc lib building since new SAI need it.
Remove SX_SCEW debian package from Mellanox SDK build since we are no longer using it (we use libxml2 instead).
Update SAI, SDK, FW to version 4.6.1020/2012.1020/SAIBuild2211.25.1.0
SDK/FW bug fixes

In SPC-1 platforms: Fastboot mode is not operational for Split port with Force mode in 50G speed
SFP modules are kept in disabled state after set LPM (low power mode) on/off for at least 3 minutes.
When preforming fast boot from an old SDK version (currently installed) to a newer one (target version), and the system was initially loaded with a new SDK version (past version), and the system has not been wiped, under specific conditions, the fast boot would use the past version's data and may fail.
SDK/FW Features

On SN2700 all ports can support y cable by credo
SAI bug Fixes

When creating an ACL rule with SAI_ACL_ENTRY_ATTR_FIELD_SRC_IP/SAI_ACL_ENTRY_ATTR_FIELD_DST_IP enabled, and then disabling the field by setting enable=false, a match on L3_type=IPv4 will remain programmed for the rule Issue resolved after the fix
Allow the max scale of virtual routers to be configure for SPC-1, SPC-2, SPC-3 when fastboot enable
Remove default hash key of SRC_MAC, DST_MAC and ETH_TYPE
SAI features

Port init profile
Dual ToR Active-Standby | Additional MAC support
Work item tracking
Microsoft ADO (number only):
How I did it
Update SDK/FW/SAI make files

How to verify it
Run full sonic-mgmt regression on Mellanox platform
2023-08-20 19:24:32 +08:00
mssonicbld
7109d7197a
[submodule] Update submodule sonic-utilities to the latest HEAD automatically (#16199)
#### Why I did it
src/sonic-utilities
```
* d69432d1 - (HEAD -> 202211, origin/202211) [202211][db_migrator] Add migration of FLEX_COUNTER_DELAY_STATUS during 1911->2211 upgrade + fast-reboot. Add UT. (#2838) (34 hours ago) [Vadym Hlushko]
```
#### How I did it
#### How to verify it
#### Description for the changelog
2023-08-19 14:32:23 +08:00
mssonicbld
db27039aea
[submodule] Update submodule sonic-swss to the latest HEAD automatically (#16181)
#### Why I did it
src/sonic-swss
```
* 859bd678 - (HEAD -> 202211, origin/202211) Fix error in peer response time when headroom is calculated for 800G (#2860) (2 days ago) [Stephen Sun]
* 5f294cf1 - [Dynamic Buffer][Mellanox] Skip PGs in pending deleting set while checking accumulative headroom of a port (#2871) (2 days ago) [Stephen Sun]
```
#### How I did it
#### How to verify it
#### Description for the changelog
2023-08-18 14:32:35 +08:00
mssonicbld
651a13216f
[submodule] Update submodule sonic-platform-common to the latest HEAD automatically (#16180)
#### Why I did it
src/sonic-platform-common
```
* 2da4286 - (HEAD -> 202211, origin/202211) Add new SSD type support (#390) (14 hours ago) [Junchao-Mellanox]
```
#### How I did it
#### How to verify it
#### Description for the changelog
2023-08-17 18:32:47 +08:00
mssonicbld
5ba7067c03
[submodule] Update submodule sonic-utilities to the latest HEAD automatically (#16183)
#### Why I did it
src/sonic-utilities
```
* 46b32daa - (HEAD -> 202211, origin/202211) [kdump] Fix API to read the current running image (#2217) (14 hours ago) [rajendra-dendukuri]
```
#### How I did it
#### How to verify it
#### Description for the changelog
2023-08-17 16:32:51 +08:00
mssonicbld
0d2464d81e
Updated PG headroom settings for 40g port speed (#16038) (#16178) 2023-08-17 08:02:10 +08:00
xumia
cb408d3427 [Build] Fix some of the patches not applied issue (#15660)
Why I did it
Fix some of the patches in .patches folder not applied issue.
The command "quilt applied" only lists the applied patches, if some of the patches have issues, then the patches will not be applied when you run the build command again.

Work item tracking
Microsoft ADO (number only): 24410730
How I did it
Run the command to apply the patches without any conditions.
If failed, check if the failure reason is "series fully applied".
How to verify it
2023-08-16 14:33:20 +08:00
Longxiang Lyu
0112adef58 [YANG][vlan-sub-interface] Add vlan field (#15838)
* [YANG][vlan-sub-interface] Add `vlan` field

Signed-off-by: Longxiang Lyu <lolv@microsoft.com>

* Fix typo

Signed-off-by: Longxiang Lyu <lolv@microsoft.com>

* Fix UT

Signed-off-by: Longxiang Lyu <lolv@microsoft.com>

---------

Signed-off-by: Longxiang Lyu <lolv@microsoft.com>
2023-08-16 14:33:15 +08:00
mssonicbld
fb86e65c5d
[submodule] Update submodule sonic-utilities to the latest HEAD automatically (#16060)
#### Why I did it
src/sonic-utilities
```
* ec37e5d4 - (HEAD -> 202211, origin/202211) [Techsupport] Update the message seen during the lock acquisition failure (#2897) (10 days ago) [Vivek]
```
#### How I did it
#### How to verify it
#### Description for the changelog
2023-08-16 14:33:10 +08:00
Jing Zhang
25662c62b4 add service_mgmt (#15927)
Adding yang model for CONFIG_DB table MUX_LINKMGR|SERVICE_MGMT.

sign-off: Jing Zhang zhangjing@microsoft.com
2023-08-16 14:33:06 +08:00
mssonicbld
96148d575d
[submodule] Update submodule sonic-swss to the latest HEAD automatically (#16104)
#### Why I did it
src/sonic-swss
```
* 63b08b59 - (HEAD -> 202211, origin/202211) [ASAN] Fix Indirect Mem Leaks in Orchagent (#2869) (2 days ago) [Vivek]
* 4248d01d - [muxorch] set mux state to init upon warm reboot (#2834) (5 days ago) [Nikola Dancejic]
* 3ca4b842 - Handle duplicate routes in a graceful manner (#2688) (5 days ago) [prabhataravind]
```
#### How I did it
#### How to verify it
#### Description for the changelog
2023-08-16 14:33:01 +08:00
SuvarnaMeenakshi
9e43edf237 [SNMP][IPv6]: Fix to use link local IPv6 address as snmp agentAddress (#16013)
<!--
     Please make sure you've read and understood our contributing guidelines:
     https://github.com/Azure/SONiC/blob/gh-pages/CONTRIBUTING.md

     ** Make sure all your commits include a signature generated with `git commit -s` **

     If this is a bug fix, make sure your description includes "fixes #xxxx", or
     "closes #xxxx" or "resolves #xxxx"

     Please provide the following information:
-->

#### Why I did it
fixes: https://github.com/sonic-net/sonic-buildimage/issues/16001
Caused by: https://github.com/sonic-net/sonic-buildimage/pull/15487

The above PR introduced change to use Management and Loopback Ipv4 and ipv6 addresses as snmpagent address in snmpd.conf file.
With this change, if Link local IP address is configured as management or Loopback IPv6 address, then snmpd tries to open socket on that ipv6 address and fails with the below error:
```
Error opening specified endpoint "udp6:[fe80::5054:ff:fe6f:16f0]:161"
Server Exiting with code 1
```
From RFC4007, if we need to specify non-global ipv6 address without ambiguity, we need to use zone id along with the ipv6 address: <address>%<zone_id>
Reference: https://datatracker.ietf.org/doc/html/rfc4007

##### Work item tracking
- Microsoft ADO **(number only)**:

#### How I did it
Modify snmpd.conf file to use the %zone_id representation for ipv6 address.
#### How to verify it
In VS testbed, modify config_db to use link local ipv6 address as management address:
    "MGMT_INTERFACE": {
        "eth0|10.250.0.101/24": {
            "forced_mgmt_routes": [
                "172.17.0.1/24"
            ],
            "gwaddr": "10.250.0.1"
        },
        "eth0|fe80::5054:ff:fe6f:16f0/64": {
            "gwaddr": "fe80::1"
        }
    },

Execute config_reload after the above change.
snmpd comes up and check if snmpd is listening on ipv4 and ipv6 addresses:
```
admin@vlab-01:~$ sudo netstat -tulnp | grep 161
tcp        0      0 127.0.0.1:3161          0.0.0.0:*               LISTEN      274060/snmpd        
udp        0      0 10.1.0.32:161           0.0.0.0:*                           274060/snmpd        
udp        0      0 10.250.0.101:161        0.0.0.0:*                           274060/snmpd        
udp6       0      0 fc00:1::32:161          :::*                                274060/snmpd        
udp6       0      0 fe80::5054:ff:fe6f::161 :::*                                274060/snmpd      -- Link local 
 
admin@vlab-01:~$ sudo ifconfig eth0
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.250.0.101  netmask 255.255.255.0  broadcast 10.250.0.255
        inet6 fe80::5054:ff:fe6f:16f0  prefixlen 64  scopeid 0x20<link>
        ether 52:54:00:6f:16:f0  txqueuelen 1000  (Ethernet)
        RX packets 36384  bytes 22878123 (21.8 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 261265  bytes 46585948 (44.4 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

admin@vlab-01:~$ docker exec -it snmp snmpget -v2c -c public fe80::5054:ff:fe6f:16f0 1.3.6.1.2.1.1.1.0
iso.3.6.1.2.1.1.1.0 = STRING: "SONiC Software Version: SONiC.master.327516-04a6031b2 - HwSku: Force10-S6000 - Distribution: Debian 11.7 - Kernel: 5.10.0-18-2-amd64"
```
Logs from snmpd:
```
Turning on AgentX master support.
NET-SNMP version 5.9
Connection from UDP/IPv6: [fe80::5054:ff:fe6f:16f0%eth0]:44308
```
Ran test_snmp_loopback test to check if loopback ipv4 and ipv6 works:
```
./run_tests.sh -n vms-kvm-t0 -d vlab-01 -c snmp/test_snmp_loopback.py  -f vtestbed.yaml -i ../ansible/veos_vtb -e "--skip_sanity --disable_loganalyzer" -u
=== Running tests in groups ===
Running: pytest snmp/test_snmp_loopback.py --inventory ../ansible/veos_vtb --host-pattern vlab-01 --testbed vms-kvm-t0 --testbed_file vtestbed.yaml --log-cli-level warning --log-file-level debug --kube_master unset --showlocals --assert plain --show-capture no -rav --allow_recover --ignore=ptftests --ignore=acstests --ignore=saitests --ignore=scripts --ignore=k8s --ignore=sai_qualify --junit-xml=logs/tr.xml --log-file=logs/test.log --skip_sanity --disable_loganalyzer
..                                                                        

snmp/test_snmp_loopback.py::test_snmp_loopback[vlab-01] PASSED 
```
<!--
If PR needs to be backported, then the PR must be tested against the base branch and the earliest backport release branch and provide tested image version on these two branches. For example, if the PR is requested for master, 202211 and 202012, then the requester needs to provide test results on master and 202012.
-->

#### Which release branch to backport (provide reason below if selected)

<!--
- Note we only backport fixes to a release branch, *not* features!
- Please also provide a reason for the backporting below.
- e.g.
- [x] 202006
-->

- [ ] 201811
- [ ] 201911
- [ ] 202006
- [x] 202012
- [x] 202106
- [x] 202111
- [x] 202205
- [x] 202211
- [x] 202305

#### Tested branch (Please provide the tested image version)

<!--
- Please provide tested image version
- e.g.
- [x] 20201231.100
-->

- [ ] <!-- image version 1 -->
- [ ] <!-- image version 2 -->

#### Description for the changelog
<!--
Write a short (one line) summary that describes the changes in this
pull request for inclusion in the changelog:
-->

<!--
 Ensure to add label/tag for the feature raised. example - PR#2174 under sonic-utilities repo. where, Generic Config and Update feature has been labelled as GCU.
-->

#### Link to config_db schema for YANG module changes
<!--
Provide a link to config_db schema for the table for which YANG model
is defined
Link should point to correct section on https://github.com/Azure/sonic-buildimage/blob/master/src/sonic-yang-models/doc/Configuration.md
-->

#### A picture of a cute animal (not mandatory but encouraged)
2023-08-16 14:32:52 +08:00
mssonicbld
822cec71de
[submodule] Update submodule sonic-platform-daemons to the latest HEAD automatically (#16154)
#### Why I did it
src/sonic-platform-daemons
```
* 8ea4de3 - (HEAD -> 202211, origin/202211) [PSU power threshold] Fix logic error: compare the system power with the PSU's power threshold (#367) (2 days ago) [Stephen Sun]
```
#### How I did it
#### How to verify it
#### Description for the changelog
2023-08-16 14:32:41 +08:00
mssonicbld
cd6636d4d2
[Mellanox] Use Debian reboot in Nvidia platform reboot when it is invoked from kdump capture boot (#15701) (#16050) 2023-08-15 23:51:54 +08:00
mssonicbld
fb8f6265c0
Fix issue: set delayed attribute to true for platform monitor service (#15816) (#16042) 2023-08-15 23:26:26 +08:00
mssonicbld
d22c610874
Update the description message of PSU power threshold checking in system health (#15289) (#16131) 2023-08-15 20:09:50 +08:00
Zain Budhwani
c3b25660ed
Fix 202211 test_events to unblock PR checker (#16151)
* [yang] Change swss-event, dhcp-relay-event leafref to string (#13326)

Why I did it
Do not require leafref as part of yang. Only need string to compare whether string received from event matches what is possible for ifname.

How I did it
How to verify it
Run UT

* Add fix to monit_regex.json for catching mem_usage and cpu_usage (#14954)

Why I did it
Current regex not able to capture logs, modify regex to capture syslog messages

Work item tracking
Microsoft ADO (number only): 13366345
How I did it
Code change

How to verify it
sonic-mgmt test case

* Update usage leaf in sonic-events-host yang models (#15805)

#### Why I did it

event yang models for usage currently use int as type for usage leaf, needs to be of type decimal64

##### Work item tracking
- Microsoft ADO **(number only)**:17747466

#### How I did it

Update yang models and UT

#### How to verify it

UT
2023-08-15 14:16:06 +08:00
mssonicbld
bc1688c4b3
[YANG] add yang model for MUX_LINKMGR|MUXLOGGER (#15884) (#16022)
Add yang model for MUX_LINKMGR|MUXLOGGER.

Co-authored-by: Jing Zhang <zhangjing@microsoft.com>
2023-08-07 09:55:22 -07:00
mssonicbld
1e19747091
[syncd.sh] Clear semaphore before updating firmware (#15818) (#16066) 2023-08-07 14:48:05 +08:00
Vadym Hlushko
73999dddca
[syncd.sh] Clear semaphore before updating firmware (#15819)
Signed-off-by: Vadym Hlushko <vadymh@nvidia.com>
2023-08-06 22:31:05 -07:00
mssonicbld
cd06ae9599
[E1031] fix pca9548 initializes failed occasionally (#15712) (#16053) 2023-08-06 23:52:44 +08:00
mssonicbld
4217e52dd3
[Mellanox] Remove unnecessary file manipulation in the SAI Make file (#15993) (#16044) 2023-08-06 14:12:06 +08:00
mssonicbld
946d276f96
[Mellanox] Remove reset_from_comex from reboot cause mapping (#15793) (#16039) 2023-08-06 14:03:59 +08:00
lerry-lee
86c1bf5c15
[CI/CD] Use remote PR test template from sonic-mgmt master to run PR test (#15979)
Why I did it
Use remote PR test template from sonic-mgmt master to run PR test.

How I did it
Modify PR test azure pipeline yml file.

How to verify it
PR test executing normally.

Signed-off-by: Chun'ang Li <chunangli@microsoft.com>
2023-08-01 16:20:18 +08:00
mssonicbld
0bcbd81f67
[Build] update python package docker in host image to 6.1.1 (#14993) (#15989) 2023-07-28 19:52:32 +08:00
mssonicbld
f0a2894fed
[submodule] Update submodule sonic-utilities to the latest HEAD automatically (#15983)
#### Why I did it
src/sonic-utilities
```
* 9e8df7e5 - (HEAD -> 202211, origin/202211) [build] Fix dependency issue between responses and urllib3 package. (#2928) (#2929) (20 hours ago) [Liu Shilong]
```
#### How I did it
#### How to verify it
#### Description for the changelog
2023-07-28 16:32:39 +08:00
mssonicbld
1cc41c67bc
[submodule] Update submodule sonic-swss to the latest HEAD automatically (#15943)
#### Why I did it
src/sonic-swss
```
* 6ec611dc - (HEAD -> 202211, origin/202211) [202211][ppi]: Implement port bulk comparison logic (#2564) (#2821) (19 hours ago) [Nazarii Hnydyn]
```
#### How I did it
#### How to verify it
#### Description for the changelog
2023-07-24 14:35:15 +08:00
mssonicbld
b04c39f656
[Mellanox] Add support for BIOS update on Spectrum-4 (#15795) (#15941) 2023-07-23 22:42:39 +08:00
Kebo Liu
19fd6d5b8d
[202211] [Mellanox] Update SAI build procedure (#15728) (#15742)
Backport #15728

Why I did it
To optimize Mellanox platform SAI build

Work item tracking
Microsoft ADO (number only):
How I did it
SAI debs are now downloaded as Spectrum-SDK-Drivers-SONiC-Bins release.

How to verify it
Configure/build for Mellanox platform, check the image and ensure that correct SAI debs are included.
2023-07-23 21:12:44 +08:00
Masaru OKI
89e2cddb67 Pick dependency files in submodules. (#15142)
#### Why I did it

Failed to build sonic-dhcp6relay_1.0.0-0_amd64.deb

#### How I did it

src/dhcprelay has git submodule.
Dependency files by "git ls-files" are not picked files in submodules.
Add --recurse-submodules, work again.

#### How to verify it

make all
2023-07-22 14:32:57 +08:00
lixiaoyuner
27563c8fd2 [k8s]: Bypass the systemd service restart limit and do immediately restart when change to local mode (#15432)
Why I did it
During the upgrade process via k8s, the feature's systemd service will restart as well, all of the feature systemd service has restart number limit, and the limit number is too small, only three times. if fallback happens when upgrade, the start count will be 2, just once again, the systemd service will be down. So, need to bypass this. This restart function will be called when do local -> kube, kube -> kube, kube ->local, each time call this function, we indeed need to restart successfully, so do reset-failed every time we do restart.
When need to go back to local mode, we do systemd restart immediately without waiting the default restart interval time so that we can reduce the container down time.

Work item tracking
Microsoft ADO (number only):
24172368

How I did it
Before every restart for upgrade, do reset feature's restart number. The restart number will be reset to 0 to bypass the restart limit.
When need to go back to local mode, we do systemd restart immediately.

How to verify it
Feature's systemd service can be always restarted successfully during upgrade process via k8s.
2023-07-20 18:34:21 +08:00
Saikrishna Arcot
e76b4a0cc7 Upgrade scapy in the PTF's python3 virtualenv to 2.5.0 (#15573)
This is primarily to fix a bug in scapy hitting an error when trying to
listen on multiple interfaces in a single `sniff` call. This also
upgrades it to the current latest version.

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
2023-07-20 18:34:16 +08:00
mssonicbld
0e1a2a6101
[submodule] Update submodule sonic-platform-daemons to the latest HEAD automatically (#15915)
#### Why I did it
src/sonic-platform-daemons
```
* ea8e5c7 - (HEAD -> 202211, origin/202211) [202211] chassisd: Fix crash on exit on linecard (#352) (6 hours ago) [Patrick MacArthur]
```
#### How I did it
#### How to verify it
#### Description for the changelog
2023-07-20 18:34:05 +08:00
xumia
7862722228 [Build] Fix the PyYang python package installation issue (#15890)
Why I did it
Fix the armhf build failure.
How to reproduce the issue:

docker run -it debain:bullseye bash
apt-get update && apt-get install -y python3-pip
pip3 install PyYAML==5.4.1
Error message:

Collecting PyYAML==5.4.1
  Installing build dependencies ... done
  Getting requirements to build wheel ... error
  ERROR: Command errored out with exit status 1:
   command: /usr/bin/python3 /tmp/tmp6xabslgb_in_process.py get_requires_for_build_wheel /tmp/tmp_er01ztl
....
      raise AttributeError(attr)
  AttributeError: cython_sources
  ----------------------------------------
WARNING: Discarding d63f2d7597/PyYAML-5.4.1.tar.gz (sha256)=607774cbba28732bfa802b54baa7484215f530991055bb562efbed5b2f20a45e (from https://pypi.org/simple/pyyaml/) (requires-python:>=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.*, !=3.5.*). Command errored out with exit status 1: /usr/bin/python3 /tmp/tmp6xabslgb_in_process.py get_requires_for_build_wheel /tmp/tmp_er01ztl Check the logs for full command output.
ERROR: Could not find a version that satisfies the requirement PyYAML==5.4.1
ERROR: No matching distribution found for PyYAML==5.4.1
root@fa2fa92edcfd:/# 
But if adding the option --no-build-isolation, then it is good, see fix.

install "PyYAML==5.4.1" --no-build-isolation
The same error can be found in the multiple builds.

Work item tracking
Microsoft ADO (number only): 24567457

How I did it
Add a build option --no-build-isolation.
2023-07-20 04:32:58 +08:00
lixiaoyuner
cc818fcf09
[ctrmgr]: Container image clean up bug fix (#15772) (#15841)
Why I did it
When do clean up container images, current code has two bugs need to be fixed. And some variables' name maybe cause confused, change the variables' name.

Work item tracking
Microsoft ADO (number only): 24502294

How I did it
We do clean up after tag latest successfully. But currently tag latest function only return 0 and 1, 0 means succeed and 1 means failed, when we get 1, we will retry, when we get 0, we will do clean up. Actually the code 0 includes another case we don't need to do clean up. The case is that when we are doing tag latest, the container image we want to tag maybe not running, so we can not tag latest and don't need to cleanup, we need to separate this case from 0, return -1 now.

When local mode(v1) -> kube mode(v2) happens, one problem is how to handle the local image, there are two cases. one case is that there was one kube v1 container dry-run(cause we don't relace the local if kube version = local version), we will remove the kube v1 image and tag the local version with ACR prefix and remove local v1 local tag. Another case is that there was no kube v1 container dry-run, we remove the local v1 image directly, cause the local v1 image should not be the last desire version.

About the docker_id variable, it may cause confused, it's actually docker image id, so rename the variable. About the two dicts and the list, rename them to be more readable.

How to verify it
Check tag latest and image clean up result.
2023-07-17 14:20:48 -07:00
mssonicbld
c2b7ed37d2
[ctgmgr]: do not remove label when do systemd service stop when service is in kube mode (#15642) (#15846) 2023-07-15 04:44:25 +08:00
mssonicbld
ffa84ce84f
Potential fix for Celestica E1031 device hang (#15822) (#15843) 2023-07-15 03:13:10 +08:00
lixiaoyuner
5a836b24f3 Add health check probe for k8s upgrade containers. (#15223)
#### Why I did it
After k8s upgrade a container, k8s can only know the container is running, don't know the service's status inside container. So we need a probe inside container, k8s will call the probe to check whether the container is really ready.
##### Work item tracking
- Microsoft ADO **(number only)**: 22453004
#### How I did it
Add a health check probe inside config engine container, the probe will check whether the start service exit normally or not if the start service exists and call the python script to do container self-related specific checks if the script is there. The python script should be implemented by feature owner if it's needed.

more details: [design doc](https://github.com/sonic-net/SONiC/blob/master/doc/kubernetes/health-check.md)
#### How to verify it
Check path /usr/bin/readiness_probe.sh inside container.

#### Which release branch to backport (provide reason below if selected)

- [ ] 201811
- [ ] 201911
- [ ] 202006
- [ ] 202012
- [ ] 202106
- [ ] 202111
- [x] 202205
- [x] 202211

#### Tested branch (Please provide the tested image version)
- [x] 20220531.28
2023-07-14 20:55:07 +08:00
SuvarnaMeenakshi
6d491fac2d [SNMP][IPv6]: Fix SNMP IPv6 reachability issue in certain scenarios (#15487)
Modify snmpd.conf to start snmpd to listen on specific management and loopback ips instead of listening on any ip.

#### Why I did it
SNMP over IPv6 is not working for all scenarios for a single asic platforms.
The expectation is that SNMP query over IPv6 should work over Management or Loopback0 addresses.
**Specific scenario where this issue is seen**
In case of Lab T0 device,  when SNMP request is sent from a directly connected T1 neighbor over Loopback IP, SNMP response was not received.
This was because the SRC IP address in SNMP response was not Loopback IP, it was the PortChannel IP connected to the neighboring device.
```
23:18:51.620897  In 22:26:27:e6:e0:07 ethertype IPv6 (0x86dd), length 105: fc00::72.41725 > **fc00:1::32**.161:  C="msft" **GetRequest**(28)  .1.3.6.1.2.1.1.1.0
23:18:51.621441 Out 28:99:3a:a0:97:30 ethertype IPv6 (0x86dd), length 241: **fc00::71**.161 > fc00::72.41725:  C="msft" **GetResponse**(162)  .1.3.6.1.2.1.1.1.0="SONiC Software Version: SONiC.xxx - HwSku: xx - Distribution: Debian 10.13 - Kernel: 4.19.0-12-2-amd64"
```
In case of IPv4, the SRC IP in SNMP response was correctly set to Loopback IP.
```
23:25:32.769712  In 22:26:27:e6:e0:07 ethertype IPv4 (0x0800), length 85: 10.0.0.57.56701 > **10.1.0.32**.161:  C="msft" **GetRequest**(28)  .1.3.6.1.2.1.1.1.0
23:25:32.975967 Out 28:99:3a:a0:97:30 ethertype IPv4 (0x0800), length 221: **10.1.0.32**.161 > 10.0.0.57.56701:  C="msft" **GetResponse**(162)  .1.3.6.1.2.1.1.1.0="SONiC Software Version: SONiC.xxx - HwSku: xx - Distribution: Debian 10.13 - Kernel: 4.19.0-12-2-amd64"
```

**Sequence of SNMP request and response**
1. SNMP request will be sent with SRC IP fc00::72 DST IP fc00:1::32
2. SNMP request is received at SONiC device is sent to snmpd which is listening on port 161 :::161/
3. snmpd process will parse the request create a response and sent to DST IP fc00::72. 
snmpd process does not track the DST IP on which the SNMP request was received, which in this case is Loopback IP.
snmpd process will only keep track what is tht IP to which the response should be sent to.
4. snmpd process will send the response packet.
5. Kernel will do a route look up on destination IP and find the best path.
ip -6 route get fc00::72
fc00::72 from :: dev PortChannel101 proto kernel src fc00::71 metric 256 pref medium
5. Using the "src" ip from about, the response is sent out. This SRC ip is that of the PortChannel and not the device Loopback IP.

The same issue is seen when SNMP query is sent from a remote server over Management IP.
SONiC device eth0 --------- Remote server
SNMP request comes with SRC IP <Remote_server> DST IP <Mgmt IP>
If kernel finds best route to Remote_server_IP is via BGP neighbors, then it will send the response via front-panel interface with SRC IP as Loopback IP instead of Management IP.

Main issue is that in case of IPv6, snmpd ignores the IP address to which SNMP request was sent, in case of IPv6.
In case of IPv4, snmpd keeps track of DST IP of SNMP request, it will keep track if the SNMP request was sent to mgmt IP or Loopback IP.
Later, this IP is used in ipi_spec_dst as SRC IP which helps kernel to find the route based on DST IP using the right SRC IP.
https://github.com/net-snmp/net-snmp/blob/master/snmplib/transports/snmpUDPBaseDomain.c#L300 
ipi.ipi_spec_dst.s_addr = srcip->s_addr
Reference: https://man7.org/linux/man-pages/man7/ip.7.html
```
If IP_PKTINFO is passed to sendmsg(2)
              and ipi_spec_dst is not zero, then it is used as the local
              source address for the routing table lookup and for
              setting up IP source route options.  When ipi_ifindex is
              not zero, the primary local address of the interface
              specified by the index overwrites ipi_spec_dst for the
              routing table lookup.
```

**This issue is not seen on multi-asic platform, why?**
on multi-asic platform, there exists different network namespaces.
SNMP docker with snmpd process runs on host namespace.
Management interface belongs to host namespace.
Loopback0 is configured on asic namespaces.
Additional inforamtion on how the packet coming over Loopback IP reaches snmpd process running on host namespace: https://github.com/sonic-net/sonic-buildimage/pull/5420
Because of this separation of network namespaces, the route lookup of destination IP is confined to routing table of specific namespace where packet is received.
if packet is received over management interface, SNMP response also is sent out of management interface. Same goes with packet received over Loopback Ip.

##### Work item tracking
- Microsoft ADO **17537063**:

#### How I did it
Have snmpd listen on specific Management and Loopback IPs specifically instead of listening on any IP for single-asic platform.

Before Fix
```
admin@xx:~$ sudo netstat -tulnp | grep 161   
udp        0      0 0.0.0.0:161             0.0.0.0:*                           15631/snmpd         
udp6       0      0 :::161                  :::*                                15631/snmpd  
```
After fix
```
admin@device:~$ sudo netstat -tulnp | grep 161
udp        0      0 10.1.0.32:161           0.0.0.0:*                           215899/snmpd        
udp        0      0 10.3.1.1:161             0.0.0.0:*                           215899/snmpd        
udp6       0      0 fc00:1::32:161          :::*                                215899/snmpd        
udp6       0      0 fc00:2::32:161          :::*                                215899/snmpd  
``` 

**How this change helps with the issue?**
To see snmpd trace logs, modify snmpd to start using the below parameters, in supervisord.conf file
```
/usr/sbin/snmpd -f -LS0-7i -Lf /var/log/snmpd.log
```
When snmpd listens on any IP, snmpd binds to IPv4 and IPv6 sockets as below:
```
netsnmp_udpbase: binding socket: 7 to UDP: [0.0.0.0]:0->[0.0.0.0]:161
trace: netsnmp_udp6_transport_bind(): transports/snmpUDPIPv6Domain.c, 303:
netsnmp_udpbase: binding socket: 8 to UDP/IPv6: [::]:161
```

When IPv4 response is sent, it goes out of fd 7 and IPv6 response goes out of fd 8.
When IPv6 response is sent, it does not have the right SRC IP and it can lead to the issue described.

When snmpd listens on specific Loopback/Management IPs, snmpd binds to different sockets:
```
trace: netsnmp_udpipv4base_transport_bind(): transports/snmpUDPIPv4BaseDomain.c, 207:
netsnmp_udpbase: binding socket: 7 to UDP: [0.0.0.0]:0->[10.250.0.101]:161
trace: netsnmp_udpipv4base_transport_bind(): transports/snmpUDPIPv4BaseDomain.c, 207:
netsnmp_udpbase: binding socket: 8 to UDP: [0.0.0.0]:0->[10.1.0.32]:161
trace: netsnmp_register_agent_nsap(): snmp_agent.c, 1261:
netsnmp_register_agent_nsap: fd 8
netsnmp_udpbase: binding socket: 10 to UDP/IPv6: [fc00:1::32]:161
trace: netsnmp_register_agent_nsap(): snmp_agent.c, 1261:
netsnmp_register_agent_nsap: fd 10
netsnmp_ipv6: fmtaddr: t = (nil), data = 0x7fffed4c85d0, len = 28
trace: netsnmp_udp6_transport_bind(): transports/snmpUDPIPv6Domain.c, 303:
netsnmp_udpbase: binding socket: 9 to UDP/IPv6: [fc00:2::32]:161
```
When SNMP request comes in via Loopback IPv4, SNMP response is sent out of fd 8
```
trace: netsnmp_udpbase_send(): transports/snmpUDPBaseDomain.c, 511:
netsnmp_udp: send 170 bytes from 0x5581f2fbe30a to UDP: [10.0.0.33]:46089->[10.1.0.32]:161 on fd 8
```
When SNMP request comes in via Loopback IPv6, SNMP response is sent out of fd 10
```
netsnmp_ipv6: fmtaddr: t = (nil), data = 0x5581f2fc2ff0, len = 28
trace: netsnmp_udp6_send(): transports/snmpUDPIPv6Domain.c, 164:
netsnmp_udp6: send 170 bytes from 0x5581f2fbe30a to UDP/IPv6: [fc00::42]:43750 on fd 10
```

#### How to verify it
Verified on single asic and multi-asic devices.
Single asic SNMP query with Loopback 
```
ARISTA01T1#bash snmpget -v2c -c xxx 10.1.0.32 1.3.6.1.2.1.1.1.0
SNMPv2-MIB::sysDescr.0 = STRING: SONiC Software Version: SONiC.xx - HwSku: Arista-7260xx - Distribution: Debian 10.13 - Kernel: 4.19.0-12-2-amd64
ARISTA01T1#bash snmpget -v2c -c xxx fc00:1::32 1.3.6.1.2.1.1.1.0
SNMPv2-MIB::sysDescr.0 = STRING: SONiC Software Version: SONiC.xx - HwSku: Arista-7260xxx - Distribution: Debian 10.13 - Kernel: 4.19.0-12-2-amd64
```

On multi-asic -- no change.
```
sudo netstat -tulnp | grep 161
udp        0      0 0.0.0.0:161             0.0.0.0:*                           17978/snmpd         
udp6       0      0 :::161                  :::*                                17978/snmpd 
```
Query result using Loopback IP from a directly connected BGP neighbor
```
ARISTA01T2#bash snmpget -v2c -c xxx 10.1.0.32 1.3.6.1.2.1.1.1.0
SNMPv2-MIB::sysDescr.0 = STRING: SONiC Software Version: SONiC.xx - HwSku: xx - Distribution: Debian 9.13 - Kernel: 4.9.0-14-2-amd64
ARISTA01T2#bash snmpget -v2c -c xxx fc00:1::32 1.3.6.1.2.1.1.1.0
SNMPv2-MIB::sysDescr.0 = STRING: SONiC Software Version: SONiC.xx - HwSku: xx - Distribution: Debian 9.13 - Kernel: 4.9.0-14-2-amd64  
```
<!--
If PR needs to be backported, then the PR must be tested against the base branch and the earliest backport release branch and provide tested image version on these two branches. For example, if the PR is requested for master, 202211 and 202012, then the requester needs to provide test results on master and 202012.
-->
2023-07-14 04:32:39 +08:00