Commit Graph

7370 Commits

Author SHA1 Message Date
Zhaohui Sun
a098b92591
[202205]Change orchagent pop batch size from 8192 to 1024 (#16126)
### Why I did it
Background running lua script may cause redis-server quite busy if batch size is 8192.
If handling time exceeded default 5s, the redis-server will not response to other process and will cause syncd crash.

```
Aug  9 07:46:29.512326 str-s6100-acs-5 INFO database#supervisord: redis 68:M 09 Aug 2023 07:46:29.511 # Lua slow script detected: still in execution after 5186 milliseconds. You can try killing the script using the SCRIPT KILL command. Script SHA1 is: 88270a7c5c90583e56425aca8af8a4b8c39fe757
Aug  9 07:46:29.523716 str-s6100-acs-5 ERR syncd#syncd: :- checkReplyType: Expected to get redis type 5 got type 6, err: BUSY Redis is busy running a script. You can only call SCRIPT KILL or SHUTDOWN NOSAVE.
Aug  9 07:46:29.524818 str-s6100-acs-5 INFO syncd#supervisord: syncd terminate called after throwing an instance of '
Aug  9 07:46:29.525268 str-s6100-acs-5 ERR pmon#CCmisApi: :- checkReplyType: Expected to get redis type 5 got type 6, err: BUSY Redis is busy running a script. You can only call SCRIPT KILL or SHUTDOWN NOSAVE.
Aug  9 07:46:29.526148 str-s6100-acs-5 INFO syncd#supervisord: syncd std::system_error'
Aug  9 07:46:29.528308 str-s6100-acs-5 ERR pmon#psud[32]: :- checkReplyType: Expected to get redis type 5 got type 6, err: BUSY Redis is busy running a script. You can only call SCRIPT KILL or SHUTDOWN NOSAVE.
Aug  9 07:46:29.529048 str-s6100-acs-5 ERR lldp#python3: :- guard: RedisReply catches system_error: command: *2#015#012$3#015#012DEL#015#012$27#015#012LLDP_ENTRY_TABLE:Ethernet37#015#012, reason: BUSY Redis is busy running a script. You can only call SCRIPT KILL or SHUTDOWN NOSAVE.: Input/output error
Aug  9 07:46:29.529720 str-s6100-acs-5 ERR snmp#python3: :- guard: RedisReply catches system_error: command: *2#015#012$7#015#012HGETALL#015#012$28#015#012COUNTERS:oid:0x100000000000a#015#012, reason: BUSY Redis is busy running a script. You can only call SCRIPT KILL or SHUTDOWN NOSAVE.: Input/output error
```

88270a7c5c90583e56425aca8af8a4b8c39fe757 is /usr/share/swss/consumer_state_table_pops.lua

##### Work item tracking
- Microsoft ADO **24741990**:

#### How I did it
Change batch size from 8192 to 1024.

#### How to verify it
Run all test cases in sonic-mgmt to verify the system stability.

### Tested branch (Please provide the tested image version)

- [x] 20220531.36
2023-08-14 17:54:39 -07:00
mssonicbld
a7193556aa
[submodule] Update submodule sonic-platform-daemons to the latest HEAD automatically (#16108)
src/sonic-platform-daemons

* f5a0ffc - (HEAD -> 202205, origin/202205) Update active application selected code in transceiver_info table aft… (#381) (4 hours ago) [Michael Wang - TW]
2023-08-11 13:24:51 -07:00
mssonicbld
c34e303c6f
Update the iSMART_64 tool (#15936) (#16103)
Why I did it
Updating the iSMART_64 tool for supporting latest debian releases.

How I did it
On branch new_ismart
Changes to be committed:
(use "git restore --staged ..." to unstage)
modified: platform/broadcom/sonic-platform-modules-dell/s6100/scripts/iSMART_64

How to verify it
In s6100, run the iSMART_64 tool.
md5sum - 24725730d7649769c7ba50971c1f2955

Co-authored-by: Santhosh Kumar T <53558409+santhosh-kt@users.noreply.github.com>
2023-08-11 11:08:34 -07:00
mssonicbld
6cc1846562
[submodule] Update submodule sonic-utilities to the latest HEAD automatically (#16110)
src/sonic-utilities

* 0f001c56 - (HEAD -> 202205, origin/202205) UT change: for db_migrator test do not check for RESTAPI cert values (#2919) (4 hours ago) [Vaibhav Hemant Dixit]
* 69d348d1 - [CLI][Show][BGP] Show BGP Change for no neighbor scenario (#2885) (6 hours ago) [Dev Ojha]
* 4c6af3c3 - [multi-asic] Refine [override config table] for corner cases (#2918) (6 hours ago) [wenyiz2021]
* bef3ffeb - [db_migrator] Set docker_routing_config_mode to the value obtained from minigraph parser (#2890) (#2922) (7 hours ago) [Vaibhav Hemant Dixit]
2023-08-11 08:44:17 -07:00
mssonicbld
776201cf30
[submodule] Update submodule sonic-swss to the latest HEAD automatically (#16109)
src/sonic-swss

* 3e2974df - (HEAD -> 202205, origin/202205) [muxorch] set mux state to init upon warm reboot (#2834) (4 hours ago) [Nikola Dancejic]
2023-08-11 08:43:53 -07:00
mssonicbld
ba8a88a15d
[submodule] Update submodule sonic-platform-common to the latest HEAD automatically (#16107)
src/sonic-platform-common

* 3b993c5 - (HEAD -> 202205, origin/202205) [Credo][Ycable] enhancement and error exception for some APIs (#303) (7 hours ago) [Xinyu Lin]
* ab91fde - [ycable] add definitions of some new API's for Y-Cable infrastructure (#301) (7 hours ago) [vdahiya12]
* 2b551f2 - [Credo][Ycable] fix incorrect uart statistics (#296) (7 hours ago) [Xinyu Lin]
2023-08-11 08:42:29 -07:00
mssonicbld
7109ee0525
[submodule] Update submodule linkmgrd to the latest HEAD automatically (#16105)
src/linkmgrd

* 6ce71ba - (HEAD -> 202205, origin/202205) Add ADO to the PR template (#215) (4 hours ago) [Longxiang Lyu]
* 1010d93 - [active-standby] Write `unhealthy` is default route `N/A` (#214) (4 hours ago) [Longxiang Lyu]
* 15e9ca2 - [link prober] Increase pause/restart probe log verbosity (#213) (4 hours ago) [Longxiang Lyu]
2023-08-11 08:41:53 -07:00
mssonicbld
628e1ad981
[SNMP][IPv6]: Fix to use link local IPv6 address as snmp agentAddress (#16013) (#16102)
<!--
     Please make sure you've read and understood our contributing guidelines:
     https://github.com/Azure/SONiC/blob/gh-pages/CONTRIBUTING.md

     ** Make sure all your commits include a signature generated with `git commit -s` **

     If this is a bug fix, make sure your description includes "fixes #xxxx", or
     "closes #xxxx" or "resolves #xxxx"

     Please provide the following information:
-->

#### Why I did it
fixes: https://github.com/sonic-net/sonic-buildimage/issues/16001
Caused by: https://github.com/sonic-net/sonic-buildimage/pull/15487

The above PR introduced change to use Management and Loopback Ipv4 and ipv6 addresses as snmpagent address in snmpd.conf file.
With this change, if Link local IP address is configured as management or Loopback IPv6 address, then snmpd tries to open socket on that ipv6 address and fails with the below error:
```
Error opening specified endpoint "udp6:[fe80::5054:ff:fe6f:16f0]:161"
Server Exiting with code 1
```
From RFC4007, if we need to specify non-global ipv6 address without ambiguity, we need to use zone id along with the ipv6 address: <address>%<zone_id>
Reference: https://datatracker.ietf.org/doc/html/rfc4007

##### Work item tracking
- Microsoft ADO **(number only)**:

#### How I did it
Modify snmpd.conf file to use the %zone_id representation for ipv6 address.
#### How to verify it
In VS testbed, modify config_db to use link local ipv6 address as management address:
    "MGMT_INTERFACE": {
        "eth0|10.250.0.101/24": {
            "forced_mgmt_routes": [
                "172.17.0.1/24"
            ],
            "gwaddr": "10.250.0.1"
        },
        "eth0|fe80::5054:ff:fe6f:16f0/64": {
            "gwaddr": "fe80::1"
        }
    },

Execute config_reload after the above change.
snmpd comes up and check if snmpd is listening on ipv4 and ipv6 addresses:
```
admin@vlab-01:~$ sudo netstat -tulnp | grep 161
tcp        0      0 127.0.0.1:3161          0.0.0.0:*               LISTEN      274060/snmpd        
udp        0      0 10.1.0.32:161           0.0.0.0:*                           274060/snmpd        
udp        0      0 10.250.0.101:161        0.0.0.0:*                           274060/snmpd        
udp6       0      0 fc00:1::32:161          :::*                                274060/snmpd        
udp6       0      0 fe80::5054:ff:fe6f::161 :::*                                274060/snmpd      -- Link local 
 
admin@vlab-01:~$ sudo ifconfig eth0
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.250.0.101  netmask 255.255.255.0  broadcast 10.250.0.255
        inet6 fe80::5054:ff:fe6f:16f0  prefixlen 64  scopeid 0x20<link>
        ether 52:54:00:6f:16:f0  txqueuelen 1000  (Ethernet)
        RX packets 36384  bytes 22878123 (21.8 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 261265  bytes 46585948 (44.4 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

admin@vlab-01:~$ docker exec -it snmp snmpget -v2c -c public fe80::5054:ff:fe6f:16f0 1.3.6.1.2.1.1.1.0
iso.3.6.1.2.1.1.1.0 = STRING: "SONiC Software Version: SONiC.master.327516-04a6031b2 - HwSku: Force10-S6000 - Distribution: Debian 11.7 - Kernel: 5.10.0-18-2-amd64"
```
Logs from snmpd:
```
Turning on AgentX master support.
NET-SNMP version 5.9
Connection from UDP/IPv6: [fe80::5054:ff:fe6f:16f0%eth0]:44308
```
Ran test_snmp_loopback test to check if loopback ipv4 and ipv6 works:
```
./run_tests.sh -n vms-kvm-t0 -d vlab-01 -c snmp/test_snmp_loopback.py  -f vtestbed.yaml -i ../ansible/veos_vtb -e "--skip_sanity --disable_loganalyzer" -u
=== Running tests in groups ===
Running: pytest snmp/test_snmp_loopback.py --inventory ../ansible/veos_vtb --host-pattern vlab-01 --testbed vms-kvm-t0 --testbed_file vtestbed.yaml --log-cli-level warning --log-file-level debug --kube_master unset --showlocals --assert plain --show-capture no -rav --allow_recover --ignore=ptftests --ignore=acstests --ignore=saitests --ignore=scripts --ignore=k8s --ignore=sai_qualify --junit-xml=logs/tr.xml --log-file=logs/test.log --skip_sanity --disable_loganalyzer
..                                                                        

snmp/test_snmp_loopback.py::test_snmp_loopback[vlab-01] PASSED 
```
<!--
If PR needs to be backported, then the PR must be tested against the base branch and the earliest backport release branch and provide tested image version on these two branches. For example, if the PR is requested for master, 202211 and 202012, then the requester needs to provide test results on master and 202012.
-->

#### Which release branch to backport (provide reason below if selected)

<!--
- Note we only backport fixes to a release branch, *not* features!
- Please also provide a reason for the backporting below.
- e.g.
- [x] 202006
-->

- [ ] 201811
- [ ] 201911
- [ ] 202006
- [x] 202012
- [x] 202106
- [x] 202111
- [x] 202205
- [x] 202211
- [x] 202305

#### Tested branch (Please provide the tested image version)

<!--
- Please provide tested image version
- e.g.
- [x] 20201231.100
-->

- [ ] <!-- image version 1 -->
- [ ] <!-- image version 2 -->

#### Description for the changelog
<!--
Write a short (one line) summary that describes the changes in this
pull request for inclusion in the changelog:
-->

<!--
 Ensure to add label/tag for the feature raised. example - PR#2174 under sonic-utilities repo. where, Generic Config and Update feature has been labelled as GCU.
-->

#### Link to config_db schema for YANG module changes
<!--
Provide a link to config_db schema for the table for which YANG model
is defined
Link should point to correct section on https://github.com/Azure/sonic-buildimage/blob/master/src/sonic-yang-models/doc/Configuration.md
-->

#### A picture of a cute animal (not mandatory but encouraged)

Co-authored-by: SuvarnaMeenakshi <50386592+SuvarnaMeenakshi@users.noreply.github.com>
2023-08-11 08:40:50 -07:00
mssonicbld
5092a37a5c
[Mellanox] Remove unnecessary file manipulation in the SAI Make file (#15993) (#16101)
Signed-off-by: Kebo Liu <kebol@nvidia.com>
Co-authored-by: Kebo Liu <kebol@nvidia.com>
2023-08-11 08:40:30 -07:00
mssonicbld
270820c1cf
[chassis]: removed dependency for bgp and swss for chassis supervisor (#15734) (#16099)
Fixes #15667 and #13293

Work item tracking
Microsoft ADO 24472854:

How I did it
On chassis supervisor bgp feature is disabled in hostcfgd. The dependency between swss and bgp causes the bgp containers to start even though the feature is disabled.

How to verify it
Tests on chassis supervisor and LC

Co-authored-by: Arvindsrinivasan Lakshmi Narasimhan <55814491+arlakshm@users.noreply.github.com>
2023-08-11 08:39:22 -07:00
mssonicbld
f835098361
Fix CONFIG_DB_INITIALIZED flag check logic and set/reset flag for warmboot (#15685) (#16098)
* Fix CONFIG_DB_INITIALIZED flag check logic and set/reset flag for warm-reboot
* Fix db-cli usage
* Handle same image warm-reboot and generalize handling of INIT flag
* Cover boot from ONIE case: set config init flag when minigraph, config_db are missing
* Handle case: first boot of SONiC
* Check for config init flag
* Simplify logic, and do not call db_migrator for same image reboot

Co-authored-by: Vaibhav Hemant Dixit <vaibhav.dixit@microsoft.com>
2023-08-11 08:38:59 -07:00
Bohan Yang
45880bc126
[Arista] Update phy-credo package to support 40G/100G speed change on 7800R3-48CQ2-LC (#16006)
* Update phy-credo script

1. Fix import SonicV2Connector
2. Determine medium type based on compliance code
3. Fix runtime error on unrelated platforms

* Use the 'raw' patch of phy-credo package
2023-08-10 16:47:32 -07:00
mssonicbld
c2a7dd1e19
add service_mgmt (#15927) (#16070)
Adding yang model for CONFIG_DB table MUX_LINKMGR|SERVICE_MGMT.

sign-off: Jing Zhang zhangjing@microsoft.com

Co-authored-by: Jing Zhang <zhangjing@microsoft.com>
2023-08-10 13:35:30 -07:00
mssonicbld
a134bfe0b2
[syncd.sh] Clear semaphore before updating firmware (#15818) (#16068)
Why I did it
The hw resources should be released before updating firmware.

How I did it
Added logic to release hw resources in syncd.sh script

Signed-off-by: Vadym Hlushko <vadymh@nvidia.com>
Co-authored-by: Vadym Hlushko <62022266+vadymhlushko-mlnx@users.noreply.github.com>
2023-08-10 13:34:52 -07:00
James An
5b05598cac
Update cisco-8000.ini (#16030) 2023-08-10 13:34:30 -07:00
mssonicbld
642358c474
[E1031] fix pca9548 initializes failed occasionally (#15712) (#15994)
Why I did it
[E1031] fix pca9548 initializes failed occasionally in stress test.
When failure happened, ismt i2c bus hang up and need power cycle to
recover it.

How I did it
Add 0.5s delay between setuping and configuring pca9548 i2c mux.

How to verify it
Reboot stress test at least 100 times without failure.

Co-authored-by: Ikki Zhu <79439153+qnos@users.noreply.github.com>
2023-08-10 13:33:37 -07:00
mssonicbld
d351e05f82
[monit][dualtor] Periodically check mux neighbors consistency (#15769) (#15954)
Signed-off-by: Longxiang Lyu <lolv@microsoft.com>
Co-authored-by: Longxiang Lyu <35479537+lolyu@users.noreply.github.com>
2023-08-10 13:33:09 -07:00
mssonicbld
5d250c6264
[ci/build]: Upgrade SONiC package versions (#15940) 2023-08-10 13:32:17 -07:00
mssonicbld
cc204641cf
[YANG] add yang model for MUX_LINKMGR|MUXLOGGER (#15884) (#16021)
Add yang model for MUX_LINKMGR|MUXLOGGER.

Co-authored-by: Jing Zhang <zhangjing@microsoft.com>
2023-08-07 09:51:35 -07:00
lerry-lee
f61334fb6e
[CI/CD] Use remote PR test template from sonic-mgmt master to run PR test (#15977)
Why I did it
Use remote PR test template from sonic-mgmt master to run PR test.

How I did it
Modify PR test azure pipeline yml file.

How to verify it
PR test executing normally.

Signed-off-by: Chun'ang Li <chunangli@microsoft.com>
2023-08-01 16:19:43 +08:00
Volodymyr Samotiy
5475f85619
[202205] [Mellanox] Advance SAI submodule pointer (#15970)
Signed-off-by: Volodymyr Samotiy <volodymyrs@nvidia.com>
2023-07-26 10:40:14 -07:00
mssonicbld
a03489a413
[ci/build]: Upgrade SONiC package versions (#15939) 2023-07-22 15:52:35 -07:00
mssonicbld
1f809c8476
[submodule] Update submodule sonic-utilities to the latest HEAD automatically (#15930)
src/sonic-utilities

* 99864640 - (HEAD -> 202205, origin/202205) [dualtor] Add script to verify consistency between kernel and ASIC  (#2840) (87 minutes ago) [Longxiang Lyu]
* a32ddc1b - [show][muxcable] update `show mux config` to print out `soc_ipv6` as well  (#2909) (88 minutes ago) [Jing Zhang]
* 0c6d0c51 - [202205] Flush RESTAPI db in fast-reboot shutdown path (#2921) (4 hours ago) [bingwang-ms]
2023-07-21 09:03:28 -07:00
mssonicbld
54b21c1917
[submodule] Update submodule sonic-swss to the latest HEAD automatically (#15928)
src/sonic-swss

* 17c4d731 - (HEAD -> 202205, origin/202205) Remove system neighbor DEL operation in m_toSync if SET operation for (#2853) (3 hours ago) [Song Yuan]
2023-07-21 09:02:02 -07:00
mssonicbld
ab0768eb15
Update WRED profile on system ports (#15612) (#15914)
* Update WRED profile on system ports

Co-authored-by: vmittal-msft <46945843+vmittal-msft@users.noreply.github.com>
2023-07-20 08:39:54 -07:00
mssonicbld
c369e1fc17
[submodule] Update submodule sonic-platform-daemons to the latest HEAD automatically (#15913)
src/sonic-platform-daemons

* 8147e25 - (HEAD -> 202205, origin/202205) Revert "Added PCIe transaction check for all peripherals on the bus (#331)" (12 hours ago) [Ying Xie]
2023-07-19 20:51:42 -07:00
abdosi
b4f86cd15d
[submodule update] sonic-sairedis (#15831)
56aee6c0dfc4705584764edd35b7b2977bb762eb (HEAD -> 202205, origin/202205) [202205] Advance SAI submodule to fix issue https://github.com/sonic-net/sonic-buildimage/issues/14706 (#1265)

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
2023-07-19 18:12:54 -07:00
mssonicbld
8b2d1b2883
[submodule] Update submodule sonic-restapi to the latest HEAD automatically (#15886)
src/sonic-restapi

* a69ba06 - (HEAD -> 202205, origin/master, origin/HEAD, origin/202205, master) [actions] Support Semgrep by Github Actions (#144) (3 weeks ago) [Mai Bui]
* 6b242a3 - [Ci] Upgrade python 2 to python 3 (#145) (3 weeks ago) [xumia]
* 1c50caa - prevent downcasting of 64-bit integer (#142) (2 months ago) [Mai Bui]
* de26989 - Use -race detector when building and testing (#141) (3 months ago) [Lawrence Lee]
* 9fe2eff - [go] Update Go to version 1.15 (#140) (3 months ago) [Lawrence Lee]
2023-07-19 14:10:08 -07:00
mssonicbld
f55d574ec3
[submodule] Update submodule sonic-swss to the latest HEAD automatically (#15887)
src/sonic-swss

* ed0ca898 - (HEAD -> 202205, origin/202205) Add missing parameter to on_switch_shutdown_request method. (#2567) (34 hours ago) [Hua Liu]
2023-07-19 14:09:20 -07:00
jcaiMR
07388fd1fa
advance dhcprelay to 6a6ce24, add default dhcpv6 dualtor source interface (#15881)
sonic-build image side change to fix source interface selection in dual tor scenario.
dhcprelay related PR:
[master]fix dhcpv6 relay dual tor source interface selection issue sonic-dhcp-relay#42

Announce dhcprelay submodule to 6a6ce24 to include PR #42
2023-07-19 14:08:48 -07:00
mssonicbld
0291dae68a
[ci/build]: Upgrade SONiC package versions (#15855) 2023-07-19 08:28:14 -07:00
xumia
d0648f81c7
[Build] Fix the PyYang python package installation issue (#15890) (#15907)
Why I did it
Fix the armhf build failure.
How to reproduce the issue:

docker run -it debain:bullseye bash
apt-get update && apt-get install -y python3-pip
pip3 install PyYAML==5.4.1
Error message:

Collecting PyYAML==5.4.1
  Installing build dependencies ... done
  Getting requirements to build wheel ... error
  ERROR: Command errored out with exit status 1:
   command: /usr/bin/python3 /tmp/tmp6xabslgb_in_process.py get_requires_for_build_wheel /tmp/tmp_er01ztl
....
      raise AttributeError(attr)
  AttributeError: cython_sources
  ----------------------------------------
WARNING: Discarding d63f2d7597/PyYAML-5.4.1.tar.gz (sha256)=607774cbba28732bfa802b54baa7484215f530991055bb562efbed5b2f20a45e (from https://pypi.org/simple/pyyaml/) (requires-python:>=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.*, !=3.5.*). Command errored out with exit status 1: /usr/bin/python3 /tmp/tmp6xabslgb_in_process.py get_requires_for_build_wheel /tmp/tmp_er01ztl Check the logs for full command output.
ERROR: Could not find a version that satisfies the requirement PyYAML==5.4.1
ERROR: No matching distribution found for PyYAML==5.4.1
root@fa2fa92edcfd:/#
But if adding the option --no-build-isolation, then it is good, see fix.

install "PyYAML==5.4.1" --no-build-isolation
The same error can be found in the multiple builds.

Work item tracking
Microsoft ADO (number only): 24567457

How I did it
Add a build option --no-build-isolation.
2023-07-19 08:08:35 -07:00
StormLiangMS
1243a6e2f9
cherry-pick frr log enhancement for graceful restart (#15863)
Graceful restart is a key event for bgpd, related log print is debug level. To change it to info level to get more visibilities when this kind of event is triggered.
2023-07-19 07:32:43 -07:00
Tejaswini Chadaga
e179ba3b5e
[202205] Update Broadcom DNX SAI to 7.1.54.4 (#15850)
To pick the below fixes:

DNX fixes:

Temporarily revert fix for CS00012287482 - support for 1024 LAGs on DNX
CS00012297599 - [J2C+] sonic-mgmt failure in test_copp.py (test_no_policer[BGP])
CS00012293560 - ECN remark issue in SONiC
CS00012302371 - SONiC: V6 packets were mapped to wrong TC queue
CS00012288540 - Available ACL Entry and Counter is incorrect after removing ACL rules
Other changes (XGS fixes)

SID - L3 multicast packet drop due to wrong VFI derivation - SDK-350470
SID - SIGSEGV in linkscan callback delivery - SDK-287578
SID - Repeated VXLAN calls deletes vlan translation action profile SDK-313980
SER - error in IS_TDM_CALENDAR0/1 can cause traffic hit in TH
SID - L2_ENTRY Table Lookups May Miss
[CSP CS00012275452] sai_object_type_get_availability failed with SAI_STATUS_INVALID_PARAMETER
[CSP CS00012253527] sai_query_attribute_capability for obj type SAI_OBJECT_TYPE_SWITCH
2023-07-14 18:57:46 -07:00
lixiaoyuner
d72355d297 [ctgmgr]: do not remove label when do systemd service stop when service is in kube mode (#15642)
Why I did it
When sonic is managed by k8s, the sonic container is managed by k8s daemonset, daemonset identifies its members by labels. Currently when restarting a sonic service by systemctl, if the service's container is already managed by k8s, systemd script stops the container by removing the feature label to make it disjoin from k8s daemonset, and then starts it by adding the label to make it join k8s daemonset again.

This behavior would cause problem during k8s container upgrade. Containers in daemonset are upgraded in a rolling fashion, that means the daemonset version is updated first, then rollout the new version to containers with precheck/postcheck one by one. However, if a sonic device joins a daemonset, k8s will directly deploy a pod with the current version of daemonset, it is expected when a device joins k8s cluster at first time.

But for a device which has already joined k8s cluster, the re-joining daemonset will cause the container upgraded to new version without precheck, so if a systemd service is restarted during daemonset upgrade, the container may be upgraded without precheck and break rolling update policy. To fix it, we need to remove the logic about dropping k8s label in systemd service stop script for kube mode.

Work item tracking
Microsoft ADO (number only): 24304563

How I did it
Don't drop label in systemd service stop script when feature's set_owner is kube. Only drop label when feature's set_owner is local.

How to verify it
The label feature_enabled should be always true if the feature's set owner is kube.
2023-07-15 06:34:32 +08:00
mssonicbld
1b32bf6b2d
update rsyslog log size conf (#15821) (#15845) 2023-07-15 05:47:03 +08:00
mssonicbld
f02ca9a749
Potential fix for Celestica E1031 device hang (#15822) (#15844) 2023-07-15 05:29:15 +08:00
mssonicbld
7cfb71bc18
[SNMP][IPv6]: Fix SNMP IPv6 reachability issue in certain scenarios (#15487) (#15826)
Modify snmpd.conf to start snmpd to listen on specific management and loopback ips instead of listening on any ip.

#### Why I did it
SNMP over IPv6 is not working for all scenarios for a single asic platforms.
The expectation is that SNMP query over IPv6 should work over Management or Loopback0 addresses.
**Specific scenario where this issue is seen**
In case of Lab T0 device,  when SNMP request is sent from a directly connected T1 neighbor over Loopback IP, SNMP response was not received.
This was because the SRC IP address in SNMP response was not Loopback IP, it was the PortChannel IP connected to the neighboring device.
```
23:18:51.620897  In 22:26:27:e6:e0:07 ethertype IPv6 (0x86dd), length 105: fc00::72.41725 > **fc00:1::32**.161:  C="msft" **GetRequest**(28)  .1.3.6.1.2.1.1.1.0
23:18:51.621441 Out 28:99:3a:a0:97:30 ethertype IPv6 (0x86dd), length 241: **fc00::71**.161 > fc00::72.41725:  C="msft" **GetResponse**(162)  .1.3.6.1.2.1.1.1.0="SONiC Software Version: SONiC.xxx - HwSku: xx - Distribution: Debian 10.13 - Kernel: 4.19.0-12-2-amd64"
```
In case of IPv4, the SRC IP in SNMP response was correctly set to Loopback IP.
```
23:25:32.769712  In 22:26:27:e6:e0:07 ethertype IPv4 (0x0800), length 85: 10.0.0.57.56701 > **10.1.0.32**.161:  C="msft" **GetRequest**(28)  .1.3.6.1.2.1.1.1.0
23:25:32.975967 Out 28:99:3a:a0:97:30 ethertype IPv4 (0x0800), length 221: **10.1.0.32**.161 > 10.0.0.57.56701:  C="msft" **GetResponse**(162)  .1.3.6.1.2.1.1.1.0="SONiC Software Version: SONiC.xxx - HwSku: xx - Distribution: Debian 10.13 - Kernel: 4.19.0-12-2-amd64"
```

**Sequence of SNMP request and response**
1. SNMP request will be sent with SRC IP fc00::72 DST IP fc00:1::32
2. SNMP request is received at SONiC device is sent to snmpd which is listening on port 161 :::161/
3. snmpd process will parse the request create a response and sent to DST IP fc00::72. 
snmpd process does not track the DST IP on which the SNMP request was received, which in this case is Loopback IP.
snmpd process will only keep track what is tht IP to which the response should be sent to.
4. snmpd process will send the response packet.
5. Kernel will do a route look up on destination IP and find the best path.
ip -6 route get fc00::72
fc00::72 from :: dev PortChannel101 proto kernel src fc00::71 metric 256 pref medium
5. Using the "src" ip from about, the response is sent out. This SRC ip is that of the PortChannel and not the device Loopback IP.

The same issue is seen when SNMP query is sent from a remote server over Management IP.
SONiC device eth0 --------- Remote server
SNMP request comes with SRC IP <Remote_server> DST IP <Mgmt IP>
If kernel finds best route to Remote_server_IP is via BGP neighbors, then it will send the response via front-panel interface with SRC IP as Loopback IP instead of Management IP.

Main issue is that in case of IPv6, snmpd ignores the IP address to which SNMP request was sent, in case of IPv6.
In case of IPv4, snmpd keeps track of DST IP of SNMP request, it will keep track if the SNMP request was sent to mgmt IP or Loopback IP.
Later, this IP is used in ipi_spec_dst as SRC IP which helps kernel to find the route based on DST IP using the right SRC IP.
https://github.com/net-snmp/net-snmp/blob/master/snmplib/transports/snmpUDPBaseDomain.c#L300 
ipi.ipi_spec_dst.s_addr = srcip->s_addr
Reference: https://man7.org/linux/man-pages/man7/ip.7.html
```
If IP_PKTINFO is passed to sendmsg(2)
              and ipi_spec_dst is not zero, then it is used as the local
              source address for the routing table lookup and for
              setting up IP source route options.  When ipi_ifindex is
              not zero, the primary local address of the interface
              specified by the index overwrites ipi_spec_dst for the
              routing table lookup.
```

**This issue is not seen on multi-asic platform, why?**
on multi-asic platform, there exists different network namespaces.
SNMP docker with snmpd process runs on host namespace.
Management interface belongs to host namespace.
Loopback0 is configured on asic namespaces.
Additional inforamtion on how the packet coming over Loopback IP reaches snmpd process running on host namespace: https://github.com/sonic-net/sonic-buildimage/pull/5420
Because of this separation of network namespaces, the route lookup of destination IP is confined to routing table of specific namespace where packet is received.
if packet is received over management interface, SNMP response also is sent out of management interface. Same goes with packet received over Loopback Ip.

##### Work item tracking
- Microsoft ADO **17537063**:

#### How I did it
Have snmpd listen on specific Management and Loopback IPs specifically instead of listening on any IP for single-asic platform.

Before Fix
```
admin@xx:~$ sudo netstat -tulnp | grep 161   
udp        0      0 0.0.0.0:161             0.0.0.0:*                           15631/snmpd         
udp6       0      0 :::161                  :::*                                15631/snmpd  
```
After fix
```
admin@device:~$ sudo netstat -tulnp | grep 161
udp        0      0 10.1.0.32:161           0.0.0.0:*                           215899/snmpd        
udp        0      0 10.3.1.1:161             0.0.0.0:*                           215899/snmpd        
udp6       0      0 fc00:1::32:161          :::*                                215899/snmpd        
udp6       0      0 fc00:2::32:161          :::*                                215899/snmpd  
``` 

**How this change helps with the issue?**
To see snmpd trace logs, modify snmpd to start using the below parameters, in supervisord.conf file
```
/usr/sbin/snmpd -f -LS0-7i -Lf /var/log/snmpd.log
```
When snmpd listens on any IP, snmpd binds to IPv4 and IPv6 sockets as below:
```
netsnmp_udpbase: binding socket: 7 to UDP: [0.0.0.0]:0->[0.0.0.0]:161
trace: netsnmp_udp6_transport_bind(): transports/snmpUDPIPv6Domain.c, 303:
netsnmp_udpbase: binding socket: 8 to UDP/IPv6: [::]:161
```

When IPv4 response is sent, it goes out of fd 7 and IPv6 response goes out of fd 8.
When IPv6 response is sent, it does not have the right SRC IP and it can lead to the issue described.

When snmpd listens on specific Loopback/Management IPs, snmpd binds to different sockets:
```
trace: netsnmp_udpipv4base_transport_bind(): transports/snmpUDPIPv4BaseDomain.c, 207:
netsnmp_udpbase: binding socket: 7 to UDP: [0.0.0.0]:0->[10.250.0.101]:161
trace: netsnmp_udpipv4base_transport_bind(): transports/snmpUDPIPv4BaseDomain.c, 207:
netsnmp_udpbase: binding socket: 8 to UDP: [0.0.0.0]:0->[10.1.0.32]:161
trace: netsnmp_register_agent_nsap(): snmp_agent.c, 1261:
netsnmp_register_agent_nsap: fd 8
netsnmp_udpbase: binding socket: 10 to UDP/IPv6: [fc00:1::32]:161
trace: netsnmp_register_agent_nsap(): snmp_agent.c, 1261:
netsnmp_register_agent_nsap: fd 10
netsnmp_ipv6: fmtaddr: t = (nil), data = 0x7fffed4c85d0, len = 28
trace: netsnmp_udp6_transport_bind(): transports/snmpUDPIPv6Domain.c, 303:
netsnmp_udpbase: binding socket: 9 to UDP/IPv6: [fc00:2::32]:161
```
When SNMP request comes in via Loopback IPv4, SNMP response is sent out of fd 8
```
trace: netsnmp_udpbase_send(): transports/snmpUDPBaseDomain.c, 511:
netsnmp_udp: send 170 bytes from 0x5581f2fbe30a to UDP: [10.0.0.33]:46089->[10.1.0.32]:161 on fd 8
```
When SNMP request comes in via Loopback IPv6, SNMP response is sent out of fd 10
```
netsnmp_ipv6: fmtaddr: t = (nil), data = 0x5581f2fc2ff0, len = 28
trace: netsnmp_udp6_send(): transports/snmpUDPIPv6Domain.c, 164:
netsnmp_udp6: send 170 bytes from 0x5581f2fbe30a to UDP/IPv6: [fc00::42]:43750 on fd 10
```

#### How to verify it
Verified on single asic and multi-asic devices.
Single asic SNMP query with Loopback 
```
ARISTA01T1#bash snmpget -v2c -c xxx 10.1.0.32 1.3.6.1.2.1.1.1.0
SNMPv2-MIB::sysDescr.0 = STRING: SONiC Software Version: SONiC.xx - HwSku: Arista-7260xx - Distribution: Debian 10.13 - Kernel: 4.19.0-12-2-amd64
ARISTA01T1#bash snmpget -v2c -c xxx fc00:1::32 1.3.6.1.2.1.1.1.0
SNMPv2-MIB::sysDescr.0 = STRING: SONiC Software Version: SONiC.xx - HwSku: Arista-7260xxx - Distribution: Debian 10.13 - Kernel: 4.19.0-12-2-amd64
```

On multi-asic -- no change.
```
sudo netstat -tulnp | grep 161
udp        0      0 0.0.0.0:161             0.0.0.0:*                           17978/snmpd         
udp6       0      0 :::161                  :::*                                17978/snmpd 
```
Query result using Loopback IP from a directly connected BGP neighbor
```
ARISTA01T2#bash snmpget -v2c -c xxx 10.1.0.32 1.3.6.1.2.1.1.1.0
SNMPv2-MIB::sysDescr.0 = STRING: SONiC Software Version: SONiC.xx - HwSku: xx - Distribution: Debian 9.13 - Kernel: 4.9.0-14-2-amd64
ARISTA01T2#bash snmpget -v2c -c xxx fc00:1::32 1.3.6.1.2.1.1.1.0
SNMPv2-MIB::sysDescr.0 = STRING: SONiC Software Version: SONiC.xx - HwSku: xx - Distribution: Debian 9.13 - Kernel: 4.9.0-14-2-amd64  
```
<!--
If PR needs to be backported, then the PR must be tested against the base branch and the earliest backport release branch and provide tested image version on these two branches. For example, if the PR is requested for master, 202211 and 202012, then the requester needs to provide test results on master and 202012.
-->

Co-authored-by: SuvarnaMeenakshi <50386592+SuvarnaMeenakshi@users.noreply.github.com>
2023-07-14 10:16:50 -07:00
lixiaoyuner
65638f18da
[ctrmgr]: Container image clean up bug fix (#15772) (#15840)
Why I did it
When do clean up container images, current code has two bugs need to be fixed. And some variables' name maybe cause confused, change the variables' name.

Work item tracking
Microsoft ADO (number only): 24502294

How I did it
We do clean up after tag latest successfully. But currently tag latest function only return 0 and 1, 0 means succeed and 1 means failed, when we get 1, we will retry, when we get 0, we will do clean up. Actually the code 0 includes another case we don't need to do clean up. The case is that when we are doing tag latest, the container image we want to tag maybe not running, so we can not tag latest and don't need to cleanup, we need to separate this case from 0, return -1 now.

When local mode(v1) -> kube mode(v2) happens, one problem is how to handle the local image, there are two cases. one case is that there was one kube v1 container dry-run(cause we don't relace the local if kube version = local version), we will remove the kube v1 image and tag the local version with ACR prefix and remove local v1 local tag. Another case is that there was no kube v1 container dry-run, we remove the local v1 image directly, cause the local v1 image should not be the last desire version.

About the docker_id variable, it may cause confused, it's actually docker image id, so rename the variable. About the two dicts and the list, rename them to be more readable.

How to verify it
Check tag latest and image clean up result.
2023-07-14 08:46:11 -07:00
lixiaoyuner
665256fb42
[k8s]: Bypass the systemd service restart limit and do immediately restart when change to local mode (#15432) (#15839)
Why I did it
During the upgrade process via k8s, the feature's systemd service will restart as well, all of the feature systemd service has restart number limit, and the limit number is too small, only three times. if fallback happens when upgrade, the start count will be 2, just once again, the systemd service will be down. So, need to bypass this. This restart function will be called when do local -> kube, kube -> kube, kube ->local, each time call this function, we indeed need to restart successfully, so do reset-failed every time we do restart.
When need to go back to local mode, we do systemd restart immediately without waiting the default restart interval time so that we can reduce the container down time.

Work item tracking
Microsoft ADO (number only):
24172368

How I did it
Before every restart for upgrade, do reset feature's restart number. The restart number will be reset to 0 to bypass the restart limit.
When need to go back to local mode, we do systemd restart immediately.

How to verify it
Feature's systemd service can be always restarted successfully during upgrade process via k8s.
2023-07-14 08:43:23 -07:00
mssonicbld
786fdd087c
[submodule] Update submodule sonic-utilities to the latest HEAD automatically (#15835)
src/sonic-utilities

* bc7c7929 - (HEAD -> 202205, origin/202205) Add FEC correctable and uncorrectable port stats (#2027) (10 hours ago) [Prince George]
* 58db48ad - [show][muxcable] update `show mux tunnel-route` to check soc_ipv6 as well (10 hours ago) [Jing Zhang]
* 24fc1db8 - [dualtor][route_check] filter out `soc_ipv6`  (#2899) (10 hours ago) [Jing Zhang]
* d89d4832 - [route_check][dualtor] Ignore vlan neighbor route miss (#2888) (10 hours ago) [Longxiang Lyu]
2023-07-14 08:42:51 -07:00
mssonicbld
a3b0e7f4e4
[submodule] Update submodule sonic-sairedis to the latest HEAD automatically (#15833)
src/sonic-sairedis

* 56aee6c - (HEAD -> 202205, origin/202205) [202205] Advance SAI submodule to fix issue https://github.com/sonic-net/sonic-buildimage/issues/14706 (#1265) (10 hours ago) [abdosi]
2023-07-14 08:42:17 -07:00
mssonicbld
4a248c58f5
[submodule] Update submodule sonic-swss to the latest HEAD automatically (#15834)
src/sonic-swss

* 493f66f5 - (HEAD -> 202205, origin/202205) Remove redundant updateFabricPortState (#2850) (6 hours ago) [kenneth-arista]
2023-07-14 08:41:34 -07:00
mssonicbld
01342856f7
[submodule] Update submodule sonic-platform-daemons to the latest HEAD automatically (#15832)
src/sonic-platform-daemons

* bef58aa - (HEAD -> 202205, origin/202205) Added PCIe transaction check for all peripherals on the bus (#331) (10 hours ago) [Ashwin Srinivasan]
2023-07-14 08:41:04 -07:00
lixiaoyuner
fb14b987b4 Add health check probe for k8s upgrade containers. (#15223)
#### Why I did it
After k8s upgrade a container, k8s can only know the container is running, don't know the service's status inside container. So we need a probe inside container, k8s will call the probe to check whether the container is really ready.
##### Work item tracking
- Microsoft ADO **(number only)**: 22453004
#### How I did it
Add a health check probe inside config engine container, the probe will check whether the start service exit normally or not if the start service exists and call the python script to do container self-related specific checks if the script is there. The python script should be implemented by feature owner if it's needed.

more details: [design doc](https://github.com/sonic-net/SONiC/blob/master/doc/kubernetes/health-check.md)
#### How to verify it
Check path /usr/bin/readiness_probe.sh inside container.

#### Which release branch to backport (provide reason below if selected)

- [ ] 201811
- [ ] 201911
- [ ] 202006
- [ ] 202012
- [ ] 202106
- [ ] 202111
- [x] 202205
- [x] 202211

#### Tested branch (Please provide the tested image version)
- [x] 20220531.28
2023-07-14 04:32:45 +08:00
mssonicbld
9cd3319495
Pick dependency files in submodules. (#15142) (#15827) 2023-07-14 04:30:32 +08:00
zitingguo-ms
e73924ebc9
Upgrade XGS SAI version to 7.1.54.4 (#15820)
Signed-off-by: zitingguo-ms <zitingguo@microsoft.com>
2023-07-13 09:13:34 -07:00
mssonicbld
2d39daf95f
[submodule] Update submodule sonic-utilities to the latest HEAD automatically (#15809)
src/sonic-utilities

* 20853a6f - (HEAD -> 202205, origin/202205) Revert "[GCU Feature Update] Cherry-pick Platform Validator PR to 202205  (#2883)" (#2908) (6 hours ago) [isabelmsft]
2023-07-13 08:27:51 -07:00
mssonicbld
7c6a1612d1
[ci/build]: Upgrade SONiC package versions (#15766) 2023-07-13 08:27:25 -07:00
James An
c39e370592
Update cisco-8000.ini (#15813) 2023-07-13 08:26:36 -07:00