Commit Graph

7259 Commits

Author SHA1 Message Date
mssonicbld
624a5d489f
[Mellanox] Revise label name and fix typo in sensor.conf of 4600C (#16271) (#16467)
- Why I did it
Revise lable name and fix typo in sensor.conf of 4600C

- How I did it
Revise lable name and fix typo in sensor.conf of 4600C

- How to verify it
Manual test
sonic-mgmt test_sensors.py

Co-authored-by: Junchao-Mellanox <57339448+Junchao-Mellanox@users.noreply.github.com>
2023-09-06 16:08:27 -07:00
mssonicbld
0fe5c9fc7d
[platform]: Disable interrupt for intel i2c-i801 driver (#16309) (#16457)
On S6100 we are seeing almost 100K interrupts per second on intels i801 SMBUS controller which affects systems performance.

We now disable the i801 driver interrupt and instead enable polling

Microsoft ADO (number only): 24910530

How I did it
Disable the interrupt by passing the interrupt disable feature argument to i2c-i801 driver

How to verify it
This fix is NOT applicable for ARM based platforms. Applicable only for intel based platforms:-

- On SN2700 its already disabled in Mellanox hw-mgmt
- Celestica DX010 and E1031
- Dell S6100 verified the interrupts are no longer incrementing.
- Arista 7260CX3

Signed-off-by: Prince George <prgeor@microsoft.com>
Co-authored-by: Prince George <45705344+prgeor@users.noreply.github.com>
2023-09-06 09:49:58 -07:00
mssonicbld
07955af2ed
[ci/build]: Upgrade SONiC package versions (#16316) 2023-09-05 21:54:50 -07:00
mssonicbld
89f091eded
[Mellanox] set select timeout to no more than 1 sec to make sure fast shutdown (#13611) (#16449)
- Why I did it
Commit sonic-net/sonic-platform-daemons@153ea47 changed SfpStateUpdateTask from Process to Thread. In this commit, it raises an exception in SfpStateUpdateTask to make shutdown flow fast. But it does not work on Nvidia platform as Nvidia platform is passing timeout parameter of get_change_event to select. Linux select function can not be interrupted by a Python exception. There is no such issue on Nvidia platform before that commit. However, in order to comply with the commit and make shutdown flow fast, we decided to change Nvidia platform API implementation.

To fix issue #13591.

- How I did it
The select call in get_change_event should use no more than 1 second as timeout parameter.
Outside the select call, add a while loop to make sure timeout parameter of get_change_event work as expected

- How to verify it
Manual test

Co-authored-by: Junchao-Mellanox <57339448+Junchao-Mellanox@users.noreply.github.com>
2023-09-05 21:53:08 -07:00
mssonicbld
d5e2c0004f
Assign the higher metric value for Ipv6 default route learnt via RA message (#16367) (#16440)
* Fix the Loopback0 IPv6 address of LC's in chassis not reachable from peer device's
* Assign the metric vaule for Ipv6 default route learnt via RA message to higher value so that BGP learnt default route is higher priority.

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
Co-authored-by: abdosi <58047199+abdosi@users.noreply.github.com>
2023-09-05 21:52:38 -07:00
mssonicbld
a9564286b2
[submodule] Update submodule sonic-platform-daemons to the latest HEAD automatically (#16390)
src/sonic-platform-daemons

* 0258ecf - (HEAD -> 202205, origin/202205) [pmon][chassis][voq] Chassis DB cleanup when module is down (#394) (9 hours ago) [vganesan-nokia]
2023-09-05 21:48:11 -07:00
mssonicbld
8cac746a03
Fix openconfig_acl.py (#16303) (#16345)
How I did it
Fix the regex for L4 port range in openconfig_acl.py.

How to verify it
Build image and install on Arista-720DT DUT, then try the repro steps in #16189 and confirmed the ACL rule be setup correctly:

Co-authored-by: Zhijian Li <zhijianli@microsoft.com>
2023-09-05 21:47:55 -07:00
James An
1d3d70986f
Update cisco-8000.ini (#16387)
Why I did it

Common Release Notes for 8102-64H, T0/DualTor, and 8101-32FH

Fix for an issue where drop counters were incrementing twice for packets with invalid tag
Fix for the ECC errors reported in SR 695600099
Fix for fwutil show updates failure

How I did it

Update platform version to 202205.2.2.11
2023-09-05 21:42:59 -07:00
jcaiMR
8787b71e03
fix counter log issue on 32bits platform (#16357)
Cherry pick sonic-net/sonic-dhcpmon#11 into 202205.
2023-09-05 09:42:56 -07:00
Junchao-Mellanox
874ca68060
Fix issue: set has_timer attribute to true for platform monitor service (#15624)
There is a redundant line in init_cfg.json.j2. It would cause pmon service always has "has_timer=False". However, we know that PMON has a timer now. So, I try to fix it here.
2023-09-04 19:38:21 -07:00
Arvindsrinivasan Lakshmi Narasimhan
c4c2c00c11
submodule update sonic-platform-daamons (#16386)
Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>
2023-09-01 20:12:38 -07:00
mssonicbld
74dbafe728
[Nokia-IXR7250E] Modify the platform_ndk.json for Nokia-IXR7250E platform (#16355) (#16382)
Signed-off-by: mlok <marty.lok@nokia.com>
Co-authored-by: Marty Y. Lok <76118573+mlok-nokia@users.noreply.github.com>
2023-09-01 20:12:14 -07:00
Arvindsrinivasan Lakshmi Narasimhan
18fb27b84d
patch fix to ignore dup nh in netlink msg (#16385)
Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>
2023-09-01 20:11:43 -07:00
mssonicbld
f7f2e654c4
[chassis] Chassis DB cleanup when asic comes up (#16213) (#16378)
* [chassis]Chassis DB cleanup when asic comes up

Cleanup the entries from the following tables in chassis app db in
redis_chassis server in the supervisor
(1) SYSTEM_NEIGH
(2) SYSTEM_INTERFACE
(3) SYSTEM_LAG_MEMBER_TABLE
(4) SYSTEM_LAG_TABLE
As part of the clean up only those entries created by the asic that
is coming up are deleted. The LAG IDs used by the asics are also
de-allocated from SYSTEM_LAG_ID_TABLE and SYSTEM_LAG_ID_SET

- Added check to run the chassis db clean up only for voq switches.

Signed-off-by: vedganes <veda.ganesan@nokia.com>
Co-authored-by: vganesan-nokia <67648637+vganesan-nokia@users.noreply.github.com>
2023-09-01 16:20:31 -07:00
mssonicbld
88d692f987
[Nokia][DeviceData] Update the Nokia platform IXR-7250E device data (#16028) (#16381)
Why I did it
Update the platform_reboot of Nokia Platform IXR-7250E-36x400G to displays the correct reboot-cause history when reboot from supervisor card.

Work item tracking
Microsoft ADO (number only):
How I did it
Modify the platform_reboot script to copy the correct reboo-cause.txt file from NDK to the /host/reboot-cause directory at the down cycle when the reboot is issued from Supervisor (for both reboot right after install a new image and normal reboot)

Signed-off-by: mlok <marty.lok@nokia.com>
Co-authored-by: Marty Y. Lok <76118573+mlok-nokia@users.noreply.github.com>
2023-09-01 16:19:22 -07:00
Xichen96
35bb472601
[installer] add processor.max_cstate=1 to intel kernel cmdline for intel cpu (#16371)
This is a fix for PR #6051

The original PR will disable intel idle driver but it cannot limit the max c-state to 1 due to system will fall back to acpi idle driver.

Currently intel_idle.max_cstate=0 is already present, which will disable intel idle driver. With the added option, common idle driver will be disabled as well, so there will not be idle management. This is to prevent a bug that can be triggered by idle instruction on intel platform.

Work item tracking
Microsoft ADO (number only): 24867921

How I did it
Add the option to installer file beside intel_idle.max_cstate=0

Signed-off-by: Xichen Lin <lukelin0907@gmail.com>
2023-09-01 11:05:12 -07:00
mssonicbld
896b8e7209
[submodule] Update submodule sonic-swss to the latest HEAD automatically (#16333)
src/sonic-swss

* d787d50d - (HEAD -> 202205, origin/202205) Remove fabric queue counters. (#2862) (2 days ago) [jfeng-arista]
* 4579d43f - update portStatIds for cisco (#2876) (3 days ago) [Zhixin Zhu]
2023-09-01 09:10:46 -07:00
mssonicbld
95d7d440c2
[submodule] Update submodule sonic-linux-kernel to the latest HEAD automatically (#16331)
src/sonic-linux-kernel

* db00eb9 - (HEAD -> 202205, origin/202205) PATCH] net: allow user to set metric on default route learned via Router Advertisement (#326) (2 days ago) [abdosi]
2023-09-01 09:10:11 -07:00
vmittal-msft
134a22221c
Update CPU transmitted packets to queue 7 for chassis (#16349) 2023-08-31 08:57:02 -07:00
Tejaswini Chadaga
56d6ed14c0
[202205] Update Broadcom DNX SAI version to 7.1.60.4 (#16351)
To include the following fixes:

DNX:

CS00012287482 - Support for 1024 LAGs on DNX (Added back fix reverted in [202205] Update Broadcom DNX SAI version to 7.1.54.4 #15850)
CS00012302400 - New SAI 7.1.50.4 caused regression in sonic-mgmt ACL test &
ACL entry creation failing with SAI_STATUS_INVALID_PORT_NUMBER in SAI 7.1.50.4
(CS00012302347)
CS00012302163 - SAI_API_BRIDGE:_brcm_sai_bridge_port_learn_flag:1620 sai bridge lag port list get. failed with error -7.
CS00012296571 - LACP packets are queued to Queue 0 instead of Queue 7
CS00012301919 - The traffic is queued to VOQ 8 sometimes instead of destination port's VOQ
CS00012297160 - [SONIC] [J2C+] Traffic to unknown destination route getting enqueued on VOQ 10
CS00012298730 - [7.x][J2/J2C+] : Treat Q=0 as lowest priority and Q=7 as highest priority in Strict Priority Scheduling
Also includes -
XGS:

Port SONIC-62323 to SAI 7.1, Use single NH instead of ecmp
[SAI_BRANCH rel_ocp_sai_7_1] ECMP group expansion fail due to no resources
Fix capability for Hostif queue on SAI version 7.1
CS00012302193 - SAI_SWITCH_ATTR_SWITCH_HARDWARE_INFO attribute value changed
2023-08-31 08:32:13 -07:00
Samuel Angebault
4e87caf6bb
[202205][Arista] Ignore poll errors during get_event_change (#16304)
This is a backport of #16112
Handle exceptions gracefully within get_change_event
2023-08-29 11:43:20 -07:00
mssonicbld
d17ed9d9d6
[submodule] Update submodule sonic-sairedis to the latest HEAD automatically (#16293)
src/sonic-sairedis

* 70242e7 - (HEAD -> 202205, origin/202205) [CI]: Fix collect log error in azp template. (#1281) (2 days ago) [Nazarii Hnydyn]
2023-08-28 09:32:05 -07:00
mssonicbld
46e562b881
[ci/build]: Upgrade SONiC package versions (#16214) 2023-08-28 09:29:43 -07:00
mssonicbld
b0660ebb2d
[submodule] Update submodule sonic-utilities to the latest HEAD automatically (#16265)
src/sonic-utilities

* 1ed5b5a9 - (HEAD -> 202205, origin/202205) Add transceiver status CLI to show output from TRANSCEIVER_STATUS table (cherry-pick to 202205) (#2950) (4 days ago) [longhuan-cisco]
* ba327726 - Fix in config override when all asic namespaces not present in golden_config_db (#2946) (4 days ago) [judyjoseph]
2023-08-28 09:29:18 -07:00
mssonicbld
d264df3984
Dell S6100 Platform API 2.0 fixes (#16208) (#16252)
Why I did it
Dell S6100 Platform components needs to be updated.

How I did it
Modified platform.json to fix the issue.

How to verify it
Run sonic-mgmt component test and check whether it passes.

Co-authored-by: Aravind Mani <53524901+aravindmani-1@users.noreply.github.com>
2023-08-25 17:05:37 -07:00
mssonicbld
f04206922a
[submodule] Update submodule sonic-platform-common to the latest HEAD automatically (#16264)
src/sonic-platform-common

* b41db16 - (HEAD -> 202205, origin/202205) Move tx_disable/tx_disabled_channel/rx_los/tx_fault  to get_transceiver_status API (#359) (#395) (32 hours ago) [longhuan-cisco]
2023-08-25 17:04:44 -07:00
mssonicbld
8757e6b8d9
[YANG SONIC-ACL] Fix Yang definition of IN_PORTS and OUT_PORTS (#16220) (#16235)
How I did it
Update Yang definition of IN_PORTS and OUT_PORTS to string.
Since we cannot split the string with comma (,) and validate each substring is a valid SONiC port name. The only restriction for them is must be a string.

How to verify it
Verified by building sonic_yang_models-1.0-py3-none-any.whl. While building the target package, unit tests were run and passed.
Build a SONiC image based on 202205 branch and installed on physical DUT. Re try the steps in [Yang] Incorrect definition of IN_PORTS and OUT_PORTS in sonic-acl.yang #16190 and can see below success response:

Co-authored-by: Zhijian Li <zhijianli@microsoft.com>
2023-08-25 17:03:46 -07:00
judyjoseph
d91565ba5e
sudo not required explicitly as /bin/ip netns identify is part of READ_ONLY_CMDS in sudoers file (#16258)
Cherry-pick PR :#16115
2023-08-25 17:02:26 -07:00
Junchao-Mellanox
d19d904f6a
[Mellanox] Fix issue: watchdogutil command does not work (#16091) (#16260)
- Why I did it
watchdogutil uses platform API watchdog instance to control/query watchdog status. In Nvidia watchdog status, it caches "armed" status in a object member "WatchdogImplBase.armed". This is not working for CLI infrastructure because each CLI will create a new watchdog instance, the status cached in previous instance will totally lose. Consider following commands:

admin@sonic:~$ sudo watchdogutil arm -s 100      =====> watchdog instance1, armed=True
Watchdog armed for 100 seconds
admin@sonic:~$ sudo watchdogutil status             ======> watchdog instance2, armed=False
Status: Unarmed
admin@sonic:~$ sudo watchdogutil disarm            =======> watchdog instance3, armed=False
Failed to disarm Watchdog

- How I did it
Use sysfs to query watchdog status

- How to verify it
Manual test
Unit test
Conflicts:
	platform/mellanox/mlnx-platform-api/sonic_platform/watchdog.py
	platform/mellanox/mlnx-platform-api/tests/test_watchdog.py
2023-08-25 17:01:37 -07:00
Junchao-Mellanox
611449dc88
Fix issue: systemctl daemon-reload would sporadically cause udev handler fail (#15253) (#15959)
A workaround to back port the fix for a systemd issue.

The systemd issue: systemd/systemd#24668
The systemd PR to fix the issue: https://github.com/systemd/systemd/pull/24673/files

The formal solution should upgrade systemd to a version that contains the fix. But, systemd is a very basic service, upgrading systemd requires heavy test.
2023-08-22 09:54:56 -07:00
mssonicbld
be818f146f
[submodule] Update submodule sonic-platform-daemons to the latest HEAD automatically (#16196)
src/sonic-platform-daemons

* b01c88d - (HEAD -> 202205, origin/202205) [ycable] Curb log messages in active-active by changing verbosity level; fix missing namespaces in delete event handle (#391) (4 days ago) [vdahiya12]
2023-08-22 08:58:06 -07:00
James An
c63df22c7d
Update cisco-8000.ini (#16171)
Release Notes for Cisco T0 and 8102-64H.
• Fix for PSUD crash when PSUs are inserted in an operational system
• Fix for VxLAN counters not incrementing in show vxlan counter' and 'show platform npu vxlan counters'
• Fix for continuous error messages reported by thermalctld
• Fix for dshell client enable/disable causing syncd crash
• Support for 9100 TPID for Cisco fanout.
• Caveat: Drop counters for packets with invalid VLAN tag are counted twice.

Release Notes for Cisco 8101-32FH:
• Aikido FPD 1.89 Upgrade
2023-08-21 09:32:41 -07:00
Pavan-Nokia
5d4a201453
[armhf][Nokia-7215]Add HWSKU files for new SAI (#16175)
Add new easy bringup (EZB) files for new SAI 1.10.2-5
2023-08-18 11:23:23 -07:00
Rajkumar-Marvell
1d3b2b6383
[Marvell] Update armhf sai debian (#16172)
Added fix for IPv6 Egress ACL, dir_bcast testcase failures.

Signed-off-by: rajkumar38 <rpennadamram@marvell.com>
2023-08-18 11:22:40 -07:00
zitingguo-ms
0184109a8c
upgrade XGS SAI to 7.1.54.4-3 (#16201)
Update SAI xgs version to 7.1.54.4-3 to include the following XGS changes:

7.1.54.3-1: Port SONIC-62323 to SAI 7.1, Use single NH instead of ecmp
7.1.54.3-2: [SAI_BRANCH rel_ocp_sai_7_1] ECMP group expansion fail due to no resources
7.1.54.3-3: Fix capability for Hostif queue on SAI version 7.1

Signed-off-by: zitingguo-ms <zitingguo@microsoft.com>
2023-08-18 11:22:01 -07:00
mssonicbld
5be045beed
[submodule] Update submodule sonic-utilities to the latest HEAD automatically (#16198)
src/sonic-utilities

* 56a1ae24 - (HEAD -> 202205, origin/202205) clear: Fix clear queuecounters to also clear VOQ counters (#2879) (10 hours ago) [Patrick MacArthur]
2023-08-18 11:21:13 -07:00
mssonicbld
a5eda5aaa8
Updated PG headroom settings for 40g port speed (#16038) (#16177)
Co-authored-by: vmittal-msft <46945843+vmittal-msft@users.noreply.github.com>
2023-08-17 08:40:00 -07:00
mssonicbld
f95031b5ab
[ci/build]: Upgrade SONiC package versions (#16124) 2023-08-16 13:30:16 -07:00
mssonicbld
a61bb76026
[submodule] Update submodule sonic-platform-common to the latest HEAD automatically (#16153)
src/sonic-platform-common

* a6dd67e - (HEAD -> 202205, origin/202205) Comment out tx power validation check and program the passed value  (#389) (29 hours ago) [abdosi]
2023-08-16 13:29:34 -07:00
Zhaohui Sun
a098b92591
[202205]Change orchagent pop batch size from 8192 to 1024 (#16126)
### Why I did it
Background running lua script may cause redis-server quite busy if batch size is 8192.
If handling time exceeded default 5s, the redis-server will not response to other process and will cause syncd crash.

```
Aug  9 07:46:29.512326 str-s6100-acs-5 INFO database#supervisord: redis 68:M 09 Aug 2023 07:46:29.511 # Lua slow script detected: still in execution after 5186 milliseconds. You can try killing the script using the SCRIPT KILL command. Script SHA1 is: 88270a7c5c90583e56425aca8af8a4b8c39fe757
Aug  9 07:46:29.523716 str-s6100-acs-5 ERR syncd#syncd: :- checkReplyType: Expected to get redis type 5 got type 6, err: BUSY Redis is busy running a script. You can only call SCRIPT KILL or SHUTDOWN NOSAVE.
Aug  9 07:46:29.524818 str-s6100-acs-5 INFO syncd#supervisord: syncd terminate called after throwing an instance of '
Aug  9 07:46:29.525268 str-s6100-acs-5 ERR pmon#CCmisApi: :- checkReplyType: Expected to get redis type 5 got type 6, err: BUSY Redis is busy running a script. You can only call SCRIPT KILL or SHUTDOWN NOSAVE.
Aug  9 07:46:29.526148 str-s6100-acs-5 INFO syncd#supervisord: syncd std::system_error'
Aug  9 07:46:29.528308 str-s6100-acs-5 ERR pmon#psud[32]: :- checkReplyType: Expected to get redis type 5 got type 6, err: BUSY Redis is busy running a script. You can only call SCRIPT KILL or SHUTDOWN NOSAVE.
Aug  9 07:46:29.529048 str-s6100-acs-5 ERR lldp#python3: :- guard: RedisReply catches system_error: command: *2#015#012$3#015#012DEL#015#012$27#015#012LLDP_ENTRY_TABLE:Ethernet37#015#012, reason: BUSY Redis is busy running a script. You can only call SCRIPT KILL or SHUTDOWN NOSAVE.: Input/output error
Aug  9 07:46:29.529720 str-s6100-acs-5 ERR snmp#python3: :- guard: RedisReply catches system_error: command: *2#015#012$7#015#012HGETALL#015#012$28#015#012COUNTERS:oid:0x100000000000a#015#012, reason: BUSY Redis is busy running a script. You can only call SCRIPT KILL or SHUTDOWN NOSAVE.: Input/output error
```

88270a7c5c90583e56425aca8af8a4b8c39fe757 is /usr/share/swss/consumer_state_table_pops.lua

##### Work item tracking
- Microsoft ADO **24741990**:

#### How I did it
Change batch size from 8192 to 1024.

#### How to verify it
Run all test cases in sonic-mgmt to verify the system stability.

### Tested branch (Please provide the tested image version)

- [x] 20220531.36
2023-08-14 17:54:39 -07:00
mssonicbld
a7193556aa
[submodule] Update submodule sonic-platform-daemons to the latest HEAD automatically (#16108)
src/sonic-platform-daemons

* f5a0ffc - (HEAD -> 202205, origin/202205) Update active application selected code in transceiver_info table aft… (#381) (4 hours ago) [Michael Wang - TW]
2023-08-11 13:24:51 -07:00
mssonicbld
c34e303c6f
Update the iSMART_64 tool (#15936) (#16103)
Why I did it
Updating the iSMART_64 tool for supporting latest debian releases.

How I did it
On branch new_ismart
Changes to be committed:
(use "git restore --staged ..." to unstage)
modified: platform/broadcom/sonic-platform-modules-dell/s6100/scripts/iSMART_64

How to verify it
In s6100, run the iSMART_64 tool.
md5sum - 24725730d7649769c7ba50971c1f2955

Co-authored-by: Santhosh Kumar T <53558409+santhosh-kt@users.noreply.github.com>
2023-08-11 11:08:34 -07:00
mssonicbld
6cc1846562
[submodule] Update submodule sonic-utilities to the latest HEAD automatically (#16110)
src/sonic-utilities

* 0f001c56 - (HEAD -> 202205, origin/202205) UT change: for db_migrator test do not check for RESTAPI cert values (#2919) (4 hours ago) [Vaibhav Hemant Dixit]
* 69d348d1 - [CLI][Show][BGP] Show BGP Change for no neighbor scenario (#2885) (6 hours ago) [Dev Ojha]
* 4c6af3c3 - [multi-asic] Refine [override config table] for corner cases (#2918) (6 hours ago) [wenyiz2021]
* bef3ffeb - [db_migrator] Set docker_routing_config_mode to the value obtained from minigraph parser (#2890) (#2922) (7 hours ago) [Vaibhav Hemant Dixit]
2023-08-11 08:44:17 -07:00
mssonicbld
776201cf30
[submodule] Update submodule sonic-swss to the latest HEAD automatically (#16109)
src/sonic-swss

* 3e2974df - (HEAD -> 202205, origin/202205) [muxorch] set mux state to init upon warm reboot (#2834) (4 hours ago) [Nikola Dancejic]
2023-08-11 08:43:53 -07:00
mssonicbld
ba8a88a15d
[submodule] Update submodule sonic-platform-common to the latest HEAD automatically (#16107)
src/sonic-platform-common

* 3b993c5 - (HEAD -> 202205, origin/202205) [Credo][Ycable] enhancement and error exception for some APIs (#303) (7 hours ago) [Xinyu Lin]
* ab91fde - [ycable] add definitions of some new API's for Y-Cable infrastructure (#301) (7 hours ago) [vdahiya12]
* 2b551f2 - [Credo][Ycable] fix incorrect uart statistics (#296) (7 hours ago) [Xinyu Lin]
2023-08-11 08:42:29 -07:00
mssonicbld
7109ee0525
[submodule] Update submodule linkmgrd to the latest HEAD automatically (#16105)
src/linkmgrd

* 6ce71ba - (HEAD -> 202205, origin/202205) Add ADO to the PR template (#215) (4 hours ago) [Longxiang Lyu]
* 1010d93 - [active-standby] Write `unhealthy` is default route `N/A` (#214) (4 hours ago) [Longxiang Lyu]
* 15e9ca2 - [link prober] Increase pause/restart probe log verbosity (#213) (4 hours ago) [Longxiang Lyu]
2023-08-11 08:41:53 -07:00
mssonicbld
628e1ad981
[SNMP][IPv6]: Fix to use link local IPv6 address as snmp agentAddress (#16013) (#16102)
<!--
     Please make sure you've read and understood our contributing guidelines:
     https://github.com/Azure/SONiC/blob/gh-pages/CONTRIBUTING.md

     ** Make sure all your commits include a signature generated with `git commit -s` **

     If this is a bug fix, make sure your description includes "fixes #xxxx", or
     "closes #xxxx" or "resolves #xxxx"

     Please provide the following information:
-->

#### Why I did it
fixes: https://github.com/sonic-net/sonic-buildimage/issues/16001
Caused by: https://github.com/sonic-net/sonic-buildimage/pull/15487

The above PR introduced change to use Management and Loopback Ipv4 and ipv6 addresses as snmpagent address in snmpd.conf file.
With this change, if Link local IP address is configured as management or Loopback IPv6 address, then snmpd tries to open socket on that ipv6 address and fails with the below error:
```
Error opening specified endpoint "udp6:[fe80::5054:ff:fe6f:16f0]:161"
Server Exiting with code 1
```
From RFC4007, if we need to specify non-global ipv6 address without ambiguity, we need to use zone id along with the ipv6 address: <address>%<zone_id>
Reference: https://datatracker.ietf.org/doc/html/rfc4007

##### Work item tracking
- Microsoft ADO **(number only)**:

#### How I did it
Modify snmpd.conf file to use the %zone_id representation for ipv6 address.
#### How to verify it
In VS testbed, modify config_db to use link local ipv6 address as management address:
    "MGMT_INTERFACE": {
        "eth0|10.250.0.101/24": {
            "forced_mgmt_routes": [
                "172.17.0.1/24"
            ],
            "gwaddr": "10.250.0.1"
        },
        "eth0|fe80::5054:ff:fe6f:16f0/64": {
            "gwaddr": "fe80::1"
        }
    },

Execute config_reload after the above change.
snmpd comes up and check if snmpd is listening on ipv4 and ipv6 addresses:
```
admin@vlab-01:~$ sudo netstat -tulnp | grep 161
tcp        0      0 127.0.0.1:3161          0.0.0.0:*               LISTEN      274060/snmpd        
udp        0      0 10.1.0.32:161           0.0.0.0:*                           274060/snmpd        
udp        0      0 10.250.0.101:161        0.0.0.0:*                           274060/snmpd        
udp6       0      0 fc00:1::32:161          :::*                                274060/snmpd        
udp6       0      0 fe80::5054:ff:fe6f::161 :::*                                274060/snmpd      -- Link local 
 
admin@vlab-01:~$ sudo ifconfig eth0
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.250.0.101  netmask 255.255.255.0  broadcast 10.250.0.255
        inet6 fe80::5054:ff:fe6f:16f0  prefixlen 64  scopeid 0x20<link>
        ether 52:54:00:6f:16:f0  txqueuelen 1000  (Ethernet)
        RX packets 36384  bytes 22878123 (21.8 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 261265  bytes 46585948 (44.4 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

admin@vlab-01:~$ docker exec -it snmp snmpget -v2c -c public fe80::5054:ff:fe6f:16f0 1.3.6.1.2.1.1.1.0
iso.3.6.1.2.1.1.1.0 = STRING: "SONiC Software Version: SONiC.master.327516-04a6031b2 - HwSku: Force10-S6000 - Distribution: Debian 11.7 - Kernel: 5.10.0-18-2-amd64"
```
Logs from snmpd:
```
Turning on AgentX master support.
NET-SNMP version 5.9
Connection from UDP/IPv6: [fe80::5054:ff:fe6f:16f0%eth0]:44308
```
Ran test_snmp_loopback test to check if loopback ipv4 and ipv6 works:
```
./run_tests.sh -n vms-kvm-t0 -d vlab-01 -c snmp/test_snmp_loopback.py  -f vtestbed.yaml -i ../ansible/veos_vtb -e "--skip_sanity --disable_loganalyzer" -u
=== Running tests in groups ===
Running: pytest snmp/test_snmp_loopback.py --inventory ../ansible/veos_vtb --host-pattern vlab-01 --testbed vms-kvm-t0 --testbed_file vtestbed.yaml --log-cli-level warning --log-file-level debug --kube_master unset --showlocals --assert plain --show-capture no -rav --allow_recover --ignore=ptftests --ignore=acstests --ignore=saitests --ignore=scripts --ignore=k8s --ignore=sai_qualify --junit-xml=logs/tr.xml --log-file=logs/test.log --skip_sanity --disable_loganalyzer
..                                                                        

snmp/test_snmp_loopback.py::test_snmp_loopback[vlab-01] PASSED 
```
<!--
If PR needs to be backported, then the PR must be tested against the base branch and the earliest backport release branch and provide tested image version on these two branches. For example, if the PR is requested for master, 202211 and 202012, then the requester needs to provide test results on master and 202012.
-->

#### Which release branch to backport (provide reason below if selected)

<!--
- Note we only backport fixes to a release branch, *not* features!
- Please also provide a reason for the backporting below.
- e.g.
- [x] 202006
-->

- [ ] 201811
- [ ] 201911
- [ ] 202006
- [x] 202012
- [x] 202106
- [x] 202111
- [x] 202205
- [x] 202211
- [x] 202305

#### Tested branch (Please provide the tested image version)

<!--
- Please provide tested image version
- e.g.
- [x] 20201231.100
-->

- [ ] <!-- image version 1 -->
- [ ] <!-- image version 2 -->

#### Description for the changelog
<!--
Write a short (one line) summary that describes the changes in this
pull request for inclusion in the changelog:
-->

<!--
 Ensure to add label/tag for the feature raised. example - PR#2174 under sonic-utilities repo. where, Generic Config and Update feature has been labelled as GCU.
-->

#### Link to config_db schema for YANG module changes
<!--
Provide a link to config_db schema for the table for which YANG model
is defined
Link should point to correct section on https://github.com/Azure/sonic-buildimage/blob/master/src/sonic-yang-models/doc/Configuration.md
-->

#### A picture of a cute animal (not mandatory but encouraged)

Co-authored-by: SuvarnaMeenakshi <50386592+SuvarnaMeenakshi@users.noreply.github.com>
2023-08-11 08:40:50 -07:00
mssonicbld
5092a37a5c
[Mellanox] Remove unnecessary file manipulation in the SAI Make file (#15993) (#16101)
Signed-off-by: Kebo Liu <kebol@nvidia.com>
Co-authored-by: Kebo Liu <kebol@nvidia.com>
2023-08-11 08:40:30 -07:00
mssonicbld
270820c1cf
[chassis]: removed dependency for bgp and swss for chassis supervisor (#15734) (#16099)
Fixes #15667 and #13293

Work item tracking
Microsoft ADO 24472854:

How I did it
On chassis supervisor bgp feature is disabled in hostcfgd. The dependency between swss and bgp causes the bgp containers to start even though the feature is disabled.

How to verify it
Tests on chassis supervisor and LC

Co-authored-by: Arvindsrinivasan Lakshmi Narasimhan <55814491+arlakshm@users.noreply.github.com>
2023-08-11 08:39:22 -07:00
mssonicbld
f835098361
Fix CONFIG_DB_INITIALIZED flag check logic and set/reset flag for warmboot (#15685) (#16098)
* Fix CONFIG_DB_INITIALIZED flag check logic and set/reset flag for warm-reboot
* Fix db-cli usage
* Handle same image warm-reboot and generalize handling of INIT flag
* Cover boot from ONIE case: set config init flag when minigraph, config_db are missing
* Handle case: first boot of SONiC
* Check for config init flag
* Simplify logic, and do not call db_migrator for same image reboot

Co-authored-by: Vaibhav Hemant Dixit <vaibhav.dixit@microsoft.com>
2023-08-11 08:38:59 -07:00