Commit Graph

7379 Commits

Author SHA1 Message Date
Alpesh Patel
4ee9565064 qos template change for backend compute-ai deployment (#16150)
#### Why I did it

To enable qos config for a certain backend deployment mode, for resource-type "Compute-AI".
This deployment has the following requirement:

- Config below enabled if DEVICE_TYPE as one of backend_device_types
- Config below enabled if ResourceType is 'Compute-AI'
- 2 lossless TCs' (2, 3)
- 2 lossy TCs' (0,1)
- DSCP to TC map uses 4 DSCP code points and maps to the TCs' as follows:
   "DSCP_TO_TC_MAP": {
        "AZURE": {
             "48" : "0",
            "46" : "1",
            "3"  : "3",
            "4"  : "4"
        }
    }

- WRED profile has green {min/max/mark%} as {2M/10M/5%}

This required template change <as in the PR> in addition to the vendor qos.json.j2 file (not included here).

### How I did it

#### How to verify it
- with the above change and the vendor config change, generated the qos.json file and verified that the objective stated in "Why I did it" was met

- verified no error

### Description for the changelog
Update qos_config.j2 for Comptue-AI deployment on one of backend device type roles
2023-09-21 18:34:15 +08:00
mssonicbld
996ce9b9ad
[submodule] Update submodule sonic-platform-daemons to the latest HEAD automatically
src/sonic-platform-daemons

* 198f300 - (HEAD -> 202205, origin/202205) [pmon]chassisd crash fix (#396)
2023-09-20 12:14:37 -07:00
Aravind Mani
d2fe62322e [devices]: Dell S6100 API 2.0 fix (#16363)
Why I did it
sonic-mgmt test failure is seen for update_firmware component API

Microsoft ADO: 25208748

How I did it
Edited API 2.0 to fix this issue.

How to verify it
Run sonic-mgmt test after the fix and verify it passes.
2023-09-19 10:25:41 -07:00
vganesan-nokia
5281005304
[swss] Chassis db clean up optimization and bug fixes (#16454) (#16541)
* [swss] Chassis db clean up optimization and bug fixes

This commit includes the following changes:
    - Fix for regression failure due to error in finding CHASSIS_APP_DB in
    pizzabox (#PR 16451)
    - After attempting to delete the system neighbor entries from
    chassis db, before starting clearing the system interface entries,
    wait for sometime only if some system neighbors were deleted.
    If there are no system neighbors entries deleted for the asic coming up,
    no need to wait.
    - Similar changes for system lag delete. Before deleting the
    system lag, wait for some time only if some system lag memebers were
    deleted. If there are no system lag members deleted no need to wait.
    - Flush the SYSTEM_NEIGH_TABLE from the local STATE_DB. While asic
    is coming up, when system neigh entries are deleted from chassis ap
    db (as part of chassis db clean up), there is no orchs/process running to
    process the delete messages from chassis redis. Because of this, stale system
    neigh are entries present in the local STATE_DB. The stale entries result in
    creation of orphan (no corresponding data path/asic db entry) kernel neigh
    entries during STATE_DB:SYSTEM_NEIGH_TABLE entries processing by nbrmgr (after
    the swss serive came up). This is avoided by flushing the SYSTEM_NEIGH_TABLE from
    the local STATE_DB when sevice comes up.

Signed-off-by: vedganes <veda.ganesan@nokia.com>

* [swss] Chassis db clean up bug fixes review comment fix - 1

Debug logs added for deletion of other tables (SYSTEM_INTERFACE and SYSTEM_LAG_TABLE)

Signed-off-by: vedganes <veda.ganesan@nokia.com>

---------

Signed-off-by: vedganes <veda.ganesan@nokia.com>
(cherry picked from commit b13b41fc22)
2023-09-14 14:07:15 -07:00
anamehra
561c71de43 Chassis: fix pmon docker failure when DEVICE_METADATA is not available (#16527)
Signed-off-by: anamehra anamehra@cisco.com

Added a check for DEVICE_METADATA before accessing the data. This prevents the j2 failure when var is not available.
2023-09-14 09:29:06 +08:00
mssonicbld
b4ab3e01df
Run db_migrator for non first-time reboots (#16116) (#16520) 2023-09-12 18:40:30 +08:00
Rajendra Kumar Thirumurthi
dbfa8f9660
[frr]: lib: Fix corruption when routemap delete/add sequence happens (#16456)
Why I did it
Zebra core sometimes seen during config reload. Series of route-map deletions and then re-adds, and this triggers the hash table to realloc to grow to a larger size, then subsuquent route-map operations will be against a corrupted hash table.

Issue is seen when we have BFD Enable on Static Route table we see Static route-map being created/deleted based on bfd session state. However issue itself is very generic from FRR perspective.

Thie issue has detailed core info sonic-net/sonic-frr#37 . This PR fixes this issue.
Fixes#sonic-net/sonic-frr#37

Work item tracking
Microsoft ADO (17952227):

How I did it
This fix is already in Master frr/8.2.5. Porting this fix to 202205 branch to address this Zebra core.
sonic-net/sonic-frr@5f503e5

Solution:
The whole purpose of the delay of deletion and the storage of the route-map is to allow the using protocol the ability to process the route-map at a later time while still retaining the route-map name( for more efficient reprocessing ). The problem exists because we are keeping multiple copies of deletion events that are indistinguishable from each other causing hash havoc.

How to verify it
Verified running sonic-mgmt test, doing multiple config reloads.
2023-09-08 23:19:07 -07:00
anamehra
2b302e83c0 chassis-packet: Update arp_update script for FAILED and STALE check (#16311)
chassis-packet: Update arp_update script for FAILED and STALE check (#16311)

1. Fixing an issue with FAILED entry resolution retry.
Neighbor entries in arp table may sometimes enter a FAILED state when the far end is down and reports the state as follows:
2603:10e2:400:3::1 dev PortChannel19 router FAILED
While the arp_update script handles the entries for FAILED in the following format, the above was not handled due to the token location (extra router keyword at index 4):
2603:10e2:400:3::1 dev PortChannel19 FAILED

The former format may appear if an arp resolution is tried on a link that is known but the far end goes down, e.g., pinging a STALE entry while the far end is down.

2. Refreshing STALE entries to make sure the far end is reachable.
STALE entries for some backend ports may appear in chassis-packet when no traffic is received for a while on the port. When the far end goes down, it is expected for BFD to stop sending packets on the session for which the far end is not reachable. But as the entry is known as stale, on the Cisco chassis, BFD keeps sending packets. Refreshing the stale entry will keep active links as reachable in the neighbor table while the entries for the far end down will enter a failed state. FAILED state entries will be retired and entered reachable when far end comes back up.
2023-09-09 09:26:53 +08:00
mssonicbld
91382fe31c
[Nokia][sonic-platform] Update Nokia sonic-platform submodule (#16348) (#16503) 2023-09-09 09:03:31 +08:00
mssonicbld
32f23dd786
Update macsec CAK keys in profile for tests to change to type7 encoded format (#16388) (#16499) 2023-09-09 06:23:49 +08:00
mssonicbld
85f357e88a
[submodule] Update submodule sonic-swss to the latest HEAD automatically (#16455)
src/sonic-swss

* 33d81e7f - (HEAD -> 202205, origin/202205) Support type7 encoded CAK key for macsec in config_db (#2892) (2 days ago) [judyjoseph]
2023-09-08 09:51:56 -07:00
vdahiya12
8f65b7874f
[minigraph] remove number of lanes check for changing speed from 400G to 100G and set speed setting before lane reconfiguration (#16452)
* [minigraph] remove number of lanes check for changing speed from 400G to 100G and set speed setting before lane reconfiguration   (#15721)

8111 800G interface, split to 2x400G (each has 4 lanes) fails to change interface speed from 400G to 100G during deploy mg. In minigraph.xml, the interface speed configuration is good, but fails to generate the right value to config_db.json.

In order to support this SKU the speed transitioning should support both 4 lanes and 8 lanes in the port_config.ini.

Why I did it

before this change for a 400G to 100G transition, in all cases except when lanes are 8, we would continue and the line
ports.setdefault(port_name, {})['speed'] = port_speed_png[port_name]
would not be executed, hence the default speed will never be set for a case and config_db will not be updated,
where speed is transitioning from 400G to 100G or 40G, but lanes are not equal to 8.

In order for those cases to pass where lanes are not specifically 8, we need the change

Work item tracking
24242657

Signed-off-by: vaibhav-dahiya <vdahiya@microsoft.com>

* fix UT

Signed-off-by: vaibhav-dahiya <vdahiya@microsoft.com>

---------

Signed-off-by: vaibhav-dahiya <vdahiya@microsoft.com>
2023-09-07 16:57:37 -07:00
mssonicbld
0bc0068163
[submodule] Update submodule sonic-utilities to the latest HEAD automatically (#16481)
src/sonic-utilities

* 787b4a32 - (HEAD -> 202205, origin/202205) Remove SFP index usage in generating list of SFP hw error (#2961) (6 hours ago) [Prince George]
2023-09-07 10:17:23 -07:00
mssonicbld
70ff54ccc4
[submodule] Update submodule sonic-platform-common to the latest HEAD automatically (#16475)
src/sonic-platform-common

* 6a38e71 - (HEAD -> 202205, origin/202205) Default implementation of under/over speed checks (#382) (10 minutes ago) [spilkey-cisco]
* 9f2f61d - Convert the tx/rx power unit to the dBm unit (#377) (11 minutes ago) [ChiouRung Haung]
2023-09-06 17:58:58 -07:00
mssonicbld
7a9c05c1e7
[yang] Add Bmc to Device Neighbor Metadata element type list (#16188) (#16470)
Bmc is a valid neighbor type in minigraph, however it was missing from the YANG model definition. Usually, the Bmc type device can be neighbor of BmcMgmtToRRouter. This PR is to introduce this type.

Co-authored-by: Yaqiang Zhu <zyq1512099831@gmail.com>
2023-09-06 16:31:18 -07:00
mssonicbld
7f35f4c200
[submodule] Update submodule sonic-utilities to the latest HEAD automatically (#16476)
src/sonic-utilities

* 03292ffe - (HEAD -> 202205, origin/202205) Fix show acl table for masic (#2937) (6 minutes ago) [Arvindsrinivasan Lakshmi Narasimhan]
* 627a2f59 - [Techsupport] Update the message seen during the lock acquisition failure (#2897) (55 minutes ago) [Vivek]
2023-09-06 16:10:36 -07:00
mssonicbld
f2f8f5f7a9
[submodule] Update submodule linkmgrd to the latest HEAD automatically (#16474)
src/linkmgrd

* 4bf3ebb - (HEAD -> 202205, origin/202205) [active-standby] Fix extra toggle observed in `config reload` (#216) (53 minutes ago) [Longxiang Lyu]
2023-09-06 16:10:06 -07:00
mssonicbld
95f9f44958
[YANG][vlan-sub-interface] Add vlan field (#15838) (#16469)
* [YANG][vlan-sub-interface] Add `vlan` field



* Fix typo



* Fix UT



---------

Signed-off-by: Longxiang Lyu <lolv@microsoft.com>
Co-authored-by: Longxiang Lyu <35479537+lolyu@users.noreply.github.com>
2023-09-06 16:09:40 -07:00
mssonicbld
86b4d38bd3
[YANG SONIC-ACL] Fix Yang definition of ACL_TABLE_TYPE (#16247) (#16472)
How I did it
Update Yang definition of ACL_TABLE_TYPE.
Update existing testcase.
Add new testcase to cover lowercase key scenario.

How to verify it
Verified by building sonic_yang_models-1.0-py3-none-any.whl. While building the target package, unit tests were run and passed.

Co-authored-by: Zhijian Li <zhijianli@microsoft.com>
2023-09-06 16:09:17 -07:00
mssonicbld
4adaa2854e
[minigraph-parser] Update the definition of acl table type BMCDATA and BMCDATAV6 (#16249) (#16473)
Why I did it
According to ACL-Table-Type-HLD, the value type of MATCHES, ACTIONS and BIND_POINTS should be list instead of string. Opening this PR to update the definition of BMCDATA and BMCDATAV6.

How I did it
Update the definition of BMCDATA and BMCDATAV6 in minigraph-parser.

How to verify it
Verified by UT and build SONiC image.

Co-authored-by: Zhijian Li <zhijianli@microsoft.com>
2023-09-06 16:08:47 -07:00
mssonicbld
624a5d489f
[Mellanox] Revise label name and fix typo in sensor.conf of 4600C (#16271) (#16467)
- Why I did it
Revise lable name and fix typo in sensor.conf of 4600C

- How I did it
Revise lable name and fix typo in sensor.conf of 4600C

- How to verify it
Manual test
sonic-mgmt test_sensors.py

Co-authored-by: Junchao-Mellanox <57339448+Junchao-Mellanox@users.noreply.github.com>
2023-09-06 16:08:27 -07:00
mssonicbld
0fe5c9fc7d
[platform]: Disable interrupt for intel i2c-i801 driver (#16309) (#16457)
On S6100 we are seeing almost 100K interrupts per second on intels i801 SMBUS controller which affects systems performance.

We now disable the i801 driver interrupt and instead enable polling

Microsoft ADO (number only): 24910530

How I did it
Disable the interrupt by passing the interrupt disable feature argument to i2c-i801 driver

How to verify it
This fix is NOT applicable for ARM based platforms. Applicable only for intel based platforms:-

- On SN2700 its already disabled in Mellanox hw-mgmt
- Celestica DX010 and E1031
- Dell S6100 verified the interrupts are no longer incrementing.
- Arista 7260CX3

Signed-off-by: Prince George <prgeor@microsoft.com>
Co-authored-by: Prince George <45705344+prgeor@users.noreply.github.com>
2023-09-06 09:49:58 -07:00
mssonicbld
07955af2ed
[ci/build]: Upgrade SONiC package versions (#16316) 2023-09-05 21:54:50 -07:00
mssonicbld
89f091eded
[Mellanox] set select timeout to no more than 1 sec to make sure fast shutdown (#13611) (#16449)
- Why I did it
Commit sonic-net/sonic-platform-daemons@153ea47 changed SfpStateUpdateTask from Process to Thread. In this commit, it raises an exception in SfpStateUpdateTask to make shutdown flow fast. But it does not work on Nvidia platform as Nvidia platform is passing timeout parameter of get_change_event to select. Linux select function can not be interrupted by a Python exception. There is no such issue on Nvidia platform before that commit. However, in order to comply with the commit and make shutdown flow fast, we decided to change Nvidia platform API implementation.

To fix issue #13591.

- How I did it
The select call in get_change_event should use no more than 1 second as timeout parameter.
Outside the select call, add a while loop to make sure timeout parameter of get_change_event work as expected

- How to verify it
Manual test

Co-authored-by: Junchao-Mellanox <57339448+Junchao-Mellanox@users.noreply.github.com>
2023-09-05 21:53:08 -07:00
mssonicbld
d5e2c0004f
Assign the higher metric value for Ipv6 default route learnt via RA message (#16367) (#16440)
* Fix the Loopback0 IPv6 address of LC's in chassis not reachable from peer device's
* Assign the metric vaule for Ipv6 default route learnt via RA message to higher value so that BGP learnt default route is higher priority.

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
Co-authored-by: abdosi <58047199+abdosi@users.noreply.github.com>
2023-09-05 21:52:38 -07:00
mssonicbld
a9564286b2
[submodule] Update submodule sonic-platform-daemons to the latest HEAD automatically (#16390)
src/sonic-platform-daemons

* 0258ecf - (HEAD -> 202205, origin/202205) [pmon][chassis][voq] Chassis DB cleanup when module is down (#394) (9 hours ago) [vganesan-nokia]
2023-09-05 21:48:11 -07:00
mssonicbld
8cac746a03
Fix openconfig_acl.py (#16303) (#16345)
How I did it
Fix the regex for L4 port range in openconfig_acl.py.

How to verify it
Build image and install on Arista-720DT DUT, then try the repro steps in #16189 and confirmed the ACL rule be setup correctly:

Co-authored-by: Zhijian Li <zhijianli@microsoft.com>
2023-09-05 21:47:55 -07:00
James An
1d3d70986f
Update cisco-8000.ini (#16387)
Why I did it

Common Release Notes for 8102-64H, T0/DualTor, and 8101-32FH

Fix for an issue where drop counters were incrementing twice for packets with invalid tag
Fix for the ECC errors reported in SR 695600099
Fix for fwutil show updates failure

How I did it

Update platform version to 202205.2.2.11
2023-09-05 21:42:59 -07:00
jcaiMR
8787b71e03
fix counter log issue on 32bits platform (#16357)
Cherry pick sonic-net/sonic-dhcpmon#11 into 202205.
2023-09-05 09:42:56 -07:00
Junchao-Mellanox
874ca68060
Fix issue: set has_timer attribute to true for platform monitor service (#15624)
There is a redundant line in init_cfg.json.j2. It would cause pmon service always has "has_timer=False". However, we know that PMON has a timer now. So, I try to fix it here.
2023-09-04 19:38:21 -07:00
Arvindsrinivasan Lakshmi Narasimhan
c4c2c00c11
submodule update sonic-platform-daamons (#16386)
Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>
2023-09-01 20:12:38 -07:00
mssonicbld
74dbafe728
[Nokia-IXR7250E] Modify the platform_ndk.json for Nokia-IXR7250E platform (#16355) (#16382)
Signed-off-by: mlok <marty.lok@nokia.com>
Co-authored-by: Marty Y. Lok <76118573+mlok-nokia@users.noreply.github.com>
2023-09-01 20:12:14 -07:00
Arvindsrinivasan Lakshmi Narasimhan
18fb27b84d
patch fix to ignore dup nh in netlink msg (#16385)
Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>
2023-09-01 20:11:43 -07:00
mssonicbld
f7f2e654c4
[chassis] Chassis DB cleanup when asic comes up (#16213) (#16378)
* [chassis]Chassis DB cleanup when asic comes up

Cleanup the entries from the following tables in chassis app db in
redis_chassis server in the supervisor
(1) SYSTEM_NEIGH
(2) SYSTEM_INTERFACE
(3) SYSTEM_LAG_MEMBER_TABLE
(4) SYSTEM_LAG_TABLE
As part of the clean up only those entries created by the asic that
is coming up are deleted. The LAG IDs used by the asics are also
de-allocated from SYSTEM_LAG_ID_TABLE and SYSTEM_LAG_ID_SET

- Added check to run the chassis db clean up only for voq switches.

Signed-off-by: vedganes <veda.ganesan@nokia.com>
Co-authored-by: vganesan-nokia <67648637+vganesan-nokia@users.noreply.github.com>
2023-09-01 16:20:31 -07:00
mssonicbld
88d692f987
[Nokia][DeviceData] Update the Nokia platform IXR-7250E device data (#16028) (#16381)
Why I did it
Update the platform_reboot of Nokia Platform IXR-7250E-36x400G to displays the correct reboot-cause history when reboot from supervisor card.

Work item tracking
Microsoft ADO (number only):
How I did it
Modify the platform_reboot script to copy the correct reboo-cause.txt file from NDK to the /host/reboot-cause directory at the down cycle when the reboot is issued from Supervisor (for both reboot right after install a new image and normal reboot)

Signed-off-by: mlok <marty.lok@nokia.com>
Co-authored-by: Marty Y. Lok <76118573+mlok-nokia@users.noreply.github.com>
2023-09-01 16:19:22 -07:00
Xichen96
35bb472601
[installer] add processor.max_cstate=1 to intel kernel cmdline for intel cpu (#16371)
This is a fix for PR #6051

The original PR will disable intel idle driver but it cannot limit the max c-state to 1 due to system will fall back to acpi idle driver.

Currently intel_idle.max_cstate=0 is already present, which will disable intel idle driver. With the added option, common idle driver will be disabled as well, so there will not be idle management. This is to prevent a bug that can be triggered by idle instruction on intel platform.

Work item tracking
Microsoft ADO (number only): 24867921

How I did it
Add the option to installer file beside intel_idle.max_cstate=0

Signed-off-by: Xichen Lin <lukelin0907@gmail.com>
2023-09-01 11:05:12 -07:00
mssonicbld
896b8e7209
[submodule] Update submodule sonic-swss to the latest HEAD automatically (#16333)
src/sonic-swss

* d787d50d - (HEAD -> 202205, origin/202205) Remove fabric queue counters. (#2862) (2 days ago) [jfeng-arista]
* 4579d43f - update portStatIds for cisco (#2876) (3 days ago) [Zhixin Zhu]
2023-09-01 09:10:46 -07:00
mssonicbld
95d7d440c2
[submodule] Update submodule sonic-linux-kernel to the latest HEAD automatically (#16331)
src/sonic-linux-kernel

* db00eb9 - (HEAD -> 202205, origin/202205) PATCH] net: allow user to set metric on default route learned via Router Advertisement (#326) (2 days ago) [abdosi]
2023-09-01 09:10:11 -07:00
vmittal-msft
134a22221c
Update CPU transmitted packets to queue 7 for chassis (#16349) 2023-08-31 08:57:02 -07:00
Tejaswini Chadaga
56d6ed14c0
[202205] Update Broadcom DNX SAI version to 7.1.60.4 (#16351)
To include the following fixes:

DNX:

CS00012287482 - Support for 1024 LAGs on DNX (Added back fix reverted in [202205] Update Broadcom DNX SAI version to 7.1.54.4 #15850)
CS00012302400 - New SAI 7.1.50.4 caused regression in sonic-mgmt ACL test &
ACL entry creation failing with SAI_STATUS_INVALID_PORT_NUMBER in SAI 7.1.50.4
(CS00012302347)
CS00012302163 - SAI_API_BRIDGE:_brcm_sai_bridge_port_learn_flag:1620 sai bridge lag port list get. failed with error -7.
CS00012296571 - LACP packets are queued to Queue 0 instead of Queue 7
CS00012301919 - The traffic is queued to VOQ 8 sometimes instead of destination port's VOQ
CS00012297160 - [SONIC] [J2C+] Traffic to unknown destination route getting enqueued on VOQ 10
CS00012298730 - [7.x][J2/J2C+] : Treat Q=0 as lowest priority and Q=7 as highest priority in Strict Priority Scheduling
Also includes -
XGS:

Port SONIC-62323 to SAI 7.1, Use single NH instead of ecmp
[SAI_BRANCH rel_ocp_sai_7_1] ECMP group expansion fail due to no resources
Fix capability for Hostif queue on SAI version 7.1
CS00012302193 - SAI_SWITCH_ATTR_SWITCH_HARDWARE_INFO attribute value changed
2023-08-31 08:32:13 -07:00
Samuel Angebault
4e87caf6bb
[202205][Arista] Ignore poll errors during get_event_change (#16304)
This is a backport of #16112
Handle exceptions gracefully within get_change_event
2023-08-29 11:43:20 -07:00
mssonicbld
d17ed9d9d6
[submodule] Update submodule sonic-sairedis to the latest HEAD automatically (#16293)
src/sonic-sairedis

* 70242e7 - (HEAD -> 202205, origin/202205) [CI]: Fix collect log error in azp template. (#1281) (2 days ago) [Nazarii Hnydyn]
2023-08-28 09:32:05 -07:00
mssonicbld
46e562b881
[ci/build]: Upgrade SONiC package versions (#16214) 2023-08-28 09:29:43 -07:00
mssonicbld
b0660ebb2d
[submodule] Update submodule sonic-utilities to the latest HEAD automatically (#16265)
src/sonic-utilities

* 1ed5b5a9 - (HEAD -> 202205, origin/202205) Add transceiver status CLI to show output from TRANSCEIVER_STATUS table (cherry-pick to 202205) (#2950) (4 days ago) [longhuan-cisco]
* ba327726 - Fix in config override when all asic namespaces not present in golden_config_db (#2946) (4 days ago) [judyjoseph]
2023-08-28 09:29:18 -07:00
mssonicbld
d264df3984
Dell S6100 Platform API 2.0 fixes (#16208) (#16252)
Why I did it
Dell S6100 Platform components needs to be updated.

How I did it
Modified platform.json to fix the issue.

How to verify it
Run sonic-mgmt component test and check whether it passes.

Co-authored-by: Aravind Mani <53524901+aravindmani-1@users.noreply.github.com>
2023-08-25 17:05:37 -07:00
mssonicbld
f04206922a
[submodule] Update submodule sonic-platform-common to the latest HEAD automatically (#16264)
src/sonic-platform-common

* b41db16 - (HEAD -> 202205, origin/202205) Move tx_disable/tx_disabled_channel/rx_los/tx_fault  to get_transceiver_status API (#359) (#395) (32 hours ago) [longhuan-cisco]
2023-08-25 17:04:44 -07:00
mssonicbld
8757e6b8d9
[YANG SONIC-ACL] Fix Yang definition of IN_PORTS and OUT_PORTS (#16220) (#16235)
How I did it
Update Yang definition of IN_PORTS and OUT_PORTS to string.
Since we cannot split the string with comma (,) and validate each substring is a valid SONiC port name. The only restriction for them is must be a string.

How to verify it
Verified by building sonic_yang_models-1.0-py3-none-any.whl. While building the target package, unit tests were run and passed.
Build a SONiC image based on 202205 branch and installed on physical DUT. Re try the steps in [Yang] Incorrect definition of IN_PORTS and OUT_PORTS in sonic-acl.yang #16190 and can see below success response:

Co-authored-by: Zhijian Li <zhijianli@microsoft.com>
2023-08-25 17:03:46 -07:00
judyjoseph
d91565ba5e
sudo not required explicitly as /bin/ip netns identify is part of READ_ONLY_CMDS in sudoers file (#16258)
Cherry-pick PR :#16115
2023-08-25 17:02:26 -07:00
Junchao-Mellanox
d19d904f6a
[Mellanox] Fix issue: watchdogutil command does not work (#16091) (#16260)
- Why I did it
watchdogutil uses platform API watchdog instance to control/query watchdog status. In Nvidia watchdog status, it caches "armed" status in a object member "WatchdogImplBase.armed". This is not working for CLI infrastructure because each CLI will create a new watchdog instance, the status cached in previous instance will totally lose. Consider following commands:

admin@sonic:~$ sudo watchdogutil arm -s 100      =====> watchdog instance1, armed=True
Watchdog armed for 100 seconds
admin@sonic:~$ sudo watchdogutil status             ======> watchdog instance2, armed=False
Status: Unarmed
admin@sonic:~$ sudo watchdogutil disarm            =======> watchdog instance3, armed=False
Failed to disarm Watchdog

- How I did it
Use sysfs to query watchdog status

- How to verify it
Manual test
Unit test
Conflicts:
	platform/mellanox/mlnx-platform-api/sonic_platform/watchdog.py
	platform/mellanox/mlnx-platform-api/tests/test_watchdog.py
2023-08-25 17:01:37 -07:00
Junchao-Mellanox
611449dc88
Fix issue: systemctl daemon-reload would sporadically cause udev handler fail (#15253) (#15959)
A workaround to back port the fix for a systemd issue.

The systemd issue: systemd/systemd#24668
The systemd PR to fix the issue: https://github.com/systemd/systemd/pull/24673/files

The formal solution should upgrade systemd to a version that contains the fix. But, systemd is a very basic service, upgrading systemd requires heavy test.
2023-08-22 09:54:56 -07:00