Commit Graph

7369 Commits

Author SHA1 Message Date
mssonicbld
1a8a3ae880
[submodule] Update submodule sonic-platform-daemons to the latest HEAD automatically (#16926)
src/sonic-platform-daemons

* 2bb8e6b - (HEAD -> 202205, origin/202205) Revert "Use vendor customizable fan speed threshold checks (#378)" (4 minutes ago) [Ying Xie]
2023-10-17 19:09:54 -07:00
mssonicbld
29dd1c2b69
[submodule] Update submodule sonic-utilities to the latest HEAD automatically (#16884)
src/sonic-utilities

* 0ad458cb - (HEAD -> 202205, origin/202205) Include /var/log.tmpfs in techsupport (#2979) (3 days ago) [mihirpat1]
2023-10-17 19:06:08 -07:00
mssonicbld
8e945fb211
Disable CPU C-States other than C1 (#16703) (#16887) 2023-10-14 15:48:43 +08:00
mssonicbld
b6f783ffa4
Revert "Move /var/log to RAM for Mellanox SN2700, Nokia 7215 and Dell S6100 (#15077)" (#16775) (#16886) 2023-10-14 15:38:25 +08:00
James An
b380d99222
Update cisco-8000.ini (#16883)
Release Notes for Cisco 8102-32FH-O:

Fixed platform_test failures in test_component.py
IOFPGA_SJTAG label under ‘fwutil show status’ changed to IOFPGA’
Validated auto FPD upgrade
2023-10-13 19:01:15 -07:00
Hua Liu
cd64b60ec2
[202205] [TACACS] Improve per-command authorization performance by read passwd entry with getpwent (#16659)
Improve per-command authorization performance by read passwd entry with getpwent.
This is manually cherry-pick PR for #16460

Why I did it
Currently per-command authorization will check if user is remote user with getpwnam API, which will trigger tacplus-nss for authentication with TACACS server.
But this is not necessary because when user login the user information already add to local passwd file.
Use getpwent API can directly read from passwd file, this will improve per-command authorization performance.
2023-10-13 18:36:45 -07:00
mssonicbld
1e3c23d23b
[submodule] Update submodule sonic-sairedis to the latest HEAD automatically (#16882)
src/sonic-sairedis

* 439b926 - (HEAD -> 202205, origin/202205) [syncd] Change sai discovery log priority to info (#1296) (3 minutes ago) [Kam
2023-10-13 17:19:15 -07:00
mssonicbld
aea2e19ad4
[snmp] Check intfmgrd running before start (#16588) (#16881)
Add pre start check to ensure intfmgrd is running.
The check will run for 20 seconds at most.

Signed-off-by: Longxiang Lyu <lolv@microsoft.com>
Co-authored-by: Longxiang Lyu <35479537+lolyu@users.noreply.github.com>
2023-10-13 17:16:00 -07:00
mssonicbld
956e970a13
[submodule] Update submodule sonic-snmpagent to the latest HEAD automatically (#16877)
src/sonic-snmpagent

* 17a8bb2 - (HEAD -> 202205, origin/202205) Add ifhighspeed UT (#296) (5 minutes ago) [Hua Liu]
* b5a52ff - Fix key missing exception when invalied transiver info in STATE_DB (#289) (5 minutes ago) [Hua Liu]
* 09bb0c2 - Fix FdbUpdater crash when SAI_FDB_ENTRY_ATTR_BRIDGE_PORT_ID attribute missing. (#286) (5 minutes ago) [Hua Liu]
* 792e403 - Support interface speed for PortChannels (#262) (5 minutes ago) [Lukas Stockner]
2023-10-13 16:02:49 -07:00
mssonicbld
990072da47
[submodule] Update submodule sonic-telemetry to the latest HEAD automatically (#16870)
src/sonic-telemetry

*   a399feb - (HEAD -> 202205, origin/202205) Merge pull request #155 from zbud-msft/cherry-pick-on-change-mode-202205 (48 minutes ago) [Ying Xie]
|\  
| *   d7ea9fe - Merge branch '202205' into cherry-pick-on-change-mode-202205 (2 hours ago) [Ying Xie]
| |\  
| |/  
|/|   
* |   7623da9 - Merge pull request #165 from zbud-msft/cherry-pick-prepare-state-db-202205 (2 hours ago) [Ying Xie]
|\ \  
| * | a561194 - Cherry pick files from on change deletion commit (17 hours ago) [Zain Budhwani]
|/ /  
| *   3a3f43c - Merge branch '202205' into cherry-pick-on-change-mode-202205 (19 hours ago) [Ying Xie]
| |\  
| |/  
|/|   
* | 818b345 - Merge pull request #162 from zbud-msft/202205_remove_download_image (19 hours ago) [Ying Xie]
* | 1b4d489 - Install necessary deb instead of entire image (19 hours ago) [Zain Budhwani]
 /  
* e494561 - Add key to on change updates (#138) (2 weeks ago) [Zain Budhwani]
2023-10-13 16:02:26 -07:00
mssonicbld
1f36540c5d
[submodule] Update submodule sonic-platform-daemons to the latest HEAD automatically (#16869)
src/sonic-platform-daemons

* 6064369 - (HEAD -> 202205, origin/202205) Use vendor customizable fan speed threshold checks (#378) (3 hours ago) [spilkey-cisco]
2023-10-13 10:11:56 -07:00
mssonicbld
c774189d14
[submodule] Update submodule linkmgrd to the latest HEAD automatically (#16849)
src/linkmgrd

* d7ab364 - (HEAD -> 202205, origin/202205) [warmboot] config all interfaces back to `auto` if reconciliation times out  (#220) (29 minutes ago) [Jing Zhang]
2023-10-11 18:40:00 -07:00
SuvarnaMeenakshi
427a3325d1
[202205][SNMP][IPv6]: Revert PRs to support SNMP over IPv6 (#16650)
* Revert "[SNMP][IPv6]: Fix to use link local IPv6 address as snmp agentAddress (#16013) (#16102)"

This reverts commit 628e1ad981.

* Revert "[SNMP][IPv6]: Fix SNMP IPv6 reachability issue in certain scenarios (#15487) (#15826)"

This reverts commit 7cfb71bc18.
2023-10-11 12:02:33 -07:00
mssonicbld
36cf71f79a
[submodule] Update submodule sonic-swss to the latest HEAD automatically (#16834)
src/sonic-swss

* 561cfd94 - (HEAD -> 202205, origin/202205) [202205][buffers] Add handler for the 'create_only_config_db_buffers' configuration knob (#2882) (11 hours ago) [Vadym Hlushko]
2023-10-11 11:56:28 -07:00
Samuel Angebault
87ab7a4e68
[202205][Arista] Update arista platform submodules (#16561)
* [202205][Arista] Update arista platform submodules

 - fix issue where platform debug info would no longer be in the dump
 - fix issue in scd-xcvr where active low bits couldn't be set
 - fix issue in scd-smbus where it perform an oob access
2023-10-11 11:55:51 -07:00
mssonicbld
37fe9cc4eb
[submodule] Update submodule sonic-platform-common to the latest HEAD automatically (#16530)
src/sonic-platform-common

* ade83aa - (HEAD -> 202205, origin/202205) [202205] Fix issue: should use 'Value' column to calculate the health percentage for Virtium SSD (#385) (4 weeks ago) [Junchao-Mellanox]
2023-10-11 09:39:57 -07:00
Nazarii Hnydyn
7b06d9b982
[hostcfgd] Fix issue: FeatureHandler might override user configuration (#16766)
Signed-off-by: Nazarii Hnydyn <nazariig@nvidia.com>
2023-10-11 09:14:00 -07:00
abdosi
6f93832a03 [chassisd]: Updated the API get_platform_info() to return running/detected ASIC's count (#16539)
previously, get_num_asics() returns the maximum number of asics. however, the asic_count 
should be actual number of asics populated which can be get from get_asic_presence_list().

ADO: 25158825

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
2023-10-11 09:30:32 +08:00
Vadym Hlushko
3ac09d544a
[202205][buffers] Add 'create_only_config_db_buffers.json' file for the Mellanox devices (not MSFT SKU) (#16232)
* [buffers] Add create_only_config_db_buffers.json for MLNX devices (not MSFT SKU), inject it at the start of the swss docker

Signed-off-by: vadymhlushko-mlnx <vadymh@nvidia.com>

* [buffers] Align the sonic-device_metadata.yang

Signed-off-by: vadymhlushko-mlnx <vadymh@nvidia.com>

---------

Signed-off-by: vadymhlushko-mlnx <vadymh@nvidia.com>
2023-10-10 09:28:00 -07:00
Qi Luo
827eb3dc3d Fix the dependency grpcio-tools version (#16776)
#### Why I did it
Fix the build break of marvell-armhf/sonic-ycabled
2023-10-10 10:33:03 +08:00
zitingguo-ms
d83ecab437
upgrade xgs SAI version to 7.1.62.4 (#16793)
Upgrade the xgs SAI version to 7.1.62.4 to include the following changes:

7.1.62.4: ECMP CRM fix - CS00012312907
7.1.61.4: Includes nexthop group scaling fix - CS00012304075
7.1.60.4: CS00012302193 - SAI_SWITCH_ATTR_SWITCH_HARDWARE_INFO attribute value changed
7.1.59.4: [CS00012302400 CS00012302347]backport SONIC-76986 to SAI7.1: Fix the issue--"empty LAG can't be added to ACL entry"
7.1.57.4: [CSP CS00012296571] Backport SONIC-75371 jira on SAI 7.1 branch
7.1.56.4: [CSP CS00012302193] backport SONIC-72912 jira on SAI 7.1 branch

Signed-off-by: zitingguo-ms <zitingguo@microsoft.com>
2023-10-09 13:57:50 -07:00
mssonicbld
8ce40ade8b
Update BMCDATAV6 Definition (#16634) (#16813) 2023-10-10 03:03:48 +08:00
Volodymyr Samotiy
6f18a2335b
[202205] [Mellanox] Update SDK/FW to 4.5.4318/2010.4316 and SAI to 2205.25.1.2 (#16590)
Update SDK/FW to 4.5.4318/2010.4316 and SAI to 2205.25.1.2 in order to include listed below fixes.

SDK/FW

In some cases, when an ACL has two or more rules with a similar key, modifying/removing one of the rules may cause modification/removal of one of the similar-key rules, instead of the requested rule.
Using module SPQCELRCDFB when connected to a 3rd party switch, there may either be no link or a very long link up time (~2 minutes).
In some case warmboot from 201911 to 202205 might result in dataplane traffic loss
When upgrade SONiC version using warm boot from version 201911/202012 to newer version, then doing cold boot back to older version and upgrade again to newer one warm boot might be fail.
SAI

Added support for dynamic ordered ECMP group (SAI_NEXT_GROUP_TYPE_DYNAMIC_ORDERED_ECMP)
"store and forward" KV was added
Added Support for IPV6 link local debug counters

---------

Signed-off-by: Volodymyr Samotiy <volodymyrs@nvidia.com>
2023-10-05 16:49:30 -07:00
mssonicbld
855c76d541
[submodule] Update submodule sonic-swss to the latest HEAD automatically (#16532)
src/sonic-swss

* de7186c6 - (HEAD -> 202205, origin/202205) [202205][CodeQL]: Use dependencies with relevant versions in azp template. (#2905) (13 days ago) [Nazarii Hnydyn]
* 106dd9ed - [CodeQL]: Use dependencies with relevant versions in azp template. (#2845) (3 weeks ago) [Nazarii Hnydyn]
2023-10-05 16:35:20 -07:00
mssonicbld
6af29aa951
[submodule] Update submodule sonic-utilities to the latest HEAD automatically (#16717)
src/sonic-utilities

* ff8a0643 - (HEAD -> 202205, origin/202205) [202205][acl-loader] Identity ICMP v4/v6 based on IP_PROTOCOL for custom ACL table types (#3003) (6 days ago) [Zhijian Li]
* d9bc820e - Handle NotImplementedError exception while changing optoe write max (#2985) (8 days ago) [mihirpat1]
* 4bf29fe2 - [sonic-package-manager] Increate timeout for sonic-package-manager migrate (#2973) (8 days ago) [Yaqiang Zhu]
2023-10-05 08:29:18 -07:00
mssonicbld
d4e98e9ec7
[submodule] Update submodule sonic-linux-kernel to the latest HEAD automatically (#16756)
src/sonic-linux-kernel

* 246887e - (HEAD -> 202205, origin/202205) [202205] [Mellanox] Add patch for hw-mgmt 7.0020.4305 (#330) (6 days ago) [Junchao-Mellanox]
2023-10-05 08:28:41 -07:00
jhli-cisco
4b69efb461
Update cisco-8000.ini (#16778)
Why I did it
Fixes for
MIGSMSFT-333 / SR 696141124 - Fix OREDERED ECMP NHG drop when route is added before members are added
MIGSMSFT-333 / SR 696141124 – Fix port handling of empty ecmp group to drop packets
2023-10-04 18:20:00 -07:00
Junchao-Mellanox
5c80d3804b [Mellanox] wait reset cause ready (#16722)
Why I did it
SONiC service determine-reboot-cause might run before driver creating reset cause files. In that case, the reset cause will be "Unknown". This PR introduces a wait mechanism to wait for reset cause sysfs files ready.

How I did it
/run/hw-management/config/reset_attr_ready is the file to indicate all reset cause files are ready. In chassis.get_reboot_cause function, it waits /run/hw-management/config/reset_attr_ready for up to 45 seconds.

How to verify it
Manual test on master/202211/202205
2023-10-04 14:34:27 +08:00
Junchao-Mellanox
3bc0da4a3f
[202205] [Mellanox] upgrade hw-management package to 7.0020.4305 (#16483)
* [Mellanox] upgrade hw-management package to 7.0020.4304

* Update hw-management to 7.0020.4305
2023-10-03 18:57:47 -07:00
mssonicbld
a35649e853
[ci/build]: Upgrade SONiC package versions (#16698) 2023-10-03 08:38:07 -07:00
mssonicbld
ef7780d8f4
[fast-reboot] Fix regression: set FAST_REBOOT state_db flag to support fast-reboot from older images (#16733) (#16754) 2023-09-29 04:08:34 +08:00
Nazarii Hnydyn
214ea08777
[ssm]: Enable Store-And-Forward switching mode for SN2700/SN3800/SN4600C/SN4700. (#16662)
Signed-off-by: Nazarii Hnydyn <nazariig@nvidia.com>
2023-09-28 09:50:28 -07:00
vmittal-msft
9ffa4bdfff [nokia]: Updated total headroom pool size to accommodate 100G ports on T2 uplinks (#16690)
Microsoft ADO (25266920)

sonic-mgmt xoff test was failing for [100g,120km]. Needed to update total headroom pool size when 100G line card is used as T2 uplink.

This size was calculated assuming 100g is used for downlink so cable length was 2km whereas it can also be used for uplink (cable length - 120km). so we need to do calculation based on 120km not 2km. Although it will be some wastage for 2km scenario but it should cover both cases.
2023-09-27 12:32:28 +08:00
mssonicbld
4a75f1be0a
[chassis/multi-asic] Enable Sending BGP Community over internal neighbors over iBGP Session (#16705) (#16710) 2023-09-27 11:01:45 +08:00
snider-nokia
45d5701c4b [Nokia][sonic-platform] Update Nokia sonic-platform submodule - SFP support for CMIS CDB operations (#16572)
This fixes Nokia-ION/ndk#22
Note that this PR must be coupled with NDK version >= 22.9.13

Why I did it
To provide proper support for CMIS compliant transceiver module CDB operations (including FW related operations).

How I did it
Enhanced the transport subsystem so as to provide for up to 2k bytes of data to be passed to/from modules (as contrasted with the prior max of 128 bytes).

How to verify it
Ensure that new FW (firmware) can be programmed to CMIS compliant module(s) using the 'sfputil firmware ...' commands.
2023-09-26 09:31:10 +08:00
mssonicbld
d7c7261d01
[ci/build]: Upgrade SONiC package versions (#16506) 2023-09-25 09:12:08 -07:00
Nazarii Hnydyn
7c68be04e8
[Mellanox]: Update SKUs to enable SDK dumps. (#16286)
CHERRY-PICK: #7708

Signed-off-by: Nazarii Hnydyn <nazariig@nvidia.com>
2023-09-23 00:39:23 -07:00
judyjoseph
5eed2054fe
Update Broadcom DNX SAI version to 7.1.60.4-1 (#16660)
Fixes: 16564
2023-09-22 19:41:07 -07:00
abdosi
7558d03611
[202205] Assign altname for bridge interface on chassis and iptables rules update to allow traffic on it. (#16504)
What I did:
Fixes: #16468

Why I did:
On Some chassis there is no dedicated eth1-midplane interface on supervisor for supervisor and LC communication but instead Linux bridge br1 is used for that. Because of this changes that were done to white-list traffic over eth1-midplane would not work.

How I did:
To fix this we are using altname property of ip link command to set eth1-midplane as altname of br interface. This is done to keep design generic across chassis and between supervisor and LC also. IP-table rules are updated to get parent/base interface name of eth1-midplane.

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
2023-09-22 10:53:23 -07:00
James An
6ebfa3b34b
[cisco]: Update cisco-8000.ini (#16656)
Why I did it

Release Notes for 8102-64H
• Fix NHG drop when route is added before members are added (MIGSMSFT-333 / SR 696141124)
• Added a new system device property "acl_set_dscp_encap_outer_only"
• IN_DISCARD counters report back per-port counters only instead of all counters that are per-port and also that are shared.

How I did it

Update platform version to 202205.2.2.12
2023-09-21 23:02:38 -07:00
Alpesh Patel
4ee9565064 qos template change for backend compute-ai deployment (#16150)
#### Why I did it

To enable qos config for a certain backend deployment mode, for resource-type "Compute-AI".
This deployment has the following requirement:

- Config below enabled if DEVICE_TYPE as one of backend_device_types
- Config below enabled if ResourceType is 'Compute-AI'
- 2 lossless TCs' (2, 3)
- 2 lossy TCs' (0,1)
- DSCP to TC map uses 4 DSCP code points and maps to the TCs' as follows:
   "DSCP_TO_TC_MAP": {
        "AZURE": {
             "48" : "0",
            "46" : "1",
            "3"  : "3",
            "4"  : "4"
        }
    }

- WRED profile has green {min/max/mark%} as {2M/10M/5%}

This required template change <as in the PR> in addition to the vendor qos.json.j2 file (not included here).

### How I did it

#### How to verify it
- with the above change and the vendor config change, generated the qos.json file and verified that the objective stated in "Why I did it" was met

- verified no error

### Description for the changelog
Update qos_config.j2 for Comptue-AI deployment on one of backend device type roles
2023-09-21 18:34:15 +08:00
mssonicbld
996ce9b9ad
[submodule] Update submodule sonic-platform-daemons to the latest HEAD automatically
src/sonic-platform-daemons

* 198f300 - (HEAD -> 202205, origin/202205) [pmon]chassisd crash fix (#396)
2023-09-20 12:14:37 -07:00
Aravind Mani
d2fe62322e [devices]: Dell S6100 API 2.0 fix (#16363)
Why I did it
sonic-mgmt test failure is seen for update_firmware component API

Microsoft ADO: 25208748

How I did it
Edited API 2.0 to fix this issue.

How to verify it
Run sonic-mgmt test after the fix and verify it passes.
2023-09-19 10:25:41 -07:00
vganesan-nokia
5281005304
[swss] Chassis db clean up optimization and bug fixes (#16454) (#16541)
* [swss] Chassis db clean up optimization and bug fixes

This commit includes the following changes:
    - Fix for regression failure due to error in finding CHASSIS_APP_DB in
    pizzabox (#PR 16451)
    - After attempting to delete the system neighbor entries from
    chassis db, before starting clearing the system interface entries,
    wait for sometime only if some system neighbors were deleted.
    If there are no system neighbors entries deleted for the asic coming up,
    no need to wait.
    - Similar changes for system lag delete. Before deleting the
    system lag, wait for some time only if some system lag memebers were
    deleted. If there are no system lag members deleted no need to wait.
    - Flush the SYSTEM_NEIGH_TABLE from the local STATE_DB. While asic
    is coming up, when system neigh entries are deleted from chassis ap
    db (as part of chassis db clean up), there is no orchs/process running to
    process the delete messages from chassis redis. Because of this, stale system
    neigh are entries present in the local STATE_DB. The stale entries result in
    creation of orphan (no corresponding data path/asic db entry) kernel neigh
    entries during STATE_DB:SYSTEM_NEIGH_TABLE entries processing by nbrmgr (after
    the swss serive came up). This is avoided by flushing the SYSTEM_NEIGH_TABLE from
    the local STATE_DB when sevice comes up.

Signed-off-by: vedganes <veda.ganesan@nokia.com>

* [swss] Chassis db clean up bug fixes review comment fix - 1

Debug logs added for deletion of other tables (SYSTEM_INTERFACE and SYSTEM_LAG_TABLE)

Signed-off-by: vedganes <veda.ganesan@nokia.com>

---------

Signed-off-by: vedganes <veda.ganesan@nokia.com>
(cherry picked from commit b13b41fc22)
2023-09-14 14:07:15 -07:00
anamehra
561c71de43 Chassis: fix pmon docker failure when DEVICE_METADATA is not available (#16527)
Signed-off-by: anamehra anamehra@cisco.com

Added a check for DEVICE_METADATA before accessing the data. This prevents the j2 failure when var is not available.
2023-09-14 09:29:06 +08:00
mssonicbld
b4ab3e01df
Run db_migrator for non first-time reboots (#16116) (#16520) 2023-09-12 18:40:30 +08:00
Rajendra Kumar Thirumurthi
dbfa8f9660
[frr]: lib: Fix corruption when routemap delete/add sequence happens (#16456)
Why I did it
Zebra core sometimes seen during config reload. Series of route-map deletions and then re-adds, and this triggers the hash table to realloc to grow to a larger size, then subsuquent route-map operations will be against a corrupted hash table.

Issue is seen when we have BFD Enable on Static Route table we see Static route-map being created/deleted based on bfd session state. However issue itself is very generic from FRR perspective.

Thie issue has detailed core info sonic-net/sonic-frr#37 . This PR fixes this issue.
Fixes#sonic-net/sonic-frr#37

Work item tracking
Microsoft ADO (17952227):

How I did it
This fix is already in Master frr/8.2.5. Porting this fix to 202205 branch to address this Zebra core.
sonic-net/sonic-frr@5f503e5

Solution:
The whole purpose of the delay of deletion and the storage of the route-map is to allow the using protocol the ability to process the route-map at a later time while still retaining the route-map name( for more efficient reprocessing ). The problem exists because we are keeping multiple copies of deletion events that are indistinguishable from each other causing hash havoc.

How to verify it
Verified running sonic-mgmt test, doing multiple config reloads.
2023-09-08 23:19:07 -07:00
anamehra
2b302e83c0 chassis-packet: Update arp_update script for FAILED and STALE check (#16311)
chassis-packet: Update arp_update script for FAILED and STALE check (#16311)

1. Fixing an issue with FAILED entry resolution retry.
Neighbor entries in arp table may sometimes enter a FAILED state when the far end is down and reports the state as follows:
2603:10e2:400:3::1 dev PortChannel19 router FAILED
While the arp_update script handles the entries for FAILED in the following format, the above was not handled due to the token location (extra router keyword at index 4):
2603:10e2:400:3::1 dev PortChannel19 FAILED

The former format may appear if an arp resolution is tried on a link that is known but the far end goes down, e.g., pinging a STALE entry while the far end is down.

2. Refreshing STALE entries to make sure the far end is reachable.
STALE entries for some backend ports may appear in chassis-packet when no traffic is received for a while on the port. When the far end goes down, it is expected for BFD to stop sending packets on the session for which the far end is not reachable. But as the entry is known as stale, on the Cisco chassis, BFD keeps sending packets. Refreshing the stale entry will keep active links as reachable in the neighbor table while the entries for the far end down will enter a failed state. FAILED state entries will be retired and entered reachable when far end comes back up.
2023-09-09 09:26:53 +08:00
mssonicbld
91382fe31c
[Nokia][sonic-platform] Update Nokia sonic-platform submodule (#16348) (#16503) 2023-09-09 09:03:31 +08:00
mssonicbld
32f23dd786
Update macsec CAK keys in profile for tests to change to type7 encoded format (#16388) (#16499) 2023-09-09 06:23:49 +08:00