Commit Graph

7364 Commits

Author SHA1 Message Date
mssonicbld
b278f161a9
[ci/build]: Upgrade SONiC package versions (#17541) 2023-12-18 14:25:47 -08:00
Vadym Hlushko
7294103e67
[202205][Mellanox] Add mlxtrace to techsupport (#15961) (#15982)
* [mlxtrace] Add mft-fwtrace-cfg.deb which contains fwtrace_cfg files for the mlxtrace utility

Signed-off-by: vadymhlushko-mlnx <vadymh@nvidia.com>

* [mlxtrace] Remove mlxtrace support for SPC4

Signed-off-by: vadymhlushko-mlnx <vadymh@nvidia.com>

---------

Signed-off-by: vadymhlushko-mlnx <vadymh@nvidia.com>
2023-12-18 14:24:02 -08:00
mssonicbld
603ed3e48c
[submodule] Update submodule sonic-swss to the latest HEAD automatically (#17545)
src/sonic-swss

* 19affd32 - (HEAD -> 202205, origin/202205) [muxorch][202205] Fixing cache bug in updateRoute logic (#2994) (3 hours ago) [Nikola Dancejic]
2023-12-18 13:54:39 -08:00
mssonicbld
4d5604a9e4
[submodule] Update submodule linkmgrd to the latest HEAD automatically (#17536)
src/linkmgrd

* 6fa4adb - (HEAD -> 202205, origin/202205) [active-standby] Fix `show mux status` inconsistency introduced by orchagent rollback  (#225) (#226) (2 days ago) [Jing Zhang]
2023-12-18 13:53:43 -08:00
Kebo Liu
215545516f
[202205] [Mellanox] Revert LPM implementation to the old way (#17179)
* Revert "[202205] [Mellanox] Fix issue: user must set admin down before toggling LPM (#14370)"

This reverts commit f74c69e876.

* update copyright header

Signed-off-by: Kebo Liu <kebol@nvidia.com>
2023-12-12 08:41:04 -08:00
Nazarii Hnydyn
489795344a
[mellanox]: Disable MFT bash autocompletion. (#17362)
Signed-off-by: Nazarii Hnydyn <nazariig@nvidia.com>
2023-12-12 08:40:20 -08:00
mssonicbld
3a191221f6
[submodule] Update submodule sonic-platform-common to the latest HEAD automatically (#17451)
src/sonic-platform-common

* 39ade8d - (HEAD -> 202205, origin/202205) [Credo][Ycable] Remove the thread locker protection from the thread-safe APIs (#388) (4 days ago) [Xinyu Lin]
2023-12-12 08:39:30 -08:00
mssonicbld
23a59cb28d
[submodule] Update submodule sonic-dbsyncd to the latest HEAD automatically (#17449)
src/sonic-dbsyncd

* cde84fa - (HEAD -> 202205, origin/202205) [lldp-syncd] Fix unexpected exception in snmp-subagent (#64) (4 days ago) [Zhaohui Sun]
2023-12-12 08:38:55 -08:00
Volodymyr Samotiy
185b03e9c2
[202205] [Mellanox] Update SAI to 2205.25.1.27 (#17444)
Signed-off-by: Volodymyr Samotiy <volodymyrs@nvidia.com>
2023-12-12 08:38:17 -08:00
anamehra
e754b3b32c
Fixed determine/process reboot-cause service dependency (#17462)
Signed-off-by: anamehra <anamehra@cisco.com>
2023-12-11 13:17:40 -08:00
mssonicbld
83a3562892
[submodule] Update submodule sonic-platform-daemons to the latest HEAD automatically (#17453) 2023-12-10 14:35:24 +08:00
mssonicbld
1f14283add
[ci/build]: Upgrade SONiC package versions (#17388) 2023-12-06 10:11:01 -08:00
Ying Xie
ac32685247
Revert "[pmon] update gRPC version to 1.57.0 (#16257) (#17218)" (#17390)
This reverts commit 6b4bad0ab1.
2023-12-04 11:06:10 -08:00
mssonicbld
5ad93e6a88
[ci/build]: Upgrade SONiC package versions (#17144) 2023-11-30 13:39:21 -08:00
vdahiya12
6b4bad0ab1
[pmon] update gRPC version to 1.57.0 (#16257) (#17218)
* [pmon] update gRPC version to 1.57.0 (#16257)

Signed-off-by: vaibhav-dahiya <vdahiya@microsoft.com>

* fix conflict

Signed-off-by: vaibhav-dahiya <vdahiya@microsoft.com>

---------

Signed-off-by: vaibhav-dahiya <vdahiya@microsoft.com>
2023-11-30 13:38:52 -08:00
mssonicbld
c18b3f9947
[chassis/arista]: Increase LAG Ids to 1024 (#10519) (#17241)
Why I did it
Today at most 128 LAGs are supported. This is not sufficient if there are many LAGs with just few ports.

How I did it
Increase LAG Ids to 1024 for DNX device.

Co-authored-by: Song Yuan <64041228+ysmanman@users.noreply.github.com>
2023-11-30 13:36:28 -08:00
mssonicbld
6a5195ebd6
Revert iBGP GTSM feature for VOQ Chassis (#17037) (#17347)
What I did:

Revert the GTSM feature for VOQ iBGP session done as part of #16777.

Why I did:
On VOQ chassis BGP packets go over Recycle Port and then for Ingress Pipeline Routing making ttl as 254 and failing single hop check.

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
Co-authored-by: abdosi <58047199+abdosi@users.noreply.github.com>
2023-11-30 13:35:59 -08:00
mssonicbld
400717d392
[submodule] Update submodule sonic-utilities to the latest HEAD automatically (#17353)
src/sonic-utilities

* 8765fcef - (HEAD -> 202205, origin/202205) [GCU Bug Fix] Cherry-pick RDMA Platform Validator PR to 202205 (#3051) (3 hou
2023-11-30 13:35:18 -08:00
Arvindsrinivasan Lakshmi Narasimhan
63b6dedfcf
change the max lag_id to 1024 (#17336) 2023-11-29 10:09:07 -08:00
mssonicbld
530bc16005
[submodule] Update submodule sonic-swss to the latest HEAD automatically (#17329)
#### Why I did it
src/sonic-swss
```
* fbab6b75 - (HEAD -> 202205, origin/202205) [Chassis][202205][orchagent] : Support WRED profiles on system ports (#2945) (9 hours ago) [vmittal-msft]
```
#### How I did it
#### How to verify it
#### Description for the changelog
2023-11-29 16:34:32 +08:00
zitingguo-ms
edd094593c
Fix device type and add cluster in DEVICE_NEIGHBOR_METADATA yang model (#17049) (#17251)
Why I did it
The current DEVICE_NEIGHBOR_METADATA yang model has two issues that would block GCU operation when it checks if the current config aligns with the YANG model:

Missing cluster field in YANG
Incomplete set of device type. The device type in YANG model doesn't include all the device type.
Work item tracking
Microsoft ADO (number only): 25577813
How I did it
Add cluster field in DEVICE_NEIGHBOR_METADATA YANG model.
Change device type to string.
Fix the UT test accordingly.
How to verify it
Build the image and verify the unit tests passed.

Signed-off-by: zitingguo-ms <zitingguo@microsoft.com>
2023-11-27 18:06:37 -08:00
mssonicbld
03fd20410a
[Nokia][Nokia-IXR7250E-SUP-10] Update BCM config for supervisor card to reduce the CPU usage (#16790) (#17307) 2023-11-28 05:19:21 +08:00
JunhongMao
d917c6d169 [VOQ][saidump] Install rdbtools into the docker base related containers. (#16466)
Fix #13561

The existing saidump use https://github.com/sonic-net/sonic-swss-common/blob/master/common/table_dump.lua script which loops the ASIC_DB more than 5 seconds and blocks other processes access.

This solution uses the Redis SAVE command to save the snapshot of DB each time and recover later, instead of looping through each entry in the table.

Related PRs:
sonic-net/sonic-utilities#2972
sonic-net/sonic-sairedis#1288
sonic-net/sonic-sairedis#1298

How did I do it?
To use the Redis-db SAVE option to save the snapshot of DB each time and recover later, instead of looping through each entry in the table and saving it.

1. Updated dockers/docker-base-bullseye/Dockerfile.j2, install Python library rdbtools into the all the docker-base-bullseye containers.

2. Updated sonic-buildimage/src/sonic-sairedis/saidump/saidump.cpp, add a new option -r, which updates the rdbtools's output-JSON files' format.

3. To add a new script file: syncd/scripts/saidump.sh into the sairedis repo. This shell script does the following steps:

  For each ASIC, such as ASIC0,

  3.1. Config Redis consistency directory. 
  redis-cli -h $hostname -p $port CONFIG SET dir $redis_dir > /dev/null

  3.2. Save the Redis data.
  redis-cli -h $hostname -p $port SAVE > /dev/null

  3.3. Run rdb command to convert the dump files into JSON files
    rdb --command json $redis_dir/dump.rdb | tee $redis_dir/dump.json > /dev/null

  3.4.  Run saidump -r to update the JSON files' format as same as the saidump before. 
       Then we can get the saidump's result in standard output."
       saidump -r $redis_dir/dump.json -m 100

  3.5. Clear the temporary files.
   rm -f $redis_dir/dump.rdb
   rm -f $redis_dir/dump.json

4. Update sonic-buildimage/src/sonic-utilities/scripts/generate_dump. To check the asic db size and if it is larger than ROUTE_TAB_LIMIT_DIRECT_ITERATION (with default value 24000) entries, then do with REDIS SAVE, otherwise, to do with old method: looping through each entry of Redis DB.

How to verify it
On T2 setup with more than 96K routes, execute CLI command -- generate_dump
No error should be shown
Download the generate_dump result and verify the saidump file after unpacking it.
2023-11-21 12:34:06 +08:00
Junhua Zhai
0d4710ec07 [gearbox] use credo sai v0.9.3 (#16860)
Update credo sai package to the latest v0.9.3, which fixes the issue aristanetworks/sonic#92.
2023-11-21 10:41:57 +08:00
Deepak Singhal
894046e199
Disable systemd auto-restart of dependent services for spineRouters (#17203)
Currently hostcfgd script overrides the systemd service files of the features depending upon auto_restart enable/disable.
I am skipping dependent features(syncd, gbsyncd for now) to have "RESTART=Always"
for them to not start immediately, and instead get started by SWSS through swss.sh script.
The issue of syncd double stop is also applicable to pizza box platforms, however no traffic impact is seen there, whereas on VOQ chassis, we do see traffic impact due to early start of syncd service.
2023-11-20 14:38:20 -08:00
judyjoseph
f59dc50eae
[brcm]: Update Brcm SAI for DNX platforms (#17108)
Update the Brcm SAI 7.0 with following fixes

Offical Brcm SDK fix for memory leak
(CS00012315073 [7.0][J2C+] : PFCWD counter polling causing continuous mem leak on production device)

Official Brcm fix for CPU high
(CS00012317195 High CPU due to SDK calling soc_dnxc_port_resource_get for few stats counters even with bcmCNTR thread)

Offical Brcm SAI fix for getting voq counters working.
CSP CS00012319503: DNX SAI 7.1.60.4 has broken Voq counters support

How to verify it
Validated by running the nightly pipeline on a chassis platform.

Validated that the voq counters, by sensind traffic from T1 VM --> T3 VM 

                              Port    Voq    Counter/pkts    Counter/bytes    Drop/pkts    Drop/bytes
----------------------------------  -----  --------------  ---------------  -----------  ------------
svcstr-xxxx-lc1-1|asic0|Ethernet48   VOQ0               0                0            0             0
svcstr-xxxx-lc1-1|asic0|Ethernet48   VOQ1              27             1968            0             0
svcstr-xxxx-lc1-1|asic0|Ethernet48   VOQ2               0                0            0             0
svcstr-xxxx-lc1-1|asic0|Ethernet48   VOQ3               0                0            0             0
svcstr-xxxx-lc1-1|asic0|Ethernet48   VOQ4               0                0            0             0
svcstr-xxxx-lc1-1|asic0|Ethernet48   VOQ5               0                0            0             0
svcstr-xxxx-lc1-1|asic0|Ethernet48   VOQ6               0                0            0             0
svcstr-xxxx-lc1-1|asic0|Ethernet48   VOQ7               0                0            0             0
 
                              Port    Voq    Counter/pkts    Counter/bytes    Drop/pkts    Drop/bytes
----------------------------------  -----  --------------  ---------------  -----------  ------------
svcstr-xxxx-lc1-1|asic0|Ethernet56   VOQ0               0                0            0             0
svcstr-xxxx-lc1-1|asic0|Ethernet56   VOQ1            7099           625680            0             0
svcstr-xxxx-lc1-1|asic0|Ethernet56   VOQ2               0                0            0             0
svcstr-xxxx-lc1-1|asic0|Ethernet56   VOQ3               0                0            0             0
svcstr-xxxx-lc1-1|asic0|Ethernet56   VOQ4               0                0            0             0
svcstr-xxxx-lc1-1|asic0|Ethernet56   VOQ5               0                0            0             0
svcstr-xxxx-lc1-1|asic0|Ethernet56   VOQ6               0                0            0             0
svcstr-xxxx-lc1-1|asic0|Ethernet56   VOQ7               0                0            0             0

---------------

The CPU usage has come down in SUP

System 'xxxx-sup-1'
  status                       Running
  monitoring status            Monitored
  monitoring mode              active
  on reboot                    start
  load average                 [7.94] [8.70] [7.54]
  cpu                          2.6%us 45.0%sy 0.0%wa     <<<<-- it is 45%
  memory usage                 8.9 GB [28.6%]
  swap usage                   0 B [0.0%]
  uptime                       21m
  boot time                    Fri, 17 Nov 2023 21:55:55
  data collected               Fri, 17 Nov 2023 22:16:59

-------------

syncd memory usage no increasing.
2023-11-17 14:50:59 -08:00
Lawrence Lee
2765e8020f
[tph]: Detect LAG flaps from APPL_DB (#16879) (#17156)
Why I did it
A race condition exists while the TPH is processing a netlink message - if a second netlink message arrives during processing it will be missed since TPH is not listening for other messages.
Another bug was found where TPH was unnecessarily restarting since it was checking admin status instead of operational status of portchannels.

How I did it
Subscribe to APPL_DB for updates on LAG operational state
Track currently sniffed interfaces

How to verify it
Send tunnel packets with destination IP of an unresolved neighbor, verify that ping commands are run
Shut down a portchannel interface, verify that sniffer does not restart
Send tunnel packets, verify ping commands are still run
Bring up portchannel interface, verify that sniffer restarts

Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
2023-11-15 11:02:10 -08:00
wadoodkhan
c289af56d4
[Marvell] Update armhf sai debian (#17091)
Signed-off-by: Wadood A. Khan <wkhan@marvell.com>
2023-11-08 18:51:17 -08:00
mssonicbld
35c855bfa6
[submodule] Update submodule sonic-utilities to the latest HEAD automatically (#17116)
src/sonic-utilities

* 5d3c563a - (HEAD -> 202205, origin/202205) [dualtor_neighbor_check] Adjust zero-mac check condition (#3034) (5 minutes ago) [Longxiang Lyu]
2023-11-08 08:25:07 -08:00
mssonicbld
ff7e1967de
[ci/build]: Upgrade SONiC package versions (#17095) 2023-11-07 08:02:58 -08:00
mssonicbld
6897543c88
[submodule] Update submodule sonic-swss to the latest HEAD automatically (#17094)
src/sonic-swss

* 01bf3b19 - (HEAD -> 202205, origin/202205) [muxorch] Fixing updateRoute logic (#2950) (5 hours ago) [Nikola Dancejic]
* 1e264e01 - Handle Mac address 'none' (#2593) (9 hours ago) [Prince Sunny]
* dc0e29b4 - [202205][teamd]: Clean teamd process if LAG creation fails (#2888) (#2932) (4 days ago) [Lawrence Lee]
2023-11-07 08:02:20 -08:00
vdahiya12
dfe45212ee
[DualToR][caclmgrd] Fix IPtables rules for multiple vlan interfaces for DualToR config (#17093)
* [DualToR][caclmgrd] Fix IPtables rules for multiple vlan interfaces for
DualToR config

Signed-off-by: vaibhav-dahiya <vdahiya@microsoft.com>
2023-11-06 10:08:56 -08:00
mssonicbld
213c18e966
[ci/build]: Upgrade SONiC package versions (#17050) 2023-11-02 08:21:01 -07:00
jhli-cisco
4ecd9869a3
Update cisco-8000.ini (#17066) 2023-11-02 08:20:33 -07:00
mssonicbld
1bd19a2e93
[ci/build]: Upgrade SONiC package versions (#17036) 2023-10-30 14:13:37 -07:00
Liu Shilong
9073c5d7aa
[ci] Fix build error when converting vhdx image. (#17029)
Why I did it
When using sonic-slave-buster to convert sonic-vs.img.gz to vhdx, it also needs reproducible options.
Otherwise it will rebuild sonic-slave-buster because tag different.

Work item tracking
Microsoft ADO (number only): 25615544
How I did it
Add build options to use same sonic-slave docker when generating vhdx image.

How to verify it
2023-10-27 18:00:33 +00:00
mssonicbld
4ee39d121f
[minigraph-parser] Disable unsupported counters on management devices (#16937) (#17028)
Why I did it
To avoid orchagent crash issue like sonic-net/sonic-swss#2935, disable unsupported counters on SONiC management devices.

Work item tracking
Microsoft ADO (number only): 25437720
How I did it
Update the minigraph parser to disable unsupported counters on management devices.

How to verify it
Verified by unittest.
Manually apply patch to DUT and do config load_minigraph

Co-authored-by: Zhijian Li <zhijianli@microsoft.com>
2023-10-27 09:28:21 -07:00
Saikrishna Arcot
4b38216e97
[202205] Update OpenSSH to 1:8.4p1-5+deb11u2 (#17027)
* [baseimage]: Update openssh to 1:8.4p1-5+deb11u2 (#16826)

Openssh in Debian Bullseye has been updated to 1:8.4p1-5+deb11u2 to fix CVE-2023-38408. 
Since we're building openssh with some patches, we need to update our version as well.

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>

* Remove main deb installation for derived deb build (#16859)

* Don't install dependencies of derived debs

When "building" a derived deb package, don't install the dependencies of
the package into the container. It's not needed at this stage.

* Re-add openssh-client and openssh-sftp-server as derived debs

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>

---------

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>

* Re-add missing dependency for derived debs. (#16896)

* Re-add missing dependency for derived debs.

My previous changed removed the whole dependency on the main deb
existing, not just the installation of the main deb. Fix this by
readding a dependency on the main deb being built/pulled from cache.

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>

* Add the kernel and initramfs as dependencies for RFS build

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>

---------

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>

---------

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
2023-10-26 22:37:30 -07:00
Kebo Liu
cb840c101d
[202205] Add special rsyslog filter for MSN2700 platform #16684 (#17020)
Signed-off-by: Kebo Liu <kebol@nvidia.com>
2023-10-26 08:00:52 -07:00
mssonicbld
cacbdbe99c
[submodule] Update submodule sonic-telemetry to the latest HEAD automatically (#17016)
src/sonic-telemetry

* 1a70b50 - (HEAD -> 202205, origin/202205) Merge pull request #168 from zbud-msft/cherry-pick-fix-panic-202205 (4 hours ago) [Ying Xie]
* 2eb9275 - Recover from potential panic when doing map to JSON serialization (#161) (7 days ago) [Zain Budhwani]
2023-10-26 08:00:17 -07:00
Prince Sunny
e367b00253
[Submodule] Update for sonic-restapi (#16993)
Submodule update for sonic-restapi

ccad4a2 - 2023-10-17 : [Tunnel] Support co-existence of IPv4 and IPv6 tunnels (#147) [Prince Sunny]
c8fa96b - 2023-10-12 : Remove command to install libhiredis deb file (#146) [Saikrishna Arcot]
2023-10-25 16:35:04 -07:00
abdosi
c9111122e4 [chassis/multi-asic] Make sure iBGP session established as directly connected (#16777)
What I did:
Make Sure for internal iBGP we are one-hop away (directly connected) by using Generic TTL security mechanism.

Why I did:
Without this change it's possible on packet chassis i-BGP can be established even if there no direct connection. Below is the example

- Let's say we have 3 LC's LC1/LC2/LC3 each having i-BGP session session with each other over Loopback4096
- Each LC's have static route towards other LC's Loopback4096 to establish i-BGP session
- LC1 learn default route 0.0.0.0/0 from it's e-BGP peers and send it over to LC2 and LC3 over i-BGP
- Now for some reason on LC2 static route towards LC3 is removed/not-present/some-issue we expect i-BGP session should go down between LC2 and LC3
- However i-BGP between LC2 and LC3 does not go down because of feature ip nht-resolve-via-default  where LC2 will use default route to reach Loopback4096 of LC3. As it's using default route BGP packets from LC2 towards LC3 will first route to LC1 and then go to LC3 from there.

Above scenario can result in packet mis-forwarding on data plane

How I fixed it:-

To make sure BGP packets between i-BGP peers are not going with extra routing hop enable using GTSM feature

neighbor PEER ttl-security hops NUMBER

This command enforces Generalized TTL Security Mechanism (GTSM), as specified in RFC 5082. With this command, only neighbors that are the specified number of hops away will be allowed to become neighbors. This command is mutually exclusive with ebgp-multihop.

We set hop count as 1 which makes FRR to reject BGP connection if we receive BGP packets if it's TTL < 255. Also setting this attribute make sure i-BGP frames are originated with IP TTL of 255.

How I verify:

Manual Verification of above scenario. See blow BGP packets receive with IP TTL 254 (additional routing hop) we are seeing FIN TCP flags as BGP is rejecting the connection

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
2023-10-25 12:32:27 +08:00
mssonicbld
39e67f0a73
[submodule] Update submodule sonic-swss to the latest HEAD automatically (#16976)
src/sonic-swss

* 8b280d84 - (HEAD -> 202205, origin/202205) [202205][FlexCounters] Fixed orchagent crash issue#2395 (#2939) (4 hours ago) [Rajkumar-Marvell]
2023-10-23 19:01:37 -07:00
Samuel Angebault
f261de5652
[202205][Arista] Update arista platform submodules (#16892)
This change should have been part of #16561 but it was missed when updating the PR.
The update fixes an oob access in the scd-smbus kernel module.
2023-10-20 15:22:54 -07:00
mssonicbld
2639fa7f73
[submodule] Update submodule sonic-swss to the latest HEAD automatically (#16927)
src/sonic-swss

* 79dab014 - (HEAD -> 202205, origin/202205) [muxorch] Reorder the neighbor disable operations (#2917) (11 hours ago) [Longxiang Lyu]
2023-10-17 19:11:00 -07:00
mssonicbld
1a8a3ae880
[submodule] Update submodule sonic-platform-daemons to the latest HEAD automatically (#16926)
src/sonic-platform-daemons

* 2bb8e6b - (HEAD -> 202205, origin/202205) Revert "Use vendor customizable fan speed threshold checks (#378)" (4 minutes ago) [Ying Xie]
2023-10-17 19:09:54 -07:00
mssonicbld
29dd1c2b69
[submodule] Update submodule sonic-utilities to the latest HEAD automatically (#16884)
src/sonic-utilities

* 0ad458cb - (HEAD -> 202205, origin/202205) Include /var/log.tmpfs in techsupport (#2979) (3 days ago) [mihirpat1]
2023-10-17 19:06:08 -07:00
mssonicbld
8e945fb211
Disable CPU C-States other than C1 (#16703) (#16887) 2023-10-14 15:48:43 +08:00
mssonicbld
b6f783ffa4
Revert "Move /var/log to RAM for Mellanox SN2700, Nokia 7215 and Dell S6100 (#15077)" (#16775) (#16886) 2023-10-14 15:38:25 +08:00
James An
b380d99222
Update cisco-8000.ini (#16883)
Release Notes for Cisco 8102-32FH-O:

Fixed platform_test failures in test_component.py
IOFPGA_SJTAG label under ‘fwutil show status’ changed to IOFPGA’
Validated auto FPD upgrade
2023-10-13 19:01:15 -07:00