Commit Graph

1252 Commits

Author SHA1 Message Date
Kevin Wang
be87afd312 [qos] change the template keyword from Compute-AI to ComputeAI (#17902)
Why I did it
Align the keywords to make qos configuration take effect

Work item tracking
Microsoft ADO (number only):
How I did it
Change the keyword to ComputeAI

How to verify it
reload minigraph and check the qos configuration
2024-01-30 00:35:24 +08:00
mssonicbld
2e096ff576
[ci/build]: Upgrade SONiC package versions (#17865) 2024-01-23 20:15:09 -08:00
mssonicbld
1239ef1d86
[ci/build]: Upgrade SONiC package versions (#17565) 2024-01-11 08:57:42 -08:00
Lawrence Lee
2757915e4f add timeout to ping6 command (#17729)
Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
2024-01-11 12:34:27 +08:00
mssonicbld
04d83c05ae
[docker_image_ctl.j2]: swss docker initialization improvements (#17628) (#17681) 2024-01-05 04:33:39 +08:00
mssonicbld
78c8f11086
Update backend_acl.py to specify ACL table name (#17553) (#17667) 2024-01-04 10:55:31 +08:00
mssonicbld
b278f161a9
[ci/build]: Upgrade SONiC package versions (#17541) 2023-12-18 14:25:47 -08:00
mssonicbld
1f14283add
[ci/build]: Upgrade SONiC package versions (#17388) 2023-12-06 10:11:01 -08:00
Ying Xie
ac32685247
Revert "[pmon] update gRPC version to 1.57.0 (#16257) (#17218)" (#17390)
This reverts commit 6b4bad0ab1.
2023-12-04 11:06:10 -08:00
mssonicbld
5ad93e6a88
[ci/build]: Upgrade SONiC package versions (#17144) 2023-11-30 13:39:21 -08:00
vdahiya12
6b4bad0ab1
[pmon] update gRPC version to 1.57.0 (#16257) (#17218)
* [pmon] update gRPC version to 1.57.0 (#16257)

Signed-off-by: vaibhav-dahiya <vdahiya@microsoft.com>

* fix conflict

Signed-off-by: vaibhav-dahiya <vdahiya@microsoft.com>

---------

Signed-off-by: vaibhav-dahiya <vdahiya@microsoft.com>
2023-11-30 13:38:52 -08:00
mssonicbld
ff7e1967de
[ci/build]: Upgrade SONiC package versions (#17095) 2023-11-07 08:02:58 -08:00
mssonicbld
213c18e966
[ci/build]: Upgrade SONiC package versions (#17050) 2023-11-02 08:21:01 -07:00
mssonicbld
1bd19a2e93
[ci/build]: Upgrade SONiC package versions (#17036) 2023-10-30 14:13:37 -07:00
Saikrishna Arcot
4b38216e97
[202205] Update OpenSSH to 1:8.4p1-5+deb11u2 (#17027)
* [baseimage]: Update openssh to 1:8.4p1-5+deb11u2 (#16826)

Openssh in Debian Bullseye has been updated to 1:8.4p1-5+deb11u2 to fix CVE-2023-38408. 
Since we're building openssh with some patches, we need to update our version as well.

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>

* Remove main deb installation for derived deb build (#16859)

* Don't install dependencies of derived debs

When "building" a derived deb package, don't install the dependencies of
the package into the container. It's not needed at this stage.

* Re-add openssh-client and openssh-sftp-server as derived debs

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>

---------

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>

* Re-add missing dependency for derived debs. (#16896)

* Re-add missing dependency for derived debs.

My previous changed removed the whole dependency on the main deb
existing, not just the installation of the main deb. Fix this by
readding a dependency on the main deb being built/pulled from cache.

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>

* Add the kernel and initramfs as dependencies for RFS build

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>

---------

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>

---------

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
2023-10-26 22:37:30 -07:00
Kebo Liu
cb840c101d
[202205] Add special rsyslog filter for MSN2700 platform #16684 (#17020)
Signed-off-by: Kebo Liu <kebol@nvidia.com>
2023-10-26 08:00:52 -07:00
mssonicbld
8e945fb211
Disable CPU C-States other than C1 (#16703) (#16887) 2023-10-14 15:48:43 +08:00
mssonicbld
aea2e19ad4
[snmp] Check intfmgrd running before start (#16588) (#16881)
Add pre start check to ensure intfmgrd is running.
The check will run for 20 seconds at most.

Signed-off-by: Longxiang Lyu <lolv@microsoft.com>
Co-authored-by: Longxiang Lyu <35479537+lolyu@users.noreply.github.com>
2023-10-13 17:16:00 -07:00
Vadym Hlushko
3ac09d544a
[202205][buffers] Add 'create_only_config_db_buffers.json' file for the Mellanox devices (not MSFT SKU) (#16232)
* [buffers] Add create_only_config_db_buffers.json for MLNX devices (not MSFT SKU), inject it at the start of the swss docker

Signed-off-by: vadymhlushko-mlnx <vadymh@nvidia.com>

* [buffers] Align the sonic-device_metadata.yang

Signed-off-by: vadymhlushko-mlnx <vadymh@nvidia.com>

---------

Signed-off-by: vadymhlushko-mlnx <vadymh@nvidia.com>
2023-10-10 09:28:00 -07:00
mssonicbld
a35649e853
[ci/build]: Upgrade SONiC package versions (#16698) 2023-10-03 08:38:07 -07:00
mssonicbld
ef7780d8f4
[fast-reboot] Fix regression: set FAST_REBOOT state_db flag to support fast-reboot from older images (#16733) (#16754) 2023-09-29 04:08:34 +08:00
mssonicbld
d7c7261d01
[ci/build]: Upgrade SONiC package versions (#16506) 2023-09-25 09:12:08 -07:00
abdosi
7558d03611
[202205] Assign altname for bridge interface on chassis and iptables rules update to allow traffic on it. (#16504)
What I did:
Fixes: #16468

Why I did:
On Some chassis there is no dedicated eth1-midplane interface on supervisor for supervisor and LC communication but instead Linux bridge br1 is used for that. Because of this changes that were done to white-list traffic over eth1-midplane would not work.

How I did:
To fix this we are using altname property of ip link command to set eth1-midplane as altname of br interface. This is done to keep design generic across chassis and between supervisor and LC also. IP-table rules are updated to get parent/base interface name of eth1-midplane.

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
2023-09-22 10:53:23 -07:00
Alpesh Patel
4ee9565064 qos template change for backend compute-ai deployment (#16150)
#### Why I did it

To enable qos config for a certain backend deployment mode, for resource-type "Compute-AI".
This deployment has the following requirement:

- Config below enabled if DEVICE_TYPE as one of backend_device_types
- Config below enabled if ResourceType is 'Compute-AI'
- 2 lossless TCs' (2, 3)
- 2 lossy TCs' (0,1)
- DSCP to TC map uses 4 DSCP code points and maps to the TCs' as follows:
   "DSCP_TO_TC_MAP": {
        "AZURE": {
             "48" : "0",
            "46" : "1",
            "3"  : "3",
            "4"  : "4"
        }
    }

- WRED profile has green {min/max/mark%} as {2M/10M/5%}

This required template change <as in the PR> in addition to the vendor qos.json.j2 file (not included here).

### How I did it

#### How to verify it
- with the above change and the vendor config change, generated the qos.json file and verified that the objective stated in "Why I did it" was met

- verified no error

### Description for the changelog
Update qos_config.j2 for Comptue-AI deployment on one of backend device type roles
2023-09-21 18:34:15 +08:00
vganesan-nokia
5281005304
[swss] Chassis db clean up optimization and bug fixes (#16454) (#16541)
* [swss] Chassis db clean up optimization and bug fixes

This commit includes the following changes:
    - Fix for regression failure due to error in finding CHASSIS_APP_DB in
    pizzabox (#PR 16451)
    - After attempting to delete the system neighbor entries from
    chassis db, before starting clearing the system interface entries,
    wait for sometime only if some system neighbors were deleted.
    If there are no system neighbors entries deleted for the asic coming up,
    no need to wait.
    - Similar changes for system lag delete. Before deleting the
    system lag, wait for some time only if some system lag memebers were
    deleted. If there are no system lag members deleted no need to wait.
    - Flush the SYSTEM_NEIGH_TABLE from the local STATE_DB. While asic
    is coming up, when system neigh entries are deleted from chassis ap
    db (as part of chassis db clean up), there is no orchs/process running to
    process the delete messages from chassis redis. Because of this, stale system
    neigh are entries present in the local STATE_DB. The stale entries result in
    creation of orphan (no corresponding data path/asic db entry) kernel neigh
    entries during STATE_DB:SYSTEM_NEIGH_TABLE entries processing by nbrmgr (after
    the swss serive came up). This is avoided by flushing the SYSTEM_NEIGH_TABLE from
    the local STATE_DB when sevice comes up.

Signed-off-by: vedganes <veda.ganesan@nokia.com>

* [swss] Chassis db clean up bug fixes review comment fix - 1

Debug logs added for deletion of other tables (SYSTEM_INTERFACE and SYSTEM_LAG_TABLE)

Signed-off-by: vedganes <veda.ganesan@nokia.com>

---------

Signed-off-by: vedganes <veda.ganesan@nokia.com>
(cherry picked from commit b13b41fc22)
2023-09-14 14:07:15 -07:00
mssonicbld
b4ab3e01df
Run db_migrator for non first-time reboots (#16116) (#16520) 2023-09-12 18:40:30 +08:00
anamehra
2b302e83c0 chassis-packet: Update arp_update script for FAILED and STALE check (#16311)
chassis-packet: Update arp_update script for FAILED and STALE check (#16311)

1. Fixing an issue with FAILED entry resolution retry.
Neighbor entries in arp table may sometimes enter a FAILED state when the far end is down and reports the state as follows:
2603:10e2:400:3::1 dev PortChannel19 router FAILED
While the arp_update script handles the entries for FAILED in the following format, the above was not handled due to the token location (extra router keyword at index 4):
2603:10e2:400:3::1 dev PortChannel19 FAILED

The former format may appear if an arp resolution is tried on a link that is known but the far end goes down, e.g., pinging a STALE entry while the far end is down.

2. Refreshing STALE entries to make sure the far end is reachable.
STALE entries for some backend ports may appear in chassis-packet when no traffic is received for a while on the port. When the far end goes down, it is expected for BFD to stop sending packets on the session for which the far end is not reachable. But as the entry is known as stale, on the Cisco chassis, BFD keeps sending packets. Refreshing the stale entry will keep active links as reachable in the neighbor table while the entries for the far end down will enter a failed state. FAILED state entries will be retired and entered reachable when far end comes back up.
2023-09-09 09:26:53 +08:00
mssonicbld
0fe5c9fc7d
[platform]: Disable interrupt for intel i2c-i801 driver (#16309) (#16457)
On S6100 we are seeing almost 100K interrupts per second on intels i801 SMBUS controller which affects systems performance.

We now disable the i801 driver interrupt and instead enable polling

Microsoft ADO (number only): 24910530

How I did it
Disable the interrupt by passing the interrupt disable feature argument to i2c-i801 driver

How to verify it
This fix is NOT applicable for ARM based platforms. Applicable only for intel based platforms:-

- On SN2700 its already disabled in Mellanox hw-mgmt
- Celestica DX010 and E1031
- Dell S6100 verified the interrupts are no longer incrementing.
- Arista 7260CX3

Signed-off-by: Prince George <prgeor@microsoft.com>
Co-authored-by: Prince George <45705344+prgeor@users.noreply.github.com>
2023-09-06 09:49:58 -07:00
mssonicbld
07955af2ed
[ci/build]: Upgrade SONiC package versions (#16316) 2023-09-05 21:54:50 -07:00
mssonicbld
d5e2c0004f
Assign the higher metric value for Ipv6 default route learnt via RA message (#16367) (#16440)
* Fix the Loopback0 IPv6 address of LC's in chassis not reachable from peer device's
* Assign the metric vaule for Ipv6 default route learnt via RA message to higher value so that BGP learnt default route is higher priority.

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
Co-authored-by: abdosi <58047199+abdosi@users.noreply.github.com>
2023-09-05 21:52:38 -07:00
Junchao-Mellanox
874ca68060
Fix issue: set has_timer attribute to true for platform monitor service (#15624)
There is a redundant line in init_cfg.json.j2. It would cause pmon service always has "has_timer=False". However, we know that PMON has a timer now. So, I try to fix it here.
2023-09-04 19:38:21 -07:00
mssonicbld
f7f2e654c4
[chassis] Chassis DB cleanup when asic comes up (#16213) (#16378)
* [chassis]Chassis DB cleanup when asic comes up

Cleanup the entries from the following tables in chassis app db in
redis_chassis server in the supervisor
(1) SYSTEM_NEIGH
(2) SYSTEM_INTERFACE
(3) SYSTEM_LAG_MEMBER_TABLE
(4) SYSTEM_LAG_TABLE
As part of the clean up only those entries created by the asic that
is coming up are deleted. The LAG IDs used by the asics are also
de-allocated from SYSTEM_LAG_ID_TABLE and SYSTEM_LAG_ID_SET

- Added check to run the chassis db clean up only for voq switches.

Signed-off-by: vedganes <veda.ganesan@nokia.com>
Co-authored-by: vganesan-nokia <67648637+vganesan-nokia@users.noreply.github.com>
2023-09-01 16:20:31 -07:00
mssonicbld
46e562b881
[ci/build]: Upgrade SONiC package versions (#16214) 2023-08-28 09:29:43 -07:00
Junchao-Mellanox
611449dc88
Fix issue: systemctl daemon-reload would sporadically cause udev handler fail (#15253) (#15959)
A workaround to back port the fix for a systemd issue.

The systemd issue: systemd/systemd#24668
The systemd PR to fix the issue: https://github.com/systemd/systemd/pull/24673/files

The formal solution should upgrade systemd to a version that contains the fix. But, systemd is a very basic service, upgrading systemd requires heavy test.
2023-08-22 09:54:56 -07:00
mssonicbld
f95031b5ab
[ci/build]: Upgrade SONiC package versions (#16124) 2023-08-16 13:30:16 -07:00
mssonicbld
270820c1cf
[chassis]: removed dependency for bgp and swss for chassis supervisor (#15734) (#16099)
Fixes #15667 and #13293

Work item tracking
Microsoft ADO 24472854:

How I did it
On chassis supervisor bgp feature is disabled in hostcfgd. The dependency between swss and bgp causes the bgp containers to start even though the feature is disabled.

How to verify it
Tests on chassis supervisor and LC

Co-authored-by: Arvindsrinivasan Lakshmi Narasimhan <55814491+arlakshm@users.noreply.github.com>
2023-08-11 08:39:22 -07:00
mssonicbld
f835098361
Fix CONFIG_DB_INITIALIZED flag check logic and set/reset flag for warmboot (#15685) (#16098)
* Fix CONFIG_DB_INITIALIZED flag check logic and set/reset flag for warm-reboot
* Fix db-cli usage
* Handle same image warm-reboot and generalize handling of INIT flag
* Cover boot from ONIE case: set config init flag when minigraph, config_db are missing
* Handle case: first boot of SONiC
* Check for config init flag
* Simplify logic, and do not call db_migrator for same image reboot

Co-authored-by: Vaibhav Hemant Dixit <vaibhav.dixit@microsoft.com>
2023-08-11 08:38:59 -07:00
mssonicbld
a134bfe0b2
[syncd.sh] Clear semaphore before updating firmware (#15818) (#16068)
Why I did it
The hw resources should be released before updating firmware.

How I did it
Added logic to release hw resources in syncd.sh script

Signed-off-by: Vadym Hlushko <vadymh@nvidia.com>
Co-authored-by: Vadym Hlushko <62022266+vadymhlushko-mlnx@users.noreply.github.com>
2023-08-10 13:34:52 -07:00
mssonicbld
d351e05f82
[monit][dualtor] Periodically check mux neighbors consistency (#15769) (#15954)
Signed-off-by: Longxiang Lyu <lolv@microsoft.com>
Co-authored-by: Longxiang Lyu <35479537+lolyu@users.noreply.github.com>
2023-08-10 13:33:09 -07:00
mssonicbld
5d250c6264
[ci/build]: Upgrade SONiC package versions (#15940) 2023-08-10 13:32:17 -07:00
mssonicbld
a03489a413
[ci/build]: Upgrade SONiC package versions (#15939) 2023-07-22 15:52:35 -07:00
mssonicbld
ab0768eb15
Update WRED profile on system ports (#15612) (#15914)
* Update WRED profile on system ports

Co-authored-by: vmittal-msft <46945843+vmittal-msft@users.noreply.github.com>
2023-07-20 08:39:54 -07:00
mssonicbld
0291dae68a
[ci/build]: Upgrade SONiC package versions (#15855) 2023-07-19 08:28:14 -07:00
mssonicbld
1b32bf6b2d
update rsyslog log size conf (#15821) (#15845) 2023-07-15 05:47:03 +08:00
mssonicbld
7c6a1612d1
[ci/build]: Upgrade SONiC package versions (#15766) 2023-07-13 08:27:25 -07:00
mssonicbld
7e5156b64c
[ci/build]: Upgrade SONiC package versions (#15760) 2023-07-08 09:45:59 -07:00
mssonicbld
2d1efaec67
Revert "Revert "Fix for fast/cold-boot: call db_migrator only after old config is loaded (#14933)" (#15464)" (#15684) (#15746) 2023-07-08 07:21:28 +08:00
mssonicbld
6f6db96634
[ci/build]: Upgrade SONiC package versions (#15700) 2023-07-07 14:27:19 -07:00
lixiaoyuner
6922edba80
Move k8s script to docker-config-engine (#14788) (#15740)
Why I did it
To reduce the container's dependency from host system

Work item tracking
Microsoft ADO (number only):
17713469
How I did it
Move the k8s container startup script to config engine container, other than mount it from host.

How to verify it
Check file path(/usr/share/sonic/scripts/container_startup.py) inside config engine container.

Signed-off-by: Yun Li <yunli1@microsoft.com>
Co-authored-by: Qi Luo <qiluo-msft@users.noreply.github.com>
2023-07-07 09:22:58 -07:00
mssonicbld
7952fe7f4d
[arp_update]: Fix IPv6 neighbor race condition (#15583) (#15694) 2023-07-01 10:21:55 +08:00