Commit Graph

1306 Commits

Author SHA1 Message Date
Stephen Sun
610685d27b
Do not pass the option "device" in rsyslog.conf by default when syslog server's source address is configured (#17616)
### Why I did it

An in-band syslog server will not receive any syslog if it is configured without a VRF specified, which is because `eth0` is always specified as the `device` of a syslog server and the syslog packets will be sent to `eth0` regardless of its destination IP address.

### How I did it

Pass the option "device" in rsyslog.conf only if when syslog server's source address is configured with a non-default VRF

#### How to verify it

Manually test:
1. Configuring a syslog server without VRF specified or with `default` as the VRF: no `device` passed in `rsyslog.conf`
2. Configuring a syslog server with non-default VRF: the configured VRF passed as `device` in `rsyslog.conf`
2024-03-23 17:04:00 -07:00
ganglv
9a6d6137a3
Remove UpdateGraphService feature (#18330)
### Why I did it
Remove UpdateGraphService feature from sonic image. The goal is to simplify the bootup process.

### How I did it
Remove updategraph service and updategraph script.
Update all related services, replace updategraph.service with config-setup.service.

#### How to verify it
Build and install new image, load minigraph and check all the services.
2024-03-14 13:12:26 -07:00
amulyan7
80448380e6
Set loglevel for crash kernel to reduce verbosity and improve overall router recovery time (#18285)
Why I did it
On certain routers with baud rate 9600, crash kernel is taking a long time , close to ~5mins, to complete kernel dump and reload the box. On contrast to routers with baud rate 115200, crash kernel dump process is observed to be completed under 35s-60s (depending on the platform). Currently, all debug and informational messages are printed on the console which also factors in for the delay seen. Unless the router is monitored on console in real time, these messages are not very useful. Setting the loglevel to warning will help reduce the verbosity of logs on console, in turn allow crash kernel dump process to be completed in a reasonable time which will also help in overall router recovery time.

How I did it
Setting loglevel attribute in crashkernel cmdline

How to verify it
Install SONiC image with crashkernel cmdline with loglevel set to warning and initiate an induced a crash (sysrq-trigger)
crashkernel boot and dump process will be completed in 20s-30s depending on the platform
2024-03-13 09:36:51 +08:00
Xincun Li
f886328897
[sn2700]: Add CPLD update. (#17376)
Why I did it
Porting #12173 to master, this will ensure all above 201911 version will have CPLD update files.

Microsoft ADO 25846069:

How I did it
Added Mellanox CPLD burn/refresh vme bundle for SN2700 platforms

How to verify it
Using update_firmware script to install private image that contains CPLD VME files with UPDATE_MLNX_CPLD_FW parameter.

Before update, the CPLD version was 15
admin@str2-msn2700-spy-1:~$ sudo fwutil show status
Chassis    Module    Component    Version                Description
---------  --------  -----------  ---------------------  ----------------------------------------
MSN2700    N/A       ONIE         2016.11-5.1.0012-9600  ONIE - Open Network Install Environment
                     SSD          0115-000               SSD - Solid-State Drive
                     BIOS         0ABZS017_01.01.213     BIOS - Basic Input/Output System
                     CPLD1        CPLD000085_REV1501     CPLD - Complex Programmable Logic Device
                     CPLD2        CPLD000043_REV0400     CPLD - Complex Programmable Logic Device
                     CPLD3        CPLD000000_REV0100     CPLD - Complex Programmable Logic Device
Do Update
admin@str2-msn2700-spy-1:/tmp$ sudo ./update_firmware sonic-mellanox-xincun-cpld.bin UPDATE_MLNX_CPLD_FW=1
Available space: 8101 MB
Warning: 'sonic_installer' command is deprecated and will be removed in the future
Please use 'sonic-installer' instead
Current FW version: SONiC-OS-20201231.110
Target FW version number: add-cpld-2.83464431-a0237f7aef
Target FW version: SONiC-OS-add-cpld-2.83464431-a0237f7aef
expr: non-integer argument
NOTICE: Reset Drop caches to index 1
Warning: 'sonic_installer' command is deprecated and will be removed in the future
Please use 'sonic-installer' instead
Image SONiC-OS-add-cpld-2.83464431-a0237f7aef is already installed. Setting it as default...
Command: grub-set-default --boot-directory=/host 0

Command: sync;sync;sync

Command: sleep 3

Done
NOTICE: sonic_installer install successfully
Mellanox platform is detected: x86_64-mlnx_msn2700-r0
Mellanox ASIC maintenance...
Mellanox ASIC firmware is up to date
Mellanox CPLD maintenance...
NOTICE: Copy Mellanox firmware upgrade utility
'/tmp/image-add-cpld-2.83464431-a0237f7aef-fs//usr/bin/mlnx-fw-upgrade.sh' -> '/usr/bin/mlnx-fw-upgrade.sh'
NOTICE: Copy Mellanox cpldupdate utility
'/tmp/image-add-cpld-2.83464431-a0237f7aef-fs//usr/bin/cpldupdate' -> '/usr/bin/cpldupdate'
Mellanox CPLD firmware upgrade is required. Installing compatible version...
Current CPLD firmware version: 15
Target CPLD firmware version: 20
NOTICE: Upgrade MLNX CPLD FW from 15 to 20
CPLD burn firmware file: /tmp/tmp.42DXmW1pQS/FUI000193_Burn_Panther_CPLD000085_REV2000_CPLD000128_REV0600_CPLD000130_REV0300.vme
CPLD refresh firmware file: /tmp/tmp.42DXmW1pQS/FUI000193_Refresh_Panther_CPLD000085_REV2000_CPLD000128_REV0600_CPLD000130_REV0300.vme
[/] CPLD update...                 Lattice Semiconductor Corp.

             ispVME(tm) V12.2 Copyright 1998-2012.

               Customized for Mellanox products.

Processing virtual machine file (/tmp/tmp.42DXmW1pQS/FUI000193_Burn_Panther_CPLD000085_REV2000_CPLD000128_REV0600_CPLD000130_REV0300.vme)......

Diamond Deployment Tool 3.12
CREATION DATE: Tue Sep 20 09:41:49 2022


[|] CPLD update...+=======+
| PASS! |
+=======+
Power cycle the device, then check CPLD version, it has changed to 20.
admin@str2-msn2700-spy-1:~$ sudo fwutil show status
Chassis    Module    Component    Version                Description
---------  --------  -----------  ---------------------  ----------------------------------------
MSN2700    N/A       ONIE         2016.11-5.1.0012-9600  ONIE - Open Network Install Environment
                     SSD          0115-000               SSD - Solid-State Drive
                     BIOS         0ABZS017_01.01.213     BIOS - Basic Input/Output System
                     CPLD1        CPLD000085_REV2000     CPLD - Complex Programmable Logic Device
                     CPLD2        CPLD000128_REV0600     CPLD - Complex Programmable Logic Device
                     CPLD3        CPLD000000_REV0000     CPLD - Complex Programmable Logic Device
2024-03-06 07:39:00 -08:00
lixiaoyuner
9fef78c4f0
Install and upgrade packages for k8s master image (#18159)
### Why I did it
- Currently inside k8s master image we are going to use AAD to do authentication related stuff with python language, we need to pre-install several azure key-vault related python packages.
- Need to upgrade cri-dockerd to 0.3.10 to support bookworm
- Need to change netcat package name to netcat-openbsd for bookworm
- Remove the unnecessary apt-get update
##### Work item tracking
- Microsoft ADO **(number only)**: 26435886

#### How I did it
- pip3 install azure-keyvault-secrets
- apt-get -y install netcat-openbsd
- upgrade the cri-dockerd version for bookworm
#### How to verify it
- pip3 list to check if azure-keyvault-secrets is installed inside image
- dpkg -l to check if netcat-openbsd is installed inside image
- systemctl status cri-dockerd.service to check if it's running well
2024-02-27 18:09:12 -08:00
Liu Shilong
5e23a6bc93
[build] Use public storage for public resources. (#18038) 2024-02-27 17:45:49 -08:00
Nikola Dancejic
1bf2f72a48
[ebtables] Add multicast drop rule to ebtables (#18064)
Adding rule to ebtables to drop multicast packets in kernel. This was
done to address a bug where NS packets were flooding ports with
duplicate packets.

Signed-off-by: Nikola Dancejic <ndancejic@microsoft.com>
2024-02-27 13:11:58 -08:00
rajib-dutta1
4753953ed0
Ipmitool bookworm: Fix and patch enterprise-numbers URL (#17878)
### Why I did it

ipmitool utility is used to access various HW sensors. Some platforms use "ipmitool raw " to read specific addresses. 

ipmitool_1.8.19-4_amd64.deb, that is part of bookworm has a defect. The package is missing file enterprise.txt that is expected by the "raw read" code path. 
It is so because the file the .deb tries to download at the build time does not have the necessary extension as it is available on remote server: https://www.iana.org/assignments/enterprise-numbers.txt

### How I did it

The defect had been fixed using coding changes in next unstable version of Linux. It is expected to be available in future stable version of the OS. Hence to keep the changes to minimal, the .dsc file is downloaded and only the Makefile is modified to download the correct file. To make is work as patch necessary changes are made.

#### How to verify it
Build log is attached and installation of the file is noted line #2274
When using vanilla bookworm on platforms like 5212 or 5224:
-------------------------------------------------------------------
root@sonic:~# ipmitool raw 0x04 0x2d 0x31
IANA PEN registry open failed: No such file or directory
00 c0 01 80

When fixed we should not see the above error:
--------------------------------------------------
root@sonic:/home/admin# ipmitool raw 0x04 0x2d 0x31
 00 c0 00 80

### Description for the changelog

This change is to address ipmitool raw read issue. This patch must be removed once it is available in next stable Linux release that contains the fix. 

1edb0e27e4
2024-02-26 17:49:06 -08:00
Prince George
0564ce48c9
[baseimage]: Update smartmontool version >= v7.4 (#17635)
Why I did it
Update smartmontool verson to 7.4. This is done to prevent smartmontools service to exit with non-zero exit status on platform that does not have a SSD/disk to be monitored.

Until Debian Bullseye (which had smartmontools 7.2), Debian had a patch applied that changed the default quit mode to never exit. A bug report was filed on Debian, saying that the source code patch isn't needed and could just be done via command line options, and also that smartmontools 7.3 has a new built-in option to exit with 0 if there are no monitorable devices found (which prevents systemd from treating it as a service failure). Because of that, Debian Bookworm (which also upgraded to 7.3) removed the patch and restored the default behavior of exiting with exit code 17 if there are no devices found.

Smartmontools v7.3 has this issue, because of which smartd exits with non-zero exit status even with "-q" option.

How I did it
Update the smartmontools to version 7.4 which has the fix for exiting gracefully if no monitoring device is found
Added smartd option "-q nodev0" to allow smartd to exit with status 0 if no monitoring device found
2024-02-12 09:37:12 -08:00
Stepan Blyshchak
cac73d80ca
[bootchart] enable command line recording (#17778)
Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
2024-02-12 08:36:44 -08:00
Zain Budhwani
c8439cdd4b
Disable eventd and rsyslog plugin in slim images (#17905)
### Why I did it

Disable eventd at buildtime for slim images

##### Work item tracking
- Microsoft ADO **(number only)**:26386286

#### How I did it

Add flags for disabling eventd and only copy rsyslog conf files when eventd is included and not slim image

#### How to verify it

Manual testing
2024-01-30 22:14:23 -08:00
Kevin Wang
5516381d7e
[qos] change the template keyword from Compute-AI to ComputeAI (#17902)
Why I did it
Align the keywords to make qos configuration take effect

Work item tracking
Microsoft ADO (number only):
How I did it
Change the keyword to ComputeAI

How to verify it
reload minigraph and check the qos configuration
2024-01-29 10:10:54 +08:00
ganglv
c798ea8e08
Change tcp port range to support telemetry and gnmi (#17907)
* Reserve tcp port for telemetry and gnmi

* Use ip_local_port_range instead

* Fix sysctl config
2024-01-26 09:31:09 -08:00
Hua Liu
bdb24676eb
Change orchagent stuck message from ERR to WARNING (#17872)
Change orchagent stuck message from ERR to WARNING

#### Why I did it
During switch initialization, sometime Orchagent will busy for more than 40seconds and will trigger process stuck workdog error.
To improve this issue, change watchdog error message to warning message.

##### Work item tracking
- Microsoft ADO: 26517622

#### How I did it
Change orchagent stuck message from ERR to WARNING.

#### How to verify it
Pass all UT.

### Description for the changelog
Change orchagent stuck message from ERR to WARNING.
2024-01-26 00:01:50 -08:00
Zain Budhwani
b557488608
Remove echo log to /tmp/{$SERVICE}-debug.log in service_mgmt.sh (#17838)
### Why I did it

Unnecessary for logs to be written out to /tmp/${SERVICE}-debug.log as they are already being written to syslog. Therefore, removing writing to a new log in concern for memory space and not being able to startup some services in RO state.

##### Work item tracking
- Microsoft ADO **(number only)**:26458976

#### How I did it

Remove DEBUGLOG definition and line that echo's message to mentioned log file.

#### How to verify it

Manually verified, /tmp/${SERVICE}-debug.log files do not exist and log for service starting still appears in syslog
2024-01-25 17:14:21 -08:00
mssonicbld
1fb9732f41 [ci/build]: Upgrade SONiC package versions 2024-01-25 14:35:40 +08:00
Oleksandr Ivantsiv
c693e75f0f
[dns] Do not apply dynamic DNS configuration when MGMT interface has static IP address. (#17769)
### Why I did it
Fix the issue detected by[ TestStaticMgmtPortIP::test_dynamic_dns_not_working_when_static_ip_configured ](https://github.com/sonic-net/sonic-mgmt/blob/master/tests/dns/static_dns/test_static_dns.py#L105C9-L105C63) test.

### How I did it
Query MGMT interface configuration. Do not apply dynamic DNS configuration when MGMT interface has static IP address.

#### How to verify it
Run `tests/dns/static_dns/test_static_dns.py` sonic-mgmt tests.
2024-01-23 16:29:55 -08:00
Hua Liu
c274be2e59
Fix IPV6 forced-mgmt-route not work issue (#17299)
ix IPV6 forced-mgmt-route not work issue

Why I did it
IPV6 forced-mgmt-route not work

When add a IPV6 route, should use 'ip -6 rule add pref 32764 address' command, but currently in the template the '-6' parameter are missing, so the IPV6 route been add to IPV4 route table.

Also this PR depends on #17281 , which will fix the IPV6 'default' route table missing in IPV6 route lookup issue. 

Microsoft ADO (number only):24719238
2024-01-22 09:59:12 -08:00
Nazarii Hnydyn
e173987a56
[swss/syncd]: Remove dependency on interfaces-config.service (#17739)
Signed-off-by: Nazarii Hnydyn <nazariig@nvidia.com>
Co-authored-by: Stepan Blyshchak <38952541+stepanblyschak@users.noreply.github.com>
2024-01-18 08:04:00 -08:00
Liping Xu
d6e0bf66a6
disable restapi for leafRouter in slim image (#17713)
Why I did it
For some devices with small memory, after upgrading to the latest image, the available memory is not enough.

Work item tracking
Microsoft ADO (number only):
26324242
How I did it
Disable restapi feature for LeafRouter which with slim image.

How to verify it
verified on 7050qx T1 (slim image), restapi disabled
verified on 7050qx T0 (slim image), restapi enabled
verified on 7260 T1 (normal image), restapi enabled
2024-01-12 15:26:06 +08:00
Lawrence Lee
eb70bff4b7
add timeout to ping6 command (#17729)
Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
2024-01-10 14:40:15 -08:00
prabhataravind
c20abb9e28
[docker_image_ctl.j2]: swss docker initialization improvements (#17628)
* [docker_image_ctl.j2]: swss docker initialization improvements

This commit attempts to address the following:
 * Make sure swss container is indeed up and running before running any commands
   on it. In case where swss container is not fully up when swss.sh attempts to
   create swss:/ready file using "docker exec swss$DEV touch", the command can
   fail silently and can cause swssconfig to wait forever leading to missing IP
   decap configuration among other things. Add a wait so that docker commands
   are run only after swss container status is "Running"
*  Add a log when swss:/ready file is created or if the file creation fails so
   that it becomes easier to debug such scenarios in the future
* [docker_image_ctl.j2]: Use swss$DEV to accommodate multi ASIC platforms as well

Signed-off-by: Prabhat Aravind <paravind@microsoft.com>
2024-01-03 17:44:22 -08:00
bingwang-ms
977e73d370
Update backend_acl.py to specify ACL table name (#17553) 2024-01-03 14:55:38 -08:00
prabhataravind
038ca267c8
[image_config]: Update DHCP rate-limit for mgmt TOR devices (#17630)
* [image_config]: Update DHCP rate-limit for mgmt TOR devices

    Change DHCP rate limit(queue4,group3) in SONiC copp configuration to 300 PPS
    for mgmt TORs while keeping the rate limit at 100 PPS for other topologies.

    Why I did it:
    Some mgmt TORs based on Marvell ASIC do not support 100 PPS CIR, so that led
    to these devices silently dropping DHCP packets.

    Microsoft ADO: **25820076**

    How to verify it:
    Send DHCP broadcast packets to an M0 DUT and verify that they are trapped to
    CPU at 300 PPS. On non-mgmt devices, the packets should be trapped at CIR of
    100 PPS. Also ran sonic-mgmt dhcp_relay test and confirmed that it passes.

Signed-off-by: Prabhat Aravind <paravind@microsoft.com>
2024-01-02 21:29:34 -08:00
Junchao-Mellanox
f3f2972512
Optimize syslog rate limit feature for fast and warm boot (#17458)
- Why I did it
Optimize syslog rate limit feature for fast and warm boot

- How I did it
Optimize redis start time
Don't render rsyslog.conf in container startup script
Disable containercfgd by default. There is a new CLI to enable it (in another PR)

- How to verify it
Manual test
Regression test
2023-12-20 09:12:03 +02:00
Prince George
30ff77350f
Fix the fsck script that does filesystem repair (#17424)
Fix the fsck check which is not working. Potentially fixes #16938
Modified fsck script to run on the ext4.fsck on the appropriate disk where SONiC resides

Microsoft ADO: 26098631
2023-12-19 17:51:49 -08:00
Junhua Zhai
53be9de743
Fix syncd_request_shutdown coredump in config reload on KVM sonic (#17486)
The issue is related to #16812. Process syncd does not run in the container gbsyncd on kvm sonic with default hwsku.

Microsoft ADO : 26151608

How I did it
If syncd has not run in container gbsyncd, it is not needed to trigger graceful shudown of syncd.

How to verify it
None of syncd_request_shutdown coredump in config reload on KVM sonic
2023-12-13 17:37:44 -08:00
Yevhen Fastiuk
5efb123ede
[NTP] Add NTP extended configuration (#15058)
hld [#1296](https://github.com/sonic-net/SONiC/pull/1296)
closes [#1254](https://github.com/sonic-net/SONiC/issues/1254)
depends-on [#60](https://github.com/sonic-net/sonic-host-services/pull/60), [#781](https://github.com/sonic-net/sonic-swss-common/pull/781), [#2835](https://github.com/sonic-net/sonic-utilities/pull/2835), [#10749](https://github.com/sonic-net/sonic-mgmt/pull/10749)

#### Why I did it
To cover the next AIs:
* Configure NTP global parameters
* Add/remove new NTP servers
* Change the configuration for NTP servers
* Show NTP status
* Show NTP configuration

### How I did it
* Add YANG model for a new configuration
* Extend configuration templates to support new knobs

### Description for the changelog
* Add ability to configure NTP global parameters such as authentication, dhcp, admin state
* Change the configuration for NTP servers
* Add an ability to show NTP configuration

#### Link to config_db schema for YANG module changes
[NTP configuration](https://github.com/sonic-net/sonic-buildimage/blob/master/src/sonic-yang-models/doc/Configuration.md#ntp-and-syslog-servers)
2023-12-11 13:31:35 -08:00
Stepan Blyshchak
b61528bee9
Revert "[swss/syncd] remove dependency on interfaces-config.service (#13084) (#14341)" (#15094) (#17367)
This reverts commit 499f57a7f7.

Co-authored-by: Nazarii Hnydyn <nazariig@nvidia.com>
2023-12-07 15:20:39 -08:00
Ying Xie
2e072beb41
Revert "[pmon] update gRPC version to 1.57.0 (#16257)" (#17401)
This reverts commit 45a852233b.
2023-12-07 11:01:47 -08:00
centecqianj
8ec4b53451
[Bookworm] Upgrade centec-arm64 platform to Bookworm. (#17411)
Why I did it
1. Upgrade centec-arm64 platform to Bookworm.
2. Solve the problem of compiling the docker-syncd-centec-rpc.gz error on the centec platform.

How I did it
1. Modified platform driver to comply with bookworm kernel.
2. Upgrade SONiC package versions of the centec platform.

How to verify it
1. Compile the centec-arm64 platform to generate sonic-centec-arm64.bin.
2. Compile the centec platform to generate docker-syncd-centec-rpc.gz.

Signed-off-by: centecqianj <qianj@centec.com>
2023-12-07 08:42:13 -08:00
Stepan Blyshchak
9555883e6f
[config-chassisdb] use cached variables (#17342)
- Why I did it
Improve boot performance mostly needed for fast and warmboot

- How I did it
Use cached variable.

- How to verify it
Boot the system. Simply do "systemd-analyze blame" and look at service start time.

Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
2023-12-07 15:24:21 +02:00
Stepan Blyshchak
6435df1056
[config-topology] use cached variables (#17343)
- Why I did it
Improve  boot performance mostly needed for fast and warmboot

- How I did it
Use cached variable.

- How to verify it
Boot the system. Simply do "systemd-analyze blame" and look at service start time.

Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
2023-12-07 15:22:44 +02:00
Aaron Payment
0ecee5df05
[gbsyncd]: Set SYSLOG_CONFIG_FEATURE for gbsyncd (#17325)
Why I did it
SONiC Mgmt test syslog/test_syslog_rate_limit.py syslog.test_syslog_rate_limit test_syslog_rate_limit was failing on SKUs with gbsyncd. This includes Arista 720DT when testing on the 202305 branch.

How I did it
The issue was no value for gbsyncd in "show syslog rate-limit-container",
because gbsyncd is not having a SYSLOG_CONFIG_FEAGTURE|gbsyncd entry in
config_db, which is further because gbsyncd feature is for not enabled
through init_cfg.json.j2.

How to verify it
Test is now passing on 720DT in 202305 branch.

Co-authored-by: Boyang Yu <byu@arista.com>
2023-12-06 22:04:21 -08:00
Junhua Zhai
048f2a7c39
[gbsyncd] Graceful shutdown of syncd process in container gbsyncd (#16812)
Fix #16608. Need to gracefully shutdown syncd/gbsyncd individually.
2023-12-06 21:43:13 -08:00
Hua Liu
164916681a
Fix can't access IPV6 address via management interface because 'default' route table does not add to route lookup issue. (#17281)
Fix can't access IPV6 address via management interface because 'default' route table does not add to route lookup issue.

#### Why I did it
When device set with IPV6 TACACS server address, and shutdown all BGP, device can't connect to TACACS server via management interface.

After investigation, I found the IPV6 'default' route table does not add to route lookup:

admin@vlab-01:~$ ip -6 rule list
1001:   from all lookup local
32765:  from fec0::ffff:afa:1 lookup default
32766:  from all lookup main
admin@vlab-01:~$

As compare:
admin@vlab-01:~$ ip -4 rule list
1001:   from all lookup local
32764:  from all to 172.17.0.1/24 lookup default
32765:  from 10.250.0.101 lookup default
32766:  from all lookup main
32767:  from all lookup default <== 'default' route table exist in IPV4 route lookup

Issue fix by add 'default' route table to route lookup with following command:
admin@vlab-01:~$ sudo ip -6 rule add pref 32767 lookup default
admin@vlab-01:~$ ip -6 rule list
1001:   from all lookup local
32765:  from fec0::ffff:afa:1 lookup default
32766:  from all lookup main
32767:  from all lookup default <== 'default' route table been added to IPV6 route lookup
admin@vlab-01:~$

##### Work item tracking
- Microsoft ADO: 25798732

#### How I did it
When management interface using 'default' route table, add 'default' route table to IPV6 route lookup.

#### How to verify it
Pass all UT.
Add new UT to cover this change.
Manually verify issue fixed:

### Tested branch (Please provide the tested image version)

- [x]  master-17281.417570-2133d58fa

#### Description for the changelog
Fix can't access IPV6 address via management interface because 'default' route table does not add to route lookup issue.
2023-12-05 11:51:56 -08:00
Ashwin Hiranniah
ada7c6a72e
Add pensando platform (#15978)
This commit adds support for pensando asic called ELBA. ELBA is used in pci based cards and in smartswitches.

#### Why I did it
This commit introduces pensando platform which is based on ELBA ASIC.
##### Work item tracking
- Microsoft ADO **(number only)**:

#### How I did it
Created platform/pensando folder and created makefiles specific to pensando.
This mainly creates pensando docker (which OEM's need to download before building an image) which has all the userspace to initialize and use the DPU (ELBA ASIC).
Output of the build process creates two images which can be used from ONIE and goldfw.
Recommendation is use to use ONIE.
#### How to verify it
Load the SONiC image via ONIE or goldfw and make sure the interfaces are UP.

##### Description for the changelog
Add pensando platform support.
2023-12-04 14:41:52 -08:00
Kebo Liu
4c699050e8
[Mellanox] Add special rsyslog filter for MSN2410 platform (#17365)
- Why I did it
Mellanox MSN2410 platforms have a non-functional error log: "ERR pmon#sensord: Error getting sensor data: dps460/#10: Can't read". This error is because of a firmware issue with some PSU, we are not able to upgrade the FW online. Since there is no functional impact, this error log can be ignored safely

- How I did it
Add a new rsyslog rule to the rsyslog-container.conf.j2, if the docker name is pmon and the platform name matches, the new rule will be inserted into the docker rsyslogd.conf

- How to verify it
run regression on the MSN2410 platform to make the error log will not be printed to the syslog.

Signed-off-by: Kebo Liu <kebol@nvidia.com>
2023-12-03 15:32:56 +02:00
centecqianj
8db3a99d11
[Bookworm] Upgrade centec platforms to Bookworm (#17364)
How I did it
Modified platform driver to comply with bookworm kernel.
Modified python build commands for building whl packages.

How to verify it
Verify whether all the platform bookworm debs are built.
make target/debs/bookworm/platform-modules-v682-48y8c-d_1.0_amd64.deb
Load the platform debian into the device and install it in bookworm image.
Verify the platform related CLI and the functionality

Signed-off-by: centecqianj <qianj@centec.com>
2023-12-01 16:07:52 -08:00
Lawrence Lee
572af1dcdf
[arp_update]: Flush neighbors with incorrect MAC info (#17238)
[arp_update]: Flush MAC mismatch neighbors

- Check for MAC mismatch between neighbor entries in the kernel and APPL_DB
- Flush any entries with a mismatch
2023-11-30 14:23:05 -08:00
Xincun Li
f13081bfbd
Ensure that 'logrotate-config.service' is set as a dependency to start before 'logrotate.service'. (#17312)
* Ensure that 'logrotate-config.service' is set as a dependency to start before 'logrotate.service'.
2023-11-29 17:22:47 -08:00
Vivek
4727185648
[lldp] Clean up service start logic owing to port init start optimization (#17268)
Signed-off-by: Vivek Reddy <vkarri@nvidia.com>
2023-11-27 09:56:54 -08:00
prabhataravind
aea3c42f29
[image_config]: Update DHCP rate-limit (#17132)
Change DHCP rate limit in SONiC copp configuration to 100 PPS as this is
necessary to ensure that DHCP flood does not cause LACP/BGP flaps in all
scenarios

This is an extension to the change in image_config: copp: Enable rate limiting 
for bgp, lacp, dhcp, lldp, macsec and udld #14859 and sonic-mgmt change in 
[tests/copp]: Update copp mgmt tests to support new rate-limits sonic-mgmt#8199

Why I did it
300 PPS is not sufficient to prevent LACP/BGP flaps in all cases. 100 PPS seems to
provide better resiliency against DHCP traffic flood to CPU.

Microsoft ADO 25776614:

Send DHCP broadcast packets to DUT and verify that they are trapped to CPU at 100 PPS.

Signed-off-by: Prabhat Aravind <paravind@microsoft.com>
2023-11-22 15:02:17 -08:00
mssonicbld
52e304afcf [ci/build]: Upgrade SONiC package versions (#17035) 2023-11-21 18:53:15 -08:00
Saikrishna Arcot
318f3945be Modify the sudoers file to lecture RO users once
Debian changed the defaults of the sudo package to never lecture the
user when using an unauthorized sudo command, which breaks our use case
of lecturing once. Add a line to lecture once, which is the old
defaults.

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
2023-11-21 18:53:15 -08:00
Saikrishna Arcot
862bd794ee Fix container down event not sending out a notification
systemd changed the log message syntax for a container going down.
Update the regex for the new format.

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
2023-11-21 18:53:15 -08:00
Saikrishna Arcot
cae42998dd Fix PAM module configuration issue
pam-auth-update doesn't store local configuration, and it's meant to be
used by packages only. Because libpam-systemd was getting uninstalled
afterwards, this caused tacplus to get re-enabled.

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
2023-11-21 18:53:15 -08:00
Saikrishna Arcot
73605a98ef Modify rasdaemon service on amd64 only
Rasdaemon is not installed on armhf or arm64

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
2023-11-21 18:53:15 -08:00
Saikrishna Arcot
0664c791ef For Bookworm, use non-free-firmware instead of non-free
Starting with Bookworm, Debian moved the non-free Linux firmware blobs
into a new non-free-firmware component, since they are frequently needed
by users and since they need to be updated frequently. Since the only
thing we currently install from the non-free component (that I can think
of) is the Linux firmware, have Bookworm use non-free-firmware instead
of non-free.

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
2023-11-21 18:53:15 -08:00
Saikrishna Arcot
ed5176107b Update Debian build script for Bookworm
Notable changes:
* Use j2cli from Debian repos instead of pip
* Use setuptools from Debian repos instead of pip
* Use wheel from Debian repos instead of pip
* Update grpcio and grpcio-tools python packages to match version in
  Bookworm
* Use m2crypto from Debian repos instead of pip

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
2023-11-21 18:53:15 -08:00