sonic-buildimage

Archived

Author	SHA1	Message	Date
rajib-dutta1	4753953ed0	Ipmitool bookworm: Fix and patch enterprise-numbers URL (#17878 ) ### Why I did it ipmitool utility is used to access various HW sensors. Some platforms use "ipmitool raw " to read specific addresses. ipmitool_1.8.19-4_amd64.deb, that is part of bookworm has a defect. The package is missing file enterprise.txt that is expected by the "raw read" code path. It is so because the file the .deb tries to download at the build time does not have the necessary extension as it is available on remote server: https://www.iana.org/assignments/enterprise-numbers.txt ### How I did it The defect had been fixed using coding changes in next unstable version of Linux. It is expected to be available in future stable version of the OS. Hence to keep the changes to minimal, the .dsc file is downloaded and only the Makefile is modified to download the correct file. To make is work as patch necessary changes are made. #### How to verify it Build log is attached and installation of the file is noted line #2274 When using vanilla bookworm on platforms like 5212 or 5224: ------------------------------------------------------------------- root@sonic:~# ipmitool raw 0x04 0x2d 0x31 IANA PEN registry open failed: No such file or directory 00 c0 01 80 When fixed we should not see the above error: -------------------------------------------------- root@sonic:/home/admin# ipmitool raw 0x04 0x2d 0x31 00 c0 00 80 ### Description for the changelog This change is to address ipmitool raw read issue. This patch must be removed once it is available in next stable Linux release that contains the fix. `1edb0e27e4`	2024-02-26 17:49:06 -08:00
Prince George	0564ce48c9	[baseimage]: Update smartmontool version >= v7.4 (#17635 ) Why I did it Update smartmontool verson to 7.4. This is done to prevent smartmontools service to exit with non-zero exit status on platform that does not have a SSD/disk to be monitored. Until Debian Bullseye (which had smartmontools 7.2), Debian had a patch applied that changed the default quit mode to never exit. A bug report was filed on Debian, saying that the source code patch isn't needed and could just be done via command line options, and also that smartmontools 7.3 has a new built-in option to exit with 0 if there are no monitorable devices found (which prevents systemd from treating it as a service failure). Because of that, Debian Bookworm (which also upgraded to 7.3) removed the patch and restored the default behavior of exiting with exit code 17 if there are no devices found. Smartmontools v7.3 has this issue, because of which smartd exits with non-zero exit status even with "-q" option. How I did it Update the smartmontools to version 7.4 which has the fix for exiting gracefully if no monitoring device is found Added smartd option "-q nodev0" to allow smartd to exit with status 0 if no monitoring device found	2024-02-12 09:37:12 -08:00
Stepan Blyshchak	cac73d80ca	[bootchart] enable command line recording (#17778 ) Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>	2024-02-12 08:36:44 -08:00
Zain Budhwani	c8439cdd4b	Disable eventd and rsyslog plugin in slim images (#17905 ) ### Why I did it Disable eventd at buildtime for slim images ##### Work item tracking - Microsoft ADO (number only):26386286 #### How I did it Add flags for disabling eventd and only copy rsyslog conf files when eventd is included and not slim image #### How to verify it Manual testing	2024-01-30 22:14:23 -08:00
Kevin Wang	5516381d7e	[qos] change the template keyword from Compute-AI to ComputeAI (#17902 ) Why I did it Align the keywords to make qos configuration take effect Work item tracking Microsoft ADO (number only): How I did it Change the keyword to ComputeAI How to verify it reload minigraph and check the qos configuration	2024-01-29 10:10:54 +08:00
ganglv	c798ea8e08	Change tcp port range to support telemetry and gnmi (#17907 ) * Reserve tcp port for telemetry and gnmi * Use ip_local_port_range instead * Fix sysctl config	2024-01-26 09:31:09 -08:00
Hua Liu	bdb24676eb	Change orchagent stuck message from ERR to WARNING (#17872 ) Change orchagent stuck message from ERR to WARNING #### Why I did it During switch initialization, sometime Orchagent will busy for more than 40seconds and will trigger process stuck workdog error. To improve this issue, change watchdog error message to warning message. ##### Work item tracking - Microsoft ADO: 26517622 #### How I did it Change orchagent stuck message from ERR to WARNING. #### How to verify it Pass all UT. ### Description for the changelog Change orchagent stuck message from ERR to WARNING.	2024-01-26 00:01:50 -08:00
Zain Budhwani	b557488608	Remove echo log to /tmp/{$SERVICE}-debug.log in service_mgmt.sh (#17838 ) ### Why I did it Unnecessary for logs to be written out to /tmp/${SERVICE}-debug.log as they are already being written to syslog. Therefore, removing writing to a new log in concern for memory space and not being able to startup some services in RO state. ##### Work item tracking - Microsoft ADO (number only):26458976 #### How I did it Remove DEBUGLOG definition and line that echo's message to mentioned log file. #### How to verify it Manually verified, /tmp/${SERVICE}-debug.log files do not exist and log for service starting still appears in syslog	2024-01-25 17:14:21 -08:00
mssonicbld	1fb9732f41	[ci/build]: Upgrade SONiC package versions	2024-01-25 14:35:40 +08:00
Oleksandr Ivantsiv	c693e75f0f	[dns] Do not apply dynamic DNS configuration when MGMT interface has static IP address. (#17769 ) ### Why I did it Fix the issue detected by[ TestStaticMgmtPortIP::test_dynamic_dns_not_working_when_static_ip_configured ](https://github.com/sonic-net/sonic-mgmt/blob/master/tests/dns/static_dns/test_static_dns.py#L105C9-L105C63) test. ### How I did it Query MGMT interface configuration. Do not apply dynamic DNS configuration when MGMT interface has static IP address. #### How to verify it Run `tests/dns/static_dns/test_static_dns.py` sonic-mgmt tests.	2024-01-23 16:29:55 -08:00
Hua Liu	c274be2e59	Fix IPV6 forced-mgmt-route not work issue (#17299 ) ix IPV6 forced-mgmt-route not work issue Why I did it IPV6 forced-mgmt-route not work When add a IPV6 route, should use 'ip -6 rule add pref 32764 address' command, but currently in the template the '-6' parameter are missing, so the IPV6 route been add to IPV4 route table. Also this PR depends on #17281 , which will fix the IPV6 'default' route table missing in IPV6 route lookup issue. Microsoft ADO (number only):24719238	2024-01-22 09:59:12 -08:00
Nazarii Hnydyn	e173987a56	[swss/syncd]: Remove dependency on interfaces-config.service (#17739 ) Signed-off-by: Nazarii Hnydyn <nazariig@nvidia.com> Co-authored-by: Stepan Blyshchak <38952541+stepanblyschak@users.noreply.github.com>	2024-01-18 08:04:00 -08:00
Liping Xu	d6e0bf66a6	disable restapi for leafRouter in slim image (#17713 ) Why I did it For some devices with small memory, after upgrading to the latest image, the available memory is not enough. Work item tracking Microsoft ADO (number only): 26324242 How I did it Disable restapi feature for LeafRouter which with slim image. How to verify it verified on 7050qx T1 (slim image), restapi disabled verified on 7050qx T0 (slim image), restapi enabled verified on 7260 T1 (normal image), restapi enabled	2024-01-12 15:26:06 +08:00
Lawrence Lee	eb70bff4b7	add timeout to ping6 command (#17729 ) Signed-off-by: Lawrence Lee <lawlee@microsoft.com>	2024-01-10 14:40:15 -08:00
prabhataravind	c20abb9e28	[docker_image_ctl.j2]: swss docker initialization improvements (#17628 ) * [docker_image_ctl.j2]: swss docker initialization improvements This commit attempts to address the following: * Make sure swss container is indeed up and running before running any commands on it. In case where swss container is not fully up when swss.sh attempts to create swss:/ready file using "docker exec swss$DEV touch", the command can fail silently and can cause swssconfig to wait forever leading to missing IP decap configuration among other things. Add a wait so that docker commands are run only after swss container status is "Running" * Add a log when swss:/ready file is created or if the file creation fails so that it becomes easier to debug such scenarios in the future * [docker_image_ctl.j2]: Use swss$DEV to accommodate multi ASIC platforms as well Signed-off-by: Prabhat Aravind <paravind@microsoft.com>	2024-01-03 17:44:22 -08:00
bingwang-ms	977e73d370	Update backend_acl.py to specify ACL table name (#17553 )	2024-01-03 14:55:38 -08:00
prabhataravind	038ca267c8	[image_config]: Update DHCP rate-limit for mgmt TOR devices (#17630 ) * [image_config]: Update DHCP rate-limit for mgmt TOR devices Change DHCP rate limit(queue4,group3) in SONiC copp configuration to 300 PPS for mgmt TORs while keeping the rate limit at 100 PPS for other topologies. Why I did it: Some mgmt TORs based on Marvell ASIC do not support 100 PPS CIR, so that led to these devices silently dropping DHCP packets. Microsoft ADO: 25820076 How to verify it: Send DHCP broadcast packets to an M0 DUT and verify that they are trapped to CPU at 300 PPS. On non-mgmt devices, the packets should be trapped at CIR of 100 PPS. Also ran sonic-mgmt dhcp_relay test and confirmed that it passes. Signed-off-by: Prabhat Aravind <paravind@microsoft.com>	2024-01-02 21:29:34 -08:00
Junchao-Mellanox	f3f2972512	Optimize syslog rate limit feature for fast and warm boot (#17458 ) - Why I did it Optimize syslog rate limit feature for fast and warm boot - How I did it Optimize redis start time Don't render rsyslog.conf in container startup script Disable containercfgd by default. There is a new CLI to enable it (in another PR) - How to verify it Manual test Regression test	2023-12-20 09:12:03 +02:00
Prince George	30ff77350f	Fix the fsck script that does filesystem repair (#17424 ) Fix the fsck check which is not working. Potentially fixes #16938 Modified fsck script to run on the ext4.fsck on the appropriate disk where SONiC resides Microsoft ADO: 26098631	2023-12-19 17:51:49 -08:00
Junhua Zhai	53be9de743	Fix syncd_request_shutdown coredump in config reload on KVM sonic (#17486 ) The issue is related to #16812. Process syncd does not run in the container gbsyncd on kvm sonic with default hwsku. Microsoft ADO : 26151608 How I did it If syncd has not run in container gbsyncd, it is not needed to trigger graceful shudown of syncd. How to verify it None of syncd_request_shutdown coredump in config reload on KVM sonic	2023-12-13 17:37:44 -08:00
Yevhen Fastiuk	5efb123ede	[NTP] Add NTP extended configuration (#15058 ) hld [#1296](https://github.com/sonic-net/SONiC/pull/1296) closes [#1254](https://github.com/sonic-net/SONiC/issues/1254) depends-on [#60](https://github.com/sonic-net/sonic-host-services/pull/60), [#781](https://github.com/sonic-net/sonic-swss-common/pull/781), [#2835](https://github.com/sonic-net/sonic-utilities/pull/2835), [#10749](https://github.com/sonic-net/sonic-mgmt/pull/10749) #### Why I did it To cover the next AIs: * Configure NTP global parameters * Add/remove new NTP servers * Change the configuration for NTP servers * Show NTP status * Show NTP configuration ### How I did it * Add YANG model for a new configuration * Extend configuration templates to support new knobs ### Description for the changelog * Add ability to configure NTP global parameters such as authentication, dhcp, admin state * Change the configuration for NTP servers * Add an ability to show NTP configuration #### Link to config_db schema for YANG module changes [NTP configuration](https://github.com/sonic-net/sonic-buildimage/blob/master/src/sonic-yang-models/doc/Configuration.md#ntp-and-syslog-servers)	2023-12-11 13:31:35 -08:00
Stepan Blyshchak	b61528bee9	Revert "[swss/syncd] remove dependency on interfaces-config.service (#13084 ) (#14341 )" (#15094 ) (#17367 ) This reverts commit `499f57a7f7`. Co-authored-by: Nazarii Hnydyn <nazariig@nvidia.com>	2023-12-07 15:20:39 -08:00
Ying Xie	2e072beb41	Revert "[pmon] update gRPC version to 1.57.0 (#16257 )" (#17401 ) This reverts commit `45a852233b`.	2023-12-07 11:01:47 -08:00
centecqianj	8ec4b53451	[Bookworm] Upgrade centec-arm64 platform to Bookworm. (#17411 ) Why I did it 1. Upgrade centec-arm64 platform to Bookworm. 2. Solve the problem of compiling the docker-syncd-centec-rpc.gz error on the centec platform. How I did it 1. Modified platform driver to comply with bookworm kernel. 2. Upgrade SONiC package versions of the centec platform. How to verify it 1. Compile the centec-arm64 platform to generate sonic-centec-arm64.bin. 2. Compile the centec platform to generate docker-syncd-centec-rpc.gz. Signed-off-by: centecqianj <qianj@centec.com>	2023-12-07 08:42:13 -08:00
Stepan Blyshchak	9555883e6f	[config-chassisdb] use cached variables (#17342 ) - Why I did it Improve boot performance mostly needed for fast and warmboot - How I did it Use cached variable. - How to verify it Boot the system. Simply do "systemd-analyze blame" and look at service start time. Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>	2023-12-07 15:24:21 +02:00
Stepan Blyshchak	6435df1056	[config-topology] use cached variables (#17343 ) - Why I did it Improve boot performance mostly needed for fast and warmboot - How I did it Use cached variable. - How to verify it Boot the system. Simply do "systemd-analyze blame" and look at service start time. Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>	2023-12-07 15:22:44 +02:00
Aaron Payment	0ecee5df05	[gbsyncd]: Set SYSLOG_CONFIG_FEATURE for gbsyncd (#17325 ) Why I did it SONiC Mgmt test syslog/test_syslog_rate_limit.py syslog.test_syslog_rate_limit test_syslog_rate_limit was failing on SKUs with gbsyncd. This includes Arista 720DT when testing on the 202305 branch. How I did it The issue was no value for gbsyncd in "show syslog rate-limit-container", because gbsyncd is not having a SYSLOG_CONFIG_FEAGTURE\|gbsyncd entry in config_db, which is further because gbsyncd feature is for not enabled through init_cfg.json.j2. How to verify it Test is now passing on 720DT in 202305 branch. Co-authored-by: Boyang Yu <byu@arista.com>	2023-12-06 22:04:21 -08:00
Junhua Zhai	048f2a7c39	[gbsyncd] Graceful shutdown of syncd process in container gbsyncd (#16812 ) Fix #16608. Need to gracefully shutdown syncd/gbsyncd individually.	2023-12-06 21:43:13 -08:00
Hua Liu	164916681a	Fix can't access IPV6 address via management interface because 'default' route table does not add to route lookup issue. (#17281 ) Fix can't access IPV6 address via management interface because 'default' route table does not add to route lookup issue. #### Why I did it When device set with IPV6 TACACS server address, and shutdown all BGP, device can't connect to TACACS server via management interface. After investigation, I found the IPV6 'default' route table does not add to route lookup: admin@vlab-01:~$ ip -6 rule list 1001: from all lookup local 32765: from fec0::ffff:afa:1 lookup default 32766: from all lookup main admin@vlab-01:~$ As compare: admin@vlab-01:~$ ip -4 rule list 1001: from all lookup local 32764: from all to 172.17.0.1/24 lookup default 32765: from 10.250.0.101 lookup default 32766: from all lookup main 32767: from all lookup default <== 'default' route table exist in IPV4 route lookup Issue fix by add 'default' route table to route lookup with following command: admin@vlab-01:~$ sudo ip -6 rule add pref 32767 lookup default admin@vlab-01:~$ ip -6 rule list 1001: from all lookup local 32765: from fec0::ffff:afa:1 lookup default 32766: from all lookup main 32767: from all lookup default <== 'default' route table been added to IPV6 route lookup admin@vlab-01:~$ ##### Work item tracking - Microsoft ADO: 25798732 #### How I did it When management interface using 'default' route table, add 'default' route table to IPV6 route lookup. #### How to verify it Pass all UT. Add new UT to cover this change. Manually verify issue fixed: ### Tested branch (Please provide the tested image version) - [x] master-17281.417570-2133d58fa #### Description for the changelog Fix can't access IPV6 address via management interface because 'default' route table does not add to route lookup issue.	2023-12-05 11:51:56 -08:00
Ashwin Hiranniah	ada7c6a72e	Add pensando platform (#15978 ) This commit adds support for pensando asic called ELBA. ELBA is used in pci based cards and in smartswitches. #### Why I did it This commit introduces pensando platform which is based on ELBA ASIC. ##### Work item tracking - Microsoft ADO (number only): #### How I did it Created platform/pensando folder and created makefiles specific to pensando. This mainly creates pensando docker (which OEM's need to download before building an image) which has all the userspace to initialize and use the DPU (ELBA ASIC). Output of the build process creates two images which can be used from ONIE and goldfw. Recommendation is use to use ONIE. #### How to verify it Load the SONiC image via ONIE or goldfw and make sure the interfaces are UP. ##### Description for the changelog Add pensando platform support.	2023-12-04 14:41:52 -08:00
Kebo Liu	4c699050e8	[Mellanox] Add special rsyslog filter for MSN2410 platform (#17365 ) - Why I did it Mellanox MSN2410 platforms have a non-functional error log: "ERR pmon#sensord: Error getting sensor data: dps460/#10: Can't read". This error is because of a firmware issue with some PSU, we are not able to upgrade the FW online. Since there is no functional impact, this error log can be ignored safely - How I did it Add a new rsyslog rule to the rsyslog-container.conf.j2, if the docker name is pmon and the platform name matches, the new rule will be inserted into the docker rsyslogd.conf - How to verify it run regression on the MSN2410 platform to make the error log will not be printed to the syslog. Signed-off-by: Kebo Liu <kebol@nvidia.com>	2023-12-03 15:32:56 +02:00
centecqianj	8db3a99d11	[Bookworm] Upgrade centec platforms to Bookworm (#17364 ) How I did it Modified platform driver to comply with bookworm kernel. Modified python build commands for building whl packages. How to verify it Verify whether all the platform bookworm debs are built. make target/debs/bookworm/platform-modules-v682-48y8c-d_1.0_amd64.deb Load the platform debian into the device and install it in bookworm image. Verify the platform related CLI and the functionality Signed-off-by: centecqianj <qianj@centec.com>	2023-12-01 16:07:52 -08:00
Lawrence Lee	572af1dcdf	[arp_update]: Flush neighbors with incorrect MAC info (#17238 ) [arp_update]: Flush MAC mismatch neighbors - Check for MAC mismatch between neighbor entries in the kernel and APPL_DB - Flush any entries with a mismatch	2023-11-30 14:23:05 -08:00
Xincun Li	f13081bfbd	Ensure that 'logrotate-config.service' is set as a dependency to start before 'logrotate.service'. (#17312 ) * Ensure that 'logrotate-config.service' is set as a dependency to start before 'logrotate.service'.	2023-11-29 17:22:47 -08:00
Vivek	4727185648	[lldp] Clean up service start logic owing to port init start optimization (#17268 ) Signed-off-by: Vivek Reddy <vkarri@nvidia.com>	2023-11-27 09:56:54 -08:00
prabhataravind	aea3c42f29	[image_config]: Update DHCP rate-limit (#17132 ) Change DHCP rate limit in SONiC copp configuration to 100 PPS as this is necessary to ensure that DHCP flood does not cause LACP/BGP flaps in all scenarios This is an extension to the change in image_config: copp: Enable rate limiting for bgp, lacp, dhcp, lldp, macsec and udld #14859 and sonic-mgmt change in [tests/copp]: Update copp mgmt tests to support new rate-limits sonic-mgmt#8199 Why I did it 300 PPS is not sufficient to prevent LACP/BGP flaps in all cases. 100 PPS seems to provide better resiliency against DHCP traffic flood to CPU. Microsoft ADO 25776614: Send DHCP broadcast packets to DUT and verify that they are trapped to CPU at 100 PPS. Signed-off-by: Prabhat Aravind <paravind@microsoft.com>	2023-11-22 15:02:17 -08:00
mssonicbld	52e304afcf	[ci/build]: Upgrade SONiC package versions (#17035 )	2023-11-21 18:53:15 -08:00
Saikrishna Arcot	318f3945be	Modify the sudoers file to lecture RO users once Debian changed the defaults of the sudo package to never lecture the user when using an unauthorized sudo command, which breaks our use case of lecturing once. Add a line to lecture once, which is the old defaults. Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>	2023-11-21 18:53:15 -08:00
Saikrishna Arcot	862bd794ee	Fix container down event not sending out a notification systemd changed the log message syntax for a container going down. Update the regex for the new format. Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>	2023-11-21 18:53:15 -08:00
Saikrishna Arcot	cae42998dd	Fix PAM module configuration issue pam-auth-update doesn't store local configuration, and it's meant to be used by packages only. Because libpam-systemd was getting uninstalled afterwards, this caused tacplus to get re-enabled. Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>	2023-11-21 18:53:15 -08:00
Saikrishna Arcot	73605a98ef	Modify rasdaemon service on amd64 only Rasdaemon is not installed on armhf or arm64 Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>	2023-11-21 18:53:15 -08:00
Saikrishna Arcot	0664c791ef	For Bookworm, use non-free-firmware instead of non-free Starting with Bookworm, Debian moved the non-free Linux firmware blobs into a new non-free-firmware component, since they are frequently needed by users and since they need to be updated frequently. Since the only thing we currently install from the non-free component (that I can think of) is the Linux firmware, have Bookworm use non-free-firmware instead of non-free. Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>	2023-11-21 18:53:15 -08:00
Saikrishna Arcot	ed5176107b	Update Debian build script for Bookworm Notable changes: * Use j2cli from Debian repos instead of pip * Use setuptools from Debian repos instead of pip * Use wheel from Debian repos instead of pip * Update grpcio and grpcio-tools python packages to match version in Bookworm * Use m2crypto from Debian repos instead of pip Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>	2023-11-21 18:53:15 -08:00
Saikrishna Arcot	34a1ac1a0f	Migrate from ntp to ntpsec Debian Bookworm no longer uses NTP, and instead uses NTPsec. Modify our files to update/replace the NTPsec files instead. Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>	2023-11-21 18:53:15 -08:00
abdosi	4a7aa2634f	[chassis] Support advertisement of Loopback0 of all LC's across all e-BGP peers in TSA mode (#16714 ) What I did: In Chassis TSA mode Loopback0 Ip's of each LC's should be advertise through e-BGP peers of each remote LC's How I did: - Route-map policy to Advertise own/self Loopback IP to other internal iBGP peers with a community internal_community as define in constants.yml - Route-map policy to match on above internal_community when route is received from internal iBGP peers and set a internal tag as define in constants.yml and also delete the internal_community so we don't send to any of e-BGP peers - In TSA new route-map match on above internal tag and permit the route (Loopback0 IP's of remote LC's) and set the community to traffic_shift_community. - In TSB delete the above new route-map. How I verify: Manual Verification UT updated. sonic-mgmt PR: sonic-net/sonic-mgmt#10239 Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>	2023-11-20 09:42:02 -08:00
Ze Gan	9f08f88a0d	[dpu]: Add DPU database service (#17161 ) Sub PRs: sonic-net/sonic-host-services#84 #17191 Why I did it According to the design, the database instances of DPU will be kept in the NPU host. Microsoft ADO (number only): 25072889 How I did it To follow the multiple ASIC design, I assume a new platform environment variable NUM_DPU will be defined in the /usr/share/sonic/device/$PLATFORM/platform_env.conf. Based on this number, NPU host will launch a corresponding number of instances for the DPU database. Signed-off-by: Ze Gan <ganze718@gmail.com>	2023-11-17 09:10:03 -08:00
ganglv	c71fb3a30f	Share image for gnmi and telemetry (#16863 ) Why I did it Share docker image to support gnmi container and telemetry container Work item tracking Microsoft ADO 25423918: How I did it Create telemetry image from gnmi docker image. Enable gnmi container and disable telemetry container by default. How to verify it Run end to end test.	2023-11-08 08:54:36 +08:00
prabhataravind	7e49530459	[copp]: Enable rate limiting for bgp, lacp, dhcp, lldp, macsec and udld (#14859 ) Why I did it It was observed that a flood of DHCP packets without rate-limiting can cause BGP flaps or lacp keepalive losses. This change attempts to prevent or reduce such BGP flaps by enabling appropriate rate-limiting in SONiC for all traffic types. Work item tracking Microsoft ADO 17964421: How I did it Set a reasonable CIR/CBS value of 300 for queue4_group3 (dhcp, lldp, macsec) and 6000 for queue4_group1. The value 300 was arrived at after testing with dhcp flooding using ptf (using multiple threads). Throttling at this rate was necessary to ensure that dhcp flooding does not cause BGP flaps. How to verify it Verified with this script running from ptf, that BGP flaps don't happen when CBS/CIR is set at 300 for queue4_group3. import threading from scapy.all import * def send_dhcp_discover(intf): dhcp_discover = Ether(dst='ff:ff:ff:ff:ff:ff',src=RandMAC()) \ /IP(src='1.1.1.1',dst='255.255.255.255') \ /UDP(sport=68,dport=67) \ /DHCP(options=[('message-type','discover'),('end')]) sendp(dhcp_discover,count=100000,iface=intf) if __name__ == "__main__": t1 = threading.Thread(target=send_dhcp_discover, args=("eth1",)) t2 = threading.Thread(target=send_dhcp_discover, args=("eth2",)) t1.start() t2.start() t1.join() t2.join() Verified on Arista-7260CX3-D108C8 running 202012 that the copp rule for queue4_group1 and queue4_group3 do NOT affect BGP packets. To verify this using PTF, the copp rules were modified to set the "CBS" and "CIR" for queue4_group1 and queue4_group3 at 600pps and 50k packets each of "BGP open" and "DHCP Discover" were simultaneously sent from the same PTF port to the DUT. It was verified using "show c cpu" that packets are hitting the cpu queue at 1200 pps (double the configured CIR/CBS for these packet types). This helped conclude that throttling rate is per trap (or packet type) and not per queue. Verified with updated sonic-mgmt tests ([tests/copp]: Update copp mgmt tests to support new rate-limits sonic-mgmt#8199) on broadcom and mellanox platforms that these traffic types are rate-limited. Signed-off-by: Prabhat Aravind <paravind@microsoft.com>	2023-10-25 10:49:24 -07:00
Kebo Liu	31451295d5	Add special rsyslog filter for MSN2700 platform (#16684 ) - Why I did it Mellanox MSN2700 platforms have a non-functional error log: "ERR pmon#sensord: Error getting sensor data: dps460/#10: Can't read". This error is because of a firmware issue with some PSU, we are not able to upgrade the FW online. Since there is no functional impact, this error log can be ignored safely. - How I did it Add a new rsyslog rule to the rsyslog-container.conf.j2, if the docker name is pmon and the platform name matches, the new rule will be inserted into the docker rsyslogd.conf - How to verify it run regression on the MSN2700 platform to make the error log will not be printed to the syslog. Signed-off-by: Kebo Liu <kebol@nvidia.com>	2023-10-24 17:54:44 +03:00
Samuel Angebault	e4a497183a	Add build option to reduce final image size (#16729 ) * Reduce SONiC image filesystem size Add a build option to reduce the image size. The image reduction process is affecting the builds in 2 ways: - change some packages that are installed in the rootfs - apply a rootfs reduction script The script itself will perform a few steps: - remove file duplication by leveraging hardlinks - under /usr/share/sonic since the symlinks under the device folder are lost during the build. - under /var/lib/docker since the files there will only be mounted ro - remove some extra files (man, docs, licenses, ...) - some image specific space reduction (only for aboot images currently) The script can later be improved but for now it's reducing the rootfs size by ~30%. * restore fully featured vim package	2023-10-24 10:01:58 +08:00

1 2 3 4 5 ...

1299 Commits