sonic-buildimage

Author	SHA1	Message	Date
Stepan Blyshchak	6435df1056	[config-topology] use cached variables (#17343 ) - Why I did it Improve boot performance mostly needed for fast and warmboot - How I did it Use cached variable. - How to verify it Boot the system. Simply do "systemd-analyze blame" and look at service start time. Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>	2023-12-07 15:22:44 +02:00
Aaron Payment	0ecee5df05	[gbsyncd]: Set SYSLOG_CONFIG_FEATURE for gbsyncd (#17325 ) Why I did it SONiC Mgmt test syslog/test_syslog_rate_limit.py syslog.test_syslog_rate_limit test_syslog_rate_limit was failing on SKUs with gbsyncd. This includes Arista 720DT when testing on the 202305 branch. How I did it The issue was no value for gbsyncd in "show syslog rate-limit-container", because gbsyncd is not having a SYSLOG_CONFIG_FEAGTURE\|gbsyncd entry in config_db, which is further because gbsyncd feature is for not enabled through init_cfg.json.j2. How to verify it Test is now passing on 720DT in 202305 branch. Co-authored-by: Boyang Yu <byu@arista.com>	2023-12-06 22:04:21 -08:00
Junhua Zhai	048f2a7c39	[gbsyncd] Graceful shutdown of syncd process in container gbsyncd (#16812 ) Fix #16608. Need to gracefully shutdown syncd/gbsyncd individually.	2023-12-06 21:43:13 -08:00
Hua Liu	164916681a	Fix can't access IPV6 address via management interface because 'default' route table does not add to route lookup issue. (#17281 ) Fix can't access IPV6 address via management interface because 'default' route table does not add to route lookup issue. #### Why I did it When device set with IPV6 TACACS server address, and shutdown all BGP, device can't connect to TACACS server via management interface. After investigation, I found the IPV6 'default' route table does not add to route lookup: admin@vlab-01:~$ ip -6 rule list 1001: from all lookup local 32765: from fec0::ffff:afa:1 lookup default 32766: from all lookup main admin@vlab-01:~$ As compare: admin@vlab-01:~$ ip -4 rule list 1001: from all lookup local 32764: from all to 172.17.0.1/24 lookup default 32765: from 10.250.0.101 lookup default 32766: from all lookup main 32767: from all lookup default <== 'default' route table exist in IPV4 route lookup Issue fix by add 'default' route table to route lookup with following command: admin@vlab-01:~$ sudo ip -6 rule add pref 32767 lookup default admin@vlab-01:~$ ip -6 rule list 1001: from all lookup local 32765: from fec0::ffff:afa:1 lookup default 32766: from all lookup main 32767: from all lookup default <== 'default' route table been added to IPV6 route lookup admin@vlab-01:~$ ##### Work item tracking - Microsoft ADO: 25798732 #### How I did it When management interface using 'default' route table, add 'default' route table to IPV6 route lookup. #### How to verify it Pass all UT. Add new UT to cover this change. Manually verify issue fixed: ### Tested branch (Please provide the tested image version) - [x] master-17281.417570-2133d58fa #### Description for the changelog Fix can't access IPV6 address via management interface because 'default' route table does not add to route lookup issue.	2023-12-05 11:51:56 -08:00
Ashwin Hiranniah	ada7c6a72e	Add pensando platform (#15978 ) This commit adds support for pensando asic called ELBA. ELBA is used in pci based cards and in smartswitches. #### Why I did it This commit introduces pensando platform which is based on ELBA ASIC. ##### Work item tracking - Microsoft ADO (number only): #### How I did it Created platform/pensando folder and created makefiles specific to pensando. This mainly creates pensando docker (which OEM's need to download before building an image) which has all the userspace to initialize and use the DPU (ELBA ASIC). Output of the build process creates two images which can be used from ONIE and goldfw. Recommendation is use to use ONIE. #### How to verify it Load the SONiC image via ONIE or goldfw and make sure the interfaces are UP. ##### Description for the changelog Add pensando platform support.	2023-12-04 14:41:52 -08:00
Kebo Liu	4c699050e8	[Mellanox] Add special rsyslog filter for MSN2410 platform (#17365 ) - Why I did it Mellanox MSN2410 platforms have a non-functional error log: "ERR pmon#sensord: Error getting sensor data: dps460/#10: Can't read". This error is because of a firmware issue with some PSU, we are not able to upgrade the FW online. Since there is no functional impact, this error log can be ignored safely - How I did it Add a new rsyslog rule to the rsyslog-container.conf.j2, if the docker name is pmon and the platform name matches, the new rule will be inserted into the docker rsyslogd.conf - How to verify it run regression on the MSN2410 platform to make the error log will not be printed to the syslog. Signed-off-by: Kebo Liu <kebol@nvidia.com>	2023-12-03 15:32:56 +02:00
centecqianj	8db3a99d11	[Bookworm] Upgrade centec platforms to Bookworm (#17364 ) How I did it Modified platform driver to comply with bookworm kernel. Modified python build commands for building whl packages. How to verify it Verify whether all the platform bookworm debs are built. make target/debs/bookworm/platform-modules-v682-48y8c-d_1.0_amd64.deb Load the platform debian into the device and install it in bookworm image. Verify the platform related CLI and the functionality Signed-off-by: centecqianj <qianj@centec.com>	2023-12-01 16:07:52 -08:00
Lawrence Lee	572af1dcdf	[arp_update]: Flush neighbors with incorrect MAC info (#17238 ) [arp_update]: Flush MAC mismatch neighbors - Check for MAC mismatch between neighbor entries in the kernel and APPL_DB - Flush any entries with a mismatch	2023-11-30 14:23:05 -08:00
Xincun Li	f13081bfbd	Ensure that 'logrotate-config.service' is set as a dependency to start before 'logrotate.service'. (#17312 ) * Ensure that 'logrotate-config.service' is set as a dependency to start before 'logrotate.service'.	2023-11-29 17:22:47 -08:00
Vivek	4727185648	[lldp] Clean up service start logic owing to port init start optimization (#17268 ) Signed-off-by: Vivek Reddy <vkarri@nvidia.com>	2023-11-27 09:56:54 -08:00
prabhataravind	aea3c42f29	[image_config]: Update DHCP rate-limit (#17132 ) Change DHCP rate limit in SONiC copp configuration to 100 PPS as this is necessary to ensure that DHCP flood does not cause LACP/BGP flaps in all scenarios This is an extension to the change in image_config: copp: Enable rate limiting for bgp, lacp, dhcp, lldp, macsec and udld #14859 and sonic-mgmt change in [tests/copp]: Update copp mgmt tests to support new rate-limits sonic-mgmt#8199 Why I did it 300 PPS is not sufficient to prevent LACP/BGP flaps in all cases. 100 PPS seems to provide better resiliency against DHCP traffic flood to CPU. Microsoft ADO 25776614: Send DHCP broadcast packets to DUT and verify that they are trapped to CPU at 100 PPS. Signed-off-by: Prabhat Aravind <paravind@microsoft.com>	2023-11-22 15:02:17 -08:00
mssonicbld	52e304afcf	[ci/build]: Upgrade SONiC package versions (#17035 )	2023-11-21 18:53:15 -08:00
Saikrishna Arcot	318f3945be	Modify the sudoers file to lecture RO users once Debian changed the defaults of the sudo package to never lecture the user when using an unauthorized sudo command, which breaks our use case of lecturing once. Add a line to lecture once, which is the old defaults. Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>	2023-11-21 18:53:15 -08:00
Saikrishna Arcot	862bd794ee	Fix container down event not sending out a notification systemd changed the log message syntax for a container going down. Update the regex for the new format. Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>	2023-11-21 18:53:15 -08:00
Saikrishna Arcot	cae42998dd	Fix PAM module configuration issue pam-auth-update doesn't store local configuration, and it's meant to be used by packages only. Because libpam-systemd was getting uninstalled afterwards, this caused tacplus to get re-enabled. Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>	2023-11-21 18:53:15 -08:00
Saikrishna Arcot	73605a98ef	Modify rasdaemon service on amd64 only Rasdaemon is not installed on armhf or arm64 Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>	2023-11-21 18:53:15 -08:00
Saikrishna Arcot	0664c791ef	For Bookworm, use non-free-firmware instead of non-free Starting with Bookworm, Debian moved the non-free Linux firmware blobs into a new non-free-firmware component, since they are frequently needed by users and since they need to be updated frequently. Since the only thing we currently install from the non-free component (that I can think of) is the Linux firmware, have Bookworm use non-free-firmware instead of non-free. Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>	2023-11-21 18:53:15 -08:00
Saikrishna Arcot	ed5176107b	Update Debian build script for Bookworm Notable changes: * Use j2cli from Debian repos instead of pip * Use setuptools from Debian repos instead of pip * Use wheel from Debian repos instead of pip * Update grpcio and grpcio-tools python packages to match version in Bookworm * Use m2crypto from Debian repos instead of pip Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>	2023-11-21 18:53:15 -08:00
Saikrishna Arcot	34a1ac1a0f	Migrate from ntp to ntpsec Debian Bookworm no longer uses NTP, and instead uses NTPsec. Modify our files to update/replace the NTPsec files instead. Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>	2023-11-21 18:53:15 -08:00
abdosi	4a7aa2634f	[chassis] Support advertisement of Loopback0 of all LC's across all e-BGP peers in TSA mode (#16714 ) What I did: In Chassis TSA mode Loopback0 Ip's of each LC's should be advertise through e-BGP peers of each remote LC's How I did: - Route-map policy to Advertise own/self Loopback IP to other internal iBGP peers with a community internal_community as define in constants.yml - Route-map policy to match on above internal_community when route is received from internal iBGP peers and set a internal tag as define in constants.yml and also delete the internal_community so we don't send to any of e-BGP peers - In TSA new route-map match on above internal tag and permit the route (Loopback0 IP's of remote LC's) and set the community to traffic_shift_community. - In TSB delete the above new route-map. How I verify: Manual Verification UT updated. sonic-mgmt PR: sonic-net/sonic-mgmt#10239 Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>	2023-11-20 09:42:02 -08:00
Ze Gan	9f08f88a0d	[dpu]: Add DPU database service (#17161 ) Sub PRs: sonic-net/sonic-host-services#84 #17191 Why I did it According to the design, the database instances of DPU will be kept in the NPU host. Microsoft ADO (number only): 25072889 How I did it To follow the multiple ASIC design, I assume a new platform environment variable NUM_DPU will be defined in the /usr/share/sonic/device/$PLATFORM/platform_env.conf. Based on this number, NPU host will launch a corresponding number of instances for the DPU database. Signed-off-by: Ze Gan <ganze718@gmail.com>	2023-11-17 09:10:03 -08:00
ganglv	c71fb3a30f	Share image for gnmi and telemetry (#16863 ) Why I did it Share docker image to support gnmi container and telemetry container Work item tracking Microsoft ADO 25423918: How I did it Create telemetry image from gnmi docker image. Enable gnmi container and disable telemetry container by default. How to verify it Run end to end test.	2023-11-08 08:54:36 +08:00
prabhataravind	7e49530459	[copp]: Enable rate limiting for bgp, lacp, dhcp, lldp, macsec and udld (#14859 ) Why I did it It was observed that a flood of DHCP packets without rate-limiting can cause BGP flaps or lacp keepalive losses. This change attempts to prevent or reduce such BGP flaps by enabling appropriate rate-limiting in SONiC for all traffic types. Work item tracking Microsoft ADO 17964421: How I did it Set a reasonable CIR/CBS value of 300 for queue4_group3 (dhcp, lldp, macsec) and 6000 for queue4_group1. The value 300 was arrived at after testing with dhcp flooding using ptf (using multiple threads). Throttling at this rate was necessary to ensure that dhcp flooding does not cause BGP flaps. How to verify it Verified with this script running from ptf, that BGP flaps don't happen when CBS/CIR is set at 300 for queue4_group3. import threading from scapy.all import * def send_dhcp_discover(intf): dhcp_discover = Ether(dst='ff:ff:ff:ff:ff:ff',src=RandMAC()) \ /IP(src='1.1.1.1',dst='255.255.255.255') \ /UDP(sport=68,dport=67) \ /DHCP(options=[('message-type','discover'),('end')]) sendp(dhcp_discover,count=100000,iface=intf) if __name__ == "__main__": t1 = threading.Thread(target=send_dhcp_discover, args=("eth1",)) t2 = threading.Thread(target=send_dhcp_discover, args=("eth2",)) t1.start() t2.start() t1.join() t2.join() Verified on Arista-7260CX3-D108C8 running 202012 that the copp rule for queue4_group1 and queue4_group3 do NOT affect BGP packets. To verify this using PTF, the copp rules were modified to set the "CBS" and "CIR" for queue4_group1 and queue4_group3 at 600pps and 50k packets each of "BGP open" and "DHCP Discover" were simultaneously sent from the same PTF port to the DUT. It was verified using "show c cpu" that packets are hitting the cpu queue at 1200 pps (double the configured CIR/CBS for these packet types). This helped conclude that throttling rate is per trap (or packet type) and not per queue. Verified with updated sonic-mgmt tests ([tests/copp]: Update copp mgmt tests to support new rate-limits sonic-mgmt#8199) on broadcom and mellanox platforms that these traffic types are rate-limited. Signed-off-by: Prabhat Aravind <paravind@microsoft.com>	2023-10-25 10:49:24 -07:00
Kebo Liu	31451295d5	Add special rsyslog filter for MSN2700 platform (#16684 ) - Why I did it Mellanox MSN2700 platforms have a non-functional error log: "ERR pmon#sensord: Error getting sensor data: dps460/#10: Can't read". This error is because of a firmware issue with some PSU, we are not able to upgrade the FW online. Since there is no functional impact, this error log can be ignored safely. - How I did it Add a new rsyslog rule to the rsyslog-container.conf.j2, if the docker name is pmon and the platform name matches, the new rule will be inserted into the docker rsyslogd.conf - How to verify it run regression on the MSN2700 platform to make the error log will not be printed to the syslog. Signed-off-by: Kebo Liu <kebol@nvidia.com>	2023-10-24 17:54:44 +03:00
Samuel Angebault	e4a497183a	Add build option to reduce final image size (#16729 ) * Reduce SONiC image filesystem size Add a build option to reduce the image size. The image reduction process is affecting the builds in 2 ways: - change some packages that are installed in the rootfs - apply a rootfs reduction script The script itself will perform a few steps: - remove file duplication by leveraging hardlinks - under /usr/share/sonic since the symlinks under the device folder are lost during the build. - under /var/lib/docker since the files there will only be mounted ro - remove some extra files (man, docs, licenses, ...) - some image specific space reduction (only for aboot images currently) The script can later be improved but for now it's reducing the rootfs size by ~30%. * restore fully featured vim package	2023-10-24 10:01:58 +08:00
Samuel Angebault	d760fb928c	Disable CPU C-States other than C1 (#16703 ) Why I did it Networking devices need to be responsive. Such responsiveness is harmed when the CPU change state. There is a latency penalty when a CPU is idle (e.g C2) and need to exit this state to come back to C1 state. To prevent this from happening the CPU should be forced to remain in C1 state. How I did it Generalize the cstate forcing to C1 to all Arista products. This is done by adding processor.max_cstate=1 to the kernel cmdline for all CPUs. Additionally Intel CPUs also need intel_idle.max_cstate=0 to fallback to the acpi_idle driver. How to verify it Check that processor.max_cstate=1 is present on the cmdline for AMD CPUs Check that both processor.max_cstate=1 and intel_idle.max_cstate=0 are present on the cmdline for Intel CPUs	2023-10-13 20:24:39 -07:00
Longxiang Lyu	072eaed2e3	[snmp] Check intfmgrd running before start (#16588 ) Add pre start check to ensure intfmgrd is running. The check will run for 20 seconds at most. Signed-off-by: Longxiang Lyu <lolv@microsoft.com>	2023-10-13 16:00:51 -07:00
Saikrishna Arcot	469aed2cf7	[baseimage]: Update openssh to 1:8.4p1-5+deb11u2 (#16826 ) Openssh in Debian Bullseye has been updated to 1:8.4p1-5+deb11u2 to fix CVE-2023-38408. Since we're building openssh with some patches, we need to update our version as well. Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>	2023-10-11 10:42:20 -07:00
Vadym Hlushko	3bd396043e	[buffers] Add 'create_only_config_db_buffers.json' file for the Mellanox devices (not MSFT SKU) (#16233 ) * [buffers] Add create_only_config_db_buffers.json for MLNX devices (not MSFT SKU), inject it at the start of the swss docker Signed-off-by: vadymhlushko-mlnx <vadymh@nvidia.com> * [buffers] Align the sonic-device_metadata.yang Signed-off-by: vadymhlushko-mlnx <vadymh@nvidia.com> --------- Signed-off-by: vadymhlushko-mlnx <vadymh@nvidia.com>	2023-10-03 08:35:57 -07:00
lixiaoyuner	bca2ce25ef	[k8master]: Install nc cmd for k8s master network issue debug (#16745 )	2023-09-30 01:16:51 -07:00
Vaibhav Hemant Dixit	07f8507911	[fast-reboot] Fix regression: set FAST_REBOOT state_db flag to support fast-reboot from older images (#16733 ) Why I did it Fix: #16699 Fast reboot is failing from old OS versions (eg., 201911 image) to latest (eg., master branch) after PR #15685 The system wide flag for FAST_REBOOT is still required when the base OS version does not support the new fast-reboot reconciliation logic (no db dump)	2023-09-28 09:37:21 -07:00
mssonicbld	4768b76610	[ci/build]: Upgrade SONiC package versions	2023-09-26 12:32:31 +08:00
Yevhen Fastiuk	52f6dd65a3	Improve remote fetch (#12795 ) ### Why I did it To fix those errors: One: ``` Connecting to urm.nvidia.com (urm.nvidia.com)\|...\|:443... connected. GnuTLS: Error in the pull function. Unable to establish SSL connection. Error 4 make[1]: Leaving directory '/sonic/src/smartmontools' [ target/debs/bullseye/smartmontools_6.6-1_amd64.deb ] ``` Second: ``` Get:90 https://debian-mirror-url buster/main amd64 librrd-dev amd64 1.7.1-2 [284 kB] Get:91 https://debian-mirror-url buster/main amd64 psmisc amd64 23.2-1+deb10u1 [126 kB] Get:92 https://debian-mirror-url buster/main amd64 python-smbus amd64 4.1-1 [12.2 kB] Get:93 https://debian-mirror-url buster/main amd64 python3.7-dev amd64 3.7.3-2+deb10u3 [510 kB] Get:94 https://debian-mirror-url buster/main amd64 python3-dev amd64 3.7.3-1 [1264 B] Get:95 https://debian-mirror-url buster/main amd64 python3-smbus amd64 4.1-1 [12.5 kB] Get:96 https://debian-mirror-url buster/main amd64 rrdtool amd64 1.7.1-2 [485 kB] Fetched 122 MB in 12s (9976 kB/s) [91mE: Failed to fetch https://debian-mirror-url/pool/main/p/python-defaults/python2-minimal_2.7.16-1_amd64.deb 500 Internal Server Error [IP: ... 443] E: Failed to fetch https://debian-mirror-url/pool/main/f/fontconfig/fontconfig-config_2.13.1-2_all.deb 500 Internal Server Error [IP: ... 443] E: Unable to fetch some archives, maybe run apt-get update or try with --fix-missing? [0mThe command '/bin/sh -c apt-get update && apt-get install -y build-essential python3-dev ipmitool librrd8 librrd-dev rrdtool python-smbus python3-smbus dmidecode i2c-tools psmisc libpci3' returned a non-zero code: 100 [ target/docker-platform-monitor.gz ] Error 1 ``` #### How I did it Add retry mechanism to apt, wget, and curl hooks	2023-09-23 18:07:04 -07:00
abdosi	2ce04ab91d	[chassisd]: Add alternate to the bridge interface created on chassis supervisor. (#16505 ) Add alternate name eth1-midplane to Linux bridge br1 created on supervisor on some chassis platforms. See description here: #16504 Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>	2023-09-23 00:27:21 -07:00
Hua Liu	11f5a75425	[tacacs]: Fix tcpdump report error when tacacs enabled (#16372 ) Fix tcpdump report error when tacacs enabled. Why I did it Fix tcpdump report error when tacacs enabled: Sep 1 09:25:18.189395 vlab-01 ERR tcpdump: nss_tacplus: /etc/tacplus_nss.conf fopen failed Sep 1 09:25:18.189606 vlab-01 ERR tcpdump: nss_tacplus: bad config or server line for nss_tacplus This is because debian add a patch create AppArmor profile for resource access control. The profile need update to allow tcpdump access /etc/tacplus_nss.conf. Work item tracking Microsoft ADO: 17667308 How I did it Modify tcpdump AppArmor profile, add new line to allow tcpdump access TACACS config file: /etc/tacplus_nss.conf r,	2023-09-23 00:07:53 -07:00
mssonicbld	a8af76f108	[ci/build]: Upgrade SONiC package versions	2023-09-22 12:32:57 +08:00
Zain Budhwani	382d68fe42	Move syncd events to syncd.conf (#15950 ) ### Why I did it syncd events should have tag sonic-events-syncd, not sonic-events-host. Created a new conf file which will have syncd events ##### Work item tracking - Microsoft ADO (number only):17747466 #### How I did it Code change #### How to verify it Pipeline	2023-09-19 17:29:49 -07:00
vdahiya12	45a852233b	[pmon] update gRPC version to 1.57.0 (#16257 ) Signed-off-by: vaibhav-dahiya <vdahiya@microsoft.com>	2023-09-15 16:41:51 -07:00
Yaqiang Zhu	d11e0a214e	Add use_unix_socket_path to supervisor-proc-exit-listener (#16548 ) Why I did it ConfigDBConnector in supervisor-proc-exit-listener uses default parameter to connect CONFIG_DB (connect by 127.0.0.1:6379) which would fail at non-host network mode container, because they are not sharing the same network and socket. How I did it Add a new parameter use_unix_socket_path to this script to indicate whether to use socket to connect CONFIG_DB. How to verify it Build image and install it, kill critical processes in container and container crushed.	2023-09-15 16:23:25 -07:00
Saikrishna Arcot	f207a9b0e0	Fix potentially not having any loopback address on lo interface (#16490 ) In #15080, there was a command added to re-add 127.0.0.1/8 to the lo interface when the networking configuration is being brought down. However, the trigger for that command is `down`, which, looking at ifupdown2 configuration files, runs immediately after 127.0.0.1/16 is removed. This means there may be a period of time where there are no loopback addresses assigned to the lo interface, and redis commands will fail. Fix this by changing this to pre-down, which should run well before 127.0.0.1/16 is removed, and should always leave lo with a loopback address. Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>	2023-09-14 12:55:50 -07:00
ganglv	ce7145475d	Fix grpc package for ptf container (#16536 ) Why I did it PTF container needs to use new grpcio package. Work item tracking Microsoft ADO (number only): How I did it Update versions-py2 How to verify it Check pipeline artifact	2023-09-14 08:27:32 +08:00
Zain Budhwani	337a9dbcf4	Add rsyslog plugin support for frr log (#16192 ) ### Why I did it Currently there is only rsyslog plugin support for /var/log/syslog, meaning we do not detect events that occur in frr logs such as BGP Hold Timer Expiry that appears in frr/bgpd.log. ##### Work item tracking - Microsoft ADO (number only): 13366345 #### How I did it Add omprog action to frr/bgpd.log and frr/zebra.log. Add appropriate regex for both events. #### How to verify it sonic-mgmt test case	2023-09-12 16:53:45 -07:00
Yaqiang Zhu	76b7cb8b64	[dhcp_server] Add dhcp_server container (#14031 ) Why I did it Add dhcp_server ipv4 feature to SONiC. HLD: sonic-net/SONiC#1282 How I did it To be clarify: This container is disabled by INCLUDE_DHCP_SERVER = n for now, which would cause container not build. Add INCLUDE_DHCP_SERVER to indicate whether to build dhcp_server container Add docker file for dhcp_server, build and install kea-dhcp4 inside container Add template file for dhcp_server container services. Add entry for dhcp_server to FEATURE table in config_db. How to verify it Build image with INCLUDE_DHCP_SERVER = y to verify: Image can be install successfully without crush. By config feature state dhcp_server enabled to enable dhcp_server.	2023-09-11 09:15:56 -07:00
vganesan-nokia	b13b41fc22	[swss] Chassis db clean up optimization and bug fixes (#16454 ) * [swss] Chassis db clean up optimization and bug fixes This commit includes the following changes: - Fix for regression failure due to error in finding CHASSIS_APP_DB in pizzabox (#PR 16451) - After attempting to delete the system neighbor entries from chassis db, before starting clearing the system interface entries, wait for sometime only if some system neighbors were deleted. If there are no system neighbors entries deleted for the asic coming up, no need to wait. - Similar changes for system lag delete. Before deleting the system lag, wait for some time only if some system lag memebers were deleted. If there are no system lag members deleted no need to wait. - Flush the SYSTEM_NEIGH_TABLE from the local STATE_DB. While asic is coming up, when system neigh entries are deleted from chassis ap db (as part of chassis db clean up), there is no orchs/process running to process the delete messages from chassis redis. Because of this, stale system neigh are entries present in the local STATE_DB. The stale entries result in creation of orphan (no corresponding data path/asic db entry) kernel neigh entries during STATE_DB:SYSTEM_NEIGH_TABLE entries processing by nbrmgr (after the swss serive came up). This is avoided by flushing the SYSTEM_NEIGH_TABLE from the local STATE_DB when sevice comes up. Signed-off-by: vedganes <veda.ganesan@nokia.com> * [swss] Chassis db clean up bug fixes review comment fix - 1 Debug logs added for deletion of other tables (SYSTEM_INTERFACE and SYSTEM_LAG_TABLE) Signed-off-by: vedganes <veda.ganesan@nokia.com> --------- Signed-off-by: vedganes <veda.ganesan@nokia.com>	2023-09-11 08:28:27 -07:00
lixiaoyuner	4f53819efa	Install parted package for k8s master (#16484 ) ### Why I did it Need a tool to extend disk size ##### Work item tracking - Microsoft ADO (number only): 25094467 #### How I did it Install parted package #### How to verify it Use apt list parted command to check if it's installed	2023-09-07 23:22:47 -07:00
Aman Singhal	e22136dd9f	[cisco]: Enable Kdump config by default for cisco-8000 (#16224 ) Why I did it Enabling kdump by default for cisco-8000 by setting crashkernel cmdline arg in device installer.conf. After bootup, sonic-kdump-config wipes crashkernel arg from /host/grub/grub.cfg, and resets USE_KDUMP in /etc/default/kdump-tools, so kdump will not be enabled on subsequent reboot. How I did it Setting kdump enable config as part of init_cfg.json for cisco-8000 platforms. How to verify it Install SONiC image with kdump enabled by default (device/hwsku/installer.conf), then reboot. Kdump config should persist on subsequent reboots and kdump loaded during bootup Signed-off-by: Aman Singhal <amans@cisco.com>	2023-09-07 01:30:24 -07:00
mssonicbld	204579a0cc	[ci/build]: Upgrade SONiC package versions	2023-09-06 12:32:47 +08:00
Prince George	a4e37a5cd6	[platform]: Disable interrupt for intel i2c-i801 driver (#16309 ) On S6100 we are seeing almost 100K interrupts per second on intels i801 SMBUS controller which affects systems performance. We now disable the i801 driver interrupt and instead enable polling Microsoft ADO (number only): 24910530 How I did it Disable the interrupt by passing the interrupt disable feature argument to i2c-i801 driver How to verify it This fix is NOT applicable for ARM based platforms. Applicable only for intel based platforms:- - On SN2700 its already disabled in Mellanox hw-mgmt - Celestica DX010 and E1031 - Dell S6100 verified the interrupts are no longer incrementing. - Arista 7260CX3 Signed-off-by: Prince George <prgeor@microsoft.com>	2023-09-05 10:23:57 -07:00
Stephen Sun	b5e8c16134	[Mellanox] Enhance FW upgrade mechanism (#16090 ) ### Why I did it 1. Enhance the diagnosis information collecting mechanism - If the option `-v` is fed, it will pass additional diagnosis flags to mlxfwmanager - Collect all the output from mlxfwmanager and print them to syslog if it fails 2. Abort syncd in case waiting for device or upgrading firmware fails Signed-off-by: Stephen Sun <stephens@nvidia.com> ### How I did it #### How to verify it Regression and manual test	2023-09-04 11:28:53 -07:00
anamehra	f6897bb585	chassis-packet: Update arp_update script for FAILED and STALE check (#16311 ) chassis-packet: Update arp_update script for FAILED and STALE check (#16311) 1. Fixing an issue with FAILED entry resolution retry. Neighbor entries in arp table may sometimes enter a FAILED state when the far end is down and reports the state as follows: 2603:10e2:400:3::1 dev PortChannel19 router FAILED While the arp_update script handles the entries for FAILED in the following format, the above was not handled due to the token location (extra router keyword at index 4): 2603:10e2:400:3::1 dev PortChannel19 FAILED The former format may appear if an arp resolution is tried on a link that is known but the far end goes down, e.g., pinging a STALE entry while the far end is down. 2. Refreshing STALE entries to make sure the far end is reachable. STALE entries for some backend ports may appear in chassis-packet when no traffic is received for a while on the port. When the far end goes down, it is expected for BFD to stop sending packets on the session for which the far end is not reachable. But as the entry is known as stale, on the Cisco chassis, BFD keeps sending packets. Refreshing the stale entry will keep active links as reachable in the neighbor table while the entries for the far end down will enter a failed state. FAILED state entries will be retired and entered reachable when far end comes back up.	2023-09-01 11:41:46 -07:00

1 2 3 4 5 ...

1274 Commits