sonic-buildimage

Author	SHA1	Message	Date
mssonicbld	7986aba097	[submodule] Update submodule sonic-platform-daemons to the latest HEAD automatically (#16497 )	2023-09-08 14:57:35 +08:00
lixiaoyuner	4f53819efa	Install parted package for k8s master (#16484 ) ### Why I did it Need a tool to extend disk size ##### Work item tracking - Microsoft ADO (number only): 25094467 #### How I did it Install parted package #### How to verify it Use apt list parted command to check if it's installed	2023-09-07 23:22:47 -07:00
snider-nokia	2f69a0eaa6	[Nokia][sonic-platform] Update Nokia sonic-platform submodule (#16348 ) This likely fixes Nokia-ION/ndk#21 To fix a failure that results when edge condition results in MDIPC channel being freed with mismatched ownership.	2023-09-07 11:20:06 -07:00
Mai Bui	e07d435553	[telemetry] limit privileged flag for telemetry container (#16350 ) Signed-off-by: Mai Bui <maibui@microsoft.com>	2023-09-07 11:04:11 -07:00
Arun Saravanan Balachandran	154c0c628b	[build] Change raw image disk size to 1700MB (#16463 ) Maximum RAM availability for NOS to SONiC migration using raw image in Dell S6100 is 1700MB. Raw images larger than that cannot be used for NOS to SONiC migration.	2023-09-07 09:19:54 -07:00
Arun Saravanan Balachandran	d04e3523cd	[build] Remove compression of raw image (#16462 )	2023-09-07 09:19:17 -07:00
Arun Saravanan Balachandran	d758e44c2c	[build] Make the build to fail if raw image generation is not successful (#16461 )	2023-09-07 09:15:03 -07:00
Dror Prital	d7b85af18b	[Mellanox] Update SDK/FW to 4.6.1062/2012.1062 Update SDK/FW/SAI to 4.6.1062/2012.1062/SAIBuild2211.25.1.4 (#16478 ) - Why I did it SAI bug Fixes 1. When creating an ACL rule with SAI_ACL_ENTRY_ATTR_FIELD_SRC_IP/SAI_ACL_ENTRY_ATTR_FIELD_DST_IP enabled, and then disabling the field by setting enable=false, a match on L3_type=IPv4 will remain programmed for the rule Issue resolved after the fix 2. Allow the max scale of virtual routers to be configure for SPC-1, SPC-2, SPC-3 which is 255 when fastboot enable and 511 when fastboot disable 3. Remove default hash key of SRC_MAC, DST_MAC and ETH_TYPE SAI features 1. Port init profile 2. Dual ToR Active-Standby \| Additional MAC support SDK/FW bug fixes 1. When preforming fast boot from an old SDK version (currently installed) to a newer one (target version), and the system was initially loaded with a new SDK version (past version), and the system has not been wiped, under specific conditions, the fast boot would use the past version's data and may fail. - How I did it Update SAI version to SAIBuild2211.25.1.4 Update SDK/FW version to 4.6.1062/2012.1062	2023-09-07 14:05:33 +03:00
mssonicbld	92d20cc9a3	[submodule] Update submodule sonic-gnmi to the latest HEAD automatically (#16480 ) #### Why I did it src/sonic-gnmi ``` * 6fd461c - (HEAD -> master, origin/master, origin/HEAD) Get origin from prefix (#149) (17 hours ago) [ganglv] ``` #### How I did it #### How to verify it #### Description for the changelog	2023-09-07 18:34:19 +08:00
Aman Singhal	e22136dd9f	[cisco]: Enable Kdump config by default for cisco-8000 (#16224 ) Why I did it Enabling kdump by default for cisco-8000 by setting crashkernel cmdline arg in device installer.conf. After bootup, sonic-kdump-config wipes crashkernel arg from /host/grub/grub.cfg, and resets USE_KDUMP in /etc/default/kdump-tools, so kdump will not be enabled on subsequent reboot. How I did it Setting kdump enable config as part of init_cfg.json for cisco-8000 platforms. How to verify it Install SONiC image with kdump enabled by default (device/hwsku/installer.conf), then reboot. Kdump config should persist on subsequent reboots and kdump loaded during bootup Signed-off-by: Aman Singhal <amans@cisco.com>	2023-09-07 01:30:24 -07:00
Liu Shilong	52568ceab0	[action] Update workflow to parse & monitor pending automation PRs. (#16446 ) Why I did it There are many automation PRs pending for PR checker failure issue. As PR number grows, github api to list prs comes to its limit. We need to monitor and send alert for these PRs. Work item tracking Microsoft ADO (number only): 25064441 How I did it For auto-cherry pick PRs: - more than 3 days, comment @author to check - more than 10 days, stop comment. - more than 28 days, comment @author PR will be closed - more than 30 days, close PR For submodule update HEAD PRs: - more than 3 days, send alert(submodule PR) How to verify it Which release bra	2023-09-07 13:34:34 +08:00
judyjoseph	7d2e3cb011	Update macsec CAK keys in profile for tests to change to type7 encoded format (#16388 ) * Change the CAK key length check in config plugin, macsec test profile changes * Fix the format in add_profile api The changes needed in various macsec unit tests and config plugin when we move to accept the type 7 encoded key format for macsec. This goes along with PR : sonic-net/sonic-swss#2892 raised earlier.	2023-09-06 21:11:02 -07:00
Saikrishna Arcot	065c35cc34	Add nlohmann-json3-dev package into the slave container (#16308 ) ### Why I did it The json.hpp header file from that package is used in the sonic-swss-common build. An old version of that header file (from 2016) has been checked into the sonic-swss-common repo. However, since then, there have been changes to that header file, and starting with GCC 12 in Bookworm, generates some errors about variables being possibly uninitialized before use. ##### Work item tracking - Microsoft ADO (number only): 25027439 #### How I did it To fix this, install the nlohmann-json3-dev package, and allow using the header file from the Debian package instead of a static checked-in version. The version in Debian Bullseye is much newer than this version. #### How to verify it With this change alone, sonic-swss-common will still be using the json.hpp file in its own codebase. The change to actually use the system header file instead of the local header file will happen in a separate PR in the necessary repoes.	2023-09-06 19:23:07 -07:00
Saikrishna Arcot	24ae0a9606	Don't build libhiredis anymore (#15633 ) ### Why I did it We're not adding any patch on top of hiredis, and there's no apparent reason to build this. Remove the build step here, and just install the package from the Debian repos. ##### Work item tracking - Microsoft ADO (number only): 24381590 #### How to verify it Build the SONiC image, and load it. Verify that services come up.	2023-09-06 16:23:34 -07:00
Kebo Liu	e286869b24	[Mellanox] Update HW-MGMT package to new version V.7.0030.1011 (#16239 ) - Why I did it 1. Update Mellanox HW-MGMT package to newer version V.7.0030.1011 2. Replace the SONiC PMON Thermal control algorithm with the one inside the HW-MGMT package on all Nvidia platforms 3. Support Spectrum-4 systems - How I did it 1. Update the HW-MGMT package version number and submodule pointer 2. Remove the thermal control algorithm implementation from Mellanox platform API 3. Revise the patch to HW-MGMT package which will disable HW-MGMT from running on SIMX 4. Update the downstream kernel patch list Signed-off-by: Kebo Liu <kebol@nvidia.com>	2023-09-06 11:32:08 +03:00
Konstantin Vasin	1e7db2ab01	[build]: Don't build ethtool from source (#15856 ) Why I did it There is no reason to build deb package ethtool from source code. We can install the same version from Debian bullseye mirror. How I did it Remove ethtool Makefiles from sonic-buildimage. Install ethtool via apt-get in pmon container.	2023-09-05 23:42:34 -07:00
mssonicbld	204579a0cc	[ci/build]: Upgrade SONiC package versions	2023-09-06 12:32:47 +08:00
Prince George	a4e37a5cd6	[platform]: Disable interrupt for intel i2c-i801 driver (#16309 ) On S6100 we are seeing almost 100K interrupts per second on intels i801 SMBUS controller which affects systems performance. We now disable the i801 driver interrupt and instead enable polling Microsoft ADO (number only): 24910530 How I did it Disable the interrupt by passing the interrupt disable feature argument to i2c-i801 driver How to verify it This fix is NOT applicable for ARM based platforms. Applicable only for intel based platforms:- - On SN2700 its already disabled in Mellanox hw-mgmt - Celestica DX010 and E1031 - Dell S6100 verified the interrupts are no longer incrementing. - Arista 7260CX3 Signed-off-by: Prince George <prgeor@microsoft.com>	2023-09-05 10:23:57 -07:00
Pavan-Nokia	31194124b5	[armhf][Nokia-7215]Add HWSKU files for new SAI (#16321 ) Add new easy bringup (EZB) files for new SAI 1.12.0	2023-09-05 10:21:53 -07:00
Rajkumar-Marvell	782a92213d	[Marvell] Update armhf sai debian to add SAI 1.12 support (#16299 ) - SAI 1.12 support Signed-off-by: rajkumar38 <rpennadamram@marvell.com>	2023-09-05 10:20:27 -07:00
jcaiMR	a522a63e25	[dhcp-relay]: dhcp/dhcpv6 per interface counter support (#16377 ) Why I did it Support DHCP/DHCPv6 per-interface counter, code change in sonic-build image. Work item tracking Microsoft ADO (17271822): How I did it - Introduce libjsoncpp-dev in dhcpmon and dhcprelay repo - Show CLI changes after counter format change How to verify it - Manually run show command - dhcpmon, dhcprelay integration tests	2023-09-05 10:16:39 -07:00
Stephen Sun	b5e8c16134	[Mellanox] Enhance FW upgrade mechanism (#16090 ) ### Why I did it 1. Enhance the diagnosis information collecting mechanism - If the option `-v` is fed, it will pass additional diagnosis flags to mlxfwmanager - Collect all the output from mlxfwmanager and print them to syslog if it fails 2. Abort syncd in case waiting for device or upgrading firmware fails Signed-off-by: Stephen Sun <stephens@nvidia.com> ### How I did it #### How to verify it Regression and manual test	2023-09-04 11:28:53 -07:00
Vadym Hlushko	78587cedc3	[Mellanox] Remove mlxtrace support for SPC4 (#16373 ) - Why I did it Because the Spectrum4 devices don't support mlxtrace utility. - How I did it Edit sai.profile and remove mlxtrace_spectrum4_itrace_*.cfg.ext files Signed-off-by: vadymhlushko-mlnx <vadymh@nvidia.com>	2023-09-04 10:53:20 +03:00
mssonicbld	c787d51f29	[submodule] Update submodule sonic-linux-kernel to the latest HEAD automatically (#16391 ) #### Why I did it src/sonic-linux-kernel ``` * 7ee50c9 - (HEAD -> master, origin/master, origin/HEAD) [Mellanox] Upstream kernel patches with HW-MGMT 7.0030.1011 (#327) (29 hours ago) [Kebo Liu] ``` #### How I did it #### How to verify it #### Description for the changelog	2023-09-03 18:33:09 +08:00
mssonicbld	ccfef69ac4	[submodule] Update submodule sonic-platform-daemons to the latest HEAD automatically (#16392 ) #### Why I did it src/sonic-platform-daemons ``` * c1c43f6 - (HEAD -> master, origin/master, origin/HEAD) [pmon][chassis][voq] Chassis DB cleanup when module is down (#394) (2 days ago) [vganesan-nokia] ``` #### How I did it #### How to verify it #### Description for the changelog	2023-09-03 18:33:05 +08:00
Yoush	559151b41e	[centec]: update sonic master centec-sai reference to v1.12.0-1 (#16238 ) Signed-off-by: yoush <yoush@centec.com>	2023-09-01 23:22:00 -07:00
Vadym Hlushko	9e3fdded69	[Mellanox][SFP] Remove unused function parameter (#16318 ) Why I did it To avoid errors when the sfputil show error-status -hw is called from the host OS (not from the pmon docker). How I did it Remove the self.sdk_handle parameter from the _get_module_info() function. How to verify it Execute the sfputil show error-status -hw Signed-off-by: vadymhlushko-mlnx <vadymh@nvidia.com>	2023-09-01 23:06:04 -07:00
Mai Bui	ff5f46955c	[database] make Redis process runs as non-root user (#16326 ) Why I did it Running the Redis server as the "root" user is not recommended. It is suggested that the server should be operated by a non-privileged user. Work item tracking Microsoft ADO (number only): 15895240 How I did it Ensure the Redis process is operating under the 'redis' user in supervisord and make redis user own REDIS_DIR inside db container. How to verify it Built new image, verify redis process is running as 'redis' user and all containers are up. Signed-off-by: Mai Bui <maibui@microsoft.com>	2023-09-01 23:03:15 -07:00
Zain Budhwani	84cfc3bc69	[eventd]: Remove unnecessary log (#16166 ) Work item tracking Microsoft ADO (number only): 16789053	2023-09-01 23:01:46 -07:00
Riff	7c1d720a65	[sonic-mgmt]: Adding sshconf 0.2.5 into sonic-mgmt container. (#16344 ) Why I did it This change is to help us running SSH config generation for our testbed in mgmt container. Original PR in sonic-mgmt repo can be found here: sonic-net/sonic-mgmt#9773. Work item tracking Microsoft ADO (number only): 25007799 How I did it Updating sonic-mgmt docker file to add sshconf 0.2.5 into pip install under venv.	2023-09-01 22:58:27 -07:00
Andrew Sapronov	0405b369af	[Netberg][Barefoot] Added support for Aurora 750 (#16342 ) Why I did it Support Intel Tofino based platforms Netberg Aurora 750 ASIC: Intel Tofino BFN-T10-064Q Pors: 64x 100G How I did it Added specification to device/netberg directory Added platform/barefoot/sonic-platform-modules-netberg contains kernel modules, scripts and sonic_platform packages. Modified the platform/barefoot/platform-modules-netberg.mk to include Aurora 750 related ID. Signed-off-by: Andrew Sapronov <andrew.sapronov@gmail.com>	2023-09-01 22:52:39 -07:00
Guohan Lu	3bdfdd95ea	Revert "[Ragile]: Add new centec platform ra-b6010 (#14819 )" This reverts commit `75062436e8`.	2023-09-01 22:43:18 -07:00
anamehra	f6897bb585	chassis-packet: Update arp_update script for FAILED and STALE check (#16311 ) chassis-packet: Update arp_update script for FAILED and STALE check (#16311) 1. Fixing an issue with FAILED entry resolution retry. Neighbor entries in arp table may sometimes enter a FAILED state when the far end is down and reports the state as follows: 2603:10e2:400:3::1 dev PortChannel19 router FAILED While the arp_update script handles the entries for FAILED in the following format, the above was not handled due to the token location (extra router keyword at index 4): 2603:10e2:400:3::1 dev PortChannel19 FAILED The former format may appear if an arp resolution is tried on a link that is known but the far end goes down, e.g., pinging a STALE entry while the far end is down. 2. Refreshing STALE entries to make sure the far end is reachable. STALE entries for some backend ports may appear in chassis-packet when no traffic is received for a while on the port. When the far end goes down, it is expected for BFD to stop sending packets on the session for which the far end is not reachable. But as the entry is known as stale, on the Cisco chassis, BFD keeps sending packets. Refreshing the stale entry will keep active links as reachable in the neighbor table while the entries for the far end down will enter a failed state. FAILED state entries will be retired and entered reachable when far end comes back up.	2023-09-01 11:41:46 -07:00
abdosi	566b5dfa1f	Assign the higher metric value for Ipv6 default route learnt via RA message (#16367 ) * Fix the Loopback0 IPv6 address of LC's in chassis not reachable from peer device's * Assign the metric vaule for Ipv6 default route learnt via RA message to higher value so that BGP learnt default route is higher priority. Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>	2023-09-01 11:38:14 -07:00
Sudharsan Dhamal Gopalarathnam	238a50ff13	[P4RT]Disabling p4rt by default to overcome build issues (#16343 ) To fix #16015 P4RT is causing instability in build due to regular failures. Disabling P4RT by default	2023-09-01 11:07:50 -07:00
Marty Y. Lok	de7fb325ae	[Nokia-IXR7250E] Modify the platform_ndk.json for Nokia-IXR7250E platform (#16355 ) Signed-off-by: mlok <marty.lok@nokia.com>	2023-09-01 08:54:40 -07:00
mssonicbld	f78d25b11e	[ci/build]: Upgrade SONiC package versions	2023-09-01 16:32:44 +08:00
mssonicbld	162edc5c73	[submodule] Update submodule sonic-snmpagent to the latest HEAD automatically (#16368 )	2023-09-01 15:03:02 +08:00
vganesan-nokia	5fded5c51b	[chassis] Chassis DB cleanup when asic comes up (#16213 ) * [chassis]Chassis DB cleanup when asic comes up Cleanup the entries from the following tables in chassis app db in redis_chassis server in the supervisor (1) SYSTEM_NEIGH (2) SYSTEM_INTERFACE (3) SYSTEM_LAG_MEMBER_TABLE (4) SYSTEM_LAG_TABLE As part of the clean up only those entries created by the asic that is coming up are deleted. The LAG IDs used by the asics are also de-allocated from SYSTEM_LAG_ID_TABLE and SYSTEM_LAG_ID_SET - Added check to run the chassis db clean up only for voq switches. Signed-off-by: vedganes <veda.ganesan@nokia.com>	2023-08-31 23:38:56 -07:00
lixiaoyuner	410e6ff406	Install pyOpenSSL package for k8s master (#16361 ) ### Why I did it Need a tool to check certificate's detail of information. ##### Work item tracking - Microsoft ADO (number only): 25020260 #### How I did it Install pyOpenSSL package for k8s master #### How to verify it Pip3 list to check whether it's installed when include_kubernetes_master=y	2023-08-31 22:26:24 -07:00
Senthil Kumar Guruswamy	34e5d266e5	Handle service start-limit-hit failure event case in sysmonitor (#16174 )	2023-08-31 12:07:42 -07:00
Senthil Kumar Guruswamy	fdd5deb453	Fix for issue#14871 (#15433 ) Include valid input check for system status in test along with db update check	2023-08-31 12:04:48 -07:00
Alpesh Patel	cabdac17a5	qos template change for backend compute-ai deployment (#16150 ) #### Why I did it To enable qos config for a certain backend deployment mode, for resource-type "Compute-AI". This deployment has the following requirement: - Config below enabled if DEVICE_TYPE as one of backend_device_types - Config below enabled if ResourceType is 'Compute-AI' - 2 lossless TCs' (2, 3) - 2 lossy TCs' (0,1) - DSCP to TC map uses 4 DSCP code points and maps to the TCs' as follows: "DSCP_TO_TC_MAP": { "AZURE": { "48" : "0", "46" : "1", "3" : "3", "4" : "4" } } - WRED profile has green {min/max/mark%} as {2M/10M/5%} This required template change <as in the PR> in addition to the vendor qos.json.j2 file (not included here). ### How I did it #### How to verify it - with the above change and the vendor config change, generated the qos.json file and verified that the objective stated in "Why I did it" was met - verified no error ### Description for the changelog Update qos_config.j2 for Comptue-AI deployment on one of backend device type roles	2023-08-31 11:30:20 -07:00
Vadym Hlushko	43340cd58d	[memory_checker] Add a specific log message in a case when the docker service is not running. (#16018 ) #### Why I did it To fix the logic introduced by [[memory_checker] Do not check memory usage of containers which are not created #11129](https://github.com/sonic-net/sonic-buildimage/pull/11129). There could be a scenario before the reboot, where 1. The `docker service` has stopped 2. In a very short period of time, the monit service performs the `root@sonic:/home/admin# monit status container_memory_telemetry` In such scenario, the `memory_checker` script will throw an error to the syslog: ``` ERR memory_checker: Failed to retrieve the running container list from docker daemon! Error message is: 'Error while fetching server API version: ('Connection aborted.', FileNotFoundError(2, 'No such file or directory'))' ``` But, actually, this scenario is a correct behavior, because when the docker service is stopped, the Unix socket is destroyed and that is why we could see the `FileNotFoundError(2, 'No such file or directory'` exception in the syslog. #### How I did it Change the log severity to the warning and changed the return value. #### How to verify it It is really hard to catch the exact moment described in the `Why I did it` section. In order to check the logic: 1. Change the Unix socket path to non-existing in [/usr/bin/memory_checker](`47742dfc2c/files/image_config/monit/memory_checker (L139)`) file on the switch. 2. Execute the `root@sonic:/home/admin# monit restart container_memory_telemetry` 3. Check the syslog for such messages: ``` WARNING memory_checker: Failed to retrieve the running container list from docker daemon! Error message is: 'Error while fetching server API version: ('Connection aborte d.', FileNotFoundError(2, 'No such file or directory'))' INFO memory_checker: [memory_checker] Exits without checking memory usage since container 'telemetry' is not running! ```	2023-08-31 11:28:20 -07:00
Arvindsrinivasan Lakshmi Narasimhan	3237b2cfc8	[chassis][voq] Fix to ignore duplicate nexthop in zebra (#16275 ) Why I did it Fixes #15803 In SONiC chassis, routes have recursive nexthop resolution when the routes are learnt from remote linecard. In some cases after recursive nexthop resolution the number of nexthop for a route could reach 256. Zebra ran out of space when filling up 256 nexthops which causes zebra crash. Work item tracking Microsoft ADO (24997365): How I did it Create a patch to port FRRouting/frr#14096 which has change to ignore duplicate nexthop when filling up fpm message Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>	2023-08-31 11:06:33 -07:00
Junchao-Mellanox	0be57803e2	[Mellanox] Revise label name and fix typo in sensor.conf of 4600C (#16271 ) - Why I did it Revise lable name and fix typo in sensor.conf of 4600C - How I did it Revise lable name and fix typo in sensor.conf of 4600C - How to verify it Manual test sonic-mgmt test_sensors.py	2023-08-31 19:41:12 +03:00
Yaqiang Zhu	110dc1e247	[yang][dhcp_server] Add dhcp_server_ipv4 yang model (#16327 ) Why I did it #15955 import sonic-vlan in yang model, which would cause YANG backlink issue. So #15955 was reverted by #16322. This PR is re-submitted of #15955 without import sonic-vlan. Add yang model for IPv4 DHCP Server. How I did it Add yang model for IPv4 DHCP Server. Add four new tables: DHCP_SERVER_IPV4, DHCP_SERVER_IPV4_CUSTOMIZED_OPTIONS, DHCP_SERVER_IPV4_RANGE, DHCP_SERVER_IPV4_PORT. Add related unit test. HLD: https://github.com/yaqiangz/SONiC/blob/master_dhcp_server_hld/doc/dhcp_server/port_based_dhcp_server_high_level_design.md#rev-01 How to verify it Build sonic_yang_models packages.	2023-08-31 08:52:36 -07:00
Xichen96	a5e180552f	add processor.max_cstate=0 to intel cpu cmdline (#16339 ) Why I did it This is a fix for PR [kernel] Change grub cmdline to set c-states to 0 for "Intel" CPUs by shlomibitton · Pull Request #6051 · sonic-net/sonic-buildimage (github.com) The original PR will disable intel idle driver but it cannot limit the max c-state to 1 due to system will fall back to acpi idle driver. Currently intel_idle.max_cstate=0 is already present, which will disable intel idle driver. With the added option, common idle driver will be disabled as well, so there will not be idle management. This is to prevent a bug that can be triggered by idle instruction on intel platform. How I did it Add the option to installer file beside intel_idle.max_cstate=0	2023-08-31 08:47:46 -07:00
pettershao-ragilenetworks	75062436e8	[Ragile]: Add new centec platform ra-b6010 (#14819 ) What I did it Add new platform arm64-ragile_ra-b6010-48gt4x-r0 (Centec) ASIC Vendor: Centec Switch ASIC: Centec Port Config: 48x1G+4x10G Why I did it Add new platform RA-B6010-48GT4X How I did it Add new platform RA-B6010-48GT4X Signed-off-by: pettershao-ragilenetworks <pettershao@ragilenetworks.com>	2023-08-31 08:38:24 -07:00
mssonicbld	2a48406f57	[submodule] Update submodule sonic-linux-kernel to the latest HEAD automatically (#16352 ) #### Why I did it src/sonic-linux-kernel ``` * 1800d11 - (HEAD -> master, origin/master, origin/HEAD) AMD-Pensando ELBA SOC support (#322) (23 hours ago) [Ashwin Hiranniah] ``` #### How I did it #### How to verify it #### Description for the changelog	2023-08-31 18:33:11 +08:00

1 2 3 4 5 ...

7962 Commits