sonic-buildimage

Archived

Author	SHA1	Message	Date
Saikrishna Arcot	f84dfd2345	Re-add 127.0.0.1/8 when bringing down the interfaces (#15080 ) * Re-add 127.0.0.1/8 when bringing down the interfaces With #5353, 127.0.0.1/16 was added to the lo interface, and then 127.0.0.1/8 was removed. However, when bringing down the lo interface, like during a config reload, 127.0.0.1/16 gets removed, but 127.0.0.1/8 isn't added back to the interface. This means that there's a period of time where 127.0.0.1 is not available at all, and services that need to connect to 127.0.01 (such as for redis DB) will fail. To fix this, when going down, add 127.0.0.1/8. Add this address before the existing configuration gets removed, so that 127.0.0.1 is available at all times. Note that running `ifdown lo` doesn't actually bring down the loopback interface; the interface always stays "physically" up. Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>	2023-06-13 18:45:39 -07:00
Lior Avramov	c05d017091	[Mellanox] Remove iproute2 SDK patches from SONiC tree and consume them from SDK github (#15062 ) - Why I did it SDK patches for iproute2 were added to SONiC tree as a temporary solution. Now that SDK with the patches is available, I have removed the patches from SONiC tree and we consume them from SDK github during compilation. - How I did it During build we download SDK iproute2 patches from SDK github (or from the URL provided by user if compiling SDK from sources) and apply them before compilation. - How to verify it Compile and load on switch, verify interfaces network devices created successfully. Verify LLDP shows connections to neighbors. Verify ping between 2 hosts over 2 router ports is successful.	2023-06-13 15:17:52 +03:00
Stephen Sun	238e6ffcc1	[Mellanox] Adjust warning threshold implementation according to the latest algorithm update (#15092 ) - Why I did it Adjust the warning threshold implementation according to the latest algorithm update - How I did it Modify power warning and critical thresholds methods - How to verify it Unit test updated to cover the change Signed-off-by: Stephen Sun <stephens@nvidia.com>	2023-06-13 15:14:10 +03:00
Kebo Liu	3cb13226be	Update SN5600 platform.json with service port sfp (#15337 ) Signed-off-by: Kebo Liu <kebol@nvidia.com>	2023-06-13 14:15:15 +03:00
mssonicbld	1343b1eba3	[submodule] Update submodule sonic-platform-daemons to the latest HEAD automatically	2023-06-13 18:32:53 +08:00
mssonicbld	d7e75f48bf	[submodule] Update submodule sonic-host-services to the latest HEAD automatically	2023-06-13 16:32:51 +08:00
mssonicbld	2227365107	[submodule] Update submodule sonic-mgmt-common to the latest HEAD automatically	2023-06-13 16:32:46 +08:00
mssonicbld	9ddb9d6852	[submodule] Update submodule sonic-mgmt-framework to the latest HEAD automatically	2023-06-13 16:32:42 +08:00
mssonicbld	713a8a8a7e	[submodule] Update submodule sonic-swss to the latest HEAD automatically	2023-06-13 16:32:34 +08:00
jingwenxie	54a1ad10f9	[yang] Change asn to start from 0 for bgp monitor (#15350 ) #### Why I did it The asn 0 in BGP_MONITOR is invalid by YANG definition. However, the asn 0 in BGP_MONITOR is found in many devices. It was introduced by minigraph where its value is set to 0. To unblock Config Updater test, the short term fix is to accept the asn 0 in BGP_MONITOR. We can revert this after NGS team make all the ASN change in minigraph. ##### Work item tracking - Microsoft ADO (24186140): #### How I did it Change the range #### How to verify it Unit test.	2023-06-12 21:59:34 -07:00
Hua Liu	05f1a5a31e	Add watchdog mechanism to swss service and generate alert when swss have issue. (#15429 ) Add watchdog mechanism to swss service and generate alert when swss have issue. Work item tracking Microsoft ADO (number only): 16578912 What I did Add orchagent watchdog to monitor and alert orchagent stuck issue. Why I did it Currently SONiC monit system only monit orchagent process exist or not. If orchagent process stuck and stop processing, current monit can't find and report it. How I verified it Pass all UT. Manually test process_monitoring/test_critical_process_monitoring.py can pass. Add new UT https://github.com/sonic-net/sonic-mgmt/pull/8306 to check watchdog works correctly. Manually test, after pause orchagent with 'kill -STOP <pid>', check there are warning message exist in log: Apr 28 23:36:41.504923 vlab-01 ERR swss#supervisor-proc-watchdog-listener: Process 'orchagent' is stuck in namespace 'host' (1.0 minutes). Details if related Heartbeat message PR: https://github.com/sonic-net/sonic-swss/pull/2737 UT PR: https://github.com/sonic-net/sonic-mgmt/pull/8306	2023-06-12 17:53:54 -07:00
Alpesh Patel	633fff8c10	enable ethernet backplane port support in port config for packet mode T2 devices (#14533 ) For T2 systems using packet mode, the backplane interfaces (Ethernet-BP#) and the fabric card ethernet interfaces are not visible as neighbor interfaces. In packet mode, these interfaces needs qos and buffer config as well. This fix addresses that issue and adds the backplane interfaces to the PORTS_ACTIVE list	2023-06-12 14:02:22 -07:00
mssonicbld	cb9d9e57a6	[ci/build]: Upgrade SONiC package versions (#15431 ) Upgrade SONiC Versions	2023-06-12 22:27:29 +08:00
mssonicbld	c74629a83a	[submodule] Update submodule sonic-utilities to the latest HEAD automatically	2023-06-12 16:32:51 +08:00
mssonicbld	6b9c100974	[submodule] Update submodule sonic-host-services to the latest HEAD automatically	2023-06-11 16:32:32 +08:00
mssonicbld	50238d8039	[submodule] Update submodule sonic-platform-common to the latest HEAD automatically	2023-06-11 16:32:27 +08:00
mssonicbld	a45595158b	[ci/build]: Upgrade SONiC package versions (#15345 )	2023-06-10 20:38:13 +08:00
mssonicbld	df20467b29	[submodule] Update submodule sonic-swss-common to the latest HEAD automatically (#15425 )	2023-06-10 17:03:02 +08:00
mssonicbld	7f3d68f4c2	[submodule] Update submodule sonic-gnmi to the latest HEAD automatically	2023-06-10 16:32:55 +08:00
mssonicbld	bad9099fba	[submodule] Update submodule linkmgrd to the latest HEAD automatically	2023-06-10 16:32:50 +08:00
mssonicbld	5c18870688	[submodule] Update submodule sonic-sairedis to the latest HEAD automatically (#15402 )	2023-06-10 16:30:05 +08:00
mssonicbld	a48a813d08	[submodule] Update submodule sonic-utilities to the latest HEAD automatically (#15370 )	2023-06-10 16:17:01 +08:00
mssonicbld	dc4eb9e90d	[submodule] Update submodule sonic-ztp to the latest HEAD automatically (#15426 )	2023-06-10 16:05:44 +08:00
mssonicbld	e662c480dc	[submodule] Update submodule sonic-swss to the latest HEAD automatically (#15403 )	2023-06-10 15:57:18 +08:00
mssonicbld	516e7930b2	[submodule] Update submodule sonic-platform-daemons to the latest HEAD automatically (#15401 )	2023-06-10 15:30:27 +08:00
Liping Xu	78c41a1e58	allow docker_inram to kernel cmd list (#15374 ) Why I did it After docker_inram is enabled, the docker folder's default max size is 1.5G. It's not big enough for some tests which need to install additional docker images or install extra packages. Work item tracking Microsoft ADO 24199761: How I did it add docker_inram into cmdline_allowlist How to verify it sudo sh -c 'echo "docker_inram_size=3000M" >> kernel-cmdline-append' sudo reboot and check the docker folder size	2023-06-10 14:19:44 +08:00
Sudharsan Dhamal Gopalarathnam	162856ad9a	[sflow]Delay starting sflow service until ports are created (#15333 ) * [sflow]Delay starting sflow service until ports are created * Removing sflow from sonic.target dependency since it will be managed by hostcfgd	2023-06-09 16:28:15 -07:00
Saikrishna Arcot	d466994e91	teamd: Add support for custom retry counts for LACP sessions (#13453 ) Why I did it This is to add support for specifying custom retry counts for LACP sessions. This is to make warmboot easier on low-storage and low-memory platforms, by allowing more than 90 seconds of downtime. How I did it How to verify it Tested manually with these cases: Verify that changing the retry count using teamdctl PortChannel101 state item set runner.retry_count 5 takes effect Verify that the retry count change actually affects when the LAG goes down by forcefully killing teamd on one side (i.e. setting the retry count to 5 causes the LAG to go down after 150 seconds) Verify that the retry count gets reset to 3 after the LAG goes down for whatever reason Verify that the retry count gets reset to 3 after some period of time (30 seconds * retry count) Test cases are in sonic-net/sonic-mgmt#7961 and sonic-net/sonic-mgmt#8152. Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>	2023-06-09 10:03:25 -07:00
mssonicbld	2b5c0dd0c6	[submodule] Update submodule sonic-swss-common to the latest HEAD automatically (#15404 )	2023-06-09 15:57:30 +08:00
Ye Jianquan	cec9d7b83a	Revert "Add watchdog mechanism to swss service and generate alert when swss have issue. (#14686 )" (#15390 ) This reverts commit `44427a2f6b`. Docker image not updated during PR validation and caused PR check failures. Force merge this revert. After cache is updated after this PR is merged, issue should be fixed.	2023-06-09 09:10:35 +08:00
Arvindsrinivasan Lakshmi Narasimhan	0f194c5a03	set the default value for the port fec to RS on J2 based LC (#15346 ) Why I did it Work item tracking Microsoft ADO (24182162): How I did it update the config.bcm to set the default fec RS 100G Linecard How to verify it Tests on chassis	2023-06-08 11:08:48 -07:00
Vivek	9d8ab1b8e4	[Mellanox] Added patchwork link to commit message (#15301 ) - Why I did it Add the patchwork link to the commit description for non-upstream patches if present - How I did it Parse the patchwork/<patch_name>.txt file from hw-mgmt	2023-06-08 18:51:58 +03:00
Liu Shilong	96cac8e918	[ci] Add marvell-arm64 build in PR checks. (#15356 ) Why I did it Add marvell-arm64 platform build in PR checks to avoid build break. Work item tracking Microsoft ADO (number only): 17257160 How I did it How to verify it	2023-06-08 09:40:20 +08:00
Ikki Zhu	9fcbd5ed1d	fix possible cpld race access issue (#15371 ) Why I did it fix possible cpld race read issue between watchdog and reboot cause process How I did it Use fcntl.flock to limit parallel access to cpld sys file How to verify it It can be simulated and verified with following python script ``` python3 import fcntl import signal import threading exit_flag = False def get_cpld_reg_value(getreg_path, register): file = open(getreg_path, 'w+') # Acquire an exclusive lock on the file fcntl.flock(file, fcntl.LOCK_EX) try: file.write(register + '\n') file.flush() # Seek to the beginning of the file file.seek(0) # Read the content of the file result = file.readline().strip() finally: # Release the lock and close the file fcntl.flock(file, fcntl.LOCK_UN) file.close() return result def cpld_read(thread_num, cpld_reg, expect_val): while not exit_flag: val = get_cpld_reg_value("/sys/devices/platform/dx010_cpld/getreg", cpld_reg) #print(f"Thread {thread_num}: get cpld reg {cpld_reg}, value {val}") if val != expect_val: print(f"Thread {thread_num}: get cpld reg {cpld_reg}, value {val}, expect_val {expect_val}") def signal_handler(sig, frame): global exit_flag print("Ctrl+C detected. Quitting...") exit_flag = True if __name__ == '__main__': # Register the signal handler for Ctrl+C signal.signal(signal.SIGINT, signal_handler) t1 = threading.Thread(target=cpld_read, args=(1, '0x103', '0x11',)) t2 = threading.Thread(target=cpld_read, args=(2, '0x141', '0x00',)) t1.start() t2.start() t1.join() t2.join() ```	2023-06-07 11:29:18 -07:00
Yevhen Fastiuk	8a6d45227e	[Clock] Add timezone config YANG model (#14651 ) * Add the ability to configure timezone Signed-off-by: Yevhen Fastiuk <yfastiuk@nvidia.com> * Add YANG model for timezone Signed-off-by: Yevhen Fastiuk <yfastiuk@nvidia.com> * Add timezone reference Signed-off-by: Yevhen Fastiuk <yfastiuk@nvidia.com> --------- Signed-off-by: Yevhen Fastiuk <yfastiuk@nvidia.com>	2023-06-07 10:39:24 -07:00
abdosi	6139c525d2	updated internal route policy for chassis-packet (#15349 ) What I did: Workaround for the issue seen here : FRRouting/frr#13682 It seems there is timing issue where there are multiple recursive lookup needed to resolve nexthop of the route it's possible that it does not happen correctly causing route to remain in inactive state Issue is seen on chassis-packet as there 2 level of recursive lookup needed for a given e-BGP learnt route - Level1 to resolve e-BGP peer (connected route via bgp ) over Loopback4096 (i-BGP peering) - Level 2 Loopback4096 over backend port-channels next-hops For VOQ chassis there is no e-BGP peer (connected route via bgp ) resolution as route is added as Static route by orchagent over Ethernet-IB. Also as part of this remove route-map policy from instance.conf.j2 as same is define in peer-group.j2. Microsoft ADO: https://msazure.visualstudio.com/One/_workitems/edit/24198507 How I verify: Functional Verification manually Updated UT. We will be adding sanity check in sonic-mgmt to make sure none of route are in inactive state. Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>	2023-06-07 09:17:44 -07:00
Rajkumar-Marvell	94790bef04	[sflow] Add egress sflow support. (#14630 ) * [sflow] Add egress sflow support. - Updated sonic-yang-model - change hsflowd version to 2.0.45	2023-06-06 11:23:39 -07:00
mssonicbld	084d012749	[submodule] Update submodule sonic-mgmt-common to the latest HEAD automatically	2023-06-06 16:32:12 +08:00
mssonicbld	40eb97c2f3	[submodule] Update submodule sonic-utilities to the latest HEAD automatically (#15294 )	2023-06-06 14:44:24 +08:00
mssonicbld	f78261cbac	[submodule] Update submodule sonic-swss to the latest HEAD automatically (#15355 )	2023-06-06 14:34:37 +08:00
mssonicbld	ac56598db1	[submodule] Update submodule dhcprelay to the latest HEAD automatically	2023-06-06 14:33:15 +08:00
mssonicbld	ba241bbe3f	[submodule] Update submodule sonic-platform-daemons to the latest HEAD automatically	2023-06-06 14:33:06 +08:00
Hua Liu	44427a2f6b	Add watchdog mechanism to swss service and generate alert when swss have issue. (#14686 ) This PR depends on https://github.com/sonic-net/sonic-swss/pull/2737 merge first. What I did Add orchagent watchdog to monitor and alert orchagent stuck issue. Why I did it Currently SONiC monit system only monit orchagent process exist or not. If orchagent process stuck and stop processing, current monit can't find and report it. How I verified it Pass all UT. Add new UT https://github.com/sonic-net/sonic-mgmt/pull/8306 to check watchdog works correctly. Manually test, after pause orchagent with 'kill -STOP <pid>', check there are warning message exist in log: Apr 28 23:36:41.504923 vlab-01 ERR swss#supervisor-proc-watchdog-listener: Process 'orchagent' is stuck in namespace 'host' (1.0 minutes). Details if related Heartbeat message PR: https://github.com/sonic-net/sonic-swss/pull/2737 UT PR: https://github.com/sonic-net/sonic-mgmt/pull/8306	2023-06-05 22:21:17 -07:00
siqbal1986	381cfe4485	Added VNET_MONITOR_TABLE,BFD_SESSION_TABLE,VNET_ROUTE_TUNNEL_TABLE to the list (#14992 ) * The 3 tables in state DB need to be cleaned up after SWSS restart for have consistant state.	2023-06-05 13:18:50 -07:00
Kalimuthu-Velappan	2627dcc5b4	07.Version Cache - Support for PIP (#14613 ) During build, lots of pip packages are getting installed through pip install command. This feature adds support for caching all the pip packages into local cache path, so that subsequent build always loads from the cache.	2023-06-05 12:02:33 -07:00
Marty Y. Lok	d4a81ea121	[Nokia-IXR7250E][Devicedata] update the device data for Nokia IXR7250E platform (#15216 ) Why I did it Update the device data files to support 1024 LAGs for Nokia IXR7250E platform fixes https://github.com/Nokia-ION/ndk/issues/15 How I did it Update the lag_id_end=1024 in chassisdb.conf file and add the trunk_group_max_members=16 in the BCM config file How to verify it check to allow to create lag ids up to 1024 with 16 port members Signed-off-by: mlok <marty.lok@nokia.com>	2023-06-05 12:02:05 -07:00
Kalimuthu-Velappan	9dce453552	06.Version Cache - Support for wget (#14612 ) When a package is referenced from the web through wget command, it downloads the package for every build. This feature caches all the packages that are being downloaded from the web, so that subsequent build always loads the cache instead of from web.	2023-06-05 12:00:58 -07:00
Aravind Mani	b26445cf7b	Dell FPGA driver fix (#15144 ) Why I did it FPGA driver crash was observed in Dell FPGA based platforms. How I did it Fixed FPGA crash How to verify it Load FPGA driver and check whether the kernel crashes.	2023-06-05 11:01:46 -07:00
mssonicbld	4335690de7	[ci/build]: Upgrade SONiC package versions	2023-06-05 20:51:47 +08:00
Liu Shilong	d2915969d4	[ci] Add OVERRIDE_BUILD_OPTIONS in image build template. (#15309 ) Why I did it Set build options in pipeline UI. Support setting reproducible build options to py2,py3 in release branch and none in master branch. Work item tracking Microsoft ADO (number only): 22335854 How I did it How to verify it	2023-06-05 18:42:06 +08:00

1 2 3 4 5 ...

7613 Commits