sonic-buildimage

Archived

Author	SHA1	Message	Date
Junchao-Mellanox	b1162682cb	[system-health] [202012] No longer check critical process/service status via monit (#9367 ) Backport https://github.com/Azure/sonic-buildimage/pull/9068 to 202012 #### Why I did it Command `monit summary -B` can no longer display the status for each critical process, system-health should not depend on it and need find a way to monitor the status of critical processes. The PR is to address that. monit is still used by system-health to do file system check as well as customize check. #### How I did it 1. Get container names from FEATURE table 2. For each container, collect critical process names from file critical_processes 3. Use “docker exec -it <container_name> bash -c ‘supervisorctl status’” to get processes status inside container, parse the output and check if any critical processes exit #### How to verify it 1. Add unit test case to cover it 2. Adjust sonic-mgmt cases to cover it 3. Manual test	2021-11-24 15:36:14 -08:00
Jing Zhang	3e6cdfa3a6	[sonic-linkmgrd] submodule update (#9343 ) Submodule update for sonic-linkmgrd Incorporates: c11a576 (2021-11-22 09:38:46) [ci]: show code coverage in azure pipeline (#4) 4ceb01d (2021-11-18 20:24:20) Fix MUX toggling issue (#1) d640527 (2021-11-12 22:31:44) [ci]: fix artifact download b9f247d (2021-11-12 22:31:44) [ci]: use native arm64/armhf build 3059122 (2021-09-27 11:32:23) [linkgrd] Add Missing Apache License Header signed-off-by: Jing Zhang zhangjing@microsoft.com	2021-11-24 11:12:22 -08:00
tjchadaga	d3a5c5ccd0	[202012][sonic-sairedis] update submodule (#9364 ) Update sonic-sairedis submodule to get the below fixes: 7389704 [202012] Add ACL_TABLE object to break before make list (Azure/sonic-sairedis#971) f334349 Fix hung issue when installing linux kernel modules (Azure/sonic-sairedis#969)	2021-11-24 11:10:50 -08:00
xumia	415fd17689	[Build]: Fix the version not found issue (#9331 ) When we update the a sai package downing from a remote server, we need to update the version file as well currently, but the reproducible build feature is not enabled in master, it can only be detected when merging the code into the release branches, such as 202106, 202012, etc. The reproducible feature is to reduce the build failure, not need to break the build when the version not specified. If version not specified, the best choice is to accept the version from remote server. Co-authored-by: Ubuntu <xumia@xumia-vm1.jqzc3g5pdlluxln0vevsg3s20h.xx.internal.cloudapp.net>	2021-11-24 01:16:37 +00:00
shlomibitton	2361e75cfb	[hostcfgd] [202012] Fix the delay type to 'boot' delay instead of a unit activation delay (#8896 ) #### Why I did it With current code the delay will take place even if simple 'config reload' command executed and this is not desired. This delay should be used only when fast-rebooting. #### How I did it Change the type of delay to OnBootSec instead of OnActiveSec. #### How to verify it Fast-reboot with this PR and observe the delay. Run 'config-reload' command and observe no delay is running.	2021-11-23 15:21:07 -08:00
Vivek Reddy	edd6b847e9	[hostcfgd] [202012] Fixed the brief blackout in hostcfgd using SubscriberStateTable (#9228 ) #### Why I did it Backporting https://github.com/Azure/sonic-buildimage/pull/8861 to 202012	2021-11-22 21:57:07 -08:00
Qi Luo	7fb0f3f89f	[redis-py]: Fix redis version during pip3 install (#9329 ) The recent release of redis 4.0.0 or newer (for python3) breaks sonic-config-engine unit test. Fix to last known good version. ref: https://pypi.org/project/redis/#history	2021-11-22 11:06:12 -08:00
liuh-80	a5bf6fd874	[sonic-utilities] submodule update (#9342 ) Submodule update for sonic-utilities with following change: ec9e5ee Backport [generate_dump] remove secrets from dump files #1886 to 202012 (#1938) ce3b856 [fdbshow]: Handle FDB cleanup gracefully. (#1926) 1437bf2 [202012] Add DHCPv6 Relay counter and ipv6 helper CLI (#1917)	2021-11-22 09:45:10 -08:00
Renuka Manavalan	678100a7c4	Cherry pick of PR #9123 (#9310 ) [cherry-pick PR #9123 ] Why I did it When sshd realizes that this login can't succeed due to internal device state or configuration, instead of failing right there, it proceeds to prompt for password, so as the user does not get any clue on where is the failure point. Yet to ensure that this login does not proceed, sshd replaces user provided password with a specific pattern of characters matching length of user provided password. This pattern is "<BS><LF><CR><DEL>INCORRECT", which is bound to fail. If user provided length is smaller/equal, the substring of pattern is overwritten. If user provided length is greater, the pattern is repeated until length is exhausted. But if the PAM-tacacs plugin would send this password to AAA, the user could get locked out by AAA, for providing incorrect value. How I did it Hence this fix, matches obtained password against the pattern. If match, fail just before reaching AAA server. How to verify it Make sure tacacs is properly configured. Try logging in as, say "user-A"; ensure it succeeds Pick another user, say user-B and ensure this user has not logged into this device before (look into /etc/passed & folders under /home) Disable monit service (as that could fix the issue using disk_check.py) Start TCP dump for all TACACS servers. Simulate Read-only disk Try logging in using user-B. Verify it fails, after 3 attempts Stop tcp dump. TCP dump should show "authentication" for user-A only	2021-11-19 17:33:13 -08:00
Shilong Liu	4533e64fb4	[CI] Fix Azure pipeline set -e not work. (#9282 ) In azure pipeline template 'set -e' not works as expected.	2021-11-20 00:45:00 +00:00
tjchadaga	d5f95e5b2b	[sonic-utilites] submodule update (#9324 )	2021-11-19 13:03:26 -08:00
Prince Sunny	54a29b16c5	[swss] Update submodule for sonic-swss (#9314 ) c31a362 - 2021-11-18 : [202012][Mux orch] set default as standby, change mux orch priority (#2015) [Prince Sunny] 9a9e8e6 - 2021-11-18 : [202012] Check VS test failure (#2033) [Prince Sunny] 7eaabca - 2021-11-11 : [202012] Fix random failure in PR/CI build. (#2016) [Shilong Liu] 85230fe - 2021-11-04 : [orchagent] Fix group name of port-buffer-drop in flexcounterorch.cpp (#1967) [Junchao-Mellanox] a55c2ca - 2021-11-03 : [teammgrd]: Handle LAGs cleanup gracefully on Warm/Fast reboot. (#1934) [Nazarii Hnydyn]	2021-11-18 21:08:16 -08:00
Prince Sunny	d6ab409709	[202012] td2/td3 change cpu cos num to 10 (#9311 ) Cherry-pick from #9301	2021-11-18 12:48:20 -08:00
vdahiya12	a7a4980e45	[202012][sonic-platform-common] submodule update (#9297 ) 6f198d0 (HEAD -> 202012, origin/202012) [Y-Cable][Broadcom] upgrade to support Broadcom Y-Cable API to release (#230) 1c3e422 SSD Health: Retrieve SSD health and temperature values from generic SSD info (#229) Signed-off-by: vaibhav-dahiya <vdahiya@microsoft.com>	2021-11-17 23:06:44 -08:00
gechiang	a5f4780c64	[202012] BRCM SAI 4.3.5.1-8 Pick up fix for PFCWD getting continuously triggered/restored when pause frames are sent continuously to both queues of a port (#9296 ) 1. CS00012211718 [4.3] Pfcwd getting continuously triggered/restored when pause frames are sent continuously to both queues of a port (TD2/Th/Th2/TD3) MSFT Default Preliminary tests look fine. BGP neighbors were all up with proper routes programmed interfaces are all up Manually ran the following test cases on 7050CX3 (TD3) T0 DUT and all passed: ``` fib/test_fib.py vxlan/test_vxlan_decap.py fdb/test_fdb.py decap/test_decap.py ipfwd/test_dip_sip.py ipfwd/test_dir_bcast.py acl/test_acl.py vlan/test_vlan.py platform_tests/test_reboot.py ```	2021-11-17 21:30:10 -08:00
Qi Luo	c87ec48993	[sonic-utilities] submodule update (#9266 ) 9dd3025 2021-05-11 \| [Command-Reference.md] Document new SNMP show and config commands (#1600) [Travis Van Duyn] be40767 2021-05-05 \| [show][config] Add new snmp commands (#1347) [Travis Van Duyn]	2021-11-15 21:06:43 -08:00
trzhang-msft	19008889de	update DHCP_PACKET_MARK schema (#9077 ) - update DHCP_PACKET_MARK schema in state_db - this is an update over PR: Add service mark_dhcp_packet to mux container #9015	2021-11-15 21:37:08 +00:00
trzhang-msft	86fa5eede2	Add service mark_dhcp_packet to mux container (#9015 ) - add a new service "mark_dhcp_packet" to mux container - apply packet marks on a per-interface basis in ebtables - write packet marks to "DHCP_PACKET_MARK" table in state_db	2021-11-15 21:36:29 +00:00
Renuka Manavalan	6cb7af73d9	add arista.log to logrotate (#9245 )	2021-11-15 21:32:03 +00:00
kellyyeh	2cbe6a7502	DHCPv6 Relay multivlan functionality support (#9178 ) Fix support for DHCPV6 Relay multi vlan functionality. Make sure the relayed packet is received at correct interface. How I did it Bind a socket to each vlan interface's global and link-local address. Socket binded to global address is used for relaying data from client to server and receiving data from servers. Socket binded to link-local address is used for relaying data received from server back to the client.	2021-11-15 21:31:58 +00:00
mssonicbld	36f1a547b1	[ci/build]: Upgrade SONiC package versions (#9255 )	2021-11-14 23:26:35 +00:00
mssonicbld	4d15a1c1f6	[ci/build]: Upgrade SONiC package versions (#9221 )	2021-11-13 23:37:09 +00:00
gechiang	7ac5b40f4b	[202012]BRCM SAI 4.3.5.1-7 Picked up fixes for CS00012209390, CS00012212995, SONIC-51583, CS00012215744, and SONIC-51638 (#9252 ) This is to pick up BRCM SAI 4.3.5.1-7 fixes which contains the following fixes: 1. CS00012209390: SONIC-50037, Used SAI_SWITCH_ATTR_QOS_DSCP_TO_TC_MAP as a default decap map for IPinIP tunnels. 2. CS00012212995: SONIC-50948 SAI_API_QUEUE:_brcm_sai_cosq_stat_get:1353 egress Min limit get failed with error Invalid parameter 3. SONIC-51583: Fixed acl group member creation failure with priority of -1 4. CS00012215744:SONIC-51395 [TH, TH2] WB 3.5 to 4.3 fails at APPLY_VIEW while setting SAI_PORT_ATTR_EGRESS_ACL 5. SONIC-51638: SDK-249337 ERROR: AddressSanitizer: heap-buffer-overflow in _tlv_print_array Preliminary tests look fine. BGP neighbors were all up with proper routes programmed interfaces are all up Manually ran the following test cases on 7050CX3 (TD3) T0 DUT and all passed: ``` fib/test_fib.py vxlan/test_vxlan_decap.py fdb/test_fdb.py decap/test_decap.py ipfwd/test_dip_sip.py ipfwd/test_dir_bcast.py acl/test_acl.py vlan/test_vlan.py platform_tests/test_reboot.py ```	2021-11-13 10:45:46 -08:00
Mykhailo Onipko	a7117b905f	[BFN]: Updated SDK packages to 20211112 (#9244 ) Signed-off-by: Mykhailo Onipko <monipko@barefootnetworks.com>	2021-11-12 21:47:56 -08:00
Qi Luo	2a7595169b	[sonic-swss-common] Update submodule (#9225 ) ead0d5a 2021-11-10 \| Exclude *.a files from python deb packages (#554) [Qi Luo] 3a660ac 2021-10-20 \| Fix the option missing in kernel config issue (#541) [xumia]	2021-11-11 00:50:17 -08:00
Lawrence Lee	b027e87ffb	[mux.service]: Remove pmon dependency (#9211 ) Signed-off-by: Lawrence Lee <lawlee@microsoft.com>	2021-11-11 02:56:27 +00:00
Lawrence Lee	f317d93cb0	Merged PR 4679112: [write_standby]: Ignore non-auto interfaces [write_standby]: Ignore non-auto interfaces * In the event that `write_standby.py` is used to automatically switchover interfaces when linkmgrd or bgp crashes, ignore any interfaces that are not configured to auto-switch Signed-off-by: Lawrence Lee <lawlee@microsoft.com>	2021-11-10 18:54:33 -08:00
Lawrence Lee	57ad50cfd9	Merged PR 4559560: [bgp]: Switch to standby if BGP container exits [bgp]: Switch mux to standby if BGP container exits Signed-off-by: Lawrence Lee <lawlee@microsoft.com>	2021-11-10 18:54:33 -08:00
Lawrence Lee	6a9c709336	[write_standby]: Improve logging Signed-off-by: Lawrence Lee <lawlee@microsoft.com>	2021-11-10 18:54:33 -08:00
Lawrence Lee	77378b4364	[mux]: Call write_standby from host only Signed-off-by: Lawrence Lee <lawlee@microsoft.com>	2021-11-10 18:54:33 -08:00
Lawrence Lee	25712c712e	[mux]: Make write_standby available on host Signed-off-by: Lawrence Lee <lawlee@microsoft.com> [write_standby]: Cleanup and fix build Signed-off-by: Lawrence Lee <lawlee@microsoft.com>	2021-11-10 18:54:33 -08:00
Lawrence Lee	84cd0e9471	[mux]: Initialize all mux ports as standby Signed-off-by: Lawrence Lee <lawlee@microsoft.com>	2021-11-10 18:54:33 -08:00
Tamer Ahmed	18d1f65339	Merged PR 4813977: [mux] Update Service Install With SONiC Target [mux] Update Service Install With SONiC Target Recent PR grouped all SONiC service into sonic.taget. The install section of mux.service was not update and this causes delays when using config reload as the service failed state is not being reset. signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>	2021-11-10 18:54:33 -08:00
Lawrence Lee	70fbd6826c	Merged PR 4366316: [mux.service]: Bind to sonic.target [mux.service]: Bind to sonic.target Signed-off-by: Lawrence Lee <lawlee@microsoft.com>	2021-11-10 18:54:33 -08:00
Tamer Ahmed	b42aef68f3	Merged PR 4234524: [mux] Start Mux on Only Dual-ToR Platform [mux] Start Mux on Only Dual-ToR Platform mux docker depends on the presence of mux cable hardware and is supposed to run only Gemini ToRs. This PR change the mux feature config in order to enable mux docker based on device configuration. signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>	2021-11-10 18:54:33 -08:00
Tamer Ahmed	ed40b53ee1	[linkmgrd] Relocate Linkmgrd to Github This PR deletes local-to-buildimage linkmgrd and creates new submodule pointing to github repo of sonic-linkmgrd. signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>	2021-11-10 18:54:33 -08:00
Tamer Ahmed	b826630262	[mux] Add New Package Vars Ading new packaging variable to mux docker signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>	2021-11-10 18:54:33 -08:00
Tamer Ahmed	a8fdeb3907	[linkmgrd] Enhance Init And Switch State When Config Is Active During warm reboot, linkmgrd would go away and so heartbeats will be lost. This would result in standby link son peer ToR to pull the link active. This is undesirable since we would not create tunnel from the ToR that is being rebooted to the peer ToR. This PR implicitly lock the state of the mux if config is not set to auto. Also, orchagent does not initialize MUX to it hardware state, rather it initilizes MUX to Unknown state. linkmgrd will detect this situation and probe MUX state to correct orchagent state. There a fix for the case when state os switched MUX is delayed. The PR will poll the MUX for the new state. This is required to update the state ds and hence create/tear tunnel. signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>	2021-11-10 18:54:33 -08:00
Tamer Ahmed	b8f70f8986	Merged PR 3845699: [linkmgrd]: Introduce MUX cable linkmgrd Linkmgrd monitors link status, mux status, and link state. Has the link becomes unhealthy, linkmgrd will trigger mux switchover on a standby ToR ensuring uninterrupted service to servers/blades. This PR is initial implementation of linkmgrd. Also, docker-mux container hold packages related to maintaining and managing mux cable. It currently runs linkmgrd binary that monitor and switches the mux if needed. This PR also introduces mux-container and starts linkmgrd as startup when build is configured with INCLUDE_MUX=y Edit: linkmgrd PR will follow. signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com> Related work items: #2315, #3146150	2021-11-10 18:54:33 -08:00
tjchadaga	8138a34a0b	[202012] sonic-platform-daemons submodule update (#9222 )	2021-11-10 17:28:51 -08:00
tjchadaga	1bc012ab1e	[202012] sonic-utilities submodule update (#9214 )	2021-11-10 09:00:32 -08:00
Rajkumar-Marvell	34e5243f64	[202012][Marvell] Update armhf SAI to ver 1.7.1-6 (#9205 ) Fixed SAI error reported in issue #9172 Signed-off-by: Rajkumar Pennadam Ramamoorthy <rpennadamram@marvell.com>	2021-11-10 08:34:46 -08:00
tjchadaga	9a1b1bc44e	Fix for additional intf flap during fast-reboot (#9166 )	2021-11-09 23:20:06 +00:00
Saikrishna Arcot	bea36d963e	dhcp6relay: remove line overwriting docker-dhcp-relay variable (#9179 ) The dhcp6relay rules file had a line overwriting a variable for docker-dhcp-relay. Remove that line. This line caused a limited impact where if some (many?) of the docker containers were already built, except for dhcp-relay, and the build failed or was interrupted, then dhcp-relay container would fail to build because this variable was overwritten and the python3-swsscommon wouldn't get installed into the slave container. Most builds would be fine, though. Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>	2021-11-09 23:19:23 +00:00
mssonicbld	c15bae7c84	[ci/build]: Upgrade SONiC package versions (#9128 )	2021-11-09 22:52:26 +00:00
Vivek Reddy	1cd67bb27c	[202012] update sonic-utilities submodule (#9195 ) Submodule update for sonic-utilties ``` 48035d75 [202012] [techsupport] Techsupport Error Reporting pending fixes (#1854) 8b2ec09a Fix log_ssd_health hang issue (#1904) ac9c4254 Fix the option missing in kernel config issue (#1888) 5cc9417a disk_check: Script updated to run good in 201811 & 201911 (#1747) ```	2021-11-09 14:09:52 -08:00
trzhang-msft	7e8ebaabee	caclmgrd: support packet mark in DHCP chain (#9191 ) * caclmgrd:support packet mark in DCHP chain	2021-11-08 14:54:57 -08:00
gechiang	baa00e6969	[202012] Disable ALPM distributed hitbit thread that is used for debug purpose only but interfered with Other functional operations (#9190 ) This is to address an issue where it was observed that SAI operations sometime make take a very long to time complete (over 45ms). It was determined that the ALPM distributed thread was causing this issue. The fix is to disable this debug thread that has no functional purpose. Preliminary tests looks fine. BGP neighbors were all up with proper routes programmed interfaces are all up Manually ran the fib test cases on 7050CX3 (TD3), TD2, TH, TH2, and TH3 based platforms and thy all passed.	2021-11-08 11:50:44 -08:00
Lawrence Lee	8ada006302	[swss]: Start ndppd after vlanmgrd (#9155 ) Why I did it During swss container startup, if ndppd starts up before/with vlanmgrd, ndppd will be pinned at nearly 100% CPU usage. How I did it Only start ndppd after vlanmgrd is running. Also, call ndppd directly instead of through bash for improved logging and to prevent orphaned processes. Signed-off-by: Lawrence Lee <lawlee@microsoft.com>	2021-11-05 00:39:10 +00:00
kellyyeh	d8dd68d2f4	Fix invalid destination address error (#9143 )	2021-11-05 00:38:36 +00:00

1 2 3 4 5 ...

5068 Commits