sonic-buildimage

Author	SHA1	Message	Date
Prince George	ac1d392d4c	Disable brackted-paste mode off by default (#12285 ) * Disable brackted-paste mode off by default * address review comment	2022-10-06 07:55:09 -07:00
Aryeh Feigin	2c10ebb4fe	Use warm-boot infrastructure for fast-boot (#11594 ) This PR should be merged together with the sonic-utilities PR (sonic-net/sonic-utilities#2286) and sonic-sairedis PR (sonic-net/sonic-sairedis#1100). Use redis contents from dump file in fast-reboot. Improve fast-reboot flow by utilizing the warm-reboot infrastructure. This followes https://github.com/sonic-net/SONiC/blob/master/doc/fast-reboot/Fast-reboot_Flow_Improvements_HLD.md	2022-09-26 09:01:49 -07:00
Zain Budhwani	fd6a1b0ce2	Add events to host and create rsyslog_plugin deb pkg (#12059 ) Why I did it Create rsyslog plugin deb for other containers/host to install Add events for bgp and host events	2022-09-21 09:20:53 -07:00
Stepan Blyshchak	e662008f72	[services] kill container on stop in warm/fast mode (#10510 ) - Why I did it To optimize stop on warm boot. - How I did it Added kill for containers	2022-09-19 19:34:33 +03:00
Volodymyr Boiko	c243af0cce	[bgp][service] Start bgp service after interfaces-config service (#11827 ) - Why I did it interfaces-config service restarts networking service, during the restart loopback interface address is being removed and reassigned back, leaving loopback without an ipv4 address for a while. On SONiC startup and config reload interfaces-config and bgp services start in parallel and sometimes fpmsyncd in bgp attempts bind to loopback while it does not have an address, fails with the log Exception "Cannot assign requested address" had been thrown in daemon and exits with rc 0. root@sonic:/# supervisorctl status fpmsyncd EXITED Jul 20 05:04 AM zebra RUNNING pid 35, uptime 6:15:05 zsocket EXITED Jul 20 05:04 AM docker logs bgp INFO exited: fpmsyncd (exit status 0; expected) With fpmsyncd dead, configured routes do not appear in the database. - How I did it Added ordering dependency on interfaces-config service into bgp.config - How to verify it Itself the issue reproduces quite rarely, but one can gain the time interval between networking down and networking up in interfaces-config.sh like this: diff --git a/files/image_config/interfaces/interfaces-config.sh b/files/image_config/interfaces/interfaces-config.sh index f6aa4147a..87caceeff 100755 --- a/files/image_config/interfaces/interfaces-config.sh +++ b/files/image_config/interfaces/interfaces-config.sh @@ -63,7 +63,11 @@ done # Read sysctl conf files again sysctl -p /etc/sysctl.d/90-dhcp6-systcl.conf -systemctl restart networking +# systemctl restart networking + +systemctl start networking +sleep 10 +systemctl stop networking # Clean-up created files rm -f /tmp/ztp_input.json /tmp/ztp_port_data.json with this change the issue reproduces on every config reload. Signed-off-by: Volodymyr Boyko <volodymyrx.boiko@intel.com>	2022-09-19 17:25:10 +03:00
lixiaoyuner	a1b50cac41	Make client indentity by AME cert (#11946 ) * Make client indentity by AME cert * Join k8s cluster by ipv6 * Change join test cases * Test case bug fix * Improve read node label func * Configure kubelet and change test cases * For kubernetes version 1.22.2 * Fix undefine issue Signed-off-by: Yun Li <yunli1@microsoft.com>	2022-09-16 13:13:39 +08:00
Maxime Lorrillere	0a7dd50dcb	[Chassis][Voq]Configure midplane network on supervisor (#11725 ) Multi-asic Docker instances are created behind Docker's default bridge which doesn't allow talking to other Docker instances that are in the host network (like database-chassis). On linecards, we configure midplane interfaces to let per-asic docker containers talk to CHASSIS_DB on the supervisor through internal chassis network. On the supervisor we don't need to use chassis internal network, but we still need a similar setup in order to allow fabric containers to talk to database-chassis	2022-09-15 17:23:41 -07:00
Oleksandr Ivantsiv	549bb3d483	[services] Update "WantedBy=" section for tacacs-config.timer. (#11893 ) The timer execution may fail if triggered during a config reload (when the sonic.target is stopped). This might happen in a rare situation if config reload is executed after reboot in a small time slot (for 0 to 30 seconds) before the tacacs-config timer is triggered. To ensure that timer execution will be resumed after a config reload the WantedBy section of the systemd service is updated to describe relation to sonic.target. Signed-off-by: Oleksandr Ivantsiv <oivantsiv@nvidia.com> Signed-off-by: Oleksandr Ivantsiv <oivantsiv@nvidia.com>	2022-09-08 15:16:11 -07:00
Renuka Manavalan	31e750ee0b	Fix PR build failure (#11973 ) Some PR builds fails to find this file. Remove it temporarily until we root cause it	2022-09-06 15:13:05 -07:00
Zain Budhwani	6a54bc439a	Streaming structured events implementation (#11848 ) With this PR in, you flap BGP and use events_tool to see the published events. With telemetry PR #111 in and corresponding submodule update done in buildimage, one could run gnmi_cli to capture BGP flap events.	2022-09-03 07:33:25 -07:00
Ying Xie	a6843927d9	[mux] skip mux operations during warm shutdown (#11937 ) * [mux] skip mux operations during warm shutdown - Enhance write_standby.py script to skip actions during warm shutdown. - Expand the support to BGP service. - MuX support was added by a previous PR. - don't skip action during warm recovery Signed-off-by: Ying Xie <ying.xie@microsoft.com>	2022-09-02 13:50:42 -07:00
Longxiang Lyu	6e878a36da	[mux] Exit to write `standby` state to `active-active` ports (#11821 ) [mux] Exit to write standby state to `active-active` ports Signed-off-by: Longxiang Lyu <lolv@microsoft.com>	2022-08-31 13:10:22 -07:00
lixiaoyuner	8d6431e754	Add k8s master feature (#11637 ) * Add k8s master feature Signed-off-by: Yun Li <yunli1@microsoft.com> * Update kubernetes version mistake and make variable passing clear Signed-off-by: Yun Li <yunli1@microsoft.com> * Add CRI-dockerd package Signed-off-by: Yun Li <yunli1@microsoft.com> * Update version variable passing logic Signed-off-by: Yun Li <yunli1@microsoft.com> * Upgrade the worker kubernetes version Signed-off-by: Yun Li <yunli1@microsoft.com> * Install xml file parse tool Signed-off-by: Yun Li <yunli1@microsoft.com> Signed-off-by: Yun Li <yunli1@microsoft.com>	2022-08-13 23:01:35 +08:00
Sudharsan Dhamal Gopalarathnam	5dc4eb9693	[vs]Preventing ebtables cfg to be applied on vs (#11585 ) *Preventing ebtables rules to be applied on KVM image. The ebtables rules in SONiC are added to prevent ARP as well as L2 forwarding to be blocked in linux kernel since the hardware will take care of the actual L2 forward. However this is not the case with KVM where linux needs to forward even L2 packets	2022-08-04 09:18:00 -07:00
bingwang-ms	dc799356aa	Support different `DSCP_TO_TC_MAP` for T1 in dualtor deployment (#11569 ) * Support different DSCP_TO_TC_MAP for T1 in dualtor deployment	2022-08-01 09:35:34 +08:00
abdosi	a380105461	Enable ARP Update Script for Packet based chassis. (#11465 ) What I did: Following changes done for packet based chassis:- 1> Run arp_update on LC's to resolve static route nexthops over backend port-channel interfaces. 2> On Supervisor make sure arp_update exit gracefully	2022-07-26 16:50:16 -07:00
gregshpit	5df09490dc	Ported Marvell armhf build on amd64 host for debian buster to use cross-comp… (#8035 ) * Ported Marvell armhf build on x86 for debian buster to use cross-compilation instead of qemu emulation Current armhf Sonic build on amd64 host uses qemu emulation. Due to the nature of the emulation it takes a very long time, about 22-24 hours to complete the build. The change I did to reduce the building time by porting Sonic armhf build on amd64 host for Marvell platform for debian buster to use cross-compilation on arm64 host for armhf target. The overall Sonic armhf building time using cross-compilation reduced to about 6 hours. Signed-off-by: marvell <marvell@cpss-build3.marvell.com> * Fixed final Sonic image build with dockers inside * Update Dockerfile.j2 Fixed qemu-user-static:x86_64-aarch64-5.0.0-2 . * Update cross-build-arm-python-reqirements.sh Added support for both armhf and arm64 cross-build platform using $PY_PLAT environment variable. * Update Makefile Added TARGET=<cross-target> for armhf/arm64 cross-compilation. * Reviewer's @qiluo-msft requests done Signed-off-by: marvell <marvell@cpss-build3.marvell.com> * Added new radius/pam patch for arm64 support * Update slave.mk Added missing back tick. * Added libgtest-dev: libgmock-dev: to the buster Dockerfile.j2. Fixed arm perl version to be generic * Added missing armhf/arm64 entries in /etc/apt/sources.list * fix libc-bin core dump issue from xumia:fix-libc-bin-install-issue commit * Removed unnecessary 'apt-get update' from sonic-slave-buster/Dockerfile.j2 * Fixed saiarcot895 reviewer's requests * Fixed README and replaced 'sed/awk' with patches * Fixed ntp build to use openssl * Unuse sonic-slave-buster/cross-build-arm-python-reqirements.sh script (put all prebuilt python packages cross-compilation/install inside Dockerfile.j2). Fixed src/snmpd/Makefile to use -j1 in all cases * Clean armhf cross-compilation build fixes * Ported cross-compilation armhf build to bullseye * Additional change for bullseye * Set CROSS_BUILD_ENVIRON default value n * Removed python2 references * Fixes after merge with the upstream * Deleted unused sonic-slave-buster/cross-build-arm-python-reqirements.sh file * Fixed 2 @saiarcot895 requests * Fixed @saiarcot895 reviewer's requests * Removed use of prebuilt python wheels * Incorporated saiarcot895 CC/CXX and other simplification/generalization changes Signed-off-by: marvell <marvell@cpss-build3.marvell.com> * Fixed saiarcot895 reviewer's additional requests * src/libyang/patch/debian-packaging-files.patch * Removed --no-deps option when installing wheels. Removed unnecessary lazy_object_proxy arm python3 package instalation Co-authored-by: marvell <marvell@cpss-build3.marvell.com> Co-authored-by: marvell <marvell@cpss-build2.marvell.com>	2022-07-21 14:15:16 -07:00
Stephen Sun	8f4a1b7b85	[Mellanox] Support Mellanox-SN4600C-C64 as T1 switch in dual-ToR scenario (#11261 ) - Why I did it Support Mellanox-SN4600C-C64 as T1 switch in dual-ToR scenario This is to port #11032 and #11299 from 202012 to master. Support additional queue and PG in buffer templates, including both traditional and dynamic model Support mapping DSCP 2/6 to lossless traffic in the QoS template. Add macros to generate additional lossless PG in the dynamic model Adjust the order in which the generic/dedicated (with additional lossless queues) macros are checked and called to generate buffer tables in common template buffers_config.j2 Buffer tables are rendered via using macros. Both generic and dedicated macros are defined on our platform. Currently, the generic one is called as long as it is defined, which causes the generic one always being called on our platform. To avoid it, the dedicated macrio is checked and called first and then the generic ones. Support MAP_PFC_PRIORITY_TO_PRIORITY_GROUP on ports with additional lossless queues. On Mellanox-SN4600C-C64, buffer configuration for t1 is calculated as: 40 * 100G downlink ports with 4 lossless PGs/queues, 1 lossy PG, and 3 lossy queues 16 * 100G uplink ports with 2 lossless PGs/queues, 1 lossy PG, and 5 lossy queues Signed-off-by: Stephen Sun <stephens@nvidia.com>	2022-07-20 09:48:15 +03:00
andywongarista	88d0ce5ce8	Add gbsyncd container for broncos (#11154 ) * Add docker-gbsyncd-broncos support * Address review comments * Add socket to gbsyncd * Upgrade gbsyncd-broncos to bullseye	2022-07-18 10:57:27 +08:00
bingwang-ms	52c36cd31f	Add flag to control the generation of `PORT_QOS_MAP\|global` entry (#11448 ) Why I did it This PR is to add a flag to control whether to generate PORT_QOS_MAP\|global entry or not. It's because for some HWSKU, such as BackEndToRRouter and BackEndLeafRouter, there is no DSCP_TO_TC_MAP defined. Hence, if the PORT_QOS_MAP\|global entry is generated, OA will report some error because the DSCP_TO_TC_MAP map AZURE can not be found. Jul 14 00:24:40.286767 str2-7050qx-32s-acs-03 ERR swss#orchagent: :- saiObjectTypeQuery: invalid object id oid:0x7fddb43605d0 Jul 14 00:24:40.286767 str2-7050qx-32s-acs-03 ERR swss#orchagent: :- meta_generic_validation_objlist: SAI_SWITCH_ATTR_QOS_DSCP_TO_TC_MAP:SAI_ATTR_VALUE_TYPE_OBJECT_ID object on list [0] oid 0x7fddb43605d0 is not valid, returned null object id Jul 14 00:24:40.286767 str2-7050qx-32s-acs-03 ERR swss#orchagent: :- applyDscpToTcMapToSwitch: Failed to apply DSCP_TO_TC QoS map to switch rv:-5 Jul 14 00:24:40.286767 str2-7050qx-32s-acs-03 ERR swss#orchagent: :- doTask: Failed to process QOS task, drop it This PR is to address the issue. How I did it Add a flag require_global_dscp_to_tc_map to control whether to generate the PORT_QOS_MAP\|global entry. The default value for require_global_dscp_to_tc_map is true. If the device type is storage backend, the value is changed to false. Then the PORT_QOS_MAP\|global entry is not generated. How to verify it Update the current test_qos_dscp_remapping_render_template to cover storage backend.	2022-07-14 10:19:00 -07:00
Alexander Allen	429254cb2d	[arm] Refactor installer and build to allow arm builds targeted at grub platforms (#11341 ) Refactors the SONiC Installer to support greater flexibility in building for a given architecture and bootloader. #### Why I did it Currently the SONiC installer assumes that if a platform is ARM based that it uses the `uboot` bootloader and uses the `grub` bootloader otherwise. This is not a correct assumption to make as ARM is not strictly tied to uboot and x86 is not strictly tied to grub. #### How I did it To implement this I introduce the following changes: * Remove the different arch folders from the `installer/` directory * Merge the generic components of the ARM and x86 installer into `installer/installer.sh` * Refactor x86 + grub specific functions into `installer/default_platform.conf` * Modify installer to call `default_platform.conf` file and also call `platform/[platform]/patform.conf` file as well to override as needed * Update references to the installer in the `build_image.sh` script * Add `TARGET_BOOTLOADER` variable that is by default `uboot` for ARM devices and `grub` for x86 unless overridden in `platform/[platform]/rules.mk` * Update bootloader logic in `build_debian.sh` to be based on `TARGET_BOOTLOADER` instead of `TARGET_ARCH` and to reference the grub package in a generic manner #### How to verify it This has been tested on a ARM test platform as well as on Mellanox amd64 switches as well to ensure there was no impact. #### Description for the changelog [arm] Refactor installer and build to allow arm builds targeted at grub platforms #### Link to config_db schema for YANG module changes N/A	2022-07-12 15:00:57 -07:00
tjchadaga	849eb4bf32	Changes to persist TSA/B state across reloads (#11257 )	2022-07-12 00:22:48 -07:00
Neetha John	7cd10019b8	Add backend acl template (#11220 ) Why I did it Storage backend has all vlan members tagged. If untagged packets are received on those links, they are accounted as RX_DROPS which can lead to false alarms in monitoring tools. Using this acl to hide these drops. How I did it Created a acl template which will be loaded during minigraph load for backend. This template will allow tagged vlan packets and dropped untagged How to verify it Unit tests Signed-off-by: Neetha John <nejo@microsoft.com>	2022-07-06 10:24:16 -07:00
Stepan Blyshchak	ef8675d7ab	[sonic_debian_extension] install systemd-bootchart (#11047 ) - Why I did it Implemented sonic-net/SONiC#1001 - How I did it Install systemd-bootchart tool and provide default config for it. - How to verify it Run build and verify systemd-bootchart is installed. Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>	2022-07-06 14:03:31 +03:00
judyjoseph	baaf8b085f	Revert "Update include_macsec flag if type is SpineRouter (#11141 )" (#11306 ) This reverts commit `c9f36957db`.	2022-07-01 11:00:30 -07:00
davidpil2002	8b7d069880	Add support for Password Hardening (#10323 ) - Why I did it New security feature for enforcing strong passwords when login or changing passwords of existing users into the switch. - How I did it By using mainly Linux package named pam-cracklib that support the enforcement of user passwords, the daemon named hostcfgd, will support add/modify password policies that enforce and strengthen the user passwords. - How to verify it Manually Verification- 1. Enable the feature, using the new sonic-cli command passw-hardening or manually add the password hardening table like shown in HLD by using redis-cli command 2. Change password policies manually like in step 1. Notes: password hardening CLI can be found in sonic-utilities repo- P.R: Add support for Password Hardening sonic-utilities#2121 code config path: config/plugins/sonic-passwh_yang.py code show path: show/plugins/sonic-passwh_yang.py 3. Create a new user (using adduser command) or modify an existing password by using passwd command in the terminal. And it will now request a strong password instead of default linux policies. Automatic Verification - Unitest: This PR contained unitest that cover: 1. test default init values of the feature in PAM files 2. test all the types of classes policies supported by the feature in PAM files 3. test aging policy configuration in PAM files	2022-06-29 15:34:56 +03:00
bingwang-ms	ac86f71287	Add extra lossy PG profile for ports between T1 and T2 (#11157 ) Signed-off-by: bingwang <wang.bing@microsoft.com> Why I did it This PR brings two changes Add lossy PG profile for PG2 and PG6 on T1 for ports between T1 and T2. After PR Update qos config to clear queues for bounced back traffic #10176 , the DSCP_TO_TC_MAP and TC_TO_PG_MAP is updated when remapping is enable DSCP_TO_TC_MAP Before After Why do this change "2" : "1" "2" : "2" Only change for leaf router to map DSCP 2 to TC 2 as TC 2 will be used for lossless TC "6" : "1" "6" : "6" Only change for leaf router to map DSCP 6 to TC 6 as TC 6 will be used for lossless TC TC_TO_PRIORITY_GROUP_MAP Before After Why do this change "2" : "0" "2" : "2" Only change for leaf router to map TC 2 to PG 2 as PG 2 will be used for lossless PG "6" : "0" "6" : "6" Only change for leaf router to map TC 6 to PG 6 as PG 6 will be used for lossless PG So, we have two new lossy PGs (2 and 6) for the T2 facing ports on T1, and two new lossless PGs (2 and 6) for the T0 facing port on T1. However, there is no lossy PG profile for the T2 facing ports on T1. The lossless PGs for ports between T1 and T0 have been handled by buffermgrd .Therefore, We need to add lossy PG profiles for T2 facing ports on T1. We don't have this issue on T0 because PG 2 and PG 6 are lossless PGs, and there is no lossy traffic mapped to PG 2 and PG 6 Map port level TC7 to PG0 Before the PCBB change, DSCP48 -> TC 6 -> PG 0. After the PCBB change, DSCP48 -> TC 7 -> PG 7 Actually, we can map TC7 to PG0 to save a lossy PG. How I did it Update the qos and buffer template. How to verify it Verified by UT.	2022-06-28 12:50:33 -07:00
judyjoseph	c9f36957db	Update include_macsec flag if type is SpineRouter (#11141 ) Add the support to enable macsec when type is SpineRouter	2022-06-24 10:32:02 -07:00
xumia	fdef1f0342	[Build]: Support to use symbol links for lazy installation targets to reduce the image size (#10923 ) Why I did it Support to use symbol links in platform folder to reduce the image size. The current solution is to copy each lazy installation targets (xxx.deb files) to each of the folders in the platform folder. The size will keep growing when more and more packages added in the platform folder. For cisco-8000 as an example, the size will be up to 2G, while most of them are duplicate packages in the platform folder. How I did it Create a new folder in platform/common, all the deb packages are copied to the folder, any other folders where use the packages are the symbol links to the common folder. Why platform.tar? We have implemented a patch for it, see #10775, but the problem is the the onie use really old unzip version, cannot support the symbol links. The current solution is similar to the PR 10775, but make the platform folder into a tar package, which can be supported by onie. During the installation, the package.tar will be extracted to the original folder and removed.	2022-06-21 13:03:55 +08:00
Stepan Blyshchak	42576d2664	[auto-ts] add memory check (#10433 ) #### Why I did it To support automatic techsupport invokation in case memory usage is too high. #### How I did it Implemented according to https://github.com/Azure/SONiC/pull/939 #### How to verify it UT, manual test on the switch. DEPENDS on https://github.com/Azure/sonic-utilities/pull/2116	2022-06-20 09:39:05 -07:00
bingwang-ms	83f23e26ff	Generate switch level dscp_to_tc_map entry from qos_config template (#11087 ) * Generate switch level dscp_to_tc_map Signed-off-by: bingwang <wang.bing@microsoft.com>	2022-06-17 08:44:30 +08:00
bingwang-ms	1cc602c6af	Add two extra lossless queues for bounced back traffic (#10496 ) Signed-off-by: bingwang <bingwang@microsoft.com> Why I did it This PR is to add two extra lossless queues for bounced back traffic. HLD sonic-net/SONiC#950 SKUs include Arista-7050CX3-32S-C32 Arista-7050CX3-32S-D48C8 Arista-7260CX3-D108C8 Arista-7260CX3-C64 Arista-7260CX3-Q64 How I did it Update the buffers.json.j2 template and buffers_config.j2 template to generate new BUFFER_QUEUE table. For T1 devices, queue 2 and queue 6 are set as lossless queues on T0 facing ports. For T0 devices, queue 2 and queue 6 are set as lossless queues on T1 facing ports. Queue 7 is added as a new lossy queue as DSCP 48 is mapped to TC 7, and then mapped into Queue 7 How to verify it Verified by UT Verified by coping the new template and generate buffer config with sonic-cfggen	2022-06-02 13:03:27 -07:00
bingwang-ms	0c9bbee735	Update qos template to support SYSTEM_DEFAULT table (#10936 ) * Update qos template to support SYSTEM_DEFAULT table Signed-off-by: bingwang <wang.bing@microsoft.com>	2022-06-02 21:48:57 +08:00
Hua Liu	96954f0134	[swsscommon] Add c++ version sonic-db-cli from sonic-swss-common (#10825 ) #### Why I did it Fix sonic-db-cli high CPU usage on SONiC startup issue: https://github.com/Azure/sonic-buildimage/issues/10218 ETA of this issue will be 2022/05/31 #### How I did it Re-write sonic-cli with c++ in sonic-swss-common: https://github.com/Azure/sonic-swss-common/pull/607 Modify swss-common rules and slave.mk to install c++ version sonic-db-cli. #### How to verify it Pass all E2E test scenario. #### Which release branch to backport (provide reason below if selected) <!-- - Note we only backport fixes to a release branch, not features! - Please also provide a reason for the backporting below. - e.g. - [x] 202006 --> - [ ] 201811 - [ ] 201911 - [ ] 202006 - [ ] 202012 - [ ] 202106 - [ ] 202111 #### Description for the changelog Build and install c++ version sonic-db-cli from swss-common. #### Link to config_db schema for YANG module changes <!-- Provide a link to config_db schema for the table for which YANG model is defined Link should point to correct section on https://github.com/Azure/SONiC/wiki/Configuration. --> #### A picture of a cute animal (not mandatory but encouraged)	2022-06-01 08:05:53 +08:00
davidpil2002	ab0930313b	[YANG] Add support for Password Hardening (#10322 ) - Why I did it Yang Model about password hardening feature, the sonic CLI of this feature was autogenerated from this Yang model - How I did it Create new Yang model in src/sonic-yang-models/yang-models/sonic-passwh.yang. - How to verify it There are unitests(yang test) in this P.R covering all the passwords policies with good and bad values cases. Or is possible manually using the config/show password commands that were autogenerated from this Yang model. (this CLI code added in sonic-utilities)	2022-05-29 13:54:51 +03:00
xumia	f0dfd398a6	Revert "Reduce image size for lazy installation packages (#10775 )" (#10916 ) This reverts commit `15cf9b0d70`. Why I did it Revert the PR #10775, for it has impact on onie installation. It is caused by the symbol links not supported in some of the onie unzip. We will enable after fixing the issue, see #10914	2022-05-26 09:39:48 +08:00
abdosi	0285bfe42e	[chassis] Fix issues regarding database service failure handling and mid-plane connectivity for namespace. (#10500 ) What/Why I did: Issue1: By setting up of ipvlan interface in interface-config.sh we are not tolerant to failures. Reason being interface-config.service is one-shot and do not have restart capability. Scenario: For example if let's say database service goes in fail state then interface-services also gets failed because of dependency check but later database service gets restart but interface service will remain in stuck state and the ipvlan interface nevers get created. Solution: Moved all the logic in database service from interface-config service which looks more align logically also since the namespace is created here and all the network setting (sysctl) are happening here.With this if database starts we recreate the interface. Issue 2: Use of IPVLAN vs MACVLAN Currently we are using ipvlan mode. However above failure scenario is not handle correctly by ipvlan mode. Once the ipvlan interface is created and ip address assign to it and if we restart interface-config or database (new PR) service Linux Kernel gives error "Error: Address already assigned to an ipvlan device." based on this:https://github.com/torvalds/linux/blob/master/drivers/net/ipvlan/ipvlan_main.c#L978Reason being if we do not do cleanup of ip address assignment (need to be unique for IPVLAN) it remains in Kernel Database and never goes to free pool even though namespace is deleted. Solution: Considering this hard dependency of unique ip macvlan mode is better for us and since everything is managed by Linux Kernel and no dependency for on user configured IP address. Issue3: Namespace database Service do not check reachability to Supervisor Redis Chassis Server. Currently there is no explicit check as we never do Redis PING from namespace to Supervisor Redis Chassis Server. With this check it's possible we will start database and all other docker even though there is no connectivity and will hit the error/failure late in cycle Solution: Added explicit PING from namespace that will check this reachability. Issue 4:flushdb give exception when trying to accces Chassis Server DB over Unix Sokcet. Solution: Handle gracefully via try..except and log the message.	2022-05-24 16:54:12 -07:00
Senthil Kumar Guruswamy	f37dd770cd	System Ready (#10479 ) Why I did it At present, there is no mechanism in an event driven model to know that the system is up with all the essential sonic services and also, all the docker apps are ready along with port ready status to start the network traffic. With the asynchronous architecture of SONiC, we will not be able to verify if the config has been applied all the way down to the HW. But we can get the closest up status of each app and arrive at the system readiness. How I did it A new python based system monitor tool is introduced under system-health framework to monitor all the essential system host services including docker wrapper services on an event based model and declare the system is ready. This framework gives provision for docker apps to notify its closest up status. CLIs are provided to fetch the current system status and also service running status and its app ready status along with failure reason if any. How to verify it "show system-health sysready-status" click CLI Syslogs for system ready	2022-05-20 13:25:11 -07:00
Marty Y. Lok	23f9126f59	[VoQ][config] Multiasic Supervisor card fails to load config_db#.json in chassis when system is reboot (#10106 ) Supervisor card fails to load config_db#.json in chassis when system reboot. This is an intermittent issue, fixes #10105	2022-05-09 11:06:11 -07:00
xumia	15cf9b0d70	Reduce image size for lazy installation packages (#10775 ) Why I did it The image size is too large, when there are multiple lazy packages and multiple platforms. It is not necessary to keep the lazy installation packages in multiple copies. For cisco image, the image size will reduce from 3.5G to 1.7G. How I did it Use symbol links to only keep one package for each of the lazy package. Make a new folder fsroot/platform/common Copy the lazy packages into the folder. When using a package in each of the platform, such as x86_64-grub, x86_64-8800_rp-r0, x86_64-8201_on-r0, etc, only make a symbol link to the package in the common folder.	2022-05-09 08:26:09 -07:00
xumia	8ec8900d31	Support SONiC OpenSSL FIPS 140-3 based on SymCrypt engine (#9573 ) Why I did it Support OpenSSL FIPS 140-3, see design doc: https://github.com/Azure/SONiC/blob/master/doc/fips/SONiC-OpenSSL-FIPS-140-3.md. How I did it Install the fips packages. To build the fips packages, see https://github.com/Azure/sonic-fips Azure pipelines: https://dev.azure.com/mssonic/build/_build?definitionId=412 How to verify it Validate the SymCrypt engine: admin@sonic:~$ dpkg-query -W \| grep openssl openssl 1.1.1k-1+deb11u1+fips symcrypt-openssl 0.1 admin@sonic:~$ openssl engine -v \| grep -i symcrypt (symcrypt) SCOSSL (SymCrypt engine for OpenSSL) admin@sonic:~$	2022-05-06 07:21:30 +08:00
shlomibitton	4ec3af86af	[Fastboot] Delay PMON service for better fastboot performance (#10567 ) - Why I did it Profiling the system state on init after fast-reboot during create_switch function execution, it is possible to see few python scripts running at the same time. This parallel execution consume CPU time and the duration of create_switch is longer than it should be. Following this finding, and the motivation to ensure these services will not interfere in the future, PMON is delayed in 90 seconds until the system finish the init flow after fastboot. - How I did it Add a timer for PMON service. Exclude for MLNX platform the start trigger of PMON when SYNCD starts in case of fastboot. Copy the timer file to the host bin image. - How to verify it Run fast-reboot on MLNX platform and observe faster create_switch execution time.	2022-05-02 10:44:17 +03:00
shlomibitton	1d84e0d7df	[Fastboot] Delay LLDP service for better fastboot performance (#10568 ) - Why I did it Profiling the system state on init after fast-reboot during create_switch function execution, it is possible to see few python scripts running at the same time. This parallel execution consume CPU time and the duration of create_switch is longer than it should be. Following this finding, and the motivation to ensure these services will not interfere in the future, LLDP is delayed in 90 seconds until the system finish the init flow after fastboot. - How I did it Add a timer for LLDP service. Copy the timer file to the host bin image. - How to verify it Run fast-reboot on MLNX platform and observe faster create_switch execution time. This PR is dependent on PR: #10567	2022-04-28 10:35:14 +03:00
ganglv	9d7387a18e	[sonic-host-services]: Fix import and invalid path (#10660 ) Why I did it Can not start sonic-hostservice How I did it Install python3-dbus and systemd-python, and replace invalid path How to verify it Start the service with below commands: sudo systemctl start sonic-hostservice sudo systemctl status sonic-hostservice Signed-off-by: Gang Lv ganglv@microsoft.com	2022-04-27 07:14:51 +08:00
Saikrishna Arcot	64187a1b15	Remove SSH host keys after installing the custom version of sshd (#10633 ) * Remove SSH host keys after installing the custom version of sshd Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com> * Use an override for for sshd instead of overwriting the service file Don't overwrite upstream's .service file, and instead use an override file for making sure the host key(s) are generated. Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>	2022-04-25 10:38:52 -07:00
bingwang-ms	3fc3259a35	Define qos map `AZURE_TUNNEL` for QoS remapping of tunnel traffic (#10565 ) * Add AZURE_TUNNEL map Signed-off-by: bingwang <wang.bing@microsoft.com>	2022-04-25 15:06:10 +08:00
kellyyeh	2a516a7763	[dhcp_relay] Enable dhcp_relay on EPMS, MgmtTsTor, MgmtToRRouter and BackEndToRRouter (#10474 )	2022-04-15 18:01:24 -07:00
Yakiv Huryk	d9117d9411	[Mellanox][asan] add address sanitizer support for syncd (#10266 ) Why I did it To support address sanitizer for Mellanox syncd How I did it /var/log/asan is mapped for syncd container (the same as for swss) container stop() has a timeout (60s) for syncd (the same as for swss) This is so libasan has enough time to generate a report. added ASAN's log path to Mellanox syncd supervisord.conf added "asan: yes" to sonic_version.yml How to verify it Added artificial memory leaks Compiled with ENABLE_ASAN=y Installed the image on DUT Rebooted the DUT Verified that /var/log/asan/syncd-asan.log contains the leaks Signed-off-by: Yakiv Huryk <yhuryk@nvidia.com>	2022-04-14 15:00:32 -07:00
byu343	f7a6553933	[docker-syncd]: Add optional shm-size to syncd container (#10516 ) Why I did it In the bringup of tomahawk4/trident4, we realized that such chips need a larger size of /dev/shm in syncd container, so we added the option --shm-size to the docker create for syncd. The default value for shm-size is 64m; after this change, people can add SYNCD_SHM_SIZE=128m to platform_env.conf to change it to 128m. How to verify it We verified that after this change, 1) on existing platforms without platform_env.conf, the size of /dev/shm in syncd container (df -h \| grep shm) is still the default 64M; 2) after we add SYNCD_SHM_SIZE=128m to platform_env.conf, /dev/shm in syncd becomes 128M.	2022-04-09 10:47:18 -07:00
bingwang-ms	b9dd1df372	Update qos config to clear queues for bounced back traffic (#10176 ) * Update qos config to clear queues for bounced back traffic Signed-off-by: bingwang <bingwang@microsoft.com>	2022-04-05 22:32:25 +08:00

1 2 3 4 5 ...

530 Commits