sonic-buildimage

Author	SHA1	Message	Date
bingwang-ms	e159998657	[202012][cherry-pick] Add two extra lossless queues for bounced back traffic (#10715 ) * Add extra lossless queues Signed-off-by: bingwang <bingwang@microsoft.com>	2022-06-04 19:25:02 +08:00
bingwang-ms	7ec6a60230	[cherry-pick] [202012] Update qos config to clear queues for bounced back traffic (#10608 ) * Update qos config to clear queues for bounced back traffic Signed-off-by: bingwang <wang.bing@microsoft.com>	2022-06-02 16:29:25 +08:00
xumia	06addae853	Revert "Reduce image size for lazy installation packages (#10775 )" (#10916 ) This reverts commit `15cf9b0d70`. Why I did it Revert the PR #10775, for it has impact on onie installation. It is caused by the symbol links not supported in some of the onie unzip. We will enable after fixing the issue, see #10914	2022-05-27 17:00:50 +00:00
shlomibitton	c71c91e2b0	[202012] [Fastboot] Delay PMON service for better fastboot performance (#10745 ) #### Why I did it Profiling the system state on init after fast-reboot during create_switch function execution, it is possible to see few python scripts running at the same time. This parallel execution consume CPU time and the duration of create_switch is longer than it should be. Following this finding, and the motivation to ensure these services will not interfere in the future, PMON is delayed in 90 seconds until the system finish the init flow after fastboot. #### How I did it Add a timer for PMON service. Exclude for MLNX platform the start trigger of PMON when SYNCD starts in case of fastboot. Copy the timer file to the host bin image. #### How to verify it Run fast-reboot on MLNX platform and observe faster create_switch execution time.	2022-05-15 23:31:32 -07:00
shlomibitton	bca8a244c6	[202012] [Fastboot] Delay LLDP service for better fastboot performance (#10568 ) (#10744 ) This PR is to backport a fix #10568 This PR is dependent on PR: #10745 - Why I did it Profiling the system state on init after fast-reboot during create_switch function execution, it is possible to see few python scripts running at the same time. This parallel execution consume CPU time and the duration of create_switch is longer than it should be. Following this finding, and the motivation to ensure these services will not interfere in the future, LLDP is delayed in 90 seconds until the system finish the init flow after fastboot. - How I did it Add a timer for LLDP service. Copy the timer file to the host bin image. - How to verify it Run fast-reboot on MLNX platform and observe faster create_switch execution time.	2022-05-15 15:05:29 +03:00
xumia	951d93e362	Reduce image size for lazy installation packages (#10775 ) Why I did it The image size is too large, when there are multiple lazy packages and multiple platforms. It is not necessary to keep the lazy installation packages in multiple copies. For cisco image, the image size will reduce from 3.5G to 1.7G. How I did it Use symbol links to only keep one package for each of the lazy package. Make a new folder fsroot/platform/common Copy the lazy packages into the folder. When using a package in each of the platform, such as x86_64-grub, x86_64-8800_rp-r0, x86_64-8201_on-r0, etc, only make a symbol link to the package in the common folder.	2022-05-10 06:44:40 +00:00
Stepan Blyshchak	fa1e364f54	[services] kill container on stop in warm/fast mode (#10511 ) To optimize stop on warm boot, added kill for containers Use service "kill" in the shutdown path for fast and warm reboot. For all other reload methods, service "stop" is used. This is done to save time in shutdown path, and to overall improve the time spent in warm and fast reload. How - Use service_mgmt.sh to trigger common logic to initiate kill (fast/warm) or stop (cold) for database.sh, radv.sh, snmp.sh, telemetry.sh, mgmt-framework.sh Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>, Vaibhav H D <vaibhav.dixit@microsoft.com>	2022-04-18 14:27:48 -07:00
Saikrishna Arcot	aafb3d00e2	Start haveged before systemd-random-seed (#10328 ) The haveged service file in Debian Buster specifies that haveged should start after systemd-random-seed starts (this was removed in Bullseye after systemd changes caused a bootloop). This is a bit counterproductive, since haveged is meant to be used in environments with minimal sources of entropy, but one of the checks that systemd-random-seed does is to verify that entropy is present. Therefore, override the default .service file for haveged that moves systemd-random-seed to the Before list, allowing it to start before systemd-random-seed checks the system entropy level. (systemd doesn't allow removing items from dependency/ordering entries such as After= and Before=, so the entire .service file has to be overwritten.) Note that despite this, haveged takes up to two seconds to actually start working, so systemd-random-seed may still block for about two seconds. However, this still allows other work (such as running rc.local) to proceed a bit sooner. Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>	2022-03-24 14:28:42 -07:00
xumia	67312ff635	[Build]: Use one debian mirror config (#10281 ) Why I did it Use one debian mirror config. The empty config in https://github.com/Azure/sonic-buildimage/blob/master/files/image_config/apt/sources.list overrides the file https://github.com/Azure/sonic-buildimage/blob/master/files/apt/sources.list.amd64 (armhf/arm64), it does not make sense. All the content in files/image_config/apt is no use, any one wants to add mirror config, please add in files/apt. How I did it Remove files/image_config/apt and the reference.	2022-03-21 17:04:19 +08:00
xumia	413ee3e219	[Build]: Fix /proc not mounted issue (#10164 ) (#10256 ) [Build]: Fix /proc not mounted issue	2022-03-19 22:19:06 +08:00
xumia	a8d844c83d	[build]: Fix marvell-armhf build hung issue (#10156 ) The marvel-armhf build is hung, it does not exist after waiting for a long time. It is caused by the process /etc/entropy.py which is started by the postinst script in target/debs/buster/sonic-platform-nokia-7215_1.0_armhf.deb $ cat postinst sh /usr/sbin/nokia-7215_plt_setup.sh ... $ cat usr/sbin/nokia-7215_plt_setup.sh \| tail python /etc/entropy.py & $ cat etc/entropy.py if path.exists("/proc/sys/kernel/random/entropy_avail"): while 1: while avail() < 2048: with open('/dev/urandom', 'rb') as urnd, open("/dev/random", mode='wb') as rnd: d = urnd.read(512) t = struct.pack('ii', 4 * len(d), len(d)) + d fcntl.ioctl(rnd, RNDADDENTROPY, t) time.sleep(30) It is a workaround to fix the build issue, need to fix debian package, and revert the change.	2022-03-07 08:00:56 -08:00
vmittal-msft	304ec5b0cd	Updated traffic scheduler settings for HWSKUs : DellEMC-Z9332f-O32 & DellEMC-Z9332f-M-O16C64 (#9927 )	2022-02-15 16:15:20 -08:00
Lawrence Lee	b3a3aa0c38	[mux]: Fix `mark_dhcp_packet` (#9373 ) - Consolidate the two [Service] sections by moving the ExecStartPre line for mark_dhcp_packet.py to the first section and removing the second. - Make the mark_dhcp_packet.py file executable - Also clean up mark_dhcp_packet.py - Remove unused imports - Fix spacing and line lengths to conform to PEP8 Signed-off-by: Lawrence Lee <lawlee@microsoft.com>	2021-12-01 02:28:56 +00:00
Stephen Sun	fafd5327bd	[Reclaim buffer] Common infrastructure update for reclaiming buffer (#9133 ) - Why I did it This is to update the common sonic-buildimage infra for reclaiming buffer. - How I did it Render zero_profiles.j2 to zero_profiles.json for vendors that support reclaiming buffer The zero profiles will be referenced in PR [Reclaim buffer] Reclaim unused buffers by applying zero buffer profiles #8768 on Mellanox platforms and there will be test cases to verify the behavior there. Rendering is done here for passing azure pipeline. Load zero_profiles.json when the dynamic buffer manager starts Generate inactive port list to reclaim buffer Signed-off-by: Stephen Sun <stephens@nvidia.com>	2021-12-01 02:28:46 +00:00
trzhang-msft	86fa5eede2	Add service mark_dhcp_packet to mux container (#9015 ) - add a new service "mark_dhcp_packet" to mux container - apply packet marks on a per-interface basis in ebtables - write packet marks to "DHCP_PACKET_MARK" table in state_db	2021-11-15 21:36:29 +00:00
Lawrence Lee	b027e87ffb	[mux.service]: Remove pmon dependency (#9211 ) Signed-off-by: Lawrence Lee <lawlee@microsoft.com>	2021-11-11 02:56:27 +00:00
Lawrence Lee	57ad50cfd9	Merged PR 4559560: [bgp]: Switch to standby if BGP container exits [bgp]: Switch mux to standby if BGP container exits Signed-off-by: Lawrence Lee <lawlee@microsoft.com>	2021-11-10 18:54:33 -08:00
Lawrence Lee	77378b4364	[mux]: Call write_standby from host only Signed-off-by: Lawrence Lee <lawlee@microsoft.com>	2021-11-10 18:54:33 -08:00
Lawrence Lee	25712c712e	[mux]: Make write_standby available on host Signed-off-by: Lawrence Lee <lawlee@microsoft.com> [write_standby]: Cleanup and fix build Signed-off-by: Lawrence Lee <lawlee@microsoft.com>	2021-11-10 18:54:33 -08:00
Tamer Ahmed	18d1f65339	Merged PR 4813977: [mux] Update Service Install With SONiC Target [mux] Update Service Install With SONiC Target Recent PR grouped all SONiC service into sonic.taget. The install section of mux.service was not update and this causes delays when using config reload as the service failed state is not being reset. signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>	2021-11-10 18:54:33 -08:00
Lawrence Lee	70fbd6826c	Merged PR 4366316: [mux.service]: Bind to sonic.target [mux.service]: Bind to sonic.target Signed-off-by: Lawrence Lee <lawlee@microsoft.com>	2021-11-10 18:54:33 -08:00
Tamer Ahmed	b42aef68f3	Merged PR 4234524: [mux] Start Mux on Only Dual-ToR Platform [mux] Start Mux on Only Dual-ToR Platform mux docker depends on the presence of mux cable hardware and is supposed to run only Gemini ToRs. This PR change the mux feature config in order to enable mux docker based on device configuration. signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>	2021-11-10 18:54:33 -08:00
Tamer Ahmed	b8f70f8986	Merged PR 3845699: [linkmgrd]: Introduce MUX cable linkmgrd Linkmgrd monitors link status, mux status, and link state. Has the link becomes unhealthy, linkmgrd will trigger mux switchover on a standby ToR ensuring uninterrupted service to servers/blades. This PR is initial implementation of linkmgrd. Also, docker-mux container hold packages related to maintaining and managing mux cable. It currently runs linkmgrd binary that monitor and switches the mux if needed. This PR also introduces mux-container and starts linkmgrd as startup when build is configured with INCLUDE_MUX=y Edit: linkmgrd PR will follow. signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com> Related work items: #2315, #3146150	2021-11-10 18:54:33 -08:00
tjchadaga	9a1b1bc44e	Fix for additional intf flap during fast-reboot (#9166 )	2021-11-09 23:20:06 +00:00
Vaibhav Hemant Dixit	636870d86f	Save DB dump after warm/fast reboot (#8803 ) As a part of warmboot, redis database is dumped: `c97fe546e5/scripts/fast-reboot (L269)` However, this dump file is deleted, after it is loaded back into db post reboot. The DB dump can be useful for debugging purpose, hence taking a backup of it can be useful. Instead of deleting the dump, rename and keep the dump.	2021-09-27 02:29:12 +00:00
Stephen Sun	d599450052	Use predefined macro as vendor information (#8361 ) #### Why I did it Use a predefined variable to get vendor information when the swss docker container is created #### How I did it Use `{{ sonic_asic_platform }}` instead of `$SONIC_CFGGEN -y /etc/sonic/sonic_version.yml -v asic_type` #### How to verify it Manually test.	2021-08-16 07:51:01 +00:00
Sudharsan Dhamal Gopalarathnam	ba2284c4c0	Grouping delayed services under a target for config reload checks (#7846 ) #### Why I did it Create a target for delayed service timers. Few services in sonic have delayed to speed up the bring up of the system and essential services. However there is no way to track when they start. This will be a problem when executing config reload as config reload expects all services to be up. Hence grouped all the timers that trigger the delayed services under one target so that they could be tracked in 'config reload' command #### How I did it Created delay.target service and add created dependency on the delayed targets.	2021-08-16 07:50:56 +00:00
Longxiang Lyu	25f53289eb	[swss][arp_update] Send ipv6 pings over vlan sub interfaces (#8363 ) #### Why I did it * `arp_update` fails to ping those neighbors over vlan sub interfaces. #### How I did it * modify `arp_update_vars.j2` to get vlan sub interfaces with ipv6 addresses assigned. * modify `arp_update` to send ipv6 pings over those retrieved vlan sub interfaces. Signed-off-by: Longxiang Lyu <lolv@microsoft.com>	2021-08-07 12:43:51 +00:00
Guohan Lu	db8cc247e0	[build]: Fix docker pull on armhf platform armhf build uses native dockerd Signed-off-by: Guohan Lu <lguohan@gmail.com>	2021-08-06 23:35:25 -07:00
VenkatCisco	cb8ff6dba1	[baseimage]: add j2cli to sonic_debian_extension.j2 (#8019 ) j2cli provides access to jinja library. cisco platform.py requires j2cli to handle jinja template configuration files.	2021-08-05 15:22:57 +00:00
vdahiya12	5e594043ce	[pmon] create and mount firmware directory on PMON for firmware upgrade support on muxcable (#8283 ) This PR creates a directory firmware on the HOST with the path /usr/share/sonic/firmware, as well as this is mounted on PMON container with the same path /usr/share/sonic/firmware. This is required for firmware upgrade support for muxcable as currently by design all Y-Cable API's are called by xcvrd. As such if CLI has to transfer a file to PMON we need to mount a directory from host to PMON just for getting the firmware files. Hence we require this change. Signed-off-by: vaibhav-dahiya <vdahiya@microsoft.com>	2021-08-05 15:22:41 +00:00
Renuka Manavalan	91f611157a	cherry-pick PR #8158 & PR #8205 into 202012 (#8235 )	2021-07-20 20:52:33 -07:00
Kebo Liu	86d64d2fef	mount 'mellanox' folder only instead of create each sub folder (#7830 ) #### Why I did it Following the discussion in another PR https://github.com/Azure/sonic-buildimage/pull/7708#discussion_r642933510 , since there will be multi subfolders under /var/log/mellanox, so we agreed to only mount this folder and the subfolders will be created afterward on demand. #### How I did it during the syncd docker creation, only mount folder /var/log/mellanox #### How to verify it build an Mellanox image and verify the related folder on the host and docker side.	2021-07-13 11:36:56 +00:00
Guohan Lu	d3e2983188	Revert "[Kubernetes]: The kube server could be used as http-proxy for docker (#7469 )" This reverts commit `e851a42db7`.	2021-07-01 18:41:21 -07:00
Renuka Manavalan	e851a42db7	[Kubernetes]: The kube server could be used as http-proxy for docker (#7469 ) Why I did it The SONiC switches get their docker images from local repo, populated during install with container images pre-built into SONiC FW. With the introduction of kubernetes, new docker images available in remote repo could be deployed. This requires dockerd to be able to pull images from remote repo. Depending on the Switch network domain & config, it may or may not be able to reach the remote repo. In the case where remote repo is unreachable, we could potentially make Kubernetes server to also act as http-proxy. How I did it When admin explicitly enables, the kubernetes-server could be configured as docker-proxy. But any update to docker-proxy has to be via service-conf file environment variable, implying a "service restart docker" is required. But restart of dockerd is vey expensive, as it would restarts all dockers, including database docker. To avoid dockerd restart, pre-configure an http_proxy using an unused IP. When k8s server is enabled to act as http-proxy, an IP table entry would be created to direct all traffic to the configured-unused-proxy-ip to the kubernetes-master IP. This way any update to Kubernetes master config would be just manipulating IPTables, which will be transparent to all modules, until dockerd needs to download from remote repo. How to verify it Configure a switch such that image repo is unreachable Pre-configure dockerd with http_proxy.conf using an unused IP (e.g. 172.16.1.1) Update ctrmgrd.service to invoke ctrmgrd.py with "-p" option. Configure a k8s server, and deploy an image for feature with set_owner="kube" Check if switch could successfully download the image or not.	2021-06-17 07:09:50 +00:00
yozhao101	fb2c995f53	[202012][Monit] Deprecate the feature of monitoring the critical processes by Monit (#7823 ) Signed-off-by: Yong Zhao yozhao@microsoft.com Why I did it Currently we leveraged the Supervisor to monitor the running status of critical processes in each container and it is more reliable and flexible than doing the monitoring by Monit. So we removed the functionality of monitoring the critical processes by Monit. How I did it I removed the script process_checker and corresponding Monit configuration entries of critical processes. How to verify it I verified this on the device str-7260cx3-acs-1.	2021-06-09 09:04:22 -07:00
Renuka Manavalan	32e5137ab7	Add service to restore TACACS from old config (#7560 ) Why I did it In upgrade scenarios, where config_db.json is not carry forwarded to new image, it could be left w/o TACACS credentials. Added a service to trigger 5 minutes after boot and restore TACACS, if /etc/sonic/old_config/tacacs.json is present. How I did it By adding a service, that would fire 5 mins after boot. This service apply tacacs if available. How to verify it Upgrade and watch status of tacacs.timer & tacacs.service You may create /etc/sonic/old_config/tacacs.json, with updated credentials (before 5mins after boot) and see that appears in config & persisted too. Which release branch to backport (provide reason below if selected) 201911 202006 202012	2021-06-07 06:02:32 +00:00
yozhao101	3af05fdffe	[Monit] Restart telemetry container if memory usage is beyond the threshold (#7645 ) Signed-off-by: Yong Zhao yozhao@microsoft.com Why I did it This PR aims to monitor the memory usage of streaming telemetry container and restart streaming telemetry container if memory usage is larger than the pre-defined threshold. How I did it I borrowed the system tool Monit to run a script memory_checker which will periodically check the memory usage of streaming telemetry container. If the memory usage of telemetry container is larger than the pre-defined threshold for 10 times during 20 cycles, then an alerting message will be written into syslog and at the same time Monit will run the script restart_service to restart the streaming telemetry container. How to verify it I verified this implementation on device str-7260cx3-acs-1.	2021-05-31 04:38:18 +00:00
Lawrence Lee	cb3a9eec58	[swss.service]: Remove ordering with pmon (#7614 ) Signed-off-by: Lawrence Lee <lawlee@microsoft.com>	2021-05-27 22:29:35 +00:00
Neetha John	cb930c30cd	[qos]: modify dot1p to tc mapping (#7661 ) Map priority 0 to TC 1 and priority 1 to TC 0 Send traffic on priority 0 and 1 and verified that it gets mapped correctly in hw Signed-off-by: Neetha John <nejo@microsoft.com>	2021-05-24 22:25:47 +00:00
Nazarii Hnydyn	0e970582c1	[swss_vars]: Add 'resource_type' attribute. (#7188 ) Signed-off-by: Nazarii Hnydyn <nazariig@nvidia.com>	2021-05-03 10:38:11 -07:00
guxianghong	a0fde3a626	[arm] support compile sonic arm image on arm server (#7285 ) - Support compile sonic arm image on arm server. If arm image compiling is executed on arm server instead of using qemu mode on x86 server, compile time can be saved significantly. - Add kernel argument systemd.unified_cgroup_hierarchy=0 for upgrade systemd to version 247, according to #7228 - rename multiarch docker to sonic-slave-${distro}-march-${arch} Co-authored-by: Xianghong Gu <xgu@centecnetworks.com> Co-authored-by: Shi Lei <shil@centecnetworks.com>	2021-05-02 08:11:56 -07:00
Stepan Blyshchak	ae574ab000	[systemd] disable default systemd udev rules for interfaces (#7369 ) Fix #7364 99-default.link - was always in SONiC, but previous systemd (<247) had an issue and it did not work due to issue systemd/systemd#3374. Now systemd 247 works. However, such policy overrides teamd provided mac address which causes teamd netdev to use a random mac address. Therefore, needs to be disabled. Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>	2021-05-01 19:43:41 -07:00
Renuka Manavalan	a4d81f3c19	Copy dummy flannel.conf to get around absence of CNI Network (#6985 ) Why I did it We skip install of CNI plugin, as we don't need. But this leaves node in "not ready" state, upon joining master. To fix, we copy this dummy .conf file in /etc/cni/net.d How I did it Keep this file in /usr/share/sonic/templates and copy to /etc/cni/net.d upon joining k8s master. How to verify it Upon configuring master-IP and enable join, watch node join and move to ready state. You may verify using kubectl get nodes command	2021-03-10 09:32:49 -08:00
Sujin Kang	15aed52ef2	[pcie.yaml] Move pcie configuration file path to platform directory (#6475 ) - Why I did it The pcie configuration file location is under plugin directory not under platform directory. #6437 - How I did it Move all pcie.yaml configuration file from plugin to platform directory. Remove unnecessary timer to start pcie-check.service Move pcie-check.service to sonic-host-services - How to verify it Verify on the device	2021-03-04 21:23:05 +00:00
Stepan Blyshchak	7fb5a72d23	[services] introduce sonic.target (#5705 ) - Why I did it Group all SONiC services together and able to manage them together. Will be used in config reload command as much simpler and generic way to restart services. - How I did it Add services to sonic.target - How to verify it Together with Azure/sonic-utilities#1199 config reload -y Signed-off-by: Stepan Blyshchak <stepanb@nvidia.com>	2021-03-04 21:23:05 +00:00
dflynn-Nokia	e3ab6b0494	[armhf build] Fix azure-storage dependency on cryptography package (#6780 ) Fix marvell-armhf build break The azure-storage package depends on the cryptography package. Newer versions of cryptography require the rust compiler, the correct version for which is not readily available in buster. Hence we pre-install an older version here to satisfy the azure-storage dependency. Note: This is not a problem for other architectures as pre-built versions of cryptography are available for those. This sequence can be removed after upgrading to debian bullseye.	2021-03-01 09:40:00 -08:00
SuvarnaMeenakshi	b6aaeb979e	[multi_asic][vs]: Add dependency in teamd service to start after topology service(#6594 ) [multi_asic][vs]: Add dependency in teamd service to start after topology service. - Why I did it In multi-asic VS, topology service is run after database service to set up the internal asic topology. swss and syncd have a dependency to start after topology service is run so that the interfaces are moved to right namespace and created in the right namespace. In case of multi-asic vs, during the initial boot up, when there is no configuration added, teamd service starts and swss/syncd do not start as topology service does not start. Upon loading configuration using config_db or minigraph, swss and sycnd start up , but teamd is not restarted as swss is not stopped and started. This causes teamd to be in a bad state and requires a reload of config. - How I did it Add dependency in teamd service to start after topology service is completed. - How to verify it No change in single asic vs or platform. No change in multi-asic regular image. Change only in multi-asic VS. Bring up a multi-asic VS image without any configration, teamd service will fail to start due to dependency failure. Load minigraph, start topology service, load configuration, ensure all services come up. Signed-off-by: SuvarnaMeenakshi <sumeenak@microsoft.com>	2021-02-23 23:56:01 +00:00
Joe LeVeque	d7517a704c	[PDDF] Build and install Python 3 package (#6286 ) - Make PDDF code compliant with both Python 2 and Python 3 - Align code with PEP8 standards using autopep8 - Build and install both Python 2 and Python 3 PDDF packages	2021-02-23 23:56:01 +00:00
Lawrence Lee	e0efbc1e14	[swss]: Clear MUX-related state DB tables on start (#6759 ) * Add MUX_CABLE_TABLE to set of tables to clear on SWSS start, which will clear HW_MUX_CABLE_TABLE and MUX_CABLE_TABLE * Order swss to start before pmon to ensure that DBs are cleared before xcvrd (running inside pmon) starts and re-populates the tables Signed-off-by: Lawrence Lee <lawlee@microsoft.com>	2021-02-16 15:33:03 -08:00

1 2 3 4 5 ...

417 Commits