sonic-buildimage

Author	SHA1	Message	Date
davidpil2002	ab0930313b	[YANG] Add support for Password Hardening (#10322 ) - Why I did it Yang Model about password hardening feature, the sonic CLI of this feature was autogenerated from this Yang model - How I did it Create new Yang model in src/sonic-yang-models/yang-models/sonic-passwh.yang. - How to verify it There are unitests(yang test) in this P.R covering all the passwords policies with good and bad values cases. Or is possible manually using the config/show password commands that were autogenerated from this Yang model. (this CLI code added in sonic-utilities)	2022-05-29 13:54:51 +03:00
xumia	f0dfd398a6	Revert "Reduce image size for lazy installation packages (#10775 )" (#10916 ) This reverts commit `15cf9b0d70`. Why I did it Revert the PR #10775, for it has impact on onie installation. It is caused by the symbol links not supported in some of the onie unzip. We will enable after fixing the issue, see #10914	2022-05-26 09:39:48 +08:00
abdosi	0285bfe42e	[chassis] Fix issues regarding database service failure handling and mid-plane connectivity for namespace. (#10500 ) What/Why I did: Issue1: By setting up of ipvlan interface in interface-config.sh we are not tolerant to failures. Reason being interface-config.service is one-shot and do not have restart capability. Scenario: For example if let's say database service goes in fail state then interface-services also gets failed because of dependency check but later database service gets restart but interface service will remain in stuck state and the ipvlan interface nevers get created. Solution: Moved all the logic in database service from interface-config service which looks more align logically also since the namespace is created here and all the network setting (sysctl) are happening here.With this if database starts we recreate the interface. Issue 2: Use of IPVLAN vs MACVLAN Currently we are using ipvlan mode. However above failure scenario is not handle correctly by ipvlan mode. Once the ipvlan interface is created and ip address assign to it and if we restart interface-config or database (new PR) service Linux Kernel gives error "Error: Address already assigned to an ipvlan device." based on this:https://github.com/torvalds/linux/blob/master/drivers/net/ipvlan/ipvlan_main.c#L978Reason being if we do not do cleanup of ip address assignment (need to be unique for IPVLAN) it remains in Kernel Database and never goes to free pool even though namespace is deleted. Solution: Considering this hard dependency of unique ip macvlan mode is better for us and since everything is managed by Linux Kernel and no dependency for on user configured IP address. Issue3: Namespace database Service do not check reachability to Supervisor Redis Chassis Server. Currently there is no explicit check as we never do Redis PING from namespace to Supervisor Redis Chassis Server. With this check it's possible we will start database and all other docker even though there is no connectivity and will hit the error/failure late in cycle Solution: Added explicit PING from namespace that will check this reachability. Issue 4:flushdb give exception when trying to accces Chassis Server DB over Unix Sokcet. Solution: Handle gracefully via try..except and log the message.	2022-05-24 16:54:12 -07:00
Maxime Lorrillere	392899682f	[Arista] Add support for Wolverine linecards (#8887 ) Add support for WolverineQCpu, WolverineQCpuMs, WolverineQCpuBk, WolverineQCpuBkMs Co-authored-by: Maxime Lorrillere <mlorrillere@arista.com>	2022-05-20 14:11:06 -07:00
Senthil Kumar Guruswamy	f37dd770cd	System Ready (#10479 ) Why I did it At present, there is no mechanism in an event driven model to know that the system is up with all the essential sonic services and also, all the docker apps are ready along with port ready status to start the network traffic. With the asynchronous architecture of SONiC, we will not be able to verify if the config has been applied all the way down to the HW. But we can get the closest up status of each app and arrive at the system readiness. How I did it A new python based system monitor tool is introduced under system-health framework to monitor all the essential system host services including docker wrapper services on an event based model and declare the system is ready. This framework gives provision for docker apps to notify its closest up status. CLIs are provided to fetch the current system status and also service running status and its app ready status along with failure reason if any. How to verify it "show system-health sysready-status" click CLI Syslogs for system ready	2022-05-20 13:25:11 -07:00
Arun Saravanan Balachandran	f4b22f67a4	[initramfs]: SSD firmware upgrade in initramfs (#10748 ) Why I did it To upgrade SSD firmware in initramfs while rebooting from SONiC to SONiC and during NOS to SONiC migration. How I did it New option 'ssd-upgrader-part’ is introduced in grub command line, to indicate the partition and its filesystem type in which the SSD firmware updater is present. ‘ssd-upgrader-part’ syntax is ssd-upgrader-part=<partition>,<filesystem type>. Example: ssd-upgrader-part=/dev/sda8,ext4 A new initramfs script ‘ssd-upgrade’ is included in init-premount and it invokes the SSD firmware updater (ssd-fw-upgrade) present in the partition indicated by the boot option 'ssd-upgrader-part' How to verify it In SONiC, the SSD firmware updater is copied to “/host/” directory. Fast-reboot is to be initiated with the ‘-u’ option ([scripts/fast-reboot] Add option to include ssd-upgrader-part boot option with SONiC partition sonic-utilities#2150) After reboot, while booting into SONiC the SSD firmware updater will be executed in initramfs.	2022-05-12 08:11:02 -07:00
Marty Y. Lok	23f9126f59	[VoQ][config] Multiasic Supervisor card fails to load config_db#.json in chassis when system is reboot (#10106 ) Supervisor card fails to load config_db#.json in chassis when system reboot. This is an intermittent issue, fixes #10105	2022-05-09 11:06:11 -07:00
xumia	15cf9b0d70	Reduce image size for lazy installation packages (#10775 ) Why I did it The image size is too large, when there are multiple lazy packages and multiple platforms. It is not necessary to keep the lazy installation packages in multiple copies. For cisco image, the image size will reduce from 3.5G to 1.7G. How I did it Use symbol links to only keep one package for each of the lazy package. Make a new folder fsroot/platform/common Copy the lazy packages into the folder. When using a package in each of the platform, such as x86_64-grub, x86_64-8800_rp-r0, x86_64-8201_on-r0, etc, only make a symbol link to the package in the common folder.	2022-05-09 08:26:09 -07:00
xumia	8ec8900d31	Support SONiC OpenSSL FIPS 140-3 based on SymCrypt engine (#9573 ) Why I did it Support OpenSSL FIPS 140-3, see design doc: https://github.com/Azure/SONiC/blob/master/doc/fips/SONiC-OpenSSL-FIPS-140-3.md. How I did it Install the fips packages. To build the fips packages, see https://github.com/Azure/sonic-fips Azure pipelines: https://dev.azure.com/mssonic/build/_build?definitionId=412 How to verify it Validate the SymCrypt engine: admin@sonic:~$ dpkg-query -W \| grep openssl openssl 1.1.1k-1+deb11u1+fips symcrypt-openssl 0.1 admin@sonic:~$ openssl engine -v \| grep -i symcrypt (symcrypt) SCOSSL (SymCrypt engine for OpenSSL) admin@sonic:~$	2022-05-06 07:21:30 +08:00
Junchao-Mellanox	681c24878b	Fix race condition between networking service and interface-config service (#10573 ) Why I did it The PR is aimed to fix a bug that mgmt port eth0 may loss IP even if user configured static IP of eth0. This is not a always reproduceable issue, the reproducing flow is like: Systemd starts networking service, which runs a dhcp based configuration and assigned an ip from dhcp. Systemd starts interface-config service who depends on networking service Interface-config service runs command “ifdown –force eth0”, check line. but networking service is still running so that this line failed with error: “error: Another instance of this program is already running.”. This error is printed by ifupdown2 lib who is the main process of networking service. So, ifdown actually does not work here, the ip of eth0 is not down. Interface-config service updates /etc/networking/interface to static configuration. Interface-config service runs command “systemctl restart networking”. This command kills the previous networking related processes (log: networking.service: Main process exited, code=killed, status=15/TERM), and try to reconfigure the ip address with static configuration. But it detects that the configured IP and the existing IP are the same, and it does not really configure the ip to kernel. Hence, the ip is still getting from dhcp. (this could be a bug of ifupdown2: previous ip is from dhcp, new ip is a static ip, it treats them as same instead of re-configuring the IP) When the lease of the ip expires, the ip of eth0 is removed by kernel and the issue reproduces. The issue is not always reproduceable because networking service usually runs fast so that it won't hit step#3. How I did it Check networking service state before running "ifdown –force eth0", wait for it done if it is activating. How to verify it Manual test.	2022-05-05 15:21:44 -07:00
shlomibitton	4ec3af86af	[Fastboot] Delay PMON service for better fastboot performance (#10567 ) - Why I did it Profiling the system state on init after fast-reboot during create_switch function execution, it is possible to see few python scripts running at the same time. This parallel execution consume CPU time and the duration of create_switch is longer than it should be. Following this finding, and the motivation to ensure these services will not interfere in the future, PMON is delayed in 90 seconds until the system finish the init flow after fastboot. - How I did it Add a timer for PMON service. Exclude for MLNX platform the start trigger of PMON when SYNCD starts in case of fastboot. Copy the timer file to the host bin image. - How to verify it Run fast-reboot on MLNX platform and observe faster create_switch execution time.	2022-05-02 10:44:17 +03:00
shlomibitton	1d84e0d7df	[Fastboot] Delay LLDP service for better fastboot performance (#10568 ) - Why I did it Profiling the system state on init after fast-reboot during create_switch function execution, it is possible to see few python scripts running at the same time. This parallel execution consume CPU time and the duration of create_switch is longer than it should be. Following this finding, and the motivation to ensure these services will not interfere in the future, LLDP is delayed in 90 seconds until the system finish the init flow after fastboot. - How I did it Add a timer for LLDP service. Copy the timer file to the host bin image. - How to verify it Run fast-reboot on MLNX platform and observe faster create_switch execution time. This PR is dependent on PR: #10567	2022-04-28 10:35:14 +03:00
ganglv	9d7387a18e	[sonic-host-services]: Fix import and invalid path (#10660 ) Why I did it Can not start sonic-hostservice How I did it Install python3-dbus and systemd-python, and replace invalid path How to verify it Start the service with below commands: sudo systemctl start sonic-hostservice sudo systemctl status sonic-hostservice Signed-off-by: Gang Lv ganglv@microsoft.com	2022-04-27 07:14:51 +08:00
Saikrishna Arcot	64187a1b15	Remove SSH host keys after installing the custom version of sshd (#10633 ) * Remove SSH host keys after installing the custom version of sshd Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com> * Use an override for for sshd instead of overwriting the service file Don't overwrite upstream's .service file, and instead use an override file for making sure the host key(s) are generated. Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>	2022-04-25 10:38:52 -07:00
bingwang-ms	3fc3259a35	Define qos map `AZURE_TUNNEL` for QoS remapping of tunnel traffic (#10565 ) * Add AZURE_TUNNEL map Signed-off-by: bingwang <wang.bing@microsoft.com>	2022-04-25 15:06:10 +08:00
yozhao101	e24fe9bc60	[Monit] Fix the issue which shows Monit can not reset its counter. (#10288 ) Signed-off-by: Yong Zhao <yozhao@microsoft.com> Why I did it This PR aims to fix the Monit issue which shows Monit can't reset its counter when monitoring memory usage of telemetry container. Specifically the Monit configuration file related to monitoring memory usage of telemetry container is as following: check program container_memory_telemetry with path "/usr/bin/memory_checker telemetry 419430400" if status == 3 for 10 times within 20 cycles then exec "/usr/bin/restart_service telemetry" If memory usage of telemetry container is larger than 400MB for 10 times within 20 cycles (minutes), then it will be restarted. Recently we observed, after telemetry container was restarted, its memory usage continuously increased from 400MB to 11GB within 1 hour, but it was not restarted anymore during this 1 hour sliding window. The reason is Monit can't reset its counter to count again and Monit can reset its counter if and only if the status of monitored service was changed from Status failed to Status ok. However, during this 1 hour sliding window, the status of monitored service was not changed from Status failed to Status ok. Currently for each service monitored by Monit, there will be an entry showing the monitoring status, monitoring mode etc. For example, the following output from command sudo monit status shows the status of monitored service to monitor memory usage of telemetry: Program 'container_memory_telemetry' status Status ok monitoring status Monitored monitoring mode active on reboot start last exit value 0 last output - data collected Sat, 19 Mar 2022 19:56:26 Every 1 minute, Monit will run the script to check the memory usage of telemetry and update the counter if memory usage is larger than 400MB. If Monit checked the counter and found memory usage of telemetry is larger than 400MB for 10 times within 20 minutes, then telemetry container was restarted. Following is an example status of monitored service: Program 'container_memory_telemetry' status Status failed monitoring status Monitored monitoring mode active on reboot start last exit value 0 last output - data collected Tue, 01 Feb 2022 22:52:55 After telemetry container was restarted. we found memory usage of telemetry increased rapidly from around 100MB to more than 400MB during 1 minute and status of monitored service did not have a chance to be changed from Status failed to Status ok. How I did it In order to provide a workaround for this issue, Monit recently introduced another syntax format repeat every <n> cycles related to exec. This new syntax format will enable Monit repeat executing the background script if the error persists for a given number of cycles. How to verify it I verified this change on lab device str-s6000-acs-12. Another pytest PR (Azure/sonic-mgmt#5492) is submitted in sonic-mgmt repo for review.	2022-04-20 18:08:06 -07:00
Samuel Angebault	fb147764b5	[Arista] Fix arista-net initramfs hook (#10624 ) The interface renaming logic fails if one interface is missing. Because of the `set -e` the whole initramfs hook would abort early on error. This change fixes the current behavior to make sure missing interfaces are properly skipped and ensure existing interface are renamed.	2022-04-20 10:03:05 -07:00
Junhua Zhai	128d762af3	[gearbox] Add peer gbsyncd for swss if gearbox exists (#10504 ) Fix the issues #10501 and #9733 If having gearbox, we need: * add gbsyncd as a peer since swss also has dependency on gbsyncd * add service gbsyncd to FEATURE table if it is missing	2022-04-20 19:02:49 +08:00
kellyyeh	2a516a7763	[dhcp_relay] Enable dhcp_relay on EPMS, MgmtTsTor, MgmtToRRouter and BackEndToRRouter (#10474 )	2022-04-15 18:01:24 -07:00
Yakiv Huryk	d9117d9411	[Mellanox][asan] add address sanitizer support for syncd (#10266 ) Why I did it To support address sanitizer for Mellanox syncd How I did it /var/log/asan is mapped for syncd container (the same as for swss) container stop() has a timeout (60s) for syncd (the same as for swss) This is so libasan has enough time to generate a report. added ASAN's log path to Mellanox syncd supervisord.conf added "asan: yes" to sonic_version.yml How to verify it Added artificial memory leaks Compiled with ENABLE_ASAN=y Installed the image on DUT Rebooted the DUT Verified that /var/log/asan/syncd-asan.log contains the leaks Signed-off-by: Yakiv Huryk <yhuryk@nvidia.com>	2022-04-14 15:00:32 -07:00
Saikrishna Arcot	12ebe3ffa0	Run tune2fs during initramfs instead of image install (#10536 ) If it is run during image install, it's not guaranteed that the installation environment will have tune2fs available. Therefore, run it during initramfs instead. Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>	2022-04-12 16:24:13 -07:00
byu343	f7a6553933	[docker-syncd]: Add optional shm-size to syncd container (#10516 ) Why I did it In the bringup of tomahawk4/trident4, we realized that such chips need a larger size of /dev/shm in syncd container, so we added the option --shm-size to the docker create for syncd. The default value for shm-size is 64m; after this change, people can add SYNCD_SHM_SIZE=128m to platform_env.conf to change it to 128m. How to verify it We verified that after this change, 1) on existing platforms without platform_env.conf, the size of /dev/shm in syncd container (df -h \| grep shm) is still the default 64M; 2) after we add SYNCD_SHM_SIZE=128m to platform_env.conf, /dev/shm in syncd becomes 128M.	2022-04-09 10:47:18 -07:00
Vivek R	ed14eb5263	[interfaces-config] "main exception: cannot find interfaces: eth0" error log avoided (#10463 ) - Why I did it Fixes #9628 During bootup, this error log is seen Dec 22 04:26:29 sonic interfaces-config.sh[2546]: error: main exception: cannot find interfaces: eth0 (interface was probably never up ?) This is of non-functional nature and doesn't affect the flow. - How I did it Dont take the ifdown if not needed - How to verify it Verified during reboot. Log did not appear and IP was acquired on eth0 as expected Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>	2022-04-06 16:59:47 +03:00
bingwang-ms	b9dd1df372	Update qos config to clear queues for bounced back traffic (#10176 ) * Update qos config to clear queues for bounced back traffic Signed-off-by: bingwang <bingwang@microsoft.com>	2022-04-05 22:32:25 +08:00
judyjoseph	8e642848c2	Introduce the asic_subtype field for adding the sub platform variants. (#10235 ) * Introduce the asic_subtype field for adding the sub platform variants. It uses the value of TARGET_MACHINE variable in slave.mk.	2022-03-28 11:22:32 -07:00
Santhosh Kumar T	e2502edefd	Refactoring DELL platform init to reduce rc.local processing time porting changes in master (#10318 ) Why I did it To reduce the processing time of rc.local, refactoring s6100 platform initialization. Porting changes from 202012 branch [202012] Refactoring DELL platform init to reduce rc.local processing time #10171	2022-03-24 11:14:37 -07:00
xumia	e9ac13678d	[Build]: Fix armhf mirrors not existing issue (#10312 ) Why I did it [Build]: Fix armhf mirrors not existing issue The mirror endpoint debian-archive.trafficmanager.net does not support armhf, change to use deb.debian.org and security.debian.org.	2022-03-22 15:24:15 +08:00
Kostiantyn Yarovyi	bf4ab4a338	[Barefoot][Syncd] restart of the interface for cleaning txquee through which communication takes place between Sonic and openBMC (#9941 ) Why I did it improvement of starting barefoot SDK How I did it restart of the interface for cleaning txquee through which communication takes place between Sonic and openBMC How to verify it run sonic autorestart tests	2022-03-21 10:07:20 -07:00
Samuel Angebault	e4b507fa03	[Arista] rename management interface in initrd (#9856 ) On some products the pci enumeration adds randomness into which nic gets initialized first. Because SONiC doesn't use deterministic interface naming but instead old style interface naming, this leads to eth0 not always being the management port. To make sure eth0 is always the management port (SONiC expectation) rename the interfaces in the initramfs for Arista products.	2022-03-21 17:55:23 +05:30
xumia	1017ee6002	[Build]: Use one debian mirror config (#10274 ) Why I did it Use one debian mirror config. The empty config in https://github.com/Azure/sonic-buildimage/blob/master/files/image_config/apt/sources.list overrides the file https://github.com/Azure/sonic-buildimage/blob/master/files/apt/sources.list.amd64 (armhf/arm64), it does not make sense. All the content in files/image_config/apt is no use, any one wants to add mirror config, please add in files/apt. How I did it Remove files/image_config/apt and the reference.	2022-03-21 16:47:20 +08:00
Saikrishna Arcot	5617b1ae3e	Image disk space reduction (#10172 ) # Why I did it Reduce the disk space taken up during bootup and runtime. # How I did it 1. Remove python package cache from the base image and from the containers. 2. During bootup, if logs are to be stored in memory, then don't create the `var-log.ext4` file just to delete it later during bootup. 3. For the partition containing `/host`, don't reserve any blocks for just the root user. This just makes sure all disk space is available for all users, if needed during upgrades (for example). * Remove pip2 and pip3 caches from some containers Only containers which appeared to have a significant pip cache size are included here. Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com> * Don't create var-log.ext4 if we're storing logs in memory Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com> * Run tune2fs on the device containing /host to not reserve any blocks for just the root user Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>	2022-03-15 18:12:49 -07:00
Stepan Blyshchak	18d00dfbe7	[teamd.sh] kill teamd docker on warm shutdown for faster shutdown (#10219 ) This can save 6 sec for teamd LAG restoration - the time between: ``` Mar 9 13:51:10.467757 r-panther-13 WARNING teamd#teamd_PortChannel1[28]: Got SIGUSR1. Mar 9 13:52:33.310707 r-panther-13 INFO teamd#teamd_PortChannel1[27]: carrier changed to UP ``` - Why I did it Optimize warm boot. Specifically reduce the time needed for LAG restoration. - How I did it Kill teamd docker after graceful shutdown of teamd processes. - How to verify it Run warm reboot. Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>	2022-03-15 09:20:36 +02:00
xumia	0243ed9538	[build]: Fix marvell-armhf build hung issue (#10156 ) (#10229 ) Why I did it The marvel-armhf build is hung, it does not exit after waiting for a long time. It is caused by the process /etc/entropy.py which is started by the postinst script in target/debs/buster/sonic-platform-nokia-7215_1.0_armhf.deb	2022-03-15 10:03:54 +08:00
Saikrishna Arcot	d7c3ce0045	Specify the filesystem type when mounting to /host (#10169 ) When mounting the partition that contains `/host` during initramfs, the mount binary available there (coming from busybox) tries each filesystem in `/proc/filesystems` and sees which one succeeds. During this time, there may be some error messages logged into dmesg because some of the incorrect filesystems failed to mount the partition. Specify the filesystem type explicitly so that initramfs knows it's that type, and we know what filesystem will always get used there. Fixes #9998 Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>	2022-03-14 11:34:02 -07:00
Stepan Blyshchak	2919b4820f	[hostcfgd] record feature state in STATE DB (#9842 ) - Why I did it To implement blocking feature state change. - How I did it Record the actual feature state in STATE DB from hostcfg. - How to verify it UT + verification by running on the switch and checking STATE DB. Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>	2022-03-14 13:45:27 +02:00
xumia	eea3cc7ad1	[Build]: only install grpc in amd64 (#10212 ) [Build]: only install grpc in amd64 Unblock marvell-armhf build.	2022-03-14 13:41:37 +08:00
Samuel Angebault	8d419ca2c5	[Arista] Remove arista.log from rsyslog default logrotate (#9731 ) Why I did it In parallel of this change Arista added a custom logrotate configuration as part of its driver library. Having 2 logrotate configuration for the same log file triggers an issue. Fixes aristanetworks/sonic#38 How I did it Arista merged a few changes in sonic-buildimage which added a logrotate configuration aristanetworks/sonic@e43c797 It is therefore the right path to remove the arista.log line from the logrotate.d/rsyslog configuration. How to verify it Logrotate works without any error message, arista log rotation happens and arista daemons still append logs once file was truncated.	2022-03-11 08:09:07 -08:00
xumia	9cdf81230b	[Build]: Fix /proc not mounted issue (#10164 ) [Build]: Fix /proc not mounted issue	2022-03-11 09:23:37 +08:00
Song Yuan	01798447ab	[Chassis][QoS template] Skip configuring buffer and QoS config on recirc ports (#7869 ) * Added test case to verify the template changes.	2022-03-09 16:04:36 -08:00
Kebo Liu	fe0a7693f4	[smartmontools] Install smartmontools with apt-get and upgrade it to 7.2-1 (#10087 ) Why I did it Smartmontools 6.6 has an issue with reading SMART info of nvme SSD Smartmontools can be installed with apt-get, no need to build and install How I did it Use apt-get to install smartmontools 7.2-1 Remove previous make files for smartmontools 6.6 How to verify it verify with "smartctl" can read out correct SMART info on NVME ssd. verify "show platform ssdhealth" can still work Signed-off-by: Kebo Liu <kebol@nvidia.com>	2022-03-07 09:39:33 -08:00
Marty Y. Lok	c40f04f0e2	[chassis][supervisor]monit container-checker failed due to unexpected "database-chassis" docker running #9042 (#9043 ) Why I did it Fixed the monit container_checker fails due to unexpected "database-chassis" docker running on Supervisor card in the VOQ chassis. fixes #9042 How I did it Added database-chassis to the always running docker list if platform is supervisor card. How to verify it Execute the CLI command "sudo monit status container_checker" Signed-off-by: mlok <marty.lok@nokia.com>	2022-03-03 17:56:08 -08:00
Aravind Mani	1740beb1f2	[sonic-cfggen]: Fix sonic-cfggen build failures for armhf (#10132 ) Why I did it amrhf build fails while building sonic-config-engine whl package https://dev.azure.com/mssonic/be1b070f-be15-4154-aade-b1d3bfb17054/_apis/build/builds/77089/logs/9 The reason for the failure is due to the fact that there is a new line generated at the top of the file in buffer config test cases while building for broadcom based platform and this issue is not seen in Marvell based platforms. How I did it Removed the new line for all the buffer test cases as there is no need to add it and accordingly changed the buffer_config.j2 where the new line is generated.	2022-03-02 13:06:20 -08:00
Lawrence Lee	a50d1f1fc8	[write_standby]: Increase timeout to 60s (#10065 ) - Avoid scenarios where script times out before orchagent can establish IPinIP tunnel Signed-off-by: Lawrence Lee <lawlee@microsoft.com>	2022-02-24 14:55:45 -08:00
wenyiz2021	2d0b063191	Update container_checker for multi-asic devices when state is 'always_enabled' (#10067 ) * Update container_checker for multi-asic devices Update container_checker for multi-asic devices to add database containers in always_running_containers. Previous change was made for single-asic, and that database containers were not considered as feature when writing to state_db. * Update container_checker Update an indent	2022-02-23 18:06:30 -08:00
vmittal-msft	bc1dfea619	Updated traffic scheduler settings for HWSKUs : DellEMC-Z9332f-O32 and DellEMC-Z9332f-M-O16C64 (#9828 )	2022-02-23 17:22:41 -08:00
Stepan Blyshchak	fb752a4ae5	[rsyslog.j2] fix typo in VAR_LOG_SIZE_KB (#9954 ) This issue causes negative threshold value and thus deleting log files even when there is enough space. This issue causes negative threshold value and thus deleting log files even when there is enough space. - Why I did it To fix an issue when log files get deleted even if there is enough space. - How I did it Fixed an typo. - How to verify it Run the portion of the script that calculates threshold, see that the threshold is calculated correctly. Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>	2022-02-17 10:16:44 +02:00
byu343	155220be9b	Support multi-asic on macsec container (#9921 ) This change enables the support of running multiple macsec containers, each for one ASIC.	2022-02-13 22:45:24 -08:00
Oleksandr Ivantsiv	25a0ce5eb1	[asan] Add address sanitizer support. (#9857 ) Implement infrastructure that allows enabling address sanitizer for docker containers. Enable address sanitizer for SWSS container. - Why I did it To add a possibility to compile SONiC applications with address sanitizer (ASAN). ASAN is a memory error detector for C/C++. It finds: 1. Use after free (dangling pointer dereference) 2. Heap buffer overflow 3. Stack buffer overflow 4. Global buffer overflow 5. Use after return 6. Use after the scope 7. Initialization order bugs 8. Memory leaks - How I did it By adding new ENABLE_ASAN configuration option. - How to verify it By default ASAN is disabled and the SONiC image is not affected. When ASAN is enabled it inspects all allocation, deallocation, and memory usage that the application does in run time. To verify whether the application has memory errors tests that trigger memory usage of the application should be run. Ideally, the whole regression tests should be run. Memory leaks reports will be placed in /var/log/asan/ directory of SONiC host OS. Signed-off-by: Oleksandr Ivantsiv <oivantsiv@nvidia.com>	2022-02-09 13:29:18 +02:00
Prince George	ff14aebef9	Close console session due to user inactivity (#9890 ) Signed-off-by: Prince George <prgeor@microsoft.com>	2022-02-02 09:41:21 +05:30
tbgowda	4e32f85a31	Enable SAI_SWITCH_ATTR_UNINIT_DATA_PLANE_ON_REMOVAL attribute (#9419 ) Why I did it Fixes #8980 partly. The corresponding changes in sonic-sairedis is here : Azure/sonic-sairedis#975 How I did it Include changes from both repos and build an image for verification. How to verify it Trigger fast-reboot with the changes, see the attribute SAI_SWITCH_ATTR_UNINIT_DATA_PLANE_ON_REMOVAL being set at the SAI level. Signed-off-by: Thushar Gowda <24815472+tbgowda@users.noreply.github.com>	2022-02-01 08:44:17 -08:00
Alexander Allen	8a07af95e5	[Mellanox] Modified Platform API to support all firmware updates in single boot (#9608 ) Why I did it Requirements from Microsoft for fwutil update all state that all firmwares which support this upgrade flow must support upgrade within a single boot cycle. This conflicted with a number of Mellanox upgrade flows which have been revised to safely meet this requirement. How I did it Added --no-power-cycle flags to SSD and ONIE firmware scripts Modified Platform API to call firmware upgrade flows with this new flag during fwutil update all Added a script to our reboot plugin to handle installing firmwares in the correct order with prior to reboot How to verify it Populate platform_components.json with firmware for CPLD / BIOS / ONIE / SSD Execute fwutil update all fw --boot cold CPLD will burn / ONIE and BIOS images will stage / SSD will schedule for reboot Reboot the switch SSD will install / CPLD will refresh / switch will power cycle into ONIE ONIE installer will upgrade ONIE and BIOS / switch will reboot back into SONiC In SONiC run fwutil show status to check that all firmware upgrades were successful	2022-01-24 00:56:38 -08:00
dflynn-Nokia	b6939b9927	[firsttime boot] suppress error message on platforms not supporting kdump (#9521 ) Why I did it Eliminate benign firsttime boot error reported when running on platforms that do not support kdump. How I did it Change rc.local to check for presence of the file /etc/default/kdump-tools before referencing it. How to verify it Install a new image on an armhf or arm64 platform and check for a failed reference to /etc/default/kdump-tools on firsttime boot.	2022-01-20 18:27:10 -08:00
Shyam	20f32dc072	Added gbsyncd infra for multi-ASIC, multi-PHY mode (#9722 ) - External PHY is managed via gearbox (gbsybcd docker container) in SONiC - Enhanced 'External PHY management' from SONiC's single-ASIC environment to multi-ASIC - Enhanced gbsyncd docker container from single Namespace to multi-Namspace mode - Added gbsyncd.service.j2 on per_namespace basis. - Each namepace/ASIC now to have its unique gbsyncd<ASIC#> docker container with its own Gearbox table, redis-DB Signed-off-by: Shyam Kumar <shyakuma@cisco.com>	2022-01-21 10:08:16 +08:00
Alexander Allen	5f596aef63	[pmon] Move smartctl from pmon to host (#9607 ) Why I did it Need to be able to run smartctl when pmon docker is not running. How I did it Removed the pmon dependency for pmon as well as the command wrapper and added it to the debian-extension. How to verify it Stop pmon Run smartctl from the host and verify it runs without error	2022-01-19 10:53:10 -08:00
liuh-80	f166b991a7	[image]: Prevent radius passkey and snmp community string into syslog. (#9727 ) [image]: Prevent radius passkey and snmp community string into syslog. (#9727) #### Why I did it Prevent radius passkey and snmp community string into syslog. #### How I did it Add radius and snmp config command to PASSWD_CMDS #### How to verify it Run and pass all UTs. #### Which release branch to backport (provide reason below if selected) <!-- - Note we only backport fixes to a release branch, not features! - Please also provide a reason for the backporting below. - e.g. - [x] 202006 --> - [ ] 201811 - [ ] 201911 - [ ] 202006 - [ ] 202012 - [ ] 202106 #### Description for the changelog Add radius and snmp config command to PASSWD_CMDS to prevent radius passkey and snmp community string into syslog. #### A picture of a cute animal (not mandatory but encouraged)	2022-01-17 16:26:22 +08:00
Sudharsan Dhamal Gopalarathnam	bd0a19aa17	[rsyslog]Setting log file size to 16Mb (#9504 ) Why I did it The existing log file size in sonic is 1 Mb. Over a period of time this leads to huge number of log files which becomes difficult for monitoring applications to handle. Instead of large number of small files, the size of the log file is not set to 16 Mb which reduces the number of files over a period of time. How I did it Changed the size parameter and related macros in logrotate config for rsyslog How to verify it Execute logrotate manually and verify the limit when the file gets rotated. Signed-off-by: Sudharsan Dhamal Gopalarathnam <sudharsand@nvidia.com>	2022-01-14 10:24:07 -08:00
Marty Y. Lok	04a4b8dcb1	[multiasic][database]database.sh failed to create the database for namespace (#9502 ) Why I did it database.sh failed to create the database for namespace in multiasic platform. The latest code Docker version 20.10.x, command "docker create" no longer takes optional "NET=" with empty value. Syntax error show with current docker create command in database.sh. Issue #9503 How I did it Modify the docker_image_ctl.j2 to set default network setting NET="bridge" instead of empty for namespace database.	2021-12-13 10:17:05 -08:00
Qi Luo	cf4011d526	Revert "CRM init config for SRV6 Nexthop and MY_SID resource (#9238 )" (#9506 ) This reverts commit `8187d473af`.	2021-12-12 12:16:39 -08:00
Samuel Angebault	d499455752	[Arista] Update driver submodules (#9393 ) - Use SfpOptoeBase by default to leverage new `sonic_xcvr` refactor - Add support for `Woodleaf` product - Move `libsfp-eeprom.so` to a different `.deb` package - Add new logrotate configuration for arista logs - Improve logging mechanism for the drivers (IO loglevel, fix syslog duplicates) - Initialize chassis cards in parallel - Refactor of `get_change_event` to fix interrupts treated as presence change	2021-12-08 11:33:36 -08:00
Brian O'Connor	46bcda359c	[PINS] Build P4RT container for PINS (#9083 ) - Add INCLUDE_PINS to config to enable/disable container - Add Docker files and supporting resources - Add sonic-pins submodule and associated make files Submission containing materials of a third party: Copyright Google LLC; Licensed under Apache 2.0 #### Why I did it Adds P4RT container to SONiC for PINS The P4RT app is covered by this HLD: https://github.com/pins/SONiC/blob/master/doc/pins/p4rt_app_hld.md #### How I did it Followed the pattern and templates used for other SONiC applications #### How to verify it Build SONiC with INCLUDE_P4RT set to "y". Verify that the resulting build has a container called "p4rt" running. You can verify that the service is up by running the following command on the SONiC switch: ```bash sudo netstat -lpnt \| grep p4rt ``` You should see the service listening on TCP port 9559. #### Which release branch to backport (provide reason below if selected) None #### Description for the changelog Build P4RT container for PINS	2021-12-07 11:11:25 -08:00
Marty Y. Lok	cb4c66ae98	[chassis][multiasic] fixed rsyslogd FATAL issue in the database container in multi-asic box (#8390 ) Why I did it Fix for issue #8389 How I did it The /etc/rsyslog.conf is empty file which cause the FATAL of the process rsyslogd in the global instance database container. The function updateSyslogConf() should only generate the rsyslog.conf for containers in the namespace. it should not do it for the containers in the global instance. Instead, default rsyslog.conf should be used. Especially for database container, updateSyslogConf() is called before the database container is created. The result cause the sonic-cfggen failed to generate the rsyslog.conf.Why I did it Fix for issue #8389 How I did it The /etc/rsyslog.conf is empty file which cause the FATAL of the process rsyslogd in the global instance database container. The function updateSyslogConf() should only generate the rsyslog.conf for containers in the namespace. it should not do it for the containers in the global instance. Instead, default rsyslog.conf should be used. Especially for database container, updateSyslogConf() is called before the database container is created. The result cause the sonic-cfggen failed to generate the rsyslog.conf. Signed-off-by: mlok <marty.lok@nokia.com>	2021-12-01 07:16:49 -08:00
liuh-80	739c45645c	[TACACS+] Add audisp-tacplus for per-command accounting. (#8750 ) This pull request integrate audisp-tacplus to SONiC for per-command accounting. #### Why I did it To support TACACS per-command accounting, we integrate audisp-tacplus project to sonic. #### How I did it 1. Add auditd service to SONiC 2. Port and patch audisp-tacplus to SONiC #### How to verify it UT with CUnit to cover all new code in usersecret-filter.c Also pass all current UT. #### Which release branch to backport (provide reason below if selected) N/A #### Description for the changelog Add audisp-tacplus for per-command accounting. #### A picture of a cute animal (not mandatory but encouraged)	2021-12-01 11:50:09 +08:00
noaOrMlnx	0908f9ec49	[CoPP] Add always_enabled field (#9302 ) *Add the "always_enabled" field to copp_cfg.j2 file, in order to allow traps without an entry in features table, to be installed automatically.	2021-11-30 11:04:15 -08:00
Kumaresh Perumal	8187d473af	CRM init config for SRV6 Nexthop and MY_SID resource (#9238 ) *Enable CRM for SRV6 Nexthop and SRV6 MY_SID entries.	2021-11-30 09:21:19 -08:00
Shi Su	4b357044b3	[bgpcfgd] Add bgpcfgd support to advertise routes (#9197 ) Why I did it Add bgpcfgd support to advertise routes. How I did it Make bgpcfgd subscribe to the ADVERTISE_NETWORK table in STATE_DB and configure route advertisement accordingly. How to verify it Added unit tests in bgpcfgd and verify on KVM about route advertisement.	2021-11-29 23:17:57 -08:00
Lawrence Lee	6e1a477ce0	[mux]: Fix `mark_dhcp_packet` (#9373 ) - Consolidate the two [Service] sections by moving the ExecStartPre line for mark_dhcp_packet.py to the first section and removing the second. - Make the mark_dhcp_packet.py file executable - Also clean up mark_dhcp_packet.py - Remove unused imports - Fix spacing and line lengths to conform to PEP8 Signed-off-by: Lawrence Lee <lawlee@microsoft.com>	2021-11-29 12:04:06 -08:00
Brian O'Connor	002827f08e	[PINS] Add APPL_STATE_DB and response path log (#9082 ) - Add APPL_STATE_DB to database_config.json - Clear APPL_STATE_DB during SwSS container restarts - Add response path log file to logrotate config: responsepublisher.rec Co-authored-by: PINS Working Group <sonic-pins-subgroup@googlegroups.com>	2021-11-24 10:31:06 -08:00
Stephen Sun	b3ccef9c08	[Reclaim buffer] Common infrastructure update for reclaiming buffer (#9133 ) - Why I did it This is to update the common sonic-buildimage infra for reclaiming buffer. - How I did it Render zero_profiles.j2 to zero_profiles.json for vendors that support reclaiming buffer The zero profiles will be referenced in PR [Reclaim buffer] Reclaim unused buffers by applying zero buffer profiles #8768 on Mellanox platforms and there will be test cases to verify the behavior there. Rendering is done here for passing azure pipeline. Load zero_profiles.json when the dynamic buffer manager starts Generate inactive port list to reclaim buffer Signed-off-by: Stephen Sun <stephens@nvidia.com>	2021-11-24 15:00:23 +02:00
Junhua Zhai	240596ec7d	[gearbox] provide common gbsyncd.service.j2 to start for platform specific gbsyncd docker (#9332 ) Why I did it Fix #9059. It provides common gbsyncd.service.j2 to start for platform specific gbsyncd docker, which must be named 'gbsyncd'. How I did it All of platform specific gbsyncd dockers use a common name 'gbsyncd' Use a unique systemd service template gbsyncd.service.j2 for gbsyncd docker	2021-11-23 10:44:29 -08:00
Guohan Lu	f3faf6111b	Revert "[gearbox] provide common gbsyncd.service.j2 to start for platform specific gbsyncd docker (#9286 )" This reverts commit `1d2a11bbb8`.	2021-11-19 10:10:55 -08:00
Junhua Zhai	1d2a11bbb8	[gearbox] provide common gbsyncd.service.j2 to start for platform specific gbsyncd docker (#9286 ) Why I did it Fix #9059. It provides common gbsyncd.service.j2 to start for platform specific gbsyncd docker, which must be named 'gbsyncd'. How I did it All of platform specific gbsyncd dockers use a common name 'gbsyncd' Use a unique systemd service template gbsyncd.service.j2 for gbsyncd docker	2021-11-17 23:49:49 -08:00
Vivek Reddy	ff32ac3ed4	[Auto Techsupport] Event driven Techsupport Changes (#8670 ) #### Why I did it Changes required for feature "Event Driven TechSupport Invocation & CoreDump Mgmt". [HLD](https://github.com/Azure/SONiC/pull/818 ) Requires: https://github.com/Azure/sonic-utilities/pull/1796. Merging in any order would be fine. Summary of the changes: - Added the YANG Models for the new tables introduces as a part of this feature. - Enhanced init_cfg.json with the default config required - Added a compile Time flag which enables/disables the config required for this feature inside the init_cfg.json - Enhanced the supervisor-proc-exit-listener script to populate `<feature>:<critical_proc> = <comm>:<pid>` info in the STATE_DB when it observes an proc exit notification for the critical processes running inside the docker.	2021-11-15 21:56:37 -08:00
Renuka Manavalan	a685fe1765	add arista.log to logrotate (#9245 )	2021-11-15 07:29:30 -08:00
liuh-80	ff09b8b8ed	[TACACS+] Add Bash TACACS+ plugin for per-command authorization. (#8715 ) This pull request add a bash plugin for TACACS+ per-command authorization #### Why I did it 1. To support TACACS per command authorization, we check user command before execute it. 2. Fix libtacsupport.so can't parse tacplus_nss.conf correctly issue: Support debug=on setting. Support put server address and secret in same row. 3. Fix the parse_config_file method not reset server list before parse config file issue. #### How I did it The bash plugin will be called before every user command, and check user command with remote TACACS+ server for per-command authorization. #### How to verify it UT with CUnit cover all code in this plugin. Also pass all current UT. #### Which release branch to backport (provide reason below if selected) N/A #### Description for the changelog Add Bash TACACS+ plugin. #### A picture of a cute animal (not mandatory but encouraged)	2021-11-13 09:57:30 +08:00
Stepan Blyshchak	a2c2d67098	[ACL] enable ACL FC when genereting config from minigraph but disable by default (#8908 ) * [ACL] enable ACL FC when genereting config from minigraph but disable by default Why I did it To support ACL counters on Flex Counter Infrastructure. How I did it Enable ACL FC in init_cfg and minigraph. Disable when genereting configuration from preset. How to verify it Together with depends PRs. Run ACL/Everflow test suite. Signed-off-by: Stepan Blyshchak <stepanb@nvidia.com>	2021-11-11 09:07:54 +08:00
Guohan Lu	5f11eb320e	Revert "sysready (#8889 )" This reverts commit `d7e5372e54`.	2021-11-10 15:36:20 -08:00
Alexander Allen	2847265bfd	Mellanox bullseye merge (#1 ) Allow mellanox platform to build and successfully switch packets in Debian 11 Upgraded * Mellanox SDK * Mellanox Hardware Management * Mellanox Firmware * Mellanox Kernel Patches Adjusted build system to support host system running bullseye and dockers running buster.	2021-11-10 15:27:22 -08:00
LuiSzee	5b284767f6	Update Centec platform support for Bullseye and 5.10 kernel (#7 ) 1. Fix build for armhf and arm64 2. upgrade centec tsingma bsp support to 5.10 kernel 3. modify centec platform driver for linux 5.10 Co-authored-by: Shi Lei <shil@centecnetworks.com>	2021-11-10 15:27:22 -08:00
Saikrishna Arcot	1d00613305	Add support for building Mellanox image ISSU will likely be broken. As of right now, the issu-version file is not being generated during build. Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>	2021-11-10 15:27:22 -08:00
Saikrishna Arcot	33e4b7f90e	Fix Python 3 syntax in SONiC container startup scripts The common startup script used for SONiC containers is calling an inline python command that uses Python 2 syntax, and thus errors out when run with Python 3. Make this work with Python 3. Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>	2021-11-10 15:27:22 -08:00
Saikrishna Arcot	fb03bd2440	Get packages for the base image from the main repos instead of our mirror There appears to be some network issue in the pipeline builds when downloading packages from our mirror. Change the source to be from the main debian repos to try to get around this issue. Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>	2021-11-10 15:27:22 -08:00
Saikrishna Arcot	2b0ad74db6	Update kdump-tools for bullseye Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>	2021-11-10 15:27:22 -08:00
Saikrishna Arcot	a1d30e3aa0	Python 2 removal/cleanup Remove Python 2 package installation from the base image. For container builds, reference Python 2 packages only if we're not building for Bullseye. For libyang, don't build Python 2 bindings at all, since they don't seem to be used. Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>	2021-11-10 15:27:22 -08:00
Saikrishna Arcot	b8a7a6355b	Update the base Debian system installation script to get Bullseye Python 2 is no longer available, so remove those packages, and remove the pip2 commands. For picocom and systemd, just install from the regular repo, since there's no backports yet. Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>	2021-11-10 15:27:22 -08:00
Senthil Kumar Guruswamy	d7e5372e54	sysready (#8889 )	2021-11-10 14:52:52 -08:00
Lawrence Lee	475bfc9625	[mux.service]: Remove pmon dependency (#9211 ) Signed-off-by: Lawrence Lee <lawlee@microsoft.com>	2021-11-10 08:08:03 -08:00
tjchadaga	8544147a70	Fix for additional intf flap during fast-reboot (#9166 )	2021-11-08 15:21:11 -08:00
abdosi	ea91a72b79	[multi-asic] fix syslog not getting generated. (#9160 ) Fixes #9159	2021-11-03 18:29:09 -07:00
trzhang-msft	689c101095	update DHCP_PACKET_MARK schema (#9077 ) - update DHCP_PACKET_MARK schema in state_db - this is an update over PR: Add service mark_dhcp_packet to mux container #9015	2021-11-02 15:55:50 -07:00
Stepan Blyshchak	2ef97bb5df	[dockers] change RPC, DBG dockers version: put RPG, DBG sign in build metadata part of the version (#8920 ) - Why I did it In case an app.ext requires a dependency syncd^1.0.0, the RPC version of syncd will not satisfy this constraint, since 1.0.0-rpc < 1.0.0. This is not correct to put 'rpc' as a prerelease identifier. Instead put 'rpc' as build metadata in the version: 1.0.0+rpc which satisfies the constraint ^1.0.0. - How I did it Changed the way how to version in RPC and DBG images are constructed. - How to verify it Install app.ext with syncd^1.0.0 dependency on a switch with RPC syncd docker. Signed-off-by: Stepan Blyshchak <stepanb@nvidia.com>	2021-11-01 19:02:57 +02:00
Cosmin-Jinga-MS	dfc1697045	[CBF] Added configuration templates to generate configs for CBF (#8689 ) Updated CBF config packaging [build_templates]: Added default configuration file for CBF [rules]: Added loading rule for CBF config The CBF default config is required to load default start-up config on CBF capable platforms	2021-10-29 17:18:57 -07:00
Sachin Naik	99dcc831f2	[gearbox] Add gbsyncd container for Credo gearbox chips (#9009 ) Enable gbsyncd support for cisco platforms Signed-off-by: Sachin Naik sachnaik@cisco.com Why I did it To enable cisco gbsyncd container for cisco gearbox hardwares. How I did it Create symlink to gbsyncd.service.j2 to start gearbox systemd service. How to verify it Verify that the gbsyncd-cisco container started for x86_64-88_lc0_36fh_mo-r0 Line card root@localhost:/home/cisco# docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 50d309ea9967 docker-sonic-telemetry:latest "/usr/local/bin/supe…" 26 minutes ago Up 6 minutes telemetry 65cebc9e181b docker-sonic-mgmt-framework:latest "/usr/local/bin/supe…" 26 minutes ago Up 6 minutes mgmt-framework 5a9b510da24d docker-snmp:latest "/usr/local/bin/supe…" 26 minutes ago Up 6 minutes snmp c291b0a1fc87 26195cc7c042 "/usr/bin/docker_ini…" 26 minutes ago Up 6 minutes dhcp_relay d85aa5e6b78c docker-router-advertiser:latest "/usr/bin/docker-ini…" 28 minutes ago Up 6 minutes radv 46c787329374 docker-lldp:latest "/usr/bin/docker-lld…" 28 minutes ago Up 6 minutes lldp 6643f53e4ceb docker-gbsyncd-cisco:latest "/usr/local/bin/supe…" 28 minutes ago Up 6 minutes gbsyncd-cisco f05ae8af4aaa docker-syncd:latest "/usr/local/bin/supe…" 28 minutes ago Up 6 minutes syncd 02e0e53b62cf docker-teamd:latest "/usr/local/bin/supe…" 28 minutes ago Up 6 minutes teamd fc7bc2dbb6a9 docker-orchagent:latest "/usr/bin/docker-ini…" 28 minutes ago Up 6 minutes swss 5c5147c986c9 docker-fpm-frr:latest "/usr/bin/docker_ini…" 28 minutes ago Up 6 minutes bgp 63b5ce3d4c80 docker-platform-monitor:latest "/usr/bin/docker_ini…" 28 minutes ago Up 6 minutes pmon 7e6f34dca0e5 docker-database:latest "/usr/local/bin/dock…" 28 minutes ago Up 29 minutes database Signed-off-by: Sachin Naik <sachnaik@cisco.com> Co-authored-by: Sachin Naik <sachnaik@cisco.com>	2021-10-27 12:35:47 +08:00
Stepan Blyshchak	4ad5f2af3f	[swss.sh] fix an issue that dependent services are not read from a file (#8943 ) This is due to the SERVICE variable declared after reading a file #### Why I did it To fix an issue that dhcp_relay does not restart with swss. #### How I did it Fixed in the swss.sh script #### How to verify it sudo systemctl restart swss verify dhcp_relay restarts as well.	2021-10-26 19:01:30 -07:00
Maxime Lorrillere	81f4fca3dc	Allow database instances on multi-asic linecards to connect to chassis DB (#8583 ) Add code to interfaces-config.sh to configure eth1 in multi-asic containers so that they can access midplane subnet. Co-authored-by: Maxime Lorrillere <mlorrillere@arista.com>	2021-10-26 18:27:09 -07:00
Marty Y. Lok	b91190d82d	[Nokia] Add protobuf and grpc C++ and python lib to support Nokia IXR7250E platform (#8366 ) #### Why I did it Nokia IXR7250E platform requires grpcio, grpcio-tools python library, and libprotobuf-dev, libgrpc++ library #### How I did it Modified the build_debian.sh install libprotobuf-dev and libgrpc++ to support nokia ndk Modified the sonic_debian_extension.j2 to install the grpcio and grpcio-tools in the host Modified the docker-platform-monitor/Dockerfile.js to install grpcio and grpcio-tools for the pmon container. #### How to verify it Image running success.	2021-10-26 18:09:32 -07:00
trzhang-msft	4e0c4fb832	Add service mark_dhcp_packet to mux container (#9015 ) - add a new service "mark_dhcp_packet" to mux container - apply packet marks on a per-interface basis in ebtables - write packet marks to "DHCP_PACKET_MARK" table in state_db	2021-10-26 14:10:13 -07:00
Nazarii Hnydyn	453346f8df	[teamd]: Send USR1/USR2 only to subscribers. (#8856 ) To fix teamd signal handling, without which Process 'tlm_teamd' exited unexpectedly	2021-10-26 09:12:07 -07:00
Sumukha Tumkur Vani	3971c20001	Flush RESTAPI_DB when config reload is performed (#9037 )	2021-10-22 11:45:19 -07:00
Lawrence Lee	d5834fcb1b	Merged PR 4679112: [write_standby]: Ignore non-auto interfaces [write_standby]: Ignore non-auto interfaces * In the event that `write_standby.py` is used to automatically switchover interfaces when linkmgrd or bgp crashes, ignore any interfaces that are not configured to auto-switch Signed-off-by: Lawrence Lee <lawlee@microsoft.com>	2021-10-15 09:59:59 -07:00
Lawrence Lee	17cbfc44e6	Merged PR 4559560: [bgp]: Switch to standby if BGP container exits [bgp]: Switch mux to standby if BGP container exits Signed-off-by: Lawrence Lee <lawlee@microsoft.com>	2021-10-15 09:59:59 -07:00
Lawrence Lee	69bae5b27a	[write_standby]: Improve logging Signed-off-by: Lawrence Lee <lawlee@microsoft.com>	2021-10-15 09:59:59 -07:00
Lawrence Lee	fad5ec47b4	[mux]: Call write_standby from host only Signed-off-by: Lawrence Lee <lawlee@microsoft.com>	2021-10-15 09:59:59 -07:00
Lawrence Lee	5232647b33	[mux]: Make write_standby available on host Signed-off-by: Lawrence Lee <lawlee@microsoft.com> [write_standby]: Cleanup and fix build Signed-off-by: Lawrence Lee <lawlee@microsoft.com>	2021-10-15 09:59:59 -07:00
Tamer Ahmed	b880f9d973	Merged PR 4813977: [mux] Update Service Install With SONiC Target [mux] Update Service Install With SONiC Target Recent PR grouped all SONiC service into sonic.taget. The install section of mux.service was not update and this causes delays when using config reload as the service failed state is not being reset. signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>	2021-10-15 09:59:59 -07:00
Lawrence Lee	0295c832c2	Merged PR 4366316: [mux.service]: Bind to sonic.target [mux.service]: Bind to sonic.target Signed-off-by: Lawrence Lee <lawlee@microsoft.com>	2021-10-15 09:59:59 -07:00
Tamer Ahmed	bff785ec49	Merged PR 4234524: [mux] Start Mux on Only Dual-ToR Platform [mux] Start Mux on Only Dual-ToR Platform mux docker depends on the presence of mux cable hardware and is supposed to run only Gemini ToRs. This PR change the mux feature config in order to enable mux docker based on device configuration. signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>	2021-10-15 09:59:59 -07:00
Tamer Ahmed	c9c2826520	Merged PR 3845699: [linkmgrd]: Introduce MUX cable linkmgrd Linkmgrd monitors link status, mux status, and link state. Has the link becomes unhealthy, linkmgrd will trigger mux switchover on a standby ToR ensuring uninterrupted service to servers/blades. This PR is initial implementation of linkmgrd. Also, docker-mux container hold packages related to maintaining and managing mux cable. It currently runs linkmgrd binary that monitor and switches the mux if needed. This PR also introduces mux-container and starts linkmgrd as startup when build is configured with INCLUDE_MUX=y Edit: linkmgrd PR will follow. signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com> Related work items: #2315, #3146150	2021-10-15 09:59:59 -07:00
Ying Xie	638c287837	[copp] bind copp-config.service to sonic.target (#8969 ) copp-config service needs to be started after sonic.target so that it could render the copp-config with the latest information. It also needs to be restarted when config reload or load_minigraph is invoked. Signed-off-by: Ying Xie <ying.xie@microsoft.com>	2021-10-13 21:07:44 -07:00
liuh-80	7d40384c58	[TACACS+] Add plugin support to bash. (#8660 ) This pull request add plugin support library to bash. And we will create a TACACS+ plugin for bash in an other PR, which will bring per command authorization feature to bash. Why I did it To support TACACS per command authorization, we check user command before execute it. How I did it Add plugin support to bash. How to verify it UT with CUnit under bash project cover all new code in plugin.c. Also pass all current UT. Which release branch to backport (provide reason below if selected) N/A Description for the changelog Add plugin support to bash.	2021-10-11 15:20:51 +08:00
Ashok Daparthi-Dell	6cbdf11e53	SONIC QOS YANG - Remove qos tables field value refernce format (#7752 ) Depends on Azure/sonic-utilities#1626 Depends on Azure/sonic-swss#1754 QOS tables in config db used ABNF format i.e "[TABLE_NAME\|name] to refer fieldvalue to other qos tables. Example: Config DB: "Ethernet92\|3": { "scheduler": "[SCHEDULER\|scheduler.1]", "wred_profile": "[WRED_PROFILE\|AZURE_LOSSLESS]" }, "Ethernet0\|0": { "profile": "[BUFFER_PROFILE\|ingress_lossy_profile]" }, "Ethernet0": { "dscp_to_tc_map": "[DSCP_TO_TC_MAP\|AZURE]", "pfc_enable": "3,4", "pfc_to_queue_map": "[MAP_PFC_PRIORITY_TO_QUEUE\|AZURE]", "tc_to_pg_map": "[TC_TO_PRIORITY_GROUP_MAP\|AZURE]", "tc_to_queue_map": "[TC_TO_QUEUE_MAP\|AZURE]" }, This format is not consistent with other DB schema followed in sonic. And also this reference in DB is not required, This is taken care by YANG "leafref". Removed this format from all platform files to consistent with other sonic db schema. Example: "Ethernet92\|3": { "scheduler": "scheduler.1", "wred_profile": "AZURE_LOSSLESS" }, Dependent pull requests: #7752 - To modify platfrom files #7281 - Yang model Azure/sonic-utilities#1626 - DB migration Azure/sonic-swss#1754 - swss change to remove ABNF format	2021-09-28 09:21:24 -07:00
Vaibhav Hemant Dixit	ee9250e8cc	Save DB dump after warm/fast reboot (#8803 ) As a part of warmboot, redis database is dumped: `c97fe546e5/scripts/fast-reboot (L269)` However, this dump file is deleted, after it is loaded back into db post reboot. The DB dump can be useful for debugging purpose, hence taking a backup of it can be useful. Instead of deleting the dump, rename and keep the dump.	2021-09-23 23:53:22 -07:00
kellyyeh	62a1f5eb19	Add CLI Support for IPv6 Helpers and DHCPv6 Relay Counters (#8593 )	2021-09-23 22:01:26 -07:00
abdosi	13ec43bc68	[baseimage]: Logrotate for wtmp and btmp files. (#8743 ) Added logrotate file for wtmp and btmp to override default conf and set size cap as 100K as done in PR: #865. For buster this is control by separate file wtmp and btmp. Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>	2021-09-15 23:28:27 -07:00
Sudharsan Dhamal Gopalarathnam	db529af203	Removing execute permission from copp config file (#8680 ) *Removed execute permissions from the systemd copp-config.service file. Without this we will get a warning: "Configuration file /lib/systemd/system/copp-config.service is marked executable. Please remove executable permission bits. Proceeding anyway."	2021-09-13 09:10:21 -07:00
Ying Xie	41643a9729	[202012][fstrim] delay fstrim timer after sonic.target (#8737 ) Why I did it fstrim has dependency on pmon docker. How I did it start fstrim timer after sonic.target. How to verify it local test and PR test. Signed-off-by: Ying Xie ying.xie@microsoft.com	2021-09-13 07:37:46 -07:00
byu343	50a9587e6e	[gbsyncd] Flush GB_ASIC_DB for gbsyncd cold restart (#8633 ) This is to flush the state in GB_ASIC_DB when running 'config reload'. Otherwise, the left state affects the cold restart of gbsyncd.	2021-08-31 15:52:48 -07:00
Samuel Angebault	57e7b941ab	[Arista] Fix flash size computation for Lodoga (#8622 ) The Lodoga platform also matched crow which was hardcoding the flash size to 3700. This change enables autodetect on Clearlake which in turns allows autodetect for Lodoga. The threshold was bumped from 3700 to 4000 because size computation can differ slightly and report slightly above 3700.	2021-08-30 15:26:56 -07:00
Samuel Angebault	48ba459f9f	[Arista] Rely on automatic flash size detection for Lodoga (#8608 ) Lodoga actually has a 8GB storage device. LodogaSsd variant has a 30GB SSD drive. However, in boot0 both were mishandled and assigned 4GB for legacy reasons. Remove the hardcoding of the flash size and let boot0 autodetect the available space.	2021-08-26 19:02:10 -07:00
dflynn-Nokia	7bae388e2f	[Nokia ixs7215] Add support for changing the console baud rate (#8595 ) This commit adds support for changing the default console baud rate configured within the U-Boot bootloader. That default baud rate is exposed via the value of the U-Boot 'baudrate' environment variable. This commit removes logic that hardcoded the console baud rate to 115200 and instead ensures that the U-Boot 'baudrate' variable is always used when constructing the Linux kernel boot arguments used when booting Sonic. A change is also made to rc.local to ensure that the specified baud rate is set correctly in the serial getty service.	2021-08-26 07:14:34 -07:00
byu343	cdfb4855dc	[macsec] Add eapol to copp config (#8416 ) This change enables the control packets of MACsec to be processed by CPU.	2021-08-23 18:56:23 -07:00
Volodymyr Samotiy	e3a30deea9	[monit] Periodically monitor VNET route consistency (#8266 ) To run VNET route consistency check periodically. For any failure, the monit will raise alert based on return code. Signed-off-by: Volodymyr Samotiy <volodymyrs@nvidia.com>	2021-08-19 16:29:25 -07:00
abdosi	2348794ef0	Enable sysctl fib_multipath_use_neigh (#8502 ) Enable fib_multipath_use_neigh for v4 https://www.kernel.org/doc/Documentation/networking/ip-sysctl.txt Why I did: This is helpful if the neighbor are not directly connected then Kernel forward to unreachable neighbor option. With this option forwarding using neighbor state to be valid.	2021-08-18 15:53:17 -07:00
Stephen Sun	c895677507	Use predefined macro as vendor information (#8361 ) #### Why I did it Use a predefined variable to get vendor information when the swss docker container is created #### How I did it Use `{{ sonic_asic_platform }}` instead of `$SONIC_CFGGEN -y /etc/sonic/sonic_version.yml -v asic_type` #### How to verify it Manually test.	2021-08-16 00:36:48 -07:00
Ying Xie	71e8b0caed	[aboot] use ram partition for /var/log for devices with 3.7G disks (#8400 ) Master/202012 image size grew quite a bit. 3.7G harddrive can no longer hold one image and safely upgrade to another image. Every bit of harddrive space is precious to save now. Also sh syntax seemingly changed, [ condition ] && action was a legit syntax in 201911 branch but it is an error when condition not met with 202012 or later images. Change the syntax to if statement to avoid the issue. Signed-off-by: Ying Xie ying.xie@microsoft.com	2021-08-13 09:01:34 -07:00
Vladyslav Morokhovych	80e0627acc	[swss] Fix arp_update script (#8412 ) Fix #7968 Issue is detected on SONiC.20201231.11 In test_static_route.py::test_static_route_ecmp static routes are configured, but neighbors are not resolved after config reload even after 10 minutes. It looks like the arp_update script is starting to ping when Vlan1000 is not fully configured. When issue is reproduced, stuck ping6 process is observed in swss container : USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 180 0.1 0.0 6296 1272 pts/0 S 17:03 0:03 ping6 -I Vlan1000 -n -q -i 0 -c 1 -W 0 ff02::1 And when arp_update script successfully resolves neighbors, we observe sleep 300 instead of ping process	2021-08-12 23:29:22 -07:00
Saikrishna Arcot	c8b5daed27	Upgrade to ifupdown2 3.0.0 with a patch to fix using broadcast addresses In version 3.0.0, If a broadcast address is specified in /etc/network/interfaces, then when ifup is run, it will fail with an error saying `'str' object has no attribute 'packed'`. This appears to be because it expects all attributes for an interface to be "packable" into a compact binary representation. However, it doesn't actually convert the broadcast address into an IPNetwork object (other addresses are handled). Therefore, convert the broadcast address it reads in from a str to an IPNetwork object. Also explicitly specify the scope of the loopback address in /etc/network/interfaces as host scope. Otherwise, it will get added as global scope by default. As part of this, use JSON to parse ip's output instead of text, for robustness. Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>	2021-08-12 23:18:01 -07:00
Stepan Blyshchak	14da7a1663	[sonic_debian_extension.j2] export DOCKER_HOST so that clients can use it to connect to dockerd (#8398 ) Use DOCKER_HOST. Every client including docker command and python docker API uses this environment variable to connect to dockerd. Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>	2021-08-10 11:11:45 -07:00
lguohan	cf73e22d52	[build]: add branch and release name in sonic_version.yml (#6356 ) the branch refers the branch name that the commit is in, for example master, 202012, 201911, ... In case there is no branch, the name will be HEAD. release is encoded in /etc/sonic/sonic_release file. the file is only available for a release branch. It is not available in master branch. example for master branch ``` build_version: 'master.602-6efc0a88' debian_version: '10.7' kernel_version: '4.19.0-9-2-amd64' asic_type: vs commit_id: '6efc0a88' branch: 'master' release: 'none' build_date: Tue Dec 29 06:54:02 UTC 2020 build_number: 602 built_by: johnar@jenkins-worker-23 ``` example for 202012 release branch ``` build_version: '202012.602-6efc0a88' debian_version: '10.7' kernel_version: '4.19.0-9-2-amd64' asic_type: vs commit_id: '6efc0a88' branch: '202012' release: '202012' build_date: Tue Dec 29 06:54:02 UTC 2020 build_number: 602 built_by: johnar@jenkins-worker-23 ``` Signed-off-by: Guohan Lu <lguohan@gmail.com>	2021-08-08 20:44:02 -07:00
Guohan Lu	0b155c003e	[build]: Fix docker pull on armhf platform armhf build uses native dockerd Signed-off-by: Guohan Lu <lguohan@gmail.com>	2021-08-06 23:33:40 -07:00
Longxiang Lyu	6283716e1d	[swss][arp_update] Send ipv6 pings over vlan sub interfaces (#8363 ) #### Why I did it * `arp_update` fails to ping those neighbors over vlan sub interfaces. #### How I did it * modify `arp_update_vars.j2` to get vlan sub interfaces with ipv6 addresses assigned. * modify `arp_update` to send ipv6 pings over those retrieved vlan sub interfaces. Signed-off-by: Longxiang Lyu <lolv@microsoft.com>	2021-08-06 21:14:18 -07:00
byu343	2fccf0661a	[gearbox] Add gbsyncd container for Credo gearbox chips (#8144 ) This change is to add a gbsyncd container to accommodate the syncd process and the SAI libraries for the Credo gearbox chips. How I did it This container works similar to the existing Broadcom syncd container. Its main difference is that the SAI-related dynamic libraries are replaced by the ones for Credo gearbox chips, and the container only reacts to SAI events for the gearbox chips. The SAI libraries will be provided by the package libsai-credo_1.0_amd64.deb. For the image build, the added container will be built and included in the Broadcom platform image, after $(LIBSAI_CREDO)_URL = is replaced to the correct value. For now, as $(LIBSAI_CREDO)_URL is empty, the container build is skipped in the image build. After the container is included in the image, in the runtime, the container will begin with checking the existence of /usr/share/sonic/hwsku/gearbox_config.json; if that file is not provided, the container will exit by itself. Therefore, for platforms unrelated to the Credo chips, as long as they are not providing the file, they will not be affected by this change.	2021-08-04 16:05:53 -07:00
VenkatCisco	d294807b6b	[baseimage]: add j2cli to sonic_debian_extension.j2 (#8019 ) j2cli provides access to jinja library. cisco platform.py requires j2cli to handle jinja template configuration files.	2021-08-03 18:06:59 -07:00
vdahiya12	498422968f	[pmon] create and mount firmware directory on PMON for firmware upgrade support on muxcable (#8283 ) This PR creates a directory firmware on the HOST with the path /usr/share/sonic/firmware, as well as this is mounted on PMON container with the same path /usr/share/sonic/firmware. This is required for firmware upgrade support for muxcable as currently by design all Y-Cable API's are called by xcvrd. As such if CLI has to transfer a file to PMON we need to mount a directory from host to PMON just for getting the firmware files. Hence we require this change. Signed-off-by: vaibhav-dahiya <vdahiya@microsoft.com>	2021-08-03 09:39:39 -07:00
mprabhu-nokia	3fd6e8d500	[systemd] ASIC status based service bringup on VOQ chassis (#7477 ) Changes to allow starting per asic services like swss and syncd only if the platform vendor codedetects the asic is detected and notified. The systemd services ordering we want is database->database@->pmon->swss@->syncd@->teamd@->lldp@ There is also a requirement that management, telemetry, snmp dockers can start even if all asic services are not up. Why I did it For VOQ chassis, the fabric cards will have 1-N asics. Also, there could be multiple removable fabric cards. On the supervisor, swss and syncd containers need to be started only if the fabric-card is in Online state and respective asics are detected by the kernel. Using systemd, the dependent services can be in inactive state. How I did it Introduce a mechanism where all ASIC dependent service wait on its state to be published via PMON to REDIS. Once the subscription is received, the service proceeds to create respective dockers. For fixed platforms, systemd is unchanged i.e. the service bring up and docker creation happens in the start()/ExecStartPre routine of the .sh scripts. For VOQ chassis platform on supervisor, the service bringup skips docker creation in the start() routine, but does it in the wait()/ExecStart routine of the .sh scrips. Management dockers are decoupled from ASIC docker creation.	2021-07-27 23:02:49 -07:00
賓少鈺	aa59bfeab7	[PDE]: introduce the SONiC Platform Development Env (#7510 ) The PDE silicon test harness and platform test harness can be found in src/sonic-platform-pdk-pde	2021-07-24 16:24:43 -07:00
Renuka Manavalan	3a96eb933e	Get Docker proxy info from config (#8205 ) This helps not to hard code the docker proxy IP, but take it from config file during build time.	2021-07-19 21:17:47 -07:00
Renuka Manavalan	c5dff0c640	Revert "Revert "[Kubernetes]: The kube server could be used as http-proxy for docker (#7469 )" (#8023 )" (#8158 ) This reverts commit `7236fa98e8`. Restore original PR #7469	2021-07-15 19:48:55 -07:00
Stepan Blyshchak	b3b6938fda	[dhcp-relay] make DHCP relay an extension (#6531 ) - Why I did it Make DHCP relay docker an extension. DHCP relay now carries dhcp relay commands CLI plugin and has a complete manifest. It is installed as extension if INCLUDE_DHCP_REALY is set to y. DEPENDS on #5939 - How I did it Modify DHCP relay docker makefile and dockerfile. Make changes to sonic_debian_extension.j2 to install sonic packages. I moved DHCP related CLI tests from sonic-utilities to DHCP relay docker. This PR introduces a way to write a plugin as part of docker image and run the tests from cli-plugin-tests directory under docker directory. The test result is available in target/docker-dhcp-relay.gz.log: [ REASON ] : target/docker-dhcp-relay.gz does not exist NON-EXISTENT PREREQUISITES: docker-start target/docker-config-engine-buster.gz-load target/python-wheels/sonic_utilities-1.2-py3-none-any.whl-in stall target/debs/buster/python3-swsscommon_1.0.0_amd64.deb-install [ FLAGS FILE ] : [] [ FLAGS DEPENDS ] : [] [ FLAGS DIFF ] : [] ============================= test session starts ============================== platform linux -- Python 3.7.3, pytest-3.10.1, py-1.7.0, pluggy-0.8.0 -- /usr/bin/python3 cachedir: .pytest_cache rootdir: /sonic/dockers/docker-dhcp-relay/cli-plugin-tests, inifile: plugins: cov-2.6.0 collecting ... collected 10 items test_config_dhcp_relay.py::TestConfigVlanDhcpRelay::test_plugin_registration PASSED [ 10%] test_config_dhcp_relay.py::TestConfigVlanDhcpRelay::test_config_vlan_add_dhcp_relay_with_nonexist_vlanid PASSED [ 20%] test_config_dhcp_relay.py::TestConfigVlanDhcpRelay::test_config_vlan_add_dhcp_relay_with_invalid_vlanid PASSED [ 30%] test_config_dhcp_relay.py::TestConfigVlanDhcpRelay::test_config_vlan_add_dhcp_relay_with_invalid_ip PASSED [ 40%] test_config_dhcp_relay.py::TestConfigVlanDhcpRelay::test_config_vlan_add_dhcp_relay_with_exist_ip PASSED [ 50%] test_config_dhcp_relay.py::TestConfigVlanDhcpRelay::test_config_vlan_add_del_dhcp_relay_dest PASSED [ 60%] test_config_dhcp_relay.py::TestConfigVlanDhcpRelay::test_config_vlan_remove_nonexist_dhcp_relay_dest PASSED [ 70%] test_config_dhcp_relay.py::TestConfigVlanDhcpRelay::test_config_vlan_remove_dhcp_relay_dest_with_nonexist_vlanid PASSED [ 80%] test_show_dhcp_relay.py::TestVlanDhcpRelay::test_plugin_registration PASSED [ 90%] test_show_dhcp_relay.py::TestVlanDhcpRelay::test_dhcp_relay_column_output PASSED [100%] =============================== warnings summary =============================== /usr/local/lib/python3.7/dist-packages/tabulate.py:7 /usr/local/lib/python3.7/dist-packages/tabulate.py:7: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working from collections import namedtuple, Iterable -- Docs: https://docs.pytest.org/en/latest/warnings.html ==================== 10 passed, 1 warnings in 0.35 seconds =====================	2021-07-15 10:35:56 -07:00
Blueve	3da6f12b0b	[port_config] Introduce ad-hoc mport_config.json file (#8066 ) Signed-off-by: Jing Kan jika@microsoft.com	2021-07-15 08:56:35 +08:00
Stepan Blyshchak	3a2b8c6ba5	[SONiC Application Extension] support warm/fast reboot for extension packages (#7286 ) #### Why I did it I made this change to support warm/fast reboot for SONiC extension packages as per HLD Azure/SONiC#682. #### How I did it I extended manifest.json.j2 with new warm/fast reboot related fields and also extended sonic_debian_extension.j2 script template to generate the shutdown order files for warm and fast reboot.	2021-07-11 06:58:05 -07:00
Stepan Blyshchak	a294bfb03b	[sonic_debian_extension] fix packages.json generation and make the build fail when packages.json is not generated (#8044 ) After https://github.com/Azure/sonic-buildimage/pull/7598 the packages.json generation is broken. This change fixes it make the whole build fail in case generation failed. Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>	2021-07-09 12:29:33 -07:00
shlomibitton	776a446d76	[dhcp_relay] Disable dhcp_relay for ToRRouter switches type by the feature manager (#7789 ) - Why I did it Currently dhcp packets are disabled by the COPP manager for non ToRRouter type switches. Even if the feature is enabled, DHCP packets wont hook to the CPU since the COPP manager will not trap this packets. This change is to disable dhcp_relay by default for non ToRRouter switches from init_cfg.json. With this approach, if the user want to enable the feature for non ToRRouter switches, manual enablement is required by the 'feature' configuration. This is to keep the current approach for MSFT production issue with dhcp relay for non ToRRouter switched and allow the user to decide if to use it or not. - How I did it Configure dhcp_relay 'disabled' by default on init_cfg.json for non ToRRouter switches. Remove the exclusion of dhcp packets on copp_cfg.json - How to verify it Enable dhcp_relay feature on a non ToRRouter switch. Unit-tests modified so the default values on mocked CONFIG DB in 'test_vectors.py' for dhcp_relay will be 'disabled'. This is by the change for 'init_cfg.json.j2'. For ToRRouter the state will change from 'disabled' to 'enabled'. Another test case added for a 'ToR' switch type, this is to test the state is 'enabled' if the user configured it to be so.	2021-07-08 09:10:46 +03:00
rajendra-dendukuri	f4b0c8fe4e	[kdump] Fix kdump error message when a reboot is issued (#7985 ) dash doesn't support += operation to append to a variable's value. Use KDUMP_CMDLINE_APPEND="${KDUMP_CMDLINE_APPEND} " instead The below error message is seen when a reboot is issued. [ 342.439096] kdump-tools[13655]: /etc/init.d/kdump-tools: 117: /etc/default/kdump-tools: KDUMP_CMDLINE_APPEND+= panic=10 debug hpet=disable pcie_port=compat pci=nommconf sonic_platform=x86_64-accton_as7326_56x-r0: not found	2021-07-01 11:52:38 -07:00
Samuel Angebault	17f0217f30	[Arista] Chassis device configurations (#7529 ) Add configurations for the following chassis elements Fabrics 7804R3-FM, 7808R3-FM and 7808R3A-FM Linecard 7800R3-48CQ2 Supervisor 7800-SUP*	2021-06-30 18:16:20 -07:00
vganesan-nokia	ffca17da0b	[voqsystemlagid] Fix for timing issue in setting system lag id boundary (#7911 ) The voq system lag id boundary is set in redis-chassis. Changes include setting this from database-chassis container. This fixes a timing issue in finding datbase_config.json file from redis directory which is created from database container. Since database container usually starts after database-chassis container the existence of this file is unreliable while running the command. Running the command under database-chassis container makes sure that the database_config.json form redis-chassis directory is guaranteed to be available and hence fixes the timing issue. Signed-off-by: vedganes <vedavinayagam.ganesan@nokia.com>	2021-06-30 09:57:08 -07:00
Ying Xie	7236fa98e8	Revert "[Kubernetes]: The kube server could be used as http-proxy for docker (#7469 )" (#8023 ) This change causes nightly test to fail due to the fake proxy IP is not reachable. Reverts #7469 This reverts commit `f7ed82f44a`.	2021-06-29 18:43:53 -07:00
Stepan Blyshchak	9de7e6860b	[sonic-app-ext] support app extensions installation during build (#7593 ) Signed-off-by: Stepan Blyschak stepanb@mellanox.com Why I did it To support building DHCP relay as extension and installing it during build time. How I did it Created infrastructure. Users need to define their packages in rules/sonic-packages.mk How to verify it Together with #6531	2021-06-29 09:07:33 -07:00
Stepan Blyshchak	9ce7c6d9fe	[hostcfgd] Configure service auto-restart in hostcfgd. (#5744 ) Before this change, a process running inside every SONiC container dealt with FEATURE table 'auto_restart' field and depending on the value decided whether a container has to be killed or not. If killed service auto restart mechanism restarts the container. This change moves the logic from container to the host daemon - hostcfgd. The 'auto_restart' handling is kept in supervisor-proc-exit-listener but now it is not required for container that wants to support auto restart feature. hostcfgd refactoring - move feature handling in another class. override systemd service Restart= setting from hostcfgd. remove default systemd Restart=always. Signed-off-by: Stepan Blyshchak stepanb@nvidia.com - Why I did it Remove the need to deal with container orchestration logic from the container itself. Leave this logic to the orchestrator - host OS. - How I did it hostcfgd configures 'Restart=' value for systemd service. - How to verify it root@r-tigon-11:/home/admin# sudo config feature autorestart lldp enabled root@r-tigon-11:/home/admin# show feature status \| grep lldp lldp enabled enabled root@r-tigon-11:/home/admin# docker exec -it lldp pkill -9 lldpd root@r-tigon-11:/home/admin# docker ps -a \| grep lldp 65058396277c docker-lldp:latest "/usr/bin/docker-lld…" 2 days ago Exited (0) 20 seconds ago lldp root@r-tigon-11:/home/admin# docker ps -a \| grep lldp 65058396277c docker-lldp:latest "/usr/bin/docker-lld…" 2 days ago Up 5 seconds lldp root@r-tigon-11:/home/admin# sudo config feature autorestart lldp disabled root@r-tigon-11:/home/admin# docker exec -it lldp pkill -9 lldpd root@r-tigon-11:/home/admin# docker ps -a \| grep lldp 65058396277c docker-lldp:latest "/usr/bin/docker-lld…" 2 days ago Up 35 seconds lldp root@r-tigon-11:/home/admin# docker ps -a \| grep lldp 65058396277c docker-lldp:latest "/usr/bin/docker-lld…" 2 days ago Exited (0) 3 seconds ago lldp root@r-tigon-11:/home/admin# docker ps -a \| grep lldp 65058396277c docker-lldp:latest "/usr/bin/docker-lld…" 2 days ago Exited (0) 39 seconds ago lldp root@r-tigon-11:/home/admin#	2021-06-29 09:06:21 -07:00
xumia	5c503b81ae	Fix vtysh shell-ingestion security issue (#7759 ) Fix vtysh shell-ingestion security issue Only expose the limited parameters of the command vtysh show.	2021-06-28 09:57:08 +08:00
Santhosh Kumar T	f8eb5b0958	Flashrom refactoring for broadcom platforms (#7693 ) #### Why I did it - To build flashrom properly with dependency tracking. #### How I did it - Moved flashrom code from platform/broadcom/sonic-platform-modules-dell/tools directory to src/flashrom directory. - At the end, flashrom_0.9.7_amd64.deb package is build which will be installed in the devices. - Currently flashrom builds only for Dell S6100 platforms.	2021-06-22 15:29:21 -07:00
judyjoseph	3ad830eb49	New sonic-buildimage images for Broadcom DNX ASIC family. (#7598 ) Introduce new sonic-buildimage images for Broadcom DNX ASIC family. sonic-broadcom-dnx.bin sonic-aboot-broadcom-dnx.swi How I did it NO CHANGE to existing make commands make init; make configure PLATFORM=broadcom; make target/sonic-aboot-broadcom.swi; make target/sonic-broadcom.bin The difference now is that it will result in new broadcom images for DNX asic family as well. sonic-broadcom.bin, sonic-broadcom-dnx.bin sonic-aboot-broadcom.swi, sonic-aboot-broadcom-dnx.swi Note: This PR also adds support for Broadcom SAI 5.0 (based on 1.8 SAI ) for DNX based platform + changes in platform x86_64-arista_7280cr3_32p4 bcm config files and platform_env.conf files	2021-06-22 11:12:22 -07:00
Kebo Liu	078e0e0410	mount 'mellanox' folder only instead of create each sub folder (#7830 ) #### Why I did it Following the discussion in another PR https://github.com/Azure/sonic-buildimage/pull/7708#discussion_r642933510 , since there will be multi subfolders under /var/log/mellanox, so we agreed to only mount this folder and the subfolders will be created afterward on demand. #### How I did it during the syncd docker creation, only mount folder /var/log/mellanox #### How to verify it build an Mellanox image and verify the related folder on the host and docker side.	2021-06-22 06:27:56 -07:00
Sudharsan Dhamal Gopalarathnam	c88c3c7ba5	Grouping delayed services under a target for config reload checks (#7846 ) #### Why I did it Create a target for delayed service timers. Few services in sonic have delayed to speed up the bring up of the system and essential services. However there is no way to track when they start. This will be a problem when executing config reload as config reload expects all services to be up. Hence grouped all the timers that trigger the delayed services under one target so that they could be tracked in 'config reload' command #### How I did it Created delay.target service and add created dependency on the delayed targets.	2021-06-21 11:55:02 -07:00
Sujin Kang	ecc5073731	Support multiple pcie configuration file and change the pcie status table name to match with pcied changes (#7886 ) Why I did it Support multiple pcie configuration file and change the pcie status table name This is to match with below two PRs. Azure/sonic-platform-common#195 Azure/sonic-platform-daemons#189 How I did it Check pcie configuration file with wild card and change the device status table name How to verify it Restart with changes and see if the pcie check works as expected.	2021-06-16 16:05:48 -07:00
Ann Pokora	3d629233bf	[MPLS][libnl3] libnl patches for supporting MPLS * New accessors in libnl3 for MPLS attributes * contains patch files for bug fixes in libnl3 for MPLS attribute parsing	2021-06-16 15:08:23 -07:00
Renuka Manavalan	f7ed82f44a	[Kubernetes]: The kube server could be used as http-proxy for docker (#7469 ) Why I did it The SONiC switches get their docker images from local repo, populated during install with container images pre-built into SONiC FW. With the introduction of kubernetes, new docker images available in remote repo could be deployed. This requires dockerd to be able to pull images from remote repo. Depending on the Switch network domain & config, it may or may not be able to reach the remote repo. In the case where remote repo is unreachable, we could potentially make Kubernetes server to also act as http-proxy. How I did it When admin explicitly enables, the kubernetes-server could be configured as docker-proxy. But any update to docker-proxy has to be via service-conf file environment variable, implying a "service restart docker" is required. But restart of dockerd is vey expensive, as it would restarts all dockers, including database docker. To avoid dockerd restart, pre-configure an http_proxy using an unused IP. When k8s server is enabled to act as http-proxy, an IP table entry would be created to direct all traffic to the configured-unused-proxy-ip to the kubernetes-master IP. This way any update to Kubernetes master config would be just manipulating IPTables, which will be transparent to all modules, until dockerd needs to download from remote repo. How to verify it Configure a switch such that image repo is unreachable Pre-configure dockerd with http_proxy.conf using an unused IP (e.g. 172.16.1.1) Update ctrmgrd.service to invoke ctrmgrd.py with "-p" option. Configure a k8s server, and deploy an image for feature with set_owner="kube" Check if switch could successfully download the image or not.	2021-06-16 07:46:01 -07:00
arlakshm	4d07bbbec6	[Yang][cfggen] update sonic-cfggen to generate config_db from Yang data (#7712 ) Why I did it This PR adds changes in sonic-config-engine to consume configuration data in SONiC Yang schema and generate config_db entries How I did it Add a new file sonic_yang_cfg_generator . This file has the functions to parse yang data json and convert them in config_db json format. Validate the converted config_db entries to make sure all the dependencies and constraints are met. Add a new option -Y to the sonic-cfggen command for this purpose Add unit tests This capability is support only in sonic-config-engine Python3 package only	2021-06-10 12:03:33 -07:00
yozhao101	1a3cab43ac	[Monit] Deprecate the feature of monitoring the critical processes by Monit (#7676 ) Signed-off-by: Yong Zhao yozhao@microsoft.com Why I did it Currently we leveraged the Supervisor to monitor the running status of critical processes in each container and it is more reliable and flexible than doing the monitoring by Monit. So we removed the functionality of monitoring the critical processes by Monit. How I did it I removed the script process_checker and corresponding Monit configuration entries of critical processes. How to verify it I verified this on the device str-7260cx3-acs-1.	2021-06-04 10:16:53 -07:00
Renuka Manavalan	73447efc31	Add service to restore TACACS from old config (#7560 ) Why I did it In upgrade scenarios, where config_db.json is not carry forwarded to new image, it could be left w/o TACACS credentials. Added a service to trigger 5 minutes after boot and restore TACACS, if /etc/sonic/old_config/tacacs.json is present. How I did it By adding a service, that would fire 5 mins after boot. This service apply tacacs if available. How to verify it Upgrade and watch status of tacacs.timer & tacacs.service You may create /etc/sonic/old_config/tacacs.json, with updated credentials (before 5mins after boot) and see that appears in config & persisted too. Which release branch to backport (provide reason below if selected) 201911 202006 202012	2021-06-03 20:07:17 -07:00
Andriy Kokhan	6931a45ecf	Fixed typos in config-setup (#7754 ) Signed-off-by: Andriy Kokhan <andriyx.kokhan@intel.com>	2021-06-03 08:59:38 -07:00
yozhao101	37863ac854	[Monit] Restart telemetry container if memory usage is beyond the threshold (#7645 ) Signed-off-by: Yong Zhao yozhao@microsoft.com Why I did it This PR aims to monitor the memory usage of streaming telemetry container and restart streaming telemetry container if memory usage is larger than the pre-defined threshold. How I did it I borrowed the system tool Monit to run a script memory_checker which will periodically check the memory usage of streaming telemetry container. If the memory usage of telemetry container is larger than the pre-defined threshold for 10 times during 20 cycles, then an alerting message will be written into syslog and at the same time Monit will run the script restart_service to restart the streaming telemetry container. How to verify it I verified this implementation on device str-7260cx3-acs-1.	2021-05-28 11:13:44 -07:00
Stepan Blyshchak	d7b96dfdf1	[sonic-sdk] add sonic sdk and sonic sdk buildenv (#6712 ) - Why I did it To give SONiC Application Extension developers an environment to run and develop their apps. - How I did it Created sonic-sdk and sonic-sdk-buildenv dockers and their dbg versions. - How to verify it Build: $ make -f slave target/sonic-sdk.gz target/sonic-sdk-buildenv.gz	2021-05-28 10:16:02 -07:00
Renuka Manavalan	2cd61bc136	Invoke disk check periodically. (#7374 ) Why I did it Helps with periodic scan of disk for RO state. If found, this script makes transient fix and raise error message.	2021-05-26 17:59:08 -07:00
Lawrence Lee	79914f5336	[swss.service]: Remove ordering with pmon (#7614 ) Signed-off-by: Lawrence Lee <lawlee@microsoft.com>	2021-05-26 09:12:54 -07:00
Prince Sunny	556a1dc9a8	[Mux] Do not clean-up HW_MUX_CABLE_TABLE from State DB (#7710 ) Co-authored-by: Ubuntu <prsunny@prince-vm.vzw1i4tqyeburcdz5lrgulxi2c.yx.internal.cloudapp.net>	2021-05-26 09:12:34 -07:00
shlomibitton	9930e738fe	Remove 'vm.panic_on_oom=1' (#7678 ) #### Why I did it If a process limits using nodes by mempolicy/cpusets, and those nodes become memory exhaustion status, one process may be killed by oom-killer. No panic occurs in this case, because other node's memory may be free. This means system total status may be not fatal yet. #### How I did it Remove 'vm.panic_on_oom=1' kernel flag from 'vmcore-sysctl.conf '	2021-05-24 17:23:49 -07:00
Alexander Allen	da7533aad4	[ntp] Fix ntp.conf template to allow setting of source port in CONFIG_DB (#7586 ) Why I did it Currently, there is a bug in the ntp.conf jinja2 template where it will ignore the src_intf directive in CONFIG_DB if there are multiple IP addresses associated with an interface. This code change fixes that bug and allows the template to select the correct source interface for NTP. How I did it I did this by modifying the macro in ntp.conf.j2 which determines if there is an ip address associated with an interface to set a state variable when it detects a valid interface entry in CONFIG_DB instead of outputting "true" directly (which could result in multiple "trues" outputted for interfaces with multiple valid IP addresses). How to verify it Add two ipv4 addresses to an interface in SONiC Add the following configuration to config_db.json { "NTP": { "global": { "src_intf": "Ethernet1" } } } Replace Ethernet1 with the interface name of the one you assigned the IP addresses to. Run sudo config reload -y Open /etc/ntp.conf and verify that the following line exists ... interface listen Ethernet1 ... The interface specified should be the one set in the previous steps. Description for the changelog [ntp] Fix ntp.conf template to allow setting of source port in CONFIG_DB	2021-05-23 13:40:43 -07:00
Neetha John	3b06f44555	[qos]: modify dot1p to tc mapping (#7661 ) Map priority 0 to TC 1 and priority 1 to TC 0 Send traffic on priority 0 and 1 and verified that it gets mapped correctly in hw Signed-off-by: Neetha John <nejo@microsoft.com>	2021-05-20 10:36:39 -07:00
Sujin Kang	c6462577a9	add config-setup.service as dependency for pcie-check.service (#7599 ) Why I did it start pcie-check.service after config-setup.service since pcie_util depends on device_info which is available with config db metadata. How I did it Add config-setup.service as a dependency of pcie-check.service How to verify it Upon reboot, check if the pcie-check.sh throws the platform api error which is dependent on DEVICE_METADATA	2021-05-18 14:19:02 -07:00
Ze Gan	8f883fee67	[macsec]: Bind macsec service to sonic.target (#7642 ) MACsec service cannot be enabled by "sudo config feature state macsec enabled" Signed-off-by: Ze Gan <ganze718@gmail.com>	2021-05-18 11:44:21 -07:00
Renuka Manavalan	7a575b3d00	[container_checker] Use Feature table to get running containers (#7474 ) Why I did it Finding running containers through "docker ps" breaks when kubernetes deploys container, as the names are mangled. How I did it The data is is available from FEATURE table, which takes care of kubernetes deployment too. How to verify it Deploy a feature via kubernetes and don't expect error from container_check.	2021-05-07 08:42:15 -07:00
Nazarii Hnydyn	6e264d8ac9	[swss_vars]: Add 'resource_type' attribute. (#7526 ) Signed-off-by: Nazarii Hnydyn <nazariig@nvidia.com>	2021-05-06 12:14:21 -07:00
Stepan Blyshchak	cd2c86eab6	[dockers] label SONiC Docker with manifest (#5939 ) Signed-off-by: Stepan Blyschak stepanb@nvidia.com This PR is part of SONiC Application Extension Depends on #5938 - Why I did it To provide an infrastructure change in order to support SONiC Application Extension feature. - How I did it Label every installable SONiC Docker with a minimal required manifest and auto-generate packages.json file based on installed SONiC images. - How to verify it Build an image, execute the following command: admin@sonic:~$ docker inspect docker-snmp:1.0.0 \| jq '.[0].Config.Labels["com.azure.sonic.manifest"]' -r \| jq Cat /var/lib/sonic-package-manager/packages.json file to verify all dockers are listed there.	2021-04-26 13:51:50 -07:00
Guohan Lu	27a635a15a	Revert "Flashrom refactoring (#6922 )" This reverts commit `7dd9d1f3f2`.	2021-04-25 11:51:35 -07:00
xumia	56bdd750ab	Support readonly vtysh for sudoers (#7383 ) Why I did it Support readonly version of the command vtysh How I did it Check if the command starting with "show", and verify only contains single command in script.	2021-04-25 16:32:02 +08:00
a-barboza	ec9101f9c5	RADIUS Management User Authentication Feature (#7284 ) Why I did it HLD: https://github.com/Azure/SONiC/blob/master/doc/aaa/radius_authentication.md CLI: In a separate PR. How I did it How to verify it UT: src/sonic-host-services/tests/hostcfgd/hostcfgd_radius_test.py	2021-04-23 19:09:41 -07:00
Stepan Blyshchak	ae339c95d2	[systemd] disable default systemd udev rules for interfaces (#7369 ) Fix #7364 99-default.link - was always in SONiC, but previous systemd (<247) had an issue and it did not work due to issue systemd/systemd#3374. Now systemd 247 works. However, such policy overrides teamd provided mac address which causes teamd netdev to use a random mac address. Therefore, needs to be disabled. Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>	2021-04-21 17:50:00 -07:00
Santhosh Kumar T	7dd9d1f3f2	Flashrom refactoring (#6922 ) #### Why I did it To build flashrom properly with dependency tracking. #### How I did it Moved flashrom code from platform/broadcom/sonic-platform-modules-dell/tools directory to src/flashrom directory. At the end, flashrom_0.9.7_amd64.deb package is build which will be installed in the devices.	2021-04-20 15:24:44 -07:00
Samuel Angebault	96690faa5b	[Arista] Fix dockerd issue on Arista platforms (#7376 ) Why I did it Recent systemd upgrade from #7228 requires an extra cmdline parameter for dockerd to start properly. Updating boot0 was missed as part of the systemd upgrade change. How I did it Just added the missing cmdline parameter in files/Aboot/boot0.j2 This change fixes #7372 How to verify it Boot the image and dockerd should start normally.	2021-04-20 14:55:14 -07:00
Kuanyu Chen	01f2b5f250	[config-setup]: Fix a bug in checking if updategraph is enabled (#7093 ) Encounter error during "config-setup boot" if the updategraph is enabled. How I did it Correct the code inside the config-setup script. Remove the space between the assignment operator. How to verify it Remove the /etc/sonic/config_db.json and reboot the device. Originally, it will return following error after boot up. rv: command not found After modification, it can correctly parse the status of updategraph without error.	2021-04-19 11:40:52 -07:00
guxianghong	6fe6d7394d	[arm] support compile sonic arm image on arm server (#7285 ) - Support compile sonic arm image on arm server. If arm image compiling is executed on arm server instead of using qemu mode on x86 server, compile time can be saved significantly. - Add kernel argument systemd.unified_cgroup_hierarchy=0 for upgrade systemd to version 247, according to #7228 - rename multiarch docker to sonic-slave-${distro}-march-${arch} Co-authored-by: Xianghong Gu <xgu@centecnetworks.com> Co-authored-by: Shi Lei <shil@centecnetworks.com>	2021-04-18 08:17:57 -07:00
Stepan Blyshchak	4369361894	[sonic_debian_extension.j2] fix systemd version not from buster-backports (#7322 ) Install systemd explicitelly from backports and install libsystemd* packages from backports. Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>	2021-04-18 08:07:02 -07:00
jmmikkel	43342b33b8	[chassis] Add templates and code to support VoQ chassis iBGP peers (#5622 ) This commit has following changes: * Add templates and code to support VoQ chassis iBGP peers * Add support to convert a new VoQChassisInternal element in the BGPSession element of the minigraph to a new BGP_VOQ_CHASSIS_NEIGHBOR table in CONFIG_DB. * Add a new set of "voq_chassis" templates to docker-fpm-frr * Add a new BGP peer manager to bgpcfgd to add neighbors from the BGP_VOQ_CHASSIS_NEIGHBOR table using the voq_chassis templates. * Add a test case for minigraph.py, making sure the VoQChassisInternal element creates a BGP_VOQ_CHASSIS_NEIGHBOR entry, but not if its value is "false". * Add a set of test cases for the new voq_chassis templates in sonic-bgpcfgd tests. Note that the templates expect the new "bgp bestpath peer-type multipath-relax" bgpd configuration to be available. Signed-off-by: Joanne Mikkelson <jmmikkel@arista.com>	2021-04-16 11:11:32 -07:00
yozhao101	2737c9681f	[container_checker] Exclude the 'always_disabled' container from expected running container list (#7217 ) Signed-off-by: Yong Zhao yozhao@microsoft.com Why I did it Since we introduced a new value always_disabled for the state field in FEATURE table, the expected running container list should exclude the always_diabled containers. This bug was found by nightly test and posted at here: issue. This PR fixes #7210. How I did it I added a logic condition to decide whether the value of state field of a container was always_disabled or not. How to verify it I verified this on the device str-dx010-acs-1. Which release branch to backport (provide reason below if selected) 201811 201911 202006 [ x] 202012	2021-04-02 08:05:46 -07:00
vganesan-nokia	b313d4d092	[systemlag] Lag id boundary set for system lag (#6488 ) Signed-off-by: vedganes <vedavinayagam.ganesan@nokia.com> Changes for setting platfrom specific lag id boundary id in the chassis app db. The platfrom specific lag id boundaries are supplied via chassisdb.conf. The lag_id_start and lag_id_end boundary values sourced from this file are set in chassis app db which will be used by lag id allocator to allocate unique lag id in atomic fashion	2021-03-30 23:21:53 -07:00
Stepan Blyshchak	1f7d9e2698	[docker_img_ctl.j2] make tmpfs mounts optional and add ability to run container by image id (#6439 ) - Why I did it I made the docker_img_ctl.j2 applicable for more dockers (including application extensions dockers) by adding an option not to mount tmpfs on /tmp/ and /var/tmp/. In some applications /tmp/ is a different docker volume which can't be tmpfs. Also, I added and ability to pass REPO[:TAG]\|[@digest]/IMAGE_ID instead of just REPO name. - How I did it Modified docker_img_ctl.j2 and docker makefiles. - How to verify it Run it on the switch.	2021-03-16 17:03:12 +02:00
Stepan Blyshchak	2b8941e716	[sonic_debian_extension] add docker script to SONiC filesystem (#5935 ) - Why I did it To allow SONiC Package Migration during SONiC-2-SONiC upgrade we need to start docker daemon in chroot-ed environment in new SONiC filesystem. Later this script will be used to start dockerd in chroot environment on SONiC - How I did it Install a docker service script into /usr/lib/docker/ in SONiC filesystem. - How to verify it Install SONiC image on the switch, mount squashfs to some directory, mount overlay rw layer over squashfs, mount procfs and sysfs, mount docker library. Start the docker using: root@sonic:~$ /usr/lib/docker/docker.sh start Signed-off-by: Stepan Blyshchak <stepanb@nvidia.com>	2021-03-14 14:15:42 +02:00
Tamer Ahmed	51ab39fcb2	[hostcfgd]: Add Ability To Configure Feature During Run-time (#6700 ) Features may be enabled/disabled for the same topology based on run-time configuration. This PR adds the ability to enable/disable feature based on config db data. signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>	2021-03-13 05:56:27 -08:00
Renuka Manavalan	6f7cd8d772	Copy dummy flannel.conf to get around absence of CNI Network (#6985 ) Why I did it We skip install of CNI plugin, as we don't need. But this leaves node in "not ready" state, upon joining master. To fix, we copy this dummy .conf file in /etc/cni/net.d How I did it Keep this file in /usr/share/sonic/templates and copy to /etc/cni/net.d upon joining k8s master. How to verify it Upon configuring master-IP and enable join, watch node join and move to ready state. You may verify using kubectl get nodes command	2021-03-09 19:49:54 -08:00
yozhao101	21f5e1280d	[Supervisord] Deduplicate the alerting messages of critical processes from Supervisord. (#6849 ) Signed-off-by: Yong Zhao yozhao@microsoft.com Why I did it In the configuration of rsyslog, duplicate messages will be suppressed and reported in the format of message repeated n times. Due to this behavior, if a critical process in a container exited unexpectedly, the alerting message will be written into syslog once and not be written into syslog anymore until the second critical process exited. This PR aims to differentiate these alerting messages such that they will not be suppressed by rsyslogd and can appear in the syslog periodically. How I did it This PR adds a counter into the alerting message and shows how many minutes a critical process was not running. How to verify it I verified and test this implementation on a physical DUT.	2021-02-25 14:35:29 -08:00
Stepan Blyshchak	12c03c4f25	[sonic_debian_exntesion] install docker_image_ctl.j2 template in the image templates (#5937 ) SONiC Package Manager will require to auto-generate the start script using that template. For that, we need this template to be recorded in SONiC filesystem. Signed-off-by: Stepan Blyshchak <stepanb@nvidia.com>	2021-02-25 09:11:12 -08:00
Stepan Blyshchak	e179ec2fae	[services] introduce sonic.target (#5705 ) - Why I did it Group all SONiC services together and able to manage them together. Will be used in config reload command as much simpler and generic way to restart services. - How I did it Add services to sonic.target - How to verify it Together with Azure/sonic-utilities#1199 config reload -y Signed-off-by: Stepan Blyshchak <stepanb@nvidia.com>	2021-02-25 14:26:24 +02:00
Ze Gan	4068944202	[MACsec]: Set MACsec feature to be auto-start (#6678 ) 1. Add supervisord as the entrypoint of docker-macsec 2. Add wpa_supplicant conf into docker-macsec 3. Set the macsecmgrd as the critical_process 4. Configure supervisor to monitor macsecmgrd 5. Set macsec in the features list 6. Add config variable `INCLUDE_MACSEC` 7. Add macsec.service - How to verify it Change the `/etc/sonic/config_db.json` as follow ``` { "PORT": { "Ethernet0": { ... "macsec": "test" } } ... "MACSEC_PROFILE": { "test": { "priority": 64, "cipher_suite": "GCM-AES-128", "primary_cak": "0123456789ABCDEF0123456789ABCDEF", "primary_ckn": "6162636465666768696A6B6C6D6E6F707172737475767778797A303132333435", "policy": "security" } } } ``` To execute `sudo config reload -y`, We should find the following new items were inserted in app_db of redis ``` 127.0.0.1:6379> keys MAC 1) "MACSEC_EGRESS_SC_TABLE:Ethernet0:72152375678227538" 2) "MACSEC_PORT_TABLE:Ethernet0" 127.0.0.1:6379> hgetall "MACSEC_EGRESS_SC_TABLE:Ethernet0:72152375678227538" 1) "ssci" 2) "" 3) "encoding_an" 4) "0" 127.0.0.1:6379> hgetall "MACSEC_PORT_TABLE:Ethernet0" 1) "enable" 2) "false" 3) "cipher_suite" 4) "GCM-AES-128" 5) "enable_protect" 6) "true" 7) "enable_encrypt" 8) "true" 9) "enable_replay_protect" 10) "false" 11) "replay_window" 12) "0" ``` Signed-off-by: Ze Gan <ganze718@gmail.com>	2021-02-23 13:22:45 -08:00
arlakshm	f77157f09d	[baseimage] add ipintutil in sudoer file (#6845 ) show ip interfaces is enhanced recently to support multi ASIC platforms in this PR- https://github.com/Azure/sonic-utilities/pull/1396 . The ipintutil script as to run as sudo user, to get the ip interface from each namespace. Add this script to the sudoer file so that show ip interface command is available for user with read-only permissions Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>	2021-02-22 23:34:28 -08:00
Sujin Kang	d5238ae8dd	[pcie.yaml] Move pcie configuration file path to platform directory (#6475 ) - Why I did it The pcie configuration file location is under plugin directory not under platform directory. #6437 - How I did it Move all pcie.yaml configuration file from plugin to platform directory. Remove unnecessary timer to start pcie-check.service Move pcie-check.service to sonic-host-services - How to verify it Verify on the device	2021-02-21 08:27:37 -08:00
Samuel Angebault	5fb374b03d	[Arista] Driver and platform update (#6468 ) - Add support for `DCS-7050SX3-48YC8` and `DCS-7050SX3-48C8` platform - Add support for more variants of `DCS-7280CR3-32[PD]4` - Add Supervisor to Linecard consutil support - Complete Watchdog platform API support - Fix some PSU behavior on `DCS-7050QX-32` and `DCS-7060CX-32S` - Fix SEU management on `DCS-7060CX-32S` - Allow kernel modules to build up to linux 5.10 - Rename led color `orange` to `amber` - Miscellaneous fixes	2021-02-19 10:48:52 -08:00
SuvarnaMeenakshi	5a49a0f499	[multi-asic][vs]: Update topology script to retrieve hwsku from minigraph (#6219 ) Update topology script to retrieve hwsku from minigraph if hwsku information is not available in config_db. Fix clean up of interfaces in msft_multi_asic_vs hwsku topology script. - Why I did it When bringing up multi-asic VS switch, topology service is started during boot up. Topology service starts a shell script which runs the topology script present in /usr/share/sonic/device// directory. To invoke hwsku specific script, the topology script tries to retrieve hwsku information from config_db. During initial boot up config_db might not be populated. In order to start topology service before config_db is updated, update topology script to get hwsku information from minigraph.xml if it is available. This will be helpful to bring up multi-asic VS testbed by loading minigraph and starting topology service. - How I did it Update topology.sh script to retrieve hwsku information from minigraph.xml. Fix clean up function on msft_multi_asic_vs toplogy script. - How to verify it single-asic VS - no change; topology service is only enabled for multi-asic VS. multi-asic VS - Bring up multi-asic VS image, copy minigraph to vs image, start topology service. Topology service should be successful. to test clean up function fix, start topology service - make sure interfaces are created and moved to the right namespaces. stop topology service - make sure namespace do not have any interface and all front end interfaces are present in default namespace.	2021-02-18 22:02:29 -08:00
xumia	2ef5bd2e90	Add mirrors for reproducible build (#6813 )	2021-02-18 14:59:52 +08:00
shlomibitton	f6bee7306e	Stop teamd service before syncd (#6755 ) - What I did All SWSS dependent services should stop before SWSS service to avoid future possible issues. For example 'teamd' service will stop before to allow the driver unload netdev gracefully. This is to stop all LAG's before restarting syncd service when running 'config reload' command. - How I did it Change the order of dependent services of SWSS. - How to verify it Run 'config reload' command. Previously the operation failed when a large number of PortChannel configured on the system. Signed-off-by: Shlomi Bitton <shlomibi@nvidia.com>	2021-02-15 16:05:34 +02:00
Lawrence Lee	97c605f1f7	[swss]: Clear MUX-related state DB tables on start (#6759 ) * Add MUX_CABLE_TABLE to set of tables to clear on SWSS start, which will clear HW_MUX_CABLE_TABLE and MUX_CABLE_TABLE * Order swss to start before pmon to ensure that DBs are cleared before xcvrd (running inside pmon) starts and re-populates the tables Signed-off-by: Lawrence Lee <lawlee@microsoft.com>	2021-02-14 12:43:49 -08:00
dflynn-Nokia	88961f1339	[armhf build] Fix azure-storage dependency on cryptography package (#6780 ) Fix marvell-armhf build break The azure-storage package depends on the cryptography package. Newer versions of cryptography require the rust compiler, the correct version for which is not readily available in buster. Hence we pre-install an older version here to satisfy the azure-storage dependency. Note: This is not a problem for other architectures as pre-built versions of cryptography are available for those. This sequence can be removed after upgrading to debian bullseye.	2021-02-14 10:36:04 -08:00
Lior Avramov	6f8c31554f	[systemd] Increase syncd startup script timeout to support FW upgrade on init. (#6709 ) - Why I did it To support FW upgrade on init. - How I did it Change timeout value - How to verify it I manually changed ASIC and Gearbox FW followed by hard reset in order for FW upgrade to take place on init. Signed-off-by: liora <liora@nvidia.com>	2021-02-11 12:53:36 +02:00
Arun Saravanan Balachandran	3015de1dd0	[sonic-host-service] Move to sonic-host-services package (#6273 ) - Why I did it To move ‘sonic-host-service’ which is currently built as a separate package to ‘sonic-host-services' package. - How I did it - Moved 'sonic-host-server' to 'src/sonic-host-services' and included it as part of the python3 wheel. - Other files were moved to 'src/sonic-host-services-data' and included as part of the deb package. - Changed build option ‘INCLUDE_HOST_SERVICE’ to ‘ENABLE_HOST_SERVICE_ON_START’ for enabling sonic-hostservice at boot-up by default.	2021-02-08 19:35:08 -08:00
SuvarnaMeenakshi	62a599a5b3	[multi_asic][vs]: Add dependency in teamd service to start after topology service(#6594 ) [multi_asic][vs]: Add dependency in teamd service to start after topology service. - Why I did it In multi-asic VS, topology service is run after database service to set up the internal asic topology. swss and syncd have a dependency to start after topology service is run so that the interfaces are moved to right namespace and created in the right namespace. In case of multi-asic vs, during the initial boot up, when there is no configuration added, teamd service starts and swss/syncd do not start as topology service does not start. Upon loading configuration using config_db or minigraph, swss and sycnd start up , but teamd is not restarted as swss is not stopped and started. This causes teamd to be in a bad state and requires a reload of config. - How I did it Add dependency in teamd service to start after topology service is completed. - How to verify it No change in single asic vs or platform. No change in multi-asic regular image. Change only in multi-asic VS. Bring up a multi-asic VS image without any configration, teamd service will fail to start due to dependency failure. Load minigraph, start topology service, load configuration, ensure all services come up. Signed-off-by: SuvarnaMeenakshi <sumeenak@microsoft.com>	2021-02-04 14:10:56 -08:00
Joe LeVeque	820d350301	[pcie-check] Update underlying pcieutil command and add to sudoers file (#6682 ) - Why I did it As of Azure/sonic-utilities#1297, subcommands of pcieutil have changed to remove the redundant pcie- prefix. This PR adapts calling applications (pcie-check) to the new syntax. Resolves #6676 - How I did it Remove pcie- prefix from pcieutil subcommands in calling applications Also add pcieutil * to sudoers file, as pcieutil requires elevated permissions	2021-02-04 12:14:08 -08:00
Guohan Lu	3f2a39d583	[proc-exit-listener]: fix syntax error the bug is introduced in commit `34cca20c` Signed-off-by: Guohan Lu <lguohan@gmail.com>	2021-02-02 03:58:20 -08:00
Samuel Angebault	0c4d4ace76	[kdump] Fix OOM events in crashkernel (#6447 ) A few issues where discovered with crashkernel on Arista platforms. 1) platforms using `docker_inram=on` would end up OOM in kdump environment. This happens because the same initramfs is used by SONiC and the crashkernel. With `docker_inram=on` the `dockerfs.tar.gz` is extracted in a `tmpfs` created for the occasion. Since `dockerfs.tar.gz` weights more than 1.5G, it doesn't fit into the kdump environment and ends up OOM. This OOM event can in turn trigger a panic. 2) Arista platforms with `secureboot` enabled would fail to load the crashkernel because the kernel parameter would be discarded on boot. This happens because the `boot0` in secureboot mode is strict about kernel parameter injection. 3) The secureboot path allowlist would remove kernel crash reports. 4) The kdump service would fail on Arista products since `/boot/` is empty in `secureboot` - How I did it 1) To prevent an OOM event in the crashkernel the fix is to avoid the codepaths in `union-mount` that create tmpfs and populate them. Some more codepath specific to Arista devices are also skipped to make the kdump process faster. This relies on detecting that the initramfs is starting in a kdump environment and skipping some initialization. The `/usr/sbin/kdump-config` tool appends a few kernel cmdline arguments when loading the crashkernel. The most unique one is `systemd.unit=kdump-tools.service` which is used in a few initramfs hooks to set `in_kdump`. 2) To allow `kdump` to work in `secureboot` environment the cmdline generation in boot0 was slightly modified. The codepath to load kernel parameters changed by SONiC is now running for booting in secure mode. It was altered to prevent an append only behavior which would grow the `kernel-cmdline` at every reboot. This ever growing behavior would lead `kexec` to fail to load the kernel due to a too long cmdline. 3) To get the kernel crash under /var/crash this path has to be added to `allowlist_paths` 4) The `/host/image-XXX/boot` folder is now populated in `secureboot` mode but not used. - How to verify it Regular boot: - enable kdump - enable docker_inram=on via kernel-params - reboot - generate a crash `echo c > /proc/sysrq-trigger` - before: witness OOM events on the console - after: crash kernel works and crash available under /var/crash Secure boot: - enable kdump - reboot - generate a crash `echo c > /proc/sysrq-trigger` - before: witness no kdump - after: crash kernel works and crash available under /var/crash Co-authored-by: Boyang Yu <byu@arista.com>	2021-02-02 01:55:09 -08:00
arlakshm	b5225407ef	[baseimage]: add docker ps to the sudoer file (#6604 ) fixes Azure/sonic-utilities#1389 With the recent changes in sudoer files. The show commands fails for the read-only users. The problem here is the 'docker ps' is failing in the function [get_routing_stack()](`8a1109ed30/show/main.py (L54)`) therefore all the CLI commands are failing. Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>	2021-01-29 08:16:32 -08:00
arlakshm	ff8cc49b18	[multi asic] add ip netns identify command to sudoer (#6591 ) Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com> - Why I did it The command sudo ip netns identify <pid> is used in function get_current_namespace to check in the cli command is running in host context or within a namespace. This function is used for every CLI command and command sudo ip netns identify <pid> needs to be added in sudoer files to allow users with RO access to run show cli commands This problem is not there on single asic platforms. - How I did it Add ip netns identify [0-9]* to sudoers file.	2021-01-28 23:12:01 -08:00
Guohan Lu	34cca20cb6	[proc-exit-listener]: ignore blank lines make proc-exit-listener more rebust Signed-off-by: Guohan Lu <lguohan@gmail.com>	2021-01-27 19:41:59 -08:00
abdosi	cfa8fbbf1a	[baseimage]: Updates for Ebtables and support for multi-asic (#6542 ) Following changes were done for ebtables: - Support for Multi-asic platforms. Ebtable filters are installed in namespace for multi-asic and not host. On Single asic installed on host. - For Multi-asic platforms we don't want to install on host otherwise Namespace-to-Namespace communication does not happens since ARP Request are not forwarded. - Updated to use text file to restore ebtables rules then the binary format. Rules are restore as part of Database docker init instead of rc.local - Removed the ebtable service files for buster as not needed as filters are restored/installed as part of database docker init. All the binaries are pre-installed with ebtables* binary are same as ebatbles-legacy-* Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>	2021-01-27 08:36:10 -08:00
judyjoseph	46b3bd5503	[teamd]: Increase wait timeout for teamd docker stop to clean Port channels. (#6537 ) The Portchannels were not getting cleaned up as the cleanup activity was taking more than 10 secs which is default docker timeout after which a SIGKILL will be send. Fixes #6199 To check if it works out for this issue in 201911 ? #6503 This issue is significantly seen in master branch compared to 201911 because the Portchannel cleanup takes more time in master. Test on a DUT with 8 Port Channels. master admin@str-s6000-acs-8:~$ time sudo systemctl stop teamd real 0m15.599s user 0m0.061s sys 0m0.038s Sonic 201911.v58 admin@str-s6000-acs-8:~$ time sudo systemctl stop teamd real 0m5.541s user 0m0.020s sys 0m0.028s	2021-01-23 20:57:52 -08:00
arlakshm	0e12ca81c7	[Multi Asic] support of swss.rec and sairedis.rec for multi asic (#6310 ) Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan arlakshm@microsoft.com - Why I did it This PR has the changes to support having different swss.rec and sairedis.rec for each asic. The logrotate script is updated as well - How I did it Update the orchagent.sh script to use the logfile name options in these PRs(Azure/sonic-swss#1546 and Azure/sonic-sairedis#747) In multi asic platforms the record files will be different for each asic, with the format swss.asic{x}.rec and sairedis.asic{x}.rec Update the logrotate script for multiasic platform .	2021-01-22 09:42:19 -08:00
yozhao101	be3c036794	[supervisord] Monitoring the critical processes with supervisord. (#6242 ) - Why I did it Initially, we used Monit to monitor critical processes in each container. If one of critical processes was not running or crashed due to some reasons, then Monit will write an alerting message into syslog periodically. If we add a new process in a container, the corresponding Monti configuration file will also need to update. It is a little hard for maintenance. Currently we employed event listener of Supervisod to do this monitoring. Since processes in each container are managed by Supervisord, we can only focus on the logic of monitoring. - How I did it We borrowed the event listener of Supervisord to monitor critical processes in containers. The event listener will take following steps if it was notified one of critical processes exited unexpectedly: The event listener will first check whether the auto-restart mechanism was enabled for this container or not. If auto-restart mechanism was enabled, event listener will kill the Supervisord process, which should cause the container to exit and subsequently get restarted. If auto-restart mechanism was not enabled for this contianer, the event listener will enter a loop which will first sleep 1 minute and then check whether the process is running. If yes, the event listener exits. If no, an alerting message will be written into syslog. - How to verify it First, we need checked whether the auto-restart mechanism of a container was enabled or not by running the command show feature status. If enabled, one critical process should be selected and killed manually, then we need check whether the container will be restarted or not. Second, we can disable the auto-restart mechanism if it was enabled at step 1 by running the commnad sudo config feature autorestart <container_name> disabled. Then one critical process should be selected and killed. After that, we will see the alerting message which will appear in the syslog every 1 minute. - Which release branch to backport (provide reason below if selected) 201811 201911 [x ] 202006	2021-01-21 12:57:49 -08:00
Qi Luo	25e4d773b9	[baseimage]: Cleanup sudoers file (#6518 )	2021-01-21 08:28:32 -08:00
Ying Xie	054f5b7a53	[warm boot finalizer] only wait for enabled components to reconcile (#6454 ) * [warm boot finalizer] only wait for enabled components to reconcile Define the component with its associated service. Only wait for components that have associated service enabled to reconcile during warm reboot. Signed-off-by: Ying Xie <ying.xie@microsoft.com>	2021-01-15 07:48:11 -08:00
yozhao101	04cd1d61e8	[Monit] Monitoring the running status of containers. (#6251 ) - Why I did it This PR aims to monitor the running status of each container. Currently the auto-restart feature was enabled. If a critical process exited unexpected, the container will be restarted. If the container was restarted 3 times during 20 minutes, then it will not run anymore unless we cleared the flag using the command `sudo systemctl reset-failed <container_name>` manually. - How I did it We will employ Monit to monitor a script. This script will generate the expected running container list and compare it with the current running containers. If there are containers which were expected to run but were not running, then an alerting message will be written into syslog. - How to verify it I tested this feature on a lab device `str-a7050-acs-3` which has single ASIC and `str2-n3164-acs-3` which has a Multi-ASIC. First I manually stopped a container by running the command `sudo systemctl stop <container_name>`, then I checked whether there was an alerting message in the syslog. Signed-off-by: Yong Zhao <yozhao@microsoft.com>	2021-01-07 19:52:22 -08:00
Renuka Manavalan	dbc6718408	Take a copy of existing TACACS credentials and restore it during upgrade (#6285 ) In scenario where upgrade gets config from minigraph, it could miss tacacs credentials as they are not in minigraph. Hence restore explicitly upon load-minigraph, if present. - Why I did it Upon boot, when config migration is required, the switch could load config from minigraph. The config-load from minigraph would wipe off TACACS key and disable login via TACACS, which would disable all remote user access. This change, would re-configure the TACACS if there is a saved copy available. - How I did it When config is loaded from minigraph, look for a TACACS credentials back up (tacacs.json) under /etc/sonic/old_config. If present, load the credentials into running config, before config-save is called. - How to verify it Remove /etc/sonic/config_db.json and do an image update. Upon reboot, w/o this change, you would not be able ssh in as remote user. You may login as admin and check out, "show tacacs" & "show aaa" to verify that tacacs-key is missing and login is not enabled for tacacs. With this change applied, remove /etc/sonic/config_db.json, but save tacacs & aaa credentials as tacacs.json in /etc/sonic/. Upon reboot, you should see remote user access possible.	2021-01-07 16:45:38 -08:00
Joe LeVeque	e52581e919	[PDDF] Build and install Python 3 package (#6286 ) - Make PDDF code compliant with both Python 2 and Python 3 - Align code with PEP8 standards using autopep8 - Build and install both Python 2 and Python 3 PDDF packages	2021-01-07 10:03:29 -08:00
Akhilesh Samineni	62e7c452d0	After first bootup, the FEATURE table is not present in CONFIG_DB (#5911 ) Fix the After first bootup(onie-install), the FEATURE table is not present in CONFIG_DB. Fix is done by calling config reload.	2021-01-05 09:22:16 -08:00
Joe LeVeque	566ea4f601	[system-health] Convert to Python 3 (#5886 ) - Convert system-health scripts to Python 3 - Build and install system-health as a Python 3 wheel - Also convert newlines from DOS to UNIX	2020-12-29 14:04:09 -08:00
Joe LeVeque	62662acbd5	No longer install some unnecessary Python 2 packages in host (#6301 ) - No longer install Python 2 packages in host: - libpython2.7-dev - docker - ipaddress - netifaces - azure-storage - watchdog - futures - Install Python 3 versions of the following packages in host: - docker - azure-storage - watchdog - redis - swsssdk (install unconditionally)	2020-12-29 13:02:11 -08:00
lguohan	162f0fdfe1	[init_cfg]: allow enable/disable swss/teamd/syncd services (#6291 ) swss/teamd/syncd services were changed to always enabled in commit `fad481edc1` as a workaround for not letting hostcfgd start service during the bootup process. commit `317a4b3410` introduce wait till full system bootup before updating feature states in hostcfgd. Thus, workaround introduced in commit `fad481ed` can be removed Signed-off-by: Guohan Lu <lguohan@gmail.com>	2020-12-28 10:33:46 -08:00
Samuel Angebault	163ed6acff	[Arista] Better handle arbitrary tmpfs in boot0 (#6274 ) To limit IO and space usage on the flash device the boot0 script makes sure the SWI is in memory. Because SONiC maps /tmp on the flash, some logic is required to make sure of it. However it is possible for some provisioning mechanism to already download the swi in a memory file system. This case was not handled properly by the boot0 script. It now detect if the image is on a tmpfs or a ramfs and keep it there if that is the case. The cleanup method has been updated accordingly and will only cleanup the mount path if it's below /tmp/ as to not affect user mounted paths. - How I did it Check the filesystem on which the SWI pointed by swipath lies. If this filesystem is a ramfs or a tmpfs the move_swi_to_tmpfs becomes a no-op. Made sure the cleanup logic would not behave unexpectedly. - How to verify it In SONiC: Download the swi under /tmp and makes sure it gets moved to /tmp/tmp-swi which gets mounted for that purpose. Make sure /tmp/tmp-swi gets unmounted once the install process is done. Create a new mountpoint under /ram using either ramfs or tmpfs and download the swi there. Install the swi using sonic-installer and makes sure the image doesn't get moved by looking at the logs.	2020-12-23 22:38:59 -08:00
Prabhu Sreenivasan	df13245b9f	[CRM] Add support for snat, dnat and ipmc crm resources (#6012 ) Signed-off-by: Prabhu Sreenivasan prabhu.sreenivasan@broadcom What I did Added support for snat, dnat and ipmc resources under CRM module. How I did it New feature NAT adds new resources snat_enty and dnat_entry that needs to be monitored. ipmc_entry tracks IP multicast resources used by switch. How to verify it sonic-utilities tests and crm spytest	2020-12-23 06:15:53 -08:00
lguohan	aa1cc848e2	[sonic-yang-mgmt-py2]: remove sonic-yang-mgmt py2 (#6262 ) No longer needed as sonic-utilties has been moved python3 Signed-off-by: Guohan Lu <lguohan@gmail.com>	2020-12-22 21:05:33 -08:00
Renuka Manavalan	ba02209141	First cut image update for kubernetes support. (#5421 ) * First cut image update for kubernetes support. With this, 1) dockers dhcp_relay, lldp, pmon, radv, snmp, telemetry are enabled for kube management init_cfg.json configure set_owner as kube for these 2) Each docker's start.sh updated to call container_startup.py to register going up As part of this call, it registers the current owner as local/kube and its version The images are built with its version ingrained into image during build 3) Update all docker's bash script to call 'container start/stop/wait' instead of 'docker start/stop/wait'. For all locally managed containers, it calls docker commands, hence no change for locally managed. 4) Introduced a new ctrmgrd service, that helps with transition between owners as kube & local and carry over any labels update from STATE-DB to API server 5) hostcfgd updated to handle owner change 6) Reboot scripts are updatd to tag kube running images as local, so upon reboot they run the same image. 7) Added kube_commands.py to handle all updates with Kubernetes API serrver -- dedicated for k8s interaction only.	2020-12-22 08:01:33 -08:00
Prabhu Sreenivasan	df2a4ded98	[ntp]: Source interface support for NTP (#6033 ) Added source interface support for NTP. Also made NTP start on Mgmt-VRF by default when configured. - How I did it 1) Updated hostcfg to listen to global config NTP and NTP_SERVER tables and restart ntp when ever the configuration changes. NTP table includes source interface configuration. 2) The ntp script updated to by default start on Mgmt-VFT when configured. Signed-off-by: Prabhu Sreenivasan <prabhu.sreenivasan@broadcom>	2020-12-21 05:34:13 -08:00
abdosi	0755f29fe7	Telemetry Certificate Copy Across Image Upgrade. (#6252 ) To copy telemetry certificate during image upgrade from previous image to new image	2020-12-19 08:24:03 -08:00
arheneus@marvell.com	e88c7d11ca	[ntp][apparmor] Allow apparmor read permission for ntpd under rw mount path of rootfs (#6040 ) Certain platform specific packages sonic-platform-xyz, installs files onto rootfs, which would be placed on read-write mount path on /host/image-name/rw/... when ntpd starts it tries to do read access on /usr/bin /usr/sbin/ /usr/local/bin , which inturn links further to the read-write mount path also. Where ntpd would get below Apparmor Warning message LOG:- audit: type=1400 audit(1606226503.240:21): apparmor="DENIED" operation="open" profile="/usr/sbin/ntpd" name="/image-HEAD-dirty-20201111.173951/rw/usr/local/bin/" pid=3733 comm="ntpd" requested_mask="r" denied_mask="r" fsuid=0 ouid=0 audit: type=1400 audit(1606226503.240:22): apparmor="DENIED" operation="open" profile="/usr/sbin/ntpd" name="/image-HEAD-dirty-20201111.173951/rw/usr/sbin/" pid=3733 comm="ntpd" requested_mask="r" denied_mask="r" fsuid=0 ouid=0 audit: type=1400 audit(1606226503.240:23): apparmor="DENIED" operation="open" profile="/usr/sbin/ntpd" name="/image-HEAD-dirty-20201111.173951/rw/usr/bin/" pid=3733 comm="ntpd" requested_mask="r" denied_mask="r" fsuid=0 ouid=0 Fix: Add rw/.. mount path similar to root path access provided for ntpd in /etc/apparmor.d/usr.sbin.ntpd Signed-off-by: Antony Rheneus <arheneus@marvell.com>	2020-12-18 04:57:35 -08:00
Lawrence Lee	03ad30d2ab	[build_templates]: Start SNMP timer after SWSS service (#6195 ) Fixes #5663 - Why I did it It's currently possible for the SNMP timer to conflict with config reload (specifically if the timer triggers while config reload is stopping the SWSS service). config reload triggers SWSS to shutdown, which causes SNMP to shutdown, which conflicts with the SNMP timer causing SNMP to startup. See the linked issue for more details. - How I did it Including the After ordering dependency forces the SNMP timer to wait until SWSS finishes stopping, preventing the conflict. If there is an ordering dependency between two units (e.g. one unit is ordered After another), if one unit is shutting down while the other is starting up, the shutdown will always be ordered before the startup. In this case, that means that the SNMP timer is forced to wait for the SWSS shutdown to complete. Only then can the SNMP timer proceed. See here for more details. It's important to note that the After dependency will not cause SWSS to be started when the SNMP timer fires (assuming that SWSS has not yet been started). The existing Requisite dependency in the SNMP service will also not cause SWSS to be started, instead it will cause the SNMP service to fail if SWSS is not active. Signed-off-by: Lawrence Lee <lawlee@microsoft.com>	2020-12-16 16:39:14 -08:00
Joe LeVeque	c829e6914a	Install 'wheel' package in host OS; upgrade pip and setuptools (#6187 ) Install the 'wheel' package in host OS (along with python3 and python3-distutils which are also needed for building some Python packages) to eliminate error messages like the following: ``` Running setup.py bdist_wheel for watchdog: started Running setup.py bdist_wheel for watchdog: finished with status 'error' Complete output from command /usr/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-install-Qd3K08/watchdog/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" bdist_wheel -d /tmp/pip-wheel-0AHpMe --python-tag cp27: usage: -c [global_opts] cmd1 [cmd1_opts] [cmd2 [cmd2_opts] ...] or: -c --help [cmd1 cmd2 ...] or: -c --help-commands or: -c cmd --help error: invalid command 'bdist_wheel' ---------------------------------------- Failed building wheel for watchdog ``` These error messages appear to have no impact on the image build, because the Python package seems to still get installed successfully afterward, just the building of a wheel package fails. Therefore, this is more of a cosmetic fix than an actual bug. This is an addendum to https://github.com/Azure/sonic-buildimage/pull/6182. Also upgrade pip and install more recent version of setuptools package via PyPI.	2020-12-16 16:38:15 -08:00
mprabhu-nokia	41012f791e	In modular chassis, add CHASSIS_STATE_DB on control card (#5624 ) HLD: Azure/SONiC#646 In modular chassis, add CHASSIS_STATE_DB on control card Why I did it Modular Chassis has control-cards, line-cards and fabric-cards along with other peripherals. Control-Card CHASSIS_STATE_DB will be the central DB to maintain any state information of cards that is accessible to control-card/ How I did it Adding another DB on an existing REDIS instance running on port 6380.	2020-12-15 17:15:00 -08:00
shlomibitton	a6aaffd2ad	[kdump] Add more kernel panic conditions for vmcore dump (#6095 ) Create new file to "sysctl.d" with desired panic conditions. It will trigger a vmcore dump using kdump-tools on these situations. Signed-off-by: Shlomi Bitton <shlomibi@nvidia.com>	2020-12-15 08:54:13 -08:00
rajendra-dendukuri	b60448a006	kdump: Add default kdump command line arguments (#6180 ) The default /etc/default/kdump-tools file provided by the kdump-tools package doesn't set a value for KDUMP_CMDLINE_APPEND. The default kdump command line arguments need to be set in order to extend them to use additional arguments required for SONiC platforms. Signed-off-by: Rajendra Dendukuri <rajendra.dendukuri@broadcom.com>	2020-12-15 08:52:23 -08:00
Sabareesh-Kumar-Anandan	9f4ca01388	[sonic-config-engine] Adding dependent pkgs needed for arm compilation (#6186 ) libxslt-dev and libz-dev are dependencies for lxml==4.6.1 which is required for pyangbind==0.8.1 lxml-4.6.2-cp37-cp37m-manylinux1_x86_64.whl is directly downloaded in amd64 whereas in arm this is built from lxml-4.6.2.tar.gz Signed-off-by: Sabareesh Kumar Anandan <sanandan@marvell.com>	2020-12-15 08:44:46 -08:00
Stephen Sun	e010d83fc3	[Dynamic buffer calc] Support dynamic buffer calculation (#6194 ) - Why I did it To support dynamic buffer calculation. This PR also depends on the following PRs for sub modules - [sonic-swss: [buffermgr/bufferorch] Support dynamic buffer calculation #1338](https://github.com/Azure/sonic-swss/pull/1338) - [sonic-swss-common: Dynamic buffer calculation #361](https://github.com/Azure/sonic-swss-common/pull/361) - [sonic-utilities: Support dynamic buffer calculation #973](https://github.com/Azure/sonic-utilities/pull/973) - How I did it 1. Introduce field `buffer_model` in `DEVICE_METADATA\|localhost` to represent which buffer model is running in the system currently: - `dynamic` for the dynamic buffer calculation model - `traditional` for the traditional model in which the `pg_profile_lookup.ini` is used 2. Add the tables required for the feature: - ASIC_TABLE in platform/\<vendor\>/asic_table.j2 - PERIPHERAL_TABLE in platform/\<vendor\>/peripheral_table.j2 - PORT_PERIPHERAL_TABLE on a per-platform basis in device/\<vendor\>/\<platform\>/port_peripheral_config.j2 for each platform with gearbox installed. - DEFAULT_LOSSLESS_BUFFER_PARAMETER and LOSSLESS_TRAFFIC_PATTERN in files/build_templates/buffers_config.j2 - Add lossless PGs (3-4) for each port in files/build_templates/buffers_config.j2 3. Copy the newly introduced j2 files into the image and rendering them when the system starts 4. Update the CLI options for buffermgrd so that it can start with dynamic mode 5. Fetches the ASIC vendor name in orchagent: - fetch the vendor name when creates the docker and pass it as a docker environment variable - `buffermgrd` can use this passed-in variable 6. Clear buffer related tables from STATE_DB when swss docker starts 7. Update the src/sonic-config-engine/tests/sample_output/buffers-dell6100.json according to the buffer_config.j2 8. Remove buffer pool sizes for ingress pools and egress_lossy_pool Update the buffer settings for dynamic buffer calculation	2020-12-13 11:35:39 -08:00
Junchao-Mellanox	51c77b179f	[Mellanox] Add python3 support for Mellanox platform API (#6175 ) python2 is end of life and SONiC is going to support python3. This PR is going to support: 1. Mellanox SONiC platform API python3 support 2. Install both python2 and python3 verson of Mellanox SONiC platform API or pmon and host side	2020-12-11 10:51:31 -08:00
Prabhu Sreenivasan	77afb8e54d	[ntp]: ntp-systemd-wrapper file is getting overwritten (#6179 ) ntp-systemd-wrapper file from files/image_config/ntp was not getting picked up. Added a line on sonic_debian_extension.j2 to copy over the file from files/image_config/ntp after installing the debian package. Signed-off-by: Prabhu Sreenivasan <prabhu.sreenivasan@broadcom.com>	2020-12-10 23:20:41 -08:00
judyjoseph	6d9ecbcfd8	Move frr logs from syslog to /var/log/frr/*.log (#5988 ) - Why I did it Move frr logs from syslog from the directory /var/log/quagga/.log to /var/log/frr/log - How I did it Updated the rsyslog config files. - How to verify it Verified the logs come into the file zebra.log and bgpd.log in the DIR /var/log/frr/log	2020-12-10 08:44:34 -08:00
rajendra-dendukuri	31ce20ac38	[kdump]: Kdump usability and reliability improvements (#6113 ) - Allow platform specific reboot script to be called after crash kernel has finished copying the kernel vmcore - Disable pcie advanced features when running crash kernel. This improves reliability of the crash kernel to successfully create a vmcore and also reboot - Allow crash kernel to reboot if a panic is seen while it is generating a vmcore - Fix crash kernel to use the SONiC specific /usr/local/bin/reboot script instead of the Linux reboot command /sbin/reboot - Use sonic_platform as the kernel command line parameter to pass platform identifier string Signed-off-by: Rajendra Dendukuri <rajendra.dendukuri@broadcom.com>	2020-12-10 01:32:37 -08:00
shlomibitton	6762f526d9	[NVMe] Add NVMe SSD disc type support to installer.sh script (#6142 ) In order to install a SONiC image on top of a NVMe SSD disc properly with ONIE we must configure it properly on the installer.sh script. Signed-off-by: Shlomi Bitton <shlomibi@nvidia.com>	2020-12-09 19:03:27 -08:00
Samuel Angebault	44f4c2ed66	[Arista] Update driver submodules (#6151 ) - Enhance eeprom parsing robustness on corrupted fields - Add chassis provisioning service - Disable CPU sleep state on some systems - Complete refactor for FanSlots - Fix module unload while still in use	2020-12-08 11:17:28 -08:00
abdosi	59c1e3a78a	[multi-asic] Enhancing monit process checker for multi-asic. (#6100 ) Added Support of process checker for work on multi-asic platforms.	2020-12-04 10:39:43 -08:00
Samuel Angebault	468aac92b7	[Arista] Update platform configurations for 7060DX4 and 7060PX4 (#6084 ) Current support for the 7060PX4-32 and 7060DX4 was broken. With this change, ports are now linking fine. Co-authored-by: Zhi Yuan Carl Zhao <zyzhao@arista.com>	2020-12-04 10:11:06 -08:00
Samuel Angebault	8576911a57	[database-chassis]: Fix the way database-chassis start (#6099 ) The service crash when the platform boots due to missing waits. /usr/bin/database.sh tries to operate on a missing socket and fails. We now wait for the chassis database to be ready the same way we do database.	2020-12-04 10:09:35 -08:00
Prabhu Sreenivasan	2895b79482	[ntp]: NTP service ordering (#6115 ) Make sure ntp-config service is executed before ntpd Updated ntp-config service files to force dependency with ntp service. Also resolved circular dependency with --no-block flag. (needed as ntp-config service internally invokes systemd to restart ntp which in turn waits for ntp-config to complete) Signed-off-by: Prabhu Sreenivasan <prabhu.sreenivasan@broadcom.com>	2020-12-04 08:49:20 -08:00
Joe LeVeque	83f0d8240e	[pmon]: Install vanilla 'thrift' Python 2 and 3 packages for Barefoot in host and PMon (#6080 ) Barefoot platform vendors' sonic_platform packages import the Python 'thrift' library. Previously, our custom-built package was being installed in the PMon container and host OS. However, we are only building a Python 2 version of that package, which was only intended for use with saithrift. Fixes #6077	2020-12-04 08:41:17 -08:00
Joe LeVeque	905a5127bb	[Python] Align files in root dir, dockers/ and files/ with PEP8 standards (#6109 ) - Why I did it Align style with slightly modified PEP8 standards (extend maximum line length to 120 chars). This will also help in the transition to Python 3, where it is more strict about whitespace, plus it helps unify style among the SONiC codebase. Will tackle other directories in separate PRs. - How I did it Using `autopep8 --in-place --max-line-length 120` and some manual tweaks.	2020-12-03 15:57:50 -08:00
Sabareesh-Kumar-Anandan	fe524c37e7	[platform][marvell] Arm 32-bit Arch support changes (#5749 ) - Added Arm 32-bit arch build fixes - Added marvell armhf platform specific changes Signed-off-by: Sabareesh Kumar Anandan <sanandan@marvell.com>	2020-12-03 12:38:50 -08:00
Garrick He	fc0e6af337	[sflow] Fix race-condition seen with mVRF configured (#6102 ) Under certain conditions, the sFlow service can start before interface configurations are sucessfully applied. This will cause hsflowd to get a socket error. This fix ensures all interface configurations are successfully applied before the sFlow service (hsflowd) starts. During testing we saw this error from hsflowd if interface configs were not successfully applied before hsflowd started. ERR sflow#hsflowd: socket sendto error: Network is unreachable no FLOW samples can be seen. This can be consistently reproducible if you force sFlow service to start before interface-config.service. Signed-off-by: Garrick He <garrick_he@dell.com>	2020-12-03 01:33:10 -08:00
lguohan	4812953468	[ntp]: build ntp with various fixes (#6037 ) - NTP Bug 1970 (UNLINK_EXPR_SLIST empty list) Fix - ENOBUFS log message level set to WARN - Fix audit message seen on console apparmor - add force-confold option when install ntp Signed-off-by: Guohan Lu <lguohan@gmail.com> Co-authored-by: Prabhu Sreenivasan <prabhu.sreenivasan@broadcom>	2020-12-02 15:02:50 -08:00
Joe LeVeque	7f4ab8fbd8	[sonic-utilities] Update submodule; Build and install as a Python 3 wheel (#5926 ) Submodule updates include the following commits: * src/sonic-utilities 9dc58ea...f9eb739 (18): > Remove unnecessary calls to str.encode() now that the package is Python 3; Fix deprecation warning (#1260) > [generate_dump] Ignoring file/directory not found Errors (#1201) > Fixed porstat rate and util issues (#1140) > fix error: interface counters is mismatch after warm-reboot (#1099) > Remove unnecessary calls to str.decode() now that the package is Python 3 (#1255) > [acl-loader] Make list sorting compliant with Python 3 (#1257) > Replace hard-coded fast-reboot with variable. And some typo corrections (#1254) > [configlet][portconfig] Remove calls to dict.has_key() which is not available in Python 3 (#1247) > Remove unnecessary conversions to list() and calls to dict.keys() (#1243) > Clean up LGTM alerts (#1239) > Add 'requests' as install dependency in setup.py (#1240) > Convert to Python 3 (#1128) > Fix mock SonicV2Connector in python3: use decode_responses mode so caller code will be the same as python2 (#1238) > [tests] Do not trim from PATH if we did not append to it; Clean up/fix shebangs in scripts (#1233) > Updates to bgp config and show commands with BGP_INTERNAL_NEIGHBOR table (#1224) > [cli]: NAT show commands newline issue after migrated to Python3 (#1204) > [doc]: Update Command-Reference.md (#1231) > Added 'import sys' in feature.py file (#1232) * src/sonic-py-swsssdk 9d9f0c6...1664be9 (2): > Fix: no need to decode() after redis client scan, so it will work for both python2 and python3 (#96) > FieldValueMap `contains`(`in`) will also work when migrated to libswsscommon(C++ with SWIG wrapper) (#94) - Also fix Python 3-related issues: - Use integer (floor) division in config_samples.py (sonic-config-engine) - Replace print statement with print function in eeprom.py plugin for x86_64-kvm_x86_64-r0 platform - Update all platform plugins to be compatible with both Python 2 and Python 3 - Remove shebangs from plugins files which are not intended to be executable - Replace tabs with spaces in Python plugin files and fix alignment, because Python 3 is more strict - Remove trailing whitespace from plugins files	2020-11-25 10:28:36 -08:00
abdosi	fad481edc1	Enhanced Feature table to support 'always_enabled' value for state and auto-restart fields. (#6000 ) Added new flag value 'always_enabled' for the state and auto-restart field of feature table init_cfg.json is updated to initialize state field of database/swss/syncd/teamd feature and auto-restart field of database feature as always_enabled Once the state/auto-restart value is initialized as "always_enabled" it is immutable and cannot be change via feature config commands. (config feature..) PR#Azure/sonic-utilities#1271 hostcfgd will not take any action if state field value is 'always_enabled' Since we have always_enabled field for auto-restart updated supervisor-proc-exit-listener not to have special check for database and always rely on value from Feature table.	2020-11-25 08:41:11 -08:00
Blueve	6a6e583b06	[bash.bashrc] Add reverse SSH script to bash.bashrc (#5438 ) * [bash.bashrc] Add reverse SSH script to bash.bashrc * Fix command issue and add emptt line before EOF * Add checks for SSH_TARGET_CONSOLE_LINE Signed-off-by: Jing Kan jika@microsoft.com	2020-11-24 14:11:53 +08:00
Sudharsan Dhamal Gopalarathnam	98a434e8c1	Copp Manager Changes (#4861 ) *Introduce CoPP Manager infrastructure Copp service to generate initial copp config template file Co-authored-by: dgsudharsan <sudharsan_gopalarat@dell.com>	2020-11-23 09:31:42 -08:00
Sujin Kang	5b31996f7b	[reboot-history] Add reboot history to state db (#5933 ) - Why I did it Add reboot history to State db so that can be used telemetry service - How I did it Split the process-reboot-cause service to determine-reboot-cause and process-reboot-cause determine-reboot-cause to determine the reboot cause process-reboot-cause to parse the reboot cause files and put the reboot history to state db Moved to sonic-host-service* packages - How to verify it Performed unit test and tested on DUT	2020-11-20 20:08:18 -08:00
Joe LeVeque	23247514f9	Fix a number of LGTM alerts (#5952 ) Fix 259 alerts reported by the LGTM tool: - 245 for Unused import - 7 for Testing equality to None - 5 for Duplicate key in dict literal - 1 for Module is imported more than once - 1 for Unused local variable	2020-11-20 10:58:48 -08:00
JiangboHe	461e43649b	fix error: interface counters is mismatch after warm-reboot (#5346 ) - Why I did it There is a issue for counters after warm-reboot: If I clear counters by command "sonic-clear counters", then execute 'warm-reboot' and whenSONiC is restart, the counters showed with command "show interface counters" is still old counters before "sonic-clear". It is not the right counters because the counters file in '/tmp' is lost in warm-reboot process. - How I did it I fixed it by saving '/tmp/portstat-0' folders in '/host/' before executing 'warm-reboot' (in pull request Azure/sonic-utilities#1099 ), and restore the counters folders back to '/tmp/' after warm-reboot process is finished. - How to verify it Clear counters by command 'sonic-clear' sonic-clear counters sonic-clear dropcounters sonic-clear pfccounters sonic-clear queuecounters sonic-clear rifcounters Execute 'warm-reboot' Use command ‘show interface counters’ to see if the counters is right.	2020-11-20 10:37:45 -08:00
Joe LeVeque	7bf05f7f4f	[supervisor] Install vanilla package once again, install Python 3 version in Buster container (#5546 ) - Why I did it We were building a custom version of Supervisor because I had added patches to prevent hangs and crashes if the system clock ever rolled backward. Those changes were merged into the upstream Supervisor repo as of version 3.4.0 (http://supervisord.org/changes.html#id9), therefore, we should be able to simply install the vanilla package via pip. This will also allow us to easily move to Python 3, as Python 3 support was added in version 4.0.0. - How I did it - Remove Makefiles and patches for building supervisor package from source - Install Python 3 supervisor package version 4.2.1 in Buster base container - Also install Python 3 version of supervisord-dependent-startup in Buster base container - Debian package installed binary in `/usr/bin/`, but pip package installs in `/usr/local/bin/`, so rather than update all absolute paths, I changed all references to simply call `supervisord` and let the system PATH find the executable to prevent future need for changes just in case we ever need to switch back to build a Debian package, then we won't need to modify these again. - Install Python 2 supervisor package >= 3.4.0 in Stretch and Jessie base containers	2020-11-19 23:41:32 -08:00
pavel-shirshov	a92732fe5d	[bgpcfgd]: Fixes for BBR (#5956 ) * Add explicit default state into the constants.yml * Enable/disable only peer-groups, available in the config * Retrieve updates from frr before using configuration Co-authored-by: Pavel Shirshov <pavel.contrib@gmail.com>	2020-11-19 00:07:58 -08:00
heidinet2007	7c17c58b83	Move teamd warm reboot code to service script (#5163 ) Summary: Move teamd functions to a new service script Motivation: To segregate teamd functions in one common place. fast-reboot script calls teamd functions that should ideally be replaced by a simple call to a service script. Changes: New teamd service script and path modification from /usr/bin/teamd.sh to /usr/local/bin/teamd.sh fast-reboot script (in sonic-utilities) modification (to use new teamd.sh to stop teamd) should follow soon after this change. Verification: VS image tests. Signed-off-by: Vaibhav Hemant Dixit <vaibhav.dixit@microsoft.com> Co-authored-by: heidi.ou@alibaba-inc.com <heidi.ou@alibaba-inc.com> Co-authored-by: Ying Xie <ying.xie@microsoft.com>	2020-11-13 13:34:18 -08:00
fk410167	a3dd3f55f9	Platform Driver Developement Framework (PDDF) (#4756 ) This change introduces PDDF which is described here: https://github.com/Azure/SONiC/pull/536 Most of the platform bring up effort goes in developing the platform device drivers, SONiC platform APIs and validating them. Typically each platform vendor writes their own drivers and platform APIs which is very tailor made to that platform. This involves writing code, building, installing it on the target platform devices and testing. Many of the details of the platform are hard coded into these drivers, from the HW spec. They go through this cycle repetitively till everything works fine, and is validated before upstreaming the code. PDDF aims to make this platform driver and platform APIs development process much simpler by providing a data driven development framework. This is enabled by: JSON descriptor files for platform data Generic data-driven drivers for various devices Generic SONiC platform APIs Vendor specific extensions for customisation and extensibility Signed-off-by: Fuzail Khan <fuzail.khan@broadcom.com>	2020-11-12 10:22:38 -08:00
Lawrence Lee	ae69fdf312	[buffers_config.j2]: Use correct cable lengths for backend devices (#5905 ) * Remove 'backend' from device type strings so that backend devices ('BackEndToRRouter' and 'BackEndLeafRouter') are given the same cable lengths as regular device types. Signed-off-by: Lawrence Lee <lawlee@microsoft.com>	2020-11-12 09:03:59 -08:00
Lawrence Lee	d0f16c0d79	Make backend device checking more robust (#5730 ) Treat devices that are ToRRouters (ToRRouters and BackEndToRRouters) the same when rendering templates Except for BackEndToRRouters belonging to a storage cluster, since these devices have extra sub-interfaces created Treat devices that are LeafRouters (LeafRouters and BackEndLeafRouters) the same when rendering templates Signed-off-by: Lawrence Lee <lawlee@microsoft.com>	2020-11-10 15:06:35 -08:00
Prince Sunny	1eaaf64ed2	Set preference for forced mgmt routes (#5844 ) When forced mgmt routes are present, the issue fixed as part of #5754 is not complete. Added a preference(priority) field to forced mgmt route ip rules	2020-11-10 14:20:13 -08:00
arlakshm	2b41f6bd5c	Add the vtysh command with newly added "-n" option for multi asic to the read_only_cmds (#5845 ) In multi asic platforms the "show ip bgp summary" commands is not available for user with read only privileges, so to fix this the vtysh command with the new "-n" option, added for multi asic platforms, needs to be added to the READ_ONLY_COMMANDS list in the sudoers files. Added the command vtysh -n [0-9] -c show * to list of READ_ONLY_COMMANDS in the sudoers files in this commit. Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>	2020-11-10 12:18:49 -08:00
abdosi	4f82463670	[multi-asic] Fixed the docker mount point check for multi-asic (#5848 ) API getMount() API was not updated to handle multi-asic platforms Updated API getMount() to return abspath() for Docker Mount Point and use that one for mount point comparison Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>	2020-11-09 13:03:00 -08:00
Joe LeVeque	e0fdf45ad0	[update_chassisdb_config] Convert to Python 3 (#5838 ) - Convert update_chassisdb_config script to Python 3 - Reorganize imports per PEP8 standard - Two blank lines precede functions per PEP8 standard	2020-11-09 08:35:36 -08:00
Guohan Lu	ad2e18e856	[baseimage]: install psutil for python3 psutil is needed by process_checker which is using python3 Signed-off-by: Guohan Lu <lguohan@gmail.com>	2020-11-09 00:29:10 -08:00
Praveen Chaudhary	6156cb2805	[sonic-yang-mgmt] Build PY3 & PY2 packages (#5559 ) Moving sonic-yang-mgmt to PY3 to support move of sonic-utilities to PY3. Signed-off-by: Praveen Chaudhary<pchaudhary@linkedin.com>	2020-11-07 13:03:41 -08:00
Joe LeVeque	04d0e8ab00	[hostcfgd] Convert to Python 3; Add to sonic-host-services package (#5713 ) To consolidate host services and install via packages instead of file-by-file, also as part of migrating all of SONiC to Python 3, as Python 2 is no longer supported.	2020-11-07 12:48:19 -08:00
Joe LeVeque	9e7e092610	[Monit process_checker] Convert to Python 3 (#5836 ) Convert process_checker script to Python 3	2020-11-07 12:46:23 -08:00
lguohan	e6796da141	[init_cfg.json.j2]: only enable gbsyncd feature for vs platform (#5815 ) currently only vs platform has gdbsyncd feature built Signed-off-by: Guohan Lu <lguohan@gmail.com>	2020-11-07 00:46:18 -08:00
Stepan Blyshchak	9bc693ce6e	[hostcfgd] If feature state entry not in the cache, add a default state (#5777 ) Our use case is to register new features in runtime. The previous change which introduced the cache broke this capability and caused hostcfgd crash. Signed-off-by: Stepan Blyshchak <stepanb@nvidia.com>	2020-11-06 10:24:31 -08:00
Joe LeVeque	13ff7b38d5	[docker-wait-any] Convert to Python 3, install dependency in host OS (#5784 ) - Convert docker-wait-any script to Python 3 - Install Python 3 Docker Engine API in host OS	2020-11-05 11:23:00 -08:00
Joe LeVeque	d8045987a6	[core_uploader.py] Convert to Python 3; Use logger from sonic-py-common for uniform logging (#5790 ) - Convert core_uploader.py script to Python 3 - Use logger from sonic-py-common for uniform logging - Reorganize imports alphabetically per PEP8 standard - Two blank lines precede functions per PEP8 standard - Remove unnecessary global variable declarations	2020-11-05 11:19:26 -08:00
Joe LeVeque	522a071ffb	[core_cleanup.py] Convert to Python 3; Fix bug; Improve code reuse (#5781 ) - Convert to Python 3 - Fix bug: `CORE_FILE_DIR` previously was set to `os.path.basename(__file__)`, which would resolve to the script name. Fix this by hardcoding to `/var/core/` instead - Remove locally-define logging functions; use Logger class from sonic-py-common instead	2020-11-05 10:01:12 -08:00
Joe LeVeque	d3262d10f7	[generate_asic_config_checksum.py] Convert to Python 3 (#5783 ) - Convert script to Python 3 - Need to open file in binary mode before hashing due to new string data type in Python 3 being unicode by default. This should probably have been done regardless. - Reorganize imports alphabetically - When running the script, don't explicitly call `python`. Instead let the program loader use the interpreter specified in the shebang (which is now `python3`).	2020-11-04 15:06:44 -08:00
Lawrence Lee	10ab46f7a0	Revert "[docker-base]: Rate limit priority INFO and lower in syslog" (#5763 ) * This was a temporary fix for orchagent spamming log messages and causing rate limiting, leading to critical messages being dropped for the syslog. No longer needed since Azure/sonic-sairedis#680 was merged.	2020-11-02 08:49:40 -08:00
Blueve	698b5544c9	[openssh] Introduce custom openssh-server package for supporting reverse console SSH (#5717 ) * Build and install openssh from source * Copy openssh deb package to dest folder * Update make rule * Update sonic debian extension * Append empty line before EOF * Update openssh patch * Add openssh-server to base image dependency * Fix indent type * Fix comments * Use commit id instead of tag id and add comment Signed-off-by: Jing Kan jika@microsoft.com	2020-11-02 10:31:15 +08:00
lguohan	c8a00eda95	[mgmt ip]: mvrf ip rule priority change to 32765 (#5754 ) Fix Azure/SONiC#551 When eth0 IP address is configured, an ip rule is getting added for eth0 IP address through the interfaces.j2 template. This eth0 ip rule creates an issue when VRF (data VRF or management VRF) is also created in the system. When any VRF (data VRF or management VRF) is created, a new rule is getting added automatically by kernel as "1000: from all lookup [l3mdev-table]". This l3mdev IP rule is never getting deleted even if VRF is deleted. Once if this l3mdev IP rule is added, if user configures IP address for the eth0 interface, interfaces.j2 adds an eth0 IP rule as "1000:from 100.104.47.74 lookup default ". Priority 1000 is automatically chosen by kernel and hence this rule gets higher priority than the already existing rule "1001:from all lookup local ". This results in an issue "ping from console to eth0 IP does not work once if VRF is created" as explained in Issue 551. More details and possible solutions are explained as comments in the Issue551. This PR is to resolve the issue by always fixing the low priority 32765 for the IP rule that is created for the eth0 IP address. Tested with various combinations of VRF creation, deletion and IP address configuration along with ping from console to eth0 IP address. Co-authored-by: Kannan KVS <kannan_kvs@dell.com>	2020-10-31 20:45:59 -07:00
abdosi	dddf96933c	[monit] Adding patch to enhance syslog error message generation for monit alert action when status is failed. (#5720 ) Why/How I did: Make sure first error syslog is triggered based on FAULT TOLERANCE condition. Added support of repeat clause with alert action. This is used as trigger for generation of periodic syslog error messages if error is persistent Updated the monit conf files with repeat every x cycles for the alert action	2020-10-31 17:29:49 -07:00
Renuka Manavalan	8d8aadb615	Load config after subscribe (#5740 ) - Why I did it The update_all_feature_states can run in the range of 20+ seconds to one minute. With load of AAA & Tacacs preceding it, any DB updates in AAA/TACACS during the long running feature updates would get missed. To avoid, switch the order. - How I did it Do a load after after updating all feature states. - How to verify it Not a easy one Have a script that restart hostcfgd sleep 2s run redis-cli/config command to update AAA/TACACS table Run the script above and watch the file /etc/pam.d/common-auth-sonic for a minute. - When it repro: The updates will not reflect in /etc/pam.d/common-auth-sonic	2020-10-31 16:38:32 -07:00
Joe LeVeque	6333bb73b0	Explicitly call `pip2` rather than `pip` in locations where both pip2 and pip3 are installed (#5747 ) As part of the transition from Python 2 to Python 3, we are installing both pip2 and pip3 in the slave and config-engine containers. This PR replaces calls to `pip` in these containers with an explicit call to `pip2` to ensure the proper version of pip is executed, no matter which version of pip is aliased to `pip`, as we no longer rely on that alias. Also some other pip-related cleanup	2020-10-30 09:43:14 -07:00
Joe LeVeque	e111204206	[caclmgrd] Convert to Python 3; Add to sonic-host-services package (#5739 ) To consolidate host services and install via packages instead of file-by-file, also as part of migrating all of SONiC to Python 3, as Python 2 is no longer supported, convert caclmgrd to Python 3 and add to sonic-host-services package	2020-10-29 16:29:12 -07:00
Shi Su	5ee5c13f32	Enable synchronous mode by default and add in minigraph parser (#5735 )	2020-10-29 09:15:12 -07:00
judyjoseph	6088bd59de	[multi-ASIC] BGP internal neighbor table support (#5520 ) * Initial commit for BGP internal neighbor table support. > Add new template named "internal" for the internal BGP sessions > Add a new table in database "BGP_INTERNAL_NEIGHBOR" > The internal BGP sessions will be stored in this new table "BGP_INTERNAL_NEIGHBOR" * Changes in template generation tests with the introduction of internal neighbor template files.	2020-10-28 16:41:27 -07:00
lguohan	07748a939f	[gbsyncd]: add gbsyncd to FEATURE table (#5683 ) remove syncd from critical process list because gbsyncd process will exit for platform without gearbox. closes #5623 Signed-off-by: Guohan Lu <lguohan@gmail.com>	2020-10-27 11:40:23 -07:00
bingwang-ms	36c52cca2b	Fix 'NoSuchProcess' exception in process_checker (#5716 ) The psutil library used in process_checker create a cache for each process when calling process_iter. So, there is some possibility that one process exists when calling process_iter, but not exists when calling cmdline, which will raise a NoSuchProcess exception. This commit fix the issue. Signed-off-by: bingwang <bingwang@microsoft.com>	2020-10-27 09:25:35 +08:00
Joe LeVeque	9e34003136	[sonic-config-engine] Clean up dependencies, pin versions; install Python 3 package in Buster container (#5656 ) To clean up the image build procedure, and let setuptools/pip[3] implicitly install Python dependencies. Also use ipaddress package instead of ipaddr.	2020-10-26 13:48:50 -07:00
Shi Su	67408c85aa	[synchronous-mode] Add template file for synchronous mode (#5644 ) The orchagent and syncd need to have the same default synchronous mode configuration. This PR adds a template file to translate the default value in CONFIG_DB (empty field) to an explicit mode so that the orchagent and syncd could have the same default mode.	2020-10-23 13:08:35 -07:00
Joe LeVeque	3a4435eb53	Add sonic-host-services and sonic-host-services-data packages (#5694 ) - Why I did it Install all host services and their data files in package format rather than file-by-file - How I did it - Create sonic-host-services Python wheel package, currently including procdockerstatsd - Also add the framework for unit tests by adding one simple procdockerstatsd test case - Create sonic-host-services-data Debian package which is responsible for installing the related systemd unit files to control the services in the Python wheel. This package will also be responsible for installing any Jinja2 templates and other data files needed by the host services.	2020-10-23 09:52:29 -07:00
judyjoseph	ace7f24cba	[docker-teamd]: Add teamd as a depedent service to swss (#5628 ) - Why I did it On teamd docker restart, the swss and syncd needs to be restarted as there are dependent resources present. - How I did it Add the teamd as a dependent service for swss Updated the docker-wait script to handle service and dependent services separately. Handle the case of warm-restart for the dependent service - How to verify it Verified the following scenario's with the following testbed VM1 ----------------------------[DUT 6100] -----------------------VM2, ping traffic continuous between VMs 1. Stop teamd docker alone > swss, syncd dockers seen going away > The LAG reference count error messages seen for a while till swss docker stops. > Dockers back up. 2. Enable WR mode for teamd. Stop teamd docker alone > swss, syncd dockers not removed. > The LAG reference count error messages not seen > Repeated stop teamd docker test - same result, no effect on swss/syncd. 3. Stop swss docker. > swss, teamd, syncd goes off - dockers comes back correctly, interfaces up 4. Enable WR mode for swss . Stop swss docker > swss goes off not affecting syncd/teamd dockers. 5. Config reload > no reference counter error seen, dockers comes back correctly, with interfaces up 6. Warm reboot, observations below > swss docker goes off first > teamd + syncd goes off to the end of WR process. > dockers comes back up fine. > ping traffic between VM's was NOT HIT 7. Fast reboot, observations below > teamd goes off first ( confirmed swss don't exit here ) > swss goes off next > syncd goes away at the end of the FR process > dockers comes back up fine. > there is a traffic HIT as per fast-reboot 8. Verified in multi-asic platform, the tests above other than WR/FB scenarios	2020-10-23 00:41:16 -07:00
yozhao101	af97e23686	[hostcfgd] Enable/disable the container service only when the feature state was changed. (#5689 ) - Why I did it If we ran the CLI commands `sudo config feature autorestart snmp disabled/enabled` or `sudo config feature autorestart swss disabled/enabled`, then SNMP container will be stopped and started. This behavior was not expected since we updated the `auto_restart` field not update `state` field in `FEATURE` table. The reason behind this issue is that either `state` field or `auto_restart` field was updated, the function `update_feature_state(...)` will be invoked which then starts snmp.timer service. The snmp.timer service will first stop snmp.service and later start snmp.service. In order to solve this issue, the function `update_feature_state(...)` will be only invoked if `state` field in `FEATURE` table was updated. - How I did it When the demon `hostcfgd` was activated, all the values of `state` field in `FEATURE` table of each container will be cached. Each time the function `feature_state_handler(...)` is invoked, it will determine whether the `state` field of a container was changed or not. If it was changed, function `update_feature_state(...)` will be invoked and the cached value will also be updated. Otherwise, nothing will be done. - How to verify it We can run the CLI commands `sudo config feature autorestart snmp disabled/enabled` or `sudo config feature autorestart swss disabled/enabled` to check whether SNMP container is stopped and started. We also can run the CLI commands `sudo config feature state snmp disabled/enabled` or `sudo config feature state swss disabled/enabled` to check whether the container is stopped and restarted. Signed-off-by: Yong Zhao <yozhao@microsoft.com>	2020-10-22 20:01:07 -07:00
pavel-shirshov	c94f93f046	[bgpcfgd]: Dynamic BBR support (#5626 ) - Why I did it To introduce dynamic support of BBR functionality into bgpcfgd. BBR is adding `neighbor PEER_GROUP allowas-in 1' for all BGP peer-groups which points to T0 Now we can add and remove this configuration based on CONFIG_DB entry - How I did it I introduced a new CONFIG_DB entry: - table name: "BGP_BBR" - key value: "all". Currently only "all" is supported, which means that all peer-groups which points to T0s will be updated - data value: a dictionary: {"status": "status_value"}, where status_value could be either "enabled" or "disabled" Initially, when bgpcfgd starts, it reads initial BBR status values from the [constants.yml](https://github.com/Azure/sonic-buildimage/pull/5626/files#diff-e6f2fe13a6c276dc2f3b27a5bef79886f9c103194be4fcb28ce57375edf2c23cR34). Then you can control BBR status by changing "BGP_BBR" table in the CONFIG_DB (see examples below). bgpcfgd knows what peer-groups to change fron [constants.yml](https://github.com/Azure/sonic-buildimage/pull/5626/files#diff-e6f2fe13a6c276dc2f3b27a5bef79886f9c103194be4fcb28ce57375edf2c23cR39). The dictionary contains peer-group names as keys, and a list of address-families as values. So when bgpcfgd got a request to change the BBR state, it changes the state only for peer-groups listed in the constants.yml dictionary (and only for address families from the peer-group value). - How to verify it Initially, when we start SONiC FRR has BBR enabled for PEER_V4 and PEER_V6: ``` admin@str-s6100-acs-1:~$ vtysh -c 'show run' \| egrep 'PEER_V.? allowas' neighbor PEER_V4 allowas-in 1 neighbor PEER_V6 allowas-in 1 ``` Then we apply following configuration to the db: ``` admin@str-s6100-acs-1:~$ cat disable.json { "BGP_BBR": { "all": { "status": "disabled" } } } admin@str-s6100-acs-1:~$ sonic-cfggen -j disable.json -w ``` The log output are: ``` Oct 14 18:40:22.450322 str-s6100-acs-1 DEBUG bgp#bgpcfgd: Received message : '('all', 'SET', (('status', 'disabled'),))' Oct 14 18:40:22.450620 str-s6100-acs-1 DEBUG bgp#bgpcfgd: execute command '['vtysh', '-f', '/tmp/tmpmWTiuq']'. Oct 14 18:40:22.681084 str-s6100-acs-1 DEBUG bgp#bgpcfgd: execute command '['vtysh', '-c', 'clear bgp peer-group PEER_V4 soft in']'. Oct 14 18:40:22.904626 str-s6100-acs-1 DEBUG bgp#bgpcfgd: execute command '['vtysh', '-c', 'clear bgp peer-group PEER_V6 soft in']'. ``` Check FRR configuraiton and see that no allowas parameters are there: ``` admin@str-s6100-acs-1:~$ vtysh -c 'show run' \| egrep 'PEER_V.? allowas' admin@str-s6100-acs-1:~$ ``` Then we apply enabling configuration back: ``` admin@str-s6100-acs-1:~$ cat enable.json { "BGP_BBR": { "all": { "status": "enabled" } } } admin@str-s6100-acs-1:~$ sonic-cfggen -j enable.json -w ``` The log output: ``` Oct 14 18:40:41.074720 str-s6100-acs-1 DEBUG bgp#bgpcfgd: Received message : '('all', 'SET', (('status', 'enabled'),))' Oct 14 18:40:41.074720 str-s6100-acs-1 DEBUG bgp#bgpcfgd: execute command '['vtysh', '-f', '/tmp/tmpDD6SKv']'. Oct 14 18:40:41.587257 str-s6100-acs-1 DEBUG bgp#bgpcfgd: execute command '['vtysh', '-c', 'clear bgp peer-group PEER_V4 soft in']'. Oct 14 18:40:42.042967 str-s6100-acs-1 DEBUG bgp#bgpcfgd: execute command '['vtysh', '-c', 'clear bgp peer-group PEER_V6 soft in']'. ``` Check FRR configuraiton and see that the BBR configuration is back: ``` admin@str-s6100-acs-1:~$ vtysh -c 'show run' \| egrep 'PEER_V.? allowas' neighbor PEER_V4 allowas-in 1 neighbor PEER_V6 allowas-in 1 ``` * The test coverage * Below is the test coverage ``` ---------- coverage: platform linux2, python 2.7.12-final-0 ---------- Name Stmts Miss Cover ---------------------------------------------------- bgpcfgd/__init__.py 0 0 100% bgpcfgd/__main__.py 3 3 0% bgpcfgd/config.py 78 41 47% bgpcfgd/directory.py 63 34 46% bgpcfgd/log.py 15 3 80% bgpcfgd/main.py 51 51 0% bgpcfgd/manager.py 41 23 44% bgpcfgd/managers_allow_list.py 385 21 95% bgpcfgd/managers_bbr.py 76 0 100% bgpcfgd/managers_bgp.py 193 193 0% bgpcfgd/managers_db.py 9 9 0% bgpcfgd/managers_intf.py 33 33 0% bgpcfgd/managers_setsrc.py 45 45 0% bgpcfgd/runner.py 39 39 0% bgpcfgd/template.py 64 11 83% bgpcfgd/utils.py 32 24 25% bgpcfgd/vars.py 1 0 100% ---------------------------------------------------- TOTAL 1128 530 53% ``` - Which release branch to backport (provide reason below if selected) - [ ] 201811 - [x] 201911 - [x] 202006	2020-10-22 11:04:21 -07:00
BrynXu	29928c93a1	[chassis]: Use correct path for chassisdb.conf file (#5632 ) use correct chassisdb.conf path while bringing up chassis_db service on VoQ modular switch.chassis_db service on VoQ modular switch. resolves #5631 Signed-off-by: Honggang Xu <hxu@arista.com>	2020-10-21 01:40:04 -07:00
Lawrence Lee	207587d97c	[docker-base]: Rate limit priority INFO and lower in syslog (#5666 ) There is currently a bug where messages from swss with priority lower than the current log level are still being counted against the syslog rate limiting threshhold. This leads to rate-limiting in syslog when the rate-limiting conditions have not been met, which causes several sonic-mgmt tests to fail since they are dependent on LogAnalyzer. It also omits potentially useful information from the syslog. Only rate-limiting messages of level INFO and lower allows these tests to pass successfully. Signed-off-by: Lawrence Lee <lawlee@microsoft.com>	2020-10-20 11:52:46 -07:00
pavel-shirshov	d19d1dd569	[bgpcfgd]: Change prefix-list generation for "Allow prefix" feature (#5639 ) - Why I did it I was asked to change "Allow list" prefix-list generation rule. Previously we generated the rules using following method: ``` For each {prefix}/{masklen} we would generate the prefix-rule permit {prefix}/{masklen} ge {masklen}+1 Example: Prefix 1.2.3.4/24 would have following prefix-list entry generated permit 1.2.3.4/24 ge 23 ``` But we discovered the old rule doesn't work for all cases we have. So we introduced the new rule: ``` For ipv4 entry, For mask < 32 , we will add ‘le 32’ to cover all prefix masks to be sent by T0 For mask =32 , we will not add any ‘le mask’ For ipv6 entry, we will add le 128 to cover all the prefix mask to be sent by T0 For mask < 128 , we will add ‘le 128’ to cover all prefix masks to be sent by T0 For mask = 128 , we will not add any ‘le mask’ ``` - How I did it I change prefix-list entry generation function. Also I introduced a test for the changed function. - How to verify it 1. Build an image and put it on your dut. 2. Create a file test_schema.conf with the test configuration ``` { "BGP_ALLOWED_PREFIXES": { "DEPLOYMENT_ID\|0\|1010:1010": { "prefixes_v4": [ "10.20.0.0/16", "10.50.1.0/29" ], "prefixes_v6": [ "fc01:10::/64", "fc02:20::/64" ] }, "DEPLOYMENT_ID\|0": { "prefixes_v4": [ "10.20.0.0/16", "10.50.1.0/29" ], "prefixes_v6": [ "fc01:10::/64", "fc02:20::/64" ] } } } ``` 3. Apply the configuration by command ``` sonic-cfggen -j test_schema.conf --write-to-db ``` 4. Check that your bgp configuration has following prefix-list entries: ``` admin@str-s6100-acs-1:~$ show runningconfiguration bgp \| grep PL_ALLOW ip prefix-list PL_ALLOW_LIST_DEPLOYMENT_ID_0_COMMUNITY_1010:1010_V4 seq 10 deny 0.0.0.0/0 le 17 ip prefix-list PL_ALLOW_LIST_DEPLOYMENT_ID_0_COMMUNITY_1010:1010_V4 seq 20 permit 127.0.0.1/32 ip prefix-list PL_ALLOW_LIST_DEPLOYMENT_ID_0_COMMUNITY_1010:1010_V4 seq 30 permit 10.20.0.0/16 le 32 ip prefix-list PL_ALLOW_LIST_DEPLOYMENT_ID_0_COMMUNITY_1010:1010_V4 seq 40 permit 10.50.1.0/29 le 32 ip prefix-list PL_ALLOW_LIST_DEPLOYMENT_ID_0_COMMUNITY_empty_V4 seq 10 deny 0.0.0.0/0 le 17 ip prefix-list PL_ALLOW_LIST_DEPLOYMENT_ID_0_COMMUNITY_empty_V4 seq 20 permit 127.0.0.1/32 ip prefix-list PL_ALLOW_LIST_DEPLOYMENT_ID_0_COMMUNITY_empty_V4 seq 30 permit 10.20.0.0/16 le 32 ip prefix-list PL_ALLOW_LIST_DEPLOYMENT_ID_0_COMMUNITY_empty_V4 seq 40 permit 10.50.1.0/29 le 32 ipv6 prefix-list PL_ALLOW_LIST_DEPLOYMENT_ID_0_COMMUNITY_1010:1010_V6 seq 10 deny ::/0 le 59 ipv6 prefix-list PL_ALLOW_LIST_DEPLOYMENT_ID_0_COMMUNITY_1010:1010_V6 seq 20 deny ::/0 ge 65 ipv6 prefix-list PL_ALLOW_LIST_DEPLOYMENT_ID_0_COMMUNITY_1010:1010_V6 seq 30 permit fc01:10::/64 le 128 ipv6 prefix-list PL_ALLOW_LIST_DEPLOYMENT_ID_0_COMMUNITY_1010:1010_V6 seq 40 permit fc02:20::/64 le 128 ipv6 prefix-list PL_ALLOW_LIST_DEPLOYMENT_ID_0_COMMUNITY_empty_V6 seq 10 deny ::/0 le 59 ipv6 prefix-list PL_ALLOW_LIST_DEPLOYMENT_ID_0_COMMUNITY_empty_V6 seq 20 deny ::/0 ge 65 ipv6 prefix-list PL_ALLOW_LIST_DEPLOYMENT_ID_0_COMMUNITY_empty_V6 seq 30 permit fc01:10::/64 le 128 ipv6 prefix-list PL_ALLOW_LIST_DEPLOYMENT_ID_0_COMMUNITY_empty_V6 seq 40 permit fc02:20::/64 le 128 ``` Co-authored-by: Pavel Shirshov <pavel.contrib@gmail.com>	2020-10-20 00:38:09 -07:00
Joe LeVeque	edf4971b16	[caclmgrd] Prevent unnecessary iptables updates (#5312 ) When a large number of changes occur to the ACL table of Config DB, caclmgrd will get flooded with notifications, and previously, it would regenerate and apply the iptables rules for each change, which is unnecessary, as the iptables rules should only get applied once after the last change notification is received. If the ACL table contains a large number of control plane ACL rules, this could cause a large delay in caclmgrd getting the rules applied. This patch causes caclmgrd to delay updating the iptables rules until it has not received a change notification for at least 0.5 seconds.	2020-10-19 11:11:30 -07:00
Joe LeVeque	678b66359d	[procdockerstatsd] Convert to Python 3 (#5657 ) Make procdockerstatsd Python 3-compliant and set interpreter to python3 in shebang. Also some other cleanup to improve code reuse.	2020-10-19 09:46:02 -07:00
Rajkumar-Marvell	5708e32ccf	Set sock rx Buf size to 3MB. (#5566 ) * Set sock rx Buf size to 3MB.	2020-10-15 14:40:59 -07:00
BrynXu	a2e3d2fcea	[ChassisDB]: bring up ChassisDB service (#5283 ) bring up chassisdb service on sonic switch according to the design in Distributed Forwarding in VoQ Arch HLD Signed-off-by: Honggang Xu <hxu@arista.com> - Why I did it To bring up new ChassisDB service in sonic as designed in ['Distributed forwarding in a VOQ architecture HLD' ](`90c1289eaf/doc/chassis/architecture.md`). - How I did it Implement the section 2.3.1 Global DB Organization of the VOQ architecture HLD. - How to verify it ChassisDB service won't start without chassisdb.conf file on the existing platforms. ChassisDB service is accessible with global.conf file in the distributed arichitecture. Signed-off-by: Honggang Xu <hxu@arista.com>	2020-10-14 15:15:24 -07:00
Joe LeVeque	88c1d66c27	[python-click] No longer build our own package, let pip/setuptools install vanilla (#5549 ) We were building our own python-click package because we needed features/bug fixes available as of version 7.0.0, but the most recent version available from Debian was in the 6.x range. "Click" is needed for building/testing and installing sonic-utilities. Now that we are building sonic-utilities as a wheel, with Click specified as a dependency in the setup.py file, setuptools will install a more recent version of Click in the sonic-slave-buster container when building the package, and pip will install a more recent version of Click in the host OS of SONiC when installing the sonic-utilities package. Also, we don't need to worry about installing the Python 2 or 3 version of the package, as the proper one will be installed as necessary.	2020-10-14 10:16:35 -07:00
abdosi	9094e2176f	Optimze ACL Table/Rule notification handling (#5621 ) * Optimze ACL Table/Rule notifcation handling to loop pop() until empty to consume all the data in a batch This wau we prevent multiple call to iptable updates Signed-off-by: Abhishek Dosi <abdosi@microsoft.com> * Address review comments Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>	2020-10-14 08:05:33 -07:00
Junchao-Mellanox	1c97a03b81	[system-health] Add support for monitoring system health (#4835 ) * system health first commit * system health daemon first commit * Finish healthd * Changes due to lower layer logic change * Get ASIC temperature from TEMPERATURE_INFO table * Add system health make rule and service files * fix bugs found during manual test * Change make file to install system-health library to host * Set system LED to blink on bootup time * Caught exceptions in system health checker to make it more robust * fix issue that fan/psu presence will always be true * fix issue for external checker * move system-health service to right after rc-local service * Set system-health service start after database service * Get system up time via /proc/uptime * Provide more information in stat for CLI to use * fix typo * Set default category to External for external checker * If external checker reported OK, save it to stat too * Trim string for external checker output * fix issue: PSU voltage check always return OK * Add unit test cases for system health library * Fix LGTM warnings * fix demo comments: 1. get boot up timeout from monit configuration file; 2. set system led in library instead of daemon * Remove boot_timeout configuration because it will get from monit config file * Fix argument miss * fix unit test failure * fix issue: summary status is not correct * Fix format issues found in code review * rename th to threshold to make it clearer * Fix review comment: 1. add a .dep file for system health; 2. deprecated daemon_base and uses sonic-py-common instead * Fix unit test failure * Fix LGTM alert * Fix LGTM alert * Fix review comments * Fix review comment * 1. Add relevant comments for system health; 2. rename external_checker to user_define_checker * Ignore check for unknown service type * Fix unit test issue * Rename user define checker to user defined checker * Rename user_define_checkers to user_defined_checkers for configuration file * Renmae file user_define_checker.py -> user_defined_checker.py * Fix typo * Adjust import order for config.py Co-authored-by: Joe LeVeque <jleveque@users.noreply.github.com> * Adjust import order for src/system-health/health_checker/hardware_checker.py Co-authored-by: Joe LeVeque <jleveque@users.noreply.github.com> * Adjust import order for src/system-health/scripts/healthd Co-authored-by: Joe LeVeque <jleveque@users.noreply.github.com> * Adjust import orders in src/system-health/tests/test_system_health.py * Fix typo * Add new line after import * If system health configuration file not exist, healthd should exit * Fix indent and enable pytest coverage * Fix typo * Fix typo * Remove global logger and use log functions inherited from super class * Change info level logger to notice level Co-authored-by: Joe LeVeque <jleveque@users.noreply.github.com>	2020-10-12 11:12:49 +03:00
abdosi	01fceb6f79	Optimized caclmgrd Notification handling. Previously (#5560 ) any event happening on ACL Rule Table (eg DATAACL rules programmed) caused control plane default action to be triggered. Now Control Plance ACTION will be trigger only a) ACL Rule beloging to Control ACL Table Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>	2020-10-08 11:31:09 -07:00
jon-nokia	d03de95e81	[build]: fix pip installation for sonic utilities whl package (#5498 ) The problem was proxy was missing on "pip install". This is to fix the build behind the proxy. Signed-off-by: Jon Goldberg <jon.goldberg@nokia.com>	2020-10-06 15:47:50 -07:00
Ying Xie	ec0153008a	[rc.local] separate configuration migration and grub installation logic (#5528 ) To address issue #5525 Explicitly control the grub installation requirement when it is needed. We have scenario where configuration migration happened but grub installation is not required. Signed-off-by: Ying Xie <ying.xie@microsoft.com>	2020-10-03 23:00:39 -07:00
pavel-shirshov	ffae82f8be	[bgp] Add 'allow list' manager feature (#5513 ) implements a new feature: "BGP Allow list." This feature allows us to control which IP prefixes are going to be advertised via ebgp from the routes received from EBGP neighbors.	2020-10-02 10:06:04 -07:00
anish-n	e15e6a8313	[config-reload]: Add logic to clean up FG_ROUTE state db table during reload (#5518 ) Cleanup FG_ROUTE state db table during reload	2020-10-02 09:25:29 -07:00
Tamer Ahmed	110f7b7817	[cfggen] Build Python 2 And Python 3 Wheel Packages This builds Python 2&3 wheel packages for sonic-cfggen script. singed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>	2020-09-30 07:07:43 -07:00
Volodymyr Boiko	d71a4efe3b	[sonic-platform-common] Install Python 3 package in host OS and PMon container (#5461 ) Signed-off-by: Volodymyr Boyko <volodymyrx.boiko@intel.com>	2020-09-29 13:57:54 -07:00
Guohan Lu	e412338743	Revert "[bgp] Add 'allow list' manager feature (#5309 )" This reverts commit `6eed0820c8`.	2020-09-28 22:00:29 -07:00
pavel-shirshov	6eed0820c8	[bgp] Add 'allow list' manager feature (#5309 ) implements a new feature: "BGP Allow list." This feature allows us to control which IP prefixes are going to be advertised via ebgp from the routes received from EBGP neighbors.	2020-09-27 10:47:43 -07:00
judyjoseph	4006ce711f	[Multi-Asic] Forward SNMP requests received on front panel interface to SNMP agent in host. (#5420 ) * [Multi-Asic] Forward SNMP requests destined to loopback IP, and coming in through the front panel interface present in the network namespace, to SNMP agent running in the linux host. * Updates based on comments * Further updates in docker_image_ctl.j2 and caclmgrd * Change the variable for net config file. * Updated the comments in the code. * No need to clean up the exising NAT rules if present, which could be created by some other process. * Delete our rule first and add it back, to take care of caclmgrd restart. Another benefit is that we delete only our rules, rather than earlier approach of "iptables -F" which cleans up all rules. * Keeping the original logic to clean the NAT entries, to revist when NAT feature added in namespace. * Missing updates to log_info call.	2020-09-26 12:14:30 -07:00
Syd Logan	0311a4a037	Add gearbox phy device files and a new physyncd docker to support VS gearbox phy feature (#4851 ) * buildimage: Add gearbox phy device files and a new physyncd docker to support VS gearbox phy feature * scripts and configuration needed to support a second syncd docker (physyncd) * physyncd supports gearbox device and phy SAI APIs and runs multiple instances of syncd, one per phy in the device * support for VS target (sonic-sairedis vslib has been extended to support a virtual BCM81724 gearbox PHY). HLD is located at `b817a12fd8/doc/gearbox/gearbox_mgr_design.md` - Why I did it This work is part of the gearbox phy joint effort between Microsoft and Broadcom, and is based on multi-switch support in sonic-sairedis. - How I did it Overall feature was implemented across several projects. The collective pull requests (some in late stages of review at this point): https://github.com/Azure/sonic-utilities/pull/931 - CLI (merged) https://github.com/Azure/sonic-swss-common/pull/347 - Minor changes (merged) https://github.com/Azure/sonic-swss/pull/1321 - gearsyncd, config parsers, changes to orchargent to create gearbox phy on supported systems https://github.com/Azure/sonic-sairedis/pull/624 - physyncd, virtual BCM81724 gearbox phy added to vslib - How to verify it In a vslib build: root@sonic:/home/admin# show gearbox interfaces status PHY Id Interface MAC Lanes MAC Lane Speed PHY Lanes PHY Lane Speed Line Lanes Line Lane Speed Oper Admin -------- ----------- --------------- ---------------- --------------- ---------------- ------------ ----------------- ------ ------- 1 Ethernet48 121,122,123,124 25G 200,201,202,203 25G 204,205 50G down down 1 Ethernet49 125,126,127,128 25G 206,207,208,209 25G 210,211 50G down down 1 Ethernet50 69,70,71,72 25G 212,213,214,215 25G 216 100G down down In addition, docker ps \| grep phy should show a physyncd docker running. Signed-off-by: syd.logan@broadcom.com	2020-09-25 08:32:44 -07:00
bingwang-ms	584e2223dc	Fix exception when attempting to write a datetime to db (#5467 ) redis-py 3.0 used in master branch only accepts user data as bytes, strings or numbers (ints, longs and floats). Attempting to specify a key or a value as any other type will raise a DataError exception. This PR address the issue bt converting datetime to str	2020-09-25 20:19:18 +08:00
yozhao101	13cec4c486	[Monit] Unmonitor the processes in containers which are disabled. (#5153 ) We want to let Monit to unmonitor the processes in containers which are disabled in `FEATURE` table such that Monit will not generate false alerting messages into the syslog. Signed-off-by: Yong Zhao <yozhao@microsoft.com>	2020-09-25 00:28:28 -07:00
Venkatesan Mahalingam	418e437d79	[caclmgrd] Add support to allow/deny any IP/IPv6 protocol packets coming to CPU based on source IP (#4591 ) Add support to allow/deny packets coming to CPU based on source IP, regardless of destination port	2020-09-23 09:55:09 -07:00
abdosi	0483255e82	Fix the build issue when port2cable lenth define in (#5437 ) buffer_default_*.j2 because of which internal cable length never gets define and cause failure in test case test_multinpu_cfggen.py Signed-off-by: Abhishek Dosi <abdosi@abdosi-ubuntu-vm0.nwp1qucpfg5ejooejenqshkj3e.cx.internal.cloudapp.net> Co-authored-by: Abhishek Dosi <abdosi@abdosi-ubuntu-vm0.nwp1qucpfg5ejooejenqshkj3e.cx.internal.cloudapp.net>	2020-09-23 08:07:09 -07:00
abdosi	75e4258508	Enhanced Feature Table state enable/disable for multi-asic platforms. (#5358 ) * Enhanced Feature Table state enable/disbale for multi-asic platforms. In Multi-asic for some features we can service per asic so we need to get list of all services. Also updated logic to return if any one of systemctl command return failure and make sure syslog of feature getting enable/disable only come when all commads are sucessful. Moved the service list get api from sonic-util to sonic-py-common Signed-off-by: Abhishek Dosi <abdosi@abdosi-ubuntu-vm0.nwp1qucpfg5ejooejenqshkj3e.cx.internal.cloudapp.net> * Make sure to retun None for both service list in case of error. Signed-off-by: Abhishek Dosi <abdosi@abdosi-ubuntu-vm0.nwp1qucpfg5ejooejenqshkj3e.cx.internal.cloudapp.net> * Return empty list as fail condition Signed-off-by: Abhishek Dosi <abdosi@abdosi-ubuntu-vm0.nwp1qucpfg5ejooejenqshkj3e.cx.internal.cloudapp.net> * Address Review Comments. Made init_cfg.json.j2 knowledegable of Feature service is global scope or per asic scope Signed-off-by: Abhishek Dosi <abdosi@abdosi-ubuntu-vm0.nwp1qucpfg5ejooejenqshkj3e.cx.internal.cloudapp.net> * Fix merge conflict * Address Review Comment. Signed-off-by: Abhishek Dosi <abdosi@abdosi-ubuntu-vm0.nwp1qucpfg5ejooejenqshkj3e.cx.internal.cloudapp.net> Co-authored-by: Abhishek Dosi <abdosi@abdosi-ubuntu-vm0.nwp1qucpfg5ejooejenqshkj3e.cx.internal.cloudapp.net>	2020-09-22 08:34:02 -07:00
abdosi	a7f4bfa96d	Enabling ipv6 support on docker container network. This is needed (#5418 ) for ipv6 communication between container and host in multi-asic platforms. Address is assign is private address space of fd::/80 with prefix len selected as 80 so that last 48 bits can be container mac address and and you prevent NDP neighbor cache invalidation issues in the Docker layer. Ref: https://docs.docker.com/config/daemon/ipv6/ Ref:https://medium.com/@skleeschulte/how-to-enable-ipv6-for-docker-containers-on-ubuntu-18-04-c68394a219a2 Signed-off-by: Abhishek Dosi <abdosi@abdosi-ubuntu-vm0.nwp1qucpfg5ejooejenqshkj3e.cx.internal.cloudapp.net> Co-authored-by: Abhishek Dosi <abdosi@abdosi-ubuntu-vm0.nwp1qucpfg5ejooejenqshkj3e.cx.internal.cloudapp.net>	2020-09-22 08:32:17 -07:00
Volodymyr Boiko	97aee026de	[logrotate] create separate logrotate.d config for update-alternatives (#5382 ) To fix the following error when running `logrotate /etc/logrotate.conf` : ``` error: dpkg:10 duplicate log entry for /var/log/alternatives.log error: found error in file dpkg, skipping ``` update-alternatives is provided with dedicated logrotate config in newer dpkg package versions (probably starting from buster) Signed-off-by: Volodymyr Boyko <volodymyrx.boiko@intel.com>	2020-09-22 01:23:42 -07:00
Joe LeVeque	3987cbd80a	[sonic-utilities] Build and install as a Python wheel package (#5409 ) We are moving toward building all Python packages for SONiC as wheel packages rather than Debian packages. This will also allow us to more easily transition to Python 3. Python files are now packaged in "sonic-utilities" Pyhton wheel. Data files are now packaged in "sonic-utilities-data" Debian package. - How I did it - Build and install sonic-utilities as a Python package - Remove explicit installation of wheel dependencies, as these will now get installed implicitly by pip when installing sonic-utilities as a wheel - Build and install new sonic-utilities-data package to install data files required by sonic-utilities applications - Update all references to sonic-utilities scripts/entrypoints to either reference the new /usr/local/bin/ location or remove absolute path entirely where applicable Submodule updates: * src/sonic-utilities aa27dd9...2244d7b (5): > Support building sonic-utilities as a Python wheel package instead of a Debian package (#1122) > [consutil] Display remote device name in show command (#1120) > [vrf] fix check state_db error when vrf moving (#1119) > [consutil] Fix issue where the ConfigDBConnector's reference is missing (#1117) > Update to make config load/reload backward compatible. (#1115) * src/sonic-ztp dd025bc...911d622 (1): > Update paths to reflect new sonic-utilities install location, /usr/local/bin/ (#19)	2020-09-20 20:16:42 -07:00
Tamer Ahmed	2de3afaf35	[swss] Enhance ARP Update to Call Sonic Cfggen Once (#5398 ) This PR limited the number of calls to sonic-cfggen to one call per iteration instead of current 3 calls per iteration. The PR also installs jq on host for future scripts if needed. signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>	2020-09-18 18:44:23 -07:00
abdosi	d12e9cbbc6	[Multi-Asic] Fix for multi-asic where we should allow docker local (#5364 ) communication on docker eth0 ip . Without this TCP Connection to Redis does not happen in namespace. Signed-off-by: Abhishek Dosi <abdosi@abdosi-ubuntu-vm0.nwp1qucpfg5ejooejenqshkj3e.cx.internal.cloudapp.net> Co-authored-by: Abhishek Dosi <abdosi@abdosi-ubuntu-vm0.nwp1qucpfg5ejooejenqshkj3e.cx.internal.cloudapp.net>	2020-09-16 11:32:35 -07:00
Stepan Blyshchak	6de9390bb0	[build] Add a parameter to specify sonic version during build (#5278 ) Introduced a new build parameter 'SONIC_IMAGE_VERSION' that allows build system users to build SONiC image with a specific version string. If 'SONIC_IMAGE_VERSION' was not passed by the user, SONIC_IMAGE_VERSION will be set to the output of functions.sh:sonic_get_version function. Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>	2020-09-16 10:47:26 -07:00
Joe LeVeque	c7186a2d39	[process-reboot-cause] Use Logger class from sonic-py-common package (#5384 ) Eliminate duplicate logging code by importing Logger class from sonic-py-common package.	2020-09-16 10:35:19 -07:00
Samuel Angebault	9bf4b0a93e	[baseimage]: Change the loopback mask from /8 to /16 (#5353 ) As per the VOQ HLDs, internal networking between the linecards and supervisor is required within a chassis. Allocating 127.X/16 subnets for private communication within a chassis is a good candidate. It doesn't require any external IP allocation as well as ensure that the traffic will not leave the chassis. References: https://github.com/Azure/SONiC/pull/622 https://github.com/Azure/SONiC/pull/639 - How I did it Changed the `interfaces.j2` file to add `127.0.0.1/16` as the `lo` ip address. Then once the interface is up, the post-up command removes the `127.0.0.1/8` ip address. The order in which the netmask change is made matters for `127.0.0.1` to be reachable at all times. - How to verify it ``` root@sonic:~# ip address show dev lo 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/16 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever ``` Co-authored-by: Baptiste Covolato <baptiste@arista.com>	2020-09-15 15:29:48 -07:00
Petro Bratash	558ec53aa6	Fix bug with pcie-check.service (#5368 ) * Change STATE_DB key (PCIE_STATUS\|PCIE_DEVICES -> PCIE_DEVICES) Signed-off-by: Petro Bratash <petrox.bratash@intel.com> * [pcie-check.service] Add dependency on database.service Signed-off-by: Petro Bratash <petrox.bratash@intel.com>	2020-09-15 15:21:31 -07:00
Joe LeVeque	1ac146dd97	[caclmgrd] Inherit DaemonBase class from sonic-py-common package (#5373 ) Eliminate duplicate logging code by inheriting from DaemonBase class in sonic-py-common package.	2020-09-15 13:34:41 -07:00
Joe LeVeque	3a901eeae0	[procdockerstatsd] Inherit DaemonBase class from sonic-py-common package (#5372 ) Eliminate duplicate logging code by inheriting from DaemonBase class in sonic-py-common package.	2020-09-14 16:36:37 -07:00
noaOrMlnx	353003f6ee	Change update_feature_state call to pass False as default if feature has no 'has_timer' field (#5260 ) * Pass False as default if feature has no timer field * Update hostcfgd to fit the new changes merged New changes can be found in PR:5248	2020-09-14 11:28:24 -07:00
Samuel Angebault	0b4191fe2a	[Arista] Updating driver submodules (#5352 ) - Merge chassis codebase upstream - Add support for Otterlake supervisor - Add support for NorthFace and Camp chassis - Add support for Eldridge, Dragonfly and Brooks fabrics - Add support for Clearwater2 and Clearwater2Ms linecards - Add new arista Cli to power on/off cards - Add new arista show Cli to inspect supervisor, chassis, fabrics and linecards	2020-09-10 01:34:38 -07:00
shi-su	339cfbf9af	Remove the configuration of synchronous mode from init_cfg.json (#5308 ) Remove the configuration of synchronous mode from init_cfg.json	2020-09-10 01:26:10 -07:00
Blueve	01fb32fa08	[conf] append nos-config-part for s6100 (#5234 ) * [conf] append nos-config-part for s6100 * modify rc.local Signed-off-by: Guohan Lu <lguohan@gmail.com> * Update rc.local Co-authored-by: Blueve <jika@microsoft.com> Co-authored-by: Guohan Lu <lguohan@gmail.com> Co-authored-by: Ying Xie <yxieca@users.noreply.github.com>	2020-09-08 12:29:02 -07:00
arheneus@marvell.com	f136fd0623	[ebtbles] Replace binary config file to text config file for ebtables (#5252 ) Issue: Binary ebtables config file is CPU arch dependent Fix: Load the text config during firsttime boot and Generate the binary persistent atomic file Signed-off-by: Antony Rheneus <arheneus@marvell.com>	2020-09-03 17:27:07 -07:00
Tamer Ahmed	fdb9d028e9	[redis] Add redis Group And Grant Read/Write Access to Members (#5289 ) sonic-cfggen is now using Unix Domain Socket for Redis DB. The socket is created using root account. Subsequently, services that are started as admin fails to start. This PR creates redis group and add admin user to redis group. It also grants read/write access on redis.sock for redis group members. signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>	2020-09-02 23:40:22 -07:00
abdosi	dd908c2ee2	[sonic-swsscommon] submodule update with commit's (#5300 ) [schema] Make schema header support C project (#373) Removed DB specific get api's from Selectable class (#378) With the change as part of #378 caclmgrd need to be updated to use new client side Get API to access namespace. Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>	2020-09-02 18:09:03 -07:00
Joe LeVeque	07b9d7f44d	[pcie-check] Make pcie-check.sh executable (#5256 ) The pcie-check.sh script was added in https://github.com/Azure/sonic-buildimage/pull/4771, but was not given executable permission. Therefore, we would see messages like: ``` Aug 26 22:54:05.536248 sonic ERR systemd[664]: pcie-check.service: Failed to execute command: Permission denied Aug 26 22:54:05.536386 sonic ERR systemd[664]: pcie-check.service: Failed at step EXEC spawning /usr/bin/pcie-check.sh: Permission denied Aug 26 22:54:05.536600 sonic WARNING systemd[1]: pcie-check.service: Failed with result 'exit-code'. ```	2020-08-29 10:29:42 -07:00
Stepan Blyshchak	b31050d60e	[services][mgmt-framework] delay mgmt-framework service on boot (#5226 ) management framework provides management plane services like rest and CLI which is not needed right after boot, instead by delaying this service we give some more CPU for data plane and control plane services on fast/warm boot. Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>	2020-08-27 21:53:58 +03:00
Tamer Ahmed	7d3ec60b1f	[hostcfgd] Fix Boolean String Evaluation (#5248 ) New attribute 'has_timer' introduced to init_cfg.json does not evaluate as Bool, rather it evaluates as string. This PR fixes this issue. Also, this PR fixes an issue when there is system config unit (snmp, telemetry) that has no installation config (WantedBy=, RequiredBy=, Also=, Alias=) settings in the [Install] section. In the latter case, the .service should not be enabled. signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>	2020-08-27 06:50:03 -07:00
shi-su	f3feb56c8a	Add switch for synchronous mode (#5237 ) Add a master switch so that the sync/async mode can be configured. Example usage of the switch: 1. Configure mode while building an image `make ENABLE_SYNCHRONOUS_MODE=y <target>` 2. Configure when the device is running Change CONFIG_DB with `sonic-cfggen -a '{"DEVICE_METADATA":{"localhost": {"synchronous_mode": "enable"}}}' --write-to-db` Restart swss with `systemctl restart swss`	2020-08-24 14:04:10 -07:00
Baptiste Covolato	cd486a82a4	[arista/aboot]: Zero out 1st MB before repartitioning (#5220 ) The first partition starting point was changed to be 1M as part of this commit: `6ba2f97f1e`. On systems that are misaligned before conversion (partition start is the first sector), the relica partition that is left in the first MB can cause problems in Aboot and result in corruption of the filesystem on the new aligned partition. Zeroing this old relica makes sure that there is nothing left of the old partition lying around. There won't be any risk of having Aboot corrupt the new filesystem because of the old relica. Signed-off-by: Baptiste Covolato <baptiste@arista.com>	2020-08-22 18:46:30 -07:00
nirenjan	bb57ccecd4	[sonic-host-service]: Add SONiC Host Services infrastructure (#4840 ) - Why I did it When SONiC is configured with the management framework and/or telemetry services, the applications running inside those containers need to access some functionality on the host system. The following is a non-exhaustive list of such functionality: Image management Configuration save and load ZTP enable/disable and status Show tech support - How I did it The host service is a Python process that listens for requests via D-Bus. It will then service those requests and send a response back to the requestor. This PR only introduces the host service infrastructure. Applications that need access to the host services must add applets that will register on D-Bus endpoints to service the appropriate functionality. - How to verify it - Description for the changelog Add SONiC Host Service for container to execute select commands in host Signed-off-by: Nirenjan Krishnan <Nirenjan.Krishnan@dell.com>	2020-08-21 15:34:14 -07:00
Tamer Ahmed	90cbb4d78c	[hostcfgd] Handle Both Service And Timer Units (#5228 ) Commit `e484ae9dd` introduced systemd .timer unit to hostcfgd. However, when stopping service that has timer, there is possibility that timer is not running and the service would not be stopped. This PR address this situation by handling both .timer and .service units. signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>	2020-08-21 09:51:41 -07:00
abdosi	1a805e7409	Fix unwanted python exception in syslog during database container (#5227 ) startup when doing redis PING since database_config.json getting generated from jinja2 template is still not ready. Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>	2020-08-21 07:33:19 -07:00
abdosi	74d8b4a6be	[caclmgrd] Add support for multi-ASIC platforms (#5022 ) * Support for Control Plane ACL's for Multi-asic Platforms. Following changes were done: 1) Moved from using blocking listen() on Config DB to the select() model via python-swsscommon since we have to wait on event from multiple config db's 2) Since python-swsscommon is not available on host added libswsscommon and python-swsscommon and dependent packages in the base image (host enviroment) 3) Made iptables programmed in all namespace using ip netns exec Signed-off-by: Abhishek Dosi <abdosi@microsoft.com> * Address Review Comments Signed-off-by: Abhishek Dosi <abdosi@microsoft.com> * Fix Review Comments * Fix Comments * Added Change for Multi-asic to have iptables rules to accept internal docker tcp/udp traffic needed for syslog and redis-tcp connection. Signed-off-by: Abhishek Dosi <abdosi@microsoft.com> * Fix Review Comments * Added more comments on logic. * Fixed all warning/errors reported by http://pep8online.com/ other than line > 80 characters. * Fix Comment Signed-off-by: Abhishek Dosi <abdosi@microsoft.com> * Verified with swsscommon package. Fix issue for single asic platforms. * Moved to new python package * Address Review Comments. Signed-off-by: Abhishek Dosi <abdosi@microsoft.com> * Address Review Comments.	2020-08-20 15:11:42 -07:00
Tamer Ahmed	e484ae9dda	[services] Fix Delay Start of SNMP And Telemetry (#5211 ) SNMP and Telemetry services are not critical to switch startup. They also cause fast-reboot not to meet timing requirements. In order to delay start those service are associated with systemd timer units, however when hostcfgd initiate service start, it start the service and not the timer. This PR fixes this issue by starting the timer associated with systemd unit. signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>	2020-08-19 19:27:59 -07:00

... 5 6 7 8 9 ...

1272 Commits