sonic-buildimage

Author	SHA1	Message	Date
mssonicbld	e7f49c9bce	Fix potentially not having any loopback address on lo interface (#16490 ) (#16628 ) In #15080, there was a command added to re-add 127.0.0.1/8 to the lo interface when the networking configuration is being brought down. However, the trigger for that command is `down`, which, looking at ifupdown2 configuration files, runs immediately after 127.0.0.1/16 is removed. This means there may be a period of time where there are no loopback addresses assigned to the lo interface, and redis commands will fail. Fix this by changing this to pre-down, which should run well before 127.0.0.1/16 is removed, and should always leave lo with a loopback address. Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com> Co-authored-by: Saikrishna Arcot <sarcot@microsoft.com>	2023-09-21 20:40:21 +08:00
StormLiangMS	2b381b1fd4	Revert "revert [syslog] Add remote syslog configuration (cherry-pick to 202305) (#15897 ) (#16179 )" (#16549 ) This reverts commit `164fa102c0`.	2023-09-14 20:52:14 +08:00
Junchao-Mellanox	cead17cb55	Fix issue: systemctl daemon-reload would sporadically cause udev handler fail (#15253 ) #### Why I did it A workaround to back port the fix for a systemd issue. The systemd issue: https://github.com/systemd/systemd/issues/24668 The systemd PR to fix the issue: https://github.com/systemd/systemd/pull/24673/files The formal solution should upgrade systemd to a version that contains the fix. But, systemd is a very basic service, upgrading systemd requires heavy test. #### How I did it Copy the correct systemd-udevd.service file in build time #### Tested branch (Please provide the tested image version) - [x] 202211 - [ ] <!-- image version 2 --> ``` SONiC Software Version: SONiC.fix-udev.3-b65c7bdec_Internal SONiC OS Version: 11 Distribution: Debian 11.7 Kernel: 5.10.0-18-2-amd64 Build commit: `b65c7bdec` Build date: Mon Jun 19 10:54:50 UTC 2023 Built by: sw-r2d2-bot@r-build-sonic-ci02-241 Platform: x86_64-mlnx_msn4700-r0 HwSKU: ACS-MSN4700 ASIC: mellanox ASIC Count: 1 Serial Number: MT2022X08597 Model Number: MSN4700-WS2FO Hardware Revision: A1 Uptime: 08:10:11 up 1 min, 1 user, load average: 1.81, 0.67, 0.24 Date: Sun 25 Jun 2023 08:10:11 Docker images: REPOSITORY TAG IMAGE ID SIZE docker-fpm-frr fix-udev.3-b65c7bdec_Internal a7b911e7cb6f 346MB docker-fpm-frr latest a7b911e7cb6f 346MB docker-platform-monitor fix-udev.3-b65c7bdec_Internal 94c5178cf80b 731MB docker-platform-monitor latest 94c5178cf80b 731MB docker-orchagent fix-udev.3-b65c7bdec_Internal 46b393e0ace8 328MB docker-orchagent latest 46b393e0ace8 328MB docker-syncd-mlnx fix-udev.3-b65c7bdec_Internal 1f5c6c23e33a 734MB docker-syncd-mlnx latest 1f5c6c23e33a 734MB docker-sflow fix-udev.3-b65c7bdec_Internal 7e45992c8c59 317MB docker-sflow latest 7e45992c8c59 317MB docker-teamd fix-udev.3-b65c7bdec_Internal e4d905592cda 316MB docker-teamd latest e4d905592cda 316MB docker-nat fix-udev.3-b65c7bdec_Internal 7fe799367580 319MB docker-nat latest 7fe799367580 319MB docker-macsec latest d702a5554171 318MB docker-snmp fix-udev.3-b65c7bdec_Internal 3bce8fcf71cd 338MB docker-snmp latest 3bce8fcf71cd 338MB docker-sonic-telemetry fix-udev.3-b65c7bdec_Internal f13949cbc817 597MB docker-sonic-telemetry latest f13949cbc817 597MB docker-dhcp-relay latest 153d9072805d 306MB docker-router-advertiser fix-udev.3-b65c7bdec_Internal aed642b9a6bc 299MB docker-router-advertiser latest aed642b9a6bc 299MB docker-sonic-p4rt fix-udev.3-b65c7bdec_Internal a3cae5ca65a7 870MB docker-sonic-p4rt latest a3cae5ca65a7 870MB docker-mux fix-udev.3-b65c7bdec_Internal b81f0401b9a8 347MB docker-mux latest b81f0401b9a8 347MB docker-eventd fix-udev.3-b65c7bdec_Internal c5917d0e801f 298MB docker-eventd latest c5917d0e801f 298MB docker-lldp fix-udev.3-b65c7bdec_Internal fd5dc14a7976 341MB docker-lldp latest fd5dc14a7976 341MB docker-database fix-udev.3-b65c7bdec_Internal 438c2715a1dd 299MB docker-database latest 438c2715a1dd 299MB docker-sonic-mgmt-framework fix-udev.3-b65c7bdec_Internal 5c50b115fbcd 414MB docker-sonic-mgmt-framework latest ```	2023-09-03 18:32:54 +08:00
Vadym Hlushko	b7dfc5b280	[memory_checker] Add a specific log message in a case when the docker service is not running. (#16018 ) #### Why I did it To fix the logic introduced by [[memory_checker] Do not check memory usage of containers which are not created #11129](https://github.com/sonic-net/sonic-buildimage/pull/11129). There could be a scenario before the reboot, where 1. The `docker service` has stopped 2. In a very short period of time, the monit service performs the `root@sonic:/home/admin# monit status container_memory_telemetry` In such scenario, the `memory_checker` script will throw an error to the syslog: ``` ERR memory_checker: Failed to retrieve the running container list from docker daemon! Error message is: 'Error while fetching server API version: ('Connection aborted.', FileNotFoundError(2, 'No such file or directory'))' ``` But, actually, this scenario is a correct behavior, because when the docker service is stopped, the Unix socket is destroyed and that is why we could see the `FileNotFoundError(2, 'No such file or directory'` exception in the syslog. #### How I did it Change the log severity to the warning and changed the return value. #### How to verify it It is really hard to catch the exact moment described in the `Why I did it` section. In order to check the logic: 1. Change the Unix socket path to non-existing in [/usr/bin/memory_checker](`47742dfc2c/files/image_config/monit/memory_checker (L139)`) file on the switch. 2. Execute the `root@sonic:/home/admin# monit restart container_memory_telemetry` 3. Check the syslog for such messages: ``` WARNING memory_checker: Failed to retrieve the running container list from docker daemon! Error message is: 'Error while fetching server API version: ('Connection aborte d.', FileNotFoundError(2, 'No such file or directory'))' INFO memory_checker: [memory_checker] Exits without checking memory usage since container 'telemetry' is not running! ```	2023-09-03 18:32:43 +08:00
StormLiangMS	7b8906600c	add sonic release for 202305 (#16364 )	2023-09-03 09:23:39 +08:00
Vaibhav Hemant Dixit	0b83639068	Fix CONFIG_DB_INITIALIZED flag check logic and set/reset flag for warmboot (#15685 ) (#16217 ) Cherypick of #15685 MSFT ADO: 24274591 Why I did it Two changes: 1 Fix a day1 issue, where check to wait until CONFIG_DB_INITIALIZED is incorrect. There are multiple places where same incorrect logic is used. Current logic (until [[ $($SONIC_DB_CLI CONFIG_DB GET "CONFIG_DB_INITIALIZED") ]];) will always result in pass, irrespective of the result of GET operation. root@str2-7060cx-32s-29:~# sonic-db-cli CONFIG_DB GET "CONFIG_DB_INITIALIZED" 1 root@str2-7060cx-32s-29:~# until [[ $(sonic-db-cli CONFIG_DB GET "CONFIG_DB_INITIALIZED") ]]; do echo "entered here"; done root@str2-7060cx-32s-29:~# root@str2-7060cx-32s-29:~# root@str2-7060cx-32s-29:~# sonic-db-cli CONFIG_DB GET "CONFIG_DB_INITIALIZED" 0 root@str2-7060cx-32s-29:~# until [[ $(sonic-db-cli CONFIG_DB GET "CONFIG_DB_INITIALIZED") ]]; do echo "entered here"; done root@str2-7060cx-32s-29:~# Fix this logic by checking for value of flag to be "1". root@str2-7060cx-32s-29:~# until [[ $(sonic-db-cli CONFIG_DB GET "CONFIG_DB_INITIALIZED") -eq 1 ]]; do echo "entered here"; done entered here entered here entered here This gap in logic was highlighted when another fix was merged: #14933 The issue being fixed here caused warmboot-finalizer to not wait until config-db is initialized. 2 Set and unset CONFIG_DB_INITIALIZED for warm-reboot case Currently, during warm shutdown CONFIG_DB_INITIALIZED's value is stored in redis db backup. This is restored back when the dump is loaded during warm-recovery. So the value of CONFIG_DB_INITIALIZED does not depend on config db's state, however it remain what it was before reboot. Fix this by setting CONFIG_DB_INITIALIZED to 0 as when the DB is loaded, and set it to 1 after db_migrator is done. Work item tracking Microsoft ADO (number only): How I did it How to verify it	2023-08-24 16:58:24 +08:00
StormLiangMS	164fa102c0	revert [syslog] Add remote syslog configuration (cherry-pick to 202305) (#15897 ) (#16179 )	2023-08-19 16:01:29 +08:00
Vaibhav Hemant Dixit	2969d84e58	Revert "Revert "Fix for fast/cold-boot: call db_migrator only after old config is loaded (#14933 )" (#15464 )" (#15684 ) This reverts commit `9649a44470`.	2023-08-15 04:32:38 +08:00
Yevhen Fastiuk	4602d30a73	[syslog] Add remote syslog configuration (cherry-pick to 202305) (#15897 ) cherry-pick: #14513 depends: https://github.com/sonic-net/sonic-utilities/pull/2939 * Add an ability to configure remote syslog servers * Add an initial configuration for remote syslog * Extend YANG module and add unit tests #### Why I did it Adding the following functionality to rsyslog feature: * Configure remote syslog servers: protocol, filter, severity level * Update global syslog configuration: severity level, message format #### How I did it added parameters to syslog server and global configuration. #### How to verify it create syslog server using CLI/adding to Redis-DB verify server is added to file /etc/rsyslog.conf and server is functional. #### Description for the changelog extend rsyslog capabilities, added server and global configuration parameters. #### Link to config_db schema for YANG module changes [sonic-syslog.yang](https://github.com/sonic-net/sonic-buildimage/blob/master/src/sonic-yang-models/yang-models/sonic-syslog.yang)	2023-08-14 13:12:33 -07:00
Longxiang Lyu	6e49fa5fd2	[monit][dualtor] Periodically check mux neighbors consistency (#15769 ) Signed-off-by: Longxiang Lyu <lolv@microsoft.com>	2023-08-08 18:33:29 +08:00
mssonicbld	0b1f834e22	update rsyslog log size conf (#15821 ) (#15837 )	2023-07-14 20:34:22 +08:00
mssonicbld	bb3eff6ab4	Revert "Fix for fast/cold-boot: call db_migrator only after old config is loaded (#14933 )" (#15464 ) (#15618 )	2023-06-29 22:35:47 +08:00
Saikrishna Arcot	f84dfd2345	Re-add 127.0.0.1/8 when bringing down the interfaces (#15080 ) * Re-add 127.0.0.1/8 when bringing down the interfaces With #5353, 127.0.0.1/16 was added to the lo interface, and then 127.0.0.1/8 was removed. However, when bringing down the lo interface, like during a config reload, 127.0.0.1/16 gets removed, but 127.0.0.1/8 isn't added back to the interface. This means that there's a period of time where 127.0.0.1 is not available at all, and services that need to connect to 127.0.01 (such as for redis DB) will fail. To fix this, when going down, add 127.0.0.1/8. Add this address before the existing configuration gets removed, so that 127.0.0.1 is available at all times. Note that running `ifdown lo` doesn't actually bring down the loopback interface; the interface always stays "physically" up. Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>	2023-06-13 18:45:39 -07:00
Vaibhav Hemant Dixit	02b17839c3	Fix for fast/cold-boot: call db_migrator only after old config is loaded (#14933 ) Why I did it Fix the issue where db_migrator is called before DB is loaded w/ config. This leads to db_migrator: Not finding anything, and resumes to incorrectly migrate every missing config This is not expected. migration should happen after the old config is loaded and only new schema changes need migration. Since DB does not have anything when migrator is called, db_migrator fails when some APIs return None. The reason for incorrect call is that: database service starts db_migrator as part of startup sequence. config-setup service loads data from old-config/minigraph. However, since it has Requires=database.service. Hence, config-setup starts only when database service is started. And database service is started when db_migrator is completed. Fixed by: Check if this is first time boot by checking pending_config_migration flag. If pending_config_migration is enabled, then do not call db_migrator as part of database service startup. Let database service start which triggers config-setup service to start. Now call db_migrator after when config-setup service loads old-config/minigraph	2023-05-30 10:16:21 -07:00
judyjoseph	efeae03ea3	Add override_config to load_minigraph in config-setup service (#14834 ) This PR is to handle the override minigraph config by golden_config_db.json file if it is present in the backup location.	2023-05-10 11:54:33 -07:00
Ying Xie	72c52bc677	Revert "Clear DNS configuration received from DHCP during networking reconfiguration in Linux. (#13516 )" (#14902 ) This reverts commit `c7ecd92c54`.	2023-05-01 17:12:38 -07:00
Tejaswini Chadaga	ca224863cb	Changes to support TSA from supervisor (#14691 ) Why I did it Support for SONIC chassis isolation using TSA and un-isolation using TSB from supervisor module Work item tracking Microsoft ADO (number only): 17826134 How I did it When TSA is run on the supervisor, it triggers TSA on each of the linecards using the secure rexec infrastructure introduced in sonic-net/sonic-utilities#2701. User password is requested to allow secure login to linecards through ssh, before execution of TSA/TSB on the linecards TSA of the chassis withdraws routes from all the external BGP neighbors on each linecard, in order to isolate the entire chassis. No route withdrawal is done from the internal BGP sessions between the linecards to prevent transient drops during internal route deletion. With these changes, complete isolation of a single linecard using TSA will not be possible (a separate CLI/script option will be introduced at a later time to achieve this) Changes also include no-stats option with TSC for quick retrieval of the current system isolation state This PR also reverts changes in #11403 How to verify it These changes have a dependency on sonic-net/sonic-utilities#2701 for testing Run TSA from supervisor module and ensure transition to Maintenance mode on each linecard Verify that all routes are withdrawn from eBGP neighbors on all linecards Run TSB from supervisor module and ensure transition to Normal mode on each linecard Verify that all routes are re-advertised from eBGP neighbors on all linecards Run TSC no-stats from supervisor and verify that just the system maintenance state is returned from all linecards	2023-04-28 16:28:06 +08:00
Aryeh Feigin	039a9c998a	[Fast-boot] Clear teamd-timer when finalizing fast-reboot (#14583 ) Part of sonic-net/sonic-utilities#2760 Similar to #14295 - Why I did it To clear teamd timer when fast-reboot is finalized to prevent any further affect. - How I did it Deleted teamd timer from config-db in fast-reboot finalizer. config save call is moved to after clearing teamd-timer so it won't have any further affect as well. - How to verify it Verified manually that entry was deleted after fast-reboot was finailized.	2023-04-18 09:15:42 +03:00
Stepan Blyshchak	d73c810e86	[image_config] add rasdaemon.timer (#14300 ) rasdaemon is a tool to log hardware errors. It takes 100% CPU during boot for a few seconds. It impacts fast/warm boot by delaying control plane restoration for 5 sec on some platforms. Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>	2023-04-17 08:58:45 -07:00
Aryeh Feigin	41a9813018	Finalize fast-reboot in warmboot finalizer (#14238 ) - Why I did it To solve an issue with upgrade with fast-reboot including FW upgrade which has been introduced since moving to fast-reboot over warm-reboot infrastructure. As well, this introduces fast-reboot finalizing logic to determine fast-reboot is done. - How I did it Added logic to finalize-warmboot script to handle fast-reboot as well, this makes sense as using fast-reboot over warm-reboot this script will be invoked. The script will clear fast-reboot entry from state-db instead of previous implementation that relied on timer. The timer could expire in some scenarios between fast-reboot finished causing fallback to cold-reboot and possible crashes. As well this PR updates all services/scripts reading fast-reboot state-db entry to look for the updated value representing fast-reboot is active. - How to verify it Run fast-reboot and check that fast-reboot entry exists in state-db right after startup and being cleared as warm-reboot is finalized and not due to a timer.	2023-04-09 16:59:15 +03:00
Hua Liu	4c059d8eb5	Improve sudo cat command for RO user. (#14428 ) Improve sudo cat command for RO user. #### Why I did it RO user can use sudo command show none syslog files. #### How I did it Improve sudo cat command for RO user. #### How to verify it Pass all UT. Manually check fixed code work correctly. #### Description for the changelog Improve sudo cat command for RO user.	2023-03-27 17:08:14 -07:00
oleksandrx-kolomeiets	4da51b07ad	Set owner after restoring counters folder during warmboot (#13507 ) Why I did it After warm reboot, show environment prints the following error: failed to import plugin show.plugins.macsec: [Errno 13] Permission denied: '/tmp/cache/macsec' How I did it Set owner back to admin after restoring counters folder. How to verify it sudo warm-reboot, then ensure show environement does not print errors. Signed-off-by: Oleksandr Kolomeiets <oleksandrx.kolomeiets@intel.com>	2023-03-27 10:32:07 -07:00
Neetha John	f30fb6ec58	[storage_backend] Add backend acl service (#14229 ) Why I did it This PR addresses the issue mentioned above by loading the acl config as a service on a storage backend device How I did it The new acl service is a oneshot service which will start after swss and does some retries to ensure that the SWITCH_CAPABILITY info is present before attempting to load the acl rules. The service is also bound to sonic targets which ensures that it gets restarted during minigraph reload and config reload How to verify it Build an image with the following changes and did the following tests Verified that acl is loaded successfully on a storage backend device after a switch boot up Verified that acl is loaded successfully on a storage backend ToR after minigraph load and config reload Verified that acl is not loaded if the device is not a storage backend ToR or the device does not have a DATAACL table Signed-off-by: Neetha John <nejo@microsoft.com>	2023-03-16 14:18:28 -07:00
Sudharsan Dhamal Gopalarathnam	8883259673	[netlink] Increse netlink buffer size from 3MB to 16MB (#13965 ) #### Why I did it Following the PR https://github.com/sonic-net/sonic-swss-common/pull/739 increasing netlink buffer size in linux kernel As error is seen in fdbsyncd with netlink reports "out of memory on reading a netlink socket" It is seen when kernel is sending 10k remote mac to fdbsyncd. #### How I did it Increase the buffer size of the netlink buffer from 3MB to 16MB #### How to verify it Verified with 10k remote mac, and restarting the fdbsyncd process. So that kernel send the bridge fdb dump to the fdbsyncd. Verified that the netlink buffer error is not reported in the sys log.	2023-02-27 15:41:22 -08:00
Chun'ang Li	eea54717b8	Fix rsyslogd start failed cause by rsyslog.conf is emtpy. (#13669 ) - Why I did it In to-sonic and multi-asic KVM-test, pretest sometimes failed. Reason is rsyslogd process can not start in teamd container. Because rsyslog.conf is empty caused by sonic-cfggen execute failed - How I did it If sonic-cfggen -d execute failed, execute without -d because the template file has the default value. - How to verify it Build image and test it over 40 times, all passed pretest. Signed-off-by: Chun'ang Li <chunangli@microsoft.com>	2023-02-06 16:38:04 +02:00
anamehra	26af468a99	Add support for platform topology configuration service (#12066 ) * Add support for platform topology configuration service This service invokes the platform plugin for platform specific topology configuration. The path for platform plugin script is: /usr/share/sonic/device/$PLATFORM/plugins/config-topology.sh If the platform plugin is not available, this service does nothing. Signed-off-by: anamehra <anamehra@cisco.com>	2023-02-01 12:53:45 -08:00
Oleksandr Ivantsiv	c7ecd92c54	Clear DNS configuration received from DHCP during networking reconfiguration in Linux. (#13516 ) - Why I did it fixes #12907 When the management interface IP address configuration changes from dynamic to static the DNS configuration (retrieved from the DHCP server) in /etc/resolv.conf remains uncleared. This leads to a DNS configuration pointing to the wrong nameserver. To make the behavior clear DNS configuration received from DHCP should be cleared. - How I did it Use resolvconf package for managing DNS configuration. It is capable of tracking the source of DNS configuration and puts the configuration retrieved from the DHCP servers into a separate file. This allows the implementation of DNS configuration cleanup retrieved from DHCP during networking reconfiguration. - How to verify it Ensure that the management interface has no static configuration. Check that /etc/resolv.conf has DNS configuration. Configure a static IP address on the management interface. Verify that /etc/resolv.conf has no DNS configuration. Remove the static IP address from the management interface. Verify that /etc/resolv.conf has DNS configuration retrieved form DHCP server.	2023-01-30 22:13:10 +02:00
Devesh Pathak	c93716a142	rsyslog to start after interfaces-config (#13503 ) Fixes #12408 Why I did it We are running into #12408 very frequently. This results in no syslogs from any containers as rsyslog server could not start. some of the sonic-mgmt scripts look for log statements and error out if log is not present. Interfaces-config service configures the loopback interface along with other interfaces. rsyslog-config reads ip address of loopback interface and generates /etc/rsyslog.conf. When this race condition happens, lo interface ip is not yet programmed and rsyslog-config ends up writing UDP server as null in /etc/rsyslog.conf. How I did it rsyslog-config service is started after interfaces-config service. How to verify it Did multiple reboots and verified that $UDPServerAddress is valid.	2023-01-26 20:39:13 -08:00
Jing Zhang	dabb31c5f6	[sudoers] add `/usr/local/bin/storyteller` to `READ_ONLY_CMDS` (#13422 ) Adding /usr/local/bin/storyteller to READ_ONLY_CMDS. So no write access or prompt for password is needed to run storyteller. Tested on 202205 clusters, user who didn't request write access was able to grep log using storyteller. sign-off: Jing Zhang zhangjing@microsoft.com	2023-01-26 20:38:29 -08:00
Zain Budhwani	c9a33cb00e	Fix segfault issue inside memory_checker (#13066 ) #### Why I did it Segfault was occuring when running memory_checker #### How I did it Deinit publisher immediately after publishing #### How to verify it Manual testing	2023-01-24 15:30:41 -08:00
xumia	e6a01ca5eb	[Bug] Fix SONiC installation failure caused by pip/pip3 not found (#13284 ) The main issue is the pip/pip3 command cannot be found when the package is being installed by apt-get. When using the dpkg install, the searching path is PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin When using the apt-get install, the searching path is PATH=/usr/sbin:/usr/bin:/sbin:/bin But the pip/pip3 default path is at /usr/local/bin, so dpkg works, but apt-get not work. How I did it Export the path /usr/local/bin for pip/pip3. Make the deb packages can be installed by apt-get.	2023-01-11 08:54:24 -08:00
centecqianj	4b933bd566	[Centec arm64] Solve the abnormal console speed of centec-arm64 switch board (#13126 ) The console of the centec-arm64 board is ttyAMA0.The current regular expression cannot be correctly parsed. Signed-off-by: centecqianj <qianj@centec.com>	2023-01-07 21:10:03 -08:00
Junchao-Mellanox	2126def04e	[infra] Support syslog rate limit configuration (#12490 ) - Why I did it Support syslog rate limit configuration feature - How I did it Remove unused rsyslog.conf from containers Modify docker startup script to generate rsyslog.conf from template files Add metadata/init data for syslog rate limit configuration - How to verify it Manual test New sonic-mgmt regression cases	2022-12-20 10:53:58 +02:00
Saikrishna Arcot	00b11ec4e2	Replace logrotate cron file with (adapted) systemd timer file (#12921 ) Debian is shipping a systemd timer unit for logrotate, but we're also packaging in a cron job, which means both of them will run, potentially at the same time. Remove our cron file, and add an override to the shipped timer file to have it be run every 10 minutes. Fixes #12392. Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com> Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>	2022-12-08 14:13:11 -08:00
Junchao-Mellanox	3b3837a636	[containercfgd] Add containercfgd and syslog rate limit configuration support (#12489 ) * [containercfgd] Add containercfgd and syslog rate limit configuration support * Fix build issue * Fix checker issue * Fix review comment * Fix review comment * Update containercfgd.py	2022-12-08 08:58:35 -08:00
Zain Budhwani	4b001e5115	Change value type of params in memory_checker (#12797 ) Fix error when calling events API, required value is string, passing float	2022-11-23 17:37:28 -08:00
Lorne Long	7e525d96b3	[Build] Use apt-get to predictably support dependency ordered configuration of lazy packages (#12164 ) Why I did it The current lazy installer relies on a filename sort for both unpack and configuration steps. When systemd services are configured [started] by multiple packages the order is by filename not by the declared package dependencies. This can cause the start order of services to differ between first-boot and subsequent boots. Declared systemd service dependencies further exacerbate the issue (e.g. blocking the first-boot script). The current installer leaves packages un-configured if the package dependency order does not match the filename order. This also fixes a trivial bug in [Build]: Support to use symbol links for lazy installation targets to reduce the image size #10923 where externally downloaded dependencies are duplicated across lazy package device directories. How I did it Changed the staging and first-boot scripts to use apt-get: dpkg -i /host/image-$SONIC_VERSION/platform/$platform/.deb becomes apt-get -y install /host/image-$SONIC_VERSION/platform/$platform/.deb when dependencies are detected during image staging. How to verify it Apt-get critical rules Add a Depends= to the control information of a package. Grep the syslog for rc.local between images and observe the configuration order of packages change.	2022-11-17 11:20:42 +08:00
Devesh Pathak	0ea4f4d00e	Clear /etc/resolv.conf before building image (#12592 ) Why I did it nameserver and domain entries from build system fsroot gets into sonic image. How I did it Clear /etc/resolv.conf before building image How to verify it Built image with it and verified with install that /etc/resolv.conf is empty	2022-11-09 16:54:56 -08:00
Sudharsan Dhamal Gopalarathnam	e6a0fba9ea	[logrotate]Fix logrotate firstaction script to reflect correct size (#12599 ) - Why I did it Fix logrotate firstaction script to reflect correct size. The size was modified to change dynamically based on disk size. However this variable was not updated #9504 - How I did it Updated the variable based on disk size - How to verify it Verify in the generated rsyslog file if the variable is correctly generated from jinja template	2022-11-08 13:38:14 +02:00
Zain Budhwani	8f48773fd1	Publish additional events (#12563 ) Add event_publish code or regex for rsyslog plugin for additional events	2022-11-07 09:57:57 -08:00
Mai Bui	61a085e55e	Replace os.system and remove subprocess with shell=True (#12177 ) Signed-off-by: maipbui <maibui@microsoft.com> #### Why I did it `subprocess` is used with `shell=True`, which is very dangerous for shell injection. `os` - not secure against maliciously constructed input and dangerous if used to evaluate dynamic content #### How I did it remove `shell=True`, use `shell=False` Replace `os` by `subprocess`	2022-11-04 10:48:51 -04:00
Devesh Pathak	85e3a81f47	Fix to improve hostname handling (#12064 ) * Fix to improve hostname handling If config_db.json is missing hostname entry, hostname-config.sh ends up deleting existing entry too and hostname changes to default 'localhost' * default hostname to 'sonic` if missing in config file	2022-10-25 14:51:02 -07:00
Zain Budhwani	09fe3f467f	Add Structured Events w/ YANG Models (#12270 ) Add events for dhcp-relay, bgp, syncd, & kernel.	2022-10-09 20:23:31 -07:00
Prince George	ac1d392d4c	Disable brackted-paste mode off by default (#12285 ) * Disable brackted-paste mode off by default * address review comment	2022-10-06 07:55:09 -07:00
Saikrishna Arcot	9251d4ba8b	[docker-wait-any]: Exit worker thread if main thread is expected to exit (#12255 ) There's an odd crash that intermittently happens after the teamd container exits, and a signal is raised to the main thread to exit. This thread (watching teamd) continues execution because it's in a `while True`. The subsequent wait call on the teamd container very likely returns immediately, and it calls `is_warm_restart_enabled` and `is_fast_reboot_enabled`. In either of these cases, sometimes, there is a crash in the transition from C code to Python code (after the function gets executed). Python sees that this thread got a signal to exit, because the main thread is exiting, and tells pthread to exit the thread. However, during the stack unwinding, _something_ is telling the unwinder to call `std::terminate`. The reason is unknown. This then results in a python3 SIGABRT, and systemd then doesn't call the stop script to actually stop the container (possibly because the main process exited with a SIGABRT, so it's a hard crash). This means that the container doesn't actually get stopped or restarted, resulting in an inconsistent state afterwards. The workaround appears to be that if we know the main thread needs to exit, just return here, and don't continue execution. This at least tries to avoid it from getting into the problematic code path. However, it's still feasible to get a SIGABRT, depending on thread/process timings (i.e. teamd exits, signals the main thread to exit, and then syncd exits, and syncd calls one of the two C functions, potentially hitting the issue). Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com> Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>	2022-10-05 18:14:10 -07:00
Muhammad Danish	8c10851c2a	Update azure.github.io links to sonic-net.github.io (#12209 ) Why I did it azure.github.io/SONiC/ no longer works and returns 404 Not Found. Updated it to the correct sonic-net.github.io/SONiC/	2022-10-02 14:02:10 +08:00
Zain Budhwani	fd6a1b0ce2	Add events to host and create rsyslog_plugin deb pkg (#12059 ) Why I did it Create rsyslog plugin deb for other containers/host to install Add events for bgp and host events	2022-09-21 09:20:53 -07:00
Stepan Blyshchak	a8b2a538a5	[docker-wait-any] immediately start to wait (#11595 ) It could happen that a container has already crashed but docker-wait-any will wait forever till it starts. It should, however, immediately exit to make the serivce restart. #### Why I did it It is observed in some circumstances that the auto-restart mechanism does not work. Specifically for ```swss.service```, ```orchagent``` had crashed before ```docker-wait-any``` started in ```swss.sh```. This led ```docker-wait-any``` wait forever for ```swss``` to be in ```"Running"``` state and it results in: ``` CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 1abef1ecebff bcbca2b74df6 "/usr/local/bin/supe…" 22 hours ago Up 22 hours what-just-happened 3c924d405cd5 docker-lldp:latest "/usr/bin/docker-lld…" 22 hours ago Up 22 hours lldp eb2b12a98c13 docker-router-advertiser:latest "/usr/bin/docker-ini…" 22 hours ago Up 22 hours radv d6aac4a46974 docker-sonic-mgmt-framework:latest "/usr/local/bin/supe…" 22 hours ago Up 22 hours mgmt-framework d880fd07aab9 docker-platform-monitor:latest "/usr/bin/docker_ini…" 22 hours ago Up 22 hours pmon 75f9e22d4fdd docker-snmp:latest "/usr/local/bin/supe…" 22 hours ago Up 22 hours snmp 76d570a4bd1c docker-sonic-telemetry:latest "/usr/local/bin/supe…" 22 hours ago Up 22 hours telemetry ee49f50344b3 docker-syncd-mlnx:latest "/usr/local/bin/supe…" 22 hours ago Up 22 hours syncd 1f0b0bab3687 docker-teamd:latest "/usr/local/bin/supe…" 22 hours ago Up 22 hours teamd 917aeeaf9722 docker-orchagent:latest "/usr/bin/docker-ini…" 22 hours ago Exited (0) 22 hours ago swss 81a4d3e820e8 docker-fpm-frr:latest "/usr/bin/docker_ini…" 22 hours ago Up 22 hours bgp f6eee8be282c docker-database:latest "/usr/local/bin/dock…" 22 hours ago Up 22 hours database ``` The check for ```"Running"``` state is not needed because for cold boot case we do ```start_peer_and_dependent_services``` and for warm boot case the loop will retry to wait for container if this container is doing warm boot: `d01a91a569/files/image_config/misc/docker-wait-any (L56)` #### How I did it Removed the check for ```"Running"```. #### How to verify it Kill swss before ```docker-wait-any``` is reached and verify auto restart will restart swss serivce.	2022-09-06 09:26:54 -07:00
Hua Liu	214e394ac0	Remove swsssdk from rules and image. (#11469 ) #### Why I did it To deprecate swsssdk, remove all dependency to it. #### How I did it Remove swsssdk from rules and build image scripts. #### How to verify it Pass all UT and E2E test case #### Which release branch to backport (provide reason below if selected) <!-- - Note we only backport fixes to a release branch, not features! - Please also provide a reason for the backporting below. - e.g. - [x] 202006 --> - [ ] 201811 - [ ] 201911 - [ ] 202006 - [ ] 202012 - [ ] 202106 - [ ] 202111 - [ ] 202205 #### Description for the changelog Remove swsssdk from rules and build image scripts. #### Link to config_db schema for YANG module changes <!-- Provide a link to config_db schema for the table for which YANG model is defined Link should point to correct section on https://github.com/Azure/sonic-buildimage/blob/master/src/sonic-yang-models/doc/Configuration.md --> #### A picture of a cute animal (not mandatory but encouraged)	2022-08-25 08:35:51 +08:00
anamehra	f404ce60e0	container_checker on supervisor should check containers based on asic presence (#11442 ) Why I did it On a supervisor card in a chassis, syncd/teamd/swss/lldp etc dockers are created for each Switch Fabric card. However, not all chassis would have all the switch fabric cards present. In this case, only dockers for Switch Fabrics present would be created. The monit 'container_checker' fails in this scenario as it is expecting dockers for all Switch Fabrics (based on NUM_ASIC defined in asic.conf file).	2022-08-22 10:08:29 -07:00

1 2 3 4 5 ...

471 Commits