sonic-buildimage

Author	SHA1	Message	Date
Nikola Dancejic	1bf2f72a48	[ebtables] Add multicast drop rule to ebtables (#18064 ) Adding rule to ebtables to drop multicast packets in kernel. This was done to address a bug where NS packets were flooding ports with duplicate packets. Signed-off-by: Nikola Dancejic <ndancejic@microsoft.com>	2024-02-27 13:11:58 -08:00
Prince George	0564ce48c9	[baseimage]: Update smartmontool version >= v7.4 (#17635 ) Why I did it Update smartmontool verson to 7.4. This is done to prevent smartmontools service to exit with non-zero exit status on platform that does not have a SSD/disk to be monitored. Until Debian Bullseye (which had smartmontools 7.2), Debian had a patch applied that changed the default quit mode to never exit. A bug report was filed on Debian, saying that the source code patch isn't needed and could just be done via command line options, and also that smartmontools 7.3 has a new built-in option to exit with 0 if there are no monitorable devices found (which prevents systemd from treating it as a service failure). Because of that, Debian Bookworm (which also upgraded to 7.3) removed the patch and restored the default behavior of exiting with exit code 17 if there are no devices found. Smartmontools v7.3 has this issue, because of which smartd exits with non-zero exit status even with "-q" option. How I did it Update the smartmontools to version 7.4 which has the fix for exiting gracefully if no monitoring device is found Added smartd option "-q nodev0" to allow smartd to exit with status 0 if no monitoring device found	2024-02-12 09:37:12 -08:00
Zain Budhwani	c8439cdd4b	Disable eventd and rsyslog plugin in slim images (#17905 ) ### Why I did it Disable eventd at buildtime for slim images ##### Work item tracking - Microsoft ADO (number only):26386286 #### How I did it Add flags for disabling eventd and only copy rsyslog conf files when eventd is included and not slim image #### How to verify it Manual testing	2024-01-30 22:14:23 -08:00
ganglv	c798ea8e08	Change tcp port range to support telemetry and gnmi (#17907 ) * Reserve tcp port for telemetry and gnmi * Use ip_local_port_range instead * Fix sysctl config	2024-01-26 09:31:09 -08:00
Oleksandr Ivantsiv	c693e75f0f	[dns] Do not apply dynamic DNS configuration when MGMT interface has static IP address. (#17769 ) ### Why I did it Fix the issue detected by[ TestStaticMgmtPortIP::test_dynamic_dns_not_working_when_static_ip_configured ](https://github.com/sonic-net/sonic-mgmt/blob/master/tests/dns/static_dns/test_static_dns.py#L105C9-L105C63) test. ### How I did it Query MGMT interface configuration. Do not apply dynamic DNS configuration when MGMT interface has static IP address. #### How to verify it Run `tests/dns/static_dns/test_static_dns.py` sonic-mgmt tests.	2024-01-23 16:29:55 -08:00
Hua Liu	c274be2e59	Fix IPV6 forced-mgmt-route not work issue (#17299 ) ix IPV6 forced-mgmt-route not work issue Why I did it IPV6 forced-mgmt-route not work When add a IPV6 route, should use 'ip -6 rule add pref 32764 address' command, but currently in the template the '-6' parameter are missing, so the IPV6 route been add to IPV4 route table. Also this PR depends on #17281 , which will fix the IPV6 'default' route table missing in IPV6 route lookup issue. Microsoft ADO (number only):24719238	2024-01-22 09:59:12 -08:00
bingwang-ms	977e73d370	Update backend_acl.py to specify ACL table name (#17553 )	2024-01-03 14:55:38 -08:00
prabhataravind	038ca267c8	[image_config]: Update DHCP rate-limit for mgmt TOR devices (#17630 ) * [image_config]: Update DHCP rate-limit for mgmt TOR devices Change DHCP rate limit(queue4,group3) in SONiC copp configuration to 300 PPS for mgmt TORs while keeping the rate limit at 100 PPS for other topologies. Why I did it: Some mgmt TORs based on Marvell ASIC do not support 100 PPS CIR, so that led to these devices silently dropping DHCP packets. Microsoft ADO: 25820076 How to verify it: Send DHCP broadcast packets to an M0 DUT and verify that they are trapped to CPU at 300 PPS. On non-mgmt devices, the packets should be trapped at CIR of 100 PPS. Also ran sonic-mgmt dhcp_relay test and confirmed that it passes. Signed-off-by: Prabhat Aravind <paravind@microsoft.com>	2024-01-02 21:29:34 -08:00
Junchao-Mellanox	f3f2972512	Optimize syslog rate limit feature for fast and warm boot (#17458 ) - Why I did it Optimize syslog rate limit feature for fast and warm boot - How I did it Optimize redis start time Don't render rsyslog.conf in container startup script Disable containercfgd by default. There is a new CLI to enable it (in another PR) - How to verify it Manual test Regression test	2023-12-20 09:12:03 +02:00
Yevhen Fastiuk	5efb123ede	[NTP] Add NTP extended configuration (#15058 ) hld [#1296](https://github.com/sonic-net/SONiC/pull/1296) closes [#1254](https://github.com/sonic-net/SONiC/issues/1254) depends-on [#60](https://github.com/sonic-net/sonic-host-services/pull/60), [#781](https://github.com/sonic-net/sonic-swss-common/pull/781), [#2835](https://github.com/sonic-net/sonic-utilities/pull/2835), [#10749](https://github.com/sonic-net/sonic-mgmt/pull/10749) #### Why I did it To cover the next AIs: * Configure NTP global parameters * Add/remove new NTP servers * Change the configuration for NTP servers * Show NTP status * Show NTP configuration ### How I did it * Add YANG model for a new configuration * Extend configuration templates to support new knobs ### Description for the changelog * Add ability to configure NTP global parameters such as authentication, dhcp, admin state * Change the configuration for NTP servers * Add an ability to show NTP configuration #### Link to config_db schema for YANG module changes [NTP configuration](https://github.com/sonic-net/sonic-buildimage/blob/master/src/sonic-yang-models/doc/Configuration.md#ntp-and-syslog-servers)	2023-12-11 13:31:35 -08:00
Stepan Blyshchak	9555883e6f	[config-chassisdb] use cached variables (#17342 ) - Why I did it Improve boot performance mostly needed for fast and warmboot - How I did it Use cached variable. - How to verify it Boot the system. Simply do "systemd-analyze blame" and look at service start time. Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>	2023-12-07 15:24:21 +02:00
Stepan Blyshchak	6435df1056	[config-topology] use cached variables (#17343 ) - Why I did it Improve boot performance mostly needed for fast and warmboot - How I did it Use cached variable. - How to verify it Boot the system. Simply do "systemd-analyze blame" and look at service start time. Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>	2023-12-07 15:22:44 +02:00
Hua Liu	164916681a	Fix can't access IPV6 address via management interface because 'default' route table does not add to route lookup issue. (#17281 ) Fix can't access IPV6 address via management interface because 'default' route table does not add to route lookup issue. #### Why I did it When device set with IPV6 TACACS server address, and shutdown all BGP, device can't connect to TACACS server via management interface. After investigation, I found the IPV6 'default' route table does not add to route lookup: admin@vlab-01:~$ ip -6 rule list 1001: from all lookup local 32765: from fec0::ffff:afa:1 lookup default 32766: from all lookup main admin@vlab-01:~$ As compare: admin@vlab-01:~$ ip -4 rule list 1001: from all lookup local 32764: from all to 172.17.0.1/24 lookup default 32765: from 10.250.0.101 lookup default 32766: from all lookup main 32767: from all lookup default <== 'default' route table exist in IPV4 route lookup Issue fix by add 'default' route table to route lookup with following command: admin@vlab-01:~$ sudo ip -6 rule add pref 32767 lookup default admin@vlab-01:~$ ip -6 rule list 1001: from all lookup local 32765: from fec0::ffff:afa:1 lookup default 32766: from all lookup main 32767: from all lookup default <== 'default' route table been added to IPV6 route lookup admin@vlab-01:~$ ##### Work item tracking - Microsoft ADO: 25798732 #### How I did it When management interface using 'default' route table, add 'default' route table to IPV6 route lookup. #### How to verify it Pass all UT. Add new UT to cover this change. Manually verify issue fixed: ### Tested branch (Please provide the tested image version) - [x] master-17281.417570-2133d58fa #### Description for the changelog Fix can't access IPV6 address via management interface because 'default' route table does not add to route lookup issue.	2023-12-05 11:51:56 -08:00
Kebo Liu	4c699050e8	[Mellanox] Add special rsyslog filter for MSN2410 platform (#17365 ) - Why I did it Mellanox MSN2410 platforms have a non-functional error log: "ERR pmon#sensord: Error getting sensor data: dps460/#10: Can't read". This error is because of a firmware issue with some PSU, we are not able to upgrade the FW online. Since there is no functional impact, this error log can be ignored safely - How I did it Add a new rsyslog rule to the rsyslog-container.conf.j2, if the docker name is pmon and the platform name matches, the new rule will be inserted into the docker rsyslogd.conf - How to verify it run regression on the MSN2410 platform to make the error log will not be printed to the syslog. Signed-off-by: Kebo Liu <kebol@nvidia.com>	2023-12-03 15:32:56 +02:00
Xincun Li	f13081bfbd	Ensure that 'logrotate-config.service' is set as a dependency to start before 'logrotate.service'. (#17312 ) * Ensure that 'logrotate-config.service' is set as a dependency to start before 'logrotate.service'.	2023-11-29 17:22:47 -08:00
prabhataravind	aea3c42f29	[image_config]: Update DHCP rate-limit (#17132 ) Change DHCP rate limit in SONiC copp configuration to 100 PPS as this is necessary to ensure that DHCP flood does not cause LACP/BGP flaps in all scenarios This is an extension to the change in image_config: copp: Enable rate limiting for bgp, lacp, dhcp, lldp, macsec and udld #14859 and sonic-mgmt change in [tests/copp]: Update copp mgmt tests to support new rate-limits sonic-mgmt#8199 Why I did it 300 PPS is not sufficient to prevent LACP/BGP flaps in all cases. 100 PPS seems to provide better resiliency against DHCP traffic flood to CPU. Microsoft ADO 25776614: Send DHCP broadcast packets to DUT and verify that they are trapped to CPU at 100 PPS. Signed-off-by: Prabhat Aravind <paravind@microsoft.com>	2023-11-22 15:02:17 -08:00
Saikrishna Arcot	318f3945be	Modify the sudoers file to lecture RO users once Debian changed the defaults of the sudo package to never lecture the user when using an unauthorized sudo command, which breaks our use case of lecturing once. Add a line to lecture once, which is the old defaults. Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>	2023-11-21 18:53:15 -08:00
Saikrishna Arcot	cae42998dd	Fix PAM module configuration issue pam-auth-update doesn't store local configuration, and it's meant to be used by packages only. Because libpam-systemd was getting uninstalled afterwards, this caused tacplus to get re-enabled. Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>	2023-11-21 18:53:15 -08:00
Saikrishna Arcot	ed5176107b	Update Debian build script for Bookworm Notable changes: * Use j2cli from Debian repos instead of pip * Use setuptools from Debian repos instead of pip * Use wheel from Debian repos instead of pip * Update grpcio and grpcio-tools python packages to match version in Bookworm * Use m2crypto from Debian repos instead of pip Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>	2023-11-21 18:53:15 -08:00
Saikrishna Arcot	34a1ac1a0f	Migrate from ntp to ntpsec Debian Bookworm no longer uses NTP, and instead uses NTPsec. Modify our files to update/replace the NTPsec files instead. Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>	2023-11-21 18:53:15 -08:00
abdosi	4a7aa2634f	[chassis] Support advertisement of Loopback0 of all LC's across all e-BGP peers in TSA mode (#16714 ) What I did: In Chassis TSA mode Loopback0 Ip's of each LC's should be advertise through e-BGP peers of each remote LC's How I did: - Route-map policy to Advertise own/self Loopback IP to other internal iBGP peers with a community internal_community as define in constants.yml - Route-map policy to match on above internal_community when route is received from internal iBGP peers and set a internal tag as define in constants.yml and also delete the internal_community so we don't send to any of e-BGP peers - In TSA new route-map match on above internal tag and permit the route (Loopback0 IP's of remote LC's) and set the community to traffic_shift_community. - In TSB delete the above new route-map. How I verify: Manual Verification UT updated. sonic-mgmt PR: sonic-net/sonic-mgmt#10239 Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>	2023-11-20 09:42:02 -08:00
ganglv	c71fb3a30f	Share image for gnmi and telemetry (#16863 ) Why I did it Share docker image to support gnmi container and telemetry container Work item tracking Microsoft ADO 25423918: How I did it Create telemetry image from gnmi docker image. Enable gnmi container and disable telemetry container by default. How to verify it Run end to end test.	2023-11-08 08:54:36 +08:00
prabhataravind	7e49530459	[copp]: Enable rate limiting for bgp, lacp, dhcp, lldp, macsec and udld (#14859 ) Why I did it It was observed that a flood of DHCP packets without rate-limiting can cause BGP flaps or lacp keepalive losses. This change attempts to prevent or reduce such BGP flaps by enabling appropriate rate-limiting in SONiC for all traffic types. Work item tracking Microsoft ADO 17964421: How I did it Set a reasonable CIR/CBS value of 300 for queue4_group3 (dhcp, lldp, macsec) and 6000 for queue4_group1. The value 300 was arrived at after testing with dhcp flooding using ptf (using multiple threads). Throttling at this rate was necessary to ensure that dhcp flooding does not cause BGP flaps. How to verify it Verified with this script running from ptf, that BGP flaps don't happen when CBS/CIR is set at 300 for queue4_group3. import threading from scapy.all import * def send_dhcp_discover(intf): dhcp_discover = Ether(dst='ff:ff:ff:ff:ff:ff',src=RandMAC()) \ /IP(src='1.1.1.1',dst='255.255.255.255') \ /UDP(sport=68,dport=67) \ /DHCP(options=[('message-type','discover'),('end')]) sendp(dhcp_discover,count=100000,iface=intf) if __name__ == "__main__": t1 = threading.Thread(target=send_dhcp_discover, args=("eth1",)) t2 = threading.Thread(target=send_dhcp_discover, args=("eth2",)) t1.start() t2.start() t1.join() t2.join() Verified on Arista-7260CX3-D108C8 running 202012 that the copp rule for queue4_group1 and queue4_group3 do NOT affect BGP packets. To verify this using PTF, the copp rules were modified to set the "CBS" and "CIR" for queue4_group1 and queue4_group3 at 600pps and 50k packets each of "BGP open" and "DHCP Discover" were simultaneously sent from the same PTF port to the DUT. It was verified using "show c cpu" that packets are hitting the cpu queue at 1200 pps (double the configured CIR/CBS for these packet types). This helped conclude that throttling rate is per trap (or packet type) and not per queue. Verified with updated sonic-mgmt tests ([tests/copp]: Update copp mgmt tests to support new rate-limits sonic-mgmt#8199) on broadcom and mellanox platforms that these traffic types are rate-limited. Signed-off-by: Prabhat Aravind <paravind@microsoft.com>	2023-10-25 10:49:24 -07:00
Kebo Liu	31451295d5	Add special rsyslog filter for MSN2700 platform (#16684 ) - Why I did it Mellanox MSN2700 platforms have a non-functional error log: "ERR pmon#sensord: Error getting sensor data: dps460/#10: Can't read". This error is because of a firmware issue with some PSU, we are not able to upgrade the FW online. Since there is no functional impact, this error log can be ignored safely. - How I did it Add a new rsyslog rule to the rsyslog-container.conf.j2, if the docker name is pmon and the platform name matches, the new rule will be inserted into the docker rsyslogd.conf - How to verify it run regression on the MSN2700 platform to make the error log will not be printed to the syslog. Signed-off-by: Kebo Liu <kebol@nvidia.com>	2023-10-24 17:54:44 +03:00
lixiaoyuner	bca2ce25ef	[k8master]: Install nc cmd for k8s master network issue debug (#16745 )	2023-09-30 01:16:51 -07:00
Saikrishna Arcot	f207a9b0e0	Fix potentially not having any loopback address on lo interface (#16490 ) In #15080, there was a command added to re-add 127.0.0.1/8 to the lo interface when the networking configuration is being brought down. However, the trigger for that command is `down`, which, looking at ifupdown2 configuration files, runs immediately after 127.0.0.1/16 is removed. This means there may be a period of time where there are no loopback addresses assigned to the lo interface, and redis commands will fail. Fix this by changing this to pre-down, which should run well before 127.0.0.1/16 is removed, and should always leave lo with a loopback address. Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>	2023-09-14 12:55:50 -07:00
Zain Budhwani	337a9dbcf4	Add rsyslog plugin support for frr log (#16192 ) ### Why I did it Currently there is only rsyslog plugin support for /var/log/syslog, meaning we do not detect events that occur in frr logs such as BGP Hold Timer Expiry that appears in frr/bgpd.log. ##### Work item tracking - Microsoft ADO (number only): 13366345 #### How I did it Add omprog action to frr/bgpd.log and frr/zebra.log. Add appropriate regex for both events. #### How to verify it sonic-mgmt test case	2023-09-12 16:53:45 -07:00
lixiaoyuner	4f53819efa	Install parted package for k8s master (#16484 ) ### Why I did it Need a tool to extend disk size ##### Work item tracking - Microsoft ADO (number only): 25094467 #### How I did it Install parted package #### How to verify it Use apt list parted command to check if it's installed	2023-09-07 23:22:47 -07:00
lixiaoyuner	410e6ff406	Install pyOpenSSL package for k8s master (#16361 ) ### Why I did it Need a tool to check certificate's detail of information. ##### Work item tracking - Microsoft ADO (number only): 25020260 #### How I did it Install pyOpenSSL package for k8s master #### How to verify it Pip3 list to check whether it's installed when include_kubernetes_master=y	2023-08-31 22:26:24 -07:00
Vadym Hlushko	43340cd58d	[memory_checker] Add a specific log message in a case when the docker service is not running. (#16018 ) #### Why I did it To fix the logic introduced by [[memory_checker] Do not check memory usage of containers which are not created #11129](https://github.com/sonic-net/sonic-buildimage/pull/11129). There could be a scenario before the reboot, where 1. The `docker service` has stopped 2. In a very short period of time, the monit service performs the `root@sonic:/home/admin# monit status container_memory_telemetry` In such scenario, the `memory_checker` script will throw an error to the syslog: ``` ERR memory_checker: Failed to retrieve the running container list from docker daemon! Error message is: 'Error while fetching server API version: ('Connection aborted.', FileNotFoundError(2, 'No such file or directory'))' ``` But, actually, this scenario is a correct behavior, because when the docker service is stopped, the Unix socket is destroyed and that is why we could see the `FileNotFoundError(2, 'No such file or directory'` exception in the syslog. #### How I did it Change the log severity to the warning and changed the return value. #### How to verify it It is really hard to catch the exact moment described in the `Why I did it` section. In order to check the logic: 1. Change the Unix socket path to non-existing in [/usr/bin/memory_checker](`47742dfc2c/files/image_config/monit/memory_checker (L139)`) file on the switch. 2. Execute the `root@sonic:/home/admin# monit restart container_memory_telemetry` 3. Check the syslog for such messages: ``` WARNING memory_checker: Failed to retrieve the running container list from docker daemon! Error message is: 'Error while fetching server API version: ('Connection aborte d.', FileNotFoundError(2, 'No such file or directory'))' INFO memory_checker: [memory_checker] Exits without checking memory usage since container 'telemetry' is not running! ```	2023-08-31 11:28:20 -07:00
Vaibhav Hemant Dixit	e127701660	Fix CONFIG_DB_INITIALIZED flag check logic and set/reset flag for warmboot (#15685 ) * Fix CONFIG_DB_INITIALIZED flag check logic and set/reset flag for warm-reboot * Fix db-cli usage * Handle same image warm-reboot and generalize handling of INIT flag * Cover boot from ONIE case: set config init flag when minigraph, config_db are missing * Handle case: first boot of SONiC * Check for config init flag * Simplify logic, and do not call db_migrator for same image reboot	2023-08-04 16:00:26 -07:00
Longxiang Lyu	dc139cfc32	[monit][dualtor] Periodically check mux neighbors consistency (#15769 ) Signed-off-by: Longxiang Lyu <lolv@microsoft.com>	2023-07-24 21:16:49 -07:00
lixiaoyuner	10b65d9826	Add k8s master code new (#15716 ) Why I did it Currently, k8s master image is generated from a separate branch which we created by ourselves, not release ones. We need to commit these k8s master related code to master branch for a better way to do k8s master image build out. Work item tracking Microsoft ADO (number only): 19998138 How I did it Install k8s dashboard docker images Install geneva mds and mdsd and fluentd docker images and tag them as latest, tagging latest will help create container always with the latest version Install azure-storage-blob and azure-identity, this will help do etcd backup and restore. Install kubernetes python client packages, this will help read worker and container state, we can send these metric to Geneva. Remove mdm debian package, will replace it with the mdm docker image Add k8s master entrance script, this script will be called by rc-local service when system startup. we have some master systemd services in compute-move repo, when VMM service create master VM, VMM will copy all master service files inside VM, the entrance script will setup all services according to the service files. When the entrance script content changed, the PR build will set include_kubernetes_master=y to help do validation for k8s master related code change. The default value of include_kubernetes_master should be always n for public master branch. We will generate master image from internal master branch How to verify it Build with INCLUDE_KUBERNETES_MASTER = y	2023-07-25 07:44:59 +08:00
guangyao6	9567c06570	Add BGP configuration for BGPSentinel peer (#15714 ) Why I did it For route registry service, in order to block hijacked routes, IBGP session needs to be set up from BGP sentinel service to SONiC, and BGP sentinel service advertise the same route with higher local-preference and no export community. So that SONiC takes the route from BGP sentinel as the best path and does not advertise the route to EBGP peers. In order to do that, new route-maps are needed. So this change adds a new set of templates, keeping BGPSentinel peers out of the other templates. Work item tracking Microsoft ADO (number only): 24451346 How I did it Add sentinel_community in constants.yml, route from BGPSentinel do not match this community will be denied. Add support to convert BGPSentinel related configuration in the BGPPeerPassive element of the minigraph to a new BGP_SENTINELS table in CONFIG_DB Add a new set of "sentinels" templates to docker-fpm-frr Add a new BGP peer manager to bgpcfgd, to add neighbors from the BGP_SENTINELS table using the "sentinels" templates Add a test case for minigraph.py, making sure the BGPSentinel and BGPSentinelV6 elements create BGP_SENTINELS DB entry. Add a set of test cases for the new sentinels templates in sonic-bgpcfgd tests. Add sonic-bgp-sentinel.yang and a set of testcases for the yang file. How to verify it Testcases and UT newly added would pass. Setup IPv4 and IPv6 BGPSentinel services in minigraph, and load minigraph, show CONFIG_DB and "show runningconfig bgp", configuration would be loaded successfully. Using t1-lag topo and setup IBGP session from BGPSentinel to SONiC loopback address, IBGP session would up. Advertise route from BGPSentinel to T1 with sentinel_community, higher local-preference and no-export communiyt. In T1, show bgp route, the result is "Not advertise to any EBGP peer". Withdraw the route in BGPSentinel, in T1, route would advertise to EBGP peers. Advertise route from T1 that does not match sentinel_community, in T1, would not see the route in show bgp route.	2023-07-21 09:32:29 +08:00
Liping Xu	95d11976bd	update rsyslog log size conf (#15821 ) Why I did it For some devices whose log folder size is larger than 200M, for example, 256M, the LOG_FILE_ROTATE_SIZE_KB should be 16M. and THRESHOLD_KB=$((USABLE_SPACE_KB - (NUM_LOGS_TO_ROTATE * LOG_FILE_ROTATE_SIZE_KB * 2))) = $(( (VAR_LOG_SIZE_KB * 90 / 100) - RESERVED_SPACE_KB)) - (NUM_LOGS_TO_ROTATE * LOG_FILE_ROTATE_SIZE_KB * 2))) = $(( (256M * 90 / 100) - 4096)) - (8 * 16M * 2))) the result would be a negative value Work item tracking Microsoft ADO (number only): 24524827 How I did it Add a case for 400M, if the log folder size is between 200M and 400M, set the log file size to 2M How to verify it Do cmd "sudo logrotate -f /etc/logrotate.conf" on DUT which val/log folder size is 256M, and check the syslog.	2023-07-14 15:44:17 +08:00
Mohammedz93	28b9299445	Support Reset factory (#14105 ) #### Why I did it Support reset factory in Sonic OS [Reset Factory HLD](https://github.com/sonic-net/SONiC/pull/1231) [Sonic-mgmt tests](https://github.com/sonic-net/sonic-mgmt/pull/7652) #### How I did it - Added new script "/usr/bin/reset-factory" * It generates a new config_db.json files with factory configurations * It clears system files and logs * It removes all docker containers on system except database * It clears non-default users and restores default users password - Dump the default users info to a new file during build "/etc/sonic/default_users.json" - Supported new type "Keep-basic" in "config-setup factory" - Add new conf file for config-setup "/etc/config-setup/config-setup.conf #### How to verify it - Run reset-factory script with all types: < none \| keep-all-config \| only-config \| keep-basic > - Run config-setup factory with parameters < none \| keep-basic > #### Description for the changelog Support reset factory in Sonic OS #### Ensure to add label/tag for the feature raised. example - PR#2174 under sonic-utilities repo. where, Generic Config and Update feature has been labelled as GCU.	2023-07-11 16:14:17 -07:00
iavraham	72021fdb0f	Add remote syslog configuration (#14513 ) * Add an ability to configure remote syslog servers * Add an initial configuration for remote syslog * Extend YANG module and add unit tests #### Why I did it Adding the following functionality to rsyslog feature: - Configure remote syslog servers: protocol, filter, severity level - Update global syslog configuration: severity level, message format #### How I did it added parameters to syslog server and global configuration. #### How to verify it create syslog server using CLI/adding to Redis-DB verify server is added to file /etc/rsyslog.conf and server is functional. #### Description for the changelog extend rsyslog capabilities, added server and global configuration parameters. #### Link to config_db schema for YANG module changes https://github.com/iavraham/sonic-buildimage/blob/master/src/sonic-yang-models/yang-models/sonic-syslog.yang	2023-07-10 11:40:08 -07:00
Vaibhav Hemant Dixit	ddb3086620	Revert "Revert "Fix for fast/cold-boot: call db_migrator only after old config is loaded (#14933 )" (#15464 )" (#15684 ) This reverts commit `9649a44470`.	2023-07-06 17:34:35 -07:00
Junchao-Mellanox	b07957bdad	Fix issue: systemctl daemon-reload would sporadically cause udev handler fail (#15253 ) #### Why I did it A workaround to back port the fix for a systemd issue. The systemd issue: https://github.com/systemd/systemd/issues/24668 The systemd PR to fix the issue: https://github.com/systemd/systemd/pull/24673/files The formal solution should upgrade systemd to a version that contains the fix. But, systemd is a very basic service, upgrading systemd requires heavy test. #### How I did it Copy the correct systemd-udevd.service file in build time #### Tested branch (Please provide the tested image version) - [x] 202211 - [ ] <!-- image version 2 --> ``` SONiC Software Version: SONiC.fix-udev.3-b65c7bdec_Internal SONiC OS Version: 11 Distribution: Debian 11.7 Kernel: 5.10.0-18-2-amd64 Build commit: `b65c7bdec` Build date: Mon Jun 19 10:54:50 UTC 2023 Built by: sw-r2d2-bot@r-build-sonic-ci02-241 Platform: x86_64-mlnx_msn4700-r0 HwSKU: ACS-MSN4700 ASIC: mellanox ASIC Count: 1 Serial Number: MT2022X08597 Model Number: MSN4700-WS2FO Hardware Revision: A1 Uptime: 08:10:11 up 1 min, 1 user, load average: 1.81, 0.67, 0.24 Date: Sun 25 Jun 2023 08:10:11 Docker images: REPOSITORY TAG IMAGE ID SIZE docker-fpm-frr fix-udev.3-b65c7bdec_Internal a7b911e7cb6f 346MB docker-fpm-frr latest a7b911e7cb6f 346MB docker-platform-monitor fix-udev.3-b65c7bdec_Internal 94c5178cf80b 731MB docker-platform-monitor latest 94c5178cf80b 731MB docker-orchagent fix-udev.3-b65c7bdec_Internal 46b393e0ace8 328MB docker-orchagent latest 46b393e0ace8 328MB docker-syncd-mlnx fix-udev.3-b65c7bdec_Internal 1f5c6c23e33a 734MB docker-syncd-mlnx latest 1f5c6c23e33a 734MB docker-sflow fix-udev.3-b65c7bdec_Internal 7e45992c8c59 317MB docker-sflow latest 7e45992c8c59 317MB docker-teamd fix-udev.3-b65c7bdec_Internal e4d905592cda 316MB docker-teamd latest e4d905592cda 316MB docker-nat fix-udev.3-b65c7bdec_Internal 7fe799367580 319MB docker-nat latest 7fe799367580 319MB docker-macsec latest d702a5554171 318MB docker-snmp fix-udev.3-b65c7bdec_Internal 3bce8fcf71cd 338MB docker-snmp latest 3bce8fcf71cd 338MB docker-sonic-telemetry fix-udev.3-b65c7bdec_Internal f13949cbc817 597MB docker-sonic-telemetry latest f13949cbc817 597MB docker-dhcp-relay latest 153d9072805d 306MB docker-router-advertiser fix-udev.3-b65c7bdec_Internal aed642b9a6bc 299MB docker-router-advertiser latest aed642b9a6bc 299MB docker-sonic-p4rt fix-udev.3-b65c7bdec_Internal a3cae5ca65a7 870MB docker-sonic-p4rt latest a3cae5ca65a7 870MB docker-mux fix-udev.3-b65c7bdec_Internal b81f0401b9a8 347MB docker-mux latest b81f0401b9a8 347MB docker-eventd fix-udev.3-b65c7bdec_Internal c5917d0e801f 298MB docker-eventd latest c5917d0e801f 298MB docker-lldp fix-udev.3-b65c7bdec_Internal fd5dc14a7976 341MB docker-lldp latest fd5dc14a7976 341MB docker-database fix-udev.3-b65c7bdec_Internal 438c2715a1dd 299MB docker-database latest 438c2715a1dd 299MB docker-sonic-mgmt-framework fix-udev.3-b65c7bdec_Internal 5c50b115fbcd 414MB docker-sonic-mgmt-framework latest ```	2023-06-25 16:58:14 -07:00
Oleksandr Ivantsiv	475fe27c0b	[dns] Add support for static DNS configuration. (#14549 ) - Why I did it Add support for static DNS configuration. According to sonic-net/SONiC#1262 HLD. - How I did it Add a new resolv-config.service that is responsible for transferring configuration from Config DB into /etc/resolv.conf file that is consumed by various subsystems in Linux to resolve domain names into IP addresses. - How to verify it Run the image compilation. Each component related to the static DNS feature is covered with the unit tests. Run sonic-mgmt tests. Static DNS feature will be covered with the system tests. Install the image and run manual tests.	2023-06-22 19:12:30 +03:00
Vaibhav Hemant Dixit	9649a44470	Revert "Fix for fast/cold-boot: call db_migrator only after old config is loaded (#14933 )" (#15464 ) This reverts commit `02b17839c3`. Reverts #14933 The earlier commit caused a race condition that particularly broke cross branch warm upgrade. Issue happens when db_migrator is still migrating the DB and finalizer is checking DB for list of components to reconcile. If migration is not complete, finalizer get an empty list to wait for. Due to this, finalizer concludes warmboot (deletes system wide warmboot flag) and cause all the services to do cold restart. ADO: 24274591	2023-06-16 13:58:38 -07:00
Saikrishna Arcot	f84dfd2345	Re-add 127.0.0.1/8 when bringing down the interfaces (#15080 ) * Re-add 127.0.0.1/8 when bringing down the interfaces With #5353, 127.0.0.1/16 was added to the lo interface, and then 127.0.0.1/8 was removed. However, when bringing down the lo interface, like during a config reload, 127.0.0.1/16 gets removed, but 127.0.0.1/8 isn't added back to the interface. This means that there's a period of time where 127.0.0.1 is not available at all, and services that need to connect to 127.0.01 (such as for redis DB) will fail. To fix this, when going down, add 127.0.0.1/8. Add this address before the existing configuration gets removed, so that 127.0.0.1 is available at all times. Note that running `ifdown lo` doesn't actually bring down the loopback interface; the interface always stays "physically" up. Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>	2023-06-13 18:45:39 -07:00
Vaibhav Hemant Dixit	02b17839c3	Fix for fast/cold-boot: call db_migrator only after old config is loaded (#14933 ) Why I did it Fix the issue where db_migrator is called before DB is loaded w/ config. This leads to db_migrator: Not finding anything, and resumes to incorrectly migrate every missing config This is not expected. migration should happen after the old config is loaded and only new schema changes need migration. Since DB does not have anything when migrator is called, db_migrator fails when some APIs return None. The reason for incorrect call is that: database service starts db_migrator as part of startup sequence. config-setup service loads data from old-config/minigraph. However, since it has Requires=database.service. Hence, config-setup starts only when database service is started. And database service is started when db_migrator is completed. Fixed by: Check if this is first time boot by checking pending_config_migration flag. If pending_config_migration is enabled, then do not call db_migrator as part of database service startup. Let database service start which triggers config-setup service to start. Now call db_migrator after when config-setup service loads old-config/minigraph	2023-05-30 10:16:21 -07:00
judyjoseph	efeae03ea3	Add override_config to load_minigraph in config-setup service (#14834 ) This PR is to handle the override minigraph config by golden_config_db.json file if it is present in the backup location.	2023-05-10 11:54:33 -07:00
Ying Xie	72c52bc677	Revert "Clear DNS configuration received from DHCP during networking reconfiguration in Linux. (#13516 )" (#14902 ) This reverts commit `c7ecd92c54`.	2023-05-01 17:12:38 -07:00
Tejaswini Chadaga	ca224863cb	Changes to support TSA from supervisor (#14691 ) Why I did it Support for SONIC chassis isolation using TSA and un-isolation using TSB from supervisor module Work item tracking Microsoft ADO (number only): 17826134 How I did it When TSA is run on the supervisor, it triggers TSA on each of the linecards using the secure rexec infrastructure introduced in sonic-net/sonic-utilities#2701. User password is requested to allow secure login to linecards through ssh, before execution of TSA/TSB on the linecards TSA of the chassis withdraws routes from all the external BGP neighbors on each linecard, in order to isolate the entire chassis. No route withdrawal is done from the internal BGP sessions between the linecards to prevent transient drops during internal route deletion. With these changes, complete isolation of a single linecard using TSA will not be possible (a separate CLI/script option will be introduced at a later time to achieve this) Changes also include no-stats option with TSC for quick retrieval of the current system isolation state This PR also reverts changes in #11403 How to verify it These changes have a dependency on sonic-net/sonic-utilities#2701 for testing Run TSA from supervisor module and ensure transition to Maintenance mode on each linecard Verify that all routes are withdrawn from eBGP neighbors on all linecards Run TSB from supervisor module and ensure transition to Normal mode on each linecard Verify that all routes are re-advertised from eBGP neighbors on all linecards Run TSC no-stats from supervisor and verify that just the system maintenance state is returned from all linecards	2023-04-28 16:28:06 +08:00
Aryeh Feigin	039a9c998a	[Fast-boot] Clear teamd-timer when finalizing fast-reboot (#14583 ) Part of sonic-net/sonic-utilities#2760 Similar to #14295 - Why I did it To clear teamd timer when fast-reboot is finalized to prevent any further affect. - How I did it Deleted teamd timer from config-db in fast-reboot finalizer. config save call is moved to after clearing teamd-timer so it won't have any further affect as well. - How to verify it Verified manually that entry was deleted after fast-reboot was finailized.	2023-04-18 09:15:42 +03:00
Stepan Blyshchak	d73c810e86	[image_config] add rasdaemon.timer (#14300 ) rasdaemon is a tool to log hardware errors. It takes 100% CPU during boot for a few seconds. It impacts fast/warm boot by delaying control plane restoration for 5 sec on some platforms. Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>	2023-04-17 08:58:45 -07:00
Aryeh Feigin	41a9813018	Finalize fast-reboot in warmboot finalizer (#14238 ) - Why I did it To solve an issue with upgrade with fast-reboot including FW upgrade which has been introduced since moving to fast-reboot over warm-reboot infrastructure. As well, this introduces fast-reboot finalizing logic to determine fast-reboot is done. - How I did it Added logic to finalize-warmboot script to handle fast-reboot as well, this makes sense as using fast-reboot over warm-reboot this script will be invoked. The script will clear fast-reboot entry from state-db instead of previous implementation that relied on timer. The timer could expire in some scenarios between fast-reboot finished causing fallback to cold-reboot and possible crashes. As well this PR updates all services/scripts reading fast-reboot state-db entry to look for the updated value representing fast-reboot is active. - How to verify it Run fast-reboot and check that fast-reboot entry exists in state-db right after startup and being cleared as warm-reboot is finalized and not due to a timer.	2023-04-09 16:59:15 +03:00
Hua Liu	4c059d8eb5	Improve sudo cat command for RO user. (#14428 ) Improve sudo cat command for RO user. #### Why I did it RO user can use sudo command show none syslog files. #### How I did it Improve sudo cat command for RO user. #### How to verify it Pass all UT. Manually check fixed code work correctly. #### Description for the changelog Improve sudo cat command for RO user.	2023-03-27 17:08:14 -07:00

1 2 3 4 5 ...

500 Commits