sonic-buildimage

Author	SHA1	Message	Date
liuh-80	bb65791060	Add j2 template for enable pam_limit and limit SSH session (#10298 ) #### Why I did it When too many user login concurrently and run commands, SONiC may kernel panic on some device which has very limited memory. #### How I did it Add j2 template for setup pam_limit plugin for limit SSH session per-user. #### How to verify it Manually validate the j2 template can generate correct config file. #### Which release branch to backport (provide reason below if selected) - [x] 201811 - [ ] 201911 - [ ] 202006 - [x] 202012 - [x] 202106 - [x] 202111 #### Description for the changelog Add j2 template for setup pam_limit plugin for limit SSH session per-user. #### A picture of a cute animal (not mandatory but encouraged)	2022-03-23 16:52:09 +08:00
Ying Xie	af6ad545a3	Revert "[201811] Check platform reboot cause to see if any reset happened during fast/warm-reboot (#8912 )" (#10076 ) This reverts commit `a80319e2d0`.	2022-02-24 07:27:30 -08:00
Renuka Manavalan	7910108fd8	porting PR #8223 , which uses one shot timer to reaload tacacs config (#9987 ) Why I did it There is a small window between load & listen to config-DB. If TACACS config got updated during that gap, the listen will not show it, hence hostcfgd would miss it, until another update. How I did it porting PR #8223, which uses one shot timer to reload tacacs config.	2022-02-17 08:16:03 -08:00
Samuel Angebault	2ed7f537d4	[201811][Arista] Add emmc quirks for Upperlake (#9970 ) Why I did it Fix some unreliability seen on emmc device with some AMD CPUs How I did it Added a kernel parameter to add quirks to It depends on a sonic-linux-kernel change to work properly but will be a no-op without it. Description for the changelog Add emmc quirks for Upperlake	2022-02-11 13:26:19 -08:00
Sujin Kang	a80319e2d0	[201811] Check platform reboot cause to see if any reset happened during fast/warm-reboot (#8912 ) [201811] Check platform reboot cause to see if any reset happened during fast/warm-reboot Why I did it To recover syncd and swss from any cold reset during fast/warm-reboot How I did it Check platform reboot-cause to see if any cold reset happens for fast-reboot power up How to verify it Manual test	2021-12-01 10:50:55 -08:00
Renuka Manavalan	2a41e0f96b	[201811] disk_check.py: Change path to /usr/bin (#9074 ) The scripts from sonic-utilities are installed into /usr/bin in 201811. Hence correct path for disk_check.py to /usr/bin/	2021-10-26 18:22:10 -07:00
Ying Xie	6483bf48f6	[warmboot finalizer] load dhcpv6 copp rules when missing (#9048 ) Why I did it Need to enable DHCPv6 COPP rules. How I did it Load the separate DHCPv6 COPP rules after warm reboot if the rules are missing. How to verify it Warm reboot from an image doesn't have DHCPv6 COPP rules installed. Warm reboot from an image have DHCPv6 COPP rules already installed. In either case, the script did the right thing and only install the COPP rules if it is missing. Signed-off-by: Ying Xie ying.xie@microsoft.com	2021-10-25 08:05:55 -07:00
Vaibhav Hemant Dixit	f1d817ae54	Save DB dump after warm/fast reboot (#8913 ) Back porting the master branch change - #8803 Save the redis DB dump after warm reboot.	2021-10-22 10:51:43 -07:00
Renuka Manavalan	52366b099d	[201811] Invoke disk check periodically (#8951 ) * Invoke disk check periodically. (#7374) Why I did it Helps with periodic scan of disk for RO state. If found, this script makes transient fix and raise error message.	2021-10-15 19:43:05 -07:00
abdosi	f86b028b07	Logrotate for wtmp and btmp files to fix size getting too large. (#8744 ) Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>	2021-10-15 17:29:38 +00:00
Renuka Manavalan	77892832b7	Add service to restore TACACS from old config (#7560 ) (#8233 ) Why I did it In upgrade scenarios, where config_db.json is not carry forwarded to new image, it could be left w/o TACACS credentials. Added a service to trigger 5 minutes after boot and restore TACACS, if /etc/sonic/old_config/tacacs.json is present. How I did it By adding a service, that would fire 5 mins after boot. This service apply tacacs if available. How to verify it Upgrade and watch status of tacacs.timer & tacacs.service You may create /etc/sonic/old_config/tacacs.json, with updated credentials (before 5mins after boot) and see that appears in config & persisted too.	2021-08-02 10:33:55 -07:00
Blueve	22b5ebd792	[port_config] Introduce ad-hoc mport_config.json file (#8275 ) Signed-off-by: Jing Kan jika@microsoft.com	2021-07-29 10:41:31 +08:00
xumia	a7725e6480	Fix vtysh shell-ingestion security issue (#7991 ) Fix vtysh shell-ingestion security issue Only expose the limited parameters of the command vtysh show.	2021-06-30 19:32:21 +08:00
xumia	78f90ac7a9	Support readonly vtysh for sudoers (#7383 ) (#7573 ) * Support readonly vtysh for sudoers (#7383) Why I did it Support readonly version of the command vtysh How I did it Check if the command starting with "show", and verify only contains single command in script. * Fix the type issue in rvtysh	2021-05-19 09:02:33 +08:00
Sumukha Tumkur Vani	b6ca3bd5bb	add EPMS devicetype (#7255 )	2021-04-09 12:44:31 -07:00
rkdevi27	6c2fd18f51	Fixed S6000 abrupt reboot in 201811 (#6923 ) Why I did it The S6000 devices, the cold reboot is abrupt and it is likely to cause issues which will cause the device to land into EFI shell. Hence the platform reboot will happen after graceful unmount of all the filesystems as in S6100. How I did it Moved the platform_reboot to platform_reboot_override and hooked it to the systemd shutdown services as in S6100. Fixed the "/host unmount failed" issue as well in 201811. How to verify it Issue "reboot" command to verify if the reboot is happening gracefully.	2021-03-12 11:09:54 -08:00
arlakshm	ddbfe0631d	[baseimage]: add docker ps to the sudoer file (#6604 ) fixes Azure/sonic-utilities#1389 With the recent changes in sudoer files. The show commands fails for the read-only users. The problem here is the 'docker ps' is failing in the function [get_routing_stack()](`8a1109ed30/show/main.py (L54)`) therefore all the CLI commands are failing. Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>	2021-01-29 08:38:47 -08:00
Qi Luo	a6295f82be	Cleanup sudoers file (#6523 ) Same as https://github.com/Azure/sonic-buildimage/pull/6518 For 201811 branch	2021-01-21 14:42:10 -08:00
lguohan	bcda39f394	[sonic-linux-kernel]: update kernel to 4.9.246 (#6461 ) kernel ABI from 4.9.0-12 -> 4.9.0-14 Signed-off-by: Guohan Lu <lguohan@gmail.com> Co-authored-by: Samuel Angebault <angebault.samuel@gmail.com>	2021-01-16 12:33:23 -08:00
Renuka Manavalan	b2e3ba800e	[tacacs]: Restore from TACACS backup if present, upon load-minigraph during update-graph action. (#6407 ) Why I did it During upgrade, if config is loaded from minigraph, it would miss TACACS credentials. This leads to device losing remote user accessibility - How I did it During update graph, when config is loaded from minigraph, look for TACACS credentials back-up and load that if available - How to verify it Remove /etc/sonic/config-db.json, save TACACS credentials in /etc/sonic/tacacs.json and do a Image upgrade. Do image upgrade and boot into new image. Verify remote user access is available. NOTE: This change is available in master via PR #6285	2021-01-11 13:57:20 -08:00
Ying Xie	9ea38c417c	[rc.local] separate configuration migration and grub installation logic (#5528 ) To address issue #5525 Explicitly control the grub installation requirement when it is needed. We have scenario where configuration migration happened but grub installation is not required. Signed-off-by: Ying Xie <ying.xie@microsoft.com>	2020-10-05 15:11:35 +00:00
Blueve	55d2d15e4e	[conf] append nos-config-part for s6100 (#5234 ) * [conf] append nos-config-part for s6100 * modify rc.local Signed-off-by: Guohan Lu <lguohan@gmail.com> * Update rc.local Co-authored-by: Blueve <jika@microsoft.com> Co-authored-by: Guohan Lu <lguohan@gmail.com> Co-authored-by: Ying Xie <yxieca@users.noreply.github.com>	2020-09-08 19:30:33 +00:00
Joe LeVeque	c909422abc	[caclmgrd] Always restart service upon process termination (#5065 )	2020-08-31 20:31:13 +00:00
Joe LeVeque	4547ea022d	[caclmgrd] Improve code reuse (#4931 ) Improve code reuse in `generate_block_ip2me_traffic_iptables_commands()` function.	2020-08-31 20:30:54 +00:00
Baptiste Covolato	c706a1079f	[arista/aboot]: Zero out 1st MB before repartitioning (#5220 ) The first partition starting point was changed to be 1M as part of this commit: `6ba2f97f1e`. On systems that are misaligned before conversion (partition start is the first sector), the relica partition that is left in the first MB can cause problems in Aboot and result in corruption of the filesystem on the new aligned partition. Zeroing this old relica makes sure that there is nothing left of the old partition lying around. There won't be any risk of having Aboot corrupt the new filesystem because of the old relica. Signed-off-by: Baptiste Covolato <baptiste@arista.com>	2020-08-22 18:48:10 -07:00
Joe LeVeque	6120145bf1	[caclmgrd] remove default DROP rule on FORWARD chain (#5034 )	2020-07-24 19:09:32 +00:00
Joe LeVeque	cf142e7e6c	[caclmgrd] Filter DHCP packets based on dest port only (#4995 )	2020-07-17 18:17:27 +00:00
padmanarayana	062fd849b3	[DELL]: FTOS to SONiC fast conversion fixes (#4807 ) While migrating to SONiC 20181130, identified a couple of issues: 1. union-mount needs /host/machine.conf parameters for vendor specific checks : however, in case of migration, the /host/machine.conf is extracted from ONIE only in https://github.com/Azure/sonic-buildimage/blob/master/files/image_config/platform/rc.local#L127. 2. Since grub.cfg is updated to have net.ifnames=0 biosdevname=0, 70-persistent-net.rules changes are no longer required.	2020-06-19 22:35:29 +00:00
Joe LeVeque	d9b8bed916	[caclmgrd] Don't limit connection tracking to TCP (#4796 ) Don't limit iptables connection tracking to TCP protocol; allow connection tracking for all protocols. This allows services like NTP, which is UDP-based, to receive replies from an NTP server even if the port is blocked, as long as it is in reply to a request sent from the device itself.	2020-06-19 04:33:50 +00:00
Ying Xie	4cd54ed58c	[ntp] disable ntp long jump (#4748 ) Found another syncd timing issue related to clock going backwards. To be safe disable the ntp long jump. Signed-off-by: Ying Xie <ying.xie@microsoft.com>	2020-06-11 22:03:22 +00:00
Joe LeVeque	7ae30d7898	[caclmgrd] Get first VLAN host IP address via next() (#4685 ) I found that with IPv4Network types, calling list(ip_ntwrk.hosts()) is reliable. However, when doing the same with an IPv6Network, I found that the conversion to a list can hang indefinitely. This appears to me to be a bug in the ipaddress.IPv6Network implementation. However, I could not find any other reports on the web. This patch changes the behavior to call next() on the ip_ntwrk.hosts() generator instead, which returns the IP address of the first host.	2020-06-09 16:30:45 +00:00
Joe LeVeque	494701a0ee	[caclmgrd] Allow more ICMP types (#4625 )	2020-06-09 16:07:51 +00:00
yozhao101	aa949cdc74	[docker-syncd] Add timeout to force stop syncd container (#4617 ) - Why I did it When I tested auto-restart feature of swss container by manually killing one of critical processes in it, swss will be stopped. Then syncd container as the peer container should also be stopped as expected. However, I found sometimes syncd container can be stopped, sometimes it can not be stopped. The reason why syncd container can not be stopped is the process (/usr/local/bin/syncd.sh stop) to execute the stop() function will be stuck between the lines 164 –167. Systemd will wait for 90 seconds and then kill this process. 164 # wait until syncd quit gracefully 165 while docker top syncd$DEV \| grep -q /usr/bin/syncd; do 166 sleep 0.1 167 done The first thing I did is to profile how long this while loop will spin if syncd container can be normally stopped after swss container is stopped. The result is 5 seconds or 6 seconds. If syncd container can be normally stopped, two messages will be written into syslog: str-a7050-acs-3 NOTICE syncd#dsserve: child /usr/bin/syncd exited status: 134 str-a7050-acs-3 INFO syncd#supervisord: syncd [5] child /usr/bin/syncd exited status: 134 The second thing I did was to add a timer in the condition of while loop to ensure this while loop will be forced to exit after 20 seconds: After that, the testing result is that syncd container can be normally stopped if swss is stopped first. One more thing I want to mention is that if syncd container is stopped during 5 seconds or 6 seconds, then the two log messages can be still seen in syslog. However, if the execution time of while loop is longer than 20 seconds and is forced to exit, although syncd container can be stopped, I did not see these two messages in syslog. Further, although I observed the auto-restart feature of swss container can work correctly right now, I can not make sure the issue which syncd container can not stopped will occur in future. - How I did it I added a timer around the while loop in stop() function. This while loop will exit after spinning 20 seconds. Signed-off-by: Yong Zhao <yozhao@microsoft.com>	2020-06-09 16:07:24 +00:00
Joe LeVeque	7da0c15af5	[caclmgrd] Ignore keys in interface-related tables if no IP prefix is present (#4581 ) Since the introduction of VRF, interface-related tables in ConfigDB will have multiple entries, one of which only contains the interface name and no IP prefix. Thus, when iterating over the keys in the tables, we need to ignore the entries which do not contain IP prefixes.	2020-06-09 16:05:40 +00:00
Joe LeVeque	3ee9c5d1e3	[caclmgrd] Add some default ACCEPT rules and lastly drop all incoming packets (#4412 ) Modified caclmgrd behavior to enhance control plane security as follows: Upon starting or receiving notification of ACL table/rule changes in Config DB: 1. Add iptables/ip6tables commands to allow all incoming packets from established TCP sessions or new TCP sessions which are related to established TCP sessions 2. Add iptables/ip6tables commands to allow bidirectional ICMPv4 ping and traceroute 3. Add iptables/ip6tables commands to allow bidirectional ICMPv6 ping and traceroute 4. Add iptables/ip6tables commands to allow all incoming Neighbor Discovery Protocol (NDP) NS/NA/RS/RA messages 5. Add iptables/ip6tables commands to allow all incoming IPv4 DHCP packets 6. Add iptables/ip6tables commands to allow all incoming IPv6 DHCP packets 7. Add iptables/ip6tables commands to allow all incoming BGP traffic 8. Add iptables/ip6tables commands for all ACL rules for recognized services (currently SSH, SNMP, NTP) 9. For all services which we did not find configured ACL rules, add iptables/ip6tables commands to allow all incoming packets for those services (allows the device to accept SSH connections before the device is configured) 10. Add iptables rules to drop all packets destined for loopback interface IP addresses 11. Add iptables rules to drop all packets destined for management interface IP addresses 12. Add iptables rules to drop all packets destined for point-to-point interface IP addresses 13. Add iptables rules to drop all packets destined for our VLAN interface gateway IP addresses 14. Add iptables/ip6tables commands to allow all incoming packets with TTL of 0 or 1 (This allows the device to respond to tools like tcptraceroute) 15. If we found control plane ACLs in the configuration and applied them, we lastly add iptables/ip6tables commands to drop all other incoming packets	2020-06-09 04:21:27 +00:00
lguohan	8e014bb7e7	[baseimage]: pin down package version for azure-storage, watchdog and futures (#4575 ) Signed-off-by: Guohan Lu <lguohan@gmail.com>	2020-05-13 05:05:29 +00:00
Ying Xie	f52e59a032	[ntp] enable/disable NTP long jump according to reboot type (#4582 ) - Enable NTP long jump after cold reboot. - Disable NTP long jump after warrm/fast reboot. Signed-off-by: Ying Xie <ying.xie@microsoft.com>	2020-05-12 12:23:47 -07:00
Neetha John	3d41c271a4	[qos]: Alpha and ECN settings change for Th (#4564 ) Dynamic threshold setting changed to 0 and WRED profile green min threshold set to 250000 for Tomahawk devices Changed the dynamic threshold settings in pg_profile_lookup.ini Added a macro for WRED profiles in qos.json.j2 for Tomahawk devices Necessary changes made in qos.config.j2 to use the macro if present Signed-off-by: Neetha John <nejo@microsoft.com>	2020-05-09 18:25:17 +00:00
Joe LeVeque	ceb878414d	[process-reboot-cause] If software reboot cause is unknown add note if first boot into new image (#4538 )	2020-05-08 20:37:22 +00:00
Nazarii Hnydyn	096a0e1e18	[mellanox]: Add SSD FW update tool (#4352 ) * [mellanox]: Add SSD FW update tool. Signed-off-by: Nazarii Hnydyn <nazariig@mellanox.com> * [mellanox]: Update SSD tool. Signed-off-by: Nazarii Hnydyn <nazariig@mellanox.com>	2020-04-13 18:12:16 +03:00
SuvarnaMeenakshi	fba321ae6c	[ntp]: Add "tinker panic 0" in ntp.conf to avoid ntpd from panic (#4263 ) - What I did Add configuration to avoid ntpd from panic and exit if the drift between new time and current system time is large. - How I did it Added "tinker panic 0" in ntp.conf file. - How to verify it [this assumes that there is a valid NTP server IP in config_db/ntp.conf] Change the current system time to a bad time with a large drift from time in ntp server; drift should be greater than 1000s. Reboot the device. Before the fix: 3. upon reboot, ntp-config service comes up fine, ntp service goes to active(exited) state without any error message. This is because the offset between new time (from ntp server) and the current system time is very large, ntpd goes to panic mode and exits. The system continues to show the bad time. After the fix: 3. Upon reboot, ntp-config comes up fine, ntp services comes up from and stays in active (running) state. The system clock gets synced with the ntp server time.	2020-04-03 19:42:17 +00:00
Joe LeVeque	cbf7c7d80d	[rsyslog] Suppress duplicate messages from base image and all Docker containers (#2497 )	2020-04-02 21:42:01 +00:00
Stepan Blyshchak	a4dd0aa09f	[mellanox] add hardware watchdog script (#4274 ) admin@sonic:~$ sudo hw-management-wd.sh Usage: hw-management-wd.sh start [timeout] \| stop \| tleft \| check_reset \| help start - start watchdog timeout is optional. Default value will be used in case if it's omitted timeout provided in seconds stop - stop watchdog tleft - check watchdog timeout left check_reset - check if previous reset was caused by watchdog Prints only in case of watchdog reset help -this help Signed-off-by: Stepan Blyschak <stepanb@mellanox.com>	2020-03-31 20:34:55 -07:00
yozhao101	1cc6141a93	[Monit] Delay start of monitoring for 5 minutes (#4281 )	2020-03-19 22:49:04 +00:00
zhenggen-xu	19c1ad36a5	[201811] [interfaces-config.sh] Flush the loopback interface addresses (#4234 ) * [interfaces-config.sh] Flush the loopback interface before configure it Without this, you may end up with more and more ip addresses on loopback interface after you change the loopback ip and do config reload Signed-off-by: Zhenggen Xu <zxu@linkedin.com>	2020-03-09 16:14:59 -07:00
Prince Sunny	320dcf2008	Sleep done before mismatch handler (#4165 ) * Sleep done before mismatch handler	2020-02-25 16:39:33 +00:00
byu343	50db98e2b3	[arista]: Fix convertfs condition for booting from EOS (#4139 ) Fix the issue of incorrectly skipping the convertfs hook when fast-reboot from EOS, by adding an extra kernel cmdline param "prev_os" to differentiate fast-reboot from EOS and from SONiC. This is because we still do disk conversion for fast reboot from eos to sonic, like format the disk.	2020-02-25 16:38:56 +00:00
Stephen Sun	726fecaf8b	[process-reboot-cause]Clean up the process-reboot-cause as reqired in issue 3927 (#4128 )	2020-02-14 19:37:30 +00:00
Joe LeVeque	4af3e5066d	[interfaces-config.sh] Force lo interface down (#4149 ) Force "lo" interface down in interfaces-config.sh to prevent interface-config.service from failing with the following error: ``` -- The result is failed. systemd[1]: networking.service: Unit entered failed state. systemd[1]: networking.service: Failed with result 'exit-code'. interfaces-config.sh[29232]: Job for networking.service failed because the control process exited with error code. interfaces-config.sh[29232]: See "systemctl status networking.service" and "journalctl -xe" for details. interfaces-config.sh[29232]: ifdown: interface lo not configured interfaces-config.sh[29232]: RTNETLINK answers: File exists interfaces-config.sh[29232]: ifup: failed to bring up lo systemd[1]: interfaces-config.service: Main process exited, code=exited, status=1/FAILURE systemd[1]: Failed to start Update interfaces configuration. -- Subject: Unit interfaces-config.service has failed ``` Failure to bring down the interface will result in a failure to subsequently bring the interface back up.	2020-02-13 22:38:21 -08:00
Prince Sunny	53a2934fc5	Added timeout to ping command (#4123 )	2020-02-06 17:41:38 -08:00

1 2 3 4 5 ...

460 Commits