Commit Graph

171 Commits

Author SHA1 Message Date
Danny Allen
97c675c6d5 [cron.d] Add cron job to periodically clean-up core files (#3449)
* [cron.d] Create cron job to periodically clean-up core files
* Create script to scan /var/core and clean-up older core files
* Create cron job to run clean-up script

Signed-off-by: Danny Allen <daall@microsoft.com>

* Update interval for running cron job

* Respond to feedback

* Change syslog id
2019-09-13 10:50:31 -07:00
lguohan
95a72b4e39
[baseimage]: fix monit configuration (#3448)
- monit config broke by one monit upgrade
- abandon sed approach since it is suspestible to monit config changes
- use unixsocket instead of httpd due to a bug in 5.20.0
2019-09-12 22:48:40 -07:00
Joe LeVeque
a27f12773b [baseimage]: Log message containing SONiC version to syslog at boot (#3416) 2019-09-09 14:18:23 -07:00
Ying Xie
d6b4223bdd [control plane assistant] stop control plane assistant after warm reboot (#3337)
Delay saving configuration so that the control assistant configurations
won't be persisted.

Signed-off-by: Ying Xie <ying.xie@microsoft.com>
2019-08-15 00:45:54 -07:00
Renuka Manavalan
fcdf62f5f6
Fix to ensure that tacacs servers are ordered (reverse) by priority in pam.d's config. (#3322)
Present: Servers are listed in the same order as in redis-db
Fix: Save the sort o/p, hence use sorted list to write into pam.d's conf.
     As well convert priority to integer for use by sort.
2019-08-09 11:46:46 -07:00
arheneus@marvell.com
50fe458592 [build]: SONiC buildimage ARM arch support (#2980)
ARM Architecture support in SONIC

make configure platform=[ASIC_VENDOR_ARCH] PLATFORM_ARCH=[ARM_ARCH]
SONIC_ARCH: default amd64
armhf - arm32bit
arm64 - arm64bit

Signed-off-by: Antony Rheneus <arheneus@marvell.com>
2019-07-25 22:06:41 -07:00
Harish Venkatraman
3e69427ac0 [baseimage] management VRF support via l3mdev (#2585)
This commit adds support for New feature management VRF using L3mdev.  Added
commands to enable/disable management VRF. Config vrf add mgmt will enable
management VRF, enslave the eth0 device to the master device mgmt and restart
interfaces-configs in mgmt-vrf context.

management interface (eth0) can be configured using config interface eth0 ip
add command and removed using config interface eth0 ip remove command.

Requirement and design are covered in mgmt vrf design document.  Currently show
command displays linux command output; will update show command display in next
PR after concluding what would be the output for the show commands. Added
metric for default routes in dhcp and static, any changes for metric will be
addressed subsequently after discussing.

Signed-off-by: Harish Venkatraman <harish_venkatraman@dell.com>
2019-07-24 16:18:40 -07:00
Ying Xie
9d64ce761f
[warm reboot] save configuration after warm reboot (#3200)
* [warm reboot] save configuration after warm reboot

After warm reboot, save a copy of in memory database to config_db.json,
upgrade procedure might have removed config_db.json to force new image
to reload minigraph. However, reload minigraph is skipped during warm
reboot. Missing config_db.json would cause device to fault in next
non-upgrading cold/fast reboot.

Signed-off-by: Ying Xie <ying.xie@microsoft.com>

* Update finalize-warmboot.sh
2019-07-24 09:59:47 -07:00
zzhiyuan
e4c041b57f [baseimage]: Fix process-reboot-cause possibly throwing OSError (#3159)
In case of going from previous iteration of SONiC, and the last reboot
was hardware, REBOOT_CAUSE_FILE may not be present and the service may
throw an error.
2019-07-16 08:34:11 -07:00
Joe LeVeque
5e2ab9dd03
[process-reboot-cause] Handle case if platform does not yet have sonic_platform implementation (#3126) 2019-07-05 17:53:49 -07:00
Joe LeVeque
e5a2beb13b [reboot-cause]: Move reboot cause processing to its own service, 'process-reboot-cause' (#3102) 2019-07-03 10:38:20 -07:00
Joe LeVeque
319d854e46 [baseimage]: Increase TMOUT for serial port connections to 15 minutes (#3032)
Increase TMOUT value in order to close inactive serial console connections after 900 seconds (15 minutes) of inactivity
2019-06-19 00:16:01 -07:00
Prince Sunny
231d309b69
Generate interface table to have an entry designated to default VRF. (#2848)
* Generate default VRF table for router interfaces

* Updated jinja2 template to have prefix filter
2019-06-10 14:02:55 -07:00
Myron Sosyak
3ec95e17c8 [build_templates] [hostcfgd] Keep containers hostname up to date (#2924)
* Add updateHostName function to docker_image_ctl.j2
* Add hostname specification on container creating step
* Add listener for hostname changes in hostcfgd

Signed-off-by: Myron Sosyak <msosyak@barefootnetworks.com>
2019-06-06 00:41:30 -07:00
Joe LeVeque
3ec3e20e5a [logrotate] Enhance robustness (#2942)
* [logrotate] Decrease frequency to every 10 minutes; kill any lingering logrotate processes

* [logrotate] Delete all *.1.gz files as firstaction; Remove note about init-system-helpers < 1.47 workaround

However, continue to send SIGHUP directly to rsyslogd process
because 'service rsyslog rotate' still doesn't work properly with
init-system-helpers version 1.48
2019-05-25 18:00:18 -07:00
Ying Xie
222706120d [updategraph] set DB version after minigraph reload (#2917)
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
2019-05-18 22:08:41 -07:00
Renuka Manavalan
a357693f52 [tacacs]: skip accessing tacacs servers for local non-tacacs users (#2843)
* Switch the nss look up order as "compat" followed by "tacplus".
This helps use the legacy passwd file for user info and go to tacacs only if not found.
This means, we never contact tacacs for local users like "admin".
This isolates local users from any issues with tacacs servers.
W/o this fix, the sudo commands by local users could take <count of servers> * <tacacs timeout> seconds, if the tacacs servers are unreachable.

* Skip tacacs server access for local non-tacacs users.
Revert the order of 'compat tacplus' to original 'tacplus compat' as tacplus
access is required for all tacacs users, who also get created locally.
2019-05-09 14:36:32 -07:00
Ying Xie
9efcf1759a
[ebtables] install ebtables in base image and install filter rules (#2805)
- Add ebtables package, and install some filter rules:
  1. ebtables -A FORWARD -d BGA -j DROP
  2. ebtables -A FORWARD -p ARP -j DROP

Basically, we let the ARP packets in the VLAN being forwarded by the ASIC,
kernel gets a copy of these ARP packets and the forwarding from Kenerl gets
dropped. So there is always only one copy of ARP/response in the VLAN.

Signed-off-by: Ying Xie <ying.xie@microsoft.com>
2019-05-09 09:44:41 -07:00
Joe LeVeque
6eca27e564 [services] Restart SwSS service upon unexpected critical process exit (#2845)
* [service] Restart SwSS Docker container if orchagent exits unexpectedly

* Configure systemd to stop restarting swss if it attempts to restart more than 3 times in 20 minutes

* Move supervisor-proc-exit-listener script

* [docker-dhcp-relay] Enhance wait_for_intf.sh.j2 to utilize STATEDB

* Ensure dependent services stop/start/restart with SwSS

* Change 'StartLimitInterval' to 'StartLimitIntervalSec', as Stretch installs systemd 232 (>= v230)

* Also update journald.conf options

* Remove 'PartOf' option from unit files

* Add '$(SUPERVISOR_PROC_EXIT_LISTENER_SCRIPT)' to new shared docker-orchagent makefile

* Make supervisor-proc-exit-listener script read from 'critical_processes' file inside container

* Update critical_processes file for swss container
2019-05-01 08:02:38 -07:00
Joe LeVeque
2736da97c7 [sudoers] Add /usr/bin/teamshow to READ_ONLY_CMDS (#2846) 2019-05-01 08:01:44 -07:00
zhenggen-xu
75964ef243 [baseimage]: Add fstrim service and fstrim timer by default (#2804)
This service (weekly) will let SSD firmware to do the garbage collection
after file-system deleted files. It could avoid slowness or
even READ-ONLY error due to SSD not being able to free the pages
even though the file system thinks there was a lot of space left.

Signed-off-by: Zhenggen Xu <zxu@linkedin.com>
2019-04-21 14:21:16 -07:00
Ying Xie
f583f57af6
[service] add warmboot finializer service (#2715)
After warm reboot is done, we need to disable warm reboot flag and
tear down anything setup for warm reboot and persisted across.

Signed-off-by: Ying Xie <ying.xie@microsoft.com>
2019-04-12 15:45:58 -07:00
Renuka Manavalan
6d7ecc426c [hostcfgd] -- Fix the default for failthrough as false.
This implies that by default, if TACACS is configured properly and it reported auth_err, then don't try fail through to traditional unix authentication through /etc/passwd.

If this failthrough is intended, make it explicit through "sudo config aaa authentication failthrough enable"

Removed an unused variable "aaa.fallback"

Tested manually. Note the presence of 'auth_err=die' in all cases except when failthrough is explicitly enabled.

admin@str-s6000-acs-13:~$ sudo config aaa authentication failthrough default; date
Wed Apr  3 23:05:18 UTC 2019
admin@str-s6000-acs-13:~$ ls -lrt /etc/pam.d/common-auth-sonic ; grep 123 /etc/pam.d/common-auth-sonic
-rw-r--r-- 1 root root 1316 Apr  3 23:05 /etc/pam.d/common-auth-sonic
auth    [success=done new_authtok_reqd=done default=ignore auth_err=die]        pam_tacplus.so server=100.127.20.22:49 secret=testing123 login=login timeout=5 try_first_pass
auth    [success=done new_authtok_reqd=done default=ignore auth_err=die]        pam_tacplus.so server=100.127.20.21:49 secret=testing123 login=login timeout=5 try_first_pass

admin@str-s6000-acs-13:~$ sudo config aaa authentication failthrough enable; date ; h4 "AAA|authentication"
Wed Apr  3 23:06:37 UTC 2019
admin@str-s6000-acs-13:~$ ls -lrt /etc/pam.d/common-auth-sonic ; grep 123 /etc/pam.d/common-auth-sonic
-rw-r--r-- 1 root root 1294 Apr  3 23:06 /etc/pam.d/common-auth-sonic
auth    [success=done new_authtok_reqd=done default=ignore]     pam_tacplus.so server=100.127.20.22:49 secret=testing123 login=login timeout=5 try_first_pass
auth    [success=done new_authtok_reqd=done default=ignore]     pam_tacplus.so server=100.127.20.21:49 secret=testing123 login=login timeout=5 try_first_pass

admin@str-s6000-acs-13:~$ sudo config aaa authentication failthrough disable; date ; h4 "AAA|authentication"
Wed Apr  3 23:07:09 UTC 2019
admin@str-s6000-acs-13:~$ ls -lrt /etc/pam.d/common-auth-sonic ; grep 123 /etc/pam.d/common-auth-sonic
-rw-r--r-- 1 root root 1321 Apr  3 23:07 /etc/pam.d/common-auth-sonic
auth    [success=done new_authtok_reqd=done default=ignore auth_err=die]        pam_tacplus.so server=100.127.20.22:49 secret=testing123 login=login timeout=5 try_first_pass
auth    [success=done new_authtok_reqd=done default=ignore auth_err=die]        pam_tacplus.so server=100.127.20.21:49 secret=testing123 login=login timeout=5 try_first_pass
2019-04-03 23:16:56 +00:00
Pavlo Yadvichuk
11c2e9ee3d [barefoot]: Allow configuration of platform-specific interfaces used for internal purposes (#2631)
- Why it is required
since SONiC master switches ifupdown package to the new implementation (ifupdown2), it is required to change the configuration of a platform-specific interface for wedge100bf_32x and wedge100bf_65x platforms (bc of ifupdown2 doesn't support auto mode for inet6 protocol).

Also, need to make some refactoring and remove if platform == smth then.. from the system level scripts.

- What I did

removed customization of /usr/bin/interfaces-config.sh
explicitly created directory /etc/network/interfaces.d
added "source" to the /etc/network/interfaces generation template (to include platform-specific interfaces processing)
added platform-specific interfaces config itself (for wedge100bf_32x and wedge100bf_65x)
fixed testcase in sonic-config-engine
- How to verify it

build image for wedge100bf_32x
perform sudo config reload -y on new installation
check the correct configuration of usb0 interface
- Description for the changelog

Allow configuration of platform-specific interfaces
2019-03-09 06:22:32 -08:00
RAMA CHANDRA REDDY GADDAM
b9edb7153d [aaa] Fix common-auth-sonic.j2 template issue (#2613) 2019-03-02 15:36:35 -08:00
Jipan Yang
ff74daaf13 Move warm_restart enable/disable config to stateDB WARM_RESTART_ENABLE_TABLE (#2538)
Signed-off-by: Jipan Yang <jipan.yang@alibaba-inc.com>
2019-02-19 17:06:56 -08:00
Renuka Manavalan
fa7c46611e [hostcfgd]: Promote logs for update-notifications-from-DB from DEBUG to INFO (#2576)
* Add a log message for each notification of add/del TACACS server.

Signed-off-by: Renuka Manavalan <remanava@microsoft.com>

* Moved another syslog message from DEBUG to INFO to be able to see those notifications.

All these changes are to help with a one-time-seen-bug, that hostcfgd did not act upon changes to redis for TACACS servers. We could not repro the bug.

Signed-off-by: Renuka Manavalan <remanava@microsoft.com>
2019-02-16 10:17:13 -08:00
zhenggen-xu
982eddfaa4 [updategraph] After system upgrade, restore files/directories with original attributes etc. (#2368)
* [updategraph] After system upgrade, restore files/directories with
original attributes etc.
Restore a few more files that was missed before.
Restore FRR configuration directory if exists on old system

Signed-off-by: Zhenggen Xu <zxu@linkedin.com>

* Removed deployment_id_asn_map.yml from copy list

Signed-off-by: Zhenggen Xu <zxu@linkedin.com>
2019-02-02 12:50:19 -08:00
Joe LeVeque
39b60d2a50 [reboot cause] Move reboot-cause files to /host directory so they persist across SONiC upgrades (#2490)
* [reboot cause] Move reboot-cause files to /host directory so they persist across SONiC upgrades

* [sonic-utilities] Update submodule to include related changes
2019-01-29 03:42:19 -08:00
Joe LeVeque
8f43cad061 [rsyslog] Suppress duplicate messages from base image and all Docker containers (#2497) 2019-01-29 03:41:40 -08:00
Joe LeVeque
116ddb996a [caclmgrd] Don't crash if we find empty/null rule_props (#2475)
* [caclmgrd] Don't crash if we find empty/null rule_props
2019-01-23 18:47:05 -08:00
Prabhu Sreenivasan
f28a670097 [baseimage]: Avoid removing localhost entry from /etc/hosts file (#2452)
- What I did
This fix removes the possibility of 'localhost' entry getting removed from /etc/hosts file by hostname-config service.

Without this change, whenever we change the hostname from 'localhost' to any other name on the config_db.json and reload the config, /etc/hosts file will only have the new hostname on it. But there are multiple sonic utilities (eg: swssconfig) which relies on the hard coded 'localhost' name and they tend to stop working.

- How I did it
Added a new check on hostname-config.sh script to avid blindly deleting the line containing the old hostname from /etc/hosts file. Now it will delete the old hostname only if its not localhost or when the hostname is not changing.

- How to verify it

Bring up SONiC on a device with hostname as localhost
Edit /etc/sonic/config_db.json to update the 'hostname' filed under DEVICE_METADATA from "hostname" : "localhost" --> "hostname" : "sonic"
run config reload -y to reflect the hostname change done on config_db.json file.
cat /etc/hosts and check whether both 127.0.0.1 localhost and 127.0.0.1 sonic entry are present on the file.
ping localhost should work fine.
- Description for the changelog
Make hostname-config service more robust in handling SONiC hostname change from localhost to anything else.
2019-01-17 22:47:19 -08:00
Ying Xie
6ba93acd9c
[update graph] adapt to warm reboot scenario (#2353)
* [update graph] adapt to warm reboot scenario

When migrating configuration, always copy config files from old_config
to /etc/sonic. But if warm reboot is detected, then skip configuration
operations.

Signed-off-by: Ying Xie <ying.xie@microsoft.com>

* log file copies and misses
2018-12-06 10:24:50 -08:00
kannankvs
a9a7ce1091 tacacs management vrf changes (#2217) 2018-12-04 10:22:48 -08:00
Joe LeVeque
298d2ad8f4
[boot] Refactor: All services which start Docker containers start before ntp-config service (#2335) 2018-12-03 16:01:44 -08:00
Ying Xie
84bde1511a
[sonic boot] disable dhcp during boot up, until updategraph service is running (#2316)
* [sonic] disable management port eth0 during boot up

Signed-off-by: Ying Xie <ying.xie@microsoft.com>

* [updategraph] enable dhcp client on management port eth0

Signed-off-by: Ying Xie <ying.xie@microsoft.com>
2018-11-29 08:34:22 -08:00
Joe LeVeque
d1c9b0cb77 [boot] Start ntp-config service after all Docker containers are started (#2303) 2018-11-28 00:12:03 -08:00
Ying Xie
873df9d8e8
[bde driver] black list linux_kernel_bde driver (#2284)
This driver should be loaded by sonic service. If kernel tries to load
it, the driver would be loaded with default parameters, which is not
right for sonic.

Signed-off-by: Ying Xie <ying.xie@microsoft.com>
2018-11-21 08:08:37 -08:00
Joe LeVeque
f126000cc9
[sudoers] Add 'SONIC_CLI_IFACE_MODE' to env_keep to ensure variable is made available to sudo calls (#2249) 2018-11-15 15:16:06 -08:00
Ying Xie
5cff136951 [console speed] lock console speed to start up speed (#1734)
Auto negotiating console speed could cause sonic to lock on a wrong
speed under rare conditions. The only way to come out of the wrong
speed is to issue line break or restart console service with forced
speed, or reboot sonic.

Lock down the console speed to avoid these situations.

Signed-off-by: Ying Xie <ying.xie@microsoft.com>
2018-11-01 15:12:22 -07:00
Taoyu Li
2897686de8
[updategraph] Use empty configuration when DHCP graphurl option is missing (#2185) 2018-10-29 12:16:00 -07:00
Joe LeVeque
1e1add90f9
Remove Arista-specific service ACL solution; All platforms now use caclmgrd (#2202) 2018-10-29 10:25:18 -07:00
Wenda Ni
09ae9a8965 In the case of upgrade, have pfcwd enabled on the upgraded sonic (#2192)
Signed-off-by: Wenda <wenni@microsoft.com>
2018-10-26 09:13:45 -07:00
Shuotian Cheng
7313e7d9bc [teamd]: Add teammgrd in docker-teamd (#2064)
Remove the teamd.j2 templates used for starting the teamd. Add
teammgrd instead to manage all port channel related configuration
changes. Remove front panel port related configurations in
interfaces.j2 templates as well.

Remove teamd.sh script and use teammgrd to start all the teamd
processes. Remove all the logics in the start.sh script as well.

Update the sonic-swss submodule.

Signed-off-by: Shu0T1an ChenG <shuche@microsoft.com>
2018-10-19 03:41:53 -07:00
Taoyu Li
2a24a303ec [tacplus nss conf] tacplus should be before compat (#2163) 2018-10-18 12:42:24 -07:00
Taoyu Li
018b5899be [updategraph] add support to use preset config instead of default minigraph (#2050)
* [updategraph] add support to use preset config instead of default minigraph

* Fix variable case

* Remove default minigraph case

* Remove default minigraphs and add default_sku files
2018-09-21 22:01:10 -07:00
Taoyu Li
47c9542c63 Don't reuse init_cfg.json from old image during upgrade (#2036) 2018-09-11 21:26:51 -07:00
Shuotian Cheng
9413fa9a7b
[interfaces]: Move IP/MTU information from interfaces file into database (#1908)
- Move front panel ports and port channels MTU and IP configurations out of
the current /etc/network/interfaces file and store them in the configuration
database.

- The default MTU value for both front panel ports and the port channels is
9100. They are set via the minigraph or 9100 by default.

- Introduce portmgrd which will pick up the MTU configurations from the
configuration database.

- The updated intfmgrd will pick up IP address changes from the configuration
database.

- Update sonic-swss submodule

Signed-off-by: Shu0T1an ChenG <shuche@microsoft.com>
2018-08-20 11:19:16 -07:00
Joe LeVeque
98082d56a0 [baseimage]: Download picocom version 3.1-2 from stretch-backports; No longer build from source (#1946) 2018-08-17 17:38:20 -07:00
lguohan
f3ca7c422f
[rsyslog]: use # to separate container name and program name in syslog message (#1918)
Previously use / to separate container name and program name.

However, in rsyslogd:

Precisely, the programname is terminated by either (whichever occurs first):

end of tag
nonprintable character
‘:’
‘[‘
‘/’
The above definition has been taken from the FreeBSD syslogd sources.

Signed-off-by: Guohan Lu <gulv@microsoft.com>
2018-08-12 22:23:58 -07:00