Commit Graph

190 Commits

Author SHA1 Message Date
pra-moh
bfa96bbce3 Add daemon which periodically pushes process and docker stats to State DB (#3525) 2019-11-27 15:35:41 -08:00
pra-moh
d3a1555f30 [hostcfgd] Add support to enable/disable optional features (#3653) 2019-11-26 14:11:12 -08:00
kannankvs
4007d9ba9c [ntp]: modified ntp script to hide the error related to cfggen (#3745)
This PR is to handle the issue 3527.
When device boots up, NTP throws a traceback as explained in the issue 3527.

- Traceback will be seen when MGMT_VRF_CONFIG does not exist in the database. Traceback is coming from the script “/etc/init.d/ntp”.

- Traceback does not affect the NTP functionality with/without management VRF. When MGMT_VRF_CONFIG does not exist or when MGMT_VRF_CONFIG’s mgmtVrfEnabled is configured to “false”, “NTP” will be started in the “default VRF” context, which is working fine even with this traceback.

- This traceback error will be hidden by redirecting the error to /dev/null without affecting functionality.
2019-11-14 00:06:54 -08:00
Joe LeVeque
c50c390eb4 [rsyslog] Add support for IPv6 remote addresses (#3754) 2019-11-14 00:00:55 -08:00
Tyler Li
c07ae3b16f Loopback ip addresses move to intfmgrd for supporting VRF 2019-11-10 02:27:33 -08:00
Joe LeVeque
85b0de3df1 [docker-syncd]: Restart SwSS, syncd and dependent services if a critical process in syncd container exits unexpectedly (#3534)
Add the same mechanism I developed for the SwSS service in #2845 to the syncd service. However, in order to cause the SwSS service to also exit and restart in this situation, I developed a docker-wait-any program which the SwSS service uses to wait for either the swss or syncd containers to exit.
2019-11-09 10:26:39 -08:00
lguohan
6d46badbdc
[aboot]: preserve snmp.yml and acl.json for eos to sonic fast reboot (#3716) 2019-11-06 20:18:31 -08:00
Neetha John
95466c3ab7 [pfcwd]: Do not start pfc watchdog on Management Tor (#3719)
Signed-off-by: Neetha John <nejo@microsoft.com>
2019-11-06 18:51:02 -08:00
pavel-shirshov
d5af096f41
[TSA]: Add community to the loopback prefix, when isolated (#3708)
* Rename asn/deployment_id_asn_map.yaml to constants/constants.yaml

* Fix bgp templates

* Add community for loopback when bgpd is isolated

* Use correct community value
2019-11-06 16:07:28 -08:00
Ying Xie
5961e031e1
[hostname-config] improve hostname-config process (#3676)
We noticed in tests/production that there is a low probability failure
where /etc/hosts could have some garbage characters before the entry for
local host name. The consequence is that all sudo command would be very
slow. In extreme cases it would prevent some services from starting
properly.

I suspect that the /etc/hosts file might be opened by some process causing
the issue. Editing contents with new file level and replace the whole file
should be safer.

Signed-off-by: Ying Xie <ying.xie@microsoft.com>
2019-10-29 08:30:27 -07:00
Danny Allen
63328814fc
[core_cleanup] Fix issue where core_cleanup job runs too frequently (#3659)
Signed-off-by: Danny Allen <daall@microsoft.com>
2019-10-23 15:55:47 -07:00
pavel-shirshov
9b8f5c9c9a [ntp]: Use loopback address when we don't have MGMT interface (#3566)
Added configuration to use Loopback ip if a switch doesn't have MGMT_PORT.
2019-10-07 07:49:25 -07:00
Ying Xie
cd85e2148b
[updategraph] enhance update graph handling (#3549)
- after reloading minigraph, write latest version string in the DB.
- if old config_db.json file exists, use it and migrate to latest version.
- only reload minigraph when config_db.json doesn't exist and minigraph
  exists.

Signed-off-by: Ying Xie <ying.xie@microsoft.com>
2019-10-02 13:58:44 -07:00
Ying Xie
d5262a3621
[first boot] sync file system after moving/copying files (#3550)
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
2019-10-02 13:58:34 -07:00
Long Ou
b6a09999de [hostcfgd] hostcfgd will exit when set hostname in DEVICE_METADATA (#3394)
Signed-off-by: ouxiaolong <ouxiaolong@asterfusion.com>
2019-09-24 17:36:02 -07:00
Harish Venkatraman
9d2d617264 [SNMP] management VRF SNMP support (#2608)
* [SNMP] management VRF SNMP support

This commit adds SNMP support for Management VRF using l3mdev.
The patch included provides VRF support, there is no single
"listendevice" configuration, rather multiple agentaddress
config options can each have their own "interface" to bind to
using "ip%interface". The snmpd.conf file is accordingly
generated using the snmp.yml file and redis database info.

Adding below the comments of SNMP patch 1376
--------------------------------------------
Since the Linux kernel added support for Virtual Routing
and Forwarding (VRF) in version 4.3
(Note: these won't compile on non-linux platforms)

https://www.kernel.org/doc/Documentation/networking/vrf.txt

Linux users could not use snmpd in its current form to
bind specific listening IP addresses to specific VRF
devices. A simplified description of a VRF inteface
is an interface that is a master (a container of sorts)
that collects a set of physicalinterfaces to form a
routing table.

This set of two patches (one for V5-7-patches and one
for V5-8-patches branches) is almost identical to patch
single "listendevice" configuration. Rather, multiple
agentAddress config options can each have their own
"interface" to bind to using the <ip>%<interface>
syntax.</interface></ip>
-------------------------------------------

Signed-off-by: Harish Venkatraman <harish_venkatraman@dell.com>
2019-09-18 17:26:45 -07:00
Prince Sunny
8ca1eb289e
Install Iptables rules to set TCPMSS for 'lo' interface (#3452)
* Install Iptables rules to set TCPMSS for lo interface
* Moved implementation to hostcfgd to maintain at one place
2019-09-18 10:12:28 -07:00
sridhar-ravindran
3c0b56a709 [DELL] S6100 Support PowerCycle in Last Reboot Reason (#3403)
* [DELL] S6100 Support PowerCycle in Last Reboot Reason

* handle first time boot properly

* S6000 Last Reboot Reason Fix
2019-09-17 16:51:46 -07:00
Harish Venkatraman
31d1a76197 [baseimage]: Management vrf ntp support (#3204)
This commit adds NTP support for management VRF using L3mdev. Config vrf add
mgmt will enable management VRF, enslave the eth0 device to the master device
mgmt, stop ntp service in default, restart interfaces-configs and restart ntp
service in mgmt-vrf context. Requirement and design are covered in mgmt vrf
design document.

Signed-off-by: Harish Venkatraman <harish_venkatraman@dell.com>
2019-09-16 10:21:06 -07:00
Danny Allen
97c675c6d5 [cron.d] Add cron job to periodically clean-up core files (#3449)
* [cron.d] Create cron job to periodically clean-up core files
* Create script to scan /var/core and clean-up older core files
* Create cron job to run clean-up script

Signed-off-by: Danny Allen <daall@microsoft.com>

* Update interval for running cron job

* Respond to feedback

* Change syslog id
2019-09-13 10:50:31 -07:00
lguohan
95a72b4e39
[baseimage]: fix monit configuration (#3448)
- monit config broke by one monit upgrade
- abandon sed approach since it is suspestible to monit config changes
- use unixsocket instead of httpd due to a bug in 5.20.0
2019-09-12 22:48:40 -07:00
Joe LeVeque
a27f12773b [baseimage]: Log message containing SONiC version to syslog at boot (#3416) 2019-09-09 14:18:23 -07:00
Ying Xie
d6b4223bdd [control plane assistant] stop control plane assistant after warm reboot (#3337)
Delay saving configuration so that the control assistant configurations
won't be persisted.

Signed-off-by: Ying Xie <ying.xie@microsoft.com>
2019-08-15 00:45:54 -07:00
Renuka Manavalan
fcdf62f5f6
Fix to ensure that tacacs servers are ordered (reverse) by priority in pam.d's config. (#3322)
Present: Servers are listed in the same order as in redis-db
Fix: Save the sort o/p, hence use sorted list to write into pam.d's conf.
     As well convert priority to integer for use by sort.
2019-08-09 11:46:46 -07:00
arheneus@marvell.com
50fe458592 [build]: SONiC buildimage ARM arch support (#2980)
ARM Architecture support in SONIC

make configure platform=[ASIC_VENDOR_ARCH] PLATFORM_ARCH=[ARM_ARCH]
SONIC_ARCH: default amd64
armhf - arm32bit
arm64 - arm64bit

Signed-off-by: Antony Rheneus <arheneus@marvell.com>
2019-07-25 22:06:41 -07:00
Harish Venkatraman
3e69427ac0 [baseimage] management VRF support via l3mdev (#2585)
This commit adds support for New feature management VRF using L3mdev.  Added
commands to enable/disable management VRF. Config vrf add mgmt will enable
management VRF, enslave the eth0 device to the master device mgmt and restart
interfaces-configs in mgmt-vrf context.

management interface (eth0) can be configured using config interface eth0 ip
add command and removed using config interface eth0 ip remove command.

Requirement and design are covered in mgmt vrf design document.  Currently show
command displays linux command output; will update show command display in next
PR after concluding what would be the output for the show commands. Added
metric for default routes in dhcp and static, any changes for metric will be
addressed subsequently after discussing.

Signed-off-by: Harish Venkatraman <harish_venkatraman@dell.com>
2019-07-24 16:18:40 -07:00
Ying Xie
9d64ce761f
[warm reboot] save configuration after warm reboot (#3200)
* [warm reboot] save configuration after warm reboot

After warm reboot, save a copy of in memory database to config_db.json,
upgrade procedure might have removed config_db.json to force new image
to reload minigraph. However, reload minigraph is skipped during warm
reboot. Missing config_db.json would cause device to fault in next
non-upgrading cold/fast reboot.

Signed-off-by: Ying Xie <ying.xie@microsoft.com>

* Update finalize-warmboot.sh
2019-07-24 09:59:47 -07:00
zzhiyuan
e4c041b57f [baseimage]: Fix process-reboot-cause possibly throwing OSError (#3159)
In case of going from previous iteration of SONiC, and the last reboot
was hardware, REBOOT_CAUSE_FILE may not be present and the service may
throw an error.
2019-07-16 08:34:11 -07:00
Joe LeVeque
5e2ab9dd03
[process-reboot-cause] Handle case if platform does not yet have sonic_platform implementation (#3126) 2019-07-05 17:53:49 -07:00
Joe LeVeque
e5a2beb13b [reboot-cause]: Move reboot cause processing to its own service, 'process-reboot-cause' (#3102) 2019-07-03 10:38:20 -07:00
Joe LeVeque
319d854e46 [baseimage]: Increase TMOUT for serial port connections to 15 minutes (#3032)
Increase TMOUT value in order to close inactive serial console connections after 900 seconds (15 minutes) of inactivity
2019-06-19 00:16:01 -07:00
Prince Sunny
231d309b69
Generate interface table to have an entry designated to default VRF. (#2848)
* Generate default VRF table for router interfaces

* Updated jinja2 template to have prefix filter
2019-06-10 14:02:55 -07:00
Myron Sosyak
3ec95e17c8 [build_templates] [hostcfgd] Keep containers hostname up to date (#2924)
* Add updateHostName function to docker_image_ctl.j2
* Add hostname specification on container creating step
* Add listener for hostname changes in hostcfgd

Signed-off-by: Myron Sosyak <msosyak@barefootnetworks.com>
2019-06-06 00:41:30 -07:00
Joe LeVeque
3ec3e20e5a [logrotate] Enhance robustness (#2942)
* [logrotate] Decrease frequency to every 10 minutes; kill any lingering logrotate processes

* [logrotate] Delete all *.1.gz files as firstaction; Remove note about init-system-helpers < 1.47 workaround

However, continue to send SIGHUP directly to rsyslogd process
because 'service rsyslog rotate' still doesn't work properly with
init-system-helpers version 1.48
2019-05-25 18:00:18 -07:00
Ying Xie
222706120d [updategraph] set DB version after minigraph reload (#2917)
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
2019-05-18 22:08:41 -07:00
Renuka Manavalan
a357693f52 [tacacs]: skip accessing tacacs servers for local non-tacacs users (#2843)
* Switch the nss look up order as "compat" followed by "tacplus".
This helps use the legacy passwd file for user info and go to tacacs only if not found.
This means, we never contact tacacs for local users like "admin".
This isolates local users from any issues with tacacs servers.
W/o this fix, the sudo commands by local users could take <count of servers> * <tacacs timeout> seconds, if the tacacs servers are unreachable.

* Skip tacacs server access for local non-tacacs users.
Revert the order of 'compat tacplus' to original 'tacplus compat' as tacplus
access is required for all tacacs users, who also get created locally.
2019-05-09 14:36:32 -07:00
Ying Xie
9efcf1759a
[ebtables] install ebtables in base image and install filter rules (#2805)
- Add ebtables package, and install some filter rules:
  1. ebtables -A FORWARD -d BGA -j DROP
  2. ebtables -A FORWARD -p ARP -j DROP

Basically, we let the ARP packets in the VLAN being forwarded by the ASIC,
kernel gets a copy of these ARP packets and the forwarding from Kenerl gets
dropped. So there is always only one copy of ARP/response in the VLAN.

Signed-off-by: Ying Xie <ying.xie@microsoft.com>
2019-05-09 09:44:41 -07:00
Joe LeVeque
6eca27e564 [services] Restart SwSS service upon unexpected critical process exit (#2845)
* [service] Restart SwSS Docker container if orchagent exits unexpectedly

* Configure systemd to stop restarting swss if it attempts to restart more than 3 times in 20 minutes

* Move supervisor-proc-exit-listener script

* [docker-dhcp-relay] Enhance wait_for_intf.sh.j2 to utilize STATEDB

* Ensure dependent services stop/start/restart with SwSS

* Change 'StartLimitInterval' to 'StartLimitIntervalSec', as Stretch installs systemd 232 (>= v230)

* Also update journald.conf options

* Remove 'PartOf' option from unit files

* Add '$(SUPERVISOR_PROC_EXIT_LISTENER_SCRIPT)' to new shared docker-orchagent makefile

* Make supervisor-proc-exit-listener script read from 'critical_processes' file inside container

* Update critical_processes file for swss container
2019-05-01 08:02:38 -07:00
Joe LeVeque
2736da97c7 [sudoers] Add /usr/bin/teamshow to READ_ONLY_CMDS (#2846) 2019-05-01 08:01:44 -07:00
zhenggen-xu
75964ef243 [baseimage]: Add fstrim service and fstrim timer by default (#2804)
This service (weekly) will let SSD firmware to do the garbage collection
after file-system deleted files. It could avoid slowness or
even READ-ONLY error due to SSD not being able to free the pages
even though the file system thinks there was a lot of space left.

Signed-off-by: Zhenggen Xu <zxu@linkedin.com>
2019-04-21 14:21:16 -07:00
Ying Xie
f583f57af6
[service] add warmboot finializer service (#2715)
After warm reboot is done, we need to disable warm reboot flag and
tear down anything setup for warm reboot and persisted across.

Signed-off-by: Ying Xie <ying.xie@microsoft.com>
2019-04-12 15:45:58 -07:00
Renuka Manavalan
6d7ecc426c [hostcfgd] -- Fix the default for failthrough as false.
This implies that by default, if TACACS is configured properly and it reported auth_err, then don't try fail through to traditional unix authentication through /etc/passwd.

If this failthrough is intended, make it explicit through "sudo config aaa authentication failthrough enable"

Removed an unused variable "aaa.fallback"

Tested manually. Note the presence of 'auth_err=die' in all cases except when failthrough is explicitly enabled.

admin@str-s6000-acs-13:~$ sudo config aaa authentication failthrough default; date
Wed Apr  3 23:05:18 UTC 2019
admin@str-s6000-acs-13:~$ ls -lrt /etc/pam.d/common-auth-sonic ; grep 123 /etc/pam.d/common-auth-sonic
-rw-r--r-- 1 root root 1316 Apr  3 23:05 /etc/pam.d/common-auth-sonic
auth    [success=done new_authtok_reqd=done default=ignore auth_err=die]        pam_tacplus.so server=100.127.20.22:49 secret=testing123 login=login timeout=5 try_first_pass
auth    [success=done new_authtok_reqd=done default=ignore auth_err=die]        pam_tacplus.so server=100.127.20.21:49 secret=testing123 login=login timeout=5 try_first_pass

admin@str-s6000-acs-13:~$ sudo config aaa authentication failthrough enable; date ; h4 "AAA|authentication"
Wed Apr  3 23:06:37 UTC 2019
admin@str-s6000-acs-13:~$ ls -lrt /etc/pam.d/common-auth-sonic ; grep 123 /etc/pam.d/common-auth-sonic
-rw-r--r-- 1 root root 1294 Apr  3 23:06 /etc/pam.d/common-auth-sonic
auth    [success=done new_authtok_reqd=done default=ignore]     pam_tacplus.so server=100.127.20.22:49 secret=testing123 login=login timeout=5 try_first_pass
auth    [success=done new_authtok_reqd=done default=ignore]     pam_tacplus.so server=100.127.20.21:49 secret=testing123 login=login timeout=5 try_first_pass

admin@str-s6000-acs-13:~$ sudo config aaa authentication failthrough disable; date ; h4 "AAA|authentication"
Wed Apr  3 23:07:09 UTC 2019
admin@str-s6000-acs-13:~$ ls -lrt /etc/pam.d/common-auth-sonic ; grep 123 /etc/pam.d/common-auth-sonic
-rw-r--r-- 1 root root 1321 Apr  3 23:07 /etc/pam.d/common-auth-sonic
auth    [success=done new_authtok_reqd=done default=ignore auth_err=die]        pam_tacplus.so server=100.127.20.22:49 secret=testing123 login=login timeout=5 try_first_pass
auth    [success=done new_authtok_reqd=done default=ignore auth_err=die]        pam_tacplus.so server=100.127.20.21:49 secret=testing123 login=login timeout=5 try_first_pass
2019-04-03 23:16:56 +00:00
Pavlo Yadvichuk
11c2e9ee3d [barefoot]: Allow configuration of platform-specific interfaces used for internal purposes (#2631)
- Why it is required
since SONiC master switches ifupdown package to the new implementation (ifupdown2), it is required to change the configuration of a platform-specific interface for wedge100bf_32x and wedge100bf_65x platforms (bc of ifupdown2 doesn't support auto mode for inet6 protocol).

Also, need to make some refactoring and remove if platform == smth then.. from the system level scripts.

- What I did

removed customization of /usr/bin/interfaces-config.sh
explicitly created directory /etc/network/interfaces.d
added "source" to the /etc/network/interfaces generation template (to include platform-specific interfaces processing)
added platform-specific interfaces config itself (for wedge100bf_32x and wedge100bf_65x)
fixed testcase in sonic-config-engine
- How to verify it

build image for wedge100bf_32x
perform sudo config reload -y on new installation
check the correct configuration of usb0 interface
- Description for the changelog

Allow configuration of platform-specific interfaces
2019-03-09 06:22:32 -08:00
RAMA CHANDRA REDDY GADDAM
b9edb7153d [aaa] Fix common-auth-sonic.j2 template issue (#2613) 2019-03-02 15:36:35 -08:00
Jipan Yang
ff74daaf13 Move warm_restart enable/disable config to stateDB WARM_RESTART_ENABLE_TABLE (#2538)
Signed-off-by: Jipan Yang <jipan.yang@alibaba-inc.com>
2019-02-19 17:06:56 -08:00
Renuka Manavalan
fa7c46611e [hostcfgd]: Promote logs for update-notifications-from-DB from DEBUG to INFO (#2576)
* Add a log message for each notification of add/del TACACS server.

Signed-off-by: Renuka Manavalan <remanava@microsoft.com>

* Moved another syslog message from DEBUG to INFO to be able to see those notifications.

All these changes are to help with a one-time-seen-bug, that hostcfgd did not act upon changes to redis for TACACS servers. We could not repro the bug.

Signed-off-by: Renuka Manavalan <remanava@microsoft.com>
2019-02-16 10:17:13 -08:00
zhenggen-xu
982eddfaa4 [updategraph] After system upgrade, restore files/directories with original attributes etc. (#2368)
* [updategraph] After system upgrade, restore files/directories with
original attributes etc.
Restore a few more files that was missed before.
Restore FRR configuration directory if exists on old system

Signed-off-by: Zhenggen Xu <zxu@linkedin.com>

* Removed deployment_id_asn_map.yml from copy list

Signed-off-by: Zhenggen Xu <zxu@linkedin.com>
2019-02-02 12:50:19 -08:00
Joe LeVeque
39b60d2a50 [reboot cause] Move reboot-cause files to /host directory so they persist across SONiC upgrades (#2490)
* [reboot cause] Move reboot-cause files to /host directory so they persist across SONiC upgrades

* [sonic-utilities] Update submodule to include related changes
2019-01-29 03:42:19 -08:00
Joe LeVeque
8f43cad061 [rsyslog] Suppress duplicate messages from base image and all Docker containers (#2497) 2019-01-29 03:41:40 -08:00
Joe LeVeque
116ddb996a [caclmgrd] Don't crash if we find empty/null rule_props (#2475)
* [caclmgrd] Don't crash if we find empty/null rule_props
2019-01-23 18:47:05 -08:00