Commit Graph

363 Commits

Author SHA1 Message Date
Joe LeVeque
da57e8db36 Revert back to 'import sonic_platform' (#3249) 2019-07-31 16:44:17 -07:00
Joe LeVeque
29bbd86862
[services] Restart SwSS service upon unexpected critical process exit (#2845) (#2852) 2019-07-29 18:10:26 -07:00
Ying Xie
7cf90ec441 [warm reboot] save configuration after warm reboot (#3200)
* [warm reboot] save configuration after warm reboot

After warm reboot, save a copy of in memory database to config_db.json,
upgrade procedure might have removed config_db.json to force new image
to reload minigraph. However, reload minigraph is skipped during warm
reboot. Missing config_db.json would cause device to fault in next
non-upgrading cold/fast reboot.

Signed-off-by: Ying Xie <ying.xie@microsoft.com>

* Update finalize-warmboot.sh
2019-07-24 17:45:07 +00:00
Stephen Sun
7a9d04ee73 [Mellanox] Backporting reboot cause to 201811 (#3198)
* backport new platform api to 201811, reboot cause part

* install new platform api on host

* 1. remove chassis's dependency on sonic_platform_daemon.
2. add some mellanox-specific hardware reboot causes.
3. fix typo in files/image_config/process-reboot-cause/process-reboot-cause.

* 1. add dependency of sonic_platform for base image
2. handle the case of reboot cause file not found

* adjust log message.
2019-07-23 07:05:35 -07:00
Ying Xie
f1478818a1 Revert "[database] save configuration after DB migration (#3143)" (#3199)
This reverts commit b5a4527cb0.
2019-07-23 01:59:46 +00:00
zzhiyuan
0869fd3925 [baseimage]: Fix process-reboot-cause possibly throwing OSError (#3159)
In case of going from previous iteration of SONiC, and the last reboot
was hardware, REBOOT_CAUSE_FILE may not be present and the service may
throw an error.
2019-07-16 21:38:46 +00:00
lguohan
094f7ed9e0
Merge pull request #3015 : add kvm image support for all skus
[kvm]: add kvm image support for all skus
2019-07-16 08:26:29 -07:00
Ying Xie
a79dd716c4 [database] save configuration after DB migration (#3143)
- Make sure that migrated DB contents persisted for next boot
- Make sure that db saved after warm reboot.

Signed-off-by: Ying Xie <ying.xie@microsoft.com>
2019-07-16 03:54:14 +00:00
lguohan
6b42f753c6 [vs]: Force10-S6000 buffer settings for virtual switch (#2515)
Signed-off-by: Guohan Lu <gulv@microsoft.com>
2019-07-13 19:49:50 +00:00
Joe LeVeque
c3932e501b [process-reboot-cause] Handle case if platform does not yet have sonic_platform implementation (#3126) 2019-07-10 23:06:43 +00:00
Stepan Blyshchak
4b5abd048b [swss.sh]: Cleanup LAG entries in STATE DB (#3114)
Signed-off-by: Stepan Blyschak <stepanb@mellanox.com>
2019-07-10 23:04:33 +00:00
Joe LeVeque
1115c8431d [reboot-cause]: Move reboot cause processing to its own service, 'process-reboot-cause' (#3102) 2019-07-10 23:02:57 +00:00
Stepan Blyshchak
c932302892 fix fast reboot compatibility (#3083)
* fix fast reboot compatibility

We should handle both cases for backward-compatible with 201803:
 - fast-reboot
 - SONIC_BOOT_TYPE=fast-reboot

* handle review comments
* add a comment that getBootType code snippet is shared between two files
2019-07-10 22:53:47 +00:00
Qi Luo
588c687a27
[fast-reboot] fix fast reboot compatibility (#3083) and advance sai-redis/201811 point (#3089)
* fix fast reboot compatibility (#3083) and advance sai-redis/201811 point
* Repoint the submodule
2019-06-26 22:02:21 -07:00
Qi Luo
0ea679e297
[submodule] update sonic-linux-kernel (#3038)
* [submodule] update sonic-linux-kernel (#2985)
* Fix many version strings
* Update minor version
* Update arista-drivers submodule (#9)
* Rebuild SDK on new kernel (#10)
2019-06-20 21:21:36 -07:00
Joe LeVeque
02fc1306b0 [baseimage]: Increase TMOUT for serial port connections to 15 minutes (#3032)
Increase TMOUT value in order to close inactive serial console connections after 900 seconds (15 minutes) of inactivity
2019-06-19 19:07:36 +00:00
Joe LeVeque
8ae67c4c5d [logrotate] Enhance robustness (#2942)
* [logrotate] Decrease frequency to every 10 minutes; kill any lingering logrotate processes

* [logrotate] Delete all *.1.gz files as firstaction; Remove note about init-system-helpers < 1.47 workaround

However, continue to send SIGHUP directly to rsyslogd process
because 'service rsyslog rotate' still doesn't work properly with
init-system-helpers version 1.48
2019-05-29 00:53:13 +00:00
Stepan Blyshchak
fae35536c3 [swss.sh] flush FDB table during cold start (#2933)
Signed-off-by: Stepan Blyschak <stepanb@mellanox.com>
2019-05-29 00:51:09 +00:00
Ying Xie
5975a9c25b [updategraph] set DB version after minigraph reload (#2917)
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
2019-05-20 19:05:29 +00:00
Renuka Manavalan
238db1e06a [tacacs]: skip accessing tacacs servers for local non-tacacs users (#2843)
* Switch the nss look up order as "compat" followed by "tacplus".
This helps use the legacy passwd file for user info and go to tacacs only if not found.
This means, we never contact tacacs for local users like "admin".
This isolates local users from any issues with tacacs servers.
W/o this fix, the sudo commands by local users could take <count of servers> * <tacacs timeout> seconds, if the tacacs servers are unreachable.

* Skip tacacs server access for local non-tacacs users.
Revert the order of 'compat tacplus' to original 'tacplus compat' as tacplus
access is required for all tacacs users, who also get created locally.
2019-05-20 18:59:26 +00:00
Ying Xie
dc2fb747a5 [ebtables] install ebtables in base image and install filter rules
- Add ebtables package, and install some filter rules:
  1. ebtables -A FORWARD -d BGA -j DROP
  2. ebtables -A FORWARD -p ARP -j DROP

Basically, we let the ARP packets in the VLAN being forwarded by the ASIC,
kernel gets a copy of these ARP packets and the forwarding from Kenerl gets
dropped. So there is always only one copy of ARP/response in the VLAN.

Signed-off-by: Ying Xie <ying.xie@microsoft.com>
2019-05-06 22:13:03 +00:00
Joe LeVeque
cc90d7f5ee [sudoers] Add /usr/bin/teamshow to READ_ONLY_CMDS (#2846) 2019-05-01 15:51:13 +00:00
Ying Xie
3b02eec933 [db migrator] migrate the DB to latest schema when needed (#2808)
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
2019-04-30 23:43:52 +00:00
Qi Luo
dd31c2d84a Remove unused packages in docker images and host (#2807)
* Remove unneeded packages in docker images and host
* Remove libpython3.6 from snmp docker image
2019-04-30 19:12:00 +00:00
Ying Xie
edc8685e1e [teamd service] start teamd service after swss (#2829)
SWSS clears DB tables, if teamd is not started after swss, there is a
race condition that swss might clear vital teamd information.

Signed-off-by: Ying Xie <ying.xie@microsoft.com>
2019-04-26 22:14:14 +00:00
Andriy Moroz
5004d2b4fe Increase syncd start timeout (#2776)
* Increase syncd start timeout

Signed-off-by: Andriy Moroz <c_andriym@mellanox.com>

* Replace TimeoutSec to TimeoutStartSec

Signed-off-by: Andriy Moroz <c_andriym@mellanox.com>
2019-04-26 15:27:11 +00:00
Stepan Blyshchak
08fed3c125 [snmp.service] Make swss.service a requisite (#2790) 2019-04-18 16:55:55 +00:00
Renuka Manavalan
6c1a0ce58c [hostcfgd] -- Fix the default for failthrough as false.
This implies that by default, if TACACS is configured properly and it reported auth_err, then don't try fail through to traditional unix authentication through /etc/passwd.

If this failthrough is intended, make it explicit through "sudo config aaa authentication failthrough enable"

Removed an unused variable "aaa.fallback"

Tested manually. Note the presence of 'auth_err=die' in all cases except when failthrough is explicitly enabled.

admin@str-s6000-acs-13:~$ sudo config aaa authentication failthrough default; date
Wed Apr  3 23:05:18 UTC 2019
admin@str-s6000-acs-13:~$ ls -lrt /etc/pam.d/common-auth-sonic ; grep 123 /etc/pam.d/common-auth-sonic
-rw-r--r-- 1 root root 1316 Apr  3 23:05 /etc/pam.d/common-auth-sonic
auth    [success=done new_authtok_reqd=done default=ignore auth_err=die]        pam_tacplus.so server=100.127.20.22:49 secret=testing123 login=login timeout=5 try_first_pass
auth    [success=done new_authtok_reqd=done default=ignore auth_err=die]        pam_tacplus.so server=100.127.20.21:49 secret=testing123 login=login timeout=5 try_first_pass

admin@str-s6000-acs-13:~$ sudo config aaa authentication failthrough enable; date ; h4 "AAA|authentication"
Wed Apr  3 23:06:37 UTC 2019
admin@str-s6000-acs-13:~$ ls -lrt /etc/pam.d/common-auth-sonic ; grep 123 /etc/pam.d/common-auth-sonic
-rw-r--r-- 1 root root 1294 Apr  3 23:06 /etc/pam.d/common-auth-sonic
auth    [success=done new_authtok_reqd=done default=ignore]     pam_tacplus.so server=100.127.20.22:49 secret=testing123 login=login timeout=5 try_first_pass
auth    [success=done new_authtok_reqd=done default=ignore]     pam_tacplus.so server=100.127.20.21:49 secret=testing123 login=login timeout=5 try_first_pass

admin@str-s6000-acs-13:~$ sudo config aaa authentication failthrough disable; date ; h4 "AAA|authentication"
Wed Apr  3 23:07:09 UTC 2019
admin@str-s6000-acs-13:~$ ls -lrt /etc/pam.d/common-auth-sonic ; grep 123 /etc/pam.d/common-auth-sonic
-rw-r--r-- 1 root root 1321 Apr  3 23:07 /etc/pam.d/common-auth-sonic
auth    [success=done new_authtok_reqd=done default=ignore auth_err=die]        pam_tacplus.so server=100.127.20.22:49 secret=testing123 login=login timeout=5 try_first_pass
auth    [success=done new_authtok_reqd=done default=ignore auth_err=die]        pam_tacplus.so server=100.127.20.21:49 secret=testing123 login=login timeout=5 try_first_pass
2019-04-08 23:41:51 +00:00
Ying Xie
4eaa4dabff Revert "[teamd service] teamd service should start after syncd (#2724)" (#2733)
This reverts commit 0d1efb131c.
2019-04-04 15:22:44 +00:00
paavaanan
27f1aa7e09 removing dhcp- turn- off option from initrd (#2555)
* removing dhcp changes from initrd

* removing mgmt-intf-dhcp file
2019-04-04 15:22:14 +00:00
Ying Xie
13a643bb3e [teamd service] teamd service should start after syncd (#2724)
* [teamd service] teamd service should start after syncd

Signed-off-by: Ying Xie <ying.xie@microsoft.com>

* combine after lines
2019-04-01 22:47:47 +00:00
Ying Xie
681e34a2b1
[service] add warmboot finializer service (#2725)
After warm reboot is done, we need to disable warm reboot flag and
tear down anything setup for warm reboot and persisted across.

Signed-off-by: Ying Xie <ying.xie@microsoft.com>
2019-04-01 14:16:31 -07:00
Qi Luo
3d8d4aeef0 [security] Do not generate ssh server keys for non RSA protocols (#2718) 2019-03-29 22:37:47 +00:00
Ying Xie
f29e6230e5 [docker script] skip docker mount point checking for database container (#2683)
database container doesn't mount hwsku folder.

Signed-off-by: Ying Xie <ying.xie@microsoft.com>
2019-03-22 15:53:41 +00:00
Joe LeVeque
ecec579933 [services] Services which start containers now use 'docker wait' instead of 'docker attach' (#2661) 2019-03-19 03:05:37 +00:00
Wenda Ni
f720c2a9a3 [qos]: Map tc 1, 2, 5, and 6 back to pg 0 (#2650)
Lossy traffic does not need to be mapped to different ingress PGs. They can all share the same ingress PG.

Signed-off-by: Wenda Ni <wenni@microsoft.com>
2019-03-19 03:04:46 +00:00
Nadiia Stetskovych
4998609c2f [minigraph]: Do not fail for minigraphs which do not have neighbors listed in <Devices> section (#2522)
Signed-off-by: Nadiya.Stetskovych <nstetskovych@barefootnetworks.com>
2019-03-19 03:02:33 +00:00
Wenda Ni
0b13c45774 Add hook to allow customizing link cable lengths
Signed-off-by: Wenda Ni <wenni@microsoft.com>
2019-03-07 03:32:56 +00:00
Ying Xie
deab95cff6 [swss/syncd] cold start syncd service in swss in attach method (#2639)
start() is called by service startPre method, which is blocking. Starting
syncd service here is causing deadlock.

attach() is called by service start method, which is non-blocking.

Signed-off-by: Ying Xie <ying.xie@microsoft.com>
2019-03-07 03:30:34 +00:00
Joe LeVeque
c6ccb80803 [services] Ensure swss and syncd services start before dependent services (#2634)
* [services] Ensure swss and syncd services start before dependent services

* Add 'attach' functions to scripts which get installed to /usr/local/bin so that services only reference the one script each

* Add 'After=swss.service' to syncd.service
2019-03-07 03:23:13 +00:00
Ying Xie
d5250ad4b4 Revert "[baseimage] Delay ntp-config service to start after 5 minutes (#2494)" (#2590)
This reverts commit 33fe8d298e.
2019-02-21 18:28:04 +00:00
lguohan
c5b0c59b78 [swss]: flush asic db in swss.sh for non warm-boot (#2582)
need to flush asic db in swss.sh instead of syncd.sh

orchagent might already started in swss.sh and put commands
into asic db before asic db is flushed in syncd.sh. This
causes race condition such as INIT_VIEW not passing to syncd.

Signed-off-by: Guohan Lu <gulv@microsoft.com>
2019-02-21 18:23:58 +00:00
Renuka Manavalan
def2780f18 [hostcfgd]: Promote logs for update-notifications-from-DB from DEBUG to INFO (#2576)
* Add a log message for each notification of add/del TACACS server.

Signed-off-by: Renuka Manavalan <remanava@microsoft.com>

* Moved another syslog message from DEBUG to INFO to be able to see those notifications.

All these changes are to help with a one-time-seen-bug, that hostcfgd did not act upon changes to redis for TACACS servers. We could not repro the bug.

Signed-off-by: Renuka Manavalan <remanava@microsoft.com>
2019-02-21 18:14:04 +00:00
Stepan Blyshchak
e5daf216fd [syncd.sh] Don't stop sxdkernel during warm shutdown on Mellanox platform (#2572)
/etc/init.d/sxdkernel stop may take up to 15 sec which has impact on
control plane downtime

Signed-off-by: Stepan Blyschak <stepanb@mellanox.com>
2019-02-21 18:11:37 +00:00
Ying Xie
4faa5f2f92
[warm boot] cherry-pick PR #2538 and advance related sub-modules (#2569)
PR#2538 cannot merge due to master branch status. It has been tested
against 201811 branch.

Submodule src/sonic-sairedis 21f4a49..d57222a:
  > Add more specific logic for ingress ACL and buffer profile (#421)
  > Move warm_restart enable/disable config to stateDB WARM_RESTART_ENABLE_TABLE (#418)
  > Add support for vlan tagged frames in virtual switch (#417)

Submodule src/sonic-swss 1590030..584490c:
  > Move warm_restart enable/disable config to stateDB WARM_RESTART_ENABLE_TABLE (#786)
  > [vstest]: Potential fix for timing issue in warm_reboot's routing UT (#788)

Submodule src/sonic-swss-common 594f4e8..286ef34:
  > Move warm_restart enable/disable config to stateDB WARM_RESTART_ENABLE_TABLE (#260)

Submodule src/sonic-utilities c6666e2..b44b462:
  > Move warm_restart enable/disable config to stateDB WARM_RESTART_ENABL… (#458)
  > [aclshow] output only counters per table/rule (#442)

Signed-off-by: Ying Xie <ying.xie@microsoft.com>

[PR 2538] Move warm_restart enable/disable config to stateDB WARM_RESTART_ENABLE_TABLE

Signed-off-by: Jipan Yang <jipan.yang@alibaba-inc.com>
2019-02-14 12:12:55 -08:00
Ying Xie
24bce77def [swss/syncd] log swss/syncd service script activities (#2545)
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
2019-02-14 17:04:21 +00:00
Wenda Ni
b1bdecb1c0 [QoS]: Link pg 2 and 6 to lossy buffer profile (#2511)
* Link pg 2 and 6 to lossy buffer profile

Signed-off-by: Wenda <wenni@microsoft.com>
2019-02-03 04:41:19 +00:00
zhenggen-xu
4a24103206 [updategraph] After system upgrade, restore files/directories with original attributes etc. (#2368)
* [updategraph] After system upgrade, restore files/directories with
original attributes etc.
Restore a few more files that was missed before.
Restore FRR configuration directory if exists on old system

Signed-off-by: Zhenggen Xu <zxu@linkedin.com>

* Removed deployment_id_asn_map.yml from copy list

Signed-off-by: Zhenggen Xu <zxu@linkedin.com>
2019-02-02 20:54:58 +00:00
Prince Sunny
e9125b944a [swss]: Change VrfMgrd startup order, cleanup VRF_TABLE from state DB (#2510) 2019-02-02 19:39:42 +00:00
Joe LeVeque
f167e670fd [baseimage] Delay ntp-config service to start after 5 minutes (#2494) 2019-02-02 19:35:27 +00:00