Commit Graph

409 Commits

Author SHA1 Message Date
Joe LeVeque
2e43e6bc6c [caclmgrd] Fix application of IPv6 service ACL rules (part 2) (#4036) 2020-01-18 01:44:42 +00:00
Sujin Kang
956b8fd7c7 [reboot cause]: Delay process-reboot-cause service until network connection is stable (#4003) 2020-01-11 01:09:08 +00:00
yozhao101
27a2e0692b [Monit] Change the monitoring period from 120 seconds to 60 seconds. (#3974)
* [Monit] Change the monitoring period of monit from 120 seconds to 60
seconds and also at the same time double the interval for existing sonic monit config file in
host.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>
2020-01-11 01:01:34 +00:00
Joe LeVeque
0eab6a4c25 [201811][apt] Instruct apt-get to NOT check the "Valid Until" date in Release files (#3975) 2020-01-08 08:34:45 -08:00
Ying Xie
5ea7372dbe
[201811][monit] address build issue: hard code ARCH to amd64 (#3982)
* [201811][monit] address build issue: hard code ARCH to amd64

- also hard code the debian package path as in 201811 branch.

Signed-off-by: Ying Xie <ying.xie@microsoft.com>
2020-01-07 07:41:40 -08:00
Joe LeVeque
640023ec57 [caclmgrd] Fix application of IPv6 service ACL rules (#3917) 2020-01-06 21:04:52 +00:00
Renuka Manavalan
da7db51259 corefile uploader: Updates per review comments offline (#3915)
* Updates per review comments
1) core_uploader service waits for syslog.service
2) core_uploader service enabled for restart on failure
3) Use mtime instead of file size + ample time to be robust.

* Avoid reloading already uploaded file, by marking the names with a prefix.

* Updated failing path.
1) If rc file is missing or required data missing, it periodically logs error in forever loop.
2) If upload fails, retry every hour with a error log, forever.

* Fix few bugs

* The binary update_json.py will come from sonic-utilities.
2020-01-06 21:03:40 +00:00
Renuka Manavalan
6db0c76a06 Corefile uploader service (#3887)
* Corefile uploader service

1) A service is added to watch /var/core and upload to Azure storage
2) The service is disabled on boot. One may enable explicitly.
3) The .rc file to be updated with acct credentials and http proxy to use.
4) If service is enabled with no credentials, it would sleep, with periodic log messages
5) For any update in .rc, the service has to be restarted to take effect.

* Remove rw permission for .rc file for group & others.

* Changes per review comments.
Re-ordered .rc file per JSON.dump order.
Added a script to enable partial update of .rc, which HWProxy would use to add acct key.

* Azure storage upload requires python module futures, hence added it to install list.

* Removed trailing spaces.

* A mistake in name corrected.
Copy the .rc updater script to /usr/bin.
2020-01-06 21:02:14 +00:00
Joe LeVeque
9ee8eba77c [monit] Build from source and patch to use MemAvailable value if available on system (#3875) 2020-01-06 20:59:32 +00:00
Samuel Angebault
e9e6bc58a7 [arista] Improve platform detection mechanism (#3921)
Rely on platform= and sid= on the command line to detect the platform rather than the eeprom
The platform will now properly initialize even if the system eeprom died or is unreachable.

Add support for the 7260CX3-64E
This is a variant of the 7260CX3-64 with no real difference for software.
2019-12-18 22:46:26 -08:00
Ying Xie
9583a74b47 [swss service] flush fast-reboot enabled flag upon swss stopping (#3908)
If we need to stop swss during fast-reboot procedure on the boot up path,
it means that something went wrong, like syncd/orchagent crashed already,
we are stopping and restarting swss/syncd to re-initialize. In this case,
we should proceed as if it is a cold reboot.

Signed-off-by: Ying Xie <ying.xie@microsoft.com>
2019-12-16 16:04:10 +00:00
Stephen Sun
49869aa6fa [process-reboot-cause]Address the issue: Incorrect reboot cause returned when warm reboot follows a hardware caused reboot (#3880)
* [process-reboot-cause]Address the issue: Incorrect reboot cause returned when warm reboot follows a hardware caused reboot
1. check whether /proc/cmdline indicates warm/fast reboot.
   if yes the software reboot cause file will be treated as the reboot cause.
   finish
2. check whether platform api returns a reboot cause.
   if yes it is treated as the reboot cause.
   finish.
3. check whether /hosts/reboot-cause contains a cause.
   if yes it is treated as the cause otherwise return unknown.

* [process-reboot-cause]Fix review comments

* [process-reboot-cause]address comments
1. use "with" statement
2. update fast/warm reboot BOOT_ARG

* [process-reboot-cause]address comments

* refactor the code flow

* Remove escape

* Remove extra ':'
2019-12-14 17:44:02 +00:00
Sujin Kang
0510fc7258 Correct the watch-control service to call the right script (#3906)
* Correct the watch-control service to call the right script

* make watchdog-control.sh executable (chmod +x)
2019-12-14 09:42:36 -08:00
Ying Xie
ca1c5bc0c4 [hostcfgd] avoid in place editing config file contents (#3904)
In place editing (sed -i) seems having some issues with filesystem
interaction. It could leave 0 size file or corrupted file behind.

It would be safer to sed the file contents into a new file and switch
new file with the old file.

Signed-off-by: Ying Xie <ying.xie@microsoft.com>
2019-12-14 03:27:39 +00:00
Sujin Kang
aea18165a8
Add watchdog-control service to disable watchdog during bootup (#3877)
* Add watchdog-control service to disable watchdog during bootup

Disable only if it's applicable and the watchdog is enabled.

* Address the review comment

* Correct the watchdog start script name

* Change to call common watchdog api instead of platform specific

* Start watchdog control service after swss starts

* advance sonic-utility submodule
2019-12-13 12:44:11 -08:00
pavel-shirshov
b28dd1db7b [fast-reboot]: Save fast-reboot state into the db [Nov] (#3892)
- Port changes #3741
2019-12-13 06:07:13 -08:00
Ying Xie
ba88f9c0ae Revert "[swss.sh] When starting, call 'systemctl restart' on dependents, not (#3807)" (#3835)
This reverts commit 351410ea8c.
2019-12-02 23:56:04 +00:00
Joe LeVeque
3920ac2368 [services] Remove explicit dependencies from dhcp_relay service file, control in swss.sh (#3823) 2019-11-27 02:21:00 +00:00
Joe LeVeque
8e86a157ff [swss.sh] When starting, call 'systemctl restart' on dependents, not (#3807)
'systemctl start'
2019-11-24 03:26:03 +00:00
Wenda Ni
8788f4f783 cherry-picking diff between #3628 and #3561
Revert "Configure buffer profile to all ports (#3561)" (#3628)
Configure buffer profile to all ports (#3561)

This reverts commit 8861cbe98e.

Signed-off-by: Wenda Ni <wenni@microsoft.com>
2019-11-08 03:12:59 +00:00
Neetha John
6d23e4c8d7 [pfcwd]: Do not start pfc watchdog on Management Tor (#3719)
Signed-off-by: Neetha John <nejo@microsoft.com>
2019-11-07 21:41:32 +00:00
lguohan
9167f9da46 [aboot]: preserve snmp.yml and acl.json for eos to sonic fast reboot (#3716) 2019-11-07 21:40:20 +00:00
Wenda Ni
0ea82d8735 Fix syntax error for qos_config template (#3619)
Signed-off-by: Wenda Ni <wenni@microsoft.com>
2019-11-07 00:22:50 +00:00
Wenda Ni
f616cec7f4 Adopt per-port buffer and qos profile (#3542)
Signed-off-by: Wenda Ni <wenni@microsoft.com>
2019-11-07 00:21:52 +00:00
lguohan
d16dbbb1d3
[bgp]: start bgp service after interfaces-config service (#3702)
interfaces-config service configures lo address. If bgp service
starts before lo address is configured, then following config
in zebra will not be applied.

route-map RM_SET_SRC permit 10
 set src 10.1.0.32

The adds a few seconds delay in bgp service start
2019-11-04 22:09:00 -08:00
Ying Xie
f764a167ac [hostname-config] improve hostname-config process (#3676)
We noticed in tests/production that there is a low probability failure
where /etc/hosts could have some garbage characters before the entry for
local host name. The consequence is that all sudo command would be very
slow. In extreme cases it would prevent some services from starting
properly.

I suspect that the /etc/hosts file might be opened by some process causing
the issue. Editing contents with new file level and replace the whole file
should be safer.

Signed-off-by: Ying Xie <ying.xie@microsoft.com>
2019-10-29 15:42:23 +00:00
Prabhu Sreenivasan
ff137a8e56 [baseimage]: Avoid removing localhost entry from /etc/hosts file (#2452)
- What I did
This fix removes the possibility of 'localhost' entry getting removed from /etc/hosts file by hostname-config service.

Without this change, whenever we change the hostname from 'localhost' to any other name on the config_db.json and reload the config, /etc/hosts file will only have the new hostname on it. But there are multiple sonic utilities (eg: swssconfig) which relies on the hard coded 'localhost' name and they tend to stop working.

- How I did it
Added a new check on hostname-config.sh script to avid blindly deleting the line containing the old hostname from /etc/hosts file. Now it will delete the old hostname only if its not localhost or when the hostname is not changing.

- How to verify it

Bring up SONiC on a device with hostname as localhost
Edit /etc/sonic/config_db.json to update the 'hostname' filed under DEVICE_METADATA from "hostname" : "localhost" --> "hostname" : "sonic"
run config reload -y to reflect the hostname change done on config_db.json file.
cat /etc/hosts and check whether both 127.0.0.1 localhost and 127.0.0.1 sonic entry are present on the file.
ping localhost should work fine.
- Description for the changelog
Make hostname-config service more robust in handling SONiC hostname change from localhost to anything else.
2019-10-29 15:42:04 +00:00
Danny Allen
818ab7fdaa [core_cleanup] Fix issue where core_cleanup job runs too frequently (#3659)
Signed-off-by: Danny Allen <daall@microsoft.com>
2019-10-24 17:04:16 +00:00
Ying Xie
c7a096b6b9
[201811][ntp] removed undefined filter (#3594)
pfx_filter is not defined in 201811 branch.

Signed-off-by: Ying Xie <ying.xie@microsoft.com>
2019-10-11 19:46:14 -07:00
Nazarii Hnydyn
41ce07e75c [mellanox]: Add CPLD update for SN2700 (#3570)
* [mellanox]: Add CPLD update for SN2700.

Signed-off-by: Nazarii Hnydyn <nazariig@mellanox.com>
2019-10-09 17:36:45 -07:00
pavel-shirshov
53ec9124bc [ntp]: Use loopback address when we don't have MGMT interface (#3566)
Added configuration to use Loopback ip if a switch doesn't have MGMT_PORT.
2019-10-07 16:56:00 +00:00
Ying Xie
37b78826ee [updategraph] enhance update graph handling (#3549)
- after reloading minigraph, write latest version string in the DB.
- if old config_db.json file exists, use it and migrate to latest version.
- only reload minigraph when config_db.json doesn't exist and minigraph
  exists.

Signed-off-by: Ying Xie <ying.xie@microsoft.com>
2019-10-02 21:04:39 +00:00
Ying Xie
e4f8a3946c [first boot] sync file system after moving/copying files (#3550)
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
2019-10-02 21:04:39 +00:00
Renuka Manavalan
0493b6274e
Map /src & /debug for debug docker containers (#3470)
* Add debug docker for SNMP.

* Removed a redundant install of debug packages.
Propagate the debug flag to template file to mount /dbg & /src to debug containers.

* Revert the last change to retain the original
2019-09-19 09:09:25 -07:00
Prince Sunny
4ef5ce74e4 Install Iptables rules to set TCPMSS for 'lo' interface (#3452)
* Install Iptables rules to set TCPMSS for lo interface
* Moved implementation to hostcfgd to maintain at one place
2019-09-19 01:08:44 +00:00
Danny Allen
ba77de12ac [cron.d] Add cron job to periodically clean-up core files (#3449)
* [cron.d] Create cron job to periodically clean-up core files
* Create script to scan /var/core and clean-up older core files
* Create cron job to run clean-up script

Signed-off-by: Danny Allen <daall@microsoft.com>

* Update interval for running cron job

* Respond to feedback

* Change syslog id
2019-09-13 17:52:10 +00:00
lguohan
87cb1e307e [baseimage]: fix monit configuration (#3448)
- monit config broke by one monit upgrade
- abandon sed approach since it is suspestible to monit config changes
- use unixsocket instead of httpd due to a bug in 5.20.0
2019-09-13 06:08:30 +00:00
sridhar-ravindran
d4758afdde [DELL] S6100 Add PowerCycle Support for Last Reset Reason (#3402)
* [DELL] S6100 Add PowerCycle Support for Last Reset Reason

* handle first time boot properly

* S6000 Last Reboot Reason Fix
2019-09-09 22:33:32 -07:00
Danny Allen
541208fca2 [build_debian] Include checksum of ASIC config files in SONiC filesystem (#3384)
[build_debian] Generate checksum of ASIC config files

* Adds script to generate checksums for ASIC config files
* Adds step to build_debian that copies ASIC config checksum into SONiC filesystem

Signed-off-by: Danny Allen daall@microsoft.com
2019-09-09 18:53:15 +00:00
Joe LeVeque
aee7d86fc9 [201811] Log message containing SONiC version to syslog at boot (#3417) 2019-09-08 12:33:08 -07:00
pavel-shirshov
b715ec89c4 [Fast-Reboot]: FR mode is active only first 3 minutes after start. (#3352)
* Fast reboot mode should be enabled only 3 minutes after restart

* Advance sonic-quagga submodule
2019-08-21 21:48:33 +00:00
Ying Xie
d821cb84b8 [radv service] radv service should be a cold only dependent of swss (#3348)
radv should be left alone during warm restart of swss. Otherwise it will
announce departure and cause hosts to lose default gateway.

Signed-off-by: Ying Xie <ying.xie@microsoft.com>
2019-08-16 19:46:37 +00:00
Ying Xie
2b8eca5ebb [control plane assistant] stop control plane assistant after warm reboot (#3337)
Delay saving configuration so that the control assistant configurations
won't be persisted.

Signed-off-by: Ying Xie <ying.xie@microsoft.com>
2019-08-15 20:28:42 +00:00
Renuka Manavalan
b80d60c277 Fix to ensure that tacacs servers are ordered (reverse) by priority in pam.d's config. (#3322)
Present: Servers are listed in the same order as in redis-db
Fix: Save the sort o/p, hence use sorted list to write into pam.d's conf.
     As well convert priority to integer for use by sort.
2019-08-14 21:20:01 +00:00
Ying Xie
a41d9a5d3f [service dependent] describe non-warm-reboot dependency outside systemd (#3311)
* [service dependent] describe non-warm-reboot dependency outside systemctl

When dependency was described with systemctl, it will kick in all the time,
including under warm reboot/restart scenarios. This is not what we always
want. For components that are capable of warm reboot/start, they need to
describe dependency in service files.

Signed-off-by: Ying Xie <ying.xie@microsoft.com>

* [service] teamd service should not require swss service

Adding require swss will cause teamd to be killed by systemctl when swss
stops. This is not what we want in warm reboot.

Signed-off-by: Ying Xie <ying.xie@microsoft.com>

* refactoring code

* rename functions to match other functions in the file
2019-08-08 22:46:06 +00:00
lguohan
36c9d99434 [build]: enable docker in ram option for small disk device (#3279)
when device disk is small, do not unzip dockerfs.tar.gz on disk.
keep the tar file on the disk, unzip to tmpfs in the initrd phase.

enabled this for 7050-qx32

Signed-off-by: Guohan Lu <gulv@microsoft.com>
2019-08-07 06:07:46 +00:00
Joe LeVeque
da57e8db36 Revert back to 'import sonic_platform' (#3249) 2019-07-31 16:44:17 -07:00
Joe LeVeque
29bbd86862
[services] Restart SwSS service upon unexpected critical process exit (#2845) (#2852) 2019-07-29 18:10:26 -07:00
Ying Xie
7cf90ec441 [warm reboot] save configuration after warm reboot (#3200)
* [warm reboot] save configuration after warm reboot

After warm reboot, save a copy of in memory database to config_db.json,
upgrade procedure might have removed config_db.json to force new image
to reload minigraph. However, reload minigraph is skipped during warm
reboot. Missing config_db.json would cause device to fault in next
non-upgrading cold/fast reboot.

Signed-off-by: Ying Xie <ying.xie@microsoft.com>

* Update finalize-warmboot.sh
2019-07-24 17:45:07 +00:00
Stephen Sun
7a9d04ee73 [Mellanox] Backporting reboot cause to 201811 (#3198)
* backport new platform api to 201811, reboot cause part

* install new platform api on host

* 1. remove chassis's dependency on sonic_platform_daemon.
2. add some mellanox-specific hardware reboot causes.
3. fix typo in files/image_config/process-reboot-cause/process-reboot-cause.

* 1. add dependency of sonic_platform for base image
2. handle the case of reboot cause file not found

* adjust log message.
2019-07-23 07:05:35 -07:00