Commit Graph

2581 Commits

Author SHA1 Message Date
Renuka Manavalan
da7db51259 corefile uploader: Updates per review comments offline (#3915)
* Updates per review comments
1) core_uploader service waits for syslog.service
2) core_uploader service enabled for restart on failure
3) Use mtime instead of file size + ample time to be robust.

* Avoid reloading already uploaded file, by marking the names with a prefix.

* Updated failing path.
1) If rc file is missing or required data missing, it periodically logs error in forever loop.
2) If upload fails, retry every hour with a error log, forever.

* Fix few bugs

* The binary update_json.py will come from sonic-utilities.
2020-01-06 21:03:40 +00:00
Renuka Manavalan
6db0c76a06 Corefile uploader service (#3887)
* Corefile uploader service

1) A service is added to watch /var/core and upload to Azure storage
2) The service is disabled on boot. One may enable explicitly.
3) The .rc file to be updated with acct credentials and http proxy to use.
4) If service is enabled with no credentials, it would sleep, with periodic log messages
5) For any update in .rc, the service has to be restarted to take effect.

* Remove rw permission for .rc file for group & others.

* Changes per review comments.
Re-ordered .rc file per JSON.dump order.
Added a script to enable partial update of .rc, which HWProxy would use to add acct key.

* Azure storage upload requires python module futures, hence added it to install list.

* Removed trailing spaces.

* A mistake in name corrected.
Copy the .rc updater script to /usr/bin.
2020-01-06 21:02:14 +00:00
Joe LeVeque
9ee8eba77c [monit] Build from source and patch to use MemAvailable value if available on system (#3875) 2020-01-06 20:59:32 +00:00
Sudharsan D.G
7271f9d17c [devices]: Poller to detect Intel Rangely LPC failure for dell z9100/s6100 (#3065)
- What I did
Added Daemon to Log LPC bus degradation in Intel C2000 processor. Intel Rangeley C2000 processors with revision less than or equal to 2 have issue where LPC bus degrades over time in some processors. To identify the problem and to notify the issue, a daemon has been added which will log on encountering the issue.

- How I did it
Added a daemon which validates the CPLD scratch(0x102) and SMF scratch(0x202) registers by writing and reading values on regular polling intervals (300 seconds). If there is a discrepancy between read and write, a critical log will be thrown.

- How to verify it
The infra is verify by simulating the issue where between write and read, the value in register is modified and the log appearance is checked.

- Description for the changelog

Added Daemon to identify LPC bus degradation issue and notify using syslog in Dell S6100 and Z9100 platforms. This daemon will only run on processors with revision less than or equal to 2.
2020-01-06 18:58:18 +00:00
paavaanan
1f210771d1 [devices]: DellEMC S6000 PSU Temperature (#3954) 2019-12-31 17:22:20 -08:00
Samuel Angebault
e9e6bc58a7 [arista] Improve platform detection mechanism (#3921)
Rely on platform= and sid= on the command line to detect the platform rather than the eeprom
The platform will now properly initialize even if the system eeprom died or is unreachable.

Add support for the 7260CX3-64E
This is a variant of the 7260CX3-64 with no real difference for software.
2019-12-18 22:46:26 -08:00
Ying Xie
9583a74b47 [swss service] flush fast-reboot enabled flag upon swss stopping (#3908)
If we need to stop swss during fast-reboot procedure on the boot up path,
it means that something went wrong, like syncd/orchagent crashed already,
we are stopping and restarting swss/syncd to re-initialize. In this case,
we should proceed as if it is a cold reboot.

Signed-off-by: Ying Xie <ying.xie@microsoft.com>
2019-12-16 16:04:10 +00:00
Stephen Sun
49869aa6fa [process-reboot-cause]Address the issue: Incorrect reboot cause returned when warm reboot follows a hardware caused reboot (#3880)
* [process-reboot-cause]Address the issue: Incorrect reboot cause returned when warm reboot follows a hardware caused reboot
1. check whether /proc/cmdline indicates warm/fast reboot.
   if yes the software reboot cause file will be treated as the reboot cause.
   finish
2. check whether platform api returns a reboot cause.
   if yes it is treated as the reboot cause.
   finish.
3. check whether /hosts/reboot-cause contains a cause.
   if yes it is treated as the cause otherwise return unknown.

* [process-reboot-cause]Fix review comments

* [process-reboot-cause]address comments
1. use "with" statement
2. update fast/warm reboot BOOT_ARG

* [process-reboot-cause]address comments

* refactor the code flow

* Remove escape

* Remove extra ':'
2019-12-14 17:44:02 +00:00
Sujin Kang
0510fc7258 Correct the watch-control service to call the right script (#3906)
* Correct the watch-control service to call the right script

* make watchdog-control.sh executable (chmod +x)
2019-12-14 09:42:36 -08:00
Ying Xie
ca1c5bc0c4 [hostcfgd] avoid in place editing config file contents (#3904)
In place editing (sed -i) seems having some issues with filesystem
interaction. It could leave 0 size file or corrupted file behind.

It would be safer to sed the file contents into a new file and switch
new file with the old file.

Signed-off-by: Ying Xie <ying.xie@microsoft.com>
2019-12-14 03:27:39 +00:00
pavel-shirshov
a43425794f [lldpd]: Ports few fixes from lldpd master (#3889)
* lldpctl: put a lock around some commands to avoid race conditions

* Read all notifications in lldpctl_recv

* lib: fix memory leak

* lib: fix memory leak when handling I/O

* Update series
2019-12-14 01:05:13 +00:00
paavaanan
848c5961f8 DellEMC S6000 sensor.conf update (#3870) 2019-12-13 15:01:20 -08:00
Sujin Kang
aea18165a8
Add watchdog-control service to disable watchdog during bootup (#3877)
* Add watchdog-control service to disable watchdog during bootup

Disable only if it's applicable and the watchdog is enabled.

* Address the review comment

* Correct the watchdog start script name

* Change to call common watchdog api instead of platform specific

* Start watchdog control service after swss starts

* advance sonic-utility submodule
2019-12-13 12:44:11 -08:00
Volodymyr Samotiy
a26809a223 [Mellanox]: Update SAI pointer (#3884)
Signed-off-by: Volodymyr Samotiy <volodymyrs@mellanox.com>
2019-12-13 11:29:26 -08:00
Ying Xie
06c69ee75e
[201811][swss] advance swss submodule head (#3897)
Submodule src/sonic-swss 8ef513c..f6bfe77:
  > [aclorch] Enable DSCP rules on IPv6 mirror tables (#1146)

Signed-off-by: Ying Xie <ying.xie@microsoft.com>
2019-12-13 10:37:51 -08:00
Qi Luo
4becd5299a
Update submodule: sonic-snmpagent (#3894) 2019-12-13 09:04:29 -08:00
pavel-shirshov
b28dd1db7b [fast-reboot]: Save fast-reboot state into the db [Nov] (#3892)
- Port changes #3741
2019-12-13 06:07:13 -08:00
Ying Xie
68f3b95505
[201811][utilities] advance utilities submodule head (#3876)
Submodule src/sonic-utilities ae274e5..8237848:
  > [fast/warm reboot] ignore errors after shutting down critical service(s) (#761)
  > [neighbor advertiser] raise exception when http endpoint return failure (#758)

Signed-off-by: Ying Xie <ying.xie@microsoft.com>
2019-12-12 14:20:09 -08:00
Joe LeVeque
5615991822 Revert "[dhcp_relay] Add extra sleep before starting relay agent processes (#3824)" (#3857)
This reverts commit 7622a30d98.
2019-12-12 00:16:09 +00:00
paavaanan
11920b37a0 [devices]: DellEMC S6100 Watchdog support (#2835) 2019-12-11 17:45:34 +00:00
Joe LeVeque
4efaeef31c [isc-dhcp-relay] Patch to allow DHCP relay to discover interfaces even if they are down (#3852)
Patch isc-dhcp-relay in order to allow the relay agent to discover configured interfaces even if they are down.

Without this patch, the relay agent will not discover configured interfaces if they are down when the relay agent starts up. If the interface(s) then get brought up after the relay started, the relay will discard packets received on these interfaces and log the message, Discarding packet received on <iface_name> interface that has no IPv4 address assigned. This led to race conditions when starting SONiC (or loading configuration). To resolve this, the relay agent would need to be restarted with all configured interfaces up.

With this patch, the relay agent will discover all configured interfaces, whether or not they are up at the time the relay agent starts. Thus, the state of the configured interfaces can be down when the relay agent starts and brought up during the lifetime of the relay agent process, and the relay agent will relay packets as expected; it will not discard them.
2019-12-07 11:27:22 -08:00
Renuka Manavalan
92df547d83
Build debug docker for fpm-quagga. (#3855) 2019-12-06 20:51:46 -08:00
Renuka Manavalan
d087306411
Added debug symbol to dhcp-relay. (#3850)
* Added debug symbol to dhcp-relay.
Note: Master is different; Hence explicitly for 201811 only.

* Include debug symbols of isc-dhcp in its debug docker.
Include isc-dhcp src in source archive.
2019-12-06 20:51:31 -08:00
paavaanan
8ad48a5243 DellEMC S6100 CPLD upgrade support (#3834)
* DelllEMC S6100 CPLD upgrade support

* Typo: CPLD
2019-12-06 10:54:45 -08:00
Ying Xie
ba88f9c0ae Revert "[swss.sh] When starting, call 'systemctl restart' on dependents, not (#3807)" (#3835)
This reverts commit 351410ea8c.
2019-12-02 23:56:04 +00:00
Ying Xie
5fa79fedd0
[201811][swss][utilities] advance submodule heads (#3836)
Submodule src/sonic-swss 1bc989a..8ef513c:
  > [teamsyncd]: Add retry logic in teamsyncd to avoid team handler init failure (#854)

Submodule src/sonic-utilities e548793..ae274e5:
  > [neighbor advertiser] catch all exceptions while trying https endpoint (#757)

Signed-off-by: Ying Xie <ying.xie@microsoft.com>
2019-12-02 15:55:22 -08:00
pavel-shirshov
6d1530f753
[docker-fpm-quagga]: Enable sending ipv6 prefixes over ipv4 BGPMON session. (#3828)
* Enable ipv6 prefixes over ipv4 BGPMON session

* Fix testcases

* Update bgpd_quagga.conf
2019-11-30 22:28:46 -08:00
Ying Xie
ddba4fe322 [201811][utilities] advance sonic-utilities submodule head (#3827)
Submodule src/sonic-utilities 4f87e4d..e548793:
  > Fix a bug in idempotent check. (#755)

Signed-off-by: Ying Xie <ying.xie@microsoft.com>
2019-11-30 09:57:15 -08:00
Joe LeVeque
8c4f7e5933 [dhcp_relay] Add extra sleep before starting relay agent processes (#3824) 2019-11-27 02:21:42 +00:00
Joe LeVeque
3920ac2368 [services] Remove explicit dependencies from dhcp_relay service file, control in swss.sh (#3823) 2019-11-27 02:21:00 +00:00
Ying Xie
d0237ece11
[201811][utilities] advance submodule utilities (#3813)
Submodule src/sonic-utilities 3ed25a4..4f87e4d:
  > [neighbor_advertiser] Adds initial support for HTTPS to neighbor advertiser (#750)

Signed-off-by: Ying Xie <ying.xie@microsoft.com>
2019-11-24 09:59:48 -08:00
Joe LeVeque
8e86a157ff [swss.sh] When starting, call 'systemctl restart' on dependents, not (#3807)
'systemctl start'
2019-11-24 03:26:03 +00:00
Ying Xie
3136fd6018
[bcm SAI] upgrade Broadcom SAI to 3.5.3.3-1 (#3781)
- Broadcom SAI GA release 20191115.

Signed-off-by: Ying Xie <ying.xie@microsoft.com>
2019-11-19 21:33:55 -08:00
Qi Luo
0848a31893
Update submodule: sonic-snmpagent (#3783) 2019-11-19 13:09:58 -08:00
Ying Xie
29339773d2
[201811][sairedis] advance sairedis submodule head (#3780)
Submodule src/sonic-sairedis 627e6bc..4b11836:
  > Disable Fast-Reboot start if uptime is greater than 3 minutes (#534)

Signed-off-by: Ying Xie <ying.xie@microsoft.com>
2019-11-18 21:11:33 -08:00
Ying Xie
9b3c178fc8 [201811][kernel] advance kernel submodule head (#3773)
Submodule src/sonic-linux-kernel f6a4391..4fed1cb:
  > [kernel] add patch for mlx-platform: Fix parent device in i2c-mux-reg device registration (#112)
2019-11-18 11:29:59 -08:00
zzhiyuan
6a6ce50813 Update arista submodule for smbus reliability (#3772) 2019-11-16 20:08:47 -08:00
Nazarii Hnydyn
e546c64c76 [mellanox] Extend Mellanox FW utils with CPLD update (#3723)
* [mellanox] Extend Mellanox FW utils with CPLD update
* [mellanox] Fix FW utils review comments
2019-11-15 10:43:17 -08:00
Ying Xie
45f5270399
Revert "[build] clear dpkg cache and update sources (#3737)" (#3749)
This reverts commit 9871e043ec.
2019-11-14 07:46:36 -08:00
Ying Xie
9871e043ec
[build] clear dpkg cache and update sources (#3737)
This change is intended to fix the issue with dpkg-query during build
process.

The symptom is dpkg-query failed to open package info file, usually
/var/lib/dpkg/updates/000?

Signed-off-by: Ying Xie <ying.xie@microsoft.com>
2019-11-12 07:23:13 -08:00
Wenda Ni
8788f4f783 cherry-picking diff between #3628 and #3561
Revert "Configure buffer profile to all ports (#3561)" (#3628)
Configure buffer profile to all ports (#3561)

This reverts commit 8861cbe98e.

Signed-off-by: Wenda Ni <wenni@microsoft.com>
2019-11-08 03:12:59 +00:00
Ying Xie
d6d389d7a1
[201811][utilities] advance utilities submodule head (#3724)
Submodule src/sonic-utilities 2ca1ae1..3ed25a4:
  > Do not start pfcwd for M0 devices (#726)
  > Make configlet application script idempotent for updates. (#728)

Signed-off-by: Ying Xie <ying.xie@microsoft.com>
2019-11-07 13:53:24 -08:00
Neetha John
6d23e4c8d7 [pfcwd]: Do not start pfc watchdog on Management Tor (#3719)
Signed-off-by: Neetha John <nejo@microsoft.com>
2019-11-07 21:41:32 +00:00
lguohan
9167f9da46 [aboot]: preserve snmp.yml and acl.json for eos to sonic fast reboot (#3716) 2019-11-07 21:40:20 +00:00
pavel-shirshov
b9b56c91ff [minigraph.py]: Use default namespace for <Address> (#3695)
* [minigraph.py]: Use default namespace for <Address>
2019-11-07 21:36:43 +00:00
pavel-shirshov
90fb363958 Add NEIGHBOR_METADATA info into render (#3688) 2019-11-07 20:09:47 +00:00
pavel-shirshov
a96ed09ff3 Downport BGPM and addrack patches to configlet_201811 branch (#3669)
* BGPm for 201811 (#3601)

* Feature is downported

* Add monitors to the test minigraphs

* Test

* No pfx filer

* Fix bgp sample

* Quagga requires to activate peer-group before configuration

* Add bgpcfgd and bgpd.peer template

* Catch exception if rendering external template

* Fix tests
2019-11-07 20:08:02 +00:00
Danny Allen
aa6adc1384 [minigraph.py] Update minigraph parsing logic to include only active ports for mirror tables (#3592) (#3634)
* Update minigraph.py to filter out front-panel ports that are not active
* Update cfggen tests to reflect new behavior

Signed-off-by: Danny Allen <daall@microsoft.com>

* Incorporate PR comments
- Update t0 tests to include additional device neighbors
- Refactor xml parsing logic
2019-11-07 00:24:07 +00:00
Wenda Ni
c1e17b3579 Adopt per-port buffer & qos profile apply on mellanox (#3543)
Signed-off-by: Wenda Ni <wenni@microsoft.com>
2019-11-07 00:23:10 +00:00
Wenda Ni
0ea82d8735 Fix syntax error for qos_config template (#3619)
Signed-off-by: Wenda Ni <wenni@microsoft.com>
2019-11-07 00:22:50 +00:00