Commit Graph

3360 Commits

Author SHA1 Message Date
Abhishek Dosi
85eb651d17 [submodule update] sonic-platfrom-daemons
[syseepromd] Prevent the syseepromd from termination (#56)
 [thermalctld] Fix invalid warning status (#58)
2020-06-16 10:00:44 -07:00
Abhishek Dosi
96a8e24055 [Submodule update] sonic-swss
Revert "[portsorch] Enable port-level buffer drop counters (#1237)"
(#1308) Add more log message, fix test code (#1239)
2020-06-16 09:12:41 -07:00
Abhishek Dosi
c656b4c582 [Submodule update] sonic-util
[201911][thermal control] Backport changes from master branch (#929)
     [201911][config] Support abbreviation (#933)
       Add 'hw-management-generate-dump.sh' to 'show techsupport'
       command (#934)
       [fwutil]: Update fwutil to v2.0.0.0. (#942)
       Fixes bug for PFCWD feature parameters (#838)
     Fixed fast-reboot for BFN platform (#871)
     [config] Add 'interface transceiver' subgroup with 'lpmode' and
     'reset' subcommands (#904)
      [warm-reboot]: added pre-check for ISSU file (#915)
       [config] Don't attempt to restart disabled services (#944)
2020-06-16 09:09:22 -07:00
yozhao101
4846fc0337 [docker-syncd] Add timeout to force stop syncd container (#4617)
**- Why I did it**
When I tested auto-restart feature of swss container by manually killing one of critical processes in it, swss will be stopped. Then syncd container as the peer container should also be
stopped as expected. However, I found sometimes syncd container can be stopped, sometimes
it can not be stopped. The reason why syncd container can not be stopped is the process
(/usr/local/bin/syncd.sh stop) to execute the stop() function will be stuck between the lines 164 –167. Systemd will wait for 90 seconds and then kill this process.

164 # wait until syncd quit gracefully
165 while docker top syncd$DEV | grep -q /usr/bin/syncd; do
166 sleep 0.1
167 done

The first thing I did is to profile how long this while loop will spin if syncd container can be
normally stopped after swss container is stopped. The result is 5 seconds or 6 seconds. If syncd
container can be normally stopped, two messages will be written into syslog:

str-a7050-acs-3 NOTICE syncd#dsserve: child /usr/bin/syncd exited status: 134
str-a7050-acs-3 INFO syncd#supervisord: syncd [5] child /usr/bin/syncd exited status: 134

The second thing I did was to add a timer in the condition of while loop to ensure this while loop will be forced to exit after 20 seconds:

After that, the testing result is that syncd container can be normally stopped if swss is stopped
first. One more thing I want to mention is that if syncd container is stopped during 5 seconds or 6 seconds, then the two log messages can be still seen in syslog. However, if the execution 
time of while loop is longer than 20 seconds and is forced to exit, although syncd container can be stopped, I did not see these two messages in syslog. Further, although I observed the auto-restart feature of swss container can work correctly right now, I can not make sure the issue which syncd container can not stopped will occur in future.

**- How I did it**
I added a timer around the while loop in stop() function. This while loop will exit after spinning
20 seconds.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>
2020-06-16 08:21:15 -07:00
Dong Zhang
143e4f524c [MultiDB] Add REDIS_TIMEOUT_MSECS back which is removed by mistake (#4757) 2020-06-16 08:19:38 -07:00
Renuka Manavalan
f8a9a1b805 [k8s]: switching to Flannel from Calico. (#4768)
Switching to Flannel from Calico which brings down the image size by around 500+MB.
2020-06-16 08:18:54 -07:00
arlakshm
c5807c2dd2 [bgp]:Add redistribution connected for ipv6 also for Frontend ASICs (#4767)
* fix redistribution connected for ipv6 also

Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>
2020-06-16 08:18:19 -07:00
Joe LeVeque
c625e0e3e6 [build] Enable telemetry service by default (#4760)
**- Why I did it**
To ensure telemetry service is enabled by default after installing a fresh SONiC image

**- How I did it**
Set telemetry feature status to "enabled" when generating init_cfg.json file
2020-06-16 08:17:47 -07:00
shlomibitton
eb2fe4b16e Fix MSN4700 sensors (#4753)
Signed-off-by: Shlomi Bitton <shlomibi@mellanox.com>
2020-06-16 08:16:55 -07:00
Prince Sunny
b4f45e9c15 Submodule update - sonic-restapi (#4749) 2020-06-16 08:16:11 -07:00
Arun Saravanan Balachandran
030570de81 [DellEMC]: EEPROM decoder for S6000, S6000-ON (#4718)
**- Why I did it**

For decoding system EEPROM of S6000 based on Dell offset format and S6000-ON’s system EEPROM in ONIE TLV format.

**- How I did it**

- Differentiate between S6000 and S6000-ON using the product name available in ‘dmi’  ( “/sys/class/dmi/id/product_name” )
- For decoding S6000 system EEPROM in Dell offset format and updating the redis DB with the EEPROM contents, added a new class ‘EepromS6000’ in eeprom.py, 
- Renamed certain methods in both Eeprom, EepromS6000 classes to accommodate the plugin-specific methods.

**- How to verify it**

- Use 'decode-syseeprom' command to list the system EEPROM details.
- Wrote a python script to load chassis class and call the appropriate methods.

UT Logs: [S6000_eeprom_logs.txt](https://github.com/Azure/sonic-buildimage/files/4735515/S6000_eeprom_logs.txt), [S6000-ON_eeprom_logs.txt](https://github.com/Azure/sonic-buildimage/files/4735461/S6000-ON_eeprom_logs.txt)
Test script: [eeprom_test_py.txt](https://github.com/Azure/sonic-buildimage/files/4735509/eeprom_test_py.txt)
2020-06-16 08:15:28 -07:00
Ying Xie
aecebac86b [ntp] disable ntp long jump (#4748)
Found another syncd timing issue related to clock going backwards.
To be safe disable the ntp long jump.

Signed-off-by: Ying Xie <ying.xie@microsoft.com>
2020-06-16 08:15:00 -07:00
Junchao-Mellanox
d10b597f50 [Mellanox] Upgrade mft to 4.14.1-8 (#4701) 2020-06-16 08:14:18 -07:00
Joe LeVeque
ed0e6aed1c [hostcfgd] Get service enable/disable feature working (#4676)
Fix hostcfgd so that changes to the "FEATURE" table in ConfigDB are properly handled. Three changes here:

1. Fix indenting such that the handling of each key actually occurs in the for key in status_data.keys(): loop
2. Add calls to sudo systemctl mask and sudo systemctl unmask as appropriate to ensure changes persist across reboots
3. Substitute returns with continues so that even if one service fails, we still try to handle the others

Note that the masking is persistent, even if the configuration is not saved. We may want to consider only calling systemctl enable/disable in hostcfgd when the DB table changes, and only call systemctl mask/unmask upon calling config save.
2020-06-16 08:13:32 -07:00
Joe LeVeque
42bc14f44c [systemd] Relocate all SONiC unit files to /usr/lib/systemd/system (#4673)
This will allow us to disable services and have it persist across reboots by using the `systemctl mask` operation
2020-06-16 08:12:47 -07:00
Olivier Singla
18bbbb3c02 [baseimage]: Run fsck filesystem check support prior mounting filesystem (#4431)
* Run fsck filesystem check support prior mounting filesystem

If the filesystem become non clean ("dirty"), SONiC does not run fsck to
repair and mark it as clean again.

This patch adds the functionality to run fsck on each boot, prior to the
filesystem being mounted. This allows the filesystem to be repaired if
needed.

Note that if the filesystem is maked as clean, fsck does nothing and simply
return so this is perfectly fine to call fsck every time prior to mount the
filesystem.

How to verify this patch (using bash):

Using an image without this patch:

Make the filesystem "dirty" (not clean)
[we are making the assumption that filesystem is stored in /dev/sda3 - Please adjust depending of the platform]
[do this only on a test platform!]

dd if=/dev/sda3 of=superblock bs=1 count=2048
printf "$(printf '\\x%02X' 2)" | dd of="superblock" bs=1 seek=1082 count=1 conv=notrunc &> /dev/null
dd of=/dev/sda3 if=superblock bs=1 count=2048

Verify that filesystem is not clean
tune2fs -l /dev/sda3 | grep "Filesystem state:"

reboot and verify that the filesystem is still not clean
Redo the same test with an image with this patch, and verify that at next reboot the filesystem is repaired and becomes clean.

fsck log is stored on syslog, using the string FSCK as markup.
2020-06-16 08:12:11 -07:00
Junchao-Mellanox
62690f504a
[Mellanox] Initialize system LED color to green for 201911 (#4743)
* [Mellanox] Initialize system LED color to green for 201911

* Rename variable to make it more readable
2020-06-16 15:38:17 +03:00
Nazarii Hnydyn
50f4e7de5f
[Mellanox] Add ONIE and SSD platform components. (#4764)
Signed-off-by: Nazarii Hnydyn <nazariig@mellanox.com>
2020-06-15 13:04:44 +03:00
Arun Saravanan Balachandran
093d7731ab
[201911] DellEMC: Skip thermalctld and thermal platform API changes (#4752)
**- Why I did it**

- Skip thermalctld in DellEMC S6000, S6100, Z9100 and Z9264 platforms.
- Change the return type of thermal Platform APIs in DellEMC S6000, S6100 and Z9100 platforms to 'float'.

**- How I did it**

- Add 'skip_thermalctld:true' in pmon_daemon_control.json for DellEMC S6000, S6100, Z9100 and Z9264 platforms.
- Made changes in thermal.py, for 'get_temperature', 'get_high_threshold' and 'get_low_threshold' to return 'float' value.

**- How to verify it**

- Check thermalctld is not running in 'pmon'.
- Wrote a python script to load Chassis class and then call the APIs accordingly and verify the return type.
2020-06-11 10:48:27 -07:00
Junchao-Mellanox
0a70571011
[201911][thermal control] Backport feature from master branch (#4677)
Backport thermal control feature from master branch to 201911 branch by cherry-picking commits and manually resolving conflicts.
2020-06-08 11:20:43 -07:00
Abhishek Dosi
b5a419e1c8 [submodule update] sonic-swss
Corrected the copp rule as per NAT HLD (#1300)
2020-06-07 20:47:51 -07:00
judyjoseph
ccf12d2ff7
SAI 3.7.5.1 (#4710) 2020-06-07 20:45:12 -07:00
Volodymyr Samotiy
e73a5f1375
[Mellanox] Update SAI, SDK 4.4.0928 and FW xx.2007.1208 (#4704)
Signed-off-by: Volodymyr Samotiy <volodymyrs@mellanox.com>
2020-06-04 13:28:12 -07:00
Kebo Liu
fe8d4616e3
[submodule] Update sonic-linux-kernel pointer to pick up new kernel patches (#4696) 2020-06-04 11:55:33 +03:00
Abhishek Dosi
007eeec0e0 [Submodule update] sonic-util
Make sure db_migrator is run after all config are loaded during (#926)
Vnet alias mapping (#924)
Changes to make lldp show command for multi-npu platforms. (#914)
[Mellanox] Fix thermal control issue: use natural sort for fan
status and thermal status (#836)
[Mellanox] add document for thermal control related cli (#832)
2020-06-03 15:51:28 -07:00
Abhishek Dosi
8a66d9a902 [Submodule update] sonic-swss-common
Fix memory leak in pyext when Selectable is returned to Python (#343)
2020-06-03 15:50:32 -07:00
Abhishek Dosi
3530695ae7 [Submodule update] sonic-swss
[aclorch] Add support for creating ingress and egress MIRROR tables
 concurrently (#1286)
[proxy_arp] Implement proxy ARP feature (#1302)
Fix LAG member test case (#1304)
[orchagent] Set default MTU for the underlay loopback interface (#1299)
2020-06-03 15:49:05 -07:00
Joe LeVeque
f9bb26fe8a [build] 'make reset' target will continue recursive operations if any fail (#4675)
This change allows the recursive `git clean` and `git reset` commands to continue even if they encounter an error in one of the submodules. Previously, if an error was encountered, the operation would terminate with a message similar to the following:

Stopping at 'src/sonic-mgmt-framework'; script returned non-zero status.
2020-06-03 15:42:14 -07:00
Sumukha Tumkur Vani
6ca112e7e6 Update sonic-restapi (#4692)
Auto restart restapi server after cert rollover
2020-06-03 15:40:30 -07:00
Srideep
0de051ab9f [DellEmc] Changes to suppot new portmap for s5232f t0 config (#4670)
To support t0 config
2020-06-03 15:38:47 -07:00
Joe LeVeque
913d380f6b [caclmgrd] Get first VLAN host IP address via next() (#4685)
I found that with IPv4Network types, calling list(ip_ntwrk.hosts()) is reliable. However, when doing the same with an IPv6Network, I found that the conversion to a list can hang indefinitely. This appears to me to be a bug in the ipaddress.IPv6Network implementation. However, I could not find any other reports on the web.

This patch changes the behavior to call next() on the ip_ntwrk.hosts() generator instead, which returns the IP address of the first host.
2020-06-03 15:38:11 -07:00
abdosi
e00d038774 [sonic-slave]: add debian packages needed to compile BRCM SAI3.7 (#4672)
both for sonic-slave-stretch and sonic-slave-buster
2020-06-03 15:36:52 -07:00
Joe LeVeque
f2c0ed8e21 [caclmgrd] Allow more ICMP types (#4625) 2020-06-03 15:35:49 -07:00
Joe LeVeque
1e59be8941 [caclmgrd] Ignore keys in interface-related tables if no IP prefix is present (#4581)
Since the introduction of VRF, interface-related tables in ConfigDB will have multiple entries, one of which only contains the interface name and no IP prefix. Thus, when iterating over the keys in the tables, we need to ignore the entries which do not contain IP prefixes.
2020-06-03 15:35:10 -07:00
Joe LeVeque
ac957a0c7a [caclmgrd] Add some default ACCEPT rules and lastly drop all incoming packets (#4412)
Modified caclmgrd behavior to enhance control plane security as follows:

Upon starting or receiving notification of ACL table/rule changes in Config DB:
1. Add iptables/ip6tables commands to allow all incoming packets from established TCP sessions or new TCP sessions which are related to established TCP sessions
2. Add iptables/ip6tables commands to allow bidirectional ICMPv4 ping and traceroute
3. Add iptables/ip6tables commands to allow bidirectional ICMPv6 ping and traceroute
4. Add iptables/ip6tables commands to allow all incoming Neighbor Discovery Protocol (NDP) NS/NA/RS/RA messages
5. Add iptables/ip6tables commands to allow all incoming IPv4 DHCP packets
6. Add iptables/ip6tables commands to allow all incoming IPv6 DHCP packets
7. Add iptables/ip6tables commands to allow all incoming BGP traffic
8. Add iptables/ip6tables commands for all ACL rules for recognized services (currently SSH, SNMP, NTP)
9. For all services which we did not find configured ACL rules, add iptables/ip6tables commands to allow all incoming packets for those services (allows the device to accept SSH connections before the device is configured)
10. Add iptables rules to drop all packets destined for loopback interface IP addresses
11. Add iptables rules to drop all packets destined for management interface IP addresses
12. Add iptables rules to drop all packets destined for point-to-point interface IP addresses
13. Add iptables rules to drop all packets destined for our VLAN interface gateway IP addresses
14. Add iptables/ip6tables commands to allow all incoming packets with TTL of 0 or 1 (This allows the device to respond to tools like tcptraceroute)
15. If we found control plane ACLs in the configuration and applied them, we lastly add iptables/ip6tables commands to drop all other incoming packets
2020-06-03 09:41:52 -07:00
Kebo Liu
618d529ef4
[201911][Mellanox] Update hw-mgmt package to V.7.0010.1000 (#4688) 2020-06-02 14:53:09 -07:00
Sumukha Tumkur Vani
a693f02362 Read cloudtype info from minigraph (#4642) 2020-05-27 18:08:34 -07:00
pavel-shirshov
5969470ab8 [sonic-slave]: Install pympler to find the memory leaks in python (#4652) 2020-05-27 18:07:44 -07:00
Arun Saravanan Balachandran
98b8d1eee1 DellEMC: get_change_event Platform API implementation for S6000, S6100 and Z9100 (#4593)
For detecting transceiver change events through xcvrd in DellEMC S6000, S6100 and Z9100 platforms.

- In S6000, rename 'get_transceiver_change_event' in chassis.py to 'get_change_event' and return appropriate values.
- In S6100, implement 'get_change_event' through polling method (poll interval = 1 second) in chassis.py (Transceiver insertion/removal does not generate interrupts due to a CPLD bug)
- In Z9100, implement 'get_change_event' through interrupt method using select.epoll().
2020-05-27 18:00:45 -07:00
shlomibitton
3ca65bdb67 [Mellanox] Fix 'sensors.conf' mapping for MSN4700 (#4511)
* [Mellanox] Fix 'sensors.conf' mapping for SN4700

Signed-off-by: Shlomi Bitton <shlomibi@mellanox.com>

* Fix some labels name
2020-05-27 17:58:55 -07:00
Tyler Li
16d9fc8848 Fix vrf test failed after frr update to 7.2 (#3763) 2020-05-27 07:11:23 +00:00
Nazarii Hnydyn
12fb95c561
[201911][sonic-platform-common]: Advance submodule (#4635)
Signed-off-by: Nazarii Hnydyn <nazariig@mellanox.com>
2020-05-23 12:21:16 -07:00
pavel-shirshov
209febcf56 Update golang version for 1.11.5 to 1.14.2 (#4520) 2020-05-21 23:53:10 -07:00
Abhishek Dosi
6ebc79e297 [submodule update] sonic-rest API's
PR#39  Setup module versioning
Add support for get all Vlans (#37)
2020-05-21 22:20:18 -07:00
Abhishek Dosi
7b5fb95fc7 [submodule update] sonic-util
Revert "[config] Add 'interface transceiver' subgroup with 'lpmode' and
 'reset' subcommands (#904)"
  Multi-asic changes for config bgp commands and utilities. (#910)
2020-05-20 22:44:14 -07:00
Abhishek Dosi
a197dd46e0 [submodule update] sonic-swss with PR
[vnet] Fix IP2ME route creation logic for BITMAP VNET interface (#1284)
2020-05-20 22:44:14 -07:00
Ying Xie
14b3f0022b [ntp] enable/disable NTP long jump according to reboot type (#4577)
* [ntp] enable/disable NTP long jump according to reboot type

- Enable NTP long jump after cold reboot.
- Disable NTP long jump after warrm/fast reboot.

Signed-off-by: Ying Xie <ying.xie@microsoft.com>

* fix typo

* further refactoring

* use sonic-db-cli instead
2020-05-20 22:44:14 -07:00
Samuel Angebault
dcb780e2e9 [arista]: remove the soc property disabling sram scan (#4623) 2020-05-20 22:44:14 -07:00
arlakshm
fd2831c15c [config]: Fix the device type and internal bgp session status for multi NPU platforms (#4600)
* The following changes for multi-npu platforms are done
- Set the type in device_metadata for asic configuration to be same as host
- Set the admin-status of internal bgp sessions as up
Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>
2020-05-20 22:44:14 -07:00
abdosi
bb60e2b670 Changes to support config-setup service for multi-npu (#4609)
* Changes to support config-setup service for multi-npu
platforms. For Multi-npu we are not supporting as of
now config initializtion and ZTP. It will support creating
config db from minigraph or using  config db from previous
file system

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>

* Address Review Comments.

* Address Review comments

* Address Review Comments of using pyhton based config load_minigraph/
config save/config reload from shell scripts so that we don't duplicate
code. Also while running from shell we will skip stop/start services
done by those commands.

* Updated to use python command so no code duplication.
2020-05-20 22:44:14 -07:00