Commit Graph

3341 Commits

Author SHA1 Message Date
isabelmsft
c56ddf0dba [Kubernetes Setup] Remove flannel, kube-proxy images (#5098)
Removes installation of kube-proxy (117 MB) and flannel (53 MB) images from Kubernetes-enabled devices. These images are tested to be unnecessary for our use case, as we do not rely on ClusterIPs for Kubernetes Services or a CNI for pod networking.
2020-08-09 10:48:59 -07:00
Joe LeVeque
d7a601dd8f [.gitignore] Ignore src/sonic-py-common/.eggs/ directory (#5114)
Add generated src/sonic-py-common/.eggs/ directory to .gitignore file
2020-08-09 10:44:48 -07:00
shlomibitton
fad242480c Add support for 'Extended Specification Compliance' for QSFP cables parser (#5096)
Signed-off-by: Shlomi Bitton <shlomibi@mellanox.com>
2020-08-09 10:44:14 -07:00
Joe LeVeque
890e1f38cc [sonic-py-common] get_platform(): Refactor method of retrieving platform identifier (#5094)
Applications running in the host OS can read the platform identifier from /host/machine.conf. When loading configuration, sonic-config-engine *needs* to read the platform identifier from machine.conf, as it it responsible for populating the value in Config DB.

When an application is running inside a Docker container, the machine.conf file is not accessible, as the /host directory is not mounted. So we need to retrieve the platform identifier from Config DB if get_platform() is called from inside a Docker 
container. However, we can't simply check that we're running in a Docker container because the host OS of the SONiC virtual switch is running inside a Docker container. So I refactored `get_platform()` to:
    1. Read from the `PLATFORM` environment variable if it exists (which is defined in a virtual switch Docker container)
    2. Read from machine.conf if possible (works in the host OS of a standard SONiC image, critical for sonic-config-engine at boot)
    3. Read the value from Config DB (needed for Docker containers running in SONiC, as machine.conf is not accessible to them)

- Also fix typo in daemon_base.py
- Also changes to align `get_hwsku()` with `get_platform()`
2020-08-09 10:40:20 -07:00
pavel-shirshov
a15d79b6f1 [bgpcfgd]: Clarify error messages on reset Loopback0 ip address (#5062)
To clarify error messages in case the ip address for Loopback is already set. It doesn't make sense to call correct ip address as ambiguous in this case
2020-08-09 10:39:28 -07:00
rkdevi27
5ddfc13a75 [baseimage]: /host unmount timeout issue during reboot. (#5032)
Fix for the host unmount issue through PR https://github.com/Azure/sonic-buildimage/pull/4558 and https://github.com/Azure/sonic-buildimage/pull/4865 creates the timeout of syslog.socket closure during reboot since the journald socket closure has been included in syslog.socket

Removed the journal socket closure. The host unmount is fixed with just stopping the services which gets restarted only after /var/log unmount and not causing the unmount issues.
2020-08-09 10:38:33 -07:00
rkdevi27
652aa3b072 [baseimage]: /host unmount failed in VM during reboot (#4865)
Added a check further to make the services to stop appropriately before unmount.

Fix #4651
2020-08-09 10:37:12 -07:00
rkdevi27
f1bbda19f0 Fix "/host unmount failure" during reboot (#4558) 2020-08-09 10:34:02 -07:00
Guohan Lu
544aa236c4 [submodule]: update sonic-utilities
ef0b1fa 2020-07-21 | [config] Restart telemetry service upon config (re)load (#992)

Signed-off-by: Guohan Lu <lguohan@gmail.com>
2020-08-05 17:30:35 -07:00
Stephen Sun
b76f8fafdb [Mellanox] Update the buffer setting (#4989)
* Update the buffer size based on the latest excel

Signed-off-by: Stephen Sun <stephens@mellanox.com>

* Align the buffer configuration with the latest formula:

- reduce redundant "*2" in formula
- use port MTU for local sending the PFC frame and peer lossless MTU for peer sending lossless traffic

Buffer pool size updated accordingly.

Signed-off-by: Stephen Sun <stephens@mellanox.com>
2020-08-03 23:06:21 -07:00
Joe LeVeque
6556c40040
[201911] Introduce sonic-py-common package (#5063)
Consolidate common SONiC Python-language functionality into one shared package (sonic-py-common) and eliminate duplicate code.

The package currently includes four modules:
- daemon_base
- device_info
- logger
- task_base

NOTE: This is a combination of all changes from https://github.com/Azure/sonic-buildimage/pull/5003, https://github.com/Azure/sonic-buildimage/pull/5049 and some changes from https://github.com/Azure/sonic-buildimage/pull/5043 backported to align with the 201911 branch. As part of the 201911 port, I am not installing the Python 3 package in the base image or in the VS container, because we do not have pip3 installed, and we do not intend to migrate to Python 3 in 201911.
2020-08-03 11:50:06 -07:00
Nazarii Hnydyn
4e558bca25
[201911][Mellanox] Update MFT to 4.15.0-104 (#5077)
* [Mellanox] Update MFT to 4.15.0-104.

Signed-off-by: Nazarii Hnydyn <nazariig@mellanox.com>

* [Mellanox] Remove build system W/A.

Signed-off-by: Nazarii Hnydyn <nazariig@mellanox.com>

* [Mellanox] Add MFT DKMS build support.

Signed-off-by: Nazarii Hnydyn <nazariig@mellanox.com>
2020-08-03 13:53:33 +03:00
Andriy Kokhan
fbf3cb13a5
[bfn] Updated BFN SDK packages to 20200731 with SAI v1.5.2 (#5082)
Signed-off-by: Andriy Kokhan <akokhan@barefootnetworks.com>
2020-08-03 02:12:55 -07:00
Abhishek Dosi
3c224e7ce3 [submodule update] sonic-swss
Revert "Refine getDbId() calling to fix build after swss-common change (#1245)"

 This shoudl fix VS build.

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
2020-08-01 17:41:30 -07:00
Abhishek Dosi
35ff8b3e12 [submodule update] sonic-platform-daemons
[xcvrd] Fix bailing out on platforms that do not support QSFP-DD (#78)

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
2020-08-01 14:24:13 -07:00
Abhishek Dosi
ae65de6cac [submodule update] sonic-swss
Remove 00-copp.config.json from swss debian package. (#1366)
2020-07-31 17:30:09 -07:00
pavel-shirshov
f757a5d6eb Fix for ipv6 local-addr problem (#4876)
Co-authored-by: Pavel Shirshov <pavel.contrib@gmail.com>
2020-07-31 17:26:07 -07:00
abdosi
e3eddede1e Changes to add template support for copp.json. (#5053)
* Changes to add template support for copp.json.
This is needed so that we can install differnt type of
Traps based on Device Role (Tor/Leaf/Mgmt/etc...).

Initial use case is to install DHCP/DHCPv6 tarp only
for tor router.

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>

* Fixed based on review comments.

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>

* Fixed based on review comment.
2020-07-31 17:24:45 -07:00
Joe LeVeque
c96c3cd311 [caclmgrd] Always restart service upon process termination (#5065) 2020-07-31 17:23:48 -07:00
Abhishek Dosi
a4d399c1a9 [submodule update] sonic-platform-common
Fix import issue for python3 whl building (#105)
2020-07-31 10:24:53 -07:00
Abhishek Dosi
b576443af9 [submodule update] sonic-platform-daemons
[xcvrd] Add support for QSFP-DD cables (#66)
2020-07-30 12:12:03 -07:00
Abhishek Dosi
a2aa5c4d8c [submodule update] sonic-platform-common
[Transceiver] Add parser for QSFP-DD cable type and dictionaries for
 QSFP-DD codes (#101)
2020-07-30 12:10:20 -07:00
Kebo Liu
c8c4493a96
Update SAI to 1.16.6, SDK to 4.4.1014, FW to *.2008.1032 (#5056)
SAI:
    Fix ECMP max groups logic
    add set issu log level for spc2/spc3, as now issu is supported
    set vlan max swid = 0 on sdk init, as only single swid is needed, for efficient resource usage
    Fix traffic lost during FFB related to buffer config + optimize buffer config timing for FB
    Add ACL fields BTH, IP flags
    Add ACL infrastructure of different fields per ASIC type
    Add port stat ether rx/tx oversize pkts
  SDK/FW:
    Added support for Finisar 100GbE SWDM Transceiver FTLC9152RGPL.
    Spectrum-2 Added support for 10G BaseT modules
    Added link LED support for SN4600C.
    Counters | In SDK debug dump, the incorrect counter type appears for vtraps.
    WJH | Without any traffic or events on the idle system, the CPU load is constantly above 4%
    WJH | WJH filter currently cannot filter by PORT for buffer drop reason.
    Spectrum | ACL, Unbind, Lazy Delete | Running Lazy Delete together with auto_unbind may cause rate condition errors. To work work with Lazy Delete use new INIT parameter "acl_manual_unbind" so that ACLs will notbe removed automatically when binding point is deleted.
    Spectrum | ISSU | In ISSU mode, when querying for the number of configurable buffers, using the API sx_api_cos_port_buff_type_get with the count parameter as 0, the API returns the number for NORMAL mode instead.
    Spectrum-2 | BER | BER monitor counts raw errors instead of effective errors
    Spectrum-2 | BER | Connecting to ConnectX-5 adapter card with copper splitter cable MCP7H50-V001R30 in 1
    Spectrum-2 | Cables | Link flaps in 200GbE with AOM Optic cable MMA1T00-VS
    Spectrum-3 | Speeds, Link | When moving from a 400GbE link to a 1GbE link, packets may drop for 1msec right after link up
    Spectrum-3 | Cables, Speeds | Using 400GbE with 3rd party systems is not supported
    Spectrum-3 | LAG | After a while, LAG members become out of sync with one another
    Spectrum-3 | VLAN, Ports | Packets with VLAN headers are sent to
2020-07-30 13:37:54 +03:00
Kebo Liu
b4f010f042
[kernel]: Update sonic-linux-kernel to pick up new fix (#5044)
Azure/sonic-linux-kernel#154 patch psample to safely unload the module
2020-07-28 15:06:51 -07:00
Abhishek Dosi
a163583d93 [submodule update] sonic-swss-common
This is fix for compilation error also on 201911.
[schema]: Add a new table "NAT_DNAT_POOL_TABLE" to hold the DNAT Pool
entries.

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
2020-07-27 08:58:42 -07:00
Stephen Sun
33a3cc8861
[submodule]: update submodule head for sonic-sairedis on 201911 (#5041)
Update the meta code to support DNAT Pool changes (#616)
[syncd] Fix notification on shutdown request (#637)
Advance the submodule head of SAI (#641)

Signed-off-by: Stephen Sun <stephens@mellanox.com>
2020-07-26 13:43:01 -07:00
Abhishek Dosi
940e5c0db0 [submodule update] sonic-utilities
[config] Add 'config interface mtu' command #793
add fec configuration 'config interface fec [OPTIONS] <interface_name>
<interface_fec>' #764
2020-07-26 11:29:10 -07:00
Abhishek Dosi
6216d32db0 [submodule update] sonic-swss
[201911] Update nat entries to use nat_type to support DNAT Pool
  changes. (#1297)
    [201911] Update nat entries to use nat_type to support DNAT Pool
    changes. (#1297)
2020-07-26 11:22:41 -07:00
Nazarii Hnydyn
7fe359747a [orchagent]: Fix platform string export. (#4993)
Signed-off-by: Nazarii Hnydyn <nazariig@mellanox.com>
2020-07-26 11:20:56 -07:00
Tamer Ahmed
755319c37c [docker-orchagent] Call sonic-cfggen Once (#4936)
Optimizing number of calls made to sonic-cfggen during service
start up as it adds to total system boot up time.

signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>

**- Why I did it**
sonic-cfggen call is slow and it adds to system start up time

**- How I did it**
places all required variable into single template and called into sonic-cfggen using this template

**- How to verify it**
***-Test 1***
there is an average saving of .5 to 1 sec between old script and new script
```
root@str-s6000-acs-14:/# time ./orchagent_old.sh
/usr/bin/orchagent -d /var/log/swss -b 8192 -m f4:8e:38:16:bc:8d

real	0m3.546s
user	0m2.365s
sys	0m0.585s

root@str-s6000-acs-14:/# time ./orchagent_new.sh
/usr/bin/orchagent -d /var/log/swss -b 8192 -m f4:8e:38:16:bc:8d

real	0m2.058s
user	0m1.650s
sys	0m0.363s
```
***-Test 2***
Built an image with this change and orchagent is running with intended params:
```
admin@str-s6000-acs-14:~$ ps -ef | grep orchagent
root      2988  1901  1 02:09 pts/0    00:00:02 /usr/bin/orchagent -d /var/log/swss -b 8192 -m f4:8e:38:16:bc:8d
```

signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
2020-07-26 11:19:15 -07:00
Akhilesh Samineni
dd26117bf2 [NAT]: Update the conntrack entries timeout to Max value after warmboot (#4596)
Signed-off-by: Akhilesh Samineni <akhilesh.samineni@broadcom.com>

All new NAT conntrack entries are added to kernel with max entry timeout of 432000 and setting the same timeout during system warm reboot also
2020-07-26 11:18:42 -07:00
shlomibitton
9385775803 [Mellanox] Change fan tolerance to 50% (#5018)
Mellanox platforms fan tolerance should change to 50%

Signed-off-by: Shlomi Bitton <shlomibi@mellanox.com>
2020-07-26 11:18:00 -07:00
anish-n
733f7091ac [bgpcfgd]: Add Vlan prefix list to the FRR templates (#5005)
add the Vlan prefix list to the FRR templates
2020-07-26 11:17:29 -07:00
madhanmellanox
130aeb4cc1 [caclmgrd] Log error message if IPv4 ACL table contains IPv6 rule and vice-versa (#4498)
* Defect 2082949: Handling Control Plane ACLs so that IPv4 rules and IPv6 rules are not added to the same ACL table

* Previous code review comments of coming up with functions for is_ipv4_rule and is_ipv6_rule is addressed and also raising Exceptions instead of simply aborting when the conflict occurs is handled

* Addressed code review comment to replace duplicate code with already existing functions

* removed raising Exception when rule conflict in Control plane ACLs are found

* added code to remove the rule_props if it is conflicting ACL table versioning rule

* addressed review comment to add ignoring rule in the error statement

Co-authored-by: Madhan Babu <madhan@arc-build-server.mtr.labs.mlnx>
2020-07-26 11:16:30 -07:00
shlomibitton
d0be3ebe16 Add support for QSFP-DD cables on MLNX platform API (#4965)
Signed-off-by: Shlomi Bitton <shlomibi@mellanox.com>
2020-07-26 11:15:05 -07:00
vdahiya12
2b86e51026 [daemon_base] fix to not reregister signal handler (#4998)
* [daemon_base] fix to not reregister signal handler

-src/sonic-daemon-base/sonic_daemon_base/daemon_base.py
Problem:
Currently all daemons inherit from daemon_base class, and for
signal handling functionality they register the signal_handler() by
overriding the siganl_handler() in daemon_base by their own
implmentation.
But some sonic_platform instances also can invoke the daemon_base
constructor while trying to instantiate the common utilities
for example
platform_chassis = sonic_platform.platform.Platform().get_chassis()
This will cause the re registration of signal_handler which will
cause base class signal_handler() to be invoked when the daemon
gets a signal, whereas their own signal_handler should have been
invoked.

Fix:
We only register the siganl_handler once, and if signal_handler has
been registered, not re register it.

Signed-off-by: vaibhav-dahiya <vdahiya@microsoft.com>

* [daemon_base] fix to not reregister signal handler

Signed-off-by: vaibhav-dahiya <vdahiya@microsoft.com>
2020-07-26 11:14:17 -07:00
Stepan Blyshchak
b100ec559e [services] remove swss from WantedBy for nat service (#4991)
Otherwise, it may cause issues for warm restarts, warm reboot.
Warm restart of swss will start nat which is not expected for warm
restart. Also it is observed that during warm-reboot script execution
nat container gets started after it was killed. This causes removal of
nat dump generated by nat previously:

A check [ -f /host/warmboot/nat/nat_entries.dump ] || echo "NAT dump
does not exists" was added right before kexec:

```
Fri Jul 17 10:47:16 UTC 2020 Prepare MLNX ASIC to fastfast-reboot:
install new FW if required
Fri Jul 17 10:47:18 UTC 2020 Pausing orchagent ...
Fri Jul 17 10:47:18 UTC 2020 Stopping nat ...
Fri Jul 17 10:47:18 UTC 2020 Stopped nat ...
Fri Jul 17 10:47:18 UTC 2020 Stopping radv ...
Fri Jul 17 10:47:19 UTC 2020 Stopping bgp ...
Fri Jul 17 10:47:19 UTC 2020 Stopped bgp ...
Fri Jul 17 10:47:21 UTC 2020 Initialize pre-shutdown ...
Fri Jul 17 10:47:21 UTC 2020 Requesting pre-shutdown ...
Fri Jul 17 10:47:22 UTC 2020 Waiting for pre-shutdown ...
Fri Jul 17 10:47:24 UTC 2020 Pre-shutdown succeeded ...
Fri Jul 17 10:47:24 UTC 2020 Backing up database ...
Fri Jul 17 10:47:25 UTC 2020 Stopping teamd ...
Fri Jul 17 10:47:25 UTC 2020 Stopped teamd ...
Fri Jul 17 10:47:25 UTC 2020 Stopping syncd ...
Fri Jul 17 10:47:35 UTC 2020 Stopped syncd ...
Fri Jul 17 10:47:35 UTC 2020 Stopping all remaining containers ...
Warning: Stopping telemetry.service, but it can still be activated by:
  telemetry.timer
Fri Jul 17 10:47:37 UTC 2020 Stopped all remaining containers ...
NAT dump does not exists
Fri Jul 17 10:47:39 UTC 2020 Rebooting with /sbin/kexec -e to
SONiC-OS-201911.140-08245093 ...
```

With this change, executed warm-reboot 10 times without hitting this
issue, while without this change the issue is easily reproducible almost
every warm-reboot run.

Signed-off-by: Stepan Blyschak <stepanb@mellanox.com>
2020-07-26 11:11:35 -07:00
anish-n
b0ccf58682 [bgpcfgd]: Add fix to bgpcfgd to ignore NEIGHBOR_METADATA entries for dynamic peers (#5008)
This fix removes the requirement to have a NEIGHBOR_METADATA for dynamic peers. The change is made since it is not necessary for NEIGHBOR_METADATA entries be present for the dynamic neighbors
2020-07-26 11:10:24 -07:00
Kebo Liu
0701da4145 [Mellanox] remove code which instructs hw-mgmt to skip mlsw_minimal probing in fast-boot flow (#5011) 2020-07-26 11:09:26 -07:00
isabelmsft
ca844ec6b3 Update Kubernetes and kubernetes-cni versions (#5024)
This PR updates kubernetes version to 1.18.6 and kubernetes-cni version to 0.8.6

signed-off by: Isabel Li isabel.li@microsoft.com

Why I did it
Previous kubernetes-cni version (0.7.5) introduced Kubernetes Man In The Middle Vulnerability. “A vulnerability was found in all versions of containernetworking/plugins before version 0.8.6, that allows malicious containers in Kubernetes clusters to perform man-in-the-middle (MitM) attacks. A malicious container can exploit this flaw by sending rogue IPv6 router advertisements to the host or other containers, to redirect traffic to the malicious container.”

How I did it
Defined kubernetes-cni version to be 0.8.6 and updated kubernetes version to be 1.18.6

How to verify it
Check versions by running dpkg -l | grep kube
2020-07-26 11:08:21 -07:00
Joe LeVeque
4a2db8e216 [caclmgrd] remove default DROP rule on FORWARD chain (#5034) 2020-07-26 11:07:42 -07:00
Nazarii Hnydyn
0ec979dd30
[Mellanox] Fix SN3700 platform string. (#5035)
Signed-off-by: Nazarii Hnydyn <nazariig@mellanox.com>
2020-07-25 03:04:03 -07:00
Joe LeVeque
3f3fcd3253 [caclmgrd] Filter DHCP packets based on dest port only (#4995) 2020-07-21 10:13:17 +00:00
Joe LeVeque
52e45e823e
[201911][sudoers] Add sonic_installer list to read-only commands (#4997)
`sonic_installer list` is a read-only command. Specify it as such in the sudoers file.

This will also ensure the new `show boot` command, which calls `sudo sonic_installer list` under the hood doesn't fail due to permissions.
2020-07-17 20:13:42 -07:00
Joe LeVeque
5591131bba
[201911][sonic-telemetry] Update submodule (#4987)
Point submodule to new 201911 branch of sonic-telemetry and update pointer to the current HEAD of the 201911 branch

* src/sonic-telemetry aaa9188...01b5365 (1):
  > [testdata] Update SFP keys to align with new standard (#39)
2020-07-17 17:37:21 -07:00
Joe LeVeque
840be7732c
[201911][devices] Update SFP keys to align with new standard (#4976)
Align SFP key names with new standard defined in https://github.com/Azure/sonic-platform-common/pull/97

- hardwarerev -> hardware_rev
- serialnum -> serial
- manufacturename -> manufacturer
- modelname -> model
- Connector -> connector
2020-07-16 11:09:47 -07:00
Samuel Angebault
41ba95ee3f
[arista] update Arista drivers submodules (#4967)
Merge most of the changes that recently made it to master.
This will be the last such merge operation and future commits will only cherry-pick fixes and targeted features.

Major fixes and features,
- reboot cause enhancement with more hardware reboot cause reporting
- fix reboot cause parsing issue with 201811 release
- fix get_change_event logic
- fix error message on missing sysfs entry by our plugins
- final piece of the platform refactors for fan and sensor reporting through the platform API
2020-07-16 10:36:07 -07:00
Danny Allen
0824509373 [docker-ptf] Add support for spytest to ptf container (#4410)
- Install apt and pip dependencies
- Define traffic generator service

Signed-off-by: Danny Allen <daall@microsoft.com>
2020-07-14 22:06:05 -07:00
Abhishek Dosi
0d6754d140 [Submodule update] sonic-snmpagent
[201911] Fix interface counters in RFC1213 (#144)
2020-07-14 22:05:13 -07:00
abdosi
b725572023
Fix the below frr start.sh jija2 exception in 201911 image syslog: (#4958)
File "/usr/local/bin/sonic-cfggen", line 380, in <module>
     main()
   File "/usr/local/bin/sonic-cfggen", line 354, in main
     print(template.render(data))
   File "/usr/local/lib/python2.7/dist-packages/jinja2/environment.py", line 1090, in render
     self.environment.handle_exception()
   File "/usr/local/lib/python2.7/dist-packages/jinja2/environment.py", line 832, in handle_exception
     reraise(*rewrite_traceback_stack(source=source))
   File "<template>", line 1, in top-level template code
   File "/usr/local/lib/python2.7/dist-packages/jinja2/environment.py", line 471, in getattr
     return getattr(obj, attribute)
 jinja2.exceptions.UndefinedError: 'WARM_RESTART' is undefined

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
2020-07-14 08:02:32 -07:00