Signed-off-by: Yong Zhao yozhao@microsoft.com
Why I did it
Since we introduced a new value always_disabled for the state field in FEATURE table, the expected running container list
should exclude the always_diabled containers. This bug was found by nightly test and posted at here: issue. This PR fixes#7210.
How I did it
I added a logic condition to decide whether the value of state field of a container was always_disabled or not.
How to verify it
I verified this on the device str-dx010-acs-1.
Which release branch to backport (provide reason below if selected)
201811
201911
202006
[ x] 202012
Feb 17 Fix tests failing due to duplicate vxlan tunnel creation (#75)
Mar 11 Update route api to specify limitation (#77)
Apr 01 Add host_ifname field while adding entry in VLAN table (#80)
Unset CONFIG_THERMAL_STATISTICS to prevent kernel crash (#199)
[net] Disable prio and cls cgroups to make working cgroup2 sock matching (#198)
[doc]: Fix typos in README (#206)
[Mellanox] Backport patch to remove critical trip point from thermal zones (#201)
Signed-off-by: Stephen Sun <stephens@nvidia.com>
Why I did it
These ports are being enumerated by the latest SAI. But they are not defined in port_config.ini.
SONiC end up trying to delete these 3 ports and hit SAI error and crash.
How I did it
Add the GbE and the 2 HiGig ports in the port_config.ini.
How to verify it
Put the port_config.ini on a device crashing with port deleting. load minigraph and the crash stopped.
Signed-off-by: Ying Xie ying.xie@microsoft.com
#### Why I did it
Plexus-utils before 3.0.16 is vulnerable to command injection because it does not correctly process the contents of double quoted strings.
#### How I did it
Upgrade to 3.0.16
c5be3ca4 [psud] Increase unit test coverage; Refactor mock platform (#154)
450b7d78 Bug fix: the fields that are not supported by vendor should be "N/A" in STATE_DB (#168)
Signed-off-by: Stephen Sun <stephens@nvidia.com>
* 7260cx3 DualToR config.bcm support based on DualToR setting in device metadata at boot time.
For HWSKU Arista-7260CX3-C64 the MMU setting SOC for T0/T1 is also combined into the config.bcm.j2 logic so use just one config file and adding delta based on Switch Roles.
Signed-off-by: Yong Zhao yozhao@microsoft.com
Why I did it
If device reboot was caused by kernel panic, then we need retrieve and store the key information into the symbol file previous-reboot-cause.json. The CLI show reboot-cause will read this file to get the reason of previous reboot.
This PR is related to PR in sonic-utilities repo: Azure/sonic-utilities#1486
How I did it
The string variable previous_reboot_cause will be parsed to check whether it contains the keyword Kernel Panic. If it did, then store the keyword and time information into a dictionary.
How to verify it
I verified this change on a virtual testbed.
admin@vlab-01:/host/reboot-cause$ more previous-reboot-cause.json
{"gen_time": "2021_03_24_23_22_35", "cause": "Kernel Panic", "user": "N/A", "time": "Wed 24 Mar 2021 11:22:03 PM UTC", "comment": "N/A"}
admin@vlab-01:/host/reboot-cause$ show reboot-cause
Kernel Panic [Time: Wed 24 Mar 2021 11:22:03 PM UTC]
* Add manageability to the yang model tests by splitting the tests
and config data for the tests into multiple files.
The "tests" directory contains all the tests and the "tests_config"
directory contains the configs used for the tests.
New tests can be added in new json files.
Signed-off-by: Joyas Joseph <joyas_joseph@dell.com>
Update the sonic-swss submodule to include failure notification for orchagent. The following is the new commit in the submodule.
fa983d2 Add failure notification for orchagent
Signed-off-by: vedganes <vedavinayagam.ganesan@nokia.com>
Changes for setting platfrom specific lag id boundary id in the chassis
app db. The platfrom specific lag id boundaries are supplied via
chassisdb.conf. The lag_id_start and lag_id_end boundary values sourced
from this file are set in chassis app db which will be used by lag id
allocator to allocate unique lag id in atomic fashion
Integrate hw-management package V.7.0010.2002
Bug fixes:
Removing critical thermal zones to prevent unexpected software system shutdown:
*Kernel 4.9 -0071-mlxsw-core-Remove-critical-trip-point-from-thermal-z.patch
*Kernel 4.19 -076-mlxsw-core-Remove-critical-trip-point-from-thermal-z.patch
Removing redundant link for cpld3 for fixed systems (SN2100, SN2010).
Fix an issue with missed attribute for cpld3 (port CPLD) for SN2700, SN2410.
Signed-off-by: Stephen Sun <stephens@nvidia.com>
this PR updates the following commits in sonic-platform-daemons
260cf2d [xcvrd] change firmware information fields name inside MUX_CABLE_INFO table for Y cable (#165)
cfa600f [thermalctld] Initialize fan led in thermalctld for the first run (#167)
8509f43 [thermalctld] Refactor to allow for greater unit test coverage; Add more unit tests (#157)
70f4e7b [syseepromd] Update warning message to be more informative (#160)
Signed-off-by: vaibhav-dahiya <vdahiya@microsoft.com>
[SFlowMgr] Sflow Crash on 200G ports handled (#1683)
Remove PGs from an administratively down port. (#1677)
Stablize the test case (#1679)
Revert "Revert "[buffermgr] Support maximum port headroom checking (#1607)" (#1675)" (#1682)
Signed-off-by: Stephen Sun <stephens@nvidia.com>
py2/py3/deb packages names are case insensitive, and the versions map
key should be the same for packages whose name can have different cases.
For example, in files/build/versions/default/versions-py3, package
"click==7.1.2" is pinned; and in
files/build/versions/dockers/docker-sonic-vs/versions-py3, package
"Click==7.0" is pinned.
Without this fix, the aggregated versions-py3 file used for building
docker-sonic-vs looks like below:
...
click==7.1.2
Click==7.0
...
However, we actually want "click==7.0" to overwrite "click==7.1.2" for
docker-sonic-vs build.
Dynamic Port Breakout fall in case "autoneg" field exist in config_db.
- How I did it
Added "autoneg" field in sonic-port yang model.
- How to verify it
Add "autoneg" field into config_db like this:
"Ethernet8": {
"index": "2",
"lanes": "8,9,10,11",
"fec": "rs",
"pfc_asym": "off",
"mtu": "9100",
"alias": "Ethernet8",
"admin_status": "up",
"autoneg": "on",
"speed": "100000",
},
Changes:
-- YANG models for PORTCHANNEL_MEMBER table.
-- Yang Model Test.
-- Yang Mgmt Test with PORTCHANNEL_MEMBER table in config_db.json
Signed-off-by: Praveen Chaudhary <pchaudhary@linkedin.com>
Build Marvell kernel driver for prestera sai sdk
Builds interrupt and dma kernel driver
Removed the older method pre-compiled kernel module debian package and its makefile
* [yang-models]: Remove PLY Extensions and change translation code.
With assumption that TABLE_SEPARATOR and ENTRY_SEPARATOR for configDB is always "|",
translation from configDB.json to sonicYang.json can be done based on keys specified
in YANG Lists inside YANG models. So removing extensions is good idea.
Changes:
-- Remove use of regex in Translation code.
-- Remove regex Extensions from YANG models.
-- Improved debugging i.e. log on stdout in case of any Exception from sonic-yang-mgmt,
so that failed tests can be debugged faster. Also this is good to debug Dynamic
port breakout issues.
-- Minor Test changes.
Co-authored-by: lguohan <lguohan@gmail.com>
IPV4ANY is not valid value, fix to IPv4ANY
without this change, test case failed sometimes when the validation on IP_TYPE happens first and then PACKET_ACTION.
To add latest SAI drop REL_4.3.3.3 to SONIC which addresses the following CSP cases:
CS00012058054: [4.3][IPinIP][TTL-PIPE] IPinIP TTL Pipe Mode is NOT working it is behaving UNIFORM mode even programed as PIPE mode
CS00011227466: [4.3] Warmboot support with tunnel encap
* Updating version of hsflow daemon to apply
fix, which resolves problem of switching
between IPv4 and IPv6, in case when the
IPv4 has deleted for the interface.
The new release of hsflowd contains the fix for the issue: sflow/host-sflow@2703ecb
How I did it
HSFLOWD_VERSION env variable has changed in the rules to be pointed to the latest release of hsflowd.
How to verify it
sudo config sflow enable
sudo config loopback add Loopback1
sudo config int ip add Loopback1 a84f:97ff:fea7:33a5::fe80/64
sudo config int ip add Loopback1 192.168.101.1/24
sudo config sflow agent-id add Loopback1
sudo config sflow collector add Collector1 192.168.101.1
sudo config sflow collector add Collector2 a84f:97ff:fea7:33a5::fe80
use sudo sflowtool -p 6343 -l for checking sflow data
remove and add again the ipv4 entry of Loopback1.
hsflowd should change agent ip from IPv4 to IPv6 and wise versa, depending on IPv4 entry present or not.
Switching between IPs is being performed by hsflowd, based on IP address priority ranking.
Signed-off-by: Maksym Belei <Maksym_Belei@jabil.com>
The file device/mellanox/x86_64-mlnx_msn4410-r0/plugins/sfputil.py is not a software link for device/mellanox/x86_64-mlnx_msn2700-r0/plugins/sfputil.py. And it is still using python2 syntex which causes some SFP CLI error. The PR is to change it to a softlink and add 4410 support in device/mellanox/x86_64-mlnx_msn2700-r0/plugins/sfputil.py.
Fix the following issues:
Spectrum-2, Spectrum-3 | Port | Fix link issue when using 25 GbE rate between two ports while one is on Spectrum-2-based system and the other is on Spectrum-3-based system
All | warmboot | fail to upgrade from earlier SONiC versions with official SDK/FW 4.4.2306 (was on SONiC 201911)
All | What-Just-Happened | When enabling or disabling WJH under high traffic load to the host CPU, in very specific and low probability conditions, an error could occur, that may result in loss of data, channel failure or in extreme cases SW failure
Signed-off-by: Volodymyr Samotiy <volodymyrs@nvidia.com>
* 1ee04fb (HEAD -> master, origin/master, origin/HEAD) Modified the tests to use mock functionality of get_child_port function under portconfig utility (#1464)
* 99d251f Enable PFCWD only on ports where PFC is enabled (#1508)
* eb7945f Warmboot script improvements - timeout exec, disable swss autorestart, remove trap (#1495)
* c7d4947 [show] Fix int status of LAGs, configured as Vlan members (#1478)
Signed-off-by: Maksym Belei <Maksym_Belei@jabil.com>
To improve management of docker-gbsyncd-vs. gbsyncd_startup.py simply spawned syncd processes and then exited. In that case, supervisord would no longer manage any processes in the container, and thus there was no way to know if a critical process had exited.
I recently created gbsyncdmgrd to be a more complete, robust replacement for gbsyncd_startup.py.
NOTE: This PR is dependent on the inclusion of gbsyncdmgrd in the sonic-sairedis repo. A submodule update is pending at
#7089
The default bgp connect retry timer is 120 seconds. A reconnection will happen 120 seconds if the initial connection fails. This PR aims to allow a more frequent retry.
The psample module was not loaded on barefoot platform. The loading of this module is a prerequisite for testing SFlow.
* add `.gitignore` to the `barefoot` subdirectory to overwrite ignore "platform/**/debian/*" in the root directory
Initialize fans and thermals lists on demand; make them properties in order to reduce Chassis object initialization time
Signed-off-by: Volodymyr Boyko <volodymyrx.boiko@intel.com>
Update sonic-sairedis submodule and also update sonic-swss submodule as there are interdependent changes.
* src/sonic-sairedis 13474d1...bc58b0f (12):
> Add gbsyncdmgrd; deprecate gbsyncd_startup.py (#809)
> Remove gbsyncd_start.sh (#808)
> [gbsyncd] Fix shebang in gbsyncd_startup.py; Make script executable (#807)
> [saiasiccmp] Add saiasiccmp tool to compare 2 asic views (#791)
> [configure] Add -Wno-psabi to remove "passing argument changed in GCC 7.1" (#799)
> Update FlexCounter.cpp, use m_pollInterval in MUTEX lock (#797)
> [vs] Add special warm boot logic to populate default attributes (#796)
> [ci]: add vstest (#795)
> [tests] Add macsec unittest (#782)
> [debian/control] libsairedis-dev depends on libzmq5-dev (#794)
> [ci]: use build template (#793)
> Rename duplicate file name (#773)
* src/sonic-swss 0b0d24c...5adb73e (47):
> Initialize system port type variable (#1681)
> [Dynamic Buffer Calc] Enhance the field checking in table handling (#1680)
> Handle the clear request for 'Q_SHARED_ALL' (#1653)
> [MuxOrch] FDB ageout safety check (#1674)
> Deactivate mirror session only when session status is true in updateLagMember (#1666)
> Revert "[buffermgr] Support maximum port headroom checking (#1607)" (#1675)
> reduce severity of log to info in case of flush on non-existing member (#1669)
> Revert "[Dynamic buffer calc] Bug fix: Remove PGs from an administratively down port. (#1652)" (#1676)
> [Dynamic buffer calc] Bug fix: Remove PGs from an administratively down port. (#1652)
> [acl] Move ACL table constants to acltable.h (#1671)
> [nbrmgrd] added function to parse IP address from APP_DB (#1672)
> [MUX/PFCWD] Use in_ports for acls instead of seperate ACL table (#1670)
> [vog/systemlag] Voq lagid allocator (#1603)
> Add table descriptions for dynamic buffer calculation to the documents (#1664)
> [vstest/subintf] Add vs test case to validate processing sequence of APPL DB keys (#1663)
> Remove vxlanmgrd dependency on orchagent (#1647)
> Keep attribute order in bulk mode (#1659)
> [mux] VS test for neigh, route and fdb (#1656)
> [linksync] Netdev oper status determination using IFF_RUNNING (#1568)
> [portorch] parse on/off value from autoneg (#1658)
> [intfsorch] Create subport with the entry contains necessary attributes (#1650)
> [ci]: Purge swss before install (#1654)
> Update StateDB with error if state change failed, Update APP_DB in all state chg req (#1662)
> Added changes to handle dependency check in FdbSyncd and FpmSyncd for warm-boot (#1556)
> [synchronous mode] Add failure notification for SAI failures in synchronous mode (#1596)
> [acl] Enable VLAN ID qualifier for ACL rules (#1648)
> Updated PFCWD to use single ACL table for PFCWD and MUX (#1620)
> [orchagent] Increase SAI REDIS response timeout to support FW upgrade during init (Mellanox only). (#1637)
> [vstest/nhg]: use dvs_route fixture to make test_nhg more robust
> [vstest]: add dvs_route fixture
> [vstest/subintf] Update vs tests to validate physical port host interface vlan tag attribute (#1634)
> Remove useless header in macsecorch (#1628)
> Add SAI_INGRESS_PRIORITY_GROUP_STAT_DROPPED_PACKETS counter, create new FlexCounter group (#1600)
> fixed unsupported resource issue (#1641)
> [test_virtual_chassis]: use wait_for to make test more robust (#1640)
> spell check fixes (#1630)
> [bufferorch] Handle NOT IMPLEMENTED status returned during set attr operation (#1639)
> [ci]: run vstest
> [test_virtual_chassis]: use wait_for function to improve test robustness
> [Mux] Neighbor handling based on FDB entry (#1631)
> [ci]: use build template (#1633)
> Log level change from ERR to INFO for fetch systemports issue (#1632)
> Migrate serdes programming to port serdes object (#1611)
> [tests] Remove legacy saiattributelist.h dependency (#1608)
> [buffermgr] Support maximum port headroom checking (#1607)
> Support shared headroom pool on top of dynamic buffer calculation (#1581)
> Fix the compiling errors in gcc9 (#1621)
- Why I did it
The existing Fan led and Psu led object initialize itself to green color in init method. However, there are multiple daemons calls sonic platform API and there could be a case that:
A PSU is removed from system
Reboot switch
psud detects that 1 PSU is missing and set PSU led to red
Other daemon just start up and call sonic platform API, the API set PSU led to green by call PsuLed.init
This PR is a partial fix for the issue. As we also need guarantee that the led is initialized with a correct value. I checked existing psud and thermalctld code. psud always initialize the PSU led color on boot up, thermalcltd need some changes to initialize led color on the first run
- How I did it
Remove the led color initialization code from FanLed.init and PsuLed.init
- How to verify it
Manual test