Commit Graph

5266 Commits

Author SHA1 Message Date
Junchao-Mellanox
e925339a7e [Mellanox] Read PSU fan max/min speed per PSU (#8563)
#### Why I did it
New PSU could install different type of fan, so fan max/min speed should be read per PSU

#### How I did it
The existing implementation read PSU max/min fan speed from a common file, change it to read from per PSU file

#### How to verify it
Manual test
2021-09-02 15:45:26 -07:00
Aravind Mani
af98b9baf4 DellEMC: Z9332f fix LED issue (#8639) 2021-09-02 15:39:12 -07:00
Samuel Angebault
ae15bb953c [Arista] Fix flash size computation for Lodoga (#8622)
The Lodoga platform also matched crow which was hardcoding the flash
size to 3700. This change enables autodetect on Clearlake which in turns
allows autodetect for Lodoga.

The threshold was bumped from 3700 to 4000 because size computation can
differ slightly and report slightly above 3700.
2021-09-02 15:39:02 -07:00
Ying Xie
6b7fdd1bb8 [7050] define hwsku.json for Arista-7050QX-32S-S4Q31 to skip SFP checks for first 4 ports (#8624)
Why I did it
The first 4 ports on this dut are breakout ports. They might not always be connected in lab. Mark them as 'RJ45' to skip the SFP check since they are by default disabled.

How to verify it
run platform test_reboot.py

Signed-off-by: Ying Xie <ying.xie@microsoft.com>
2021-09-02 15:38:40 -07:00
Song Yuan
c9e01cf6ef [chassis] Set LAG Id range for 7800 chassis (#8052)
Configure LAG Id range in chassisdb.conf for 7800 chassis.
2021-09-02 15:38:28 -07:00
richardyu
1808d5894e [202012][saiserver docker]adds saiserver dependences (#8447)
Co-authored-by: richardyu-ms <richard.yu@microsoft.com>
2021-09-02 15:38:17 -07:00
Samuel Angebault
7ba0d3497f [Arista] Rely on automatic flash size detection for Lodoga (#8608)
Lodoga actually has a 8GB storage device.
LodogaSsd variant has a 30GB SSD drive.
However, in boot0 both were mishandled and assigned 4GB for legacy reasons.

Remove the hardcoding of the flash size and let boot0 autodetect the available space.
2021-09-02 15:38:04 -07:00
dflynn-Nokia
7a950cf49c [Nokia ixs7215] Add support for changing the console baud rate (#8595)
This commit adds support for changing the default console baud rate configured
within the U-Boot bootloader. That default baud rate is exposed via the value
of the U-Boot 'baudrate' environment variable. This commit removes logic that
hardcoded the console baud rate to 115200 and instead ensures that the U-Boot
'baudrate' variable is always used when constructing the Linux kernel boot
arguments used when booting Sonic.

A change is also made to rc.local to ensure that the specified baud rate is set
correctly in the serial getty service.
2021-09-02 15:37:54 -07:00
Shilong Liu
90e7f6b201 [build] Fix reproducible build issues (#8548)
* [build] Fix reproducible build issues
2021-09-02 15:37:42 -07:00
gechiang
8527e3fb18 BRCM Disable ACL Drop counted towards interface RX_DRP counters part II (#8596) 2021-09-02 15:37:17 -07:00
Samuel Angebault
6c6bfde24e
[202106][Arista] Update platform library submodules (#8643)
Update 202106 release with the Arista drivers from master.
This update is mostly targeted at chassis and is deemed stable.

Most notable chassis improvements:
 - fix psu reporting in platform api
 - powercycle fabrics on supervisor reboot
 - improve card powercycle reliability
 - fix led plugin when dealing with `Ethernet-Rec` and `Ethernet-IB`
 - fix system-health cli reporting
 - unset provisioning mode once the linecard has started
 - fix `show version` when running as `admin`
 - fix race between loading the eeprom module and sysfs file availablity
 - implement `get_all_asics` platform API
2021-09-01 23:23:11 -07:00
Judy Joseph
d7f5dded12 Revert "[bgpcfgd][voq] Fix for unit test failure in bgp config for voq switch (#8278)"
This reverts commit 2353c7decd.
2021-08-26 09:33:23 -07:00
Judy Joseph
dbcb55be66 Update sonic-swss submodule with
88a38f7 Ignore ALREADY_EXIST error in FDB creation (#1815)
b1c23f3 Change rif_rates.lua and port_rates.lua scripts to calculate rates correct (#1848)

Update sonic-utilities submodule with

cbc25d6 [config reload] Call systemctl reset-failed for snmp,telemetry,mgmt-framework services (#1773)
04dcd07 Improve config error handling on version_info (#1760)
e567a60 Load the database global_db. (#1752)
c15fb8f [sfputil] Gracefully handle improper 'specification_compliance' field (#1741)
39350f8 [dhcp_relay] Update CLI reference document and add a new API for ip address type (#1717)
18f13c6 [sonic-package-manager] switch from poetry-semver to semantic_version due to bugs found in poetry-semver (#1710)
b16724a [voq][chassis] VOQ cli show commands implementation (#1689)
9427cd6 [debug dump util] Match Infrastructure (#1666)
d9fb39b [route_check] Filter out VNET routes (#1612)
2021-08-25 22:52:58 -07:00
Judy Joseph
043e02e606 Update the sonic-linux-kernel submodule 2021-08-25 14:51:48 -07:00
byu343
4830ec99ae [Arista] Update phy-credo service for config load (#8005)
phy-credo.service will be restarted when running 'config reload'

Signed-off-by: Boyang Yu <byu@arista.com>
2021-08-25 14:41:36 -07:00
shlomibitton
edd6f4086c [dhcp_relay] Adapt config/show CLI commands to support DHCPv6 relay (#8211)
#### Why I did it
- Adapt config/show CLI commands to support DHCPv6 relay
- Support multiple dhcp servers assignment in one command
- Fix IP validation
- Adapt UT and add new UT cases

#### How I did it
- Modify config/show dhcp relay files
- Modify config/show UT files

#### How to verify it
This PR has a dependency on PR https://github.com/Azure/sonic-utilities/pull/1717
Build an image with the dependent PR and this PR
Use config/show DHCPv6 relay commands.
2021-08-25 12:45:03 -07:00
Christian Svensson
c7d4f5b8d8 [mgmt-framework]: Fix typo in mgmt_vars.j2 (#8475)
Signed-off-by: Christian Svensson <blue@cmd.nu>
2021-08-25 12:44:41 -07:00
vganesan-nokia
2353c7decd [bgpcfgd][voq] Fix for unit test failure in bgp config for voq switch (#8278)
The unit test failure was due to missing bgp graceful restart select
defer time configuration in voq_chassis.conf. Modified sample output
data file voq_chassis.conf to include this configuration.

Signed-off-by: vedganes <vedavinayagam.ganesan@nokia.com>
2021-08-25 12:44:23 -07:00
Vivek Reddy
b0f8633604 [Mellanox][master][SKU] sonic interface names are aligned to 4 instead of 8 for 4600/4600C platforms (#8155)
*Edited platform.json for 4600 & 4600C
*Edited hwsku.json and port_config.ini files for all the SKU's present under these platforms
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
2021-08-25 12:44:06 -07:00
Vivek Reddy
9a711645da [Mellanox] [master] Added D48C40 SKU for 4600C platform (#8201)
*Added new SKU for SN4600C Platform: Mellanox-SN4600C-D48C40
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
2021-08-25 12:43:47 -07:00
madhanmellanox
3057266ff6 Adding SKU Mellanox-SN3800-D100C12S2 (#8441)
* Adding SKU Mellanox-SN3800-D100C12S2

Co-authored-by: Madhan Babu <madhan@l-csi-0241l.mtl.labs.mlnx>
2021-08-25 12:43:16 -07:00
Kostiantyn Yarovyi
a83480f749 [Pcied] run by python 3
Why I did it
Pcied running by python 2.

How I did it
dropped python2 support and add python3 support for pcied in file docker-pmon.supervisord.conf.j2

How to verify it
docker exec pmon supervisorctl status
2021-08-25 12:38:14 -07:00
Alexander Allen
9a3e35ec83 [Mellanox] Upgrade Mellanox firmware tools to 4.17.0 (#8299)
- Why I did it
New release of MFT has the following changelog / RN
 Fixed an issue that resulted in getting MVPD read errors from the mlxfwmanager during fast reboot.
 Fixed mlxuptime sometimes generating a time less than previous due the wrong frequency calculation

- How I did it
Update makefile pointer to new version.

- How to verify it
Manually tested on all Mellanox platforms.
2021-08-25 12:37:36 -07:00
Volodymyr Samotiy
055d497799 [monit] Periodically monitor VNET route consistency (#8266)
*To run VNET route consistency check periodically.
*For any failure, the monit will raise alert based on return code.
Signed-off-by: Volodymyr Samotiy <volodymyrs@nvidia.com>
2021-08-25 12:37:19 -07:00
Alexander Allen
50dcc82ee0 [Mellanox] Add 2x100G and 4x50G breakout modes to MSN4410 platform (#8419)
- Why I did it
The MSN4410 platform was missing 2x100G and 4x50G supported breakout modes in platform.json

- How I did it
Added the aforementioned breakout modes to platform.json

- How to verify it
Run show interface breakout on a MSN4410 and verify that 2x100G and 4x50G are listed in the supported breakout modes for ports 1-24.
2021-08-25 12:36:31 -07:00
Samuel Angebault
8de67e0051 [Arista] Add VOQ information for Clearwater2 (#8508)
This change introduces 3 columns in the port_config.ini file.
These are coreId, corePortId and numVoq.
The ports for inband and recirc were also renamed properly.
2021-08-25 12:25:22 -07:00
Marty Y. Lok
f33b659aff Added Nokia IXR7250E support (#7809)
Why I did it
Support Nokia ixr7250E IMM and Supervisor cards

How I did it
Added modules x86_64-nokia_ixr7250e_sup-r0 and x86_64-nokia_ixr7250e_36x400g-r0 ../device/nokia directory.
Modified the platform/broadcom/one-image.mk to include NOKIA_IXR7250_PLATFORM_MODULE
Modified the platform/broadcom/rule.mk to include the platform-module-nokia.mk
2021-08-25 12:21:39 -07:00
abdosi
f9a2c096ab Enable sysctl fib_multipath_use_neigh (#8502)
Enable fib_multipath_use_neigh for v4
https://www.kernel.org/doc/Documentation/networking/ip-sysctl.txt

Why I did:
This is helpful if the neighbor are not directly connected then Kernel forward to unreachable neighbor option. With this option forwarding using neighbor state to be valid.
2021-08-25 12:21:20 -07:00
Stepan Blyshchak
9e9e2abdf7 [libteam][warm-reboot] fix issue in teamd warm-reboot that teamd starts (#8227)
with state of tdport from previous warm-reboot.

In case LAG was down before reboot, lacp->wr is not cleared.
In lacp_event_watch_port_flush_data we incremented nr_of_tdports and add
tdport to lacp->wr.state. In case lacp->wr.state already had this tdport
we do not set new state for tdport but appened a new item in
lacp->wr.state. In case we preformed warm-reboot and PortChannel member
was down, after reboot PortChannel member became up next warm-reboot
will initialize teamd with PortChannel member in down state.

Fix this issue by calling stop_wr_mode() when LAG was down. This was probably intended but missed.

#### Why I did it

To fix an issue seen in warm-reboot-sad test cases.

#### How I did it

I fixed it in SONiC libteam patch that adds warm-reboot support. Details in commit description.

#### How to verify it

Run warm-reboot-sad test on t0-56 topology.
2021-08-25 12:20:57 -07:00
Qi Luo
65c135a1ae Replace swsssdk with swsscommon in sonic-host-services (#8034)
#### Why I did it
swsssdk will be deprecated. Use swsscommon instead.

#### How to verify it
Unit test
2021-08-25 12:20:32 -07:00
minionatwork
1a41e76c45 [multi-asic] Fix for sonic-cfggen exception during platform string read (#8229)
Fix for sonic-cfggen exception during platform string read during fresh install and start of sonic in multi asic, /var/run/redisX/ is created after database docker is started.
2021-08-25 12:20:09 -07:00
Stephen Sun
288c49a085 Use predefined macro as vendor information (#8361)
#### Why I did it
Use a predefined variable to get vendor information when the swss docker container is created

#### How I did it
Use `{{ sonic_asic_platform }}` instead of `$SONIC_CFGGEN -y /etc/sonic/sonic_version.yml -v asic_type`

#### How to verify it
Manually test.
2021-08-25 12:19:09 -07:00
tomer-israel
69e32890e2 [Mellanox] fix syseeprom info values on mellanox simulator platforms msn4700 and msn4800 (#8387)
#### Why I did it
The values of the syseeprom were not valid

#### How I did it
I took the correct hexdump values from a real switch and created this hex file again

#### How to verify it
decode-syseeprom will display the new values
2021-08-25 12:18:37 -07:00
Ying Xie
34487eef5d [aboot] use ram partition for /var/log for devices with 3.7G disks (#8400)
Master/202012 image size grew quite a bit. 3.7G harddrive can no longer hold one image and safely upgrade to another image. Every bit of harddrive space is precious to save now.

Also sh syntax seemingly changed, [ condition ] && action was a legit syntax in 201911 branch but it is an error when condition not met with 202012 or later images. Change the syntax to if statement to avoid the issue.

Signed-off-by: Ying Xie ying.xie@microsoft.com
2021-08-25 12:18:19 -07:00
gechiang
c679ebf931 Reapply the fix to address setting MTU > 1500 causing portmgrd crash on BRCM platforms (#8472) 2021-08-25 12:17:56 -07:00
Kebo Liu
0108c7de58 [Mellanox] Upgrade hw-mgmt to 7.0100.2344 (#8463)
To pick up new PSU fan support from new hw-mgmt release

Signed-off-by: Kebo Liu <kebol@nvidia.com>
2021-08-25 12:17:36 -07:00
carl-nokia
aef7c85695 [Nokia ixs7215] sfputil support + component tests (#8445)
Deliver sfputil support for sfputil show eeprom and sfputil reset along with some component test case fixes

Co-authored-by: Carl Keene <keene@nokia.com>
2021-08-25 12:17:07 -07:00
Vladyslav Morokhovych
47496ec8a4 [swss] Fix arp_update script (#8412)
Fix #7968

Issue is detected on SONiC.20201231.11

In test_static_route.py::test_static_route_ecmp static routes are configured, but neighbors are not resolved after config reload even after 10 minutes.
It looks like the arp_update script is starting to ping when Vlan1000 is not fully configured.
When issue is reproduced, stuck ping6 process is observed in swss container :

USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         180  0.1  0.0   6296  1272 pts/0    S    17:03   0:03 ping6 -I Vlan1000 -n -q -i 0 -c 1 -W 0 ff02::1
And when arp_update script successfully resolves neighbors, we observe sleep 300 instead of ping process
2021-08-25 12:16:42 -07:00
Shi Su
499ad9141b [FRR]: Upgrade FRR to frr-7.5.1-s1 tag (#8443)
Update FRR 7.5.1 head.
2021-08-25 12:16:19 -07:00
carl-nokia
61dfcae4e0 [Nokia] Add hwsku.json for the Nokia-7215 (#8372)
* add hwsku.json for the Nokia-7215
* added required default_brkout_mode to hwsku as its not optional
* remove tabs from the file so spacing consistent

Co-authored-by: Carl Keene <keene@nokia.com>
2021-08-25 12:16:01 -07:00
shlomibitton
cbb825cb4b [hostcfgd] Delay hostcfgd and aaastatsd for faster boot time (#7965)
#### Why I did it
hostcfgd is starting at the same time as 'create_switch' method is called on orchagent process.
This introduce a degradation on the function execution time which eventually cause the fast-boot flow and a boot scenarion in general to run slower (~6 seconds).
This change will delay the start time of this daemon.
The aaastatsd will delay as well since it has a dependency on hostcfgd, so it is required to delay both.
90 seconds determined as the maximum allowed downtime for control plane to come back up on fast-boot flow.

#### How I did it
Add two timers for hostcfgd and aaastatsd  services in order to delay the startup of these services.

#### How to verify it
Install an image with this change and observe the daemons start 90 seconds after the system boot.
2021-08-25 12:15:16 -07:00
jerseyang
418ee0cd38 enable the emc2305 fan controller and NCP power controller 30ms timeout mechanism (#8138)
Why I did it
fix the dx010 system eeprom unavailable issue

How I did it
enable the i2c slave 30ms timeout mechanism

How to verify it
i2cstress test in DX010 iSMT controller bus

Co-authored-by: nicwu-cel <nicwu@celestica.com>
2021-08-25 12:14:59 -07:00
carl-nokia
43fa47d486 [sonic-device-data]: add port_type to OPTIONAL_PORT_ATTRIBUTES (#8370)
enable automated test suites to selectively run relevant tests ( or not run tests ) based upon a new port_type identifier in hwsku.json

How I did it
Modified the valid optional fields in validity check for hwsku.json per recommendation from Joe in
https://github.com/Azure/sonic-mgmt/pull/2654/files

Co-authored-by: Carl Keene <keene@nokia.com>
2021-08-25 12:14:17 -07:00
dflynn-Nokia
fb072d84cb [Nokia ixs7215] Watchdog timer support (#8377) 2021-08-25 12:13:44 -07:00
Rajkumar-Marvell
31f4154787 [reboot-cause] Fixed determine-reboot-cause.service failure. (#8210)
Signed-off-by: Rajkumar Pennadam Ramamoorthy rpennadamram@marvell.com

Why I did it
Install sonic image from ONIE. Once system is up, execute "config reload" command.

Root cause is that "determine-reboot-cause.service" was in failed state.
root@sonic:/host/reboot-cause# systemctl list-units --failed
UNIT LOAD ACTIVE SUB DESCRIPTION
● determine-reboot-cause.service loaded failed failed Reboot cause determination service

How I did it
Fixed the issue by setting default reason to "REBOOT_CAUSE_UNKNOWN" instead of "None".

How to verify it
Check " determine-reboot-cause.service' loaded successfully post image installation from ONIE.
Verify "reboot-cause.txt" file is created and config reload succeeds.
2021-08-25 12:13:15 -07:00
Shilong Liu
9fb60721f8 Reproducible build add docker image debian* to white list. (#8330)
#### Why I did it
1. Add version control for debian* docker image to white list.
2. Always record docker image sha256 value, regardless of white list.
2021-08-25 12:12:42 -07:00
Kebo Liu
dd9a9ba4c3 [Mellanox] Add new sensor conf to support SN4410 A1 system (#8379)
#### Why I did it

New SN410 A1 system has a different sensor layout with A0 system, needs a new sensor conf file to support it.

#### How I did it

Since the SN4410 A1 system use exactly the same sensor layout as the SN4700 A1 system, so add a symbol link linking to the SN4700 A1 sensor conf file to reuse.

#### How to verify it

Run sensor test against the SN4410 A1 system;
Run platform related regression test against the SN4410 A1 system
2021-08-25 12:12:18 -07:00
tjchadaga
8b780d68a9 Fix TH3 Warm-reboot failure due to Tunnel termination SAI failure (#8395) 2021-08-25 12:12:00 -07:00
gechiang
280df2ee46 BRCM Disable ACL Drop counted towards interface RX_DRP counters (#8382)
* BRCM Disable ACL Drop counted towards interface RX_DRP counters
2021-08-25 12:11:23 -07:00
judyjoseph
6bbfafb045 [build]: Update the make cache mode for opennsl-module-dnx (#8391)
Fix warning shown during compilation

[ DPKG ] Cache is not enabled for opennsl-modules-dnx_5.0.0.4_amd64.deb package
2021-08-25 12:10:59 -07:00