Commit Graph

5074 Commits

Author SHA1 Message Date
Samuel Angebault
d410d26e26 [Arista] Fix Clearwater2 phy initialization when no configuration is provided (#8271)
Why I did it
Fix an issue on the Clearwater2 linecard.
When the linecard is started with a fresh image without configuration, phys would not be initialized.

How I did it
Added default_sku for Clearwater2 which prevents config-setup from failing to create a default config_db.json.
Added some extra logic in the phy-credo-init script to run the phy_config.sh of the hwsku pointed by default_sku if the DEVICE_METADATA.localhost.hwsku information is not populated in CONFIG_DB.

How to verify it
Booting an image with this change and without configuration will lead to the phys being initialized using the phy_config.sh from default_sku.
2021-09-14 09:36:55 -07:00
Nazarii Hnydyn
e3a827a14d [Mellanox] Advance hw-mgmt to V.7.0010.2346. (#8667)
Commits on Sep 01, 2021
hw-mgmt: attributes: Add PSU power sensor attributes d8fce39

Commits on Sep 02, 2021
Remove MFT package flint tool from hw-management dump generation. 53d06b2
hw-mgmt: debug: Add timeout to generate-dump.sh b661fa3 

Signed-off-by: Nazarii Hnydyn <nazariig@nvidia.com>
2021-09-14 09:36:49 -07:00
Judy Joseph
1ca6dc9963 Submodule updates
sonic-swss

73f6f68 [Flex Counters] Delay flex counters even if tables are present in the DB (#1877)
5edb9e5 [buffer orch] Bugfix: Don't query counter SAI_BUFFER_POOL_STAT_XOFF_ROOM_WATERMARK_BYTES on a pool where it is not supported (#1857)
fce0c60 [crm] Fix for Issue Azure/sonic-buildimage#8036 (#1829)

sonic-utilities

2630ac1 [Fast-reboot] Set flex counters delay indicator to prevent flex counters enablement after fast-reboot (#1768)
606f1b1 [portstat pfcstat] Unify the packet number format in the output of portstat and pfcstat in all cases (#1755)
2c6a15e [ecnconfig] Fix exception seen during display and add unit tests (#1784)
9b1995e Fix logic in RIF counters print (#1732)

sonic-swss-comon

3e7b81f Add a new field for FLEX_COUNTER_TABLE to indicate delay for flex counters (#523)
2021-09-02 16:33:40 -07:00
Ann Pokora
e5e273d8a8 [yang]: sonic-yang-models updates for MPLS (#7881)
SONiC YANG model support in buildimage for MPLS:

sonic-yang-model support for MPLS enable/disable
sonic-yang-model support for MPLS CRM thresholds
2021-09-02 15:46:49 -07:00
Kebo Liu
17d0dc3a81 [Mellanox] remove sensor conf for SN4600 A0 platform due to EOL (#8629)
- Why I did it
SN4600 A0 platform was EOL, so there is no need to support it, sensor conf can be removed and we don't need to maintain 2 sensor conf files, only A1 platform is needed.

- How I did it
Remove get_sensors_conf_path which intends to load correct sensor conf for different(A0/A1) platforms.
Remove the sensor conf for A0 platform, rename previous sensor.conf.a1 to sensor.conf

- How to verify it
Run sensor test on the SN4600 platform.
2021-09-02 15:46:37 -07:00
richardyu
0f1d58a0c9 [SAIServer] sai server reads config from hwsku folder (#8625)
To enable saiserver docker on different platforms, it needs different configuration files. make the saiserver docker mount them in hwsku folder.

Co-authored-by: Ubuntu <richardyu@richardyu-ubuntu-vm0.trsxrdzozv2e1czsze2t05vqzh.ix.internal.cloudapp.net>
2021-09-02 15:46:23 -07:00
carl-nokia
75ef115e8e [Nokia ixs7215] sfp get_name test case fix (#8507)
Account for sfputil_helper indexing being 0 based

Co-authored-by: Carl Keene <keene@nokia.com>
2021-09-02 15:46:12 -07:00
shlomibitton
be5236b3b5 [Flex Counters] Reset flex counters delay flag on config DB when enable_counters script is called (#8500)
#### Why I did it
Reset flex counters delay flag on config DB when enable_counters script is called to allow enablement of flex counters in orchagent.

#### How I did it
Push to config DB 'false' value for delay indication when enable_counters script is called before enabling the counters.

#### How to verify it
Observe counters are created when enable_counters script is called.
2021-09-02 15:46:01 -07:00
Junchao-Mellanox
e925339a7e [Mellanox] Read PSU fan max/min speed per PSU (#8563)
#### Why I did it
New PSU could install different type of fan, so fan max/min speed should be read per PSU

#### How I did it
The existing implementation read PSU max/min fan speed from a common file, change it to read from per PSU file

#### How to verify it
Manual test
2021-09-02 15:45:26 -07:00
Aravind Mani
af98b9baf4 DellEMC: Z9332f fix LED issue (#8639) 2021-09-02 15:39:12 -07:00
Samuel Angebault
ae15bb953c [Arista] Fix flash size computation for Lodoga (#8622)
The Lodoga platform also matched crow which was hardcoding the flash
size to 3700. This change enables autodetect on Clearlake which in turns
allows autodetect for Lodoga.

The threshold was bumped from 3700 to 4000 because size computation can
differ slightly and report slightly above 3700.
2021-09-02 15:39:02 -07:00
Ying Xie
6b7fdd1bb8 [7050] define hwsku.json for Arista-7050QX-32S-S4Q31 to skip SFP checks for first 4 ports (#8624)
Why I did it
The first 4 ports on this dut are breakout ports. They might not always be connected in lab. Mark them as 'RJ45' to skip the SFP check since they are by default disabled.

How to verify it
run platform test_reboot.py

Signed-off-by: Ying Xie <ying.xie@microsoft.com>
2021-09-02 15:38:40 -07:00
Song Yuan
c9e01cf6ef [chassis] Set LAG Id range for 7800 chassis (#8052)
Configure LAG Id range in chassisdb.conf for 7800 chassis.
2021-09-02 15:38:28 -07:00
richardyu
1808d5894e [202012][saiserver docker]adds saiserver dependences (#8447)
Co-authored-by: richardyu-ms <richard.yu@microsoft.com>
2021-09-02 15:38:17 -07:00
Samuel Angebault
7ba0d3497f [Arista] Rely on automatic flash size detection for Lodoga (#8608)
Lodoga actually has a 8GB storage device.
LodogaSsd variant has a 30GB SSD drive.
However, in boot0 both were mishandled and assigned 4GB for legacy reasons.

Remove the hardcoding of the flash size and let boot0 autodetect the available space.
2021-09-02 15:38:04 -07:00
dflynn-Nokia
7a950cf49c [Nokia ixs7215] Add support for changing the console baud rate (#8595)
This commit adds support for changing the default console baud rate configured
within the U-Boot bootloader. That default baud rate is exposed via the value
of the U-Boot 'baudrate' environment variable. This commit removes logic that
hardcoded the console baud rate to 115200 and instead ensures that the U-Boot
'baudrate' variable is always used when constructing the Linux kernel boot
arguments used when booting Sonic.

A change is also made to rc.local to ensure that the specified baud rate is set
correctly in the serial getty service.
2021-09-02 15:37:54 -07:00
Shilong Liu
90e7f6b201 [build] Fix reproducible build issues (#8548)
* [build] Fix reproducible build issues
2021-09-02 15:37:42 -07:00
gechiang
8527e3fb18 BRCM Disable ACL Drop counted towards interface RX_DRP counters part II (#8596) 2021-09-02 15:37:17 -07:00
Samuel Angebault
6c6bfde24e
[202106][Arista] Update platform library submodules (#8643)
Update 202106 release with the Arista drivers from master.
This update is mostly targeted at chassis and is deemed stable.

Most notable chassis improvements:
 - fix psu reporting in platform api
 - powercycle fabrics on supervisor reboot
 - improve card powercycle reliability
 - fix led plugin when dealing with `Ethernet-Rec` and `Ethernet-IB`
 - fix system-health cli reporting
 - unset provisioning mode once the linecard has started
 - fix `show version` when running as `admin`
 - fix race between loading the eeprom module and sysfs file availablity
 - implement `get_all_asics` platform API
2021-09-01 23:23:11 -07:00
Judy Joseph
d7f5dded12 Revert "[bgpcfgd][voq] Fix for unit test failure in bgp config for voq switch (#8278)"
This reverts commit 2353c7decd.
2021-08-26 09:33:23 -07:00
Judy Joseph
dbcb55be66 Update sonic-swss submodule with
88a38f7 Ignore ALREADY_EXIST error in FDB creation (#1815)
b1c23f3 Change rif_rates.lua and port_rates.lua scripts to calculate rates correct (#1848)

Update sonic-utilities submodule with

cbc25d6 [config reload] Call systemctl reset-failed for snmp,telemetry,mgmt-framework services (#1773)
04dcd07 Improve config error handling on version_info (#1760)
e567a60 Load the database global_db. (#1752)
c15fb8f [sfputil] Gracefully handle improper 'specification_compliance' field (#1741)
39350f8 [dhcp_relay] Update CLI reference document and add a new API for ip address type (#1717)
18f13c6 [sonic-package-manager] switch from poetry-semver to semantic_version due to bugs found in poetry-semver (#1710)
b16724a [voq][chassis] VOQ cli show commands implementation (#1689)
9427cd6 [debug dump util] Match Infrastructure (#1666)
d9fb39b [route_check] Filter out VNET routes (#1612)
2021-08-25 22:52:58 -07:00
Judy Joseph
043e02e606 Update the sonic-linux-kernel submodule 2021-08-25 14:51:48 -07:00
byu343
4830ec99ae [Arista] Update phy-credo service for config load (#8005)
phy-credo.service will be restarted when running 'config reload'

Signed-off-by: Boyang Yu <byu@arista.com>
2021-08-25 14:41:36 -07:00
shlomibitton
edd6f4086c [dhcp_relay] Adapt config/show CLI commands to support DHCPv6 relay (#8211)
#### Why I did it
- Adapt config/show CLI commands to support DHCPv6 relay
- Support multiple dhcp servers assignment in one command
- Fix IP validation
- Adapt UT and add new UT cases

#### How I did it
- Modify config/show dhcp relay files
- Modify config/show UT files

#### How to verify it
This PR has a dependency on PR https://github.com/Azure/sonic-utilities/pull/1717
Build an image with the dependent PR and this PR
Use config/show DHCPv6 relay commands.
2021-08-25 12:45:03 -07:00
Christian Svensson
c7d4f5b8d8 [mgmt-framework]: Fix typo in mgmt_vars.j2 (#8475)
Signed-off-by: Christian Svensson <blue@cmd.nu>
2021-08-25 12:44:41 -07:00
vganesan-nokia
2353c7decd [bgpcfgd][voq] Fix for unit test failure in bgp config for voq switch (#8278)
The unit test failure was due to missing bgp graceful restart select
defer time configuration in voq_chassis.conf. Modified sample output
data file voq_chassis.conf to include this configuration.

Signed-off-by: vedganes <vedavinayagam.ganesan@nokia.com>
2021-08-25 12:44:23 -07:00
Vivek Reddy
b0f8633604 [Mellanox][master][SKU] sonic interface names are aligned to 4 instead of 8 for 4600/4600C platforms (#8155)
*Edited platform.json for 4600 & 4600C
*Edited hwsku.json and port_config.ini files for all the SKU's present under these platforms
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
2021-08-25 12:44:06 -07:00
Vivek Reddy
9a711645da [Mellanox] [master] Added D48C40 SKU for 4600C platform (#8201)
*Added new SKU for SN4600C Platform: Mellanox-SN4600C-D48C40
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
2021-08-25 12:43:47 -07:00
madhanmellanox
3057266ff6 Adding SKU Mellanox-SN3800-D100C12S2 (#8441)
* Adding SKU Mellanox-SN3800-D100C12S2

Co-authored-by: Madhan Babu <madhan@l-csi-0241l.mtl.labs.mlnx>
2021-08-25 12:43:16 -07:00
Kostiantyn Yarovyi
a83480f749 [Pcied] run by python 3
Why I did it
Pcied running by python 2.

How I did it
dropped python2 support and add python3 support for pcied in file docker-pmon.supervisord.conf.j2

How to verify it
docker exec pmon supervisorctl status
2021-08-25 12:38:14 -07:00
Alexander Allen
9a3e35ec83 [Mellanox] Upgrade Mellanox firmware tools to 4.17.0 (#8299)
- Why I did it
New release of MFT has the following changelog / RN
 Fixed an issue that resulted in getting MVPD read errors from the mlxfwmanager during fast reboot.
 Fixed mlxuptime sometimes generating a time less than previous due the wrong frequency calculation

- How I did it
Update makefile pointer to new version.

- How to verify it
Manually tested on all Mellanox platforms.
2021-08-25 12:37:36 -07:00
Volodymyr Samotiy
055d497799 [monit] Periodically monitor VNET route consistency (#8266)
*To run VNET route consistency check periodically.
*For any failure, the monit will raise alert based on return code.
Signed-off-by: Volodymyr Samotiy <volodymyrs@nvidia.com>
2021-08-25 12:37:19 -07:00
Alexander Allen
50dcc82ee0 [Mellanox] Add 2x100G and 4x50G breakout modes to MSN4410 platform (#8419)
- Why I did it
The MSN4410 platform was missing 2x100G and 4x50G supported breakout modes in platform.json

- How I did it
Added the aforementioned breakout modes to platform.json

- How to verify it
Run show interface breakout on a MSN4410 and verify that 2x100G and 4x50G are listed in the supported breakout modes for ports 1-24.
2021-08-25 12:36:31 -07:00
Samuel Angebault
8de67e0051 [Arista] Add VOQ information for Clearwater2 (#8508)
This change introduces 3 columns in the port_config.ini file.
These are coreId, corePortId and numVoq.
The ports for inband and recirc were also renamed properly.
2021-08-25 12:25:22 -07:00
Marty Y. Lok
f33b659aff Added Nokia IXR7250E support (#7809)
Why I did it
Support Nokia ixr7250E IMM and Supervisor cards

How I did it
Added modules x86_64-nokia_ixr7250e_sup-r0 and x86_64-nokia_ixr7250e_36x400g-r0 ../device/nokia directory.
Modified the platform/broadcom/one-image.mk to include NOKIA_IXR7250_PLATFORM_MODULE
Modified the platform/broadcom/rule.mk to include the platform-module-nokia.mk
2021-08-25 12:21:39 -07:00
abdosi
f9a2c096ab Enable sysctl fib_multipath_use_neigh (#8502)
Enable fib_multipath_use_neigh for v4
https://www.kernel.org/doc/Documentation/networking/ip-sysctl.txt

Why I did:
This is helpful if the neighbor are not directly connected then Kernel forward to unreachable neighbor option. With this option forwarding using neighbor state to be valid.
2021-08-25 12:21:20 -07:00
Stepan Blyshchak
9e9e2abdf7 [libteam][warm-reboot] fix issue in teamd warm-reboot that teamd starts (#8227)
with state of tdport from previous warm-reboot.

In case LAG was down before reboot, lacp->wr is not cleared.
In lacp_event_watch_port_flush_data we incremented nr_of_tdports and add
tdport to lacp->wr.state. In case lacp->wr.state already had this tdport
we do not set new state for tdport but appened a new item in
lacp->wr.state. In case we preformed warm-reboot and PortChannel member
was down, after reboot PortChannel member became up next warm-reboot
will initialize teamd with PortChannel member in down state.

Fix this issue by calling stop_wr_mode() when LAG was down. This was probably intended but missed.

#### Why I did it

To fix an issue seen in warm-reboot-sad test cases.

#### How I did it

I fixed it in SONiC libteam patch that adds warm-reboot support. Details in commit description.

#### How to verify it

Run warm-reboot-sad test on t0-56 topology.
2021-08-25 12:20:57 -07:00
Qi Luo
65c135a1ae Replace swsssdk with swsscommon in sonic-host-services (#8034)
#### Why I did it
swsssdk will be deprecated. Use swsscommon instead.

#### How to verify it
Unit test
2021-08-25 12:20:32 -07:00
minionatwork
1a41e76c45 [multi-asic] Fix for sonic-cfggen exception during platform string read (#8229)
Fix for sonic-cfggen exception during platform string read during fresh install and start of sonic in multi asic, /var/run/redisX/ is created after database docker is started.
2021-08-25 12:20:09 -07:00
Stephen Sun
288c49a085 Use predefined macro as vendor information (#8361)
#### Why I did it
Use a predefined variable to get vendor information when the swss docker container is created

#### How I did it
Use `{{ sonic_asic_platform }}` instead of `$SONIC_CFGGEN -y /etc/sonic/sonic_version.yml -v asic_type`

#### How to verify it
Manually test.
2021-08-25 12:19:09 -07:00
tomer-israel
69e32890e2 [Mellanox] fix syseeprom info values on mellanox simulator platforms msn4700 and msn4800 (#8387)
#### Why I did it
The values of the syseeprom were not valid

#### How I did it
I took the correct hexdump values from a real switch and created this hex file again

#### How to verify it
decode-syseeprom will display the new values
2021-08-25 12:18:37 -07:00
Ying Xie
34487eef5d [aboot] use ram partition for /var/log for devices with 3.7G disks (#8400)
Master/202012 image size grew quite a bit. 3.7G harddrive can no longer hold one image and safely upgrade to another image. Every bit of harddrive space is precious to save now.

Also sh syntax seemingly changed, [ condition ] && action was a legit syntax in 201911 branch but it is an error when condition not met with 202012 or later images. Change the syntax to if statement to avoid the issue.

Signed-off-by: Ying Xie ying.xie@microsoft.com
2021-08-25 12:18:19 -07:00
gechiang
c679ebf931 Reapply the fix to address setting MTU > 1500 causing portmgrd crash on BRCM platforms (#8472) 2021-08-25 12:17:56 -07:00
Kebo Liu
0108c7de58 [Mellanox] Upgrade hw-mgmt to 7.0100.2344 (#8463)
To pick up new PSU fan support from new hw-mgmt release

Signed-off-by: Kebo Liu <kebol@nvidia.com>
2021-08-25 12:17:36 -07:00
carl-nokia
aef7c85695 [Nokia ixs7215] sfputil support + component tests (#8445)
Deliver sfputil support for sfputil show eeprom and sfputil reset along with some component test case fixes

Co-authored-by: Carl Keene <keene@nokia.com>
2021-08-25 12:17:07 -07:00
Vladyslav Morokhovych
47496ec8a4 [swss] Fix arp_update script (#8412)
Fix #7968

Issue is detected on SONiC.20201231.11

In test_static_route.py::test_static_route_ecmp static routes are configured, but neighbors are not resolved after config reload even after 10 minutes.
It looks like the arp_update script is starting to ping when Vlan1000 is not fully configured.
When issue is reproduced, stuck ping6 process is observed in swss container :

USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         180  0.1  0.0   6296  1272 pts/0    S    17:03   0:03 ping6 -I Vlan1000 -n -q -i 0 -c 1 -W 0 ff02::1
And when arp_update script successfully resolves neighbors, we observe sleep 300 instead of ping process
2021-08-25 12:16:42 -07:00
Shi Su
499ad9141b [FRR]: Upgrade FRR to frr-7.5.1-s1 tag (#8443)
Update FRR 7.5.1 head.
2021-08-25 12:16:19 -07:00
carl-nokia
61dfcae4e0 [Nokia] Add hwsku.json for the Nokia-7215 (#8372)
* add hwsku.json for the Nokia-7215
* added required default_brkout_mode to hwsku as its not optional
* remove tabs from the file so spacing consistent

Co-authored-by: Carl Keene <keene@nokia.com>
2021-08-25 12:16:01 -07:00
shlomibitton
cbb825cb4b [hostcfgd] Delay hostcfgd and aaastatsd for faster boot time (#7965)
#### Why I did it
hostcfgd is starting at the same time as 'create_switch' method is called on orchagent process.
This introduce a degradation on the function execution time which eventually cause the fast-boot flow and a boot scenarion in general to run slower (~6 seconds).
This change will delay the start time of this daemon.
The aaastatsd will delay as well since it has a dependency on hostcfgd, so it is required to delay both.
90 seconds determined as the maximum allowed downtime for control plane to come back up on fast-boot flow.

#### How I did it
Add two timers for hostcfgd and aaastatsd  services in order to delay the startup of these services.

#### How to verify it
Install an image with this change and observe the daemons start 90 seconds after the system boot.
2021-08-25 12:15:16 -07:00
jerseyang
418ee0cd38 enable the emc2305 fan controller and NCP power controller 30ms timeout mechanism (#8138)
Why I did it
fix the dx010 system eeprom unavailable issue

How I did it
enable the i2c slave 30ms timeout mechanism

How to verify it
i2cstress test in DX010 iSMT controller bus

Co-authored-by: nicwu-cel <nicwu@celestica.com>
2021-08-25 12:14:59 -07:00