Commit Graph

5096 Commits

Author SHA1 Message Date
madhanmellanox
3057266ff6 Adding SKU Mellanox-SN3800-D100C12S2 (#8441)
* Adding SKU Mellanox-SN3800-D100C12S2

Co-authored-by: Madhan Babu <madhan@l-csi-0241l.mtl.labs.mlnx>
2021-08-25 12:43:16 -07:00
Kostiantyn Yarovyi
a83480f749 [Pcied] run by python 3
Why I did it
Pcied running by python 2.

How I did it
dropped python2 support and add python3 support for pcied in file docker-pmon.supervisord.conf.j2

How to verify it
docker exec pmon supervisorctl status
2021-08-25 12:38:14 -07:00
Alexander Allen
9a3e35ec83 [Mellanox] Upgrade Mellanox firmware tools to 4.17.0 (#8299)
- Why I did it
New release of MFT has the following changelog / RN
 Fixed an issue that resulted in getting MVPD read errors from the mlxfwmanager during fast reboot.
 Fixed mlxuptime sometimes generating a time less than previous due the wrong frequency calculation

- How I did it
Update makefile pointer to new version.

- How to verify it
Manually tested on all Mellanox platforms.
2021-08-25 12:37:36 -07:00
Volodymyr Samotiy
055d497799 [monit] Periodically monitor VNET route consistency (#8266)
*To run VNET route consistency check periodically.
*For any failure, the monit will raise alert based on return code.
Signed-off-by: Volodymyr Samotiy <volodymyrs@nvidia.com>
2021-08-25 12:37:19 -07:00
Alexander Allen
50dcc82ee0 [Mellanox] Add 2x100G and 4x50G breakout modes to MSN4410 platform (#8419)
- Why I did it
The MSN4410 platform was missing 2x100G and 4x50G supported breakout modes in platform.json

- How I did it
Added the aforementioned breakout modes to platform.json

- How to verify it
Run show interface breakout on a MSN4410 and verify that 2x100G and 4x50G are listed in the supported breakout modes for ports 1-24.
2021-08-25 12:36:31 -07:00
Samuel Angebault
8de67e0051 [Arista] Add VOQ information for Clearwater2 (#8508)
This change introduces 3 columns in the port_config.ini file.
These are coreId, corePortId and numVoq.
The ports for inband and recirc were also renamed properly.
2021-08-25 12:25:22 -07:00
Marty Y. Lok
f33b659aff Added Nokia IXR7250E support (#7809)
Why I did it
Support Nokia ixr7250E IMM and Supervisor cards

How I did it
Added modules x86_64-nokia_ixr7250e_sup-r0 and x86_64-nokia_ixr7250e_36x400g-r0 ../device/nokia directory.
Modified the platform/broadcom/one-image.mk to include NOKIA_IXR7250_PLATFORM_MODULE
Modified the platform/broadcom/rule.mk to include the platform-module-nokia.mk
2021-08-25 12:21:39 -07:00
abdosi
f9a2c096ab Enable sysctl fib_multipath_use_neigh (#8502)
Enable fib_multipath_use_neigh for v4
https://www.kernel.org/doc/Documentation/networking/ip-sysctl.txt

Why I did:
This is helpful if the neighbor are not directly connected then Kernel forward to unreachable neighbor option. With this option forwarding using neighbor state to be valid.
2021-08-25 12:21:20 -07:00
Stepan Blyshchak
9e9e2abdf7 [libteam][warm-reboot] fix issue in teamd warm-reboot that teamd starts (#8227)
with state of tdport from previous warm-reboot.

In case LAG was down before reboot, lacp->wr is not cleared.
In lacp_event_watch_port_flush_data we incremented nr_of_tdports and add
tdport to lacp->wr.state. In case lacp->wr.state already had this tdport
we do not set new state for tdport but appened a new item in
lacp->wr.state. In case we preformed warm-reboot and PortChannel member
was down, after reboot PortChannel member became up next warm-reboot
will initialize teamd with PortChannel member in down state.

Fix this issue by calling stop_wr_mode() when LAG was down. This was probably intended but missed.

#### Why I did it

To fix an issue seen in warm-reboot-sad test cases.

#### How I did it

I fixed it in SONiC libteam patch that adds warm-reboot support. Details in commit description.

#### How to verify it

Run warm-reboot-sad test on t0-56 topology.
2021-08-25 12:20:57 -07:00
Qi Luo
65c135a1ae Replace swsssdk with swsscommon in sonic-host-services (#8034)
#### Why I did it
swsssdk will be deprecated. Use swsscommon instead.

#### How to verify it
Unit test
2021-08-25 12:20:32 -07:00
minionatwork
1a41e76c45 [multi-asic] Fix for sonic-cfggen exception during platform string read (#8229)
Fix for sonic-cfggen exception during platform string read during fresh install and start of sonic in multi asic, /var/run/redisX/ is created after database docker is started.
2021-08-25 12:20:09 -07:00
Stephen Sun
288c49a085 Use predefined macro as vendor information (#8361)
#### Why I did it
Use a predefined variable to get vendor information when the swss docker container is created

#### How I did it
Use `{{ sonic_asic_platform }}` instead of `$SONIC_CFGGEN -y /etc/sonic/sonic_version.yml -v asic_type`

#### How to verify it
Manually test.
2021-08-25 12:19:09 -07:00
tomer-israel
69e32890e2 [Mellanox] fix syseeprom info values on mellanox simulator platforms msn4700 and msn4800 (#8387)
#### Why I did it
The values of the syseeprom were not valid

#### How I did it
I took the correct hexdump values from a real switch and created this hex file again

#### How to verify it
decode-syseeprom will display the new values
2021-08-25 12:18:37 -07:00
Ying Xie
34487eef5d [aboot] use ram partition for /var/log for devices with 3.7G disks (#8400)
Master/202012 image size grew quite a bit. 3.7G harddrive can no longer hold one image and safely upgrade to another image. Every bit of harddrive space is precious to save now.

Also sh syntax seemingly changed, [ condition ] && action was a legit syntax in 201911 branch but it is an error when condition not met with 202012 or later images. Change the syntax to if statement to avoid the issue.

Signed-off-by: Ying Xie ying.xie@microsoft.com
2021-08-25 12:18:19 -07:00
gechiang
c679ebf931 Reapply the fix to address setting MTU > 1500 causing portmgrd crash on BRCM platforms (#8472) 2021-08-25 12:17:56 -07:00
Kebo Liu
0108c7de58 [Mellanox] Upgrade hw-mgmt to 7.0100.2344 (#8463)
To pick up new PSU fan support from new hw-mgmt release

Signed-off-by: Kebo Liu <kebol@nvidia.com>
2021-08-25 12:17:36 -07:00
carl-nokia
aef7c85695 [Nokia ixs7215] sfputil support + component tests (#8445)
Deliver sfputil support for sfputil show eeprom and sfputil reset along with some component test case fixes

Co-authored-by: Carl Keene <keene@nokia.com>
2021-08-25 12:17:07 -07:00
Vladyslav Morokhovych
47496ec8a4 [swss] Fix arp_update script (#8412)
Fix #7968

Issue is detected on SONiC.20201231.11

In test_static_route.py::test_static_route_ecmp static routes are configured, but neighbors are not resolved after config reload even after 10 minutes.
It looks like the arp_update script is starting to ping when Vlan1000 is not fully configured.
When issue is reproduced, stuck ping6 process is observed in swss container :

USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         180  0.1  0.0   6296  1272 pts/0    S    17:03   0:03 ping6 -I Vlan1000 -n -q -i 0 -c 1 -W 0 ff02::1
And when arp_update script successfully resolves neighbors, we observe sleep 300 instead of ping process
2021-08-25 12:16:42 -07:00
Shi Su
499ad9141b [FRR]: Upgrade FRR to frr-7.5.1-s1 tag (#8443)
Update FRR 7.5.1 head.
2021-08-25 12:16:19 -07:00
carl-nokia
61dfcae4e0 [Nokia] Add hwsku.json for the Nokia-7215 (#8372)
* add hwsku.json for the Nokia-7215
* added required default_brkout_mode to hwsku as its not optional
* remove tabs from the file so spacing consistent

Co-authored-by: Carl Keene <keene@nokia.com>
2021-08-25 12:16:01 -07:00
shlomibitton
cbb825cb4b [hostcfgd] Delay hostcfgd and aaastatsd for faster boot time (#7965)
#### Why I did it
hostcfgd is starting at the same time as 'create_switch' method is called on orchagent process.
This introduce a degradation on the function execution time which eventually cause the fast-boot flow and a boot scenarion in general to run slower (~6 seconds).
This change will delay the start time of this daemon.
The aaastatsd will delay as well since it has a dependency on hostcfgd, so it is required to delay both.
90 seconds determined as the maximum allowed downtime for control plane to come back up on fast-boot flow.

#### How I did it
Add two timers for hostcfgd and aaastatsd  services in order to delay the startup of these services.

#### How to verify it
Install an image with this change and observe the daemons start 90 seconds after the system boot.
2021-08-25 12:15:16 -07:00
jerseyang
418ee0cd38 enable the emc2305 fan controller and NCP power controller 30ms timeout mechanism (#8138)
Why I did it
fix the dx010 system eeprom unavailable issue

How I did it
enable the i2c slave 30ms timeout mechanism

How to verify it
i2cstress test in DX010 iSMT controller bus

Co-authored-by: nicwu-cel <nicwu@celestica.com>
2021-08-25 12:14:59 -07:00
carl-nokia
43fa47d486 [sonic-device-data]: add port_type to OPTIONAL_PORT_ATTRIBUTES (#8370)
enable automated test suites to selectively run relevant tests ( or not run tests ) based upon a new port_type identifier in hwsku.json

How I did it
Modified the valid optional fields in validity check for hwsku.json per recommendation from Joe in
https://github.com/Azure/sonic-mgmt/pull/2654/files

Co-authored-by: Carl Keene <keene@nokia.com>
2021-08-25 12:14:17 -07:00
dflynn-Nokia
fb072d84cb [Nokia ixs7215] Watchdog timer support (#8377) 2021-08-25 12:13:44 -07:00
Rajkumar-Marvell
31f4154787 [reboot-cause] Fixed determine-reboot-cause.service failure. (#8210)
Signed-off-by: Rajkumar Pennadam Ramamoorthy rpennadamram@marvell.com

Why I did it
Install sonic image from ONIE. Once system is up, execute "config reload" command.

Root cause is that "determine-reboot-cause.service" was in failed state.
root@sonic:/host/reboot-cause# systemctl list-units --failed
UNIT LOAD ACTIVE SUB DESCRIPTION
● determine-reboot-cause.service loaded failed failed Reboot cause determination service

How I did it
Fixed the issue by setting default reason to "REBOOT_CAUSE_UNKNOWN" instead of "None".

How to verify it
Check " determine-reboot-cause.service' loaded successfully post image installation from ONIE.
Verify "reboot-cause.txt" file is created and config reload succeeds.
2021-08-25 12:13:15 -07:00
Shilong Liu
9fb60721f8 Reproducible build add docker image debian* to white list. (#8330)
#### Why I did it
1. Add version control for debian* docker image to white list.
2. Always record docker image sha256 value, regardless of white list.
2021-08-25 12:12:42 -07:00
Kebo Liu
dd9a9ba4c3 [Mellanox] Add new sensor conf to support SN4410 A1 system (#8379)
#### Why I did it

New SN410 A1 system has a different sensor layout with A0 system, needs a new sensor conf file to support it.

#### How I did it

Since the SN4410 A1 system use exactly the same sensor layout as the SN4700 A1 system, so add a symbol link linking to the SN4700 A1 sensor conf file to reuse.

#### How to verify it

Run sensor test against the SN4410 A1 system;
Run platform related regression test against the SN4410 A1 system
2021-08-25 12:12:18 -07:00
tjchadaga
8b780d68a9 Fix TH3 Warm-reboot failure due to Tunnel termination SAI failure (#8395) 2021-08-25 12:12:00 -07:00
gechiang
280df2ee46 BRCM Disable ACL Drop counted towards interface RX_DRP counters (#8382)
* BRCM Disable ACL Drop counted towards interface RX_DRP counters
2021-08-25 12:11:23 -07:00
judyjoseph
6bbfafb045 [build]: Update the make cache mode for opennsl-module-dnx (#8391)
Fix warning shown during compilation

[ DPKG ] Cache is not enabled for opennsl-modules-dnx_5.0.0.4_amd64.deb package
2021-08-25 12:10:59 -07:00
Longxiang Lyu
9cc4b7b406 [swss][arp_update] Send ipv6 pings over vlan sub interfaces (#8363)
#### Why I did it
* `arp_update` fails to ping those neighbors over vlan sub interfaces.

#### How I did it
* modify `arp_update_vars.j2` to get vlan sub interfaces with ipv6 addresses assigned.
* modify `arp_update` to send ipv6 pings over those retrieved vlan sub interfaces.

Signed-off-by: Longxiang Lyu <lolv@microsoft.com>
2021-08-25 12:10:33 -07:00
Blueve
02bce90933 [ARM] Fix issue whre the ping6 tool is missing from orchagent docker (#8345)
Signed-off-by: Jing Kan jika@microsoft.com
2021-08-25 12:10:06 -07:00
Guohan Lu
52a59f827e [ci]: fix artifact download syntax error for vstest (#8547)
Signed-off-by: Guohan Lu <lguohan@gmail.com>
2021-08-21 14:31:49 -07:00
lguohan
b9d6eb0678 [openssh]: move build dep installation to sonic-slave-buster (#8381)
install build dep causes dpkg lock issue in parallel build

Signed-off-by: Guohan Lu <lguohan@gmail.com>
2021-08-20 16:07:02 +08:00
Judy Joseph
1e81d10b9c sonic-swss and sonic-utilities submodule update
sonic-swss

e892dda Fix warmboot issue PR##8367 (#1866)
9c6023d Mclag enhacements support code changes. (#1331)

sonic-utilities

5465ea0 [MPLS][CLI] added config/show CLI for MPLS interface, MPLS CRM threshold config, updated CLI reference manual
3bac779  mclag enhancements as per HLD at Azure/SONIC#596 (#1138)
2021-08-19 23:15:33 -07:00
Praveen-Brcm
44a2cd8b1a MCLAG enhacements ICCPd initial code commit (#4819)
* MCLAG enhacements ICCPd initial code commit
* Resolving the merge conflicts with orighin
* L3 MCLAG Enhancements and Unique IP Changes.
* Addressed review comments

Co-authored-by: Tapash Das <tapash.das@broadcom.com>
2021-08-19 22:14:09 -07:00
richardyu
debe310c66 PTF adds unittest-xml-reporting (#8417)
Co-authored-by: richardyu-ms <richard.yu@microsoft.com>
2021-08-19 21:49:53 -07:00
Judy Joseph
c95a9d1db7 Update sonic-platform-common with following commits
1d3a810 [python coverage] fix result color bar (#202)
 3f7b359 Add a template function that returns list of asics on module (#185)
 abc2709 Fix decode error when parsing EEPROM fields (#199)
 789b41e Load interval from thermal_policy.json (#178)
 540ed1c Fix Xcvrd crash due to invalid key access in type_of_media_interface, host_electrical_interface, connector_dict (#206)
 716caf8 Unifying the platform api for get_pcie_aer_stats with PcieBase (#197)

Update sonic-utilities with following commit

 3f3974e [show priority-group drop counters] Add user info output when user want to check PG counters and polling are disabled (#1678)
 16606de Global and Interface commands for IPv6 Link local address enhancements (#1159)
2021-08-19 21:33:22 -07:00
Judy Joseph
cbca676c2b Update sonic-swss module with the following commits
0dcb2b6 Open record file in append mode (#1845)
03ce2ee [vnet/vxlan] Add support of multiple mappers for the VxLAN tunnel (#1843)
c5e90ab VOQ: Nexthop for remote VOQ LC should be created on inband OIF. (#1823)
834c5c8 Td2: Reclaim buffer from unused ports (#1830)
a5ad55c [Dynamic Buffer Calc] Bug fix: Don't create lossless buffer profile for active ports without speed configured (#1822)
f50368f [cfgmgr] Update Makefile.am to consume lib zmq (#1865)
2021-08-17 19:38:01 -07:00
Stepan Blyshchak
752117875c [sonic_debian_extension.j2] export DOCKER_HOST so that clients can use it to connect to dockerd (#8398)
Use DOCKER_HOST. Every client including docker command and python docker API uses this environment variable to connect to dockerd.

Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
2021-08-14 17:51:51 -07:00
Guohan Lu
251c04c24f [build]: Fix docker pull on armhf platform
armhf build uses native dockerd

Signed-off-by: Guohan Lu <lguohan@gmail.com>
2021-08-14 17:49:17 -07:00
Judy Joseph
e0f72470bf Update to sonic-swss-202106 branch, and incldue the following commit
97a108f Code changes to support IPv6 Link local enhancements (#1463)
2021-08-10 11:25:47 -07:00
lguohan
b65846ad00 [build]: add debug info for dpkg frontend lock (#8375)
print out the process that hold the dpkg frontend lock.

Signed-off-by: Guohan Lu <lguohan@gmail.com>
2021-08-08 20:42:18 -07:00
Neetha John
c6a3a58cf7 Revert "Revert "Update default cable len to 0m for TD2"" (#8354)
* Update default cable len to 0m for TD2 (#8298)
* Update sonic-cfggen tests with the correct cable len

Signed-off-by: Neetha John <nejo@microsoft.com>

As part of the buffer reclamation efforts for TD2, setting the default cable len to 0m which means unused ports will have a cable len of 0m.

Why I did it
To align with the changes in Azure/sonic-swss#1830

How to verify it
- With the default cable len set to 0m and the associated changes in swss, CABLE_LENGTH table had '0m' set for unused ports and accordingly more space was reserved for the shared pool
- Cfggen tests passed with the cable len update
2021-08-06 20:54:40 -07:00
Arun Saravanan Balachandran
a6b843c035 DellEMC: Add pcie.yaml for Z9332f (#8329)
Why I did it
To support "pcied" and "pcieutil" commands in DellEMC Z9332f.

How I did it
Add 'pcie.yaml' in device/dell/[PLATFORM]/ directory.

How to verify it
Execute "pcieutil check" command.
Logs: UT_logs.txt
2021-08-06 20:54:24 -07:00
Sujin Kang
c8db8d266a [pmon]: Enable Autorestart of the daemons in PMON for unexpected exit cases (#8326)
Remove the daemon list from the critical_process which prevent the PMON
from restarting when the individual daemon crashes.
2021-08-06 20:54:08 -07:00
jusherma
356c3d4e83 [build] Always use -j1 for libsnmp to avoid race condition (#8324)
I have been seeing intermittent (~40%) build failures with the same error described in PR https://github.com/Azure/sonic-buildimage/pull/6592, even with that fix present

```
/usr/bin/ld: mibgroup/ip-forward-mib/ipCidrRouteTable/.libs/ipCidrRouteTable_interface.o: file not recognized: file truncated
...
libtool:   error: 'mibgroup/ip-forward-mib/inetCidrRouteTable/inetCidrRouteTable_interface.lo' is not a valid libtool object
make[5]: *** [Makefile:1020: libnetsnmpmibs.la] Error 1
make[5]: *** Waiting for unfinished jobs....
```

#### How I did it

Use `-j1` for the libsnmp build regardless of the value of `$(MULTIARCH_QEMU_ENVIRON)`

#### How to verify it

Performed 10 builds of the libsnmp target (`target/debs/buster/libsnmp-base_5.7.3+dfsg-5_all.deb`) with and without this change. Without the change, hit the error 40% of the time. With the change did not see the error at all

Signed-off-by: Justin Sherman <jusherma@cisco.com>
2021-08-06 20:53:54 -07:00
DavidZagury
03da44aea6 [Mellanox][Pcie] Fix issue on pcied with an id that contains only decimal digits was treated as a decimal number (#8309)
A device that contains only decimal digits was mistreated as a decimal integer resulting in failure to find it in the id to bus map.
2021-08-06 20:53:41 -07:00
VenkatCisco
8093ab2024 Platform/cisco-8000 module for sonic-buildimage (#8172)
Why I did it
Update Makefile, so it does the following:
For a given platform, verify if platform/checkout/.ini exists and hence run the platform/checkout/template.j2. This allows platform code to be checked out during the 'make configure' stage.

How I did it
git clone git@github.com:Azure/sonic-buildimage.git
mkdir platform/cisco-8000

make init
make configure PLATFORM=cisco-8000
make all
2021-08-06 20:42:10 -07:00
Aravind Mani
402b0732ff Dell S6100: Monitor serial-getty service (#8304)
Why I did it
serial-getty service exited in Dell S6100 device randomly.

How I did it
Added serial-getty to monit services.

How to verify it
Stop serial-getty in ssh session and check whether the service restarts or not.
2021-08-06 20:38:59 -07:00