#### Why I did it
while sonic upgrade, Image will be extracted to tmpfs for installation so tmpfs size should be larger than image size. Image installation will fail if image size is larger than tmpfs size.
we are facing below error while installing debug image with size greater than tmpfs which is 1.5g in marvell armhf platform.
sonic-installer install <url>
New image will be installed, continue? [y/N]: y
Downloading image...
...99%, 1744 MB, 708 KB/s, 0 seconds left...
Installing image SONiC-OS-202012.0-dirty-20210311.224845 and setting it as default...
Command: bash /tmp/sonic_image
tar: installer/fs.zip: Wrote only 7680 of 10240 bytes
tar: installer/onie-image-arm64.conf: Cannot write: No space left on device
tar: Exiting with failure status due to previous errors
Verifying image checksum ... OK.
Preparing image archive ...
#### How I did it
compare downloaded image size with tmpfs size, if size less than image size update the tmpfs size according to image size.
#### How to verify it
Install an Image with size larger than tmpfs. we verified by installing debug image with size 1.9gb which is larger than tmpfs size 1.5gb.
Update sonic-utilities submodule. This include the following commits:
285960d [config]: Update environment file during config reload (#1673)
3f0ecd5 [config] Remove "reset failed" print lines from config reload (#1654)
a1c8751 Make the soft-reboot available in the SONiC image on master (#1681)
45e7b71 [Mellanox] Add all results from saisdkdump to the techsupport on Mellanox switches (#1660)
- Why I did it
The default breakout mode according to hwsku.json for the MSN4410 is 1x400G and this is not a supported breakout mode according to its platform.json
This causes a conflict on boot of this platform and no containers on the switch will init successfully.
- How I did it
Referenced the platform specification files and updated platform.json
- How to verify it
Install master version of SONiC on MSN4410
Boot switch and verify swss is successfully running using docker ps
- Why I did it
Remove EEPROM cache file and use DB instead
- How I did it
Read EEPROM data from DB if possible
If data is not ready in DB, read from hardware using a visitor pattern
- How to verify it
Manual test and regression
Why I did it
There is a regression on 2700 platform where the evidence points to the flex counter change.
How I did it
Back track the swss submodule head to exclude:
[flex-counters] Delay flex counters stats init for faster boot time (#1749)
Verified that image build from this PR doesn't trigger the crash.
Signed-off-by: Ying Xie ying.xie@microsoft.com
7670b49 [sonic_platform/sfp_base] Add common definition for get SFP error status (#194)
1336598 [CI] sonic-config-engine now depends on SONiC YANG packages (#198)
f57fee4 Add to check pcie configuration revision to get the right configuration. (#195)
4e3a0a0 Fix typo for midplane APIs. (#196)
fc2e9e2 [eeprom_tlv_info] Optimize EEPROM data process by using visitor pattern (#193)
Why I did it
SONiC YANG model support for BGP & route-map features.
How I did it
Defined various BGP and route-map YANG containers and lists based on config-DB schema.
How to verify it
Built the following successfully with various BGP & route-map unit test cases.
make target/python-wheels/sonic_yang_models-1.0-py3-none-any.whl
make target/python-wheels/sonic_yang_mgmt-1.0-py3-none-any.whl
Updates sonic-platform-daemons submodule. Includes the following commits:
```
eb8a223 [xcvrd] Force cleanup of chassis global variable on deinit (#193)
a6903c0 [CI] sonic-config-engine now depends on SONiC YANG packages (#194)
bf60a27 Replace swsssdk.SonicV2Connector with swsscommon implementation (#191)
```
19615e3 Fixing db_migrator for Feature table (#1674)
d1c1c61 [tests]: skip some dynamic port breakout unit tests (#1677)
25669c3 [CI] sonic-config-engine now depends on SONiC YANG packages (#1675)
3ff68c4 [neighbor-advertiser] delete the tunnel maps appropriately (#1663)
a425ca2 [config] support for configuring muxcable to manual mode of operation (#1642)
25e17de [show platform summary] Add chassis hardware info to platform summary and version (#1624)
f5f2a00 [db_migrator] fix old 1911 feature config migration to a new one. (#1635)
56db162 [config] Fix config int add incorrect ip (#1414)
1da879c [db_migrator][Mellanox] Update Mellanox buffer migrator with 2km-cable supported (#1564)
c2b760f [sonic_package_manager] flush once finished saving docker image into temporary file (#1638)
cd69473 Replace swsssdk.ConfigDBConnector and SonicDBConfig with swsscommon implementation (#1620)
5f20365 Change to use rvtysh when calling the show commands (#1572)
51d6bf5 Fix Aboot breakage in sonic package manager in sonic-installer (#1625)
18bed46 [console][show] Force refresh all lines status during show line (#1641)
b616cd9 [TPID CONFIG] Added TPID configuration CLI support (#1618)
01eb4b1 [show] support for show muxcable firmware version of only active banks (#1629)
7744c8d [fdb]cli: fdb entries are cleared according to vlan or port or vlan&&port (#657)
e23c5ee Add psu hardware revision to psushow table (#1601)
f1726fe Make advance_version_for_expected_database available for other db migrator test cases as well (#1614)
5d1ad05 [show] add support for muxcable metrics (#1615)
feeab29 [config] Sort Config Db When Saving (#1623)
#### Why I did it
To allow SSH connections from IPv6 addresses
Resolves https://github.com/Azure/sonic-buildimage/issues/7668
#### How I did it
In build_debian.sh, modify sshd_config file so as to enable listening for IPv6 connections
#### Why I did it
Recently, the build started failing with messages like
```
2021-06-16T16:55:02.8675603Z tests/hostcfgd/hostcfgd_test.py:5: in <module>
2021-06-16T16:55:02.8676208Z from parameterized import parameterized
2021-06-16T16:55:02.8677145Z E ModuleNotFoundError: No module named 'parameterized'
```
Unit tests for hostcfgd depend on the `parameterized` Python package, but it was never added as a dependency to the setup.py file. This dependency was added ~3 months ago. I'm not sure why we only started seeing this failure recently.
#### How I did it
Add 'parameterized' package as a test dependency in setup.py for sonic-host-services package
Why I did it
Support multiple pcie configuration file and change the pcie status table name
This is to match with below two PRs.
Azure/sonic-platform-common#195Azure/sonic-platform-daemons#189
How I did it
Check pcie configuration file with wild card and change the device status table name
How to verify it
Restart with changes and see if the pcie check works as expected.
Why I did it
The Mellanox platform is required to support the fwutil auto-update feature defined here
This is to allow switches, when performing SONiC upgrades to choose whether to perform firmware upgrades that may interrupt the data plane through a cold boot.
How I did it
Two methods were added to the component implementations for mellanox.
In the base Component class we add a default function that chooses to skip the installation of any firmware unless the cold boot option is provided. This is because the Mellanox platform, by default, does not support installing firmware on ONIE, the CPLD, or the BIOS "on-the-fly".
In the ComponentSSD class we add a function that behaves similarly but uses the Mellanox specific SSD firmware upgrade tool to check if the current SSD supports being upgraded on the fly in order to decide whether to skip or perform the installation.
How to verify it
Unit tests are included with this PR. These test will run on build of target sonic-mellanox.bin
You may also perform fwutil auto-update ... commands after Azure/sonic-utilities#1242 is merged in.
UserID is different with the image built by Jenkins if we build docker-sonic-mgmt in sonicbld pool.
So we build this image in sonictest pool. There is a sonictmp user with UserID 1001.
This will make docker image same as the image built in Jenkins
Why I did it
The SONiC switches get their docker images from local repo, populated during install with container images pre-built into SONiC FW. With the introduction of kubernetes, new docker images available in remote repo could be deployed. This requires dockerd to be able to pull images from remote repo.
Depending on the Switch network domain & config, it may or may not be able to reach the remote repo. In the case where remote repo is unreachable, we could potentially make Kubernetes server to also act as http-proxy.
How I did it
When admin explicitly enables, the kubernetes-server could be configured as docker-proxy. But any update to docker-proxy has to be via service-conf file environment variable, implying a "service restart docker" is required. But restart of dockerd is vey expensive, as it would restarts all dockers, including database docker.
To avoid dockerd restart, pre-configure an http_proxy using an unused IP. When k8s server is enabled to act as http-proxy, an IP table entry would be created to direct all traffic to the configured-unused-proxy-ip to the kubernetes-master IP. This way any update to Kubernetes master config would be just manipulating IPTables, which will be transparent to all modules, until dockerd needs to download from remote repo.
How to verify it
Configure a switch such that image repo is unreachable
Pre-configure dockerd with http_proxy.conf using an unused IP (e.g. 172.16.1.1)
Update ctrmgrd.service to invoke ctrmgrd.py with "-p" option.
Configure a k8s server, and deploy an image for feature with set_owner="kube"
Check if switch could successfully download the image or not.
What I did:
Updated 7260 MMU Profile based on latest MSFT Tier 1 Tomahawk2_MMU_Setting_48x100G_40m_16x100G_300m_v1.0 and
TH2_PGHdrm_MSFT.
How I verify:
Made sure image is up/traffic is flowing/mmu dump looked fine.
SAI qos test need will be updated to support this SKU.
#### Why I did it
The PR checkers do not re-run the sonic-config-engine test cases, caused by some of the config files changes not detected.
https://sonic-jenkins.westus2.cloudapp.azure.com/job/mellanox/job/buildimage-mlnx-all/660/console
…
07:13:24 ======================================================================
07:13:24 ERROR: test_bgpd_quagga (tests.test_j2files.TestJ2Files)
07:13:24 ----------------------------------------------------------------------
…
07:13:24 ======================================================================
07:13:24 ERROR: test_zebra_quagga (tests.test_j2files.TestJ2Files)
07:13:24 ----------------------------------------------------------------------
…
07:13:24 error: Test failed: <unittest.runner.TextTestResult run=161 errors=2 failures=0>
07:13:24 [ FAIL LOG END ] [ target/python-wheels/sonic_config_engine-1.0-py2-none-any.whl ]
07:13:24 make: *** [slave.mk:603: target/python-wheels/sonic_config_engine-1.0-py2-none-any.whl] Error 1
07:13:24 Makefile.work:292: recipe for target 'target/sonic-mellanox.bin' failed
07:13:24 make[1]: *** [target/sonic-mellanox.bin] Error 2
07:13:24 make[1]: Leaving directory '/data2/johnar/workspace/mellanox/buildimage-mlnx-all'
07:13:24 Makefile:7: recipe for target 'target/sonic-mellanox.bin' failed
07:13:24 make: *** [target/sonic-mellanox.bin] Error 2
See PR: https://github.com/Azure/sonic-buildimage/pull/7476
#### How I did it
Add the depended files.
See src/sonic-config-engine/tests/test_j2files.py
Signed-off-by: Mykola Gerasymenko <mykolax.gerasymenko@intel.com>
Why I did it
Dynamic Port Breakout falls cause of PG_DROP yang model missing
How I did it
Add PG_DROP yang model and add check this field in unit test for yang model
How to verify it
Firstly try to do DPB (2x50G) for Ethernet0 port:
sudo config interface breakout Ethernet0 2x50G -f
After that try to do DPB (1x100G[40G]) for Ethernet0 port:
sudo config interface breakout Ethernet0 1x100G[40G] -f
Both commands should work correctly.
- Why I did it
Enhance the Python3 support for platform API. Originally, some platform APIs call SDK API which didn't support Python 3. Now the Python 3 APIs have been supported in SDK 4.4.3XXX, Python3 is completely supported by platform API
- How I did it
Start all platform daemons from python3
1. Remove #/usr/bin/env python at the beginning of each platform API file as the platform API won't be started as daemons but be imported from other daemons.
2. Adjust SDK API calls accordingly
- How to verify it
Manually test and run regression platform test
Signed-off-by: Stephen Sun <stephens@nvidia.com>
- Why I did it
Adjust the Makefile for SDK/python-SDK-API to support both python2 and python3
- How to verify it
Build the image and check whether python2 and python3 are both supported by SDK API.
Signed-off-by: Stephen Sun <stephens@nvidia.com>
Added support BRCM SAI 5.0.0.1.
Major changes here:
CS00012019568 Link Training (all 100G ASICs - TH families and TD3)
CS00012184310 [attribute_capability| for port SAI_PORT_ATTR_TPID returns CREATE_IMP=false|SET_IMP=true|GET_IMP=true
CS00012182145 [IPinIP][Tunnel Delete] If IPinIP tunnel delete is performed observed following SYNCd error: ERR syncd#syncd: [none] SAI_API_TUNNEL:brcm_sai_tnl_mp_remove_tunnel_term_table_entry:4026 _brcm_sai_mptnl_sip_tnl_lookup failed with error -7.
CS00012182148 Rate Limit Parity error message to syncd/sonic.
CS00012178692 ACL drops counted as interface drops
CS00012183901 [WARMBOOT] WARMReboot with active traffic causes port flap reported during warm reboot
CS00012023263 TD3/TH2 : Support 4 lossless queues(2 SW PFCWD and 2 HW PFCWD)
CS00012019578 Pre FEC bit-error rate (BER) - DNX and XGS (TD and TH 50/100G)
To fix determine-reboot-cause service which was failing due to non-implemented thrown from get_reboot_case, if the reboot was done with `sudo reboot` (cold reboot)
Signed-off-by: Volodymyr Boyko <volodymyrx.boiko@intel.com>
Why I did it
This PR adds changes in sonic-config-engine to consume configuration data in SONiC Yang schema and generate config_db entries
How I did it
Add a new file sonic_yang_cfg_generator .
This file has the functions to
parse yang data json and convert them in config_db json format.
Validate the converted config_db entries to make sure all the dependencies and constraints are met.
Add a new option -Y to the sonic-cfggen command for this purpose
Add unit tests
This capability is support only in sonic-config-engine Python3 package only
Why I did it
* For SAI - Upgrade to Version 1.19.0
- Add support for VxLAN encap TTL uniform model on SPC2/3
- Add ACL entry actions set VRF, set do no learn, add VLAN ID, add VLAN priority
- Add ACL field has VLAN tag
- Bulk counters (improve port statistics performance)
- Create async dump extra as part of debug generate dump
- Create irisc dump on severe health event
- Support 0 port systems (modify get switch mac to work accordingly)
- Set interface vlan up state for ping tool in SONiC
- Support attributes SAI_PORT_ATTR_QOS_SCHEDULER_PROFILE_ID, SAI_PORT_ATTR_QOS_INGRESS_BUFFER_PROFILE_LIST,
SAI_PORT_ATTR_QOS_EGRESS_BUFFER_PROFILE_LIST, SAI_PORT_ATTR_POLICER_ID as part of port create Git stats
* For SDK\FW - Upgrade to Version SDK 4.4.3106, FW 2008_3110
Added Features:
- Increased ACL table
- Enhanced PSAMPLE support
- Added support for Finisar SR4 module in SN3700 systems
- Added support for Python 3.0 in examples.
Fix bugs:
- On LR4 transceivers 00YD278, the firmware incorrectly identified the transceiver
- Reduce memory consumption for virtual LAG
- Fixed PSAMPLE listeners cleanup on SDK drivers unloading.
- On Spectrum-2 and Spectrum-3 systems, slow reaction time to Rx pause packets may lead to buffer overflow on servers.
- BER may be experienced when using 5m DAC cables between SN4700 and SN2700 in 100GbE speed.
- On very rare occasion, when connecting DR4 PAM4 transceiver to 100GbE DR1 NRZ, low BER may be experienced.
- Unexpected packet drops on the port ingress buffer may be experienced when working in 400GbE mode.
Note: When performing ISSU from an older version, this fix won't be applied. For fix to apply, a non-ISSU reset is required.
- Fix SN3800 specific warm boot scenario: Disable interface, Warm Boot, Enable Interface --> link will remain down.
Signed-off-by: Dror Prital <drorp@nvidia.com>
Process pcied failed on Arista-7170-32CD-C32
```
root@sonic:/# supervisorctl
chassis_db_init EXITED Jun 03 08:48 AM
dependent-startup EXITED Jun 03 08:48 AM
ledd RUNNING pid 28, uptime 3:07:49
lm-sensors EXITED Jun 03 08:48 AM
pcied FATAL Exited too quickly (process log may have details)
```
Signed-off-by: Andriy Kokhan <andriyx.kokhan@intel.com>
This is due to the fact that we use SONIC_OVERRIDE_BUILD_VARS internally
in our build jobs and this is not accounted in caching framework.
So we add MLNX_SDK_DEB_VERSION to force rebuild if we changed it via
SONIC_OVERRIDE_BUILD_VARS.
Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>