Why I did it
#4021 describes an issue that is still being observed on master image whereby sensord does not start in pmon due to missing service.
How I did it
Updated the lm-sensors install patch with a case for systemd
How to verify it
Verified that sensord is up in pmon after boot
Co-authored-by: Boyang Yu <byu@arista.com>
* Add smartmontools to pmon docker
* Set smartmontools to install version 7.2-1 in pmon to match host; clean up smartmontools build files
* Add comments on smartmontools version for both host and pmon
#### Why I did it
To add new SKU Mellanox-SN2700-D44C10 with following requirements:
| Port configuration | Value |
| ------ |--------- |
| Breakout mode for each port |**Defined in port mapping** |
| Speed of the port | **Defined in Port mapping** |
| Auto-negotiation enable/disable | **No setting required** |
| FEC mode | **No setting required** |
|Type of transceiver used | **Not needed**|
Buffer configuration | Value
------ |---------
Shared headroom | **Enabled**
Shared headroom pool factor | **2**
Dynamic Buffer | **Disable**
In static buffer scenario how many uplinks and downlinks? | **44 x50G and 2x100G Downlinks 8x100G uplinks**
2km cable support required? | **No**
Switch configuration | Value
------ |---------
Warmboot enabled? | **yes**
Should warmboot be added to SAI profile when enabled? | **yes**
Is VxLAN source port range set? | **No**
Should Vxlan source port range be added to SAI profile when set. | **No**
Is Static Policy Based Hashing enabled? | **No**
Port Mapping
| Ports | Mode |
| ------ |--------- |
| 1,2 | 1x100G |
| 3-6 | 2x50G |
| 7-10 | 1x100G |
| 11-22 | 2x50G |
| 23-26 | 1x100G |
| 27-32 | 2x50G |
Number of Uplinks / Downlinks:
TO topology: **44 x50G and 2x100G Downlinks 8x100G uplinks**.
#### How I did it
Defined the SKU as per requirements
#### How to verify it
Load the SKU and verify if all links come up and traffic passes.
changed the platform device name under nokia directory; we now need to specify marvell armhf/arm64 to provide more accurate platform identity. otherwise onie discovery won't recognize the asic being installed.
Why I did it
when we load images using onie discovery, the process was failing because of marvell ASIC mismatch
How I did it
replace the platform asic with marvell-armhf under 7215
How to verify it
load a new image using http server and verify that the image can be loaded successfully
Why I did it
Fix apt-get remove/purge version not locked issue when the apt-get options not specified.
How I did it
Add a space character before and after the command line parameters.
- Why I did it
Fixes#11431
- How I did it
dhcp6relay binds to ipv6 addresses configured on these vlan interfaces
Thus check if they are ready before launching dhcp6relay
- How to verify it
Unit Tests
Tested on a live device
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
Signed-off-by: Neetha John nejo@microsoft.com
Why I did it
slb and bgp mon peers are not needed for storage backend. These neighbor are present in the minigraph.
How I did it
After minigraph parsing, remove these neighbors if it is a storage backend device
How to verify it
Unit tests
Verified on the device that once these tables are removed, these peers don't show up in "show runningconfig bgp" output
co-authorized by: jianquanye@microsoft.com
Migrate the t0 and t1-lag test jobs in buildimage repo to TestbedV2.
Why I did it
Migrate the t0 and t1-lag test jobs in buildimage repo to TestbedV2.
How I did it
Migrate the t0 and t1-lag test jobs in buildimage repo to TestbedV2.
Remove ceos type setting
Use 202205 branch as sonic-mgmt branch
Why I did it
Change the path of sonic submodules that point to "Azure" to point to "sonic-net"
How I did it
Replace "Azure" with "sonic-net" on all relevant paths of sonic submodules
fix linecard provisioning issue (500 error)
fix some value types for get_system_eeprom_info API
refactor code to leverage pci topology (enabling dynamic Pcie plugin)
refactor asic declaration logic to new style
misc fixes
Why I did it
BGP service has always been starting after interface-config. However, recently we discovered an issue where some BGP sessions are unable to establish due to BGP daemon not able to read the interface IP.
This issue was clearly observed after upgrading to FRR 8.2.2. See more details in #12380.
How I did it
Delaying starting BGP seems to be a workaround for this issue.
However, caution is that this delay might impact warm reboot timing and other timing sequences.
This workaround is reducing the probability of hitting the issue by close to 100X. However, this workaround is not bulletproof as test shows. It is still preferrable to have a proper FRR fix and revert this change in the future.
How to verify it
Continuously issuing config reload and check BGP session status afterwards.
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
Why I did it
When sending a PR only CI change, as expected, the target target/python-wheels/buster/sonic_config_engine-1.0-py2-none-any.whl should be from the cache, because the depended files were not changed, but it rebuilt.
How I did it
Sort the files by name.
There's an odd crash that intermittently happens after the teamd container
exits, and a signal is raised to the main thread to exit. This thread (watching
teamd) continues execution because it's in a `while True`. The subsequent wait
call on the teamd container very likely returns immediately, and it calls
`is_warm_restart_enabled` and `is_fast_reboot_enabled`. In either of these
cases, sometimes, there is a crash in the transition from C code to Python code
(after the function gets executed). Python sees that this thread got a signal
to exit, because the main thread is exiting, and tells pthread to exit the
thread. However, during the stack unwinding, _something_ is telling the
unwinder to call `std::terminate`. The reason is unknown.
This then results in a python3 SIGABRT, and systemd then doesn't call the stop
script to actually stop the container (possibly because the main process exited
with a SIGABRT, so it's a hard crash). This means that the container doesn't
actually get stopped or restarted, resulting in an inconsistent state
afterwards.
The workaround appears to be that if we know the main thread needs to exit,
just return here, and don't continue execution. This at least tries to avoid it
from getting into the problematic code path. However, it's still feasible to
get a SIGABRT, depending on thread/process timings (i.e. teamd exits, signals
the main thread to exit, and then syncd exits, and syncd calls one of the two C
functions, potentially hitting the issue).
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
To get following fixes:
be7da6b [sonic-installer] use host docker startup arguments when running dockerd in chroot (#2179) (#2407)
d112f7c [202205][auto-ts] add memory check (#2116) (#2413)
This is being done for 201911 -> 202205 warmboot upgrade, and Mellanox
platforms need to be able to mount the squashfs file in the old image.
This reverts commit d5365928d4.
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
- Why I did it
get_rx_los and get_tx_fault is not supported via the exisitng interface used, need provide dummy implementation for them.
NOTE: in later releases we will get them back via different interface.
- How I did it
Return False * lane_num for get_rx_los and get_tx_fault
- How to verify it
Added unit test
Why I did it
Add components data for sonic-mgmt testing
How I did it
Update platform.json and add platform_components.json
How to verify it
Ran sonic-mgmt tests (test_chassis and test_component)
Why I did it
Fix the build unstable issue caused by the kvm 9000 port is not ready to use in 2 seconds.
2022-09-02T10:57:30.8122304Z + /usr/bin/kvm -m 8192 -name onie -boot order=cd,once=d -cdrom target/files/bullseye/onie-recovery-x86_64-kvm_x86_64_4_asic-r0.iso -device e1000,netdev=onienet -netdev user,id=onienet,hostfwd=:0.0.0.0:3041-:22 -vnc 0.0.0.0:0 -vga std -drive file=target/sonic-6asic-vs.img,media=disk,if=virtio,index=0 -drive file=./sonic-installer.img,if=virtio,index=1 -serial telnet:127.0.0.1:9000,server
2022-09-02T10:57:30.8123378Z + sleep 2.0
2022-09-02T10:57:30.8123889Z + '[' -d /proc/284923 ']'
2022-09-02T10:57:30.8124528Z + echo 'to kill kvm: sudo kill 284923'
2022-09-02T10:57:30.8124994Z to kill kvm: sudo kill 284923
2022-09-02T10:57:30.8125362Z + ./install_sonic.py
2022-09-02T10:57:30.8125720Z Trying 127.0.0.1...
2022-09-02T10:57:30.8126041Z telnet: Unable to connect to remote host: Connection refused
How I did it
Waiting more time until the tcp port 9000 is ready, waiting for 60 seconds in maximum.
Why I did it
test_sai_qos failed because of the following error:
"stderr_lines": [
"Traceback (most recent call last):",
" File \"/usr/bin/ptf\", line 522, in <module>",
" test_modules = load_test_modules(config)",
" File \"/usr/bin/ptf\", line 413, in load_test_modules",
" mod = imp.load_module(modname, *imp.find_module(modname, [root]))",
" File \"saitests/switch.py\", line 19, in <module>",
" import switch_sai_thrift",
"ImportError: No module named switch_sai_thrift"
],
It's because test_sai_qos runs ptf script which imports switch_sai_thrift, switch_sai_thrift is installed from python-saithrift_0.9.4_amd64.deb.
For master image, the deb file is for python3, but ptf only has virtual python3 environment, that's why we add --system-site-packages to allow virtual env to access system site-packeges.
Add thrift package in docker ptf virtual python3 env, because currently env-python3 doesn't have thrift module which is needed in switch_sai_thrift.
How I did it
Enable --system-site-packages for virtual py3 env in ptf docker and install thrift for test_qos_sai
How to verify it
load and login ptf conatiner
dpkg - i python-saithrift_0.9.4_amd64.deb
source /root/env-python3/bin/activate
python
import switch_sai_thrift.switch_sai_rpc
Signed-off-by: Zhaohui Sun <zhaohuisun@microsoft.com>