Co-authored-by: Zhi Yuan (Carl) Zhao <zyzhao@arista.com>
Why I did it
Arista 7060 platform has a rare and unreproduceable PCIe timeout that could possibly be solved with increasing the switch PCIe timeout value. To do this we'll call a script for this platform to increase the PCIe timeout on boot-up.
No issues would be expected from the setpci command. From the PCIe spec:
"Software is permitted to change the value in this field at any
time. For Requests already pending when the Completion
Timeout Value is changed, hardware is permitted to use either
the new or the old value for the outstanding Requests, and is
permitted to base the start time for each Request either on when
this value was changed or on when each request was issued. "
How I did it
Add "platform-init" support in swss docker similar to how "hwsku-init" is called, only this would be for any device belonging to a platform. Then the script would reside in device data folder.
Additionally, add pciutils dependency to docker-orchagent so it can run the setpci commands.
How to verify it
On bootup of an Arista 7060, can execute:
lspci -vv -s 01:00.0 | grep -i "devctl2"
In order to check that the timeout has changed.
#### Why I did it
Sonic yang model for BUM storm control
#### How I did it
Added yang model and the corresponding test cases.
#### How to verify it
yang model test case for storm control
- Why I did it
To fix an issue that hw-mgmt patches were not applied. One patch was already in upstream hw-mgmt package thus applying it again caused an error and no other patches were applied. Also, I did it to improve the Makefile, so that the make will fail in case patches fail to apply.
- How I did it
Removed obsolete patch, made applying patches a hard failure in the build.
- How to verify it
Run the make and verify patches are applied.
Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
<!--
Please make sure you've read and understood our contributing guidelines:
https://github.com/Azure/SONiC/blob/gh-pages/CONTRIBUTING.md
** Make sure all your commits include a signature generated with `git commit -s` **
If this is a bug fix, make sure your description includes "fixes #xxxx", or
"closes #xxxx" or "resolves #xxxx"
Please provide the following information:
-->
#### Why I did it
1. Fix auditd log file path, because known issue: https://github.com/Azure/sonic-buildimage/issues/9548
2. When SONiC change to based on bullseye, auditd version upgrade from 2.8.4 to 3.0.2, and in auditd 3.0.2 the plugin file path changed to /etc/audit/plugins.d, however the upstream auditisp-tacplus project not follow-up this change, it still install plugin config file to /etc/audit/audisp.d. so the plugin can't be launch correctly, the code change in src/tacacs/audisp/patches/0001-Porting-to-sonic.patch fix this issue.
#### How I did it
Fix tacacs plugin config file path.
Create /var/log/audit folder for auditd.
#### How to verify it
Pass all UT, also run per-command acccounting UT to validate plugin loaded.
#### Which release branch to backport (provide reason below if selected)
<!--
- Note we only backport fixes to a release branch, *not* features!
- Please also provide a reason for the backporting below.
- e.g.
- [x] 202006
-->
- [ ] 201811
- [ ] 201911
- [ ] 202006
- [ ] 202012
- [ ] 202106
#### Description for the changelog
<!--
Write a short (one line) summary that describes the changes in this
pull request for inclusion in the changelog:
-->
Fix tacacs plugin config file path.
Create /var/log/audit folder for auditd.
#### A picture of a cute animal (not mandatory but encouraged)
#### Why I did it
Fix issue https://github.com/Azure/sonic-utilities/issues/1962
The problem is current implementation of [sonic-yang-mgmt::find_data_dependencies](f2774b635d/src/sonic-yang-mgmt/sonic_yang.py (L518)) does not get referrers if they are using `must` statement, it has to use `leafref`.
For now we can convert `must` to `leafref` if possible. In the future we will investigate get referrers by `must` statements as well https://github.com/Azure/sonic-buildimage/issues/9534
#### How I did it
Instead of `must` use `leafref`
#### How to verify it
unit-test
#### Which release branch to backport (provide reason below if selected)
- [ ] 201811
- [ ] 201911
- [ ] 202006
- [ ] 202012
- [ ] 202106
[LLDPD] fix to port remove and immediately create a problem - on delete link events it will immediately execute delete without using aggregate events mechanism.
This patch was added to LLDPD on this PR:
lldpd/lldpd#492
Signed-off-by: tomeri <tomeri@nvidia.com>
- Why I did it
The capability files were incorrect in comparison to the marketing spec of the SN4410 platform.
- How I did it
Aligned the capability files according to the marketing spec.
- How to verify it
Basic manual sanity checks:
1. Check if critical docker containers were UP
2. Check if interfaces were created and were UP
3. Check if interfaces created in the syncd docker container by executing – sx_api_ports_dump.py script
4. Check the logs from the start of the switch – everything was OK
5. Verified the port breakout
Signed-off-by: Vadym Hlushko <vadymh@nvidia.com>
- Why I did it
Rename platform x86_64-mlnx_msn4800 to x86_64-nvidia_msn4800
- How I did it
Rename platform folder as well as all code that reference the platform name
- How to verify it
Manual test
- Create a script in the orchagent docker container which listens for these encapsulated packets which are trapped to CPU (indicating that they cannot be routed/no neighbor info exists for the inner packet). When such a packet is received, the script will issue a ping command to the packet's inner destination IP to start the neighbor learning process.
- This script is also resilient to portchannel status changes (i.e. interface going up or down). An interface going down does not affect traffic sniffing on interfaces which are still up. When an interface comes back up, we restart the sniffer to start capturing traffic on that interface again.
Why I did it
To enable test support for BFD-related features, the PTF docker needs to have the proper support for BFD. This PR aims to add BFD support in ptf docker.
How I did it
Clone and build OpenBFDD for PTF docker.
How to verify it
Build locally and verify BFD is supported.
- Why I did it
To have an ability to use PRM sniffer.
- How I did it
Enabled the option in configure flags.
- How to verify it
Built and ran on switch. Enabled the feature in runtime and checked the sniffer recording.
Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
* [arm64]: Fix registration of the qemu interpreters
The current code doesn't properly run the container that registers the
qemu interpreters. It checks to see if the container is "known" by
Docker, but that doesn't indicate whether it's been run or not.
Therefore, just always register the qemu interpreters in the kernel, to
make sure the binary that's in the slave images that we build is used.
* [build]: Reduce the number of python calls
Modify the BLDENV and PROJECT_ROOT variables in slave.mk to be
immediate execution instead of lazy execution. Neither of these
variables should be changing for the duration of the build in each slave
container, so just run it once instead of every time they're referenced.
When running `make configure` for broadcom arm64 (where all of the slave
images are already built) on an amd64 host, this reduces the time spent
in each slave container from 4.5-5 minutes to 2 minutes.
* [sonic-slave]: Upgrade the qemu used for Bullseye arm64 to 6.1.0
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Why I did it
database.sh failed to create the database for namespace in multiasic platform.
The latest code Docker version 20.10.x, command "docker create" no longer takes optional "NET=" with empty value. Syntax error show with current docker create command in database.sh. Issue #9503
How I did it
Modify the docker_image_ctl.j2 to set default network setting NET="bridge" instead of empty for namespace database.
What I did:
Updated Jinja Template to enable BGP Graceful Restart based on device role. By default it will be enable only if the device role type is TorRouter.
Why I did:-
By default FRR is configured in Graceful Helper mode. Graceful Restart is needed on T0/TorRouter only since the device can go for warm-reboot. For T1/LeafRouter it need to be in Helper mode only
This interface type is used for recirculation on chassis.
The definition is required to prevent this interface from being
considered a physical interface in sonic-platform-common and
sonic-platform-daemon
* Remove the rw folder from the image after installing in KVM
When the image is installed from within KVM and then loaded, some files
(such as timer stamp files) are created as part of that bootup that then
get into the final image. This can cause some side effects, such as
systemd thinking that some persistent timers need to run because the
last trigger time got missed.
Therefore, at the end of the check_install.py script, remove the rw
folder so that it doesn't exist in the image, and that when this image
is started up in a KVM setup for the first time, it starts with a truly
clean slate.
Without this change, the issue seen was that for fstrim.timer, a stamp
file would be present in /var/lib/systemd/timers (and for other timers
that are marked as persistent). This would then cause fstrim.service to
get started immediately when starting a QEMU setup if the timer for that
service missed a trigger, and not wait 10 minutes after bootup. In the
case of fstrim.timer, that means if the image was started in QEMU after
next Monday, since that timer is scheduled to be triggered weekly.
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
* Split installation of SONiC and test bootup into two separate scripts
Just removing the rw directory causes other issues, since the first boot
tasks no longer run since that file isn't present. Also, just recreating
that file doesn't completely help, because there are some files that are
moved from the /host folder into the base filesystem layer, and so are
no longer available.
Instead, split the installation of SONiC and doing the test bootup into
two separate scripts and two separate KVM instances. The first KVM
instance is the one currently being run, while the second one has the
`-snapshot` flag added in, which means any changes to the disk image
don't take effect.
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Fixes#9326
#### Why I did it
When we try execute DPB from CLI we have error:
`libyang[0]: Invalid value "False" in "has_global_scope" element. (path: /sonic-feature:sonic-feature/FEATURE/FEATURE_LIST[name='bgp']/has_global_scope)`
The reason for this issue is that has_global_scope and other have been stored in redis database with value False or True form capital letter:
```
"FEATURE":{
"bgp":{
"auto_restart":"enabled",
"has_global_scope":"False",
"has_per_asic_scope":"True",
"has_timer":"False",
"high_mem_alert":"disabled",
"state":"enabled"
}
```
But yang model support boolean just in lowercase letters (https://datatracker.ietf.org/doc/html/rfc6020#section-9.5.1).
#### How I did it
Added boolean to sonic-types as typedef with different literal cases.
#### How to verify it
Run the command config interface breakout <breakout_mode>
**NOTE:**
To verify this fix, the following PRs that fix other problems in SONiC must be merged into master:
1) Azure/sonic-buildimage/pull/9075
2) Azure/sonic-buildimage/pull/9276
Signed-off-by: Neetha John <nejo@microsoft.com>
Bring back the changes in #9226 that were reverted. Unable to do a revert-revert.
Why I did it
Few device types were missing in the DEVICE_METADATA type field
How I did it
Added missing device types to the device metadata yang
#### Why I did it
Fixing issue #9294
#### How I did it
Updating ACL yang model
#### How to verify it
Validating issue with `config patch-apply` is fixed.
- Start a KVM
- Add file `add-ctrl-plane-tbl.json-patch ` with content:
```json
[
{
"op": "add",
"path": "/ACL_TABLE/ACTRLPLANETABLE",
"value": {
"policy_desc": "ACTRLPLANETABLE",
"services": [
"SSH"
],
"stage": "ingress",
"type": "CTRLPLANE"
}
}
]
```
- Run `sudo config apply-patch add-ctrl-plane-tbl.json-patch`
Before:
```
Patch Applier: The patch was sorted into 4 changes:
Patch Applier: * [{"op": "add", "path": "/ACL_TABLE/ACTRLPLANETABLE", "value": {"type": "CTRLPLANE"}}]
Patch Applier: * [{"op": "add", "path": "/ACL_TABLE/ACTRLPLANETABLE/policy_desc", "value": "ACTRLPLANETABLE"}]
Patch Applier: * [{"op": "add", "path": "/ACL_TABLE/ACTRLPLANETABLE/services", "value": ["SSH"]}]
Patch Applier: * [{"op": "add", "path": "/ACL_TABLE/ACTRLPLANETABLE/stage", "value": "ingress"}]
```
After:
```
Patch Applier: The patch was sorted into 1 change:
Patch Applier: * [{"op": "add", "path": "/ACL_TABLE/ACTRLPLANETABLE", "value": {"policy_desc": "ACTRLPLANETABLE", "services": ["SSH"], "stage": "ingress", "type": "CTRLPLANE"}}]
```
#### Which release branch to backport (provide reason below if selected)
<!--
- Note we only backport fixes to a release branch, *not* features!
- Please also provide a reason for the backporting below.
- e.g.
- [x] 202006
-->
- [ ] 201811
- [ ] 201911
- [ ] 202006
- [ ] 202012
- [ ] 202106
#### Description for the changelog
<!--
Write a short (one line) summary that describes the changes in this
pull request for inclusion in the changelog:
-->
#### A picture of a cute animal (not mandatory but encouraged)
#### Why I did it
Add the configuration for the set_owner in the `feature` yang model
#### How I did it
Add new leaf `set_pwner` to the `feature` yang model
#### How to verify it
compile `sonic_yang_mgmt-1.0-py3-none-any.whl`
- Why I did it
Fixes#8898
Dockerd for multiarch build by default use host OS config file ("/etc/docker/daemon.json").
Default configuration used on host OS may not work for containers run inside sonic-slave.
"DOCKER_CONFIG_FILE_FOR_MULTIARCH" variable allows overriding path to the
config file that will be used for multiarch dockerd.
- How I did it
Added "DOCKER_CONFIG_FILE_FOR_MULTIARCH" to Makefile.work file that allow to
override path to dockerd config file through environment variable:
DOCKER_CONFIG_FILE_FOR_MULTIARCH=${path_to_file}/daemon.json make ...
If the env variable is not set build the system preserves its default behavior.
- How to verify it
Set DOCKER_CONFIG_FILE_FOR_MULTIARCH env variable
Run build
While build is running execute ps -eo pid,cmd | grep "[0-9] dockerd.*march" command
Verify that --config-file parameter is set to the same path that was specified in DOCKER_CONFIG_FILE_FOR_MULTIARCH variable.
The same docker image is built multiple times after upgrading to bullseye, the build time is increased to about 15 hours from 6 hours.
See log: https://dev.azure.com/mssonic/be1b070f-be15-4154-aade-b1d3bfb17054/_apis/build/builds/50390/logs/9
Line 1437: 2021-11-11T11:15:02.7094923Z [ building ] [ target/docker-sonic-telemetry.gz ]
Line 1446: 2021-11-11T11:37:41.1073304Z [ finished ] [ target/docker-sonic-telemetry.gz ]
Line 1459: 2021-11-11T11:38:20.6293007Z [ building ] [ target/docker-sonic-telemetry.gz-load ]
Line 1462: 2021-11-11T11:38:28.1250201Z [ finished ] [ target/docker-sonic-telemetry.gz-load ]
Line 2906: 2021-11-11T18:57:42.8207365Z [ building ] [ target/docker-sonic-telemetry.gz ]
Line 2917: 2021-11-11T19:43:47.1860961Z [ finished ] [ target/docker-sonic-telemetry.gz ]
Line 3997: 2021-11-11T22:49:35.0196252Z [ building ] [ target/docker-sonic-telemetry.gz ]
Line 4002: 2021-11-11T23:14:00.4127728Z [ finished ] [ target/docker-sonic-telemetry.gz ]
How I did it
Place the python wheels in another folder relative to the build distribution.
Co-authored-by: Ubuntu <xumia@xumia-vm1.jqzc3g5pdlluxln0vevsg3s20h.xx.internal.cloudapp.net>
#### Why I did it
Currently only IP ACL and related model is defined. Support for MAC ACL is missing. Added support for it.
#### How I did it
ACL_RULE table is added with new MAC ACL related fields namely Source MAC, Destination MAC, Ethertype (Pattern updated to match any valid Ethertypes), VLAN, PCP, DEI
#### How to verify it
Yang model tests are attached.
Why I did it
Add YANG model support for table CABLE_LENGTH
How I did it
Add the YANG model file
Add the test description file and config file
add list CABLE_LENGTH_LIST to the qos_maps_model list in sonic-yang-ext, as it has an inner list.
How to verify it
Build sonic-yang-model and sonic-yang-mgmt
- Why I did it
Add new Spectrum-4 system support SN5600 on top of Nvidia ASIC simulator.
- How I did it
Add all relevant system and simulator SKU.
Updated syseeprom.hex and related directories to reflect Nvidia SN5600 brand name.
- How to verify it
Tested init flow, basic show commands, up interfaces, traffic test.
Signed-off-by: Raphael Tryster <raphaelt@nvidia.com>
- Why I did it
Fix sonic-config-engine unit test failure
- How I did it
* Do not use pytest fixture in the test since it is not compatible with unittest framework which is used by all of the rest test cases.
* Supply 2 missing files
- How to verify it
Run unit test or compile the module (when the unit test will run automatically)
Signed-off-by: Stephen Sun <stephens@nvidia.com>
- Why I did it
To include latest fixes.
SAI
1. Reclaim buffers for port which is admin down
2. Support for Spectrum-4 os Nvidia ASIC simulation
3. Support for SN2201
4. Fix host interface table entry, one channel per trap (fix sflow double registration)
5. 2 new queue counters - ecn marked packets + shared current occupancy
6. Fix storm policer unknown unicast
7. Add key/value for accuflow counters
8. Add MAC move
9. Add mirror congestion mode attribute
SDK
1. Under various circumstances, Ethernet ports falsely showed that InfiniBand cables were connected.
2. In SN4600C, at times, the link up time in both DAC and optics cables may, in the worst case, take up to 15 seconds.
3. Using SN4600C with copper or optics loopback cables in NRZ speeds, link may raise in long link up times
4. When ECMP has high amount of next-hops based on VLAN interfaces, in some rare cases, packets will get a wrong VLAN tag and will be dropped.
5. When connecting Spectrum devices with optical transceivers that support RXLOS, remote side port down might cause the switch firmware to get stuck and cause unexpected switch behavior.
6. Aggregation event is missing for WJH L2 drop reason 'Unicast egress port list is empty'.
7. Tying the SCL and SDA of the optical modules to 3.3V causes errors.
8. On SN4600, there was a delay of more than 10 seconds from the time a data packet is sent from CPU until it is transmitted through one of the switch ports.
9. While using SN4600C system with Finisar FTLC1157RGPL 100GbE CWDM4 modules, intermittent link flaps across multiple ports may be observed.
10. In Spectrum-2 and Spectrum-3 systems, link did not work in auto-negotiation when connected to Marvell PHY. KR mechanism has been enhanced to integrate with Marvell PHY.
11. The tunnel counter counts the drop packets now for Spectrum-2 and Spectrum-3 and consistent with Spectrum behavior and count the ECN dropped packets as well.
12. When connecting SN3800 to Cisco-9000, fast-linkup flow will fail and will rise in the normal flow.
13. Race condition in WJH library: when multiple threads load the LAG shared memory concurrently, the program may crash.
14. Add WJH L2 drop reason 'Unicast egress port list is empty' as a new drop reason.
15. Fixed a memory leak in sx_api_port_sflow_statistics_get API.
16. During initialization flow, the command interface that is used by the minimal driver and SDK caused the collision in the firmware since the same buffer is used in the firmware for the two interfaces.
17. Fix route issue on Kernel 5.10
- How I did it
Updated SDK/SAI submodule and relevant makefiles with the required versions.
- How to verify it
Build an image and run tests from "sonic-mgmt".
Signed-off-by: Volodymyr Samotiy <volodymyrs@nvidia.com>
Why I did it
#9122
DEVICE_METADATA does not have cloudtype and region.
How I did it
Add cloudtype and region to DEVICE_METADATA.
How to verify it
Follow the steps in #9122.
Build sonic-yang-model.
Signed-off-by: Gang Lv ganglv@microsoft.com
Update Nokia platform sonic-pmon submoduel to the latest with the following commits
c41c823 Fix transceiver module dynamic insertion/removal operations
21a1df6 Fixed pcied process FATAl issue
7fc1fd4 Fix midplane status nokia_cmd
a14ee1c Override get_module() api in chassis
8a457fc SON-326: Watchdog logger changes and file scrubbing
7250eb1 SON-410: Fix missing eeprom access routine
7a70c42 Allow only reboot of self card for OC API test
6ab5d96 Fixed the flake8 compliant issues
807de95 APIs to set thermal threshold to return false
9b38265 SON-382: platform-dump with common techsupport
3f83a67 Add model, base_mac, system_eeprom and serial number support in moduel.py
848d311 SFP: Add get_error_description and fix return status for set_lpmode
1fcb5de PSU check presence of psu instance for APIs
7c68da3 Fixed the eagle and hornet card description
0c01d07 Module support for reboot API
Why I did it
Nvidia platform API does not support set LED to orange
How I did it
Allow user to set LED to orange
How to verify it
Added unit test
Manual test
- Use SfpOptoeBase by default to leverage new `sonic_xcvr` refactor
- Add support for `Woodleaf` product
- Move `libsfp-eeprom.so` to a different `.deb` package
- Add new logrotate configuration for arista logs
- Improve logging mechanism for the drivers (IO loglevel, fix syslog duplicates)
- Initialize chassis cards in parallel
- Refactor of `get_change_event` to fix interrupts treated as presence change
Why I did it
To implement fan control using thermalctld in DellEMC S6000 platform
Requires: Azure/sonic-linux-kernel#241
How I did it
Add thermal policies in 'thermal_policy.json'
Implemented thermal_manager.py and the necessary modules to perform fan control via thermalctld
Removed fancontrol.sh
How to verify it
Verified that the fan speeds are set based on the fan and temperature status.
Logs: S6000_fan_control_test_logs.txt
#### What I did
[sonic-linkmgrd][master] submodule update
6c6151b Fix unstable unit tests (state change handler wasn't invoked) (#8)
2f7dc0a support code diff coverage (#5)
83f0002 Force mux state switch to standby if triggered from Cli (#6)
signed-off-by: Jing Zhang zhangjing@microsoft.com
- Why I did it
Also recalculated all parameters with the latest algorithm with per-speed peer response time taken into account
- How I did it
Detailed information of each SKU:
C64:
t0: 32 100G downlinks and 32 100G uplinks
t1: 56 100G downlinks and 8 100G uplinks with 2km-cable supported
D112C8: 112 50G downlinks and 8 100G uplinks.
D48C40: 48 50G downlinks, 32 100G downlinks, and 8 100G uplinks
D100C12S2: 4 100G downlinks, 2 10G downlinks, 100 50G downlinks, and 8 100G uplinks
2km cable is supported for C64 on t1 only
- How to verify it
Run regression test (QoS)
Signed-off-by: Stephen Sun <stephens@nvidia.com>
#### Why I did it
DPB falls due to missing POLL_INTERVAL in sonic-flex_counter yang model.
#### How I did it
Added POLL_INTERVAL leaf to ACL container in sonic-flex_counter yang model.
#### How to verify it
Run the command config interface breakout <interface> <breakout_mode>
**NOTE:**
To verify this fix, a PR ([add set_owner to feature yang](https://github.com/Azure/sonic-buildimage/pull/9075)) that fix another bug in SONiC should be merged to master.
Why I did it
Add yang model for syslog server
How I did it
Add new file sonic-syslog.yang and new files for tests
How to verify it
Compile target/python-wheels/sonic_yang_mgmt-1.0-py3-none-any.whl
Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan arlakshm@microsoft.com
- Add INCLUDE_PINS to config to enable/disable container
- Add Docker files and supporting resources
- Add sonic-pins submodule and associated make files
Submission containing materials of a third party:
Copyright Google LLC; Licensed under Apache 2.0
#### Why I did it
Adds P4RT container to SONiC for PINS
The P4RT app is covered by this HLD:
https://github.com/pins/SONiC/blob/master/doc/pins/p4rt_app_hld.md
#### How I did it
Followed the pattern and templates used for other SONiC applications
#### How to verify it
Build SONiC with INCLUDE_P4RT set to "y".
Verify that the resulting build has a container called "p4rt" running.
You can verify that the service is up by running the following command on the SONiC switch:
```bash
sudo netstat -lpnt | grep p4rt
```
You should see the service listening on TCP port 9559.
#### Which release branch to backport (provide reason below if selected)
None
#### Description for the changelog
Build P4RT container for PINS
- Why I did it
To fix the above error when running make slave.mk with PLATFORM=vs.
- How I did it
Instead of:
export BUILD_MULTIASIC_KVM=$(BUILD_MULTIASIC_KVM)
do just the export:
export BUILD_MULTIASIC_KVM
BUILD_MULTIASIC_KVM is already defined to be either empty, or from rules/config or from the environment - from Makefile.work. No need to dereference the variable in the export statement.
- How to verify it
PLATFORM=vs make -f slave.mk list # verify no error and BUILD_MULTIASIC_KVM is empty in the output
PLATFORM=vs BUILD_MULTIASIC_KVM=y make -f slave.mk list # verify no error and BUILD_MULTIASIC_KVM is set to y in the output
Signed-off-by: Stepan Blyshchak <stepanb@nvidia.com>