Why I did it
Support to trigger a pipeline to download and publish artifacts to storage and container registry.
Support to specify the patterns which docker images to upload.
How I did it
Pass the pipeline information and the artifact information by pipeline parameters to the pipeline which will be triggered a new build. It is to decouple the artifacts generation and the publish logic, how and where the artifacts/docker images will be published, depends on the triggered pipeline.
How to verify it
- Why I did it
Platform_reboot files for simx doesn't do aything different apart from calling /sbin/reboot. which is anyway done in the /usr/local/bin/reboot script i.e. the parent script which calls the platform specific reboot scripts if present.
Moreover, /sbin/reboot invoked in the platform specific reboot script is a non-blocking call and thus it returns back to the original script (although /sbin/reboot does it job in the background) and we see messages like this.
Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>
In Makefile.cache, for $(1)_DEP_PKGS_SHA, the intention is to include
the DEP_MOD_SHA and MOD_HASH of each of the current package's
dependencies. However, there's a level of dereferencing missing; instead
of grabbing the value of $(dfile)_DEP_MOD_SHA, it is literally using the
variable name $(dfile)_DEP_MOD_SHA. This means that the value of this
variable will not change when some dependency changes.
The impact of this is in transitive dependencies. For a specific
example, if there is some change in sairedis, then sairedis will be
rebuilt (because there's a change within that component), and swss will
be rebuilt (because it's a direct dependency), but
docker-swss-layer-buster will not get rebuilt, because only the direct
dependencies are effectively being checked, and those aren't changing.
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Fix issue: Non compliant leaf list in config_db schema: https://github.com/Azure/sonic-buildimage/issues/9801
The basic flow of DPB is like:
1. Transfer config db json value to YANG json value, name it “yangIn”
2. Validate “yangIn” by libyang
3. Generate a YANG json value to represent the target configuration, name it “yangTarget”
4. Do diff between “yangIn” and “yangTarget”
5. Apply the diff to CONFIG DB json and save it back to DB
The fix:
• For step #1, If value of a leaf-list field string type, transfer it to a list by splitting it with “,” the purpose here is to make step#2 happy. We also need to save <table_name>.<key>.<field_name> to a set named “leaf_list_with_string_value_set”.
• For step#5, loop “leaf_list_with_string_value_set” and change those fields back to a string.
1. Manual test
2. Changed sample config DB and unit test passed
Conflicts:
src/sonic-yang-mgmt/sonic_yang_ext.py
079f80a (HEAD -> 202111, origin/202111) Fix: if routestr does not exist, skip (#257)
8fd0fe1 Fix: not to use blocking get_all() after keys() (#255)
981107a Add VoQ Recirc interface (i.e., Ethernet-Rec) to interface maps for S… (#244)
f4ecfb6 (HEAD -> 202111, origin/202111) Removing Vnet with scope default (#2239)
Why I did it
The PR is aimed to fix a bug that mgmt port eth0 may loss IP even if user configured static IP of eth0. This is not a always reproduceable issue, the reproducing flow is like:
Systemd starts networking service, which runs a dhcp based configuration and assigned an ip from dhcp.
Systemd starts interface-config service who depends on networking service
Interface-config service runs command “ifdown –force eth0”, check line. but networking service is still running so that this line failed with error: “error: Another instance of this program is already running.”. This error is printed by ifupdown2 lib who is the main process of networking service. So, ifdown actually does not work here, the ip of eth0 is not down.
Interface-config service updates /etc/networking/interface to static configuration.
Interface-config service runs command “systemctl restart networking”. This command kills the previous networking related processes (log: networking.service: Main process exited, code=killed, status=15/TERM), and try to reconfigure the ip address with static configuration. But it detects that the configured IP and the existing IP are the same, and it does not really configure the ip to kernel. Hence, the ip is still getting from dhcp. (this could be a bug of ifupdown2: previous ip is from dhcp, new ip is a static ip, it treats them as same instead of re-configuring the IP)
When the lease of the ip expires, the ip of eth0 is removed by kernel and the issue reproduces.
The issue is not always reproduceable because networking service usually runs fast so that it won't hit step#3.
How I did it
Check networking service state before running "ifdown –force eth0", wait for it done if it is activating.
How to verify it
Manual test.
Why I did it
When lldpmgrd handled events of other tables besides PORT_TABLE, error message was printed to log.
How I did it
Handle event according to its file descriptor instead of looping all registered selectables for each coming event.
How to verify it
I verified same events are being handled by printing events key and operation, before and after the change.
Also, before the change, in init flow after config reload, when lldpmgrd handled events of other tables besides PORT_TABLE, error messages were printed to log, this issue is solved now.
- Why I did it
Profiling the system state on init after fast-reboot during create_switch function execution, it is possible to see few python scripts running at the same time.
This parallel execution consume CPU time and the duration of create_switch is longer than it should be.
Following this finding, and the motivation to ensure these services will not interfere in the future, PMON is delayed in 90 seconds until the system finish the init flow after fastboot.
- How I did it
Add a timer for PMON service.
Exclude for MLNX platform the start trigger of PMON when SYNCD starts in case of fastboot.
Copy the timer file to the host bin image.
- How to verify it
Run fast-reboot on MLNX platform and observe faster create_switch execution time.
#### Why I did it
Need to pass LY_CTX_DISABLE_SEARCHDIR_CWD to Context in order to disable automatically searching for schemas in current working directory (which is by default searched automatically)
#### How I did it
add additional attribute into YANG context
#### How to verify it
Create some invalid link on switch :
1) **ln -s /usr/abc xxx**
2) run **spm list**
--> There should not be these messages:
```
libyang[1]: Unable to get information about "xxx" file in "/tmp" when searching for (sub)modules (No such file or directory)
libyang[1]: Unable to get information about "xxx" file in "/tmp" when searching for (sub)modules (No such file or directory)
libyang[1]: Unable to get information about "xxx" file in "/tmp" when searching for (sub)modules (No such file or directory)
libyang[1]: Unable to get information about "xxx" file in "/tmp" when searching for (sub)modules (No such file or directory)
```
- Why I did it
Profiling the system state on init after fast-reboot during create_switch function execution, it is possible to see few python scripts running at the same time.
This parallel execution consume CPU time and the duration of create_switch is longer than it should be.
Following this finding, and the motivation to ensure these services will not interfere in the future, LLDP is delayed in 90 seconds until the system finish the init flow after fastboot.
- How I did it
Add a timer for LLDP service.
Copy the timer file to the host bin image.
- How to verify it
Run fast-reboot on MLNX platform and observe faster create_switch execution time.
This PR is dependent on PR: #10567
* Remove SSH host keys after installing the custom version of sshd
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
* Use an override for for sshd instead of overwriting the service file
Don't overwrite upstream's .service file, and instead use an override
file for making sure the host key(s) are generated.
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
e46b243b Fix checkReplyType failed issue via recreating xcvr_table_helper on forking subprocess (#255) (#256)
Signed-off-by: Stephen Sun <stephens@nvidia.com>
fc29641 [pbh] [aclorch] Fixed a bug causes by updating the flow-counter value for the PBH rule (#2226)
6c38ef7 [QoS] Resolve an issue in the sequence where a referenced object removed and then the referencing object deleting and then re-adding (#2210)
- Why I did it
InvalidPsuVolWA.run might raise exception if user power off PSU when it is running. This exception is not caught and will be raised to psud which causes psud failed to update PSU data to DB.
- How I did it
1. Change the log level when WA does not work. This could happen when user power off PSU, hence changing the log level from error to warning is better
2. Change the wait time from 5 to 1 to avoid introduce too much delay in psud. 1 second is usually enough per my test
3. Give a default return value for function get_voltage_low_threshold and get_voltage_high_threshold to avoid exception reach to psud
- How to verify it
Manual test.
Run sonic-mgmt regression
#### Why I did it
The test plan described in the `How to verify it` section caused an issue when 3 images (instead of 2) were present when executing `show boot` or `sonic-installer list` commands:
```
root@sonic:/home/admin# show boot
Current: SONiC-OS-master.0-dirty-20220118.165941
Next: SONiC-OS-master.0-dirty-20220118.165941
Available:
SONiC-OS-master.0-dirty-20220118.165941
SONiC-OS-202012.201-a0376a6e5_Internal
SONiC-OS-202012.201-a0376a6e5_Internal_RPC
```
#### How I did it
Fixed the `sed` pattern to match the current image revision in the `install.sh` script.
#### How to verify it
Test plan:
1. Install the `imageA` by using ONIE
2. Install the `imageA-rpc` by using `sonic-installer`
3. Reboot the switch
4. Swap to the `imageA` - `sonic-installer set-default imageA`
5. Reboot the switch
6. Install the `imageB` by using `sonic-installer`
7. Check an installed images - `show boot`
8. Reboot the switch
9. Check an installed images - `show boot`
Why I did it
To sign SONiC kernel image and allow secure boot based system to verify SONiC image before loading into the system.
How I did it
Pass following parameter to rules/config.user
Ex:
SONIC_ENABLE_SECUREBOOT_SIGNATURE := y
SIGNING_KEY := /path/to/key/private.key
SIGNING_CERT := /path/to/public/public.cert
How to verify it
Secure boot enabled system enrolled with right public key of the, image in the platform UEFI database will able to verify image before load.
Alternatively one can verify with offline sbsign tool as below.
export SBSIGN_KEY=/abc/bcd/xyz/
sbverify --cert $SBSIGN_KEY/public_cert.cert fsroot-platform-XYZ/boot/vmlinuz-5.10.0-8-2-amd64 mage
O/P:
Signature verification OK
* [CG-Fix-CVE-2021-44906] Patching on thrift.0.14.1 for package minimist
Signed-off-by: richardyu-ms <richard.yu@microsoft.com>
* add more information in patch
Signed-off-by: richardyu-ms <richard.yu@microsoft.com>
* Update 0003-Remove-minimist-packages.patch
* change the thrift 0.14.1 to package download
Signed-off-by: richardyu-ms <richard.yu@microsoft.com>
* use the series file for patching
* fix a code defect
Co-authored-by: Richard.Yu <richard.yu@microsoft.com>
Why I did it
Minigraph parser added a new field 'cluster' to device_metadata, and then yang validation is blocked.
How I did it
Add 'cluster' to device_metadata yang models.
How to verify it
Run UT for sonc-yang-models.
Use minigraph parser to generate ConfigDB schema and run yang validation.
Signed-off-by: Gang Lv ganglv@microsoft.com
Why I did it
dhcp_server is introduced, and need to update yang model.
How I did it
Update yang models and add unit test.
How to verify it
Run unit test for sonic-yang-models.
Signed-off-by: Gang Lv ganglv@microsoft.com
The interface renaming logic fails if one interface is missing.
Because of the `set -e` the whole initramfs hook would abort early on
error.
This change fixes the current behavior to make sure missing interfaces
are properly skipped and ensure existing interface are renamed.
On some products the pci enumeration adds randomness into which nic gets
initialized first.
Because SONiC doesn't use deterministic interface naming but instead old
style interface naming, this leads to eth0 not always being the
management port.
To make sure eth0 is always the management port (SONiC expectation)
rename the interfaces in the initramfs for Arista products.
- Why I did it
There is a hardware bug that PSU voltage threshold sysfs returns incorrect value. The workaround is to call "sensor -s" to refresh it.
- How I did it
Call "sensor -s" when the threshold value is not incorrect and PSU is "DELTA 1100"
- How to verify it
Unit test and Manual test
Why I did it
minigraph parser has introduced new type.
How I did it
Update yang models to support BmcMgmtToRRouter.
How to verify it
Run unit test for sonic-yang-models
Signed-off-by: Gang Lv ganglv@microsoft.com
Why I did it
Need to run yang validation for sonic-cfggen unit test, and many unit test does not provide speed for port table.
How I did it
Update minigraph xml.
How to verify it
Run sonic-cfggen unit test.
Signed-off-by: Gang Lv ganglv@microsoft.com
Why I did it
ASN range is from 1 to 4294967295, need to remove invalid ASN.
How I did it
Update unit test and replace ASN 0.
How to verify it
Run unit test for sonic-config-engine.
Signed-off-by: Gang Lv ganglv@microsoft.com
Why I did it
sonic-config-engine unit test is using invalid switch_type
How I did it
Update xml with correct switch_type
How to verify it
Run UT for sonic-config-engine
Signed-off-by: Gang Lv ganglv@microsoft.com
Why I did it
Need to run yang validation for sonic-cfggen unit test, and many unit test does not provide lanes for port table.
How I did it
Update port config file.
How to verify it
Run sonic-cfggen unit test,
Use below PR to verify
#10228
Signed-off-by: Gang Lv ganglv@microsoft.com