Why I did it
Makefile needs some dependencies from the Internet. It will fail for network related issue.
Retries will fix most of these issues.
How I did it
Add retries when running commands which maybe related with networking.
How to verify it
Cherry-pick PR: #11846
Signed-off-by: Saikrishna Arcot sarcot@microsoft.com
Why I did it
The current error handling code for when a deb package fails to be
installed currently has a chain of commands linked together by && and
ends with exit 1. The assumption is that the commands would succeed,
and the last exit 1 would end it with a non-zero return code, thus
fully failing the target and causing the build to stop because of bash's
-e flag.
However, if one of the commands prior to exit 1 returns a non-zero
return code, then bash won't actually treat it as a terminating error.
From bash's man page:
-e Exit immediately if a pipeline (which may consist of a single simple
command), a list, or a compound command (see SHELL GRAMMAR above),
exits with a non-zero status. The shell does not exit if the
command that fails is part of the command list immediately
following a while or until keyword, part of the test following the
if or elif reserved words, part of any command executed in a && or
|| list except the command following the final && or ||, any
command in a pipeline but the last, or if the command's return
value is being inverted with !. If a compound command other than a
subshell returns a non-zero status because a command failed while
-e was being ignored, the shell does not exit.
The part part of any command executed in a && or || list except the command following the final && or || says that if the failing command
is not the exit 1 that we have at the end, then bash doesn't treat it
as an error and exit immediately. Additionally, since this is a compound
command, but isn't in a subshell (subshell are marked by ( and ),
whereas { and } just tells bash to run the commands in the current
environment), bash doesn't exist. The result of this is that in the
deb-install target, if a package installation fails, it may be
infinitely stuck in that while-loop.
This was seen when the snmpd package upgrade happened, and
builds were failing to install the mismatching libsnmp-dev package,
the builds did not immediately terminate; instead, the installation
was retried again and again, suggesting it was stuck in some infinite
loop. The build jobs finally terminated only because of the timeout
specified for the jobs.
How I did it
There are two fixes for this: change to using a subshell, or use ;
instead of &&. Using a subshell would, I think, require exporting any
shell variables used in the subshell, so I chose to change the && to
;. In addition, at the start of the subshell, set +e is added in,
which removes the exit-on-error handling of bash. This makes sure that
all commands are run (the output of which may help for debugging) and
that it still exits with 1, which will then fully fail the target.
How to verify it
Why I did it
Change the path of sonic submodules that point to "Azure" to point to "sonic-net"
How I did it
Replace "Azure" with "sonic-net" on all relevant paths of sonic submodules
Why I did it
Fix the official build not triggered correctly issue, caused by the azp template path not existing.
How I did it
Change the azp template path.
Why I did it
Current isc-dhcp uses below code to remove DHCP option:
memmove(sp, op, op[1] + 2);
sp += op[1] + 2;
sp points to the option to be stripped, we can call it as option S.
op points to the option after options S, we can call it as option O.
DHCP option is a typical type-length-value structure, the first byte is type, the second byte is length, and remain parts are value.
In this case, option O length is bigger than option S, and more than 2 bytes, after the memmove, we will get this result:
Now Option S and Option O are overwritten, op[1] was the length of Option O, and it's modified after memmove.
But current implementation is still using op[1] as length to update sp (sp+=op[1]+2), so we get the wrong sp.
How I did it
Create patch from https://github.com/isc-projects/dhcp
The new impelementation use mlen to store the length of Option O before memmove, that's how it fixed the bug.
size_t mlen = op[1] + 2;
memmove(sp, op, mlen);
sp += mlen;
How to verify it
I have a PR for sonic-mgmt to cover this issue:
sonic-net/sonic-mgmt#6330
Signed-off-by: Gang Lv ganglv@microsoft.com
Signed-off-by: Gang Lv ganglv@microsoft.com
Co-authored-by: ganglv <88995770+ganglyu@users.noreply.github.com>
Why I did it
Fix apt-get remove/purge version not locked issue when the apt-get options not specified.
How I did it
Add a space character before and after the command line parameters.
Co-authored-by: xumia <59720581+xumia@users.noreply.github.com>
Why I did it
Cherry-pick #12009, and fix code conflict.
Fix the dbus-pyhon installation failure when building docker-sonic-vs, caused by the command dbus-run-session not found.
The command "dbus-run-session" should be the new dependency introduced in dbus-python 1.3.2, the old version 1.2.18 does not have the issue.
How I did it
Install the dbus debian package which contains the command dbus-run-session.
It is not a blocking issue on release branches. The release branches with reproducible build feature can avoid such issue in official builds and PR builds, it only block the version upgrade (trying to upgrade from 1.2.18 to 1.3.2).
How to verify it
- Why I did it
Update SDK/FW version - 4.5.2318/2010_2318 to pick up new fixes:
1. Cr space timeout on Hold and Release GW - at warm boot
2. SPC-1 Port in stuck PHY_UP after peer side rebooted
3. memory leak in sx_api_router_ecmp_update_set
- How I did it
update the make file with the new version number
update submodule Switch-SDK-drivers pointer
- How to verify it
run sonic regression
Signed-off-by: Kebo Liu <kebol@nvidia.com>
* [snmpd]: Update to 5.9+dfsg-4+deb11u1 to match Debian version
This brings in some security fixes.
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Co-authored-by: Saikrishna Arcot <sarcot@microsoft.com>
- Why I did it
This is for the eventual support of multiple architectures for the mellanox platform.
- How I did it
Change the location of the binaries in Switch-SDK-drivers so that the path specifies the target architecture in addition to the target distribution that the debians are built for.
This is the most straightforward way to separate binaries built against different architectures and selectively target them for installation in the mellanox SONiC image.
- How to verify it
Build SONiC for mellanox and verify it compiles successfully.
Why I did it
Fix the missing debian package for reproducible build issue.
The gnupg2 should be added into the version file.
https://dev.azure.com/mssonic/build/_build/results?buildId=118139&view=logs&j=88ce9a53-729c-5fa9-7b6e-3d98f2488e3f&t=8d99be27-49d0-54d0-99b1-cfc0d47f0318
The following packages have unmet dependencies:
gnupg2 : Depends: gnupg (>= 2.2.27-2+deb11u2) but 2.2.27-2+deb11u1 is to be installed
E: Unable to correct problems, you have held broken packages.
The issue was caused by the gnupg2 removed, and not detected.
sonic-buildimage/build_debian.sh
Line 250 in 4fb6cf0
sudo LANG=C chroot $FILESYSTEM_ROOT apt-get -y remove software-properties-common gnupg2 python3-gi
Why I did it
Fix the openssh build issue, upgrade from 8.4p1-5 to 8.4p1-5+deb11u1.
https://dev.azure.com/mssonic/build/_build/results?buildId=120209&view=logs&j=88ce9a53-729c-5fa9-7b6e-3d98f2488e3f&t=8d99be27-49d0-54d0-99b1-cfc0d47f0318
+ sudo dpkg --root=./fsroot-broadcom -i target/debs/bullseye/openssh-server_8.4p1-5_amd64.deb
dpkg: warning: downgrading openssh-server from 1:8.4p1-5+deb11u1 to 1:8.4p1-5
(Reading database ... 44818 files and directories currently installed.)
Preparing to unpack .../openssh-server_8.4p1-5_amd64.deb ...
Unpacking openssh-server (1:8.4p1-5) over (1:8.4p1-5+deb11u1) ...
dpkg: dependency problems prevent configuration of openssh-server:
openssh-server depends on openssh-client (= 1:8.4p1-5); however:
Version of openssh-client on system is 1:8.4p1-5+deb11u1.
dpkg: error processing package openssh-server (--install):
dependency problems - leaving unconfigured
Errors were encountered while processing:
openssh-server
+ clean_sys
How I did it
Upgrade openssh from 8.4p1-5 to 8.4p1-5+deb11u1.
How to verify it
Why I did it
When any of the test job failed in the test stage, the rerun will not work, the test stage will be skipped automaticall, so we do not have chance to rerun the test stage again, and the checks of the test will be always in failed status, block the PR to merge forever.
It should be caused by the condition in the Test stage, we should specify the status of the BuildVS stage.
How I did it
Fix stage dependency logic.