Commit Graph

1068 Commits

Author SHA1 Message Date
judyjoseph
ab713dcfb6 Use the macsec_enabled flag in platform to enable macsec feature state (#11998)
* Use the macsec_enabled flag in platform to enable macesc feature state
* Add macsec supported metadata in DEVICE_RUNTIME_METADATA
2022-11-10 18:08:42 +00:00
mssonicbld
584aaa7058
[ci/build]: Upgrade SONiC package versions (#12612) 2022-11-06 22:25:30 +08:00
mssonicbld
98c3e24770
[ci/build]: Upgrade SONiC package versions (#12606) 2022-11-05 00:36:02 +08:00
mssonicbld
1463af1227
[ci/build]: Upgrade SONiC package versions (#12584) 2022-11-03 00:12:56 +08:00
mssonicbld
fe62175aa6
[ci/build]: Upgrade SONiC package versions (#12571) 2022-11-02 01:18:10 +08:00
mssonicbld
ae681eabb8
[ci/build]: Upgrade SONiC package versions (#12556) 2022-11-01 03:49:12 +08:00
mssonicbld
483257d88c
[ci/build]: Upgrade SONiC package versions (#12543) 2022-10-28 23:15:39 +08:00
Samuel Angebault
8e44292d74
[202205][Arista] Fix cmdline generation during warm-reboot from 201811/201911 (#12371)
* [202012][Arista] Fix cmdline generation during warm-reboot from 201811/201911 (#11161)

Issue fixed: when performing a warm-reboot or fast-reboot from 201811 or 201911 to 202012 the kernel command line contains duplicate information. This issue is related to a change that was made to make 202012 boot0 file more futureproof.
A cold reboot brings everything back into a clean slate though not always desirable.

Changes done:
Added some logic to properly detect the end of the Aboot cmdline when cmdline-aboot-end delimiter is not set (clean case)
Added some logic to regenerate the Aboot cmdline when cmdline-aboot-end is set but duplicate parameters exists before (dirty case). Reorganized some code to handle duplicate parameter handling in the allowlist.

* Fix cmdline generation due to sonic_fips
2022-10-27 10:14:26 -07:00
Samuel Angebault
b1c0d8d5e4
Add emmc quirks to boot0 (#9989) (#12373)
Why I did it
Fix some unreliability seen on emmc device with some AMD CPUs

How I did it
Added a kernel parameter to add quirks to
It depends on a sonic-linux-kernel change to work properly but will be a no-op without it.

Description for the changelog
Add emmc quirks for Upperlake
2022-10-27 07:09:03 -07:00
Devesh Pathak
17c213a264 Fix to improve hostname handling (#12064)
* Fix to improve hostname handling
If config_db.json is missing hostname entry, hostname-config.sh ends
up deleting existing entry too and hostname changes to default 'localhost'

* default hostname to 'sonic` if missing in config file
2022-10-25 21:52:42 +00:00
Samuel Angebault
94c8107f5e Fix extraction of platform.tar.gz for firsttime (#11935) 2022-10-25 20:43:32 +00:00
cytsao1
8930d70972 [pmon] Add smartmontools to pmon docker (#11837)
* Add smartmontools to pmon docker

* Set smartmontools to install version 7.2-1 in pmon to match host; clean up smartmontools build files

* Add comments on smartmontools version for both host and pmon
2022-10-25 20:41:26 +00:00
xumia
db2128564b
[202205] Change submodule path from Azure to sonic-net (#12308)
Why I did it
Change the path of sonic submodules that point to "Azure" to point to "sonic-net"

How I did it
Replace "Azure" with "sonic-net" on all relevant paths of sonic submodules
2022-10-24 13:13:14 +08:00
mssonicbld
abc92c6248
[ci/build]: Upgrade SONiC package versions (#12452) 2022-10-20 03:23:45 +08:00
mssonicbld
5d2db5068c
[ci/build]: Upgrade SONiC package versions (#12437) 2022-10-18 22:19:35 +08:00
mssonicbld
cfc9af71ef
[ci/build]: Upgrade SONiC package versions (#12418) 2022-10-16 22:24:10 +08:00
mssonicbld
b4e6a06d1a
[ci/build]: Upgrade SONiC package versions (#12409) 2022-10-14 23:51:03 +08:00
Ying Xie
a1365b44c3 [BGP] starting BGP service after swss (#12381)
Why I did it
BGP service has always been starting after interface-config. However, recently we discovered an issue where some BGP sessions are unable to establish due to BGP daemon not able to read the interface IP.

This issue was clearly observed after upgrading to FRR 8.2.2. See more details in #12380.

How I did it
Delaying starting BGP seems to be a workaround for this issue.

However, caution is that this delay might impact warm reboot timing and other timing sequences.

This workaround is reducing the probability of hitting the issue by close to 100X. However, this workaround is not bulletproof as test shows. It is still preferrable to have a proper FRR fix and revert this change in the future.

How to verify it
Continuously issuing config reload and check BGP session status afterwards.

Signed-off-by: Ying Xie <ying.xie@microsoft.com>
2022-10-13 16:34:10 +00:00
mssonicbld
3435a8a305
[ci/build]: Upgrade SONiC package versions (#12372) 2022-10-13 02:58:26 +08:00
mssonicbld
1b5d61246a
[ci/build]: Upgrade SONiC package versions (#12324) 2022-10-09 21:44:14 +08:00
Stepan Blyshchak
06f8b1f98a
[auto-ts] add memory check (#10433) (#12291)
#### Why I did it

To support automatic techsupport invokation in case memory usage is too high.

#### How I did it

Implemented according to https://github.com/Azure/SONiC/pull/939

#### How to verify it

UT, manual test on the switch.

*DEPENDS* on https://github.com/Azure/sonic-utilities/pull/2116
2022-10-06 08:06:46 -07:00
Prince George
fab37239dd Disable brackted-paste mode off by default (#12285)
* Disable brackted-paste mode off by default

* address review comment
2022-10-06 14:58:46 +00:00
Saikrishna Arcot
ac19e2a8ba [docker-wait-any]: Exit worker thread if main thread is expected to exit (#12255)
There's an odd crash that intermittently happens after the teamd container
exits, and a signal is raised to the main thread to exit. This thread (watching
teamd) continues execution because it's in a `while True`. The subsequent wait
call on the teamd container very likely returns immediately, and it calls
`is_warm_restart_enabled` and `is_fast_reboot_enabled`. In either of these
cases, sometimes, there is a crash in the transition from C code to Python code
(after the function gets executed).  Python sees that this thread got a signal
to exit, because the main thread is exiting, and tells pthread to exit the
thread.  However, during the stack unwinding, _something_ is telling the
unwinder to call `std::terminate`.  The reason is unknown.

This then results in a python3 SIGABRT, and systemd then doesn't call the stop
script to actually stop the container (possibly because the main process exited
with a SIGABRT, so it's a hard crash). This means that the container doesn't
actually get stopped or restarted, resulting in an inconsistent state
afterwards.

The workaround appears to be that if we know the main thread needs to exit,
just return here, and don't continue execution. This at least tries to avoid it
from getting into the problematic code path. However, it's still feasible to
get a SIGABRT, depending on thread/process timings (i.e. teamd exits, signals
the main thread to exit, and then syncd exits, and syncd calls one of the two C
functions, potentially hitting the issue).

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
2022-10-06 14:57:53 +00:00
mssonicbld
204cf58221
[ci/build]: Upgrade SONiC package versions (#12278) 2022-10-05 20:38:20 +08:00
Ying Xie
76f7d7fa53 Revert "[auto-ts] add memory check (#10433)"
This reverts commit a2cd0f5d4c.
2022-10-04 21:53:45 +00:00
mssonicbld
1a08069d40
[ci/build]: Upgrade SONiC package versions (#12268) 2022-10-04 21:09:24 +08:00
Stepan Blyshchak
a2cd0f5d4c [auto-ts] add memory check (#10433)
#### Why I did it

To support automatic techsupport invokation in case memory usage is too high.

#### How I did it

Implemented according to https://github.com/Azure/SONiC/pull/939

#### How to verify it

UT, manual test on the switch.

*DEPENDS* on https://github.com/Azure/sonic-utilities/pull/2116
2022-10-03 18:58:38 +00:00
mssonicbld
89643d4717
[ci/build]: Upgrade SONiC package versions (#12245) 2022-10-02 21:13:07 +08:00
mssonicbld
a7d088c47c
[ci/build]: Upgrade SONiC package versions (#12191) 2022-09-28 23:25:55 +08:00
mssonicbld
1c5abca0a6
[ci/build]: Upgrade SONiC package versions (#12187) 2022-09-27 08:41:31 +08:00
mssonicbld
99f9c53d19
[ci/build]: Upgrade SONiC package versions (#12142) 2022-09-25 21:57:18 +08:00
Volodymyr Boiko
3d620370f7 [bgp][service] Start bgp service after interfaces-config service (#11827)
- Why I did it
interfaces-config service restarts networking service, during the restart loopback interface address is being removed and reassigned back, leaving loopback without an ipv4 address for a while.
On SONiC startup and config reload interfaces-config and bgp services start in parallel and sometimes
fpmsyncd in bgp attempts bind to loopback while it does not have an address, fails with the log
Exception "Cannot assign requested address" had been thrown in daemon
and exits with rc 0.

root@sonic:/# supervisorctl status
fpmsyncd                         EXITED    Jul 20 05:04 AM
zebra                            RUNNING   pid 35, uptime 6:15:05
zsocket                          EXITED    Jul 20 05:04 AM
docker logs bgp
INFO exited: fpmsyncd (exit status 0; expected)
With fpmsyncd dead, configured routes do not appear in the database.

- How I did it
Added ordering dependency on interfaces-config service into bgp.config

- How to verify it
Itself the issue reproduces quite rarely, but one can gain the time interval between networking down and networking up in interfaces-config.sh like this:

diff --git a/files/image_config/interfaces/interfaces-config.sh b/files/image_config/interfaces/interfaces-config.sh
index f6aa4147a..87caceeff 100755
--- a/files/image_config/interfaces/interfaces-config.sh
+++ b/files/image_config/interfaces/interfaces-config.sh
@@ -63,7 +63,11 @@ done
 # Read sysctl conf files again
 sysctl -p /etc/sysctl.d/90-dhcp6-systcl.conf

-systemctl restart networking
+# systemctl restart networking
+
+systemctl start networking
+sleep 10
+systemctl stop networking

 # Clean-up created files
 rm -f /tmp/ztp_input.json /tmp/ztp_port_data.json
with this change the issue reproduces on every config reload.

Signed-off-by: Volodymyr Boyko <volodymyrx.boiko@intel.com>
2022-09-21 21:15:08 +00:00
Maxime Lorrillere
458b12b4af [Chassis][Voq]Configure midplane network on supervisor (#11725)
Multi-asic Docker instances are created behind Docker's default bridge
which doesn't allow talking to other Docker instances that are in the
host network (like database-chassis).

On linecards, we configure midplane interfaces to let per-asic docker
containers talk to CHASSIS_DB on the supervisor through internal chassis
network.

On the supervisor we don't need to use chassis internal network, but we
still need a similar setup in order to allow fabric containers to talk
to database-chassis
2022-09-21 21:12:40 +00:00
mssonicbld
77b469d7c8
[ci/build]: Upgrade SONiC package versions (#12121) 2022-09-20 21:24:25 +08:00
Oleksandr Ivantsiv
c9ba827773
[202205] [services] Update "WantedBy=" section for tacacs-config.timer. (#11893) (#12080)
Manually cherry-picking #11893

- Why I did it
The timer execution may fail if triggered during a config reload (when the sonic.target is stopped). This might happen in a rare situation if config reload is executed after reboot in a small time slot (for 0 to 30 seconds) before the tacacs-config timer is triggered:

systemctl status tacacs-config.timer
tacacs-config.timer - Delays tacacs apply until SONiC has started
Loaded: loaded (/lib/systemd/system/tacacs-config.timer; enabled-runtime; vendor preset: enabled)
Active: failed (Result: resources) since Mon 2022-08-29 15:53:03 IDT; 1min 28s ago
Trigger: n/a
Triggers: tacacs-config.service

Aug 29 15:47:53 r-boxer-sw01 systemd[1]: Started Delays tacacs apply until SONiC has started.
Aug 29 15:53:03 r-boxer-sw01 systemd[1]: tacacs-config.timer: Failed to queue unit startup job: Transaction for tacacs-config.service/start is destructive (mgmt-framework.timer has 's>
Aug 29 15:53:03 r-boxer-sw01 systemd[1]: tacacs-config.timer: Failed with result 'resources'.

- How I did it
To ensure that timer execution will be resumed after a config reload the WantedBy section of the systemd service is updated to describe relation to sonic.target.

- How to verify it
Reboot the system
After reboot monitor tacacs-config.timer status. 30 seconds before timer activation run "config reload -y" command.
Check system status.

Signed-off-by: Oleksandr Ivantsiv <oivantsiv@nvidia.com>
2022-09-19 09:20:10 +03:00
mssonicbld
f361c029c5
[ci/build]: Upgrade SONiC package versions (#11980) 2022-09-19 12:31:16 +08:00
Aryeh Feigin
b8c6e2a45d
Use warm-boot infrastructure for fast-boot (#12026) 2022-09-14 21:23:34 +03:00
Saikrishna Arcot
f1243bad1b
Pin version of bazelisk to v1.13.0 (#12027)
* Pin version of bazelisk to v1.13.0

This tries to avoid builds failures due to the latest version of
bazelisk changing and causing hash mismatches.

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
2022-09-08 21:15:35 -07:00
Ying Xie
ee40402ab7 Revert "[build] Fix version of bazelist which is lost acccidently (#12012)"
This reverts commit 36c5787daf.
2022-09-09 04:14:59 +00:00
Liu Shilong
36c5787daf
[build] Fix version of bazelist which is lost acccidently (#12012)
Why I did it
bazelisk package with hash value 1227b24db77557d552701f6add122edc is deleted from github release.
Reproducible build only cached hash value. Package file didn't be cached. Because they are in different pipelines.
Using latest package hash instead.
2022-09-09 07:24:44 +08:00
Ze Gan
0a54c46a0d [docker-macsec]: Add dependencies of MACsec (#11770)
Why I did it
If the SWSS services was restarted, the MACsec service should also be restarted. Otherwise the data in wpa_supplicant and orchagent will not be consistent.

How I did it
Add dependency in docker-macsec.mk.

How to verify it
Manually check by 'sudo service swss restart'.

The MACsec container should be started after swss, the syslog will look like


Sep  8 14:36:29.562953 sonic INFO swss.sh[9661]: Starting existing swss container with HWSKU Force10-S6000
Sep  8 14:36:30.024399 sonic DEBUG container: container_start: BEGIN
...
Sep  8 14:36:33.391706 sonic INFO systemd[1]: Starting macsec container...
Sep  8 14:36:33.392925 sonic INFO systemd[1]: Starting Management Framework container...


Signed-off-by: Ze Gan <ganze718@gmail.com>
2022-09-08 15:50:06 +00:00
Ying Xie
b4bf4aca3f [mux] skip mux operations during warm shutdown (#11937)
* [mux] skip mux operations during warm shutdown

- Enhance write_standby.py script to skip actions during warm shutdown.
- Expand the support to BGP service.
- MuX support was added by a previous PR.
- don't skip action during warm recovery

Signed-off-by: Ying Xie <ying.xie@microsoft.com>
2022-09-08 15:48:56 +00:00
Lawrence Lee
12e6b89d80 [arp_update]: Set failed IPv6 neighbors to incomplete (#11919)
After pinging any failed IPv6 neighbor entries, set the remaining failed/incomplete entries to a permanent INCOMPLETE state. This manual setting to INCOMPLETE prevents these entries from automatically transitioning to FAILED state, and since they are now incomplete any subsequent NA messages for these neighbors is able to resolve the entry in the cache.

Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
2022-09-08 15:48:05 +00:00
Stepan Blyshchak
8431d3ab36 [docker-wait-any] immediately start to wait (#11595)
It could happen that a container has already crashed but docker-wait-any
will wait forever till it starts. It should, however, immediately exit
to make the serivce restart.

#### Why I did it

It is observed in some circumstances that the auto-restart mechanism does not work. Specifically for ```swss.service```, ```orchagent``` had crashed before ```docker-wait-any``` started in ```swss.sh```. This led ```docker-wait-any``` wait forever for ```swss``` to be in ```"Running"``` state and it results in:

```
CONTAINER ID   IMAGE                                COMMAND                  CREATED        STATUS                    PORTS     NAMES
1abef1ecebff   bcbca2b74df6                         "/usr/local/bin/supe…"   22 hours ago   Up 22 hours                         what-just-happened
3c924d405cd5   docker-lldp:latest                   "/usr/bin/docker-lld…"   22 hours ago   Up 22 hours                         lldp
eb2b12a98c13   docker-router-advertiser:latest      "/usr/bin/docker-ini…"   22 hours ago   Up 22 hours                         radv
d6aac4a46974   docker-sonic-mgmt-framework:latest   "/usr/local/bin/supe…"   22 hours ago   Up 22 hours                         mgmt-framework
d880fd07aab9   docker-platform-monitor:latest       "/usr/bin/docker_ini…"   22 hours ago   Up 22 hours                         pmon
75f9e22d4fdd   docker-snmp:latest                   "/usr/local/bin/supe…"   22 hours ago   Up 22 hours                         snmp
76d570a4bd1c   docker-sonic-telemetry:latest        "/usr/local/bin/supe…"   22 hours ago   Up 22 hours                         telemetry
ee49f50344b3   docker-syncd-mlnx:latest             "/usr/local/bin/supe…"   22 hours ago   Up 22 hours                         syncd
1f0b0bab3687   docker-teamd:latest                  "/usr/local/bin/supe…"   22 hours ago   Up 22 hours                         teamd
917aeeaf9722   docker-orchagent:latest              "/usr/bin/docker-ini…"   22 hours ago   Exited (0) 22 hours ago             swss
81a4d3e820e8   docker-fpm-frr:latest                "/usr/bin/docker_ini…"   22 hours ago   Up 22 hours                         bgp
f6eee8be282c   docker-database:latest               "/usr/local/bin/dock…"   22 hours ago   Up 22 hours                         database
```

The check for ```"Running"``` state is not needed because for cold boot case we do ```start_peer_and_dependent_services``` and for warm boot case the loop will retry to wait for container if this container is doing warm boot:
d01a91a569/files/image_config/misc/docker-wait-any (L56)

#### How I did it

Removed the check for ```"Running"```.

#### How to verify it

Kill swss before ```docker-wait-any``` is reached and verify auto restart will restart swss serivce.
2022-09-08 15:47:27 +00:00
mssonicbld
dc987ebd2c
[ci/build]: Upgrade SONiC package versions (#11951) 2022-09-05 14:42:32 +08:00
mssonicbld
613d3431d1
[ci/build]: Upgrade SONiC package versions (#11913)
Upgrade SONiC Versions
2022-09-01 15:47:48 +08:00
abdosi
72852cdd02 Address Review Comment to define SONIC_GLOBAL_DB_CLI in gbsyncd.sh (#11857)
As part of PR #11754

    Change was added to use variable SONIC_DB_NS_CLI for
    namespace but that will not work since ./files/scripts/syncd_common.sh
    uses SONIC_DB_CLI. So revert back to use SONIC_DB_CLI and define new
    variable for SONIC_GLOBAL_DB_CLI for global/host db cli access

   Also fixed DB_CLI not working for namespace.
2022-09-01 00:12:56 +00:00
Longxiang Lyu
d7f049ebf0 [mux] Exit to write standby state to active-active ports (#11821)
[mux] Exit to write standby state to `active-active` ports

Signed-off-by: Longxiang Lyu <lolv@microsoft.com>
2022-09-01 00:11:09 +00:00
andywongarista
0adfd724e6
[202205][Arista] Add initial support for 720DT-48S (#10656) (#11860)
Added initial set of config files to allow for booting and partial traffic testing in SONiC on the 720DT-48S.

How to verify it
- Switch boots
- show interfaces status shows links up on interfaces Ethernet24-51
- Traffic flows with no errors on interfaces Ethernet24-51
2022-08-30 12:39:26 +08:00
Stepan Blyshchak
c60d78dd1f [syncd.sh] 'sxdkernel start' => 'sxdkernel restart' (#11718)
Change `sxdkernel start` to `sxdkernel restart`. If `syncd` service crashes in `ExecStartPre` systemd will not call `ExecStop` and thus will not call `sxdkernel stop`. Use of `sxdkernel restart` is more robust in terms of guarantees to restore the system after unexpected crashes.

Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>

Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
2022-08-27 16:16:17 +00:00