Commit Graph

1033 Commits

Author SHA1 Message Date
Oleksandr Ivantsiv
549bb3d483
[services] Update "WantedBy=" section for tacacs-config.timer. (#11893)
The timer execution may fail if triggered during a config reload
(when the sonic.target is stopped). This might happen in a rare
situation if config reload is executed after reboot in a small
time slot (for 0 to 30 seconds) before the tacacs-config timer
is triggered. To ensure that timer execution will be resumed after
a config reload the WantedBy section of the systemd service is updated
to describe relation to sonic.target.

Signed-off-by: Oleksandr Ivantsiv <oivantsiv@nvidia.com>

Signed-off-by: Oleksandr Ivantsiv <oivantsiv@nvidia.com>
2022-09-08 15:16:11 -07:00
Ze Gan
016f671857
[docker-macsec]: Add dependencies of MACsec (#11770)
Why I did it
If the SWSS services was restarted, the MACsec service should also be restarted. Otherwise the data in wpa_supplicant and orchagent will not be consistent.

How I did it
Add dependency in docker-macsec.mk.

How to verify it
Manually check by 'sudo service swss restart'.

The MACsec container should be started after swss, the syslog will look like


Sep  8 14:36:29.562953 sonic INFO swss.sh[9661]: Starting existing swss container with HWSKU Force10-S6000
Sep  8 14:36:30.024399 sonic DEBUG container: container_start: BEGIN
...
Sep  8 14:36:33.391706 sonic INFO systemd[1]: Starting macsec container...
Sep  8 14:36:33.392925 sonic INFO systemd[1]: Starting Management Framework container...


Signed-off-by: Ze Gan <ganze718@gmail.com>
2022-09-08 23:45:06 +08:00
Renuka Manavalan
31e750ee0b
Fix PR build failure (#11973)
Some PR builds fails to find this file. Remove it temporarily until we root cause it
2022-09-06 15:13:05 -07:00
Stepan Blyshchak
a8b2a538a5
[docker-wait-any] immediately start to wait (#11595)
It could happen that a container has already crashed but docker-wait-any
will wait forever till it starts. It should, however, immediately exit
to make the serivce restart.

#### Why I did it

It is observed in some circumstances that the auto-restart mechanism does not work. Specifically for ```swss.service```, ```orchagent``` had crashed before ```docker-wait-any``` started in ```swss.sh```. This led ```docker-wait-any``` wait forever for ```swss``` to be in ```"Running"``` state and it results in:

```
CONTAINER ID   IMAGE                                COMMAND                  CREATED        STATUS                    PORTS     NAMES
1abef1ecebff   bcbca2b74df6                         "/usr/local/bin/supe…"   22 hours ago   Up 22 hours                         what-just-happened
3c924d405cd5   docker-lldp:latest                   "/usr/bin/docker-lld…"   22 hours ago   Up 22 hours                         lldp
eb2b12a98c13   docker-router-advertiser:latest      "/usr/bin/docker-ini…"   22 hours ago   Up 22 hours                         radv
d6aac4a46974   docker-sonic-mgmt-framework:latest   "/usr/local/bin/supe…"   22 hours ago   Up 22 hours                         mgmt-framework
d880fd07aab9   docker-platform-monitor:latest       "/usr/bin/docker_ini…"   22 hours ago   Up 22 hours                         pmon
75f9e22d4fdd   docker-snmp:latest                   "/usr/local/bin/supe…"   22 hours ago   Up 22 hours                         snmp
76d570a4bd1c   docker-sonic-telemetry:latest        "/usr/local/bin/supe…"   22 hours ago   Up 22 hours                         telemetry
ee49f50344b3   docker-syncd-mlnx:latest             "/usr/local/bin/supe…"   22 hours ago   Up 22 hours                         syncd
1f0b0bab3687   docker-teamd:latest                  "/usr/local/bin/supe…"   22 hours ago   Up 22 hours                         teamd
917aeeaf9722   docker-orchagent:latest              "/usr/bin/docker-ini…"   22 hours ago   Exited (0) 22 hours ago             swss
81a4d3e820e8   docker-fpm-frr:latest                "/usr/bin/docker_ini…"   22 hours ago   Up 22 hours                         bgp
f6eee8be282c   docker-database:latest               "/usr/local/bin/dock…"   22 hours ago   Up 22 hours                         database
```

The check for ```"Running"``` state is not needed because for cold boot case we do ```start_peer_and_dependent_services``` and for warm boot case the loop will retry to wait for container if this container is doing warm boot:
d01a91a569/files/image_config/misc/docker-wait-any (L56)

#### How I did it

Removed the check for ```"Running"```.

#### How to verify it

Kill swss before ```docker-wait-any``` is reached and verify auto restart will restart swss serivce.
2022-09-06 09:26:54 -07:00
Zain Budhwani
6a54bc439a
Streaming structured events implementation (#11848)
With this PR in, you flap BGP and use events_tool to see the published events.
With telemetry PR #111 in and corresponding submodule update done in buildimage, one could run gnmi_cli to capture BGP flap events.
2022-09-03 07:33:25 -07:00
Ying Xie
a6843927d9
[mux] skip mux operations during warm shutdown (#11937)
* [mux] skip mux operations during warm shutdown

- Enhance write_standby.py script to skip actions during warm shutdown.
- Expand the support to BGP service.
- MuX support was added by a previous PR.
- don't skip action during warm recovery

Signed-off-by: Ying Xie <ying.xie@microsoft.com>
2022-09-02 13:50:42 -07:00
Lawrence Lee
a762b35cbc
[arp_update]: Set failed IPv6 neighbors to incomplete (#11919)
After pinging any failed IPv6 neighbor entries, set the remaining failed/incomplete entries to a permanent INCOMPLETE state. This manual setting to INCOMPLETE prevents these entries from automatically transitioning to FAILED state, and since they are now incomplete any subsequent NA messages for these neighbors is able to resolve the entry in the cache.

Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
2022-09-02 13:40:40 -07:00
Longxiang Lyu
6e878a36da
[mux] Exit to write standby state to active-active ports (#11821)
[mux] Exit to write standby state to `active-active` ports

Signed-off-by: Longxiang Lyu <lolv@microsoft.com>
2022-08-31 13:10:22 -07:00
abdosi
3bf1abb2dc
Address Review Comment to define SONIC_GLOBAL_DB_CLI in gbsyncd.sh (#11857)
As part of PR #11754

    Change was added to use variable SONIC_DB_NS_CLI for
    namespace but that will not work since ./files/scripts/syncd_common.sh
    uses SONIC_DB_CLI. So revert back to use SONIC_DB_CLI and define new
    variable for SONIC_GLOBAL_DB_CLI for global/host db cli access

   Also fixed DB_CLI not working for namespace.
2022-08-29 08:19:28 -07:00
Hua Liu
214e394ac0
Remove swsssdk from rules and image. (#11469)
#### Why I did it
To deprecate swsssdk, remove all dependency to it. 

#### How I did it
Remove swsssdk from rules and build image scripts.

#### How to verify it
Pass all UT and E2E test case

#### Which release branch to backport (provide reason below if selected)

<!--
- Note we only backport fixes to a release branch, *not* features!
- Please also provide a reason for the backporting below.
- e.g.
- [x] 202006
-->

- [ ] 201811
- [ ] 201911
- [ ] 202006
- [ ] 202012
- [ ] 202106
- [ ] 202111
- [ ] 202205

#### Description for the changelog
Remove swsssdk from rules and build image scripts.

#### Link to config_db schema for YANG module changes
<!--
Provide a link to config_db schema for the table for which YANG model
is defined
Link should point to correct section on https://github.com/Azure/sonic-buildimage/blob/master/src/sonic-yang-models/doc/Configuration.md
-->

#### A picture of a cute animal (not mandatory but encouraged)
2022-08-25 08:35:51 +08:00
anamehra
f404ce60e0
container_checker on supervisor should check containers based on asic presence (#11442)
Why I did it
On a supervisor card in a chassis, syncd/teamd/swss/lldp etc dockers are created for each Switch Fabric card. However, not all chassis would have all the switch fabric cards present. In this case, only dockers for Switch Fabrics present would be created.

The monit 'container_checker' fails in this scenario as it is expecting dockers for all Switch Fabrics (based on NUM_ASIC defined in asic.conf file).
2022-08-22 10:08:29 -07:00
abdosi
535612f808
Added support to add gbsyncd in Feature Table of Host Config DB (#11754)
Why I did:

In case of multi-asic platforms gbsyncd is not getting added to Feature Table of Host Config DB. Without this container_checker complains of not needed gbsyncd container's are running.

How I did:
Update Both Host and Namespace config db when gbsyncd docker is starting.

How I verify:
Verified on Multi-asic platforms.
2022-08-17 14:02:21 -07:00
Stepan Blyshchak
a66941a6ce
[syncd.sh] 'sxdkernel start' => 'sxdkernel restart' (#11718)
Change `sxdkernel start` to `sxdkernel restart`. If `syncd` service crashes in `ExecStartPre` systemd will not call `ExecStop` and thus will not call `sxdkernel stop`. Use of `sxdkernel restart` is more robust in terms of guarantees to restore the system after unexpected crashes.

Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>

Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
2022-08-15 13:35:34 -07:00
lixiaoyuner
8d6431e754
Add k8s master feature (#11637)
* Add k8s master feature

Signed-off-by: Yun Li <yunli1@microsoft.com>

* Update kubernetes version mistake and make variable passing clear

Signed-off-by: Yun Li <yunli1@microsoft.com>

* Add CRI-dockerd package

Signed-off-by: Yun Li <yunli1@microsoft.com>

* Update version variable passing logic

Signed-off-by: Yun Li <yunli1@microsoft.com>

* Upgrade the worker kubernetes version

Signed-off-by: Yun Li <yunli1@microsoft.com>

* Install xml file parse tool

Signed-off-by: Yun Li <yunli1@microsoft.com>

Signed-off-by: Yun Li <yunli1@microsoft.com>
2022-08-13 23:01:35 +08:00
Nikola Dancejic
23dcfdf9b6
[swss] Adding conditional for bgp when on multi ASIC platform (#11691)
bgp should be a per-asic service, and runs for each namespace on
multi-asic platforms. However, putting bgp in MULTI_INST_DEPENDENT
causes swss to be restarted as well as bgp. this is causing issues after #11000

Issue: #11653

This fix:

removes bgp from dependents list
adds a conditional that either adds bgp, or bgp@$DEV to separate
between single and multi-asic platforms
2022-08-12 11:34:10 -07:00
Stepan Blyshchak
2d4299308d
[swss.sh/syncd.sh] Trap only on EXIT (#11590)
When using trap on SIGTERM the script will not react to the SIGTERM signal sent while a child is executing.
I.e, the following script does not react on SIGTERM sent to it if it is
waiting for sleep to finish:

```

trap "echo Handled SIGTERM" 0 2 3 15

echo "Before sleep"
sleep inf
echo "After sleep"
```

Instead, trap only on EXIT which covers also a scenario with exit on
SIGINT, SIGTERM.

Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
2022-08-10 20:57:07 -07:00
Lawrence Lee
889741c9bc
[arp_update]: Resolve failed neighbors on dualtor (#11615)
In arp_update, check for FAILED or INCOMPLETE kernel neighbor entries and manually ping them to try and resolve the neighbor

Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
2022-08-09 16:19:42 -07:00
Ying Xie
a3e3530d1d
[write_standby] update write_standby.py script (#11650)
Why I did it
The initial value has to be present for the state machines to work. In active-standby dual-tor scenario, or any hardware mux scenario, the value will be updtaed eventually with a delay.

However, in active-active dual-tor scenario, there is no other mechanism to initialize the value and get state machines started.
So this script will have to write something at start up time.

For active-active dualtor, 'active' is a more preferred initial value, the state machine will switch the state to standby soon if
link prober found link not in good state.

How I did it
Update the script to always provide initial values.

How to verify it
Tested on active-active dual-tor testbed.

Signed-off-by: Ying Xie ying.xie@microsoft.com
2022-08-09 14:21:29 -07:00
Sudharsan Dhamal Gopalarathnam
5dc4eb9693
[vs]Preventing ebtables cfg to be applied on vs (#11585)
*Preventing ebtables rules to be applied on KVM image. The ebtables rules in SONiC are added to prevent ARP as well as L2 forwarding to be blocked in linux kernel since the hardware will take care of the actual L2 forward. However this is not the case with KVM where linux needs to forward even L2 packets
2022-08-04 09:18:00 -07:00
bingwang-ms
dc799356aa
Support different DSCP_TO_TC_MAP for T1 in dualtor deployment (#11569)
* Support different DSCP_TO_TC_MAP for T1 in dualtor deployment
2022-08-01 09:35:34 +08:00
Nikola Dancejic
8f6b568acf
[swss] Adding bgp container as dependent of swss (#11000)
What I did:
Added bgp as a dependent of swss

Why I did it:
bgp container was not restarting on swss crash. When swss crashes, linkmgrd
doesn't initate a switchover because it cannot access the default route from
orchagent. Bringing down bgp with swss will isolate the ToR, causing linkmgrd
to initiate a switchover to the peer ToR avoiding significant packet loss.

How I did it:
Added bgp to DEPENDENT

Signed-off-by: Nikola Dancejic <ndancejic@microsoft.com>
2022-07-29 16:22:20 -07:00
Jing Zhang
626919e250
Update WARM START FINALIZER to wait for linkmgrd to reconcile (#11477)
Spanning from sonic-net/sonic-linkmgrd#76, this PR is to update warm restart finalizer to wait for linkmgrd to be reconciled.

sign-off: Jing Zhang zhangjing@microsoft.com

Why I did it
To make sure finalizer save config after linkmgrd's reconciliation.

How I did it
Add linkmgrd to the reconciliation wait list of warmboot finalizer.

How to verify it
Verified on lab device, linkmgrd reconciled as expected.
2022-07-28 09:08:53 -07:00
Stepan Blyshchak
925a393e3d
[swss.sh] clear counters cache folder on swss cold/fast reload (#11244)
A change in sonic-utilities makes all cache files be saved into a
/tmp/cache. On swss restart this cache has to be removed in case swss
starts in cold or fast mode. A related cache restoration in the warmboot
finalizer script is also updated to use new location.

- Why I did it
To fix #9817. Clear the cache directory on swss.sh except for warm start.
Also, adopted finalize-warmboot script to take the cache directory.

- How I did it
A change in sonic-utilities makes all cache files be saved into a /tmp/cache. On swss restart this cache has to be removed in case swss starts in cold or fast mode. A related cache restoration in the warmboot finalizer script is also updated to use new location.

- How to verify it
Run togather with Azure/sonic-utilities#2232. Verify counters cache is removed on config reload, cold/fast reboots, swss restart.

Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
2022-07-28 12:03:22 +03:00
Lior Avramov
069b3a4669
[memory_checker] Do not check memory usage of containers if docker daemon is not running (#11476)
Fix in Monit memory_checker plugin. Skip fetching running containers if docker engine is down (can happen in deinit).
This PR fixes issue #11472.

Signed-off-by: liora liora@nvidia.com

Why I did it
In the case where Monit runs during deinit flow, memory_checker plugin is fetching the running containers without checking if Docker service is still running. I added this check.

How I did it
Use systemctl is-active to check if Docker engine is still running.

How to verify it
Use systemctl to stop docker engine and reload Monit, no errors in log and relevant print appears in log.

Which release branch to backport (provide reason below if selected)
The fix is required in 202205 and 202012 since the PR that introduced the issue was cherry picked to those branches (#11129).
2022-07-27 16:18:36 -07:00
abdosi
a380105461
Enable ARP Update Script for Packet based chassis. (#11465)
What I did:

    Following changes done for packet based chassis:-
    1> Run arp_update on LC's to resolve static route nexthops over backend
    port-channel interfaces.
    2> On Supervisor make sure arp_update exit gracefully
2022-07-26 16:50:16 -07:00
Iris Hsu
f323f56c54
flush VRF_OBJECT_TABLE table on state db when swss start (#11509)
*flush VRF_OBJECT_TABLE table on state db when swss start
2022-07-21 18:01:39 -07:00
gregshpit
5df09490dc
Ported Marvell armhf build on amd64 host for debian buster to use cross-comp… (#8035)
* Ported Marvell armhf build on x86 for debian buster to use cross-compilation instead of qemu emulation

Current armhf Sonic build on amd64 host uses qemu emulation. Due to the
nature of the emulation it takes a very long time, about 22-24 hours to
complete the build. The change I did to reduce the building time by
porting Sonic armhf build on amd64 host for Marvell platform for debian
buster to use cross-compilation on arm64 host for armhf target. The
overall Sonic armhf building time using cross-compilation reduced to
about 6 hours.

Signed-off-by: marvell <marvell@cpss-build3.marvell.com>

* Fixed final Sonic image build with dockers inside

* Update Dockerfile.j2

Fixed qemu-user-static:x86_64-aarch64-5.0.0-2 .

* Update cross-build-arm-python-reqirements.sh

Added support for both armhf and arm64 cross-build platform using $PY_PLAT environment variable.

* Update Makefile

Added TARGET=<cross-target> for armhf/arm64 cross-compilation.

* Reviewer's @qiluo-msft requests done

Signed-off-by: marvell <marvell@cpss-build3.marvell.com>

* Added new radius/pam patch for arm64 support

* Update slave.mk

Added missing back tick.

* Added libgtest-dev: libgmock-dev: to the buster Dockerfile.j2. Fixed arm perl version to be generic

* Added missing armhf/arm64 entries in /etc/apt/sources.list

* fix libc-bin core dump issue from xumia:fix-libc-bin-install-issue commit

* Removed unnecessary 'apt-get update' from sonic-slave-buster/Dockerfile.j2

* Fixed saiarcot895 reviewer's requests

* Fixed README and replaced 'sed/awk' with patches

* Fixed ntp build to use openssl

* Unuse sonic-slave-buster/cross-build-arm-python-reqirements.sh script (put all prebuilt python packages cross-compilation/install inside Dockerfile.j2). Fixed src/snmpd/Makefile to use -j1 in all cases

* Clean armhf cross-compilation build fixes

* Ported cross-compilation armhf build to bullseye

* Additional change for bullseye

* Set CROSS_BUILD_ENVIRON default value n

* Removed python2 references

* Fixes after merge with the upstream

* Deleted unused sonic-slave-buster/cross-build-arm-python-reqirements.sh file

* Fixed 2 @saiarcot895 requests

* Fixed @saiarcot895 reviewer's requests

* Removed use of prebuilt python wheels

* Incorporated saiarcot895 CC/CXX and other simplification/generalization changes

Signed-off-by: marvell <marvell@cpss-build3.marvell.com>

* Fixed saiarcot895 reviewer's  additional requests

* src/libyang/patch/debian-packaging-files.patch

* Removed --no-deps option when installing wheels. Removed unnecessary lazy_object_proxy arm python3 package instalation

Co-authored-by: marvell <marvell@cpss-build3.marvell.com>
Co-authored-by: marvell <marvell@cpss-build2.marvell.com>
2022-07-21 14:15:16 -07:00
Nazarii Hnydyn
e4e3adcbc2
[ssip]: Update config generator (#10991)
- Why I did it

To implement Syslog Source IP feature
In order to include the following commit: 8e5d478 [ssip]: Add CLI (#2191)

- How I did it
Updated syslog config template
Advanced submodule sonic-utilities

ea11b22 [sonic-bootchart] add sonic-bootchart (#2195)
8e5d478 [ssip]: Add CLI (#2191)
1dacb7f Replace pyswsssdk with swsscommon (#2251)

- How to verify it
make configure PLATFORM=mellanox
make target/sonic-mellanox.bin

Signed-off-by: Nazarii Hnydyn <nazariig@nvidia.com>
2022-07-20 10:05:13 +03:00
Stephen Sun
8f4a1b7b85
[Mellanox] Support Mellanox-SN4600C-C64 as T1 switch in dual-ToR scenario (#11261)
- Why I did it
Support Mellanox-SN4600C-C64 as T1 switch in dual-ToR scenario
This is to port #11032 and #11299 from 202012 to master.

Support additional queue and PG in buffer templates, including both traditional and dynamic model
Support mapping DSCP 2/6 to lossless traffic in the QoS template.
Add macros to generate additional lossless PG in the dynamic model
Adjust the order in which the generic/dedicated (with additional lossless queues) macros are checked and called to generate buffer tables in common template buffers_config.j2
Buffer tables are rendered via using macros.
Both generic and dedicated macros are defined on our platform. Currently, the generic one is called as long as it is defined, which causes the generic one always being called on our platform. To avoid it, the dedicated macrio is checked and called first and then the generic ones.
Support MAP_PFC_PRIORITY_TO_PRIORITY_GROUP on ports with additional lossless queues.
On Mellanox-SN4600C-C64, buffer configuration for t1 is calculated as:

40 * 100G downlink ports with 4 lossless PGs/queues, 1 lossy PG, and 3 lossy queues
16 * 100G uplink ports with 2 lossless PGs/queues, 1 lossy PG, and 5 lossy queues

Signed-off-by: Stephen Sun <stephens@nvidia.com>
2022-07-20 09:48:15 +03:00
andywongarista
88d0ce5ce8
Add gbsyncd container for broncos (#11154)
* Add docker-gbsyncd-broncos support
* Address review comments
* Add socket to gbsyncd
* Upgrade gbsyncd-broncos to bullseye
2022-07-18 10:57:27 +08:00
bingwang-ms
52c36cd31f
Add flag to control the generation of PORT_QOS_MAP|global entry (#11448)
Why I did it
This PR is to add a flag to control whether to generate PORT_QOS_MAP|global entry or not.
It's because for some HWSKU, such as BackEndToRRouter and BackEndLeafRouter, there is no DSCP_TO_TC_MAP defined.
Hence, if the PORT_QOS_MAP|global entry is generated, OA will report some error because the DSCP_TO_TC_MAP map AZURE can not be found.

Jul 14 00:24:40.286767 str2-7050qx-32s-acs-03 ERR swss#orchagent: :- saiObjectTypeQuery: invalid object id oid:0x7fddb43605d0
Jul 14 00:24:40.286767 str2-7050qx-32s-acs-03 ERR swss#orchagent: :- meta_generic_validation_objlist: SAI_SWITCH_ATTR_QOS_DSCP_TO_TC_MAP:SAI_ATTR_VALUE_TYPE_OBJECT_ID object on list [0] oid 0x7fddb43605d0 is not valid, returned null object id
Jul 14 00:24:40.286767 str2-7050qx-32s-acs-03 ERR swss#orchagent: :- applyDscpToTcMapToSwitch: Failed to apply DSCP_TO_TC QoS map to switch rv:-5
Jul 14 00:24:40.286767 str2-7050qx-32s-acs-03 ERR swss#orchagent: :- doTask: Failed to process QOS task, drop it
This PR is to address the issue.

How I did it
Add a flag require_global_dscp_to_tc_map to control whether to generate the PORT_QOS_MAP|global entry. The default value for require_global_dscp_to_tc_map is true. If the device type is storage backend, the value is changed to false. Then the PORT_QOS_MAP|global entry is not generated.

How to verify it
Update the current test_qos_dscp_remapping_render_template to cover storage backend.
2022-07-14 10:19:00 -07:00
Alexander Allen
429254cb2d
[arm] Refactor installer and build to allow arm builds targeted at grub platforms (#11341)
Refactors the SONiC Installer to support greater flexibility in building for a given architecture and bootloader. 

#### Why I did it

Currently the SONiC installer assumes that if a platform is ARM based that it uses the `uboot` bootloader and uses the `grub` bootloader otherwise. This is not a correct assumption to make as ARM is not strictly tied to uboot and x86 is not strictly tied to grub. 

#### How I did it

To implement this I introduce the following changes:
* Remove the different arch folders from the `installer/` directory
* Merge the generic components of the ARM and x86 installer into `installer/installer.sh`
* Refactor x86 + grub specific functions into `installer/default_platform.conf` 
* Modify installer to call `default_platform.conf` file and also call `platform/[platform]/patform.conf` file as well to override as needed
* Update references to the installer in the `build_image.sh` script
* Add `TARGET_BOOTLOADER` variable that is by default `uboot` for ARM devices and `grub` for x86 unless overridden in `platform/[platform]/rules.mk`
* Update bootloader logic in `build_debian.sh` to be based on `TARGET_BOOTLOADER` instead of `TARGET_ARCH` and to reference the grub package in a generic manner

#### How to verify it

This has been tested on a ARM test platform as well as on Mellanox amd64 switches as well to ensure there was no impact. 

#### Description for the changelog
[arm] Refactor installer and build to allow arm builds targeted at grub platforms

#### Link to config_db schema for YANG module changes

N/A
2022-07-12 15:00:57 -07:00
geogchen
5171589f4d
Add support to generate /e/n/i when there are multiple MGMT_INTERFACE (#11368)
Why I did it
Currently interfaces.j2 hardcodes to eth0 even when there are multiple interfaces in MGMT_INTERFACE. This change adds support to generate /e/n/i when there are multiple interfaces in MGMT_INTERFACE.

How I did it
By removing hardcoded eth0 when looping through MGMT_INTERFACE.

How to verify it
Verified through unit test.

Which release branch to backport (provide reason below if selected)
 201811
 201911
 202006
 202012
 202106
 202111
 202205
Description for the changelog
Link to config_db schema for YANG module changes
A picture of a cute animal (not mandatory but encouraged)
2022-07-12 14:55:15 -07:00
tjchadaga
4f95974669
Add load_minigraph option to include traffic-shift-away during config migration (#11403) 2022-07-12 10:08:58 -07:00
tjchadaga
849eb4bf32
Changes to persist TSA/B state across reloads (#11257) 2022-07-12 00:22:48 -07:00
Neetha John
7cd10019b8
Add backend acl template (#11220)
Why I did it
Storage backend has all vlan members tagged. If untagged packets are received on those links, they are accounted as RX_DROPS which can lead to false alarms in monitoring tools. Using this acl to hide these drops.

How I did it
Created a acl template which will be loaded during minigraph load for backend. This template will allow tagged vlan packets and dropped untagged

How to verify it
Unit tests

Signed-off-by: Neetha John <nejo@microsoft.com>
2022-07-06 10:24:16 -07:00
Stepan Blyshchak
ef8675d7ab
[sonic_debian_extension] install systemd-bootchart (#11047)
- Why I did it
Implemented sonic-net/SONiC#1001

- How I did it
Install systemd-bootchart tool and provide default config for it.

- How to verify it
Run build and verify systemd-bootchart is installed.

Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
2022-07-06 14:03:31 +03:00
judyjoseph
baaf8b085f
Revert "Update include_macsec flag if type is SpineRouter (#11141)" (#11306)
This reverts commit c9f36957db.
2022-07-01 11:00:30 -07:00
Jing Zhang
5d03b5d0df
Avoid write_standby in warm restart context (#11283)
Avoid write_standby in warm restart context.

sign-off: Jing Zhang zhangjing@microsoft.com

Why I did it
In warm restart context, we should avoid mux state change.

How I did it
Check warm restart flag before applying changes to app db.

How to verify it
Ran write_standby in table missing, key missing, field missing scenarios.
Did a warm restart, app db changes were skipped. Saw this in syslog:
WARNING write_standby: Taking no action due to ongoing warmrestart.
2022-06-29 21:34:02 -07:00
andywongarista
6e0559d5fa
[Arista] Add initial support for 720DT-48S (#10656)
Added initial set of config files to allow for booting and partial traffic testing in SONiC on the 720DT-48S.

How to verify it
- Switch boots
- show interfaces status shows links up on interfaces Ethernet24-51
- Traffic flows with no errors on interfaces Ethernet24-51
2022-06-29 09:56:24 -07:00
davidpil2002
8b7d069880
Add support for Password Hardening (#10323)
- Why I did it
New security feature for enforcing strong passwords when login or changing passwords of existing users into the switch.

- How I did it
By using mainly Linux package named pam-cracklib that support the enforcement of user passwords, the daemon named hostcfgd, will support add/modify password policies that enforce and strengthen the user passwords.

- How to verify it
Manually Verification-
1. Enable the feature, using the new sonic-cli command passw-hardening or manually add the password hardening table like shown in HLD by using redis-cli command

2. Change password policies manually like in step 1.
Notes:
password hardening CLI can be found in sonic-utilities repo-
P.R: Add support for Password Hardening sonic-utilities#2121
code config path: config/plugins/sonic-passwh_yang.py
code show path: show/plugins/sonic-passwh_yang.py

3. Create a new user (using adduser command) or modify an existing password by using passwd command in the terminal. And it will now request a strong password instead of default linux policies.

Automatic Verification - Unitest:
This PR contained unitest that cover:
1. test default init values of the feature in PAM files
2. test all the types of classes policies supported by the feature in PAM files
3. test aging policy configuration in PAM files
2022-06-29 15:34:56 +03:00
bingwang-ms
ac86f71287
Add extra lossy PG profile for ports between T1 and T2 (#11157)
Signed-off-by: bingwang <wang.bing@microsoft.com>

Why I did it
This PR brings two changes

Add lossy PG profile for PG2 and PG6 on T1 for ports between T1 and T2.
After PR Update qos config to clear queues for bounced back traffic #10176 , the DSCP_TO_TC_MAP and TC_TO_PG_MAP is updated when remapping is enable

DSCP_TO_TC_MAP
Before	After	Why do this change
"2" : "1"	"2" : "2"	Only change for leaf router to map DSCP 2 to TC 2 as TC 2 will be used for lossless TC
"6" : "1"	"6" : "6"	Only change for leaf router to map DSCP 6 to TC 6 as TC 6 will be used for lossless TC

TC_TO_PRIORITY_GROUP_MAP
Before	After	Why do this change
"2" : "0"	"2" : "2"	Only change for leaf router to map TC 2 to PG 2 as PG 2 will be used for lossless PG
"6" : "0"	"6" : "6"	Only change for leaf router to map TC 6 to PG 6 as PG 6 will be used for lossless PG

So, we have two new lossy PGs (2 and 6) for the T2 facing ports on T1, and two new lossless PGs (2 and 6) for the T0 facing port on T1.
However, there is no lossy PG profile for the T2 facing ports on T1. The lossless PGs for ports between T1 and T0 have been handled by buffermgrd .Therefore, We need to add lossy PG profiles for T2 facing ports on T1.

We don't have this issue on T0 because PG 2 and PG 6 are lossless PGs, and there is no lossy traffic mapped to PG 2 and PG 6

Map port level TC7 to PG0
Before the PCBB change, DSCP48 -> TC 6 -> PG 0.
After the PCBB change, DSCP48 -> TC 7 -> PG 7
Actually, we can map TC7 to PG0 to save a lossy PG.

How I did it
Update the qos and buffer template.

How to verify it
Verified by UT.
2022-06-28 12:50:33 -07:00
judyjoseph
c9f36957db
Update include_macsec flag if type is SpineRouter (#11141)
Add the support to enable macsec when type is SpineRouter
2022-06-24 10:32:02 -07:00
geogchen
6a9c058a92
Revert "Add support for generating interface configuration in /etc/network/interfaces for multiple management interfaces (#11204)" (#11241)
This reverts commit 90a849ea85.

#### Why I did it
The interfaces unit test did not cover some of the conditions in interfaces.j2 that was changed in #11204.  Therefore reverting the change and add the tests before making the change to interfaces.j2.

#### How I did it
Git revert.

#### How to verify it

#### Which release branch to backport (provide reason below if selected)


- [ ] 201811
- [ ] 201911
- [ ] 202006
- [ ] 202012
- [ ] 202106
- [ ] 202111
- [ ] 202205

#### Description for the changelog

#### Link to config_db schema for YANG module changes

#### A picture of a cute animal (not mandatory but encouraged)
2022-06-24 06:30:33 -07:00
Sudharsan Dhamal Gopalarathnam
9452095e25
[lldp]Fix lldp spawned after reboot when disabled (#11080)
- Why I did it
When LLDP is disabled through feature command, it gets spawned after reboot.

- How I did it
In syncd.sh check if the service is enabled before spawning automatically during cold reboot.

- How to verify it
Disable lldp feature. Perform cold reboot and verify its not spawned.
2022-06-22 03:11:41 +03:00
geogchen
90a849ea85
Add support for generating interface configuration in /etc/network/interfaces for multiple management interfaces (#11204)
* [Interfaces] Modify template to support multiple management interfaces

* Modify minigraph to process interfaces in sorted order

Signed-off-by: Ubuntu <gechen@gechen-sonic-dev.d0r25nej54guppclip4gpy5b5a.jx.internal.cloudapp.net>

* Add UT minigraph

Signed-off-by: Ubuntu <gechen@gechen-sonic-dev.d0r25nej54guppclip4gpy5b5a.jx.internal.cloudapp.net>

* make case insensitve comparison

Signed-off-by: George Chen <gechen@microsoft.com>

* Use natural sort

Signed-off-by: George Chen <gechen@microsoft.com>

Co-authored-by: Ubuntu <gechen@gechen-sonic-dev.d0r25nej54guppclip4gpy5b5a.jx.internal.cloudapp.net>
2022-06-21 10:16:10 -07:00
xumia
fdef1f0342
[Build]: Support to use symbol links for lazy installation targets to reduce the image size (#10923)
Why I did it
Support to use symbol links in platform folder to reduce the image size.
The current solution is to copy each lazy installation targets (xxx.deb files) to each of the folders in the platform folder. The size will keep growing when more and more packages added in the platform folder. For cisco-8000 as an example, the size will be up to 2G, while most of them are duplicate packages in the platform folder.

How I did it
Create a new folder in platform/common, all the deb packages are copied to the folder, any other folders where use the packages are the symbol links to the common folder.

Why platform.tar?
We have implemented a patch for it, see #10775, but the problem is the the onie use really old unzip version, cannot support the symbol links.
The current solution is similar to the PR 10775, but make the platform folder into a tar package, which can be supported by onie. During the installation, the package.tar will be extracted to the original folder and removed.
2022-06-21 13:03:55 +08:00
jingwenxie
fdc65d7600
Remove minigraph loading in updategraph script (#11146)
Why I did it
Minigraph will be deprecated in the future. So minigraph related reload should be deleted.

How I did it
Remove unused load_minigraph
2022-06-21 08:57:57 +08:00
Stepan Blyshchak
42576d2664
[auto-ts] add memory check (#10433)
#### Why I did it

To support automatic techsupport invokation in case memory usage is too high.

#### How I did it

Implemented according to https://github.com/Azure/SONiC/pull/939

#### How to verify it

UT, manual test on the switch.

*DEPENDS* on https://github.com/Azure/sonic-utilities/pull/2116
2022-06-20 09:39:05 -07:00
yozhao101
241f4454b4
[memory_checker] Do not check memory usage of containers which are not created (#11129)
Signed-off-by: Yong Zhao yozhao@microsoft.com

Why I did it
This PR aims to fix an issue (#10088) by enhancing the script memory_checker.

Specifically, if container is not created successfully during device is booted/rebooted, then memory_checker do not need check its memory usage.

How I did it
In the script memory_checker, a function is added to get names of running containers. If the specified container name is not in current running container list, then this script will exit without checking its memory usage.

How to verify it
I tested on a lab device by following the steps:

Stops telemetry container with command sudo systemctl stop telemetry.service

Removes telemetry container with command docker rm telemetry

Checks whether the script memory_checker ran by Monit will generate the syslog message saying it will exit without checking memory usage of telemetry.
2022-06-17 12:13:18 -07:00