Commit Graph

1015 Commits

Author SHA1 Message Date
Junchao-Mellanox
4f326e8779
Fix race condition between networking service and interface-config service (#10573) (#10766)
Backport https://github.com/Azure/sonic-buildimage/pull/10573 to 202012.

#### Why I did it

The PR is aimed to fix a bug that mgmt port eth0 may loss IP even if user configured static IP of eth0. This is not a always reproduceable issue, the reproducing flow is like:

1.	Systemd starts networking service, which runs a dhcp based configuration and assigned an ip from dhcp.
2.	Systemd starts interface-config service who depends on networking service
3.	Interface-config service runs command  “ifdown –force eth0”, check [line](16717d2dc5/files/image_config/interfaces/interfaces-config.sh (L4)). but networking service is still running so that this [line](ac32bec0e2/ifupdown2/ifupdown/main.py (L74)) failed with error: “error: Another instance of this program is already running.”. This error is printed by ifupdown2 lib who is the main process of networking service. So, ifdown actually does not work here, the ip of eth0 is not down.
4.	Interface-config service updates /etc/networking/interface to static configuration.
5.	Interface-config service runs command “systemctl restart networking”. This command kills the previous networking related processes (log: networking.service: Main process exited, code=killed, status=15/TERM), and try to reconfigure the ip address with static configuration. But it detects that the configured IP and the existing IP are the same, and it does not really configure the ip to kernel. Hence, the ip is still getting from dhcp. (this could be a bug of ifupdown2: previous ip is from dhcp, new ip is a static ip, it treats them as same instead of re-configuring the IP)
6.	When the lease of the ip expires, the ip of eth0 is removed by kernel and the issue reproduces.

The issue is not always reproduceable because networking service usually runs fast so that it won't hit step#3.

#### How I did it

Check networking service state before running "ifdown –force eth0", wait for it done if it is activating.

#### How to verify it

Manual test.
2022-05-14 14:58:24 -07:00
xumia
951d93e362 Reduce image size for lazy installation packages (#10775)
Why I did it
The image size is too large, when there are multiple lazy packages and multiple platforms. It is not necessary to keep the lazy installation packages in multiple copies.
For cisco image, the image size will reduce from 3.5G to 1.7G.

How I did it
Use symbol links to only keep one package for each of the lazy package.
Make a new folder fsroot/platform/common
Copy the lazy packages into the folder.
When using a package in each of the platform, such as x86_64-grub, x86_64-8800_rp-r0, x86_64-8201_on-r0, etc, only make a symbol link to the package in the common folder.
2022-05-10 06:44:40 +00:00
Samuel Angebault
705d3c0804 [Arista] Remove arista.log from rsyslog default logrotate (#9731)
Why I did it
In parallel of this change Arista added a custom logrotate configuration as part of its driver library.
Having 2 logrotate configuration for the same log file triggers an issue.

Fixes aristanetworks/sonic#38

How I did it
Arista merged a few changes in sonic-buildimage which added a logrotate configuration aristanetworks/sonic@e43c797
It is therefore the right path to remove the arista.log line from the logrotate.d/rsyslog configuration.

How to verify it
Logrotate works without any error message, arista log rotation happens and arista daemons still append logs once file was truncated.
2022-04-28 23:58:41 +00:00
mssonicbld
1c9cdc4c7a
[ci/build]: Upgrade SONiC package versions (#10594) 2022-04-27 15:25:14 +00:00
yozhao101
e6c18fa6dd [Monit] Fix the issue which shows Monit can not reset its counter. (#10288)
Signed-off-by: Yong Zhao <yozhao@microsoft.com>

Why I did it
This PR aims to fix the Monit issue which shows Monit can't reset its counter when monitoring memory usage of telemetry container.

Specifically the Monit configuration file related to monitoring memory usage of telemetry container is as following:

  check program container_memory_telemetry with path "/usr/bin/memory_checker telemetry 419430400"
      if status == 3 for 10 times within 20 cycles then exec "/usr/bin/restart_service telemetry"
If memory usage of telemetry container is larger than 400MB for 10 times within 20 cycles (minutes), then it will be restarted.
Recently we observed, after telemetry container was restarted, its memory usage continuously increased from 400MB to 11GB within 1 hour, but it was not restarted anymore during this 1 hour sliding window.

The reason is Monit can't reset its counter to count again and Monit can reset its counter if and only if the status of monitored service was changed from Status failed to Status ok. However, during this 1 hour sliding window, the status of monitored service was not changed from Status failed to Status ok.

Currently for each service monitored by Monit, there will be an entry showing the monitoring status, monitoring mode etc. For example, the following output from command sudo monit status shows the status of monitored service to monitor memory usage of telemetry:

    Program 'container_memory_telemetry'
         status                             Status ok
         monitoring status          Monitored
         monitoring mode          active
         on reboot                      start
         last exit value                0
         last output                    -
         data collected               Sat, 19 Mar 2022 19:56:26
Every 1 minute, Monit will run the script to check the memory usage of telemetry and update the counter if memory usage is larger than 400MB. If Monit checked the counter and found memory usage of telemetry is larger than 400MB for 10 times
within 20 minutes, then telemetry container was restarted. Following is an example status of monitored service:

    Program 'container_memory_telemetry'
         status                             Status failed
         monitoring status          Monitored
         monitoring mode          active
         on reboot                      start
         last exit value                0
         last output                    -
         data collected               Tue, 01 Feb 2022 22:52:55
After telemetry container was restarted. we found memory usage of telemetry increased rapidly from around 100MB to more than 400MB during 1 minute and status of monitored service did not have a chance to be changed from Status failed to Status ok.

How I did it
In order to provide a workaround for this issue, Monit recently introduced another syntax format repeat every <n> cycles related to exec. This new syntax format will enable Monit repeat executing the background script if the error persists for a given number of cycles.

How to verify it
I verified this change on lab device str-s6000-acs-12. Another pytest PR (Azure/sonic-mgmt#5492) is submitted in sonic-mgmt repo for review.
2022-04-21 22:00:42 +00:00
Samuel Angebault
9de6b2ca12
[Arista] Fix arista-net initramfs hook (#10626)
The interface renaming logic fails if one interface is missing.
Because of the `set -e` the whole initramfs hook would abort early on
error.
This change fixes the current behavior to make sure missing interfaces
are properly skipped and ensure existing interface are renamed.
2022-04-20 10:03:37 -07:00
Jing Kan
4ee75f490e
[202012][copp_cfg] Enable dhcp trap for BmcMgmtToRRouter (#10596)
Signed-off-by: Jing Kan jika@microsoft.com
2022-04-19 15:59:20 +08:00
Stepan Blyshchak
fa1e364f54
[services] kill container on stop in warm/fast mode (#10511)
To optimize stop on warm boot, added kill for containers

Use service "kill" in the shutdown path for fast and warm reboot. For all other reload methods, service "stop" is used.
This is done to save time in shutdown path, and to overall improve the time spent in warm and fast reload.

How - Use service_mgmt.sh to trigger common logic to initiate kill (fast/warm) or stop (cold) for database.sh, radv.sh, snmp.sh, telemetry.sh, mgmt-framework.sh

Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>, Vaibhav H D <vaibhav.dixit@microsoft.com>
2022-04-18 14:27:48 -07:00
Ying Xie
6af3de4372
[202012][copp cfg] enable dhcp trap for a couple more devices (#10582)
* [copp cfg] enable copp trap for a couple more devices

Signed-off-by: Ying Xie <ying.xie@microsoft.com>
2022-04-15 11:47:02 -07:00
Saikrishna Arcot
29b6f62902
[202012] Run tune2fs during initramfs instead of image install (#10558)
If it is run during image install, it's not guaranteed that the
installation environment will have tune2fs available. Therefore, run it
during initramfs instead.

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
2022-04-12 19:59:24 -07:00
mssonicbld
e0fa07307a
[ci/build]: Upgrade SONiC package versions (#10395)
[ci/build]: Upgrade SONiC package versions (#10395)
2022-04-10 17:00:00 +08:00
kellyyeh
b68f4dd74c
Enable dhcp copp trap for EPMS and MgmtTsToR (#10439) 2022-04-06 09:46:08 -07:00
Saikrishna Arcot
e9db38594d
Image disk space reduction (#10172) (#10371)
Reduce the disk space taken up during bootup and runtime.

1. Remove python package cache from the base image and from the containers.
2. During bootup, if logs are to be stored in memory, then don't create the `var-log.ext4` file just to delete it later during bootup.
3. For the partition containing `/host`, don't reserve any blocks for just the root user. This just makes sure all disk space is available for all users, if needed during upgrades (for example).

* Remove pip2 and pip3 caches from some containers

Only containers which appeared to have a significant pip cache size are
included here.

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>

* Don't create var-log.ext4 if we're storing logs in memory

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>

* Run tune2fs on the device containing /host to not reserve any blocks for just the root user

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
(cherry picked from commit 5617b1ae3e)
2022-03-29 10:11:28 -07:00
mssonicbld
873689ef6e
[ci/build]: Upgrade SONiC package versions (#10373) 2022-03-28 23:08:38 +00:00
mssonicbld
e71c14502d
[ci/build]: Upgrade SONiC package versions (#10331)
Upgrade SONiC Versions
2022-03-25 15:09:37 +08:00
Saikrishna Arcot
aafb3d00e2
Start haveged before systemd-random-seed (#10328)
The haveged service file in Debian Buster specifies that haveged should
start after systemd-random-seed starts (this was removed in Bullseye
after systemd changes caused a bootloop). This is a bit
counterproductive, since haveged is meant to be used in environments
with minimal sources of entropy, but one of the checks that
systemd-random-seed does is to verify that entropy is present.

Therefore, override the default .service file for haveged that moves
systemd-random-seed to the Before list, allowing it to start before
systemd-random-seed checks the system entropy level. (systemd doesn't
allow removing items from dependency/ordering entries such as After= and
Before=, so the entire .service file has to be overwritten.)

Note that despite this, haveged takes up to two seconds to actually
start working, so systemd-random-seed may still block for about two
seconds. However, this still allows other work (such as running
rc.local) to proceed a bit sooner.

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
2022-03-24 14:28:42 -07:00
noaOrMlnx
4f021c44c2
Update docker-sonic-vs infrastructure in order to run CoPP UT (#10230)
*Changes to run CoPP UT in docker-sonic-vs
2022-03-21 21:55:24 -07:00
xumia
67312ff635
[Build]: Use one debian mirror config (#10281)
Why I did it
Use one debian mirror config.
The empty config in https://github.com/Azure/sonic-buildimage/blob/master/files/image_config/apt/sources.list overrides the file https://github.com/Azure/sonic-buildimage/blob/master/files/apt/sources.list.amd64 (armhf/arm64), it does not make sense.
All the content in files/image_config/apt is no use, any one wants to add mirror config, please add in files/apt.

How I did it
Remove files/image_config/apt and the reference.
2022-03-21 17:04:19 +08:00
gechiang
a984757b9d
[202012 BRCM SAI 4.3.5.3-3] Picked up fixes that makes up BRCM SAI version 4.3.5.3-3 (#10255) 2022-03-19 17:18:50 -07:00
xumia
413ee3e219
[Build]: Fix /proc not mounted issue (#10164) (#10256)
[Build]: Fix /proc not mounted issue
2022-03-19 22:19:06 +08:00
mssonicbld
03d058efe4
[ci/build]: Upgrade SONiC package versions (#10283)
[ci/build]: Upgrade SONiC package versions
2022-03-19 11:09:51 +08:00
Stepan Blyshchak
8ce5e4e77b [teamd.sh] kill teamd docker on warm shutdown for faster shutdown (#10219)
This can save 6 sec for teamd LAG restoration - the time between:

```
Mar  9 13:51:10.467757 r-panther-13 WARNING teamd#teamd_PortChannel1[28]: Got SIGUSR1.
Mar  9 13:52:33.310707 r-panther-13 INFO teamd#teamd_PortChannel1[27]: carrier changed to UP
```

- Why I did it
Optimize warm boot. Specifically reduce the time needed for LAG restoration.

- How I did it
Kill teamd docker after graceful shutdown of teamd processes.

- How to verify it
Run warm reboot.

Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
2022-03-16 22:22:26 +00:00
wenyiz2021
5878cfdb06 Update container_checker for multi-asic devices when state is 'always_enabled' (#10067)
* Update container_checker for multi-asic devices 

Update container_checker for multi-asic devices to add database containers in always_running_containers. 
Previous change was made for single-asic, and that database containers were not considered as feature when writing to state_db.

* Update container_checker

Update an indent
2022-03-14 23:01:43 +00:00
mssonicbld
1c4364222d
[ci/build]: Upgrade SONiC package versions (#10214) 2022-03-11 15:13:09 +00:00
Santhosh Kumar T
e83955599d
[202012] Refactoring DELL platform init to reduce rc.local processing time (#10171)
Why I did it
To reduce the processing time of rc.local, refactoring s6100 platform initialization.
Fixing [warm-upgrade][202012] Slow DELL platform init in rc.local causes lacp-teardown #10150
How I did it
On branch 202012-s6100-rclocalChanges to be committed:  (use "git restore --staged <file>..." to unstage)
        modified:   ../../../../files/image_config/platform/rc.local        
	modified:   ../debian/platform-modules-s6100.install        
	modified:   scripts/fast-reboot_plugin
        modified:   scripts/s6100_platform.sh
        renamed:    scripts/s6100_i2c_enumeration.sh -> scripts/s6100_platform_startup.sh
        renamed:    systemd/s6100-i2c-enumerate.service -> systemd/s6100-platform-startup.service
2022-03-10 18:51:07 -08:00
mssonicbld
7fe1489061
[ci/build]: Upgrade SONiC package versions (#10194) 2022-03-09 22:41:51 +00:00
mssonicbld
063882cf87
[ci/build]: Upgrade SONiC package versions (#10069)
[ci/build]: Upgrade SONiC package versions (#10069)
2022-03-08 21:32:36 +08:00
xumia
a8d844c83d
[build]: Fix marvell-armhf build hung issue (#10156)
The marvel-armhf build is hung, it does not exist after waiting for a long time.
It is caused by the process /etc/entropy.py which is started by the postinst script in target/debs/buster/sonic-platform-nokia-7215_1.0_armhf.deb

$ cat postinst 
sh /usr/sbin/nokia-7215_plt_setup.sh
...

$ cat usr/sbin/nokia-7215_plt_setup.sh | tail

    python /etc/entropy.py &


$ cat etc/entropy.py 
if path.exists("/proc/sys/kernel/random/entropy_avail"):
    while 1:
        while avail() < 2048:
            with open('/dev/urandom', 'rb') as urnd, open("/dev/random", mode='wb') as rnd:
                d = urnd.read(512)
                t = struct.pack('ii', 4 * len(d), len(d)) + d
                fcntl.ioctl(rnd, RNDADDENTROPY, t)
        time.sleep(30)

It is a workaround to fix the build issue, need to fix debian package, and revert the change.
2022-03-07 08:00:56 -08:00
roman_savchuk
4d6f9f2de7
[ BFN ] update SDE package for BFN platform (#10049)
Updated SDE package for Barefoot platform with fixes for:

- NAT
- VRF
2022-03-04 20:43:08 -08:00
Qi Luo
04925df451
[build] Fix the urllib3 version in sonic-mgmt-framework (#10149)
Fix the urllib3 version in sonic-mgmt-framework constrain file because it is already updated in Dockerfile
2022-03-04 20:34:23 -08:00
gechiang
7fb546dce4
[202012]BRCM SAI 4.3.5.3-2 Fixes CS00012228504, SONIC-55963:SID, CS00012209080, CS00012220761, and CS00012222414 (#10155) 2022-03-04 16:24:59 -08:00
Lawrence Lee
4d1abbc09b [write_standby]: Increase timeout to 60s (#10065)
- Avoid scenarios where script times out before orchagent can establish IPinIP tunnel

Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
2022-03-01 22:49:17 +00:00
noaOrMlnx
7a35504ff7
[202012] [CoPP] Add always_enabled field (#9999)
Add the "always_enabled" field to copp_cfg.j2 file, in order to allow traps without an entry in features table, to be installed automatically.

This is a cherry-pick of https://github.com/Azure/sonic-buildimage/pull/9302

- Why I did it
In order to allow traps without an entry in features table, to be installed automatically.

- How I did it
Add always_enabled field to traps without a feature
2022-02-20 12:42:39 +02:00
mssonicbld
a23aac25d3
[ci/build]: Upgrade SONiC package versions (#10023)
[ci/build]: Upgrade SONiC package versions
2022-02-19 08:10:17 +08:00
Samuel Angebault
b32d7eedaf
Add emmc quirks to boot0 (#9989)
Why I did it
Fix some unreliability seen on emmc device with some AMD CPUs

How I did it
Added a kernel parameter to add quirks to
It depends on a sonic-linux-kernel change to work properly but will be a no-op without it.

Description for the changelog
Add emmc quirks for Upperlake
2022-02-17 08:55:01 -08:00
vmittal-msft
304ec5b0cd
Updated traffic scheduler settings for HWSKUs : DellEMC-Z9332f-O32 & DellEMC-Z9332f-M-O16C64 (#9927) 2022-02-15 16:15:20 -08:00
mssonicbld
f746d27c7d
[ci/build]: Upgrade SONiC package versions (#9933) 2022-02-09 00:59:47 +00:00
Prince George
c1a0871fe9 Close console session due to user inactivity (#9890)
Signed-off-by: Prince George <prgeor@microsoft.com>
2022-02-08 19:07:29 +00:00
tbgowda
78dc2d8a7b Enable SAI_SWITCH_ATTR_UNINIT_DATA_PLANE_ON_REMOVAL attribute (#9419)
Why I did it
Fixes #8980 partly.

The corresponding changes in sonic-sairedis is here :
Azure/sonic-sairedis#975

How I did it
Include changes from both repos and build an image for verification.

How to verify it
Trigger fast-reboot with the changes, see the attribute SAI_SWITCH_ATTR_UNINIT_DATA_PLANE_ON_REMOVAL being set at the SAI level.

Signed-off-by: Thushar Gowda <24815472+tbgowda@users.noreply.github.com>
2022-02-08 19:07:08 +00:00
vmittal-msft
7435613216
[202012] BRCM SAI 4.3.5.3-1 Fix for CS00012218555 (#9923) 2022-02-07 08:02:57 -08:00
Shi Su
4191889803
[bgpcfgd] Add bgpcfgd support to advertise routes (#9197) (#9697)
Why I did it
Cherry pick changes in #9197 to 202012 branch
Add bgpcfgd support to advertise routes.

How I did it
Make bgpcfgd subscribe to the ADVERTISE_NETWORK table in STATE_DB and configure route advertisement accordingly.

How to verify it
Added unit tests in bgpcfgd and verify on KVM about route advertisement.
2022-01-26 14:38:04 -08:00
mssonicbld
3dae536de4
[ci/build]: Upgrade SONiC package versions (#9834) 2022-01-23 22:13:50 +00:00
mssonicbld
ae7514b1bd
[ci/build]: Upgrade SONiC package versions (#9832) 2022-01-22 16:01:17 +00:00
dflynn-Nokia
c715bdbf56 [firsttime boot] suppress error message on platforms not supporting kdump (#9521)
Why I did it
Eliminate benign firsttime boot error reported when running on platforms that do not support kdump.

How I did it
Change rc.local to check for presence of the file /etc/default/kdump-tools before referencing it.

How to verify it
Install a new image on an armhf or arm64 platform and check for a failed reference to /etc/default/kdump-tools on firsttime boot.
2022-01-21 02:39:17 +00:00
gechiang
090ef33ca2
[202012]BRCM SAI 4.3.5.3 Fixes CS00012218100,CS00012215529,CS00012208995,CS00012220761,CS00012211718,CS00012208995,CS00012220761, and CS00012225760 (#9815) 2022-01-20 15:28:34 -08:00
mssonicbld
2eb8fe3a2c
[ci/build]: Upgrade SONiC package versions (#9799) 2022-01-19 22:46:23 +00:00
gechiang
bdc7ce86de
[202012] BRCM SAI 4.3.5.2 Fixes CS00012205357, CS00012214196, CS00012213974 (#9754) 2022-01-13 11:40:43 -08:00
mssonicbld
a0376a6e59
[ci/build]: Upgrade SONiC package versions (#9680) 2022-01-07 22:12:12 +00:00
mssonicbld
9b1a3971bd
[ci/build]: Upgrade SONiC package versions (#9645) 2021-12-26 23:30:40 +00:00
mssonicbld
813a6387c5
[ci/build]: Upgrade SONiC package versions (#9543) 2021-12-24 17:05:45 +00:00
vmittal-msft
724037ebc3
BRCM SAI 4.3.5.1-9 for enabling SAI_SWITCH_ATTR_QOS_DSCP_TO_TC_MAP capability (#9463) 2021-12-14 09:56:21 -08:00
Lawrence Lee
b3a3aa0c38 [mux]: Fix mark_dhcp_packet (#9373)
- Consolidate the two [Service] sections by moving the ExecStartPre line for mark_dhcp_packet.py to the first section and removing the second.
- Make the mark_dhcp_packet.py file executable
- Also clean up mark_dhcp_packet.py
    - Remove unused imports
    - Fix spacing and line lengths to conform to PEP8
Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
2021-12-01 02:28:56 +00:00
Stephen Sun
fafd5327bd [Reclaim buffer] Common infrastructure update for reclaiming buffer (#9133)
- Why I did it
This is to update the common sonic-buildimage infra for reclaiming buffer.

- How I did it
Render zero_profiles.j2 to zero_profiles.json for vendors that support reclaiming buffer
The zero profiles will be referenced in PR [Reclaim buffer] Reclaim unused buffers by applying zero buffer profiles #8768 on Mellanox platforms and there will be test cases to verify the behavior there.
Rendering is done here for passing azure pipeline.
Load zero_profiles.json when the dynamic buffer manager starts
Generate inactive port list to reclaim buffer

Signed-off-by: Stephen Sun <stephens@nvidia.com>
2021-12-01 02:28:46 +00:00
gechiang
a5f4780c64
[202012] BRCM SAI 4.3.5.1-8 Pick up fix for PFCWD getting continuously triggered/restored when pause frames are sent continuously to both queues of a port (#9296)
1.  CS00012211718 [4.3] Pfcwd getting continuously triggered/restored when pause frames are sent continuously to both queues of a port (TD2/Th/Th2/TD3) MSFT Default

Preliminary tests look fine. BGP neighbors were all up with proper routes programmed
interfaces are all up
Manually ran the following test cases on 7050CX3 (TD3) T0 DUT and all passed:
```
     fib/test_fib.py
     vxlan/test_vxlan_decap.py
     fdb/test_fdb.py
     decap/test_decap.py
     ipfwd/test_dip_sip.py 
     ipfwd/test_dir_bcast.py
     acl/test_acl.py
     vlan/test_vlan.py
     platform_tests/test_reboot.py
```
2021-11-17 21:30:10 -08:00
trzhang-msft
19008889de update DHCP_PACKET_MARK schema (#9077)
- update DHCP_PACKET_MARK schema in state_db
- this is an update over PR: Add service mark_dhcp_packet to mux container #9015
2021-11-15 21:37:08 +00:00
trzhang-msft
86fa5eede2 Add service mark_dhcp_packet to mux container (#9015)
- add a new service "mark_dhcp_packet" to mux container
- apply packet marks on a per-interface basis in ebtables
- write packet marks to "DHCP_PACKET_MARK" table in state_db
2021-11-15 21:36:29 +00:00
Renuka Manavalan
6cb7af73d9 add arista.log to logrotate (#9245) 2021-11-15 21:32:03 +00:00
mssonicbld
36f1a547b1
[ci/build]: Upgrade SONiC package versions (#9255) 2021-11-14 23:26:35 +00:00
mssonicbld
4d15a1c1f6
[ci/build]: Upgrade SONiC package versions (#9221) 2021-11-13 23:37:09 +00:00
gechiang
7ac5b40f4b
[202012]BRCM SAI 4.3.5.1-7 Picked up fixes for CS00012209390, CS00012212995, SONIC-51583, CS00012215744, and SONIC-51638 (#9252)
This is to pick up BRCM SAI 4.3.5.1-7 fixes which contains the following fixes:

1.  CS00012209390: SONIC-50037, Used SAI_SWITCH_ATTR_QOS_DSCP_TO_TC_MAP as a default decap map for IPinIP tunnels.
2.  CS00012212995: SONIC-50948 SAI_API_QUEUE:_brcm_sai_cosq_stat_get:1353 egress Min limit get failed with error Invalid parameter 
3.  SONIC-51583: Fixed acl group member creation failure with priority of -1
4.  CS00012215744:SONIC-51395 [TH, TH2] WB 3.5 to 4.3 fails at APPLY_VIEW while setting SAI_PORT_ATTR_EGRESS_ACL
5.  SONIC-51638: SDK-249337 ERROR: AddressSanitizer: heap-buffer-overflow in _tlv_print_array

Preliminary tests look fine. BGP neighbors were all up with proper routes programmed
interfaces are all up
Manually ran the following test cases on 7050CX3 (TD3) T0 DUT and all passed:
```
     fib/test_fib.py
     vxlan/test_vxlan_decap.py
     fdb/test_fdb.py
     decap/test_decap.py
     ipfwd/test_dip_sip.py 
     ipfwd/test_dir_bcast.py
     acl/test_acl.py
     vlan/test_vlan.py
     platform_tests/test_reboot.py
```
2021-11-13 10:45:46 -08:00
Mykhailo Onipko
a7117b905f
[BFN]: Updated SDK packages to 20211112 (#9244)
Signed-off-by: Mykhailo Onipko <monipko@barefootnetworks.com>
2021-11-12 21:47:56 -08:00
Lawrence Lee
b027e87ffb [mux.service]: Remove pmon dependency (#9211)
Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
2021-11-11 02:56:27 +00:00
Lawrence Lee
f317d93cb0 Merged PR 4679112: [write_standby]: Ignore non-auto interfaces
[write_standby]: Ignore non-auto interfaces

* In the event that `write_standby.py` is used to automatically switchover interfaces when linkmgrd or bgp crashes, ignore any interfaces that are not configured to auto-switch

Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
2021-11-10 18:54:33 -08:00
Lawrence Lee
57ad50cfd9 Merged PR 4559560: [bgp]: Switch to standby if BGP container exits
[bgp]: Switch mux to standby if BGP container exits

Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
2021-11-10 18:54:33 -08:00
Lawrence Lee
6a9c709336 [write_standby]: Improve logging
Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
2021-11-10 18:54:33 -08:00
Lawrence Lee
77378b4364 [mux]: Call write_standby from host only
Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
2021-11-10 18:54:33 -08:00
Lawrence Lee
25712c712e [mux]: Make write_standby available on host
Signed-off-by: Lawrence Lee <lawlee@microsoft.com>

[write_standby]: Cleanup and fix build

Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
2021-11-10 18:54:33 -08:00
Tamer Ahmed
18d1f65339 Merged PR 4813977: [mux] Update Service Install With SONiC Target
[mux] Update Service Install With SONiC Target

Recent PR grouped all SONiC service into sonic.taget. The install section
of mux.service was not update and this causes delays when using config
reload as the service failed state is not being reset.

signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
2021-11-10 18:54:33 -08:00
Lawrence Lee
70fbd6826c Merged PR 4366316: [mux.service]: Bind to sonic.target
[mux.service]: Bind to sonic.target

Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
2021-11-10 18:54:33 -08:00
Tamer Ahmed
b42aef68f3 Merged PR 4234524: [mux] Start Mux on Only Dual-ToR Platform
[mux] Start Mux on Only Dual-ToR Platform

mux docker depends on the presence of mux cable hardware and is
supposed to run only Gemini ToRs. This PR change the mux feature
config in order to enable mux docker based on device configuration.

signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>
2021-11-10 18:54:33 -08:00
Tamer Ahmed
b8f70f8986 Merged PR 3845699: [linkmgrd]: Introduce MUX cable linkmgrd
Linkmgrd monitors link status, mux status, and link state. Has
the link becomes unhealthy, linkmgrd will trigger mux switchover
on a standby ToR ensuring uninterrupted service to servers/blades.
This PR is initial implementation of linkmgrd.

Also, docker-mux container hold packages related to maintaining and managing
mux cable. It currently runs linkmgrd binary that monitor and switches
the mux if needed.
This PR also introduces mux-container and starts linkmgrd as startup when
build is configured with INCLUDE_MUX=y

Edit: linkmgrd PR will follow.

signed-off-by: Tamer Ahmed <tamer.ahmed@microsoft.com>

Related work items: #2315, #3146150
2021-11-10 18:54:33 -08:00
tjchadaga
9a1b1bc44e Fix for additional intf flap during fast-reboot (#9166) 2021-11-09 23:20:06 +00:00
mssonicbld
c15bae7c84
[ci/build]: Upgrade SONiC package versions (#9128) 2021-11-09 22:52:26 +00:00
gechiang
400e40f255
[202012] BRCM SAI 4.3.5.1-6 Picked up fixes for CS00012213351, CS00012182162, and CS00012210826 (#9158)
This is to pick up BRCM SAI 4.3.5.1-6 fixes which contains the following fixes:

1.  CS00012213351 SONIC-50679: [TH, TH2] Warm-reboot from 3.5 to 4.3 fails due to null objects discovered
2.  CS00012182162: SONIC-49805 TD3 MMU config profile optimization changes 
3.  CS00012210826:SONIC-50205/760c60fc: Should read MMU_INTFI_MMU_PORT_TO_MMU_QUEUES_FC_BKP for TH3

Preliminary tests looks fine. BGP neighbors were all up with proper routes programmed
interfaces are all up
Manually ran the following test cases on 7050CX3 (TD3) T0 DUT and all passed:
```
     fib/test_fib.py
     vxlan/test_vxlan_decap.py
     fdb/test_fdb.py
     decap/test_decap.py
     ipfwd/test_dip_sip.py 
     ipfwd/test_dir_bcast.py
     acl/test_acl.py
     vlan/test_vlan.py
     platform_tests/test_reboot.py
```
2021-11-03 07:24:33 -07:00
Sumukha Tumkur Vani
65626c8925
Flush RESTAPI DB upon config reload (#9093) 2021-10-28 09:31:38 -07:00
Nazarii Hnydyn
0cbda8d362 [teamd]: Send USR1/USR2 only to subscribers. (#8856)
To fix teamd signal handling, without which Process 'tlm_teamd' exited unexpectedly
2021-10-27 03:54:58 +00:00
mssonicbld
1c86196411
[ci/build]: Upgrade SONiC package versions (#9050) 2021-10-25 17:09:12 +00:00
gechiang
c95178157d
[202012]BRCM SAI 4.5.3.1-5 picked up SAI fixes for several CSP cases (#9003) 2021-10-19 14:08:31 -07:00
Ying Xie
f1d5aaced0 [copp] bind copp-config.service to sonic.target (#8969)
copp-config service needs to be started after sonic.target so that it could
render the copp-config with the latest information.

It also needs to be restarted when config reload or load_minigraph is invoked.

Signed-off-by: Ying Xie <ying.xie@microsoft.com>
2021-10-15 00:40:05 +00:00
gechiang
eca9020a48
[202012] BRCM SAI 4.5.3.1-4 Fixes dscp-uniform mode, th3 debug counter bmp crash (#8968)
* [202012] BRCM SAI 4.5.3.1-4 Fixes dscp-uniform mode, th3 debug counter bmp crash
2021-10-13 08:25:44 -07:00
mssonicbld
b11d6cf5ee
[ci/build]: Upgrade SONiC package versions (#8919) 2021-10-09 19:12:09 +00:00
mssonicbld
0f48239167
[ci/build]: Upgrade SONiC package versions (#8894) 2021-10-02 19:01:40 +00:00
mssonicbld
d790caecbc
[ci/build]: Upgrade SONiC package versions (#8867) 2021-09-29 17:11:32 +00:00
Vaibhav Hemant Dixit
636870d86f Save DB dump after warm/fast reboot (#8803)
As a part of warmboot, redis database is dumped:
c97fe546e5/scripts/fast-reboot (L269)
However, this dump file is deleted, after it is loaded back into db post reboot.
The DB dump can be useful for debugging purpose, hence taking a backup of it can be useful.
Instead of deleting the dump, rename and keep the dump.
2021-09-27 02:29:12 +00:00
gechiang
ac9feadbf1
[202012] BRCMSAI 4.3.5.1-3 fix CS00012203600, CS00012202255, CS00012208537 (#8840) 2021-09-25 17:09:34 -07:00
mssonicbld
667fe3702c
[ci/build]: Upgrade SONiC package versions (#8829) 2021-09-23 17:34:56 +00:00
mssonicbld
c988a7766c
[ci/build]: Upgrade SONiC package versions (#8800) 2021-09-20 12:48:20 +00:00
mssonicbld
7ce529ea35
[ci/build]: Upgrade SONiC package versions (#8795) 2021-09-19 15:26:49 +00:00
mssonicbld
f716745d76
[ci/build]: Upgrade SONiC package versions (#8637) 2021-09-17 16:40:09 +00:00
abdosi
7732fa95bb [baseimage]: Logrotate for wtmp and btmp files. (#8743)
Added logrotate file for wtmp and btmp to override default conf and set size cap as 100K as done in 
PR: #865. For buster this is control by separate file wtmp and btmp.

Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
2021-09-17 08:24:10 +00:00
Sudharsan Dhamal Gopalarathnam
9c5917d8dd Removing execute permission from copp config file (#8680)
*Removed execute permissions from the systemd copp-config.service file. 
Without this we will get a warning: "Configuration file /lib/systemd/system/copp-config.service is marked executable. Please remove executable permission bits. Proceeding anyway."
2021-09-14 08:59:21 +00:00
Ying Xie
e8b8012818 [202012][fstrim] delay fstrim timer after sonic.target (#8737)
Why I did it
fstrim has dependency on pmon docker.

How I did it
start fstrim timer after sonic.target.

How to verify it
local test and PR test.

Signed-off-by: Ying Xie ying.xie@microsoft.com
2021-09-14 08:59:17 +00:00
gechiang
84b5659372
[202012] BRCM SAI 4.3.5.1-2 Fix BRCM SAI regression due to ACL Egress Mirroring Action capability (#8682) 2021-09-06 22:12:59 -07:00
Samuel Angebault
96f2eaaadb [Arista] Fix flash size computation for Lodoga (#8622)
The Lodoga platform also matched crow which was hardcoding the flash
size to 3700. This change enables autodetect on Clearlake which in turns
allows autodetect for Lodoga.

The threshold was bumped from 3700 to 4000 because size computation can
differ slightly and report slightly above 3700.
2021-09-01 01:40:45 +00:00
mssonicbld
7eb4a345fa
[ci/build]: Upgrade SONiC package versions (#8584)
Co-authored-by: mssonicbld <vsts@fv-az232-326.x3jni0md3anuvcz2px3t3ecixa.bx.internal.cloudapp.net>
2021-08-30 16:24:18 +08:00
Samuel Angebault
01117d58b5 [Arista] Rely on automatic flash size detection for Lodoga (#8608)
Lodoga actually has a 8GB storage device.
LodogaSsd variant has a 30GB SSD drive.
However, in boot0 both were mishandled and assigned 4GB for legacy reasons.

Remove the hardcoding of the flash size and let boot0 autodetect the available space.
2021-08-27 02:27:15 +00:00
dflynn-Nokia
2c91efcd15 [Nokia ixs7215] Add support for changing the console baud rate (#8595)
This commit adds support for changing the default console baud rate configured
within the U-Boot bootloader. That default baud rate is exposed via the value
of the U-Boot 'baudrate' environment variable. This commit removes logic that
hardcoded the console baud rate to 115200 and instead ensures that the U-Boot
'baudrate' variable is always used when constructing the Linux kernel boot
arguments used when booting Sonic.

A change is also made to rc.local to ensure that the specified baud rate is set
correctly in the serial getty service.
2021-08-27 02:27:06 +00:00
gechiang
fcdd63835b
[202012]BRCM SAI 4.3.5.1-1 Fix configurable drop counter out of resource (#8601)
* [202012]BRCM SAI 4.3.5.1 Fix for configurable drop counter out of resource
2021-08-26 14:30:22 -07:00
mssonicbld
98dd76c485
[ci/build]: Upgrade SONiC package versions (#8561) 2021-08-24 14:53:16 +00:00
mssonicbld
8f604998b4
[ci/build]: Upgrade SONiC package versions (#8556) 2021-08-23 17:32:59 +00:00