Commit Graph

5805 Commits

Author SHA1 Message Date
Judy Joseph
babd131e72 Update sonic-utilities
f19762b (HEAD -> 202111, origin/202111) Display the correct default poll interval for watermark counters (#2082)
2022-03-07 09:59:08 -08:00
Judy Joseph
32189d23ce Update sonic-swss submodule
91d7558 (HEAD -> 202111, origin/202111) Allow IPv4 link-local nexthops (#1903)
ceb5161 Fix for 2053, Fix IPv6 BGP multipath-relax peer-type. (#2062)
b3b279a [crm] Use sai_object_type_get_availability() API to get counters (#2098)
28955f4 Try get port operational speed from STATE DB (#2119)
2022-03-07 09:43:05 -08:00
Marty Y. Lok
76ee6448b5 [chassis][supervisor]monit container-checker failed due to unexpected "database-chassis" docker running #9042 (#9043)
Why I did it
Fixed the monit container_checker fails due to unexpected "database-chassis" docker running on Supervisor card in the VOQ chassis. fixes #9042

How I did it
Added database-chassis to the always running docker list if platform is supervisor card.

How to verify it
Execute the CLI command "sudo monit status container_checker"


Signed-off-by: mlok <marty.lok@nokia.com>
2022-03-07 09:21:32 -08:00
wenyiz2021
043fcfd099 Update container_checker for multi-asic devices when state is 'always_enabled' (#10067)
* Update container_checker for multi-asic devices 

Update container_checker for multi-asic devices to add database containers in always_running_containers. 
Previous change was made for single-asic, and that database containers were not considered as feature when writing to state_db.

* Update container_checker

Update an indent
2022-03-07 09:21:28 -08:00
Alexander Allen
8165b24ce2 [pmon] Clean up supervisord chassis_db_init entry and fix startsecs (#10071)
Why I did it
Code review was still in progress when #9858 was merged and upon further testing I have arrived at a better solution.

How I did it
Modified supervisord configuration j2 template for pmon to require no minimum uptime for chassisd_db_init and to remove the redundant exit_codes directive

How to verify it
Boot switch and verify in syslog that there are no errors related to chassis_db_init
2022-03-07 09:21:23 -08:00
Aravind Mani
0ab166a823 [sonic-cfggen]: Fix sonic-cfggen build failures for armhf (#10132)
Why I did it
amrhf build fails while building sonic-config-engine whl package
https://dev.azure.com/mssonic/be1b070f-be15-4154-aade-b1d3bfb17054/_apis/build/builds/77089/logs/9

The reason for the failure is due to the fact that there is a new line generated at the top of the file in buffer config test cases while building for broadcom based platform and this issue is not seen in Marvell based platforms.

How I did it
Removed the new line for all the buffer test cases as there is no need to add it and accordingly changed the buffer_config.j2 where the new line is generated.
2022-03-07 09:21:17 -08:00
gechiang
a187686c17 [BRCMSAI 6.0.0.13-1] Fix Cancun file directory at new location causing TD3 platform boot issue (#9922) 2022-03-07 09:21:12 -08:00
Sudharsan Dhamal Gopalarathnam
3b6bd5bffb [containerd]Fixing container commands when mode is local and state is disabled (#9986)
Why I did it
During warm-reboot and fast-reboot the below error logs appear
Feb 3 22:05:15.187408 r-lionfish-13 ERR container: docker cmd: kill for nat failed with 404 Client Error for http+docker://localhost/v1.41/containers/nat/json: Not Found ("No such container: nat")

The container command when called for local mode doesn't check if it is enabled before calling docker kill which throws the above errors.
b6ca76b482/scripts/fast-reboot (L699)

How I did it
Checking feature state if local mode and returning error exit code along with valid debug message.

How to verify it
Manually tested with warm-reboot and fast-reboot
Added UT to verify it.
2022-03-07 09:21:07 -08:00
Saikrishna Arcot
c437cad4f0 [build_debian.sh]: Fix /var/log having 0750 permissions instead of 0755 (#10031)
PR #9481 changed auditd's log directory to be /var/log instead of
/var/log/audit, because SONiC mounts a disk image at /var/log during
runtime, and so the /var/log/audit directory might not exist (since it
would've been created during package installation, mounting another
partition at /var/log will hide it). However, for security reasons,
auditd changes the log directory to have 0750 permissions, so that not
everyone knows about the audit logs or read them.

To fix this, revert the change to auditd's log directory, and tell
systemd to create the audit log directory at runtime if it doesn't
exist. Because the disk image gets mounted during initramfs (before
systemd starts), systemd will make sure that the /var/log/audit
directory will exist.

Fixes #9548 and #10015

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
2022-03-07 09:21:03 -08:00
Alexander Allen
53be3f7623 [Mellanox] Add patch to hw-mgmt to prevent loading of non-existent kernel modules (#10073)
- Why I did it
The latest upgrade of Mellanox hw-mgmt V7.0020.1300 introduced a couple new kernel modules for new Mellanox platforms that have yet to be upstreamed to the linux kernel.

As these new platforms do not have SONiC support we elected not to upstream these new drivers to sonic-linux-kernel but hw-mgmt expects them to exist which is causing a non-functional error on switch boot.

Feb 15 00:09:55.374130 r-leopard-simx-74 ERR systemd-modules-load[269]: Failed to find module 'emc2305'
Feb 15 00:09:55.374141 r-leopard-simx-74 ERR systemd-modules-load[269]: Failed to find module 'ads1015'
To resolve this we can patch hw-mgmt to no longer attempt to load these modules by default.

- How I did it
Added a SONiC patch to Mellanox hw-mgmt in order to remove the unused kernel modules which were not upstreamed to sonic-linux-kernel

- How to verify it
Boot switch and verify there are no error logs regarding kernel modules failing to load.
2022-03-07 09:20:58 -08:00
Judy Joseph
04b3af36c6 Update sonic-swss
commit 6e5ed1c0c1920e429fa07cded6867dfcb098a705 [chassis][syncd][sai] Adjusting response timeout during syncd init (#2159)
2022-03-01 09:13:42 -08:00
Junchao-Mellanox
cabcb78024 Stop PMON before swss during warm reboot (#10046)
- Why I did it
Stopping swss and syncd causes some driver module unloading. Those driver modules are depended by PMON. This could trigger ERROR logs in syslog.

- How I did it
Adjust warmboot shutdown order in make file

- How to verify it
Manual test
2022-02-27 20:45:51 -08:00
saksarav-nokia
78371d99a1 [Nokia][platform]Modify BCM config & platform_reboot for Nokia-IXR7250E-36x400G (#9990)
Signed-off-by: Sakthivadivu Saravanaraj <sakthivadivu.saravanaraj@nokia.com>
2022-02-27 20:45:48 -08:00
Junchao-Mellanox
ba4da1597a [Mellanox] Fix issue: thermal zone threshold value 0 causes fan speed stuck at 100% (#10057)
- Why I did it
In SONiC thermal control algorithm, it compares thermal zone temperature with thermal zone threshold. Previously, a thermal zone with no thermal sensor can still get its threshold. However, a recently driver patch changes this behavior: a thermal zone with no thermal sensor will return 0 for threshold. We need to ignore such thermal zone.

- How I did it
Ignore thermal zones whose temperature is 0.

- How to verify it
Added unit test case and Manual test
2022-02-27 20:45:45 -08:00
Junchao-Mellanox
00940941b2 [system-health] Fix file handle leak (#10059)
- Why I did it
swsscommon.ConfigDBConnector does not automatically close connection when the instance is recycled by python. So, it should not create this instance each time calling check_services. It will cause error like Failed to read from file /var/run/hw-management/led/led_status_capability - OSError(24, 'Too many open files')

- How I did it
Only connect DB once in init

- How to verify it
Manual test
2022-02-27 20:45:41 -08:00
arlakshm
b58a42a8d4 [chassis][bgp] create v4 and v6 peer group for VoQ internal neighbors (#9693)
Why I did it
In the recent minigraph changes we add separate BGP session configuration for V4 and V6 internal VoQ neighbors.
This PR is adding different Peer groups for V4 and V6 neighbors

How I did it
Add VOQ_CHASSIS_V4_PEER and VOQ_CHASSIS_V6_PEER groups
Add extra Unit tests

How to verify it

Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>
2022-02-27 20:45:37 -08:00
Judy Joseph
c6774ad683 update sonic-sairedis
5331ecd [vslib]: Fix MACsec bug in SCI and XPN (#1003)
ac04509 Fix build issues on gcc-10 (#999)
1b8ce97 [pipeline] Download swss common artifact in a separated directory (#995)
7a2e096 Change sonic-buildimage.vs artifact source from CI build to official build. (#992)
d5866a3 [vslib]: fix create MACsec SA error (#986)
f36f7ce Added Support for enum query capability of Nexthop Group Type. (#989)
323b89b Support for MACsec statistics (#892)
26a8a12 Prevent other notification event storms to keep enqueue unchecked and drained all memory that leads to crashing the switch router (#968)
0cb253a Fix object availability conversion (#974)
2022-02-27 20:41:15 -08:00
Saikrishna Arcot
5b7e55dfac
Package debugging and hardening for dhcpmon and dhcp6relay (#9862) (#10063)
Enable dbgsym package for dhcpmon.

Allow CFLAGS and LDFLAGS from environment variables to be used
in the dhcp6relay build. This makes sure that the -O2 flag from
dpkg-buildflags gets used.

Finally, enable all hardening flags in dpkg-buildflags for
dhcp6relay and dhcpmon. The change from the default set of flags is that
during linking, immediate binding of symbols is done instead of lazy
binding.

Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
2022-02-25 11:09:15 -08:00
xumia
5d19c74f1f [Security]: Upgrade urllib3 to fix CVE-2021-33503
See https://security.archlinux.org/CVE-2021-33503
2022-02-25 09:15:08 +00:00
Judy Joseph
38967a4ebd Update sonic-swss, sonic-utilites submodules
sonic-swss

1aa40f7 Remove port serdes object before removing port (#2152)
876d690 [doc] Updating Policer config in Configuration manual (#2144)

sonic-utilities
dfed952 show_platfom_info not run for simx (#2042)
71fdee7 [aclshow] fix aclshow when clear is called before counters are populated (#2037)
a48a027 [sonic-package-manager] implement blocking feature state change (#2035)
c51871d [ci] Fix python dependencies reference path. (#2060)
2022-02-21 23:53:01 -08:00
kellyyeh
5d4031918b [radv] Support multiple ipv6 prefixes per vlan interface (#9934)
Why I did it
Radvd.conf.j2 template creates two copies of the vlan interface when there are more than one ipv6 address assigned to a single vlan interface. Changed the format to add prefixes under the same vlan interface block.

How I did it
Modifies radvd.conf.j2 and added unit tests

How to verify it
Configure multiple ipv6 address to the same vlan, start radvd
Unit test will check if radvd.conf with multiple ipv6 addresses is formed correctly
2022-02-21 23:48:22 -08:00
Stepan Blyshchak
e5e1b70966 [rsyslog.j2] fix typo in VAR_LOG_SIZE_KB (#9954)
This issue causes negative threshold value and thus deleting log files even when there is enough space.

This issue causes negative threshold value and thus deleting log files even when there is enough space.

- Why I did it
To fix an issue when log files get deleted even if there is enough space.

- How I did it
Fixed an typo.

- How to verify it
Run the portion of the script that calculates threshold, see that the threshold is calculated correctly.

Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>
2022-02-21 23:48:19 -08:00
kellyyeh
a0bb7a0485 [dhcp_relay] Check payload size to prevent buffer overflow in dhcpv6 option (#9740) 2022-02-21 23:48:15 -08:00
Ying Xie
53a8d99baa [dhcp6relay] a couple memory access protections (#9851)
Why I did it
the strcpy and buffer allocation is not safe, it corrupts 1 byte on the stack. Depending on the memory layout, it may or may not cause issue immediately.
message type is not validated before updating the counter. Which could cause segment fault.

How I did it
Remove the unsafe strcpy, use config->interface.c_str() instead.
Check message type before updating counters.

How to verify it
The issue (1) caused segment fault on a specific platform. The fix was validated there. Issue (2) was precautionary. Added log in case it triggers.
2022-02-21 23:48:06 -08:00
kellyyeh
4df0aa348c [dhcp6relay] Support relaying Relay-Forward message (#9887) 2022-02-21 23:48:02 -08:00
Dror Prital
a6eb72cbf1 [Mellanox] Upgrade ASIC FW tool to 4.18.1-16 (#9981)
- Why I did it
Update MFT to version 4.18.1-16 for bugs fixes and new SN2201 support

- How I did it
Advance to MFT tool version to 4.18.1-16

- How to verify it
Manually tested on all Mellanox platforms (ASIC FW Upgrade, link debug tools, CPLD upgrade, etc.)
2022-02-21 23:47:58 -08:00
Alexander Allen
1f566bf98e [pmon] Fix chassis_db_init exit not being expected (#9858)
- Why I did it
Error log was shown on switches during boot
pmon#supervisord 2021-12-22 04:27:16,709 INFO exited: chassis_db_init (exit status 0; not expected)

- How I did it
Add exit code zero as an expected exit code and also disable autorestart.

- How to verify it
Boot the switch and ensure the above log line does not appear.
2022-02-21 23:47:54 -08:00
Sudharsan Dhamal Gopalarathnam
48bf7526fe [yang]YANG model for policer table (#9948)
#### Why I did it
Added yang model for policer table
Fixes https://github.com/Azure/sonic-buildimage/issues/9742 and https://github.com/Azure/sonic-buildimage/issues/9743
#### How I did it
Creating yang model for policer

#### How to verify it
Added UT to verify the yang model

The configuration schema for policer is added in the pull request https://github.com/Azure/sonic-swss/pull/2144
2022-02-21 23:47:50 -08:00
xumia
dce0e7270c [Build]: Fix marvell sai package version parsing issue
Fix marvell sai package version parsing issue (#10009)
2022-02-19 04:25:08 +00:00
Sudharsan Dhamal Gopalarathnam
da5899b748
[yang] Adding not-provisioned to type field in DEVICE_METADATA table (#9951) (#9993)
Cherry-pick of https://github.com/Azure/sonic-buildimage/pull/9951 into 202111 branch
#### Why I did it
Fixing the issue https://github.com/Azure/sonic-buildimage/issues/9915


#### How I did it
Added 'not-provisioned' as a supported value for type field in DEVICE_METADATA type. This value is set during initial ZTP bring up

#### How to verify it
Added UT to verify it.
2022-02-16 15:20:39 -08:00
Judy Joseph
063256f7b3 Merge branch '202111' of https://github.com/Azure/sonic-buildimage into 202111 2022-02-15 11:34:59 -08:00
Judy Joseph
e6adf71c20 Update sonic-platform-daemons
cb3ddf5 [pmon][xcvrd] xcvrd process show backtrace on the internal port. Port PR233 (#236)
5b4c9e1 Fix python wheels path downloaded from vs official build. (#244)
2022-02-15 11:34:30 -08:00
Dror Prital
30b69ed621
[202111] Add sonic_release file in order to have "release 202111" on sonic_version.yml (#9985)
- Why I did it
Fix Issue 9972: Incorrect information about release version in sonic_version.yml

- How I did it
Add "sonic_release" file to /sonic-buildimage/files/image_config/

- How to verify it
Install the image and run: cat /etc/sonic/sonic_version.yml
Verify the following item on sonic_version.yml file: release: '202111'
2022-02-15 10:46:42 +02:00
byu343
9b5bb887c8 Support multi-asic on macsec container (#9921)
This change enables the support of running multiple macsec containers, each for one ASIC.
2022-02-13 22:46:33 -08:00
Judy Joseph
472f2ff772 Update sonic-platform-common submodule 2022-02-13 18:10:19 -08:00
Judy Joseph
97c2b2ff4b Update sonic-swss submodule 2022-02-13 18:09:30 -08:00
Junchao-Mellanox
b37470d45e [Mellanox] Fix issue: SN4600C has 4 CPU core temperature sensors (#9930)
- Why I did it
platform.json of 4600C only has 2 CPU core thermal sensors, but there are 4 actually

- How I did it
Added thermal sensors for CPU core 2 and core 3.

- How to verify it
Build.
2022-02-13 18:04:40 -08:00
Kebo Liu
8c5db31627 fix MSN4410 chassis name in platform_components.json (#9939)
- Why I did it
The chassis name in MSN4410 platform_components.json is not correct

- How I did it
Fix the chassis name

- How to verify it
Run relevant platform API test

Signed-off-by: Kebo Liu <kebol@nvidia.com>
2022-02-13 18:04:37 -08:00
Marty Y. Lok
af300f5e04 Update Nokia sonic-platform submodule (#9963)
ded0344 Return both 'vendor_rev' and 'hardware_rev' keys from get_transceiver_info to support both earlier and later versions of xcvrd
b3442cc Change log_error to log_info when transceiver module is transitioned
3d3a73c Fix problem introduced with new SFP caching paradigm
2915746 Add script to reboot all IMMs
17ad221 Fix the position_in_parent for the psu entity
207e731 No longer force read pages at SFP init time
b387921 Fixed the voltage and power with rounding to 2 digit decimals

Signed-off-by: mlok <marty.lok@nokia.com>
2022-02-13 18:04:32 -08:00
saksarav-nokia
977cb98479 [Nokia][platform] Modified the bcm config file for Nokia-IXR7250E-36x400G (#9622)
Why I did it
Updated the BCM config recommended by Broadcom for Nokia-IXR7250E-36x400G

How I did it
Updated the BCM config file

How to verify it
Verified running the image with this BCM config in Nokia-IXR7250E-36x400G and ensured that the syncd container was stable, ports were up and passing the traffic.

Signed-off-by: Sakthivadivu Saravanaraj <sakthivadivu.saravanaraj@nokia.com>
2022-02-13 18:04:28 -08:00
xumia
bf74a68cb9 [Build]: Fix hundreds of thousands lines of logs printed in marvell-armhf (#9980)
[Build]: Fix hundreds of thousands lines of logs printed in marvell-armhf
It is caused by the bad format of the marvell sai package mrvllibsai_armhf_1.7.1-6.deb, increasing the waiting time to reduce the logs, and reduce the waste of the CPU.
2022-02-13 18:04:23 -08:00
Andriy Yurkiv
b29a3a0d4c [Mellanox][vxlan] remove old static config for VXLAN src port range feature (#9956)
- Why I did it
Need to remove old static configs from sai.profile files.
New implementation: Azure/sonic-swss#1959
New configuration: #9658

- How I did it
Remove SAI_VXLAN_SRCPORT_RANGE_ENABLE=1 lines from files per HWSKU

- How to verify it
When static config is removed following test will fail (src port will be in range 0-255)

py.test vxlan/test_vnet_vxlan.py --inventory "../ansible/inventory, ../ansible/veos" --host-pattern (testbed)-t0 --module-path ../ansible/library/ --testbed (testbed)-t0 --testbed_file ../ansible/testbed.csv --allow_recover --assert plain --log-cli-level info --show-capture=no -ra --showlocals --disable_loganalyzer --skip_sanity --upper_bound_udp_port 65535 --lower_bound_udp_port 64128
2022-02-13 18:04:19 -08:00
Stephen Sun
192409a7db Fix issue: module id got from get_change_event is wrong (#9961)
Signed-off-by: Stephen Sun <stephens@nvidia.com>
2022-02-13 18:04:13 -08:00
Travis Van Duyn
05791708f1 updated jinja template for snmp contact python2 vs python3 issue (#9949) 2022-02-13 18:04:09 -08:00
Andriy Yurkiv
84a986bc27 [Mellanox][VXLAN] add params to vxlan.json file in order to configure VXLAN src port range feature (#9658)
- Why I did it
Remove obsolete parameter that enables static VXLAN src port range
provide functionality no generate json config file according to appropriate parameter in config_db
Done for
SN3800:
• Mellanox-SN3800-D28C50
• Mellanox-SN3800-C64
• Mellanox-SN3800-D28C49S1 (New 10G SKU)

SN2700:
• Mellanox-SN2700-D48C8

- How I did it
Remove SAI_VXLAN_SRCPORT_RANGE_ENABLE=1 from appropriate sai.profile files
Created vxlan.json file and added few params that depends on DEVICE_METADATA.localhost.vxlan_port_range

- How to verify it
File /etc/swss/config.d/vxlan.json should be generated inside swss docker when it restart
[
    {
        "SWITCH_TABLE:switch": {
            "vxlan_src": "0xFF00",
            "vxlan_mask": "8"
        },
        "OP": "SET"
    }
]
Signed-off-by: Andriy Yurkiv <ayurkiv@nvidia.com>
2022-02-13 18:01:06 -08:00
Alexander Allen
825fc9e2dd [Mellanox] Update mellanox hw-mgmt submodule and versions to V.7.0020.1300 (#9860)
- Why I did it
New version of mellanox platform management code available adding support for new platforms and fixing bugs.

- How I did it
1. Updated the submodule
2. Updated makefile version references
3. Regenerated SONiC patches
2022-02-13 18:01:02 -08:00
saksarav-nokia
86bd652009 [Nokia][platform] Modified Nokia device data to support midplane (#9914)
Added midplane_subnet in chassisdb.conf for interfaces-config.sh to create midplane interface in multi-asic namespaces.

Signed-off-by: Sakthivadivu Saravanaraj <sakthivadivu.saravanaraj@nokia.com>
2022-02-13 18:00:58 -08:00
Prince George
fa0de261aa Close console session due to user inactivity (#9890)
Signed-off-by: Prince George <prgeor@microsoft.com>
2022-02-13 18:00:54 -08:00
abdosi
c4254b53f0 Updated Internal BGP Templates for chassis packet (#9674)
Fixes: https://github.com/Azure/sonic-buildimage/issues/9610
2022-02-13 18:00:50 -08:00
Ashok Daparthi-Dell
741d047982 [yang] Fix for sonic-scheduler.yang name pattern (#9873)
#### Why I did it

PR9611 - sonic-scheduler.yang pattern issue

#### How I did it
Modified the scheduler name pattern string to accept any string 

#### How to verify it

Sonic yang tests
2022-02-13 18:00:43 -08:00