sonic-buildimage

Archived

Author	SHA1	Message	Date
Samuel Angebault	5419e5de71	Add platform.json configs for all denali SKUs (#9717 )	2022-03-01 20:17:56 +00:00
Junchao-Mellanox	bc9b39a2fe	Stop PMON before swss during warm reboot (#10046 ) - Why I did it Stopping swss and syncd causes some driver module unloading. Those driver modules are depended by PMON. This could trigger ERROR logs in syslog. - How I did it Adjust warmboot shutdown order in make file - How to verify it Manual test	2022-03-01 20:13:09 +00:00
Junchao-Mellanox	74c49a7682	[system-health] Fix file handle leak (#10059 ) - Why I did it swsscommon.ConfigDBConnector does not automatically close connection when the instance is recycled by python. So, it should not create this instance each time calling check_services. It will cause error like Failed to read from file /var/run/hw-management/led/led_status_capability - OSError(24, 'Too many open files') - How I did it Only connect DB once in init - How to verify it Manual test	2022-03-01 20:12:03 +00:00
Alexander Allen	212cdfbe80	[pmon] Fix chassis_db_init exit not being expected (#9858 ) - Why I did it Error log was shown on switches during boot pmon#supervisord 2021-12-22 04:27:16,709 INFO exited: chassis_db_init (exit status 0; not expected) - How I did it Add exit code zero as an expected exit code and also disable autorestart. - How to verify it Boot the switch and ensure the above log line does not appear.	2022-03-01 03:49:55 +00:00
Alexander Allen	127a93c201	[Mellanox] Add 2x40G support to MSN4700 platform (#9485 ) - Why I did it MSN4700 platform has 8 lanes per port and thus can support 2x40G with each lane running at 10G - How I did it Added 40G to 2x200G breakout mode in platform.json - How to verify it Run config int break Ethernet0 2x40G[200G,100G,50G,25G,10G,1G] And verify the command runs successfully and the port speed was set to 40G with a 2x breakout.	2022-03-01 03:48:25 +00:00
Stephen Sun	84942c10d6	Fix typo and missing files in SN3800 and SN4600C's buffer templates (#9537 ) Why I did it Fix typo and missing files in SN3800 and SN4600C's buffer templates How I did it ingress_lossless_xoff_size => ingress_lossless_pool_xoff add missing files for SN4600C-D100C12S2 How to verify it Deploy the fix and verify whether the device can be up. Signed-off-by: Stephen Sun <stephens@nvidia.com>	2022-03-01 03:42:27 +00:00
noaOrMlnx	be7f31e6d2	[CoPP] Add always_enabled field (#9302 ) *Add the "always_enabled" field to copp_cfg.j2 file, in order to allow traps without an entry in features table, to be installed automatically.	2022-03-01 03:40:40 +00:00
Mahesh Maddikayala	87d18eb6c3	[libsaibcm] Fix the expiry date on the libsaibcm debian packages. (#10090 )	2022-02-28 08:52:11 -08:00
Shilong Liu	d05765cc06	[build] Increase vs platform kvm disk size (#10001 ) Info: Attempting file://dev/vdb/onie-installer ... Info: Attempting file://dev/vdb/onie-installer.bin ... cp: write error: No space left on device Failure: local_fs_run():/dev/vdb Unable to copy /tmp/tmp.CPY0ad/onie-installer.bin to tmpfs vs image is failing. Increase kvm device space.	2022-02-25 10:26:46 -08:00
xumia	a3733384bf	[Security]: Upgrade urllib3 to fix CVE-2021-33503 See https://security.archlinux.org/CVE-2021-33503	2022-02-25 09:13:48 +00:00
Mahesh Maddikayala	2153ea41a5	[libsaibcm] Update libsaibcm with PFC patches (#10066 )	2022-02-23 14:24:10 -08:00
xumia	93a22d7df5	[Build]: Fix marvell sai package version parsing issue Fix marvell sai package version parsing issue (#10009)	2022-02-19 04:23:29 +00:00
Nazarii Hnydyn	293ca0b870	[build]: Fix SAE to support debug flavor build. (#9435 ) setup PACKAGE_NAME, PATH, MACHINE, VERSION info for debug docker Signed-off-by: Nazarii Hnydyn <nazariig@nvidia.com>	2022-02-09 14:20:22 +08:00
Saikrishna Arcot	aa69f6b94c	[docker-mgmt-framework]: Don't overwrite /etc/passwd and /etc/group with symlinks (#9375 ) Fixes #9376 Because /etc/passwd and /etc/group have been overwritten with symlinks to /host_etc/passwd and /host_etc/group, the debug container build fails. This is because the debug container is built without /etc being mounted at /host_etc in the container (which does happen at runtime). Because of that, /etc/passwd and /etc/group don't exist, which causes some package installation errors when openssh-client tries to create a group. This is a partial revert of `1347f29178`. Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>	2022-02-08 13:14:41 +08:00
Shilong Liu	0e6eb338eb	Fix rules/functions.generage_manifest. (#9340 ) Why I did it Fix a bug in sonic debug image build. That bug is imported in the following PR: #8920	2022-02-07 14:09:58 +08:00
Volodymyr Samotiy	97bd2bf82f	[Mellanox][202106] Update SAI to 1.20.2 and SDK/FW to 4.5.1208/2010.1218 (#9618 ) - Why I did it To include latest fixes. 1. On CMIS modules, after low power configuration, the firmware waited for the module state to be ModuleReady instead of ModuleLowPower causing delays. 2. When connecting SN4600C, 100GbE port with CWDM4 module (Gen 3.0), link up time is 30 seconds. - How I did it Updated SDK/SAI submodule and relevant makefiles with the required versions. - How to verify it Build an image and run tests from "sonic-mgmt". Signed-off-by: Volodymyr Samotiy <volodymyrs@nvidia.com>	2022-01-26 10:59:39 +02:00
Shilong Liu	f5bef56d5e	[CI] Fix Azure pipeline set -e not work. (#9282 ) In azure pipeline template 'set -e' not works as expected.	2022-01-24 15:26:06 +08:00
Junchao-Mellanox	c0f0694236	[Mellanox] [202106] Optimize thermal policies (#9451 ) - Why I did it Optimize thermal control policies to simplify the logic and add more protection code in policies to make sure it works even if kernel algorithm does not work. - How I did it Reduce unused thermal policies Add timely ASIC temperature check in thermal policy to make sure ASIC temperature and fan speed is coordinated Minimum allowed fan speed now is calculated by max of the expected fan speed among all policies Move some logic from fan.py to thermal.py to make it more readable - How to verify it 1. Manual test 2. Regression	2022-01-19 11:43:55 +02:00
arlakshm	8ef7030f4f	remove staticd.conf.j2 (#9182 ) Why I did it resolves #8979 and #9055 How I did it Remove the file static.conf.j2,which adds the default route on eth0 from bgp docker Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>	2022-01-19 02:09:31 +00:00
Stephen Sun	c0df856847	Update sonic-swss-common (#9787 ) d00a25bb [ci] refer 202106 branch resources rather than master branch. (#573) Signed-off-by: Stephen Sun <stephens@nvidia.com>	2022-01-18 22:11:20 +02:00
Stepan Blyshchak	559ce14277	[nvidia] fail the build when hw-mgmt patches do not apply (#9565 ) - Why I did it To fix an issue that hw-mgmt patches were not applied. One patch was already in upstream hw-mgmt package thus applying it again caused an error and no other patches were applied. Also, I did it to improve the Makefile, so that the make will fail in case patches fail to apply. - How I did it Removed obsolete patch, made applying patches a hard failure in the build. - How to verify it Run the make and verify patches are applied. Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>	2022-01-16 08:10:25 +02:00
Qi Luo	c4bc9933fe	Update sonic-swss submodule (#9609 ) Includes below commits ``` 38b616ba 2021-12-09 \| [LGTM] lgtm to use 202106 branches of swss-common and sairedis (#2074) [Stephen Sun] c68bad07 2021-12-07 \| Fix random failure in PR/CI build. #2006 [Shilong Liu] ba17675b 2021-08-23 \| [ci]: fix artifacts download from swss-common and sairedis (#1882) [Guohan Lu] ```	2021-12-30 17:23:38 -08:00
Judy Joseph	1aa225cd0c	Update sonic-utilities submodule 74d2a09 [portstat] check TX/RX utilization calculation correctness (#1840)	2021-12-22 09:09:09 -08:00
Stephen Sun	b479bcd941	[Mellanox] Adjust buffer parameters with 2km cable supported for 4600C non-generic SKUs (#9215 ) - Why I did it Also recalculated all parameters with the latest algorithm with per-speed peer response time taken into account - How I did it Detailed information of each SKU: C64: t0: 32 100G downlinks and 32 100G uplinks t1: 56 100G downlinks and 8 100G uplinks with 2km-cable supported D112C8: 112 50G downlinks and 8 100G uplinks. D48C40: 48 50G downlinks, 32 100G downlinks, and 8 100G uplinks D100C12S2: 4 100G downlinks, 2 10G downlinks, 100 50G downlinks, and 8 100G uplinks 2km cable is supported for C64 on t1 only - How to verify it Run regression test (QoS) Signed-off-by: Stephen Sun <stephens@nvidia.com>	2021-12-22 09:06:33 -08:00
Stepan Blyshchak	4ebafdaf28	[Mellanox][SDK] Build SDK with PRM sniffer support (#9500 ) - Why I did it To have an ability to use PRM sniffer. - How I did it Enabled the option in configure flags. - How to verify it Built and ran on switch. Enabled the feature in runtime and checked the sniffer recording. Signed-off-by: Stepan Blyschak <stepanb@nvidia.com>	2021-12-22 09:05:49 -08:00
Vadym Hlushko	79d2a9a225	[Mellanox] [SN4410] [202106] Fixed capability files - port_config.ini, hwsku.json, platform.json (#9541 ) - Why I did it The capability files were incorrect in comparison to the marketing spec of the SN4410 platform. - How I did it Aligned the capability files according to the marketing spec. - How to verify it Did basic manual sanity checks: 1. Check if critical docker containers were UP 2. Check if interfaces were created and were UP 3. Check if interfaces created in the syncd docker container by executing – sx_api_ports_dump.py script 4. Check the logs from the start of the switch – everything was OK 5.Verified the port breakout Signed-off-by: Vadym Hlushko <vadymh@nvidia.com>	2021-12-16 17:39:17 +02:00
Junchao-Mellanox	73a17d8876	[Mellanox] [202106] Fix issue: SFP might not be re-initialized after cable plug in (#9387 ) - Why I did it Fix issue: SFP might not be re-initialized after cable plug in. There could be case like: 1. The SFP object was initialized as QSFP type 2. User use adapter to connect a SFP to a QSFP port 3. The SFP object treat the SFP as QSFP and failed to process EEPROM - How I did it If a new SFP is plugged, always re-initialize the SFP - How to verify it Added unit test case	2021-12-14 16:27:26 +02:00
Junchao-Mellanox	3ae3eeda35	[Mellanox] Allow user to set LED to orange (#9259 ) (#9515 ) Backport #9259 to 202106 - Why I did it Nvidia platform API does not support set LED to orange. - How I did it Allow user to set LED to orange - How to verify it Manual test and unit testing	2021-12-14 13:33:23 +02:00
Stephen Sun	5c91f233ef	[Reclaim buffer][202106] Reclaim unused buffers by applying zero buffer profiles (#9062 ) This is to backport community PR #8768 to 202106 branch Why I did it Support zero buffer profiles Add buffer profiles and pool definition for zero buffer profiles Support applying zero profiles on INACTIVE PORTS Enable dynamic buffer manager to load zero pools and profiles from a JSON file Signed-off-by: Stephen Sun stephens@nvidia.com How I did it Add buffer profiles and pool definition for zero buffer profiles If the buffer model is static: - Apply normal buffer profiles to admin-up ports - Apply zero buffer profiles to admin-down ports If the buffer model is dynamic: - Apply normal buffer profiles to all ports - buffer manager will take care when a port is shut down - Update buffers_config.j2 to support INACTIVE PORTS by extending the existing macros to generate the various buffer objects, including PGs, queues, ingress/egress profile lists Originally, all the macros to generate the above buffer objects took active ports only as an argument Now that buffer items need to be generated on inactive ports as well, an extra argument representing the inactive ports need to be added To be backward compatible, a new series of macros are introduced to take both active and inactive ports as arguments The original version (with active ports only) will be checked first. If it is not defined, then the extended version will be called Only vendors who support zero profiles need to change their buffer templates Enable buffer manager to load zero pools and profiles from a JSON file: The JSON file is provided on a per-platform basis It is copied from platform/<vendor> folder to /usr/share/sonic/temlates folder in compiling time and rendered when the swss container is being created. To make code clean and reduce redundant code, extract common macros from buffer_defaults_t{0,1}.j2 of all SKUs to two common files: One in Mellanox-SN2700-D48C8 for single ingress pool mode The other in ACS-MSN2700 for double ingress pool mode Those files of all other SKUs will be symbol link to the above files Update sonic-cfggen test accordingly: - Adjust example output file of JSON template for unit test - Add unit test in for Mellanox's new buffer templates. How to verify it Regression test. Unit test in sonic-cfggen Run regression test and manually test.	2021-12-13 10:51:50 -08:00
Samuel Angebault	08c2c07fc0	[202106][Arista] Update arista platform library (#9483 )	2021-12-09 18:29:59 -08:00
Arvindsrinivasan Lakshmi Narasimhan	e2b8e2d1da	submodule update swss Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>	2021-12-08 00:25:09 +00:00
Vivek Reddy	4856f98716	[Mellanox] [SKU] Fix the shared headroom for 4600C-C64 SKU (#8242 ) Removed ingress_lossy_pool from the BUFFER_POOL list Fx the the egress_lossless_pool_size value Signed-off-by: Vivek Reddy Karri <vkarri@nvidia.com>	2021-12-08 00:23:07 +00:00
Volodymyr Samotiy	b22db5a52b	[Mellanox] [202106] Update SAI to v1.20.0.1 and SDK/FW to v4.5.1156/v2010.1152 (#9431 ) - Why I did it To include latest fixes. SAI * Reduce verbosity of warning message on shared memory already existing * accuflow allocation support by key value SDK * Under various circumstances, Ethernet ports falsely showed that InfiniBand cables were connected. * In SN4600C, at times, the link up time in both DAC and optics cables may, in the worst case, take up to 15 seconds. * Using SN4600C with copper or optics loopback cables in NRZ speeds, link may raise in long link up times * When ECMP has high amount of next-hops based on VLAN interfaces, in some rare cases, packets will get a wrong VLAN tag and will be dropped. * When connecting Spectrum devices with optical transceivers that support RXLOS, remote side port down might cause the switch firmware to get stuck and cause unexpected switch behavior. * Aggregation event is missing for WJH L2 drop reason 'Unicast egress port list is empty'. * Tying the SCL and SDA of the optical modules to 3.3V causes errors. * On SN4600, there was a delay of more than 10 seconds from the time a data packet is sent from CPU until it is transmitted through one of the switch ports. * While using SN4600C system with Finisar FTLC1157RGPL 100GbE CWDM4 modules, intermittent link flaps across multiple ports may be observed. * In Spectrum-2 and Spectrum-3 systems, link did not work in auto-negotiation when connected to Marvell PHY. KR mechanism has been enhanced to integrate with Marvell PHY. * The tunnel counter counts the drop packets now for Spectrum-2 and Spectrum-3 and consistent with Spectrum behavior and count the ECN dropped packets as well. * When connecting SN3800 to Cisco-9000, fast-linkup flow will fail and will rise in the normal flow. * Race condition in WJH library: when multiple threads load the LAG shared memory concurrently, the program may crash. * Add WJH L2 drop reason 'Unicast egress port list is empty' as a new drop reason. * Fixed a memory leak in sx_api_port_sflow_statistics_get API. * During initialization flow, the command interface that is used by the minimal driver and SDK caused the collision in the firmware since the same buffer is used in the firmware for the two interfaces. - How I did it Updated SDK/SAI submodule and relevant makefiles with the required versions. - How to verify it Build an image and run tests from "sonic-mgmt". Signed-off-by: Volodymyr Samotiy <volodymyrs@nvidia.com>	2021-12-06 21:54:14 +02:00
Mahesh Maddikayala	3f3dceb96a	[broadcom]: update bcm dnx gpl module pointer (#9442 ) saibcm_modules_dnx submodule update contains a fix for kernel crash	2021-12-03 21:21:07 -08:00
Judy Joseph	70b24ad9c5	Update sonic-utilities submodule 9514857 [config reload][202106] Update command reference (#1944)	2021-12-01 19:21:45 -08:00
Judy Joseph	41a2d3e290	Update sonic-swss submodule [8522f4f] Don't handle buffer pool watermark during warm reboot reconciling (#1987)	2021-12-01 11:13:38 -08:00
Junchao-Mellanox	f5c847bdbc	[system-health] [202106] No longer check critical process/service status via monit (#9366 )	2021-12-01 10:26:06 -08:00
Junchao-Mellanox	e4ff4d2e3a	[Mellanox] Fan speed should not be 100% when PSU is powered off (#9258 ) - Why I did it When PSU is powered off, the PSU is still on the switch and the air flow is still the same. In this case, it is not necessary to set FAN speed to 100%. - How I did it When PSU is powered of, don't treat it as absent. - How to verify it Adjust existing unit test case Add new case in sonic-mgmt	2021-12-01 09:47:26 -08:00
Stephen Sun	fa0ae42e69	[Reclaim buffer] Common infrastructure update for reclaiming buffer (#9133 ) - Why I did it This is to update the common sonic-buildimage infra for reclaiming buffer. - How I did it Render zero_profiles.j2 to zero_profiles.json for vendors that support reclaiming buffer The zero profiles will be referenced in PR [Reclaim buffer] Reclaim unused buffers by applying zero buffer profiles #8768 on Mellanox platforms and there will be test cases to verify the behavior there. Rendering is done here for passing azure pipeline. Load zero_profiles.json when the dynamic buffer manager starts Generate inactive port list to reclaim buffer Signed-off-by: Stephen Sun <stephens@nvidia.com>	2021-12-01 09:47:18 -08:00
shlomibitton	1ebe52847a	[DHCPv6 relay] [202106] Fix DHCPv6 design to support multiple VLANS (#9163 ) - Why I did it If multiple Vlans are configured to have DHCPv6 relay, only one relay instance is able to capture DHCP packets received from upstream, this is as a result of kernel design to operate this way (SO_REUSEPORT). DHCPv6 transmit unicast packets to clients, only multicast packets can be captured on multiple application listening on the same UDP port. This issue causing only one Vlan interface to get packets from servers. - How I did it Change the design to neglect Vlan isolation and run only one relay instance serving all Vlans with all configured DHCP servers. - How to verify it Run DHCPv6 relay test with 2 Vlans configured do have a DHCP relay. Signed-off-by: Shlomi Bitton <shlomibi@nvidia.com>	2021-11-18 19:40:48 +02:00
gechiang	e0209f745a	[202106]Disable ALPM distributed hitbit thread that is used for debug purpose only but interfered with Other functional operations (#9293 ) This is to address an issue where it was observed that SAI operations sometime may take a very long to time complete (over 45ms). It was determined that the ALPM distributed thread was causing this issue. The fix is to disable this debug thread that has no functional purpose. Preliminary tests looks fine. BGP neighbors were all up with proper routes programmed interfaces are all up Manually ran the fib test cases on 7050CX3 (TD3), TD2, TH, TH2, and TH3 based platforms and thy all passed. Note: the testing was done over 20201230 image and are porting this change to master branch. No need to port this to 20201230 branch as a separate PR was already done for that branch. (#9190) this PR is created to port the changes made by (#9199) but could not be cherry picked directly to 202106 branch.	2021-11-17 20:58:25 -08:00
Arvindsrinivasan Lakshmi Narasimhan	84226bdc57	sonic-utilities submodule update Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>	2021-11-18 01:27:29 +00:00
Arvindsrinivasan Lakshmi Narasimhan	d4ed9e7e62	Swss submodule update Signed-off-by: Arvindsrinivasan Lakshmi Narasimhan <arlakshm@microsoft.com>	2021-11-18 00:39:15 +00:00
Judy Joseph	ff4613035e	Update sub-modules sonic-snmpagent 7e46eb1 [201911][RFC1213]: Initialize lag oid map in reinit_data (#234) aa98ded CPU Spike because of redundant and flooded keyspace notifis handled (#230) sonic-swss bc4e334 [Mux orch] Handle setting unknown mux state (#1984) bd3630b [tunnel decap] Change tunnel orch order (#1977) 87a673a Fix the option missing in kernel config issue (#1973) 57967a1 [orchagent] Fix group name of port-buffer-drop in flexcounterorch.cpp (#1967) sonic-utilities 181e8b0 Fix the option missing in kernel config issue (#1888) 21c0cc0 [watermarkstat] Fix for error in processing empty array from couters db (#1810) 7f15755 [chassis][supervisor][show][interfaces]show interfaces command warning on Supervisor card (#1771)	2021-11-14 15:57:36 -08:00
zzhiyuan	2b3cca6e86	[Arista] Fix 7060 flex HWSKU SFP ports and Ethernet8/1 (#9173 ) * [Arista] Fix 7060 flex HWSKU SFP ports and Ethernet8/1 * [Arista] Fix polarity flips for Arista 7060 on non-leading intfs Co-authored-by: Zhi Yuan (Carl) Zhao <zyzhao@arista.com>	2021-11-14 15:25:39 -08:00
Shilong Liu	606d64378f	Add artifacts for failure build to debug. (#9213 )	2021-11-14 15:25:03 -08:00
tjchadaga	7050792f63	Fix for additional intf flap during fast-reboot (#9166 )	2021-11-14 15:19:43 -08:00
Neetha John	f6511086e5	[minigraph] Add tagged vlan member support for storage backend (#9045 ) Signed-off-by: Neetha John <nejo@microsoft.com> Why I did it Storage T0's have all vlan members as tagged How I did it Since currently minigraph does not have a unique way to identify if a vlan member is tagged/untagged and to ensure other scenarios are not broken, the logic used is to just update the vlan member type as 'tagged' when we determine that it is a storage backend device. This change will apply only to storage backend T0's since storage backend T1's will not have vlan member information How to verify it Updated the storage backend T0 testcases to check for tagged vlan members Added testcase to check if a T1 and backend T1 device generates an empty vlan member table Existing vlan member testcases are good enough for checking if any regression has been caused for regular T0's Build sonic_config_engine-1.0-py3-none-any.whl successfully	2021-11-14 15:17:02 -08:00
dflynn-Nokia	33fce6afd1	[Nokia ixs7215] Platform API fixes (#9025 ) * [Nokia ixs7215] Platform API fixes This commit delivers the following fixes - Fix bug preventing access to second PSU eeprom - Fix bug preventing updates to front panel PSU status led - Fix SFP reset test case failure * Fix LGTM alert	2021-11-14 15:16:14 -08:00
dflynn-Nokia	030551ba27	[Nokia ixs7215] Add new platform capabilities to platform.json (#9032 ) This commit more fully declares the HW capabilities of the Nokia-7215 platform. For example, support for the threshold values associated with each thermal sensor is described. The intent here is to inform the sonic-mgmt platform test cases of which HW features are supported. This commit must align with PR# 4521 within the sonic-mgmt git repo which is currently under review. Any changes to that PR will need to be reflected in this commit.	2021-11-14 15:15:56 -08:00

1 2 3 4 5 ...

5205 Commits