Why I did it
Add two platform that support s3IP framework
How I did it
Add two platforms supporting S3IP SYSFS (TCS8400, TCS9400)
How to verify it
Manual test
Adding platform support for FS s5800-48t4s and s5800-48t8s-mars8p.
Both s5800-48t4s and s5800-48t8s-mars8p have 48 * 10/100/1000 Base-T ports, 4 * 10GE SFP+ Ports on Centec TsingMa.
s5800-48t4s is different from s5800-48t8s-mars8p in that:
The phy chip used by s5800-48t4s is Marvell 88e1680;
The phy chip used by s5800-48t4s-mars8p is Centec ctc21108;
Why I did it
Fixes#12634
Observing the following error while running 'sfputil show lpmode' command.
AttributeError: 'Sfp' object has no attribute 'get_power_set'
Root Cause: get_power_set() is defined for QSFP28 and QSFP+ i.e. Sff8636 and Sff8634. However, the function is not defined in the optoe_base class.
How I did it
To use get_power_set(), we need to initialise the 'api' via get_xcvr_api() and then use it to run get_power_set().
- Consolidating multiple read functions in a PSU driver on the basis of byte, word or block read,
- Enhancing PDDF parsing script support for CPU and PCH temperature reading,
- Adding missing methods in PDDF common APIs
Why I did it
- PSU driver changes are to optimize the code and increase the code coverage
- PDDF parser script enhancements to accommodate the CPU and PCH temp reading using hwmon device path
- Some of the new APIs were missing from the PDDF common platform classes
How I did it
Added code changes and verified them on AS7816 adn AS7726 platforms.
Why I did it
Provide CPLD and FPGA driver framework that complies with s3ip sysfs specification
How I did it
1、 The framework module provides register and unregister interface and implementation.
2、 The framework will help you create the sysfs node
How to verify it
A demo driver base on this framework will display the sysfs node wich conform to the s3ip sysfs specification
Why I did it
Update ECMP calculator README file with new instructions how to run the calculator.
How I did it
Update README file.
How to verify it
Read README file.
docker-sonic-vs doesn't have the infra needed for the syslog rate limit
configuration, so it's not going to be rendering jinja templates to
overwrite /etc/rsyslog.conf. This also means that syslog messages would
get logged twice (because both the default /etc/rsyslog.conf file and
/etc/rsyslog.d/50-default.conf are telling it to log to syslog).
Therefore, keep the custom static /etc/rsyslog.conf file for docker-sonic-vs.
Fixessonic-net/sonic-swss#2570.
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Why I did it
A demo driver base on this framework will display the sysfs node wich conform to the s3ip sysfs specification
How I did it
1、 demo driver will call the s3ip kernel framework interface
How to verify it
run the demo ,it will display the sysfs node wich conform to the s3ip sysfs specification
Why I did it
The user framework module complies with s3ip sysfs specification
How I did it
1、 create a s3ip_sysfs service
2、 the s3ip_sysfs service call the “s3ip_sysfs_tool.sh” to install kernel module and run s3ip_load.py
3、 s3ip_load.py will parse the s3ip_sysfs_conf.json configuration file and create /sys_switch/ directory
How to verify it
A demo driver base on this framework will display the sysfs node wich conform to the s3ip sysfs specification
Why I did it
Provide slot and switch_rootsysfs driver framework that complies with s3ip sysfs specification
How I did it
1、 The framework module provides register and unregister interface and implementation.
2、 The framework will help you create the sysfs node
How to verify it
A demo driver base on this framework will display the sysfs node wich conform to the s3ip sysfs specification
Why I did it
Provide SYSLED and watchdog driver framework that complies with s3ip sysfs specification
How I did it
1、 The framework module provides register and unregister interface and implementation.
2、 The framework will help you create the sysfs node
How to verify it
A demo driver base on this framework will display the sysfs node wich conform to the s3ip sysfs specification
Why I did it
Provide a sensor driver framework that complies with s3ip sysfs specification
How I did it
1、 The framework module provides register and unregister interface and implementation.
2、 The framework will help you create the sysfs node
How to verify it
A demo driver base on this framework will display the sysfs node wich conform to the s3ip sysfs specification
Why I did it
Provide a transceiver driver framework that complies with s3ip sysfs specification
How I did it
1、 The framework module provides register and unregister interface and implementation.
2、 The framework will help you create the sysfs node
How to verify it
A demo driver base on this framework will display the sysfs node wich conform to the s3ip sysfs specification
Why I did it
Provide a Fan driver framework that complies with s3ip sysfs specification
How I did it
1、 The framework module provides register and unregister interface and implementation.
2、 The framework will help you create the sysfs node
How to verify it
A demo driver base on this framework will display the sysfs node wich conform to the s3ip sysfs specification
Why I did it
Provide a PSU driver framework that complies with s3ip sysfs specification
How I did it
1、 The framework module provides register and unregister interface and implementation.
2、 The framework will help you create the sysfs node
How to verify it
A demo driver base on this framework will display the sysfs node wich conform to the s3ip sysfs specification
Why I did it
Platform interface doesn't provide all sensors and using it isn't effective
How I did it
Request sensors via http from BMC server and parse the result
How to verify it
Related daemon in pmon populates redis db, run this command to view the contents
- Why I did it
Remove TODO comments which are no longer needed
- How I did it
Remove TODO comments which are no longer needed
- How to verify it
Only comment change
This feature caches all the deb files during docker build and stores them
into version cache.
It loads the cache file if already exists in the version cache and copies the extracted
deb file from cache file into Debian cache path( /var/cache/apt/archives).
The apt-install always installs the deb file from the cache if exists, this
avoid unnecessary package download from the repo and speeds up the overall build.
The cache file is selected based on the SHA value of version dependency
files.
Why I did it
How I did it
How to verify it
* 03.Version-cache - framework environment settings
It defines and passes the necessary version cache environment variables
to the caching framework.
It adds the utils script for shared cache file access.
It also adds the post-cleanup logic for cleaning the unwanted files from
the docker/image after the version cache creation.
* 04.Version cache - debug framework
Added DBGOPT Make variable to enable the cache framework
scripts in trace mode. This option takes the part name of the script to
enable the particular shell script in trace mode.
Multiple shell script names can also be given.
Eg: make DBGOPT="image|docker"
Added verbose mode to dump the version merge details during
build/dry-run mode.
Eg: scripts/versions_manager.py freeze -v \
'dryrun|cmod=docker-swss|cfile=versions-deb|cname=all|stage=sub|stage=add'
* 05.Version cache - docker dpkg caching support
This feature caches all the deb files during docker build and stores them
into version cache.
It loads the cache file if already exists in the version cache and copies the extracted
deb file from cache file into Debian cache path( /var/cache/apt/archives).
The apt-install always installs the deb file from the cache if exists, this
avoid unnecessary package download from the repo and speeds up the overall build.
The cache file is selected based on the SHA value of version dependency
files.
Signed-off-by: maipbui <maibui@microsoft.com>
Dependency: [PR (#12065)](https://github.com/sonic-net/sonic-buildimage/pull/12065) needs to merge first.
#### Why I did it
1. `eval()` - not secure against maliciously constructed input, can be dangerous if used to evaluate dynamic content. This may be a code injection vulnerability.
2. `subprocess()` - when using with `shell=True` is dangerous. Using subprocess function without a static string can lead to command injection.
3. `os` - not secure against maliciously constructed input and dangerous if used to evaluate dynamic content.
4. `is` operator - string comparison should not be used with reference equality.
5. `globals()` - extremely dangerous because it may allow an attacker to execute arbitrary code on the system
#### How I did it
1. `eval()` - use `literal_eval()`
2. `subprocess()` - use `shell=False` instead. use an array string. Ref: [https://semgrep.dev/docs/cheat-sheets/python-command-injection/#mitigation](https://semgrep.dev/docs/cheat-sheets/python-command-injection/#mitigation)
3. `os` - use with `subprocess`
4. `is` - replace by `==` operator for value equality
5. `globals()` - avoid the use of globals()
Why I did it
Initial implementation of Watchdog platform plugin for BMC-based boards
How I did it
How to verify it
Run platform_tests/test_reload_config.py
*Replaced BRCM SDK's psample support flag(PSAMPLE_SUPPORT) with linux kernel psample module support config flag(CONFIG_PSAMPLE) in saibcm-modules.
*Replaced BUILD_PSAMPLE conditioanl check with CONFIG_PSAMPLE to build psample callback library(psample-cb.o), only if psample config is enabled in linux kernel.
*Cleaned up PSAMPLE_SUPPORT related commented code.
Signed-off-by: haris@celestica.com
Signed-off-by: haris@celestica.com
Why I did it
SIGTERM takes more than 10 seconds to be processed, so psud is stopped by SIGKILL, this causes unexpected behavior since data base is not cleared
How I did it
Decorate get_presence api to cancel it on SIGTERM signal in order to avoid long processing.
How to verify it
test_pmon_psud_stop_and_start_status
test_pmon_psud_term_and_start_status
1d53bf4 Skip platform NDK health check two times in watchdog.sh
d68297c Added code to shutdown the channel after the grpc call also fixed the show fp-status command
0769efe Impelemented the module API to return the correct eeprom info for fabric card.
171569c Remove explicit logger identifier for transceiver module operations; use inherited id
6c4d651 Corrected the log messages for firmware install
Signed-off-by: mlok <marty.lok@nokia.com>
- Why I did it
Added ECMP calculator tool.
- How I did it
New files were added.
- How to verify it
Manual tests performed according to tests chapter in HLD
Automated tests will be added by verification.
Why I did it
support multi-platform device tree for default dtb may not suitable on all vender hardware designs.
How I did it
use onie_platform variable to load device tree blob
Why I did it
smartctl tool is available only in PMON docker. Hence, the tool may be not accessible incase PMON docker goes down.
Using iSMART_64 tool to fetch the SSD firmware version and device model information.
How I did it
Replacing smartctl with iSMART_64.
Signed-off-by: maipbui <maibui@microsoft.com>
Dependency: [https://github.com/sonic-net/sonic-buildimage/pull/12065](https://github.com/sonic-net/sonic-buildimage/pull/12065)
#### Why I did it
`subprocess.Popen()` and `subprocess.run()` is used with `shell=True`, which is very dangerous for shell injection.
`os` - not secure against maliciously constructed input and dangerous if used to evaluate dynamic content
`getstatusoutput` is dangerous because it contains `shell=True` in the implementation
#### How I did it
Replace `os` by `subprocess`, use with `shell=False`
Remove unused functions
Signed-off-by: maipbui <maibui@microsoft.com>
#### Why I did it
The [xml.etree.ElementTree](https://docs.python.org/3/library/xml.etree.elementtree.html#module-xml.etree.ElementTree) module is not secure against maliciously constructed data.
`os` - not secure against maliciously constructed input and dangerous if used to evaluate dynamic content
`subprocess.getstatusoutput` is dangerous because include shell=True in the implementation
#### How I did it
Remove xml. Use [lxml](https://pypi.org/project/lxml/) XML parsers package that prevent potentially malicious operation.
Replace `os` by `subprocess`
Use command as an array instead of string
Use `getstatusoutput_noshell` in `sonic_py_common` lib
- Why I did it
Add support for compiling Spectrum-4 ASIC firmware to the SONiC image
Add support for Spectrum-4 ASIC firmware upgrade
- How I did it
Update Mellanox fw make files to include Spectrum-4 ASIC firmware binaries.
Update firmware upgrade scripts to be able to detect Spectrum-4 ASIC.
- How to verify it
Run regression tests
Signed-off-by: Kebo Liu <kebol@nvidia.com>
- Why I did it
Add SDK hash calculator Debian and update SDK makefile to compile it.
- How I did it
SDK hash calculator Debian will be used by ECMP calculator (PR #12482)
- How to verify it
Compile sonic-buildimage and verify SDK hash calculator Debian exist in target folder.
* Support power threshold
Signed-off-by: Stephen Sun <stephens@nvidia.com>
* get_psu_power_warning_threshold => get_psu_power_warning_suppress_threshold
Signed-off-by: Stephen Sun <stephens@nvidia.com>
* Fix comments
Signed-off-by: Stephen Sun <stephens@nvidia.com>
Signed-off-by: Stephen Sun <stephens@nvidia.com>
Make syncd rpc docker which supports sai-ptf v2
local bulild the target
NOSTRETCH=y NOJESSIE=y make configure PLATFORM=vs
NOSTRETCH=y NOJESSIE=y NOBULLSEYE=y SAITHRIFT_V2=y make target/docker-ptf-sai.gz
NOSTRETCH=y NOJESSIE=y make configure PLATFORM=vs
NOSTRETCH=y NOJESSIE=y NOBULLSEYE=y make target/docker-ptf.gz
NOSTRETCH=y NOJESSIE=y make configure PLATFORM=broadcom
NOSTRETCH=y NOJESSIE=y ENABLE_SYNCD_RPC=y SAITHRIFT_V2=y make target/docker-syncd-brcm-rpcv2.gz
NOSTRETCH=y NOJESSIE=y ENABLE_SYNCD_RPC=y SAITHRIFT_V2=y make target/docker-saiserverv2-brcm.gz
Test done:
#12619
NOSTRETCH=y NOJESSIE=y make configure PLATFORM=broadcom
NOSTRETCH=y NOJESSIE=y ENABLE_SYNCD_RPC=y make target/docker-syncd-brcm-rpc.gz
NOSTRETCH=y NOJESSIE=y ENABLE_SYNCD_RPC=y make target/docker-saiserver-brcm.gz
Signed-off-by: richardyu-ms <richard.yu@microsoft.com>
Signed-off-by: richardyu-ms <richard.yu@microsoft.com>
add partial reboot cause support for linecards
add watchdog support for linecards
add power draw information for chassis
properly implement Chassis.get_port_or_cage_type
fix pcieutil on chassis with powered off cards
fix watchdog-control.service crash
misc fixes and cleanups
Why I did it
Move armhf syncd docker compilation to bullseye.
How I did it
compile syncd docker for armhf platform using below commands,
NOJESSIE=1 NOSTRETCH=1 NOBUSTER=1 BLDENV=bullseye make configure PLATFORM=marvell-armhf PLATFORM_ARCH=armhf
NOJESSIE=1 NOSTRETCH=1 NOBUSTER=1 BLDENV=bullseye make target/docker-syncd-mrvl.gz
How to verify it
upgrade the syncd docker and verify ports are up.
Signed-off-by: rajkumar38 <rpennadamram@marvell.com>
* [SAI PTF] SAI PTF docker support sai-ptf v2
Publish the sai-ptf docker.
Take part of the change from previous PR #11610 (already reverted as some cache issue)
Cause in #11610, added two new target in it, one is sai-ptf another one is syncd-rpc with sai-ptf v2, to make the upgrade with more clear target, use this one take the sai-ptf one.
Test one:
NOSTRETCH=y NOJESSIE=y make configure PLATFORM=vs
NOSTRETCH=y NOJESSIE=y NOBULLSEYE=y SAITHRIFT_V2=y make target/docker-ptf-sai.gz
Signed-off-by: richardyu-ms <richard.yu@microsoft.com>
* remove useless change
Signed-off-by: richardyu-ms <richard.yu@microsoft.com>
* remove useless parameters
Signed-off-by: richardyu-ms <richard.yu@microsoft.com>
* remove useless change
Signed-off-by: richardyu-ms <richard.yu@microsoft.com>
* Update azure-pipelines-build.yml
remove a useless option
Signed-off-by: richardyu-ms <richard.yu@microsoft.com>
This PR is part of the following HLD:
Persistent loglevel HLD: sonic-net/SONiC#1041
- Why I did it
After the Logger tables moved from the LOGLEVEL_DB to the CONFIG_DB and the jinja2_cache was deleted the LOGLEVEL_DB is not in use.
- How I did it
Removed the LOGLEVEL_DB from the SONiC code
- How to verify it
All tests were passed
Why I did it
syseepromd in pmon crashes because of missing import in python script and doesn't get in running state
How I did it
Fix missing import issue to avoid python script failing
How to verify it
Boot system and wait till syseepromd gets into running state
Which release branch to backport (provide reason below if selected)
201811
201911
202006
202012
202106
202111
202205
* Build docker-gbsyncd-broncos image
* Correct typo in LIBSAI_BRONCOS_URL_PREFIX
* Update docker-gbsyncd-broncos/Dockerfile.j2
* Enable debug shell support on docker-gbsyncd-broncos
* Include bcmsh in docker-gbsyncd-broncos
Why I did it
In docker-gbsyncd-broncos image, enable debug shell support for BRCM broncos PHY.
How I did it
How to verify it
Note: need enable attr SAI_SWITCH_ATTR_SWITCH_SHELL_ENABLE support in BCM PAI library
# bcmsh
Press Enter to show prompt.
Press Ctrl+C to exit.
NOTICE: Only one bcmsh or bcmcmd can connect to the shell at same time.
BRCM:> help
help
List of available commands
- h or help => Print command menu
- l => Print list of active ports on the PHY
- ps <port_id> <options> => Print port status
<options> => 1 -> Link status
=> 2 -> Link training failure status
=> 3 -> Link training RX status
=> 4 -> PRBS lock status
=> 5 -> PRBS lock loss status
- rd <port_id> <addr> <no of registers to read> => Read register contents
- wr <port_id> <addr> <data> => Write register data
- rrd <lanemap> <if_side> <addr> <no of registers to read> => Raw read register contents using lanemap and if_side (line = 0, system = 1)
- rwr <lanemap> <if_side> <addr> <data> => Raw write register data using lanemap and if_side (line = 0, system = 1)
- fw or firmware => Print firmware version of the PHY
- pd or port_dump <port_id> <flags> => Dump port status
- eyescan <port_id> => Display eye scan
- fec_status <port_id> => Get fec status of the port
- polarity <lanemap> <if_side> <TX polarity> <RX Polarity> => Set TX and RX polarity
<lanemap> => 0xF, 0xFF, or 0xFFFF based on number of lanes
<if_side > => Line = 0, System = 1
<TX/RX Polarity> =>_TX/RX Polarity bitmap of all lanes
Each bit represents a lane number.
E.g. Lane 0's polarity value (0 or 1) is populated in Bit 0.
- polarity <lanemap> <if_side> => Print TX and RX polarity
- lb <port_id> <lb_value> => Enable loopback on the port
lb_value = 0 -> Disable, 1 -> PHY, 2 -> MAC
- lb <port_id> => Print loopback configuration of the port
- prbs <port_id> <options> <val> => Set/Get PRBS configuration
<options> => 1 -> Get PRBS state and polynomial
2 -> Set PRBS Polynomial, <val> - PRBS Polynomial
Please refer to phy/chip documentation for valid values
3 -> Enable PRBS
<val> => 0 Disable PRBS
1 Enable both PRBS Transmitter and Receiver
2 Enable PRBS Receiver
3 Enable PRBS Transmitter
exit or q => Exit the diagnostic shell
- Why I did it
Update SN2201 dynamic minimum fan speed table according to data provided by the thermal team.
- How I did it
Update the thermal table in device_data.py
- How to verify it
Run platform related regression
Signed-off-by: Kebo Liu <kebol@nvidia.com>
Signed-off-by: maipbui <maibui@microsoft.com>
#### Why I did it
`subprocess.Popen()` and `subprocess.run()` is used with `shell=True`, which is very dangerous for shell injection.
`os` - not secure against maliciously constructed input and dangerous if used to evaluate dynamic content
#### How I did it
Replace `os` by `subprocess`
Remove unused functions
Why I did it
In case the device contains more then one FAN drawer, the FANs name was incorrect.
How I did it
Passed max fan value to FAN object.
Fixed get_name() FAN API
How to verify it
show platform fan
Why I did it
SONiC will report the kernel dump while system reboot in Belgite platform as the following shows:
How I did it
Cause:
Invalid cdev container pointer from the inode is being accessing in misc
device open, which causes a memory corruption in the slub.
Because of the slub corruption, random crash is seen during reboot.
Fix: - Instead of cdev pointer from the inode, mdev container pointer is
used from the file->privdate_data member.
Action: update the pddf_custom_wdt driver,
How to verify it
Do the reboot stress test to check whether there is kernel dump during reboot progress
- Why I did it
Update SDK/FW version - 4.5.3186/2010_3186 in order to have the following changes:
New functionality:
1. Added support for 6.5W (Class 8) in ports 49-50, 53-54, 57-58, and 61-62 on SN4600 system
Fix the following issues:
1. On very rare occasion (~1/100K), during I2C transaction with MMS1V50-WM and MMS1V90-WR modules on SN4700 system, the module may send unexpected stop which violate the I2C specification, possibly affecting the link up flow
2. When running 1GbE speeds on SN4600 system, the port remained active while peer side was closed
3. While toggling the cable with ‘sfputil lpmode on/off’, error msg like “ERR pmon#xcvrd: Receive PMPE error event on module 1: status {X} error type {y}” could be received
4. When toggling many ports of the Spectrum devices while raising 10GbE link up and link maintenance is enabled, the switch may get stuck and may need to be rebooted
5. When trying to reconfigure the Flex Parser header and Flex transition parameters after ISSU, the switch will returned an error even if the configuration was identical to that done before performing the ISSU
6. While moving from lossless to lossy mode while shared headroom was used, reduction of the shared headroom can only be done prior to pool type change and when shared headroom is not utilized
7. SLL configuration is missing in SDK dump
8. If TTL_CMD_COPY is used in Encap direction for a packet with no TTL, then the value passed in the ttl data structure will be used if non-zero (default 255 if zero)
9. PCI calibration changes from a static to a dynamic mechanism
10. Layer 4 port information is not initialized for BFD packet event. To address the issue, remote peer UDP port information was added in BFD packet event
11. SDK returned error when FEC mode is set on twisted pair, when FEC was set to None
- How I did it
Update pointer for the SDK/FW
- How to verify it
Run regression tests
Signed-off-by: dprital <drorp@nvidia.com>
Why I did it
syseepromd in pmon crashes because of missing import in python script and doesn't get in running state
How I did it
Fix missing import issue to avoid python script failing
How to verify it
Boot system and wait till syseepromd gets into running state
Signed-off-by: maipbui <maibui@microsoft.com>
#### Why I did it
`os` - not secure against maliciously constructed input and dangerous if used to evaluate dynamic content
#### How I did it
Replace `os` by `subprocess`
This fixes the following error
```
admin@sonic:~$ sudo fwutil show status
mount: /mnt/onie-fs: special device /dev/sda2
does not exist.
Error: Command '['mount', '-n', '-r', '-t', 'ext4', '/dev/sda2\n', '/mnt/onie-fs']' returned non-zero exit status 32.. Aborting...
Aborted!
admin@sonic:~$ sudo vi /usr/local/lib/python3.9/dist-packages/sonic_platform/
```
Seems like #11877 the rstrip('\n') was removed. Probably by mistake.
Signed-off-by: Stephen Sun <stephens@nvidia.com>
fix linecard provisioning issue (500 error)
fix some value types for get_system_eeprom_info API
refactor code to leverage pci topology (enabling dynamic Pcie plugin)
refactor asic declaration logic to new style
misc fixes
Signed-off-by: Mariusz Stachura <mariusz.stachura@intel.com>
What I did
Adding the dynamic headroom calculation support for Barefoot platforms.
Why I did it
Enabling dynamic mode for barefoot case.
How I verified it
The community tests are adjusted and pass.
Remove swsssdk from sonic OS image and docker image
#### Why I did it
swsssdk is deprecated, so need remove from image.
#### How I did it
Update config file to remove swsssdk from image.
#### How to verify it
Pass all test case.
#### Which release branch to backport (provide reason below if selected)
<!--
- Note we only backport fixes to a release branch, *not* features!
- Please also provide a reason for the backporting below.
- e.g.
- [x] 202006
-->
- [ ] 201811
- [ ] 201911
- [ ] 202006
- [ ] 202012
- [ ] 202106
- [ ] 202111
- [ ] 202205
#### Description for the changelog
Remove swsssdk from sonic OS image and docker image
#### Ensure to add label/tag for the feature raised. example - PR#2174 under sonic-utilities repo. where, Generic Config and Update feature has been labelled as GCU.
#### Link to config_db schema for YANG module changes
<!--
Provide a link to config_db schema for the table for which YANG model
is defined
Link should point to correct section on https://github.com/Azure/sonic-buildimage/blob/master/src/sonic-yang-models/doc/Configuration.md
-->
#### A picture of a cute animal (not mandatory but encouraged)
Signed-off-by: maipbui <maibui@microsoft.com>
#### Why I did it
`os` - not secure against maliciously constructed input and dangerous if used to evaluate dynamic content.
#### How I did it
`os` - use with `subprocess`
#### How to verify it
Signed-off-by: maipbui <maibui@microsoft.com>
Dependency: [PR (#12065)](https://github.com/sonic-net/sonic-buildimage/pull/12065) needs to merge first.
#### Why I did it
`subprocess.Popen()` and `subprocess.check_output()` is used with `shell=True`, which is very dangerous for shell injection.
#### How I did it
Disable `shell=True`, enable `shell=False`
#### How to verify it
Tested on DUT, compare and verify the output between the original behavior and the new changes' behavior.
[testresults.zip](https://github.com/sonic-net/sonic-buildimage/files/9550867/testresults.zip)
- Why I did it
To update MFT package to the latest version.
- How I did it
Updated MFT_VERSION & MFT_REVISION in platform/mellanox/mft.mk.
- How to verify it
Build an image and deploy to the switch
Check MFT version by dpkg -l | grep mft
Verify that all the SONiC services up and running
Run regression testing using tests from sonic-mgmt
Signed-off-by: Volodymyr Samotiy <volodymyrs@nvidia.com>
- Why I did it
To include latest fixes and new functionality
SAI fixes and new features
fix#3205239, incorrect object type returned for SG child list
Fix VRF-VNI map entries remove issue
ECC health event and logging
[Port Buffers] restore default queue and pg configuration when all user pools are deleted
Fix EVPN type3 error on removal of uc/bc flood group
Fix EVPN type2 MAC move from local to remote results in SAI failure
Fix Disable learning on VXLAN tunnel
Fix error on VXLAN v6 tunnel removal
Fix port cannot apply schedule group when it is a lag member
Fix BFD add more detailed message on BFD packet not related to any existing session
gcc10 compilation fixes
Disable learning on VXLAN tunnel
Support BFD remote-disc exchange in negotiation stage
Tunnel Loopback packet action attribute implementation (for Dual TOR)
Add KVD resources MIN/MAX functionality (pending CRM issue with MIN only)
Support for CRC2 hash algorithm
Bulk counter support for PGs, queues
Support mirror sample rate attribute (SPC2+)
[Functional] [QoS] | Unable to remove SCHEDULE profile table even if there is no object referencing it
Next hop group optimized bulk API
Reduce verbosity of shared database already exists print
Span mirror policer (SPC2+), optimize pipeline for acl mirror action with policer on SPC2+
use same size descriptor pool for rx/tx
fix bfd - notify Sonic for admin-down event
2201 - empty list for supported fec for RJ45 ports
Fix don't disable used tunnel underlay interfaces
SDK fixes
100GbE FCI DAC (10137628-4050LF/HPE PN: 845408-B21) was recognized by mistake as supporting "cable burning' which caused the switch firmware to read page 0x9f (which unsupported in the cable) and to report this cable as having "bad eeprom".
Added remote peer UDP port information in BFD packet event.
After editing an ECMP, the resilient ECMP next-hop counter may not count correctly.
Fixed potential memory leaks in some APIs related to LPM
If TTL_CMD_COPY is used in Encap direction for a packet with no TTL, then the value passed in the ttl data structure will be used if non-zero (default 255 if zero).
In SN2201: When configuring Force mode, user should configure Speed and FEC on both sides
In Flex Tunnel encapsulation flow, if the encapsulation is with an IPv6 header, the flow label field may not be updated as expected.
In some cases, when changing speed to 400GbE over 8 lanes, the first few packets would be dropped.
In some traffic patterns involving small packets, the PortRcvErrors counter may mistakenly count events of local physical errors due to an internal flow in the hardware that involves link packets.
On Spectrum systems, sometimes during link failure, not all previous firmware indications cleared properly, potentially affecting the next link up attempt.
On the NVIDIA Spectrum-2 switch, when receiving a packet with Symbol Errors on ports that are configured to cut-thought mode, a pipeline might get stuck.
PCI calibration changes from a static to a dynamic mechanism.
SDK debug dump shows "Unknown" Counter in RFC3635 Counter Group.
SDK debug dump shows "Unknown" Counter in the PPCNT Traffic Class Counter Group.
SDK Dump missing column headers in some GC tables may result in difficulty understanding the dump.
SLL configuration is missing in SDK dump.
Spectrum-2 systems, do no support 1GbE on supported 40GbE modules.
When binding a UDP port which is already in use for BFD TX session, the error message appears incorrectly.
When Flex Tunnel was used, Flex Modifier sometimes experienced a brief mis-configuration during ISSU.
When many ports are active (e.g. 70 ports up), and the configuration of shared buffer is applied on the fly, occasionally, the firmware might get stuck.
When running 1GbE speeds on SN4600 system, the port remained active while peer side was closed.
When toggling many ports of the Spectrum devices while raising 10GbE link up and link maintenance is enabled, the switch may get stuck and may need to be rebooted.
When trying to reconfigure the Flex Parser header and Flex transition parameters after ISSU, the switch will returned an error even if the configuration was identical to that done before performing the ISSU.
While toggling the cable, and the low power mode is set to ON, an unexpected PMPE event error is received.
- How I did it
Updated SDK/SAI submodule and relevant makefiles with the required versions.
- How to verify it
Build an image and run tests from "sonic-mgmt".
Signed-off-by: Volodymyr Samotiy <volodymyrs@nvidia.com>
- Why I did it
get_rx_los and get_tx_fault is not supported via the exisitng interface used, need provide dummy implementation for them.
NOTE: in later releases we will get them back via different interface.
- How I did it
Return False * lane_num for get_rx_los and get_tx_fault
- How to verify it
Added unit test
* Move qsfp eeprom reading to new cached api
* provide reading multiple pages in recursive manner
* workaround with flat memory on cmis
* remove workaround with memory model
* Remove unused imports
- Why I did it
Fix a typo in chassis platform API which causes the following error
>>> import sonic_platform as P
>>> c = P.platform.Platform().get_chassis()
>>> sl = c.get_all_sfps()
>>> sl[0].get_lpmode()
Sep 28 07:48:33 INFO LOG: Initializing SX log with STDOUT as output file.
False
>>> del c
Exception ignored in: <function Chassis.__del__ at 0x7f1d166ef8b0>
Traceback (most recent call last):
File "/usr/local/lib/python3.9/dist-packages/sonic_platform/chassis.py", line 126, in __del__
self.sfp_module.deinitialize_sdk_handle(sfp_module.SFP.shared_sdk_handle)
NameError: name 'sfp_module' is not defined
- How I did it
Use self while using the SDK handle
- How to verify it
Manual test
Signed-off-by: Stephen Sun <stephens@nvidia.com>
This reverts commit 9750cb4.
There is a PR to handle 202205 branch revert: #12184
- Why I did it
The PR to be reverted introduced many notice logs every 1 minute if SFP is not plugged:
Cannot get module EEPROM information: Input/output error
Before the "bad" PR, the message format is like this:
INFO pmon#supervisord: xcvrd Cannot get module EEPROM information: Input/output error
It was truncated by rsyslog because every message is the same. However, the "bad" PR introduces SFP index to the message:
NOTICE pmon#xcvrd: Failed to get EEPROM data for sfp 39: Cannot get module EEPROM information: Input/output error
Rsyslog no longer truncate such log and many such messages are flooded to syslog.
- How I did it
Revert the PR
- How to verify it
Manual test
- Why I did it
Update SDK/FW version - 4.5.2320/2010_2320 in order to have the following fixes:
• Spectrum-3 | PCI calibration changes from a static to a dynamic mechanism.
• [VxLAN] TTL was set to 0 for non IP traffic (such as ARP)
- How I did it
Update pointer for the SDK/FW
- How to verify it
Run regression tests
- Why I did it
ethtool print error logs when EEPROM of a SFP is not available. It prints error like this:
INFO pmon#/supervisord: xcvrd Cannot get module EEPROM information: Input/output error
INFO pmon#/supervisord: xcvrd Cannot get Module EEPROM data: Invalid argument
However, this log does not contain the relevant SFP index which is hard for developer/qa to find the exactly SFP.
- How I did it
Redirect ethtool stderr to subprocess and log it better
- How to verify it
Manual test
* Adding support for get/set low pwer mode for QSFPs in PDDF common APIs
* Adding support for get/set low pwer mode for QSFPs in PDDF common APIs - Review comments
Why I did it
To gracefully unmount filesystems and stop containers while performing a cold reboot.
Unmount ONIE-BOOT if mounted during fast/soft/warm reboot
How I did it
Override systemd-reboot service to perform a cold reboot.
Unmount ONIE-BOOT if mounted using fast/soft/warm-reboot plugins.
How to verify it
On reboot, verify that the container stop and filesystem unmount services have completed execution before the platform reboot.
Why I did it
The directory /var/warmboot as top directory for warmboot feature is also needed in docker gbsyncd. Some vendor SAI might save data under it. Without it, the SAI init/creation API failure has happened on PikeZ platform.
How I did it
Mount host directory /host/warmboot as /var/warmboot in docker gbsyncd, which is same as what it has done on docker syncd.
Why I did it
S5296F - Platform API 2.0 changes
How I did it
Implemented the functional API's needed for Platform API 2.0
How to verify it
Used the API 2.0 test suite to validate the test cases.
- Why I did it
Update SDK/FW version - 4.5.2318/2010_2318 to pick up new fixes:
1. Cr space timeout on Hold and Release GW - at warm boot
2. Spectrum Port in stuck PHY_UP after peer side rebooted
3. Memory leak in sx_api_router_ecmp_update_set
- How I did it
Update the make file with the new version number
Update submodule Switch-SDK-drivers pointer
- How to verify it
Run sonic regression
Signed-off-by: Kebo Liu <kebol@nvidia.com>
#### Why I did it
Update scripts in sonic-buildimage from py-swsssdk to swsscommon
#### How I did it
Replace swsssdk with swsscommon in centec devices.
#### How to verify it
Pass all E2E test case
#### Which release branch to backport (provide reason below if selected)
<!--
- Note we only backport fixes to a release branch, *not* features!
- Please also provide a reason for the backporting below.
- e.g.
- [x] 202006
-->
- [ ] 201811
- [ ] 201911
- [ ] 202006
- [ ] 202012
- [ ] 202106
- [ ] 202111
- [ ] 202205
#### Description for the changelog
Replace swsssdk with swsscommon in centec devices.
#### Link to config_db schema for YANG module changes
<!--
Provide a link to config_db schema for the table for which YANG model
is defined
Link should point to correct section on https://github.com/Azure/sonic-buildimage/blob/master/src/sonic-yang-models/doc/Configuration.md
-->
#### A picture of a cute animal (not mandatory but encouraged)
Why I did it
It solves a swss orchagent crash issue on PikeZ device, due to link-training setting of external PHY port.
How I did it
Catch up the fix for CS00012257483 in version 7.1.7.2.
* draft upgrade to deb11 of syncd and syncd-rpc
* upgrade to python3
* revert workaround with libsaithrift
* Provide urls for sai and platform debs
* Downgrade python3 to python2
* Remove saithrift-patches
* Upgrade modules
* remove unnecessary lib
* remove more unnecessary modules
* Update sdk reference
* remove unnecessary packages from syncd-rpc
# Why I did it
platform-modules-belgite's deb requests linux-image-5.10.0-8-2-amd64-unsigned, which does not match the runtime kernel version
# How I did it
update the belgite's deb configuration in deb's control
# How to verify it
check the firsttime boot log in belgite platform
Co-authored-by: nicwu-cel <nicwu@celestica.com>
* Update BRCM KNET module to support new psample definitions from sflow dropmon feature
* Update BRCM KNET module to support new psample definitions from sflow dropmon feature
* Advance saibcm-modules-dnx
- Add Watchdog remaining time API
- Add support for non-swappable fans via a FixedDrawer
- Add ASIC voltage tweaks for PikeZ product
- Add better pylint support
- Fix reboot-cause decision issue for future products
- Fix thermal issue for RJ45 ports
- Deprecate Catalina prototype support
- Why I did it
Update HW-MGMT to V.7.0020.3006
1. Support new system SN2201
2. Add COMEX BRDWL respin support
- How I did it
Update the version number of the makefile
Advance the hw-mgmt submodule pointer
- How to verify it
Run full regression on Nvidia platforms
Signed-off-by: Kebo Liu <kebol@nvidia.com>
- Why I did it
Add PSU input voltage and input current to mlnx platform api.
- How I did it
Implement 2 function of getting the psu voltage and psu current input:
Get the values from "power/psu{}_curr_in" , "power/psu{}_volt_in"
- How to verify it
Manual test.
Run sonic-mgmt regression
Signed-off-by: orfar1994 <orfar1994@gmail.com>
- Why I did it
Add more log while doing sysfs reading to increase the debug capability
- How I did it
Log the relevant file path and error number while sysfs reading return None
- How to verify it
Manual test
To reduce rc.local script execution time. Porting changes from [DellEMC] S6100 Platform Service optimization #10989
Changes:
Moving platform-modules-s6100.service and s6100-lpc-monitor.service asynchronous to rc.local script.
this upgrade contains two changes:
1. Add the following MacSec Initialization Condition:
- When MacSec feature is not included MacSec block should not be brought out of reset irrespective of the value of the newly added config variable.
- When included its initialization is controlled by the newly added config variable.
2. DNX buf fix: increase _BRCM_SAI_MAX_ACL_TABLES to 128
Signed-off-by: zitingguo <zitingguo@microsoft.com>
Why I did it
Enable syncd container autorestart for Innovium platforms
How I did it
Add critical_process file and sypervisord.conf entry
How to verify it
Tested with autorestart/test_container_autorestart.py::test_containers_autorestart
PASSED autorestart/test_container_autorestart.py::test_containers_autorestart[sonic-xxx-dut-sonic-xxx-dut|syncd]
Signed-off-by: rck-innovium rck@innovium.com
* Ported Marvell armhf build on x86 for debian buster to use cross-compilation instead of qemu emulation
Current armhf Sonic build on amd64 host uses qemu emulation. Due to the
nature of the emulation it takes a very long time, about 22-24 hours to
complete the build. The change I did to reduce the building time by
porting Sonic armhf build on amd64 host for Marvell platform for debian
buster to use cross-compilation on arm64 host for armhf target. The
overall Sonic armhf building time using cross-compilation reduced to
about 6 hours.
Signed-off-by: marvell <marvell@cpss-build3.marvell.com>
* Fixed final Sonic image build with dockers inside
* Update Dockerfile.j2
Fixed qemu-user-static:x86_64-aarch64-5.0.0-2 .
* Update cross-build-arm-python-reqirements.sh
Added support for both armhf and arm64 cross-build platform using $PY_PLAT environment variable.
* Update Makefile
Added TARGET=<cross-target> for armhf/arm64 cross-compilation.
* Reviewer's @qiluo-msft requests done
Signed-off-by: marvell <marvell@cpss-build3.marvell.com>
* Added new radius/pam patch for arm64 support
* Update slave.mk
Added missing back tick.
* Added libgtest-dev: libgmock-dev: to the buster Dockerfile.j2. Fixed arm perl version to be generic
* Added missing armhf/arm64 entries in /etc/apt/sources.list
* fix libc-bin core dump issue from xumia:fix-libc-bin-install-issue commit
* Removed unnecessary 'apt-get update' from sonic-slave-buster/Dockerfile.j2
* Fixed saiarcot895 reviewer's requests
* Fixed README and replaced 'sed/awk' with patches
* Fixed ntp build to use openssl
* Unuse sonic-slave-buster/cross-build-arm-python-reqirements.sh script (put all prebuilt python packages cross-compilation/install inside Dockerfile.j2). Fixed src/snmpd/Makefile to use -j1 in all cases
* Clean armhf cross-compilation build fixes
* Ported cross-compilation armhf build to bullseye
* Additional change for bullseye
* Set CROSS_BUILD_ENVIRON default value n
* Removed python2 references
* Fixes after merge with the upstream
* Deleted unused sonic-slave-buster/cross-build-arm-python-reqirements.sh file
* Fixed 2 @saiarcot895 requests
* Fixed @saiarcot895 reviewer's requests
* Removed use of prebuilt python wheels
* Incorporated saiarcot895 CC/CXX and other simplification/generalization changes
Signed-off-by: marvell <marvell@cpss-build3.marvell.com>
* Fixed saiarcot895 reviewer's additional requests
* src/libyang/patch/debian-packaging-files.patch
* Removed --no-deps option when installing wheels. Removed unnecessary lazy_object_proxy arm python3 package instalation
Co-authored-by: marvell <marvell@cpss-build3.marvell.com>
Co-authored-by: marvell <marvell@cpss-build2.marvell.com>
Fix an issue with front panel port led introduced in previous PR
Implement status led for linecards
Implement full power cycle for linecards
Improve reboot cause reporting for Ucd devices
Add fan support for PikeZ
Miscellaneous fixes and improvements
- Why I did it
Support get_port_or_cage_type for RJ45 ports
- How I did it
Implement the new platform API get_port_or_cage_type
Fix the issue: unable to import SFP when chassis object is destructed
- How to verify it
Manually test and regression test
Signed-off-by: Stephen Sun <stephens@nvidia.com>
- Why I did it
Fix bug: pmon report error on start up because some SKUs do not have hwsku.json
- How I did it
If hwsku.json, do not extract RJ45 port information
- How to verify it
Manual test.
Unit test.
- Why I did it
Add support for mellanox platform building for target architecture arm64.
- How I did it
Contains the following changes:
1. Change instances of hard-coded amd64 to $(CONFIGURED_ARCH)
2. Add logic to download correct binary for MFT package
3. Add TARGET_BOOTLOADER=grub definition to rules.mk to override default arm64 bootloader
- How to verify it
Build mellanox platform with TARGET_ARCH set as arm64
Give more room for the kernel image in memory
Change-Id: I015856d173d50d94e30d8c555590efb70eb712ae
Signed-off-by: Pavan Naregundi <pnaregundi@marvell.com>
- Why I did it
Advance to new SAI version for bugs fixes as well as new features/enhacements:
New:
- ARM64 support
- FG ECMP performance optimization
- Support setting empty list for port ingress/egress buffer profile list
- Add service port for SN5600
- Add CR8/SR8/LR8/KR8 interface type
- Disable mlxtrace during debug dump
Fixes:
- Fix SAI_ACL_ENTRY_ATTR_FIELD_TC
- Fix Packets loop back if no member in portchannel
- Fix optimize descriptors apply time (and fast boot time)
- Add flush fdb entries for vxlan tunnel bridge port
- Don't disable used tunnel underlay interfaces
- How I did it
Advanced SAI submodule
- How to verify it
make configure PLATFORM=mellanox
make target/sonic-mellanox.bin
Signed-off-by: Nazarii Hnydyn <nazariig@nvidia.com>
#### Why I did it
Fix some bugs on centec tsingma bsp and v682 sonic_platform package.
#### How I did it
1. add module license for centec mars phy driver
2. Fix i2c function ability setting for tsingma soc i2c controller
3. Fix eeprom read error on v682 sonic_platform sfp module
#### How to verify it
Build SONiC image and verify it on centec E530-48T4X and V682-48Y8C board.
Why I did it
Fix Centec-Arm64 compile error, Centec SAI Dev package reference is error
How I did it
Modify sai.mk of arm64 platform for Centec
How to verify it
Build centec amd64 and arm64 sonic image
Signed-off-by: Sudharsan Dhamal Gopalarathnam sudharsand@nvidia.com
Why I did it
During the system boot up when 'show platform status' or 'show version' command is executed before STATE_DB CHASSIS_INFO table is populated, the show will try to fallback to use the platform API. The DMI file in mellanox platforms require root permission for access. So if the show commands are executed as admin or any other user, the following error log will appear in the syslog
Jun 28 17:21:25.612123 sonic ERR show: Fail to decode DMI /sys/firmware/dmi/entries/2-0/raw due to PermissionError(13, 'Permission denied')
How I did it
Check the file permission before accessing it.
How to verify it
Added UT to verify. Manually verified if the error log is not thrown.
Why I did it
Support Intel Tofino based platforms Netberg Aurora 610
ASIC: Intel Tofino BFN-T10-032D-020
Pors: 48x 25G + 8x 100G
How I did it
Added specification to device/netberg directory
Added platform/barefoot/sonic-platform-modules-netberg contains kernel modules, scripts and sonic_platform packages.
Modified the platform/barefoot/one-image.mk and platform/barefoot/rule.mk to include Aurora 610 related ID and files.
How to verify it
Build SONiC
Install the image on the device and verify the related components are installed and shown correctly.
Why I did it
Add 6512-32r support for Wistron platform
Update sw-to3200k for newer branch
How I did it
Add code in device and platform folder for 6512-32r
Update sw-to3200k code both in device and platform folder
How to verify it
Install on Wistron device and run command to verify
Signed-off-by: RogerChang Roger_Chang@wistron.com
Why I did it
To return 'False' in update_firmware component API in DellEMC Z9332f platform, if the firmware image is not present in the provided image path.
How I did it
Updated 'update_firmware' in component.py to return False if image is not found in location provided by 'image_path'
How to verify it
Verified that the API returns False when an invalid image path is specified.
- Why I did it
This is for the eventual support of multiple architectures for the mellanox platform.
- How I did it
Change the location of the binaries in Switch-SDK-drivers so that the path specifies the target architecture in addition to the target distribution that the debians are built for.
This is the most straightforward way to separate binaries built against different architectures and selectively target them for installation in the mellanox SONiC image.
- How to verify it
Build SONiC for mellanox and verify it compiles successfully.
- Why I did it
To provide an ability to suppress ASAN false positives and have a clean ASAN report for docker-sonic-vs/mlnx-syncd/orchagent docker
- How I did it
Added the "print_suppressions=0" to ASAN configs.
- How to verify it
add a suppression to some ASAN-enabled component (the suppression should catch some leak)
build with ENABLE_ASAN=y
run a test and see that the ASAN report is empty instead of having the suppression summary
Signed-off-by: Yakiv Huryk <yhuryk@nvidia.com>
To ensure that ASAN logs are always generated. Currently, the way to get the logs is to map the "/var/log/asan" outside of a container, which doesn't work for DVS test run with "--imgname" option.
Signed-off-by: Yakiv Huryk <yhuryk@nvidia.com>
* Support new platform SN2201 and RJ45 port
Signed-off-by: Kebo Liu <kebol@nvidia.com>
* remove unused import and redundant function
Signed-off-by: Kebo Liu <kebol@nvidia.com>
* fix error introduced by rebase
Signed-off-by: Kebo Liu <kebol@nvidia.com>
* Revert the special handling of RJ45 ports (#56)
* Revert the special handling of RJ45 ports
sfp.py
sfp_event.py
chassis.py
Signed-off-by: Stephen Sun <stephens@nvidia.com>
* Remove deadcode
Signed-off-by: Stephen Sun <stephens@nvidia.com>
* Support CPLD update for SN2201
A new class is introduced, deriving from ComponentCPLD and overloading _install_firmware
Change _install_firmware from private (starting with __) to protected, making it overloadable
Signed-off-by: Stephen Sun <stephens@nvidia.com>
* Initialize component BIOS/CPLD
Signed-off-by: Stephen Sun <stephens@nvidia.com>
* Remove swb_amb which doesn't on DVT board any more
Signed-off-by: Stephen Sun <stephens@nvidia.com>
* Remove the unexisted sensor - switch board ambient - from platform.json
Signed-off-by: Stephen Sun <stephens@nvidia.com>
* Do not report error on receiving unknown status on RJ45 ports
Translate it to disconnect for RJ45 ports
Report error for xSFP ports
Signed-off-by: Stephen Sun <stephens@nvidia.com>
* Add reinit for RJ45 to avoid exception
Signed-off-by: Stephen Sun <stephens@nvidia.com>
Co-authored-by: Stephen Sun <5379172+stephenxs@users.noreply.github.com>
Co-authored-by: Stephen Sun <stephens@nvidia.com>
Why I did it
S5212F - Platform API 2.0 changes
S5224F - Platform API 2.0 changes
How I did it
Implemented the functional API's needed for Platform API 2.0
Added media_settings.json, pcie.yaml, platform.json, system_health_monitoring_config.json files.
How to verify it
Used the API 2.0 test suite to validate the test cases.
Why I did it
Added support for the device Z9432F
How I did it
Implemented the support for the platform Z9432F
Switch Vendor: DellEMC
Switch SKU: Z9432F-ON
ASIC Vendor: Broadcom
SONiC Image: sonic-broadcom.bin
This fixes the build for armhf to be able to use '/device///installer.conf' files. Specifically, armhf needs support to be able to change the size of /var/log/ directory. It is hardcoded to 512 bytes on all armhf platforms currently. This change will allow any armhf platform to be able to use an installer.conf file to customize the installed image.
- Why I did it
"import sonic_platform" takes about 600ms ~ 1000ms, it is kind of slow. After this optimization, the time is about 100ms. The benefit is that those CLIs which does not need the slow import sentence would be faster than before.
- How I did it
Find slow import and call them when need.
- How to verify it
Measure the import time.
- Why I did it
To include latest fixes:
1. Warmboot | When trying to reconfigure the Flex Parser header and Flex transition parameters after ISSU, the switch will returned an error even if the configuration was identical to that done before performing the ISSU.
2. Link Up | When toggling many ports of the Spectrum devices while raising 10GbE link up and link maintenance is enabled, the switch may get stuck and may need to be rebooted.
3. Shared buffer | While moving from lossless to lossy while shared headroom was used, reduction of the shared headroom can only be done prior to pool type change and when shared headroom is not utilized.
- How I did it
Updated SDK submodule along with the relevant Makefiles
- How to verify it
Build an image and run tests from "sonic-mgmt".
Signed-off-by: Volodymyr Samotiy <volodymyrs@nvidia.com>
Currently, the build with ASAN_ENABLE=y reuses the packages built with
ASAN_ENABLE=n (and vice versa). To address this issue, ASAN_ENABLE is added to DEP_FLAGS for asan-enabled packages (docker-syncd-mlnx, syncd, docker-orchagent, swss).
- Why I did it
To make dpkg cache use/rebuild the packages for ASAN_ENABLE=y/n.
- How I did it
Added ASAN_ENABLE to the DEP_FLAGS for asan-enabled packages.
- How to verify it
Built with ASAN_ENABLE=y/n and checked the .flags .log files.
Signed-off-by: Yakiv Huryk <yhuryk@nvidia.com>
This is to improve the readability of ASAN reports. The debug package adds function names and source code references to the backtrace (currently, there are only binary addresses of functions)
Another way to address this issue is to build the image with "INSTALL_DEBUG_TOOLS=y". The downside of this approach is that the image size and compilation time are unnecessarily big. Also, the idea is to make the "ENABLE_ASAN" self-sufficient, which would not be the case for this approach.
- Why I did it
To improve the readability of asan logs.
- How I did it
Added SYNCD_DBG and SWSS_DBG to corresponding docker images for ASAN_ENABLE=y build
- How to verify it
Add artificial memory leak
Build with ASAN_ENABLE=y
Test the image and check the ASAN report
Signed-off-by: Yakiv Huryk <yhuryk@nvidia.com>
- Why I did it
Update MFT to newer version
- How I did it
Update MFT_VERSION in platform/mellanox/mft.mk
- How to verify it
Check version via dpkg -l | grep mft
Signed-off-by: Andriy Yurkiv <ayurkiv@nvidia.com>
Why I did it
add celestica belgite platform
How I did it
add belgite platform in celestica
Co-authored-by: nicwu-cel <nicwu@celestica.com>
Co-authored-by: anjian <anjian@celestica.com>
Co-authored-by: sandycelestica <sandyli@celestica.com>
This update has following changes
Refactor pci topology logic for chassis (fixes some chassis commands and chassisd on linecard)
Introduce new cooling algorithm
Fix linecard poweroff logic when supervisor is going down
Fix linecard status led leading to system-health crashing
Misc fixes
- Why I did it
Script fails when there is an exception while reading
- How I did it
Add more logs and checks. Fix wrong variable naming and messages.
- How to verify it
Provoke exception while read_eeprom() and check that it is handled properly
Why I did it
To include ONIE version in show platform firmware status command output in DellEMC S6100 and Z9332f platforms.
How I did it
Include ‘ONIE’ in the list of components provided by platform APIs in DellEMC S6100 and Z9332f.
Unmount ONIE-BOOT if mounted using fast/soft/warm-reboot plugins in DellEMC S6100.
Fixes#9279
- Why I did it
Part of larger effort to move all SONiC systems to bullseye
- How I did it
1. Update container makefiles with correct dependencies
2. Update container Dockerfile with correct base image
3. Update container Dockerfile with correct apt dependencies
4. Update any other makefiles with dependencies to remove python2 support
5. Minor changes to support bullseye / python3
- How to verify it
Run regression on the switch:
1. Verify PTF community tests work
2. Verify syncd runs and all ports come up / pass traffic
3. Verify all platform tests succeed
Update SDK/FW to 4.5.1500/2010.1500 and SAI version to 1.21.1.1
SDK/FW features:
1. Added support for Finisar DR4 (FTCD4523E2PCM) on Spectrum-2 and Spectrum-3 systems.
SAI Features:
1. ECMP overlay support for IPv6
2. BFD offloading / 4K scale
3. Host interface user traps + improved trap registration (table entry)
4. gcc11 compilation fixes
5. Read support for ACL redirect action
6. Optimize ECMP DB size
7. Buffer descriptors new defaults
8. Updated port mapping for SN2201
SAI Fixes:
1. Debug counter removal when configured with all drop reasons
- Why I did it
Upgrade Mellanox SDK and SAI versions to latest
- How I did it
Updated submodule pointers
- How to verify it
Regression tested
Currently, the build dockers are created as a user dockers(docker-base-stretch-<user>, etc) that are
specific to each user. But the sonic dockers (docker-database, docker-swss, etc) are
created with a fixed docker name and common to all the users.
docker-database:latest
docker-swss:latest
When multiple builds are triggered on the same build server that creates parallel building issue because
all the build jobs are trying to create the same docker with latest tag.
This happens only when sonic dockers are built using native host dockerd for sonic docker image creation.
This patch creates all sonic dockers as user sonic dockers and then, while
saving and loading the user sonic dockers, it rename the user sonic
dockers into correct sonic dockers with tag as latest.
docker-database:latest <== SAVE/LOAD ==> docker-database-<user>:tag
The user sonic docker names are derived from 'DOCKER_USERNAME and DOCKER_USERTAG' make env
variable and using Jinja template, it replaces the FROM docker name with correct user sonic docker name for
loading and saving the docker image.
Why I did it
Migrate ptftests script to python3, in order to do an incremental migration, add python virtual environment firstly, install all required python packages in virtual env as well.
Then migrate ptftests scripts from python2 to python3 one by one avoid impacting non-changed scripts.
Signed-off-by: Zhaohui Sun zhaohuisun@microsoft.com
How I did it
Add python3 virtual environment for docker-ptf.
Add submodule ptf-py3 and install patched ptf 0.9.3 into virtual environment as well, two ptf issues were reported here:
p4lang/ptf#173p4lang/ptf#174
Signed-off-by: Zhaohui Sun <zhaohuisun@microsoft.com>
On vs platform, egress_lossless_pool's mode is static.
So the corresponding profile should be of static_th as well.
Signed-off-by: Stephen Sun <stephens@nvidia.com>
* Optimize dx010 sonic platform init script to speed up init process
* Merge issue #10152: [warm-upgrade][202012] Slow Celestica platform init
in rc.local causes lacp-teardown fix into master branch
Signed-off-by: Eric Zhu <erzhu@celestica.com>
- Why I did it
To support docker-sonic-vs image with ASAN.
- How I did it
1. Made the supervisord.conf a template
2. Added the 'log_path' environment variable for ASAN-enabled daemons
3. Added supervisord.conf.j2 generation and ASAN lib to the docker-sonic-vs/Dockerfile.j2
- How to verify it
1. Made a build with ENABLE_ASAN=y
2. Run the tests, checked ASAN reports
Signed-off-by: Yakiv Huryk <yhuryk@nvidia.com>
- Why I did it
InvalidPsuVolWA.run might raise exception if user power off PSU when it is running. This exception is not caught and will be raised to psud which causes psud failed to update PSU data to DB.
- How I did it
1. Change the log level when WA does not work. This could happen when user power off PSU, hence changing the log level from error to warning is better
2. Change the wait time from 5 to 1 to avoid introduce too much delay in psud. 1 second is usually enough per my test
3. Give a default return value for function get_voltage_low_threshold and get_voltage_high_threshold to avoid exception reach to psud
- How to verify it
Manual test.
Run sonic-mgmt regression
Fix the issues #10501 and #9733
If having gearbox, we need:
* add gbsyncd as a peer since swss also has dependency on gbsyncd
* add service gbsyncd to FEATURE table if it is missing
- Why I did it
Implement newly added reboot causes in PR Azure/sonic-platform-common#277
- How I did it
Map the reboot cause sysfs to the newly added reboot causes.
- How to verify it
manual test, check whether the reboot cause is correct after rebooting the switch in various ways.
run the community reboot test to see whether the reboot cause checker is passing.
Signed-off-by: Kebo Liu <kebol@nvidia.com>
The v0.7.5 has bug fix for the support of gearbox port and macsec counters. It also includes a owl firmware update with owl.lz4.fw.1.94.0.bin.
How I did it
Update credo sai url for v0.7.5
Update gearbox_config.json with using firmware owl.lz4.fw.1.94.0.bin instead of owl.lz4.fw.1.92.1.bin
How to verify it
Test gearbox port and macsec counter successfully on A7280.
Why I did it
To support address sanitizer for Mellanox syncd
How I did it
/var/log/asan is mapped for syncd container (the same as for swss)
container stop() has a timeout (60s) for syncd (the same as for swss)
This is so libasan has enough time to generate a report.
added ASAN's log path to Mellanox syncd supervisord.conf
added "asan: yes" to sonic_version.yml
How to verify it
Added artificial memory leaks
Compiled with ENABLE_ASAN=y
Installed the image on DUT
Rebooted the DUT
Verified that /var/log/asan/syncd-asan.log contains the leaks
Signed-off-by: Yakiv Huryk <yhuryk@nvidia.com>
- Why I did it
There is a hardware bug that PSU voltage threshold sysfs returns incorrect value. The workaround is to call "sensor -s" to refresh it.
- How I did it
Call "sensor -s" when the threshold value is not incorrect and PSU is "DELTA 1100"
- How to verify it
Unit test and Manual test
Why I did it
Prevent from i2c bus to get locked.
How I did it
Add sysfs driver to access ioport.
Command to reset i2c mux:
echo 1 > /sys/devices/platform/as9716_32d_ioport/i2c_mux_rst
Command to bring i2c mux out of reset:
echo 0 > /sys/devices/platform/as9716_32d_ioport/i2c_mux_rst
Signed-off-by: Brandon Chuang <brandon_chuang@edge-core.com>
Why I did it
For trident4/tomahawk4, linux_ngknet.ko and linux_ngknetcb.ko have to be installed. Also, the kernel modules to load on such chips are different from existing ones, so we add an option is_ltsw_chip to determine the kernel modules to load. The option is_ltsw_chip is controlled by adding 'is_ltsw_chip=1' to platform_env.conf or not.
How to verify it
We verified that existing platforms still work after this change; and for platforms with trident4/tomahawk4, we can load the different kernel modules as expected after adding 'is_ltsw_chip=1' to platform_env.conf
b67d479 Fixed the sfp refactor issue
827c5a6 Added nokia_cmd command nokia_common grpc support for power down/up SFM module
aeb7f56 Added the nokia cli commands for midplane
c57d083 Fix the get_my_module issue and the thermal_infos exception issue.
0536293 Change the output of "show chassis module status"
63212d7 Enhance the help display for nokia_cmd command
e8d2599 Fix the sonic_install_ndk_service script issue
d52bdcf Add command nokia_cmd show sfm-eeprom support
Signed-off-by: mlok <marty.lok@nokia.com>
* [BFN] Fix for run fwutil without sudo
SONiC has a concept of "platform components"
this may include - CPLD, FPGA, BIOS, BMC, etc.
These changes are needed to read the version of the BIOS and BMC component.
What I did
The previous implementaion of component.py expect fwutil run with sudo.
When fwutil run without sudo, there are an exception:
```
Traceback (most recent call last):
File "/usr/local/bin/fwutil", line 5, in <module>
from fwutil.main import cli
File "/usr/local/lib/python3.9/dist-packages/fwutil/__init__.py", line 3, in <module>
from . import main
File "/usr/local/lib/python3.9/dist-packages/fwutil/main.py", line 40, in <module>
pdp = PlatformDataProvider()
File "/usr/local/lib/python3.9/dist-packages/fwutil/lib.py", line 159, in __init__
self.__platform = Platform()
File "/usr/local/lib/python3.9/dist-packages/sonic_platform/platform.py", line 21, in __init__
self._chassis = Chassis()
File "/usr/local/lib/python3.9/dist-packages/sonic_platform/chassis.py", line 48, in __init__
self.__initialize_components()
File "/usr/local/lib/python3.9/dist-packages/sonic_platform/chassis.py", line 136, in __initialize_components
component = Components(index)
File "/usr/local/lib/python3.9/dist-packages/sonic_platform/component.py", line 184, in __init__
self.version = get_bios_version()
File "/usr/local/lib/python3.9/dist-packages/sonic_platform/component.py", line 19, in get_bios_version
return subprocess.check_output(['dmidecode', '-s', 'bios-version']).strip().decode()
File "/usr/lib/python3.9/subprocess.py", line 424, in check_output
return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
File "/usr/lib/python3.9/subprocess.py", line 505, in run
with Popen(*popenargs, **kwargs) as process:
File "/usr/lib/python3.9/subprocess.py", line 951, in __init__
self._execute_child(args, executable, preexec_fn, close_fds,
File "/usr/lib/python3.9/subprocess.py", line 1823, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'dmidecode'
```
How I did it
Modification of dmidecode command
How to verify it
Run manually 'fwutil' (without sudo)
Previous command output had exception
New command output:
Root privileges are required
Signed-off-by: Taras Keryk <tarasx.keryk@intel.com>
* Why I did it
The previous implementaion of component.py expect fwutil run with sudo.
When fwutil run without sudo, there are an exception:
Traceback (most recent call last):
File "/usr/local/bin/fwutil", line 5, in <module>
from fwutil.main import cli
File "/usr/local/lib/python3.9/dist-packages/fwutil/__init__.py", line 3, in <module>
from . import main
File "/usr/local/lib/python3.9/dist-packages/fwutil/main.py", line 40, in <module>
pdp = PlatformDataProvider()
File "/usr/local/lib/python3.9/dist-packages/fwutil/lib.py", line 159, in __init__
self.__platform = Platform()
File "/usr/local/lib/python3.9/dist-packages/sonic_platform/platform.py", line 21, in __init__
self._chassis = Chassis()
File "/usr/local/lib/python3.9/dist-packages/sonic_platform/chassis.py", line 48, in __init__
self.__initialize_components()
File "/usr/local/lib/python3.9/dist-packages/sonic_platform/chassis.py", line 136, in __initialize_components
component = Components(index)
File "/usr/local/lib/python3.9/dist-packages/sonic_platform/component.py", line 184, in __init__
self.version = get_bios_version()
File "/usr/local/lib/python3.9/dist-packages/sonic_platform/component.py", line 19, in get_bios_version
return subprocess.check_output(['dmidecode', '-s', 'bios-version']).strip().decode()
File "/usr/lib/python3.9/subprocess.py", line 424, in check_output
return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
File "/usr/lib/python3.9/subprocess.py", line 505, in run
with Popen(*popenargs, **kwargs) as process:
File "/usr/lib/python3.9/subprocess.py", line 951, in __init__
self._execute_child(args, executable, preexec_fn, close_fds,
File "/usr/lib/python3.9/subprocess.py", line 1823, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'dmidecode'
How I did it
Modification of dmidecode command
How to verify it
Run manually 'fwutil' (without sudo)
Previous command output had exception
New command output:
Root privileges are required
Signed-off-by: Taras Keryk tarasx.keryk@intel.com
Signed-off-by: Taras Keryk <tarasx.keryk@intel.com>
* rewrite a call of dmidecode, when run without sudo
Signed-off-by: Taras Keryk <tarasx.keryk@intel.com>
* Why I did it
The previous implementaion of component.py expect fwutil run with sudo.
When fwutil run without sudo, there are an exception:
Traceback (most recent call last):
File "/usr/local/bin/fwutil", line 5, in <module>
from fwutil.main import cli
File "/usr/local/lib/python3.9/dist-packages/fwutil/__init__.py", line 3, in <module>
from . import main
File "/usr/local/lib/python3.9/dist-packages/fwutil/main.py", line 40, in <module>
pdp = PlatformDataProvider()
File "/usr/local/lib/python3.9/dist-packages/fwutil/lib.py", line 159, in __init__
self.__platform = Platform()
File "/usr/local/lib/python3.9/dist-packages/sonic_platform/platform.py", line 21, in __init__
self._chassis = Chassis()
File "/usr/local/lib/python3.9/dist-packages/sonic_platform/chassis.py", line 48, in __init__
self.__initialize_components()
File "/usr/local/lib/python3.9/dist-packages/sonic_platform/chassis.py", line 136, in __initialize_components
component = Components(index)
File "/usr/local/lib/python3.9/dist-packages/sonic_platform/component.py", line 184, in __init__
self.version = get_bios_version()
File "/usr/local/lib/python3.9/dist-packages/sonic_platform/component.py", line 19, in get_bios_version
return subprocess.check_output(['dmidecode', '-s', 'bios-version']).strip().decode()
File "/usr/lib/python3.9/subprocess.py", line 424, in check_output
return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
File "/usr/lib/python3.9/subprocess.py", line 505, in run
with Popen(*popenargs, **kwargs) as process:
File "/usr/lib/python3.9/subprocess.py", line 951, in __init__
self._execute_child(args, executable, preexec_fn, close_fds,
File "/usr/lib/python3.9/subprocess.py", line 1823, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'dmidecode'
The previous implementaion of eeprom.py expect fwutil run with sudo.
When fwutil run without sudo, there are an exception:
Traceback (most recent call last):
File "/usr/lib/python3.9/logging/config.py", line 564, in configure
handler = self.configure_handler(handlers[name])
File "/usr/lib/python3.9/logging/config.py", line 745, in configure_handler
result = factory(**kwargs)
File "/usr/lib/python3.9/logging/handlers.py", line 153, in init
BaseRotatingHandler.init(self, filename, mode, encoding=encoding,
File "/usr/lib/python3.9/logging/handlers.py", line 58, in init
logging.FileHandler.init(self, filename, mode=mode,
File "/usr/lib/python3.9/logging/init.py", line 1142, in init
StreamHandler.init(self, self._open())
File "/usr/lib/python3.9/logging/init.py", line 1171, in _open
return open(self.baseFilename, self.mode, encoding=self.encoding,
PermissionError: [Errno 13] Permission denied: '/var/log/platform.log'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/bin/fwutil", line 5, in
from fwutil.main import cli
File "/usr/local/lib/python3.9/dist-packages/fwutil/init.py", line 3, in
from . import main
File "/usr/local/lib/python3.9/dist-packages/fwutil/main.py", line 41, in
pdp = PlatformDataProvider()
File "/usr/local/lib/python3.9/dist-packages/fwutil/lib.py", line 162, in init
self.chassis_component_map = self.__get_chassis_component_map()
File "/usr/local/lib/python3.9/dist-packages/fwutil/lib.py", line 168, in __get_chassis_component_map
chassis_name = self.__chassis.get_name()
File "/usr/local/lib/python3.9/dist-packages/sonic_platform/chassis.py", line 146, in get_name
return self._eeprom.modelstr()
File "/usr/local/lib/python3.9/dist-packages/sonic_platform/chassis.py", line 54, in _eeprom
self.__eeprom = Eeprom()
File "/usr/local/lib/python3.9/dist-packages/sonic_platform/eeprom.py", line 50, in init
logging.config.dictConfig(config_dict)
File "/usr/lib/python3.9/logging/config.py", line 809, in dictConfig
dictConfigClass(config).configure()
File "/usr/lib/python3.9/logging/config.py", line 571, in configure
raise ValueError('Unable to configure handler '
ValueError: Unable to configure handler 'file'
How I did it
Modification call of dmidecode command.
Added modification of log files access attributes before file open operations.
How to verify it
Run manually 'fwutil' (without sudo)
New command output have no exception.
Signed-off-by: Taras Keryk <tarasx.keryk@intel.com>
* Added file_check for checking access to log files for eeprom.py
Signed-off-by: Taras Keryk <tarasx.keryk@intel.com>
* Removed unused import
* Added logfile_create to eeprom.py and chassis.py
Signed-off-by: Taras Keryk <tarasx.keryk@intel.com>
* Created platform_utils.py
Signed-off-by: Taras Keryk <tarasx.keryk@intel.com>
* Added interpreter string to platform_utils.py
Signed-off-by: Taras Keryk <tarasx.keryk@intel.com>