This change was submitted directly to 202205 but it's also needed in master and 202305 with SAI9.x
#13346
There has been a couple CSPs for this as well:
CS00012273013 - [7.1][J2, J2c+] Disable SA Equals DA trap on DNX
CS00012320965 - SAI9.2: iBGP doesn't work due to SA_EQUALS_DA trap
If SA_EQUALS_DA trap is enabled iBGP won't work as the Ethernet-IB0 ports are expected to get packets with SA==DA.
In the VOQ chassis design, for outgoing control plane packets, the packets goes the recycle port for routing, therefore the dmac of the packet should be the asic router mac. The source mac is assigned by the kernel, so it is also the asic router mac.
SAI 9.x requires a SYNCD_SHM_SIZE specified otherwise it will default to 64mb which is insufficient for syncd.
E.G. of a few failures seen when insufficient shmem was set
ha_init: The file: warmboot_data_0 is of size=762[MB] and is beyond the directory: /dev/shm available storage of size=64[MB]#015
syncd.sh[26074]: Cannot get SYNCD_SHM_SIZE for chip: [869] in /usr/share/sonic/device/x86_64-broadcom_common/syncd_shm.ini. Skip set SYNCD_SHM_SIZE.
Syncd hangs here:
syncd#syncd: [none] SAI_API_SWITCH:_brcm_sai_shr_ha_section_resize:536 start=0x7f6e641b4000, end=0x7f6e645b4000, len=302276608, free=0x7f6e641b4000
Broadcom recommended using 1gb for DNX devices.
Since currently we don't use SAI9.x on master and 202305 this change won't fix anything until we upgrade the SAI on those branches.
These devices will not reliabily report the proper devid and vendorid
when reading it is read directly from the pci config space.
It can be read but shouldn't be compared against some fixed value like
the one stored in pcie.yaml.
Since this makes pcied unhappy, the simplest path forward is to just
remove this device from monitoring.
Added support data for fabric monitoring in CONFIG_DB
The CONFIG_DB now has the FABRIC_MONITOR|FABRIC_MONITOR_DATA table for default value for fabric port monitoring. An example output of getting this table is:
sonic-db-cli CONFIG_DB hgetall "FABRIC_MONITOR|FABRIC_MONITOR_DATA"
{'monErrThreshCrcCells': '1', 'monErrThreshRxCells': '61035156', 'monPollThreshIsolation': '1', 'monPollThreshRecovery': '8'}
The CONFIG_DB now also has a table for each fabric port for its isolate status.
An example output of getting this table is:
sonic-db-cli CONFIG_DB hgetall "FABRIC_PORT|Fabric20"
{'alias': 'Fabric20', 'isolateStatus': 'False', 'lanes': '20'}
Why I did it
Work item tracking
Microsoft ADO (24182162):
How I did it
update the config.bcm to set the default fec RS 100G Linecard
How to verify it
Tests on chassis
* Update PG headroom settings ports based on port speed/cable length
* Updated XOFF settings to use chip level numbers than core
* Updated PG headroom based on uplink/downlink side
* fix for sonic-config-gen tests
* More fixes for unit test cases
* more test fixes
* Merged multiple functions into one
Why I did it
Update ECN settings for T2 chassis
How I did it
Updated qos config file to load these settings during switch bootup
How to verify it
Verified on line card on T2 chassis
Why I did it
Support Egress Mirroring on supported Arista platforms
How I did it
Add necessary soc properties for egress mirroring recycle ports to be created
Signed-off-by: Nathan Wolfe <nwolfe@arista.com>
Updated asic_port_names for all Arista LC SKUs to follow latest naming
conventions to remove redundant ASICx suffix. For
Arista-7800R3-48CQ2-C48, added the asic_port_name mapping.
Provide platform-components.json for Clearwater2 and Wolverine
These files are needed for fwutil platform sonic-mgmt tests to pass.
Fix PikeZ platform_components.json
Co-authored-by: Patrick MacArthur <pmacarthur@arista.com>
Co-authored-by: Andy Wong <andywong@arista.com>
To support 64 cores on arista skus. Fixesaristanetworks/sonic#77
Remapped recycle ports to lowers core port ids and set appl_param_nof_ports_per_modid to 64.
Eariler the SDK stat polling was erroneously set to once every msec
which is far more frequent than required by SWSS. The new setting, which
is consistent with other vendor SKUs, is once a second. The net result
is reduced CPU MHz by syncd.
Why I did it
On DNX (J2/J2c/J2c+) platforms, Single Path Nexthops and ECMp Nexthop resources(FECs) are shared. BRCM SAI do not have partition of this resource, and hence more single path Nexthop entries, causes ECMP programming to fail in scaled setup.
How I did it
Broadcom provided SAI changes to reserve resources for single path nexthop entries(More details in CSP: https://brcmsemiconductor-csm.wolkenservicedesk.com/wolken-support/allcases/request-details?requestId=CS00012251649).
Along with SAI changes, they provided configurable Macro/flag to reserve NON_ECMP entries.
This PR is to add that flag in various sai.profile files wherever applicable.
PS: We are reserving 3072 single path Nexthop entries on each Linecard. Calculation is as follows.
Max Slots per chassis: 8
Max No of Ports(each LC): 64
MyIP/Subnet Entries per port: 4(v4/v6)
Nbr Entries Per port: 2(v4/v6)
Total Non_ECMP Count: 8x64x(4+2) = 3072
How to verify it
Without this change, the ECMP group count will be shown as Max_count in 'crm show resources all' command, and with this change the ECMP group count will be decreased by 24(3072/128).
For 7800 LCs, set LAG mode to support 1024 number of 16-member system
LAGs.
Why I did it
The SOC property changes are necessary to match #10519 which increases the number of system LAG IDs to 1024.
Description for the changelog
For 7800 LCs, set LAG mode to support 1024 number of 16-member system
LAGs.
Update the bcm config file system_ref_core_clock_khz param to handlesystems with J2cplus linecards.
We need system_ref_core_clock_khz to be set to 1600000 for supporting j2 and j2cplus linecards on the same chassis.
* Removed unused default_config.json
* Remove asic.conf file from HW SKUs directories as they are not used by upstream code
* Enable dynamic PCI ID identification on Otterlake2
Co-authored-by: Maxime Lorrillere <mlorrillere@arista.com>
Why I did it
Fixes some pmon errors/warnings by providing missing configuration files
How I did it
Add missing pcie.yaml and sensors.conf for supported linecards
How to verify it
pcie-check should pass
sensors should display proper sensor names
Why I did it
The buffer pool & profile setting in buffer template was not correct and caused the errors like the following:
ERR swss#orchagent: :- parseReference: malformed reference:[BUFFER_PROFILE|ingress_lossless_profile]. Must not be surrounded by [ ]
How I did it
Fix the buffer pool & profile setting by removing "[]".
How to verify it
Loaded image with this fix in a switch and made sure the error was not seen anymore.
#### Why I did it
Add platform_asic file to each platform folder in sonic-device-data package. The file content will be used as the ground truth of mapping from PLATFORM_STRING to switch ASIC family.
One use case of the mapping is to prevent installing a wrong image, which targets for other ASIC platforms. For example, currently we have several ONIE images naming as sonic-*.bin, it's easy to mistakenly install the wrong image. With this mapping built into image, we could fetch the ONIE platform string, and figure out which ASIC it is using, and check we are installing the correct image.
After this PR merged, each platform vendor has to add one mandatory text file `device/PLATFORM_VENDOR/PLATFORM_STRING/platform_asic`, with the content of the platform's switch ASIC family.
I will update https://github.com/Azure/SONiC/wiki/Porting-Guide after this PR is merged.
You can get a list of the ASIC platforms by `ls -b platform | cat`. Currently the options are
```
barefoot
broadcom
cavium
centec
centec-arm64
generic
innovium
marvell
marvell-arm64
marvell-armhf
mellanox
nephos
p4
vs
```
Also support
```
broadcom-dnx
```
#### How I did it
#### How to verify it
Test one image on DUT. And check the folders under `/usr/share/sonic/device`
Why I did it
Fix an issue on the Clearwater2 linecard.
When the linecard is started with a fresh image without configuration, phys would not be initialized.
How I did it
Added default_sku for Clearwater2 which prevents config-setup from failing to create a default config_db.json.
Added some extra logic in the phy-credo-init script to run the phy_config.sh of the hwsku pointed by default_sku if the DEVICE_METADATA.localhost.hwsku information is not populated in CONFIG_DB.
How to verify it
Booting an image with this change and without configuration will lead to the phys being initialized using the phy_config.sh from default_sku.
This change introduces 3 columns in the port_config.ini file.
These are coreId, corePortId and numVoq.
The ports for inband and recirc were also renamed properly.
Prevent system-healthd from service from failing at boot time due to missing configuration.
Also adds basic support for healthd.
The following caveat exists with this placeholder configuration:
- No PSU monitoring (sensors/fans)
- No ASIC temperature monitoring
We don't need a custom platform reboot on Clearwater2(Ms). They are expected to be rebooted via a normal linux soft reboot.
Remove symlink to the arista common platform reboot for those 2 platforms.
bring up chassisdb service on sonic switch according to the design in
Distributed Forwarding in VoQ Arch HLD
Signed-off-by: Honggang Xu <hxu@arista.com>
**- Why I did it**
To bring up new ChassisDB service in sonic as designed in ['Distributed forwarding in a VOQ architecture HLD' ](90c1289eaf/doc/chassis/architecture.md).
**- How I did it**
Implement the section 2.3.1 Global DB Organization of the VOQ architecture HLD.
**- How to verify it**
ChassisDB service won't start without chassisdb.conf file on the existing platforms.
ChassisDB service is accessible with global.conf file in the distributed arichitecture.
Signed-off-by: Honggang Xu <hxu@arista.com>
- Merge chassis codebase upstream
- Add support for Otterlake supervisor
- Add support for NorthFace and Camp chassis
- Add support for Eldridge, Dragonfly and Brooks fabrics
- Add support for Clearwater2 and Clearwater2Ms linecards
- Add new arista Cli to power on/off cards
- Add new arista show Cli to inspect supervisor, chassis, fabrics and linecards