The race condition could happen like this:
When an interface is enslaved into the port channel immediately after
it is created, the order of creating the ifinfo and linking the ifinfo to
the port is not guaranteed.
Please check the patch commit message to get full details.
Signed-off-by: Shu0T1an ChenG <shuche@microsoft.com>
ARM Architecture support in SONIC
make configure platform=[ASIC_VENDOR_ARCH] PLATFORM_ARCH=[ARM_ARCH]
SONIC_ARCH: default amd64
armhf - arm32bit
arm64 - arm64bit
Signed-off-by: Antony Rheneus <arheneus@marvell.com>
* Added debug symbols to many debug dockers.
* For debug images *only*:
1) Archive source files into debug image
2) Archived source is copied into /src
3) Created an empty dir /debug
4) Mount both /src as ro & /debug as rw into every docker
5) Login banner will give some details on /src & /debug
6) Devs can copy core file into /debug and view it from inside a container.
7) Dev may create all gdb logs and other data directly into /debug.
* Dropped redundant REDIS_TOOLS per review comments.
* Added debug symbols to frr package and hence FRR based BGP docker.
* 1) Moved dbg_files.sh to scripts/
2) Src directories to archive are now collected from individual Makefiles.
3) Added few more debug symbols
4) Added few more debug dockers.
Here after no more changes except per review comments.
To debug:
Install required version of debug image in Switch or VM.
Copy core file into /debug of host
Get into Docker
gdb /usr/bin/<daemon> -c /debug/<your core file>
set directory /src/... <-- inside gdb to get the source
For non-in-depth debugging:
Download corresponding debug Docker image (docker-...-dbg.gz) to your VM
Load the image
Run image with entrypoint as 'bash' with dir containing core mapped in.
Run gdb on the core.
Port libteam patch which fixes the race condition we observed during
warm reboot.
Remove early patches: 0006, 0008, 0009.
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
Backport of
54f137c105
According to 6.4.15 of IEEE 802.1AX-2014, Figure 6-22, the state that the
port is selected moves MUX state from DETACHED to ATTACHED.
But ATTACHED state does not mean that the port can send and receive user
frames. COLLECTING_DISTRIBUTION state is the state that the port can send
and receive user frames. To move MUX state from ATTACHED to
COLLECTING_DISTRIBUTION, the partner state should be sync as well as the
port selected.
In function lacp_port_actor_update(), only INFO_STATE_SYNCHRONIZATION
should be set to the actor.state when the port is selected.
INFO_STATE_COLLECTING and INFO_STATE_DISTRIBUTING should be set to false
with ATTACHED mode and set to true when INFO_STATE_SYNCHRONIZATION of
partner.state is set.
In function lacp_port_should_be_{enabled, disabled}(), we also need to
check the INFO_STATE_SYNCHRONIZATION bit of partner.state.
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
When adding a lag member dynamically after system boots up, teamd
port priv change handler could re-entrant itself and causing adding
operation to fail.
While handling PORT_CHANGE event, teamd_per_port.c port priv change
handler was called, it will then call runner_lacp to add port to lag,
the later causes IFINFO_CHANGE to be notified and calls the priv change
handler again, this re-entrance would cause runner_lacp port_added to
be called again and messes up with the previous adding sequence. Then
fails the lag member adding operation.
Prevent per port priv change handler re-entrance solves the problem.
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
Race condition has been noticed after warm reboot: sometimes when
port_changed notification was received, the link message didn't
have the device name. Without device name, creating team port
would fail.
Registering to the interface information change notification, so
later when device name becomes available, retry creating team port.
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
The race condition could happen like this:
When an interface is enslaved into the port channel immediately after
it is created, the order of creating the ifinfo and linking the ifinfo to
the port is not guaranteed.
Please check the patch commit message to get full details.
Signed-off-by: Shu0T1an ChenG <shuche@microsoft.com>
When using actor port number 0 in lag configuration, IO cannot be sent to
peer. Increase actor port number by 1 to keep uniqueness and at the same
time, avoid using actor port number 0.
Ref. 802.1AX 6.3.4 Port identification
Signed-off-by: Ying Xie <ying.xie@microsoft.com>
- What I did
Fixed vanilla teamd bug, which prevented teamd to have a correct view of kernel state. Check bug #2 from the message
Changed schema for LACP port id.
Changed severity of an error message.
Removed logic to disable warm_start_read mode, when teamd started. It didn't work in system restart mode, because interfaces were added one by one, and it's impossible to say when everything is added.
- How I did it
I've added team_refresh() on every port addition
I extract port id from the port name. Currently I support only "EthernetX" scheme. We need to add more schemes if we change port scheme.
_err -> _info
...
- How to verify it
Build the image, install on your DUT, reboot it once, then reboot it on WR mode checking LACP state on remote side. The state shouldn't flip.
* Don't put down LAG interface when it starts in WR mode
* Change logic. Don't touch carrier in WR mode. Until it could be in UP mode
* Change control plane restore logic in WR mode
* Fix teamd behavior for Warm-reboot mode
* Don't save 'read' state into the struct. Try to read a lacp file everytime when a port starts.
* Fix filename for access()
* [libteam] Add fallback support for single-member-port LAG
* Allow the port to be selected if the LAG is configured
with fallback and port is in defaulted state due to missing
LACP PDUs from remote end
* Only enable port if LAG is admin up and the member port
is link up
* [team] Add lacp fallback config to teamd.j2 template
* [teamd] Resolve config conflict between fallback and minlink
* Remove min_link config if fallback is configured
* Add support for fallback config in minigraph
* [teamd] Only enable fallback if it is single-member-port LAG
Signed-off-by: Haiyang Zheng <haiyang.z@alibaba-inc.com>
* [teamd] Removing the admin status check in lacp_port_link_update
Will submit another pull request to fix this issue.
Signed-off-by: Haiyang Zheng <haiyang.z@alibaba-inc.com>
* [config]: Add SONIC_CONFIG_MAKE_JOBS
This config option allows user to specify -j value that will be passed
to each package build.
Signed-off-by: marian-pritsak <marianp@mellanox.com>
* Build improvements
Fix dependencies
Add configuration options
Automatically build sonic-slave
* Set default number of jobs to 1
* Auto generate target/debs directory
Signed-off-by: marian-pritsak <marianp@mellanox.com>
* Automatically remove sonic-slave container after exit
* Silence clean-logs
* Add SONIC_CLEAN_TARGETS to clean
* Use second expansion for clean dependencies
* Avoid creating empty log files
Remove log file on flush instead of writing empty string
* Put dpkg install inside lock
Use same lock as debian install targets do to avoid
race condition in dpkg installation
* Remove redirect to log from docker save
* Add .platform dependency to all and clean targets
* Remove header and footer from clean targets
* Disable messages for SONIC_CLEAN_TARGETS
* Exit with error if dpkg-buildpackage fails
* Set new location for debs in build_debian.sh
* Add recipe for docker-database
* Update redis version to 3.2.4
* Add support for p4 platform
* Add recipe for snmpd
* Add slave targets to phony and make all target default
* Remove build.sh from thrift
* Add versioning to team, nl, hiredis and initramfs
* Change sonic-slave to support snmpd build from sources
* Remove src/tenjin
* Add recipe for lldpd
* Add recipe for mpdecimal
* Remove hiredis directory on rebuild
* Add recipe for Mellanox hw management
* Remove generic image from all targets for Mellanox
* Add support for python wheels
* Add lldp and snmp dockers
* Sync docker-database to include libjemalloc
* Fix asyncsnmp variable name
* Change default build configuration
Redirect output to log files by default
Set number of jobs to nproc value
Do not print dependencies
Fix logging to print log of failed job into console
* Use docker inspect to check if sonic-slave image exists
* Use config in slave.mk directly
* Disable color output by default
* Remove sswsdk dependency from lldp and snmp dockers
* Fix comment in py wheels install targets
* Add dependency between two versions of sswsdk
* Add containers to mellanox platform
lldp, snmp and database containers
* Add recipe for team docker
* Add team docker to mellanox platform
* Encrypt password passed to build_debian.sh
* Update mellanox SAI version
Make version and revision setting only in main recipe
* Fix error handling in makefiles
As makefiles use .ONESHELL we should add -e
option to shell options in order to exit after any command fails
* Add recipe for platform monitor image
* Add platfotm monitor to mellanox targets
* Ignore submodules when building base image