We´ll discuss the segmentation fault bug of having more than one processor per sensor, and what happens to the Capture being processed by two processors and therefore being associated with different floating frames in the processors.
So there are a few things needed to understand the prb
WOLF nodes always connect upwards and downwards to another node. Frames link downwards to Captures, and Captures link upwards to Frames.
WOLF processors and in particular ProcessorTracker works with three Captures:
Origin, links to a keyframe in the WOLF tree
Last, links to a non-keyframe present in the same Processor. We call these "floating frames", since they are not present in the WOLF Trajectory, only local to the processors.
Incoming, usually not linked to any frame since it is the one just being received by the processor. It may be linked temporarily, before Incoming becomes Last at the end of the processing cycle
The issue is that if two processors have the same Capture, then there are two non-keyframes and only one Capture. The capture cannot be linked to two frames at once.
The solution we envision is to remove the non-keyframes from the processors, and use only keyframes.
Also, we need to check where appropriate if the Capture is present already, linked to another keyframe, and act accordingly. This is what we took care of a couple of weeks ago, but the solution might need to be somewhere else than where we put it then.
For this, the whole ProcessorTracker::processCapture() function needs to be rewritten. If you look at it, you-ll see that there are 6 possible cases to handle: FIRST_TIME, SECOND_TIME, RUNNING, all of them WITH and without KEYFRAME. The WITH_KEYFRAME and WITHOUT_KEYFRAME cases refer to the case where there is or there is not a keyframe created by another processor, and we need to decide if we join it or not.
This makes the work delicate in two senses:
The algorithm is not trivial since the 6 cases need special attention to detail, each
Any mistake in this class will compromise the functionning of all classes deriving from them
In any case, all the unit tests in the tests/ directory need to continue working after the fix. Since there are over 60 or 70 test files, each with many tests, I hope that passing the tests is a guarantee that the job was well done.
Then, in principle, we just need to take care of the situation at the time of joining a keyframe_callback. So only the three cases WITH_KEYFRAME should be concerned. From these, the SECOND_TIME_WITH_KEYFRAME should never appear, and it's just there to catch it with a debug message in case it appears. This leaves out only two cases to address.
Since capture_processing and frame_floating are not linked, all teh instances of the following code anywhere else in WOLF immediately fail:
processor->getLast()->getFrame()// from outside the processorthis->last_ptr_->getFrame()// within the processor, using method
they should be substituted by the new ProcessorBase::getLastFrame():
processor->getLastFrame()// from outside the processorthis->getLastFrame()// within the processor, using methodthis->last_frame_ptr_// within the processor, using attributes
I would revise #432 and what is mentioned about #218 (closed). There could be some "exceptions" made because of floating frames that could be forbidden again.
Many thanks for keeping this issue in your concern, @jsola. I just have two questions:
I assume that you've worked on a branch 489-multiple-processors-per-sensor-2. When I pull, built it, and run the unit tests on my computer, it gives me a few failures:
93% tests passed, 5 tests failed out of 70Total Test time (real) = 5.05 secThe following tests FAILED: 21 - gtest_processor_base (SEGFAULT) 53 - gtest_map_yaml (Failed) 57 - gtest_processor_landmark_external (SEGFAULT) 62 - gtest_processor_tracker_feature_dummy (SEGFAULT) 63 - gtest_processor_tracker_landmark_dummy (SEGFAULT)Errors while running CTest
This is more than what I do the same thing on other branches (e.g., main, devel, and 489-multiple-processors-per-sensor):
99% tests passed, 1 tests failed out of 70Total Test time (real) = 5.02 secThe following tests FAILED: 53 - gtest_map_yaml (Failed)Errors while running CTest
I'm wondering if this situation is anticipated (and natural), or sort of a thing to be addressed. From my intuition, it might be plausible that some unit test results vary depending on the running environment, but this is something we should avoid if possible.
When I do the same roslaunch runs I did last time (that is, use Vision and Apriltag modules at the same time), it gives me the following logging file [logging.txt], which is different from what I posted on Element a week ago but still throwing segfault issue in a different context. I'm curious if this would go away when all plugins change accompanying the changes on the wolfcore (as you've addressed above), or if this is somewhat we should address at this point before moving forward.
I figured out why the unit test failures happen. The old wolfcore library used to reside on /usr and the unit tests were referring to that old one. I completely removed it, built and reran the same unit tests, and now it works without raising any issues.
I found a minor bug on plugin apriltag regarding getLast()->getFrame() which cannot be called now. I substituted it by getLastFrame()
In plugins and in your ROS modules, this kind of bug needs to be searched for and fixed systematically in all Processors that derive from ProcessorTracker