multiple processors per sensor

added bug label

created branch 489-multiple-processors-per-sensor to address this issue

mentioned in issue mobile_robotics/wolf_projects/wolf_lib/plugins/vision#30 (closed)

changed the description

So there are a few things needed to understand the prb

WOLF nodes always connect upwards and downwards to another node. Frames link downwards to Captures, and Captures link upwards to Frames.
WOLF processors and in particular ProcessorTracker works with three Captures:
- Origin, links to a keyframe in the WOLF tree
- Last, links to a non-keyframe present in the same Processor. We call these "floating frames", since they are not present in the WOLF Trajectory, only local to the processors.
- Incoming, usually not linked to any frame since it is the one just being received by the processor. It may be linked temporarily, before Incoming becomes Last at the end of the processing cycle

The issue is that if two processors have the same Capture, then there are two non-keyframes and only one Capture. The capture cannot be linked to two frames at once.

The solution we envision is to remove the non-keyframes from the processors, and use only keyframes.

Also, we need to check where appropriate if the Capture is present already, linked to another keyframe, and act accordingly. This is what we took care of a couple of weeks ago, but the solution might need to be somewhere else than where we put it then.

For this, the whole ProcessorTracker::processCapture() function needs to be rewritten. If you look at it, you-ll see that there are 6 possible cases to handle: FIRST_TIME, SECOND_TIME, RUNNING, all of them WITH and without KEYFRAME. The WITH_KEYFRAME and WITHOUT_KEYFRAME cases refer to the case where there is or there is not a keyframe created by another processor, and we need to decide if we join it or not.

This makes the work delicate in two senses:

The algorithm is not trivial since the 6 cases need special attention to detail, each
Any mistake in this class will compromise the functionning of all classes deriving from them

In any case, all the unit tests in the tests/ directory need to continue working after the fix. Since there are over 60 or 70 test files, each with many tests, I hope that passing the tests is a guarantee that the job was well done.

This is completely related to #432

I am working on the folowing idea right now.

Keep the floating frames. Give them proper names, like frame_floating.
Do not link the Captures to these frames. Just keep them unlinked, but keep the pair capture/frame consistently updated.
I use the following names for my reasoning:
- capture_processing the capture being processed in the processor (it usually is either last or incoming)
- frame_floating the frame in the processor associated to capture_processing
- keyframe_callback a keyframe that arrived to this processor. We may want to join this keyframe
- capture_callback_same_sensor a capture in keyframe_callback that belongs to the same sensor
Checks to do to allow for two processors per sensor:
- whether the keyframe_callback has a capture of the same sensor
- in such case we are in the situation of two processors per sensor
- Then, assert these conditions which should be true:
  - whether capture_callback_same_sensor == capture_processing
  - whether capture_processing has already a frame linked
  - whether keyframe_callback == capture_processing->getFrame()

Then, in principle, we just need to take care of the situation at the time of joining a keyframe_callback. So only the three cases WITH_KEYFRAME should be concerned. From these, the SECOND_TIME_WITH_KEYFRAME should never appear, and it's just there to catch it with a debug message in case it appears. This leaves out only two cases to address.

created branch 489-multiple-processors-per-sensor-2 to address this issue

mentioned in merge request !468 (merged)

Since capture_processing and frame_floating are not linked, all teh instances of the following code anywhere else in WOLF immediately fail:

processor->getLast()->getFrame() // from outside the processor
this->last_ptr_->getFrame()      // within the processor, using method

they should be substituted by the new ProcessorBase::getLastFrame():

processor->getLastFrame() // from outside the processor
this->getLastFrame()      // within the processor, using method
this->last_frame_ptr_     // within the processor, using attributes

I would revise #432 and what is mentioned about #218 (closed). There could be some "exceptions" made because of floating frames that could be forbidden again.

The new design does not differ much from the old one. Basically:

We do not link to floating frames
In case of receiving a keyframe callback, we check if the incoming frame has a capture of the same sensor
- If it does not have it, we join the KF normally by linking the capture to the KF.
- If it has it, we do not join since the capture is already linked to the KF.
Fixed all accesses to getLast()->getFrame() by using getLastFrame()
Add a new test file for the case of 2 processors in one sensor

Regarding #218 (closed) and #432, we should be safe in removing capture->move(KF) and unlink().

The only usages of move() are in its particular gtest file.
The only usages of unlink() are inside move()

I still have 3 tests failing in gtest_processor_landmark_external which I am investigating. I initially had 5 failing, so I am having progress... :-)

OK:

100% tests passed, 0 tests failed out of 71

So the work is done! Will merge when all plugins adhere to the changes, there might be some things to change, in particular, all instances of:

getLast()->getFrame() ---> getLastFrame()

Many thanks for keeping this issue in your concern, @jsola. I just have two questions:

I assume that you've worked on a branch 489-multiple-processors-per-sensor-2. When I pull, built it, and run the unit tests on my computer, it gives me a few failures:

93% tests passed, 5 tests failed out of 70

Total Test time (real) =   5.05 sec

The following tests FAILED:
         21 - gtest_processor_base (SEGFAULT)
         53 - gtest_map_yaml (Failed)
         57 - gtest_processor_landmark_external (SEGFAULT)
         62 - gtest_processor_tracker_feature_dummy (SEGFAULT)
         63 - gtest_processor_tracker_landmark_dummy (SEGFAULT)
Errors while running CTest

This is more than what I do the same thing on other branches (e.g., main, devel, and 489-multiple-processors-per-sensor):

99% tests passed, 1 tests failed out of 70

Total Test time (real) =   5.02 sec

The following tests FAILED:
         53 - gtest_map_yaml (Failed)
Errors while running CTest

I'm wondering if this situation is anticipated (and natural), or sort of a thing to be addressed. From my intuition, it might be plausible that some unit test results vary depending on the running environment, but this is something we should avoid if possible.

When I do the same roslaunch runs I did last time (that is, use Vision and Apriltag modules at the same time), it gives me the following logging file [logging.txt], which is different from what I posted on Element a week ago but still throwing segfault issue in a different context. I'm curious if this would go away when all plugins change accompanying the changes on the wolfcore (as you've addressed above), or if this is somewhat we should address at this point before moving forward.

Thanks!

No no, all ctests must pass for wolf core and for all plugins

If they do not, there is another kind of issue, probably with your setup.

Are you using Ubuntu? Which version?

I'm using Ubuntu 20.04.6 LTS.

ok same as me

You may want to try uninstalling the whole wolf core lib from your system, then:

cd wolf/build
rm -rf *
cmake .. // the default mode is debug mode
make // no need to install yet
ctest

if this works, then:

cmake --> switch to release mode
make
sudo make install

if it does not work, then

../bin/gtest_<test that is failing>

will output more info about what's failing. Post this info here.

I figured out why the unit test failures happen. The old wolfcore library used to reside on /usr and the unit tests were referring to that old one. I completely removed it, built and reran the same unit tests, and now it works without raising any issues.

Thanks again @jsola for pointing this out!

There was one offending plugin that needs a git pull , but it is the laser plugin which you do not use.

All other plugins are OK with the modifications in core.

The branch on core is effectively 489-multiple-processors-per-sensor-2.

I found a minor bug on plugin apriltag regarding getLast()->getFrame() which cannot be called now. I substituted it by getLastFrame()

In plugins and in your ROS modules, this kind of bug needs to be searched for and fixed systematically in all Processors that derive from ProcessorTracker

You can git pull on apriltag

Same offending bug I found in plugin gnss. Fixed.

Alright, now all the unit tests in wolfcore and execution with ROS wrapper (SLAM demo that uses both vision and apriltag modules at the same time) works, without raising any segfault.

Cool I-m glad to hear that!

I will merge the changes to devel

Fixed a minor bug in one apriltag test

mentioned in commit 60abb040

closed with merge request !468 (merged)

multiple processors per sensor

Designs

Child items ...

Activity