(a) DAVIS interaction track results
Results from ATNet are extracted from their open-sourced code.
Interactions are provided by the official DAVIS evaluation robot.
Your browser does not support the video tag.
bmx-trees, DAVIS 2017 validation set.
Your browser does not support the video tag.
kite-surf, DAVIS 2017 validation set.
Your browser does not support the video tag.
pigs, DAVIS 2017 validation set.
Your browser does not support the video tag.
scooter-black, DAVIS 2017 validation set.
Your browser does not support the video tag.
soapbox, DAVIS 2017 validation set.
(b) Real user-interaction processes
These show the entire interaction process with our algorithm.
VIDEO
For objects with complex structure,
users can combine different interaction techniques (clicks, scribbles, local control)
to achieve accurate results.
VIDEO
Using clicks with f-BRS can be highly efficient in annotating objects with clear structure.
VIDEO
Scribbles and clicks can be used in conjunction easily.
(c) Real user-interaction results
We experiment with the generalizability and robustness of our method by testing it on videos collected from the Internet.
Our method works well even outside of the DAVIS dataset.
Your browser does not support the video tag.
151 frames with 6 objects. User time: ~180s.
The two fighters have close to mirrored appearance.
Your browser does not support the video tag.
216 frames with 4 objects. User time: just under 6 seconds!
Your browser does not support the video tag.
130 frames with 2 objects. User time: ~60s.
Thin structure like the legs of the chair can be well-captured.
Your browser does not support the video tag.
252 frames with 3 objects. User time: ~60s.
Interaction between moving objects does not pose a major challenge to our method.
Your browser does not support the video tag.
181 frames with 3 objects. User time: ~35s.
All pandas look the same but we can still handle it efficiently.
Your browser does not support the video tag.
>
168 frames with 1 object. User time: ~40s. Occlusion from objects with similar appearance is difficult to handle for iVOS methods.
Your browser does not support the video tag.
132 frames with 3 objects. User time: ~20s.