We experiment with the generalizability and robustness of our method by testing it on videos collected from the Internet.
Our method works well even outside of the DAVIS dataset.
151 frames with 6 objects. User time: ~180s.
The two fighters have close to mirrored appearance.
216 frames with 4 objects. User time: just under 6 seconds!
130 frames with 2 objects. User time: ~60s.
Thin structure like the legs of the chair can be well-captured.
252 frames with 3 objects. User time: ~60s.
Interaction between moving objects does not pose a major challenge to our method.
181 frames with 3 objects. User time: ~35s.
All pandas look the same but we can still handle it efficiently.
>
168 frames with 1 object. User time: ~40s. Occlusion from objects with similar appearance is difficult to handle for iVOS methods.