Since the acquisition and annotation of real-world data is complex, computer vision datasets only capture a fraction of our continuous world. To cope with unseen conditions, fight biases of the training data, or simply reduce our dependency on data, algorithms may be trained in a weak-/un-supervised fashion. Recently, novel avenues of research have emerged to relax supervision (less labels, less data) for example using multimodal models, generative AI, transfer learning, continual learning, etc. This lets us foresee new frontiers of computer vision, holding immense potential for the African society. This 3rd WSCV edition will gather leading computer vision figures with keynotes and lightning talks on various topics like: zero-shot training, multimodal models / foundational models, open-vocabulary, self-/un-supervised training, diffusion models, robustness and uncertainty estimation; as well as talks on african initiatives.

📢 The workshop will have a poster session showcasing participants works on computer vision.
Prize will be awarded 🏆


News 08/21: Papers submission is now closed. DLI participants can still present their DLI poster at the workshop (cf. contact at the bottom). We announced the prizes details ;-)

Invited Speakers


Vicky Kalogeiton

École Polytechnique

Daniel Omeiza

University of Oxford

Joyce Nakatumba-Nabende

Makerere University

Candace Ross

Meta AI

Daniel Ajisafe

University of British Columbia

Pierluigi Zama Ramirez

University of Bologna

Program

Times are local Dakar time (GMT).
08:30am
- Workshop start
- Opening remarks. Raoul de Charette, Inria
08:30am
- Oriane Siméoni, Valeo.ai
Object localization (almost) for free harnessing self-supervised features
[+] Abstract
The localization of objects in images is today at the heart of many perception systems. However, training object detectors requires large and expensive campaigns of annotation for a finite and pre-defined vocabulary. Instead, being able to discover objects in images without knowing in advance which objects populate a dataset is an exciting prospect. In this talk we will discuss solutions to exploit self-supervised pre-trained features to perform class-agnostic object localization with zero annotation, and such without requiring object proposals nor expensive exploration of image collections. Then, we will investigate means to unite unsupervised object localization with VLM open-vocabulary features leading to good quality open-vocabulary semantic segmentation with no extra annotation.
09:30am
- Pierluigi Zama Ramirez, University of Bologna
Neural Processing of 3D Neural Fields
[+] Abstract
In recent years, Neural Fields have emerged as an effective tool for encoding diverse continuous signals such as images, videos, and 3D shapes. However, given that Neural Fields are essentially neural networks, it remains unclear whether and how they can be seamlessly integrated into deep learning pipelines for solving downstream tasks. This presentation delves into the novel research problem of Neural Fields processing through deep learning pipelines, exploring techniques for leveraging this data representation to perform tasks such as classification, segmentation, and even more complex tasks like natural language understanding of the neural field content.
10:30am
☕ Coffee Break

11:00am
- Daniel Ajisafe, University of British Columbia
Behind the Scenes - Learning Human Body Pose, Shape and Appearance from Mirror Videos
[+] Abstract
Humans exists as essential part of the world and its important to develop algorithms that can reconstruct humans in their full digital form. While prior works have attempted using marker suits to collect 3D data or using multiple cameras to achieve this purpose, mirrors form an affordable and available alternative producing the reflection of a person in a way that is temporally synchronized. In this talk, I will uncover whats behind the scenes, specifically our main contributions that extends articulated neural radiance fields to include a notion of the mirror and making it sample efficient over potential occlusion regions. I will also demonstrate the benefit of learning a complete body model from mirror scenes.
11:30am
- Joyce Nakatumba-Nabende, Makerere AI Lab
(TBD) Computer vision and African Initiatives at Makerere AI Lab
[+] Abstract
TBD
12:00pm
- Candace Ross, Meta AI
(TBD) Vision and Language
[+] Abstract
TBD
🍽️ Lunch break / 🎓 Mentoring lunch (upon registration at the workshop)
2:00pm
- Daniel Omeiza, University of Oxford
Providing Explanations for Responsible Autonomous Driving
[+] Abstract
The increasing development of sophisticated AI models over the last few years has been characterised by rising societal concerns around safety and trust. Groups advocating responsible AI (RAI) have advocated for explainability, a desirable requirement for AI technologies, including agent-based systems such as autonomous vehicles (AVs). AVs should be able to explain what they have ‘seen’, done, and might do in environments in which they operate and do so in intelligible forms. In this talk, I would motivate the need for explainability in autonomous driving, talk about existing efforts in CV and NLP to make AVs explainable, and discuss some research efforts from our group at Oxford to make AVs explainable and safer.
2:45pm
- Spotlights presentations

  • BioNAS: Incorporating Bio-inspired Learning Rules to Neural Architecture Search Imane Hamzaoui
  • RGB UAV Imagery Segmentation: Comparative Study Mathews Jahnical Jere
  • AJA-pose: A Framework for Animal Pose Estimation based on VHR Network Architecture Austin Kaburia Kibaara, Joan Kabura, Antony M. Gitau, Ciira wa Maina
3:00pm
👁️🗣️🤝🏾 Poster sessions (20 posters) + ☕ Coffee Break

4:00pm
- Vicky Kalogeiton, École Polytechnique
Multimodality for story-level understanding and generation of visual data
[+] Abstract
In this talk, I will address the importance of multimodality (i.e. using more than one modality, such as video, audio, text, masks and clinical data) for story-level recognition and generation. First, I will focus on story-level multimodal video understanding, as audio, faces, and visual temporal structure come naturally with the videos, and we can exploit them for free (FunnyNet-W and Short Film Dataset). Then, I will show some examples of visual generation from text and other modalities (ET, CAD, DynamicGuidance).
5:00pm
- Panel + 🏆 Announcement of the Awards 🏆

5:30pm
- Workshop end

Call for Papers

We welcome submission of short/regular papers on any computer vision topics, for presentation at the poster session. It can be original or recently published work.

Submission deadline: August 11th 2024 (Anywhere on Earth).
Submission deadline (EXTENSION): August 20th 2024 (Anywhere on Earth).
🏆 Prizes will be awarded. 🏆
500$ cash (Snap gift)
GoPro HERO12 (GoPro gift)
Submission website:
https://openreview.net/group?id=DeepLearningIndaba.com/2024/Workshop/WSCV

Instructions:
Submissions should be 4 to 8 pages (excluding references pages).
We encourage submissions to use our double-column latex kit but we will accept single/double columns submissions with any format. Anonymity is optional.
We accept submission which are original, under review, or already published.

⚠️ If you don't yet have an openreview account, note that the account creation can take a few days for validation. Create your account asap.


The topics of interest include, but are not limited to:

Organizers

Fabio Pizzati
Fabio Pizzati

Oxford Uni.
Tuan-Hung Vu
Tuan-Hung Vu

Valeo.ai
Andrei Bursuc
Andrei Bursuc

Valeo.ai

Sileye Ba

L'Oréal

Volunteers

Lama Moukheiber
Lama Moukheiber

University at Buffalo
Benjamin Rukundo
Benjamin Rukundo

Makerere University

Volunteers are welcome to help at the workshop. Just contact us.

Important dates

📢 Want to volunteer ? Any questions ? Contact Raoul de Charette.


Thanks to our award sponsors