- Opening remarks. Raoul de Charette, Inria
Since the acquisition and annotation of real-world data is complex, computer vision datasets only capture a fraction of our continuous world. To cope with unseen conditions, fight biases of the training data, or simply reduce our dependency on data, algorithms may be trained in a weak-/un-supervised fashion. Recently, novel avenues of research have emerged to relax supervision (less labels, less data) for example using multimodal models, generative AI, transfer learning, continual learning, etc. This lets us foresee new frontiers of computer vision, holding immense potential for the African society.
This 3rd WSCV edition will gather leading computer vision figures with keynotes and lightning talks on various topics like:
zero-shot training, multimodal models / foundational models, open-vocabulary, self-/un-supervised training, diffusion models, robustness and uncertainty estimation; as well as talks on african initiatives.
📢 The workshop will have a poster session showcasing participants works on computer vision.
Prize will be awarded 🏆
News 08/21: Papers submission is now closed. DLI participants can still present their DLI poster at the workshop (cf. contact at the bottom). We announced the prizes details ;-)
📢 The workshop will have a poster session showcasing participants works on computer vision.
Prize will be awarded 🏆
News 08/21: Papers submission is now closed. DLI participants can still present their DLI poster at the workshop (cf. contact at the bottom). We announced the prizes details ;-)
Invited Speakers
Vicky Kalogeiton
École Polytechnique
Daniel Omeiza
University of Oxford
Joyce Nakatumba-Nabende
Makerere University
Oriane Siméoni
valeo.ai
Candace Ross
Meta AI
Daniel Ajisafe
University of British Columbia
Pierluigi Zama Ramirez
University of Bologna
Program
Times are local Dakar time (GMT).
08:30am
- Workshop start
08:30am
- Oriane Siméoni, Valeo.aiObject localization (almost) for free harnessing self-supervised features
[+] Abstract
The localization of objects in images is today at the heart of many perception systems. However, training object detectors requires large and expensive campaigns of annotation for a finite and pre-defined vocabulary. Instead, being able to discover objects in images without knowing in advance which objects populate a dataset is an exciting prospect. In this talk we will discuss solutions to exploit self-supervised pre-trained features to perform class-agnostic object localization with zero annotation, and such without requiring object proposals nor expensive exploration of image collections. Then, we will investigate means to unite unsupervised object localization with VLM open-vocabulary features leading to good quality open-vocabulary semantic segmentation with no extra annotation.
09:30am
- Pierluigi Zama Ramirez, University of BolognaNeural Processing of 3D Neural Fields
[+] Abstract
In recent years, Neural Fields have emerged as an effective tool for encoding diverse continuous signals such as images, videos, and 3D shapes. However, given that Neural Fields are essentially neural networks, it remains unclear whether and how they can be seamlessly integrated into deep learning pipelines for solving downstream tasks. This presentation delves into the novel research problem of Neural Fields processing through deep learning pipelines, exploring techniques for leveraging this data representation to perform tasks such as classification, segmentation, and even more complex tasks like natural language understanding of the neural field content.
10:30am
☕ Coffee Break
11:00am
- Daniel Ajisafe, University of British ColumbiaBehind the Scenes - Learning Human Body Pose, Shape and Appearance from Mirror Videos
[+] Abstract
Humans exists as essential part of the world and its important to develop algorithms that can reconstruct humans in their full digital form. While prior works have attempted using marker suits to collect 3D data or using multiple cameras to achieve this purpose, mirrors form an affordable and available alternative producing the reflection of a person in a way that is temporally synchronized. In this talk, I will uncover whats behind the scenes, specifically our main contributions that extends articulated neural radiance fields to include a notion of the mirror and making it sample efficient over potential occlusion regions. I will also demonstrate the benefit of learning a complete body model from mirror scenes.
11:30am
- Joyce Nakatumba-Nabende, Makerere AI Lab(TBD) Computer vision and African Initiatives at Makerere AI Lab
[+] Abstract
TBD
🍽️ Lunch break / 🎓 Mentoring lunch (upon registration at the workshop)
2:00pm
- Daniel Omeiza, University of OxfordProviding Explanations for Responsible Autonomous Driving
[+] Abstract
The increasing development of sophisticated AI models over the last few years has been characterised by rising societal concerns around safety and trust. Groups advocating responsible AI (RAI) have advocated for explainability, a desirable requirement for AI technologies, including agent-based systems such as autonomous vehicles (AVs). AVs should be able to explain what they have ‘seen’, done, and might do in environments in which they operate and do so in intelligible forms. In this talk, I would motivate the need for explainability in autonomous driving, talk about existing efforts in CV and NLP to make AVs explainable, and discuss some research efforts from our group at Oxford to make AVs explainable and safer.
2:45pm
- Spotlights presentations
- BioNAS: Incorporating Bio-inspired Learning Rules to Neural Architecture Search
- RGB UAV Imagery Segmentation: Comparative Study
- AJA-pose: A Framework for Animal Pose Estimation based on VHR Network Architecture
3:00pm
👁️🗣️🤝🏾 Poster sessions (20 posters) + ☕ Coffee Break
4:00pm
- Vicky Kalogeiton, École PolytechniqueMultimodality for story-level understanding and generation of visual data
[+] Abstract
In this talk, I will address the importance of multimodality (i.e. using more than one modality, such as video, audio, text, masks and clinical data) for story-level recognition and generation. First, I will focus on story-level multimodal video understanding, as audio, faces, and visual temporal structure come naturally with the videos, and we can exploit them for free (FunnyNet-W and Short Film Dataset). Then, I will show some examples of visual generation from text and other modalities (ET, CAD, DynamicGuidance).
5:00pm
- Panel + 🏆 Announcement of the Awards 🏆
5:30pm
- Workshop end
Call for Papers
We welcome submission of short/regular papers on any computer vision topics, for presentation at the poster session.
It can be original or recently published work.
Instructions:
The topics of interest include, but are not limited to:
Submission deadline: August 11th 2024 (Anywhere on Earth).
Submission deadline (EXTENSION): August 20th 2024 (Anywhere on Earth).
🏆 Prizes will be awarded. 🏆
500$ cash (Snap gift)
GoPro HERO12 (GoPro gift)
Submission website: Submission deadline (EXTENSION): August 20th 2024 (Anywhere on Earth).
🏆 Prizes will be awarded. 🏆
500$ cash (Snap gift)
GoPro HERO12 (GoPro gift)
Instructions:
Submissions should be 4 to 8 pages (excluding references pages).
We encourage submissions to use our double-column latex kit but we will accept single/double columns submissions with any format. Anonymity is optional.
We accept submission which are original, under review, or already published.
⚠️ If you don't yet have an openreview account, note that the account creation can take a few days for validation. Create your account asap.
We encourage submissions to use our double-column latex kit but we will accept single/double columns submissions with any format. Anonymity is optional.
We accept submission which are original, under review, or already published.
⚠️ If you don't yet have an openreview account, note that the account creation can take a few days for validation. Create your account asap.
The topics of interest include, but are not limited to:
- 3D computer vision
- Adversarial learning, adversarial attack for vision algorithms
- Autonomous agents with vision (reinforcement/imitation learning)
- Biometrics, face, gesture, body pose
- Computational photography, image and video synthesis
- Explainable, fair, accountable, privacy-preserving, ethical computer vision
- Foundation models, multimodal large language models, etc.
- Image recognition and understanding (object detection, categorization, segmentation, scene modeling, visual reasoning)
- Low-level and physics-based vision
- Semi-/Self-/Un-supervised learning and Few-/Zero-shot algorithms
- Transfer learning (domain adaptation, etc.)
- Video understanding (tracking, action recognition, etc.)
- Multi-modal vision (image+text, image+sound, etc.)
Organizers
Raoul de Charette
Inria
Fabio Pizzati
Oxford Uni.
Tuan-Hung Vu
Valeo.ai
Andrei Bursuc
Valeo.ai
Sileye Ba
L'Oréal
Volunteers
Lama Moukheiber
University at Buffalo
Benjamin Rukundo
Makerere University
Loyani Loyani Kisula
NM-AIST
Volunteers are welcome to help at the workshop. Just contact us.
Important dates
- Submission deadline:
August 11, 2024 (AOE)August 20, 2024 (AOE). - Decision notification: August 23, 2024.
- Workshop date: September 6, 2024.
📢 Want to volunteer ? Any questions ? Contact Raoul de Charette.