Phong Nguyen

I am a PhD student at Center For Machine Vision and Signal Analysis (CMVS), University of Oulu, Finland, where I am co-advised by Prof. Janne Heikkilä and Prof. Esa Rahtu.

I have a MS in Electronics and Electrical Engineering (Autonomous AI Drone) at Dongguk University, South Korea where I was a research assistant for Prof.Kang Ryoung Park. I have a BS in Mechanical Engineering from HUST, Vietnam.

From May to November of 2021, I joined the Reality Labs Research, Sausalito where I was a research intern for Nikolaos Sarafianos, Christoph Lassner and Tony Tung. I am also very lucky to have a 2022 summer internship at NVIDIA Toronto AI Lab and work with Sanja Fidler and Sameh Khamis.

I am currently looking for full-time research and engineering positions related to neural rendering, 3D generative models. Feel free to reach me by email or LinkedIn for future opportunities.

Email  /  CV  /  Google Scholar  /  Twitter  /  Github

profile photo

I'm interested in the topic of 3D reconstruction, novel view synthesis and neural rendering. My research combines 3D computer vision and deep learning.

Free-Viewpoint RGB-D Human Performance Capture and Rendering
Phong Nguyen, Nikolaos Sarafianos, Christoph Lassner, Janne Heikkilä, Tony Tung
ECCV, 2022
arxiv / bibtex / project page / poster / video

We propose an architecture to learn dense features in novel views obtained by sphere-based neural rendering, and create complete renders using a global context inpainting model. Additionally, an enhancer network leverages the overall fidelity, even in occluded areas from the original view, producing crisp renders with fine details. Our method produces high quality novel images and generalizes on unseen human actors during inferences.

RGBD-Net: Predicting Color and Depth images for Novel Views Synthesis
Phong Nguyen, Animesh Karnewar, Lam Huynh, Esa Rahtu, Jiri Matas, Janne Heikkila
3DV, 2021
code / bibtex / video /

We propose a new cascaded architecture for novel view synthesis, called RGBD-Net, which consists of two core components: a hierarchical depth regression network and a depth-aware generator network. The former one predicts depth maps of the target views by using adaptive depth scaling, while the latter one leverages the predicted depths and renders spatially and temporally consistent target images.

Lightweight Monocular Depth with a Novel Neural Architecture Search Method
Lam Huynh, Phong Nguyen, Esa Rahtu, Jiri Matas, Janne Heikkila
WACV, 2021
arxiv / bibtex

This paper presents a novel neural architecture search method, called LiDNAS, for generating lightweight monocular depth estimation models. Unlike previous neural architecture search (NAS) approaches, where finding optimized networks are computationally highly demanding, the introduced novel Assisted Tabu Search leads to efficient architecture exploration.

Monocular Depth Estimation Primed by Salient Point Detection and Hessian Loss
Lam Huynh, Matteo Pedone, Phong Nguyen, Esa Rahtu, Jiri Matas, Janne Heikkila
3DV, 2021
arxiv / bibtex

This work proposes an accurate and lightweight framework for monocular depth estimation based on a self-attention mechanism stemming from salient point detection. Specifically, we utilize a sparse set of keypoints to train a FuSaNet model that consists of two major components: Fusion-Net and Saliency-Net.

Boosting Monocular Depth Estimation with Lightweight 3D Point Fusion
Lam Huynh, Phong Nguyen, Esa Rahtu, Jiri Matas, Janne Heikkila
ICCV, 2021
project page / arxiv / bibtex

In this paper, we propose enhancing monocular depth estimation by adding 3D points as depth guidance. Unlike existing depth completion methods, our approach performs well on extremely sparse and unevenly distributed point clouds, which makes it agnostic to the source of the 3D points.

Sequential View Synthesis with Transformer
Phong Nguyen, Lam Huynh, Esa Rahtu, Janne Heikkila
ACCV, 2020

We introduces Transformer-based Generative Query Network (T-GQN) which uses multi-view attention learning between context images to obtain multiple implicit scene representations. A sequential rendering decoder is presented to predict multiple target images, based on the learned representations. T-GQN not only gives consistent predictions but also doesn’t require any retraining for finetuning.

Guiding Monocular Depth Estimation Using Depth-Attention Volume
Lam Huynh, Phong Nguyen, Esa Rahtu, Jiri Matas, Janne Heikkila
ECCV, 2020
project page / arxiv / bibtex

In this paper, we propose guiding depth estimation to favor planar structures that are ubiquitous especially in indoor environments. This is achieved by incorporating a non-local coplanarity constraint to the network with a novel attention mechanism called depth-attention volume (DAV).

Predicting Novel Views Using Generative Adversarial Query Network
Phong Nguyen, Lam Huynh, Esa Rahtu, Janne Heikkila
SCIA, 2019 (Best Paper Award)

We introduces the Generative Adversarial Query Network (GAQN), a general learning framework for novel view synthesis that combines Generative Query Network (GQN) and Generative Adversarial Networks (GANs).

LightDenseYOLO: A Fast and Accurate Marker Tracker for Autonomous UAV Landing by Visible Light Camera Sensor on Drone
Phong Nguyen, Muhammad Arsalan,Ja Hyung Koo, Rizwan Ali Naqvi, Noi Quang Truong Kang Ryoung Park
Sensors, 2018

We proposed lightDenseYOLO, a novel marker detector for autonomous drone landing using deep neural networks.

Remote Marker-Based Tracking for UAV Landing Using Visible-Light Camera Sensor
Phong Nguyen Ki Wan Kim, Young Won Lee, Kang Ryoung Park
Sensors, 2017

In this research, we determined how to safely land a drone in the absence of GPS signals using our remote maker-based tracking algorithm based on the visible light camera sensor.

Reading Group for Vietnamese

During my free time, I made explaining videos for exciting computer vision papers at the Cracking Papers 4 VN Youtube channel. Here are some examples:

The credit of this website template goes to Jon Barron. Thank you!