Joshua's Home
Computer Science
Art Portfolio

Plus Ultra

3D Style Transfer in VR

NeRF, Style Transfer, VR

This work takes NeRF as a backbone, exploring and experimenting with different methods to make 3D style transfer possible, by combining 2D style transfer technique with photo-realistic 3D scenes from NeRF. And it then explores the applications that can utilize such techniques to achieve architecture design and art creation in a fashion of 3D-aware style control and transfer. To explore the topic, the generation of the datasets for the network is discussed, and different ways of combining NeRF network and Style transfer network are implemented and analyzed, including latent space embedding of various features and concatenation of two networks. Finally, the 3D style scene is put in the VR for an immersive experience.

[Github]

Alexa Arena Simbot Challenge

ASR, SLAM, CV, NLP, Robotics

It’s a university challenge competition organized by Amazon Alexa, which aims at building a virtual robot assistant for daily tasks by human-robot interaction(human utterance). I lead the team, designed and implemented the Alexa virtual home robot to finish housework tasks with ASR, language parser, semantic detector, RGB-D SLAM navigator, and robot instruction logic. Language model includes Transformers-based profanity, co-ref, text-to-text, and ViT vision-assisted parser.

[Web]

Visual Question Answering

VQA, Pytorch, Deep Learning, Transformer

This project aims at building a learning method to answer questions based on image inputs, we used the dataset of VQA v1.0 Open-Ended, and took the 5216 most frequent answers as the target number of classes plus the “others” class, for the supervised training. We used ResNet-18 for image feature extraction, and RoBERTa for text feature extraction, the two features are then concatenated and sent to a fully connected layer to calculate the multi-label loss. To improve the performance, we introduced Transformer-based VQA, which is inserted between the two features layer(before concatenation) and the fully connected layer. To be more specific, there are 3 Cross-Encoder layers, each of them consisting of self-attention and cross-attention. The result has an accuracy of around 67%.

[Github]

Ground-robot Navigation

Path Planning, Ground-robot, SLAM

We Built the Husky ground-robot system with a Velodyne-16 Lidar sensor, Xsens-IMU, a 2nd 2D lidar, an RGB RealSense camera, and most importantly, on-board computing hardware with modern GPU; We then configured the ROS system with AMCL, Gmapping, Dijkstra* and DWA for 2D-lidar SLAM and path planning, navigation while avoiding dynamic obstacles, this achieves a foundation for advanced research such as reinforcement learning, more accurate pedestrian detection, etc. Besides, we managed to run a 3D lidar-based SLAM algorithm in real-time.

Weakly Supervised Object Detection

Object Detection, Pytorch, Deep Learning, Computer Vision

We used ImageNet pre-trained model to give a rough object detection, we first implement an Alex-Net-based naive object detection network and then implemented a WSDDN model where the object region proposals is fed into spatial pyramid pooling for both classification and detection. We used ROI Pooling to get features from Selective Search and implemented a classifier to get both bounding boxes and class scores for each region, and then use NMS and get intersection over union to reduce the number of bboxes, finally we write mAP and recall to evaluate the model.

[Github]

Pose Estimation implemented in Exercise App

iOS develop, SwiftUI, Pose Detection, Pose Classification

In this project, we explore and experiment with Google MediaPipe’s BlazePose to detect and track pose in real-time, and tried different methods of pose classification, such as classic KNN and TensorFlow-trained deep learning networks, to classify human poses and compared them to reference. Besides, we add a repetition counting mechanism to only add counting for those poses that match the reference and the threshold can be adjusted according to use cases, such as strictness of how accurate the exercise of users and force them to do it properly. We also implement the whole experiment from Pytorch and TensorFlow to Swift language for iOS development.

3D Lidar SLAM, Pedestrian Detection and Tracking

SLAM, Pedestrian Detection and Tracking, Reinforcement Learning, ROS

In this project, we experimented with 3D Lidar-based SLAM SOAT such as LOAM, Lego-LOAM, and LIO-SAM, and configured with customized Velodyne-16 and Xsens IMU, running on ground robot Husky, besides, we combined with HDL_Localization to detect and track human pedestrian nearby the robot and adjust the path planning method to follow human and avoid the obstacles. The Pipeline can run in real-time with the onboard computing units on Husky.

NeRF-W in Historic Architecture

NeRF, Deep Learning, Synthesized Image

In this project, we explore and experiment with existing NeRF-related neural networks, we are interested in using NeRF to generate free-view synthesized authentic photos, especially for large real-life objects whose non-occlusive views are hard to approach, such as large architecture with occlusions and views that are too high to take a picture unless using drone.

[Github] [Web]

Neural Style Transfer

Deep Learning, Image Synthesis, Style Transfer

Neural Style Transfer is a VGG-19 neural network, that utilizes the regression method MSE for loss function, and the LBFGS for input image(noise) optimization. It only uses the feature extraction part of VGG-19, and only for evaluation purposes (no gradient optimization for these layers), instead, the optimization happens in the loss function and input(two ends). And the loss function consists of two parts, content loss, and style loss, we’ll implement them separately first, and then combine them together with assigned weights.

[Github] [Web]

CycleGAN

CycleGAN, DC-GAN, Deep Learning

This project implements two famous GAN architectures: DC-GAN and CycleGAN. It is programmed in Pytorch, the major code includes the build-up of the discriminator and generator neural network, loss function, and forward and backward propagations. It also explores different methods that help GAN generate better results, such as Data Augmentation, Differentiable Augmentation, the variance of different loss functions, and the variance of different discriminators, and implemented in the different datasets to check the robustness of the network.

[Github] [Web]

Poisson Blending

Image Synthesis, Poisson Blending

The project explores gradient-domain processing in the practice of image blending, tone mapping, and non-photorealistic rendering. The method mainly focuses on the Poisson Blending algorithm. The tasks include primary gradient minimization, 4 neighbors-based Poisson blending, mixed gradient Poisson blending, and gray-scale intensity preserved color2gray method. The whole project is implemented in Python.

[Github] [Web]

CMU 15213/15513

Computer System, Cache, Malloc, Stack, Web Proxy

This course project mainly includes basic computer system knowledge from compiler to the linker, such as stack, heap, cache, simple network implementation, debug and disassemble usages. The assignment is based on C.

[Github]

Scotty3D

Computer Graphics

Scotty3D is a 3D modeling software developed by CMU for education purposes, it includes tasks of Half-edge Mesh method to manage mesh editing tasks such as Edge flip, beveling, Catmull-Clark subdivision, isotropic re-mesh, linear subdivision, simplification, triangulation; it also includes path tracing methods such as camera ray intersection with objects, BVH optimization, path tracing, material tracing, direct and environment lighting rendering; At last, it includes animation of skeleton kinematics, skinning, splines, and some particle effects.

[Github] [Web]

Draw SVG

Computer Graphics

As the name implies, this project’s target is to draw SVG from 3D shapes, the main tasks are to draw lines, and triangle shapes, manage super-sampling, transforms, trilinear filter, alpha composition, scaling, etc.

[Github] [Web]

Dataset and Benchmarking of Modern Visual SLAM Systems

SLAM, Realistic Rendering, Computer Photography

We discuss the structure of the representations and optimization problems involved in Spatial AI and propose new synthetic datasets that include accurate ground truth information about the scene composition as well as individual object shapes and poses. We furthermore propose evaluation metrics for all aspects of such joint geometric-semantic representations and apply them to a new semantic SLAM framework.

[Github] [Web]

Dense Object Reconstruction from RGBD Images with Embedded Deep Shape Representations

AutoEncoder, SLAM, 3D Reconstruction

With an application to dense object modeling from RGB-D images, our work aims at taking the best of both worlds by embedding modern higher-order object shape priors into classical iterative residual minimization objectives. We demonstrate a general ability to improve mapping accuracy with respect to each modality alone, and present a successful application to real data.

[Web]

A RGB-based SLAM Systems

SLAM

A fundamental SLAM system with tracking, mapping, and pose optimization in Matlab. It includes SIFT & Harris feature extraction, 7/8 points, homograph method, and LevenBerg-Marquardt average error for pose optimization.

[Github]