• Joshua's Home
  • Computer Science
  • Art Portfolio

Plus Ultra

Visual Question Answering

VQA, Pytorch, Deep Learning, Transformer

In this project, we used dataset of VQA v1.0 Open-Ended, and took the 5216 most frequent answers as target number of classes plus “others” class, for the supervised training. We used ResNet-18 for image feature extraction, and RoBERTa as text feature extraction, the two feature are concatenated and sent to a fully connected layer to calculate the multi-label loss. To improve the performance, we introduced Transformer-based VQA, which is inserted between the two features layer(before concatenation) and the fully connected layer. To be more specific, there are 3 Cross-Encoder layers, each of them consist of a self-attention and cross-attention. The result have an accuracy around 67%.

[Github]    

Ground-robot Navigation

Path Planning, Ground-robot, SLAM

We Built the Husky ground-robot system with Velodyne-16 Lidar sensor, Xsens-IMU, a 2nd 2D lidar and RGB RealSense camera, and most importantly, an on-board computing hardware with modern GPU; We then configured the ROS system with AMCL, Gmapping, Dijkstra* and DWA for 2D-lidar SLAM and path planning, navigation while avoiding dynamic obstacles, this achieves a fundation for advanced research such as reinforcement learning, more accurate pedestrian detection, etc. Besides, we managed to run 3D lidar-based SLAM algorithm in real time.

Weakly Supervised Object Detection

Object Detection, Pytorch, Deep Learning, Computer Vision

We used ImageNet pre-trained model to give a rough object detection, we first implement an Alex-Net based naive object detection network, and then implemented WSDDN model where the object region proposals is fed into spatial pyramid pooling for both classification and detection. We used ROI Pooling to get feature from Selective Search and implememnted a classifier to get both bounding-boxes and class scores for each region, and then use NMS and get intersection over union to reduce the number of bboxes, fianlly we write mAP and recall to evaluate the model.

[Github]    

Pose Estimation implemented in Exercise App

iOS develop, SwiftUI, Pose Detection, Pose Classification

In this project, we explore and experiment with Google MediaPipe’s BlazePose to detect and track pose in real-time, and tried diffent methods of pose classification, such as classic KNN and TensorFLow trained deep learning networks, to classify human poses and compared them to reference. Besides, we add repetition counting mechanism to only add counting for those poses that match the reference and the threshold can be adjusted according to use cases, such as strictness of how accurate the exercise of users and force them to do it properly. We also implement the whole experiment from Pytorch and TensorFlow to Swift language for iOS development.

3D Lidar SLAM, Pedestrian Detection and Tracking

SLAM, Pedestrian Detection and Tracking, Reinforcement Learning, ROS

In this project, we experimented with 3D Lidar-based SLAM SOAT such as LOAM, Lego-LOAM, and LIO-SAM, and configured with customized Velodyne-16 and Xsens IMU, running on ground robot Husky, besides, we combined with HDL_Localization to detect and track human pedestrian nearby the robot and adjust the path planning method to follow human and avoid the obstacles. The Pipeline can run real-time with the onboard computing units on Husky.

NeRF-W in Historic Architecture

NeRF, Deep Learning, Synthesized Image

In this project, we explore and experiment with existing NeRF related neural network, we are interested in using NeRF to generate free view sythesized authentic photos, and especially for large real-life objects whose non-occlusive views are hard to approach, such as large architecture with occlusions and views that are too high to take a picture unless using drone.

[Github]     [Web]

Neural Style Transfer

Deep Learning, Image Synthesis, Style Transfer

Neural Style Transfer is a vgg-19 based neural network, utilizes regression method MSE for loss function, and the LBFGS for input image(noise) optimization. It only uses the feature extraction part of vgg-19, and only for evaluation purpose(no gradient optimization for these layers), instead, the optimization happens in the loss function and input(two ends). And the loss function consists of two parts, content loss and style loss, we’ll implement them separately first, and then combine them together with assigned weights

[Github]     [Web]

CycleGAN

CycleGAN, DC-GAN, Deep Learning

This project implements two famous GAN architecture: DC-GAN and CycleGAN. It is programmed in Pytorch, the major code includes the build-up of discriminator and generator neural network, loss function, forward and backward propagations. It also explores different methods that help GAN generate better results, such as Data Augmentation, Differentiable Augmentation, variance of different lose functions, variance of different discriminators, and implemented in different dataset to check the robustness fo the network.

[Github]     [Web]

Poisson Blending

Image Synthesis, Poisson Blending

The project explores the gradient-domain processing in the practice of image blending, tone mapping and non-photorealistic rendering. The method mainly focuses on the Poisson Blending algorithm. The tasks include primary gradient minimization, 4 neighbors based Poisson blending, mixed gradient Poisson blending and gray-scale intensity preserved color2gray method. The whole project is implemented in Python.

[Github]     [Web]

CMU 15213/15513

Computer System, Cache, Malloc, Stack, Web Proxy

This course mainly includes basic computer system knowledge from compiler to linker, includes stack, heap, cache, simple network implementation, debug and disassemble usages. The assignment is based on C.

[Github]    

Scotty3D

Computer Graphics

Scotty3D is a 3D modelling software developed by CMU for education purpose, it includes tasks of Half-edge Mesh method to manage mesh editing tasks such as Edge flip, beveling, Catmull-Clark subdivision, isotropic remeshing, linear subdivision, simplification, triangulation; it also includes path tracing methods such as camera ray intersection with objects, BVH optimization, path tracing, material tracing, direct and environment lighting rendering; At last, it includes animation of skeleton kinematics, skinning, splines, and some particle effects.

[Github]     [Web]

Draw SVG

Computer Graphics

As name implies, this project’s target is to draw svg from 3D shape, the main tasks are to draw line, triangle shapes, manage super-sampling, transforms, trilinear filter, alpha composition, scaling, etc.

[Github]     [Web]

Dataset and Benchmarking of Modern Visual SLAM Systems

SLAM, Realistic Rendering, Computer Photography

We discuss the structure of the representations and optimisation problems involved in Spatial AI, and propose new synthetic datasets that include accurate ground truth information about the scene composition as well as individual object shapes and poses. We furthermore propose evaluation metrics for all aspects of such joint geometric-semantic representations and apply them to a new semantic SLAM framework.

[Github]     [Web]

Dense Object Reconstruction from RGBD Images with Embedded Deep Shape Representations

AutoEncoder, SLAM, 3D Reconstruction

With an application to dense object modeling from RGB-D images, our work aims at taking the best of both worlds by embedding modern higher-order object shape priors into classical iterative residual minimization objectives. We demonstrate a general ability to improve mapping accuracy with respect to each modality alone, and present a successful application to real data.

[Web]

A RGB-based SLAM Systems

SLAM

A fundamental SLAM system with tracking, mapping and pose optimization in Matlab. It includes SIFT & Harris feature extraction, 7/8 points, homograph method, and LevenBerg-Marquardt average error for pose optimization.

[Github]