From Pixels to Wireframes: 3D Reconstruction via CLIP-Based Sketch Abstraction

2 minute read

Date: May 30, 2025

The project explores a novel approach to abstracting 3D surfaces into wireframe-like representations by leveraging CLIP-based semantic guidance and Bézier curve parameterizations. The project website can be found at this link.

Project Structure

clipasso3d/
├── CLIP_/                    
├── data/
├── notebooks/
├── source/
├── .gitignore
├── .gitmodules
├── README.md
└── requirements.txt

Setup & Installation

Clone the repository:

   git clone --recurse-submodules https://github.com/tarhanefe/clipasso3d.git
   cd clipasso3d

Create a new Conda environment with python version 3.10:

   conda create -n 3dsketch python=3.10 -y
   conda activate 3dsketch

Install Pytorch for Cuda version 12.1:

   pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu121

Install library requirements:

   pip install -r requirements.txt

How to Use ?

1- Trial Notebook

Use WireframeTrial.ipynb notebook inside the notebooks folder to interactively examine the training process and obtained results.

2- Run file

  cd clipasso3d
  python run.py \
  --data_name rose \
  --save_dir training_frames \
  --output_gif training_evolution.gif \
  --batch_size 1 \
  --epochs 10 \
  --inner_steps 30 \
  --learning_rate 0.005 \
  --n_curves 25 \
  --thickness 0.02 \
  --radius 0.8 \
  --length 0.02 \
  --overlap 0.6 \
  --clip_weight 1.0 \
  --clip_conv_loss 1.0 \
  --clip_fc_loss_weight 0.1 \
  --clip_conv_layer_weights 0 0 1.0 1.0 0.0 \
  --gif_fps 120 \
  --rotation_time 6.0 \
  --revolutions 3.0 \
  --output_semigif semihelical.gif

where

🔧 Argument Descriptions

Argument	Description
`--data_name`	Name of dataset folder inside `../data/` (e.g., `rose`).
`--save_dir`	Directory to save per-batch visualization frames.
`--output_gif`	Path to save the training evolution GIF.
`--batch_size`	Number of images processed per training batch.
`--epochs`	Number of training epochs.
`--inner_steps`	Inner optimization steps per batch.
`--learning_rate`	Learning rate for Adam optimizer.
`--n_curves`	Number of Bézier curves initialized in the scene.
`--thickness`	Radius (thickness) of each rendered curve/sphere segment.
`--radius`	Distance from the scene center to place initial curve points.
`--length`	Length of the each initial curve segment.
`--overlap`	Degree of allowed overlap between nearby spherical gaussians on a curve.
`--clip_weight`	Total CLIP loss weight.
`--clip_conv_loss`	Weight of the convolutional CLIP loss.
`--clip_fc_loss_weight`	Weight of CLIP’s final-layer (semantic) similarity loss.
`--clip_conv_layer_weights`	Weights for each CLIP convolutional layer (ViT-B/32 has 5 layers).
`--gif_fps`	Frame rate (FPS) of generated GIFs.
`--rotation_time`	Duration (in seconds) of 3D camera rotation in the fly-around GIF.
`--revolutions`	Number of full revolutions in the semihelical fly-around.
`--output_semigif`	Output path for the 3D rendered semihelical GIF animation.