Campanile-like Pitch: UXUI for progressive pose estimation

1. Problem Statement

Our goal is to create a pleasurable UX/UI for manually calibrating images through progressive constraints, rather than requiring users to define 7 to 10 correspondences upfront before running an optimization like Levenberg-Marquardt. The current approach can feel rigid and non-interactive, whereas we want a workflow that allows for incremental refinement— letting users place an image in the scene and gradually improve its alignment as they add correspondences.

This calibration process does not require an explicit 3D model. Instead, the system should be flexible enough to work with different sources of constraints, such as:

Rough geo-referencing – Placing an image near a known lat/long.
UV correspondences across multiple known cameras – Using epipolar constraints rather than an explicit 3D model.
Direct 3D correspondences – From RGB-depth cameras.
2D lat/long correspondences on a map – Aligning a feature to a known georectified map without height information.
3D geopoints (lat/long/height above sea level) – For more precise placement.
Hybrid cases – Where constraints come from images, known 3D features, or geo-referenced maps.

The key challenge is designing an intuitive sequence of adjustments based on the available constraints. The first step may be placing an image at a rough location, followed by a single correspondence that refines translation. A second point may refine depth and focal length, while a third introduces yaw and pitch corrections.

As more correspondences are added, adjustments shift from coarse manual placement to fine-tuned corrections, ensuring a fluid and interactive calibration experience. The system should also be able to blend constraints from multiple sources, whether those constraints are geospatial (lat/long/height), image-based (UV correspondences across cameras), or direct depth measurements.

By structuring the workflow to provide immediate, real-time feedback and refine the image placement with each interaction, we avoid a trial-and-error process and instead create a progressive and intuitive calibration experience.

2. Appetite

3. Solution Concept

We propose a stepwise refinement approach that introduces camera parameters in a natural order as more correspondences are added.

Step 1: Translation from First Correspondence

The first correspondence (user clicking on a 3D model point and dragging it to align in the photo) primarily affects **camera translation** \((X_C, Y_C, Z_C)\).

Given the projection equation:

p_x = f * (X_P - X_C) / (Z_P - Z_C) + c_x
p_y = f * (Y_P - Y_C) / (Z_P - Z_C) + c_y

The user drag introduces a UV displacement \(\Delta u_x, \Delta u_y\), leading to an update:

ΔX_C = (Δu_x * (Z_P - Z_C) / f) + ΔZ_C * (X_P - X_C) / (Z_P - Z_C)
ΔY_C = (Δu_y * (Z_P - Z_C) / f) + ΔZ_C * (Y_P - Y_C) / (Z_P - Z_C)
ΔZ_C = (Z_P - Z_C) * ((X_P - X_C) * Δu_x / f - (Z_P - Z_C)) / (X_P - X_C)

This establishes an initial alignment for the first correspondence by updating the camera’s **translation**.

Step 2: Adjusting Field of View (FOV)

Once a second correspondence is introduced, the system may need to refine **focal length (f)** or **Field of View (FOV)** to maintain projection consistency across a larger region.

The FOV adjustment ensures that the projected scale of features remains consistent:

f' = f * ( ||P_2 - P_1|| / ||p'_2 - p'_1|| )

Where:

\(||P_2 - P_1||\) is the Euclidean distance between two 3D points.
\(||p'_2 - p'_1||\) is the user-specified image-space displacement after dragging.

Step 3: Introducing Yaw and Pitch

With a third correspondence, the system can begin refining **yaw (\(\theta\)) and pitch (\(\phi\))**.

Using the Jacobian:

J_θ = -f * ( (Y_P - Y_C) / (Z_P - Z_C) ) 
J_φ =  f * ( (X_P - X_C) / (Z_P - Z_C) )

We compute small angle updates:

Δθ = J_θ^{-1} * Δu_x
Δφ = J_φ^{-1} * Δu_y

Step 4: Refining Principal Points (c_x, c_y)

As additional correspondences are added, adjustments to the **principal point (\(c_x, c_y\))** account for any misalignment due to lens distortions or sensor miscalibration.

Adjusting principal points follows:

c_x' = c_x + mean( Δu_x / f )
c_y' = c_y + mean( Δu_y / f )

Step 5: Final Optimization with Levenberg-Marquardt

Once the system becomes **over-constrained**, a nonlinear optimization method like Levenberg-Marquardt is used for fine-tuning:

min Σ || P_i - Projected(P_i, X_C, Y_C, Z_C, f, θ, φ, c_x, c_y) ||

This step ensures that all correspondences contribute to an optimal camera pose solution.

Campanile-like Pitch: UXUI for progressive pose estimation

1. Problem Statement

2. Appetite

3. Solution Concept

Step 1: Translation from First Correspondence

Step 2: Adjusting Field of View (FOV)

Step 3: Introducing Yaw and Pitch

Step 4: Refining Principal Points (c_x, c_y)

Step 5: Final Optimization with Levenberg-Marquardt

4. Rabbit Holes and Scope Cut

5. No-Gos

6. Risks and Unknowns

7. Mockups

8. Integration Plan

9. Success Criteria

10. Next Steps