Colorizing the Prokudin-Gorskii Photo Collection

Prokudin-Gorskii's early 1900s collection features almost 2,000 imaged scenes, each of which consists of 3 glass negatives (for each color channel). The hope was to be able to align these plates and observe images in RGB!

A cathedral

Emir of Bukhara

Glass negatives of scenes (BGR from top to bottom)

Main Idea

If we can crop these images into the three respective blue, green, and red channels (from top to bottom) and we can align these channels, then we should get a coherent RGB image.

Initial Approach

We can do some naive 3-way vertical splitting to get rough crops of our 3 color channels. This is as simple as splitting the height by 3 into 3 crops. If we define B to be our baseline, we can align B and R, as well as B and G and stack these results with our original B channel and get our RGB image. To align these cropped channels, we can perform a sweep of channel shifts (optimizing over a loss) and choose the shift that minimizes our loss. In this particular implementation, I sweep over a range of (-15, 15) in both cartesian directions and roll the pixel values such that any pixels affected by the offset are propagated to the other edge of the image. If you consider the sum of square distances, notice how we simply subtract the raw pixel values/brightnesses which, when optimized over, doesn't necessarily converge to an aligned image. Image pixels aren't simply a function of the object's brightness and must account for other features as well. As a solution, we can use a normalized cross-convolution loss which aims to measure the structural similarity of objects in an image.

Large Images: Refining Our Approach

A search space of 30 * 30 = 900 pixels is not feasible for large images on the order of 80+ mb. I optimized my code as much as possible through vectorization and cropping but this still proved too exhaustive. As a solution, we can employ an image pyramid which performs sweeps increasing in search space as we decrease the image resolution. If we perform an exhaustive enough search at a low resolution, we can take our displacement vector and scale it by our rescaling factor, getting us a displacement for the next finest resolution. We can do this for an arbitrary number of levels, but I primarily stick with 3 for small images (< 1mb) and 5 for large images (>>>> 1mb). Another clever trick I was recommended was to align the red channel with green, and then align green with blue, and account for that additional shift in the red offset. In doing so, we have a 3-way alignment scheme going on instead of both red and green aligning to blue.

Results

Onion Church - Red: (112, 48), Green: [48, 32]

Boat - Red: (144, 32), Green: [64, 16]

Icon - Red: (96, 16), Green: [48, 16]

Train - Red: (96, 32), Green: [48, 0]

Castle - Red: (144, 32), Green: [64, 16]

Melons - Red: (176, 16), Green: [80, 16]

Shed - Red: (80, 32), Green: [32, 16]

Three Generations - Red: (112, 16), Green: [48, 16]

Church - Red: (48, -16), Green: [16, 0]

Emir - Red: (96, 48), Green: [48, 32]

Lady - Red: (112, 16), Green: [48, 16]

Sculpture - Red: (144, -32), Green: [32, -16]

Monastery - Red: (-2, 0), Green: [-6, 0]

Cathedral - Red: (12, 4), Green: [4, 4]

Tobolsk - Red: (8, 4), Green: [4, 4]

Ranch - Red: (128, 64), Green: [64, 32]

Harvesters - Red: (128, 16), Green: [64, 16]

Self Portrait - Red: (176, 32), Green: [80, 32]

Bells and Whistles

I still noticed some artifacts after pyramid search and decided to use edge detection for better alignment. I employed 3 different filters: Canny, Gaussian, and Sobel. I found Canny to be the best, and you can find the results displayed below. If you look closely to the left-most person and the outline of their clothes, you can notice that the red outline has been removed!

Three generations (unfiltered) - Red: (112, 16), Green: [48, 16]

Three generations (filtered) - Red: (32, 16), Green: [64, 16]

Side by side of edge edge detection via filtering

Edge detection is far more robust to color channels, especially when such color channels aren't strong signals w.r.t each other. In comparing edges via filtering, we circumvent this issue.