Sorting Images using EMD¶

Consider a set of pictures from 360° camera mounted inside a merry-go-round. They were taken at night and only one seat is visible which emits light – the seat in the shape of a jelly fish. We know that the merry-go-round rotates clockwise and that all the pictures were taken during a single cycle of merry-go-round. Given that the first picture is P1, your task is to sort the rest in the chronological order. Assume that the move of jelly fish is monotonous in horizontal direction, i.e., it always moves in clockwise direction, never backwards.

Input files: text files, each of them with 10 rows and 80 columns representing the brightness level in the given parts of the picture. Jelly fish can be recognized by high brightness (value 1 to 9) on a black background (value 0).

00000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000000000005950000000000000000000000000000000000000000
00000000000000000000000000000000000009990000000000000000000000000000000000000000
00000000000000000000000000000000000005950000000000000000000000000000000000000000
00000000000000000000000000000000000003930000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000000000000000000000000000000000

Directions¶

In order to capture the movement of the jelly fish between two pictures, use Earth-mover distance, also known as Wasserstein distance. It can be formulated as a min-cost flow problem on a certain network. Use NetworkX package (https://networkx.org) to perform the optimization required to find this flow.

About Earth-mover distance: Consider two distributions over a discrete set of points $P$: $μ$ in $[0,1]^P$ and $λ$ in $[0,1]^P$. Since both are distributions and sum of $μ(p)$ over all points in $P$ is $1$ (and the same holds for $λ$), there is surely a way to transform $μ$ into $λ$: we denote $f_{p,q}$ the probability mass transfered from point $p$ to $q$ for each pair of points $p,q$ from $P$, in order to transform $μ$ into $λ$:

For each $p$ in $P$: $μ(p) = \sum q f_{p,q}$ and for each $q$ in $P$: $λ(q) = Σ \sum p f_{p,q}$.

Note that $f_{p,p}$ denotes the probability mass which remains at the same place. Let $d(p,q)$ denote the distance between $p$ and $q$. Then, the cost of the transformation $f$ is

$\sum\limits_{p,q} d(p,q) f_{p,q}$

The earth-mover distance of μ and λ is then the cost of the cheapest transformation between $μ$ and $λ$.


Solution¶

In [ ]:
import networkx as nx
import numpy as np

def comp_dist(f,g):
    # if the two matrices are the same, we want the distance to be zero
    if f == g:
        distance = 0.0
    else:
        # initialise matrices of first and second file
        m1 = np.zeros((10,80))
        m2 = np.zeros((10,80))

        # create matrix for first file
        with open(f) as file:
            for i, line in enumerate(file):
                if line.strip():  # check if line is not empty
                    m1[i] = [int(x) for x in line.strip()]

        # create matrix for second file
        with open(g) as file:
            for i, line in enumerate(file):
                if line.strip():  # check if line is not empty
                    m2[i] = [int(x) for x in line.strip()]

        a = int(np.sum(m1))
        b = int(np.sum(m2))
        for i in range(10):
            for j in range(80):
                m1[i,j] = round((m1[i,j] / a)*10000)
                m2[i,j] = round((m2[i,j] / b)*10000)
        # Let l1 and l2 be the lists of bright pixels and their relative positions in the corresponding matrices
        l1 = []
        l2 = []
        for i in range(10):
            for j in range(80):
                if m1[i][j] != 0:
                    l1.append((m1[i][j],i,j))
        for i in range(10):
            for j in range(80):
                if m2[i][j] != 0:
                    l2.append((m2[i][j],i,j))
        g = nx.DiGraph()

        for i in range(len(l1)):
            g.add_node(l1[i], demand = -l1[i][0])
        for i in range(len(l2)):
            g.add_node(l2[i], demand =  l2[i][0])

        for i in range(len(l1)):
            for j in range(len(l2)):
                g.add_edge(l1[i], l2[j], weight = (l2[j][2]-l1[i][2])%80)
        distance = nx.min_cost_flow_cost(g)
    return(distance)

def sort_files():
    files = []
    for i in range(1,16):
        files.append(f"P{i}.txt")
    sorted_files = sorted(files, key = lambda x: comp_dist("P1.txt", x))
    return(sorted_files)