I am trying to learn more about agglomerative clustering from resources online by coming up with an algorithm rather than borrowing scipy routines; for my example, I am using the single-link criteria, which merges the two closest points together iteratively.

Using my algorithm, I have obtained the necessary data to form a dendrogram. Drawing one using matplotlib seems overly complicated, so I thought I'd borrow the scipy routine `dendrogram`

just for plotting. In order to do so, I must input my result in such a way that the scipy routine understands. Below is a comparison of my algorithm and the scipy algorithm. I do not understand how their output is obtained. I do not think I've made a mistake, but please correct me if I am wrong.

```
## GET EXAMPLE DATA
x = np.array([0, 5, 25, 125, 625])
y = np.array([0, 4, 16, 64, 256])
z = np.array([1, 1, 1, 5, 1000])
# data = np.array([x, y])
data = np.array([x, y, z])
def get_all_possible_differences_per_dim(data, dim):
""" """
## ADD DIMENSION AND BROADCAST SUBTRACTION
x = np.reshape(data[dim], (len(data[dim]), 1))
return x - x.transpose()
def get_euclidean_distance(points):
""" """
## STANDARD DISTANCE METRIX
return np.sqrt(np.sum(np.square(points), axis=0))
def single_link(data):
""" """
## GET DISTANCE MATRIX
ndim = data.shape[0]
all_distances = np.array([get_all_possible_differences_per_dim(data, dim) for dim in range(ndim)])
distance_matrix = get_euclidean_distance(all_distances)
## PAD DISTANCE MATRIX AT/BELOW DIAGONAL WITH LARGE DIFFERENCES
## GET ARGSORTED INDICES OF CLOSEST DATA POINTS USING NUMBER OF ELEMENTS CONTAINED IN ABOVE-DIAGONAL DISTANCES
## BELOW-DIAGONAL ENTRIES[i, j] = (-1) * ABOVE-DIAGONAL ENTRIES[j, i]; AVOID OVER-COUNTING
for row in range(len(distance_matrix)):
for col in range(len(distance_matrix[row])):
if row >= col:
distance_matrix[row, col] = np.inf
res = {}
## FLATTEN COPY OF DISTANCE MATRIX TO ONE DIMENSION
arr = np.reshape(distance_matrix, -1)
n = 2*distance_matrix.shape[0]
## USE ARGSORTED INDEX IN CONJUNCTION WITH NUMBER OF ROWS AND COLUMNS
## LOCATE i-th DATAPOINT (via ROW) WITH THE j-th DATA POINT (Via COLUMN)
rows, cols = np.divmod(np.argsort(arr)[:n], distance_matrix.shape[0])
## GET INDICES
indices = np.array([rows, cols]).T
res['indices'] = indices
## GET DISTANCES AT INDICES
res['distances'] = distance_matrix[indices[:, 0], indices[:, 1]]
res['distance matrix'] = distance_matrix
return res
```

To view the results obtained from the algorithm above,

```
my_result = single_link(data)
for key, value in my_result.items():
print("\n .. {} ({}):\n{}\n".format(key, value.shape, value))
```

produces the output:

```
.. distances ((10,)):
[ 6.40312424 23.32380758 29.68164416 110.9954954 134.22369388
140.48843369 1129.99513273 1189.79031766 1202.45789947 1205.88639598]
.. indices ((10, 2)):
[[0 1]
[1 2]
[0 2]
[2 3]
[1 3]
[0 3]
[3 4]
[2 4]
[1 4]
[0 4]]
.. distance matrix ((5, 5)):
[[ inf 6.40312424 29.68164416 140.48843369 1205.88639598]
[ inf inf 23.32380758 134.22369388 1202.45789947]
[ inf inf inf 110.9954954 1189.79031766]
[ inf inf inf inf 1129.99513273]
[ inf inf inf inf inf]]
```

From the above, we identify the first pair of `indices`

as `[0, 1]`

, which corresponds to the 0-th row and 1-st column of the distance matrix. The i-th pair of indices correspond to the i-th shortest distances away.

To view the result from the scipy algorithm, run this:

```
scipy_result = linkage(data, method='single')
print("\n .. SCIPY RESULT ({}):\n{}\n".format(scipy_result.shape, scipy_result))
```

which produces:

```
.. SCIPY RESULT ((2, 4)):
[[ 0. 1. 374.11762856 2. ]
[ 2. 3. 394.48447371 3. ]]
```

To view the figure obtained via scipy,

```
fig = plt.figure()
dn = dendrogram(scipy_result)
plt.show()
plt.close(fig)
```

I can see that the scipy array and my array `my_result['distances']`

have similar shape and likely correspond to each other. Clearly, they have reduced a dimension. But since I am argsorting, I should be collecting the same results in a different shape. Can someone help me see what I am doing wrong?

```
[1]: https://docs.scipy.org/doc/scipy/reference/generated/scipy.cluster.hierarchy.linkage.html#scipy.cluster.hierarchy.linkage
```