Sunday, August 24, 2014

Cortical Learning Algorithm for Morse code - part 1

Cortical Learning Algorithm Overview 

Humans can perform many tasks that computers currently cannot do. For example understanding spoken language in noisy environment, walking down a path in complex terrain or winning in CQWW WPX CW contest are tasks currently not feasible for computers (and might be difficult for humans, too).
Despite decades of machine learning & artificial intelligence research, we have few viable algorithms for achieving human-like performance on a computer. Morse decoding at the best human performance level would be a good target to test these new algorithms.

Numenta Inc.  has developed technology called Cortical Learning Algorithm (CLA) that was recently made available as open source project  NuPIC.  This software provides an online learning system that learns from every data point fed into the system. The CLA is constantly making predictions which are continually verified as more data arrives. As the underlying patterns in the data change the CLA adjusts accordingly. CLA uses Sparse Distributed Representation (SDR) in similar fashion as neocortex in human brain stores information. SDR has many advantages over traditional ways of storing memories, such as ability to associate and retrieve information using noisy data.

Detailed description on how CLA works can be found from this whitepaper.

Experiment 

To learn more how CLA works I decided to start with a very simple experiment.  I created a Python script that uses Morse code book and calculates Sparse Distributed Representation (SDR) for each character.  Figure 1 below shows the Morse alphabet and numbers 0...9 converted to SDRs.

Fig 1. SDR for Morse characters A...Z, 0...9


NuPIC requires input vector to be a binary representation of the input signal.  I created a function that packs "dits" and "dahs" into 36x1  vector, see two examples below. Each "dit"  is represented as 1. followed by 0. and each "dah" is represented by  1. 1. 1. followed by 0.  to accomodate 1:3 timing ratio between "dit" and "dah".  This preserves the semantic structure of Morse code that is important from sequence learning perspective.

H ....
[ 1.  0.  1.  0.  1.  0.  1.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]

O ---
[ 1.  1.  1.  0.  1.  1.  1.  0.  1.  1.  1.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]

The Spatial Pooler uses 64 x 64 vector giving SDR of size 4096. As you calculate the SDR random bits get active on this vector. I plotted all active cells (value = 1) per columns 0...4096 for each letters and numbers as displayed in Fig 1. above. The respective character is shown on the right most column.

To see better the relationships between SDR and Morse character set I created another SDR map with letters  'EISH5' and 'TMO0'  next to each other.  These consequent letters and numbers differ from each other only by one "dit" or one "dah".  See Fig 2.  for SDR visualization of these characters.

There is no obvious visible patterns across these Morse characters, all values look quite different.  In the Numenta CLA whitepaper page 21 it says "Imagine now that the input pattern changes. If only a few input bits change, some columns will receive a few more or a few less inputs in the “on” state, but the set of active columns will not likely change much. Thus similar input patterns (ones that have a significant number of active bits in common) will map to a relatively stable set of active columns."

This doesn't seem to apply in these experiments so I need to investigate this a bit further.


Fig 2. SDR for Morse characters EISH5 and TMO0










I did another experiment by reducing SDR size to only 16x16  so 256 cells per SDR. In Fig 3.  it is now easier to see common patterns between similar characters - for example compare C with K and Y.  These letters have 3 common cells active.

Fig 3. SDR  map with reduced size 16x16 = 256 cells
























The Python software to create the SDR pictures is below:

"""A simple program that demonstrates the working of the spatial pooler"""
import numpy as np
from matplotlib import pyplot as plt
from random import randrange, random
from nupic.research.spatial_pooler import SpatialPooler as SP


CODE = {
    ' ': '',
    'A': '.-',
    'B': '-...',
    'C': '-.-.', 
    'D': '-..',
    'E': '.',
    'F': '..-.',
    'G': '--.',    
    'H': '....',
    'I': '..',
    'J': '.---',
    'K': '-.-',
    'L': '.-..',
    'M': '--',
    'N': '-.',
    'O': '---',
    'P': '.--.',
    'Q': '--.-',
    'R': '.-.',
    'S': '...',
    'T': '-',
    'U': '..-',
    'V': '...-',
    'W': '.--',
    'X': '-..-',
    'Y': '-.--',
    'Z': '--..',
    '0': '-----',
    '1': '.----',
    '2': '..---',
    '3': '...--',
    '4': '....-',
    '5': '.....',
    '6': '-....',
    '7': '--...',
    '8': '---..',
    '9': '----.' }



class Morse():
  
  def __init__(self, inputShape, columnDimensions):
    """
     Parameters:
     ----------
     _inputShape : The size of the input. (m,n) will give a size m x n
     _columnDimensions : The size of the 2 dimensional array of columns
     """
    self.inputShape = inputShape
    self.columnDimensions = columnDimensions
    self.inputSize = np.array(inputShape).prod()
    self.columnNumber = np.array(columnDimensions).prod()
    self.inputArray = np.zeros(self.inputSize)
    self.activeArray = np.zeros(self.columnNumber)

    self.sp = SP(self.inputShape, 
self.columnDimensions,
potentialRadius = self.inputSize,
numActiveColumnsPerInhArea = int(0.02*self.columnNumber),
globalInhibition = True,
synPermActiveInc = 0.01
)

    
  def createInputVector(self,elem,chr):
        
    print "\n\nCreating a Morse codebook input vector for character: " + chr + " " + elem
    
    #clear the inputArray to zero before creating a new input vector
    self.inputArray[0:] = 0
    j = 0
    i  = 0

    while j < len(elem) :
      if elem[j] == '.' :
        self.inputArray[i] = 1
        self.inputArray[i+1] = 0
        i = i + 2
      if elem[j] == '-' :
        self.inputArray[i] = 1
        self.inputArray[i+1] = 1
        self.inputArray[i+2] = 1
        self.inputArray[i+3] = 0
        i = i + 4
      j = j + 1
               
    print self.inputArray
     
     
  def createSDRs(self,row,x,y,ch):
    """Run the spatial pooler with the input vector"""
    print "\nComputing the SDR for character: " + ch
    
    #activeArray[column]=1 if column is active after spatial pooling
    self.sp.compute(self.inputArray, True, self.activeArray)
    
    # plot each row above previous one
    z = self.activeArray * row

    # Plot the SDR vectors - these are 4096 columns in the plot, with active cells visible
    plt.plot(x,y,z,'o')
    plt.text(4120,row-0.5,ch,fontsize=14);
    print self.activeArray.nonzero()
    
    
# Testing NuPIC for Morse decoding  
# Create SDRs from Morse Codebook input vectors
print "\n \nCreate SDRs from Morse Codebook input vectors"

# vector size 36x1 for input,  64x64 = 4096 for SDR 
example = Morse((36, 1), (64, 64))

x,y = np.meshgrid(np.linspace(1,1,4096), np.linspace(1,1,32))
plt.ylim([0,38])
plt.xlim([0,4170])

# Select the characters to be converted to SDRs
#str = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789'
str = 'EISH5 TMO0'
row = 1
for ch in str:
  example.createInputVector(CODE[ch],ch)
  example.createSDRs(row,x,y,ch)
  row = row +1

plt.show()
plt.clf()



Conclusions

CLA provides promising new technology that is now open for ham radio experimenters to start tinkering with. Building a CLA based Morse decoder would be a good performance benchmark for CLA technology. There are plenty of existing Morse decoders available to compare with and it is fairly easy to test decoder accuracy under noisy conditions.  There is also a plenty of audio test material available, including streaming sources like WebSDR stations.

73
Mauri 




Popular Posts