#!pip install ANNarchy
Logging with tensorboard
The tensorboard
extension allows to log various information (scalars, images, etc) during training for visualization using tensorboard
.
The extension has to be explicitly imported after ANNarchy:
import numpy as np
import matplotlib.pyplot as plt
import ANNarchy as ann
from ANNarchy.extensions.tensorboard import Logger
ANNarchy 4.8 (4.8.2) on darwin (posix).
As it is just for demonstration purposes, we will be an extremely simplified model of the basal ganglia learning to solve through reinforcement learning a stimulus-response task with 4 stimuli and 2 responses (left and right). The two first stimuli should be responded with left, the two others with right.
= [
stimuli 1, 0, 0, 0], 0), # A : left
([0, 1, 0, 0], 0), # B : left
([0, 0, 1, 0], 1), # C : right
([0, 0, 0, 1], 1), # D : right
([ ]
We keep here the model as simple as possible. It is inspired from the rate-coded model described here:
Vitay J, Hamker FH. 2010. A computational model of Basal Ganglia and its role in memory retrieval in rewarded visual memory tasks. Frontiers in computational neuroscience 4. doi:10.3389/fncom.2010.00013
The input population is composed of 4 static neurons to represent the inputs:
= ann.Population(4, ann.Neuron(parameters="r=0.0")) cortex
The cortex projects on the striatum, which is composed of 10 neurons integrating excitatory and inhibitory inputs:
= ann.Neuron(
msn ="tau = 10.0 : population; noise = 0.1 : population",
parameters="""
equations tau*dv/dt + v = sum(exc) - sum(inh) + noise * Uniform(-1, 1)
r = clip(v, 0.0, 1.0)
""")
= ann.Population(10, msn) striatum
The striatum projects inhibitorily on GPi, whose neurons are tonically active (high baseline). Normally, GPi would project on the thalamus and back to the cortex, but here we read the output of the network directly in GPi: if the first neuron (corresponding to the left action) is less active than the second neuron, the selected action is left.
= ann.Neuron(
gp_neuron ="tau = 10.0 : population; B = 1.0",
parameters="tau*dv/dt + v = B - sum(inh); r= pos(v)")
equations= ann.Population(2, gp_neuron) gpi
Learning occurs at the cortico-striatal synapses, using a reward-modulated Hebbian learning rule, with Oja regularization:
= ann.Synapse(
corticostriatal ="""
parameters eta = 0.1 : projection
alpha = 0.5 : projection
dopamine = 0.0 : projection""",
="w += eta*(dopamine * pre.r * post.r - alpha*w*post.r*post.r) : min=0.0"
equations
)= ann.Projection(cortex, striatum, "exc", corticostriatal)
cx_str =ann.Uniform(0.0, 0.5)) cx_str.connect_all_to_all(weights
<ANNarchy.core.Projection.Projection at 0x12695f8f0>
Some lateral competition between the striatal neurons:
= ann.Projection(striatum, striatum, "inh")
str_str =0.6) str_str.connect_all_to_all(weights
<ANNarchy.core.Projection.Projection at 0x12695da90>
One half of the striatal population is connected to the left GPi neuron, the other half to the right neuron:
= ann.Projection(striatum[:int(striatum.size/2)], gpi[0], 'inh').connect_all_to_all(1.0)
str_gpi1 = ann.Projection(striatum[int(striatum.size/2):], gpi[1], 'inh').connect_all_to_all(1.0) str_gpi2
We add a monitor on GPi and compile:
= ann.Monitor(gpi, 'r')
m
compile() ann.
Compiling ... OK
Each trial is very simple: we get a stimulus x
from the stimuli
array and a correct response t
, reset the network for 40 ms, set the input and simulate for 50 ms, observe the activity in GPi to decide what the answer of the network is, provide reward accordingly to the corticostriatal projection and let learn for 10 ms.
Here the “dopamine” signal is directly the reward (+1 for success, -1 for failure), not the reward prediction error, but it is just for demonstration.
def training_trial(x, t):
# Delay period
= 0.0
cortex.r = 0.0
cx_str.dopamine 40.0)
ann.simulate(
# Set inputs
= np.array(x)
cortex.r 50.0)
ann.simulate(
# Read output
= gpi.r
output = np.argmin(output)
answer
# Provide reward
= 1.0 if answer == t else -1.0
reward = reward
cx_str.dopamine 10.0)
ann.simulate(
# Get recordings
= m.get('r')
data
return reward, data
The whole training procedure will simply iterate over the four stimuli for 100 trials:
for trial in range(100):
# Get a stimulus
= stimuli[trial%len(stimuli)]
x, t # Perform a trial
= training_trial(x, t) reward, data
We use the Logger
class of the tensorboard
extension to keep track of various data:
with Logger() as logger:
for trial in range(100):
# Get a stimulus
= stimuli[trial%len(stimuli)]
x, t # Perform a trial
= training_trial(x, t)
reward, data # Log data...
Note that it would be equivalent to manually close the Logger after training:
= Logger()
logger for trial in range(100):
# Get a stimulus
= stimuli[trial%len(stimuli)]
x, t # Perform a trial
= training_trial(x, t)
reward, data # Log data...
logger.close()
We log here different quantities, just to demonstrate the different methods of the Logger
class:
- The reward received after each trial:
"Reward", reward, trial) logger.add_scalar(
The tag “Reward” will be the name of the plot in tensorboard. reward
is the value that will be displayed, while trial is the index of the current trial (x-axis).
- The activity of the two GPi cells at the end of the trial, in separate plots depending on the stimulus:
if trial%len(stimuli) == 0:
= "GPi activity/A"
label elif trial%len(stimuli) == 1:
= "GPi activity/B"
label elif trial%len(stimuli) == 2:
= "GPi activity/C"
label elif trial%len(stimuli) == 3:
= "GPi activity/D"
label "Left neuron": gpi.r[0], "Right neuron": gpi.r[1]}, trial) logger.add_scalars(label, {
The four plots will be grouped under the label “GPi activity”, with a title A, B, C or D. Note that add_scalars()
requires a dictionary of values that will plot together.
- The activity in the striatum as a 2*5 image:
"Activity/Striatum", striatum.r.reshape((2, 5)), trial) logger.add_image(
The activity should be reshaped to the correct dimensions. Note that activity in the striatum is bounded between 0 and 1, so there is no need for equalization.
- An histogram of the preference for the stimuli A and B of striatal cells:
= np.array(cx_str.w)
w "Cortico-striatal weights/Left - AB/CD", np.mean(w[:5, :2] - w[:5, 2:], axis=1), trial)
logger.add_histogram("Cortico-striatal weights/Right - AB/CD", np.mean(w[5:, :2] - w[5:, 2:], axis=1), trial) logger.add_histogram(
We make here two plots, one for the first 5 striatal cells, the other for the rest. We plot the difference between the mean weights of each cell for the stimuli A and B, and the mean weights for the stimuli C and D. If learning goes well, the first five striatal cells should have stronger weights for A and B than for C and D, as they project to the left GPi cell.
- A matplotlib figure showing the time course of the two GPi cells (as recorded by the monitor):
= plt.figure(figsize=(10, 8))
fig 0], label="left")
plt.plot(data[:, 1], label="right")
plt.plot(data[:,
plt.legend()"Activity/GPi", fig, trial) logger.add_figure(
Note that the figure will be automatically closed by the logger, no need to call show()
. Logging figures is extremely slow, use that feature wisely.
By default, the logs are saved in the subfolder runs/
, but this can be changed when creating the Logger:
with Logger("/tmp/experiment") as logger:
Each run of the network will be saved in this folder. You may want to delete the folder before each run, in order to only visualize the last run:
%rm -rf runs
with Logger() as logger:
for trial in range(100):
# Get a stimulus
= stimuli[trial%len(stimuli)]
x, t
# Perform a trial
= training_trial(x, t)
reward, data
# Log received rewards
"Reward", reward, trial)
logger.add_scalar(
# Log outputs depending on the task
if trial%len(stimuli) == 0:
= "GPi activity/A"
label elif trial%len(stimuli) == 1:
= "GPi activity/B"
label elif trial%len(stimuli) == 2:
= "GPi activity/C"
label elif trial%len(stimuli) == 3:
= "GPi activity/D"
label "Left neuron": gpi.r[0], "Right neuron": gpi.r[1]}, trial)
logger.add_scalars(label, {
# Log striatal activity as a 2*5 image
"Activity/Striatum", striatum.r.reshape((2, 5)), trial)
logger.add_image(
# Log histogram of cortico-striatal weights
= np.array(cx_str.w)
w "Cortico-striatal weights/Left - AB/CD", np.mean(w[:5, :2] - w[:5, 2:], axis=1), trial)
logger.add_histogram("Cortico-striatal weights/Right - AB/CD", np.mean(w[5:, :2] - w[5:, 2:], axis=1), trial)
logger.add_histogram(
# Log matplotlib figure of GPi activity
= plt.figure(figsize=(10, 8))
fig 0], label="left")
plt.plot(data[:, 1], label="right")
plt.plot(data[:,
plt.legend()"Activity/GPi", fig, trial) logger.add_figure(
Logging in runs/May29_14-02-54_Juliens-MBP
You can now visualize the logged information by running tensorboard in a separate terminal and opening the corresponding page:
tensorboard --logdir runs
or directly in the notebook if you have the tensorboard
extension installed:
%load_ext tensorboard
%tensorboard --logdir runs --samples_per_plugin images=100
You should see a tensorboard page with four tabs Scalars, Images, Distributions and Histograms:
The Reward plot shows that the network successfully learns to solve the task, as it consistently gets rewards of +1 (note that this may vary from run to run, depending on weight initialization):
The GPi activity tab shows that the two GPi cells quickly learn to be inhibited for the right stimuli.
In the Images tab, the plot for the striatum allows to visualize activity at the end of each rtial, showing that only one cell in the correct subpopulation is active:
The matplotlib figure for the GPi activity shows what happens during a trial, especially at the end of the reset period:
In the histograms tab, we can see that the left striatal population has acquired a preference (stronger weights) for the stimuli A and B, as the values are positive. The right population has negative values, so the neurons have stronger weights to the stimuli C and D. Note that some neurons in the right population still have stronger weights from A and B, but they are probably inhibited by the left population, so they do not impair performance.