Neuroscience Study

2.4 Neural Encoding: Variability 본문

Computational Neuroscience/Week2 What do Neurons Encode?

2.4 Neural Encoding: Variability

siliconvalleystudent 2022. 10. 4. 02:39

2.4 Neural Encoding: Variability

_118ba0a52ea1cb8a72560d78573bb6f8_Lecture-2-part-4.pdf
2.13MB

 

We'll finish up this week's material by considering a final couple of model upgrades.

Let's go back and stare at this data again.

There are a couple of issues that we haven't yet addressed.

One is that we modeled only a time bearing firing rate.

And of course this data is in the form of spiked times.

In what sense the precise patterns of these spike trains might be meaningful is something that we'll return to in a couple of weeks. But in this section we'll address directly the hidden assumptions of models, like the ones we've been developing, about the relationship between that time varying firing rate, RFT, and the currents of single spikes.

And we'll try to deal with the fact that there does appear to be some fine structure, here maybe, in the spike trains that a smooth function RFT can miss.

But first we'll talk about the fact that this data was produced by showing the retina a natural movie and not white noise, which was the stimulus that we used in our previous discussion.

In real life, neurons aren't living in a world of white noise and it turns out that the statistics of this stimulus that you used to sample a model do affect that model that you arrive at.

So we choose to use white noise rather than some more natural stimulus because no matter how you filter it, it's always Gaussian, which means that there's no special structure, no special directions in the stimulus set itself.

Since it's already come up and it will be coming up again let me just remind you what a Gaussian function is.
So it's defined as follows. Some coefficient multiplied by this exponential factor, which includes x minus some, some parameter.

X not squared divided by 2 sigma squared.

So here, x not is the center of this function.

And sigma is a measure of its width.

So for thinking about this function p of x, as a Gaussian probability, distribution over x.

Then x is mean, x bar, which is the mean of x, is x naught.

And it's variance, defined as x minus it's average, squared is equal to sigma squared.

So that standard deviation is just the square root of that, which is sigma.

Now, if you add together two or more Gaussian random numbers, the new random number also has a Gaussian distribution and that's just what you're doing by filtering.

Taking linear combinations of the values of the white noise at different time points.

So with white noise, when we're using geometrical techniques like PCA, we're making sure that we have a stimulus that's as symmetric as possible with respect to those coordinate transformations that filtering give us.

There are no special stimulus dimensions that are built into the prior, into the, into the stimulus ensemble itself.

Let's go back to the question that we posed last time.

When have we found a good feature?

When have we identified a good, a good filter, f?

We answered that by looking for the response function with respect to that stimulus component f, an input output curve that is interesting or has some structure.

So, recall, I showed you these, these two cases.

In this one, the Gaussian prior here of the distribution of the filtered stimulus.

And the conditional distribution, so those values of the filtered stimulus that are conditional on, on the arrival time of the spike.

In this case, those two distributions are very similar, so when we take their ratio to compute the input output
function, we get just a flat curve.

Now when those two distributions are, are very distinct, when they're very different, then their ratio has some interesting structure.

So, instead of taking the average or doing PCA to find that filter, could we just go directly to these quantities, to these, to these distributions, the prior and the conditional distribution, and ask, can I find an F?

A choice of F, that when I project the stimulus onto it, that the conditional distribution and the prior are as different as possible.

So, what would it mean to be as different as possible?

There's a standard measure that we use for evaluating the difference between two probability distributions, and that's called the Kullback-Leibler divergence.

So here is the, the definition of the Kullback- Leibler divergence, DKL.

So here is the divergence between two distributions P of s and Q of s.

It's given by integrating over all the, all the random variables.

So, in this case s, so we integrate over s. P of s, multiplied by the logarithm of the ratio of those two distributions.

So what do we get if we use this DKL between the prior and the spike conditional distribution as a measure of the success of the choice of f, and just try to find an f that maximized this this quantity directly?

So this is the approach that was developed by Tatyana Sharpee and Bo Bialek.

So now, I'm taking some arbitrary stimulus distribution.

And here I've drawn it in a, you know, pseudo high dimensional space.

P of s is the, is the distribution of all possible stimuli.

We're going to take some filter, again a vector, in this high dimensional space, that's f1, and project all of the stimuli
onto it to compute the prior here in gray.

And now we'll project the spike-triggering stimuli which here, we pictured in, in yellow to compute the spike-conditional distribution, here in yellow.

And now, one can vary f around.

Right, so we can take different directions of this f.

And repeat this procedure and compute the DKL between this prior and the spike conditional distribution.

Here's another example of a different choice of f, f2.

In that case our prior has a slightly different shape because the stimulus distribution has a different shape in that direction and the spike conditional distribution also has a different shape.

You can see that these two distributions are much more similar than these two are.

And so, we would prefer f1 as a better choice of our filter than we would f2.

And so one can move around in this space and keep evaluating these two distributions, and look, search for an f that maximizes the difference between those two distributions.

Now, this turns out to be equivalent to maximizing the mutual information between the spike
and the stimulus.

So we're trying to find a stimulus component that is as informative as possible.

So observing a spike pins down our estimate for the stimulus much better for the f1 component, in this case, than
it does for the f2 component.

So notice that the stimulus here is no longer Gaussian, we mentioned that. It's no longer a nice, symmetric ball, and
I've draw it like that because there's nothing about this technique that demands that our stimulus be white noise.

Since this is a stimulus with some arbitrary distributions, you can see both the prior and the spike-conditional distributions
and varying with the direction of f, but that is okay.

The fact that this method can be applied to arbitrary inputs means that this technique has been applied
to derived models using natural stimuli.

So one can then take, take this to the next step and compute the input output function from the ratio of the conditional distribution and the prior.

So it's a powerful technique.

It generalizes to, to complex stimuli.

However, one of its downsides is that the maximization step is not guaranteed to converge to a
unique maximum.

That is, it is a difficult optimization problem.

So to summarize, we saw how to build a model with a single filter, by taking the spike triggered average.

We saw that we could generalize that to multiple filters using PCA.

And, finally we introduced an information theoretic method that uses the whole distribution of stimuli to compute and optimal filter and this light less method removed the requirement for Guassian stimuli.

So our next task, as foreshadowed, is to deal with the issue of the relationship between our time varying r of t and the arrival time of spikes.

So to go from r of t to spikes, the assumption that we'll be making is that every spike is generated independently
with the probability that scaled by that time variant r of t.

What does this mean and how can we test it?

Let's start from the most elementary random process, the flip of a coin.

Says probability 1/2 of landing heads, probability 1/2 of landing tails.

Now, let's take a biased coin, it only has some small probability, p, of landing heads up, and that;s when the system spikes.

So now we can think of the arrival times of spikes as,as obeying something as simple as that.

We have some time, t. We divide it into many time bends of size delta t.

Let's say there's n of them, right, n is t over delta t.

And that gives us a sequence of n time bends and let's assume that's the same probability, p, of
firing in each of those bends.

Now we'd like to know how many spike will occur in the total time t? This is, of course, a random number. It will vary on every trial.

This random number has what's called a binomial distribution.

Binomial meaning two value and those two values have the probability firing p and the probability of not firing one minus p.

So, what's the probability that we'll see some particular number of spikes, k spikes in those n time bends.

How do we compute this?

All we need to do, is count, what's the probability that there's a spike at exactly k bends, it's the probability, bend by bend, that a spike occurred.

So, we need probability to the power k. And then the probability that a spike didn't occur in the remaining bends, so 1 minus p.

How many bends did a spike not happen in, that's n minus k.

And we don't really care which of the k bends it occurred in, so we need to count up the number of different ways that we could arrange those k spikes among the, among the n bends.
And that's a quantity often called n choose k, and we can write that as n factorial, over k factorial n minus k factorial.
Where factorial, let's give an example, three factorial is three times two, times one.

So n factorial is n times n minus one, times n minus two all the way down to one.

So write this here. Now what's the average number of spikes?

That's just n times p, the number of bends the number of bends times the probability in a bend that they'll be a spike.

What's the variance in the number of spikes?

That turns out to be given by n p, 1 minus P.

Now, in the limit that there are many time bends and the probability of a spike in any bend becomes very small, one can show that the binomial distribution has a limit.

That's the following form.

So we go from that distribution that we just arrived, in the limit of very small time bends and now where we set a parameter r, which is the probability in a time bend, divided by the size of the time bend.

So the probability for a given time bend is going to be coming very small as the time bend size becomes very size small.

So what we want to do is set some parameter r, such that that parameter stays finite as the time bend gets very small.

And so that's the rate or probability per unit of time.

So now that becomes our parameter in this distribution.

So one can start with that previous distribution of the binomial distribution, do some calculations and end up with with
an expression like this.

Some of you might like to try that for yourself or perhaps look it up on, on Wikipedia.

This new distribution is called the Poisson Distribution.

I've sub scripted it now, not by the number of bends but by the total time, t, as we again assumed that we've taken limit where delta t becomes very small.

So what are the properties of the Poisson distribution? It has a mean of r times t, which hopefully feels intuitive.

The number of spikes is the rate times the total time, slightly less intuitive.

So it has a variance that's given by r times t.

So you might notice that that's the same as the mean.

That is a very unusual propedate, and because of that, a quantity called the Fano factor, which is the ratio of the
mean to the variance, has become a way to test whether a distribution is Poisson or not.

If it has a value of one, then it's Poisson.

Finally false spikes have been generated through a Poisson process, which fundamentally expresses the idea we
started from, which is that they're generated in every time bin, delta t, as though they were independent with the
probability r times delta t.

Then they'll also have the property that the intervals between successive spikes has an exponential distribution.

You can gets some intuition for why this is by considering this distribution above but, evaluated just for one spike as
a function now of the time, t.

You'll see the appearance of the exponential, and the factorial goes away.

So comparing between them, the interval distribution doesn't have this factor t out the front because it has to be normalized over all time while the expression above doesn't.

Now, the probability of seeing 5 spikes in a chunk of time, t, depends on the firing
rate in this way, this is the Poisson distribution.

So these are two strong characteristics of a Poisson distribution.

One, that the final factor is 1. And, second that the interval distribution should look like an exponential distribution of
times.

So here are some examples of the Poisson distribution for a few different choices of the firing rate.

For low firing rate, the distribution is almost exponential, whereas as the rate gets higher, the Poisson distribution
looks more and more Gaussian.

Now in general, the rate is varying as a function of time.

So if we want to see if this idea is reasonable by looking at data, we need to allow r, the rate, to vary in time.

Here is a data from a neuron in monkey MT cortex, which is sensitive to motion. The monkey is watching the variable
patterns drift across the screen and we're going to look in more detail at this experiment next week.

The same pattern is being shown over and over again.

You see that as in the retina there is an over all modulation in the firing rate over time.

But if you zoom in here on a short interval of time, that's drawn up here, the spikes are very
variable.

Now if you split the data up into these little windows of time and plot the main number of spikes in a time bend against the variance in that time bend what would you expect to see?

In every bend, if the spikes of Poisson but with a different rate, you could plot the rate against the variance.

What would you expect?

Remember that the slope of that plot would be the Fano factor.

So it expected, if it were Poisson, to have a constant slope of about one.

And in the data you see that, that is, that is very close to being true.

Here is the line, the line of slope 1.

You see that the data is very close to that.

So, even though the firing rate is changing in each short time chunk, the cell's response looks Poisson.

Where does this kind of variability come from?

It's likely that while the neuron is receiving a mean input that's proportional to the stimulus, it's also
receiving a barrage of background input.

Remember that a cortical neuron gets inputs from around 10,000 other neurons.

If that input is balanced, that is, if it varies around zero, to be both positive and negative, it won't add much to the average firing rate.

But it will jitter at the spikes.

For example, here's the behavior of a neuron model, that's driven by white noise.

It also looks very close to Poisson, in the sense that the interspike interval distribution looks very close to exponential

I've emphasized that by plotting the number of intervals in log, against the interval itself.

Which, if it's an exponential distribution, should look like a straight line with a negative slope, given by the firing rate.

So the Poisson nature of firing and the randomness that we need to build into our response models takes care of the effects of random unobserved background noise.

Still, let's zoom in on these very short intervals.

At short intervals, the distribution stops looking exponential.

This is for the very good reason that a neuron is unable to fire arbitrarily rapidly.

There are bi-physical processes that prevent a neuron from firing immediately after an action potential, and you see here that's caused a gap of maybe a minimum of 10 seconds, in this case, between successive spikes.

So we're going to talk about those processes in a few weeks from now.

So, we might want to improve our model yet more, by taking these intrinsic limitations in firing seriously.

This can be very helpful, as these intrinsic processes going on inside the neuron, might add quite a bit of
structure to the spike trains.

For example, there may be some resonance such the neuron likes to fire at a certain frequency independent of
the fluctuations of the stimulus.

So these intrinsic effects can be built into coding models.

They're elaborations of the ones we've been looking at called generalized linear models.

Here the setup is very similar, the stimulus comes at similar, the stimulus comes in, is filtered through some feature, processed through a nonlinearity.

Here the nonlinearity is drawn as exponential.

I'll talk about that in a minute.

And there's, then there's an explicit spike generation step, explicit Poisson spike generation step.

If generation of the random process generates a spike, then a so-called post-spike filter, drawn here, is injected back in to the input that's going into the nonlinearity.

So, of example, if the system is refractory what you'd want for this waveform is that it would quickly move you away from threshold and hold you away from it for some time, so you want a big negative pulse that might decay back over time.

So we might want to add in something like this that decays back over time.

So that would draw the neuron away from, from spiking with an initial big dip and then relax back over, over the refractory period.

The one that's drawn here, taken from this, this very nice paper, is a little bit more sophisticated.

It first draws the neuron away from spiking, with a big initial dip, so it has the refractory property built in, but then it becomes positive, which is going to promote spiking at some time after the previous spike.

So that could give a neuron that has a slight tendency to fire periodically which is very nice.

So the spiking probability is now proportional to an exponential of the filtered stimulus as before, plus the filtered spiking activity as we've drawn, as we've written out right here.

So why this exponential non-linearity?

In the models that I've shown before, we've allowed the non-linearity to be something that we've computed directly from the data whereas here it's fixed as a non-linearity.

Liam Paninsky showed that by fixing the non-linearity to be exponential, or to be in the exponential family, you become
able to, you become able to find all the parameters of this model, all the values of these filters using an optimization scheme that's now globally converted.

So you've sacrificed some generality for a model that's more complete in another way.

You get more power in that you can add more, more filters and it's guaranteed to be solved reliably and repeatably.
So if we're going on adding additional factors to what can influence the spiking probability, why, why stop at that?

As Emery Brown and colleagues pointed out, one can also include many other intrinsic and extrinsic factors. In this paper, the group included the influence, not only of refractory effects, but also of the firing of other neurons in the network and applied this to the type of data that you saw from the retina.

So including both self firing, the output of the neuron itself, and also the effects of the firing of other neurons, they allowed them to predict the spike patterns.

So they got, they were able to captured these detailed spike interval patterns that we saw in the retinal data, but also the correlations between the neurons and the network, and they were able to do that with, with amazing accuracy.

So, I'll finish up with another beautiful idea from, from Emory Brown's group.

We can use this Poisson nature of firing to test whether we have captured everything that we can about
the inputs in our model.

Let's say we have a model like the GLM, where the output depends on many influences, on the stimulus, on the history of firing in the neuron that recoding from on the history of firing in, in other neurons as well.

Then we can our output spike intervals and scale them by the firing rate that's predicted by the model.

So we take these intervals times between successive spikes, we scale them by the firing rate that our model predicted given all the interactions that, that we've incorporated.

If this predicted rate does truly account for all the influences on the firing, even ones due to previous spiking, then these new scaled intervals should be distributed like a pure Poisson process, with an effective rate of one, that is as a single clean exponential.

So this is called the Time-rescaling theorem and it's used as a way to test how well one has done in capturing all the influences on spiking with ones models.

So, we've reached the end of this stretch.

We've looked at some classical, and some more modern ways, of thinking about what spikes represent and how one can predict them from, from data.

I'd like to emphasize that some of these models and methods are a very powerful way of thinking about the neural code.

But there is a lot that they ignore.

These models, in particular, give the impression that neurons represent a particular thing, and
that's it.

In fact, neural responses are modulated by many other influences, by how the animal is using it's body to

deploy it's senses, to what it expects to see in the environment, by the context in which

the stimulus appears.

We'll have a look at one example of such influences in the later lecture.

But you should also, always keep in mind that while I'm trying to give you an overview of current approaches to understanding the brain.

And these methods have made huge progress in allowing us to make sense of a lot of data even if under rather
limited circumstances.

It's likely that some of these ideas might be overturned completely with a much more general approach.

So the field is still really wide open to new ideas and concepts that will provide a richer and a more powerful understand.

So to wrap up, I know this week has started to exercise maybe some math muscles that might be rusty.

So please refer to the supplementary materials online to see if there's anything that can help you, and do hit the
forums.

There are a lot of knowledgeable people among you, and it's great to see questions being answered and discussions developing there.

And, of course, our team is standing by, ready to pitch in and to help, as well.

For next week, I hope you'll join us again as we start to learn how to use decoding to read minds.

Back next week.