DecodingSpikeTrainsInstantbyInstantUsi

Decoding Spike Trains Instant by Instant Using Order Statistics and the Mixture-of-Poissons Model

http://www.100md.com 《神经科学杂志》2003年第6期

     Laboratory of Neuropsychology, National Institute of Mental Health, National Institutes of Health, Department of Health and Human Services, Bethesda, Maryland 20892-4415/, http://www.100md.com

    ABSTRACT/, http://www.100md.com

    In the brain, spike trains are generated in time and presumably also interpreted as they unfold in time. Recent work (Oramet al., 1999; Baker and Lemon, 2000) suggests that in severalareas of the monkey brain, individual spike times carry informationbecause they reflect an underlying rate variation. Constructinga model based on this stochastic structure allows us to applyorder statistics to decode spike trains instant by instant asspikes arrive or do not. Order statistics are time-consuming tocompute in the general case. We demonstrate that data from neuronsin primary visual cortex are well fit by a mixture of Poissonprocesses; in this special case, our computations are substantiallyfaster. In these data, spike timing contributed information beyondthat available from the spike count throughout the trial. At theend of the trial, a decoder based on the mixture-of-Poissons modelcorrectly decoded about three times as many trials as expectedby chance, compared with approximately twice as many as expectedby chance using the spike count only. If our model perfectly describedthe spike trains, and enough data were available to estimate modelparameters, then our Bayesian decoder would be optimal. For four-fifthsof the sets of stimulus-elicited responses, the observed spiketrains were consistent with the mixture-of-Poissons model. Mostof the error in estimating stimulus probabilities is attributableto not having enough data to specify the parameters of the modelrather than to misspecification of the modelitself.

    Key words: visual cortex; information; coding; modeling; statistics; timing7k]](t0, 百拇医药

    Introduction7k]](t0, 百拇医药

    Because responses are presumably interpreted as they unfold in time, our goal is to decode spike trains instant by instantdepending on whether a spike arrived. If spike trains are thoughtof as words in a language, then decodingfiguring out what stimuluselicited a particular observed spike traincan be thought of aslooking up words in a neural dictionary. Ideally this dictionaryshould allow us, at any point in time, to translate a spike traininto the best possible guess of which stimulus elicitedit.7k]](t0, 百拇医药

    One approach to constructing a neural dictionary is to make as few assumptions as possible. Another approach, which we takehere, is to model spike trains using (and checking) certain assumptionsabout their structure. Recent work shows that in spike trainsfrom three areas of the monkey brain (the lateral geniculate nucleus,primary visual cortex (V1), and primary motor cortex) spike timesappear to have been thrown down at random, with probabilitiesdetermined by the firing rate profile over time, the peristimulustime histogram (PSTH) (Oram et al., 1999, 2001; Baker and Lemon,2000). Oram et al. (1999) and Baker and Lemon (2000) simulatedstochastic (but non-Poisson) spike trains with relations amongspikes that were indistinguishable from those observed in experimentaldata. Using these models, it is in principle possible to determinehow likely each stimulus is to have elicited every possible spiketrain. However, compiling a dictionary in this way would requiresimulating very large data sets, which, although less difficultthan gathering enough experimental data, would still be prohibitivelytime-consuming.

    Here we show that for spike trains with the stochastic structure described by Oram et al. (1999), we can use order statistics(Arnold et al., 1992), instead of simulation, to calculate directlyhow likely each stimulus is to have elicited each spike train.Order statistics give the probabilities of individual spike timesin trains with this stochastic structure based on the spike countand the firing rate profileexactly the parameters used by Oramet al. (1999). Using order statistics, we can update the stimulusprobabilities at each instant depending on whether a spike arrives.Once we have these probabilities, we can guess, for example, thatthe stimulus with the highest probability elicited the spike train.This is something like looking up a word letter by letter anddiscarding words that do not begin with the letters seen sofar.!'$(e0}, 百拇医药

    Although much faster than simulation, decoding using order statistics is, in the general case, still more time-consuming thanwe would like. However, calculating order statistics is much simplerand faster in the special case in which the spike trains are consideredas arising from a mixture of a small number of Poisson processes.We show that our data are in fact well fit by such a model. Therecognition of this structure in the data substantially simplifiesand speeds decoding of single-neuron spike trains instant byinstant.

    Materials and Methodsy, 百拇医药

    Decoding. Decoding, that is, guessing which stimulus elicited a particular spike train, has two steps. We first estimate theprobability that each stimulus elicited the spike train and thenchoose a stimulus based on those estimated probabilities. Herewe minimize expected error by choosing the most probable stimulus.If some mistakes are more important than others, a decision ruleminimizing an appropriate cost function could beused.y, 百拇医药

    With the decision rule fixed, decoding a spike train as it unfolds in time requires estimating the probability p_s(t_j|R_j):the probability, estimated at time t_j, that stimulus s elicitedthe response R_j , where each r_i is 1 if a spikeoccurred at time i and 0 if no spike occurred in time bin i. Thepresence or absence of a spike is measured in 1 msec bins; timet_i is the time in the ith bin. Any bin width can be used as longas there can be at most one spike in any bin. All of our calculationscould be recast in continuous time in a straightforwardmanner.

    Using Bayes' rule, estimating p_s(t_j|R_j) can be transformed into estimating the probability that a spike occurs in the nextbin (which may depend on the history of the spike train):u(+4t, 百拇医药

    if a spike is fired in bin j, oru(+4t, 百拇医药

    if no spike is fired in bin j. P(r_j = 1|s, R_{j 1}) is the probability that the a spike appears in the timebin j (r_j = 1) if stimulus s has been presented and the responsethrough time t_{j 1} is R_{j 1}.We use P for the probability of response given stimulus and pfor probability of stimulus given response to create a typographicaldistinction. Z, here and throughout, is a normalizing term; thatis, it is the sum, across the appropriate variable, usually thestimulus s, of terms in the numerator of whatever fraction itappearsin.u(+4t, 百拇医药

    Using Equations 1 and 2, we proceed bin by bin, updating the stimulus probabilities depending on whether a spike occurs ineach bin. For efficiency, responses can be decoded in small jumpsrather than instant by instant. The probability P(r_j . . . r_{j + k}|s)for any response can be calculated using the distribution of firstspike times, P(₁|s):

    where the response through time bin j is assumed known. Below we will show how to calculate the distribution of first spiketimes (Arnold et al., 1992).+l$0o-%, 百拇医药

    We avoid overfitting using three-way cross-validation: the trials were divided into three subsets, with two-thirds used tofit the model and the remaining third used to test decoding. Eachthird of the data was used twice to fit and once to test; we averagedover the threesplits.+l$0o-%, 百拇医药

    Order statistics for spike trains. By considering each spike as the "next first spike," the distribution of first spike timesP(₁ = t_j|s) was calculated for each stimulus using .+l$0o-%, 百拇医药

    For a spike train with n spikes, the (n, k) order statistic describes the distribution of the kth of n draws, after thosedraws are sorted; in our case, this will be P(_k|s, n), the timeof the kth of n spikes in a train elicited by stimulus s:+l$0o-%, 百拇医药

    When we focus on the first of n spikes, i.e., k = 1, simplifies to describe the (n, 1) or first order statisticP(₁|s, n):

    , f_s(t_j) is the normalized spike density function or firing rate profile over time; F_s(t_j) is the correspondingcumulative firing rate profile; and the factor [1 F_s(t_j)]^{n k} is the probability of n k spikes arriving after time t_j. Asnoted above, we discretize time into 1 msec bins (the same precisionwith which spike times wererecorded).fe, http://www.100md.com

    illustrates estimating the quantities in from data for spike trains elicited by two stimuli (coded black,gray throughout). One stimulus elicits spikes beginning ~25 msecearlier than the other. For each stimulus s, the firing rate profileis f_s(t) . F_s(t) is the corresponding cumulativeprobability. From f_s(t) and F_s(t), we can calculate P(₁|s, n)for any spike count n . As n increases, the weightof the first order statistic shifts left; i.e., the first spikeis likely to occur earlier.fe, http://www.100md.com

    fig.ommtted

     Construction of count-weighted order statistics. Labels correspond to equations in Materials and Methods. x-Axis (except E), Time from stimulus onset (milliseconds), truncated 150 msec after stimulus onset for visibility (although the measured responses extend to 300 msec after stimulus onset); vertical lines show 75, 90, and 95 msec after stimulus onset. x-Axis (E), Spike count. y-Axes: A, trials; B-F, probability. A, Rasters of responses to two stimuli (black, gray throughout). Each row of dots represents a trial; each dot shows a spike time. B, Firing rate profiles, f_s(t), estimated using local regression (Loader, 1999) with a data window including 10% of the data. C, Cumulative firing rate profiles, F_s(t) (they do not reach 1 because the x-axis is truncated). D, Order statistics, P(₁|s, n), for n = 2, 5, and 10. E, Count histograms, _s(n), with mixture-of-Poissons approximations used to mitigate the effects of using a small data set. F, Count-weighted order statistics, P(₁|s) (.1@;:9eh, http://www.100md.com

    To avoid requiring the decoder to know in advance how many spikes n will arrive in a particular trial, we average the firstorder statistics P(₁|s, n), weighted with the expected distributionof spike counts _s(n) for stimulus s ( we use _s(n)to distinguish from p_s(t), estimated probability of stimulus sat time t). This count-weighted first order statistic, P(₁|s),gives the expected probability of the first spike occurring attime t for spike density f_s(t) and the expected distribution ofspike counts for stimulus s:

    In other words, the count-weighted first order statistic gives the distribution of the next interspike interval. Note thatP(₁|s) sums to 1 _s(0); the term corresponding to no morespikes (n = 0) is leftout.)r58w5a, 百拇医药

    The second order statistic conditioned on the first spike time is calculated as the first order statistic of a restrictedresponse _s, created by left-truncating f_s(t) at the time ofthe previous spike and renormalized to sum to 1. The (n, k + 1)order statistic can be calculated as the (n k, 1) order statisticof the remaining response (Arnold et al., 1992).)r58w5a, 百拇医药

    Order statistics show a Markov property such that future spike times depend on past spike times only through the number ofspikes already observed and the most recent spike time. This dependenceon the number of spikes already observed means that the processis not a renewal process. shows first order statisticscalculated at the beginning of a trial and after the first andfifth spikes in the response.

    fig.ommttedj, http://www.100md.com

     Probabilities of first spike times calculated at different times during a trial. The four rastergrams in the first row show responses of four stimuli (taken, for this example, from one of the neurons in the experiment using 16 stimuli). Each row represents a trial; each dot in a row represents the time of a spike in that trial. The colors of the rasters correspond to the colors used in the graphs below. The bottom three panels show the first spike time probabilities calculated at stimulus onset, 102 msec after stimulus onset, and 164 msec after stimulus onset; under each plot, vertical lines show the times of spikes in the train being decoded (which comes from the stimulus whose responses are shown in black). The distribution of next spike times is initially similar for the red, green, and blue stimuli, but later in the trial, when the blue stimulus fires much less than the red or green stimulus, the probabilities diverge.j, http://www.100md.com

    shows the decoding of several spike trains for the two-stimulus example of .j, http://www.100md.com

    fig.ommttedj, http://www.100md.com

     Decoding responses for the example of . Labels correspond to equations in Materials and Methods. Black and gray are as in. x-Axis, Time (milliseconds); vertical lines, 75, 90, and 95 msec after stimulus onset. y-Axes, Probability. A, Stimulus probabilities, p_s(t), from decoding a spike train with a single spike at 75 msec, when only one stimulus ever elicits spikes. B, Decoding a spike train with a single spike at 90 msec. The probability of a black stimulus rises between 70 and 90 msec after onset, because early spikes are more likely from the gray stimulus, and none appear. C, D, Interpretation of a spike 95 msec after stimulus onset depends on whether there was an earlier spike. The decoding algorithm does not look into the future, so B-D are identical until 90 msec after stimulus onset, and B and D are identical until 95 msec after stimulus onset.

    Simplifying order statistics using a mixture of Poisson distributions. can be used to calculate the count-weightedfirst order statistic for any distribution of spike counts. Ifthe spike count distribution is a mixture of a finite number ofPoisson distributions, our calculations become muchsimpler.)*g, 百拇医药

    For a Poisson process with time-varying mean rate _s f_s(t), the probability that the first spike occurs at time j simplifiesto:)*g, 百拇医药

    where f_s(j) is the spike density function (normalized PSTH) for stimulus s. _s is a rate multiplier. Because we use discretetime bins, the first term in , which represents theprobability of a spike at time t, can be >1 for some combinationsof f_s and _s. To avoid this problem, we use the approximation:)*g, 百拇医药

    because e^_sf_s(t_j) is the probability of no spike occurring in time bin j and is guaranteed to be between 0 and1.

    calculates the sum of count-specific first order statistics P(₁|s, n) for each n, weighted with the appropriatePoisson probabilities. This simplifies in the caseof a simple Poisson distribution of spike counts. However, thedistributions of stimulus-elicited spike counts from several brainareas are not adequately modeled by a Poisson distribution (Baddeleyet al., 1997; Gershon et al., 1998; Wiener et al., 2001).3x3h9vr, 百拇医药

    A mixture of a finite number of Poisson distributions is more flexible than a single Poisson distribution for modeling stimulus-elicitedspike count distributions and almost as simple. To model the distributionof spike counts elicited by stimulus s as a mixture of Poissondistributions with mean rates _s,i with weights summing to 1,we add the first order statistics for each mean rate _s,i weightedby the probability p(_s,i) of observing that rate:3x3h9vr, 百拇医药

    In each mixture, the constituent Poisson processes share a single firing rate profile f_s(t) and differ only in the rate bywhich the profile is multiplied. Thus the model is separable:estimation of the firing rate profile does not interact with estimationof the distribution of mean counts. This model provides a gooddescription of our data (seeResults).

    The calculations in are conceptually identical to those in. However, because the sum incan be taken analytically in the case of a Poissondistribution, the calculations using a mixture of Poisson distributionsare approximately an order of magnitude faster for our data sets.The speedup comes because we calculate order statistics for onlya few Poisson means rather than for many spike counts. The twomethods differ slightly in how they deal with the discretizationof time into bins (the approximation in is unnecessary whenusing ). Below we will show that the effect of this differenceon the estimated stimulus probabilities isnegligible.#x+6o|, 百拇医药

    Model parameters on subintervals. The decoding procedures outlined above iteratively treat each new spike as the first spikein a shorter response. This requires estimating model parametersfor the shorter response from the parameters for the full model.For the multiple Poisson formulation, this is simple and quick(whereas in the general case, it is computationally intensive).The rate function of a Poisson process on a subinterval is theoriginal rate function restricted to that interval. The weightsp(_s,i|s) are continuously updated using Bayes' rule, becauseeach weight is the probability with which we believe the spiketrain being decoded comes from the particular process. The Poissonprocesses differ only by a multiplier of the rate, so our evidenceof which process the train comes from is the number of spikeswe have seen:

    where p(_s,i)(0) denotes the weights at the beginning of thetrial.6%, 百拇医药

    Generating surrogate data. To compare the performance of our decoder on recorded data with its performance on data with thestructure for which the decoder was designed, we simulate spiketrains using a procedure similar to that used for decoding. Wecalculate the distribution P(|s, 1) of next spike times (above)and randomly select a spike time. The simulation also proceedsiteratively: after each spike time is chosen, the distributionfor the next is calculated. The probabilities in Equations 7 and10 do not sum to 1, because they omit the probability of no additionalspikes. For implementation, the probability that no more spikesoccur in the interval is assigned to an additional bin. When thisbin is selected, we consider the spike train under constructionto be complete. By construction, such trains have the structurespecified by themodel.6%, 百拇医药

    Surrogate spike trains matched to observed data are generated using the observed spike density and spike count distributionfor each stimulus. The same number of trials observed is generatedfor eachstimulus.

    Incorporating a refractory period. Real spike train data often depart from the simple mixture-of-Poissons model by becomingeither less or more likely to fire for a short time after a spike(exhibiting a refractory or rebound period, respectively). Toadjust, we depart slightly from classical order statistics andreplace the spike density function f(t) with f(t)r(t ), where is the time of the previousspike./[-nxy, 百拇医药

    We estimate r by comparing the observed interspike interval distribution with the interspike interval distribution of a setof surrogate trains. Because interval distribution is relatedto spike count (more spikes mean shorter intervals), we actuallygenerate more surrogate trains than needed and subselect to matchthe observed spike count distribution. If the interspike intervalin the generated data is significantly different from the observedinterspike interval (² test, p < 0.05), then r(1) is set to g_obs(1)/g_gen(1), where g_obsand g_gen are the interspike interval distributions in the observedand generated data. A new set of trials is simulated with thisnew refractory period (in which the frequency of interspike intervalsof length 1 now matches the frequency in the observed data), andthe process is repeated for subsequent intervals until the interspikeinterval distribution of the generated data is indistinguishablefrom the interspike interval distribution of the experimentaldata.

    Estimating the spike density function from data. The spike density function measures rate variation over time. To estimatethe spike density function for a particular stimulus, we createa histogram at 1 msec precision (the experimental sampling rate)across trials for that stimulus and smooth using local regression(Loader, 1999) with a window using 10% of the data. The spikedensity functions obtained using this window and the optimal smoothingwindow determined by generalized cross-validation (Craven andWahba, 1979; Loader, 1999) are similar: median correlation, 0.98;interquartile range (iqr), 0.95-0.99. The smoothed histogram isnormalized so that its sum across bins is1.d, http://www.100md.com

    Fitting a mixture of Poisson distributions to observed spike counts. The probability of drawing a particular count n froma mixture of Poisson distributions is the weighted sum of theprobabilities of drawing the count n from each of the individualPoisson distributions in the mixture:d, http://www.100md.com

    where k is the number of distributions in the mixture; _s,i 0 is the mean of the ith distribution in the mixture for stimuluss; and the weights p(_s,i) are positive and sum to1.

    For each stimulus s, the parameters _s,i and p(_s,i) are estimated by maximizing the log likelihood of the observed spikecounts given the corresponding model. We seek the most parsimoniousdescription of each distribution. Thus, for each stimulus, wefirst find the best single Poisson distribution (k = 1). If theobserved distribution could reasonably have come from the fitPoisson (² test, p > 0.05), we use this model; otherwise, we fit a mixtureof two Poisson distributions and again check for consistency.In general, if the data are inconsistent with a mixture of k Poissondistributions, we fit a mixture with k + 1 Poisson distributions.In this work, we did not require mixtures of more than five Poissondistributions (seeResults).9t, 百拇医药

    Information calculations. Transmitted information (Shannon and Weaver, 1949; Cover and Thomas, 1991) is defined as I(R, S)= S_r,s p(r, s)log(p(s|r)/p(s)), where the stimulus probabilitiesp(s) are taken from the experiment, and p(s|r) is calculated usingEquation 1 or 2. Using response models avoids some estimationproblems associated with small data sets (Panzeri and Treves,1996; Golomb et al., 1997) and yields estimates of informationcomparable with those using other validated methods (Gershon etal., 1998; Wiener and Richmond, 1998).

    Data sets. We decoded responses recorded from monkey primary visual cortex in two previously reported experiments (Kjaer etal., 1997 Wiener et al., 2001). In each experiment, responseswere recorded using standard single-electrode techniques fromcomplex cells in primary visual cortex of awake rhesus monkeys.At the beginning of each trial, a fixation point appeared on ascreen. One hundred milliseconds after the monkey fixated, a stimuluswas flashed on the receptive field of the neuron for 300 msecand then replaced with the background. The monkey was not requiredto react to the stimulus. If the monkey fixated within ~2° ofthe fixation point during the entire period from the appearanceof the fixation point until the stimulus disappeared, it was rewardedwith a drop of liquid when the stimulus disappeared. If the monkeyshifted its gaze further than 2° from the fixation point, thetrial was aborted, and the monkey received no reward. There wasa delay of 300 msec between trials, during which the monkey wasnot required tofixate.

    In one set of experiments (Wiener et al., 2001), 128 stimuli were shown: 32 oriented bars , 32 sine-wave gratings, 32 Walsh patterns, and 32 photographic images. In another experiment (Kjaer et al., 1997), 16 Walshpatterns were used .ou)^m@6, http://www.100md.com

    fig.ommttedou)^m@6, http://www.100md.com

     Stimuli used in the experiments. In one set of experiments (Wiener et al., 2001), the stimulus set consisted of 32 oriented bars (A), 32 sine-wave gratings (B), 32 Walsh patterns (C), and 32 photographic images (D). In the other set of experiments, 16 Walsh patterns were used (the 8 shown in E and their contrast-reversed counterparts).ou)^m@6, http://www.100md.com

    Computation. The calculations presented in this paper were performed in the R statistical computing environment (Ihaka andGentleman, 1996).ou)^m@6, http://www.100md.com

    Resultsou)^m@6, http://www.100md.com

    Our original analyses were performed on data from the 29 neurons from the experiment with 16 stimuli (Kjaer et al., 1997).In a single experiment (recording from a single neuron), eachstimulus was presented approximately the same number of times;the number of presentations per stimulus varied from neuron toneuron, ranging from 19 to 230 (median, 42). The mixture-of-Poissonsmodel made decoding sufficiently fast that we were able to decodedata from the 17 neurons from the experiment with 128 stimuli(Wiener et al., 2001). In these experiments, the median numberof presentations per stimulus ranged from 8 to 52 (median 14)in different neurons; each stimulus was presented approximatelythe same number of times. Below we will show that decoding resultsusing the two methods on the data from the experiment with 16stimuli were very similar. Except where stated otherwise, theresults in this paper were obtained using the mixture-of-Poissonsmodel.

    Mixtures of Poisson distributions for spike count2z}y1m, 百拇医药

    We examined the mixture-of-Poissons model for 2636 spike count distributions in two different sets of V1 data (Kjaer et al.,1997; Wiener et al., 2001); 50.8% of the spike count distributionswere adequately fit with a single Poisson distribution, 39.4%with a mixture of two Poisson distributions, 7.4% with three,2.0% with four, and 0.4% with five. No distribution required amixture of more than five distributions. Examples of mixture modelswith one, two, and three components are shown in .2z}y1m, 百拇医药

    fig.ommtted2z}y1m, 百拇医药

     Spike count distributions fit by mixtures of Poisson distributions. The top, middle, and bottom panels show spike count distributions fit by a single Poisson distribution, a mixture of two Poisson distributions, and a mixture of three Poisson distributions, respectively. In each panel, the histogram (gray bars) shows the observed distribution of spike counts. The dots connected by lines show the fitted values, and the solid and dashed lines show the component Poisson distributions. A, Single Poisson distribution with mean of 12.4. B, Mixture of two Poisson distributions with means of 0.4 (weight, 0.56) and 3.1 (weight, 0.44). C, Mixture of three Poisson distributions with means of 0.2 (weight, 0.11), 5.8 (weight, 0.31), and 14.4 (weight, 0.58). Note the different scales on the x- and y-axes of the three panels.

    The modeled distributions are, by design, smoother than the observed distributions. Various spike counts are more or lesslikely in the model than in the data. Spike counts that are veryfrequent in the data may be assigned less probability in the model,and spike counts not actually observed in the data may be assignednonzero probability (as is the case for a count of four in ). Fitting a mixture of Poisson distributions to the observedspike count is, among other things, a form of smoothing (meaningwe believe that if we gathered enough data we would encountera response with four spikes elicited by the stimulus correspondingto). As expected when smoothing, the means of the modeleddistributions are very close to the observed means (median differenceacross all neurons, 0% of the measured mean; iqr 3 to +4%), butthe variances are lower (median difference, 5% of the measuredvariance; iqr, 24 to + 9%).s/u{@, 百拇医药

    In the Poisson decoding, we model the distribution of mean spike counts in any subinterval of the trial (from any particulartime to the end of the trial) on the basis of the distributionof mean spike counts in the entire trial [p(_s,i) at the startof the trial] and the firing rate modulation over time (i.e.,how many of the spikes are expected to have occurred by now; see). If we use the mixture-of-Poissons model to predict, onfive successively shrinking subintervals (beginning 50, 100, 150,200, and 250 msec after stimulus onset), the variance of spikecounts elicited by each stimulus and take a linear regressionof log(predicted variance) versus log(observed variance), we finda negative intercept (median across 46 neurons, 0.13; iqr, 0.25to 0.03) and a slope near 1 (median, 0.98; iqr, 0.95-1.02). Ther² values of the regressions are high (median, 0.94; iqr, 0.91-0.96).Below we will examine the effect on decoding accuracy of usingthe estimated subinterval spike count distributions compared withusing spike count distributions directly measured on thesubintervals.

    Decoding5p&:?3v, 百拇医药

    shows an example of the development of stimulus probabilities over time (i.e., the probability that each stimuluselicited the observed spike train) when decoding using the mixture-of-Poissonsmethod.shows an example from an experiment in which16 Walsh patterns were shown (Kjaer et al., 1997), and shows an example from an experiment in which 128 stimuli wereshown (Wiener et al., 2001). The spike trains decoded are shownunderneath each panel. The stimulus probabilities at each timedepend only on the spike train observed up to that time; the decodingalgorithm does not look into the future. Although the probabilitieschange more abruptly when spikes arrive, they change even whenno spikes are fired. Thus the absence of spikes, as well as theirpresence, is informative (cf. Sherlock Holmes's dog that did notbark).5p&:?3v, 百拇医药

    fig.ommtted5p&:?3v, 百拇医药

    Decoding responses of neurons in monkey V1 in an experiment using 16 stimuli (Kjaer et al., 1997) (top) or 128 stimuli (Wiener et al., 2001) (bottom). x-Axis, Time from stimulus onset. y-Axis, Stimulus probabilities. Stimulus probabilities from decoding single spike trains (black vertical lines below each panel). Each line shows the probability assigned by the decoding algorithm to a single stimulus. A, Decoding a response in an experiment using 16 stimuli (Kjaer et al., 1997). The top line shows the probability of the stimulus that elicited the spike train ( second Walsh pattern from the left). B, Decoding a response in an experiment using 128 stimuli (Wiener et al., 2001). The top line shows the probability of the stimulus that elicited the spike train ( first row, second Walsh pattern from the left).

    , A and B, shows correctly decoded trials in which one stimulus is far more probable than all others. In these cases,the probability of the guessed stimulus is high. summarizesthe distribution of the maximum estimated probability, that is,the probability of the guessed stimulus, at the end of our decodingwindow (300 msec after stimulus onset). It also shows the differencebetween the probabilities of the most probable and second mostprobable stimulus (the "margin of victory" over the "runner-up").The winning probabilities are higher in the experiments with 16stimuli than in the experiments with 128 stimuli, because thetotal probability (p = 1) is divided among fewer stimuli.$, 百拇医药

    fig.ommtted$, 百拇医药

    Probability of most probable (guessed) stimulus when spike trains are decoded using the mixture-of-Poissons model$, 百拇医药

    $, 百拇医药

    Intuitively, it should become easier to distinguish among stimuli as the probability of one stimulus rises and the probabilitiesof others fall. We evaluate the decoding in two ways: by measuringthe amount of information obtained from the spike train and alsoby seeing how well the decoding lets us guess which stimulus elicitedthe observed response. As stated in Materials and Methods, inall our calculations we avoided overfitting using three-way cross-validation:the available trials were divided into three subsets, with two-thirdsused to fit the model and the remaining third used to test decoding.Each third of the data was used twice to fit and once to test;the results we present are averaged over the threesets.

    Information theory (Shannon and Weaver, 1949) shows that the amount of information in a set of spike trains depends on howthey are interpreted, that is, on the nature of the neural code., A and C, shows that for both sets of experiments, decodingusing only the number of spikes in a response (dark boxes) yieldssubstantially less information than considering both spike countand spike timing using the mixture-of-Poissons model (light boxes).5&!, 百拇医药

    fig.ommtted5&!, 百拇医药

     Decoding using timing (through the mixture-of-Poissons model) is more effective than decoding using spike count alone. A, C, Distribution across 29 (A) or 17 (C) neurons of information transmitted by the neuronal responses (spike trains) about which stimulus was shown, using expanding windows starting at stimulus onset. Dark boxes, Spike count only; light boxes, spike count and timing together. B, D, Distribution across 29 (B) or 17 (D) neurons of the percent of trials correctly decoded by guessing the stimulus with highest probability of having elicited the observed response. For comparability, figures are presented as multiples of the percent of trials that would be correctly decoded by chance (B, 1/16 = 6.25%; D, 1/128 = 0.78%). The dotted line shows the percent of trials correctly decoded by chance. We avoided overfitting using three-way cross-validation: the available trials were divided into three subsets, with two-thirds used to fit the model and the remaining third used to test decoding. Each third of the data was used twice to fit and once to test; the results shown here are averaged over the three sets. Boxes, Median; line and indentation at center, interquartile range: bottom and top. If notches do not overlap, corresponding medians are different (p < 0.05). Whiskers have been eliminated, and the dark boxes have been made wider, for visibility.

    Does this extra information provided by paying attention to timing actually help us decode more accurately? It is naturalto guess that the stimulus with the highest probability elicitedthe observed spike train (as it did in both examples in ., B and D, shows the distribution (across 29 and 17 neurons,respectively) of the percent of trials correctly decoded, thatis, for which the highest-probability stimulus did elicit theresponse. For comparability between the two experiments, the figuresare shown as multiples of the percent of trials that could becorrectly decoded simply by guessing (1/16 = 6.25%, and 1/128= 0.78%). In both experiments, more trials were correctly decodedusing both spike count and timing via the mixture-of-Poissonsmethod than using spike countalone.ow%[u, 百拇医药

    When timing is taken into account, the amount of information transmitted by the neuronal responses is larger in the experimentwith 128 stimuli than in the experiment with 16 stimuli . This difference could be because one experiment uses morestimuli than the other, or it could be because the larger experimentused stimuli of several different kinds. To check whether thenumber of stimuli was the crucial factor, we randomly selectedsubsets of 64, 32, and 16 stimuli, allowing stimuli of all fourkinds to enter each set. shows that this has very littleeffect on the amount of information transmitted by the responses. compares the amount of information transmitted by responsesto the randomly selected groups of 32 stimuli with the amountof information transmitted by the four subsets of 32 stimuli consistingof the bars, gratings, Walsh patterns, and photographic images,respectively . Responses to the Walsh patterns andphotographs used in these experiments transmit less informationthan do the responses to bars and gratings.

    fig.ommtted$u[w7, http://www.100md.com

     Differences in stimulus set, rather than in the number of stimuli, account for differences in the results of the two sets of experiments. x-Axis, Time from stimulus onset (milliseconds). y-Axis, Transmitted information (bits). A, Information transmitted by responses of a single neuron to different numbers of stimuli: 128 (solid circles), 64 (plus symbols), 32 (open triangles), and 16 (open circles). For all except the full set of 128 stimuli, the lines represent the median value of transmitted information over 100 random samples (from the full set of 128) of the appropriate number of stimuli. Transmitted information drops only slightly with the number of stimuli. B, Information transmitted by responses of the neuron to random subsamples of 32 stimuli (filled circles; same as triangles in A) and to subsets of 32 stimuli consisting of bars ( open circles), gratings ( open triangles), Walsh patterns ( plus symbols), and photographic stimuli ( times symbols). Much less information is transmitted by responses to Walsh patterns or photographic stimuli than by responses to bars, gratings, and random subsamples.

    Wiener et al. (2001), in an analysis using spike count alone, have previously shown that the difference in information transmissionin neuronal responses to stimuli of the four different kinds isprimarily attributable to the fact that responses to Walsh patternsand photographs have a smaller dynamic range than responses tobars and gratings. This new result shows that the difference persistsin an analysis that takes into account spike timing as well. Thepersistent difference in results obtained using different kindsof stimuli reinforces a point made by Wiener et al. (2001): conclusionsbased on experiments using a narrow set of stimuli may not generalizewell to experiments using a larger set. The need for a large numberof stimuli unfortunately conflicts with the need for many trialsper stimulus in an experiment with a limited number oftrials..*x|, http://www.100md.com

    Relation between decoding success and transmitted information.*x|, http://www.100md.com

    Intuitively, we expect transmitted information and decoding success to be related: if more information is transmitted by theresponses, more responses should be correctly decoded. Above we showed that in both sets of experiments (with 16 and 128stimuli), transmitted information and percent of trials correctlydecoded both increase as the trials progress. shows that,for individual neurons from the two sets of experiments, moreinformation transmitted generally corresponds to a greater portionof trials correctly decoded.

    fig.ommttedqq}6m, 百拇医药

     Relation across neurons between transmitted information and percent of trials correctly decoded for individual neurons. x-Axis, Transmitted information (bits). y-Axis, Percent of trials correctly decoded (as a multiple of the number of trials that would be correctly decoded by chance). Each symbol shows transmitted information and the percent of trials correctly decoded 300 msec after stimulus presentation for a single neuron. The percent of trials correctly decoded increases with increasing information transmitted, except when decoding using the mixture-of-Poissons method in the experiment with 128 stimuli. Circles, Decoding using the spike count in the experiment with 16 stimuli; triangles, decoding using the spike count only in the experiment with 128 stimuli; plus symbols, decoding using the mixture-of-Poissons method in the experiment with 16 stimuli; times symbols, decoding using the mixture-of-Poissons method in the experiment with 128 stimuli.qq}6m, 百拇医药

    Effect of refractory periodqq}6m, 百拇医药

    Because we estimate the spike count distribution from data, we incorporate the effect of refractory period on spike count.We found that the effect of any additional refractory or reboundperiod on the number of trials correctly decoded is very small:0.2% fewer trials are correctly decoded when the refractory periodis included (median across neurons; iqr, 2.7 to 2.2%). The effecton individual stimulus probabilities was also small: in the experimentsusing 16 stimuli, the median correlation between the estimatedstimulus probabilities obtained when taking the additional refractoryeffects into account and those obtained when not taking the additionalrefractory effects into account was 0.997 (median across all trialsin 29 neurons), and the fifth percentile of correlations was 0.96.The median correlation in the experiments using 128 stimuli was0.999 (median across all trials in 17 neurons), and the fifthpercentile was 0.99. Except where otherwise noted, the resultsthroughout this study were calculated using the refractoryperiod.

    Decoding signal versus decoding noisel^'^#v;, 百拇医药

    When there are no differences in timing among responses elicited by different stimuli, we would expect the decoder to reduceto a spike count decoder. To check this, we simulated two setsof responses with identical spike density functions (normalizedPSTH) but different distributions of spike count (Poisson distributionswith means of 4 and 10). The only nonrandom differences in timingarise from the differences in spike count. A trial with sevenspikes should, at the end of the trial, be assigned with probability0.4 to the Poisson distribution with mean 4 and with probability0.6 to the Poisson distribution with mean 10. In fact, dependingon the data used to create the Poisson models, different spiketrains of length 7 will be assigned different probabilities, butwe expect the assigned probabilities to converge to p = 0.4 and0.6. shows that when spike timing is known to be random,decoding on the basis of timing (using the mixture-of-Poissonsmodel; white boxes) converges to the correct answer more slowlythan the optimal decoding using spike count alone (black boxes).

    fig.ommtted8gcp()a, http://www.100md.com

     Decoding using the mixture-of-Poissons method is inferior to spike count decoding when there are no timing differences among responses to different stimuli. A, Mixture-of-Poissons decoding converges to the correct answer less quickly than does spike count decoding in a two-stimulus example with identical PSTH for both stimuli. x-Axis, Number of trials per stimulus. y-Axis, Distribution of estimated probabilities that a train with seven spikes arose from a homogeneous Poisson process with a mean of 4 spikes rather than from one with a mean of 10 spikes. The left (open) box in each set shows results using the mixture-of-Poissons decoding method. The right (filled) box shows results when decoding on the basis of spike count alone. The simulated data were generated by a homogeneous Poisson process with a mean of 4 for one stimulus and a homogeneous Poisson process with mean of 10 for the other stimulus. The mean number of spikes for each process was held fixed while the number of trials changed. Similar results were obtained using inhomogeneous Poisson processes as long as the rates varied in time in the same way (i.e., the normalized PSTHs were the same for the two processes). The left box in each set represents 2500 values, the results of decoding 500 randomly generated test trains using models derived from five different sets of simulated data. The right box represents 250 values, the result of decoding the response "seven spikes" according to the models derived from 250 sets of simulated data (using additional sets of simulated data does not change the results). Interpretation of box plots: The line at the center of each box shows the median estimated probability, and the notch shows a 5% confidence interval for the median. The bottom and top of the box show the 25th and 75th percentiles, and the bottom and top whiskers show the 5th and 95th percentiles. B, x-Axis, Time from stimulus onset (milliseconds). y-axis, Number of trials correctly decoded (as a multiple of chance) using the spike count only (dark boxes) and the mixture-of-Poissons method (light boxes). Each box plot represents results from 25 artificial data sets with spike distributions modeled on those from one of the neurons from the experiment with 16 stimuli and no differences in the timing of stimuli as above. Interpretation of box plots is as above (whiskers removed for visibility).

    A similar effect can be observed in surrogate data generated to match the spike count distribution seen in one of the neuronsfrom the experiment with 16 stimuli but with spikes equally likelyat any time (i.e., with a flat PSTH) for all stimuli. Here again,spike timing is purely random and carries no information aboutwhich stimulus generated the response. shows that moretrials are correctly decoded using the spike count alone thanusing the full mixture-of-Poissons model. Again, attempting todecode using random timing degradesperformance.s2k}5r, 百拇医药

    In the cases just presented, the spike density function is (by construction) of no help in decoding but still must be estimated.When a small amount of data is available, random spike time variationscause differences in the estimates of the spike density functionsfor different stimuli, and these spurious differences interferewith decoding. The random differences in the spike density functionsbecome smaller when more data are available to estimate them (althoughtrue differences in spike density functions can be detectableeven with modest amounts of data). The need for data to eliminatespurious correlations would only increase if more complicatedtiming features were to be taken into account. Thus it is importantto avoid trying to decode on the basis of complicated timing featuresthat can be accounted for as stochastic consequences of the spikecount and simpler timingfeatures.

    Comparing different decoding variants', 百拇医药

    Trains from the neurons from the experiment with 16 stimuli were decoded using both the order statistic method (with the spikecount distributions represented using mixtures of Poisson distributions)and the simplified mixture-of-Poissons method. The correlationbetween the two sets of probabilities at the end of the trial(that is, 300 msec after stimulus onset) was 0.997 (median acrossneurons; iqr, 0.992-0.999). This verifies that the two methodsgive nearly identical results (up to certain issues related totime discretization near the end of a trial). As a result, thesame stimulus was guessed by the two methods in 95% of the cases(median across neurons; iqr, 94-98%). It may seem odd that differentstimuli could be guessed when the correlation between probabilitiesis so high. shows that the difference between the probabilityof the winning stimulus and the next most probable stimulus canbe quite small; thus small discrepancies can change the guessedstimulus in a small number of trials. As would be expected, thedifferences in the largest and second-largest probabilities intrials when the two methods made the same guess were ~5 timesas large as the differences in trials when the two methods madedifferent guesses. We did not decode the experiments with 128stimuli using the order statistic method because of the computationalburden.

    As previously presented, when using the mixture-of-Poissons model, we have modeled the distribution of mean spike counts inany subinterval of the trial (from any particular time to theend of the trial) on the basis of the distribution of mean spikecounts in the entire trial [_s,i and p() at the start of thetrial] and the firing rate modulation over time (Eq. 11). It ispossible, at substantial expense in both computation time andadditional model complexity, to avoid this approximation by measuringthe distribution of spike counts in each subinterval and findinga new mixture of Poisson distributions to model each distribution.Despite the additional model complexity and computation, decodingusing measured subinterval distributions gives results similar,but not identical, to those obtained when using estimated subintervaldistributions: the correlation between the two sets of probabilitiesat the end of trials was 0.88 (iqr, 0.83-0.94), and the same stimuluswas guessed in 75% of the trials (median across neurons; iqr,65-82%). Approximately the same number of trials were correctlydecoded using the two methods , although using the measureddistributions yielded slightly more transmitted information .

    fig.ommtted/w\, 百拇医药

     Decoding results using the mixture-of-Poissons model to estimate spike count distributions on subintervals (dark boxes) are similar to those obtained when directly measuring spike count distributions on subintervals (light boxes). A, The portion of trials correctly decoded is similar for the two methods. x-Axis, Time from stimulus onset (milliseconds). y-axis, Percent of trials correctly decoded (as a multiple of chance). The dotted line shows the approximate percent of trials correctly decoded by chance. B, Using measured distributions results in higher transmitted information than using estimated distributions. x-Axis, Time from stimulus onset (milliseconds). y-Axis, Transmitted information. Light boxes are narrower for visibility only. This graph uses the data from the experiment with 16 stimuli (Kjaer et al., 1997). Interpretation of box plots is as in ./w\, 百拇医药

    Decoding using the order statistic method or the mixture-of-Poissons method with directly measured spike count distributionson subintervals takes substantially longer (by approximately anorder of magnitude) than using the mixture-of-Poissons model whenestimating the count distributions on subintervals. The resultsof this section show that the extra time produces little changeindecoding.

    How good is the mixture-of-Poissons model?i\q72/, http://www.100md.com

    Time-rescaling goodness-of-fit testi\q72/, http://www.100md.com

    Barbieri et al. (2001) and Brown et al. (2002) presented a method for checking the goodness of fit of spike-train models.Their method is based on the time-rescaling theorem, which showsthat any point process can be transformed, by a suitable rescalingof time, into a homogeneous Poisson process. To apply their methodto our data, we randomly chose two-thirds of the trains to estimatethe model, withholding one-third of the trains to use as a testset. Each model was tested using the Kolmogorov-Smirnov test presentedby Brown et al. (2002), with p = 0.95. In the neurons from theexperiment with 128 stimuli, the test spike trains were consistentwith the model for 109 stimuli (median across neurons; iqr, 104-112).In the neurons from the experiment with 16 stimuli, the test spiketrains were consistent with the model for 11 stimuli (median acrossneurons; iqr, 10-13). When the refractory-rebound effect was ignored,the results were very similar: the test spike trains were stillconsistent with the model for 108 of 128 stimuli (median acrossneurons; iqr, 104-111) and in 11 of 16 stimuli (median acrossneurons; iqr, 9-13). Overall, 82% of stimuli were consistent withthe model when the refractory period was included in the model,and 78% were still consistent when the refractory period was ignored.Thus the mixture of Poisson distributions with different meanrates, and therefore different temporal structures, seems to accountfor much of the observed temporal structure of the spike trains.This may explain why our decoding results with or without therefractory period are sosimilar.

    Number of trials correctly decoded7k]](t0, 百拇医药

    Another way to examine how well our model describes the data is to compare the performance of the decoder on real data withits performance on artificially generated surrogate data thatmatch the real data in many ways and are known to have the structurespecified by the model. If our model describes the data well,we would expect to decode real spike trains nearly as well aswe decode artificially generated spike trains. If our model describesthe data poorly, we would expect to decode artificial spike trainsmuch more accurately than real spiketrains.7k]](t0, 百拇医药

    We generated cell-matched surrogate data as described in Materials and Methods. compares the percent of trials correctlydecoded in real and surrogate data, both for the spike count codeand using the mixture-of-Poissons model. When decoding using thespike count only, we correctly decode 15% (median across neurons;iqr, 7-20%) more trials from surrogate data than from real datain the experiments with 16 stimuli (circles) and 18% (median;iqr, 10-23%) more in the experiments with 128 stimuli (triangles).The decoder does nearly as well when decoding data from the experimentswith 16 stimuli using the mixture-of-Poissons method (plus symbols):we correctly decode 17% (median; iqr, 7-23%) more trials fromsurrogate data than from real data. However, when the mixture-of-Poissonsmethod is used with data from the experiments with 128 stimuli(times symbols), many more trials are correctly decoded from surrogatedata than from real data (median, 100% more; iqr, 75-150%). shows that the performance of the mixture-of-Poissons decoderis related to the amount of data available: the more data availableto estimate the model parameters in the first place, the moreclosely the number of trials correctly decoded in real data approachesthe number correctly decoded in surrogate data.

    fig.ommtted!'$(e0}, 百拇医药

     Performance of the decoder on real data and matched surrogate data. A, Percent of trials correctly decoded (in multiples of chance) for the real data (x-axis) and for matched surrogate data (y-axis). Results are shown both when decoding using the spike count only (circles, 16 stimuli; triangles, 128 stimuli) and when decoding using the mixture-of-Poissons method (plus symbols, 16 stimuli; times symbols, 128 stimuli). Each symbol shows results for a single neuron, and each neuron is represented twice, once for decoding using the spike count and once for decoding using the mixture-of-Poissons method. The figures for surrogate data are medians across 10 artificial data sets. Real data are decoded nearly as well as surrogate data when using the spike count code and even when using the mixture-of-Poissons method for the experiment with 16 stimuli. However, for data from the experiment with 128 stimuli, the decoder performs much better on surrogate data than real data. B, The performance of the decoder depends on the number of trials per stimulus available. x-Axis, Median number of trials per stimulus. y-Axis, Ratio of the number of trials correctly decoded for surrogate data to the number of trials correctly decoded in real data. Only data from the experiment with 128 stimuli decoded using the mixture-of-Poissons method are shown (times symbols are used for consistency with A).

    To check whether the number of stimuli involved (and therefore the complexity of the problem) contributes to the differencebetween the two sets of results, we created 16-stimulus neuronsfrom 128-stimulus neurons by taking the responses from 16 randomlychosen stimuli and discarding the rest. Twenty-five subneuronswere created from each of the two neurons with the most trialsper stimulus (50 and 52). In these artificial neurons, 16% moretrials are correctly decoded from surrogate data than from thereal data (median; iqr, 11-20%), not significantly different fromthat seen for the neurons from the experiment that actually hadonly 16 stimuli (Kruskal-Wallis test, p = 0.7). There is stillno significant difference if we try to minimize the influenceof the number of trials per stimulus by comparing the resultsfor the subsampled data only with results from the seven neuronsfrom the experiment with 16 stimuli with between 40 and 60 trialsper stimulus (Kruskal-Wallis test, p = 0.2). Thus correctly choosingamong 128 stimuli requires more data to parameterize the modelthan correctly choosing among 16stimuli.

    Accuracy of individual stimulus probabilities: model error and estimation erroru(+4t, 百拇医药

    The most stringent possible test of the performance of the decoder assesses the accuracy of all the estimated stimulus probabilities.For example, if we look at all the stimuli assigned probabilitiesnear p = 0.1 in various trials, we would hope that those stimuliwere actually the stimuli that elicited those trials 10% of thetime. This is essentially the same test we might use to evaluatethe quality of a weather-forecasting service: it should rain onhalf the days with a 50% chance of rain, one-fourth of the dayswith a 25% chance of rain, and so on. To test this, we bin theestimated probabilities (we have one estimated probability foreach stimulus for each trial) and then ask how often the stimulirepresented in each bin really did elicit the correspondingtrial.u(+4t, 百拇医药

    Departure from accurate prediction can come from two sources. The first possible source of error is model misspecification:the mixture-of-Poissons model may not capture the true structureof the spike trains. The second possible source of error is difficultyfitting the model on the basis of the amount of data available.We will call these model error and estimation error. To separatethese errors, we must examine how accurately the model estimatesstimulus probabilities in three different kinds of data: the realdata; artificial data, generated according to the model, withthe same number of trials per stimulus as the real data; and artificialdata, generated according to the model, with many more trialsper stimulus than available in the real data. Model error canbe examined by comparing results from the real data and the smallerset of artificial data: because the two sets have equal numbersof trials per stimulus, any difference in our ability to decodemust derive from the fact that one set of data is known to beconsistent with the model, whereas the other is not. Estimationerror can be examined by comparing results from the larger andsmaller artificial data sets: because both artificial data setsare known to be consistent with the model, any difference in ourability to decode must derive from difference in the amount ofdataavailable.

    Figure 13A shows that, for the data from the experiment with 16 stimuli, the predicted stimulus probabilities are close tothe true stimulus probabilities. Small predicted stimulus probabilitiestend to be slightly too large, and large predicted stimulus probabilitiestend to be slightly too small. This is true whether we decodeusing the spike count only (filled squares) or using the fullmixture-of-Poissons model (filled circles), although the differenceis smaller for spike count only. The same effect is seen in artificialdata with the same number of trials per stimulus as seen in thereal data, although it is substantially less pronounced (opensquares, open circles). This indicates that some of our erroris attributable to the model not quite fitting the data. However,in artificial data with many more trials per stimulus than inour actual data (500 trials per stimulus), the predicted probabilitiesare almost exactly equal to the observed probabilities both whenthe spike trains are decoded using the spike count only (plussymbols) and when they are decoded using the full model (timessymbols). The excellent agreement between estimated and actualstimulus probabilities obtained on large artificial data setsserves as confirmation that the decoder works as it is supposedto and also indicates that part of the error seen with real datasets is attributable to difficulty correctly estimating the modelparameters (the spike density functions and the mixtures of Poissonsfor spike count distributions) using the amount of data available.

    fig.ommttedfe, http://www.100md.com

     Accuracy of estimated stimulus probabilities. x-Axis, Estimated stimulus probability. y-Axis, Observed probability. Observed probabilities are calculated by binning the estimated probabilities (one for each stimulus for each trial) and then asking how often the stimuli represented in each bin really did elicit the corresponding trial. If our model described the data perfectly, and we had sufficient data to estimate the model, the estimated and observed probabilities would be identical. In both panels, the number of probabilities in each bin drops sharply as the estimated probability increases. A, Results from 29 neurons from the experiment with 16 stimuli. Filled symbols show results for the real data; open symbols show results for artificial data with the same number of trials per stimulus as the real data; times and plus symbols show results for very large artificial data sets (500 trials per stimulus). Squares and plus symbols show results when decoding using the spike count only; circles and times symbols show results when decoding using the full mixture-of-Poissons model. B, Results from the two neurons with the most (50) trials per stimulus from the experiment with 128 stimuli. Symbols are as in A. Results for extremely large artificial data sets are not included because of the computational burden. Note different scales in the two panels.

    is similar tobut for data from the two neurons with the most (50) trials per stimulus in the experimentswith 128 stimuli. Here the predicted probabilities are almostalways lower than the true probabilities, whether decoding usingthe spike count only or using the mixture-of-Poissons model. Becauseof the amount of computation involved, we do not include resultsfor extremely large artificial data sets in , but theresults in imply that the decoder works properly, sofor a sufficiently large artificial data set, the probabilitieswould lie along the identity line. shows that most ofour misestimation of probabilities can be ascribed to limitedsample size rather than to mis-specification of the model (i.e.,the distance between the filled and open symbols is smaller thanthe distance between the open symbols and the identity line),particularly for the full mixture-of-Poissons model. Excludingthe three smallest probability bins (in which the error is extremelysmall), estimation error accounts for approximately three-fifthsof the total estimation error when decoding using spike countonly and for almost four-fifths of the error when decoding usingthe full mixture-of-Poissons model. The fact that the decoderestimates probabilities less well in the data with 128 stimulithan in the data with 16 stimuli again shows that more data arerequired to accurately solve the more difficultproblem.

    Discussion)r58w5a, 百拇医药

    In this paper we present the order statistic model of spike trains, which is based on the observation that in several brainareas, single-neuron spike trains are almost entirely describedas stochastic samples from the firing rate profile (Oram et al.,1999, 2001; Baker and Lemon, 2000). In accordance with these observations,in the order statistic model, individual spike times are informativebecause they reflect underlying rate variation. This model requiresestimating only the spike count distribution, the spike densityfunction, and the interspike interval distribution from data.All of these can be reasonably estimated using amounts of datagathered in virtually every experiment. Our results show thatfor decoding problems with many conditions, decoding performanceis quite good with amounts of data frequently collected, but verylarge amounts of data may be needed for optimaldecoding.)r58w5a, 百拇医药

    The order statistic model is general and can be used with any estimate of the spike count distribution, modeled or measured.The mixture-of-Poissons model gives up some of this generality,modeling each spike count distribution as a mixture of Poissondistributions and the spike trains as instances of a mixture ofPoisson processes. In exchange, the Poisson model speeds up thecalculation of order statistics; we calculate distributions ofnext spike times for a few (up to five) Poisson means insteadof for many (up to 50) spike counts . For ourdata, decoding using the Poisson model was approximately an orderof magnitude faster, and the extra generality of the order statisticmodel was not needed: >98% of the spike count distributions werefit with mixtures of three or fewer Poisson distributions, andall were fit by mixtures of five or fewer. A mixture of Poissonprocesses may or may not fit a particular data set. Other structuresidentified in particular data sets might help simplify calculatingorder statistics in a similar way, although the details of thecalculation would bedifferent.

    Any decoding method must decide which aspects of a response are important, that is, carry information that can be used todistinguish among stimuli, and which can be ignored. Ignoringspike timing that does carry information reduces decoding accuracy, but so does paying attention to timing that does notcarry information . When spike timing is taken into account,there are so many distinct responses that a great deal of datais needed to determine whether apparent correlations between stimulusand response are reliable or merely (as in the examples of ) sampling artifacts. Thus decoding using timing is difficultprecisely because timing presents great opportunities for encoding.Here we avoid the difficulties of directly counting how ofteneach response is elicited by using a model that captures the structureof observed responses, including high-order correlations amongspike times (Oram et al., 1999, 2001). This model leads to simpledecoding methods that are much more effective than decoding basedon spike countalone.

    The decoding scheme described here assumes that the response being decoded was generated by one of the stimuli, and in thatcase, we checked in a small amount of data from area TE of monkeytemporal lobe that the decoder works even when spikes occur betweentrials. We can also treat the no-stimulus, or intertrial, conditionas just another experimental condition. Even in this case, thedecoding algorithm assumes that the time of stimulus onset isknown. Several groups have that suggested that eye movement orother causes of large-scale change in the visual field cause neuralresponses that can be used as synchronizing or reset signals (Sobotkaand Ringo, 1997; Sobotka et al., 1997; Huang and Paradiso, 2000;Greschner et al., 2002).)*g, 百拇医药

    The question of response alignment is related to the question of response latency. Once responses are aligned on stimulusonset, systematic differences in response latency are reflectedin the firing rate profile over time and possibly also in thespike count distribution. show examples, basedon real data, of constructing order statistics and decoding realdata with a latency difference. Rhythmic firing phase-locked tostimulus onset would also be reflected in the PSTH. Rhythmic firingnot phase-locked to stimulus onset would not show up in the PSTHbut might be accounted for by the refractory-rebound term. Anyadditional underlying statistical structure different from thatassumed by the model might require additional modeling work toaccommodate.

    Other models and methods3x3h9vr, 百拇医药

    The models presented here can be estimated using modest amounts of data, because they assume that spike trains have a certainstochastic structure. This contrasts with the "direct method,"which makes no assumptions about the form of the neural code butrequires very large data sets, because it directly counts howoften each stimulus elicits each response. The direct method hasbeen used to calculate information transmission rates in the fly(Strong et al., 1998), in anesthetized monkeys (Reinagel and Reid,2000), and in isolated retina (Nirenberg et al., 2001). However,in awake monkeys the amount of data gathered has not been sufficientto estimate information transmitted by spike timing; only calculationsusing the spike count in single bins have been possible (Reichet al., 2001). Thus, at least in awake monkeys, the direct methodcannot currently substitute for finding some principle allowingus to compactly describe spike trains, that is, amodel.

    Others researchers have modeled spike trains using variants of Poisson processes. Lansky and Vaillant (2000) and Lansky etal. (2001) model the responses of hippocampal place cells usingmixtures of two Poisson processes. Kass and Ventura (2001) modelfiring probability as a product of time from stimulus onset anda refractory term depending on the time of the last spike (i.e.,an inhomogeneous Poisson process with a refractory period); theyestimate the two components together using a generalized linearmodel, whereas we estimated them separately. Barbieri et al. (2001)model spike trains by characterizing interspike intervals usinggamma and inverse Gaussian distributions (generalizing the exponentialdistribution of intervals arising from a Poisson process). Theyuse time rescaling to account for inhomogeneities in firing rate,inducing a Markov dependence similar to that seen in the orderstatistic and mixture-of-Poissons models presented here. One possibleuse of statistical models such as the mixture-of-Poissons modeland the others described above is to provide a benchmark againstwhich to evaluate spike trains produced by biophysical modelsthat address the cellular mechanisms of neuronal informationprocessing.

    Another way to take timing into account when decoding is to sequentially decode the spike count in different time bins. Inthe limit of bins that can contain at most one spike, the decodersimply asks, at each time, "Did a spike occur now? . . . now?. . . now?" This is equivalent to decoding using order statisticsor the mixture-of-Poissons model only if the responses in allbins are independent. The Markov dependence built into the orderstatistic and mixture-of-Poissons models seems to incorporatemost of the correlations observed in data (Oram et al., 1999).6%, 百拇医药

    We decode to determine which of a categorical set of stimuli was presented, without regard to stimulus characteristics suchas bar or grating orientation. Other groups have developed approachesto reconstruct continuously varying signals (Gabbiani and Koch,1996; Rieke et al., 1997; Brown et al., 1998; Schwartz and Moran,2000; Wessberg et al., 2000; Manwani et al., 2001; Serruya etal., 2002). An appropriate discretization of the continuous signalmight allow our methods to be applied to these problems. Thismight require finding a time scale on which the PSTH gave informationabout the signal (that is, how long a section of signal shouldbe treated as a single stimulus in our formulation), as well asimposing probabilistic continuity conditions (as by Brown et al.,1998).

    Conclusion/[-nxy, 百拇医药

    The results of Oram et al. (1999) suggest that, at least in V1 and the lateral geniculate nucleus, individual spike timesare random and carry information only by reflecting underlyingrate variation. Other features of precise spike timing can bepredicted from spike count and the PSTH and therefore cannot carryinformation unavailable from these coarser measures. In this context,order statistics (Arnold et al., 1992), whether calculated directlyor in special cases such as a mixture of Poisson processes, formalizesthe intimate connection between spike timing and spike count (Wienerand Richmond, 1999; Oram et al., 1999). Because spike count distributionsare often non-Poisson (Baddeley et al., 1997; Gershon et al.,1998), single inhomogeneous Poisson models cannot be expectedto match features of precise timing. Therefore, the mixture-of-Poissonsmodel is a more appropriate null hypothesis when searching inneuronal responses for timing relations that are unexpected, andwhich therefore may carry uniqueinformation.

    Referencesd, http://www.100md.com

    Arnold BC, Balakrishnan N, Nagaraja HN (1992) In: A first course in order statistics. New York: Wiley .d, http://www.100md.com

    Baddeley R, Abbott LF, Booth MCA, Sengpiel F, Freeman T, Wakeman EA, Rolls ET (1997) Responses of neurons in primary and inferior temporal visual cortices to natural scenes. Proc R Soc Lond B Biol Sci 264:1775-1783 .d, http://www.100md.com

    Baker SN, Lemon RN (2000) Precise spatiotemporal repeating patterns in monkey primary and supplementary motor areas occur at chance levels. J Neurophysiol 84:1770-1780 .d, http://www.100md.com

    Barbieri R, Quirk MC, Frank LM, Wilson MA, Brown EN (2001) Construction and analysis of non-Poisson stimulus-response models of neural spiking activity. J Neurosci Methods 105:25-37 .d, http://www.100md.com

    Brown EN, Frank LM, Tang D, Quirk MC, Wilson MA (1998) A statistical paradigm for neural spike train decoding applied to position prediction from ensemble firing patterns of rat hippocampal place cells. J Neurosci 18:7411-7425 .d, http://www.100md.com

    Brown EN, Barbieri R, Ventura V, Kass RE, Frank LM (2002) The time-rescaling theorem and its application to neural spike train data analysis. Neural Comput 14:325-346 .

    Cover TM, Thomas JA (1991) In: Elements of information theory. New York: Wiley .9t, 百拇医药

    Craven P, Wahba G (1979) Smoothing noisy data with spline functions. Numerische Mathematik 31:377-403 .9t, 百拇医药

    Gabbiani F, Koch C (1996) Coding of time-varying signals in spike trains of integrate-and-fire neurons with random threshold. Neural Comput 8:44-66 .9t, 百拇医药

    Gershon ED, Wiener MC, Latham PE, Richmond BJ (1998) Coding strategies in monkey V1 and inferior temporal cortices. J Neurophysiol 79:1135-1144 .9t, 百拇医药

    Golomb D, Hertz J, Panzeri S, Treves A, Richmond BJ (1997) How well can we estimate the information carried in neuronal responses from limited samples? Neural Comput 9:649-665 .9t, 百拇医药

    Greschner M, Bongard M, Rujan P, Ammermuller J (2002) Retinal ganglion cell synchronization by fixational eye movements improves feature estimation. Nat Neurosci 5:341-347 .9t, 百拇医药

    Huang X, Paradiso MA (2000) Stimulus structure and expectation reflected in the delayed responses of macaque V1 neurons. Soc Neurosci Abstr 26:559 .

    Ihaka R, Gentleman R (1996) R: a language for data analysis and graphics. J Comput Graphical Stat 5:299-314 .'d4-, 百拇医药

    Kass RE, Ventura V (2001) A spike-train probability model. Neural Comput 13:1713-1720 .'d4-, 百拇医药

    Kjaer TW, Gawne TJ, Hertz JA, Richmond BJ (1997) Insensitivity of V1 complex cells to small shifts in the retinal image of complex patterns. J Neurophysiol 78:3187-3197 .'d4-, 百拇医药

    Lansky P, Vaillant J (2000) Stochastic model of the overdispersion in the place cell discharge. Biosystems 58:27-32 .'d4-, 百拇医药

    Lansky P, Fenton AA, Vaillant J (2001) The overdispersion in activity of place cells. Neurocomputing 38:1393-1399 .'d4-, 百拇医药

    Loader C (1999) In: Local regression and likelihood. New York: Springer .'d4-, 百拇医药

    Manwani A, Steinmetz PN, Koch C (2001) The impact of spike timing variability on the signal-encoding performance of neural spiking models. Neural Comput 14:347-367 .'d4-, 百拇医药

    Nirenberg S, Carcieri SM, Jacobs AL, Latham PE (2001) Retinal ganglion cells act largely as independent encoders. Nature 411:698-701 .

    Oram MW, Wiener MC, Lestienne R, Richmond BJ (1999) The stochastic nature of precisely timed spike patterns in visual system neuronal responses. J Neurophysiol 81:3021-3033 .ou)^m@6, http://www.100md.com

    Oram MW, Hatsopoulos NG, Richmond BJ, Donoghue JP (2001) Synchrony in motor cortical neurons provides direction information that is redundant with the information from coarse temporal response measures. J Neurophysiol 86:1700-1716 .ou)^m@6, http://www.100md.com

    Panzeri S, Treves A (1996) Analytical estimates of limited sampling biases in different information measures. Network 7:87-107 .ou)^m@6, http://www.100md.com

    Reich DS, Mechler F, Victor JD (2001) Formal and attribute-specific information in primary visual cortex. J Neurophysiol 85:305-318 .ou)^m@6, http://www.100md.com

    Reinagel P, Reid RC (2000) Temporal coding of visual information in the thalamus. J Neurosci 20:5392-5400 .ou)^m@6, http://www.100md.com

    Rieke F, Warland D, de Ruyter van Steveninck RR, Bialek W (1997) In: Spikes: a Bradford book. Cambridge, MA: MIT .ou)^m@6, http://www.100md.com

    Schwartz AB, Moran DW (2000) Arm trajectory and representation of movement processing in motor cortical activity. Eur J Neurosci 12:1851-1856 .

    Serruya MD, Hatsopoulos NG, Paninski L, Fellows MR, Donoghue JP (2002) Brain-machine interface: instant neural control of a movement signal. Nature 416:141-142 .2z}y1m, 百拇医药

    Shannon CE, Weaver W (1949) In: The mathematical theory of communication. Urbana, IL: University of Illinois .2z}y1m, 百拇医药

    Sobotka S, Ringo JL (1997) Saccadic eye movements, even in darkness, generate event-related potentials recorded in medial sputum and medial temporal cortex. Brain Res 756:168-173 .2z}y1m, 百拇医药

    Sobotka S, Nowicka A, Ringo JL (1997) Activity linked to externally cued saccades in single units recorded from hippocampal, parahippocampal, and inferotemporal areas of macaques. J Neurophysiol 78:2156-2163 .2z}y1m, 百拇医药

    Strong SP, Koberle R, van Steveninck RDR, Bialek W (1998) Entropy and information in neural spike trains. Phys Rev Lett 80:197-200 .2z}y1m, 百拇医药

    Wessberg J, Stambaugh CR, Kralik JD, Beck PD, Laubach M, Chapin JK, Kim J, Biggs SJ, Srinivisan MA, Nicolelis MAL (2000) Real-time prediction of hand trajectory by ensembles of cortical neurons in primates. Nature 408:361-365 .2z}y1m, 百拇医药

    Wiener MC, Richmond BJ (1998) Using response models to study coding strategies in monkey visual cortex. Biosystems 48:279-286 .2z}y1m, 百拇医药

    Wiener MC, Richmond BJ (1999) Using response models to estimate channel capacity for neuronal classification of stationary visual stimuli using temporal coding. J Neurophysiol 82:2861-2875 .2z}y1m, 百拇医药

    Wiener MC, Oram MW, Liu Z, Richmond BJ (2001) Consistency of encoding in monkey visual cortex. J Neurosci 21:8210-8221(Matthew C. Wiener and Barry J. Richmond)

百拇医药网 http://www.100md.com/html/DirDu/2005/05/06/58/20/19.htm