LT^2C^2: A language of thought with Turing-computable Kolmogorov complexity

In this paper, we present a theoretical effort to connect the theory of program size to psychology by implementing a concrete language of thought with Turing-computable Kolmogorov complexity (LT^2C^2) satisfying the following requirements: 1) to be simple enough so that the complexity of any given finite binary sequence can be computed, 2) to be based on tangible operations of human reasoning (printing, repeating,...), 3) to be sufficiently powerful to generate all possible sequences but not too powerful as to identify regularities which would be invisible to humans. We first formalize LT^2C^2, giving its syntax and semantics and defining an adequate notion of program size. Our setting leads to a Kolmogorov complexity function relative to LT^2C^2 which is computable in polynomial time, and it also induces a prediction algorithm in the spirit of Solomonoff's inductive inference theory. We then prove the efficacy of this language by investigating regularities in strings produced by participants attempting to generate random strings. Participants had a profound understanding of randomness and hence avoided typical misconceptions such as exaggerating the number of alternations. We reasoned that remaining regularities would express the algorithmic nature of human thoughts, revealed in the form of specific patterns. Kolmogorov complexity relative to LT^2C^2 passed three expected tests examined here: 1) human sequences were less complex than control PRNG sequences, 2) human sequences were not stationary, showing decreasing values of complexity resulting from fatigue, 3) each individual showed traces of algorithmic stability since fitting of partial sequences was more effective to predict subsequent sequences than average fits. This work extends on previous efforts to combine notions of Kolmogorov complexity theory and algorithmic information theory to psychology, by explicitly ...


I. Introduction
Although people feel they understand the concept of randomness [1], humans are unable to produce random sequences, even when instructed to do so [2][3][4][5][6], and to perceive randomness in a way that is inconsistent with probability theory [7][8][9][10].For instance, random sequences are not perceived by participants as such because runs appear too long to be random [11,12] and, similarly, sequences pro-duced by participants aiming to be random have too many alternations [13,14].This bias, known as the gambler's fallacy, is thought to result from an expectation of local representativeness (LR) of randomness [10] which ascribes chance to a selfcorrecting mechanism, promptly restoring the balance whenever disrupted.In words of Tversky and Kahneman [5], people apply the law of large numbers too hastily, as if it were the law of small numbers.The gambler's fallacy leads to classic psychological illusions in real-world situations such as the hot hand perception by which people assume specific states of high performance, while analysis of records show that sequences of hits and misses are largely compatible with Bernoulli (random) process [15,16].
Despite massive evidence showing that perception and productions of randomness shows systematic distortions, a mathematical and psychological theory of randomness remains partly elusive.From a mathematical point of view -as discussed below-a notion of randomness for finite sequences presents a major challenge.
From a psychological point of view, it remains difficult to ascribe whether the inability to produce and perceive randomness adequately results from a genuine misunderstanding of randomness or, instead, as a consequence of the algorithmic nature of human thoughts which is revealed in the forms of patterns and, hence, in the impossibility of producing genuine chance.
In this work, we address both issues by developing a framework based on a specific language of thought by instantiating a simple device which induces a computable (and efficient) definition of algorithmic complexity [17][18][19].
The notion of algorithmic complexity is described in greater detail below but, in short, it assigns a measure of complexity to a given sequence as the length of the shortest program capable of producing it.If a sequence is algorithmically compressible, it implies that there may be a certain pattern embedded (described succinctly by the program) and hence it is not random.For instance, the binary version of Champernowne's sequence [20] 01101110010111011110001001101010111100 . . .consisting of the concatenation of the binary representation of all the natural numbers, one after another, is known to be normal in the scale of 2, which means that every finite word of length n occurs with a limit frequency of 2 −n -e.g., the string 1 occurs with probability 2 −1 , the string 10 with probability 2 −2 , and so on.Although this sequence may seem random based on its probability distribution, every prefix of length n is produced by a program much shorter than n.
The theory of program size, developed simultaneously in the '60s by Kolmogorov [17], Solomonoff [21] and Chaitin [22], had a major influence in theoretical computer science.Its practical relevance was rather obscure because most notions, tools and problems were undecidable and, overall, because it did not apply to finite sequences.A problem at the heart of this theory is that the complexity of any given sequence depends on the chosen language.For instance, the sequence x1 = 1100101001111000101000110101100110011100 which seems highly complex, may be trivially accounted by a single character if there is a symbol (or instruction of a programming language) which accounts for this sequence.This has its psychological analog in the kind of regularities people often extract: is obviously a non-random sequence, as it can succinctly be expressed as repeat 20 times: print '10'. (1) Instead, the sequence appears more random and yet it is highly compressible as it consists of the first 40 binary digits of π after the decimal point.This regularity is simply not extracted by the human-compressor and demonstrates how the exceptions to randomness reveal natural patterns of thoughts [23].
The genesis of a practical (computable) algorithmic information theory [24] has had an influence (although not yet a major impact) in psychology.Variants of Kolmogorov complexity have been applied to human concept learning [25], to general theories of cognition [26] and to subjective randomness [23,27].In this last work, Falk and Konold showed that a simple measure, inspired in algorithmic notions, was a good correlate of perceived randomness [27].Griffiths & Tenenbaum developed statistical models that incorporate the detection of certain regularities, which are classified in terms of the Chomsky hierarchy [23].They showed the existence of motifs (repetition, symmetry) and related their probability distributions to Kolmogorov complexity via Levin's coding theorem (cf.section VII. for more details).
The main novelty of our work is to develop a class of specific programming languages (or Turing machines) which allows us to stick to the theory of program size developed by Kolomogorov, Solomonoff and Chaitin.We use the patterns of sequences of humans aiming to produce random strings to fit, for each individual, the language which captures these regularities.

II. Mathematical theory of randomness
The idea behind Kolmogorov complexity theory is to study the length of the descriptions that a formal language can produce to identify a given string.All descriptions are finite words over a finite alphabet, and hence each description has a finite length -or, more generally, a suitable notion of size.One string may have many descriptions, but any description should describe one and only one string.Roughly, the Kolmogorov complexity [17] of a string x is the length of the shortest description of x.So a string is 'simple' if it has at least one short description, and it is 'complex' if all its descriptions are long.Random strings are those with high complexity.
As we have mentioned, Kolmogorov complexity uses programming languages to describe strings.Some programming languages are Turing complete, which means that any partial computable function can be represented in it.The commonly used programming languages, like C++ or Java, are all Turing complete.However, there are also Turing incomplete programming languages, which are less powerful but more convenient for specific tasks.
In any reasonable imperative language, one can describe x 2 above with a program like (1), of length 26, which is considerably smaller than 40, the size of the described string.It is clear that x 2 is 'simple'.The case of x 3 is a bit tricky.Although at first sight it seems to have a complete lack of structure, it contains a hidden pattern: it consists of the first forty binary digits of π after the decimal point.This pattern could hardly be recognized by the reader, but once it is revealed to us, we agree that x 3 must also be tagged as 'simple'.Observe that the underlying programming language is central: x 3 is 'simple' with the proviso that the language is strong enough to represent (in a reasonable way) an algorithm for computing the bits of π -a language to which humans are not likely to have access when they try to find patterns in a string.Finally, for x 1 , the best way to describe it seems to be something like print '1100101001111000101000110101100110011100', which includes the string in question verbatim, length 48.Hence x 1 only has long descriptions and hence it is 'complex'.
In general, both the string of length n which alternates 0s and 1s and the string which consists of the first n binary digits of π after the decimal point can be computed by a program of length ≈ log nand this applies to any computable sequence.The idea of the algorithmic randomness theory is that a truly random string of length n necessarily needs a program of length ≈ n (cf.section ii. for details).

i. Languages, Turing machines and Kolmogorov complexity
Any programming language L can be formalized with a Turing machine M L , so that programs of L are represented as inputs of M L via an adequate binary codification.If L is Turing complete then the corresponding machine M L is called universal, which is equivalent to say that M L can simulate any other Turing machine.Let {0, 1} * denote the set of finite words over the binary alphabet.Given a Turing machine M , a program p and a string x (p, x ∈ {0, 1} * ), we say that p is an M -description of x if M (p) = xi.e., the program p, when executed in the machine M , computes x.Here we do not care about the time that the computation needs, or the memory it consumes.The Kolmogorov complexity of x ∈ {0, 1} * relative to M is defined by the length of the shorter M -description of x.More formally, where |p| denotes the length of p.Here M is any given Turing machine, possibly one with a very specific behavior, so it may be the case that a given
string x does not have any M -description at all.In this case, M (x) = ∞.In practical terms, a machine M is a useful candidate to measure complexity if it computes a surjective function.In this case, every string x has at least one M -description and therefore K M (x) < ∞.

ii. Randomness for finite words
The strength of Kolmogorov complexity appears when M is set to any universal Turing machine U .The invariance theorem states that K U is minimal, in the sense that for every Turing machine M there is a constant c M such that for all x ∈ {0, 1} * we have K U (x) ≤ K M (c) + c M .Here, c M can be seen as the specification of the language M in U (i.e., the information contained in c M tells U that the machine to be simulated is M ).If U and U ′ are two universal Turing then K U and K U ′ differ at most by a constant.In a few words, K U (x) represents the length of the ultimate compressed version of x, performed by means of algorithmic processes.
For analysis of arbitrarily long sequences, c M becomes negligible and hence for nonpractical aspects of the theory the choice of the machine is not relevant.However, for short sequences, as we study here, this becomes a fundamental problem, as notions of complexity are highly dependent on the choice of the underlying machine through the constant c M .The most trivial example, as referred in the introduction, is that for any given sequence, say x 1 , there is a machine M for which x 1 has minimal complexity.

iii. Solomonoff induction
Here we have presented compression as a framework to understand randomness.Another very influential paradigm proposed by Schnorr is to use the notion of martingale (roughly, a betting strategy), by which a sequence is random if there is no computable martingale capable of predicting forthcoming symbols (say, of a binary alphabet {0, 1}) better than chance [28,29].In the 1960s, Solomonoff [21] proposed a universal prediction method which successfully approximates any distribution µ, with the only requirement of µ being computable.
This theory brings together concepts of algorithmic information, Kolmogorov complexity and prob-ability theory.Roughly, the idea is that amongst all 'explanations' of x, those which are 'simple' are more relevant, hence following Occam's razor principle: amongst all hypothesis that are consistent with the data, choose the simplest.Here the 'explanations' are formalized as programs computing x, and 'simple' means low Kolmogorov complexity.
Solomonoff's theory, builds on the notion of monotone (and prefix) Turing machines.Monotone machines are ordinary Turing machines with a one-way read-only input tape, some work tapes, and a one-way write-only output tape.The output is written one symbol at a time, and no erasing is possible in it.The output can be finite if the machine halts, or infinite in case the machine computes forever.The output head of monotone machines can only "print and move to the right" so they are well suited for the problem of inference of forthcoming symbols based on partial (and finite) states of the output sequence.Any monotone machine N has the monotonicity property (hence its name) with respect to extension: if p, q ∈ {0, 1} * then N (p) is a prefix of N (p q), where p q denotes the concatenation of p and q.
One of Solomonoff's fundamental results is that given a finite observed sequence x ∈ {0, 1} * , the most likely finite continuation is the one in which the concatenation of x and y is less complex in a Kolmogorov sense.This is formalized in the following result (see theorem 5.2.3 of [24]): for almost all infinite binary sequences X (in the sense of µ) we have Here, X↾ n represents the first n symbols of X, and Km U is the monotone Kolmogorov complexity relative to a monotone universal machine U .That is, Km U (x) is defined as the length of the shortest program p such that the output of U (p) starts with x -and possibly has a (finite or infinite) continuation.
In other words, Solomonoff inductive inference leads to a method of prediction based on data compression, whose idea is that whenever the source has output the string x, it is a good heuristic to choose the extrapolation y of x that minimizes Km U (x y).For instance, if one has observed x 2 , it is more likely for the continuation to be 1010 rather than 0101, as the former can be succinctly described by a program like repeat 22 times: print '10'.(2) and the latter looks more difficult to describe; indeed the shorter program describing it seems to be something like repeat 20 times: print '10'; (3) print '0101'.
Intuitively, as program ( 2) is shorter than (3), x 2 1010 is more probable than x 2 0101.Hence, if we have seen x 2 , it seems to be a better strategy to predict 1.

III. A framework for human thoughts
The notion of thought is not well grounded.We lack an operative working definition and, as also happens with other terms in neuroscience (consciousness, self, ...), the word thought is highly polysemic in common language.It may refer, for example, to a belief, to an idea or to the content of the conscious mind.Due to this difficulty, the mere notion of thought has not been a principal or directed object of study in neuroscience, although of course it is always present implicitly, vaguely, without a formal definition.
Here we do not intend to elaborate an extensive review on the philosophical and biological conceptions of thoughts (see [30] for a good review on thoughts).Nor are we in a theoretical position to provide a full formal definition of a thought.Instead, we point to the key assumptions of our framework about the nature of thoughts.This accounts to defining constraints in the class of thoughts which we aim to describe.In other words, we do not claim to provide a general theory of human thoughts (which is not amenable at this stage lacking a full definition of the class) but rather of a subset of thoughts which satisfy certain constraints defined below.
For instance, E.B. Titchener and W. Wundt, the founders of structuralist school in psychology (seeking structure in the mind without evoking metaphysical conceptions, a tradition which we inherit and to which we adhere), believed that thoughts were images (there are not imageless thoughts) and hence can be broken down to elementary sensations [30].While we do not necessarily agree with this propositions (see Carey [31] for more contemporary versions denying the sensory foundations of conceptual knowledge), here we do not intend to explain all possible thoughts but rather a subset, a simpler class which -in agreement with the Wundt and Titchener-can be expressed in images.More precisely, we develop a theory which may account for Boole's [32] notion of thoughts as propositions and statements about the world which can be represented symbolically.Hence, a first and crucial assumption of our framework is that thoughts are discrete.Elsewhere we have extensively discussed [33][34][35][36][37][38][39] how the human brain, whose architecture is quite different from Turing machines, can emerge in a form of computation which is discrete, symbolic and resembles Turing devices.
Second, here we focus on the notion of "propless" mental activity, i.e., whatever (symbolic) computations can be carried out by humans without resorting to external aids such as paper, marbles, computers or books.This is done by actually asking participants to perform the task "in their heads".Again, this is not intended to set a proposition about the universality of human thoughts but, instead, a narrower set of thoughts which we conceive is theoretically addressable in this mathematical framework. Summarizing: 1. We think we do not have a good mathematical (even philosophical) conception of thoughts, as mental structures, yet.
2. Intuitively (and philosophically), we adhere to a materialistic and computable approach to thoughts.Broadly, one can think (to picture, not to provide a formal framework) that thoughts are formations of the mind with certain stability which defines distinguishable clusters or objects [40][41][42].
3. While the set of such objects and the rules of their transitions may be of many different forms (analogous, parallel, unconscious, unlinked to sensory experience, non-linguistic, non-symbolic), here we work on a subset of thoughts, a class defined by Boole's attempt
to formalize thought as symbolic propositions about the world.
4. This states -which may correspond to human "conscious rational thoughts", the seed of Boole and Turing foundations [34,34]-are discrete and defined by symbols and potentially represented by a Turing device.
5. We focus on an even narrower space of thoughts.Binary formations (right or left, zero or one) to focus on what kind of language better describes these transitions.This work can be naturally extended to understand discrete transitions in conceptual formations [43][44][45].
6.We concentrate on prop-less mental activity to understand limitations of the human mind when it does not have evident external support (paper, computer...) IV. Implementing a language of thought with Turingcomputable complexity As explained in section II.i., Kolmogorov complexity considers all possible computable compressors and assigns to a string x the length of the shortest of the corresponding compressions.This seems to be a perfect theory of compression but it has a drawback: the function K U is not computable, that is, there is no effective procedure to calculate K U (x) given x.
On the other hand, the definition of randomness introduced in section II.i., having very deep and intricate connections with algorithmic information and computability theories, is simply too strong to explain our own perception of randomness.To detect that x 3 consists of the first twenty bits of π is incompatible with human patterns of thought.
Hence, the intrinsic algorithms (or observed patterns) which make human sequences not random are too restricted to be accounted by a universal machine and may be better described by a specific machine.Furthermore, our hypothesis is that each person uses his own particular specific machine or algorithm to generate a random string.
As a first step in this complicated enterprise, we propose to work with a specific language LT 2 C 2 which meets the following requirements: • LT 2 C 2 must reflect some plausible features of our mental activity when finding succinct descriptions of words.For instance, finding repetitions in a sequence such as x 2 seems to be something easy for our brain, but detecting numerical dependencies between its digits as in x 3 seems to be very unlikely.
• LT 2 C 2 must be able to describe any string in {0, 1} * .This means that the map given by the induced machine N def = N LT 2 C 2 must be surjective.
• N must be simple enough so that K N -the Kolmogorov complexity relative to N -becomes computable.This requirement clearly makes LT 2 C 2 Turing incomplete, but as we have seen before, this is consistent with human deviations from randomness.
• The rate of compression given by K N must be sensible for very short strings, since our experiments will produce such strings.For instance, the approach, followed in [46], of using the size of the compressed file via general-purpose compressors like Lempel-Ziv based dictionary (gzip) or block based (bzip2) to approximate the Kolmogorov complexity does not work in our setting.This method works best for long files.
• LT 2 C 2 should have certain degrees of freedom, which can be adjusted in order to approximate the specific machine that each individual follows during the process of randomness generation.
We will not go into the details on how to codify the instructions of LT 2 C 2 into binary strings of N : for the sake of simplicity we take N as a surjective total mapping LT 2 C 2 → {0, 1} * .We restrict ourselves to describe the grammar and semantics of our proposed programming language LT 2 C 2 .It is basically an imperative language with only two classes of instructions: a sort of print i, which prints the bit i in the output; and a sort of repeat n times P , which for a fixed n ∈ N it repeats n times the program P .The former is simply represented as i and the latter as (P ) n .
Formally, we set the alphabet {0, 1, (, ), 0 , . . ., 9 } and define LT 2 C 2 over such alphabet with the following grammar: where n > 1 is the decimal representation of n ∈ N and ǫ denotes the empty string.The semantics of LT 2 C 2 is given through the behavior of N as follows: .
N is not universal, but every string x has a program in N which describes it: namely x itself.Furthermore, N is monotone in the sense that if p, q ∈ LT 2 C 2 then N (p) is a prefix of N (p q).In Table 1, the first column shows some examples of N -programs which compute 1001001001.
program size 1001001001 10 (100) 2 1(0) 2 1 6.6 (100) 3 1 4.5 1((0) 2 1) 3  3.8 Table 1: Some N -descriptions of 1001001001 and its sizes for b = r = 1 i.Kolmogorov complexity for LT 2 C 2 The Kolmogorov complexity relative to N (and hence to the language LT 2 C 2 ) is defined as where p , the size of a program p, is inductively defined as: In the above definition, b ∈ N, r ∈ R are two parameters that control the relative weight of the print operation and the repeat n times operation.In the sequel, we drop the subindex of K N and simply write K def = K N .Table 1 shows some examples of the size of N -programs when b = r = 1.Observe that for all x we have K(x) ≤ x .
It is not difficult to see that K(x) depends only on the values of K(y), where y is any nonempty and proper substring of x.Since • is computable in polynomial time, using dynamic programming one can calculate K(x) in polynomial time.This, of course, is a major difference with respect to the Kolmogorov complexity relative to a universal machine, which is not computable.

ii. From compression to prediction
As one can imagine, the perfect universal prediction method described in section II.iii.is, again, noncomputable.We define a computable prediction algorithm based on Solomonoff's theory of inductive inference but using K, the Kolmogorov complexity relative to LT 2 C 2 , instead of Km U (which depends on a universal machine).To predict the next symbol of x ∈ {0, 1} * , we follow the idea described in section II.iii.: amongst all extrapolations y of x we choose the one that minimizes K(x y).If such y starts with 1, we predict 1, else we predict 0. Since we cannot examine the infinitely many extrapolations, we restrict to those up to a fixed given length ℓ F .Also, we do not take into account the whole x but only a suffix of length ℓ P .Both ℓ F and ℓ P are parameters which control, respectively, how many extrapolation bits are examined (ℓ F many Future bits) and how many bits of the tail of x (ℓ P many Past bits) are considered.
Let {0, 1} n (resp.{0, 1} ≤n ) be the set of words over the binary alphabet {0, 1} of length n (resp.at most n).Formally, the prediction method is as follows.Suppose ) is a string.The next symbol is determined as follows: where for i ∈ {0, 1}, and g : {0, 1} ℓP → {0, 1} is defined as g(z) = i if the number of occurrences of i in z is greater than the number of occurrences of 1 − i in z; in case the number of occurrences of 1s and 0s in z coincide then g(z) is defined as the last bit of z.

V. Methods
Thirty eight volunteers (mean age = 24) participated in an experiment to examine the capacity of LT 2 C 2 to identify regularities in production of binary sequences.Participants were asked to produce random sequences, without further instruction.All the participants were college students or graduates with programming experience and knowledge of the theoretical foundations of randomness and computability.This was intended to test these ideas in a hard sample where we did not expect typical errors which results from a misunderstanding of chance.
The experiment was divided in four blocks.In each block the participant pressed freely the left or right arrow 120 times.
After each key press, the participant received a notification with a green square which progressively filled a line to indicate the participant the number of choices made.At the end of the block, participants were provided feedback of how many times the predictor method has correctly predicted their input.After this point, a new trial would start.
38 participants performed 4 sequences, yielding a total of 152 sequences.14 sequences were excluded from analysis because they had an extremely high level of predictability.Including these sequences would have actually improved all the scores reported here.

i. Law of large numbers
Any reasonable notion of randomness for strings on base 2 should imply Borel's normality, or the law of large numbers in the sense that if x ∈ {0, 1} n is random then the number of occurrences of any given string y in x divided by n should tend to 2 −|y| , as n goes to infinity.
A well-known result obtained in some investigations on generation or perception of randomness in binary sequences is that people tend to increase the number of alternations of symbols with respect to the expected value [27].Given a string x of length n with r runs, there are n − 1 transitions between successive symbols and the number of alternations between symbol types is r − 1.The probability of alternation of the string x is defined as In our experiment, the average P (x) of participants was 0.51, very close to the expected probability of alternation of a random sequence which should be 0.5.A t-test on the P (x) of the strings produced by participants, where the null hypothesis is that they are a random sample from a normal distribution with mean 0.5, shows that the hypothesis cannot be rejected as the p-value is 0.31 and the confidence interval on the mean is [0.49, 0.53].This means that the probability of alternation is not a good measure to distinguish participant's strings from random ones, or at least, that the participants in this very experiment can bypass this validation.
Although the probability of alternation was close to the expected value in a random string, participants tend to produce n-grams of length ≥ 2 with probability distributions which are not equiprobable (see Fig. 1).Strings containing more alternations (like 1010, 0101, 010, 101) and 3− and 4− runs have a higher frequency than expected by chance.This might be seen as an effort from participants to keep the probability of alternation close to 0.5 by compensating the excess of alternations with blocks of repetitions of the same symbol.

ii. Comparing human randomness with other random sources
We asked whether K, the Kolmogorov complexity relative to LT 2 C 2 defined in section IV.i., is able to detect and compress more patterns in strings generated by participants than in strings produced by other sources, which are considered random for many practical issues.In particular, we studied strings originated by two sources: Pseudo-Random Number Generator (PRNG) and Atmospheric Noise (AN).For the PRNG source, we chose the Mersenne Twister algorithm [47] (specifically, the second revision from 2002 that is currently implemented in GNU Scientific Library).The atmospheric noise was taken from random.orgsite (property of Randomness and Integrity Services Limited) which also runs real-time statistic tests recommended by the US National Institute of Standards and Technology to ensure the random quality of the numbers produced over time.
In Table 2, we summarize our results using b = 1 and r = 1 for the parameters of K as defined in section IV.i.The mean and median of K increases when comparing participant's string with PRNG or AN strings.This difference was significant, as confirmed by a t-test (p-value of 4.9 × 10 −11 when comparing participant's sample with PRNG one, a p-value of 1.2 × 10 −15 when comparing participant's with AN and a p-value of 1.4 × 10 −2 when comparing PRNG with AN sample).

Participants
Therefore, despite the simplicity of LT 2 C 2 , based merely on prints and repeats, it is rich enough to identify regularities of human sequences.The K function relative to LT 2 C 2 is an effective and significant measure to distinguish strings produced by participants with profound understanding in the mathematics of randomness, from PRNG and AN strings.As expected, humans produce less complex (i.e., less random) strings than those produced by PRNG or atmospheric noise sources.

iii. Mental fatigue
On cognitively demanding tasks, fatigue affects performance by deteriorating the capacity to organize behavior [48][49][50][51][52]. Specifically, Weiss claim that boredom may be a factor that increases nonrandomness [48].Hence, as another test to the ability of K relative to LT 2 C 2 to identify idiosyncratic elements of human regularities, we asked whether the random quality of the participant's string deteriorated with time.
For each of the 138 strings x = x 1 • • • x 120 (x i ∈ {0, 1}) produced by the participants, we measured the K complexity of all the sub-strings of length 30.
Specifically, we calculated the average K(x i • • • x i+30 ) from the 138 strings for each i ∈ [0, 90] (see Fig. 2), using the same parameters as in section VI.ii.(b = r = 1), and compared to the same sliding average procedure for PRNG (Fig. 3) and AN sources (Fig. 4).
The sole source which showed a significant linear regression was human generated data (see Table 3) which, as expected, showed a negative correlation indicating that participants produced less complex or random strings over time (slope −0.007, p < 0.02).
The finding of a fatigue-related effect shows that the unpropped, i.e., resource-limited, human Turing machine is not only limited in terms of the language it can parse, but also in terms of the amount of time it can dedicate to a particular task.

050001-9
Papers in Physics, vol. 5, art.050001 (2013) / S Romano et al.In section IV.ii., we introduced a prediction method with two parameters: ℓ F and ℓ P .A predictor based on LT 2 C 2 achieved levels of predictability close to 56% which were highly significant (see Table 4).The predictor, as expected, performed at chance for the control PRNG and AN data.This fit was relatively insensitive to the values of ℓ P and ℓ F , Table 3: Predictability contrary to our intuition that there may be a memory scale which would correspond in this framework to a given length.
A very important aspect of this investigation, in line with the prior work of [23], is to inquire whether specific parameters are stable for a given individual.To this aim, we optimized, for each participant, the parameters using the first 80 symbols of the sequence and then tested these parameters in the second half of each segment (last 80 symbols of the sequence) After this optimization procedure, mean predictability increased significantly to 58.14% (p < 0.002, see Table 5).As expected, the optimization based on partial data of PRNG and AN resulted in no improvement in the classifier, which remained at chance with no significant difference (p < 0.3, p < 0.2, respectively).Hence, while the specific parameters for compression vary widely across each individual, they show stability in the time-scale of this experiment.

VII. Discussion
Here we analyzed strings produced by participants attempting to generate random strings.Participants had a profound understanding of randomness and hence avoided typical misconceptions such as exaggerating the number of alternations.We reasoned that remaining regularities would express the algorithmic nature of human thoughts, revealed in the form of specific patterns.
Our effort here was to bridge the gap between Kolmogorov theory and psychology, developing a concrete language, LT 2 C 2 , satisfying the following requirements: 1) to be simple enough so that the complexity of any given sequence can be computed, 2) to be based on tangible operations of human reasoning ( printing, repeating, . . .), 3) to be sufficiently powerful to generate all possible sequences but not too powerful as to identify regularities which would be invisible to humans.
More specifically, our aim is to develop a class of languages with certain degrees of freedom which can then be fit to an individual (or an individual in a specific context and time).Here, we opted for a comparably easier strategy by only allowing the relative cost of each operation to vary.However, a natural extension of this framework is to generate classes of languages where structural and qualitative aspects of the language are free to vary.For instance, one can devise a program structure for repeating portions of (not necessarily neighboring) code, or considering the more general framework of for-programs where the repetitions are more general than in our setting: for i=1 to n do P (i), where P is a program that uses the successive values of i = 1, 2, . . ., n in each iteration.For instance, the following program for i=1 to 6 do print '0' repeat i times: print '1' would describe the string 010110111011110111110111111.
The challenge from the computational theoretical point of view is to define an extension which induces a computable (even more, feasible, whenever possible) Kolmogorov complexity.For instance, adding simple control structures like conditional jumps or allowing the use of imperative program variables may turn the language into Turing-complete, with the theoretical consequences that we already mentioned.The aim is to keep the language simple and yet include structures to compact some patterns which are compatible with the human language of thought.
We emphasize that our aim here was not to generate an optimal predictor of human sequences.Clearly, restricting LT 2 C 2 to a very rudimentary language is not the way to go to identify vast classes of patterns.Our goal, instead, was to use human sequences to calibrate a language which expresses and captures specific patterns of human thought in a tangible and concrete way.
Our model is based on ideas from Kolmogorov complexity and Solomonoff's induction.It is important to compare it to what we think is the closest and more similar approach in previous studies: the work [23] of Griffiths and Tenenbaum's.Griffiths and Tenenbaum devise a series of statistical models that account for different kind of regularities.Each model Z is fixed and assigns to every binary string x a probability P Z (x).This probabilistic approach is connected to Kolmogorov complexity theory via Levin's famous Coding Theorem, which points out a remarkably numerical relation between the algorithmic probability P U (x) (the probability that the universal prefix Turing machine U outputs x when the input is filled-up with the results of coin tosses) and the (prefix) Kolmogorov complexity K U described in section II.i.Formally, the theorem states that there is a constant c such that for any string x ∈ {0, 1} * such that (the reader is referred to section 4.3.4 of [24] for more details).Griffiths & Tenenbaum's bridge to Kolmogorov complexity is only established through this last theoretical result: replacing P U by P Z in Eq. ( 4) should automatically give us some Kolmogorov complexity K Z with respect to some underlying Turing machine Z.
While there is hence a formal relation to Kolmogorov complexity, there is no explicit definition of the underlying machine, and hence no notion of program.
On the contrary, we propose a specific language of thought, formalized as the programming language LT 2 C 2 or, alternatively, as a Turing machine N , which assigns formal semantics to each program.Semantics are given, precisely, through the behavior of N .The fundamental introduction of program semantics and the clear distinction between inputs (programs of N ) and outputs (binary strings) allows us to give a straightforward definition of Kolmogorov complexity relative to N , denoted K N , which -because of the choice of LT 2 C 2 -becomes computable in polynomial time.
Once we count with a complexity function, we apply Solomonoff's ideas of inductive inference to obtain a predictor which tries to guess the continuation of a given string under the assumption that the most probable one is the most compressible in terms of LT 2 C 2 -Kolmogorov complexity.As in [23], we also make use of the Coding Theorem ( 4), but in the opposite direction: given the complexity K N , we derive an algorithmic probability P N .This work is mainly a theoretical development, to develop a framework to adapt Kolmogorov ideas in a constructive procedure (i.e., defining an explicit language) to identify regularities in human sequences.The theory was validated experimentally, as three tests were satisfied: 1) human sequences were less complex than control PRNG sequences, 2) human sequences were non-stationary, showing decreasing values of complexity, 3) each individual showed traces of algorithmic stability since fitting of partial data was more effective to predict subsequent data than average fits.Our hope is that this theory may constitute, in the future, a useful framework to ground and describe the patterns of human thoughts.

Figure 1 :
Figure 1: Frequency of sub-strings up to length 4

Table 2 :
Values of K(x), where x is a string produced by participants, PRNG or AN sources