From learning algorithms…

The practical utility of computing machinery, automatic computation of mathematical tables and information technology has become extremely important in modern world. The introduction of the Boolean logic, the Bayesian decision theory and the conception of levels of abstraction (Turing) and ascending orders of languages (hierarchy of languages) gave a strong impetus to the development of computation. Aspirations nowadays are lifting off by the concurrence of artificial intelligence, symbolic / sub-symbolic methods and the exponential growth of data for machine learning, within the search for a master algorithm.

In the beginning of his Mathematical Analysis of Logic (2009; 1847), George Boole had stressed that the modern Symbolical Algebra does not depend on the interpretation of the symbols, but solely upon the laws of their combination. The interpretation of the symbols is irrelevant, since the “business of Logic is with the relations of classes, and with the modes in which the mind contemplates those relations.”[1] The product xy is used to combine “x and y”, the intersection of two sets. In the case of the product xx=x²=x and of the possible combinations x-xx=0 and x(1-x)=0, we have the nodal denouement of the Boolean Algebra as an analysis of class relations. Furthermore, the operation inclusive-or is written x + y, which means “all the x or all the y or x and y”. The operation exclusive-or is today written x ⊕ y, which means “all the x or all the y but not both x and y”. This symbolism was further developed with the so-called De Morgan’s theorems, the Peirce function, the Sheffer stroke function, and so on. The truth tables of the above-mentioned functions can be vividly visualized with means of Euler and Venn diagrams and Karnaugh maps.

The next clinching turn was made in the 1920s with the introduction of the vacuum tube and the electronics, “a new power source,” which disseminated the telephone lines and diffused the communication networks as never, all over the world. The vacuum tube could be used to rectify, to amplify, to generate (for instance, the radio transmitter), to control, to transform light into electric current, and to transform electric current into light. The successor of the vacuum tube was the transistor, more power-saving than the vacuum tube, more reliable, smaller, more effective, resilient etc. At the same period, Claude Shannon contributed to the foundations of computer science; in 1948, Claude Shannon of Bell Labs developed his mathematical theory of communication (also called information theory), the foundation of our understanding of digital transmission. Shannon’s switching theory depended on Boolean algebra, while both based their researches on the probability calculus. Shannon started his career at the MIT Department of Electrical Engineering, working part-time on Vannevar Bush’s famous electromechanical differential analyzer. His MIT master’s thesis, “A Symbolic Analysis of Relay and Switching Circuits,” was published in 1938 in the Transactions of the American Institute of Electrical Engineers.

Shannon defined the entropy of a coding system as the average informational value of its answers. The measure of the information content of a message is the reduction of the uncertainty by receiving a message. Just as the entropy of a message limits its minimum coding length, so too the complexity of the message determines its compressibility. Modern information theories are based on Turing machines and on the clarification of the concepts of probability and randomness. Kolmogorov, Solomonoff and Chaitin introduced the algorithmic theory of information, which measures the information content by the length of the shortest program which computes this content.

All and only effective procedures are computable functions[2]

The definition of computability by Turing is based on his argumentation that a particular logical system is not decidable.[3] The computationalists support that minds may be paralleled with formal systems, collections of systems and computations. Mental operations are considered as computations and the mind is seen as a particular register machine. The demand for functionality is plausibly the main underlying structure of computationalism. The mind may be viewed as equivalent to a formal system, an algorithm, and a register machine program, according to Carter (2007). Therefore, the mind obtains alternative ways of speaking, as the Church/Turing thesis suggests: all and only effective procedures are computable functions, hence, all and only those which are algorithmic. The Turing computable functions are exactly the recursive functions of positive integers, which are considered as effectively computable, as Church (1936) insisted.

Symbolic modeling and generally computational modeling is regarded as compatible with computationalism. Dynamicism, an alternative conception to computationalism, conjectures that cognition depends upon the state evolutions of a cognitive system. The computationalist theory is compatible only with symbolic models of cognitive science, but is incompatible with many connectionist and the nonconnectionist continuous models. Cognition is, essentially, a matter of the particular state evolutions that a cognitive system undergoes in certain situations, as Giunti (1996; 2006) argues.

Dynamical systems are arbitrary deterministic systems that may be regarded as computational, under the requirement that they admit an effective, uniform recursive representation. The nervous activity, for instance, embodies the “all-or-none” informational contents of the logical constants, being therefore representable by means of the two-valued propositional logic (McCulloch and Pitts, 1943). The age-old example of the optic stalk that after being cut, the eye went blind, is regarded by McCulloch (1953) as the beginning of neurophysiology and psychology as statistical and experimental sciences. “The eye is not only the most important of sense organs. It is the most complicated, being in reality an invaginated evagination of the brain itself”, as McCulloch (op. cit. p. 96) claimed. McCulloch is considered as a trailblazer for the interdisciplinary approach to understanding the brain, through neuropsychiatry and cybernetics (Abraham, 2016).

Computation is an objective phenomenon and only some systems can be regarded as computational, as Fresco (2015) stresses. The subjective view of the computational phenomenon is interconnected with pancomputationalism, anti-realism, and observer-relativity of computation (Putnam, 1988; Searle, 2008).

The basic idea of the computer model of the mind is that the mind is the program and the brain the hardware of a computational system. The slogan one often sees is “the mind is to the brain as the program is to the hardware” (Searle, 2008: p. 86).

Searle does not consider the mind as a computer program. The Chinese Room Argument assumes a resident in a room with an utmost efficient syntactical program. It instantiates a thought experiment with a strong artificial intelligence situation or computer functionalism, as the resident can respond to any Chinese question without commanding the language; neither the person inside the Chinese room nor any other part of the system literally understands Chinese. The conclusion is that “instantiating a computer program is never by itself a sufficient condition for intentionality.” The explanation of the production of intentionality by the brain cannot be approached only by means of instantiating a computer program, but it demands internal causal powers equivalent to those of brains (Searle, 1980: p. 417).

The Turing test is crucial for the later formulation of the problem: “Can a machine think?” If a computer can perform in such a way that a professional cannot distinguish between the performance of the computer and that of a human who has a certain cognitive ability, then the computer has that ability as well. Such a computer program would not be merely a model of the mind; “it would literally be a mind” (Searle, 1990: p. 26).

Further on, there is a distinction between instructional (computationally nontrivial) information and trivial computation. Therefore, nonempty, meaningful and well-formed data comprise instructional information. However, Turing Machines, quantum computers, relativistic computers, connectionist networks etc. do not conform to the same definition of computation. The perceptual system is more efficient and reliable than any computer in occasions of unconscious inference, such as with perceptual constancies and the underdetermination problem. The disagreement over computationalism becomes even more interesting, when it is viewed from the perspective of decision making and reasoning under uncertainty, in which case the Bayesian decision theory and the fundamental concept of subjective probability play an outstandingly significant role.

Bayesian models are based on the computation of four factors: prior probabilities, prior likelihoods, sensory input, and the utility function (Rescorla, 2015a). An important property of the probability functions is that the probability of any consequence b of a proposition a is at least as great as that of a itself:

P(a) ⩽ P(b)

Furthermore, Bayes’s Theorem states that the probability P(h|e) of the hypothesis conditional on the evidence (or the posterior probability of the hypothesis) is equal to the probability P(e|h) of the data conditional on the hypothesis (or the likelihood of the hypothesis) times the probability P(h) (the so-called prior probability) of the hypothesis, all divided by the probability P(e) of the data.

If the posterior probability P(h|e) is greater than the prior probability P(h), the evidence confirms or supports the hypothesis. If the posterior probability is smaller, the evidence disconfirms the hypothesis. Therefore, a hypothesis is maximally disconfirmed when it is refuted by lack of evidence (P(h|~e)=0). A theory can also be confirmed by its consequences, with the help of the Bolzano-Weierstrass theorem.

The observational counterparts of the theoretical notion of “probability distribution” are the relative frequencies (Howson and Urbach, 2005). Moreover, the so-called Gap Argument refers to the bridging of the gap between linguistic and non-linguistic entities, such as natural numbers, real numbers and other domains, by citing representational relations between linguistic and non-linguistic items (Rescorla, 2015b).

“What is a symbol, that intelligence may use it, and intelligence, that it may use a symbol?”[4]

The physical symbol system hypothesis of Allen Newell and Herbert A. Simon (1976) grants that a physical symbol system has the necessary and sufficient means for general intelligent action. Human thinking, therefore, is considered as a kind of symbol storing and manipulation, as “symbols lie at the root of intelligent action”. Allen Newell and Herbert A. Simon developed heuristic programming, heuristic search, means-ends analysis and the methods of induction in artificial intelligence. They are the “inventors of list processing, and have been major contributors to both software technology and the development of the concept of the computer as a system of manipulating symbolic structures” (Bernard A. Galler, Chairman of the Turing Award Committee, 1975).

Whereas the symbolic approaches of the Good Old Fashioned Artificial Intelligence (GOFAI) produce logical conclusions, allow for human intervention and handle better small data, the sub-symbolic methods present associative results, prefer to give priority to data and can handle huge datasets. The symbolic approaches have the advantage of explainability, by providing rules, decision trees and planning schemes to AI. By contrast, neural networks (Hebb, 1949), machine intelligence and statistical learning methods, such as Bayesian learning, deep learning, backpropagation, and genetic algorithms, are the basis of the non-symbolic, connectionist movement.[5] Intentionality is one of the most important instances of the limitations of the symbolic approaches, on account of the epistemological and ethical difficulties to mirror through AI the human behavior.

The five tribes of machine learning offer relevant comparisons: While symbolists take deduction as the inverse of learning, inspired by philosophy, logic and psychology, connectionists compare the brain with the engineer, influenced by physics and neuroscience. Evolutionaries assimilate the computer to evolution, following genetical and biological models, analogizers prefer mathematical optimization, and Bayesians suggest that learning is a probabilistic activity. The master algorithm of the symbolists is inverse deduction, the connectionists’ is backpropagation, the evolutionaries’ is genetic programming, the Bayesians’ is Bayesian inference and the analogizers’ is support vector machine. The hybridization or cohabitation of symbolic and sub-symbolic methods has also produced many in-between approaches, which are trying to find the ultimate master algorithm.

The problems of induction and explanation

Induction is a type of learning. The problem of induction refers to the demand for high accuracy of the classification of previously unseen examples of general concepts. Induction of decision trees, rule induction, instance-based learning, Bayesian classification, neural networks, genetic algorithms and statistical methods are among the practical applications of induction. The cross-validation and combination (multi-strategy learning) of automated applications of induction can solve the problem of bottlenecks in knowledge bases.

More specifically, rule induction and instance-based learning can be unified, alleviating their weaknesses, as Domingos (1997) showed, in his study of supervised concept learning. Unsupervised concept learning, on the other side, aims to concept formation or clustering. Furthermore, regression and probability estimation are applications of induction to an infinite number of classes.

Algorithms are devices and theories, which secure rigor in science. The research for the master algorithm selects a feasible task, since every algorithm, no matter how complex, can be reduced to the three operations AND, OR, and NOT, as Domingos (2015) points out.

Machine learning is practically the inverse of programming, it is its empirical output. The biggest part of our knowledge is statistical. Machine learning comprises the following modules: pattern recognition, statistical modeling, data mining, knowledge discovery, predictive analysis, data science, adaptive systems, self-organizing systems, etc. Learning algorithms are the matchmakers in the internet market, offering ways for handling the flood of data, limitless choice, vast choice, low cost, etc.

Finally, the exponential growth of data is a presupposition of scientific progress, as in Newtonian physics, or in contemporary molecular biology, digital sky surveys, particle accelerators and artificial intelligence. Causal explanations are vital for bottom-up scientific research, as Kitcher (1985) showed, distinguishing it from the top-down approach that pertains to prediction. Machine learning can predict the effectiveness of new drugs, for example. Prediction and explanation, based on the distinction between antecedents and consequences (causes and effects), should be considered as standard tasks by epistemology.

Bibliography

Tara H. Abraham (2016). “Modelling the mind: the case of Warren S. McCulloch.” Canadian Medical Association Journal 188(13): pp. 974-975.

George Boole (1848). “The Calculus of Logic.” Cambridge and Dublin Mathematical Journal, Vol. III, pp. 183-98.

—————— (1854). An Investigation of the Laws of Thought, on which Are Founded the Mathematical Theories of Logic and Probabilities. London: Walton and Maberley.

—————— (2009; 1847). The mathematical analysis of Logic. Being an Essay Towards a Calculus of Deductive Reasoning. Cambridge:Cambridge University Press.

Alonzo Church (1936). “An Unsolvable Problem of Elementary Number Theory.” American Journal of Mathematics, 58: 345-63.

Donald Hebb (1949). The organisation of behaviour. New York: Wiley & Sons.

Colin Howson & Peter Urbach (2005). Scientific Reasoning: The Bayesian Approach (3^rd ed.). Chicago: Open Court.

Eleni Ilkou & Maria Koutraki (2020). “Symbolic Vs Sub-symbolic AI Methods: Friends or Enemies?” Proceedings of the CIKM 2020 Workshops, October 19-20, Galway, Ireland.

Warren S. McCulloch (1953). “Information in the Head.” In: McMillan at al. Current Trends in Information Theory. Pittsburgh: University of Pittsburgh Press.

—————— (1961). “What is a number, that a man may know it, and a man, that he may know a number.” General Semantics Bulletin, 26 and 27: 7-18.

W. S. McCulloch & W. Pitts, (1943). “A logical calculus of the ideas immanent in nervous activity.” Bulletin of Mathematical Biophysics, 5, 115–137.

Matt Carter (2007). Minds and Computers: An Introduction to the Philosophy of Artificial Intelligence. Edinburgh: Edinburgh University Press.

Pedro Domingos (1997). A Unified Approach to Concept Learning. Ph.D. Dissertation, Department of Information and Computer Science, University of California, Irvine.

—————— (2015). The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World. New York: Basic Books.

Nir Fresco (2015). “Objective Computation Versus Subjective Computation.” Erkenntnis 80(5): 1031-1053.

Marco Giunti (1996). “Beyond computationalism.” In: Garrison W. Cottrell; Cognitive Science Society (U.S.), Proceedings of the Eighteenth Annual Conference of the Cognitive Science Society: July 12-15, 1996, University of California, San Diego. Mahwah, NJ: Lawrence Erlbaum Associates, pp. 71-75.

—————— (2006). “Is being computational an intrinsic property of a dynamical system?” In: Gianfranco Minati, Eliano Pessa & Mario Abram (Eds.), Systemics of Emergence: Research and Development. New York: Springer, pp. 683-694.

P. Kitcher (1985). “Two approaches to explanation.” The Journal of Philosophy, 82(11): 632-39.

Allen Newell & Herbert A. Simon (1962). “Computer Simulation of Human Thinking and Problem Solving.” Monographs of the Society for Research in Child Development, 27(2), Thought in the Young Child: Report of a Conference on Intellective Development with Particular Attention to the Work of Jean Piaget, pp. 137-150.

—————— (1964). “An example of human chess play in the light of chess playing programs.” Carnegie Institut of Technology.

—————— (1976). “Computer Science as Empirical Inquiry: Symbols and Search”, The ACM Turing Award Lecture. Communications of the Association for Computing Machinery, 19(3): 113-126.

Herbert Ohlman (1990). “Information: Timekeeping, Computing, Telecommunications and Audiovisual Technologies.” In: Ian McNeil, An Encyclopaedia of the History of Technology (London, New York: Routledge): p. 703.

H. Putnam (1988). Representation and reality. Cambridge: The MIT Press.

Michael Rescorla (2014a). “A Theory of Computational Implementation.” Synthese 191: pp. 1277-1307.

—————— (2014b). “Computational Modeling of the Mind: What Role for Mental Representation?” Wiley Interdisciplinary Reviews: Cognitive Science 6 (2014): pp. 65-73.

—————— (2015a). “Bayesian Perceptual Psychology.” In: The Oxford Handbook of the Philosophy of Perception (edited by Mohan Matthen). Oxford University Press: pp. 694-716.

—————— (2015b). “The Representational Foundations of Computation.” Philosophia Mathematica 23: pp. 338-366.

—————— (2016). “Bayesian Sensorimotor Psychology.” Mind and Language 31: pp. 3-36.

D. E. Rumelhart & J. L. McClelland (Eds.), (1986). Parallel distributed processing. 2 vols. Cambridge, MA: The MIT Press.

John R. Searle (1980). “Minds, brains, and programs.” The Behavioral and Brain Sciences 3: pp. 417–424.

—————— (1990). “Is the Brain’s Mind a Computer Program? No. A program merely manipulates symbols, whereas a brain attaches meaning to them.” Scientific American 262(1): pp. 26-31.

—————— 1992. The Rediscovery of the Mind. Cambridge, MA and London: MIT Press.

—————— (2008). “Is the brain a digital computer?” In: Philosophy in a New Century: Selected Essays, 5, pp. 86-106. Cambridge: Cambridge University Press.

C. E. Shannon (1940). A Symbolic Analysis of Relay and Switching Circuits. Master Thesis, Massachusetts Institute of Technology.

—————— (1948). “The Mathematical Theory of Communication.” Bell System Technical Journal 27: pp. 379-423.

—————— (1949). “Communication Theory of Secrecy Systems.” Bell System Technical Journal 27

—————— (1993). Collected Papers (edited by N. Sloane and A. Wyner). New York: IEEE Press.

C. E. Shannon & D. W. Hagelbarger (1956). “Concavity of Resistance Functions.” Journal of Applied Physics 27(1): pp. 42–43.

C. E. Shannon & W. Weaver (1949). The Mathematical Theory of Communication. Urbana and Chicago: University of Illinois Press.

Herbert A. Simon (1979). “Rational Decision Making in Business Organizations.” The American Economic Review, 69(4): 493-513.

Alan M. Turing (1936). “On Computable Numbers, with an Application to the Entscheidungsproblem.” Proceedings of the London Mathematical Society 2(42): pp. 230–65; 2(43): pp. 544–46.

[1] George Boole (1848). “The Calculus of Logic.” Cambridge and Dublin Mathematical Journal, Vol. III, p. 183.

[2] The Church/Turing thesis (Carter, 2007, p. 87).

[3] Alan M. Turing (1937-38). “On Computable Numbers, with an Application to the Entscheidungsproblem.” Proceedings of the London Mathematical Society 2(42): 230–65.

[4] Allen Newell and Herbert A. Simon (1976), p. 115.

[5] D. E. Rumelhart & J. L. McClelland (Eds.), (1986). Parallel distributed processing. 2 vols. Cambridge, MA: The MIT Press.

Milestones