{"id":544,"date":"2018-10-30T11:17:08","date_gmt":"2018-10-30T10:17:08","guid":{"rendered":"https:\/\/blog.sido-lyon.com\/?p=544"},"modified":"2018-10-30T11:17:08","modified_gmt":"2018-10-30T10:17:08","slug":"needs-based-architecture-for-intelligent-agents","status":"publish","type":"post","link":"https:\/\/blog.sido-lyon.com\/en\/2018\/10\/30\/needs-based-architecture-for-intelligent-agents\/","title":{"rendered":"Needs-Based Architecture for Intelligent Agents"},"content":{"rendered":"<p>Original article from <a href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A\">Frontiersin.org<\/a> The past few years have seen considerable progress in the deployment of voice-enabled personal assistants, first on smartphones (such as Apple\u2019s\u00a0Siri) and most recently as standalone devices in people\u2019s homes (such as Amazon\u2019s\u00a0Alexa). Such \u2018intelligent\u2019 communicative agents are distinguished from the previous generation of speech-based systems in that they claim to offer access to services and information via\u00a0conversationalinteraction (rather than simple voice commands). In reality, conversations with such agents have limited depth and, after initial enthusiasm, users typically revert to more traditional ways of getting things done. It is argued here that one source of the problem is that the standard architecture for a contemporary spoken language interface fails to capture the fundamental\u00a0teleological\u00a0properties of human spoken language. As a consequence, users have difficulty engaging with such systems, primarily due to a gross mismatch in\u00a0intentional\u00a0priors. This paper presents an alternative needs-driven cognitive architecture which models speech-based interaction as an emergent property of coupled hierarchical feedback-control processes in which a speaker has in mind the\u00a0needs\u00a0of a listener and a listener has in mind the\u00a0intentions\u00a0of a speaker. The implications of this architecture for future spoken language systems are illustrated using results from a new type of \u2018intentional speech synthesiser\u2019 that is capable of optimising its pronunciation in unpredictable acoustic environments as a function of its perceived communicative success. It is concluded that such purposeful behavior is essential to the facilitation of meaningful and productive spoken language interaction between human beings and autonomous social agents (such as robots). However, it is also noted that persistent mismatched priors may ultimately impose a fundamental limit on the effectiveness of speech-based human\u2013robot interaction 1. Introduction Recent years have seen tremendous progress in the deployment of practical spoken language systems (see Figure\u00a0<a href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#F1\">1<\/a>). Commencing in the 1980s with the appearance of specialised isolated-word recognition (IWR) systems for military command-and-control equipment, spoken language technology has evolved from large-vocabulary continuous speech recognition (LVCSR) for dictating documents (such as Dragon\u2019s\u00a0Naturally Speaking\u00a0and IBM\u2019s\u00a0Via Voice) released in the late 1990s, through telephone-based interactive voice response (IVR) systems, to the surprise launch in 2011 of\u00a0Siri\u00a0(Apple\u2019s voice-enabled personal assistant for the iPhone).\u00a0Siri\u00a0was quickly followed by Google\u00a0Now\u00a0and Microsoft\u2019s\u00a0Cortana, and these contemporary systems not only represent the successful culmination of over 50 years of laboratory-based speech technology research (<a href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#B37\">Pieraccini, 2012<\/a>) but also signify that speech technology has finally become \u201cmainstream\u201d (<a href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#B13\">Huang, 2002<\/a>) and has entered into general public awareness. <a href=\"https:\/\/www.frontiersin.org\/files\/Articles\/277317\/frobt-04-00066-HTML\/image_m\/frobt-04-00066-g001.jpg\" target=\"_blank\" rel=\"noopener noreferrer\"><img decoding=\"async\" src=\"https:\/\/www.frontiersin.org\/files\/Articles\/277317\/frobt-04-00066-HTML\/image_t\/frobt-04-00066-g001.gif\" alt=\"www.frontiersin.org\" \/><\/a> <strong>Figure 1<\/strong>. The evolution of spoken language technology applications from specialised military \u2018command-and-control\u2019 systems of the 1980\/90s to contemporary \u2018voice-enabled personal assistants\u2019 (such as\u00a0Siri) and future \u2018autonomous social agents\u2019 (such as robots). Research is now focused on verbal interaction with embodied conversational agents (such as on-screen avatars) or physical devices (such as Amazon\u00a0Echo, Google\u00a0Home, and, most recently, Apple\u00a0HomePod) based on the assumption that spoken language will provide a \u2018natural\u2019 interface between human beings and future (so-called)\u00a0intelligent\u00a0systems. As Figure\u00a0<a href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#F1\">1<\/a>\u00a0shows, the ultimate goal is seen as\u00a0conversational\u00a0interaction between users and autonomous social agents (such as robots), and first-generation devices (such as\u00a0Jibo<a href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#note1\">1<\/a>\u00a0and\u00a0Olly<a href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#note2\">2<\/a>) are now beginning to enter the commercial marketplace. <\/p>\n<h3>1.1. Limitations of Current Systems<\/h3>\n<p> However, while the raw technical performance of contemporary spoken language systems has improved significantly in recent years [as evidenced by corporate giants such as Microsoft and IBM continuing to issue claim and counter-claim as to whose system has the lowest word error rates (<a href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#B56\">Xiong et al., 2016<\/a>;\u00a0<a href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#B41\">Saon et al., 2017<\/a>)], in reality, users\u2019 experiences with such systems are often less than satisfactory. Not only can real-world conditions (such as noisy environments, strong accents, older\/younger users or non-native speakers) lead to very poor speech recognition accuracy, but the \u2018understanding\u2019 exhibited by contemporary systems is rather shallow. As a result, after initial enthusiasm, users often lose interest in talking to\u00a0Siri\u00a0or\u00a0Alexa, and they revert to more traditional interface technologies for completing their tasks (<a href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#B28\">Moore et al., 2016<\/a>). One possible explanation for this state of affairs is that, while component technologies such as automatic speech recognition and text-to-speech synthesis are subject to continuous ongoing improvements, the overall architecture of a spoken language system has not changed for quite some time. Indeed, there is a W3C \u2018standard\u2019 architecture to which most systems conform (<a href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#B53\">W3C-SIF, 2000<\/a>) (see Figure\u00a0<a href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#F2\">2<\/a>). Of course, standardisation is helpful because it promotes interoperability and expands markets. However, it can also stifle innovation by prescribing sub-optimal solutions. <a href=\"https:\/\/www.frontiersin.org\/files\/Articles\/277317\/frobt-04-00066-HTML\/image_m\/frobt-04-00066-g002.jpg\" target=\"_blank\" rel=\"noopener noreferrer\"><img decoding=\"async\" src=\"https:\/\/www.frontiersin.org\/files\/Articles\/277317\/frobt-04-00066-HTML\/image_t\/frobt-04-00066-g002.gif\" alt=\"www.frontiersin.org\" \/><\/a><strong>Figure 2<\/strong>. Structure of the W3C \u2018standard\u2019\u00a0Speech Interface Framework\u00a0(<a href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#B53\">W3C-SIF, 2000<\/a>). In the context of spoken language, there are a number of issues with the standard architecture depicted in Figure\u00a0<a href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#F2\">2<\/a>. 1. The standard architecture reflects a traditional open-loop stimulus\u2013response (\u2018behaviorist\u2019) view of interaction; the user utters a request, the system replies. This is known as the \u2018tennis match\u2019 metaphor for language, where discrete messages are passed back and forth between interlocutors\u2014a stance that is nowadays regarded as somewhat restrictive and old-fashioned (<a href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#B3\">Bickhard, 2007<\/a>;\u00a0<a href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#B7\">Fusaroli et al., 2014<\/a>). Contemporary \u2018enactive\u2019 perspectives regard spoken language interaction as being analogous to the continuous coordinated synchronous behavior exhibited by coupled dynamical systems: that is, more like a three-legged race than a tennis match (<a href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#B5\">Cummins, 2011<\/a>). 2. The standard architecture suggests complete independence between the input and output components, whereas there is growing evidence of the importance of \u2018sensorimotor overlap\u2019 between perception and production in living systems (<a href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#B54\">Wilson and Knoblich, 2005<\/a>;\u00a0<a href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#B43\">Sebanz et al., 2006<\/a>;\u00a0<a href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#B35\">Pickering and Garrod, 2007<\/a>). 3. The standard architecture fails to emphasise the importance of \u2018user modeling\u2019 in managing an interactive dialog: that is, successful interaction is not only conditioned on knowledge about users\u2019 directly observable characteristics and habits but it also depends on inferring their internal beliefs, desires, and\u00a0intentions\u00a0(<a href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#B6\">Friston and Frith, 2015<\/a>;\u00a0<a href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#B42\">Scott-Phillips, 2015<\/a>). 4. The standard architecture neglects the crucial teleological\/compensatory nature of behavior in living systems (<a href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#B38\">Powers, 1973<\/a>). In particular, it fails to acknowledge that speakers and listeners continuously balance the effectiveness of communication against the\u00a0effort\u00a0required to communicate effectively (<a href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#B19\">Lombard, 1911<\/a>)\u2014behavior that leads to a \u2018contrastive\u2019 (as opposed to signal-based) form of communication (<a href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#B18\">Lindblom, 1990<\/a>). As an example of the latter,\u00a0<a href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#B11\">Hawkins (2003)<\/a>\u00a0provides an informative illustration of such\u00a0regulatorybehavior in everyday conversational interaction. On hearing a verbal enquiry from a family member as to the whereabouts of some mislaid object, the listener might reply with any of the following utterances: \u201cI! \u2026 DO! \u2026 NOT! \u2026 KNOW!\u201d \u201cI do not know\u201d \u201cI don\u2019t know\u201d \u201cI dunno\u201d \u201cdunno\u201d [<img decoding=\"async\" src=\"https:\/\/www.frontiersin.org\/files\/Articles\/277317\/frobt-04-00066-HTML\/image_n\/frobt-04-00066-i001.gif\" alt=\"yes\" \/>] \u2026 where the last utterance is barely more than a series of nasal grunts! Which utterance is spoken would depend on the communicative context; the first might be necessary if the TV was playing loudly, whereas the last would be normal behavior for familiar interlocutors in a quiet environment. Such responses would be both inappropriate and ineffective if the situations were reversed; shouting in a quiet environment is unnecessary (and would be regarded as socially unacceptable), and a soft grunt in a noisy environment would not be heard (and might be regarded as an indication of lazyness). Such\u00a0adaptive\u00a0behavior is the basis of Lindblom\u2019s \u2018H&amp;H\u2019 (Hypo-and-Hyper) theory of speech production (<a href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#B18\">Lindblom, 1990<\/a>), and it provides a key motivation for what follows. <\/p>\n<h3>1.2. A Potential Solution<\/h3>\n<p> Many of the limitations identified above are linked, and closing the loops between speaking-and-listening and speaker-and-listener appears to be key. Therefore, what seems to be required going forward is an architecture for spoken language interaction that replaces the traditional open-loop stimulus\u2013response arrangement with a\u00a0closed-loop\u00a0dynamical framework; a framework in which intentions lead to actions, actions lead to consequences, and perceived consequences are compared to intentions (in a continuous cycle of synchronous\u00a0regulatory\u00a0behavior). This paper presents such a framework; speech-based interaction is modeled as an emergent property of coupled hierarchical feedback-control processes in which a speaker has in mind the\u00a0needs\u00a0of a listener and a listener has in mind the\u00a0intentions\u00a0of a speaker, and in which information is shared across sensorimotor channels. Section 2 introduces the theoretical basis for the proposed new architecture, and Section 3 presents a practical instantiation in the form of a new type of\u00a0intentional\u00a0speech synthesiser which is capable of adapting its pronunciation in unpredictable acoustic environments. Section 4 then discusses the wider implications of the new architecture in the context of human\u2013machine interaction, and Section 5 draws conclusions on the potential effectiveness of future spoken language systems. 2. An Architecture for Intentional Communicative Interaction Motivated by the arguments outlined above, an architecture for intentional communicative interaction was originally proposed by\u00a0<a href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#B24\">Moore (2007b)<\/a>. Known variously as \u2018PRESENCE\u2019 (PREdictive SENsorimotor Control and Emulation) (<a href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#B23\">Moore, 2007a<\/a>) and \u2018MBDIAC\u2019 (Mutual Beliefs Desires Intentions, and Consequences) (<a href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#B25\">Moore, 2014<\/a>), the core principle is the notion of closed-loop hierarchical feedback-control. As a result, it has many parallels with \u2018Perceptual Control Theory\u2019 (PCT) (<a href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#B22\">Moore, 2018<\/a>;\u00a0<a href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#B39\">Powers et al., 1960<\/a>;\u00a0<a href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#B38\">Powers, 1973<\/a>;\u00a0<a href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#B20\">Mansell and Carey, 2015<\/a>). The core principles of the architecture are reprised here in order to contextualise the design of the\u00a0intentional\u00a0speech synthesiser presented in Section 3. <\/p>\n<h3>2.1. Actions and Consequences<\/h3>\n<p> First, consider a \u2018world\u2019 that obeys the ordinary Laws of Physics. The world\u00a0W\u00a0has a set of possible states\u00a0S, and its state\u00a0s[t] at time\u00a0t\u00a0is some function of its previous states from\u00a0s[\u2013\u221e] to\u00a0s[t\u20131]. The world can thus be viewed as a form of dynamical system that evolves from state to state over time. These state transitions can be expressed as a\u00a0transform\u00a0\u2026 fW:s[\u2212\u221e],\u2026,s[t\u22121]\u2192s[t],fW:s[\u2212\u221e],\u2026,s[t\u22121]\u2192s[t],\u2003(1) where\u00a0fW\u00a0is some function that transforms the states of the world up to time\u00a0t\u20131 to the state of the world at time\u00a0t. This means that the evolution of events in the world constitutes a continuous cycle of \u2018cause-and-effect.\u2019 Events follows a time course in which it can be said that\u00a0actions\u00a0(i.e., the sequence of events in the past) lead to\u00a0consequences\u00a0(i.e., events in the future) which constitute further actions, leading to further consequences, and so on (see Figure\u00a0<a href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#F3\">3<\/a>) \u2026 Consequences=fW(Actions).Consequences=fW(Actions).\u2003(2) <a href=\"https:\/\/www.frontiersin.org\/files\/Articles\/277317\/frobt-04-00066-HTML\/image_m\/frobt-04-00066-g003.jpg\" target=\"_blank\" rel=\"noopener noreferrer\"><img decoding=\"async\" src=\"https:\/\/www.frontiersin.org\/files\/Articles\/277317\/frobt-04-00066-HTML\/image_t\/frobt-04-00066-g003.gif\" alt=\"www.frontiersin.org\" \/><\/a><strong>Figure 3<\/strong>. Illustration of the continuous cycle of cause-and-effect in a world that obeys the ordinary Laws of Physics. \u00a0Of course, the state-space\u00a0S\u00a0of possible actions and consequences would be immense due to the complexity of the world\u00a0W. This means that it is impossible to model. In practice, some parts of the world might have very little influence on other parts. So it is appropriate to consider a subset of the world\u00a0w\u00a0that has a minimal dependency on the rest. <\/p>\n<h3>2.2. An Agent Manipulating the World<\/h3>\n<p> Now consider the presence of an intentional agent\u00a0a\u00a0(natural or artificial) that seeks to effect a change in the world (the reason\u00a0why\u00a0the agent wishes to change the state of the world is addressed in Section 2.7). In this case, the agent\u2019s\u00a0intentions\u00a0are converted into actions which are, in turn, transformed into consequences \u2026 Consequences=fw(ga(Intentions)),Consequences=fw(ga(Intentions)),\u2003(3) \u00a0where\u00a0g\u00a0is some function that transforms agent\u00a0a\u2019s intentions into actions (a process known in robotics as \u2018action selection\u2019). This situation corresponds to an open-loop\u00a0stimulus\u2013response\u00a0configuration, hence the accuracy with which an agent can achieve its intended consequences is critically dependent on it having precise information about both\u00a0f\u00a0and\u00a0g. In this situation, the best method for achieving the required consequences is for the agent to employ an\u00a0inverse transform\u00a0in which\u00a0g\u00a0is replaced by\u00a0f\u22121(commonly referred to as \u2018inverse kinematics\u2019). It is possible to discuss at length how information about the transforms\u00a0g,\u00a0f, or\u00a0f\u22121\u00a0could be acquired; for example, using machine learning techniques on extensive quantities of training data. However, regardless of the approach taken, the final outcome would not only be sensitive to any inaccuracies in calibrating the relevant model parameters, but it would also be unable to tolerate unforeseen noise and\/or disturbances present in the agent or in the world. This is a fundamental limitation on any \u2018open-loop\u2019 approach. Control theory (and thus Perceptual Control Theory) provides an alternative\u00a0closed-loop\u00a0solution that is not dependent on knowing\u00a0f\u00a0or\u00a0f\u22121. An agent simply needs to be able to judge whether the consequences of its actions\u00a0match\u00a0its intentions (and adjust its behavior accordingly). An agent thus needs to be able to choose actions that minimise the difference between its intentions and the perceived consequences of its actions (a process known as \u2018negative feedback control\u2019) (see Figure\u00a0<a href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#F4\">4<\/a>). In practice, it takes time to minimise the difference (since physical actions cannot take place instantaneously). So the process typically\u00a0iterates\u00a0toward a solution. This means that, although closed-loop control does not require information about\u00a0f\u00a0or\u00a0f\u22121, it does need to know about\u00a0g\u2014the mapping between the error (the difference between intentions and consequences in perceptual space) and the appropriate control action.<a href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#note3\">3<\/a>\u00a0This is either known in advance, or it has to be discovered (learnt) by active exploration; for example, using \u2018reinforcement learning\u2019 (<a href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#B45\">Sutton and Barto, 1998<\/a>) or the process referred to in Perceptual Control Theory as \u2018reorganisation\u2019 (<a href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#B38\">Powers, 1973<\/a>). <a href=\"https:\/\/www.frontiersin.org\/files\/Articles\/277317\/frobt-04-00066-HTML\/image_m\/frobt-04-00066-g004.jpg\" target=\"_blank\" rel=\"noopener noreferrer\"><img decoding=\"async\" src=\"https:\/\/www.frontiersin.org\/files\/Articles\/277317\/frobt-04-00066-HTML\/image_t\/frobt-04-00066-g004.gif\" alt=\"www.frontiersin.org\" \/><\/a><strong>Figure 4<\/strong>. Illustration of an intentional agent manipulating the world in a\u00a0closed-loopnegative-feedback configuration. \u00a0In many situations, negative feedback-control is able to employ an optimisation technique known as\u00a0gradient descent\u00a0in which the difference between the intentions and the perceived consequences is a continuous variable that can be reduced monotonically to zero. Hence, in the general case, negative feedback-control can be viewed as an iterative\u00a0search\u00a0over possible actions to find those which give rise to the best match between intentions and perceived consequences \u2026 Actions\u02c6=arg\u2009minActions(Intentions\u2212PerceivedConsequences),Actions^=arg\u2009minActions(Intentions\u2212PerceivedConsequences),\u2003(4) \u00a0where\u00a0Actions\u02c6Actions^\u00a0represents an estimate of the actions required to minimise the difference between intentions and perceived consequences. However, this configuration will only function correctly if two conditions are met: (i) the agent can observe the consequences of its actions, and (ii) the search space contains only one\u00a0globalminimum. If the consequences of an agent\u2019s actions are hidden (for example, the internal states of another agent), then the loop can still function, but only if the agent is able to estimate the consequences of\u00a0possible\u00a0actions. Likewise, if the search space has many local minima, then an iterative search can avoid getting stuck by exploring the space\u00a0in advance.<a href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#note4\">4<\/a>\u00a0In other words, in both of these cases, an agent would benefit from an ability to\u00a0predict\u00a0the consequences of possible actions. This means that an intentional agent needs to be able to (i) estimate the relationship between available actions and potential consequences (fw), (ii) perform a search over hypothetical actions, and then (iii) execute those actions that are found to minimise the estimated error. In this case \u2026 Actions\u02c6=arg\u2009minActions\u02dc(Intentions\u2212fw\u02c6(Actions\u02dc)),Actions^=arg\u2009minActions~(Intentions\u2212fw^(Actions~)),\u2003(5) where\u00a0fw\u02c6fw^\u00a0is the estimate of\u00a0fw\u00a0and\u00a0Actions\u02dcActions~\u00a0is the set of available actions (see Figure\u00a0<a href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#F5\">5<\/a>). <a href=\"https:\/\/www.frontiersin.org\/files\/Articles\/277317\/frobt-04-00066-HTML\/image_m\/frobt-04-00066-g005.jpg\" target=\"_blank\" rel=\"noopener noreferrer\"><img decoding=\"async\" src=\"https:\/\/www.frontiersin.org\/files\/Articles\/277317\/frobt-04-00066-HTML\/image_t\/frobt-04-00066-g005.gif\" alt=\"www.frontiersin.org\" \/><\/a><strong>Figure 5<\/strong>. Illustration of an intentional agent manipulating the world in a\u00a0closed-loopconfiguration in the situation where the agent is unable to directly observe the consequences of its actions (in space or time). \u00a0What is interesting in this arrangement is that the estimated transform\u00a0fw\u02c6fw^\u00a0can be interpreted as a form of\u00a0mental simulation\u00a0(or predictor) that emulates the consequences of possible actions prior to action selection (<a href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#B12\">Hesslow, 2002<\/a>;\u00a0<a href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#B9\">Grush, 2004<\/a>). In other words, searching over\u00a0fw\u02c6(Actions\u02dc)fw^(Actions~)\u00a0is equivalent to\u00a0planning\u00a0in the field of Artificial Intelligence and to \u2018imagination mode\u2019 in Perceptual Control Theory (<a href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#B38\">Powers, 1973<\/a>). Another insight to emerge from this approach is that the depth of the search can be regarded as analogous to\u00a0effort, i.e., the amount of energy devoted to finding a solution. <\/p>\n<h3>2.3. An Agent Interpreting the World<\/h3>\n<p> Now, consider the complementary situation in which an agent\u00a0a\u00a0is attempting to\u00a0interpret\u00a0the world\u00a0w. In this case, interpretation is defined as an agent deriving potentially hidden actions\/causes of events by observing their visible effects\/consequences Actions\u02c6=ha(fw(Actions)),Actions^=hafw(Actions),\u2003(6) \u00a0where\u00a0h\u00a0is some perceptual function that transforms observed effects (i.e., the evolution of states resulting from\u00a0Actions\u00a0in the world\u00a0w) into estimated causes. Given that consequences are caused by actions via the transform\u00a0fw, it is possible, in principle, to compute the actions directly from the observed consequences using the inverse transform\u00a0f\u22121wfw\u22121. However, in practice,\u00a0f\u22121wfw\u22121\u00a0is not known and very hard to estimate. A more tractable solution is to construct an estimate of\u00a0fw\u00a0(known as a \u2018forward\/generative model\u2019) and to compare its output with the observed signals. Such a configuration (based on a generative model) is known as a \u2018maximum likelihood\u2019 or \u2018Bayesian\u2019 classifier, and mathematically it is the optimum way to estimate hidden variables given uncertainty in both the observations and the underlying process. It is also a standard result in the field of statistical estimation that the parameters of forward\/generative models are much easier to derive using maximum likelihood (ML) or maximum\u00a0a posteriori\u00a0(MAP) estimation techniques. The agent thus interprets the world by searching over possible actions\/causes to find the best match between the predicted and the observed consequences \u2026 Actions\u02c6=arg\u2009minActions(Consequences\u2212fw\u02c6(Actions)).Actions^=arg\u2009minActionsConsequences\u2212fw^(Actions).\u2003(7) This process is illustrated in Figure\u00a0<a href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#F6\">6<\/a>, and what is immediately apparent is that, like manipulation, the process of interpretation is also construed as a negative feedback-control loop; in this case, it is a search over possible causes (rather than effects). In fact, the architecture illustrated in Figure\u00a0<a href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#F6\">6<\/a>\u00a0is a standard model-based\u00a0recognition\u00a0framework in which the recognition\/interpretation\/inference of the (hidden) cause of observed behavior is viewed as a search over possible outputs from a forward model that is capable of generating that behavior (<a href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#B54\">Wilson and Knoblich, 2005<\/a>;\u00a0<a href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#B36\">Pickering and Garrod, 2013<\/a>): an approach known more generally as\u00a0analysis-by-synthesis. Again, the depth of the search is analogous to effort. <a href=\"https:\/\/www.frontiersin.org\/files\/Articles\/277317\/frobt-04-00066-HTML\/image_m\/frobt-04-00066-g006.jpg\" target=\"_blank\" rel=\"noopener noreferrer\"><img decoding=\"async\" src=\"https:\/\/www.frontiersin.org\/files\/Articles\/277317\/frobt-04-00066-HTML\/image_t\/frobt-04-00066-g006.gif\" alt=\"www.frontiersin.org\" \/><\/a><strong>Figure 6<\/strong>. Illustration of an agent\u00a0a\u00a0attempting to infer the hidden causes\/actions from their observable effects\/consequences by using a negative feedback-control loop to search the outputs from\u00a0fw\u02c6fw^\u00a0(a forward estimate of\u00a0fwfw). <\/p>\n<h3>2.4. One Agent Communicating Its Intentions to Another Agent<\/h3>\n<p> The foregoing establishes a remarkably symmetric framework for agents manipulating and interpreting the world in the presence of uncertainty and unknown disturbances. The processes of both manipulation and interpretation employ negative feedback-control loops that perform a search over the potential outputs of a forward model. We now consider the case where the world contains more than one agent: a world in which a sending agent\u00a0s\u00a0is attempting to change the mental state of a receiving agent\u00a0r\u00a0(that is,\u00a0communicating\u00a0its intentions without being able to directly observe whether those intentions have been perceived). For the sending agent\u00a0s\u00a0\u2026 Actionss=gs(Intentionss),Actionss=gsIntentionss,\u2003(8) where\u00a0gs\u00a0is the transform from intentions to behavior, and for the receiving agent\u00a0r\u00a0\u2026 Interpretationsr=hr(Actionss),Interpretationsr=hrActionss,\u2003(9) where\u00a0hr\u00a0is the transform from observed behavior to interpretations. Hence, for agent\u00a0s\u00a0attempting to communicate its intentions to agent\u00a0r, the arguments put forward in Section 2.2 suggest that, if there is no direct feedback from agent\u00a0r, then agent\u00a0s\u00a0needs to compute appropriate behavior (actions) based on Actionss\u02c6=arg\u2009minActionss\u02dc(Intentionss\u2212hr\u02c6(Actionss\u02dc)),Actionss^=arg\u2009minActionss~Intentionss\u2212hr^(Actionss~),\u2003(10) \u00a0which is a negative feedback-control loop performing a search over possible behaviors by agent\u00a0sand their interpretations<a href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#note5\">5<\/a>\u00a0by agent\u00a0r\u00a0as estimated by agent\u00a0s. This process can be regarded as\u00a0synthesis-by-analysis. <\/p>\n<h3>2.5. One Agent Interpreting the Behavior of Another Agent<\/h3>\n<p> For agent\u00a0r\u00a0attempting to interpret the intentions of agent\u00a0s, the arguments put forward in Section 2.3 suggest that agent\u00a0r\u00a0needs to compare the observed actions of agent\u00a0s\u00a0with the output of a forward model of agent\u00a0s\u00a0\u2026 Intentionss\u02c6=arg\u2009minIntentionss(Actionss\u2212gs\u02c6(Intentionss)),Intentionss^=arg\u2009minIntentionssActionss\u2212gs^(Intentionss),\u2003(11) \u00a0which is a negative feedback-control loop performing a search over the possible intentions of agent\u00a0s\u00a0and their realisations by agent\u00a0s\u00a0as estimated by agent\u00a0r. As in Figure\u00a0<a href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#F6\">6<\/a>, this process is\u00a0analysis-by-synthesis. In fact, this particular configuration is exactly how the previous generation of algorithms for automatic speech recognition were formulated using \u2018hidden Markov models\u2019 (HMMs) (<a href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#B8\">Gales and Young, 2007<\/a>) as an appropriate forward\/generative model for speech. Interestingly, such an approach to speech recognition is not only reminiscent of the \u2018Motor Theory\u2019 of speech perception (<a href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#B16\">Liberman et al., 1967<\/a>), but it is also supported by neuroimaging data (<a href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#B14\">Kuhl et al., 2014<\/a>;\u00a0<a href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#B44\">Skipper, 2014<\/a>). <\/p>\n<h3>2.6. Using \u2018Self\u2019 to Model \u2018Other\u2019<\/h3>\n<p> The configurations outlined in Sections 2.4 and 2.5 lead to an important observation; both require one agent to have a model of some aspect of the other agent. The sending agent\u00a0s\u00a0selects its actions by searching over possible interpretations by the receiving agent\u00a0r\u00a0using an estimate of the receiving agent\u2019s transform from observations to interpretation (hr\u02c6hr^). The receiving agent\u00a0r\u00a0infers the intentions of the sending agent\u00a0s\u00a0by searching over possible interpretations using as estimate of the sending agent\u2019s transform from intentions to actions (gs\u02c6gs^). So this leads to an important question: where do the transforms\u00a0hr\u02c6hr^\u00a0and\u00a0gs\u02c6gs^\u00a0come from? More precisely, how might their parameters be estimated? Obviously they could be derived using a variety of different learning procedures. However, one intriguing possibility is that, if the agents are very similar to each other (for example, conspecifics), then each agent could approximate these functions using information recruited\u00a0from their own structures\u2014exactly as proposed by\u00a0<a href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#B6\">Friston and Frith (2015)<\/a>. In other words,\u00a0hr\u02c6\u2190\u2223hshr^\u21a4hs\u00a0(which can be searched using\u00a0gs\u00a0rather than\u00a0gr\u02c6gr^) and\u00a0gs\u02c6\u2190\u2223grgs^\u21a4gr\u00a0(which can be searched using\u00a0hr\u00a0rather than\u00a0hs\u02c6hs^) (see Figure\u00a0<a href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#F7\">7<\/a>). <a href=\"https:\/\/www.frontiersin.org\/files\/Articles\/277317\/frobt-04-00066-HTML\/image_m\/frobt-04-00066-g007.jpg\" target=\"_blank\" rel=\"noopener noreferrer\"><img decoding=\"async\" src=\"https:\/\/www.frontiersin.org\/files\/Articles\/277317\/frobt-04-00066-HTML\/image_t\/frobt-04-00066-g007.gif\" alt=\"www.frontiersin.org\" \/><\/a><strong>Figure 7<\/strong>. Illustration of a world containing one agent (a sender\u00a0s) communicating with another (a receiver\u00a0r) where each makes use of a model of the other by exploiting knowledge of themselves. \u00a0This arrangement, in which both agents exploit sensorimotor knowledge of themselves to model each other, can be thought of as\u00a0synthesis-by-analysis-by-synthesis\u00a0for the sending agent and\u00a0analysis-by-synthesis-by-analysis\u00a0for the receiving agent. Combining both into a single communicative agent gives rise to a structure where perception and production are construed as parallel recursive control-feedback processes (both of which employ search as the underlying mechanism for optimisation), and in which the intentions of \u2018self\u2019 and the intentions of \u2018other\u2019 are linked to the behavior of \u2018self\u2019 and the observations of \u2018other,\u2019 respectively (see Figure\u00a0<a href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#F8\">8<\/a>). <a href=\"https:\/\/www.frontiersin.org\/files\/Articles\/277317\/frobt-04-00066-HTML\/image_m\/frobt-04-00066-g008.jpg\" target=\"_blank\" rel=\"noopener noreferrer\"><img decoding=\"async\" src=\"https:\/\/www.frontiersin.org\/files\/Articles\/277317\/frobt-04-00066-HTML\/image_t\/frobt-04-00066-g008.gif\" alt=\"www.frontiersin.org\" \/><\/a><strong>Figure 8<\/strong>. Illustration of a communicative agent that is capable of optimising the signaling of its own intentions (PRODUCTION) and inferring the intentions of others (PERCEPTION). <\/p>\n<h3>2.7. A Needs-Driven Communicative Agent<\/h3>\n<p> The preceding arguments provide novel answers to two key questions: how can an agent (i) optimise its behavior in order to communicate its intentions and (ii) infer the intentions of another agent by observing their behavior? However, thus far, it has been assumed that intentionality is a key driver of communicative interaction\u2014but whither the intentions? Perceptual Control Theory suggests that purposeful behavior exists at every level in a\u00a0hierarchy\u00a0of control systems. So, by invoking intentionality as a manifestation of purposeful goal-driven behavior, it is possible to make a direct link with established aged-based modeling approaches such as \u2018BDI\u2019 (Beliefs-Desires-Intentions) (<a href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#B40\">Rao and Georgoff, 1995<\/a>;\u00a0<a href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#B55\">Wooldridge, 2000<\/a>) and \u2018DAC\u2019 (Distributed Adaptive Control) (<a href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#B33\">Pfeifer and Verschure, 1992<\/a>;\u00a0<a href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#B52\">Verschure, 2012<\/a>). In particular, the DAC architecture emphasises that behaviors are ultimately driven by a motivational system based on an agent\u2019s fundamental\u00a0needs(<a href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#B21\">Maslow, 1943<\/a>). Likewise, intrinsic motivations are thought to play a crucial role in driving learning (<a href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#B32\">Oudeyer and Kaplan, 2007<\/a>;\u00a0<a href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#B2\">Baldassarre et al., 2014<\/a>). Putting all this together, it is possible to formulate a generic and remarkably symmetric architecture for a\u00a0needs-driven\u00a0communicative agent that is both a sender and a receiver. In this framework, it is proposed that a communicative agent\u2019s behavior is conditioned on appropriate motivational and deliberative belief states:\u00a0motivation\u00a0\u21d2\u00a0expression\u00a0\u21d2\u00a0production. Likewise, the intentions, desires, and needs of another agent are inferred via a parallel interpretive structure:\u00a0perception\u00a0\u21d2\u00a0interpretation\u00a0\u21d2\u00a0comprehension. At each level, optimisation involves\u00a0search\u00a0and, thereby, a mechanism for managing \u2018effort.\u2019 This canonic configuration is illustrated in Figure\u00a0<a href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#F9\">9<\/a>. <a href=\"https:\/\/www.frontiersin.org\/files\/Articles\/277317\/frobt-04-00066-HTML\/image_m\/frobt-04-00066-g009.jpg\" target=\"_blank\" rel=\"noopener noreferrer\"><img decoding=\"async\" src=\"https:\/\/www.frontiersin.org\/files\/Articles\/277317\/frobt-04-00066-HTML\/image_t\/frobt-04-00066-g009.gif\" alt=\"www.frontiersin.org\" \/><\/a><strong>Figure 9<\/strong>. Illustration of the derived architecture for a needs-driven communicative agent (<a href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#B22\">Moore, 2018<\/a>). Such a needs-driven architecture is founded on a model of interaction in which each speaker\/listener has in mind the needs and intentions of the other speaker\/listener(s). As such, the proposed solution is entirely neutral with respect to the nature of the speaking\/listening agents; that is, it applies whether they are living or artificial systems. Hence, the derived architecture not only captures important features of human speech but also provides a potential blueprint for a new type of spoken language system. For example, the proposed architecture suggests an approach to automatic speech recognition which incorporates a generative model of speech whose output is compared with incoming speech data. Of course, this is exactly how HMM-based automatic speech recognition systems are constructed (<a href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#B8\">Gales and Young, 2007<\/a>)\u2014the difference is that the architecture derived above not only suggest a richer generative model [in line with the \u2018Motor Theory\u2019 of speech perception (<a href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#B17\">Liberman and Mattingly, 1985<\/a>) and previous attempts to implement \u2018recognition-by-synthesis\u2019 (<a href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#B4\">Bridle and Ralls, 1985<\/a>)] but also that such an embedded model of speech generation should be derived not from the voice of the speaker but from the voice of the listener (which, in this case, is a machine!). Thus far, no-one has attempted such a radical approach to automatic speech recognition. The proposed architecture also provides a framework for a new type of\u00a0intentional\u00a0speech synthesiser which listens to its own output and modifies its behavior as a function of how well it thinks it is achieving its communicative goals: for example, talking louder in a noisy environment and actively altering its pronunciation to maximise intelligibility and minimise potential confusion. In particular, the architecture makes an analogy between the depth of each search process and \u2018motivation\/effort,\u2019 thereby reflecting the behavior illustrated by the \u201cI do not know\u201d example presented in Section 1.1 where a speaker trades effort against intelligibility. The key insight here is that the behavioral \u2018target\u2019 is not a signal but a\u00a0perception\u00a0(<a href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#B38\">Powers, 1973<\/a>). Hence, the solution maps very nicely into a hierarchical control-feedback process which aims to maintain sufficient contrast at the highest\u00a0pragmatic\u00a0level of communication by means of suitable regulatory compensations at the lower semantic, syntactic, lexical, phonemic, phonetic, and acoustic levels balanced against the effort of doing so. Such an innovative approach to speech synthesis has been implemented by the authors and is described below. 3. A Next-Generation\u00a0Intentional\u00a0Speech Synthesiser The ideas outlined above have been used to construct a new type of\u00a0intentional\u00a0speech synthesiser known as \u2018C2H\u2019 (Computational model for H&amp;H theory), which is capable of adapting its pronunciation in unpredictable acoustic environments as a function of its perceived communicative success (<a href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#B29\">Moore and Nicolao, 2011<\/a>;\u00a0<a href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#B31\">Nicolao et al., 2012<\/a>). The \u2018synthesis-by-analysis\u2019 model (based on the principles outlined in Section 2.4) consists of a speech production system [inspired by\u00a0<a href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#B15\">Levelt (1989)<\/a>\u00a0and\u00a0<a href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#B10\">Hartsuiker and Kolk (2001)<\/a>] and a negative feedback loop which, respectively, generates utterances and measures the environment effects on the outcome such that adjustments based on\u00a0articulatory effort\u00a0can be made dynamically according to the results of the analysis (see Figure\u00a0<a href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#F10\">10<\/a>). The perceptual feedback consists of an emulation of a listener\u2019s auditory perceptual apparatus that measures the environmental state and returns information that is used to control the degree of modification to speech production. <a href=\"https:\/\/www.frontiersin.org\/files\/Articles\/277317\/frobt-04-00066-HTML\/image_m\/frobt-04-00066-g010.jpg\" target=\"_blank\" rel=\"noopener noreferrer\"><img decoding=\"async\" src=\"https:\/\/www.frontiersin.org\/files\/Articles\/277317\/frobt-04-00066-HTML\/image_t\/frobt-04-00066-g010.gif\" alt=\"www.frontiersin.org\" \/><\/a> <strong>Figure 10<\/strong>. Illustration of the C2H model of speech production (speech synthesiser on the left, auditory feedback loop on the right, adaptive control in the center). <\/p>\n<h3><\/h3>\n<h3>3.1. Implementation<\/h3>\n<p> The C2H model was implemented using \u2018HTS\u2019: the state-of-the-art parametric speech synthesiser developed by\u00a0<a href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#B49\">Tokuda et al. (2007<\/a>,\u00a0<a href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#B48\">2013<\/a>). HTS is based on hidden Markov modeling, and a recursive search algorithm was added to adapt the model statistics at the frame (rather than whole utterance) level (<a href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#B47\">Tokuda et al., 1995<\/a>). This allowed the energy distribution and organisation in automatic speech production to be obtained through active manipulation of the synthesis parameters. An adaptation transform covering both the acoustic and durational statistics was trained using \u2018maximum likelihood linear regression\u2019 (MLLR); only the mean vectors were transformed. An implementation of the standard ANSI \u2018Speech Intelligibility Index\u2019 (SII) (<a href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#B1\">American National Standards Institute, 1997<\/a>) was used to estimate the intelligibility of the resulting synthesised speech (i.e., the artificial speaker\u2019s model of the human listener). <\/p>\n<h3>3.2. Actively Managing Phonetic Contrast<\/h3>\n<p> Inspired by the \u2018H&amp;H\u2019 principles espoused by\u00a0<a href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#B18\">Lindblom (1990)<\/a>, the adaptation of the synthesiser output was motivated by both articulatory and energetic manifestations of phonetic contrast. In particular, we introduce the notion of\u00a0low-energy attractors: minimally contrastive acoustic realisations toward which at least two competing phones tend to converge. For example, in the utterances \u201cThis is my pet\u201d versus \u201cThis is my pot,\u201d the ease with which a listener can distinguish between \u201cpet\u201d and \u201cpot\u201d depends on the effort put in to the pronunciation of the vowel by the speaker. With poor contextual support (including the history of the interaction) and\/or environmental noise, a speaker is likely to produce very clear high-effort\u00a0hyper-articulated output: [<img decoding=\"async\" src=\"https:\/\/www.frontiersin.org\/files\/Articles\/277317\/frobt-04-00066-HTML\/image_n\/frobt-04-00066-i002.gif\" alt=\"yes\" \/>] or [<img decoding=\"async\" src=\"https:\/\/www.frontiersin.org\/files\/Articles\/277317\/frobt-04-00066-HTML\/image_n\/frobt-04-00066-i003.gif\" alt=\"yes\" \/>]. However, if the context is strong and\/or the environment is quiet, then a speaker is likely to produce a much less clear low-effort\u00a0hypo-articulated output: close to [<img decoding=\"async\" src=\"https:\/\/www.frontiersin.org\/files\/Articles\/277317\/frobt-04-00066-HTML\/image_n\/frobt-04-00066-i004.gif\" alt=\"yes\" \/>] (the neutral\u00a0schwa\u00a0vowel) for both \u201cpet\u201d and \u201cpot.\u201d In HMM-based speech synthesis, the acoustic realisation of any particular phone can be altered continuously (using a reasonably simple adaptation) in any direction in the high-dimensional space that is defined by their parametric representation. Therefore, once identified, a low-energy attractor in the acoustic space defines a specific direction along which each phone parametric representation should be possible to move in order to decrease or increase the degree of articulation. The hypothesis is thus that by manipulating the acoustic distance between the realisation of different phones, it is possible to vary the output from hypo-articulated speech (i.e., by moving toward the attractor) to hyper-articulated speech (i.e., by moving in the opposite direction away from the attractor) with appropriate consequences for the intelligibility of the resulting output. It is well established that hyper-articulated speech corresponds to an expansion of a speaker\u2019s vowel space and, conversely, hypo-articulated speech corresponds to a contraction of their vowel space (<a href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#B50\">van Bergem, 1995<\/a>). Hence, in the work reported here, the mid-central schwa vowel [<img decoding=\"async\" src=\"https:\/\/www.frontiersin.org\/files\/Articles\/277317\/frobt-04-00066-HTML\/image_n\/frobt-04-00066-i005.gif\" alt=\"yes\" \/>] was defined as the low-energy attractor for\u00a0all\u00a0vowels (see Figure\u00a0<a href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#F11\">11<\/a>A). However, for consonants it is not possible to define such a single low-energy attractor (<a href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#B51\">van Son and Pols, 1999<\/a>). In this case, each consonant was considered to have a particular competitor that is acoustically very close, and hence potentially\u00a0confusable. Therefore, the minimum-contrastive point for each confusable pair of consonants was defined to be half-way between their citation realisations (see Figure\u00a0<a href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#F11\">11<\/a>B). <a href=\"https:\/\/www.frontiersin.org\/files\/Articles\/277317\/frobt-04-00066-HTML\/image_m\/frobt-04-00066-g011.jpg\" target=\"_blank\" rel=\"noopener noreferrer\"><img decoding=\"async\" src=\"https:\/\/www.frontiersin.org\/files\/Articles\/277317\/frobt-04-00066-HTML\/image_t\/frobt-04-00066-g011.gif\" alt=\"www.frontiersin.org\" \/><\/a><strong>Figure 11<\/strong>. Graphical representation of the transformations required to achieve\u00a0hyper-articulated (red arrows,\u00a0THYP) or\u00a0hypo-articulated (blue arrows,\u00a0THYO) output for\u00a0<strong>(A)<\/strong>\u00a0a vowel midway between [<img decoding=\"async\" src=\"https:\/\/www.frontiersin.org\/files\/Articles\/277317\/frobt-04-00066-HTML\/image_n\/frobt-04-00066-i007.gif\" alt=\"yes\" \/>] and [<img decoding=\"async\" src=\"https:\/\/www.frontiersin.org\/files\/Articles\/277317\/frobt-04-00066-HTML\/image_n\/frobt-04-00066-i008.gif\" alt=\"yes\" \/>] and\u00a0<strong>(B)<\/strong>\u00a0a pair of contrastive consonants [<img decoding=\"async\" src=\"https:\/\/www.frontiersin.org\/files\/Articles\/277317\/frobt-04-00066-HTML\/image_n\/frobt-04-00066-i009.gif\" alt=\"yes\" \/>] and [<img decoding=\"async\" src=\"https:\/\/www.frontiersin.org\/files\/Articles\/277317\/frobt-04-00066-HTML\/image_n\/frobt-04-00066-i010.gif\" alt=\"yes\" \/>] (LC signifies the minimum-contrastive configurations). <\/p>\n<h3>3.3. MLLR Transforms<\/h3>\n<p> The MLLR transformations were estimated using a corpus of synthetic\u00a0hypo-articulated speech. This consisted of speech generated using the HTS system trained on the CMU-ARCTIC SLT corpus<a href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#note6\">6<\/a>and forcing its input control sequences to have only low-energy attractors. All vowels were substituted with schwa [<img decoding=\"async\" src=\"https:\/\/www.frontiersin.org\/files\/Articles\/277317\/frobt-04-00066-HTML\/image_n\/frobt-04-00066-i006.gif\" alt=\"yes\" \/>], while consonants were changed into their specific competitors. Using decision-tree-based clustering, HTS found the most likely acoustic model according to the phonetic and prosodic context for all of the phones, even those unseen in its original training corpus. Adaptations of both the acoustic and duration models were trained to match the characteristics of the\u00a0hypo-articulation reference, and a set of transformations was obtained which modified the mean vectors in the relevant HMMs. The covariance vector was not considered. The linear transform can be written as \u2026 \u03bc\u2192\u2032i=A\u2192i\u03bc\u2192i+b\u2192i,\u03bc\u2192i\u2032=A\u2192i\u03bc\u2192i+b\u2192i,\u2003(12) \u00a0where\u00a0A\u2192iA\u2192i\u00a0is a\u00a0P\u00a0\u00d7\u00a0P\u00a0matrix,\u00a0b\u2192ib\u2192i\u00a0is a\u00a0P\u00a0\u00d7 1 bias vector for the\u00a0i-th model, and\u00a0P\u00a0is the size of the parametric representation. In practice, the MLLR transformations are scaled with different strengths. So, given the full-strength transform toward the low-energy attractor\u00a0\u03bc\u2192\u2032i\u03bc\u2192i\u2032, the scaled mean vector\u00a0\u03bc\u2192(\u03b1)i\u03bc\u2192i(\u03b1)\u00a0is computed as \u2026 \u03bc\u2192(\u03b1)i=\u03bc\u2192i+\u03b1(\u03bc\u2192\u2032i\u2212\u03bci\u2192)=\u03b1\u03bc\u2192\u2032i+(1\u2212\u03b1)\u03bci\u2192\u03bc\u2192i(\u03b1)=\u03bc\u2192i+\u03b1(\u03bc\u2192i\u2032\u2212\u03bci\u2192)=\u03b1\u03bc\u2192i\u2032+(1\u2212\u03b1)\u03bci\u2192\u2003(13) \u00a0where\u00a0\u03b1\u00a0is a weighting factor (\u03b1\u00a0\u2265 0). The transformation toward\u00a0hyper-articulated speech is defined as the inverse of the trained transformation, which simply means that\u00a0\u03b1\u00a0\u2264 0. The net result, given that the MLLR transform is part of a synthesis-by-analysis closed loop (as proposed in Section 2.4 and illustrated in Figure\u00a0<a href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#F10\">10<\/a>), is that the strength of the modification (controlled by\u00a0\u03b1) can be adjusted continuously as a function of the perceived intelligibility of the speech<a href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#note7\">7<\/a>\u00a0in the environment where the communication takes place. The adjustment of\u00a0\u03b1\u00a0thus controls the dynamic expansion\/compression of the acoustic space in order to achieve the communicative\u00a0intentions\u00a0of the synthesiser. <\/p>\n<h3>3.4. System Evaluation<\/h3>\n<p> In order to test the effectiveness of the intentional speech synthesiser, the C2H model was used to synthesise speech in the presence of various interfering noises at a range of signal-to-noise ratios. Experiments were conducted with different-strength MLLR adaptations (different values of\u00a0\u03b1), and objective SII speech intelligibility measurements (<a href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#B1\">American National Standards Institute, 1997<\/a>) were made for each condition. SII was selected as not only is it a standard protocol for objective intelligibility assessment [and has been shown to have a good correlation with human perception (<a href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#B46\">Tang et al., 2016<\/a>)] but it also formed the basis of the system\u2019s model of the listener. Phonetic analysis was provided by the standard Festival toolkit,<a href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#note8\">8<\/a>\u00a0and the duration control was left to the statistical model and its adaptations. 200 sentences from the 2010 Blizzard Challenge<a href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#note9\">9<\/a>\u00a0evaluation test were used to generate the full-strength forward transformation (\u03b1\u00a0= max\u00a0\u03b1) and full-strength inverse transformation (\u03b1\u00a0= min\u00a0\u03b1) samples. A standard speech synthesis (\u03b1\u00a0= 0) was generated as reference. It turns out that the range of values for\u00a0\u03b1\u00a0is not easily defined, and there is a significant risk that the transformation could produce unnatural speech phenomena (particularly as there is no lower limit for the value of\u00a0\u03b1). In practice, the boundary values for\u00a0\u03b1\u00a0were determined experimentally, and an acceptable range of values was found to be\u00a0\u03b1\u00a0= [\u20130.8, 1] for vowels and\u00a0\u03b1\u00a0= [\u20130.7, 0.6] for consonants. As an example of the effectiveness of these transformations, Figure\u00a0<a href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#F12\">12<\/a>\u00a0illustrates the consequences for the distribution of formant resonance frequencies for a range of different vowel sounds. As can be seen, the vowel space is severely reduced for\u00a0hypo-articulated speech and somewhat expanded for\u00a0hyper-articulated speech. This pattern successfully replicates established results obtained by comparing natural spontaneous speech with read speech (cf. Figure 2 in\u00a0<a href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#B51\">van Son and Pols (1999)<\/a>). <a href=\"https:\/\/www.frontiersin.org\/files\/Articles\/277317\/frobt-04-00066-HTML\/image_m\/frobt-04-00066-g012.jpg\" target=\"_blank\" rel=\"noopener noreferrer\"><img decoding=\"async\" src=\"https:\/\/www.frontiersin.org\/files\/Articles\/277317\/frobt-04-00066-HTML\/image_t\/frobt-04-00066-g012.gif\" alt=\"www.frontiersin.org\" \/><\/a><strong>Figure 12<\/strong>. Illustration of the effect of hyper\/hypo transformations on the F1\u2013F2 distribution of vowel formant resonance frequencies (in hertz). The distribution for\u00a0untransformed\u00a0vowels are shown with black lines. The distribution for\u00a0hypo\u00a0vowels are shown with blue-dashed lines\u00a0<strong>(A)<\/strong>, and the distribution for\u00a0hyper\u00a0vowels are shown with red-dashed lines\u00a0<strong>(B)<\/strong>. Formant frequencies were extracted using \u2018Praat\u2019 (<a href=\"http:\/\/www.fon.hum.uva.nl\/praat\/\">http:\/\/www.fon.hum.uva.nl\/praat\/<\/a>) and phone labels are displayed using the \u2018CMU Pronouncing Phoneme Set\u2019 (<a href=\"http:\/\/www.speech.cs.cmu.edu\/cgi-bin\/cmudict\">http:\/\/www.speech.cs.cmu.edu\/cgi-bin\/cmudict<\/a>). \u00a0In terms of speech intelligibility, Figure\u00a0<a href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#F13\">13<\/a>\u00a0shows the consequences of varying between hypo- and hyper-articulation for synthesised speech competing with speech-babble noise at a challenging signal-to-noise ratio of 0 dB. The figure plots the difference in performance for hypo-articulated (HYO) speech or hyper-articulated (HYP) speech normalised with respect to the standard synthesiser settings (STD). The results clearly show a reduction in intelligibility for hypo-articulated speech and an increase in intelligibility for hyper-articulated speech. On average, the results indicate that the intelligibility of the synthesised speech can be reduced by 25% in hypo-articulated speech<a href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#note10\">10<\/a>\u00a0and increased by 25% in hyper-articulated speech. <a href=\"https:\/\/www.frontiersin.org\/files\/Articles\/277317\/frobt-04-00066-HTML\/image_m\/frobt-04-00066-g013.jpg\" target=\"_blank\" rel=\"noopener noreferrer\"><img decoding=\"async\" src=\"https:\/\/www.frontiersin.org\/files\/Articles\/277317\/frobt-04-00066-HTML\/image_t\/frobt-04-00066-g013.gif\" alt=\"www.frontiersin.org\" \/><\/a><strong>Figure 13<\/strong>. Distribution of the SII differences (in percentage) between\u00a0hypo\u00a0and\u00a0standardspeech (HYO-STD blue-crossed histograms), and between\u00a0hyper\u00a0and\u00a0standard\u00a0speech (HYP-STDred-dotted histograms) for\u00a0<strong>(A)<\/strong>\u00a0vowels and\u00a0<strong>(B)<\/strong>\u00a0consonants. \u00a0Overall, the results of the evaluation show that we were able to successfully implement the core components of a new form of\u00a0intentional\u00a0speech synthesiser based on the derived needs-driven architecture that is capable of dynamically adapting its output as a function of its perceived communicative success modulated by articulatory effort. 4. Discussion The needs-driven cognitive architecture described in Section 2 does appear to capture several important elements of communicative interaction that are missing from the \u2018standard\u2019 W3C-style model shown in Figure\u00a0<a href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#F2\">2<\/a>. Not only does the new architecture suggest a more structured approach to advanced forms of both automatic speech recognition and speech synthesis (the latter being demonstrated in Section 3) but it also applies to\u00a0all\u00a0forms of teleological\/communicative interaction. That is, the derived architecture is not specific to speech-based interactivity, but also relevant to sign language and any other mode of communicative behavior\u2014by mind or machine. In particular, two of the key concepts embedded in the architecture illustrated in Figure\u00a0<a href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#F9\">9<\/a>\u00a0are (i) an agent\u2019s ability to \u2018infer\u2019 (using search) the consequences of their actions when they cannot be observed directly and (ii) the use of a\u00a0forward model\u00a0of \u2018self\u2019 to model \u2018other.\u2019 Both of these features align well with the contemporary view of language as \u201costensive inferential recursive mind-reading\u201d (<a href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#B42\">Scott-Phillips, 2015<\/a>), so this is a very positive outcome. On the other hand, the intentional speech synthesiser described in Section 3 represents only one facet of the full needs-driven architecture. For example, while the implications of the framework for other aspects (such as automatic speech recognition) are discussed in Section 2.7, they have not yet been validated experimentally. Hence, while the derived architecture may be an appropriate model of communicative interaction between conspecifics (in this case, human beings), no\u00a0artificial\u00a0agent yet has such an advanced structure. This means that there is currently a gross mismatch in priors between humans and artificial agents, which is probably one explanation as to why users have difficulty engaging with contemporary speech-based systems. Following the analogy cited in Section 1.1, language-based interaction between users and current speech-based systems is more like a three-legged race where one partner has fallen over and is being dragged along the ground! Indeed, the richness of the derived architecture makes it clear that successful language-based interaction between human beings is founded on substantial shared priors (see Figure\u00a0<a href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#F14\">14<\/a>A). However, for human\u2013machine interaction, the fundamentally different situated and embodied real-world experiences of the interlocutors may mean that it may not be possible to simply \u2018upgrade\u2019 from one to the other (see Figure\u00a0<a href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#F14\">14<\/a>B. In other words, there may be a fundamental limit to the complexity of the interaction that can take place between\u00a0mismatched\u00a0partners such as a human being and an autonomous social agent (<a href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#B27\">Moore, 2016b<\/a>). Although it is certainly possible to instantiate a speech-based communicative interface<a href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#note11\">11<\/a>\u00a0between humans and machines \u2026 \u201cThe assumption of continuity between a fully coded communication system at one end, and language at the other, is simply not justified.\u201d (<a href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#B42\">Scott-Phillips, 2015<\/a>) <a href=\"https:\/\/www.frontiersin.org\/files\/Articles\/277317\/frobt-04-00066-HTML\/image_m\/frobt-04-00066-g014.jpg\" target=\"_blank\" rel=\"noopener noreferrer\"><img decoding=\"async\" src=\"https:\/\/www.frontiersin.org\/files\/Articles\/277317\/frobt-04-00066-HTML\/image_t\/frobt-04-00066-g014.gif\" alt=\"www.frontiersin.org\" \/><\/a><strong>Figure 14<\/strong>. Pictographic representation (<a href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#B26\">Moore, 2016a<\/a>) of language-based coupling (dialog) between interlocutors. In\u00a0<strong>(A)<\/strong>, communicative interaction between human beings is founded on two-way ostensive recursive mind-reading (including mutual Theory-of-Mind). In\u00a0<strong>(B)<\/strong>, the artificial agent lacks the capability of ostensive recursive mind-reading (it has no Theory-of-Mind), so the interaction is inevitably constrained. \u00a0This notion of a potential discontinuity between simple command-based interaction and \u2018natural\u2019 human language has been a concern of the spoken language dialog systems (SLDS) community for some time. For example,\u00a0<a href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#B34\">Phillips (2006)<\/a>\u00a0speculated about a non-linear relationship between\u00a0flexibility\u00a0and\u00a0usability\u00a0of an SLDS; as flexibility increases with advancing technology, so usability increases until users no longer know what they can and cannot say, at which point usability tumbles and interaction falls apart (see Figure\u00a0<a href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#F15\">15<\/a>). Interestingly, the shape of the curve illustrated in Figure\u00a0<a href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#F15\">15<\/a>\u00a0is virtually identical to the famous \u2018uncanny valley effect\u2019 (<a href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#B30\">Mori, 1970<\/a>) in which a near human-looking artifact (such as a humanoid robot) can trigger feelings of eeriness and repulsion in an observer; as human likeness increases, so affinity increases until a point where artifacts start to appear creepy and affinity goes negative. <a href=\"https:\/\/www.frontiersin.org\/files\/Articles\/277317\/frobt-04-00066-HTML\/image_m\/frobt-04-00066-g015.jpg\" target=\"_blank\" rel=\"noopener noreferrer\"><img decoding=\"async\" src=\"https:\/\/www.frontiersin.org\/files\/Articles\/277317\/frobt-04-00066-HTML\/image_t\/frobt-04-00066-g015.gif\" alt=\"www.frontiersin.org\" \/><\/a><strong>Figure 15<\/strong>. Illustration of the consequences of increasing the flexibility of spoken language dialog systems; increasing flexibility can lead to a\u00a0habitability gap\u00a0where usability drops catastrophically (reproduced, with permission, from\u00a0<a href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#B34\">Phillips (2006)<\/a>). This means that it is surprisingly difficult to deliver a technology corresponding to the point marked \u2018??\u2019. (Contemporary systems such as\u00a0Siri\u00a0or\u00a0Alexa\u00a0correspond to the point marked \u2018Add NL\/Dialog.\u2019) \u00a0Evidence for this unintended consequence of mismatched priors was already referred to in Section 1.1 in terms of its manifestation in low usage statistics for contemporary voice-enabled systems. This perspective is also supported by early experience with\u00a0Jibo\u00a0for which it has been reported that \u201cUsers had trouble discovering what Jibo could do\u201d.<a href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#note12\">12<\/a>\u00a0Clearly, understanding how to bridge this \u2018habitability gap\u2019 (<a href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#B27\">Moore, 2016b<\/a>) is a critical aspect of ongoing research into the development of effective spoken language-based interaction between human beings and autonomous social agents (such as robots). Finally, it is worth noting that there is an important difference between mismatched priors\/beliefs and misaligned needs\/intentions. The former leads to the habitability issues discussed above, but the latter can give rise to conflict rather than cooperation. Based on an earlier version of the architecture presented herein,\u00a0<a href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#B23\">Moore (2007a<\/a>,<a href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#B24\">b<\/a>) concludes that, in order to facilitate cooperative interaction, an agent\u2019s needs and intentions must be subservient to its user\u2019s needs and intentions. 5. Conclusion This paper has presented an alternative needs-driven cognitive architecture which models speech-based interaction as an emergent property of coupled hierarchical feedback-control processes in which a speaker has in mind the\u00a0needs\u00a0of a listener and a listener has in mind the\u00a0intentions\u00a0of a speaker. The architecture has been derived from basic principles underpinning agent\u2013world and agent\u2013agent interaction and, as a consequence, it goes beyond the standard behaviorist stimulus\u2013response model of interactive dialog currently deployed in contemporary spoken language systems. The derived architecture reflects contemporary views on the nature of spoken language interaction, including sensorimotor overlap and the power of exploiting models of \u2018self\u2019 to understand\/influence the behavior of \u2018other.\u2019 The implications of this architecture for future spoken language systems have been illustrated through the development of a new type of\u00a0intentional\u00a0speech synthesiser that is capable of adapting its pronunciation in unpredictable acoustic environments as a function of its perceived communicative success. Results have confirmed that, by actively managing phonetic contrast, the synthesiser is able to increase\/decrease intelligibility by up to 25%. The research presented herein confirms that intentional behavior is essential to the facilitation of meaningful and productive communicative interaction between human beings and autonomous social agents (such as robots). However, it is also pointed out that there is currently a gross mismatch in intentional priors between humans and artificial agents, and that this may ultimately impose a fundamental limit on the effectiveness of speech-based human\u2013robot interaction. Author Contributions RM developed the overall architecture, MN implemented and tested the speech synthesiser, and both authors contributed to the written manuscript. Conflict of Interest Statement The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest. The reviewer, TB, and handling editor declared their shared affiliation. Funding This work was supported by the European Commission [EU-FP6-507422, EU-FP6-034434, EU-FP7-213850, EU-FP7-231868, and EU-FP7-611971] and the UK Engineering and Physical Sciences Research Council [EP\/I013512\/1]. Footnotes <\/p>\n<li><strong><a title=\"\" href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#note1a\">^<\/a><\/strong><a href=\"https:\/\/www.jibo.com\/\">https:\/\/www.jibo.com<\/a>.<\/li>\n<li><strong><a title=\"\" href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#note2a\">^<\/a><\/strong><a href=\"https:\/\/www.heyolly.com\/\">https:\/\/www.heyolly.com<\/a>.<\/li>\n<li><strong><a title=\"\" href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#note3a\">^<\/a><\/strong>For example, the \u2018wrong\u2019 sign for g would lead to positive feedback and an unstable system.<\/li>\n<li><strong><a title=\"\" href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#note4a\">^<\/a><\/strong>Also, it might be safer and\/or less costly to avoid physical exploration in favour of virtual exploration.<\/li>\n<li><strong><a title=\"\" href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#note5a\">^<\/a><\/strong>Note that this assumes \u2018honest\u2019 communication in which intentions and interpretations are the same. Relaxing this assumption is an interesting topic, but is beyond the scope of the work reported herein.<\/li>\n<li><strong><a title=\"\" href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#note6a\">^<\/a><\/strong><a href=\"http:\/\/festvox.org\/cmu_arctic\">http:\/\/festvox.org\/cmu_arctic<\/a>.<\/li>\n<li><strong><a title=\"\" href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#note7a\">^<\/a><\/strong>As measured by the SII-based simulated \u2018listener.\u2019<\/li>\n<li><strong><a title=\"\" href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#note8a\">^<\/a><\/strong><a href=\"http:\/\/www.cstr.ed.ac.uk\/projects\/festival\/\">http:\/\/www.cstr.ed.ac.uk\/projects\/festival\/<\/a>.<\/li>\n<li><strong><a title=\"\" href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#note9a\">^<\/a><\/strong><a href=\"http:\/\/www.synsig.org\/index.php\/Blizzard_Challenge_2010\">http:\/\/www.synsig.org\/index.php\/Blizzard_Challenge_2010<\/a>.<\/li>\n<li><strong><a title=\"\" href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#note10a\">^<\/a><\/strong>It might appear strange that an artificial talker would seek to minimise communicative effort\u2014why not speak maximally clearly all the time? However, not only is hypo-articulated speech often used in human communication as a strategy to overcome social and formal barriers but there is a correlation between \u03b1 and effort for both the speaker and the listener. In particular, hyper-articulation means that there is an increase in the length and amplitude of an utterance, and speech that is too loud or takes too long is tiring for a listener (i.e. it requires additional perceptual effort).<\/li>\n<li><strong><a title=\"\" href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#note11a\">^<\/a><\/strong>Essentially a voice button pressing system.<\/li>\n<li><strong><a title=\"\" href=\"https:\/\/www.frontiersin.org\/articles\/10.3389\/frobt.2017.00066\/full?utm_source=F-AAE&amp;utm_medium=EMLF&amp;utm_campaign=MRK_496986_72_Roboti_20171226_arts_A#note12a\">^<\/a><\/strong><a href=\"https:\/\/www.slashgear.com\/jibo-delayed-to-2017-as-social-robot-hits-more-hurdles-20464725\/\">https:\/\/www.slashgear.com\/jibo-delayed-to-2017-as-social-robot-hits-more-hurdles-20464725\/<\/a>.<\/li>\n<p> References American National Standards Institute. (1997).\u00a0American National Standard Methods for Calculation of the Speech Intelligibility ANSI S3.5-1997, New York, NY: ANSI. <a href=\"http:\/\/scholar.google.com\/scholar_lookup?title=American+National+Standard+Methods+for+Calculation+of+the+Speech+Intelligibility+ANSI+S3.5-1997&amp;author=American+National+Standards+Institute&amp;publication_year=1997\" target=\"_blank\" rel=\"noopener noreferrer\">Google Scholar<\/a> Baldassarre, G., Stafford, T., Mirolli, M., Redgrave, P., Ryan, R. M., and Barto, A. (2014). Intrinsic motivations and open-ended development in animals, humans, and robots: an overview.\u00a0Front. Psychol.\u00a05:985. doi: 10.3389\/fpsyg.2014.00985 <a href=\"https:\/\/doi.org\/10.3389\/fpsyg.2014.00985\" target=\"_blank\" rel=\"noopener noreferrer\">CrossRef Full Text<\/a>\u00a0|\u00a0<a href=\"http:\/\/scholar.google.com\/scholar_lookup?title=Intrinsic+motivations+and+open-ended+development+in+animals,+humans,+and+robots:+an+overview&amp;author=G.+Baldassarre&amp;author=T.+Stafford&amp;author=M.+Mirolli&amp;author=P.+Redgrave&amp;author=R.+M.+Ryan&amp;author=A.+Barto&amp;journal=Front.+Psychol.&amp;publication_year=2014&amp;volume=5&amp;pages=985&amp;doi=10.3389\/fpsyg.2014.00985\" target=\"_blank\" rel=\"noopener noreferrer\">Google Scholar<\/a> Bickhard, M. H. (2007). Language as an interaction system.\u00a0New Ideas Psychol.\u00a025, 171\u2013187. doi:10.1016\/j.newideapsych.2007.02.006 <a href=\"https:\/\/doi.org\/10.1016\/j.newideapsych.2007.02.006\" target=\"_blank\" rel=\"noopener noreferrer\">CrossRef Full Text<\/a>\u00a0|\u00a0<a href=\"http:\/\/scholar.google.com\/scholar_lookup?title=Language+as+an+interaction+system&amp;author=M.+H.+Bickhard&amp;journal=New+Ideas+Psychol.&amp;publication_year=2007&amp;volume=25&amp;pages=171%E2%80%93187&amp;doi=10.1016\/j.newideapsych.2007.02.006\" target=\"_blank\" rel=\"noopener noreferrer\">Google Scholar<\/a> Bridle, J. S., and Ralls, M. P. (1985). \u201cAn approach to speech recognition using synthesis by rule,\u201d in\u00a0Computer Speech Processing, eds F. Fallside and W. Woods (London, UK: Prentice Hall), 277\u2013292. <a href=\"http:\/\/scholar.google.com\/scholar_lookup?title=An+approach+to+speech+recognition+using+synthesis+by+rule&amp;author=J.+S.+Bridle&amp;author=M.+P.+Ralls&amp;publication_year=1985&amp;pages=277%E2%80%93292\" target=\"_blank\" rel=\"noopener noreferrer\">Google Scholar<\/a> Cummins, F. (2011). Periodic and aperiodic synchronization in skilled action.\u00a0Front. Hum. Neurosci.\u00a05:170. doi:10.3389\/fnhum.2011.00170 <a href=\"http:\/\/www.ncbi.nlm.nih.gov\/sites\/entrez?Db=pubmed&amp;Cmd=ShowDetailView&amp;TermToSearch=22232583\" target=\"_blank\" rel=\"noopener noreferrer\">PubMed Abstract<\/a>\u00a0|\u00a0<a href=\"https:\/\/doi.org\/10.3389\/fnhum.2011.00170\" target=\"_blank\" rel=\"noopener noreferrer\">CrossRef Full Text<\/a>\u00a0|\u00a0<a href=\"http:\/\/scholar.google.com\/scholar_lookup?title=Periodic+and+aperiodic+synchronization+in+skilled+action&amp;author=F.+Cummins&amp;journal=Front.+Hum.+Neurosci.&amp;publication_year=2011&amp;volume=5&amp;pages=170&amp;doi=10.3389\/fnhum.2011.00170&amp;pmid=22232583\" target=\"_blank\" rel=\"noopener noreferrer\">Google Scholar<\/a> Friston, K., and Frith, C. (2015). A duet for one.\u00a0Conscious. Cogn.\u00a036, 390\u2013405. doi:10.1016\/j.concog.2014.12.003 <a href=\"http:\/\/www.ncbi.nlm.nih.gov\/sites\/entrez?Db=pubmed&amp;Cmd=ShowDetailView&amp;TermToSearch=25563935\" target=\"_blank\" rel=\"noopener noreferrer\">PubMed Abstract<\/a>\u00a0|\u00a0<a href=\"https:\/\/doi.org\/10.1016\/j.concog.2014.12.003\" target=\"_blank\" rel=\"noopener noreferrer\">CrossRef Full Text<\/a>\u00a0|\u00a0<a href=\"http:\/\/scholar.google.com\/scholar_lookup?title=A+duet+for+one&amp;author=K.+Friston&amp;author=C.+Frith&amp;journal=Conscious.+Cogn.&amp;publication_year=2015&amp;volume=36&amp;pages=390%E2%80%93405&amp;doi=10.1016\/j.concog.2014.12.003&amp;pmid=25563935\" target=\"_blank\" rel=\"noopener noreferrer\">Google Scholar<\/a> Fusaroli, R., Raczaszek-Leonardi, J., and Tyl\u00e9n, K. (2014). Dialog as interpersonal synergy.\u00a0New Ideas Psychol.\u00a032, 147\u2013157. doi:10.1016\/j.newideapsych.2013.03.005 <a href=\"https:\/\/doi.org\/10.1016\/j.newideapsych.2013.03.005\" target=\"_blank\" rel=\"noopener noreferrer\">CrossRef Full Text<\/a>\u00a0|\u00a0<a href=\"http:\/\/scholar.google.com\/scholar_lookup?title=Dialog+as+interpersonal+synergy&amp;author=R.+Fusaroli&amp;author=J.+Raczaszek-Leonardi&amp;author=K.+Tyl%C3%A9n&amp;journal=New+Ideas+Psychol.&amp;publication_year=2014&amp;volume=32&amp;pages=147%E2%80%93157&amp;doi=10.1016\/j.newideapsych.2013.03.005\" target=\"_blank\" rel=\"noopener noreferrer\">Google Scholar<\/a> Gales, M., and Young, S. J. (2007). The application of hidden Markov models in speech recognition.\u00a0Found. Trends Sig. Process.\u00a01, 195\u2013304. doi:10.1561\/2000000004 <a href=\"https:\/\/doi.org\/10.1561\/2000000004\" target=\"_blank\" rel=\"noopener noreferrer\">CrossRef Full Text<\/a>\u00a0|\u00a0<a href=\"http:\/\/scholar.google.com\/scholar_lookup?title=The+application+of+hidden+Markov+models+in+speech+recognition&amp;author=M.+Gales&amp;author=S.+J.+Young&amp;journal=Found.+Trends+Sig.+Process.&amp;publication_year=2007&amp;volume=1&amp;pages=195%E2%80%93304&amp;doi=10.1561\/2000000004\" target=\"_blank\" rel=\"noopener noreferrer\">Google Scholar<\/a> Grush, R. (2004). The emulation theory of representation: motor control, imagery, and perception.\u00a0Behav. Brain Sci.\u00a027, 377\u2013442. doi:10.1017\/S0140525X04000093 <a href=\"http:\/\/www.ncbi.nlm.nih.gov\/sites\/entrez?Db=pubmed&amp;Cmd=ShowDetailView&amp;TermToSearch=15736871\" target=\"_blank\" rel=\"noopener noreferrer\">PubMed Abstract<\/a>\u00a0|\u00a0<a href=\"https:\/\/doi.org\/10.1017\/S0140525X04000093\" target=\"_blank\" rel=\"noopener noreferrer\">CrossRef Full Text<\/a>\u00a0|\u00a0<a href=\"http:\/\/scholar.google.com\/scholar_lookup?title=The+emulation+theory+of+representation:+motor+control,+imagery,+and+perception&amp;author=R.+Grush&amp;journal=Behav.+Brain+Sci.&amp;publication_year=2004&amp;volume=27&amp;pages=377%E2%80%93442&amp;doi=10.1017\/S0140525X04000093&amp;pmid=15736871\" target=\"_blank\" rel=\"noopener noreferrer\">Google Scholar<\/a> Hartsuiker, R. J., and Kolk, H. H. J. (2001). Error monitoring in speech production: a computational test of the perceptual loop theory.\u00a0Cogn. Psychol.\u00a042, 113\u2013157. doi:10.1006\/cogp.2000.0744 <a href=\"http:\/\/www.ncbi.nlm.nih.gov\/sites\/entrez?Db=pubmed&amp;Cmd=ShowDetailView&amp;TermToSearch=11259106\" target=\"_blank\" rel=\"noopener noreferrer\">PubMed Abstract<\/a>\u00a0|\u00a0<a href=\"https:\/\/doi.org\/10.1006\/cogp.2000.0744\" target=\"_blank\" rel=\"noopener noreferrer\">CrossRef Full Text<\/a>\u00a0|\u00a0<a href=\"http:\/\/scholar.google.com\/scholar_lookup?title=Error+monitoring+in+speech+production:+a+computational+test+of+the+perceptual+loop+theory&amp;author=R.+J.+Hartsuiker&amp;author=H.+H.+J.+Kolk&amp;journal=Cogn.+Psychol.&amp;publication_year=2001&amp;volume=42&amp;pages=113%E2%80%93157&amp;doi=10.1006\/cogp.2000.0744&amp;pmid=11259106\" target=\"_blank\" rel=\"noopener noreferrer\">Google Scholar<\/a> Hawkins, S. (2003). Roles and representations of systematic fine phonetic detail in speech understanding.\u00a0J. Phon.\u00a031, 373\u2013405. doi:10.1016\/j.wocn.2003.09.006 <a href=\"https:\/\/doi.org\/10.1016\/j.wocn.2003.09.006\" target=\"_blank\" rel=\"noopener noreferrer\">CrossRef Full Text<\/a>\u00a0|\u00a0<a href=\"http:\/\/scholar.google.com\/scholar_lookup?title=Roles+and+representations+of+systematic+fine+phonetic+detail+in+speech+understanding&amp;author=S.+Hawkins&amp;journal=J.+Phon.&amp;publication_year=2003&amp;volume=31&amp;pages=373%E2%80%93405&amp;doi=10.1016\/j.wocn.2003.09.006\" target=\"_blank\" rel=\"noopener noreferrer\">Google Scholar<\/a> Hesslow, G. (2002). Conscious thought as simulation of behaviour and perception.\u00a0Trends Cogn. Sci.\u00a06, 242\u2013247. doi:10.1016\/S1364-6613(02)01913-7 <a href=\"http:\/\/www.ncbi.nlm.nih.gov\/sites\/entrez?Db=pubmed&amp;Cmd=ShowDetailView&amp;TermToSearch=12039605\" target=\"_blank\" rel=\"noopener noreferrer\">PubMed Abstract<\/a>\u00a0|\u00a0<a href=\"https:\/\/doi.org\/10.1016\/S1364-6613(02)01913-7\" target=\"_blank\" rel=\"noopener noreferrer\">CrossRef Full Text<\/a>\u00a0|\u00a0<a href=\"http:\/\/scholar.google.com\/scholar_lookup?title=Conscious+thought+as+simulation+of+behaviour+and+perception&amp;author=G.+Hesslow&amp;journal=Trends+Cogn.+Sci.&amp;publication_year=2002&amp;volume=6&amp;pages=242%E2%80%93247&amp;doi=10.1016\/S1364-6613(02)01913-7&amp;pmid=12039605\" target=\"_blank\" rel=\"noopener noreferrer\">Google Scholar<\/a> Huang, X. D. (2002).\u00a0Making Speech Mainstream. Redmond, WA: Microsoft Speech Technologies Group. <a href=\"http:\/\/scholar.google.com\/scholar_lookup?title=Making+Speech+Mainstream&amp;author=X.+D.+Huang&amp;publication_year=2002\" target=\"_blank\" rel=\"noopener noreferrer\">Google Scholar<\/a> Kuhl, P. K., Ramirez, R. R., Bosseler, A., Lin, J.-F. L., and Imada, T. (2014). Infants\u2019 brain responses to speech suggest analysis by synthesis.\u00a0Proc. Natl. Acad. Sci. U.S.A.\u00a0111, 11238\u201311245. doi:10.1073\/pnas.1410963111 <a href=\"https:\/\/doi.org\/10.1073\/pnas.1410963111\" target=\"_blank\" rel=\"noopener noreferrer\">CrossRef Full Text<\/a>\u00a0|\u00a0<a href=\"http:\/\/scholar.google.com\/scholar_lookup?title=Infants%E2%80%99+brain+responses+to+speech+suggest+analysis+by+synthesis&amp;author=P.+K.+Kuhl&amp;author=R.+R.+Ramirez&amp;author=A.+Bosseler&amp;author=J.+F.+L.+Lin&amp;author=T.+Imada&amp;journal=Proc.+Natl.+Acad.+Sci.+U.S.A.&amp;publication_year=2014&amp;volume=111&amp;pages=11238%E2%80%9311245&amp;doi=10.1073\/pnas.1410963111\" target=\"_blank\" rel=\"noopener noreferrer\">Google Scholar<\/a> Levelt, W. J. M. (1989).\u00a0Speaking: From Intention to Articulation. Cambridge, MA: The MIT Press. <a href=\"http:\/\/scholar.google.com\/scholar_lookup?title=Speaking:+From+Intention+to+Articulation&amp;author=W.+J.+M.+Levelt&amp;publication_year=1989\" target=\"_blank\" rel=\"noopener noreferrer\">Google Scholar<\/a> Liberman, A., Cooper, F. S., Shankweiler, D. P., and Studdert-Kennedy, M. (1967). Perception of the speech code.\u00a0Psychol. Rev.74, 431\u2013461. doi:10.1037\/h0020279 <a href=\"https:\/\/doi.org\/10.1037\/h0020279\" target=\"_blank\" rel=\"noopener noreferrer\">CrossRef Full Text<\/a>\u00a0|\u00a0<a href=\"http:\/\/scholar.google.com\/scholar_lookup?title=Perception+of+the+speech+code&amp;author=A.+Liberman&amp;author=F.+S.+Cooper&amp;author=D.+P.+Shankweiler&amp;author=M.+Studdert-Kennedy&amp;journal=Psychol.+Rev.&amp;publication_year=1967&amp;volume=74&amp;pages=431%E2%80%93461&amp;doi=10.1037\/h0020279\" target=\"_blank\" rel=\"noopener noreferrer\">Google Scholar<\/a> Liberman, A. M., and Mattingly, I. G. (1985). The motor theory of speech perception revised.\u00a0Cognition\u00a021, 1\u201336. doi:10.1016\/0010-0277(85)90021-6 <a href=\"https:\/\/doi.org\/10.1016\/0010-0277(85)90021-6\" target=\"_blank\" rel=\"noopener noreferrer\">CrossRef Full Text<\/a>\u00a0|\u00a0<a href=\"http:\/\/scholar.google.com\/scholar_lookup?title=The+motor+theory+of+speech+perception+revised&amp;author=A.+M.+Liberman&amp;author=I.+G.+Mattingly&amp;journal=Cognition&amp;publication_year=1985&amp;volume=21&amp;pages=1%E2%80%9336&amp;doi=10.1016\/0010-0277(85)90021-6\" target=\"_blank\" rel=\"noopener noreferrer\">Google Scholar<\/a> Lindblom, B. (1990). \u201cExplaining phonetic variation: a sketch of the H&amp;H theory,\u201d in\u00a0Speech Production and Speech Modelling, eds W. J. Hardcastle and A. Marchal (Berlin: Kluwer Academic Publishers), 403\u2013439. <a href=\"http:\/\/scholar.google.com\/scholar_lookup?title=Explaining+phonetic+variation:+a+sketch+of+the+H&amp;H+theory&amp;author=B.+Lindblom&amp;publication_year=1990&amp;pages=403%E2%80%93439\" target=\"_blank\" rel=\"noopener noreferrer\">Google Scholar<\/a> Lombard, E. (1911). Le sign de l\u2019\u00e9l\u00e9vation de la voix.\u00a0Ann. Maladies Oreille Larynx Nez Pharynx\u00a037, 101\u2013119. <a href=\"http:\/\/scholar.google.com\/scholar_lookup?title=Le+sign+de+l%E2%80%99%C3%A9l%C3%A9vation+de+la+voix&amp;author=E.+Lombard&amp;journal=Ann.+Maladies+Oreille+Larynx+Nez+Pharynx&amp;publication_year=1911&amp;volume=37&amp;pages=101%E2%80%93119\" target=\"_blank\" rel=\"noopener noreferrer\">Google Scholar<\/a> Mansell, W., and Carey, T. A. (2015). A perceptual control revolution.\u00a0Psychologist\u00a028, 896\u2013899. <a href=\"http:\/\/scholar.google.com\/scholar_lookup?title=A+perceptual+control+revolution&amp;author=W.+Mansell&amp;author=T.+A.+Carey&amp;journal=Psychologist&amp;publication_year=2015&amp;volume=28&amp;pages=896%E2%80%93899\" target=\"_blank\" rel=\"noopener noreferrer\">Google Scholar<\/a> Maslow, A. H. (1943). A theory of human motivation.\u00a0Psychol. Rev.\u00a050, 370\u2013396. doi:10.1037\/h0054346 <a href=\"https:\/\/doi.org\/10.1037\/h0054346\" target=\"_blank\" rel=\"noopener noreferrer\">CrossRef Full Text<\/a>\u00a0|\u00a0<a href=\"http:\/\/scholar.google.com\/scholar_lookup?title=A+theory+of+human+motivation&amp;author=A.+H.+Maslow&amp;journal=Psychol.+Rev.&amp;publication_year=1943&amp;volume=50&amp;pages=370%E2%80%93396&amp;doi=10.1037\/h0054346\" target=\"_blank\" rel=\"noopener noreferrer\">Google Scholar<\/a> Moore, R. K. (2018). \u201cPCT and beyond: towards a computational framework for \u2018intelligent\u2019 systems,\u201d in\u00a0Living Control Systems IV: Perceptual Control Theory and the Future of the Life and Social Sciences, eds A. McElhone and W. Mansell (Benchmark Publications Inc). Available at:\u00a0<a href=\"https:\/\/arxiv.org\/abs\/1611.05379\">https:\/\/arxiv.org\/abs\/1611.05379<\/a>. <a href=\"http:\/\/scholar.google.com\/scholar_lookup?title=PCT+and+beyond:+towards+a+computational+framework+for+%E2%80%98intelligent%E2%80%99+systems&amp;author=R.+K.+Moore&amp;publication_year=2018\" target=\"_blank\" rel=\"noopener noreferrer\">Google Scholar<\/a> Moore, R. K. (2007a). PRESENCE: a human-inspired architecture for speech-based human-machine interaction.\u00a0IEEE Trans. Comput.\u00a056, 1176\u20131188. doi:10.1109\/TC.2007.1080 <a href=\"https:\/\/doi.org\/10.1109\/TC.2007.1080\" target=\"_blank\" rel=\"noopener noreferrer\">CrossRef Full Text<\/a>\u00a0|\u00a0<a href=\"http:\/\/scholar.google.com\/scholar_lookup?title=PRESENCE:+a+human-inspired+architecture+for+speech-based+human-machine+interaction&amp;author=R.+K.+Moore&amp;journal=IEEE+Trans.+Comput.&amp;publication_year=2007a&amp;volume=56&amp;pages=1176%E2%80%931188&amp;doi=10.1109\/TC.2007.1080\" target=\"_blank\" rel=\"noopener noreferrer\">Google Scholar<\/a> Moore, R. K. (2007b). Spoken language processing: piecing together the puzzle.\u00a0Speech Commun.\u00a049, 418\u2013435. doi:10.1016\/j.specom.2007.01.011 <a href=\"https:\/\/doi.org\/10.1016\/j.specom.2007.01.011\" target=\"_blank\" rel=\"noopener noreferrer\">CrossRef Full Text<\/a>\u00a0|\u00a0<a href=\"http:\/\/scholar.google.com\/scholar_lookup?title=Spoken+language+processing:+piecing+together+the+puzzle&amp;author=R.+K.+Moore&amp;journal=Speech+Commun.&amp;publication_year=2007b&amp;volume=49&amp;pages=418%E2%80%93435&amp;doi=10.1016\/j.specom.2007.01.011\" target=\"_blank\" rel=\"noopener noreferrer\">Google Scholar<\/a> Moore, R. K. (2014). \u201cSpoken language processing: time to look outside?\u201d in\u00a02nd International Conference on Statistical Language and Speech Processing (SLSP 2014), Lecture Notes in Computer Science, Vol. 8791, eds L. Besacier, A.-H. Dediu, and C. Mart\u00edn-Vide (Grenoble: Springer), 21\u201336. <a href=\"http:\/\/scholar.google.com\/scholar_lookup?title=Spoken+language+processing:+time+to+look+outside?&amp;author=R.+K.+Moore&amp;conference=2nd+International+Conference+on+Statistical+Language+and+Speech+Processing+(SLSP+2014),+Lecture+Notes+in+Computer+Science&amp;publication_year=2014&amp;volume=8791&amp;pages=21%E2%80%9336\" target=\"_blank\" rel=\"noopener noreferrer\">Google Scholar<\/a> Moore, R. K. (2016a). Introducing a pictographic language for envisioning a rich variety of enactive systems with different degrees of complexity.\u00a0Int. J. Adv. Robot. Syst. 13. Available at:\u00a0<a href=\"https:\/\/wwws.sagepub.com\/doi\/pdf\/10.5772\/62244\">https:\/\/wwws.sagepub.com\/doi\/pdf\/10.5772\/62244<\/a> <a href=\"http:\/\/scholar.google.com\/scholar_lookup?title=Introducing+a+pictographic+language+for+envisioning+a+rich+variety+of+enactive+systems+with+different+degrees+of+complexity&amp;author=R.+K.+Moore&amp;publication_year=2016a\" target=\"_blank\" rel=\"noopener noreferrer\">Google Scholar<\/a> Moore, R. K. (2016b). \u201cIs spoken language all-or-nothing? Implications for future speech-based human-machine interaction,\u201d in\u00a0Dialogues with Social Robots \u2013 Enablements, Analyses, and Evaluation, eds K. Jokinen and G. Wilcock (Springer Lecture Notes in Electrical Engineering (LNEE)), 281\u2013291. Available at:\u00a0<a href=\"http:\/\/arxiv.org\/abs\/1607.05174\">http:\/\/arxiv.org\/abs\/1607.05174<\/a> <a href=\"http:\/\/scholar.google.com\/scholar_lookup?title=Is+spoken+language+all-or-nothing?+Implications+for+future+speech-based+human-machine+interaction&amp;author=R.+K.+Moore&amp;publication_year=2016b&amp;pages=281%E2%80%93291\" target=\"_blank\" rel=\"noopener noreferrer\">Google Scholar<\/a> Moore, R. K., Li, H., and Liao, S.-H. (2016). \u201cProgress and prospects for spoken language technology: what ordinary people think,\u201d in\u00a0INTERSPEECH\u00a0(San Francisco, CA), 3007\u20133011. <a href=\"http:\/\/scholar.google.com\/scholar_lookup?title=Progress+and+prospects+for+spoken+language+technology:+what+ordinary+people+think&amp;author=R.+K.+Moore&amp;author=H.+Li&amp;author=S.+H.+Liao&amp;publication_year=2016\" target=\"_blank\" rel=\"noopener noreferrer\">Google Scholar<\/a> Moore, R. K., and Nicolao, M. (2011). \u201cReactive speech synthesis: actively managing phonetic contrast along an H&amp;H continuum,\u201d in\u00a017th International Congress of Phonetics Sciences (ICPhS)\u00a0(Hong Kong), 1422\u20131425. <a href=\"http:\/\/scholar.google.com\/scholar_lookup?title=Reactive+speech+synthesis:+actively+managing+phonetic+contrast+along+an+H&amp;H+continuum&amp;author=R.+K.+Moore&amp;author=M.+Nicolao&amp;conference=17th+International+Congress+of+Phonetics+Sciences+(ICPhS)&amp;publication_year=2011&amp;pages=1422%E2%80%931425\" target=\"_blank\" rel=\"noopener noreferrer\">Google Scholar<\/a> Mori, M. (1970). Bukimi no tani (the uncanny valley).\u00a0Energy\u00a07, 33\u201335. <a href=\"http:\/\/scholar.google.com\/scholar_lookup?title=Bukimi+no+tani+(the+uncanny+valley)&amp;author=M.+Mori&amp;journal=Energy&amp;publication_year=1970&amp;volume=7&amp;pages=33%E2%80%9335\" target=\"_blank\" rel=\"noopener noreferrer\">Google Scholar<\/a> Nicolao, M., Latorre, J., and Moore, R. K. (2012). \u201cC2H: a computational model of H&amp;H-based phonetic contrast in synthetic speech,\u201d in\u00a0INTERSPEECH\u00a0(Portland, USA). <a href=\"http:\/\/scholar.google.com\/scholar_lookup?title=C2H:+a+computational+model+of+H&amp;H-based+phonetic+contrast+in+synthetic+speech&amp;author=M.+Nicolao&amp;author=J.+Latorre&amp;author=R.+K.+Moore&amp;publication_year=2012\" target=\"_blank\" rel=\"noopener noreferrer\">Google Scholar<\/a> Oudeyer, P.-Y., and Kaplan, F. (2007). What is intrinsic motivation? A typology of computational approaches.\u00a0Front. Neurorobot.\u00a01:6. doi:10.3389\/neuro.12.006.2007 <a href=\"http:\/\/www.ncbi.nlm.nih.gov\/sites\/entrez?Db=pubmed&amp;Cmd=ShowDetailView&amp;TermToSearch=18958277\" target=\"_blank\" rel=\"noopener noreferrer\">PubMed Abstract<\/a>\u00a0|\u00a0<a href=\"https:\/\/doi.org\/10.3389\/neuro.12.006.2007\" target=\"_blank\" rel=\"noopener noreferrer\">CrossRef Full Text<\/a>\u00a0|\u00a0<a href=\"http:\/\/scholar.google.com\/scholar_lookup?title=What+is+intrinsic+motivation?+A+typology+of+computational+approaches&amp;author=P.+Y.+Oudeyer&amp;author=F.+Kaplan&amp;journal=Front.+Neurorobot.&amp;publication_year=2007&amp;volume=1&amp;pages=6&amp;doi=10.3389\/neuro.12.006.2007&amp;pmid=18958277\" target=\"_blank\" rel=\"noopener noreferrer\">Google Scholar<\/a> Pfeifer, R., and Verschure, P. (1992). \u201cDistributed adaptive control: a paradigm for designing autonomous agents,\u201d in\u00a0First European Conference on Artificial Life, eds F. J. Varela and P. Bourgine (Cambridge, MA), 21\u201330. <a href=\"http:\/\/scholar.google.com\/scholar_lookup?title=Distributed+adaptive+control:+a+paradigm+for+designing+autonomous+agents&amp;author=R.+Pfeifer&amp;author=P.+Verschure&amp;conference=First+European+Conference+on+Artificial+Life&amp;publication_year=1992&amp;pages=21%E2%80%9330\" target=\"_blank\" rel=\"noopener noreferrer\">Google Scholar<\/a> Phillips, M. (2006). \u201cApplications of spoken language technology and systems,\u201d in\u00a0IEEE\/ACL Workshop on Spoken Language Technology (SLT), eds M. Gilbert and H. Ney (Aruba: IEEE). <a href=\"http:\/\/scholar.google.com\/scholar_lookup?title=Applications+of+spoken+language+technology+and+systems&amp;author=M.+Phillips&amp;conference=IEEE\/ACL+Workshop+on+Spoken+Language+Technology+(SLT)&amp;publication_year=2006\" target=\"_blank\" rel=\"noopener noreferrer\">Google Scholar<\/a> Pickering, M. J., and Garrod, S. (2007). Do people use language production to make predictions during comprehension?\u00a0Trends Cogn. Sci.\u00a011, 105\u2013110. doi:10.1016\/j.tics.2006.12.002 <a href=\"http:\/\/www.ncbi.nlm.nih.gov\/sites\/entrez?Db=pubmed&amp;Cmd=ShowDetailView&amp;TermToSearch=17254833\" target=\"_blank\" rel=\"noopener noreferrer\">PubMed Abstract<\/a>\u00a0|\u00a0<a href=\"https:\/\/doi.org\/10.1016\/j.tics.2006.12.002\" target=\"_blank\" rel=\"noopener noreferrer\">CrossRef Full Text<\/a>\u00a0|\u00a0<a href=\"http:\/\/scholar.google.com\/scholar_lookup?title=Do+people+use+language+production+to+make+predictions+during+comprehension?&amp;author=M.+J.+Pickering&amp;author=S.+Garrod&amp;journal=Trends+Cogn.+Sci.&amp;publication_year=2007&amp;volume=11&amp;pages=105%E2%80%93110&amp;doi=10.1016\/j.tics.2006.12.002&amp;pmid=17254833\" target=\"_blank\" rel=\"noopener noreferrer\">Google Scholar<\/a> Pickering, M. J., and Garrod, S. (2013). Forward models and their implications for production, comprehension, and dialogue.\u00a0Behav. Brain Sci.\u00a036, 377\u2013392. doi:10.1017\/S0140525X12003238 <a href=\"https:\/\/doi.org\/10.1017\/S0140525X12003238\" target=\"_blank\" rel=\"noopener noreferrer\">CrossRef Full Text<\/a>\u00a0|\u00a0<a href=\"http:\/\/scholar.google.com\/scholar_lookup?title=Forward+models+and+their+implications+for+production,+comprehension,+and+dialogue&amp;author=M.+J.+Pickering&amp;author=S.+Garrod&amp;journal=Behav.+Brain+Sci.&amp;publication_year=2013&amp;volume=36&amp;pages=377%E2%80%93392&amp;doi=10.1017\/S0140525X12003238\" target=\"_blank\" rel=\"noopener noreferrer\">Google Scholar<\/a> Pieraccini, R. (2012).\u00a0The Voice in the Machine. Cambridge, MA: MIT Press. <a href=\"http:\/\/scholar.google.com\/scholar_lookup?title=The+Voice+in+the+Machine&amp;author=R.+Pieraccini&amp;publication_year=2012\" target=\"_blank\" rel=\"noopener noreferrer\">Google Scholar<\/a> Powers, W. T. (1973).\u00a0Behavior: The Control of Perception. Hawthorne, NY: Aldine. <a href=\"http:\/\/scholar.google.com\/scholar_lookup?title=Behavior:+The+Control+of+Perception&amp;author=W.+T.+Powers&amp;publication_year=1973\" target=\"_blank\" rel=\"noopener noreferrer\">Google Scholar<\/a> Powers, W. T., Clark, R. K., and McFarland, R. L. (1960). A general feedback theory of human behavior: part II.\u00a0Percept. Mot. Skills\u00a011, 71\u201388. doi:10.2466\/pms.1960.11.3.309 <a href=\"https:\/\/doi.org\/10.2466\/pms.1960.11.3.309\" target=\"_blank\" rel=\"noopener noreferrer\">CrossRef Full Text<\/a>\u00a0|\u00a0<a href=\"http:\/\/scholar.google.com\/scholar_lookup?title=A+general+feedback+theory+of+human+behavior:+part+II&amp;author=W.+T.+Powers&amp;author=R.+K.+Clark&amp;author=R.+L.+McFarland&amp;journal=Percept.+Mot.+Skills&amp;publication_year=1960&amp;volume=11&amp;pages=71%E2%80%9388&amp;doi=10.2466\/pms.1960.11.3.309\" target=\"_blank\" rel=\"noopener noreferrer\">Google Scholar<\/a> Rao, A., and Georgoff, M. (1995).\u00a0BDI Agents: from Theory to Practice. Melbourne: Australian Artificial Intelligence Institute. Technical report. <a href=\"http:\/\/scholar.google.com\/scholar_lookup?title=BDI+Agents:+from+Theory+to+Practice&amp;author=A.+Rao&amp;author=M.+Georgoff&amp;publication_year=1995\" target=\"_blank\" rel=\"noopener noreferrer\">Google Scholar<\/a> Saon, G., Kurata, G., Sercu, T., Audhkhasi, K., Thomas, S., Dimitriadis, D., et al. (2017).\u00a0English Conversational Telephone Speech Recognition by Humans and Machines. Available at:\u00a0<a href=\"https:\/\/arxiv.org\/abs\/1703.02136\">https:\/\/arxiv.org\/abs\/1703.02136<\/a> <a href=\"http:\/\/scholar.google.com\/scholar_lookup?title=English+Conversational+Telephone+Speech+Recognition+by+Humans+and+Machines&amp;author=G.+Saon&amp;author=G.+Kurata&amp;author=T.+Sercu&amp;author=K.+Audhkhasi&amp;author=S.+Thomas&amp;author=D.+Dimitriadis&amp;publication_year=2017\" target=\"_blank\" rel=\"noopener noreferrer\">Google Scholar<\/a> Scott-Phillips, T. (2015).\u00a0Speaking Our Minds: Why Human Communication is Different, and How Language Evolved to Make It Special. London, New York: Palgrave MacMillan. <a href=\"http:\/\/scholar.google.com\/scholar_lookup?title=Speaking+Our+Minds:+Why+Human+Communication+is+Different,+and+How+Language+Evolved+to+Make+It+Special&amp;author=T.+Scott-Phillips&amp;publication_year=2015\" target=\"_blank\" rel=\"noopener noreferrer\">Google Scholar<\/a> Sebanz, N., Bekkering, H., and Knoblich, G. (2006). Joint action: bodies and minds moving together.\u00a0Trends Cogn. Sci.\u00a010, 70\u201376. doi:10.1016\/j.tics.2005.12.009 <a href=\"http:\/\/www.ncbi.nlm.nih.gov\/sites\/entrez?Db=pubmed&amp;Cmd=ShowDetailView&amp;TermToSearch=16406326\" target=\"_blank\" rel=\"noopener noreferrer\">PubMed Abstract<\/a>\u00a0|\u00a0<a href=\"https:\/\/doi.org\/10.1016\/j.tics.2005.12.009\" target=\"_blank\" rel=\"noopener noreferrer\">CrossRef Full Text<\/a>\u00a0|\u00a0<a href=\"http:\/\/scholar.google.com\/scholar_lookup?title=Joint+action:+bodies+and+minds+moving+together&amp;author=N.+Sebanz&amp;author=H.+Bekkering&amp;author=G.+Knoblich&amp;journal=Trends+Cogn.+Sci.&amp;publication_year=2006&amp;volume=10&amp;pages=70%E2%80%9376&amp;doi=10.1016\/j.tics.2005.12.009&amp;pmid=16406326\" target=\"_blank\" rel=\"noopener noreferrer\">Google Scholar<\/a> Skipper, J. I. (2014). Echoes of the spoken past: how auditory cortex hears context during speech perception.\u00a0Phil. Trans. R. Soc. B\u00a0369, 20130297. doi:10.1098\/rstb.2013.0297 <a href=\"http:\/\/www.ncbi.nlm.nih.gov\/sites\/entrez?Db=pubmed&amp;Cmd=ShowDetailView&amp;TermToSearch=25092665\" target=\"_blank\" rel=\"noopener noreferrer\">PubMed Abstract<\/a>\u00a0|\u00a0<a href=\"https:\/\/doi.org\/10.1098\/rstb.2013.0297\" target=\"_blank\" rel=\"noopener noreferrer\">CrossRef Full Text<\/a>\u00a0|\u00a0<a href=\"http:\/\/scholar.google.com\/scholar_lookup?title=Echoes+of+the+spoken+past:+how+auditory+cortex+hears+context+during+speech+perception&amp;author=J.+I.+Skipper&amp;journal=Phil.+Trans.+R.+Soc.+B&amp;publication_year=2014&amp;volume=369&amp;pages=20130297&amp;doi=10.1098\/rstb.2013.0297&amp;pmid=25092665\" target=\"_blank\" rel=\"noopener noreferrer\">Google Scholar<\/a> Sutton, R. S., and Barto, A. G. (1998).\u00a0Reinforcement Learning: An Introduction. Cambridge, MA: The MIT Press. <a href=\"http:\/\/scholar.google.com\/scholar_lookup?title=Reinforcement+Learning:+An+Introduction&amp;author=R.+S.+Sutton&amp;author=A.+G.+Barto&amp;publication_year=1998\" target=\"_blank\" rel=\"noopener noreferrer\">Google Scholar<\/a> Tang, Y., Cooke, M., and Valentini-Botinhao, C. (2016). Evaluating the predictions of objective intelligibility metrics for modified and synthetic speech.\u00a0Comput. Speech Lang.\u00a035, 73\u201392. doi:10.1016\/j.csl.2015.06.002 <a href=\"https:\/\/doi.org\/10.1016\/j.csl.2015.06.002\" target=\"_blank\" rel=\"noopener noreferrer\">CrossRef Full Text<\/a>\u00a0|\u00a0<a href=\"http:\/\/scholar.google.com\/scholar_lookup?title=Evaluating+the+predictions+of+objective+intelligibility+metrics+for+modified+and+synthetic+speech&amp;author=Y.+Tang&amp;author=M.+Cooke&amp;author=C.+Valentini-Botinhao&amp;journal=Comput.+Speech+Lang.&amp;publication_year=2016&amp;volume=35&amp;pages=73%E2%80%9392&amp;doi=10.1016\/j.csl.2015.06.002\" target=\"_blank\" rel=\"noopener noreferrer\">Google Scholar<\/a> Tokuda, K., Masuko, T., Yamada, T., Kobayashi, T., and Imai, S. (1995). \u201cAn algorithm for speech parameter generation from continuous mixture HMMs with dynamic features,\u201d in\u00a0EUROSPEECH 1995\u00a0(Madrid, Spain), 757\u2013760. <a href=\"http:\/\/scholar.google.com\/scholar_lookup?title=An+algorithm+for+speech+parameter+generation+from+continuous+mixture+HMMs+with+dynamic+features&amp;author=K.+Tokuda&amp;author=T.+Masuko&amp;author=T.+Yamada&amp;author=T.+Kobayashi&amp;author=S.+Imai&amp;publication_year=1995\" target=\"_blank\" rel=\"noopener noreferrer\">Google Scholar<\/a> Tokuda, K., Nankaku, Y., Toda, T., Zen, H., Yamagishi, J., and Oura, K. (2013). Speech synthesis based on hidden Markov models.\u00a0Proc. IEEE\u00a0101, 1234\u20131252. doi:10.1109\/JPROC.2013.2251852 <a href=\"https:\/\/doi.org\/10.1109\/JPROC.2013.2251852\" target=\"_blank\" rel=\"noopener noreferrer\">CrossRef Full Text<\/a>\u00a0|\u00a0<a href=\"http:\/\/scholar.google.com\/scholar_lookup?title=Speech+synthesis+based+on+hidden+Markov+models&amp;author=K.+Tokuda&amp;author=Y.+Nankaku&amp;author=T.+Toda&amp;author=H.+Zen&amp;author=J.+Yamagishi&amp;author=K.+Oura&amp;journal=Proc.+IEEE&amp;publication_year=2013&amp;volume=101&amp;pages=1234%E2%80%931252&amp;doi=10.1109\/JPROC.2013.2251852\" target=\"_blank\" rel=\"noopener noreferrer\">Google Scholar<\/a> Tokuda, K., Zen, H., Yamagishi, J., Masuko, T., Sako, S., Black, A. W., et al. (2007). \u201cThe HMM-based speech synthesis system (HTS),\u201d in\u00a06th ISCA Workshop on Speech Synthesis, Bonn. <a href=\"http:\/\/scholar.google.com\/scholar_lookup?title=The+HMM-based+speech+synthesis+system+(HTS)&amp;author=K.+Tokuda&amp;author=H.+Zen&amp;author=J.+Yamagishi&amp;author=T.+Masuko&amp;author=S.+Sako&amp;author=A.+W.+Black&amp;publication_year=2007\" target=\"_blank\" rel=\"noopener noreferrer\">Google Scholar<\/a> van Bergem, D. R. (1995). Perceptual and acoustic aspects of lexical vowel reduction, a sound change in progress.\u00a0Speech Commun.\u00a016, 329\u2013358. doi:10.1016\/0167-6393(95)00003-7 <a href=\"https:\/\/doi.org\/10.1016\/0167-6393(95)00003-7\" target=\"_blank\" rel=\"noopener noreferrer\">CrossRef Full Text<\/a>\u00a0|\u00a0<a href=\"http:\/\/scholar.google.com\/scholar_lookup?title=Perceptual+and+acoustic+aspects+of+lexical+vowel+reduction,+a+sound+change+in+progress&amp;author=D.+R.+van+Bergem&amp;journal=Speech+Commun.&amp;publication_year=1995&amp;volume=16&amp;pages=329%E2%80%93358&amp;doi=10.1016\/0167-6393(95)00003-7\" target=\"_blank\" rel=\"noopener noreferrer\">Google Scholar<\/a> van Son, R. J. J. H., and Pols, L. C. W. (1999). An acoustic description of consonant reduction.\u00a0Speech Commun.\u00a028, 125\u2013140. doi:10.1016\/S0167-6393(99)00009-6 <a href=\"https:\/\/doi.org\/10.1016\/S0167-6393(99)00009-6\" target=\"_blank\" rel=\"noopener noreferrer\">CrossRef Full Text<\/a>\u00a0|\u00a0<a href=\"http:\/\/scholar.google.com\/scholar_lookup?title=An+acoustic+description+of+consonant+reduction&amp;author=R.+J.+J.+H.+van+Son&amp;author=L.+C.+W.+Pols&amp;journal=Speech+Commun.&amp;publication_year=1999&amp;volume=28&amp;pages=125%E2%80%93140&amp;doi=10.1016\/S0167-6393(99)00009-6\" target=\"_blank\" rel=\"noopener noreferrer\">Google Scholar<\/a> Verschure, P. F. M. J. (2012). Distributed adaptive control: a theory of the mind, brain, body nexus.\u00a0Biol. Inspired Cognit. Archit.\u00a01, 55\u201372. doi:10.1016\/j.bica.2012.04.005 <a href=\"https:\/\/doi.org\/10.1016\/j.bica.2012.04.005\" target=\"_blank\" rel=\"noopener noreferrer\">CrossRef Full Text<\/a>\u00a0|\u00a0<a href=\"http:\/\/scholar.google.com\/scholar_lookup?title=Distributed+adaptive+control:+a+theory+of+the+mind,+brain,+body+nexus&amp;author=P.+F.+M.+J.+Verschure&amp;journal=Biol.+Inspired+Cognit.+Archit.&amp;publication_year=2012&amp;volume=1&amp;pages=55%E2%80%9372&amp;doi=10.1016\/j.bica.2012.04.005\" target=\"_blank\" rel=\"noopener noreferrer\">Google Scholar<\/a> W3C-SIF. (2000).\u00a0Introduction and Overview of W3C Speech Interface Framework. Available at:\u00a0<a href=\"http:\/\/www.w3.org\/TR\/voice-intro\/\">http:\/\/www.w3.org\/TR\/voice-intro\/<\/a> <a href=\"http:\/\/scholar.google.com\/scholar_lookup?title=Introduction+and+Overview+of+W3C+Speech+Interface+Framework&amp;author=W3C-SIF&amp;publication_year=2000\" target=\"_blank\" rel=\"noopener noreferrer\">Google Scholar<\/a> Wilson, M., and Knoblich, G. (2005). The case for motor involvement in perceiving conspecifics.\u00a0Psychol. Bull.\u00a0131, 460\u2013473. doi:10.1037\/0033-2909.131.3.460 <a href=\"http:\/\/www.ncbi.nlm.nih.gov\/sites\/entrez?Db=pubmed&amp;Cmd=ShowDetailView&amp;TermToSearch=15869341\" target=\"_blank\" rel=\"noopener noreferrer\">PubMed Abstract<\/a>\u00a0|\u00a0<a href=\"https:\/\/doi.org\/10.1037\/0033-2909.131.3.460\" target=\"_blank\" rel=\"noopener noreferrer\">CrossRef Full Text<\/a>\u00a0|\u00a0<a href=\"http:\/\/scholar.google.com\/scholar_lookup?title=The+case+for+motor+involvement+in+perceiving+conspecifics&amp;author=M.+Wilson&amp;author=G.+Knoblich&amp;journal=Psychol.+Bull.&amp;publication_year=2005&amp;volume=131&amp;pages=460%E2%80%93473&amp;doi=10.1037\/0033-2909.131.3.460&amp;pmid=15869341\" target=\"_blank\" rel=\"noopener noreferrer\">Google Scholar<\/a> Wooldridge, M. (2000).\u00a0Reasoning About Rational Agents. Cambridge, MA: The MIT Press. <a href=\"http:\/\/scholar.google.com\/scholar_lookup?title=Reasoning+About+Rational+Agents&amp;author=M.+Wooldridge&amp;publication_year=2000\" target=\"_blank\" rel=\"noopener noreferrer\">Google Scholar<\/a> Xiong, W., Droppo, J., Huang, X., Seide, F., Seltzer, M., Stolcke, A., et al. (2016).\u00a0Achieving Human Parity in Conversational Speech Recognition. Available at:\u00a0<a href=\"https:\/\/arxiv.org\/abs\/1610.05256\">https:\/\/arxiv.org\/abs\/1610.05256<\/a> <a href=\"http:\/\/scholar.google.com\/scholar_lookup?title=Achieving+Human+Parity+in+Conversational+Speech+Recognition&amp;author=W.+Xiong&amp;author=J.+Droppo&amp;author=X.+Huang&amp;author=F.+Seide&amp;author=M.+Seltzer&amp;author=A.+Stolcke&amp;publication_year=2016\" target=\"_blank\" rel=\"noopener noreferrer\">Google Scholar<\/a> Keywords:\u00a0communicative agents, spoken language processing, hierarchical control, intentional speech synthesis, autonomous social agents, mismatched priors Citation:\u00a0Moore RK and Nicolao M (2017) Toward a Needs-Based Architecture for \u2018Intelligent\u2019 Communicative Agents: Speaking with Intention.\u00a0Front. Robot. AI\u00a04:66. doi: 10.3389\/frobt.2017.00066 Received:\u00a030 April 2017;\u00a0Accepted:\u00a021 November 2017; Published:\u00a020 December 2017 Edited by: <a href=\"http:\/\/www.frontiersin.org\/people\/u\/8886\">Serge Thill<\/a>, Plymouth University, United Kingdom Reviewed by: <a href=\"http:\/\/www.frontiersin.org\/people\/u\/9158\">Tony Belpaeme<\/a>, Plymouth University, United Kingdom <a href=\"http:\/\/www.frontiersin.org\/people\/u\/27968\">Paul Baxter<\/a>, University of Lincoln, United Kingdom Copyright:\u00a0\u00a9 2017 Moore and Nicolao. This is an open-access article distributed under the terms of the\u00a0<a href=\"http:\/\/creativecommons.org\/licenses\/by\/4.0\/\" target=\"_blank\" rel=\"license noopener noreferrer\">Creative Commons Attribution License (CC BY)<\/a>. The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. *Correspondence:\u00a0Roger K. Moore,\u00a0<a href=\"mailto:r.k.moore@sheffield.ac.uk\">r.k.moore@sheffield.ac.uk<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Explore the limitations of current voice-enabled personal assistants and a new cognitive architecture for more human-robot interactions<\/p>\n","protected":false},"author":1,"featured_media":463,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_et_pb_use_builder":"","_et_pb_old_content":"","_et_gb_content_width":"","inline_featured_image":false,"footnotes":""},"categories":[52],"tags":[],"class_list":["post-544","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-uncategorized"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.3 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Needs-Based Architecture for Intelligent Agents - Blog sido lyon<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/blog.sido-lyon.com\/en\/2018\/10\/30\/needs-based-architecture-for-intelligent-agents\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Needs-Based Architecture for Intelligent Agents - Blog sido lyon\" \/>\n<meta property=\"og:description\" content=\"Explore the limitations of current voice-enabled personal assistants and a new cognitive architecture for more human-robot interactions\" \/>\n<meta property=\"og:url\" content=\"https:\/\/blog.sido-lyon.com\/en\/2018\/10\/30\/needs-based-architecture-for-intelligent-agents\/\" \/>\n<meta property=\"og:site_name\" content=\"Blog sido lyon\" \/>\n<meta property=\"article:published_time\" content=\"2018-10-30T10:17:08+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/blog.sido-lyon.com\/app\/uploads\/2026\/03\/COMMUNICATIVEAGENTS.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"1440\" \/>\n\t<meta property=\"og:image:height\" content=\"550\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"serviceweb\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"serviceweb\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"45 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/blog.sido-lyon.com\\\/en\\\/2018\\\/10\\\/30\\\/needs-based-architecture-for-intelligent-agents\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/blog.sido-lyon.com\\\/en\\\/2018\\\/10\\\/30\\\/needs-based-architecture-for-intelligent-agents\\\/\"},\"author\":{\"name\":\"serviceweb\",\"@id\":\"https:\\\/\\\/blog.sido-lyon.com\\\/#\\\/schema\\\/person\\\/32a1bfd9b5a826079ba4f86a57e4c710\"},\"headline\":\"Needs-Based Architecture for Intelligent Agents\",\"datePublished\":\"2018-10-30T10:17:08+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/blog.sido-lyon.com\\\/en\\\/2018\\\/10\\\/30\\\/needs-based-architecture-for-intelligent-agents\\\/\"},\"wordCount\":8998,\"commentCount\":0,\"image\":{\"@id\":\"https:\\\/\\\/blog.sido-lyon.com\\\/en\\\/2018\\\/10\\\/30\\\/needs-based-architecture-for-intelligent-agents\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/blog.sido-lyon.com\\\/app\\\/uploads\\\/2026\\\/03\\\/COMMUNICATIVEAGENTS.jpg\",\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/blog.sido-lyon.com\\\/en\\\/2018\\\/10\\\/30\\\/needs-based-architecture-for-intelligent-agents\\\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/blog.sido-lyon.com\\\/en\\\/2018\\\/10\\\/30\\\/needs-based-architecture-for-intelligent-agents\\\/\",\"url\":\"https:\\\/\\\/blog.sido-lyon.com\\\/en\\\/2018\\\/10\\\/30\\\/needs-based-architecture-for-intelligent-agents\\\/\",\"name\":\"Needs-Based Architecture for Intelligent Agents - Blog sido lyon\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/blog.sido-lyon.com\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/blog.sido-lyon.com\\\/en\\\/2018\\\/10\\\/30\\\/needs-based-architecture-for-intelligent-agents\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/blog.sido-lyon.com\\\/en\\\/2018\\\/10\\\/30\\\/needs-based-architecture-for-intelligent-agents\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/blog.sido-lyon.com\\\/app\\\/uploads\\\/2026\\\/03\\\/COMMUNICATIVEAGENTS.jpg\",\"datePublished\":\"2018-10-30T10:17:08+00:00\",\"author\":{\"@id\":\"https:\\\/\\\/blog.sido-lyon.com\\\/#\\\/schema\\\/person\\\/32a1bfd9b5a826079ba4f86a57e4c710\"},\"breadcrumb\":{\"@id\":\"https:\\\/\\\/blog.sido-lyon.com\\\/en\\\/2018\\\/10\\\/30\\\/needs-based-architecture-for-intelligent-agents\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/blog.sido-lyon.com\\\/en\\\/2018\\\/10\\\/30\\\/needs-based-architecture-for-intelligent-agents\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/blog.sido-lyon.com\\\/en\\\/2018\\\/10\\\/30\\\/needs-based-architecture-for-intelligent-agents\\\/#primaryimage\",\"url\":\"https:\\\/\\\/blog.sido-lyon.com\\\/app\\\/uploads\\\/2026\\\/03\\\/COMMUNICATIVEAGENTS.jpg\",\"contentUrl\":\"https:\\\/\\\/blog.sido-lyon.com\\\/app\\\/uploads\\\/2026\\\/03\\\/COMMUNICATIVEAGENTS.jpg\",\"width\":1440,\"height\":550},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/blog.sido-lyon.com\\\/en\\\/2018\\\/10\\\/30\\\/needs-based-architecture-for-intelligent-agents\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/blog.sido-lyon.com\\\/en\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Needs-Based Architecture for Intelligent Agents\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/blog.sido-lyon.com\\\/#website\",\"url\":\"https:\\\/\\\/blog.sido-lyon.com\\\/\",\"name\":\"Blog sido lyon\",\"description\":\"Acc\u00e9l\u00e9rateur de transformation num\u00e9rique Industrie &amp; Services. De la donn\u00e9e terrain \u00e0 la performance d\u00e9ployable\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/blog.sido-lyon.com\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/blog.sido-lyon.com\\\/#\\\/schema\\\/person\\\/32a1bfd9b5a826079ba4f86a57e4c710\",\"name\":\"serviceweb\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/7039866eaefe045ad09c69f377af732e90b75e8223b3251e531ea69b54adfb74?s=96&d=mm&r=g\",\"url\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/7039866eaefe045ad09c69f377af732e90b75e8223b3251e531ea69b54adfb74?s=96&d=mm&r=g\",\"contentUrl\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/7039866eaefe045ad09c69f377af732e90b75e8223b3251e531ea69b54adfb74?s=96&d=mm&r=g\",\"caption\":\"serviceweb\"},\"sameAs\":[\"https:\\\/\\\/blog.sido-lyon.com\"]}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Needs-Based Architecture for Intelligent Agents - Blog sido lyon","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/blog.sido-lyon.com\/en\/2018\/10\/30\/needs-based-architecture-for-intelligent-agents\/","og_locale":"en_US","og_type":"article","og_title":"Needs-Based Architecture for Intelligent Agents - Blog sido lyon","og_description":"Explore the limitations of current voice-enabled personal assistants and a new cognitive architecture for more human-robot interactions","og_url":"https:\/\/blog.sido-lyon.com\/en\/2018\/10\/30\/needs-based-architecture-for-intelligent-agents\/","og_site_name":"Blog sido lyon","article_published_time":"2018-10-30T10:17:08+00:00","og_image":[{"width":1440,"height":550,"url":"https:\/\/blog.sido-lyon.com\/app\/uploads\/2026\/03\/COMMUNICATIVEAGENTS.jpg","type":"image\/jpeg"}],"author":"serviceweb","twitter_card":"summary_large_image","twitter_misc":{"Written by":"serviceweb","Est. reading time":"45 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/blog.sido-lyon.com\/en\/2018\/10\/30\/needs-based-architecture-for-intelligent-agents\/#article","isPartOf":{"@id":"https:\/\/blog.sido-lyon.com\/en\/2018\/10\/30\/needs-based-architecture-for-intelligent-agents\/"},"author":{"name":"serviceweb","@id":"https:\/\/blog.sido-lyon.com\/#\/schema\/person\/32a1bfd9b5a826079ba4f86a57e4c710"},"headline":"Needs-Based Architecture for Intelligent Agents","datePublished":"2018-10-30T10:17:08+00:00","mainEntityOfPage":{"@id":"https:\/\/blog.sido-lyon.com\/en\/2018\/10\/30\/needs-based-architecture-for-intelligent-agents\/"},"wordCount":8998,"commentCount":0,"image":{"@id":"https:\/\/blog.sido-lyon.com\/en\/2018\/10\/30\/needs-based-architecture-for-intelligent-agents\/#primaryimage"},"thumbnailUrl":"https:\/\/blog.sido-lyon.com\/app\/uploads\/2026\/03\/COMMUNICATIVEAGENTS.jpg","inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/blog.sido-lyon.com\/en\/2018\/10\/30\/needs-based-architecture-for-intelligent-agents\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/blog.sido-lyon.com\/en\/2018\/10\/30\/needs-based-architecture-for-intelligent-agents\/","url":"https:\/\/blog.sido-lyon.com\/en\/2018\/10\/30\/needs-based-architecture-for-intelligent-agents\/","name":"Needs-Based Architecture for Intelligent Agents - Blog sido lyon","isPartOf":{"@id":"https:\/\/blog.sido-lyon.com\/#website"},"primaryImageOfPage":{"@id":"https:\/\/blog.sido-lyon.com\/en\/2018\/10\/30\/needs-based-architecture-for-intelligent-agents\/#primaryimage"},"image":{"@id":"https:\/\/blog.sido-lyon.com\/en\/2018\/10\/30\/needs-based-architecture-for-intelligent-agents\/#primaryimage"},"thumbnailUrl":"https:\/\/blog.sido-lyon.com\/app\/uploads\/2026\/03\/COMMUNICATIVEAGENTS.jpg","datePublished":"2018-10-30T10:17:08+00:00","author":{"@id":"https:\/\/blog.sido-lyon.com\/#\/schema\/person\/32a1bfd9b5a826079ba4f86a57e4c710"},"breadcrumb":{"@id":"https:\/\/blog.sido-lyon.com\/en\/2018\/10\/30\/needs-based-architecture-for-intelligent-agents\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/blog.sido-lyon.com\/en\/2018\/10\/30\/needs-based-architecture-for-intelligent-agents\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/blog.sido-lyon.com\/en\/2018\/10\/30\/needs-based-architecture-for-intelligent-agents\/#primaryimage","url":"https:\/\/blog.sido-lyon.com\/app\/uploads\/2026\/03\/COMMUNICATIVEAGENTS.jpg","contentUrl":"https:\/\/blog.sido-lyon.com\/app\/uploads\/2026\/03\/COMMUNICATIVEAGENTS.jpg","width":1440,"height":550},{"@type":"BreadcrumbList","@id":"https:\/\/blog.sido-lyon.com\/en\/2018\/10\/30\/needs-based-architecture-for-intelligent-agents\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/blog.sido-lyon.com\/en\/"},{"@type":"ListItem","position":2,"name":"Needs-Based Architecture for Intelligent Agents"}]},{"@type":"WebSite","@id":"https:\/\/blog.sido-lyon.com\/#website","url":"https:\/\/blog.sido-lyon.com\/","name":"Blog sido lyon","description":"Acc\u00e9l\u00e9rateur de transformation num\u00e9rique Industrie &amp; Services. De la donn\u00e9e terrain \u00e0 la performance d\u00e9ployable","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/blog.sido-lyon.com\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/blog.sido-lyon.com\/#\/schema\/person\/32a1bfd9b5a826079ba4f86a57e4c710","name":"serviceweb","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/7039866eaefe045ad09c69f377af732e90b75e8223b3251e531ea69b54adfb74?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/7039866eaefe045ad09c69f377af732e90b75e8223b3251e531ea69b54adfb74?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/7039866eaefe045ad09c69f377af732e90b75e8223b3251e531ea69b54adfb74?s=96&d=mm&r=g","caption":"serviceweb"},"sameAs":["https:\/\/blog.sido-lyon.com"]}]}},"_links":{"self":[{"href":"https:\/\/blog.sido-lyon.com\/en\/wp-json\/wp\/v2\/posts\/544","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.sido-lyon.com\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.sido-lyon.com\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.sido-lyon.com\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.sido-lyon.com\/en\/wp-json\/wp\/v2\/comments?post=544"}],"version-history":[{"count":0,"href":"https:\/\/blog.sido-lyon.com\/en\/wp-json\/wp\/v2\/posts\/544\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/blog.sido-lyon.com\/en\/wp-json\/wp\/v2\/media\/463"}],"wp:attachment":[{"href":"https:\/\/blog.sido-lyon.com\/en\/wp-json\/wp\/v2\/media?parent=544"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.sido-lyon.com\/en\/wp-json\/wp\/v2\/categories?post=544"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.sido-lyon.com\/en\/wp-json\/wp\/v2\/tags?post=544"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}